صفحه 1:
Tecniche di Data Mining Fosca Giannotti and Dino Pedreschi Pisa KDD Lab, CNUCE-CNR & Univ. Pisa http://www-kdd.cnuce.cnrit/ DIPARTIMENTO DI INFORMATICA - Universita di Pisa anno accademico 2002/2003

صفحه 2:
Tecniche di Data Mining 1 ®@CPO Tevacke di deta whic | Corsi d beers Gperidisics tt Toforwuires ‏اس‎ ‎‘Porwutck= ‎0 6۳100۶0 Bust di dat ‏سای :رهب موه و‎ di dota ‏هو بو‎ Pocutst det dat © Corse dt Lowes to ToPorwwutce (quiquecme, verchic prices) Bante det det ed epteuzicae ‏میس ول‎ 1 Opreo dh Lewes Gpecidisice tt FoPorwutce per (Ervarwia pe per !@zenk Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 3:
Tecniche di Data Mining 1 Acronimo: TDM ‏:واعقء0 اش(‎ Lunedi 11-13 aula E, 610۷601 14-6 aula B | Docente: Fosca Giannotti, CNUCE-CNR, f.giannotti@cnuce.cnr.it | Corso Integrativo: Dino Pedreschi, Dipartimento di Informatica, pedre@di.unipi.it Ricevimento: Mercoledi 14-16 | ISTI, Area Ricerca CNR, localita San Cataldo, Pisa Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 4:
Tecniche di Data Mining I RP erioent bolo Pet %deavet Lor, Dicker (Comber, Osta ‏امه سوه( رت(‎ Meche, Dorcas (ener Pubbshers, 0 ‏هط سوام واه منوا‎ 16۵ 0266660- 60-6 KO. Parnn, ©. Pricisty-Shopiry, ®. Gah, 7. Dirnveuny (editors). dbenwer in Komwlecke dovovery ond chia wisien, OTD Press, (890. ead J. Meret, WLebht Dorada, Padhrais Seo, Priecipbes oP Dot ‏رم‎ OW Prove, DOA. ©, Ckohrebart, ren he (Deb: Disrovertiy ‏اوه‎ Prox ‏مجو رلا"‎ Wer0, Dorgaa keaParenm, SBD (-SSE80-PSF-F, CDOS D4 foot waltz zat celle leziodt sarap resi dspoubt utrwersr ft sie web del corsv: ‏ون یی اج( سسمبا ند‎ Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 5:
ONATLO I mMessaggio e-mail con subject:CorsoTDM. | Contenuto | NomeeCognome... 7 e-mail " annoimmatricofazione: 5 Corso diCawrea :, © Corsi dibasidi dati: ۲۰ Frequentati neiprecedenti semestri: (+ mmquestosemestre: annoaccadenicasoos/2008 Introduzione Giannotti & Pedreschi

صفحه 6:
Contenuti del corso I IntroductionandBasicconcepts(2 ore) > Leapplicaziont > Iprocessodi knowledge discovery 1 DataConsolidation &Data Preparation (4 +2 esercitazione) + Nozionibasiche di Data Warehousing + Nozioni basiche dianalisimultidimensionale dei dati 1 Regole Associatives +4 esercitazione) * Regoleintra-attributo,inter-attributo ~ Calcolo efficiente di regoleLassociazione:algoritmo Apriorie varianti * Estensioni del concetto di regola dassociazione:tassonomie,regole quantitative, regolepredittive, Regoleassociativeefattore Tempo: RAA Cicliche eCalendriche > Pattern SequenzialieSerieTemporali 2 Basket Market Analysisutilizzando RIA Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 7:
Contenuti del corso Classificazione con alberi di decisione (6 ore +2 esercitazione) > Principalitecniche di classificazione > Classificatori bayesiant * Aberi di decisione > Rassegnadialtrimetodi > Applicazioneal rilevamento di frodi 1 Clustering (2 ore+z2 esercitazione) © Principalitecniche di clustering > Apylicazioneal Customer segmentation ۲ Web Mining (gore) 1 Temi avanzati (6 oreseminari) Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 8:
Modalit divalutazione | €sercizi durante il corso(oOrale):30% | Seminario (oProgetto):70%? | students shouldpairupinteams. They will receive thesame credit as their partner. Division of Caborisuptothem. Presentations show dtake s5ominutes, including 10minutes for discussion. Apresentation normally covers twoor three closely relatedpapers Transparencies showldbemadeavailable tothe rest of the class---preferablyin PDF or HIML format. Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 9:
Course Outline 1 Imtroductionandbasic concepts | mMotivations,applications, the KDD process, the techniques | DeeperintoDM technology 1 Association Rules and Market Basket Analysis 1 Decision TreesandFraudDetection | ClusteringandCustomer Segmentation | DeeperintoData Preparation 1 Basicnotion of Datawarefiouse | Selectionandpreprocessing 1 AdvancedTopics 1 ScalableDM algorithms 1 Datamining query languages 1 Miningon Web = Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi —

صفحه 10:
fromdatamanagement to data analysis | 1960s: | Data coltection, database creation IMS andnetworkDBMS. | 4970s: | Relational datamodel,relational DBMS implementation. | 4980s: ۱ (extended-relational,00,deductive,etc.) andapplication-oriented DBMS (spatial, scientific, engineering, etc.). “49908: | Datamininganddatawarehousing, multimedia databasesand Web technology. 70 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 11:
“Necessity isthe Mother of Invention” 1 Data explosion problem: | automateddata collection toolsmature database technologyand internet Ceadtotremendous amounts of data storedin databases, data warehouses andother information repositories. | Wearedrowning in information, but starving for knowledge! ohn ‘Naisbett) 9 ‏وم مه‎ 1 On-Cineanalytical processing | Extraction ofinteresting knowledge (rules, regularities, patterns,constraints)fromdatainlargedatabases. ” Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 12:
20 ott watt ons 5 DM | Abundance of businessandindustry data " Competitive focus - Knowledge Management " Imexpensive,powerful computing engines “ Strong theoretical/mathematical foundations ' machinelearning & Cogic ' statistics ' databasemanagement systems 92 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 13:
Sources 0 ‘Data I Business Transactions | widespreaduse of bar codes => storage ofmillions of transactions daily (e.g.,Walmart:2000 stores => 20M transactions per day) | most important problem:effectiveuse of thedatainareasonable timeframe for competitive decision-making | u SclentificData | datageneratedthroughmultitude of experimentsandobservations | examples, geological data,satelliteimaging data, NASAcarth observations ۱ ateofdata collection far exceedsthespeedbywhichweanalyse the ita Financial Data | companyinformation | economic data (GNP,priceindexes,etc.) | stockmarkets a annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi

صفحه 14:
ا ‎Sources of Data‏ I Personal / Statistical Data government census medical histories customer profiles demographic data dataandstatistics about sportsandathletes orld Wide Web andOnline Repositories email, news,messages Web documents, images, video, etc. Cink structure of of the hypertext frommiltions of Web sites Webusage data (fromserver Cogs, network traffic.anduser registrations) online databases,and digital libraries د 7 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 15:
Classes of applications I Databaseanalysisanddecision support | Market analysis * target marketing,customer relation management, market basket anaCysis,cross sefCing, market segmentation, | Riskanalysis * Forecasting, customer retention, improvedunderwriting, quality control, competitiveanalysis. 0 | Other Applications | Text (news group,email,documents)and Web analysis. ! Intelligent Query Answering Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi 0

صفحه 16:
Market analysts I Wherearethe data sourcesfor analysis? ! Credit cardtransactions,Coyalty cards, discount coupons, customer complaint calls,plus (public) lifestyle studies. ‎Target marketing‏ ا ‎| Find clusters of “model” customers who share the same characteristics:interest,income level, spending fabits,etc. ‎“ Determine customer purchasing patterns over time ! Conversion ofsingletoafoint bankaccount:marriage,etc. “ Cross-market analysis ! Associations/co-reCations between product sales | Prediction basedon the associationinformation. ‎Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 17:
OO Market Analysis (2) | Customer profiling | dataminingcan tell youwhat types of customers buy what products (clustering or classification). | Identifying customer requirements ! identifying the best products for different customers | useprediction tofindwhatfactorswill attract new customers | Provides summary information | yariousmultidimensional summary reports! | statistical summary information (data central tendency andvariation) Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 18:
se amatysis | Financeplanningandasset evaluation: | cashflow analysisandprediction | contingent claimanalysistoevaluateassets ! cross-sectional andtimeseriesanalysis (financial-ratio,trend analysis,etc.) | Resourceplanning: ! swmmarizeandcompare the resourcesandspending | Competition: | monitor competitorsandmarket directions (CI: competitive intelligence). | group customersintoclassesandclass-hasedpricing procedures | setpricing strategy ina highly competitivemarket Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 19:
FraudDetecti on “ Applications: | widelyusedin healthcare, retail,credit cardservices, teLecommunications (phonecardfraud),etc. “Approach: | usehistorical data to buildmodels of fraudulent behavior anduse datamining to help identify similar instances. “Examples: | autoinswurance: detect a group of people whostageaccidentsto collect oninsurance | money laundering: detect suspicious money transactions (US ‘Treasury's Financial Crimes Enforcement Network) | medicalinsurance: detect professional patientsandring of doctorsandring of references Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 20:
FraudDetection (2) “ Moreexamples: | Detectinginappropriatemedical treatment: | Australian HealthInsurance Commission identifies thatin many cases blanket screening tests were requested (save Australian $7m/yr). | Detecting telephonefraud: 0 Telephone call model: destination of the call, duration, time of day or week. Analyzepatterns that deviatefromanexpected norm. 0 British TeLecomidentifieddiscretegroupsof callerswith frequent intra-group calls, especially mobile phones,and brokea multimillion dollar fraud. | Retail: Analystsestimate that 38 %of retail shrinkis due to dishonest employees, Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi ‏سح‎

صفحه 21:
Other applications | Sports 1 IBM AdvancedScout analyzed NBA game statistics (shots Blocked, assists,andfouls) to gain competitiveadvantagefor New YorRKnicksandMiami Heat. 1 Astronomy | yrrandthe Palomar Observatory discovered 22 quasars with the help of datamining | Internet Web Surf-Aid 1 IBM Surf-Aidapplies data mining algorithms to Web access Cogsfor market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Webmarketing, improving Web site organization, etc. | Watchfor the PRIVACY pitfa(t! Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 22:
SS... Whatis KDD? Aprocess! | Theselectionandprocessing of datafor: ' theidentification of novel,accurate,and useful patterns,and ' themodeling of real-worldphenomena. ‎Datamining isamajor component of the‏ لأ ‎KDD process - automated discovery of‏ ‎patternsandthe development of predictive‏ ‎andexplanatory models.‏ ‎22 ‎annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi

صفحه 23:
the XDD process Interpretatio: nd Evaluatio) $election an Rreprocessing 3 “ 2 E atterns | | Models + Prepared. Data eg ge Data Sources Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 24:
1 xDD stepscanbemergedor combined ! Data Cleaning +DataIntegration=Data Preprocessing | Data Sefection+Data Transformation =Data Consolidation | KDDisandIterative Process | art +engineering rather than science 24 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 25:
The virtuous cycle Identify Problemor Qpportunit Act on Knowledge Measure effect Strategy of Action Results 25 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 26:
The steps of the KDD process Learning theapplication domain: ! relevant prior Rnowledgeand goals ofapplication Dataconsolidation:Creatinga target dataset Selectionand Preprocessing | Data cleaning: (may take 60%of effort!) | Datareductionandprofection: | finduseful features dimensionality /variable reduction, invariant ‘representation. Choosing functions of datamining summarization, classification, regression,association, clustering, Choosing themining algorithms) Datamining:searchfor patterns ofinterest Interpretationandevaluation:analysis of results. | visualization, transformation, removing redundant patterns, .« ‘Use of discoveredknowledge Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 27:
Domain Experts Mining Specialists Data Administrator] “ Annoaccadenicascezi3000 Introduzione Giannotti & Pedreschi ‏ری‎

صفحه 28:
تأنالا 01011 عانتمالا ععنالات وبأ نأناء عأنااناً 55ع نا اناأكطا5 إل Data Mining Operational Business Business ‏و‎ ‎Data ۲ 1 ~ Basket Analysis Data iformation ۳ Fraud Detection Werehouse Warehouse Target Marketing ga BN. v Business Queries Extraction/ Replication Hs “ae Data Cleaning Meta Data Management Giannotti & Pedreschi ‏سح‎ Co Annoaccademicasoos/2000 Introduzione

صفحه 29:
The xDD process Interpretati d Evaluatio * Patterns & : Models anne Date SOUTCES » troduzione Giannotti & Pedrescht

صفحه 30:
Data consolidationand preparation GarbageinGarbage out I The quality of results relates directly to quality of thedata | 50%-70%of KDD process effort is spent on data consolidation andpreparation | Mafor justification for a corporate data warehouse 30 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 31:
5 7 ‏سم‎ on Fromdata sources toconsolidateddata repository S| — ‏ا‎ Sse Object/Relation DBMS Multidimensional DBMS ‘ ۱ Deductive Datei’ ۱ 2 Flat files 6 Giannotti & Pedreschi Annoaccademicasoos/2000 Introduzione

صفحه 32:
11( 0:60:05 00012011 | Determinepreliminary list ofattributes | Consolidate dataintoworking database ' InternalandExternal sources | Eliminate or estimate missing values | Remove outliers (obvious exceptions) | Determineprior probabilities of categoriesand deal with volume bias و Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 33:
The XDD process Interpretat: d Evaluatio; wd av. ۸ annoacadeicascoines Introduzione Giannotti & Pedreschi bb

صفحه 34:
Data selectionandpreprocessing I Generateaset ofexamples | choosesampling method ' consider sample complexity | dealwithvolumebiasissues | Reduceattribute dimensionality | remove redundant and/or correlating attributes | comBineattributes (sum,multiply, difference) | Reduceattributevalueranges ! group symbolic discretevatues 1! quantify continuous numericvatues | Transformdata | de-correlateandnormatizevalues | map time-series data tostaticrepresentation ۱ tools play key role 7 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 35:
The XDD process Interpretatio: d Evaluatio; election and reprocessing وه Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 36:
Datamining tasks andmethods | DirectedKnowledgeDiscovery ! Purpose: Explainvalue of somefieldin terms of all the others (goal-oriented) | Method: select the target fieldbasedon some hypothesis about thedatalaskthealgorithmtotellus how to predict or classify new instances ۱ Examples: | what products show increasedsalewhen cream cheeseis discounted ‏لقع تدع نه 07[ ونه طز قاع بن1سه :010 ع كننا 0 :نه :0111161 6 بقاع نه سد ذأ‎ ‘user ‏اما ها متام‎ 2 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 37:
OO Datamining tasks andmethods ‎undirected Knowledge Discovery‏ ذأ ‎(Explorative Methods)‏ ‎| Purpose: Findpatternsin the data that may be interesting (notarget filed) ۱3 rules (affinity grouping) ! Examples: J whichproductsin the catalog often sell together ‎| market segmentation (groups of customers/users withsimilar characteristics) ‎37 ‎Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 38:
Alternatively :Dataminingtasksandmethods | Automated Exploration/Discovery ' ¢g.. discovering new market segments | clusteringanalysis | Prediction/Classification | e.g..forecasting gross sales given current factors regression, newral networks,geneticalgorithms, decision trees ft rey \ | Explanation/Description ' eg..characterizing customers by demographics andpurchase history 5 | decision trees,association rules ‏انا‎ ‎andincome < $35k then... و Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 39:
Automatedexplorationand discovery I Clustering:partitioninga set of dataintoa set of classes, called clusters,whosemembers sharesome interesting common properties, ‎Distance-basednumerical clustering‏ لأ ‎| metricgrouping of examples (K-NN) ' graphical visualization can beused ۱ ‏ونتانأمت تأكنانك بلاتصياىء ينه‎ ‎| searchfor the number of classes which result in best fit of aprobability distributiontothedata ‎! AutoClass (NASA) one of best examples ‎39 ‎Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 40:
Predictionandclassification | cearning apredictivemodel ‏لأ‎ Classification of a new case/sample | Many methods: ! Artificial neural networks ! Inductive decisiontreeandrule systems 1١ Geneticalgorithms ! Nearest neighbor clustering algorithms ! Statistical (parametric,andnon-parametric) 40 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 41:
Generalizationandregression | The objective of learning is toachieve good generalizationtonew unseen cases. “ Generalization can be definedasamathematical interpolationor regressionover a set of training points “ Models can bevalidatedwithapreviouslyunseen test set or using cross-validation metfods a Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 42:
02320001 andprediction ! Classify data basedon thevalues ofa target attribute,e.g.,classify countries basedon climate, or classify cars basedon gasmileage. ! Alseobtainedmodel topredict someunknown or missing attributevalues basedon other information. a2 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 43:
Summarizing:inductive modeling > ۵ Objective: Developageneralmodel or hypothesis from specific examples 5 الال تنه وماقمس ووه مم5 0 | Classification (concept ‏ا‎ ‎recognition) x 43 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 44:
Explanationand description | Learna generalized hypothesis (model) from selecteddata | Description/Interpretation of model provides new knowledge | Affinity Grouping | Methods: ' Inductive decision treeandrufe systems ! Association rule systems ' LinkAnalysis 1 a Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 45:
Exception/deviation detection ‎Generateamodel of normal activity‏ لا ‎Deviation frommodel causes alert‏ “ ‎"Methods:‏ ‎| Artificial newral networks ‎۱ Inductive decision treeandrule systems ‎| Statisticalmethods ‎۱ Visualization tools ‎۶ ‎Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi {7

صفحه 46:
Outlier andexception data analysis | Time-series analysis (trendanddeviation): ! Trendanddeviation analysis: regression, sequential pattern, similar sequences,trendand deviation,e.g., stockanalysis. ! Similarity-basedpattern-directedanalysis | Pull vs. partial periodicity analysis | Otherpattern-directedor statisticalanalysis 46 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 47:
“7 Example: Moviegoer Database fmoviegoer_ID movie _10 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 48:
SELECT moviegoers.name, moviegoers.sex, moviegoers.age, sources.source, movies.name FROM movies, sources, moviegoers WHERE sources.source_ID = moviegoers.source ID AND movies.movie ID = moviegoers.movie_ID ORDER BY moviegoers. name; سس — ی سس ‎Day‏ مس ‎oP Obert‏ م ‎Oo‏ ‏موه 1۵ ‎Obert‏ 06 - مه مه 7 ‎ow Obert‏ - هه Ome 3 90 Obert ‏امس‎ اوه باه سق 1 ‎Obert‏ 06 م ‎Owe‏ ‏مه مه ‎vt 0 90 Obert‏ Gob 2 60 ‏.تمصت‎ Scharber’ Let Gram - ee Obert Guger Oop ‎Obert ddr‏ « 0 بسن ‎Caw 3 es Obert ‏و‎ ‎O.Pines The Orde‏ 90 م ون ‎Obert rem‏ ۵ مه ‎Cut 3 6 ‏بسسد ۵ لهه‎ Da) Want - €o ‏هه‎ ۳ Cre 0 29) D.Odwe Prepon 1 ‎Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi 1 ‎ ‎ ‎ ‎ ‎

صفحه 49:
Example: Moviegoer Da tabase | Classification | determine sex basedonage,source,andmovies seen | determine source basedon sex,age,andmovies seen | determinemost recent movie basedon past movies, age, sex,and sowrce | Estimation | forpredict, needa continuousvariable(e.g,,“age”) | predict ageasafunction of source,sex,andpast movies | ifwehada “rating’ fieldfor eachmoviegoer,wecowdpredict therating a new moviegoer gives toa movie basedon age, sex, past movies,etc. a3 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 50:
Example: Moviegoer Database U Clustering | findgroupings of movies that are often seen by the same people findgroupings of people that tend toseethesame movies clustering might reveal relationships that arenot necessarily recordedin the data (e.g.,wemay finda cluster that is dominated by people with young children {or a cluster of movies that correspondtoa particular genre) 50 annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi

صفحه 51:
Example: Moviegoer Database I Association Rules | market basket analysis (MBA): “whichmoviesgotogether?” | needtocreate transactions’ for eachmoviegoer containing movies seen by that moviegoer: MO ‏مس‎ مر روط سم ۵۵0 سانا مهو رم ۰ مه 6 ‎O08‏ ‎Dey, (reps)‏ سم من متا ©0006 004: (Prerepemn, Cchanber's Les} | mayresult in association rules sucha: {“Phenomenon”, “The Birdcage”} {“Trainspotting”} {“Trainspotting”, “The Birdcage”} ==> {sex = “f"} 1111 a7 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 52:
Example: Moviegoer Database I sequence Analysis ۱ similar to MBA, but order inwhichitemsappearin thepatternisimportant | e.g.,peoplewho rent “The Birdcage” during avisit tendtorent “Trainspotting” in the next visit. 32 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

صفحه 53:
The XDD process Interpretatio: nd Evaluatio; Dp and Warehousing \ w ‏ل‎ 5 ia annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi ort

صفحه 54:
Areall the discoveredpatterninteresting? | Adatamining system/query may generate thousands of patterns, not all of themareinteresting. “ Imterestingnessmeasures: | easifyunderstoodby humans | validon new or test data withsome degree of certainty. | potentiallyuseful | novel, orvalidates some hypothesis that auser seeks toconfirm | Objectivevs. subjectiveinterestingnessmeasures ! Offective:basedonstatisticsandstructures of patterns,e.g,, support, confidence, etc. | Subjective:basedonuser’s beliefsin the data,e.g.,unexpectedness, novelty,etc. annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi

صفحه 55:
Interpretationandevaluation Evaluation | Statistical validationandsignificance testing | Qualitative review by expertsinthefield | Pilot surveys toevaluate model accuracy Interpretation | Inductivetreeandrulemodelscanbereaddirectly | Clustering results can begraphedandtabled | Codecanbeautomatically generated by some systems (IDTs, Regression models) 55 Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi

جهت مطالعه ادامه متن، فایل را دریافت نمایید.
34,000 تومان