صفحه 1:
Tecniche di Data Mining
Fosca Giannotti and
Dino Pedreschi
Pisa KDD Lab, CNUCE-CNR & Univ. Pisa
http://www-kdd.cnuce.cnrit/
DIPARTIMENTO DI INFORMATICA - Universita di Pisa
anno accademico 2002/2003
صفحه 2:
Tecniche di Data Mining
1 ®@CPO Tevacke di deta whic
| Corsi d beers Gperidisics tt Toforwuires اس
‘Porwutck=
0 6۳100۶0 Bust di dat سای :رهب موه و di dota
هو بو Pocutst det dat
© Corse dt Lowes to ToPorwwutce (quiquecme, verchic
prices)
Bante det det ed epteuzicae میس ول
1 Opreo dh Lewes Gpecidisice tt FoPorwutce per (Ervarwia
pe per !@zenk
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 3:
Tecniche di Data Mining
1 Acronimo: TDM
:واعقء0 اش( Lunedi 11-13 aula E, 610۷601 14-6
aula B
| Docente: Fosca Giannotti, CNUCE-CNR,
f.giannotti@cnuce.cnr.it
| Corso Integrativo: Dino Pedreschi,
Dipartimento di Informatica, pedre@di.unipi.it
Ricevimento: Mercoledi 14-16
| ISTI, Area Ricerca CNR, localita San Cataldo, Pisa
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 4:
Tecniche di Data Mining
I RP erioent bolo Pet
%deavet Lor, Dicker (Comber, Osta امه سوه( رت( Meche,
Dorcas (ener Pubbshers, 0
هط سوام واه منوا 16۵ 0266660-
60-6
KO. Parnn, ©. Pricisty-Shopiry, ®. Gah, 7. Dirnveuny (editors).
dbenwer in Komwlecke dovovery ond chia wisien, OTD Press, (890.
ead J. Meret, WLebht Dorada, Padhrais Seo, Priecipbes oP Dot
رم OW Prove, DOA.
©, Ckohrebart, ren he (Deb: Disrovertiy اوه Prox مجو رلا"
Wer0, Dorgaa keaParenm, SBD (-SSE80-PSF-F, CDOS
D4 foot waltz zat celle leziodt sarap resi dspoubt utrwersr ft sie
web del corsv:
ون یی اج( سسمبا ند
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 5:
ONATLO
I mMessaggio e-mail con subject:CorsoTDM.
| Contenuto
| NomeeCognome...
7 e-mail
" annoimmatricofazione:
5 Corso diCawrea :,
© Corsi dibasidi dati:
۲۰ Frequentati neiprecedenti semestri:
(+ mmquestosemestre:
annoaccadenicasoos/2008 Introduzione Giannotti & Pedreschi
صفحه 6:
Contenuti del corso
I IntroductionandBasicconcepts(2 ore)
> Leapplicaziont
> Iprocessodi knowledge discovery
1 DataConsolidation &Data Preparation (4 +2 esercitazione)
+ Nozionibasiche di Data Warehousing
+ Nozioni basiche dianalisimultidimensionale dei dati
1 Regole Associatives +4 esercitazione)
* Regoleintra-attributo,inter-attributo
~ Calcolo efficiente di regoleLassociazione:algoritmo Apriorie
varianti
* Estensioni del concetto di regola dassociazione:tassonomie,regole
quantitative, regolepredittive,
Regoleassociativeefattore Tempo: RAA Cicliche eCalendriche
> Pattern SequenzialieSerieTemporali
2 Basket Market Analysisutilizzando RIA
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 7:
Contenuti del corso
Classificazione con alberi di decisione (6 ore +2 esercitazione)
> Principalitecniche di classificazione
> Classificatori bayesiant
* Aberi di decisione
> Rassegnadialtrimetodi
> Applicazioneal rilevamento di frodi
1 Clustering (2 ore+z2 esercitazione)
© Principalitecniche di clustering
> Apylicazioneal Customer segmentation
۲ Web Mining (gore)
1 Temi avanzati (6 oreseminari)
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 8:
Modalit divalutazione
| €sercizi durante il corso(oOrale):30%
| Seminario (oProgetto):70%?
| students shouldpairupinteams. They will receive thesame
credit as their partner. Division of Caborisuptothem.
Presentations show dtake s5ominutes, including 10minutes
for discussion. Apresentation normally covers twoor three
closely relatedpapers
Transparencies showldbemadeavailable tothe rest of the
class---preferablyin PDF or HIML format.
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 9:
Course Outline
1 Imtroductionandbasic concepts
| mMotivations,applications, the KDD process, the techniques
| DeeperintoDM technology
1 Association Rules and Market Basket Analysis
1 Decision TreesandFraudDetection
| ClusteringandCustomer Segmentation
| DeeperintoData Preparation
1 Basicnotion of Datawarefiouse
| Selectionandpreprocessing
1 AdvancedTopics
1 ScalableDM algorithms
1 Datamining query languages
1 Miningon Web =
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi —
صفحه 10:
fromdatamanagement to data analysis
| 1960s:
| Data coltection, database creation IMS andnetworkDBMS.
| 4970s:
| Relational datamodel,relational DBMS implementation.
| 4980s:
۱ (extended-relational,00,deductive,etc.)
andapplication-oriented DBMS (spatial, scientific, engineering, etc.).
“49908:
| Datamininganddatawarehousing, multimedia databasesand Web
technology.
70
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 11:
“Necessity isthe Mother of Invention”
1 Data explosion problem:
| automateddata collection toolsmature database technologyand
internet Ceadtotremendous amounts of data storedin databases,
data warehouses andother information repositories.
| Wearedrowning in information, but starving for knowledge! ohn
‘Naisbett)
9 وم مه
1 On-Cineanalytical processing
| Extraction ofinteresting knowledge (rules, regularities,
patterns,constraints)fromdatainlargedatabases.
”
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 12:
20 ott watt ons 5 DM
| Abundance of businessandindustry data
" Competitive focus - Knowledge Management
" Imexpensive,powerful computing engines
“ Strong theoretical/mathematical
foundations
' machinelearning & Cogic
' statistics
' databasemanagement systems
92
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 13:
Sources 0 ‘Data
I Business Transactions
| widespreaduse of bar codes => storage ofmillions of transactions
daily (e.g.,Walmart:2000 stores => 20M transactions per day)
| most important problem:effectiveuse of thedatainareasonable
timeframe for competitive decision-making
|
u SclentificData
| datageneratedthroughmultitude of experimentsandobservations
| examples, geological data,satelliteimaging data, NASAcarth
observations
۱ ateofdata collection far exceedsthespeedbywhichweanalyse the
ita
Financial Data
| companyinformation
| economic data (GNP,priceindexes,etc.)
| stockmarkets
a
annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi
صفحه 14:
ا
Sources of Data
I Personal / Statistical Data
government census
medical histories
customer profiles
demographic data
dataandstatistics about sportsandathletes
orld Wide Web andOnline Repositories
email, news,messages
Web documents, images, video, etc.
Cink structure of of the hypertext frommiltions of Web sites
Webusage data (fromserver Cogs, network traffic.anduser
registrations)
online databases,and digital libraries
د
7
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 15:
Classes of applications
I Databaseanalysisanddecision support
| Market analysis
* target marketing,customer relation management, market basket
anaCysis,cross sefCing, market segmentation,
| Riskanalysis
* Forecasting, customer retention, improvedunderwriting, quality
control, competitiveanalysis.
0
| Other Applications
| Text (news group,email,documents)and Web analysis.
! Intelligent Query Answering
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi 0
صفحه 16:
Market analysts
I Wherearethe data sourcesfor analysis?
! Credit cardtransactions,Coyalty cards, discount coupons,
customer complaint calls,plus (public) lifestyle studies.
Target marketing ا
| Find clusters of “model” customers who share the same
characteristics:interest,income level, spending fabits,etc.
“ Determine customer purchasing patterns over time
! Conversion ofsingletoafoint bankaccount:marriage,etc.
“ Cross-market analysis
! Associations/co-reCations between product sales
| Prediction basedon the associationinformation.
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 17:
OO
Market Analysis (2)
| Customer profiling
| dataminingcan tell youwhat types of customers buy what
products (clustering or classification).
| Identifying customer requirements
! identifying the best products for different customers
| useprediction tofindwhatfactorswill attract new
customers
| Provides summary information
| yariousmultidimensional summary reports!
| statistical summary information (data central tendency
andvariation)
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 18:
se amatysis
| Financeplanningandasset evaluation:
| cashflow analysisandprediction
| contingent claimanalysistoevaluateassets
! cross-sectional andtimeseriesanalysis (financial-ratio,trend
analysis,etc.)
| Resourceplanning:
! swmmarizeandcompare the resourcesandspending
| Competition:
| monitor competitorsandmarket directions (CI: competitive
intelligence).
| group customersintoclassesandclass-hasedpricing procedures
| setpricing strategy ina highly competitivemarket
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 19:
FraudDetecti on
“ Applications:
| widelyusedin healthcare, retail,credit cardservices,
teLecommunications (phonecardfraud),etc.
“Approach:
| usehistorical data to buildmodels of fraudulent behavior anduse
datamining to help identify similar instances.
“Examples:
| autoinswurance: detect a group of people whostageaccidentsto
collect oninsurance
| money laundering: detect suspicious money transactions (US
‘Treasury's Financial Crimes Enforcement Network)
| medicalinsurance: detect professional patientsandring of
doctorsandring of references
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 20:
FraudDetection (2)
“ Moreexamples:
| Detectinginappropriatemedical treatment:
| Australian HealthInsurance Commission identifies thatin
many cases blanket screening tests were requested (save
Australian $7m/yr).
| Detecting telephonefraud:
0 Telephone call model: destination of the call, duration, time of
day or week. Analyzepatterns that deviatefromanexpected
norm.
0 British TeLecomidentifieddiscretegroupsof callerswith
frequent intra-group calls, especially mobile phones,and
brokea multimillion dollar fraud.
| Retail: Analystsestimate that 38 %of retail shrinkis due to
dishonest employees,
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi سح
صفحه 21:
Other applications
| Sports
1 IBM AdvancedScout analyzed NBA game statistics (shots
Blocked, assists,andfouls) to gain competitiveadvantagefor
New YorRKnicksandMiami Heat.
1 Astronomy
| yrrandthe Palomar Observatory discovered 22 quasars with
the help of datamining
| Internet Web Surf-Aid
1 IBM Surf-Aidapplies data mining algorithms to Web access
Cogsfor market-related pages to discover customer preference
and behavior pages, analyzing effectiveness of Webmarketing,
improving Web site organization, etc.
| Watchfor the PRIVACY pitfa(t!
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 22:
SS...
Whatis KDD? Aprocess!
| Theselectionandprocessing of datafor:
' theidentification of novel,accurate,and
useful patterns,and
' themodeling of real-worldphenomena.
Datamining isamajor component of the لأ
KDD process - automated discovery of
patternsandthe development of predictive
andexplanatory models.
22
annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi
صفحه 23:
the XDD process
Interpretatio:
nd Evaluatio)
$election an
Rreprocessing
3 “
2 E atterns
| | Models
+ Prepared. Data
eg ge
Data Sources
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 24:
1 xDD stepscanbemergedor combined
! Data Cleaning +DataIntegration=Data Preprocessing
| Data Sefection+Data Transformation =Data Consolidation
| KDDisandIterative Process
| art +engineering rather than science
24
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 25:
The virtuous cycle
Identify
Problemor
Qpportunit
Act on
Knowledge
Measure effect
Strategy of Action Results
25
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 26:
The steps of the KDD process
Learning theapplication domain:
! relevant prior Rnowledgeand goals ofapplication
Dataconsolidation:Creatinga target dataset
Selectionand Preprocessing
| Data cleaning: (may take 60%of effort!)
| Datareductionandprofection:
| finduseful features dimensionality /variable reduction, invariant
‘representation.
Choosing functions of datamining
summarization, classification, regression,association, clustering,
Choosing themining algorithms)
Datamining:searchfor patterns ofinterest
Interpretationandevaluation:analysis of results.
| visualization, transformation, removing redundant patterns, .«
‘Use of discoveredknowledge
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 27:
Domain
Experts
Mining
Specialists
Data
Administrator]
“
Annoaccadenicascezi3000 Introduzione Giannotti & Pedreschi ری
صفحه 28:
تأنالا 01011 عانتمالا ععنالات وبأ نأناء عأنااناً 55ع نا اناأكطا5 إل
Data Mining
Operational Business Business و
Data ۲ 1 ~ Basket Analysis
Data iformation ۳ Fraud Detection
Werehouse Warehouse Target Marketing
ga BN.
v Business Queries
Extraction/ Replication Hs “ae
Data Cleaning
Meta Data Management
Giannotti & Pedreschi سح
Co
Annoaccademicasoos/2000 Introduzione
صفحه 29:
The xDD process
Interpretati
d Evaluatio
* Patterns &
: Models
anne Date SOUTCES » troduzione Giannotti & Pedrescht
صفحه 30:
Data consolidationand preparation
GarbageinGarbage out
I The quality of results relates directly to quality of
thedata
| 50%-70%of KDD process effort is spent on data
consolidation andpreparation
| Mafor justification for a corporate data warehouse
30
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 31:
5 7 سم on
Fromdata sources toconsolidateddata
repository
S| — ا Sse
Object/Relation DBMS
Multidimensional DBMS
‘ ۱ Deductive Datei’ ۱
2 Flat files 6
Giannotti & Pedreschi
Annoaccademicasoos/2000 Introduzione
صفحه 32:
11( 0:60:05 00012011
| Determinepreliminary list ofattributes
| Consolidate dataintoworking database
' InternalandExternal sources
| Eliminate or estimate missing values
| Remove outliers (obvious exceptions)
| Determineprior probabilities of categoriesand
deal with volume bias
و
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 33:
The XDD process
Interpretat:
d Evaluatio;
wd
av. ۸
annoacadeicascoines Introduzione Giannotti & Pedreschi bb
صفحه 34:
Data selectionandpreprocessing
I Generateaset ofexamples
| choosesampling method
' consider sample complexity
| dealwithvolumebiasissues
| Reduceattribute dimensionality
| remove redundant and/or correlating attributes
| comBineattributes (sum,multiply, difference)
| Reduceattributevalueranges
! group symbolic discretevatues
1! quantify continuous numericvatues
| Transformdata
| de-correlateandnormatizevalues
| map time-series data tostaticrepresentation
۱ tools play key role
7
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 35:
The XDD process
Interpretatio:
d Evaluatio;
election and
reprocessing
وه
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 36:
Datamining tasks andmethods
| DirectedKnowledgeDiscovery
! Purpose: Explainvalue of somefieldin terms of all the
others (goal-oriented)
| Method: select the target fieldbasedon some hypothesis
about thedatalaskthealgorithmtotellus how to
predict or classify new instances
۱ Examples:
| what products show increasedsalewhen cream
cheeseis discounted
لقع تدع نه 07[ ونه طز قاع بن1سه :010 ع كننا 0 :نه :0111161 6 بقاع نه سد ذأ
‘user اما ها متام
2
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 37:
OO
Datamining tasks andmethods
undirected Knowledge Discovery ذأ
(Explorative Methods)
| Purpose: Findpatternsin the data that may be
interesting (notarget filed)
۱3 rules (affinity
grouping)
! Examples:
J whichproductsin the catalog often sell together
| market segmentation (groups of customers/users
withsimilar characteristics)
37
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 38:
Alternatively :Dataminingtasksandmethods
| Automated Exploration/Discovery
' ¢g.. discovering new market segments
| clusteringanalysis
| Prediction/Classification
| e.g..forecasting gross sales given current factors
regression, newral networks,geneticalgorithms,
decision trees ft
rey \
| Explanation/Description
' eg..characterizing customers by demographics
andpurchase history 5
| decision trees,association rules انا
andincome < $35k
then...
و
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 39:
Automatedexplorationand discovery
I Clustering:partitioninga set of dataintoa set of classes,
called clusters,whosemembers sharesome interesting
common properties,
Distance-basednumerical clustering لأ
| metricgrouping of examples (K-NN)
' graphical visualization can beused
۱ ونتانأمت تأكنانك بلاتصياىء ينه
| searchfor the number of classes which result in best fit of
aprobability distributiontothedata
! AutoClass (NASA) one of best examples
39
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 40:
Predictionandclassification
| cearning apredictivemodel
لأ Classification of a new case/sample
| Many methods:
! Artificial neural networks
! Inductive decisiontreeandrule systems
1١ Geneticalgorithms
! Nearest neighbor clustering algorithms
! Statistical (parametric,andnon-parametric)
40
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 41:
Generalizationandregression
| The objective of learning is toachieve good
generalizationtonew unseen cases.
“ Generalization can be definedasamathematical
interpolationor regressionover a set of training
points
“ Models can bevalidatedwithapreviouslyunseen
test set or using cross-validation metfods
a
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 42:
02320001 andprediction
! Classify data basedon thevalues ofa target
attribute,e.g.,classify countries basedon climate,
or classify cars basedon gasmileage.
! Alseobtainedmodel topredict someunknown or
missing attributevalues basedon other
information.
a2
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 43:
Summarizing:inductive modeling > ۵
Objective: Developageneralmodel or
hypothesis from specific examples
5 الال تنه وماقمس ووه مم5 0
| Classification (concept ا
recognition)
x
43
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 44:
Explanationand description
| Learna generalized hypothesis (model) from
selecteddata
| Description/Interpretation of model provides new
knowledge
| Affinity Grouping
| Methods:
' Inductive decision treeandrufe systems
! Association rule systems
' LinkAnalysis
1
a
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 45:
Exception/deviation detection
Generateamodel of normal activity لا
Deviation frommodel causes alert “
"Methods:
| Artificial newral networks
۱ Inductive decision treeandrule systems
| Statisticalmethods
۱ Visualization tools
۶
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi {7
صفحه 46:
Outlier andexception data analysis
| Time-series analysis (trendanddeviation):
! Trendanddeviation analysis: regression,
sequential pattern, similar sequences,trendand
deviation,e.g., stockanalysis.
! Similarity-basedpattern-directedanalysis
| Pull vs. partial periodicity analysis
| Otherpattern-directedor statisticalanalysis
46
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 47:
“7
Example: Moviegoer Database
fmoviegoer_ID
movie _10
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 48:
SELECT moviegoers.name, moviegoers.sex, moviegoers.age,
sources.source, movies.name
FROM movies, sources, moviegoers
WHERE sources.source_ID = moviegoers.source ID AND
movies.movie ID = moviegoers.movie_ID
ORDER BY moviegoers. name;
سس — ی سس
Day مس oP Obert م Oo
موه 1۵ Obert 06 - مه
مه 7 ow Obert - هه
Ome 3 90 Obert امس
اوه باه سق 1 Obert 06 م Owe
مه مه vt 0 90 Obert
Gob 2 60 .تمصت Scharber’ Let
Gram - ee Obert Guger Oop
Obert ddr « 0 بسن
Caw 3 es Obert و
O.Pines The Orde 90 م ون
Obert rem ۵ مه
Cut 3 6 بسسد ۵ لهه Da)
Want - €o هه ۳
Cre 0 29) D.Odwe Prepon 1
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi 1
صفحه 49:
Example: Moviegoer Da tabase
| Classification
| determine sex basedonage,source,andmovies seen
| determine source basedon sex,age,andmovies seen
| determinemost recent movie basedon past movies, age, sex,and
sowrce
| Estimation
| forpredict, needa continuousvariable(e.g,,“age”)
| predict ageasafunction of source,sex,andpast movies
| ifwehada “rating’ fieldfor eachmoviegoer,wecowdpredict
therating a new moviegoer gives toa movie basedon age, sex,
past movies,etc.
a3
Annoaccademicasoos/2000 Introduzione
Giannotti & Pedreschi
صفحه 50:
Example: Moviegoer Database
U Clustering
| findgroupings of movies that are often seen by the
same people
findgroupings of people that tend toseethesame
movies
clustering might reveal relationships that arenot
necessarily recordedin the data (e.g.,wemay finda
cluster that is dominated by people with young
children {or a cluster of movies that correspondtoa
particular genre)
50
annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi
صفحه 51:
Example: Moviegoer Database
I Association Rules
| market basket analysis (MBA): “whichmoviesgotogether?”
| needtocreate transactions’ for eachmoviegoer containing
movies seen by that moviegoer:
MO مس
مر روط سم ۵۵0
سانا مهو رم ۰ مه 6 O08
Dey, (reps) سم من متا ©0006
004: (Prerepemn, Cchanber's Les}
| mayresult in association rules sucha:
{“Phenomenon”, “The Birdcage”} {“Trainspotting”}
{“Trainspotting”, “The Birdcage”} ==> {sex = “f"}
1111
a7
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 52:
Example: Moviegoer Database
I sequence Analysis
۱ similar to MBA, but order inwhichitemsappearin
thepatternisimportant
| e.g.,peoplewho rent “The Birdcage” during avisit
tendtorent “Trainspotting” in the next visit.
32
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi
صفحه 53:
The XDD process
Interpretatio:
nd Evaluatio;
Dp
and Warehousing
\
w
ل
5 ia
annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi ort
صفحه 54:
Areall the discoveredpatterninteresting?
| Adatamining system/query may generate thousands of
patterns, not all of themareinteresting.
“ Imterestingnessmeasures:
| easifyunderstoodby humans
| validon new or test data withsome degree of certainty.
| potentiallyuseful
| novel, orvalidates some hypothesis that auser seeks toconfirm
| Objectivevs. subjectiveinterestingnessmeasures
! Offective:basedonstatisticsandstructures of patterns,e.g,,
support, confidence, etc.
| Subjective:basedonuser’s beliefsin the data,e.g.,unexpectedness,
novelty,etc.
annoaccadenicascesiso0» Introduzione Giannotti & Pedreschi
صفحه 55:
Interpretationandevaluation
Evaluation
| Statistical validationandsignificance testing
| Qualitative review by expertsinthefield
| Pilot surveys toevaluate model accuracy
Interpretation
| Inductivetreeandrulemodelscanbereaddirectly
| Clustering results can begraphedandtabled
| Codecanbeautomatically generated by some
systems (IDTs, Regression models)
55
Annoaccademicasoos/2000 Introduzione Giannotti & Pedreschi