صفحه 1:
۱ Duta © :قاس
Oh & it so +
reseed by
مورا T. Ovss
EL 06
0 6
۳ 09۷/۶, , 09
© Coprakt OOOO, Larexa T. Doss, Debad Poor, “ker.
صفحه 2:
Larissa T. Moss
Method FocusTnc.+ www .methodfocus.com+ methodfocus@earthlink.net *(626) 355-8167
Ms. Mossisfounderandpresident of Method FocusInc.,a company speciaCizingin
improving the quality of business information systems. Shefrequently speaksat
Data Warehouse, Business IntelCigence, CRM,andInformation Quality
conferencesaround the worldon the topics of information asset management, data
quality, data modeling project management, andorganizational realignment.
SheCecturesworldwide on the-BI topics of spiral development methodology, data
modeling data audit andcontrof, project management, aswelCas organizational
issues. Her articlesare frequently publishedin DM Review, TDWI Journal ofData
‘Warehousing Cutter IT Journal, AnaCytictdge and The Navigator. She co-
authored the books: Data Warehouse Project Management, Addison Wesley 2000,
Impossi6leData Warehouse Situations Addison Wesley 2002, and Business
IntelLigence Roadmap: The Complete Project Lifecycle for Decision Support
Applications Addison Wesley 2003. Ms. Mossisa member of theIBM GoldGroup,a
Friendof Teradata,a senior consultant at theCut ter Consortiumanda
contributing member of Ask The Expertson www .dmreview.com. Shefasbeen a
(ecturer at DCI TDWI, MISTIandat the Extension of the California Polytechnic
‘University, Pomona .Shecan be reachedat (moss@met hodfocus.com.
© Cop rnft ODO, Leesa T. Does, Detod Pome, “ac.
صفحه 3:
@resectatiod Oulice
° Oko do we wed by dota quelip?
مه بل ربو6
© Wow we وله جر it today?
act Pevive tevharby soho
۰ Oko do we howe to chore?
عديسفيواننه یسیون
° Wow do we chop?
09 سسب [كائهع سبد
Pome, ‘har. 9 بل ۰ ما ,9069 مرول 6
صفحه 4:
ای ح )۰
سح ع )۰
اوه و وو()) *
٠ Outs is عواموست
© Data is ioteyruted
* Osta values Polow the bustcess rules
* Oota vorrespouds tv established dowd
۰ Ont ts wel dePiced ood understood
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
صفحه 5:
Gyxopiows الوم خن dota
* Op pour proqravs obec wi dota exceptions?
° Ore pou were vodhused ubout weaciay oP data?
* 4s sowe DP pour date is tov stoke Por reporticc?
۰ Is pour dota being shored? Is it sharuble?
° Ore reports wowwistent?
۰ و() it take pour TT stoPP or the eo users kours صا
recounts tosvesisieat reports?
* Opes werriey dato ped couse the syste to Pal?
* Op beepers w of P ot cight? 7
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
صفحه 6:
Oiny dota ابص
© Deen (dePoull) okies
۰ رود ماو votes .
© Deer cher not just
* Oulepurpose Pickle
۲8 ی data entry
+ Cree fore kev صا errors
* Ovatireadiotey voles
۰ Oiptaica oF busicess rules
۰ موق priwary key
8 Oocrusique privary key
© Osstay dota ماو
* Ceappropriae dota روا
6 لدب ۲ مورا ,9005 لسن Pome, ‘Ia.
صفحه 7:
Overy (dePod) voor A
۶ DePouls Por woedatory Pies
SSN 999-99-9
Age 999
Zip 99999
Income 9,999,999.99
2070956120232 ی مد وا رو profes
۴ ره ار 4
هجو لعج ۲ مورا 305 ملسون 9
صفحه 8:
“4 ۷ ” ۱ ۱ A
۶ DePauls wi یی
SSN 888-88-8888 <X_ Ooererbuceu
Income 999,999.99 <X سوه
Age 000 <i Onrpg newer
Source Code FFX اج تسد سوه 0
بت قت حی ae oe اس ور
9 هجو لعج ۲ مورا 305 ملسون 9
صفحه 9:
Oisstay Odes
° Opentivcd systews do ut days require
inPorwotivod or dewoyraphic dott
Gender
Ethnicity 9
Age
Income
Referring Source
j 7 Gee. صا تتامو code worketoy cocacks
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
صفحه 10:
۳ و یمن(
© OG Pek! exphoty hos DPOY weave
» Obick busicess vei euters the dota
» Ot udtsat teve to history it was otered
» OD uahue ta poe or wore other جلاعا
Appraisal Amount
0 25 redefines = 25 attributes !
Advertised Amo 7 Ons = = aw
ای از Not mutually exclusive !
Only the value of one
Sold Date is known for each record !
Loan Type Code
redePed us...
} وود WG erably to idee product رام
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 0
صفحه 11:
Crppic udues ((1)
© OR tec Pou it “(itched Gio” Pies
» Osrxrily oor byte (iP oot oo bit)
» Ish over (B, B, O, 4, ©, 9, ...)
> Oocrnteliqest, csrictulive poder
» Oto سيج اه exchusive
[ موز mere el wm wnt br مسد
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ad
صفحه 12:
عي ۱
Master Cd {AB CDEFGH I}
tA, B,C} سعد سد گک>
{D, E, FB <& مرهج
{G HI} XX صمسود یمه
Ned a CODE TRANSLATION Liat
Pome, ‘Ia. 6 لدب ۲ مورا ,9005 لسن 6
صفحه 13:
* Dustrurtured text
» aw disvercoble potera
» pect be parsed
address-line-1: ROSENTHAL, LEVITZ,
A
address-line-2: | TTORNEYS
address-line-3: 10 MARKET, SAN
FRANC
address-line-4: ISCO, CA 95111
وسح ملحب وممصم دا i DRESS. Aecblay
6 112 جم" لحماه 0 ,0 ۲ مورا ,9003 ناسون 6
صفحه 14:
Coctradiniiag values
۶ ون to coe Piel one topvosistedt wi
values fe carter لا لاه
1488 Flatbush Avenue
New York, NY 75261 CX exw Ly
Type of real property: Single Family Residence
Number of rental units: سیک تاه
EISEN, سا اه لیات heer
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. OF
صفحه 15:
Otkiion oP bustess rues A
* @usitess Rule? Odustuble Rute Oortqayes wust hove
» موه( Ieterest Rute ( Crile)
» منم Ioterest Rute ( Poor)
* Ousicess Rule: @ Orthay is thas higher thoa a Plo
ceiling-interest-rate: 8.25
floor-interest-rate: و
} جوا ]نوتیز ey to لت proce proPacbiny
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 6
صفحه 16:
A جوا روم تهج
° bith history, Poy, stoned io pperciosral Pies
« primary hes ure vusiowarily re-used
» way kee مه ما محص ”ال د
January ‘94: branch 501 = San Francisco Main
region 1
area SW
August ‘97: branch 501 = San Luis Obispo
region 2
area SW
۳ |
© هجو لعج ۲ مورا 305 ملسون 9
صفحه 17:
Oowrudique privary heps
ای متام ).و
« ای موه تال
Customer Name Phone Number Cust. Nui
Philip K. Sherman 818.357.5166 960601
Philip K. Sherman 818.357.7711 960105
Philip K. Sherman 818.357.8911 960003
>» Duliple ewpbyer ای
Employee Name Department Empl. Number
July 1995: Bob Smith 213 (HR) 21304762
January 1996: Bob Smith 432 (SRV) 43218221
August 1999: Bob Smith 206 (MKT) 20684762
DIES WEES. —tecblay to code ewploper beurPts reuds
© Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. a
صفحه 18:
Oisstoy data rektiocships A
° Oats that should be related to other data igo depeudedt
(pareutchid) مشاه
اس با موه یبا ®rack
» Branch number 0765 does wit exist ta the
@RBOOW table
سوت سس وتو
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 6
صفحه 19:
۱ ant be
» tuo eutiy types wit the sowe hey vohes
Purchaser: Jackie Schmidt 837221
Seller: Robert Black 837221
© Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. 6
صفحه 20:
Ikppact oP enoveves data 7
۰ له از جوا مه to correct dota problews
۰ مج مق ceeded to correct data problews
° Doe ced تلاو required to re-run jobs thot obec
0 Die wosted arguing pver iaoousistedt reports
° Lost busicess vpportuaties due to ucravuituble data
* Ouuble to dewousirate busicess potecticl ao
bupout
* ices wo be pod Por ware w place wits
(pvernnved requicicas
* Ghippieg products to the wrod vustiowers
* Oa public relives with vustowers
steeds to روص وا جه مها
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. مع
صفحه 21:
ea
Oly Bon
“hapa DD end 1 Cy,
Om
On ver
موه و
سمه 5
موه و
$s ogee
$077,000
s9,009,00q
$00,000]
$ ao/eo0
ممه و
موه و
see
$9,999,000 =
9 مج
OOK 2
ken 00
4 و
« 308,070
9 اج موه
®
0 همه
03 دجم ممه
0 Pome, “ec.
lo مس
0 wm
suse
$0.68
و
و
0
‘S000
$0.00 he
"Phew (GMM bak rae)
Cran cot oor
لله مرجت يسا سسا
اك
او شم
مس
سوه ری شا
مس رای
سس هن
ماس
سا ,3005 باون 6
صفحه 22:
redueckrat dct 1 ی و
© Werkoor (ORO, disks) oad یامه (progres
wortteodure) costs ured
us ores oP vorvoicted reduadaat dete
۰ Cntr tive itches to recoil toccusisteuries
*) Cxtea resources ueeded to repounle وود
* Onwise busicess devisioes wade due to الیل
ied ioovasisteat data
° Lost vpportuciies due to uoretable dat
© Overckargicy or pverpopswed Por products
* Duplicate shippicagy oP products
وی تذل لو مه لوب رو( ۰
اوه
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ee
صفحه 23:
Ober Orn
‘Porwutios Oevelopwedt Ovst Badbstr ‘hap Deed Cay
Tod
Omer سید
أسمية | میم Oa rad | سفسسسين | 8+
Ted | عم | سفسية إستويمة | سسوية | سه
Ome میم | تسه | Ome | اه Gere | تست
0ك
Gearon sheeted De ممه 0.00 2000 | 2000/00
Cane ei
cermin pnp + مود 0
مس ممم ميو و — OF
Ode Oe
ardrereve eat ge + tow aon — genowo | 5 eoa0,hro
aed carn ome دض لممموه و
Pedr coma بين تعقه ۳0 uso | seman $16,000} 000
an Polat oe ۳0۰ ۱
ی aed ee00oop سمه .نمت
سينا 46 fe0,000,00p «00%
dat arate prenpony LD”
ert onan ol
* Oeerane rome oferty decby aerng nd eck caer رس
0 empame خدج ل sou dan ليق اعماج لجن chew dao, err be pov
يي cabs ese, resne peter daw rma nore)
‘hens HDD phan np lad he pend rd SED Cd vent
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 9
صفحه 24:
Oirny data — Wow did t koppeo?
7
S| é بع اه
8 8
S| 5/85
1 1 alr ۳ 2 ar 7 7
ل
للملا مد ۰ 0
۰ تلم وم
ody doo
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. er
صفحه 25:
Wrong priority on project constraints!
dadusttdl Bye:
۰ Cheuper, Poster, better
رای و os possible
© Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. es
صفحه 26:
۳ ۶ 277۶ - و رباص Pie ts
یبا
* Guero نصا ها مه side ood fo IP wants quotiy, but rarely
ip the extea five quedo or toheo ty uchieve it.
Quali odd eve oe polorized poostranis.
۰ Dke higher the quoliy the wore ePPon (tieve) i tokes to deliver.
۰ Cowpusies ure dived by shorter und shorter schedules.
A= moe 72
YOu O00
© Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. ea
صفحه 27:
Cy can’t
۰ Mate بوص حاصو( technology
* Cxstower Retaivaship Qocagesect
° Coterprise Resvurve Phocaiccy 6
° Coterprise Opplcaiogn lotecpotica 3
۰ امیرمه() بلیوسی) La
et Peowe Veckwby COohioe
مع Pome, ‘Ia. لدب ۲ مورا ,9005 لسن 6
صفحه 28:
Outs هه(
.رف 00۵
ممصمل ع موت مدلا مره با ای و تم وگو
the eoiryprea و مس رای
عمج بو سود ] مرس مد 7
Dw recheck D deported views
مسر مومس | مرو و
] مك مسا Deo tere COED] 7 Ace
D chor reporter
1 read crafts reporieny Dee ost to chores car
0 Pewter chacs celery D racrecsed cata recherche
وه مب و
it sounds too good to be true, it is to good to be true.
© Oaprntt ODO, Laresa T. Doss, Drtad Pome, ‘har. ee
صفحه 29:
Custower Rehtiowship Queagewedt
CRO deters ...
اجه مه سم سیر انیت هقی ومد
ike pryonizatiocd bPetoe, oreatay cowpetiive odverctcce
thous customer service excellence.
The Rou:
| syste
O departed views
7 سوه وق chats
او بو موم املسم 7
سس Pore te too [1
] رمرم ones
Phe Prower!
0 cet مرس
7 مهل que
رسد 7
اه أدب -وصخاصد 1
1 هه بو سم
D keeway pour ooxopeti
اور لنوت موس 17
it sounds too good to be true, it is to good to be true.
Pome, ‘Ia. 9 لدب ۲ مورا ,9005 لسن 6
صفحه 30:
CGuterprise Qespurve locate
ERE delivers...
رت سای ماوت هط ار مهو و
و موه موه و وا ام
1 وس The Red:
0 chet tater Days oowersion wl ores
7 مرو ۳
1 Deno dey chat
0 chat quay D oper ction Pores
] (جاطصجى) بلسي سدم [ صوص رصم reports
اب ول آمسیی تا یمه ] و رو
1 60 اجه سا له
it sounds too good to be true, it is to good to be true.
Pome, ‘Ia. 90 لدب ۲ مورا ,9005 لسن 6
صفحه 31:
CGuterprise @pphouion ٩۱
EO delivers ...
rt ی و با مرو ob deporte مهو
اه توص روط ار
تحسم ای و ات لو
Te Pedy: 2
سمل روف 7 |
beverage extoiery chats Dow re kteqroios
لول 2D sll cara موسیگ لوا bch
مه oP جلجماطا لاد 1 مس رومججومو روم 7
wrest he marred cht و 7 Paster chats delivery
OO Rarer dhe arrears core
it sounds too good to be true, it is to good to be true.
Pome, ‘Ia. 90 لدب ۲ مورا ,9005 لسن 6
صفحه 32:
Phe Prowise: Realy of KD:
7 سس موف rab Deo اه کل
7 هم ] موه مك coe RTI
11 Kistoraced cat Dee cosy
17 Pewter char delve ] جم مسلاصات بوجاس ادس
سفه ولمم 1 و وا موی
] مس & oy meter coat رو توا
1[ هه هه ۶و وحاص لجر Odin n chert
D hess re-sobieny scxre problears ی مر
‘it sounds too good to be true, it is to good to be true.
Pome, ‘Ia. 9 لدب ۲ مورا ,9005 لسن 6
صفحه 33:
Oks the یط
You cannot keep doing
what you have always done
and expect the results to be different.
Not even with
new technology.
“Dot wordt’ be "لسيجا
Operk, Grr Trek
Pome, ‘Ia. 59 لدب ۲ مورا ,9005 لسن 6
صفحه 34:
Oke do we howe to choo?
. Ossess the curred state oP data qualiy of pour posepooy
8. Ocderstacd ced Pre the root causes Por data orotic
©. PerPorer dota oudiis requkedy (covethy, quarter)
€. Gtop worker to isvlated “suie kices”
> Gtop revreutey data
S. Cearaly wacage pour dota the o bustuess usset
(Coterprise IePorevation Dacageweut [(B10])
> @ssewble dota us ceeded Proc the data iweciory (euterprise dota
wodel cod wweta data)
> Grordandize ced recvorie date trocesPorewativas Por BVOO
upphicaives (coordrated (PL steric area)
©. Gre dows project scopes ty tacorpornte data quel ood EID
hve
©. Cwobed dota qualiy ced C10 untivites to of projets
© Conrad ODO, Leesa P. Doss, Detod Pores, “ow. oF
صفحه 35:
...i¢ u cross-organizational discipline
und ua enterprise architecture
Por co tote<rated ovllertioa oF
vperdivcd as well us devision support
whick provide the bustuess oreremuciy
easy access to their busicess data, word
ull the to woke accurate business decisions.
... & wt bosiess we wud
Pome, ‘Ia. 56 لدب ۲ مورا ,9005 لسن 6
صفحه 36:
20%
Data Delivery ~
Management ۱
Provide 9
Get control intuitive access
over the to business |
information _
Data Reengineering
(Enterprise Information
Management)
© Oaprntt ODO, Laresa T. Doss, Drtad Pome, ‘har. 56
صفحه 37:
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. مو
صفحه 38:
Tedusindbage weet wodet
2 وج مهس سطاهسلتا
۱ ۳۵ مسب
»| عد
Seats |
25/ ,9 2 5 مس | [
ol Lee = Fe
0
> Gaw ad wrk حر
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 56
صفحه 39:
Pome, ‘Ia. 59 لدب ۲ مورا ,9005 لسن 6
صفحه 40:
4ePorwoivd Bye:
* Qeussewhle the eure ووو مجامج
° Qewe usets row ieciory
vk proposilios
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 0
صفحه 41:
etree “RePavtoricg”
Ie - Kent Beck
rogvil= @pplicaiva
© Oaprntt ODO, Laresa T. Doss, Drtad Pome, ‘har. 60
صفحه 42:
GoPiwore reuse vowept (C)
٠ Requreweus ooo be tested, ood koplewedted و sev
جل موسوم
* Goope ts very soot جاطامج موی اجه
° Devkowlogpy icProstructure ooo be tested ocd proves
* Onte uokaves (per release) ore reliively scott
* @rviett schedules ure pusier tv estwote becouse the
ای رو وا وود
۰ مجممممرج() uniiviies coo be iteroively rePiced, koued,
ued udapted
000:
The quoliy oP the release delvercbles (ord uticvatehy
the qualiy oP the opphicaives) wil be higher!
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
صفحه 43:
(0 enn ol Toei deg
BI/DW Development Steps
1. Business Case Cross-
Assessment ... = organizational
2.A Enterprise Technical Cross-
Infrastructure .. organizational
Data 28 Enterprise N Non-Technical Cross-
Gane Infrastructure .. organizational
‘palais 3. Project Project-specific
Planning .... .. Project-specific
4. Project Requirements Cross-
Definition 5 organizational
5. Data Project-specific
Analysis ...Cross-
6. Application organizational
rototyping .. 9 Cross-
7. Meta Data Repository organizational
analysis. Cross-
و1 Database organizational
Design
صفحه 44:
(©)
© Coexvitoent ty dats qucliy ecobedded ta the ا
۰ ا < principle
۰ صجب وم مس مرس > 6
© Oreck ho take @ corxooa kPoreioa achierkns FESOUTCES
(voter prise dota come!) < policy
ام[ dowrstireu iPorordion cores to the
requireweuts dePioiiza step
Develo data دحل جما جا ميت ovate Step
وه دمص امه[
لا رتم و لین data wodels cad ملك ی
۰ سا نون devebpemal®TL processes
1 اس مماسطط همست < enforcement
بای ال مه رای لو تمه oor
مجحب لعج یسوط مب رما the
تا مر os weta dott
© Cop rnft ODO, Leesa T. Does, Detod Pome, “ac. CE
صفحه 45:
سوه ها Base ; اسر
۵9/۵۵ سس Ca
اي
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. eS
تسه
صفحه 46:
10 respowwtbiliies
Orovess wodels
ناك
rch
@ ee Discover,
—_,| Coordinate, | ©
Outabases Integrate, 1
© Dera daa ام Document, | ۶
ae = Control
Ovsivess weta dota
4 مهن حك
٠ Policy esiory
۵ AP corel مره
Procedures ی یت
هم Doss, Drtad Pome, “ler. ۰ مورا ,0009 برممً
صفحه 47:
Quatre stewardship
© Bvordoos oP the dota while دز اذ beta لو vr
wwotctatced by thea
* Create sterdacds ced procedures ty eusure thot policies cord
busteess nudes ore koowe ood Poised
* CoPorve udkereue ty poles ced bustcess rules tho! qoverct
the data while the data is to their mustody
© Periodical) wouter (cudt) tke qualiy oP the dota جز their mustody:
* Ov منم os هه
* Cua be u busivess persva pr aa VP persva
“One whe wanages carters property.”
ممم Pome, ‘Ia. لدب ۲ مورا ,9005 لسن 6
صفحه 48:
وه 0۵۰)
* @uhoniy te establish policies ood set bustuess rules Por
the dota verde thee patrol
° Oevide what the oP Pind euterprise dePicitica ood doouict
يز Por the dota ددمت عفجكلا علد
۰) له لجه her cod users vo proper usage of
their dota,
© Crequedy, but at uboaps, the data prigtcrator
* Cac be o person pr a power
“Oue who bree the kecpal right tothe px
© Orprntt ODO, Lassa T. Ores, Deed Pore, “kar. ©
oo of ۰
صفحه 49:
وان لمی مت
و
۱
Deranged ]
مس رومام YY
سس
سل
OOOO, Poets
و Pome, ‘Ia. لدب ۲ مورا ,9005 لسن 6
صفحه 50:
ای
۳
لعج سعط
طم معط
Payment
Method
Product,
=
r موی TS Payment}
lee
كه
۲۳-6
۳0
7
=
at
=
۳17
Salespersor ¥
2
۳ تس
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
Existing
Customer
Potential
Customer
صفحه 51:
Find the!
Ovrtradctey yokes
Otehttiva oP busters rules
Qeused privary keys
QOow-ucique primary keys
تا ول بو(
وا ال و
ی ]
+ Overy ves
حصا ومد مولیت ۰
لو Opry
۰ مومسم Piekks
0
۰ PreePorw uddess heer
© Cop rnft ODO, Leesa T. Does, Detod Pome, “ac. 90
صفحه 52:
۱
۰ )دز ماه اس رتاو نب tov bron)
۰ رون wt be work the toe ord woe to eos every
dota elect
© Dot ol dota is equally ادع مياه
* Oot dota cod be ceodsed
* Yew de pou koow what to dec?
... ۳ ts the quesiod
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. Se
صفحه 53:
questions (() میم
* Can the dota be deoused? nag
Does the correct data exist anywhere?
Is it easily accessible?
© Ghowd the dota be cleused?
How extensive is the problem?
How elaborate will the cleansing process
be?
Is it cost-effective?
9, boresa T. Doss, Detod Poms, “her. 99
صفحه 54:
(©) حمصاوصب بجوت !]>
rag
© OW oF we buldoy the upplicaica?
What business questions cannot be answered today?
© OW we we wt oble نوی و the business
questions?
Is it because of this dirty data?
Is it because of these missing relationships?
© Oil the هی تلم و ناس the vost
oP the (۶
9, boresa T. Doss, Detod Poms, “her. 9
صفحه 55:
Cotexpries vP data مه
Owes devel
© Orica مل
= Dot ol deta هه راو to ol pod users
= Ol ontcdl data cast be لمان
- مه نارامج( Brekke
۱
مرو to the نا رتم wot ubsvhiely oical
= Panther ييا بعل يصوي شعو
وان us بو os tive claws
— Vkose thot cocont be dlecased should be bucoped to oriicdl Por the:
ued release
حافك امم موص و
عمهذا صا عله صا نت رل اوه
حك صدة خا جره نا يسصميا0 -
112 جم" لحماه 0 ,0 ۲ مورا ,9003 ناسون 6
صفحه 56:
)0 — واو — preveciod
© Ohkere should the dtp dota be cleuased?
In the staging area of the BI application?
In the source (legacy) files?
© Okeo should it be #لدصصمصاء
Retroactively?
At data entry time?
* Wow should it be cleused?
Use data cleansing or ETL tools?
Write procedural (COBOL/C++) code?
* Oke ull we حك & preved dey date to he Ptr?
Cowce Ona Rewragweriag ...
Ted [Dad] Quay Oorrgeoed (TAO)
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 90
صفحه 57:
Coordirated BTL stacy
Guerre Ons 295 0-5 یمرن Onaga مسا
سوت Or Ow Gore! rev
Cw) تا موجن ان ان
rite هت تن فا پر نو وه قار
س0
ا
Oo)
سحي سس +09 اسر
سس
مب( ۵0۵
مسق al
سب
یه 60-۲
مو
1 جد لحماه !ا ,حدم( 1١. ددرا _ 0005© يوون ©
صفحه 58:
صفحه 59:
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 99
صفحه 60:
CT te-vus! record vows
, OUTPUT |# Output Records
PROCESS SS
ها = ۳ +
INPUT MODULE =
RECORDS
Ss
# Input Records =
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. مه
صفحه 61:
ات مرول وس را۳)
lecords Per First Output Domain
+
[fecords Per Second Output Domain
+
kecords Per Third Output Domain
+
‘ejected Data Values
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ea
صفحه 62:
ات اون طسو را۳)
Per First Output Amount $ سح
OUTPUT.
MOUNTS,
-
aml
AMOUNT!
=
tal $ Per Second Output Amount
EJECTEL} tal $ Rejected Amounts
MOUNTS
|_, PROCES
INPUT |||
MOUNTS}
_
Total $ Input Amounts
Total $ Per First Input Amount
+
Total $ Per Second Input
Amount
+
Total $ Per Rejected Amounts
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
صفحه 63:
مزا data, eppevicly ۱
حول مه
۰ مور oF dota pwoers, 0۲۹ 09
سوه وومصو لجن
9 هجو لعج ۲ مورا 305 ملسون 9
صفحه 64:
Outs qualiy wouturtiy
Cron Dt wb level oF
4 a. OQ wotuniiy is
Deter proPibe
7 on a
تسيا 5 cherry BND
9
rs kn ها
a aed progres :
یی
يك
short OQ wrtuds &
هم اب ل | سدس |
©
Crowe 7 هی
تست | 3 امد
0 fl a
“| Opkeotration
6 لدب ۲ مورا ,9005 لسن Pome, ‘Ia. Or
صفحه 65:
OQ vepebliy weturiy woe ((1)
(Corer: Lory Corboh)
COO Level 0. Ouvertaiaty - Ooovesrious ond بستكي
> — Data quay problews are devied.
> Op Porerd dota qualiy processes سحام
> Data quali) faltaives ane ad hor oad chai.
واه الط oo ول و success وق«
0000 مرب .9 امصا - Nhe big Bho! ood lip service
> Dats que) problews ane achar hedged.
> Dao problews arr otached oF they coe up.
> Orman Puan مدل ام و و quai iii,
raker thaw the orcrizatioc. مره موه وه رون«
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 96
صفحه 66:
(Corer: Lory Corboh)
COO Level 9. ایو ولمم
> مرف رو وت toes DPR.
< Caerprerwide dota لم وج موه رل
> Dota qual problews ane corrented ut he source (where possible).
> Data quahy KeproUEeU process ir Ketiirratzed.
COO Lae &. Disdow - Oching ماد ه
و و موم( resprasbiy Por مدل quit.
> Data gay (uP reports to a chieP oPPicer (O10, CKO, COC).
» Dot gun) porreniva chomes to data dePert preven.
« Ol eters array one لام
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. وه
صفحه 67:
(Corer: Lory Corboh)
COO Level S. Certainty - Dirvoca
« ota dePert prevertion i the wats Pore.
> Data quay i oa iaiegral port oP the bustaess processes.
> Ol بو رام مج و مه توص processes.
> De ature of اوه رهم
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. مو
صفحه 68:
ee
Orqeatzaiodd tezpact
* Cross-or~pizeiocd tusks cod respousibities we wt
well dePiced
٠ Date qualiy respousibiliy is oot له له
۰ Ocha oP date is ot vedersiood or appreciated
© Crvjects ore oPted vost justified usiay the جه ماص لوا
sweat wodet
* Resour reqineweus ore ut well dePiced
° Ikvpert oo uppicaion developwedt ewpire
* Op reward Por data shoriery
* Resistowe to chore
و اس
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
صفحه 69:
© نی oad VP )مشاه (
۰) ced busttess vlaborativa (“portcership”)
٠ ID aed VP ootaboraivs (“partcership”)
© Ceara” ead User eure
Pome, ‘Ia. 9 لدب ۲ مورا ,9005 لسن 6
صفحه 70:
مان
PO
cpllaboratiop collaboration]
“*[م»ه ] 000 * * معن 00
Ober |‘ 1 سور
یات Ccterpree
۳ 000 ۵0 ۵۵ دص
ی
هجو لعج ۲ مورا 305 ملسون 9
صفحه 71:
dS steps to [oa] 1۱۲۶ (a)
Onderstoad the root causes Por pour correo dott
chavs.
8. @vvept respowsbilyy
© “Ves, itis pur Pout” Por betcry tc this wees.
* Ooveptiog respousibiiy is a prerequisite Por choo.
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.
20
صفحه 72:
49 steps to [0G] recovery (©) e
/\
من
©. ما ن طب
٠ )( نج افجلا بيج keow better”, the devision is
pours: Gtay stuck or chop.
٠ Phere cod be ww wore Pose hopes Por oy
siiver bullet امه یاوه
60. سس رید causes
© Oko we the specific root causes Por ال اوه
fo pour or<proizaina?
* Cowes root couses Ure DOWN, Sow Ure wt.
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. هم
صفحه 73:
dS steps tw [0G] revovery (9) e
6. ما0 ce
* Adoesd't water “whose Pout” itis trot the
۲۳۱ هه Exist.
۰ IP west colborde wit the busicess
vo wweuniy to oP et chor.
واه موه نمی موه وونی) ۰
امین wih busicess
/\
9. edly change یه
٠ Oho will be the cowiers?
۰ Chooges wost be systesnio ocd holistic, ut
isvhited aod sporedic.
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. هم
صفحه 74:
49 steps to [0G] recovery )(
من
2 Gpred te word
© Do ebro chanes, here wust be “sowetkiog in ®
Por everybody.
* Otherwise, chooges tiqyer condety cod unde
results in resistoae or و
موه ۳ .6
وه و لیوا yet وه و موه رز ۰
9
۰ لور people in chore phasic.
* Cross-oryenizeivcd روط one phased ic.
© Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. PP
صفحه 75:
49 steps to [0G] recovery (S) e
من
/\
9. Cronies changes
٠ Gowe chooges ore eusier to iopleswedt thot
vikers.
۰ عم من موق o higher puyback.
(0. ‘koplewedt changes
* Gvervocr oPPevted by the changes wost hove oo
Dpportaily to review ond upprove the pho bePore
iwplexpectaiivd.
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. هم
صفحه 76:
49 steps to [0G] recovery (©) e
/\
من
وج ۳۳و فحت( .00
وه لا مد له اوق ٠
Ore the chooges ofPeciiag cour whersely? ©
ا
له و اجو Ootkiey is perPect the *
۰ Oke wight work io vor orqecizeive wap uot work ia
اس
هم ,9009 لسن 6
صفحه 77:
O®ibiograpky
*Adelman, Sid, and Larissa Terpeluk Moss. Data Warehouse Project Management.
Boston, MA: Addison-Wesley, 2000.
*Aiken, Peter H. Data Reverse Engineering: Slaying the Legacy Dragon. New York:
McGraw-Hill, 1995.
*Brackett, Michael H. Data Resource Quality: Turning Bad Habits into Good
Practices. Boston, MA: Addison-Wesley, 2000.
*Brackett, Michael H. The Data Warehouse Challenge: Taming Data Chaos. New
York: John Wiley & Sons, 1996.
*English, Larry P. Improving Data Warehouse and Business Information Quality:
Methods for Reducing Costs and Increasing Profits. New York: John Wiley & Sons,
1999.
*Hoberman, Steve. Data Modeler’s Workbench: Tools and Techniques for Analysis
and Design. New York: John Wiley & Sons, 2001.
+Kuan-Tsae, Huang, Yang W. Lee, and Richard Y. Wang. Quality Information and
Knowledge Management. Upper Saddle River, NJ: Prentice Hall, 1998.
*Marco, David. Building and Managing the Meta Data Repository: A Full Lifecycle
Guide. New York: John Wiley & Sons, 2000.
*Moss, Larissa T., and Shaku Atre. Business Intelligence Roadmap: The Complete
Lifecycle for Decision-Support Applications. Boston, MA: Addison-Wesley, 2003.
*Reingruber, Michael C., and William W. Gregory. The Data Modeling Handbook: A
Best-Practice Approach to Building Quality Data Models. New York: John Wiley &
Sons, 1994.
*Ross, Ronald G. The Business Rule Concepts. Houston, TX: Business Rule Solutions,
Inc., 1998.
*Simsion, Graemex,aje Mtanleling.EssentialsAnalysis,Dasign, and Innovation. ۵
Improving Data Quality:
Why is it so difficult?
presented by
Larissa T. Moss
President, Method Focus, Inc.
DAMA
Oakland, CA
May 7, 2003
Copyright 2003, Larissa T. Moss, Method Focus, Inc.
Larissa T. Moss
Method Focus Inc. www.methodfocus.com methodfocus@earthlink.net (626) 355-8167
Ms. Moss is founder and president of Method Focus Inc., a company specializing in
improving the quality of business information systems. She frequently speaks at
Data Warehouse, Business Intelligence, CRM, and Information Quality
conferences around the world on the topics of information asset management, data
quality, data modeling, project management, and organizational realignment.
She lectures worldwide on the BI topics of spiral development methodology, data
modeling, data audit and control, project management, as well as organizational
issues. Her articles are frequently published in DM Review, TDWI Journal of Data
Warehousing, Cutter IT Journal, Analytic Edge, and The Navigator. She coauthored the books: Data Warehouse Project Management, Addison Wesley 2000,
Impossible Data Warehouse Situations, Addison Wesley 2002, and Business
Intelligence Roadmap: The Complete Project Lifecycle for Decision Support
Applications, Addison Wesley 2003. Ms. Moss is a member of the IBM Gold Group, a
Friend of Teradata, a senior consultant at the Cutter Consortium, and a
contributing member of Ask The Experts on www.dmreview.com. She has been a
lecturer at DCI, TDWI, MISTI, and at the Extension of the California Polytechnic
University, Pomona . She can be reached at lmoss@ methodfocus.com.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
2
Presentation Outline
• What do we mean by data quality?
Dirty data categories
• How are we addressing it today?
Ineffective technology solutions
• What do we have to change?
Approaches and techniques
• How do we change?
12 steps to [DQ] recovery
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
3
What do we mean by data quality?
•
•
•
•
•
•
•
•
Data is correct
#1
Data is accurate
Data is consistent
Data is complete
Data is integrated
Data values follow the business rules
Data corresponds to established domains
Data is well defined and understood
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
4
Symptoms of poor-quality data
•
•
•
•
•
•
Do your programs abend with data exceptions?
Are your users confused about meaning of data?
Is some of your data is too stale for reporting?
Is your data being shared? Is it sharable?
Are reports inconsistent?
Does it take your IT staff or the end users hours to
reconcile inconsistent reports?
• Does merging data often cause the system to fail?
• Do beepers go off at night?
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
5
Dirty data categories
•
•
•
•
•
•
•
•
•
•
•
•
Dummy (default) values
“Intelligent” dummy values
Missing values
Multi-purpose fields
Cryptic values
Free-form address lines
Contradicting values
Violation of business rules
Reused primary key
Non-unique primary key
Missing data relationships
Inappropriate data relationships
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
6
Dummy (default) values
• Defaults for mandatory fields
SSN 999-99-9999
Age 999
Zip
99999
Income 9,999,999.99
Inability to determine customer profiles
Inability to determine customer demographics
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
7
“Intelligent” dummy values
• Defaults with meaning
SSN 888-88-8888
Income 999,999.99
Age
000
Source Code
‘FF’
Non-resident alien
Employee
Corporate customer
Account closed prior to 1991
Inability to write straight forward queries without
knowing how to filter data
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
8
Missing Values
• Operational systems do not always require
informational or demographic data
Gender
Ethnicity
Age
Income
Referring Source
Inability to analyze marketing channels
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
9
Multi-purpose fields
•
ONE field explicitly has MANY meanings
» Which business unit enters the data
» At what time in history it was entered
» A value in one or more other fields
Appraisal Amount
redefined as
25 redefines = 25 attributes !
25 redefines = 25 attributes !
Advertised Amount
redefined as
Sold Date
Loan Type Code
Not mutually exclusive !
Only the value of one
is known for each record !
redefined as ...
Inability to judge product profitability
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
10
Cryptic values (1)
• Often found in “Kitchen Sink” fields
» Usually one byte (if not one bit)
» Highly cryptic (A, B, C, 1, 2, 3, ...)
» Non-intelligent, non-intuitive codes
» Often not mutually exclusive
Inability to empower end users to write their own
queries
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
11
Cryptic values (2)
•
ONE field implicitly has MANY meanings
Master_Cd {A, B, C, D, E, F, G, H, I}
{A, B, C}
{D, E, F}
{G, H, I}
Type of customer
Type of supplier
Regional constraints
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
12
Free-form address lines
• Unstructured text
» no discernable pattern
» cannot be parsed
address-line-1:
A
address-line-2:
address-line-3:
FRANC
address-line-4:
ROSENTHAL, LEVITZ,
TTORNEYS
10 MARKET, SAN
ISCO, CA 95111
Inability to perform market analysis
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
13
Contradicting values
• Values in one field are inconsistent with
values in another related field
1488 Flatbush Avenue
New York, NY 75261
Texas Zip
Type of real property: Single Family Residence
Number of rental units: four
Income property
Inability to make reliable business decisions
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
14
Violation of business rules
• Business Rule: Adjustable Rate Mortgages must have
»
»
Maximum Interest Rate ( Ceiling)
Minimum Interest Rate ( Floor)
• Business Rule: A Ceiling is always higher than a Floor
ceiling-interest-rate:
floor-interest-rate:
8.25
switched ?
14.75
Inability to calculate product profitability
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
15
Reused primary keys
• Little history, if any, stored in operational files
» primary keys are customarily re-used
» may have a different rollup structure
January ‘94: branch 501 = San Francisco Main
region 1
area
SW
August ‘97: branch 501 = San Luis Obispo
region 2
area
SW
Inability to evaluate organizational performance
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
16
Non-unique primary keys
• Duplicate identification numbers
» Multiple customer numbers
Customer Name
Philip K. Sherman
Philip K. Sherman
Philip K. Sherman
Phone Number
Cust. Num
818.357.5166
960601
818.357.7711
960105
818.357.8911
960003
» Multiple employee numbers
Employee Name
Department
Empl. Number
July 1995: Bob Smith 213 (HR)
21304762
January 1996: Bob Smith 432 (SRV)
43218221
August 1999: Bob Smith 206 (MKT)
20684762
Inability to determine customer relationships
Inability to analyze employee benefits trends
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
17
Missing data relationships
• Data that should be related to other data in a
dependent
(parent-child) relationship
Branch
Employee
Benefit
» Branch number 0765 does not exist in the
BRANCH table
Inability to produce accurate rollups
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
18
Inappropriate data relationships
• Data that is inadvertently related, but should not be
» two entity types with the same key values
Purchaser: Jackie Schmidt 837221
Seller:
Robert Black
837221
Inability to determine customer or vendor
relationships
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
19
Impact of erroneous data
•
•
•
•
•
•
Extra time it takes to correct data problems
Extra resources needed to correct data problems
Time and effort required to re-run jobs that abend
Time wasted arguing over inconsistent reports
Lost business opportunities due to unavailable data
Unable to demonstrate business potential in a
buyout
• Fines may be paid for noncompliance with
government regulations
• Shipping products to the wrong customers
• Bad public relations with customers
– leads to alienated and lost customer
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
20
Cost of erroneous data
© Larry English,
Improving DW and BI Quality
Direct Costs of Non-Quality Information
Marketing
Campaign
Time: ($60/hour loaded rate)
Creating redundant occurrence
Researching correct address
Correcting address errors
Handling complaints from customers
Mail preparation
Materials, Facilities, Equipment:
Marketing brochure
Postage
Warehouse storage
Shipping equipment and maintenance
Computing resources:
CPU transactions
Data storage
Data backup
Per
Instance
Number
of
Instances
2.4 min
167,141
10 min
5,000/mo
0.3 min
6,000/mo
5.5 min
974/yr
0.1 min
393,273
$1.96
$0.52
$0.01
$5,000/yr
393,273
393,273
393,273
36%
$0.02/trans 393,273
$0.001/mo
393,273
$0.005/mo
393,273
Total
Number
Per Year
1
12
12
1
4
Total
Cost
Per Year
$ 401,138
$ 600,000
$ 21,600
$
5,357
$ 157,309
4
4
4
1
$
4
12
12
Total Annual Costs
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
$3,083,260
$ 818,008
$ 15,731
1,800
$
31,462
$
4,719
$ 23,596
$5,163,980
21
Impact of redundant data
• Hardware (CPU, disks) and software
(program
maintenance) costs incurred
as a result of uncontrolled redundant data
• Extra time it takes to reconcile inconsistencies
• Extra resources needed to reconcile inconsistencies
• Unwise business decisions made due to redundant
and inconsistent data
• Lost opportunities due to unreliable data
• Overcharging or overpayment for products
• Duplicate shipping of products
• Money wasted on sending redundant marketing
material
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
22
Cost of redundant data
© Larry English,
Improving DW and BI Quality
Information Development Cost Analysis
Portfolio
Total
Number
Category
Infrastructure Basis:
Enterprise architected DBs
Enterprise reusable
create/update programs +
Total Infrastructure expenses
Value Basis:
Total retrieve equivalent pgms +
Total value-adding expenses
Cost-adding Basis:
Redundant create/update pgms
Interface/extract programs
Redundant database files
Total cost-adding expenses
Lifetime Total **
200
300
300
500
400
600
1,500
Relative
Weight
Factor*
0.75
1.50
1.00
1.50
1.00
0.75
Average
Unit
Dev/Maint
Costs
$ 15,000
$ 30,000
$ 20,000
$ 30,000
$ 20,000
$ 15,000
Total
Dev/Maint
Expenses**
Total
Infrastructure
Value-adding
Cost-adding
Expenses
% of
Budget
Expenses
$ 3,000,000
$ 9,000,000
$12,000,000
24%
$ 6,000,000
$ 6,000,000
12%
$15,000,000
$ 8,000,000
$ 9,000,000
$32,000,000
64%
$50,000,000
100%
3,800
* Determine relative effort to develop average unit of each category using effort to develop a retrieve program as “1.00”
+ For programs that retrieve some data and create/update other data, determine the percent of retrieve only attributes and percent of
create/update attributes (e.g., to retrieve customer data to create an order)
**Based on 3.800 application programs and database files in portfolio and $50 Million in development
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
23
Dirty data – How did it happen?
Chief
Business Units
Business
Manager
Technology
Manager
...
Technology
Manager
Client
Sales
Business
Manager
...
Client
Inventory
...
Client
Distribution
Chief
Information
Officer
Client
Customer Support
Chief
Operating
Officer
Technology
Client
Product Pricing
Business
Client
Financial (AP & AR)
Officer
Client
Marketing
Executive
?
...
IT
paired with
IT
IT
IT
IT
IT
IT
Information Technology Units
• data redundancy
• process redundancy
• dirty data
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
24
Major cause for data deficiencies
highest to lowest priority
Project Constraints
Priority
TIME
SCOPE
BUDGE
T
PEOPLE
QUALIT
Y
1
5
2
3
4
Wrong priority on project constraints!
Cost-based
value proposition
Industrial Age:
• Cheaper, faster, better
• Automate as quickly as possible
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
25
Time is getting shorter – scope is getting
bigger
• Everyone on the business side and in IT wants quality, but rarely
is the extra time given or taken to achieve it.
Quality and time are polarized constraints.
• The higher the quality the more effort (time) it takes to deliver.
• Companies are driven by shorter and shorter schedules.
SCOPE
YAH
TIME
DDD
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
26
How are we addressing it today?
•
•
•
•
•
Data Warehousing
Customer Relationship Management
Enterprise Resource Planning
Enterprise Application Integration
Knowledge Management
Why can’t
technology
fix this?
Ineffective Technology Solutions
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
27
Data Warehousing
DW delivers...
a collection of integrated data used to support the strategic decision
making process for the enterprise.
The Promise:
data integration
no redundancy
consistency
historical data
ad-hoc reporting
trend analysis reporting
faster data delivery
faster data access
The Reality:
stove pipe marts
departmental views
swim lane development approach
too time consuming to integrate
too costly to cleanse data
increased data redundancy
it sounds too good to be true, it is to good to be true.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
28
Customer Relationship Management
CRM delivers …
seamless coordination between back-office systems, front-office systems and the Web.
the organizational lifeline, creating competitive advantage
through customer service excellence.
The Promise:
data integration
data quality
customer intimacy
customer wallet share
product pricing customization
knowing your competition
geographic market potential
The Reality:
more stovepipe systems
departmental views
dirty customer data
purchased packages not integrated
focus is too narrow
privacy issues
it sounds too good to be true, it is to good to be true.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
29
Enterprise Resource Planning
ERP delivers...
a collection of functional modules used to integrate
operational data to support seamless operational
business processes for the enterprise.
The Promise:
data integration
no redundancy
consistency
data quality
easy reporting
easy maintenance
Y2K compliance
The Reality:
system conversion not crossorganizational analysis
same dirty data
operational focus
poor quality (unusable) reports
one-size-fits-all data warehouse
too costly
it sounds too good to be true, it is to good to be true.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
30
Enterprise Application Integration
EAI delivers ...
integration of disparate applications into a unified set
of business processes through centrally managed rules
and middleware technologies.
The Promise:
fast & automated integration
leverage existing data
bridge islands of automation
easy cross-system reporting
faster data delivery
faster data access
The Reality:
dirty data
no true integration
still data redundancy
still islands of automation
easier access to the current data
mess
it sounds too good to be true, it is to good to be true.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
31
Knowledge Management
KM delivers ...
a process for capturing, editing, verifying (for accuracy),
disseminating, and utilizing tacit and explicit
information about the organization.
The Promise:
utilize organizational info
data integration
historical data
faster data delivery
faster data access
first & only customer contact
reduction of customer calls
less re-solving same problems
Reality of KM:
too difficult to build
too time consuming
too costly
technology challenges
non-sharing culture
isolated applications
difficult to disseminate
information
f it sounds too good to be true, it is to good to be true.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
32
What’s the lesson?
You cannot keep doing
what you have always done
and expect the results to be different.
Not even with
new technology.
“That wouldn’t be logical”
Spock, Star Trek
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
33
What do we have to change?
1. Assess the current state of data quality at your company
2. Understand and fix the root causes for data contamination
3. Perform data audits regularly (monthly, quarterly)
4. Stop working in isolated “swim lanes”
> Stop recreating data
5. Centrally manage your data like a business asset
(Enterprise Information Management [EIM])
> Assemble data as needed from the data inventory (enterprise data
model and meta data)
> Standardize and reconcile data transformations for BI/DW
applications (coordinated ETL staging area)
6. Scale down project scopes to incorporate data quality and EIM
activities
7. Embed data quality and EIM activities in all projects
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
34
Business intelligence …
…is
…isaacross-organizational
cross-organizationaldiscipline
discipline
and
andan
anenterprise
enterprisearchitecture
architecture
for
foran
anintegrated
integratedcollection
collectionof
of
operational
operationalas
aswell
wellas
asdecision
decisionsupport
support
applications
applicationsand
anddatabases,
databases,
which
whichprovide
providethe
thebusiness
businesscommunity
community
easy
easyaccess
accesstototheir
theirbusiness
businessdata,
data,and
and
allows
allowsthem
themtotomake
makeaccurate
accuratebusiness
businessdecisions
decisions. .
… is not business as usual
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
35
BI goals and objectives
80%
20%
Data
Management
Get control
over the
existing data
chaos
Data Delivery
Provide
intuitive access
to business
information
Data Reengineering
(Enterprise Information
Management)
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
36
Proliferation of data quality problems
“LegaMarts”
(Doug Hackney)
Legacy
BI ?
L
L
L
L
transformation ?
cleansing?
Data Warehouses
Data Marts
Users
DM
Marketing
DM
Finance
DW
Customer Support
DM
Product Sales
DM
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
Engineering
37
Industrial-age mental model
Business Units
Project Constraints
Client
Client
Sales
QUALIT
Y
5
Client
Inventory
PEOPLE
4
Client
Distribution
BUDGET
3
Client
Product Pricing
SCOPE
2
Client
Financial (AP & AR)
TIME
1
Client
Marketing
Priority
Customer Support
highest to lowest priority
IT
IT
IT
IT
IT
IT
IT
Information Technology Units
Scrap and rework
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
38
The game has changed
…but our mental model has not
1.
Enormous degree of complexity
(John Zachman)
2. Extremely high rate of change
Cheaper, faster, better !!!
But how?
Don’t scrap and rework.
Reuse what you already have.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
39
Information-age mental model
Project Constraints
Priority
QUALIT
Y
BUDGET
PEOPLE
TIME
SCOPE
Investment-based
value proposition
highest to lowest priority
1
2
3
4
5
Reassemble
reusable
components
Information Age:
• Reassemble the entire enterprise
• Reuse assets from inventory
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
40
Software release concept (1)
“Extreme scoping”
Projects
First
Release
- Larissa Moss
Second
Release
Final
Release
Application
Reusable &
Expanding
Third
Release
Fifth
Release
Fourth
Release
“Refactoring”
- Kent Beck
Project /= Application
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
41
Software release concept (2)
• Requirements can be tested, and implemented in
increments
• Scope is very small and manageable
small
• Technology infrastructure can be tested and proven
• Data volumes (per release) are relatively small
• Project schedules are easier to estimate because
scope is very small
• Development activities can be iteratively refined,
and adapted
the
honed,
AND:
The quality of the release deliverables (and ultimately
the quality of the applications) will be higher!
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
42
Cross-organizational development approach
(© Larissa Moss and Shaku Atre, “Business Intelligence Roadmap”)
(1)
Data
Quality
Touch
Points
BI/DW Development Steps
Cross1. Business Case
organizational
Assessment ...........................
Cross2.A Enterprise Technical
organizational
Infrastructure ...........
Cross2.B Enterprise Non-Technical
organizational
Infrastructure ...
Project-specific
3. Project
Planning ........................................... Project-specific
Cross4. Project Requirements
organizational
Definition ..................
Project-specific
5. Data
Analysis ...............................................Crossorganizational
6. Application
CrossPrototyping ...............................
organizational
7. Meta Data Repository
CrossAnalysis ...................
organizational
8. Database
CrossDesign ..........................................
© Copyright
2003, Larissa
T. Moss, Method Focus, Inc.organizational
43
9.
ETL Design
.......................................
Cross-organizational development approach
(2)
• Commitment to data quality embedded in the methodology
• Cross-organizational program management
• Enterprise information management group
• Standards that include a common information architecture
(enterprise data model)
Involving down-stream information consumers in the
requirements definition step
Involving data owners in the data analysis step
Involving business representatives from all business
units to ratify the data models and meta data
• Coordinating the development/ETL processes
Disallowing stovepipe development
Extracting and cleansing source data only once
Reconciling data transformations and storing the
reconciliation totals as meta data
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
44
Enterprise information management
Business Units
Client
IT
IT
IT
Client
Sales
Client
Inventory
IT
Client
Product Pricing
Financial (AP & AR)
Marketing
IT
Client
Distribution
Client
Customer Support
Client
IT
IT
Information Technology Units
Discover, Coordinate, Integrate, Document, Control
ODS
Operational
Environment
Operational Systems
Enterprise
Information
Management
OM
EDW
DM
BI/DW Databases
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
Decision Support
Environment
45
EIM responsibilities
• Business architecture inventory
Process models
Data models
• Application inventory
• Meta data inventory
Business meta data
Technical meta data
Discover,
Coordinate,
Integrate,
Document,
Control
Stewards
Programs
Databases
Architects
Managers
• Policy inventory
Standards
IT asset inventory
Procedures
management
Guidelines
…
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
46
Data stewardship
• Guardians of the data while it is being created
maintained by them
or
• Create standards and procedures to ensure that policies and
business rules are known and followed
• Enforce adherence to policies and business rules that govern
the data while the data is in their custody
• Periodically monitor (audit) the quality of the data in their custody
• Also known as custodians
• Can be a business person or an IT person
“One who manages another’s property.”
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
47
Data ownership
• Authority to establish policies and set business rules for
the data under their control
• Decide what the official enterprise definition and domain
is for the data under their control
• Monitor and advise other end users on proper usage of
their data
• Frequently, but not always, the data originator
• Can be a person or a committee
“One who has the legal right to the possession of a property.”
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
48
Enterprise architecture
Mission
Missionand
andObjective
Objective
Business
BusinessPrinciples
Principles
Business
BusinessFunctions
Functions
Program
Management
Program Management
Enterprise Data Model
Enterprise Data Model
- Data Standardization
- Data Standardization
- Data Integration
- Data Integration
- Data Reconciliation
- Data Reconciliation
- Data Quality
- Data Quality
Storage &
Presentation
Business Architecture
Information Architecture
Application Architecture
Technology Architecture
Operational
OperationalApplications
Applications
Data
Access
Data AccessApplications
Applications
Data
Analysis
Applications
Data Analysis Applications
Application
ApplicationDatabases
Databases
Technology
TechnologyPlatform
Platform
Network
Network
Middleware
Middleware
DBMS,
DBMS,Tools
Tools
Content
2. Data Delivery
1. Data Management
• data integration• data access
• data cleansing • data manipulation
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
49
Enterprise data model (data inventory)
Custome
r
Account
Payment
Payment
Method
Account
Custome
r
Product
Order
Product
Part
Product
Existing
Customer
Potential
Customer
Salesper
son
TopDown
Supported by
common
data definitions,
domains, and
business rules.
Product
Category
Part
Salaried
Salesperson
Org Unit
Supplier
Shipment
Commissioned
Salesperson
Org
Structure
Warehouse
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
50
Source data analysis
Domain Violations:
• Dummy values
• Intelligent dummy values
• Missing values
• Multi-purpose fields
• Cryptic values
• Free-form address lines
Integrity Violations:
• Contradicting values
• Violation of business rules
• Reused primary keys
• Non-unique primary keys
• Missing data relationships
• Inappropriate data relationships
BottomUp
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
51
To cleanse or not to cleanse …
•
You probably cannot cleanse it all (takes too long)
•
It may not be worth the time and money to cleanse every
data element
•
Not all data is equally significant
•
Not all data can be cleansed
•
How do you know what to cleanse?
…that is the question
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
52
Triaging questions (1)
• Can the data be cleansed?
Does the correct data exist anywhere?
Is it easily accessible?
• Should the data be cleansed?
How extensive is the problem?
How elaborate will the cleansing process
be?
Is it cost-effective?
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
53
Triaging questions (2)
• Why are we building the application?
What business questions cannot be answered today?
• Why are we not able to answer the business
questions?
Is it because of this dirty data?
Is it because of these missing relationships?
• Will the benefits of cleansing outweigh the cost
of the effort?
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
54
Categories of data significance
• Critical data
Business decision!
– Not all data is equally critical to all end users
– All critical data must be cleansed
– Usually includes amount fields
• Important data
– Important to the organization, but not absolutely critical
– Further prioritize important data elements
– Cleanse as many as time allows
– Those that cannot be cleansed should be bumped to critical for the
next release
• Insignificant data
– Informational data, which is nice to have
– Cleansing is optional if time allows
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
55
Cleansing – repairing – prevention
• Where should the dirty data be cleansed?
In the staging area of the BI application?
In the source (legacy) files?
• When should it be cleansed?
Retroactively?
At data entry time?
• How should it be cleansed?
Use data cleansing or ETL tools?
Write procedural (COBOL/C++) code?
• What will we do to prevent dirty data in the future?
Source Data Reengineering …
Total [Data] Quality Management (TQM)
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
56
Coordinated ETL staging
Legacy
Operat’l
reports
Staging
Area
Cleansing
Transform’s
Operational
Data Store/
Oper Marts
Tactical rpts
Staging
Area
Cleansing
Transform’s
Enterprise Data
Warehouse
Data Marts
Strategic rpts
Strategic rpts
OM
L
Customer
Support
DM
L
Daily
StA
ODS
L
CRM
Operational
Clients
Clients
Product Pricing
Finance
Mo
StA
EDW
CRM
DM
Marketing
Analytical
Transformation
Cleansing
DM
EXW
EXW
Engineering
Legal
Enterprise
EnterpriseArchitecture
Architecture&&Meta
MetaData
DataRepository
Repository
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
57
ETL process flow
Sales
Sales
File
File
Account
Account
Tran
Tran
File
File
Extract New Sales
Extract New Sales
Extract Accounts
Extract Accounts
New
New
Sales
Sales
Associate
Associate
Accounts
Accounts
Filter Accounts
Filter Accounts
Accounts
Accounts
New
New
Accounts
Accounts
Sort Accts
Sort Accts
Account
Account
Errors
Errors
Merge Customers
Merge Customers
Customer
Customer
Info
File
Info File
Prospects
Prospects
All
All
Customers
Customers
Sort Customers
Sort Customers
Sorted
Sorted
Customers
Customers
Prospects
Prospects
Profile
Profile
Customers
Customers
– coordinated –
Extract
Extract
2
Customers
Customers
Merge Prospects
Merge Prospects
Extract Prospects
Extract Prospects
Sorted
Sorted
Accounts
Accounts
Match
Match
Accounts
Accounts
Customer
Customer
Master
Master
1
Cleanse
Cleanse
Transform
Transform
Prepare
Prepare
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
3
Load
Load
58
ETL Reconciliation
L
L
L
Monthly
Staging
Area
DM
Load Files
DM
L
DM
ODS
(daily)
EDW
(monthly)
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
(monthly)
DM
59
ETL tie-outs: record counts
INPUT
RECORDS
# Input Records
PROCESS
PROCESS
MODULE
MODULE
=
OUTPUT # Output Records
RECORDS
+
REJECTED
RECORDS
# Rejected Records
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
60
ETL tie-outs: domain counts
# Records Per First Output Domain
OUTPUT
CODES
+
INPUT
CODES
OUTPUT# Records Per Second Output Domain
CODES
+
PROCESS
PROCESS
MODULE
MODULE
OUTPUT
# Records Per Third Output Domain
CODES
# Records Per Input Domain
=
+
REJECTED
# Rejected Data Values
CODES
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
61
ETL tie-outs: amount counts
Total $ Per First Output Amount
OUTPUT
AMOUNTS
+
INPUT
AMOUNTS
OUTPUT Total $ Per Second Output Amount
AMOUNTS
PROCESS
PROCESS
MODULE
MODULE
Total $ Input Amounts
+
= REJECTEDTotal $ Rejected Amounts
AMOUNTS
Total $ Per First Input Amount
+
Total $ Per Second Input
Amount
+
Total $ Per Rejected Amounts
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
62
Data quality improvements
• Source data repairs
• Increased program edits
• Enhanced data entry procedures
• Improved data quality training
• Regular data audits
• Data usage monitoring
• Enterprise-wide end user surveys
• Continuous validation of enterprise data model
• Continuous validation of meta data, especially
definitions
and domains
• Involvement of data owners, information consumers,
and business sponsors
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
63
Data quality maturity
At what level of
DQ maturity is
your organization?
Program
abends
1
Data profiling
Data cleansing
during ETL
Discovery
by accident
2
Limited
data analysis
3
short
term
Repairing
source data
and programs
Enterprise-wide
DQ methods &
techniques
Addressing
root causes
4
Proactive
prevention
Scale of 1 .. 5
5
long
term
Continuous
DQ process
improvements
Optimization
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
64
DQ capability maturity model (1)
(Source: Larry English)
CMM Level 1. Uncertainty - Unconscious and unaware
»
»
»
»
Data quality problems are denied.
No formal data quality processes defined.
Data quality initiatives are ad hoc and chaotic.
Any success is dependent on individual efforts.
CMM Level 2. Awakening - The big Aha! and lip service
»
»
»
»
Data quality problems are acknowledged.
Major problems are attacked as they come up.
Minimum funding for a formal data quality initiative.
Capability is a characteristic of the individual rather than the
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
organization.
65
DQ capability maturity model (2)
(Source: Larry English)
CMM Level 3. Enlightenment - Let’s do something
» Data quality initiative takes off.
» Enterprise-wide data quality assessment is performed.
» Data quality problems are corrected at the source (where possible).
» Data quality improvement process is institutionalized.
CMM Level 4. Wisdom - Making a difference
»
»
»
»
Management accepts personal responsibility for data quality.
Data quality group reports to a chief officer (CIO, CKO, COO).
Data quality correction changes to data defect prevention.
All business areas are involved.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
66
DQ capability maturity model (3)
(Source: Larry English)
CMM Level 5. Certainty - Nirvana
» Data defect prevention is the main focus.
» Data quality is an integral part of the business processes.
» All business areas are continuously improving the processes.
» The culture of the organization has changed.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
67
Organizational impact
• Cross-organizational tasks and responsibilities
well defined
• Data quality responsibility is not clear or ignored
are not
• Value of data is not understood or appreciated
• Projects are often cost justified using the industrial-age
mental model
• Resource requirements are not well defined
• Impact on application development empire
• No reward for data sharing
• Resistance to change
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
68
Organizational changes
• Business and IT collaboration (“partnership”)
• Business and business collaboration (“partnership”)
• IT and IT collaboration (“partnership”)
• Increased end user involvement
• Cross-organizational activities
• Architecture and standardization
• Software release concept
• New charge-back system
• New incentives
• New leadership
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
69
New leadership
CEO
collaboration
collaboration
CFO
COO
LOB Execs
Enterprise
Information
Management
CKO
EIM
CTO
...EA IT Execs
DA DQA MDA
Chief
Knowledge
Officer
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
70
How do we change?
12 steps to [DQ] recovery (1)
1. Become aware
• Every cultural transformation process begins
with an “Aha”.
• Understand the root causes for your current data
chaos.
2. Accept responsibility
• “Yes, it is our fault” for being in this mess.
• Accepting responsibility is a prerequisite for change.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
71
12 steps to [DQ] recovery (2)
3. Decide to change
• Now that “you know better”, the decision is
yours: Stay stuck or change.
• There can be no more false hopes for any
silver bullet technology solutions.
4. Identify root causes
• What are the specific root causes for non-quality data
in your organization?
• Some root causes are common, some are not.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
72
12 steps to [DQ] recovery (3)
5. Collaborate
• It doesn’t matter “whose fault” it is that the
root causes exist.
• IT must collaborate with the business
community to affect changes.
• Business community must also collaborate
with business community.
6. Identify change agents
• Who will be the couriers?
• Changes must be systemic and holistic, not
isolated and sporadic.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
73
12 steps to [DQ] recovery (4)
7. Spread the word
• To embrace changes, there must be “something in it”
for everybody.
• Otherwise, changes trigger anxiety and anxiety
results in resistance or rejection.
8. Plan changes
• Big changes do not get implemented in one “Big
Bang”.
• Involve people in change planning.
• Cross-organizational changes are phased in.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
74
12 steps to [DQ] recovery (5)
9. Prioritize changes
• Some changes are easier to implement than
others.
• Some changes have a higher payback.
10. Implement changes
• Everyone affected by the changes must have an
opportunity to review and approve the plan before
implementation.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
75
12 steps to [DQ] recovery (6)
11. Measure effectiveness
• Solicit feedback from “the trenches”.
• Are the changes affecting anyone adversely?
12. Refine changes
• Nothing is perfect the first time around.
• What might work in one organization may not work in
another.
© Copyright 2003, Larissa T. Moss, Method Focus, Inc.
76
Bibliography
•Adelman, Sid, and Larissa Terpeluk Moss. Data Warehouse Project Management.
Boston, MA: Addison-Wesley, 2000.
•Aiken, Peter H. Data Reverse Engineering: Slaying the Legacy Dragon. New York:
McGraw-Hill, 1995.
•Brackett, Michael H. Data Resource Quality: Turning Bad Habits into Good
Practices. Boston, MA: Addison-Wesley, 2000.
•Brackett, Michael H. The Data Warehouse Challenge: Taming Data Chaos. New
York: John Wiley & Sons, 1996.
•English, Larry P. Improving Data Warehouse and Business Information Quality:
Methods for Reducing Costs and Increasing Profits. New York: John Wiley & Sons,
1999.
•Hoberman, Steve. Data Modeler’s Workbench: Tools and Techniques for Analysis
and Design. New York: John Wiley & Sons, 2001.
•Kuan-Tsae, Huang, Yang W. Lee, and Richard Y. Wang. Quality Information and
Knowledge Management. Upper Saddle River, NJ: Prentice Hall, 1998.
•Marco, David. Building and Managing the Meta Data Repository: A Full Lifecycle
Guide. New York: John Wiley & Sons, 2000.
•Moss, Larissa T., and Shaku Atre. Business Intelligence Roadmap: The Complete
Lifecycle for Decision-Support Applications. Boston, MA: Addison-Wesley, 2003.
•Reingruber, Michael C., and William W. Gregory. The Data Modeling Handbook: A
Best-Practice Approach to Building Quality Data Models. New York: John Wiley &
Sons, 1994.
•Ross, Ronald G. The Business Rule Concepts. Houston, TX: Business Rule Solutions,
Inc., 1998.
•Simsion, Graeme.
Data 2003,
Modeling
Essentials:
Analysis,
and Innovation. 77
© Copyright
Larissa
T. Moss, Method
Focus,Design,
Inc.