صفحه 1:
۱ Duta © ‏:قاس‎ ‎Oh & it so + reseed by ‏مورا‎ T. Ovss EL 06 0 6 ۳ 09۷/۶, , 09 © Coprakt OOOO, Larexa T. Doss, Debad Poor, “ker.

صفحه 2:
Larissa T. Moss Method FocusTnc.+ www .methodfocus.com+ methodfocus@earthlink.net *(626) 355-8167 Ms. Mossisfounderandpresident of Method FocusInc.,a company speciaCizingin improving the quality of business information systems. Shefrequently speaksat Data Warehouse, Business IntelCigence, CRM,andInformation Quality conferencesaround the worldon the topics of information asset management, data quality, data modeling project management, andorganizational realignment. SheCecturesworldwide on the-BI topics of spiral development methodology, data modeling data audit andcontrof, project management, aswelCas organizational issues. Her articlesare frequently publishedin DM Review, TDWI Journal ofData ‘Warehousing Cutter IT Journal, AnaCytictdge and The Navigator. She co- authored the books: Data Warehouse Project Management, Addison Wesley 2000, Impossi6leData Warehouse Situations Addison Wesley 2002, and Business IntelLigence Roadmap: The Complete Project Lifecycle for Decision Support Applications Addison Wesley 2003. Ms. Mossisa member of theIBM GoldGroup,a Friendof Teradata,a senior consultant at theCut ter Consortiumanda contributing member of Ask The Expertson www .dmreview.com. Shefasbeen a (ecturer at DCI TDWI, MISTIandat the Extension of the California Polytechnic ‘University, Pomona .Shecan be reachedat (moss@met hodfocus.com. © Cop rnft ODO, Leesa T. Does, Detod Pome, “ac.

صفحه 3:
@resectatiod Oulice ° Oko do we wed by dota quelip? ‏مه بل ربو6‎ © Wow we ‏وله جر‎ it today? act Pevive tevharby soho ۰ Oko do we howe to chore? ‏عديسفيواننه یسیون‎ ° Wow do we chop? 09 ‏سسب [كائهع سبد‎ ‎Pome, ‘har. 9‏ بل ۰ ما ,9069 مرول 6 ‎

صفحه 4:
ای ح )۰ سح ع )۰ اوه و وو()) * ٠ Outs is ‏عواموست‎ © Data is ioteyruted * Osta values Polow the bustcess rules * Oota vorrespouds tv established dowd ۰ Ont ts wel dePiced ood understood © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.

صفحه 5:
Gyxopiows ‏الوم خن‎ dota * Op pour proqravs obec wi dota exceptions? ° Ore pou were vodhused ubout weaciay oP data? * 4s sowe DP pour date is tov stoke Por reporticc? ۰ Is pour dota being shored? Is it sharuble? ° Ore reports wowwistent? ۰ ‏و()‎ it take pour TT stoPP or the eo users kours ‏صا‎ ‎recounts tosvesisieat reports? * Opes werriey dato ped couse the syste to Pal? * Op beepers w of P ot cight? 7 © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.

صفحه 6:
Oiny dota ‏ابص‎ © Deen (dePoull) okies ۰ ‏رود ماو‎ votes . © Deer cher not just * Oulepurpose Pickle ۲8 ‏ی‎ data entry + Cree fore kev ‏صا‎ errors * Ovatireadiotey voles ۰ Oiptaica oF busicess rules ۰ ‏موق‎ priwary key 8 Oocrusique privary key © Osstay dota ‏ماو‎ * Ceappropriae dota ‏روا‎ 6 ‏لدب ۲ مورا ,9005 لسن‎ Pome, ‘Ia.

صفحه 7:
Overy (dePod) voor A ۶ DePouls Por woedatory Pies SSN 999-99-9 Age 999 Zip 99999 Income 9,999,999.99 2070956120232 ‏ی مد وا رو‎ profes ۴ ‏ره ار‎ 4 هجو لعج ۲ مورا 305 ملسون 9

صفحه 8:
“4 ۷ ” ۱ ۱ A ۶ DePauls wi ‏یی‎ SSN 888-88-8888 <X_ Ooererbuceu Income 999,999.99 <X ‏سوه‎ Age 000 <i Onrpg newer Source Code FFX ‏اج تسد سوه‎ 0 بت ‏ قت حی ‎ae oe‏ اس ور 9 هجو لعج ۲ مورا 305 ملسون 9

صفحه 9:
Oisstay Odes ° Opentivcd systews do ut days require inPorwotivod or dewoyraphic dott Gender Ethnicity 9 Age Income Referring Source j 7 Gee. ‏صا تتامو‎ code worketoy cocacks © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.

صفحه 10:
۳ و یمن( © OG Pek! exphoty hos DPOY weave » Obick busicess vei euters the dota » Ot udtsat teve to history it was otered » OD uahue ta poe or wore other ‏جلاعا‎ ‎Appraisal Amount 0 25 redefines = 25 attributes ! Advertised Amo 7 Ons = = aw ‏ای از‎ Not mutually exclusive ! Only the value of one Sold Date is known for each record ! Loan Type Code redePed us... } ‏وود‎ WG erably to idee product ‏رام‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 0

صفحه 11:
Crppic udues ((1) © OR tec Pou it “(itched Gio” Pies » Osrxrily oor byte (iP oot oo bit) » Ish over (B, B, O, 4, ©, 9, ...) > Oocrnteliqest, csrictulive poder » Oto ‏سيج اه‎ exchusive [ ‏موز‎ mere el wm wnt br ‏مسد‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ad

صفحه 12:
عي ۱ Master Cd {AB CDEFGH I} tA, B,C} ‏سعد سد گک>‎ {D, E, FB <& ‏مرهج‎ ‎{G HI} XX ‏صمسود یمه‎ Ned a CODE TRANSLATION Liat ‎Pome, ‘Ia. 6‏ لدب ۲ مورا ,9005 لسن 6 ‎

صفحه 13:
* Dustrurtured text » aw disvercoble potera » pect be parsed address-line-1: ROSENTHAL, LEVITZ, A address-line-2: | TTORNEYS address-line-3: 10 MARKET, SAN FRANC address-line-4: ISCO, CA 95111 وسح ملحب وممصم دا ‎i DRESS. Aecblay‏ 6 112 جم" لحماه 0 ,0 ۲ مورا ,9003 ناسون 6

صفحه 14:
Coctradiniiag values ۶ ‏ون‎ to coe Piel one topvosistedt wi values fe carter ‏لا لاه‎ 1488 Flatbush Avenue New York, NY 75261 CX exw Ly Type of real property: Single Family Residence Number of rental units: ‏سیک تاه‎ EISEN, ‏سا اه لیات‎ heer © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. OF

صفحه 15:
Otkiion oP bustess rues A * @usitess Rule? Odustuble Rute Oortqayes wust hove » ‏موه(‎ Ieterest Rute ( Crile) » ‏منم‎ Ioterest Rute ( Poor) * Ousicess Rule: @ Orthay is thas higher thoa a Plo ceiling-interest-rate: 8.25 floor-interest-rate: ‏و‎ } ‏جوا ]نوتیز‎ ey to ‏لت‎ proce proPacbiny © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 6

صفحه 16:
‎A‏ جوا روم تهج ‎° bith history, Poy, stoned io pperciosral Pies ‎« primary hes ure vusiowarily re-used » way kee ‏مه ما محص ”ال د‎ ‎January ‘94: branch 501 = San Francisco Main region 1 area SW ‎August ‘97: branch 501 = San Luis Obispo region 2 area SW ‎۳ | ‏© هجو لعج ۲ مورا 305 ملسون 9 ‎

صفحه 17:
Oowrudique privary heps ‏ای متام ).و‎ « ‏ای موه تال‎ Customer Name Phone Number Cust. Nui Philip K. Sherman 818.357.5166 960601 Philip K. Sherman 818.357.7711 960105 Philip K. Sherman 818.357.8911 960003 >» Duliple ewpbyer ‏ای‎ ‎Employee Name Department Empl. Number July 1995: Bob Smith 213 (HR) 21304762 January 1996: Bob Smith 432 (SRV) 43218221 August 1999: Bob Smith 206 (MKT) 20684762 DIES WEES. —tecblay to code ewploper beurPts reuds © Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. a

صفحه 18:
Oisstoy data rektiocships A ° Oats that should be related to other data igo depeudedt (pareutchid) ‏مشاه‎ اس با موه یبا ‎®rack‏ » Branch number 0765 does wit exist ta the @RBOOW table سوت سس وتو © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 6

صفحه 19:
۱ ant be » tuo eutiy types wit the sowe hey vohes Purchaser: Jackie Schmidt 837221 Seller: Robert Black 837221 © Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. 6

صفحه 20:
Ikppact oP enoveves data 7 ۰ ‏له از جوا مه‎ to correct dota problews ۰ ‏مج مق‎ ceeded to correct data problews ° Doe ced ‏تلاو‎ required to re-run jobs thot obec 0 Die wosted arguing pver iaoousistedt reports ° Lost busicess vpportuaties due to ucravuituble data * Ouuble to dewousirate busicess potecticl ao bupout * ices wo be pod Por ware w place wits (pvernnved requicicas * Ghippieg products to the wrod vustiowers * Oa public relives with vustowers steeds to ‏روص وا جه مها‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏مع‎

صفحه 21:
ea Oly Bon “hapa DD end 1 Cy, Om On ver موه و سمه 5 موه و $s ogee $077,000 s9,009,00q $00,000] $ ao/eo0 ممه و موه و ‎see‏ $9,999,000 = 9 مج ‎OOK 2‏ ‎ken‏ 00 4 و « 308,070 9 اج موه ® 0 همه 03 دجم ممه 0 Pome, “ec. lo ‏مس‎ ‎0 wm suse $0.68 ‏و‎ ‏و‎ 0 ‘S000 $0.00 he "Phew (GMM bak rae) Cran cot oor لله مرجت يسا سسا اك او شم مس سوه ری شا مس رای سس هن ماس سا ,3005 باون 6

صفحه 22:
‎redueckrat dct 1‏ ی و ‎© Werkoor (ORO, disks) oad ‏یامه‎ (progres ‎wortteodure) costs ured us ores oP vorvoicted reduadaat dete ‎۰ Cntr tive itches to recoil toccusisteuries ‎*) Cxtea resources ueeded to repounle ‏وود‎ ‎* Onwise busicess devisioes wade due to ‏الیل‎ ‎ied ioovasisteat data ‎° Lost vpportuciies due to uoretable dat ‎© Overckargicy or pverpopswed Por products ‎* Duplicate shippicagy oP products ‏وی تذل لو مه لوب رو( ۰ اوه ‎© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ee ‎

صفحه 23:
Ober Orn ‘Porwutios Oevelopwedt Ovst Badbstr ‘hap Deed Cay Tod Omer ‏سید‎ ‏أسمية | میم‎ Oa rad | ‏سفسسسين‎ | 8+ Ted | ‏عم | سفسية إستويمة | سسوية | سه‎ Ome ‏میم | تسه‎ | Ome | ‏اه‎ Gere | ‏تست‎ ‏0ك‎ ‎Gearon sheeted De ‏ممه‎ 0.00 2000 | 2000/00 Cane ei cermin pnp + ‏مود‎ 0 ‏مس ممم ميو و‎ — OF Ode Oe ardrereve eat ge + tow aon — genowo | 5 eoa0,hro aed carn ome ‏دض لم‌مموه و‎ Pedr coma ‏بين تعقه‎ ۳0 uso | seman $16,000} 000 an Polat oe ۳0۰ ۱ ‏ی‎ aed ee00oop ‏سمه .نمت‎ ‏سينا‎ 46 fe0,000,00p «00% dat arate prenpony LD” ert onan ol * Oeerane rome oferty decby aerng nd eck caer ‏رس‎ ‎0 empame ‏خدج ل‎ sou dan ‏ليق اعماج لجن‎ chew dao, err be pov ‏يي‎ cabs ese, resne peter daw rma nore) ‘hens HDD phan np lad he pend rd SED Cd vent © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 9

صفحه 24:
Oirny data — Wow did t koppeo? 7 S| é ‏بع اه‎ 8 8 S| 5/85 1 1 alr ۳ 2 ar 7 7 ‏ل‎ للملا مد ۰ 0 ۰ ‏تلم وم‎ ody doo © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. er

صفحه 25:
Wrong priority on project constraints! dadusttdl Bye: ۰ Cheuper, Poster, better ‏رای و‎ os possible © Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. es

صفحه 26:
۳ ۶ 277۶ - و رباص ‎Pie ts‏ یبا * Guero ‏نصا ها مه‎ side ood fo IP wants quotiy, but rarely ip the extea five quedo or toheo ty uchieve it. Quali odd eve oe polorized poostranis. ۰ Dke higher the quoliy the wore ePPon (tieve) i tokes to deliver. ۰ Cowpusies ure dived by shorter und shorter schedules. A= moe 72 YOu O00 © Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. ea

صفحه 27:
Cy can’t ۰ Mate ‏بوص حاصو(‎ technology * Cxstower Retaivaship Qocagesect ° Coterprise Resvurve Phocaiccy 6 ° Coterprise Opplcaiogn lotecpotica 3 ۰ ‏امیرمه() بلیوسی)‎ La et Peowe Veckwby COohioe مع ‎Pome, ‘Ia.‏ لدب ۲ مورا ,9005 لسن 6

صفحه 28:
Outs ‏هه(‎ .رف 00۵ ممصمل ع موت مدلا مره با ای و تم وگو ‎the eoiryprea‏ و مس رای عمج بو سود ] مرس مد 7 Dw recheck D deported views مسر مومس | مرو و ] ‏مك مسا‎ Deo tere COED] 7 Ace D chor reporter 1 read crafts reporieny Dee ost to chores car 0 Pewter chacs celery D racrecsed cata recherche وه مب و it sounds too good to be true, it is to good to be true. © Oaprntt ODO, Laresa T. Doss, Drtad Pome, ‘har. ee

صفحه 29:
Custower Rehtiowship Queagewedt CRO deters ... اجه مه سم سیر انیت هقی ومد ike pryonizatiocd bPetoe, oreatay cowpetiive odverctcce thous customer service excellence. The Rou: | syste O departed views 7 ‏سوه وق‎ chats او بو موم املسم 7 سس ‎Pore te too‏ [1 ] ‏رمرم‎ ones Phe Prower! 0 cet ‏مرس‎ ‎7 ‏مهل‎ que رسد 7 اه أدب -وصخاصد 1 1 ‏هه بو سم‎ D keeway pour ooxopeti اور لنوت موس 17 it sounds too good to be true, it is to good to be true. ‎Pome, ‘Ia. 9‏ لدب ۲ مورا ,9005 لسن 6 ‎ ‎ ‎

صفحه 30:
CGuterprise Qespurve locate ERE delivers... رت سای ماوت هط ار مهو و و موه موه و وا ام 1 ‏وس‎ The Red: 0 chet tater Days oowersion wl ores 7 ‏مرو‎ ۳ 1 Deno dey chat 0 chat quay D oper ction Pores ] ‏(جاطصجى) بلسي سدم [ صوص رصم‎ reports ‏اب ول آمسیی تا یمه ] و رو‎ 1 60 ‏اجه سا له‎ it sounds too good to be true, it is to good to be true. ‎Pome, ‘Ia. 90‏ لدب ۲ مورا ,9005 لسن 6 ‎ ‎ ‎

صفحه 31:
CGuterprise @pphouion ٩۱ EO delivers ... ‎rt‏ ی و با مرو ‎ob deporte‏ مهو اه توص روط ار ‏تحسم ای و ات لو ‎Te Pedy:‏ 2 سمل روف 7 | ‎beverage extoiery chats Dow re kteqroios‏ لول ‎2D sll cara‏ موسیگ لوا ‎bch‏ ‏مه ‎oP‏ جلجماطا لاد 1 مس رومججومو روم 7 ‎wrest he marred cht‏ و 7 ‎Paster chats delivery‏ ‎OO Rarer dhe arrears core‏ ‎ ‎ ‎ ‎it sounds too good to be true, it is to good to be true. ‎Pome, ‘Ia. 90‏ لدب ۲ مورا ,9005 لسن 6 ‎ ‎ ‎

صفحه 32:
Phe Prowise: Realy of KD: 7 ‏سس موف‎ rab Deo ‏اه کل‎ 7 ‏هم ] موه مك‎ coe RTI 11 Kistoraced cat Dee cosy 17 Pewter char delve ] ‏جم مسلاصات بوجاس ادس‎ ‏سفه ولمم 1 و وا موی‎ ] ‏مس‎ & oy meter coat ‏رو توا‎ 1[ ‏هه هه ۶و وحاص لجر‎ Odin n chert D hess re-sobieny scxre problears ‏ی مر‎ ‘it sounds too good to be true, it is to good to be true. ‎Pome, ‘Ia. 9‏ لدب ۲ مورا ,9005 لسن 6 ‎ ‎ ‎

صفحه 33:
Oks the ‏یط‎ You cannot keep doing what you have always done and expect the results to be different. Not even with new technology. “Dot wordt’ be ‏"لسيجا‎ ‎Operk, Grr Trek ‎Pome, ‘Ia. 59‏ لدب ۲ مورا ,9005 لسن 6 ‎

صفحه 34:
Oke do we howe to choo? . Ossess the curred state oP data qualiy of pour posepooy 8. Ocderstacd ced Pre the root causes Por data orotic ©. PerPorer dota oudiis requkedy (covethy, quarter) €. Gtop worker to isvlated “suie kices” > Gtop revreutey data S. Cearaly wacage pour dota the o bustuess usset (Coterprise IePorevation Dacageweut [(B10]) > @ssewble dota us ceeded Proc the data iweciory (euterprise dota wodel cod wweta data) > Grordandize ced recvorie date trocesPorewativas Por BVOO upphicaives (coordrated (PL steric area) ©. Gre dows project scopes ty tacorpornte data quel ood EID hve ©. Cwobed dota qualiy ced C10 untivites to of projets © Conrad ODO, Leesa P. Doss, Detod Pores, “ow. oF

صفحه 35:
...i¢ u cross-organizational discipline und ua enterprise architecture Por co tote<rated ovllertioa oF vperdivcd as well us devision support whick provide the bustuess oreremuciy easy access to their busicess data, word ull the to woke accurate business decisions. ... & wt bosiess we wud ‎Pome, ‘Ia. 56‏ لدب ۲ مورا ,9005 لسن 6 ‎ ‎

صفحه 36:
20% Data Delivery ~ Management ۱ Provide 9 Get control intuitive access over the to business | information _ Data Reengineering (Enterprise Information Management) © Oaprntt ODO, Laresa T. Doss, Drtad Pome, ‘har. 56

صفحه 37:
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏مو‎

صفحه 38:
Tedusindbage weet wodet 2 وج مهس سطاهسلتا ۱ ۳۵ مسب »| عد ‎Seats‏ | 25/ ,9 2 5 مس | [ ‎ol Lee = Fe‏ 0 > Gaw ad wrk ‏حر‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 56

صفحه 39:
‎Pome, ‘Ia. 59‏ لدب ۲ مورا ,9005 لسن 6 ‎

صفحه 40:
4ePorwoivd Bye: * Qeussewhle the eure ‏ووو مجامج‎ ° Qewe usets row ieciory vk proposilios © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 0

صفحه 41:
etree “RePavtoricg” Ie - Kent Beck rogvil= @pplicaiva © Oaprntt ODO, Laresa T. Doss, Drtad Pome, ‘har. 60

صفحه 42:
GoPiwore reuse vowept (C) ٠ Requreweus ooo be tested, ood koplewedted ‏و‎ sev ‏جل موسوم‎ * Goope ts very soot ‏جاطامج موی اجه‎ ° Devkowlogpy icProstructure ooo be tested ocd proves * Onte uokaves (per release) ore reliively scott * @rviett schedules ure pusier tv estwote becouse the ‏ای رو وا وود‎ ۰ ‏مجممممرج()‎ uniiviies coo be iteroively rePiced, koued, ued udapted 000: The quoliy oP the release delvercbles (ord uticvatehy the qualiy oP the opphicaives) wil be higher! © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.

صفحه 43:
(0 enn ol Toei deg BI/DW Development Steps 1. Business Case Cross- Assessment ... = organizational 2.A Enterprise Technical Cross- Infrastructure .. organizational Data 28 Enterprise N Non-Technical Cross- Gane Infrastructure .. organizational ‘palais 3. Project Project-specific Planning .... .. Project-specific 4. Project Requirements Cross- Definition 5 organizational 5. Data Project-specific Analysis ...Cross- 6. Application organizational rototyping .. 9 Cross- 7. Meta Data Repository organizational analysis. Cross- ‏و1‎ Database organizational Design

صفحه 44:
(©) © Coexvitoent ty dats qucliy ecobedded ta the ‏ا‎ ‎۰ ‏ا‎ < principle ۰ ‏صجب وم مس مرس‎ > 6 © Oreck ho take @ corxooa kPoreioa achierkns FESOUTCES (voter prise dota come!) < policy ‏ام[‎ dowrstireu iPorordion cores to the requireweuts dePioiiza step Develo data ‏دحل جما جا ميت‎ ovate Step ‏وه دمص امه[‎ ‏لا رتم و لین‎ data wodels cad ‏ملك ی‎ ۰ ‏سا نون‎ devebpemal®TL processes 1 ‏اس مماسطط همست‎ < enforcement ‏بای ال مه رای لو تمه‎ oor ‏مجحب لعج یسوط مب رما‎ the ‏تا مر‎ os weta dott © Cop rnft ODO, Leesa T. Does, Detod Pome, “ac. CE

صفحه 45:
سوه ها ‎Base‏ ; اسر ۵9/۵۵ ‏سس‎ Ca اي © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. eS تسه

صفحه 46:
10 respowwtbiliies Orovess wodels ‏ناك‎ ‎rch ‎@ ee Discover, —_,| Coordinate, | © Outabases Integrate, 1 © Dera daa ‏ام‎ Document, | ۶ ae = Control Ovsivess weta dota 4 ‏مهن حك‎ ٠ Policy esiory ۵ AP corel ‏مره‎ ‎Procedures ‏ی یت‎ هم ‎Doss, Drtad Pome, “ler.‏ ۰ مورا ,0009 برممً

صفحه 47:
Quatre stewardship © Bvordoos oP the dota while ‏دز اذ‎ beta ‏لو‎ vr wwotctatced by thea * Create sterdacds ced procedures ty eusure thot policies cord busteess nudes ore koowe ood Poised * CoPorve udkereue ty poles ced bustcess rules tho! qoverct the data while the data is to their mustody © Periodical) wouter (cudt) tke qualiy oP the dota ‏جز‎ their mustody: * Ov ‏منم‎ os ‏هه‎ * Cua be u busivess persva pr aa VP persva “One whe wanages carters property.” ممم ‎Pome, ‘Ia.‏ لدب ۲ مورا ,9005 لسن 6

صفحه 48:
وه 0۵۰) * @uhoniy te establish policies ood set bustuess rules Por the dota verde thee patrol ° Oevide what the oP Pind euterprise dePicitica ood doouict ‏يز‎ Por the dota ‏ددمت عفجكلا علد‎ ۰) ‏له لجه‎ her cod users vo proper usage of their dota, © Crequedy, but at uboaps, the data prigtcrator * Cac be o person pr a power “Oue who bree the kecpal right tothe px © Orprntt ODO, Lassa T. Ores, Deed Pore, “kar. © oo of ۰

صفحه 49:
وان لمی مت و ۱ ‎Deranged‏ ] مس رومام ‎YY‏ ‏سس ‏سل ‎OOOO, Poets‏ و ‎Pome, ‘Ia.‏ لدب ۲ مورا ,9005 لسن 6

صفحه 50:
ای ۳ لعج سعط طم معط Payment Method Product, = r ‏موی‎ TS Payment} lee ‏كه‎ ‎۲۳-6 ‎۳0 ‎7 ‎= ‎at ‎= ‎۳17 ‎Salespersor ¥ 2 ۳ ‏تس‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. Existing Customer Potential Customer

صفحه 51:
Find the! Ovrtradctey yokes Otehttiva oP busters rules Qeused privary keys QOow-ucique primary keys ‏تا ول بو(‎ وا ال و ی ] + Overy ves حصا ومد مولیت ۰ لو ‎Opry‏ ۰ ‏مومسم‎ Piekks 0 ۰ PreePorw uddess heer © Cop rnft ODO, Leesa T. Does, Detod Pome, “ac. 90

صفحه 52:
۱ ۰ ‏)دز ماه اس رتاو نب‎ tov bron) ۰ ‏رون‎ wt be work the toe ord woe to eos every dota elect © Dot ol dota is equally ‏ادع مياه‎ * Oot dota cod be ceodsed * Yew de pou koow what to dec? ... ۳ ts the quesiod © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. Se

صفحه 53:
‎questions (()‏ میم ‎ ‎* Can the dota be deoused? nag Does the correct data exist anywhere? Is it easily accessible? ‎© Ghowd the dota be cleused? How extensive is the problem? How elaborate will the cleansing process be? Is it cost-effective? ‎9, boresa T. Doss, Detod Poms, “her. 99 ‎ ‎

صفحه 54:
(©) حمصاوصب بجوت !]> rag © OW oF we buldoy the upplicaica? What business questions cannot be answered today? © OW we we wt oble ‏نوی و‎ the business questions? Is it because of this dirty data? Is it because of these missing relationships? © Oil the ‏هی تلم و ناس‎ the vost oP the (۶ 9, boresa T. Doss, Detod Poms, “her. 9

صفحه 55:
Cotexpries vP data ‏مه‎ ‎Owes devel © Orica ‏مل‎ = Dot ol deta ‏هه راو‎ to ol pod users = Ol ontcdl data cast be ‏لمان‎ ‎- ‏مه نارامج(‎ Brekke ۱ ‏مرو‎ to the ‏نا رتم‎ wot ubsvhiely oical = Panther ‏ييا بعل يصوي شعو‎ ‏وان‎ us ‏بو‎ os tive claws — Vkose thot cocont be dlecased should be bucoped to oriicdl Por the: ued release حافك امم موص و عمهذا صا عله صا نت رل اوه حك صدة خا جره نا يسصميا0 - 112 جم" لحماه 0 ,0 ۲ مورا ,9003 ناسون 6

صفحه 56:
)0 — ‏واو‎ — preveciod © Ohkere should the dtp dota be cleuased? In the staging area of the BI application? In the source (legacy) files? © Okeo should it be ‏#لدصصمصاء‎ ‎Retroactively? ‎At data entry time? * Wow should it be cleused? Use data cleansing or ETL tools? Write procedural (COBOL/C++) code? * Oke ull we ‏حك‎ & preved dey date to he Ptr? Cowce Ona Rewragweriag ... Ted [Dad] Quay Oorrgeoed (TAO) © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 90

صفحه 57:
Coordirated BTL stacy ‎Guerre Ons 295 0-5‏ یمرن ‎Onaga‏ مسا ‏سوت ‎Or Ow Gore! rev‏ ‎Cw)‏ تا موجن ان ان ‎rite‏ هت تن فا پر نو وه قار ‏س0 ‏ا ‎ ‎Oo) ‎ ‎ ‎ ‏سحي سس +09 اسر ‎ ‏سس ‏مب( ۵0۵ مسق ‎al‏ ‏سب ‏یه 60-۲ ‏مو ‏1 جد لحماه !ا ,حدم( ‎1١.‏ ددرا _ 0005© يوون © ‎

صفحه 58:

صفحه 59:
© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 99

صفحه 60:
CT te-vus! record vows , OUTPUT |# Output Records PROCESS SS ‏ها‎ = ۳ + INPUT MODULE = RECORDS Ss # Input Records = © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏مه‎

صفحه 61:
ات مرول وس را۳) lecords Per First Output Domain + [fecords Per Second Output Domain + kecords Per Third Output Domain + ‘ejected Data Values © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ea

صفحه 62:
ات اون طسو را۳) ‎Per First Output Amount‏ $ سح ‎OUTPUT.‏ ‎MOUNTS,‏ ‎- ‎aml ‎AMOUNT! ‎= ‎tal $ Per Second Output Amount ‎EJECTEL} tal $ Rejected Amounts ‎MOUNTS ‎ ‎ ‎ ‎|_, PROCES INPUT ||| MOUNTS} ‎_ ‎Total $ Input Amounts ‎Total $ Per First Input Amount ‎+ ‎Total $ Per Second Input Amount + Total $ Per Rejected Amounts ‎© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‎

صفحه 63:
مزا ‎data, eppevicly‏ ۱ حول مه ۰ ‏مور‎ oF dota pwoers, 0۲۹ 09 سوه ووم‌صو لجن 9 هجو لعج ۲ مورا 305 ملسون 9

صفحه 64:
Outs qualiy wouturtiy Cron Dt wb level oF 4 a. OQ wotuniiy is Deter proPibe 7 on a ‏تسيا‎ 5 cherry BND 9 ‎rs kn‏ ها ‎a aed progres‏ : یی ‏يك ‎short OQ wrtuds &‏ هم اب ل | سدس | © ‎ ‎ ‎ ‎ ‎ ‎Crowe 7 ‏هی‎ ‏تست | 3 امد‎ 0 fl a “| Opkeotration ‎ ‎ ‎6 ‏لدب ۲ مورا ,9005 لسن‎ Pome, ‘Ia. Or ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

صفحه 65:
OQ vepebliy weturiy woe ((1) (Corer: Lory Corboh) COO Level 0. Ouvertaiaty - Ooovesrious ond ‏بستكي‎ > — Data quay problews are devied. > Op Porerd dota qualiy processes ‏سحام‎ > Data quali) faltaives ane ad hor oad chai. واه الط ‎oo‏ ول و ‎success‏ وق« 0000 ‏مرب .9 امصا‎ - Nhe big Bho! ood lip service > Dats que) problews ane achar hedged. > Dao problews arr otached oF they coe up. > Orman Puan ‏مدل ام و و‎ quai iii, ‎raker thaw the orcrizatioc.‏ مره موه وه رون« ‎© Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 96 ‎

صفحه 66:
(Corer: Lory Corboh) COO Level 9. ‏ایو ولمم‎ > ‏مرف رو وت‎ toes DPR. < Caerprerwide dota ‏لم وج موه رل‎ > Dota qual problews ane corrented ut he source (where possible). > Data quahy KeproUEeU process ir Ketiirratzed. COO Lae &. Disdow - Oching ‏ماد ه‎ ‏و و موم(‎ resprasbiy Por ‏مدل‎ quit. > Data gay (uP reports to a chieP oPPicer (O10, CKO, COC). » Dot gun) porreniva chomes to data dePert preven. « Ol eters array one ‏لام‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏وه‎

صفحه 67:
(Corer: Lory Corboh) COO Level S. Certainty - Dirvoca « ota dePert prevertion i the wats Pore. > Data quay i oa iaiegral port oP the bustaess processes. > Ol ‏بو رام مج و مه توص‎ processes. > De ature of ‏اوه رهم‎ © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏مو‎

صفحه 68:
ee Orqeatzaiodd tezpact * Cross-or~pizeiocd tusks cod respousibities we wt well dePiced ٠ Date qualiy respousibiliy is oot ‏له له‎ ۰ Ocha oP date is ot vedersiood or appreciated © Crvjects ore oPted vost justified usiay the ‏جه ماص لوا‎ sweat wodet * Resour reqineweus ore ut well dePiced ° Ikvpert oo uppicaion developwedt ewpire * Op reward Por data shoriery * Resistowe to chore و اس © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her.

صفحه 69:
© ‏نی‎ oad VP ‏)مشاه‎ ( ۰) ced busttess vlaborativa (“portcership”) ٠ ID aed VP ootaboraivs (“partcership”) © Ceara” ead User eure ‎Pome, ‘Ia. 9‏ لدب ۲ مورا ,9005 لسن 6 ‎

صفحه 70:
مان PO cpllaboratiop collaboration] ‏“*[م»ه ] 000 * * معن‎ 00 ‎Ober‏ |‘ 1 سور یات ‎Ccterpree‏ ‏۳ 000 ۵0 ۵۵ دص ی ‎ ‎ ‎ ‏هجو لعج ۲ مورا 305 ملسون 9 ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

صفحه 71:
dS steps to [oa] 1۱۲۶ (a) Onderstoad the root causes Por pour correo dott chavs. 8. @vvept respowsbilyy © “Ves, itis pur Pout” Por betcry tc this wees. * Ooveptiog respousibiiy is a prerequisite Por choo. © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. 20

صفحه 72:
49 steps to [0G] recovery (©) e /\ من ©. ‏ما ن طب‎ ٠ )( ‏نج افجلا بيج‎ keow better”, the devision is pours: Gtay stuck or chop. ٠ Phere cod be ww wore Pose hopes Por oy siiver bullet ‏امه یاوه‎ 60. ‏سس رید‎ causes © Oko we the specific root causes Por ‏ال اوه‎ fo pour or<proizaina? * Cowes root couses Ure DOWN, Sow Ure wt. © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏هم‎

صفحه 73:
dS steps tw [0G] revovery (9) e 6. ‏ما0‎ ce * Adoesd't water “whose Pout” itis trot the ۲۳۱ ‏هه‎ Exist. ۰ IP west colborde wit the busicess vo wweuniy to oP et chor. واه موه نمی موه وونی) ۰ امین ‎wih busicess‏ /\ 9. edly change ‏یه‎ ‎٠ Oho will be the cowiers? ۰ Chooges wost be systesnio ocd holistic, ut isvhited aod sporedic. © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏هم‎

صفحه 74:
49 steps to [0G] recovery )( من 2 Gpred te word © Do ebro chanes, here wust be “sowetkiog in ® Por everybody. * Otherwise, chooges tiqyer condety cod unde results in resistoae or ‏و‎ موه ۳ .6 وه و لیوا ‎yet‏ وه و موه رز ۰ 9 ۰ ‏لور‎ people in chore phasic. * Cross-oryenizeivcd ‏روط‎ one phased ic. © Conrad OOOO, Leresa TP. Doss, Detod Pores, “ow. PP

صفحه 75:
49 steps to [0G] recovery (S) e من /\ 9. Cronies changes ٠ Gowe chooges ore eusier to iopleswedt thot vikers. ۰ ‏عم من موق‎ o higher puyback. (0. ‘koplewedt changes * Gvervocr oPPevted by the changes wost hove oo Dpportaily to review ond upprove the pho bePore iwplexpectaiivd. © Conrntt OOOO, Laresa T. Oss, Detod Pome, ‘her. ‏هم‎

صفحه 76:
49 steps to [0G] recovery (©) e /\ من وج ۳۳و فحت( .00 وه لا مد له اوق ‎٠‏ ‎Ore the chooges ofPeciiag cour whersely?‏ © ا له و اجو ‎Ootkiey is perPect the‏ * ۰ Oke wight work io vor orqecizeive wap uot work ia ‏اس‎ هم ,9009 لسن 6

صفحه 77:
O®ibiograpky *Adelman, Sid, and Larissa Terpeluk Moss. Data Warehouse Project Management. Boston, MA: Addison-Wesley, 2000. *Aiken, Peter H. Data Reverse Engineering: Slaying the Legacy Dragon. New York: McGraw-Hill, 1995. *Brackett, Michael H. Data Resource Quality: Turning Bad Habits into Good Practices. Boston, MA: Addison-Wesley, 2000. *Brackett, Michael H. The Data Warehouse Challenge: Taming Data Chaos. New York: John Wiley & Sons, 1996. *English, Larry P. Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. New York: John Wiley & Sons, 1999. *Hoberman, Steve. Data Modeler’s Workbench: Tools and Techniques for Analysis and Design. New York: John Wiley & Sons, 2001. +Kuan-Tsae, Huang, Yang W. Lee, and Richard Y. Wang. Quality Information and Knowledge Management. Upper Saddle River, NJ: Prentice Hall, 1998. *Marco, David. Building and Managing the Meta Data Repository: A Full Lifecycle Guide. New York: John Wiley & Sons, 2000. *Moss, Larissa T., and Shaku Atre. Business Intelligence Roadmap: The Complete Lifecycle for Decision-Support Applications. Boston, MA: Addison-Wesley, 2003. *Reingruber, Michael C., and William W. Gregory. The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models. New York: John Wiley & Sons, 1994. *Ross, Ronald G. The Business Rule Concepts. Houston, TX: Business Rule Solutions, Inc., 1998. *Simsion, Graemex,aje Mtanleling.EssentialsAnalysis,Dasign, and Innovation. ۵

Improving Data Quality: Why is it so difficult? presented by Larissa T. Moss President, Method Focus, Inc. DAMA Oakland, CA May 7, 2003  Copyright 2003, Larissa T. Moss, Method Focus, Inc. Larissa T. Moss Method Focus Inc.  www.methodfocus.com  methodfocus@earthlink.net  (626) 355-8167 Ms. Moss is founder and president of Method Focus Inc., a company specializing in improving the quality of business information systems. She frequently speaks at Data Warehouse, Business Intelligence, CRM, and Information Quality conferences around the world on the topics of information asset management, data quality, data modeling, project management, and organizational realignment. She lectures worldwide on the BI topics of spiral development methodology, data modeling, data audit and control, project management, as well as organizational issues. Her articles are frequently published in DM Review, TDWI Journal of Data Warehousing, Cutter IT Journal, Analytic Edge, and The Navigator. She coauthored the books: Data Warehouse Project Management, Addison Wesley 2000, Impossible Data Warehouse Situations, Addison Wesley 2002, and Business Intelligence Roadmap: The Complete Project Lifecycle for Decision Support Applications, Addison Wesley 2003. Ms. Moss is a member of the IBM Gold Group, a Friend of Teradata, a senior consultant at the Cutter Consortium, and a contributing member of Ask The Experts on www.dmreview.com. She has been a lecturer at DCI, TDWI, MISTI, and at the Extension of the California Polytechnic University, Pomona . She can be reached at lmoss@ methodfocus.com. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 2 Presentation Outline • What do we mean by data quality? Dirty data categories • How are we addressing it today? Ineffective technology solutions • What do we have to change? Approaches and techniques • How do we change? 12 steps to [DQ] recovery © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 3 What do we mean by data quality? • • • • • • • • Data is correct #1 Data is accurate Data is consistent Data is complete Data is integrated Data values follow the business rules Data corresponds to established domains Data is well defined and understood © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 4 Symptoms of poor-quality data • • • • • • Do your programs abend with data exceptions? Are your users confused about meaning of data? Is some of your data is too stale for reporting? Is your data being shared? Is it sharable? Are reports inconsistent? Does it take your IT staff or the end users hours to reconcile inconsistent reports? • Does merging data often cause the system to fail? • Do beepers go off at night? © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 5 Dirty data categories • • • • • • • • • • • • Dummy (default) values “Intelligent” dummy values Missing values Multi-purpose fields Cryptic values Free-form address lines Contradicting values Violation of business rules Reused primary key Non-unique primary key Missing data relationships Inappropriate data relationships © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 6 Dummy (default) values • Defaults for mandatory fields SSN 999-99-9999 Age 999 Zip 99999 Income 9,999,999.99 Inability to determine customer profiles Inability to determine customer demographics © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 7 “Intelligent” dummy values • Defaults with meaning SSN 888-88-8888 Income 999,999.99 Age 000 Source Code ‘FF’ Non-resident alien Employee Corporate customer Account closed prior to 1991 Inability to write straight forward queries without knowing how to filter data © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 8 Missing Values • Operational systems do not always require informational or demographic data Gender Ethnicity Age Income Referring Source Inability to analyze marketing channels © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 9 Multi-purpose fields • ONE field explicitly has MANY meanings » Which business unit enters the data » At what time in history it was entered » A value in one or more other fields Appraisal Amount redefined as 25 redefines = 25 attributes ! 25 redefines = 25 attributes ! Advertised Amount redefined as Sold Date Loan Type Code Not mutually exclusive ! Only the value of one is known for each record ! redefined as ... Inability to judge product profitability © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 10 Cryptic values (1) • Often found in “Kitchen Sink” fields » Usually one byte (if not one bit) » Highly cryptic (A, B, C, 1, 2, 3, ...) » Non-intelligent, non-intuitive codes » Often not mutually exclusive Inability to empower end users to write their own queries © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 11 Cryptic values (2) • ONE field implicitly has MANY meanings Master_Cd {A, B, C, D, E, F, G, H, I} {A, B, C} {D, E, F} {G, H, I} Type of customer Type of supplier Regional constraints © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 12 Free-form address lines • Unstructured text » no discernable pattern » cannot be parsed address-line-1: A address-line-2: address-line-3: FRANC address-line-4: ROSENTHAL, LEVITZ, TTORNEYS 10 MARKET, SAN ISCO, CA 95111 Inability to perform market analysis © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 13 Contradicting values • Values in one field are inconsistent with values in another related field 1488 Flatbush Avenue New York, NY 75261 Texas Zip Type of real property: Single Family Residence Number of rental units: four Income property Inability to make reliable business decisions © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 14 Violation of business rules • Business Rule: Adjustable Rate Mortgages must have » » Maximum Interest Rate ( Ceiling) Minimum Interest Rate ( Floor) • Business Rule: A Ceiling is always higher than a Floor ceiling-interest-rate: floor-interest-rate: 8.25 switched ? 14.75 Inability to calculate product profitability © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 15 Reused primary keys • Little history, if any, stored in operational files » primary keys are customarily re-used » may have a different rollup structure January ‘94: branch 501 = San Francisco Main region 1 area SW August ‘97: branch 501 = San Luis Obispo region 2 area SW Inability to evaluate organizational performance © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 16 Non-unique primary keys • Duplicate identification numbers » Multiple customer numbers Customer Name Philip K. Sherman Philip K. Sherman Philip K. Sherman Phone Number Cust. Num 818.357.5166 960601 818.357.7711 960105 818.357.8911 960003 » Multiple employee numbers Employee Name Department Empl. Number July 1995: Bob Smith 213 (HR) 21304762 January 1996: Bob Smith 432 (SRV) 43218221 August 1999: Bob Smith 206 (MKT) 20684762 Inability to determine customer relationships Inability to analyze employee benefits trends © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 17 Missing data relationships • Data that should be related to other data in a dependent (parent-child) relationship Branch Employee Benefit » Branch number 0765 does not exist in the BRANCH table Inability to produce accurate rollups © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 18 Inappropriate data relationships • Data that is inadvertently related, but should not be » two entity types with the same key values Purchaser: Jackie Schmidt 837221 Seller: Robert Black 837221 Inability to determine customer or vendor relationships © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 19 Impact of erroneous data • • • • • • Extra time it takes to correct data problems Extra resources needed to correct data problems Time and effort required to re-run jobs that abend Time wasted arguing over inconsistent reports Lost business opportunities due to unavailable data Unable to demonstrate business potential in a buyout • Fines may be paid for noncompliance with government regulations • Shipping products to the wrong customers • Bad public relations with customers – leads to alienated and lost customer © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 20 Cost of erroneous data © Larry English, Improving DW and BI Quality Direct Costs of Non-Quality Information Marketing Campaign Time: ($60/hour loaded rate) Creating redundant occurrence Researching correct address Correcting address errors Handling complaints from customers Mail preparation Materials, Facilities, Equipment: Marketing brochure Postage Warehouse storage Shipping equipment and maintenance Computing resources: CPU transactions Data storage Data backup Per Instance Number of Instances 2.4 min 167,141 10 min 5,000/mo 0.3 min 6,000/mo 5.5 min 974/yr 0.1 min 393,273 $1.96 $0.52 $0.01 $5,000/yr 393,273 393,273 393,273 36% $0.02/trans 393,273 $0.001/mo 393,273 $0.005/mo 393,273 Total Number Per Year 1 12 12 1 4 Total Cost Per Year $ 401,138 $ 600,000 $ 21,600 $ 5,357 $ 157,309 4 4 4 1 $ 4 12 12 Total Annual Costs © Copyright 2003, Larissa T. Moss, Method Focus, Inc. $3,083,260 $ 818,008 $ 15,731 1,800 $ 31,462 $ 4,719 $ 23,596 $5,163,980 21 Impact of redundant data • Hardware (CPU, disks) and software (program maintenance) costs incurred as a result of uncontrolled redundant data • Extra time it takes to reconcile inconsistencies • Extra resources needed to reconcile inconsistencies • Unwise business decisions made due to redundant and inconsistent data • Lost opportunities due to unreliable data • Overcharging or overpayment for products • Duplicate shipping of products • Money wasted on sending redundant marketing material © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 22 Cost of redundant data © Larry English, Improving DW and BI Quality Information Development Cost Analysis Portfolio Total Number Category Infrastructure Basis: Enterprise architected DBs Enterprise reusable create/update programs + Total Infrastructure expenses Value Basis: Total retrieve equivalent pgms + Total value-adding expenses Cost-adding Basis: Redundant create/update pgms Interface/extract programs Redundant database files Total cost-adding expenses Lifetime Total ** 200 300 300 500 400 600 1,500 Relative Weight Factor* 0.75 1.50 1.00 1.50 1.00 0.75 Average Unit Dev/Maint Costs $ 15,000 $ 30,000 $ 20,000 $ 30,000 $ 20,000 $ 15,000 Total Dev/Maint Expenses** Total Infrastructure Value-adding Cost-adding Expenses % of Budget Expenses $ 3,000,000 $ 9,000,000 $12,000,000 24% $ 6,000,000 $ 6,000,000 12% $15,000,000 $ 8,000,000 $ 9,000,000 $32,000,000 64% $50,000,000 100% 3,800 * Determine relative effort to develop average unit of each category using effort to develop a retrieve program as “1.00” + For programs that retrieve some data and create/update other data, determine the percent of retrieve only attributes and percent of create/update attributes (e.g., to retrieve customer data to create an order) **Based on 3.800 application programs and database files in portfolio and $50 Million in development © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 23 Dirty data – How did it happen? Chief Business Units Business Manager Technology Manager ... Technology Manager Client Sales Business Manager ... Client Inventory ... Client Distribution Chief Information Officer Client Customer Support Chief Operating Officer Technology Client Product Pricing Business Client Financial (AP & AR) Officer Client Marketing Executive ? ... IT paired with IT IT IT IT IT IT Information Technology Units • data redundancy • process redundancy • dirty data © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 24 Major cause for data deficiencies highest to lowest priority Project Constraints Priority TIME SCOPE BUDGE T PEOPLE QUALIT Y 1 5 2 3 4      Wrong priority on project constraints! Cost-based value proposition Industrial Age: • Cheaper, faster, better • Automate as quickly as possible © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 25 Time is getting shorter – scope is getting bigger • Everyone on the business side and in IT wants quality, but rarely is the extra time given or taken to achieve it. Quality and time are polarized constraints. • The higher the quality the more effort (time) it takes to deliver. • Companies are driven by shorter and shorter schedules. SCOPE YAH TIME DDD © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 26 How are we addressing it today? • • • • • Data Warehousing Customer Relationship Management Enterprise Resource Planning Enterprise Application Integration Knowledge Management Why can’t technology fix this? Ineffective Technology Solutions © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 27 Data Warehousing DW delivers... a collection of integrated data used to support the strategic decision making process for the enterprise. The Promise:  data integration  no redundancy  consistency  historical data  ad-hoc reporting  trend analysis reporting  faster data delivery  faster data access The Reality:  stove pipe marts  departmental views  swim lane development approach  too time consuming to integrate  too costly to cleanse data  increased data redundancy it sounds too good to be true, it is to good to be true. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 28 Customer Relationship Management CRM delivers … seamless coordination between back-office systems, front-office systems and the Web. the organizational lifeline, creating competitive advantage through customer service excellence. The Promise:  data integration  data quality  customer intimacy  customer wallet share  product pricing customization  knowing your competition  geographic market potential The Reality:  more stovepipe systems  departmental views  dirty customer data  purchased packages not integrated  focus is too narrow  privacy issues it sounds too good to be true, it is to good to be true. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 29 Enterprise Resource Planning ERP delivers... a collection of functional modules used to integrate operational data to support seamless operational business processes for the enterprise. The Promise:  data integration  no redundancy  consistency  data quality  easy reporting  easy maintenance  Y2K compliance The Reality:  system conversion not crossorganizational analysis  same dirty data  operational focus  poor quality (unusable) reports  one-size-fits-all data warehouse  too costly it sounds too good to be true, it is to good to be true. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 30 Enterprise Application Integration EAI delivers ... integration of disparate applications into a unified set of business processes through centrally managed rules and middleware technologies. The Promise:  fast & automated integration  leverage existing data  bridge islands of automation  easy cross-system reporting  faster data delivery  faster data access The Reality:  dirty data  no true integration  still data redundancy  still islands of automation  easier access to the current data mess it sounds too good to be true, it is to good to be true. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 31 Knowledge Management KM delivers ... a process for capturing, editing, verifying (for accuracy), disseminating, and utilizing tacit and explicit information about the organization. The Promise:  utilize organizational info  data integration  historical data  faster data delivery  faster data access  first & only customer contact  reduction of customer calls  less re-solving same problems Reality of KM:  too difficult to build  too time consuming  too costly  technology challenges  non-sharing culture  isolated applications  difficult to disseminate information f it sounds too good to be true, it is to good to be true. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 32 What’s the lesson? You cannot keep doing what you have always done and expect the results to be different. Not even with new technology. “That wouldn’t be logical” Spock, Star Trek © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 33 What do we have to change? 1. Assess the current state of data quality at your company 2. Understand and fix the root causes for data contamination 3. Perform data audits regularly (monthly, quarterly) 4. Stop working in isolated “swim lanes” > Stop recreating data 5. Centrally manage your data like a business asset (Enterprise Information Management [EIM]) > Assemble data as needed from the data inventory (enterprise data model and meta data) > Standardize and reconcile data transformations for BI/DW applications (coordinated ETL staging area) 6. Scale down project scopes to incorporate data quality and EIM activities 7. Embed data quality and EIM activities in all projects © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 34 Business intelligence … …is …isaacross-organizational cross-organizationaldiscipline discipline and andan anenterprise enterprisearchitecture architecture for foran anintegrated integratedcollection collectionof of operational operationalas aswell wellas asdecision decisionsupport support applications applicationsand anddatabases, databases, which whichprovide providethe thebusiness businesscommunity community easy easyaccess accesstototheir theirbusiness businessdata, data,and and allows allowsthem themtotomake makeaccurate accuratebusiness businessdecisions decisions. . … is not business as usual © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 35 BI goals and objectives 80% 20% Data Management Get control over the existing data chaos Data Delivery Provide intuitive access to business information Data Reengineering (Enterprise Information Management) © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 36 Proliferation of data quality problems “LegaMarts” (Doug Hackney) Legacy BI ? L L L L transformation ? cleansing? Data Warehouses Data Marts Users DM Marketing DM Finance DW Customer Support DM Product Sales DM © Copyright 2003, Larissa T. Moss, Method Focus, Inc. Engineering 37 Industrial-age mental model Business Units Project Constraints    Client Client Sales QUALIT Y 5 Client Inventory PEOPLE 4 Client Distribution BUDGET 3 Client Product Pricing SCOPE 2 Client Financial (AP & AR) TIME 1 Client Marketing Priority Customer Support highest to lowest priority   IT IT IT IT IT IT IT Information Technology Units Scrap and rework © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 38 The game has changed …but our mental model has not 1. Enormous degree of complexity (John Zachman) 2. Extremely high rate of change Cheaper, faster, better !!! But how? Don’t scrap and rework. Reuse what you already have. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 39 Information-age mental model Project Constraints Priority QUALIT Y BUDGET PEOPLE TIME SCOPE Investment-based value proposition highest to lowest priority 1 2 3 4 5     Reassemble reusable components  Information Age: • Reassemble the entire enterprise • Reuse assets from inventory © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 40 Software release concept (1) “Extreme scoping” Projects First Release - Larissa Moss Second Release Final Release Application Reusable & Expanding Third Release Fifth Release Fourth Release “Refactoring” - Kent Beck Project /= Application © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 41 Software release concept (2) • Requirements can be tested, and implemented in increments • Scope is very small and manageable small • Technology infrastructure can be tested and proven • Data volumes (per release) are relatively small • Project schedules are easier to estimate because scope is very small • Development activities can be iteratively refined, and adapted the honed, AND: The quality of the release deliverables (and ultimately the quality of the applications) will be higher! © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 42 Cross-organizational development approach (© Larissa Moss and Shaku Atre, “Business Intelligence Roadmap”) (1) Data Quality Touch Points BI/DW Development Steps Cross1. Business Case organizational Assessment ........................... Cross2.A Enterprise Technical organizational Infrastructure ........... Cross2.B Enterprise Non-Technical organizational Infrastructure ... Project-specific 3. Project Planning ........................................... Project-specific Cross4. Project Requirements organizational Definition .................. Project-specific 5. Data Analysis ...............................................Crossorganizational 6. Application CrossPrototyping ............................... organizational 7. Meta Data Repository CrossAnalysis ................... organizational 8. Database CrossDesign .......................................... © Copyright 2003, Larissa T. Moss, Method Focus, Inc.organizational 43 9. ETL Design ....................................... Cross-organizational development approach (2) • Commitment to data quality embedded in the methodology • Cross-organizational program management • Enterprise information management group • Standards that include a common information architecture (enterprise data model)  Involving down-stream information consumers in the requirements definition step  Involving data owners in the data analysis step  Involving business representatives from all business units to ratify the data models and meta data • Coordinating the development/ETL processes  Disallowing stovepipe development  Extracting and cleansing source data only once  Reconciling data transformations and storing the reconciliation totals as meta data © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 44 Enterprise information management Business Units Client IT IT IT Client Sales Client Inventory IT Client Product Pricing Financial (AP & AR) Marketing IT Client Distribution Client Customer Support Client IT IT Information Technology Units Discover, Coordinate, Integrate, Document, Control ODS Operational Environment Operational Systems Enterprise Information Management OM EDW DM BI/DW Databases © Copyright 2003, Larissa T. Moss, Method Focus, Inc. Decision Support Environment 45 EIM responsibilities • Business architecture inventory Process models Data models • Application inventory • Meta data inventory Business meta data Technical meta data Discover, Coordinate, Integrate, Document, Control Stewards Programs Databases Architects Managers • Policy inventory Standards IT asset inventory Procedures management Guidelines … © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 46 Data stewardship • Guardians of the data while it is being created maintained by them or • Create standards and procedures to ensure that policies and business rules are known and followed • Enforce adherence to policies and business rules that govern the data while the data is in their custody • Periodically monitor (audit) the quality of the data in their custody • Also known as custodians • Can be a business person or an IT person “One who manages another’s property.” © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 47 Data ownership • Authority to establish policies and set business rules for the data under their control • Decide what the official enterprise definition and domain is for the data under their control • Monitor and advise other end users on proper usage of their data • Frequently, but not always, the data originator • Can be a person or a committee “One who has the legal right to the possession of a property.” © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 48 Enterprise architecture Mission Missionand andObjective Objective Business BusinessPrinciples Principles Business BusinessFunctions Functions Program Management Program Management Enterprise Data Model Enterprise Data Model - Data Standardization - Data Standardization - Data Integration - Data Integration - Data Reconciliation - Data Reconciliation - Data Quality - Data Quality Storage & Presentation Business Architecture Information Architecture Application Architecture Technology Architecture Operational OperationalApplications Applications Data Access Data AccessApplications Applications Data Analysis Applications Data Analysis Applications Application ApplicationDatabases Databases Technology TechnologyPlatform Platform Network Network Middleware Middleware DBMS, DBMS,Tools Tools Content 2. Data Delivery 1. Data Management • data integration• data access • data cleansing • data manipulation © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 49 Enterprise data model (data inventory) Custome r Account Payment Payment Method Account Custome r Product Order Product Part Product Existing Customer Potential Customer Salesper son TopDown Supported by common data definitions, domains, and business rules. Product Category Part Salaried Salesperson Org Unit Supplier Shipment Commissioned Salesperson Org Structure Warehouse © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 50 Source data analysis Domain Violations: • Dummy values • Intelligent dummy values • Missing values • Multi-purpose fields • Cryptic values • Free-form address lines Integrity Violations: • Contradicting values • Violation of business rules • Reused primary keys • Non-unique primary keys • Missing data relationships • Inappropriate data relationships BottomUp © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 51 To cleanse or not to cleanse … • You probably cannot cleanse it all (takes too long) • It may not be worth the time and money to cleanse every data element • Not all data is equally significant • Not all data can be cleansed • How do you know what to cleanse? …that is the question © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 52 Triaging questions (1) • Can the data be cleansed? Does the correct data exist anywhere? Is it easily accessible? • Should the data be cleansed? How extensive is the problem? How elaborate will the cleansing process be? Is it cost-effective? © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 53 Triaging questions (2) • Why are we building the application? What business questions cannot be answered today? • Why are we not able to answer the business questions? Is it because of this dirty data? Is it because of these missing relationships? • Will the benefits of cleansing outweigh the cost of the effort? © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 54 Categories of data significance • Critical data Business decision! – Not all data is equally critical to all end users – All critical data must be cleansed – Usually includes amount fields • Important data – Important to the organization, but not absolutely critical – Further prioritize important data elements – Cleanse as many as time allows – Those that cannot be cleansed should be bumped to critical for the next release • Insignificant data – Informational data, which is nice to have – Cleansing is optional if time allows © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 55 Cleansing – repairing – prevention • Where should the dirty data be cleansed? In the staging area of the BI application? In the source (legacy) files? • When should it be cleansed? Retroactively? At data entry time? • How should it be cleansed? Use data cleansing or ETL tools? Write procedural (COBOL/C++) code? • What will we do to prevent dirty data in the future? Source Data Reengineering … Total [Data] Quality Management (TQM) © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 56 Coordinated ETL staging Legacy Operat’l reports Staging Area Cleansing Transform’s Operational Data Store/ Oper Marts Tactical rpts Staging Area Cleansing Transform’s Enterprise Data Warehouse Data Marts Strategic rpts Strategic rpts OM L Customer Support DM L Daily StA ODS L CRM Operational Clients Clients Product Pricing Finance Mo StA EDW CRM DM Marketing Analytical Transformation Cleansing DM EXW EXW Engineering Legal Enterprise EnterpriseArchitecture Architecture&&Meta MetaData DataRepository Repository © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 57 ETL process flow Sales Sales File File Account Account Tran Tran File File Extract New Sales Extract New Sales Extract Accounts Extract Accounts New New Sales Sales Associate Associate Accounts Accounts Filter Accounts Filter Accounts Accounts Accounts New New Accounts Accounts Sort Accts Sort Accts Account Account Errors Errors Merge Customers Merge Customers Customer Customer Info File Info File Prospects Prospects All All Customers Customers Sort Customers Sort Customers Sorted Sorted Customers Customers Prospects Prospects Profile Profile Customers Customers – coordinated – Extract Extract 2 Customers Customers Merge Prospects Merge Prospects Extract Prospects Extract Prospects Sorted Sorted Accounts Accounts Match Match Accounts Accounts Customer Customer Master Master 1 Cleanse Cleanse Transform Transform Prepare Prepare © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 3 Load Load 58 ETL Reconciliation L L L Monthly Staging Area DM Load Files DM L DM ODS (daily) EDW (monthly) © Copyright 2003, Larissa T. Moss, Method Focus, Inc. (monthly) DM 59 ETL tie-outs: record counts INPUT RECORDS # Input Records PROCESS PROCESS MODULE MODULE = OUTPUT # Output Records RECORDS + REJECTED RECORDS # Rejected Records © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 60 ETL tie-outs: domain counts # Records Per First Output Domain OUTPUT CODES + INPUT CODES OUTPUT# Records Per Second Output Domain CODES + PROCESS PROCESS MODULE MODULE OUTPUT # Records Per Third Output Domain CODES # Records Per Input Domain = + REJECTED # Rejected Data Values CODES © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 61 ETL tie-outs: amount counts Total $ Per First Output Amount OUTPUT AMOUNTS + INPUT AMOUNTS OUTPUT Total $ Per Second Output Amount AMOUNTS PROCESS PROCESS MODULE MODULE Total $ Input Amounts + = REJECTEDTotal $ Rejected Amounts AMOUNTS Total $ Per First Input Amount + Total $ Per Second Input Amount + Total $ Per Rejected Amounts © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 62 Data quality improvements • Source data repairs • Increased program edits • Enhanced data entry procedures • Improved data quality training • Regular data audits • Data usage monitoring • Enterprise-wide end user surveys • Continuous validation of enterprise data model • Continuous validation of meta data, especially definitions and domains • Involvement of data owners, information consumers, and business sponsors © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 63 Data quality maturity At what level of DQ maturity is your organization? Program abends 1 Data profiling Data cleansing during ETL Discovery by accident 2 Limited data analysis 3 short term Repairing source data and programs Enterprise-wide DQ methods & techniques Addressing root causes 4 Proactive prevention Scale of 1 .. 5 5 long term Continuous DQ process improvements Optimization © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 64 DQ capability maturity model (1) (Source: Larry English) CMM Level 1. Uncertainty - Unconscious and unaware » » » » Data quality problems are denied. No formal data quality processes defined. Data quality initiatives are ad hoc and chaotic. Any success is dependent on individual efforts. CMM Level 2. Awakening - The big Aha! and lip service » » » » Data quality problems are acknowledged. Major problems are attacked as they come up. Minimum funding for a formal data quality initiative. Capability is a characteristic of the individual rather than the © Copyright 2003, Larissa T. Moss, Method Focus, Inc. organization. 65 DQ capability maturity model (2) (Source: Larry English) CMM Level 3. Enlightenment - Let’s do something » Data quality initiative takes off. » Enterprise-wide data quality assessment is performed. » Data quality problems are corrected at the source (where possible). » Data quality improvement process is institutionalized. CMM Level 4. Wisdom - Making a difference » » » » Management accepts personal responsibility for data quality. Data quality group reports to a chief officer (CIO, CKO, COO). Data quality correction changes to data defect prevention. All business areas are involved. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 66 DQ capability maturity model (3) (Source: Larry English) CMM Level 5. Certainty - Nirvana » Data defect prevention is the main focus. » Data quality is an integral part of the business processes. » All business areas are continuously improving the processes. » The culture of the organization has changed. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 67 Organizational impact • Cross-organizational tasks and responsibilities well defined • Data quality responsibility is not clear or ignored are not • Value of data is not understood or appreciated • Projects are often cost justified using the industrial-age mental model • Resource requirements are not well defined • Impact on application development empire • No reward for data sharing • Resistance to change © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 68 Organizational changes • Business and IT collaboration (“partnership”) • Business and business collaboration (“partnership”) • IT and IT collaboration (“partnership”) • Increased end user involvement • Cross-organizational activities • Architecture and standardization • Software release concept • New charge-back system • New incentives • New leadership © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 69 New leadership CEO collaboration collaboration CFO COO LOB Execs Enterprise Information Management CKO EIM CTO ...EA IT Execs DA DQA MDA Chief Knowledge Officer © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 70 How do we change? 12 steps to [DQ] recovery (1) 1. Become aware • Every cultural transformation process begins with an “Aha”. • Understand the root causes for your current data chaos. 2. Accept responsibility • “Yes, it is our fault” for being in this mess. • Accepting responsibility is a prerequisite for change. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 71 12 steps to [DQ] recovery (2) 3. Decide to change • Now that “you know better”, the decision is yours: Stay stuck or change. • There can be no more false hopes for any silver bullet technology solutions. 4. Identify root causes • What are the specific root causes for non-quality data in your organization? • Some root causes are common, some are not. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 72 12 steps to [DQ] recovery (3) 5. Collaborate • It doesn’t matter “whose fault” it is that the root causes exist. • IT must collaborate with the business community to affect changes. • Business community must also collaborate with business community. 6. Identify change agents • Who will be the couriers? • Changes must be systemic and holistic, not isolated and sporadic. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 73 12 steps to [DQ] recovery (4) 7. Spread the word • To embrace changes, there must be “something in it” for everybody. • Otherwise, changes trigger anxiety and anxiety results in resistance or rejection. 8. Plan changes • Big changes do not get implemented in one “Big Bang”. • Involve people in change planning. • Cross-organizational changes are phased in. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 74 12 steps to [DQ] recovery (5) 9. Prioritize changes • Some changes are easier to implement than others. • Some changes have a higher payback. 10. Implement changes • Everyone affected by the changes must have an opportunity to review and approve the plan before implementation. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 75 12 steps to [DQ] recovery (6) 11. Measure effectiveness • Solicit feedback from “the trenches”. • Are the changes affecting anyone adversely? 12. Refine changes • Nothing is perfect the first time around. • What might work in one organization may not work in another. © Copyright 2003, Larissa T. Moss, Method Focus, Inc. 76 Bibliography •Adelman, Sid, and Larissa Terpeluk Moss. Data Warehouse Project Management. Boston, MA: Addison-Wesley, 2000. •Aiken, Peter H. Data Reverse Engineering: Slaying the Legacy Dragon. New York: McGraw-Hill, 1995. •Brackett, Michael H. Data Resource Quality: Turning Bad Habits into Good Practices. Boston, MA: Addison-Wesley, 2000. •Brackett, Michael H. The Data Warehouse Challenge: Taming Data Chaos. New York: John Wiley & Sons, 1996. •English, Larry P. Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. New York: John Wiley & Sons, 1999. •Hoberman, Steve. Data Modeler’s Workbench: Tools and Techniques for Analysis and Design. New York: John Wiley & Sons, 2001. •Kuan-Tsae, Huang, Yang W. Lee, and Richard Y. Wang. Quality Information and Knowledge Management. Upper Saddle River, NJ: Prentice Hall, 1998. •Marco, David. Building and Managing the Meta Data Repository: A Full Lifecycle Guide. New York: John Wiley & Sons, 2000. •Moss, Larissa T., and Shaku Atre. Business Intelligence Roadmap: The Complete Lifecycle for Decision-Support Applications. Boston, MA: Addison-Wesley, 2003. •Reingruber, Michael C., and William W. Gregory. The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models. New York: John Wiley & Sons, 1994. •Ross, Ronald G. The Business Rule Concepts. Houston, TX: Business Rule Solutions, Inc., 1998. •Simsion, Graeme. Data 2003, Modeling Essentials: Analysis, and Innovation. 77 © Copyright Larissa T. Moss, Method Focus,Design, Inc.

51,000 تومان