علوم مهندسی

Data Warehousing

صفحه 1:
Chapter 0: 0۵۵) حول :0 dePPrep @. WofFer, Dav ®. Presvot, Pred R. OrtPackteo

صفحه 2:
ODePicitiog oP ters (Reusous Por ‏تا ونوا مور وا‎ Reuse Por ceed ‏حاص جاع سد ولوك خام‎ Oesoribe three levels oP dota worekouse achitertures List Pour steps oP ‏توص ول‎ Orsoribe two vowpouesis oP star eschew Cstccate Pact table size Opsiqa a dota wart [ATE UNIVERSITY

صفحه 3:
(0 Py ۹ ۰ ۵0 — © sbievt-orieuted, ioteqrated, fve-vartcdl, ave-updatabhe vollevtioa oP data used i Support oP eacagened ‏سل‎ ‏مس باه‎ — Ohevtorieded: 4. ostowvers, poicuts, studects, products - Teteqraed: Ovesistedt cexviey poovedives, Porwats, ‏لصو‎ ‎structures; Proc wutiple data sources — Preevortad, Oon study treads cod choages — Opoudabe: Read-ooly, perivdicdly rePreshed * Ode Owt — © data worekouse thal is hevited tt soope E UNIVERSITY

صفحه 4:
© Aoteqroted, pospoep-wide view oP ‏ماه لامج‎ (Proc disporde databases) ۰ Gepardiv oP operciocd and icPforwaiccd systews ord dott (Por teeproved perPorxvade) Table 11-1 Comparison of Operational and Informational Systems informational Systems ‘Support managerial decision making Historical point.in-time (snapshots) and predictions Managers, business analysts, customers Broad, ad hoc, complex queries and analysis Ease of flexible access and use Periodic batch updates and queries requiring many oral rows Operational Systems Run the business on a current basis Current representation of state of the business Clerks, salespersons, administrators Narrow, planned, and simple updates and queries Performance: throughput, availability Many, constant updates and queries on one or a few table rows Characteristic Primary purpose Type of data Primary users Scope of usage Design goal Volume A STATE UNIVERSITY

صفحه 5:
e Versus Data Mart Data Mart Scope + Specific DSS application * Decentralized by user area * Organic, possibly not planned Data * Some history, detailed, and summarized + Highly denormalized Subjects * One central subject of concer to users Sources ۰ Few internal and extemal sources Other Characteristics + Restrictive + Project-oriented * Short lite + Start small, becomes large + Multi, semi-complex structures, together complex Table 11-2 Data Wareho Data Warehouse Scope + Application independent * Centralized, possibly enterprise-wide * Planned Data > Historical, detailed, and summarized + Lightly denormalized Subjects * Multiple subjects Sources + Many internal and external sources Other Characteristics * Flexible * Data-oriented Long life * Large + Single complex structure Adapted rom Strange (1997) TATE UNIVERSITY IOWA

صفحه 6:
Outa Ourehouse (rchitevtures * ‏وق‎ Two-Level Orckitecture ۰ Iedepesdedt Data Dart * Oepertedt Data Out ond Operatiocd Data Gtore Lovicd Duta Out ant @rive Oarekouer ° DrreeLwer ucvhiterture lve some form of extraction, transformation andgoading TOWA STATE UNIVERSITY

صفحه 7:
Figure 11-2: Generic two-level data warehousing architectt Source Data Staging roa Data &Netsdata End-User Data Systems Storage Area Presentation Tools ۳ ea Processing ‘Ad hoe query i tools oak ‏فلت‎ ‎done pee ۲1 ‏مه عم‎ nat ieee Report mitre stone ‏تسس‎ Endusor 0 (aie > al ‘pplceions ۳ ‏تم سر‎ tea casting ae ‏و‎ 1 mining tools: , Visualization E 72 delivery occ riodic extraction > data is not completely curren TOWA STATE UNIVERSITY

صفحه 8:
Figure 11-3 Independent data mart data warehousing Dew : Mini-warehouses, limited in architecture ‏و‎ noe Data Systems Storage Area Presentation Tools ۳ Processing Ad hoc every ean ‘ole reconcile matched to ae preservation ‏ع‎ format ۲-۸ ie temo chips Report writers ‏مامه‎ ‎transform End-user conform ‘plications Modeling internat expert to dats ‘mining tools Exemal teks ۳ hy 7 ‏ا‎ tools ~_E 1 1 Macy ess = a Separate ETL for each ata access 4 independent data mart complexity d, inle d TOWA STATE UNIVERSITY

صفحه 9:
ODS provides option a three- _ for obtaining ‎currentdata‏ سس ‎Storage Area Presentation Tecs‏ ‎‘Ad hoc query teals ‏لمع‎ ‎preceriation ‏سس‎ ‎Report writers ‎End-user ‎‘applications ‎Modeling ‎‘mining tools ‎Visualization ‎ ‎ ‎impler data access ‎۹ ‎Dependent data ‎ ‎Figure 11-4 Dependent data mart ‎with operational data store: level arehitecture Data Staging Area ‎ata Systems (Operational Data Store) Internal ‎< ‎External ‎2 mip ‎Data Storage relvional, fast Processing deen reconcile erie match ‏ام‎ ‎remove dupe ‏مدا ماه‎ tensiorm center dimensions export = DW ‘nel DIM: ‎leanee ‏و‎ ‎waa ‎ ‎ ‎ ‎Single ETL for enterprise data warehouse ‎TOWA STATE UNIV E1511 Y ‎ ‎ ‎ ‎ ‎

صفحه 10:
ODS and data warehouse are one and the same Figure 11-5 Logical data mart and real time warehouse ‏فیس‎ ‎Dota & Motadata New buss nies ۳ architegture Data Staging hes End User Deta Systems (Operations Dats Stor) Storage Area Presentation Tec ‏سس سا‎ 0 | Data Storage Ad hoc query vena st ‏مس‎ tons Real-time Report writers Processing Entucer ۳۳ ‏ی‎ ‏مهم ی‎ a, CRM ane ‏ا‎ er sti ATH atch 2 ‏نگ ات‎ (Fe (tial >| remove dus 4 ۳ ‏سس‎ ‎| ‏و‎ ‏عاد و ساد | و جح‎ 2 ‏ا‎ 01 ۳ tools Near real-time ETL fotata marts are NOT sepg Data Warehouse databases, but logical, TOWA STATE UNIVERSITY

صفحه 11:
Outa Charunteristivs 0 سب ‎Exavple oF OBOG‏ ‎Gtatus vs. Cuect Data‏ روت رجا ‎eore image‏ ‎x2 | abeat | oaza004 | 720 Statu‏ ‎J 5‏ ‎Update‏ ‎Kis‏ ‎coargr2008 | | vent (wthetawa) Event = a database‏ ‎action‏ ‎(create/update/delete)‏ 78 ‎that results from a‏ After image transaction [rs [acm Statu 4 Ki 5 WA STATE UNIVERSITY

صفحه 12:
hii Outa Ckarunteristics Drecwiedt pperciocd ‏بل‎ Trassiedt vs. Periodic Duta Table X (10/05) With ky [a | 8 transient oot | a |b data, oo2 |e | a changes to ooa | ce | f existing oot | a | on records are written Table X (10/06) Table X (10/07 vail able X (1 able X (10107) 1 6006 ‏ا‎ 6 previous ‏د‎ | 2 a oe records, ۱ ‏اع أ ممه ۵ نت‎ thus ۱ wos |e br ۷ c08: destroying ۳

صفحه 13:
we Outs Churucteristics ‏ول‎ Drasiedt vs. Periodic Duta وی دس كه ص | © ‎Om [A]‏ | وه ‎Periodic‏ 6ص | ‎wa [vo [fe] data are‏ ‎never‏ اه ‎physicall‏ ‎wbx 007‏ وا ‎“ie‏ ‎Key [| Ow [A | 8 | Aton key | Ove | A | 8 | Aten altered‏ ‎cor | voor [a [>| c cor | woos [+ [® | © or‏ ‎deleted‏ ۳ - ال ات | ‎os | ۱08 | ۰ [| ۲ 0 once‏ 3 ۲ | ۰ | ۱۵8 | موه ‎om | wes fa | | 6 pw | wor fe ls |v they‏ ‎wo | we fel] e have‏ م ‎oo | oe fy Le]‏ ‎wow [wos fy [|v 3‏ م |[ ‎row [awe Tm‏ ‎Dw os | wor | y | > ۲ > 5‏ 1 6 هه | ۱۹۹۱۱۵۹ Bt

صفحه 14:
Other Duta Darehouse Changes © Dew desoipive utributes ٠ Dew busicess uniivity otrbutes © Dew chsses oP desoipive utributes © Orsmipive utbutes bevowe wore rePiced ٠ Orsoripive dota ae related ty poe water * Dew source vP dot rE UNIVERSITY

صفحه 15:
Phe Revowiled Duta Lauper * Dypicdl operatccd cata te — Dresiect-ont histericad Der ested (pete dee ceased Bie padres) — Restricted ta scope powprekeuive — Goweitres poor qudliyoovesisieanies urd errors * OPter GPL, dota should be: — Detaled-ont ‏اسر موه‎ ‏ج اما‎ — ‏و او لمر لد اده(‎ or higher — Oowpreheusiveruerprise-wide perspevive — Doel ‏وملعم‎ should be cond eapuyh ‏عامط اطوعلك اطوجه صا‎ — Quality ‏رکه انح ات عمجم امن‎ rE UNIVERSITY

صفحه 16:
Phe EPL Process * Cupture/@xtrant © Gob or cote ‏وان‎ ‎٠ ProwPornv © Lowd ocd ‏لها‎ ETL = Extract, transform, and Igat

صفحه 17:
Capture/Extract...obtaining a snapshot of a chosen subset of the source data for loading into the data 0 0 عدره وك :00 مس ‎Gteps ia toto _- Serub/Cleanse Transform ~—‏ ‎a‏ 0 ‎recoveries J 0 a‏ ‎i Staging Area ۱‏ ‎Load \,‏ 7 ‎/|Capture/Extract 5‏ ‎en ee ”"Meceages sbout ,‏ ‎et rejected data 5‏ + للد اس 5 مدا ۲ ماس -~ 57 د ‎Operational Messages about Enterprise data‏ ‎systems. rejected data warehouse or‏ ‎operational data‏ ‎tore‏ ‎Static extract = Incremental extract =‏ ‎capturing a snapshot of capturing changes a ۱‏ ‎the source data at a have occurred ۱‏ ‎iot in ti last static extag‏ point in time TOWA STATE UNIVERSITY

صفحه 18:
Scrub/Cleanse...uses pattern recognition and Al techniques to upgrade data quality ‎ea =‏ :00 سم ‎Gteps ia dota ‘ScrubiGleanse] .“ Transform ~~‏ ‎revatio x /‏ ‎(coct.) 0 “Staging Area‏ ‎CapturelExtract,‏ / ‎Mescages about‏ ” ار ‎ee a rejected data at‏ - 2 = 7 2-9 ‎ee b >‏ ا سد ‎data‏ سر ‎Operational Meesages about‏ ‎systems rejected data warehouse or‏ ‎operational data‏ ی ‎store‏ ‎Fixing errors: Also: decoding,‏ ‎misspellings, erroneous reformatting, time stamping,‏ ‎dates, incorrect field usage, conversion, key genera |‏ ‎ ‎ ‎ ‎mismatched addresses, merging, error missing data, duplicate data, detection/loggingg# ‎ATE UINIV ‎ ‎ ‎ ‎RSITY ‎ ‎

صفحه 19:
Transform = convert data from format of operational system to format of data warehouse ‎ae‏ و تست :100 سب ‎Transform | ~‏ را ‎Gteps in dota _~~ Scrub/Cleanse‏ ‎Z‏ ترس مت وا ‎(cout.) Me ‘Staging Area‏ ‎ ‎/ Capture/Extract ‎Messages about ‎ ‎rejected data 9 ‏هه‎ ۳ as Operational Messages about Enterprise data systems rejected data warehouse or ‘operational data store Record-level: Field-level: Selection-data partitioning single-field-from one field tp (24 Joining-data combining field ‎Aggregation-data multi-field-from man summarization 1 5 1 ‎ ‎ ‎

صفحه 20:
Load/Index= place transformed data into the warehouse and ۳ 1040: create indexes Geeps ta dota ‏ساي تر‎ 7 Transform > ۳ 3 قهة ووأوهاه ر يهنا ‎CaptuerExtract‏ / ‎ole _”” Messages about — |‏ ‎a rejected data pea‏ _— 49 ~ ف ‎Sess ie‏ 3 = ‎Operational Messages about Enterprise data‏ ‎systems rejected data warehouse or‏ ‎‘operational data‏ ‎store‏ ‎Refresh mode: bulk Update mode: onlys‏ ‎rewriting of target data at changes in source da; ۳‏ ‎periodic intervals written to data wa‏ TOWA STATE UNIVERSITY

صفحه 21:
دمشدوسرو صصص لاصخ جابيد 8) :00-00 سب In general-some transformation function translates data from old form 555 5 to new form ‏مس‎ Algorithmic transformation 1 uses a formula or logical Sarees expression Table lookup-another approach, uses a separate table keyed b) source record code

صفحه 22:
Piure (0-08: OutiPield trocsPoreativd Source Record ‎Telephone No | +=‏ | شم | مس مه ‎Sie‏ ‎M:1-from many source‏ 5 ‎fields to one target‏ ‎field‏ ‎“angst Recor‏ ‎addrose | ++‏ | قيمك | مصرممع لمعته ماوق ‎ ‎ ‎1:M-from one source field to many target fields ‎Target Record ‎ ‎ ‎ ‎Product iD | Brond Name | Product Name | === ‎ ‎ ‎ ‎TOWA STA JNIVERSITY ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

صفحه 23:
Orrived Data ٠ Obievives — Gose oP use Por devisivg support uppiicotioas — Post respouse to predePiced user queries — Custowized data Por poricuhar forget cudieures — Od-koo query support — Oats wictay capabilities ‏سمل دعر‎ — Ortaled (costly perivdiz) dota = Bycreyae (Por suoaary) — Ostributed (10 departeectdl servers) Most common data model = star schemg Gi (also called “dimensional model”) rE UNIVERSITY

صفحه 24:
‎Cowpourcds of o ster schewe‏ 1-09( مب ‎Fact tables contain ‎ee nen Dimension table actual or quantitative ۳ 2 ‏سم‎ Key 3 (PK) Fact table ‘Attribute ‏وموم درم | سس‎ | Attribute > ۲ 2 (PK\FK) Key 3 (PK\(FK) [>> 5 Attribute Key 4 (PK\FK) | Dimension tables are denormalized ind fact Key 5 (PK) to maximize performance ‏چم‎ Dimension table 1 ‏اس‎ Key 4 (PK) Data column Attribute Attribute Data column Dimension tables contain ۳7۳ descriptions about the subjects of ۳ ‎ ‎ ‎ ‎ ‎ ‎Dimension table Key 1 (PK) Attribute Attribute ‎Attribute ‎Dimension table Key 2 (PK) ‎Attribute ‎Attribute ‎Attribute ‎ ‎the business Excellent for ad-hoc queries, but bad for online transa: ‎RSITY ‎IMA‏ كن ‎1:N relationship between dimension tables a ‎table: ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

صفحه 25:
امه موه وق :00-06 سم PRODUCT Fact table provides statistics for Product_Code sales broken down by product, Description period and store dimensions Color See. SALES STORE ‏ها‎ Product _Code ;—] Store_Code <<) Period Code Store_Name PERIOD 2 Store Code ‏لخ‎ City Period Code ‏سس‎ 8 Units. Sold ‘oor nits § Telephone Dollars Sold ay ollars Manager Dollars_Cost Month Day TOWA STATE UNIVERSITY

صفحه 26:
‎sckews wit socople dota‏ 6 10-9 سم ‎ ‎ ‎ ‎Product Period ‎“Gade | Deceripton | Color | size ede | Year | Quarter | Month 100 | Sweater | Blue | 40 zoos] 4 4 410 | shoes | Brown | 10 1/2 zoos] 4 3 125 | Glows [tan | 2004] 1 3 ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎Product | Period | Store | unite | Dotare | Datars Cede | Cece | Sola | soit | Cost ‏6د‎ | 62 | sr | a0 | 1800 | 1200 sates] 126 | 002 | sz | 50 | 1000 | 600 100 | oor | si | 40 | 1600 | 1000 ‏ود‎ | 02 | sa | ‏مه‎ | 2000 | 1200 100 | 00a | se | a0 | 1200 | 750 sia | Store Corte } Name | cry | Tatepnone | manager y Store Tan's | San Antonia | 689-109-1400 | Burgace 1 eure | Portna | 042.621.2125 | Thomas cde | Bouser | 417-106-0007} ‏بده‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎TOWA STATE UNIVERSITY ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎

صفحه 27:
Issues Rexardiccy Star Gokewu Onweusive toble keys woust be sunpyate (aoc-icteliqedt! aad ae-busitess irokted), beoxiter! = (Keys wey cheep over eo بمج مت مالسا ‎Pant Pable-whot tevel oP detal do pou wat?‏ اه رطلجم) ‎= Drocsurtord yrcie Piast vel ‏سای مق تلهم( ‎— Creer yu D> beter worket bosket voapsts copcbiliy ‎= Creer gra D wore ckeewina tbls, wore rows ki Pa table ‎QOuratica oP the dotabase—how wurk history shoud be kept? ‎= Doturd dura ‏مه مین‎ S quarters ‎= Prrcetd ketliiows cay wed baer duration 5 — Older data i ‏ال ل‎ | ‎ ‎ATE UNIVERSITY

صفحه 28:
عصل مط:() :0-19 سم Country Calendar Table Date Dimension Table Fact Table Date key (PKIIFK] Date key [PK] Date key [PKIIFK] Country [PK] Full date Other PKs Holiday flag Day of week (Country PK needed Religious holiday flag Day number in month if facts relate to a Civil holiday flag Day number overall specific country) Holiday name ‘Week number in year ‏تم‎ ‎Season Week number overall Month Month number overall Event Table ‏ات‎ ‎Event key [PK] Weekday flag Event type Last day in month flag Event name Event key [FK] a ۲ 5 Fact tables contain time-period data ‏ی‎ > Date dimensions are important ‏ا‎

صفحه 29:
ke use oP a set oP qrophicd tools thot provides users with ‏اجان‎ views oP their dota cord ‏ما چاه‎ ‏راون صا‎ the dota ustey steeple wierdowwieg techoiques * OLOP Operativas - Onbe slit — coe up wil O-D view oP dota - Ortkdawe — spicy Prow sucvary to score detoiled views [ATE UNIVERSITY

صفحه 30:
له ول و م6 41-981 سم Measure Units | Revenue | Cost 200 | 1863 | 1020 200 | 1278 | ars 350 | 1800 | 1275 400 | 1935. | 1800

صفحه 31:
Sales 8 $100 1 ‘Sales EJ ‏ويه‎ ‏و‎ ‎0 Summary الس رومع ‎Speck‏ ال ‎Spach‏ | ميت هت | مامت ‎Drill-down with‏ ‎color added‏ ‎rene | Package sze | coor‏ ‎Setow [2m | white‏ سا ‎en‏ ‎Fin‏ 7 ال ‎i ee ۶‏ ‎Gren‏ | ۳ ‎Snack | Yaw‏ | سمه عي | ‎Sete [Spat‏ سن | اسه | ‎Seto‏ 06 سم تیه )] Starting with summary data, users can obtain details for particular cells JNIVERSITY ۹ 10114

صفحه 32:
Outa Diciagy cord Oiscctizaica (Geawedge dscovery vestry a beud oP statsticd, 1, vnd coor ‏جات اي‎ ods: ماوت بسن سره لاه ‎Cxpkin‏ = - OnrPinw kypokeses = Orpbre dara Por caw or ‏ساسا لصو‎ Devbuiques مس ایو = مد ط) - ۹ - Verdes = Protas Que visudizaiva — represeuiey data ia yruphicalemulicreda Boreas Por ۱۹۹۱۱۵۹ Bt

Chapter 11: Data Warehousing Modern Database Management Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden 1 Objectives • Definition of terms • Reasons for information gap between information needs and availability • Reasons for need of data warehousing • Describe three levels of data warehouse architectures • List four steps of data reconciliation • Describe two components of star schema • Estimate fact table size • Design a data mart 2 • Data Warehouse: Warehouse Definition – A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decisionmaking processes – Subject-oriented: e.g. customers, patients, students, products – Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources – Time-variant: Can study trends and changes – Nonupdatable: Read-only, periodically refreshed • Data Mart: Mart – A data warehouse that is limited in scope 3 Need for Data Warehousing • Integrated, company-wide view of high-quality information (from disparate databases) • Separation of operational and informational systems and data (for improved performance) 4 5 Data Warehouse Architectures • Generic Two-Level Architecture • Independent Data Mart • Dependent Data Mart and Operational Data Store • Logical Data Mart and @ctive Warehouse • Three-Layer architecture olve some form of extraction, transformation and loading 6 Figure 11-2: Generic two-level data warehousing architectu L T One, compan y-wide warehou se E riodic extraction  data is not completely current in wareh 7 Figure 11-3 Independent data mart data warehousing architecture Data marts: Mini-warehouses, limited in scope L T E Separate ETL for each independent data mart Data access complexity due to multiple data marts 8 Figure 11-4 Dependent data mart with operational data store: a threelevel architecture ODS provides option for obtaining current data L T E Single ETL for enterprise data warehouse (EDW) Simpler data access Dependent data marts loaded from EDW 9 Figure 11-5 Logical data mart and real time warehouse architecture ODS and data warehouse are one and the same L T E Near real-time ETL forData marts are NOT separate Data Warehouse databases, but logical views of the data warehouse  Easier to create new data 10 Figure 11-7 Example of DBMS log entry Data Characteristics Status vs. Event Data Statu s Event = a database action (create/update/delete) that results from a transaction Statu s 12 Figure 11-8 Transient operational data Data Characteristics Transient vs. Periodic Data With transient data, changes to existing records are written over previous records, thus destroying the previous data content 13 Figure 11-9: Periodic warehouse data Data Characteristics Transient vs. Periodic Data Periodic data are never physicall y altered or deleted once they have been added to the 14 store Other Data Warehouse Changes • • • • • • New descriptive attributes New business activity attributes New classes of descriptive attributes Descriptive attributes become more refined Descriptive data are related to one another New source of data 15 The Reconciled Data Layer • Typical operational data is: – – – – Transient–not historical Not normalized (perhaps due to denormalization for performance) Restricted in scope–not comprehensive Sometimes poor quality–inconsistencies and errors • After ETL, data should be: – – – – – – Detailed–not summarized yet Historical–periodic Normalized–3rd normal form or higher Comprehensive–enterprise-wide perspective Timely–data should be current enough to assist decision-making Quality controlled–accurate with full integrity 16 The ETL Process • • • • Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load 17 Figure 11-10: Steps in data reconciliation Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract 18 Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Figure 11-10: Steps in data reconciliation (cont.) Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating 19 missing data Transform = convert data from format of operational system to format of data warehouse Figure 11-10: Steps in data reconciliation (cont.) Record-level: Selection–data partitioning Joining–data combining Aggregation–data summarization Field-level: single-field–from one field to one field multi-field–from many fields to one, or one field to many 20 Figure 11-10: Steps in data reconciliation (cont.) Load/Index= place transformed data into the warehouse and create indexes Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse 21 Figure 11-11: Single-field transformation In general–some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup–another approach, uses a separate table keyed by source record code 22 Figure 11-12: Multifield transformation M:1–from many source fields to one target field 1:M–from one source field to many target fields 23 Derived Data • Objectives – – – – – Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities  Characteristics – Detailed (mostly periodic) data – Aggregate (for summary) – Distributed (to departmental servers) Most common data model = star schema (also called “dimensional model”) 24 Figure 11-13 Components of a star schema Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business Excellent for ad-hoc queries, but bad for online transaction processing 25 Figure 11-14: Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions 26 Figure 11-15 Star schema with sample data 27 Issues Regarding Star Schema • Dimension table keys must be surrogate (non-intelligent and non-business related), because: – Keys may change over time – Length/format consistency • Granularity of Fact Table–what level of detail do you want? – – – – Transactional grain–finest level Aggregated grain–more summarized Finer grains  better market basket analysis capability Finer grain  more dimension tables, more rows in fact table • Duration of the database–how much history should be kept? – Natural duration–13 months or 5 quarters – Financial institutions may need longer duration – Older data is more difficult to source and cleanse 28 Figure 11-16: Modeling dates Fact tables contain time-period data  Date dimensions are important 29 On-Line Analytical Processing (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques • Relational OLAP (ROLAP) – Traditional relational representation • Multidimensional OLAP (MOLAP) – Cube structure • OLAP Operations – Cube slicing – come up with 2-D view of data – Drill-down – going from summary to more detailed views 31 Figure 11-22: Slicing a data cube 32 Figure 11-24 Example of drill-down Starting with summary data, users can obtain details for particular cells Summary report Drill-down with color added 33 Data Mining and Visualization • Knowledge discovery using a blend of statistical, AI, and computer graphics techniques • Goals: – Explain observed events or conditions – Confirm hypotheses – Explore data for new or unexpected relationships • Techniques – – – – – Case-based reasoning Rule discovery Signal processing Neural nets Fractals • Data visualization – representing data in graphical/multimedia formats for analysis 34

62,000 تومان