علوم مهندسی

Data Warehousing

در نمایش آنلاین پاورپوینت، ممکن است بعضی علائم، اعداد و حتی فونت‌ها به خوبی نمایش داده نشود. این مشکل در فایل اصلی پاورپوینت وجود ندارد.




  • جزئیات
  • امتیاز و نظرات
  • متن پاورپوینت

امتیاز

نقد و بررسی ها

هیچ نظری برای این پاورپوینت نوشته نشده است.

اولین کسی باشید که نظری می نویسد “Data Warehousing”

Data Warehousing

اسلاید 1: 1Chapter 11: Data WarehousingModern Database ManagementJeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

اسلاید 2: 2ObjectivesDefinition of termsReasons for information gap between information needs and availabilityReasons for need of data warehousingDescribe three levels of data warehouse architecturesList four steps of data reconciliationDescribe two components of star schemaEstimate fact table sizeDesign a data mart

اسلاید 3: 3DefinitionData Warehouse: A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processesSubject-oriented: e.g. customers, patients, students, productsIntegrated: Consistent naming conventions, formats, encoding structures; from multiple data sourcesTime-variant: Can study trends and changesNonupdatable: Read-only, periodically refreshedData Mart:A data warehouse that is limited in scope

اسلاید 4: 4Need for Data WarehousingIntegrated, company-wide view of high-quality information (from disparate databases)Separation of operational and informational systems and data (for improved performance)

اسلاید 5: 5

اسلاید 6: 6Data Warehouse ArchitecturesGeneric Two-Level ArchitectureIndependent Data MartDependent Data Mart and Operational Data StoreLogical Data Mart and @ctive WarehouseThree-Layer architectureAll involve some form of extraction, transformation and loading (ETL)

اسلاید 7: 7Figure 11-2: Generic two-level data warehousing architectureETLOne, company-wide warehousePeriodic extraction  data is not completely current in warehouse

اسلاید 8: 8Figure 11-3 Independent data mart data warehousing architectureData marts:Mini-warehouses, limited in scopeETLSeparate ETL for each independent data martData access complexity due to multiple data marts

اسلاید 9: 9Figure 11-4 Dependent data mart with operational data store: a three-level architectureETLSingle ETL for enterprise data warehouse(EDW)Simpler data accessODS provides option for obtaining current dataDependent data marts loaded from EDW

اسلاید 10: 10ETLNear real-time ETL for Data WarehouseODS and data warehouse are one and the sameData marts are NOT separate databases, but logical views of the data warehouse Easier to create new data martsFigure 11-5 Logical data mart and real time warehouse architecture

اسلاید 11: 11Figure 11-6 Three-layer data architecture for a data warehouse

اسلاید 12: 12Data Characteristics Status vs. Event DataStatusStatusEvent = a database action (create/update/delete) that results from a transactionFigure 11-7 Example of DBMS log entry

اسلاید 13: 13Data Characteristics Transient vs. Periodic DataWith transient data, changes to existing records are written over previous records, thus destroying the previous data contentFigure 11-8 Transient operational data

اسلاید 14: 14Periodic data are never physically altered or deleted once they have been added to the storeData Characteristics Transient vs. Periodic DataFigure 11-9: Periodic warehouse data

اسلاید 15: 15Other Data Warehouse ChangesNew descriptive attributesNew business activity attributesNew classes of descriptive attributesDescriptive attributes become more refinedDescriptive data are related to one anotherNew source of data

اسلاید 16: 16The Reconciled Data LayerTypical operational data is:Transient–not historicalNot normalized (perhaps due to denormalization for performance)Restricted in scope–not comprehensiveSometimes poor quality–inconsistencies and errorsAfter ETL, data should be:Detailed–not summarized yetHistorical–periodicNormalized–3rd normal form or higherComprehensive–enterprise-wide perspectiveTimely–data should be current enough to assist decision-makingQuality controlled–accurate with full integrity

اسلاید 17: 17The ETL ProcessCapture/ExtractScrub or data cleansingTransformLoad and IndexETL = Extract, transform, and load

اسلاید 18: 18Static extract = capturing a snapshot of the source data at a point in timeIncremental extract = capturing changes that have occurred since the last static extractCapture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouseFigure 11-10: Steps in data reconciliation

اسلاید 19: 19Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data qualityFixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistenciesAlso: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing dataFigure 11-10: Steps in data reconciliation(cont.)

اسلاید 20: 20Transform = convert data from format of operational system to format of data warehouseRecord-level:Selection–data partitioningJoining–data combiningAggregation–data summarizationField-level: single-field–from one field to one fieldmulti-field–from many fields to one, or one field to manyFigure 11-10: Steps in data reconciliation(cont.)

اسلاید 21: 21Load/Index= place transformed data into the warehouse and create indexesRefresh mode: bulk rewriting of target data at periodic intervalsUpdate mode: only changes in source data are written to data warehouseFigure 11-10: Steps in data reconciliation(cont.)

اسلاید 22: 22Figure 11-11: Single-field transformationIn general–some transformation function translates data from old form to new formAlgorithmic transformation uses a formula or logical expressionTable lookup–another approach, uses a separate table keyed by source record code

اسلاید 23: 23Figure 11-12: Multifield transformationM:1–from many source fields to one target field1:M–from one source field to many target fields

اسلاید 24: 24Derived DataObjectivesEase of use for decision support applicationsFast response to predefined user queriesCustomized data for particular target audiencesAd-hoc query supportData mining capabilities CharacteristicsDetailed (mostly periodic) dataAggregate (for summary)Distributed (to departmental servers)Most common data model = star schema(also called “dimensional model”)

اسلاید 25: 25Figure 11-13 Components of a star schemaFact tables contain factual or quantitative dataDimension tables contain descriptions about the subjects of the business 1:N relationship between dimension tables and fact tables Excellent for ad-hoc queries, but bad for online transaction processingDimension tables are denormalized to maximize performance

اسلاید 26: 26 Figure 11-14: Star schema exampleFact table provides statistics for sales broken down by product, period and store dimensions

اسلاید 27: 27 Figure 11-15 Star schema with sample data

اسلاید 28: 28Issues Regarding Star SchemaDimension table keys must be surrogate (non-intelligent and non-business related), because:Keys may change over timeLength/format consistencyGranularity of Fact Table–what level of detail do you want? Transactional grain–finest levelAggregated grain–more summarizedFiner grains  better market basket analysis capabilityFiner grain  more dimension tables, more rows in fact tableDuration of the database–how much history should be kept?Natural duration–13 months or 5 quartersFinancial institutions may need longer durationOlder data is more difficult to source and cleanse

اسلاید 29: 29Figure 11-16: Modeling datesFact tables contain time-period data Date dimensions are important

اسلاید 30: 30The User Interface Metadata (data catalog)Identify subjects of the data martIdentify dimensions and factsIndicate how data is derived from enterprise data warehouses, including derivation rulesIndicate how data is derived from operational data store, including derivation rulesIdentify available reports and predefined queriesIdentify data analysis techniques (e.g. drill-down)Identify responsible people

اسلاید 31: 31On-Line Analytical Processing (OLAP)The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniquesRelational OLAP (ROLAP)Traditional relational representationMultidimensional OLAP (MOLAP)Cube structureOLAP OperationsCube slicing – come up with 2-D view of dataDrill-down – going from summary to more detailed views

اسلاید 32: 32Figure 11-22: Slicing a data cube

اسلاید 33: 33Figure 11-24 Example of drill-downSummary reportDrill-down with color addedStarting with summary data, users can obtain details for particular cells

اسلاید 34: 34Data Mining and VisualizationKnowledge discovery using a blend of statistical, AI, and computer graphics techniquesGoals:Explain observed events or conditionsConfirm hypothesesExplore data for new or unexpected relationshipsTechniquesCase-based reasoningRule discoverySignal processingNeural netsFractalsData visualization – representing data in graphical/multimedia formats for analysi

34,000 تومان

خرید پاورپوینت توسط کلیه کارت‌های شتاب امکان‌پذیر است و بلافاصله پس از خرید، لینک دانلود پاورپوینت در اختیار شما قرار خواهد گرفت.

در صورت عدم رضایت سفارش برگشت و وجه به حساب شما برگشت داده خواهد شد.

در صورت بروز هر گونه مشکل به شماره 09353405883 در ایتا پیام دهید یا با ای دی poshtibani_ppt_ir در تلگرام ارتباط بگیرید.

افزودن به سبد خرید