علوم مهندسی کامپیوتر و IT و اینترنت

Chapter 23: Advanced Application Development

database_course_silberschatz_2005_ch23

در نمایش آنلاین پاورپوینت، ممکن است بعضی علائم، اعداد و حتی فونت‌ها به خوبی نمایش داده نشود. این مشکل در فایل اصلی پاورپوینت وجود ندارد.




  • جزئیات
  • امتیاز و نظرات
  • متن پاورپوینت

امتیاز

درحال ارسال
امتیاز کاربر [0 رای]

نقد و بررسی ها

هیچ نظری برای این پاورپوینت نوشته نشده است.

اولین کسی باشید که نظری می نویسد “Chapter 23: Advanced Application Development”

Chapter 23: Advanced Application Development

اسلاید 1: 1Chapter 23: Advanced Application DevelopmentPerformance TuningPerformance BenchmarksStandardizationE-CommerceLegacy Systems

اسلاید 2: 2Performance TuningAdjusting various parameters and design choices to improve system performance for a specific application.Tuning is best done by identifying bottlenecks, andeliminating them.Can tune a database system at 3 levels:Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits, move to a faster processor.Database system parameters -- e.g., set buffer size to avoid paging of buffer, set checkpointing intervals to limit log size. System may have automatic tuning.Higher level database design, such as the schema, indices and transactions (more later)

اسلاید 3: 3BottlenecksPerformance of most systems (at least before they are tuned) usually limited by performance of one or a few components: these are called bottlenecksE.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of timeWorth spending most time on 20% of code that take 80% of timeBottlenecks may be in hardware (e.g. disks are very busy, CPU is idle), or in softwareRemoving one bottleneck often exposes anotherDe-bottlenecking consists of repeatedly finding bottlenecks, and removing themThis is a heuristic

اسلاید 4: 4Identifying BottlenecksTransactions request a sequence of services e.g. CPU, Disk I/O, locks With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served Can model database as a queueing system with a queue for each service transactions repeatedly do the followingrequest a service, wait in queue for the service, and get servicedBottlenecks in a database system typically show up as very high utilizations (and correspondingly, very long queues) of a particular serviceE.g. disk vs CPU utilization100% utilization leads to very long waiting time: Rule of thumb: design system for about 70% utilization at peak loadutilization over 90% should be avoided

اسلاید 5: 5Queues In A Database System

اسلاید 6: 6Tunable ParametersTuning of hardwareTuning of schemaTuning of indicesTuning of materialized viewsTuning of transactions

اسلاید 7: 7Tuning of HardwareEven well-tuned transactions typically require a few I/O operationsTypical disk supports about 100 random I/O operations per secondSuppose each transaction requires just 2 random I/O operations. Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew)Number of I/O operations per transaction can be reduced by keeping more data in memoryIf all data is in memory, I/O needed only for writesKeeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost

اسلاید 8: 8Hardware Tuning: Five-Minute RuleQuestion: which data to keep in memory:If a page is accessed n times per second, keeping it in memory saves n * price-per-disk-drive accesses-per-second-per-diskCost of keeping page in memory price-per-MB-of-memory ages-per-MB-of-memoryBreak-even point: value of n for which above costs are equalIf accesses are more then saving is greater than cost Solving above equation with current disk and memory prices leads to: 5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes it should be kept in memory (by buying sufficient memory!)

اسلاید 9: 9Hardware Tuning: One-Minute RuleFor sequentially accessed data, more pages can be read per second. Assuming sequential reads of 1MB of data at a time: 1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memoryPrices of disk and memory have changed greatly over the years, but the ratios have not changed muchso rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!

اسلاید 10: 10Hardware Tuning: Choice of RAID LevelTo use RAID 1 or RAID 5? Depends on ratio of reads and writesRAID 5 requires 2 block reads and 2 block writes to write out one data blockIf an application requires r reads and w writes per secondRAID 1 requires r + 2w I/O operations per secondRAID 5 requires: r + 4w I/O operations per secondFor reasonably large r and w, this requires lots of disks to handle workloadRAID 5 may require more disks than RAID 1 to handle load! Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory!Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwiseIf you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!

اسلاید 11: 11Tuning the Database DesignSchema tuningVertically partition relations to isolate the data that is accessed most often -- only fetch needed information.E.g., split account into two, (account-number, branch-name) and (account-number, balance). Branch-name need not be fetched unless requiredImprove performance by storing a denormalized relation E.g., store join of account and depositor; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly.Price paid: more space and more work for programmer to keep relation consistent on updatesbetter to use materialized views (more on this later..)Cluster together on the same disk page records that would match in a frequently required join, compute join very efficiently when required.

اسلاید 12: 12Tuning the Database Design (Cont.)Index tuningCreate appropriate indices to speed up slow queries/updatesSpeed up slow updates by removing excess indices (tradeoff between queries and updates)Choose type of index (B-tree/hash) appropriate for most frequent types of queries.Choose which index to make clusteredIndex tuning wizards look at past history of queries and updates (the workload) and recommend which indices would be best for the workload

اسلاید 13: 13Tuning the Database Design (Cont.)Materialized ViewsMaterialized views can help speed up certain queriesParticularly aggregate queriesOverheadsSpaceTime for view maintenanceImmediate view maintenance:done as part of update txn time overhead paid by update transactionDeferred view maintenance: done only when requiredupdate transaction is not affected, but system time is spent on view maintenanceuntil updated, the view may be out-of-datePreferable to denormalized schema since view maintenance is systems responsibility, not programmersAvoids inconsistencies caused by errors in update programs

اسلاید 14: 14Tuning the Database Design (Cont.)How to choose set of materialized viewsHelping one transaction type by introducing a materialized view may hurt othersChoice of materialized views depends on costsUsers often have no idea of actual cost of operationsOverall, manual selection of materialized views is tediousSome database systems provide tools to help DBA choose views to materialize“Materialized view selection wizards”

اسلاید 15: 15Tuning of TransactionsBasic approaches to tuning of transactionsImprove set orientationReduce lock contentionRewriting of queries to improve performance was important in the past, but smart optimizers have made this less importantCommunication overhead and query handling overheads significant part of cost of each callCombine multiple embedded SQL/ODBC/JDBC queries into a single set-oriented querySet orientation -> fewer calls to databaseE.g. tune program that computes total salary for each department using a separate SQL query by instead using a single query that computes total salaries for all department at once (using group by) Use stored procedures: avoids re-parsing and re-optimization of query

اسلاید 16: 16Tuning of Transactions (Cont.)Reducing lock contentionLong transactions (typically read-only) that examine large parts of a relation result in lock contention with update transactionsE.g. large query to compute bank statistics and regular bank transactionsTo reduce contentionUse multi-version concurrency controlE.g. Oracle “snapshots” which support multi-version 2PLUse degree-two consistency (cursor-stability) for long transactionsDrawback: result may be approximate

اسلاید 17: 17Tuning of Transactions (Cont.)Long update transactions cause several problemsExhaust lock spaceExhaust log space and also greatly increase recovery time after a crash, and may even exhaust log space during recovery if recovery algorithm is badly designed!Use mini-batch transactions to limit number of updates that a single transaction can carry out. E.g., if a single large transaction updates every record of a very large relation, log may grow too big.* Split large transaction into batch of ``mini-transactions, each performing part of the updates Hold locks across transactions in a mini-batch to ensure serializabilityIf lock table size is a problem can release locks, but at the cost of serializability* In case of failure during a mini-batch, must complete its remaining portion on recovery, to ensure atomicity.

اسلاید 18: 18Performance SimulationPerformance simulation using queuing model useful to predict bottlenecks as well as the effects of tuning changes, even without access to real systemQueuing model as we saw earlierModels activities that go on in parallelSimulation model is quite detailed, but usually omits some low level detailsModel service time, but disregard details of service E.g. approximate disk read time by using an average disk read timeExperiments can be run on model, and provide an estimate of measures such as average throughput/response timeParameters can be tuned in model and then replicated in real systemE.g. number of disks, memory, algorithms, etc

اسلاید 19: 19Performance BenchmarksSuites of tasks used to quantify the performance of software systemsImportant in comparing database systems, especially as systems become more standards compliant.Commonly used performance measures:Throughput (transactions per second, or tps)Response time (delay from submission of transaction to return of result)Availability or mean time to failure

اسلاید 20: 20Performance Benchmarks (Cont.)Suites of tasks used to characterize performance single task not enough for complex systemsBeware when computing average throughput of different transaction typesE.g., suppose a system runs transaction type A at 99 tps and transaction type B at 1 tps. Given an equal mixture of types A and B, throughput is not (99+1)/2 = 50 tps.Running one transaction of each type takes time 1+.01 seconds, giving a throughput of 1.98 tps.To compute average throughput, use harmonic mean: n Interference (e.g. lock contention) makes even this incorrect if different transaction types run concurrently1/t1 + 1/t2 + … + 1/tn

اسلاید 21: 21Database Application ClassesOnline transaction processing (OLTP) requires high concurrency and clever techniques to speed up commit processing, to support a high rate of update transactions.Decision support applications including online analytical processing, or OLAP applicationsrequire good query evaluation algorithms and query optimization.Architecture of some database systems tuned to one of the two classesE.g. Teradata is tuned to decision supportOthers try to balance the two requirementsE.g. Oracle, with snapshot support for long read-only transaction

اسلاید 22: 22Benchmarks SuitesThe Transaction Processing Council (TPC) benchmark suites are widely used. TPC-A and TPC-B: simple OLTP application modeling a bank teller application with and without communicationNot used anymoreTPC-C: complex OLTP application modeling an inventory systemCurrent standard for OLTP benchmarking

اسلاید 23: 23Benchmarks Suites (Cont.)TPC benchmarks (cont.)TPC-D: complex decision support applicationSuperceded by TPC-H and TPC-RTPC-H: (H for ad hoc) based on TPC-D with some extra queries Models ad hoc queries which are not known beforehandTotal of 22 queries with emphasis on aggregationprohibits materialized viewspermits indices only on primary and foreign keysTPC-R: (R for reporting) same as TPC-H, but without any restrictions on materialized views and indicesTPC-W: (W for Web) End-to-end Web service benchmark modeling a Web bookstore, with combination of static and dynamically generated pages

اسلاید 24: 24TPC Performance MeasuresTPC performance measurestransactions-per-second with specified constraints on response time transactions-per-second-per-dollar accounts for cost of owning system TPC benchmark requires database sizes to be scaled up with increasing transactions-per-second reflects real world applications where more customers means more database size and more transactions-per-secondExternal audit of TPC performance numbers mandatory TPC performance claims can be trusted

اسلاید 25: 25TPC Performance MeasuresTwo types of tests for TPC-H and TPC-RPower test: runs queries and updates sequentially, then takes mean to find queries per hourThroughput test: runs queries and updates concurrentlymultiple streams running in parallel each generates queries, with one parallel update streamComposite query per hour metric: square root of product of power and throughput metricsComposite price/performance metric

اسلاید 26: 26Other BenchmarksOODB transactions require a different set of benchmarks.OO7 benchmark has several different operations, and provides a separate benchmark number for each kind of operationReason: hard to define what is a typical OODB applicationBenchmarks for XML being discussed

اسلاید 27: 27StandardizationThe complexity of contemporary database systems and the need for their interoperation require a variety of standards.syntax and semantics of programming languagesfunctions in application program interfacesdata models (e.g. object oriented/object relational databases)Formal standards are standards developed by a standards organization (ANSI, ISO), or by industry groups, through a public process.De facto standards are generally accepted as standards without any formal process of recognition Standards defined by dominant vendors (IBM, Microsoft) often become de facto standards De facto standards often go through a formal process of recognition and become formal standards

اسلاید 28: 28Standardization (Cont.)Anticipatory standards lead the market place, defining features that vendors then implement Ensure compatibility of future products But at times become very large and unwieldy since standards bodies may not pay enough attention to ease of implementation (e.g.,SQL-92 or SQL:1999)Reactionary standards attempt to standardize features that vendors have already implemented, possibly in different ways.Can be hard to convince vendors to change already implemented features. E.g. OODB systems

اسلاید 29: 29SQL Standards HistorySQL developed by IBM in late 70s/early 80sSQL-86 first formal standardIBM SAA standard for SQL in 1987 SQL-89 added features to SQL-86 that were already implemented in many systems Was a reactionary standardSQL-92 added many new features to SQL-89 (anticipatory standard)Defines levels of compliance (entry, intermediate and full)Even now few database vendors have full SQL-92 implementation

اسلاید 30: 30SQL Standards History (Cont.)SQL:1999Adds variety of new features --- extended data types, object orientation, procedures, triggers, etc.Broken into several partsSQL/Framework (Part 1): overviewSQL/Foundation (Part 2): types, schemas, tables, query/update statements, security, etcSQL/CLI (Call Level Interface) (Part 3): API interface SQL/PSM (Persistent Stored Modules) (Part 4): procedural extensionsSQL/Bindings (Part 5): embedded SQL for different embedding languages

اسلاید 31: 31SQL Standards History (Cont.)More parts undergoing standardization processPart 7: SQL/Temporal: temporal dataPart 9: SQL/MED (Management of External Data)Interfacing of database to external data sources Allows other databases, even files, can be viewed as part of the databasePart 10 SQL/OLB (Object Language Bindings): embedding SQL in JavaMissing part numbers 6 and 8 cover features that are not near standardization yet

اسلاید 32: 32Database Connectivity StandardsOpen DataBase Connectivity (ODBC) standard for database interconnectivity based on Call Level Interface (CLI) developed by X/Open consortiumdefines application programming interface, and SQL features that must be supported at different levels of complianceJDBC standard used for JavaX/Open XA standards define transaction management standards for supporting distributed 2-phase commitOLE-DB: API like ODBC, but intended to support non-database sources of data such as flat filesOLE-DB program can negotiate with data source to find what features are supportedInterface language may be a subset of SQLADO (Active Data Objects): easy-to-use interface to OLE-DB functionality

اسلاید 33: 33Object Oriented Databases StandardsObject Database Management Group (ODMG) standard for object-oriented databases version 1 in 1993 and version 2 in 1997, version 3 in 2000provides language independent Object Definition Language (ODL) as well as several language specific bindingsObject Management Group (OMG) standard for distributed software based on objects Object Request Broker (ORB) provides transparent message dispatch to distributed objectsInterface Definition Language (IDL) for defining language-independent data typesCommon Object Request Broker Architecture (CORBA) defines specifications of ORB and IDL

اسلاید 34: 34XML-Based StandardsSeveral XML based Standards for E-commerceE.g. RosettaNet (supply chain), BizTalk Define catalogs, service descriptions, invoices, purchase orders, etc.XML wrappers are used to export information from relational databases to XMLSimple Object Access Protocol (SOAP): XML based remote procedure call standardUses XML to encode data, HTTP as transport protocolStandards based on SOAP for specific applicationsE.g. OLAP and Data Mining standards from Microsoft

اسلاید 35: 35E-CommerceE-commerce is the process of carrying out various activities related to commerce through electronic meansActivities include:Presale activities: catalogs, advertisements, etcSale process: negotiations on price/quality of serviceMarketplace: e.g. stock exchange, auctions, reverse auctionsPayment for saleDelivery related activities: electronic shipping, or electronic tracking of order processing/shippingCustomer support and post-sale service

اسلاید 36: 36E-CatalogsProduct catalogs must provide searching and browsing facilitiesOrganize products into intuitive hierarchyKeyword searchHelp customer with comparison of productsCustomization of catalogNegotiated pricing for specific organizationsSpecial discounts for customers based on past historyE.g. loyalty discountLegal restrictions on salesCertain items not exposed to under-age customersCustomization requires extensive customer-specific information

اسلاید 37: 37MarketplacesMarketplaces help in negotiating the price of a product when there are multiple sellers and buyersSeveral types of marketplacesReverse auctionAuctionExchangeReal world marketplaces can be quite complicated due to product differentiationDatabase issues:Authenticate biddersRecord buy/sell bids securelyCommunicate bids quickly to participantsDelays can lead to financial loss to some participantsNeed to handle very large volumes of trade at timesE.g. at the end of an auction

اسلاید 38: 38Types of MarketplaceReverse auction system: single buyer, multiple sellers.Buyer states requirements, sellers bid for supplying items. Lowest bidder wins. (also known as tender system)Open bidding vs. closed biddingAuction: Multiple buyers, single sellerSimplest case: only one instance of each item is being soldHighest bidder for an item winsMore complicated with multiple copies, and buyers bid for specific number of copiesExchange: multiple buyers, multiple sellersE.g., stock exchangeBuyers specify maximum price, sellers specify minimum price exchange matches buy and sell bids, deciding on price for the trade e.g. average of buy/sell bids

اسلاید 39: 39Order SettlementOrder settlement: payment for goods and deliveryInsecure means for electronic payment: send credit card numberBuyers may present some one else’s credit card numbersSeller has to be trusted to bill only for agreed-on itemSeller has to be trusted not to pass on the credit card number to unauthorized peopleNeed secure payment systemsAvoid above-mentioned problemsProvide greater degree of privacyE.g. not reveal buyers identity to sellerEnsure that anyone monitoring the electronic transmissions cannot access critical information

اسلاید 40: 40Secure Payment SystemsAll information must be encrypted to prevent eavesdroppingPublic/private key encryption widely usedMust prevent person-in-the-middle attacks E.g. someone impersonates seller or bank/credit card company and fools buyer into revealing information Encrypting messages alone doesn’t solve this problemMore on this in next slideThree-way communication between seller, buyer and credit-card company to make paymentCredit card company credits amount to sellerCredit card company consolidates all payments from a buyer and collects them togetherE.g. via buyer’s bank through physical/electronic check payment

اسلاید 41: 41Secure Payment Systems (Cont.)Digital certificates are used to prevent impersonation/man-in-the middle attackCertification agency creates digital certificate by encrypting, e.g., seller’s public key using its own private keyVerifies sellers identity by external means first!Seller sends certificate to buyerCustomer uses public key of certification agency to decrypt certificate and find sellers public key Man-in-the-middle cannot send fake public keySellers public key used for setting up secure communicationSeveral secure payment protocolsE.g. Secure Electronic Transaction (SET)

اسلاید 42: 42Digital CashCredit-card payment does not provide anonymityThe SET protocol hides buyers identity from sellerBut even with SET, buyer can be traced with help of credit card companyDigital cash systems provide anonymity similar to that provided by physical cashE.g. DigiCashBased on encryption techniques that make it impossible to find out who purchased digital cash from the bankDigital cash can be spent by purchaser in partsmuch like writing a check on an account whose owner is anonymous

اسلاید 43: 43Legacy SystemsLegacy systems are older-generation systems that are incompatible with current generation standards and systems but still in production useE.g. applications written in Cobol that run on mainframesToday’s hot new system is tomorrows legacy system!Porting legacy system applications to a more modern environment is problematicVery expensive, since legacy system may involve millions of lines of code, written over decadesOriginal programmers usually no longer availableSwitching over from old system to new system is a problem more on this laterOne approach: build a wrapper layer on top of legacy application to allow interoperation between newer systems and legacy applicationE.g. use ODBC or OLE-DB as wrapper

اسلاید 44: 44Legacy Systems (Cont.)Rewriting legacy application requires a first phase of understanding what it doesOften legacy code has no documentation or outdated documentationreverse engineering: process of going over legacy code to Come up with schema designs in ER or OO modelFind out what procedures and processes are implemented, to get a high level view of systemRe-engineering: reverse engineering followed by design of new systemImprovements are made on existing system design in this process

اسلاید 45: 45Legacy Systems (Cont.)Switching over from old to new system is a major problemProduction systems are in every day, generating new dataStopping the system may bring all of a company’s activities to a halt, causing enormous lossesBig-bang approach: Implement complete new systemPopulate it with data from old system No transactions while this step is executedscripts are created to do this quicklyShut down old system and start using new systemDanger with this approach: what if new code has bugs or performance problems, or missing featuresCompany may be brought to a halt

اسلاید 46: 46Legacy Systems (Cont.)Chicken-little approach: Replace legacy system one piece at a timeUse wrappers to interoperate between legacy and new codeE.g. replace front end first, with wrappers on legacy backend Old front end can continue working in this phase in case of problems with new front endReplace back end, one functional unit at a timeAll parts that share a database may have to be replaced together, or wrapper is needed on database alsoDrawback: significant extra development effort to build wrappers and ensure smooth interoperationStill worth it if company’s life depends on system

اسلاید 47: 47End of Chapter

34,000 تومان

خرید پاورپوینت توسط کلیه کارت‌های شتاب امکان‌پذیر است و بلافاصله پس از خرید، لینک دانلود پاورپوینت در اختیار شما قرار خواهد گرفت.

در صورت عدم رضایت سفارش برگشت و وجه به حساب شما برگشت داده خواهد شد.

در صورت نیاز با شماره 09353405883 در واتساپ، ایتا و روبیکا تماس بگیرید.

افزودن به سبد خرید