Database Design Goals, Normalization, Normal Forms
اسلاید 1: Recap of Feb 20: Database Design Goals, Normalization, Normal FormsGoals for designing a database: a schema with:simple, easy to phrase queriesavoids redundancies (repetition of information)avoids anomalies good performanceNormalizationdecompose complex relationsLossy decompositionsFunctional DependenciesNormal Forms: 1NF, 2NF, 3NF, BCNFBCNF or 3NF: lossless decomposition in bothBCNF can’t always ensure dependency preservation3NF sometimes requires null values or redundant information
اسلاید 2: Getting Physical: Storage and File Structure (Chapter 11)Up until now we have examined database design from a high-level conceptual view, passing over actual implementation and underlying hardware.Appropriate focus for database usersBut hardware does have an influence on implementation, and implementation does have an influence on what conceptual designs will be more efficient and usefulNow we get physical -- examine physical storage media to give a background for later focus on implementation of the data models and languages already described
اسلاید 3: Chapter 11At this point we are focussing on the following sections11.1 Overview of Physical Storage Media11.2 Magnetic Disks11.3 RAID (very briefly)11.4 Tertiary Storage11.5 Storage Access11.6 File Organization11.7 Organization of Records in Files11.8 Data-Dictionary Storage
اسلاید 4: Midterm!As per the syllabus, the Midterm will cover the pre-midterm lecture notes plus sections 1,2,3 (except 3.4 & 3.5), 4, 6, 7-7.7, 11 (except 11.3 and 11.9) of the textWe have examined all that material except chapter 11, which starts todayScheduling the midterm depends upon the speed with which we get through the material.It looks as if we’ll have enough time to schedule one day for review before the midtermMy current guess is that we might be able to schedule the midterm as early as March 13; certainly no later than March 20. I’m aiming for Tuesday, March 18.
اسلاید 5: Midterm Study and Homework #2Material you are responsible for:All material presented in class before the midtermTextbook sections 1,2,3 (except 3.4 & 3.5), 4, 6, 7-7.7, 11 (except 11.3 and 11.9) The homework questions from assignment 1 (exercises 1.1, 1.2, 1.3, and 2.1-2.6) are all useful study aids, as are the questions from homework assignment #2:3.2, 3.3, 3.5, 3.6 -- 3.9, 3.164.1, 4.2, 4.4 -- 4.87.2, 7.4, 7.5, 7.11, 7.12, 7.15, 7.16, 7.21, 7.23Homework #2 is due Tuesday, March 11. I’ll try to have them back to you, graded, Thursday March 13 so you can use them as a study aid for the exam Tuesday, March 18.
اسلاید 6: Classification of Physical Storage MediaMedia are classified according to three characteristics:speed of accesscost per unit of datareliabilitydata loss on power failure or system crashphysical failure of the storage deviceWe can also differentiate storage as eithervolatile storagenon-volative storage
اسلاید 7: Physical Storage Media Overview (11.1)Typical media available are:CacheMain memoryFlash memoryMag diskOptical storage (CD or DVD)Tape storage
اسلاید 8: Physical Storage Media -- Cache and Main MemoryCachefastest and most costly form of storagevolatilemanaged by computer system hardwareMain memoryfast access (10s to 100s of nanoseconds)generally too small or expensive to hold the entire databasecurrent capacities commonly used are up to a few Gigabites capacities have gone up and per-byte costs have decreased steadily, roughly a factor of 2 every 2-3 yearsvolatile
اسلاید 9: Physical Storage Media -- Flash MemoryAlso known as EEPROM -- Electrically Erasable Programmable Read-Only Memorynon-volatilereading data is comparable to main memory speedswriting is more complexcan’t overwrite a single location -- a whole bank of memory must be erased to permit writing within that bank.erasing is only supported a limited number of times -- 10,000 to one million erase cycleswrites are slow (a few microseconds), and erases are slowercost comparable to main memorywidely used in computer systems embedded in other devices, such as digital cameras and hand-held computers
اسلاید 10: Physical Storage Media -- Magnetic Diskdata is stored on a spinning disk and read/written magneticallyprimary medium for long-term storage of datatypically stores entire databasedata must be moved from disk to main memory for access, and written back for storagemuch slower access than main memory (about which more later)direct access -- possible to read data on disk in any order, unlike magnetic tapecapacities up to 100 gigmuch larger capacity and cheaper cost/byte than main memory or flash memorycapacity doubles every two or three yearssurvives power failures and system crashesdisk failure can destroy data, but this is more rare than system crashes
اسلاید 11: Physical Storage Media -- Optical StorageNon-volatile; data is read optically from a spinning disk using a laserCD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular formsWrite-once, Read-many (WORM) optical disks used for archival storage (CD-R and DCD-R)Multiple-write versions also available (CD-RW, DVD-RW, and DVD-RAM)Reads and writes are slower than with magnetic diskJuke-box systems available for storing large volumes of datalarge numbers of removable disksseveral drivesmechanism for automatic loading/unloading of disks
اسلاید 12: Physical Storage Media -- Tape StorageNon-volatileused primarily for backup (to recover from disk failure) and for archival datasequential access -- much slower than diskvery high capacity (40-300 GB tapes available)tape can be removed from drive; storage costs much cheaper than disk, but drives are expensive; data is read optically from a spinning disk using a laserJuke-box systems available for storing large volumes of datae.g., remote sensing data, possibly hundreds of terabytes (1012 bytes) or even a petabyte (1015 bytes)
اسلاید 13: Storage HierarchyPrimary storage: fastest media but volatilecachemain memorysecondary storage: next level in hierarchy; moderately fast access time, non-volatilealso called on-line storageflash memory, magnetic diskstertiary storage: lowest level in hierarchy; slower access time, non-volatilealso called off-line storageoptical storage, magnetic tape
اسلاید 14: Magnetic Disks (11.2)Read-write headpositioned very close to the platter surface (almost touching it)Reads or writes magnetically coded informationSurface of platter divided into circular tracksover 16,000 tracks per platter on typical hard disksEach track is divided into sectorsa sector is the smallest unit of data that can be read or writtensector size is typically 512 bytestypically 200 (on inner tracks) to 400 (outer tracks) sectors per track
اسلاید 15: Magnetic Disks (cont)To read/write a sectordisk arm swings to position head on the right trackplatter spins continually; data is read/written as sector passes under headHead-disk assembliesmultiple disk platters on a single spindle (typically 2 to 4)one head per platter, mounted on a common armCylinder i consists of ith track of all the platters
اسلاید 16: Magnetic Disks (cont)Earlier generation disks were susceptible to head crashesdisk spins constantly at 60, 120, even 250 revolutions per secondhead is very close to the surface; if it touches the surface it can scrape the recording medium off the surface, wiping out data and causing the removed medium to fly around, causing more head crashesnewer disks have less friable material; less subject to head crashes
اسلاید 17: Magnetic Disks (cont)Disk controller -- interfaces between the computer system and the disk driveaccepts high-level commands to read or write a sectorinitiates actions such as moving the disk arm to the right track and actually reading or writing the datacomputes and attaches checksums to each sector to verify that data is read back correctlyEnsures successful writing by reading back sector after writing itPerforms remapping of bad sectorsMultiple disks are connected to a computer system through a controllercontrollers functionality (checksum, bad sector remapping) often carried out by individual disks, reducing load on controllerTwo disk interface standards are ATA (AT attachment) and SCSI (Small Computer System Interconnect)
اسلاید 18: Disk Performance MeasuresAccess time -- the time it takes from when a read or write request is issued to when data transfer begins. Consists of:Seek time -- time it takes to reposition the arm over the correct trackaverage seek time is 1/2 the worst case seek time4 to 10 milliseconds on typical disksRotational latency -- time it takes for the sector to appear under the headaverage latency is 1/2 the worst case latency4 to 11 milliseconds on typical disk (5400 to 15000 rpm)Data Transfer Rate -- rate of retrieval from or storage to disk4 to 8 MB per second is typicalMultiple disks may share a controller, so the rate that controller can handle is also important. E.G., ATA-5: 66MB/sec, SCSI-3: 40MB/sec, Fiber Channel: 256 MB/s
اسلاید 19: Disk Performance Measures (cont.)Mean time to failure (MTTF) - the average time the disk is expected to run continuously without any failure.Typically 3-5 yearsSounds good, but if you have 1500 disks then 300 per year will fail, or about 1 per dayMTTF decreases as the disk agesRAID (Redundant Arrays of Independent Disks) (11.3)disk organization techniques that manage a large number of disks, providing a view of a single disk ofhigh capacity and high speed by using multiple disks in parallelhigh reliability by storing data redundantly, so that data can be recovered even if a disk failsMTTdata loss can be as high as 500,000 to 1,000,000 hours on a RAID
اسلاید 20: Optimization of Disk-Block Access: MotivationRequests for disk I/O are generated both by the file system and by the virtual memory managerEach request specifies the address on the disk to be referenced in the form of a block numbera block is a contiguous sequence of sectors from a single track on one platterblock sizes range from 512 bytes to several K (4 -- 16K is typical)smaller blocks mean more transfers from disk; larger blocks makes for more wasted space due to partially filled blocksblock is the standard unit of data transfer between disk to main memorySince disk access speed is much slower than main memory access, methods for optimizing disk-block access are important
اسلاید 21: Optimization of Disk-Block Access: MethodsDisk-arm Scheduling: requests for several blocks may be speeded up by requesting them in the order they will pass under the head.If the blocks are on different cylinders, it is advantageous to ask for them in an order that minimizes disk-arm movementElevator algorithm -- move the disk arm in one direction until all requests from that direction are satisfied, then reverse and repeatSequential access is 1-2 orders of magnitude faster; random access is about 2 orders of magnitude slower
اسلاید 22: Optimization of Disk-Block Access: MethodsNon-volatile write buffersstore written data in a RAM buffer rather than on diskwrite the buffer whenever it becomes full or when no other disk requests are pendingbuffer must be non-volatile to protect from power failurecalled non-volatile random-access memory (NV-RAM)typically implemented with battery-backed-up RAMdramatic speedup on writes; with a reasonable-sized buffer write latency essentially disappearswhy can’t we do the same for reads? (hints: ESP, clustering)
اسلاید 23: Optimization of Disk-Block Access: MethodsFile organization (Clustering): reduce access time by organizing blocks on disk in a way that corresponds closely to the way we expect them to be accessedsequential files should be kept organized sequentiallyhierarchical files should be organized with mothers next to daughtersfor joining tables (relations) put the joining tuples next to each otherover time fragmentation can become an issuerestoration of disk structure (copy and rewrite, reordered) controls fragmentation
اسلاید 24: Optimization of Disk-Block Access: MethodsLog-based file systemdoes not update in-place, rather writes updates to a log diskessentially, a disk functioning as a non-volatile RAM write bufferall access in the log disk is sequential, eliminating seek timeeventually updates must be propogated to the original blocksas with NV-RAM write buffers, this can occur at a time when no disk requests are pendingthe updates can be ordered to minimize arm movementthis can generate a high degree of fragmentation on files that require constant updatesfragmentation increases seek time for sequential reading of files
اسلاید 25: Storage Access (11.5)Basic concepts (some already familiar):block-based. A block is a contiguous sequence of sectors from a single track; blocks are units of both storage allocation and data transfera file is a sequence of records stored in fixed-size blocks (pages) on the diskeach block (page) has a unique address called BIDoptimization is done by reducing I/O, seek time, etc.database systems seek to minimize the number of block transfers between the disk and memory. We can reduce the number of disk accesses by keeping as many blocks as possible in main memory.Buffer - portion of main memory used to store copies of disk blocksbuffer manager - subsystem responsible for allocating buffer space in main memory and handling block transfer between buffer and disk
اسلاید 26: Buffer ManagementThe buffer pool is the part of the main memory alocated for temporarily storing disk blocks read from disk and made available to the CPUThe buffer manager is the subsystem responsible for the allocation and the management of the buffer space (transparent to users)On a process (user) request for a block (page) the buffer manager:checks to see if the page is already in the buffer poolif so, passes the address to the processif not, it loads the page from disk and then passes the address to the processloading a page might require clearing (writing out) a page to make spaceVery similar to the way virtual memory managers work, although it can do a lot better (why?)
اسلاید 27: Buffer Replacement StrategiesMost operating systems use a LRU replacement scheme. In database environments, MRU is better for some common operations (e.g., join)LRU strategy: replace the least recently used blockMRU strategy: replace the most recently used blockSometimes it is useful to fasten or pin blocks to keep them available during an operation and not let the replacement strategy touch thempinned block is thus a block that is not allowed to be written back to diskThere are situations where it is necessary to write back a block to disk even though the buffer space it occupies is not yet needed. This write is called the forced output of a block; useful in recovery situationsToss-immediate strategy: free the space occupied by a block as soon as the final tuple of that block has been processed
اسلاید 28: Buffer Replacement StrategiesMost recently used (MRU) strategy: system must pin the block currently being processed. After the final tuple of that block has been processed the block is unpinned and becomes the most recently used block. This is essentially “toss-immediate” with pinning, and works very well with joins.The buffer manager can often use other information (design or statistical) to predict the probability that a request will reference a particular pagee.g., the data dictionary is frequently accessed -- keep the data dictionary blocks in main memory bufferif several pages are available for overwrite; choose the one that has the lowest number of recent access requests to replace
اسلاید 29: Buffer Management (cont)Existing OS affect DBMS operations by:read ahead, write behindwrong replacement strategiesUnix is not good for DBMS to run on topMost commercial systems implement their own I/O on a raw disk partitionVariations of buffer allocationcommon buffer pool for all relationsseparate buffer pool for each relationas above but with relations borrowing space from each otherprioritized buffers for very frequently accessed blocks, e.g. data dictionary
اسلاید 30: Buffer Management (cont)For each buffer the manager keeps the following:which disk and which block it is inwhether the block is dirty (has been modified) or not (why?)information for the replacement strategylast time block was accessedwhether it is pinnedpossible statistical information (access frequency etc.)
نقد و بررسی ها
هیچ نظری برای این پاورپوینت نوشته نشده است.