صفحه 1:
Recap of Feb 20: Database
Design Goals, Normalization,
Normal Forms
* Goals for designing a database: a schema with:
- simple, easy to phrase queries
- avoids redundancies (repetition of information)
- avoids anomalies
- good performance
* Normalization
- decompose complex relations
- Lossy decompositions
- Functional Dependencies
* Normal Forms: 1NF, 2NF, 3NF, BCNF
- BCNF or 3NF: lossless decomposition in both
+ BCNF can’t always ensure dependency preservation
* 3NF sometimes requires null values or redundant information
صفحه 2:
Getting Physical: Storage and
File Structure (Chapter 11)
* Up until now we have examined database design
from a high-level conceptual view, passing over
actual implementation and underlying hardware.
- Appropriate focus for database users
- But hardware does have an influence on implementation,
and implementation does have an influence on what
conceptual designs will be more efficient and useful
* Now we get physical -- examine physical storage
media to give a background for later focus on
implementation of the data models and languages
already described
صفحه 3:
Chapter 11
At this point we are focussing on the following
sections
¢ 11.1 Overview of Physical Storage Media
¢ 11.2 Magnetic Disks
¢ 11.3 RAID (very briefly)
٠ 11.4 Tertiary Storage
¢ 11.5 Storage Access
* 11.6 File Organization
٠ 11.7 Organization of Records in Files
¢ 11.8 Data-Dictionary Storage
صفحه 4:
Midterm!
As per the syllabus, the Midterm will cover the pre-
midterm lecture notes plus sections 1,2,3 (except 3.4 &
3.5), 4, 6, 7-7.7, 11 (except 11.3 and 11.9) of the text
We have examined all that material except chapter 11,
which starts today
Scheduling the midterm depends upon the speed with
which we get through the material.
It looks as if we’ll have enough time to schedule one
day for review before the midterm
My current guess is that we might be able to schedule
the midterm as early as March 13; certainly no later
than March 20. I’m aiming for Tuesday, March 18.
صفحه 5:
Midterm Study and Homework
#2
* Material you are responsible for:
~ All material presented in class before the midterm
- Textbook sections 1,2,3 (except 3.4 & 3.5), 4, 6, 7-7.7, 11
(except 11.3 and 11.9)
* The homework questions from assignment 1 (exercises
1.1, 1.2, 1.3, and 2.1-2.6) are all useful study aids, as are
the questions from homework assignment #2:
- 3.2, 3.3, 3.5, 3.6 -- 3.9, 3.16
- 4.1, 4.2, 4.4 - 8
- 7.2, 7:4: 7.57.11 7.12, 7:15, 7.16, 7.213
* Homework #2 is due Tuesday, March 11. I'll try to have
them back to you, graded, Thursday March 13 so you can
use them as a study aid for the exam Tuesday, March 18.
صفحه 6:
Classification of Physical
Storage Media
¢ Media are classified according to three
characteristics:
- speed of access
- cost per unit of data
- reliability
* data loss on power failure or system crash
* physical failure of the storage device
¢ We can also differentiate storage as either
- volatile storage
- non-volative storage
صفحه 7:
Physical Storage Media
Overview (11.1)
¢ Typical media available are:
- Cache
- Main memory
- Flash memory
- Mag disk
- Optical storage (CD or DVD)
- Tape storage
صفحه 8:
Physical Storage Media -- Cache
and Main Memory
* Cache
- fastest and most costly form of storage
- volatile
- managed by computer system hardware
¢ Main memory
- fast access (10s to 100s of nanoseconds)
- generally too small or expensive to hold the entire
database
* current capacities commonly used are up to a few Gigabites
* capacities have gone up and per-byte costs have decreased
steadily, roughly a factor of 2 every 2-3 years
- volatile
صفحه 9:
Physical Storage Media -- Flash
Memory
Also known as EEPROM -- Electrically Erasable
Programmable Read-Only Memory
non-volatile
reading data is comparable to main memory speeds
writing is more complex
~ can’t overwrite a single location -- a whole bank of memory
must be erased to permit writing within that bank.
- erasing is only supported a limited number of times -- 10,000
to one million erase cycles
- writes are slow (a few microseconds), and erases are slower
cost comparable to main memory
widely used in computer systems embedded in other
devices, such as digital cameras and hand-held computers
صفحه 10:
Physical Storage Media --
Magnetic Disk
data is stored on a spinning disk and read/written magnetically
primary medium for long-term storage of data
typically stores entire database
data must be moved from disk to main memory for access, and
written back for storage
- much slower access than main memory (about which more later)
direct access -- possible to read data on disk in any order, unlike
magnetic tape
capacities up to 100 gig
- much larger capacity and cheaper cost/byte than main memory or flash
memory
- capacity doubles every two or three years
survives power failures and system crashes
- disk failure can destroy data, but this is more rare than system crashes
صفحه 11:
Physical Storage Media --
Optical Storage
Non-volatile; data is read optically from a spinning disk
using a laser
CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular
forms
Write-once, Read-many (WORM) optical disks used for
archival storage (CD-R and DCD-R)
Multiple-write versions also available (CD-RW, DVD-RW, and
DVD-RAM)
Reads and writes are slower than with magnetic disk
Juke-box systems available for storing large volumes of data
- large numbers of removable disks
- several drives
- mechanism for automatic loading/unloading of disks
صفحه 12:
Physical Storage Media -- Tape
Storage
Non-volatile
used primarily for backup (to recover from disk failure)
and for archival data
sequential access -- much slower than disk
very high capacity (40-300 GB tapes available)
tape can be removed from drive; storage costs much
cheaper than disk, but drives are expensive; data is
read optically from a spinning disk using a laser
Juke-box systems available for storing large volumes of
data
- e.g., remote sensing data, possibly hundreds of terabytes
(10"2 bytes) or even a petabyte (105 bytes)
صفحه 13:
Storage Hierarchy
Primary storage: fastest
media but volatile
- cache
- main memory
secondary storage: next level
in hierarchy; moderately fast
access time, non-volatile
- also called on-line storage
- flash memory, magnetic disks
tertiary storage: lowest level
in hierarchy; slower access
time, non-volatile
~ also called off-line storage
- optical storage, magnetic tape
صفحه 14:
Magnetic Disks (11.2)
Read-write head
- positioned very close to the
platter surface (almost touching
it)
- Reads or writes magnetically
coded information
Surface of platter divided into
circular tracks
= over 16,000 tracks per platter on
typical hard disks
Each track is divided into sectors
- asector is the smallest unit of
data that can be read or written
- sector size is typically 512 bytes
- typically 200 (on inner tracks) to
400 (outer tracks) sectors per
track roan
am asenbly
صفحه 15:
Magnetic Disks (cont)
To read/write a sector
- disk arm swings to position
head on the right track
- platter spins continually;
data is read/written as
sector passes under head
Head-disk assemblies
- multiple disk platters on a
single spindle (typically 2
to 4)
- one head per platter,
mounted on a common arm
Cylinder i consists of #*
track of all the platters
am asenbly
مر ۱ |
=|
0
صفحه 16:
Magnetic Disks (cont)
* Earlier generation disks were susceptible to head
crashes
- disk spins constantly at 60, 120, even 250 revolutions per
second
- head is very close to the surface; if it touches the surface it
can scrape the recording medium off the surface, wiping out
data and causing the removed medium to fly around,
causing more head crashes
- newer disks have less friable material; less subject to head
crashes
صفحه 17:
Magnetic Disks (cont)
1
و
Multiple disks are Connected to
a computer system through a
controller
= controllers functionality
(checksum, bad sector
remapping) often carried out by
individual disks, reducing load
on controller
Two disk interface standards
are ATA (AT attachment) and
SCSI (Small Computer System
Interconnect)
Disk controller -- interfaces
between the computer system
and the disk drive
- accepts high-level commands to
read or write a sector
- initiates actions such as moving
the disk arm to the right track
and actually reading or writing
the data
- computes and attaches
checksums to each sector to
verify that data is read back
correctly
- Ensures successful writing by
reading back sector after writing
it
- Performs remapping of bad
sectors
صفحه 18:
Disk Performance Measures
+ Access time -- the time it takes from when a read or write
request is issued to when data transfer begins. Consists of:
- Seek time -- time it takes to reposition the arm over the correct
track
* average seek time is 1/2 the worst case seek time
+ 4 to 10 milliseconds on typical disks
- Rotational latency -- time it takes for the sector to appear under
the head
* average latency is 1/2 the worst case latency
+ 4 to 11 milliseconds on typical disk (5400 to 15000 rpm)
* Data Transfer Rate -- rate of retrieval from or storage to disk
- 4 to 8 MB per second is typical
- Multiple disks may share a controller, so the rate that controller
can handle is also important. E.G., ATA-5: 66MB/sec, SCSI-3:
40MB/sec, Fiber Channel: 256 MB/s
صفحه 19:
Disk Performance Measures
(cont.)
* Mean time to failure (MTTF) - the average time the disk is
expected to run continuously without any failure.
~ Typically 3-5 years
~ Sounds good, but if you have 1500 disks then 300 per year will
fail, or about 1 per day
- MTTF decreases as the disk ages
* RAID (Redundant Arrays of Independent Disks) (11.3)
- disk organization techniques that manage a large number of
disks, providing a view of a single disk of
+ high capacity and high speed by using multiple disks in parallel
+ high reliability by storing data redundantly, so that data can be
recovered even if a disk fails
+ MTTdata loss can be as high as 500,000 to 1,000,000 hours on a
RAID
صفحه 20:
Optimization of Disk-Block
Access: Motivation
Requests for disk 1/0 are generated both by the file system
and by the virtual memory manager
Each request specifies the address on the disk to be
referenced in the form of a block number
- ablock is a contiguous sequence of sectors from a single track on
one platter
~ block sizes range from 512 bytes to several K (4 -- 16K is typical)
smaller blocks mean more transfers from disk; larger blocks
makes for more wasted space due to partially filled blocks
block is the standard unit of data transfer between disk to main
memory
Since disk access speed is much slower than main memory
access, methods for optimizing disk-block access are
important
صفحه 21:
Optimization of Disk-Block
Access: Methods
* Disk-arm Scheduling: requests for several
blocks may be speeded up by requesting them
in the order they will pass under the head.
- If the blocks are on different cylinders, it is
advantageous to ask for them in an order that
minimizes disk-arm movement
- Elevator algorithm -- move the disk arm in one
direction until all requests from that direction are
satisfied, then reverse and repeat
- Sequential access is 1-2 orders of magnitude faster;
random access is about 2 orders of magnitude slower
صفحه 22:
Optimization of Disk-Block
Access: Methods
* Non-volatile write buffers
- store written data in a RAM buffer rather than on disk
- write the buffer whenever it becomes full or when no
other disk requests are pending
- buffer must be non-volatile to protect from power
failure
* called non-volatile random-access memory (NV-RAM)
* typically implemented with battery-backed-up RAM
- dramatic speedup on writes; with a reasonable-sized
buffer write latency essentially disappears
- why can’t we do the same for reads? (hints: ESP,
clustering)
صفحه 23:
Optimization of Disk-Block
Access: Methods
¢ File organization (Clustering): reduce access
time by organizing blocks on disk in a way that
corresponds closely to the way we expect them
to be accessed
- sequential files should be kept organized sequentially
- hierarchical files should be organized with mothers
next to daughters
- for joining tables (relations) put the joining tuples next
to each other
- over time fragmentation can become an issue
* restoration of disk structure (copy and rewrite, reordered)
controls fragmentation
صفحه 24:
Optimization of Disk-Block
Access: Methods
* Log-based file system
does not update in-place, rather writes updates to a log disk
* essentially, a disk functioning as a non-volatile RAM write buffer
all access in the log disk is sequential, eliminating seek time
eventually updates must be propogated to the original blocks
+ as with NV-RAM write buffers, this can occur at a time when no
disk requests are pending
* the updates can be ordered to minimize arm movement
this can generate a high degree of fragmentation on files that
require constant updates
+ fragmentation increases seek time for sequential reading of files
صفحه 25:
Storage Access (11.5)
* Basic concepts (some already familiar):
- block-based. A block is a contiguous sequence of sectors from a
single track; blocks are units of both storage allocation and data
transfer
- a file is a sequence of records stored in fixed-size blocks (pages) on
the disk
- each block (page) has a unique address called BID
- optimization is done by reducing I/O, seek time, etc.
- database systems seek to minimize the number of block transfers
between the disk and memory. We can reduce the number of disk
accesses by keeping as many blocks as possible in main memory.
- Buffer - portion of main memory used to store copies of disk blocks
- buffer manager - subsystem responsible for allocating buffer space
in main memory and handling block transfer between buffer and
disk
صفحه 26:
Buffer Management
The buffer pool is the part of the main memory alocated for
temporarily storing disk blocks read from disk and made
available to the CPU
The buffer manager is the subsystem responsible for the
allocation and the management of the buffer space
(transparent to users)
On a process (user) request for a block (page) the buffer manager:
checks to see if the page is already in the buffer pool
- if'so, passes the address to the process
if not, it loads the page from disk and then passes the address to the
process
- loading a page might require clearing (writing out) a page to make space
Very similar to the way virtual memory managers work, although it
can do a lot better (why?)
صفحه 27:
Buffer Replacement Strategies
Most operating systems use a LRU replacement scheme. In database
environments, MRU is better for some common operations (e.g., join)
- LRU strategy: replace the least recently used block
- MRU strategy: replace the most recently used block
Sometimes it is useful to fasten or pin blocks to keep them available
during an operation and not let the replacement strategy touch them
- pinned block is thus a block that is not allowed to be written back to disk
There are situations where it is necessary to write back a block to
disk even though the buffer space it occupies is not yet needed. This
write is called the forced output of a block; useful in recovery
situations
Toss-immediate strategy: free the space occupied by a block as soon
as the final tuple of that block has been processed
صفحه 28:
Buffer Replacement Strategies
* Most recently used (MRU) strategy: system must pin the
block currently being processed. After the final tuple of
that block has been processed the block is unpinned and
becomes the most recently used block. This is essentially
“toss-immediate” with pinning, and works very well with
joins.
The buffer manager can often use other information (design
or statistical) to predict the probability that a request will
reference a particular page
- e.g., the data dictionary is frequently accessed -- keep the data
dictionary blocks in main memory buffer
- if several pages are available for overwrite; choose the one that
has the lowest number of recent access requests to replace
صفحه 29:
Buffer Management (cont)
¢ Existing OS affect DBMS operations by:
read ahead, write behind
- wrong replacement strategies
- Unix is not good for DBMS to run on top
- Most commercial systems implement their own I/O ona
raw disk partition
* Variations of buffer allocation
- common buffer pool for all relations
- separate buffer pool for each relation
- as above but with relations borrowing space from each
other
- prioritized buffers for very frequently accessed blocks,
e.g. data dictionary
صفحه 30:
Buffer Management (cont)
¢ For each buffer the manager keeps the
following:
- which disk and which block it is in
- whether the block is dirty (has been modified) or not
(why?)
- information for the replacement strategy
* last time block was accessed
+ whether it is pinned
* possible statistical information (access frequency etc.)