Structure of a DBMS
description
Transcript of Structure of a DBMS
Structure of a DBMS• A typical DBMS has a
layered architecture.
• The figure does not show the concurrency control and recovery components.
• This is one of several possible architectures; each system has its own variations.
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layersmust considerconcurrencycontrol andrecovery
Query Parser/Decomp.
Data Storage
DBMS deals with a very large amount of data.
• How does a computer system store and manage very large volumes of data?
• What representations and data structures best support efficient manipulations of this data?
Storage HierarchyCPU
CACHE
MAIN MEMORY
HARD DISK (RAID)
OPTICAL DISK
TAPE
KB
MB
GB
TB
TB
Redundant Array Inexpensive Disk
RAID: A number of disks is organized and appears to be a single one to the OS
• Aggregate disk capacity
• Increase I/O throughput
• Fault-tolerant (hot swapping)
PC
RAID System
Each database contains a number of tables Each table contains a number of records Each record has a unique identifier called a
record id, or rid. Records can be stored in files based on the underlying
OS. Records can be stored directly to disk blocks bypassing
file systems (raw devices).
Storing databases/tables/records
Disk Manager
Buffer Manager
Why raw devices?• Differences in OS support: portability
issues• Some limitations, e.g., files can’t span
disks.• Performance
Higher Layer
Lowest level in DBMS software architectureOS files disksor
Access Manager
Arrange records in a page.
An overview at 30000-feet
High
Disk Manager
Buffer Manager
Higher Layer
OS files disksor
Access Manager
Record 1Record 2
:::::Record n1
CreateTable(DB, tName, field[])ReadRecord(DB, tName, rID)WriteRecord(DB, tName, rID, r)DeleteRecord(DB, tName, rID)Append(DB, tName, record)
record-level operations
Page 1Page 2
:::::Page mAllocatePage()
DeletePage (pID)WritePage (pID, page)ReadRecord(pID)
page-level operations
Mapping between database-table-record
and page
Db1:T1Record 1Record 2
Record n2
Dbi:Tj
:: :::::
Mapping between page and devices
HD Optical Tape
Disk Manager• Higher levels call upon this layer to:
– allocate/de-allocate a page on disk– read/write a page (or a block or a unit of disk
retrieval)
Disk Manager
Buffer Manager
Higher Layer
OS files disksor
Access Manager
Lowest level in DBMS software architecture
Arrange records in a page.
Page 1Page 2Page 3Page 4
:::::Page n
These pages are physically located in different devices (RAID, optical, and/or tape)
Buffer Manager
DB
MAIN MEMORY
DISK
disk page
free frame
Page Requests from Higher LevelsBUFFER POOL
choice of frame dictatedby replacement policy
Size of a frame equal to size of a disk pageTwo variables associated with each frame/page:
• Pin_count: Number of current users of the page• Dirty: whether the page has been modified since it has been
brought into the buffer pool.
Buffer ManagementWhen a Page is Requested ...
• If a requested page is not in the buffer pool:– Choose a frame for replacement– If the frame is dirty, write it to disk– Read the requested page into the chosen frame
• Pin the page and return its address to the requester.
If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!
Pinning: Incrementing a pin_count.Unpinning: Release the page and the pin_count is decremented.
Buffer Replacement Policy• Frame is chosen for replacement by a
replacement policy:– Least-recently-used (LRU), First-In-First-Out (FIFO),
Most-recently-used (MRU) etc.
• Policy can have big impact on # of I/O’s; depends on the access pattern.
• Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.– # buffer frames < # pages in file means each page
request causes an I/O. MRU much better in this situation (but not in all situations, of course).
Access Manager• Arrange records into a
page• Retrieve records from a
page• What to concern
– Mapping between record and page
– Record formats– Page formats– File formats
Record 1Record 2
:::::Record n1
Page 1Page 2
:::::Page m
Insert/delete/retrieve a record from/to a page
Db1:T1Record 1Record 2
Record n2
Dbi:Tj
:: :::::
Record-Page Mapping • Maintain a mapping table
Record 1Record 2
:::::Record n1
Page 1Page 2
:::::Page m
Insert/delete/retrieve a record from/to a page
Db1:T1Record 1Record 2
Record n2
Dbi:Tj
:: :::::
DBdb1db2
dbn:::
t1t2
tm:::
Tablep1p2
pj:::
Page
t1t2
tm:::
Table R1R2R3
. . N
Record Formats: Fixed Length
• Information about field types and lengths is stored in system catalogs.• Li: Size of field i in bytes
Base address (B)
L1 L2 L3 L4
F1 F2 F3 F4
Address = B+L1+L2
Name: char (40) Address: char (100)Phone: char (10)Email: char (100)
Record Formats: Variable Length• Two alternative formats (# fields is fixed):
Second offers direct access to i’th field, efficient storage of nulls; small directory overhead.
4 $ $ $ $
FieldCount
Fields Delimited by Special Symbols
F1 F2 F3 F4
F1 F2 F3 F4
Array of Field Offsets
Can be used for fixed length fieldsOr Variable length fields
VARCHARBLOB
Page Formats: Fixed Length Records
Slot 1Slot 2
Slot N. . .
N
Solution 1: PACKED
FreeSpace
number of records
Each slot holds one record Record the number of records
(total = PageSize/RecordSize) The free slot starts from N+1 When appending a new
record, allocate slot N+1, then update N++
When deleting a record at slot i, move the last record to the slot i. If sorted, all records after slot i must be moved up
Slot 3
Page Formats: Fixed Length Records
. . .
M10. . .M ... 3 2 1
Slot 1Slot 2
Slot N
FreeSpace
Slot M11
numberof slots
Each slot holds one record Total number of slots is
PageSize/RecordSize Need a bitmap to record if a
slot is occupied or not If bit[i]==1, slot i is occupied When inserting a record,
search the bitmap to find a bit that is 0, then allocate the corresponding slot for the record
When deleting a record, simply reset the corresponding bit
Solution 1: UNPACKED
Page Formats: Variable Length Records
Can move records on page without changing rid; so, attractive for fixed-length records too.
Page iRid = (i,N)
Rid = (i,2)
Rid = (i,1)
Pointerto startof freespace
SLOT DIRECTORY
N . . . 2 120 16 24 N
# slots
• Format• Heap File: Suitable when typical access is a file scan
retrieving all records.• Sorted File: Best if records must be retrieved in some
order, or only a `range’ of records is needed.• Hashed File: Good for equality selections.
File Formats and Operation Costs
• Cost factors (we ignore CPU costs, for simplicity)– P: The number of data pages– R: Number of records per page– D: (Average) time to read or write disk page
– Measuring number of page I/O’s ignores gains of pre-fetching blocks of pages; thus, even I/O cost is only approximated.
• The data in a heap file is not ordered. – How to find a page that has some free space– How to find the free space inside a page
– Depend on record format, i.e., fixed-length or variable length
• Two types of implementations– Link-based– Directory-based
Heap Files
Heap File Implemented as a List
• To insert a record, one searches the pages with free space and find the one that has sufficient space• Many pages may contain some tiny free space• A long list of pages may have to loaded in order to find an
appropriate one
HeaderPage
DataPage
DataPage
DataPage
DataPage
DataPage
DataPage Pages with
Free Space
Full Pages
Heap File Using a Page Directory
• The directory is a collection of pages; • Each page contains a number of entry• Each entry contains a pointer linking to the page and
a variable recording the free space• The number of directory pages is much smaller than
that of data pages
DataPage 1
DataPage 2
DataPage N
HeaderPage
DIRECTORY
Scan: P*DSearch with equality selection: If a selection is based on a candidate key, on average, we must scan half the file, assuming that the record exists 0.5*P*D.
Search with range selection: The entire file must be scanned. The cost is P*D.
Insert: Assume that records are always inserted at the end of the file. We fetch the last page in the file, add the record, and write the page back. The cost is 2D.
Delete:. The cost also depends on the number of qualifying records. The cost is search cost + D.
Heap Files and Associated Costs
P: The number of data pagesR: Number of records per pageD: (Average) time to read or write disk page
DataPage 1
DataPage 2
DataPage P
::directory
Sorted File
DataPage 2
DataPage N
HeaderPage
DIRECTORY
sorted field
DataPage 1
1
234
RID
Sort records directly• Expensive when inserting a record
Sorted File
DataPage n
Index Page
DIRECTORY
sorted field
DataPage 1
1
234
RID
• Keep the sorted field using index page• Each entry points to a record
• May contain the value of the sorted field• May contain a valid bit for deleting operation
• The entries are sorted according to the sorted field• Inserting a record just need to reorganize the index page
• The size is much smaller
Sorted Files and Associated CostsScan: P*DSearch with equality selection: Assume that the selection is specified on the field by which the file is sorted. The cost is D * log P assuming that the sorted file is stored sequentially.
Search with range selection:
Insert: Search cost + 2*0.5*P*D; the assumption is that the inserted record belongs in the middle of the file.
Delete: Search cost + 2*0.5*P*D; the assumption is that we need to pack the file and the record to be deleted is in the middle of the file.
D * log P + cost of retrieving qualified records.
Smith, 40, 3000Jones, 40, 6003Tracy, 40, 5004
h1ageh(age)=00
h(age)=01
h(age)=9
File hashed on age
•File is a collection of buckets. •Bucket = primary page plus zero or more overflow pages.
•Hashing function h: h(r) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.
Hashed files
Doug, 20, 3800
Ashby, 21,3000Basu, 31, 4003Bristow, 21, 2007
Class, 59, 5004Daniels, 29, 6003
overflow pages
Smith, 40, 3000Jones, 40, 6003Tracy, 40, 5004
h1ageh(age)=00
h(age)=01
h(age)=9
File hashed on age
•File is a collection of buckets. •Bucket = primary page plus zero or more overflow pages.
•Hashing function h: h(r) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.
Hashed files
Doug, 20, 3800
Ashby, 21,3000Basu, 31, 4003Bristow, 21, 2007
Class, 59, 5004Daniels, 29, 6003
overflow pages
Efficiency depends on• Hash function• Data skew factor
Hashed Files and Associated CostsAssume that there is no overflow page.
Scan: 1.25*P*D if pages are kept at 80% occupancy
Search with equality selection:D
Search with range selection: 1.25*P*D
Insert: Search cost + D = 2D
Delete: Search cost + D = 2D
HeapFile
Sorted File
HashedFile
Scan all records BD BD 1.25 BDEquality Search 0.5 BD D log 2B DRange Search BD D (log 2B + # of
pages withmatches)
1.25 BD
Insert 2D Search + BD 2DDelete Search + D Search + BD 2D
Several assumptions underlie these (rough) estimates!
Cost Comparison