Select Operation Strategies And Indexing (Chapter 8)
description
Transcript of Select Operation Strategies And Indexing (Chapter 8)
Select Operation Strategies
And Indexing
(Chapter 8)
Disk access
• DBs traditionally stored on disk
• Cheaper to store on disk than in memory
• Seek time, latency, data transfer time
• disk access is page oriented
• 2 - 4 KB page size
Access time
• A to randomly access a page – 12-20 ms –50-83 I/O's per second
• large disparity between disk access and memory access (10-200 ns)
• hash disk page address and look in lookaside table to see if page in memory buffer
• In memory DBs the future?
Table scan
• Linear search - all data rows read in – I/O parallelism can be used
• multiple I/O read requests satisfied at the same time
• stripe the data across different disks
– Problems with parallelism?• must balance disk arm load to gain maximum
parallelism • requires the same total number of random I/O's,
but using devices for a shorter time
Sequential prefetch I/O
• retrieve one disk page after another (on same track) - typically 32
• seek time no longer a problem
• must know in advance to read 32 successive pages
• speed up of I/O by a factor of 10 (500 I/O's per second vs. 70)
Access time
• Seek time – 10-15ms
• Latency time – 2-5 ms
• Data transfer time – 10-200 ns
Access time for fast I/O
RIO Seq. Prefetch .010 .010 Seek - disk arm to cylinder .002 .002 Latency - platter to sector .0015 .048 Data transfer - Page .0135 .060 1 page vs. 32 pages
.43 seconds .060 seconds for 32 pages for both
Textbook access time
RIO Seq. Prefetch .008 .008 Seek - disk arm to cylinder .004 .004 Latency - platter to sector .0005 .016 Data transfer - Page .0125 .028 1 page vs. 32 pages
.40 seconds .028 seconds for 32 pages for both
Disk allocation
• Disk Resource Allocation for Databases (control DBA has)
• No standard SQL approach, but general way to deal with allocation
• Some OS allow specification of size of file and disk device
• contiguous sectors on disk - want close together as possible to minimize seek time
Tablespace
• Allocation medium for tables and indexes for ORACLE, DB2
• usually files (relations) cannot span disk devices • can put >1 table in table space if accessed
together • corresponds to 1 or more OS files and can span
disk devices
Query Language
• ORACLE DB's contain several tablespaces, including one called system - data description + indexes + user-defined tables
Create tablespace tspace1 datafile 'fname1', 'fname2';
• default tablespace given to each user • if multiple tablespaces - better control over load
balancing • can take some disk space off-line
Extent
• extent - contiguous storage on disk • when data segment or index segment first created,
given an initial extent from tablespace 10KB (5 pages) • if need more space given next contiguous extent
• can increase the size by a positive % (cannot decrease) initial n - size of initial extent next n - size of next max extents - maximum number of extents min extents - number of extents initially
allocated pct increase n - % by which next extent
grows over previous one
Create table
• Create table statement - can specify tablespace, no. of extents
• can override parameters for extent allocation • pctfree - determine how much space can be
used for inserts of new rows • if 10%, inserts stop when page is 90% full• pctused - where new inserts start again • if fall below certain percentage of total, default =
40% pctfree + pctused < 100
Rows
• Row layout on each disk page (see figure) • Row directory - page byte offset • can have rows from multiple tables on same page, more
info • in index, point to or RID –
page #, slot # • RID can be retrieved in ORACLE but not DB2 (violates
relational model rule) – in ORACLE, rows can be slit between pages (row record
fragmentation) – in DB2, entire row moved to new page, need forwarding pointer
Binary Search
• Binary search on disk – optimal for comparisons - not optimal for disk-
based look-up – must keep data in order – may be reading values from same page at
different times
• Instead use B+-tree index
Indexing
• Keyed access retrieval method • index is a sorted file - sorted by index key • index entries:
index key pointer (RID)
• pointer is RID • index resides on disk, partially memory resident when
accessed
B+-tree
• Most commonly used index structure type in DBs today
• Based on B-tree
• Used to minimize disk I/O
• available in DB2, ORACLE also has hash cluster, Ingres has heap structure, B-tree, isam (chain together new nodes) Example
B+-tree
• leaf level pointers to data (RID)
• the remaining are directory (index) nodes that point to other index nodes
• assume number of entries in each index node fits on one page - one node is one page
• if tree with depth of 3, 3 I/Os to get pointer to data
B+-tree
• B+-tree structured to get most out of every disk page read
• Read in index node, can make multiple probes to same page if remains in memory
• likely since frequent access to upper -level nodes of actively used B+-trees
• search for leftmost index entry Si such that
X <= Si
B+-tree
• Index has a directory structure that allows retrieval of a range of values efficiently
• Index entries always placed in sequence by value - can use sequential prefetch on index
• Index entries shorter than data rows and require proportionately less I/O
B+-tree
• Balancing of B+-trees - insert, delete
• nodes usually not full
• utilities to reorganize to lower disk I/O
• most systems allow nodes to become depopulated- no automatic algorithm to balance
• average node below root level 71% full in active growing B+-trees
Duplicate key values
• Duplicate key values in index • leaf nodes have sibling pointers • but a delete of a row that has a heavily
duplicated key entails a long search through the leaf-level of the B+-tree
• Index compression - with multiple duplicates | header info | PrX keyval RID RID ... RID | PrX keyval RID…RID|
where PrX is count of RID values
Create Index
Options: multiple columns
tablespace storage - initial extents, etc. percent free default = 10
% of each page left unfilled free page (1 free page for every n
index pages) Can control % of B+-tree node pages left
unfilled when index created, refers to initial creation
Clustering
• Placing rows on disk in order by some common index key value (remember the index itself is always sorted)
• clustered (clustering) index - index with rows in the same order as the key values
• efficiency advantage read in a page, get all of the rows with
the same value • clustering is useful for range queries
e.g. between keyval1 and keyval2
Clustering
• can only cluster table by 1 clustering index at a time
• In DB2 – – if the table is empty, rows sorted as placed on
disk – subsequent insertions not clustered, must use
REORG
Indexes vs. table scan
• To illustrate the difference between table scan, secondary index (non clustered) and clustered index
Assume 10 M customers, 200 cities2KB/page, row = 100 bytes, 20 rows/page Select *
From Customers Where city = Birmingham
1/200 * 10M if assume selectivity = 1/200 50,000 customers in a city
Tables Scan
Table Scan - read entire table
10,000,000/20 = 500,000 pages
If use prefetch?
500000/32 * .? =
Secondary Index
Secondary Index–
• In the worst case 1 entry for B'ham per page • 50,000 pages (10M/200)• 3 upper nodes of the tree • Assume 1000 index entries per leaf node, read
50000/1000 index pages • (3 + 50 + 50,000)*?=
Clustering Index
• Clustering Index –
• All entries for B'ham clustered on same pages
• 50,000/20 = 2500 pages (with 20 rows per page)
• (3 + 50 + 2500)*?=
% Free
• Redo the previous calculations assuming relations created with 50% free option specified.
Multiple Indexes
• More than one index on a relation – e.g. class - one index, gender - one index
Composite Index
• One index based on more than one attribute Create Index index_name on Table (col1, col2,... coln)
• Composite index entry - values for each attribute class, gender entry in index is: C1, C2, RID
• What would B+ tree look like?
Creating Indexes
When determining what indexes to create consider: workload - mix of queries and frequencies of requests 20% of requests are updates, etc.
can create lots of indexes but: cost to create insertions initial load time high if a large table index entries can become longer and
longer as multiple columns included