Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

27
Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes

Transcript of Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Page 1: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 1

CSE2132 Database Systems

Week 11 Lecture

Indexes

Page 2: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 2

SELECT *FROM EMPLOYEE

WHERE EMP_ID = 'E9'

Assuming EMP_ID is unique we expect to retrieve 1 row.

How many records did we have to access in order to retrieve that 1 row? P1 E1 - Jones

E2 - Smith

E3 - Wong

P2 E4 - White

E5 - Bloggs

P3 E7 - Chen

E9 - Green

IndexesConsider

Page 3: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 3

Indexes an OverviewThe minimum amount of data transfer between Secondary Storage and

Main Memory is 1 page.

Therefore the cost of accessing Emp_Id = 'E9' is measured using the number of pages we had to access to retrieve the record we required. In the case of an unordered file the number of accesses using a linear search will average n/2 - where n = the number of pages.

n = 3

BA (Blocks Accessed) = n/2 = 3/2 = 2 accesses (1.5)

The minimum possible number of data base accesses will be equal to the number of rows retrieved (or less if the rows are in the same block/page).

i.e 1 row = 1 page (block) accessed

Page 4: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 4

The use of index may aid in this BUT indexes have their own overheads Indexes may use up to 50% of the allocated file space of the data base

How can we reduce the cost of index use and the space used by indexes ?

1. Use efficient file access methods.e.g. If we have an ordered file we can use a binary search rather than just a linear search.

2. Be wise when choosing to create an index.

Indexes an Overview

Page 5: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 5

r=30,000 recordsblocksize=1,024 bytes (or page size)rec length=100 bytes BF = Blocking Factor = 1,024/100 = 10Number of Blocks = r/BF = 30,000/10 = 3,000 blocks (or pages)Using a Linear SearchBA = n/2 = 3,000/2 = 1,500 (on average)Using a Binary Search (if the file is ordered)

BA=log n

BA=log 3,000=121 record still need 12 accesses (This is a maximum)

An Example

2

2

Page 6: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 6

BA = 1 - depending on the blocking factor and the fill factor

1 record retrieved = 1 access - optimal

10 records retrieved = 10 accesses

But what if they are 10 consecutive rows?BF = 10

BA = 1 or 2 if the records are stored consecutively rather than 10 required for a hashed file organization

Hashing - The Number of Accesses

Page 7: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 7

Indexes rely on a key value for access.

If we do not have a key we must sequentially search the file.

An index is an auxiliary file that makes it more efficient to search for a record in the data file .

An Index is called an ACCESS PATH on a field.

An example of a simple index structure is <key(field-value),pointer-to-page>

ordered by the field value.

The Situation When Using Indexes

Page 8: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 8

The pages can be moved around and the index address will still be correct providing the address contained in the directory is updated

The pointers provide direct access to the data records

Optimally there should be very few accesses to traverse the index file as all of the accesses are overheads

BA = BA (index) + 1

To retrieve the data record

Index Operation

Page 9: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 9

Example 1 from Elmasri & Navathe (p108) Data file 3000 blocks

Binary Search BA=log23000=12

Using an Index to retrieve a data record the number of accesses is :- BAtotal = BAi+BAd = (log n)+1

Size of index record - assume key = 9 bytes eg. SSN V+PV - length of keyP - length of pointer to data value= 9 + 6 = 15 bytesBlocking Factor BFi=Blocksize / Indexrecordsize= 1024/15 = 68 records per block

Number of Index Blocks = 30,000/68 = 442

One Way Indexes Reduce Accesses

2

Page 10: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 10

Total Accesses are Then

BAi+d

= (log2

442) + 1

= 9 + 1 = 10 accesses

Which is less than the 12 accesses using the orderedfile alone.

Nb: A non-ordered data file is assumed and a dense index.

Page 11: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 11

The data file may be ordered and therefore our index may be sparsely populated, i.e. only 1 index entry per block of data records - therefore less index records

Another Way Indexes Reduce Accesses

Our index would only need to contain 3 entries instead of 9 for the dense index - the number of index entries depends on the blocking factor of the data file. An unordered data file requires 30,000 index entries with one index entry per data record. The ordered file requires only one index entry per block i.e. 3000 index entries.

P1 P2 P3

E1 Jones

E2 Smith

E3 Wong

E4 White

E5 Bloggs

E6 West

E7 Chen

E8 Brown

E9 Green

Page 12: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 12

3000 index records/68 = 45 index blocks : ordered data file

Total Accesses are Then

NOTE: Index files may be independent of the data fileThe index can be created and dropped independently of the data fileThe index file can be of any file organization(in Oracle it is a B+Tree)There can be any number of index files for a data file

BA = (log 2

BA = 5 + 1 = 6 accesses for an indexed ordered file

45) + 1

Page 13: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 13

The attributes on which the index is/are built is/are called the INDEXED FIELD/S- if these attributes are built on the PRIMARY KEY then this index is called the PRIMARY INDEX- if an index is built on any other attributes it is called a SECONDARY INDEX and the attribute values may be non- unique

Index Terminology

A SIMPLE CLUSTERING INDEX (This is not an option in Oracle)- the index is built on a non-unique attribute and includes one index entry for each distinct value of the attribute. The index entry points to the first data block that contains records with that attribute value. The underlying file must be ordered on the chosen non-unique attribute.

Page 14: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 14

INDEX FILE

DATA FILEENO ENAME SALARY BIRTDATE

PRIMARYKEY VALUE

BLOCKPOINTER

.....

13

1314

2526

39

1

13

25

39

41

53

A Primary Index on an Ordered File

This assumes ENOis unique and is beingused as a primary key.It is a non dense index as the data file is ordered on the index key.

(ISAM)

Page 15: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 15

INDEX FILEDATA FILE

ENAME DNO EMPNO SALARY

INDEXFIELD VALUE

BLOCKPOINTER

.....

A Dense Secondary Index on Non Data File Ordering Column

1379

13141617

25262930

33353941

This assumes ENAME has been nominated as an alternative key so an index entry is required for every record.The data file has been ordered on empno

Aaron

Abbot

Adams

Akers

Alexander

Alfred

Allen

Anderson

Aaron

Adams

Akers

Allen

Abbot

Page 16: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 16

An Clustering Index on a Non_Key Column

1234

5

1 41 81 272 15

2 11333

3344

5555

INDEX FILE

DATA FILEDNO NAME EMPNO SALARY

CLUSTERINGFIELD VALUE

BLOCKPOINTER

.....

It has been decided to order the data file on the department number.This is sometimes termed a clustering index but it is different to a cluster in Oracle

Page 17: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 17

Consider Which Type of Index ?

SELECT * FROM EMP WHERE E# = ‘E1’SELECT * FROM EMP WHERE DEPT = ‘D1’SELECT * FROM EMP WHERE ENAME = ‘ABLE’SELECT * FROM EMP WHERE ENAME LIKE ‘A%’SELECT * FROM EMP WHERE DEPT = ‘D2’ AND ENAME > ‘CAIN’

E# ENAME DEPT

E1E2E3E4

ABLEADAMSBLAKEBROCK

D1D1D1D2

Page 18: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 18

- because a single level index is an ordered file we can create a non-dense index to an index

i.e A Second Level Index

We can repeat this process of leveling until the highest level of our index can fit into main memory

- probably 1 page in size - this will reduce the number of I/O's in traversing the index by 1

This concept of a number of index levels decreasing in breadth is a tree structure

Multi Level Indexes

Page 19: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 19

Tree structures have certain properties we can take advantage of :-

1. The number of pointers at one level will determine the number of values stored at the next level

We can predict the number of levels and therefore the number of accesses required for our data file.If a node has p data fields it has p + 1 pointers to p + 1 nodes each of which has p values i.e A node which can fit 2 data values per index node has 3 pointers to three nodes each with two data values and 3 pointers which can point to 18 data values .

Indexes as Tree structures

Page 20: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 20

<=

> >

<=

> >

<=

> ><=

> >

A 3 level index with 2 data values per node could point to 18 data values.

Indexes as Tree structures

Page 21: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 21

2. If a tree structure is balanced and self-maintaining then the number of accesses to the data file is constant

The number of accesses using a multi-level index is :-

Indexes as Tree structures

- where b is the branching factor and n is the number of pages

NOTE : The number of accesses will decrease as n becomes smaller (i.e use a non-dense index) and as the branching factor b increases

BA = (logb

n) + 1

Page 22: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 22

(Example 3. Elmasri and Navathe Ch 5)V = 9 bytes + a 6 byte pointer thus Index entry Ri = 15 bytesThe blocking factor = 1024/15 = 68This is known as the fanout for a multi-level indexThe number of first level blocks = 30,000 / 68 = 442 blocksThe number of second level blocks = 442 / 68 = 7 blocksThe number of third level blocks = 7 / 68 = 1 block

Indexes as Tree structures

BA = T(number of levels) + 1 = 3 + 1 = 4

or BA = (log68 3000) + 1 = 3 + 1 = 4

Index size = b1 + b2 + b3 = 442 + 7 + 1 = 450 blocks

Page 23: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 23

SELECT *FROM EMPWHERE E# = 'E4'

Total Number of Accesses = Index Accesses+1 Data Access = 1 + 1 = 2Average Number of Serial Accesses for a Table with 3 pages is n/2 = 2

Index Usage

An index is used to optimize data retrieval

Key Page #

E1E20E40

123

P1

P2

P3

E1E2E10

E20E30

E40E45E56

Page 24: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 24

p. 210, 242-247 Hoffer et.al. 6th Edn.Ch 18 Date The Query Optimizer may decide against using an indexIn some cases the optimizer will use only the index to answer a query and will not access any data pagesSELECT COUNT(*)FROM EMP

Some Relational Products force the use of an index even though it may be inefficient to do so1. To support a PK (in Oracle if a column is made a primary key an index is created)2. To implement clustering

The Query Optimizer

Page 25: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 25

Create indexes on columns used in predicates :-. Read only and frequently accessed tables if > 3 pages. The columns of a predicate in frequently executed transactions. High update tables can also use indexes if > 6 pages. columns used in joins are candidates for indexes. columns in which aggregates are frequently calculated. use indexes on FK if using RI - will work out integrity violations or cascade quicklyAvoid creating indexes on :-. attributes with a small number of unique values i.e. Gender M,F although the Oracle BITMAP index is suitable in this situation. keep indexes down to a reasonable number on high update attributes(tables) 2 or 3 if possible

Good and Bad Candidates for Indexes

Page 26: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 26

The above criteria are not always mutually exclusive - therefore must decide on index usage based on the most important requirements. There are tricks which can be used to enhance the use of indexes :-. place index or index level in memory. place indexes on a fast device. place indexes on a separate device from the data they reference. use multiple column indexes where possiblei.e INDEX C1, C2, C4SELECT *FROM EMP WHERE C1 = 'A' - will use both AND C2 = 'B' treats this asAND C5 = 'E' a substring

SELECT *FROM EMPWHERE C1 = 'A' -will use only oneAND C4 = 'D'AND C5 = 'E'

Other Issues for Indexes

Page 27: Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Indexes 11. 27

Oracle block maybe 2048 bytes (they vary with operating system) Number of EMPS = 30,000 48 bytes overhead per index blockBlocking Factor of the index or keys per index page = 2000/(key length + 6)= 2000/(9+6)= 133- in theory the branching factor will be less than 133 as we will want to include free space b1 = number of records / 133 = 30,000/ 133 = 225 b2 = b1 / 133 = 2b3 = b2 / 133 < 1 t = 3 levels of index dataor t = log 133 30000 = 2.1 = 3 (= log 10 30000 / log 10 133)Maximum Number of Records with a three level index with Blocking Factor as indicated = 133 x 133 x 133 = 2.5 million

An Example