Download - Chap2 FileOrg Indexes

7/31/2019 Chap2 FileOrg Indexes

1/42

File Organization and Index StructuresInstructor: Mr Mourad Benchikh

Text Books: Elmasri & Navathe Chap. 5+6

Ramakrishnan & Gehrke Chap. 7+8+9

Oracle9i documentation

First-Semester 1427-1428 Databases are stored physically as files of records typically

stored on magnetic disks.

This chapter will deal with the organization of databases instorage and the techniques for accessing them efficiently

using various algorithms some of which require auxiliary

data structures called Indexes. Emphasize on search process ; deletion, update, and

insertion issues will not be covered.


2/42

Storage Medium Primary Storage

Main memory, smaller but faster cache memories.

Fast access to data but is of limited storage capacity

Can be operated on directly by the CPU

Secondary Storage Magnetic disks, optical disks and tapes

Larger capacity and less cost

Slower access to data

Data cannot be processed directly by CPU

Magnetic Disks Secondary storage.

Transfer of data between main memory and disk takes place in units of disk blocks: blocksunits of data transfer and data allcation.

For read command: the block from disk is copied into the buffer

For write command: the contents of the buffer are copied into the disk block


3/42

Records Records Data is usually stored in form of records.

Each record consists of a collection of related data values or items.

Records usually describe entities and their attributes.

For example, an EMPLOYEE record represents and employee

entity and each field value in the record specifies some attribute of

that employee, such as NAME, BIRTHDATE, SALARY.

A collection of field names and their corresponding data types

constitutes a record type or record format.

C-Notation:

struct employee{

char name[30];char ssn[9];

int salary;

int jobCode;

char department[20];

};


4/42

File File

A sequence of records.

Usually all records in a file are of the same record type (Fixed-length records)

Variable-length records: some possible schemes: The file records are of the same record type but one or more of the fields are of varying

size.

The file records are of the same record type but one or more of the fields may havemultiple values for the individual records.

The file records are of the same record type, but one or more of the fields are optional. The file include records of different types, each record will be preceded by a record typeindication: if a relation exists between EMPLOYEE and DEPARTMENT, then their corresponding records are physicallycontiguous (clustered) in order to minimize I/O operations.

In general, a block contains one or more records specific toone file only:

Spanned organization: records can cross block boundaries

Unspanned organization: records cant cross block boundaries.

Blocking Factor: Bfr =Number of records per block.


5/42

Allocating File Blocks Contiguous Allocation The file blocks are allocated to consecutive disk blocks.

Reading the whole file is very fast (using double buffering)

Expanding the file is difficult

Linked Allocation Each file block contains a pointer to the next file block.

Easy to expand but slow to read the whole file.

Combination Allocates clusters of consecutive disk blocks and the clusters are

linked.

Indexed allocation One or more index blocks contain pointers to the actual file blocks.


6/42

Organization & Access Method File Organization

The organization of the data of a file into records, blocks, and access structures

The way records and blocks are placed on the storage medium and interlinked

Example: Sorted File.

Access Method Provide a group of operations that can be applied to a file :

Open, Find, Delete, Modify, Insert, Close,..etc.

It is possible to apply several access methods to a file organization.

Some access methods can be applied only to files organized in certain ways:

Cannot apply an indexed access method to a file without an index.

Choose the file organization that efficiently implement

the access methods needed by the application.


7/42

Heap Files (Unordered Files)

Heap File (Pile) The simplest type of file organization.

Records are placed in the file in the order in which they are inserted.

New records are inserted at the end of the file. the address of the last block infile header-

Searching, using any search cdt, involves a linear search, an expensiveprocedure

Relative or Direct File Relative or (Direct File)

Unordered fixed-length records using unspanned blocks and contiguous

allocation

We can then access any record by its position in the file.

The ith record is located in blocki/Bfr.

Helpful organization to locate a record by its position but not helpful to locate a

record based on a search condition.


8/42

Sorted Files Organization that physically order the records of a file on disk basedon the values of one of the their fields called the ordering field.

If the ordering field is also a key field of the file then the field is

called the ordering key for the file. Figure 5.9 shows an ordered file with NAME as the ordering key

field (assuming that employees have distinct names).

Reading the records in order of the ordering key values becomesextremely efficient, because no sorting is required.

Using a search condition based on the value of an ordering key field

results in faster access when the binary search technique is used.

Ordering does not provide any advantage for random or ordered

access of the records based on values for the other non-ordering

fields of the file. In this case, do a linear search for random access


9/42


10/42

Binary SearchAlgorithm 5.1 Binary search on an ordering key of a disk file

L= 1; U = b; /* b is the number of file blocks*/

while(U >= L) do

begin I = (L + U) div 2;read block I of the file into the buffer;

if K < (ordering key field value of the first record in block I)

then U = I-1

else if K > (ordering key field value of the last record in block I)then L = I+1

else if the record with ordering key field value = K is in the buffer

then goto found

else goto notFound

endif;

goto notFound;

If b is the number of a sorted files block, then in average log2(b) isthe number of blocks to search using a binary search.


11/42

Hashing Organization Provides very fast access to records on certain search

conditions.

The search condition must be an equality condition on a

hash field of the file.

In most cases, the hash field is also a key field of the

file (hash key) Hashing

To provide a function h, called a hash function, that is

applied to the hash field value of a record and yields theaddress of the disk block in which the record is stored.

A search for the record within the block can be carried

out in a main memory buffer.


12/42

Internal Hashing Internal files

Hashing is also used as an internal search structure within

a program whenever a group of records accessed

exclusively by using the value of one field.

Hashing is implemented as a hash table through the use of

an array of records.

Suppose that the array index range is from 0 to N-1; then

we have N slots whose addresses correspond to the array

indexes.

We choose a hash function that transforms the hash field

value into an integer between 0 and N-1.

One common hash function is the h(K) = K mod M

function, this value is used for the record address.


13/42

Internal Hashing

rrecord

s

N

record

slo

ts

H(K)

Key0

1

N-1

K mod N

In general, r N


14/42

Hashing Function Key is student id (six digits) Assume we have N = 100,000 record slots numbered 00000 99999

H(K): student_id mod 100000

085768085768 mod 100000 = 85768 134281134281 mod 100000 = 34281 101004101004 mod 100000 = 1004 100000100000 mod 100000 = 0 601004601004 mod 100000 = 1004 (collision)

Collision Collision

A collision occurs when the hash field value of a record that is being inserted hashes to anaddress that already contains a different record.

The process of finding another position (after collision) is called collision resolution.

Methods for collision resolution:

Open addressing Chaining Multiple hashing


15/42

External Hashing Hashing for disk files is called external hashing.

The target address space is made of buckets, each of which holds multiple

records. A bucket is either one disk block or a cluster of contiguous blocks.

The hashing function maps a the indexing fields value into a relative

bucket number.

A table maintained in the file header converts the bucket number into the

corresponding disk block address.


16/42

Dynamic Files & Hashing

One problem with hashing so far is that the

address space N is fixed.

Extendible hashing

If the number of records grows beyond original size,

the file must be reorganized

How to handle dynamic files better?

Dynamic hashing Linear hashing


17/42

Indexing Index File (same idea as textbook index) : auxiliary structure designed to speed up access to

desired data.

Indexing field: field on which the index file is defined.

Index file stores each value of the index field along with pointer: pointer(s) to block(s) that

contain record(s) with that field value or pointer to the record with that field value:

In oracle, the pointer is called RowID which tells the DBMS where the row (record) is located (by file, block within thatfile, and row within the block).

To find a record in the data file based on a certain selection criterion on an indexing field,

we initially access the index file, which will allow the access of the record on the data file. Index file much smaller than the data file => searching will be fast.

Indexing important for file systems and DBMSs:

Databases eventually map data to file structures on disk :

Records of each relation may be stored in a separate file. Records of several different relations can be stored in the same file (i.e. physically

clustered file organization : to minimize I/O)

In DBMSs, the query processor accesses the index structures for processing a query

(e.g., indexed join called also single-loop join)


18/42

Types of Indexes

Indexes on ordered vs. unordered files Dense vs. non-dense (i.e. sparse) indexes

- Dense: An entry in the index file for each record of the data file.

- Sparse: only some of the data records are represented in the index, often one index entry per block of the data file.

Primary indexes vs. secondary indexes

Ordered Indexes Hash indexes- Ordered Indexes: indexing fields stored in sorted order.

- Hash indexes: indexing fields stored using a hash function.

Single-level vs. multi-level single-level index is an ordered file and is searched using binary search.

multi-level ones are tree-structured that improve the search and require a more elaborate search algorithm.

Index on a single indexing field Index on multiple indexing

fields (i.e.Composite Index). If a certain combination of fields is used frequently, set an index on multiple fields.


19/42

Single-Level Ordered Index : Primary Index

Physical records may be kept ordered on the primary

key

The index is ordered but only one entry record foreach block (non-dense).

Each index entry has the value of the primary key

field for the first record (or the last record) in a blockand a pointer to that block.

Reduces the index requirements

fewer index entries than records in the file

binary search over index can be faster (fewer index block

to read than ordered? file approach).


20/42

Single-Level Ordered Index: Primary Index10567 J. Doe CS 3

11589 T. Allen BA 215973 M. Smith CS 3

29579 B. Zimmer BS 1

34596 T. Atkins ME 475623 J. Wong BA 3

84920 S. Allen CS 496256 P. Wright ME 2

15973

7562396256


21/42

Single-Level Ordered Index: Clustering Index Records physically ordered by a non-key field

Same general structure as ordered file index

One entry in the index for each distinct value of the

clustering field with a pointer to the first block in the

data file that has a record with that value for its

clustering field. Possibly many records for one index entry (non-dense)

Sometimes entire blocks reserved for each distinct

clustering field value


22/42

Single-Level Ordered Index: Clustering Index11589 T. Allen BA 2

75623 J. Wong BA 329579 B. Zimmer BS 1

10567 J. Doe CS 3

15973 M. Smith CS 384920 S. Allen CS 4

34596 T. Atkins ME 496256 P. Wright ME 2

BA

BSCS

ME


23/42

Single-Level Ordered Index: Secondary Indexes Ordered file with two fields.

Non-ordering field (indexing field)

Block pointer or a record pointer

There can be several secondary indexes for the same file but only oneprimary index.

Dense Secondary Index (non-ordering key field). See Figure 6.4.

Several options for a secondary index on a non-key field: Option1:Include several index entries with the same value of the

indexing field -one for each record- dense index.

Option2: More commonly used, have a single entry for each index

value but to create an extra level of indirection to handle the

multiple pointers. See figure 6.5

Etc.


24/42


25/42


26/42

Types of Single-Level Ordered

Indexes

Secondary Index (non-key)Clustering IndexNon-key Field

Secondary Index (key)Primary IndexKey Field

Non-ordering FieldOrdering Field

Non-DenseNumber of distinct index

field values (Option 2 )

Secondary (non-

key)

DenseNumber of records in a

data file

Secondary (Key)

Non-denseNumber of distinct index

field values

Clustering

Non-denseNumber of blocks in data

file

Primary

Dense or non-denseNumber of first-level

Index entries


27/42

Static Multilevel Indexes

Multilevel index considers the index file (first level) as anordered file with a distinct value of each value of the

indexing field. The primary index to first level is called

second level of the multilevel index. Hence multilevel index with r1 first-level entries will have

approximately t levels, t = logfo r1

. Fanout : fo = Nb records per First level block.

Indexed Sequential File: commonly used file organization The data file is an ordered file with a multilevel primary index on its ordering

key field. See Figure 6.6

Multilevel index speeds record search.

Problems of index deletion & insertion which may require

reorganization of the index: when the data file is modified,

the index must be updated.


28/42


29/42

Dynamic Multilevel Indexes Retain the benefits of using multilevel indexing while reducing index insertion & deletionproblems: automatically reorganizes itself with small, local changes in the face of insertions

and deletions.

Leave some space in each of its blocks for inserting new entries.

Dynamic multilevel indexes are implemented as B-trees and often as B+

-trees.B-tree: . allow an indexing field value to appear only once at some level in the tree ;

.pointer to data at each node.

B+-tree: .pointers to data are stored only at the leaf nodes of the tree ;

. Leaf nodes have an entry for every indexing field value.

. The leaf nodes are usually linked together to provide ordered access on the indexing field to the records.

. All the leaf nodes of the tree are at the same depth: retrieval of any record takes the same time.

. In Oracle B+-tree is called B*-tree??? see next figure -

Other types of indexes-Other indexing techniques other than tree-based techniques are: hashed-based techniques:

-Hashing can be used not only for file organization, but also for index-structure creation: a hash

index organizes the indexing fields, with their associated pointers, into a hash file structure.


30/42

3-levels B+-index

Fil f i d d Cl t i O l 9i


31/42

Files of mixed records:Clusters in Oracle 9i

A cluster is made up of a group of tables that share the same datablocks, These tables have been grouped together because they share common columns and areoften used together.

For example, the EMP and DEPT tables share the DEPTNO column called clusterkey-. When you cluster the EMP and DEPT tables clustered tables-, Oracle

physically stores all rows for each department from both the EMP and DEPT tables inthe same data blocks.

Advantages:

Access time improves for joins of clustered tables The cluster key is the column, or group of columns, that the clustered tables have in common.

Each cluster key value is stored only once each in the cluster and the cluster index, no matterhow many rows of different tables contain the value. Therefore, less storage might be requiredto store related table and index data in a cluster than is necessary in non-clustered table format.

For example, notice how each cluster key (each DEPTNO) is stored just once for many rowsthat contain the same value in both the EMP and DEPT tables. see next figure-

A hash cluster : for performance accessOracle physically stores the rows of a table in a hash cluster and retrieves themaccording to the results of a hash function. a way to improve the performance of dataretrieval


32/42

Clusters in Oracle 9i (contd)

Cl t i O l 9i ( td)


33/42

Clusters in Oracle 9i (contd)

Steps Create the cluster

CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTUSED 80 PCTFREE 5SIZE 600 TABLESPACE users STORAGE (INITIAL 200k NEXT 300K

MINEXTENTS 2 MAXEXTENTS 20 PCTINCREASE 33);

Creating Clustered Tables

CREATE TABLE dept ( deptno NUMBER(3) PRIMARY KEY, . . . ) CLUSTER

emp_dept (deptno); CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, ename

VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept)CLUSTER emp_dept (deptno);

Creating the Cluster Indexe:A cluster index must be created before

any rows can be inserted into any clustered table

CREATE INDEX emp_dept_index ON CLUSTER emp_dept INITRANS 2MAXTRANS 5 TABLESPACE users STORAGE (INITIAL 50K NEXT 50KMINEXTENTS 2 MAXEXTENTS 10 PCTINCREASE 33) PCTFREE 5;

SQL O l 9i d I d


34/42

SQL, Oracle9i and Indexes SQL-92 doesnt include statement for index structure, and so there are some

variation in index-related commands cross different DBMSs.

When a table is created, it is desirable to add indexes on certain

attributes

Especially the primary key

The existence of indexes can greatly speed query processing

Consider selecting a subset of tuples from a relation based on the value of the

key field or a join like:

RR.ATTR1>S.ATTR2 S

Indexes can be created implicitly by the DBMS at table creation

time E.g. on any attribute designated as a primary key

Oracle automatically creates an index when UNIQUE or PRIMARY KEY

constraints clause is specified in a Create Table.

SQL O l 9i d I d


35/42

SQL, Oracle9i and Indexes Indexes may also be created explicitly with SQL DDL

commands

Consider the following Oracle Statements:

When you create an index, Oracle fetches and sorts the columns to beindexed, and stores the RowId along with the index value for each row.

Then Oracle loads the index from the bottom up.

CREATE INDEX emp_ename ON emp(ename); Oracle sorts the EMP table on theENAME column. It then loads the index with the ENAME and corresponding RowId

values in this sorted order. When it uses the index, Oracle does a quick search through the

sorted ENAME values and then uses the associated RowId values to locate the rows

having the sought ENAME value.

In Oracle you can create more than one index using the same columns

provided that you specify distinctly different combinations of the columns

In Oracle you cannot create an index that references only one column in a

table if another such index already exists.

SQL O l 9i d I d ( )


36/42

SQL, Oracle9i and Indexes (contd)

Consider the following Oracle Statements (contd): CREATE UNIQUE INDEX pkIdx ON Staff(SIN)

Creates an index on the field SIN in the table Staff

The UNIQUE keyword ensures the uniqueness of SIN values in the table(and index). This uniqueness is enforced even when adding an index to atable with existing data. If the SIN field is non-unique then the indexcreation fails.

If the UNIQUE keyword is not used, then two rows of the table can have thesame value.

Nonunique indexes are sorted by the index key and rowid.

Composite index is an index that you create on multiple columns in a table

CREATE INDEX CInd ON Student(Fname, Lname); Composite indexes can speed retrieval of data for SELECT statements in

which the WHERE clause references all or the leading portion of thecolumns in the composite index

- DROP INDEX clIdx; -Drops the index clIdx-.

SQL Oracle9i and Indexes (contd)


37/42

SQL, Oracle9i and Indexes (cont d) Oracle and indexes

Table indexes: Store each field value repeatedly with each stored RowId. Oracle uses B*-tree (B+-tree???) as internal structure of a table index.

Bitmap indexes:

Rather than a B*-tree, bitmap indexes store the RowIds associated with a field value asa bitmap. Each bit in the bitmap corresponds to a possible RowId, and if the bit is set, itmeans that the row with the corresponding RowId contains the field value.

A mapping function converts the bit position to an actual RowId, so the bitmap index providesthe same functionality as a regular index even though it uses a different representationinternally.

Among the advantages of using bitmap indexes: speed searches in case where low cardinalitycolumns are used - columns in which the number of distinct values is small compared to thenumber of rows in the table-.

Cluster indexes: A cluster index is an index defined specifically for a cluster. A cluster index contains an entry for each cluster key value.

To locate a row in a cluster

the cluster index is used to find the cluster key value, which points to the data block associatedwith that cluster key value.

rac e an n exes cont


38/42

, rac e an n exes cont- create bitmap index Emp_M_S on Employee(Marital_Status);

- create bitmap index Emp_R on Employee(Region);

SQL Oracle9i and Indexes (contd)


39/42

SQL, Oracle9i and Indexes (cont d) Oracle and indexes (contd)

Function-Based indexes You can create indexes based on Oracle Functions.

You can create such an index -Create index name_emp on emp(upper(name))-

. Can facilitates processing the query: select * from emp where upper(ename)=ALI.

- Index-Organized table The entire table is stored within an index structure.

Create table employee (ID char(9) primary key, name varchar2(20)) organization index;

Instead of maintaining two separate storages for the table and the B*-tree index, the

database system only maintains a single B*-tree index . The tables data is sorted by the tables primary key.-primary key mandatory-

Each B*-tree index leaf entry contains instead of


40/42

Index-Organized Table

Overview of Oracle9i DB structure and Space


41/42


management Oracle DB has logical and physical structures. Such separation allow logical structures to be defined identically

across different hardware and operating system platforms.

Logical DB structures represent the components see inan Oracle DB. Consist of: Tablespaces: The DB is divided logically divided into units called tablespaces

regrouping together related logical structures like all applications objects.SYSTEM tablespace is the minimum tablesapce requirement at DB creation. Italways contains the Data Dictionary..

Blocks: a block is the smallest unit of storage in Oracle.

Extents: an extent is a grouping of contiguous blocks.

Segments: a segment is a set of extents allocated for logical structures (as schemas).There are four segment types : data segments (store table (cluster) data), indexsegments (store index data), temporary segments (for temporary work: sort,etc.),undo segments (store undo information)

Schema objects : are the logical structures referring to the DBs data: tables, views,indexes, cluster, etc.



42/42


management Physical DB structures represents the method of internal

storage. Consist of:

Datafiles: contain all the DB data. An Oracle DB should have one ormore data files. Each data file is associated with only one tablespace. A

tablespace can consists of more than one data file.

When a user wants to read data in a table and the requested information is not in the

memory cache of the DB, it is read from the appropriate datafiles and stored in memory.

Modified or new data is not necessary written to a datafile immediately. It is pooled in

memory and written to the appropriate datafiles all at once as determined by the DBW).

Redo log files: record all changes made to data. These files are critical

for DB operation and recovery from failure. Two or more redo log files

are necessary. A redo log is made of redo entries (I.e. redo records).

Control files: maintain information about the physical structure of the

DB (ex. name and location of every data file and redo log file, etc.).

Every Oracle DB has at least one control file.