7/31/2019 Chap2 FileOrg Indexes
1/42
File Organization and Index StructuresInstructor: Mr Mourad Benchikh
Text Books: Elmasri & Navathe Chap. 5+6
Ramakrishnan & Gehrke Chap. 7+8+9
Oracle9i documentation
First-Semester 1427-1428 Databases are stored physically as files of records typically
stored on magnetic disks.
This chapter will deal with the organization of databases instorage and the techniques for accessing them efficiently
using various algorithms some of which require auxiliary
data structures called Indexes. Emphasize on search process ; deletion, update, and
insertion issues will not be covered.
7/31/2019 Chap2 FileOrg Indexes
2/42
Storage Medium Primary Storage
Main memory, smaller but faster cache memories.
Fast access to data but is of limited storage capacity
Can be operated on directly by the CPU
Secondary Storage Magnetic disks, optical disks and tapes
Larger capacity and less cost
Slower access to data
Data cannot be processed directly by CPU
Magnetic Disks Secondary storage.
Transfer of data between main memory and disk takes place in units of disk blocks: blocksunits of data transfer and data allcation.
For read command: the block from disk is copied into the buffer
For write command: the contents of the buffer are copied into the disk block
7/31/2019 Chap2 FileOrg Indexes
3/42
Records Records Data is usually stored in form of records.
Each record consists of a collection of related data values or items.
Records usually describe entities and their attributes.
For example, an EMPLOYEE record represents and employee
entity and each field value in the record specifies some attribute of
that employee, such as NAME, BIRTHDATE, SALARY.
A collection of field names and their corresponding data types
constitutes a record type or record format.
C-Notation:
struct employee{
char name[30];char ssn[9];
int salary;
int jobCode;
char department[20];
};
7/31/2019 Chap2 FileOrg Indexes
4/42
File File
A sequence of records.
Usually all records in a file are of the same record type (Fixed-length records)
Variable-length records: some possible schemes: The file records are of the same record type but one or more of the fields are of varying
size.
The file records are of the same record type but one or more of the fields may havemultiple values for the individual records.
The file records are of the same record type, but one or more of the fields are optional. The file include records of different types, each record will be preceded by a record typeindication: if a relation exists between EMPLOYEE and DEPARTMENT, then their corresponding records are physicallycontiguous (clustered) in order to minimize I/O operations.
In general, a block contains one or more records specific toone file only:
Spanned organization: records can cross block boundaries
Unspanned organization: records cant cross block boundaries.
Blocking Factor: Bfr =Number of records per block.
7/31/2019 Chap2 FileOrg Indexes
5/42
Allocating File Blocks Contiguous Allocation The file blocks are allocated to consecutive disk blocks.
Reading the whole file is very fast (using double buffering)
Expanding the file is difficult
Linked Allocation Each file block contains a pointer to the next file block.
Easy to expand but slow to read the whole file.
Combination Allocates clusters of consecutive disk blocks and the clusters are
linked.
Indexed allocation One or more index blocks contain pointers to the actual file blocks.
7/31/2019 Chap2 FileOrg Indexes
6/42
Organization & Access Method File Organization
The organization of the data of a file into records, blocks, and access structures
The way records and blocks are placed on the storage medium and interlinked
Example: Sorted File.
Access Method Provide a group of operations that can be applied to a file :
Open, Find, Delete, Modify, Insert, Close,..etc.
It is possible to apply several access methods to a file organization.
Some access methods can be applied only to files organized in certain ways:
Cannot apply an indexed access method to a file without an index.
Choose the file organization that efficiently implement
the access methods needed by the application.
7/31/2019 Chap2 FileOrg Indexes
7/42
Heap Files (Unordered Files)
Heap File (Pile) The simplest type of file organization.
Records are placed in the file in the order in which they are inserted.
New records are inserted at the end of the file. the address of the last block infile header-
Searching, using any search cdt, involves a linear search, an expensiveprocedure
Relative or Direct File Relative or (Direct File)
Unordered fixed-length records using unspanned blocks and contiguous
allocation
We can then access any record by its position in the file.
The ith record is located in blocki/Bfr.
Helpful organization to locate a record by its position but not helpful to locate a
record based on a search condition.
7/31/2019 Chap2 FileOrg Indexes
8/42
Sorted Files Organization that physically order the records of a file on disk basedon the values of one of the their fields called the ordering field.
If the ordering field is also a key field of the file then the field is
called the ordering key for the file. Figure 5.9 shows an ordered file with NAME as the ordering key
field (assuming that employees have distinct names).
Reading the records in order of the ordering key values becomesextremely efficient, because no sorting is required.
Using a search condition based on the value of an ordering key field
results in faster access when the binary search technique is used.
Ordering does not provide any advantage for random or ordered
access of the records based on values for the other non-ordering
fields of the file. In this case, do a linear search for random access
7/31/2019 Chap2 FileOrg Indexes
9/42
7/31/2019 Chap2 FileOrg Indexes
10/42
Binary SearchAlgorithm 5.1 Binary search on an ordering key of a disk file
L= 1; U = b; /* b is the number of file blocks*/
while(U >= L) do
begin I = (L + U) div 2;read block I of the file into the buffer;
if K < (ordering key field value of the first record in block I)
then U = I-1
else if K > (ordering key field value of the last record in block I)then L = I+1
else if the record with ordering key field value = K is in the buffer
then goto found
else goto notFound
endif;
goto notFound;
If b is the number of a sorted files block, then in average log2(b) isthe number of blocks to search using a binary search.
7/31/2019 Chap2 FileOrg Indexes
11/42
Hashing Organization Provides very fast access to records on certain search
conditions.
The search condition must be an equality condition on a
hash field of the file.
In most cases, the hash field is also a key field of the
file (hash key) Hashing
To provide a function h, called a hash function, that is
applied to the hash field value of a record and yields theaddress of the disk block in which the record is stored.
A search for the record within the block can be carried
out in a main memory buffer.
7/31/2019 Chap2 FileOrg Indexes
12/42
Internal Hashing Internal files
Hashing is also used as an internal search structure within
a program whenever a group of records accessed
exclusively by using the value of one field.
Hashing is implemented as a hash table through the use of
an array of records.
Suppose that the array index range is from 0 to N-1; then
we have N slots whose addresses correspond to the array
indexes.
We choose a hash function that transforms the hash field
value into an integer between 0 and N-1.
One common hash function is the h(K) = K mod M
function, this value is used for the record address.
7/31/2019 Chap2 FileOrg Indexes
13/42
Internal Hashing
rrecord
s
N
record
slo
ts
H(K)
Key0
1
N-1
K mod N
In general, r N
7/31/2019 Chap2 FileOrg Indexes
14/42
Hashing Function Key is student id (six digits) Assume we have N = 100,000 record slots numbered 00000 99999
H(K): student_id mod 100000
085768085768 mod 100000 = 85768 134281134281 mod 100000 = 34281 101004101004 mod 100000 = 1004 100000100000 mod 100000 = 0 601004601004 mod 100000 = 1004 (collision)
Collision Collision
A collision occurs when the hash field value of a record that is being inserted hashes to anaddress that already contains a different record.
The process of finding another position (after collision) is called collision resolution.
Methods for collision resolution:
Open addressing Chaining Multiple hashing
7/31/2019 Chap2 FileOrg Indexes
15/42
External Hashing Hashing for disk files is called external hashing.
The target address space is made of buckets, each of which holds multiple
records. A bucket is either one disk block or a cluster of contiguous blocks.
The hashing function maps a the indexing fields value into a relative
bucket number.
A table maintained in the file header converts the bucket number into the
corresponding disk block address.
7/31/2019 Chap2 FileOrg Indexes
16/42
Dynamic Files & Hashing
One problem with hashing so far is that the
address space N is fixed.
Extendible hashing
If the number of records grows beyond original size,
the file must be reorganized
How to handle dynamic files better?
Dynamic hashing Linear hashing
7/31/2019 Chap2 FileOrg Indexes
17/42
Indexing Index File (same idea as textbook index) : auxiliary structure designed to speed up access to
desired data.
Indexing field: field on which the index file is defined.
Index file stores each value of the index field along with pointer: pointer(s) to block(s) that
contain record(s) with that field value or pointer to the record with that field value:
In oracle, the pointer is called RowID which tells the DBMS where the row (record) is located (by file, block within thatfile, and row within the block).
To find a record in the data file based on a certain selection criterion on an indexing field,
we initially access the index file, which will allow the access of the record on the data file. Index file much smaller than the data file => searching will be fast.
Indexing important for file systems and DBMSs:
Databases eventually map data to file structures on disk :
Records of each relation may be stored in a separate file. Records of several different relations can be stored in the same file (i.e. physically
clustered file organization : to minimize I/O)
In DBMSs, the query processor accesses the index structures for processing a query
(e.g., indexed join called also single-loop join)
7/31/2019 Chap2 FileOrg Indexes
18/42
Types of Indexes
Indexes on ordered vs. unordered files Dense vs. non-dense (i.e. sparse) indexes
- Dense: An entry in the index file for each record of the data file.
- Sparse: only some of the data records are represented in the index, often one index entry per block of the data file.
Primary indexes vs. secondary indexes
Ordered Indexes Hash indexes- Ordered Indexes: indexing fields stored in sorted order.
- Hash indexes: indexing fields stored using a hash function.
Single-level vs. multi-level single-level index is an ordered file and is searched using binary search.
multi-level ones are tree-structured that improve the search and require a more elaborate search algorithm.
Index on a single indexing field Index on multiple indexing
fields (i.e.Composite Index). If a certain combination of fields is used frequently, set an index on multiple fields.
7/31/2019 Chap2 FileOrg Indexes
19/42
Single-Level Ordered Index : Primary Index
Physical records may be kept ordered on the primary
key
The index is ordered but only one entry record foreach block (non-dense).
Each index entry has the value of the primary key
field for the first record (or the last record) in a blockand a pointer to that block.
Reduces the index requirements
fewer index entries than records in the file
binary search over index can be faster (fewer index block
to read than ordered? file approach).
7/31/2019 Chap2 FileOrg Indexes
20/42
Single-Level Ordered Index: Primary Index10567 J. Doe CS 3
11589 T. Allen BA 215973 M. Smith CS 3
29579 B. Zimmer BS 1
34596 T. Atkins ME 475623 J. Wong BA 3
84920 S. Allen CS 496256 P. Wright ME 2
15973
7562396256
7/31/2019 Chap2 FileOrg Indexes
21/42
Single-Level Ordered Index: Clustering Index Records physically ordered by a non-key field
Same general structure as ordered file index
One entry in the index for each distinct value of the
clustering field with a pointer to the first block in the
data file that has a record with that value for its
clustering field. Possibly many records for one index entry (non-dense)
Sometimes entire blocks reserved for each distinct
clustering field value
7/31/2019 Chap2 FileOrg Indexes
22/42
Single-Level Ordered Index: Clustering Index11589 T. Allen BA 2
75623 J. Wong BA 329579 B. Zimmer BS 1
10567 J. Doe CS 3
15973 M. Smith CS 384920 S. Allen CS 4
34596 T. Atkins ME 496256 P. Wright ME 2
BA
BSCS
ME
7/31/2019 Chap2 FileOrg Indexes
23/42
Single-Level Ordered Index: Secondary Indexes Ordered file with two fields.
Non-ordering field (indexing field)
Block pointer or a record pointer
There can be several secondary indexes for the same file but only oneprimary index.
Dense Secondary Index (non-ordering key field). See Figure 6.4.
Several options for a secondary index on a non-key field: Option1:Include several index entries with the same value of the
indexing field -one for each record- dense index.
Option2: More commonly used, have a single entry for each index
value but to create an extra level of indirection to handle the
multiple pointers. See figure 6.5
Etc.
7/31/2019 Chap2 FileOrg Indexes
24/42
7/31/2019 Chap2 FileOrg Indexes
25/42
7/31/2019 Chap2 FileOrg Indexes
26/42
Types of Single-Level Ordered
Indexes
Secondary Index (non-key)Clustering IndexNon-key Field
Secondary Index (key)Primary IndexKey Field
Non-ordering FieldOrdering Field
Non-DenseNumber of distinct index
field values (Option 2 )
Secondary (non-
key)
DenseNumber of records in a
data file
Secondary (Key)
Non-denseNumber of distinct index
field values
Clustering
Non-denseNumber of blocks in data
file
Primary
Dense or non-denseNumber of first-level
Index entries
7/31/2019 Chap2 FileOrg Indexes
27/42
Static Multilevel Indexes
Multilevel index considers the index file (first level) as anordered file with a distinct value of each value of the
indexing field. The primary index to first level is called
second level of the multilevel index. Hence multilevel index with r1 first-level entries will have
approximately t levels, t = logfo r1
. Fanout : fo = Nb records per First level block.
Indexed Sequential File: commonly used file organization The data file is an ordered file with a multilevel primary index on its ordering
key field. See Figure 6.6
Multilevel index speeds record search.
Problems of index deletion & insertion which may require
reorganization of the index: when the data file is modified,
the index must be updated.
7/31/2019 Chap2 FileOrg Indexes
28/42
7/31/2019 Chap2 FileOrg Indexes
29/42
Dynamic Multilevel Indexes Retain the benefits of using multilevel indexing while reducing index insertion & deletionproblems: automatically reorganizes itself with small, local changes in the face of insertions
and deletions.
Leave some space in each of its blocks for inserting new entries.
Dynamic multilevel indexes are implemented as B-trees and often as B+
-trees.B-tree: . allow an indexing field value to appear only once at some level in the tree ;
.pointer to data at each node.
B+-tree: .pointers to data are stored only at the leaf nodes of the tree ;
. Leaf nodes have an entry for every indexing field value.
. The leaf nodes are usually linked together to provide ordered access on the indexing field to the records.
. All the leaf nodes of the tree are at the same depth: retrieval of any record takes the same time.
. In Oracle B+-tree is called B*-tree??? see next figure -
Other types of indexes-Other indexing techniques other than tree-based techniques are: hashed-based techniques:
-Hashing can be used not only for file organization, but also for index-structure creation: a hash
index organizes the indexing fields, with their associated pointers, into a hash file structure.
7/31/2019 Chap2 FileOrg Indexes
30/42
3-levels B+-index
Fil f i d d Cl t i O l 9i
7/31/2019 Chap2 FileOrg Indexes
31/42
Files of mixed records:Clusters in Oracle 9i
A cluster is made up of a group of tables that share the same datablocks, These tables have been grouped together because they share common columns and areoften used together.
For example, the EMP and DEPT tables share the DEPTNO column called clusterkey-. When you cluster the EMP and DEPT tables clustered tables-, Oracle
physically stores all rows for each department from both the EMP and DEPT tables inthe same data blocks.
Advantages:
Access time improves for joins of clustered tables The cluster key is the column, or group of columns, that the clustered tables have in common.
Each cluster key value is stored only once each in the cluster and the cluster index, no matterhow many rows of different tables contain the value. Therefore, less storage might be requiredto store related table and index data in a cluster than is necessary in non-clustered table format.
For example, notice how each cluster key (each DEPTNO) is stored just once for many rowsthat contain the same value in both the EMP and DEPT tables. see next figure-
A hash cluster : for performance accessOracle physically stores the rows of a table in a hash cluster and retrieves themaccording to the results of a hash function. a way to improve the performance of dataretrieval
7/31/2019 Chap2 FileOrg Indexes
32/42
Clusters in Oracle 9i (contd)
Cl t i O l 9i ( td)
7/31/2019 Chap2 FileOrg Indexes
33/42
Clusters in Oracle 9i (contd)
Steps Create the cluster
CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTUSED 80 PCTFREE 5SIZE 600 TABLESPACE users STORAGE (INITIAL 200k NEXT 300K
MINEXTENTS 2 MAXEXTENTS 20 PCTINCREASE 33);
Creating Clustered Tables
CREATE TABLE dept ( deptno NUMBER(3) PRIMARY KEY, . . . ) CLUSTER
emp_dept (deptno); CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, ename
VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept)CLUSTER emp_dept (deptno);
Creating the Cluster Indexe:A cluster index must be created before
any rows can be inserted into any clustered table
CREATE INDEX emp_dept_index ON CLUSTER emp_dept INITRANS 2MAXTRANS 5 TABLESPACE users STORAGE (INITIAL 50K NEXT 50KMINEXTENTS 2 MAXEXTENTS 10 PCTINCREASE 33) PCTFREE 5;
SQL O l 9i d I d
7/31/2019 Chap2 FileOrg Indexes
34/42
SQL, Oracle9i and Indexes SQL-92 doesnt include statement for index structure, and so there are some
variation in index-related commands cross different DBMSs.
When a table is created, it is desirable to add indexes on certain
attributes
Especially the primary key
The existence of indexes can greatly speed query processing
Consider selecting a subset of tuples from a relation based on the value of the
key field or a join like:
RR.ATTR1>S.ATTR2 S
Indexes can be created implicitly by the DBMS at table creation
time E.g. on any attribute designated as a primary key
Oracle automatically creates an index when UNIQUE or PRIMARY KEY
constraints clause is specified in a Create Table.
SQL O l 9i d I d
7/31/2019 Chap2 FileOrg Indexes
35/42
SQL, Oracle9i and Indexes Indexes may also be created explicitly with SQL DDL
commands
Consider the following Oracle Statements:
When you create an index, Oracle fetches and sorts the columns to beindexed, and stores the RowId along with the index value for each row.
Then Oracle loads the index from the bottom up.
CREATE INDEX emp_ename ON emp(ename); Oracle sorts the EMP table on theENAME column. It then loads the index with the ENAME and corresponding RowId
values in this sorted order. When it uses the index, Oracle does a quick search through the
sorted ENAME values and then uses the associated RowId values to locate the rows
having the sought ENAME value.
In Oracle you can create more than one index using the same columns
provided that you specify distinctly different combinations of the columns
In Oracle you cannot create an index that references only one column in a
table if another such index already exists.
SQL O l 9i d I d ( )
7/31/2019 Chap2 FileOrg Indexes
36/42
SQL, Oracle9i and Indexes (contd)
Consider the following Oracle Statements (contd): CREATE UNIQUE INDEX pkIdx ON Staff(SIN)
Creates an index on the field SIN in the table Staff
The UNIQUE keyword ensures the uniqueness of SIN values in the table(and index). This uniqueness is enforced even when adding an index to atable with existing data. If the SIN field is non-unique then the indexcreation fails.
If the UNIQUE keyword is not used, then two rows of the table can have thesame value.
Nonunique indexes are sorted by the index key and rowid.
Composite index is an index that you create on multiple columns in a table
CREATE INDEX CInd ON Student(Fname, Lname); Composite indexes can speed retrieval of data for SELECT statements in
which the WHERE clause references all or the leading portion of thecolumns in the composite index
- DROP INDEX clIdx; -Drops the index clIdx-.
SQL Oracle9i and Indexes (contd)
7/31/2019 Chap2 FileOrg Indexes
37/42
SQL, Oracle9i and Indexes (cont d) Oracle and indexes
Table indexes: Store each field value repeatedly with each stored RowId. Oracle uses B*-tree (B+-tree???) as internal structure of a table index.
Bitmap indexes:
Rather than a B*-tree, bitmap indexes store the RowIds associated with a field value asa bitmap. Each bit in the bitmap corresponds to a possible RowId, and if the bit is set, itmeans that the row with the corresponding RowId contains the field value.
A mapping function converts the bit position to an actual RowId, so the bitmap index providesthe same functionality as a regular index even though it uses a different representationinternally.
Among the advantages of using bitmap indexes: speed searches in case where low cardinalitycolumns are used - columns in which the number of distinct values is small compared to thenumber of rows in the table-.
Cluster indexes: A cluster index is an index defined specifically for a cluster. A cluster index contains an entry for each cluster key value.
To locate a row in a cluster
the cluster index is used to find the cluster key value, which points to the data block associatedwith that cluster key value.
rac e an n exes cont
7/31/2019 Chap2 FileOrg Indexes
38/42
, rac e an n exes cont- create bitmap index Emp_M_S on Employee(Marital_Status);
- create bitmap index Emp_R on Employee(Region);
SQL Oracle9i and Indexes (contd)
7/31/2019 Chap2 FileOrg Indexes
39/42
SQL, Oracle9i and Indexes (cont d) Oracle and indexes (contd)
Function-Based indexes You can create indexes based on Oracle Functions.
You can create such an index -Create index name_emp on emp(upper(name))-
. Can facilitates processing the query: select * from emp where upper(ename)=ALI.
- Index-Organized table The entire table is stored within an index structure.
Create table employee (ID char(9) primary key, name varchar2(20)) organization index;
Instead of maintaining two separate storages for the table and the B*-tree index, the
database system only maintains a single B*-tree index . The tables data is sorted by the tables primary key.-primary key mandatory-
Each B*-tree index leaf entry contains instead of
7/31/2019 Chap2 FileOrg Indexes
40/42
Index-Organized Table
Overview of Oracle9i DB structure and Space
7/31/2019 Chap2 FileOrg Indexes
41/42
Overview of Oracle9i DB structure and Space
management Oracle DB has logical and physical structures. Such separation allow logical structures to be defined identically
across different hardware and operating system platforms.
Logical DB structures represent the components see inan Oracle DB. Consist of: Tablespaces: The DB is divided logically divided into units called tablespaces
regrouping together related logical structures like all applications objects.SYSTEM tablespace is the minimum tablesapce requirement at DB creation. Italways contains the Data Dictionary..
Blocks: a block is the smallest unit of storage in Oracle.
Extents: an extent is a grouping of contiguous blocks.
Segments: a segment is a set of extents allocated for logical structures (as schemas).There are four segment types : data segments (store table (cluster) data), indexsegments (store index data), temporary segments (for temporary work: sort,etc.),undo segments (store undo information)
Schema objects : are the logical structures referring to the DBs data: tables, views,indexes, cluster, etc.
Overview of Oracle9i DB structure and Space
7/31/2019 Chap2 FileOrg Indexes
42/42
Overview of Oracle9i DB structure and Space
management Physical DB structures represents the method of internal
storage. Consist of:
Datafiles: contain all the DB data. An Oracle DB should have one ormore data files. Each data file is associated with only one tablespace. A
tablespace can consists of more than one data file.
When a user wants to read data in a table and the requested information is not in the
memory cache of the DB, it is read from the appropriate datafiles and stored in memory.
Modified or new data is not necessary written to a datafile immediately. It is pooled in
memory and written to the appropriate datafiles all at once as determined by the DBW).
Redo log files: record all changes made to data. These files are critical
for DB operation and recovery from failure. Two or more redo log files
are necessary. A redo log is made of redo entries (I.e. redo records).
Control files: maintain information about the physical structure of the
DB (ex. name and location of every data file and redo log file, etc.).
Every Oracle DB has at least one control file.