Post on 03-Jan-2016
File Organization and
Storage Structures
Chapter 5
Basic Concepts
The database on secondary storage is organized into one or more files, where each file consists of a number of records.
Each record consists of one or more fields.
Typically, a record corresponds to an entity and a field to an attribute.
The physical record is the unit of transfer between disk and primary storage, and vice versa.
A physical record , sometimes called block or page, contains mostly several logical records, depending on the size of the records.
List structures
Elementary listSingular list
Circular list
Symmetric list
Symmetric circular list
Sequential insertion
X(1)
X(2)
X(3)
X(4)
FreeZone
X’(1)=X(1)
X’(2)=Y
X’(3)=X(2)
X’(4)=X(3)
freeZone
X’(5)=X(4)
Insertion with pointer technique
X(1)
X(3)
X(2)
X(4)
Y
X’(1)=X(1)
X’(4)=X(3)
X’(3)=X(2)
X’(5)=X(4)
X’(2)=Y
Multi-list structure
record with pointer record length 10
address
list1
list2
list emptyplaces
2000
3000
2020
2030
-1
-1
2050
2040
2010
2000
2060
3000
-1
.
.
.
A
B
K
L
Insertion at beginning of list 2
list1
list2
2000
3000
2020
2030
-1
-1
2010
2050
2040
2000
2060
3000
-1
.
.
.
A
B
K
L
M
List1: A B
List2: M K L
General tree structureA
B C
D E F H J K L
M N P Q R
Equivalent binary tree structure
A
B C
D E F
H J K L
Q R
M N P
Pointer Implementation
A
B C
D E F
H J K L
Q R
M N P
-1
-1
-1-1-1
-1-1-1-1
-1-1-1
-1-1-1-1
Bi-directional treeX
Y R S
Z U T
Entry
-1 X
Y -1 R -1 S
-1 Z -1 U -1 T
- first lower- higher- next
Ring structure
X
Y Z U
V T R
Entry
X
Y Z U
V T R
File Organization
File OrganizationThe physical arrangement of data into records and pages on
secondary storageMain types
• Heap or unordered
• Sorted
• Hash
Access methodThe steps involved in storing and retrieving records from a
file
Sample Data
SUPPLIER file
SNUM SNAME STATUS CITY
S1 De Smet 20 London
S2 Janssens 10 Paris
S3 Blanchart 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
Hash Files
S300 Blanchart 30 Paris
0 1
2 3
4 5
6 7
8 9
10 11
12
S200 Janssens 10 Paris
S500 Adams 30 Athens
S100 De Smet 20 London
S400 Clark 20 London
Hashing techniques
Duplicate handling
- open addressing- unchained overflow- Chained overflow- Multiple hashing
Hashing algorithms
- folding- mid-square- division by prime number
Limitations: - inappropriate for value ranges - retrieval on the non-hash fields
An Index
An index provides an ACCESS PATH to the file it is indexing
a file may have several associated indexes
the sequential access path is always available
an index imposes an ordering on the file it is indexing
it can be used for direct access
it speeds up retrieval and slows down updating
it is not the same thing as a key
can be build on combinations of fields
can be SRA or symbolic
Sample Data
SUPPLIER file
SNUM SNAME STATUS CITY
S1 De Smet 20 London
S2 Janssens 10 Paris
S3 Blanchart 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
Supplier file with index on city
Supplier file
SNUM SNAME STATUS CITY
S1 De Smet 20 London
S2 Janssens 10 Paris
S3 Blanchart 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
City-index
Athens .
London .
London .
Paris .
Paris .
Supplier file with two indexes
10
20
20
30
30
Supplier file
City-index
Athens .
London .
London .
Paris .
Paris .
SNUM SNAME STATUS CITY
S1 De Smet 20 London
S2 Janssens 10 Paris
S3 Blanchart 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
Non-dense index
S2 .
S4 .
S5 .
block 1
block 2
block 3
SNUM-index SNUM SNAME STATUS CITY
S1 De Smet 20 London
S2 Janssens 10 Paris
S3 Blanchart 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens
Factoring out a field
SNUM SNAME STATUS CITY-pointer
S1 De Smet 20
S2 Janssens 10
S3 Blanchart 30
S4 Clark 20
S5 Adams 30
Supplier fileCITY-file
CITY
Athens
London
Paris
Combining Indexing and factoring out
S1 De Smet 20
S2 Janssens 10
S3 Blanchart 30
S4 Clark 20
S5 Adams 30
Athens London Paris
Parent - Child structure
S1 De Smet 20
S2 Janssens 10
S3 Blanchart 30
S4 Clark 20
S5 Adams 30
Athens London Paris
CITY file
SUPPLIER file
Fully inverted file
SNAME-index STATUS-index CITY-index Supplier-
file
De Smet S1-> 10 S1-> Athens S5-> S1
Janssens S2-> 20 S1->,S4-> London S1->,S4-> S2
Blanchart S3-> 30 S3->,S5-> Paris S2->,S3-> S3
Clark S4-> S4
Adams S5-> S5
File organization: Indexed-sequential
multi-levelindex blocks
datablocks
BehrDoomsFagin
AdamsAlbertBehr
BodooClaesCoddDooms
ErnestFagin
AceAdamoAdams
AdemarAertsAlanAlbert
AloisBallBehr
BensBodoo
parameters - index block size - data block size
B-tree conceptBALANCED tree
25 144
9 - 64 100 196 -
1 4 - 9 16 - 25 36 49 64 81 - 100 121 - 144 169 - 196225250
non-dense index
dense index
B-tree insertion
non-dense index
dense index
same B-tree after insertion of record 32
64 -
25 - 144 -
9 - 36 - 100 - 196 -
1 4 - 9 16 - 25 32 - 36 49 - 64 81 - 100 121 - 144 169 - 196225256
B-tree deletion
25 81
9 - 36 - 144 196
non-dense index
1 4 -- 9 16 - 25 32 - 36 49 - 81 100 121 144169 - 196225 256
Deletion of 64