File Organization and Storage Structures Chapter 5.

Post on 03-Jan-2016

230 views 0 download

Tags:

Transcript of File Organization and Storage Structures Chapter 5.

File Organization and

Storage Structures

Chapter 5

Basic Concepts

The database on secondary storage is organized into one or more files, where each file consists of a number of records.

Each record consists of one or more fields.

Typically, a record corresponds to an entity and a field to an attribute.

The physical record is the unit of transfer between disk and primary storage, and vice versa.

A physical record , sometimes called block or page, contains mostly several logical records, depending on the size of the records.

List structures

Elementary listSingular list

Circular list

Symmetric list

Symmetric circular list

Sequential insertion

X(1)

X(2)

X(3)

X(4)

FreeZone

X’(1)=X(1)

X’(2)=Y

X’(3)=X(2)

X’(4)=X(3)

freeZone

X’(5)=X(4)

Insertion with pointer technique

X(1)

X(3)

X(2)

X(4)

Y

X’(1)=X(1)

X’(4)=X(3)

X’(3)=X(2)

X’(5)=X(4)

X’(2)=Y

Multi-list structure

record with pointer record length 10

address

list1

list2

list emptyplaces

2000

3000

2020

2030

-1

-1

2050

2040

2010

2000

2060

3000

-1

.

.

.

A

B

K

L

Insertion at beginning of list 2

list1

list2

2000

3000

2020

2030

-1

-1

2010

2050

2040

2000

2060

3000

-1

.

.

.

A

B

K

L

M

List1: A B

List2: M K L

General tree structureA

B C

D E F H J K L

M N P Q R

Equivalent binary tree structure

A

B C

D E F

H J K L

Q R

M N P

Pointer Implementation

A

B C

D E F

H J K L

Q R

M N P

-1

-1

-1-1-1

-1-1-1-1

-1-1-1

-1-1-1-1

Bi-directional treeX

Y R S

Z U T

Entry

-1 X

Y -1 R -1 S

-1 Z -1 U -1 T

- first lower- higher- next

Ring structure

X

Y Z U

V T R

Entry

X

Y Z U

V T R

File Organization

File OrganizationThe physical arrangement of data into records and pages on

secondary storageMain types

• Heap or unordered

• Sorted

• Hash

Access methodThe steps involved in storing and retrieving records from a

file

Sample Data

SUPPLIER file

SNUM SNAME STATUS CITY

S1 De Smet 20 London

S2 Janssens 10 Paris

S3 Blanchart 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

Hash Files

S300 Blanchart 30 Paris

0 1

2 3

4 5

6 7

8 9

10 11

12

S200 Janssens 10 Paris

S500 Adams 30 Athens

S100 De Smet 20 London

S400 Clark 20 London

Hashing techniques

Duplicate handling

- open addressing- unchained overflow- Chained overflow- Multiple hashing

Hashing algorithms

- folding- mid-square- division by prime number

Limitations: - inappropriate for value ranges - retrieval on the non-hash fields

An Index

An index provides an ACCESS PATH to the file it is indexing

a file may have several associated indexes

the sequential access path is always available

an index imposes an ordering on the file it is indexing

it can be used for direct access

it speeds up retrieval and slows down updating

it is not the same thing as a key

can be build on combinations of fields

can be SRA or symbolic

Sample Data

SUPPLIER file

SNUM SNAME STATUS CITY

S1 De Smet 20 London

S2 Janssens 10 Paris

S3 Blanchart 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

Supplier file with index on city

Supplier file

SNUM SNAME STATUS CITY

S1 De Smet 20 London

S2 Janssens 10 Paris

S3 Blanchart 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

City-index

Athens .

London .

London .

Paris .

Paris .

Supplier file with two indexes

10

20

20

30

30

Supplier file

City-index

Athens .

London .

London .

Paris .

Paris .

SNUM SNAME STATUS CITY

S1 De Smet 20 London

S2 Janssens 10 Paris

S3 Blanchart 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

Non-dense index

S2 .

S4 .

S5 .

block 1

block 2

block 3

SNUM-index SNUM SNAME STATUS CITY

S1 De Smet 20 London

S2 Janssens 10 Paris

S3 Blanchart 30 Paris

S4 Clark 20 London

S5 Adams 30 Athens

Factoring out a field

SNUM SNAME STATUS CITY-pointer

S1 De Smet 20

S2 Janssens 10

S3 Blanchart 30

S4 Clark 20

S5 Adams 30

Supplier fileCITY-file

CITY

Athens

London

Paris

Combining Indexing and factoring out

S1 De Smet 20

S2 Janssens 10

S3 Blanchart 30

S4 Clark 20

S5 Adams 30

Athens London Paris

Parent - Child structure

S1 De Smet 20

S2 Janssens 10

S3 Blanchart 30

S4 Clark 20

S5 Adams 30

Athens London Paris

CITY file

SUPPLIER file

Fully inverted file

SNAME-index STATUS-index CITY-index Supplier-

file

De Smet S1-> 10 S1-> Athens S5-> S1

Janssens S2-> 20 S1->,S4-> London S1->,S4-> S2

Blanchart S3-> 30 S3->,S5-> Paris S2->,S3-> S3

Clark S4-> S4

Adams S5-> S5

File organization: Indexed-sequential

multi-levelindex blocks

datablocks

BehrDoomsFagin

AdamsAlbertBehr

BodooClaesCoddDooms

ErnestFagin

AceAdamoAdams

AdemarAertsAlanAlbert

AloisBallBehr

BensBodoo

parameters - index block size - data block size

B-tree conceptBALANCED tree

25 144

9 - 64 100 196 -

1 4 - 9 16 - 25 36 49 64 81 - 100 121 - 144 169 - 196225250

non-dense index

dense index

B-tree insertion

non-dense index

dense index

same B-tree after insertion of record 32

64 -

25 - 144 -

9 - 36 - 100 - 196 -

1 4 - 9 16 - 25 32 - 36 49 - 64 81 - 100 121 - 144 169 - 196225256

B-tree deletion

25 81

9 - 36 - 144 196

non-dense index

1 4 -- 9 16 - 25 32 - 36 49 - 81 100 121 144169 - 196225 256

Deletion of 64