cellular patition by using hash org.
-
Upload
tejbir-sangwan -
Category
Documents
-
view
214 -
download
0
Transcript of cellular patition by using hash org.
-
8/8/2019 cellular patition by using hash org.
1/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63 1
DATA STRUCTUREFILES
UNIT IV
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 2
Learning Objectives
Hashing
Indexing Techniques
File Organization
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63 3
Hashing
-
8/8/2019 cellular patition by using hash org.
2/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 4
Hashing
Technique for performing Insertion, Deletion,
Search in constant average time
Ordering of elements is not supported efficiently
Keys are mapped onto a number between 0 &
TableSize-1
Mapping is done on basis of a function called
Hash Function
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 5
Hash Function
Transforms a key into a cell/bucket address
Must be simple to compute
Should ensure that distinct keys get distinct cells
Not possible in all cases as number of keys increases
Leads to collisions (multiple keys map to the same hash
value)
So choose a function that leads to even distribution of
keys
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 6
Considerations
Which hash function to use
How to respond to collisions
-
8/8/2019 cellular patition by using hash org.
3/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 7
Common hash functions
Mod
Mid Square
Folding
Digit Analysis
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 8
Collision Resolution
Open address hashing
Linear Probing
Quadratic Probing
Double Hashing
Separate Chaining
Rehashing
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 9
Open Addressing
In case of a collision alternate cells are tried till an
empty cell is not found.
Cell hi(X)= Hash(X) + F(i) Given F(0)=0
For Linear probing F(i) is a linear function of i;
For Quadratic probing F(i) is a function ofi2;
For Double Hashing F(i) is some function of I otherthan the one chosen originally.
-
8/8/2019 cellular patition by using hash org.
4/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 10
Linear Probing
For a table large enough in size to hold all the keys;
free space will always be found
Though the time required will be large
Drawback Blocks of occupied cells might get formed: PRIMARY
CLUSTERING
i.e a key that hashes into a cluster will require several
attempts to resolve collision
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 11
Linear Probing
Consider a hash table with 10 slots.
Say, The keys to be inserted are 12, 30, 11, 32, 34, 54, 50
The hash function is mod 10
This divisor is chosen just for illustration and is not a goodchoice
as a maximum of 10 resultant cells get generated, thuscollisions will be frequent.
The divisor should preferably be a prime number
Stages of insertion are illustrated on following slides
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 12
Linear Probing: Illustration
0
1
2
3
4
5
6
7
8
9
Add 12 on Cell 12%10= 2 12
Add 30 on Cell 30%10= 0 30
Add 11 on Cell 11%10= 1 11
Try to Add 32 on Cell 32%10= 2; Not available; Try Next 32
Add 34 on Cell 34%10= 4 34
Try to Add 54 on Cell 54%10= 4; Not available; Try Next 54
Try to Add 50 on Cell 50%10= 0; Not available; Try Next
Till an empty cell isnt found50
-
8/8/2019 cellular patition by using hash org.
5/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 13
Quadratic Probing
Similar treatment can be given when collisions
occur in case of Quadratic probing;
Here,
instead of choosing the next cell that lies after the idealcell i(or a cell given by a linear function ofi)
A new cell number given by some quadratic function of
iis chosen
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 14
Separate Chaining
Maintains a list of all the keys that hash to the same value
To insert:
Calculate the hash function
Access the corresponding list
Add a link to the list
i.e. A link is added in case of a collision
The new key might be added at either end of the list
Better for large sized records, handles collisions & overflow
efficiently.
Not as efficient when record size is small or domain of keysvalues is limited to a small number of entries
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 15
Separate Chaining: Illustration
3
2
1
030
22
43
10
Insert Sequence: 22, 42, 30, 43, 10Insert Sequence: 22, 42, 30, 43, 10
42
-
8/8/2019 cellular patition by using hash org.
6/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 16
Rehashing
When table gets Too full, number of collisions increase;
thus, resulting in a degradation in performance whileinserting as well as searching
Build another hash table with size ~ 2*OldSize
Scan the original table; for each entry Compute the new hash value
Insert in the new hash table
Rehashing is costly, thus, should not be done veryfrequently.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 17
Rehashing: Illustration
Consider the hashtable as given in thefigure:
0
1
2
3
4
5
6
7
8
9
12
30
11
32
34
54
50
The keys to be inserted are 12,30, 11, 32, 34, 54, 50
The hash function is mod 10
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 18
Rehashing
New table size 19
The hash function is mod 23
0 1 2 3 4 5 67
8 9 10 11 12 13 14 15 16 17 18
1230 1132 345450
-
8/8/2019 cellular patition by using hash org.
7/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63 19
Indexing Techniques
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 20
Indexing Techniques
Cylinder Surface Indexing
Hashed Indexing
Tree Indexing
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 21
Cylinder Surface Indexing
Used for primary key index in sequential file
organization
Assumes records are stored in increasing order of
Primary Key
Index consists of CYLINDER INDEX + SURFACE
INDEX for each cylinder
-
8/8/2019 cellular patition by using hash org.
8/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 22
Cylinder Surface Indexing
If a data file takes up ccylinders CI has centries
Each CI entry contains
{CYLINDER_NO, Largest key on cylinder}
Each entry of SI of ith cylinder contains:
{SURFACE_NO, Largest key on ith cylinder of this surface}
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 23
Cylinder Surface Indexing
Searching a record (ISAM)
Read Cylinder Index in memory
Locate the cylinder number that possibly contains therecord
Read the surface index of the corresponding cylinder
Find the surface (reduced to track) that may contain therecord
Search the track sequentially
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 24
Hashed Indexing
Maintains hash table of key values along with the correspondingrecord addresses
The set of hash functions and overflow handling techniques:discussed in hashing
In case oflinear probingseek time is less as overflow buckets /cells are adjacent
In case ofSeparate Chainingspecial buffer space is allocatedfor expansion of buckets; thus little or no additional seek timeis required
Max seek time in case of random or quadratic probing
-
8/8/2019 cellular patition by using hash org.
9/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 25
Tree Indexing
Indexing using balanced trees of orderm
Discussed before as B-trees and B+ tree
Maximum number of keys: ml-1
Let number of Keys= N
Number of failure nodes (number of nodes that one could
reach while looking for a key that doesnt exist in tree)=
N+1
= number of nodes at level l+1
>= 2 * Ceil (m/2) l-1
Thus, N >= 2 * Ceil (m/2) l-11
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 26
Tree indexing
Consider a B-Tree of order m=200
Say N= 2 * Ceil (m/2) l-11
i.e. 2*106 >= 2 * Ceil (200/2) l-11
We get
106 >= (100) l-1
6 >= 2(l-1)
l
-
8/8/2019 cellular patition by using hash org.
10/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63 28
FILE ORGANIZATION
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 29
File Organization
Sequential File Organization
Random File Organization
Inverted Files
Cellular Files
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 30
Sequential File Organization
ISAM is the most popular sequential file organization
Cylinder surface index is maintained for primary key.
Makes search based on PK efficient
Search based on other attributes require use of an alternate
indexing technique
Insertion, Deletion are time consuming
Batch processes and Range queries are executed efficiently
-
8/8/2019 cellular patition by using hash org.
11/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 31
Random File Organization
Records are stored at random locations
Techniques used for randomization
Direct Addressing
Directory Lookup Hashed File organization
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 32
Direct Addressing
Available disk space is divided into nodes large enough to hold
a record
Numeric value of the PK determined the node number where
the insertion is to be made (1 disk access for read)
Good for fixed length records and high identifier density
(Current/Domain).
In case of variable length records pointer to actual locations on
disk are maintained. (2 disk accesses for read)
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 33
Directory Lookup
Like, DA, Variable length records, index maintainskey values and pointers to disk addresses
Unlike, DA, Variable length records, available
space is utilized efficiently as the existing keys
are stored contiguously
Searching requires multiple disk accesses as the
index needs to be searched first
-
8/8/2019 cellular patition by using hash org.
12/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 34
Hashed File Organization
Uses same principle as hashed indexes
Available file space is divided into
cells/buckets/slots
Some space is set aside for overflow in case of
chaining
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 35
Inverted Files
Index contains the link information
Index structure is most important
Stores index values and related record addresses
Records may be stored using any organization
Actual records my do away with storage of key
values.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 36
Inverted Files
F340
C220
B200
E110
D101
A100
E# Index
A, B, DProgramme
r
C, EAnalyst
Occupation Index
B, C, DFemale
A, EMale
Gender Index
-
8/8/2019 cellular patition by using hash org.
13/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 37
Inverted Files
Searching becomes efficient as address
associated with a key value are available as a
list
Combination of conditions can be carried out
using simple list operations like union,
intersection, subtraction etc.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 38
Cellular PartitionsStorage media is divided into cells
A cell could be
A disk pack; or
A cylinder
Lists of a given key value are divided into sub-lists
such that each sub-list occupies a single cell.
The index entries now contain the starting address of
each sub-list and the number of records in this list.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 39
Cellular Partition
In case a cell is a cylinder, all the records placed in
on cell can be accessed without moving theread/write head
In case a cell is a disk pack, several cells can be
search in parallel.
-
8/8/2019 cellular patition by using hash org.
14/14
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 40
What we Studied
Hashing
Indexing Techniques
File Organization
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 41
Review Questions
1. What is the criteria behind the design of hash function ?
2. What are the various ways to store the Graphs in Memory?3. Discuss the application of hash table. Write short note on symbol
table.
4. Compare Sequential and random file organization.5. What are the advantages of usinginverted files?
6. Would you use Quadratic Probing for resolving collisions inhashed index files? State reasons.
7. Write short note on Structure of direct file8. Give comparison between sequential file,indexed sequential file
and random access file.
9. Write a short note on Open Address Hashing and SeparateChaining
10. Discuss Random file Organization and various techniques used
for randomization11. Explain various techniques for overflow / collision resolution incase of hashing
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63. 42
References
Fundamentals of Data Structures, E. Horowitz and S. Sahani,Galgotia Booksource Pvt. Ltd., (1999)
Data Structures and Algorithm Analysis in C (Second Edition)by Mark Allen Weiss
Data Structures: A Pseudocode Approach with C, Second EditionRichard Gilberg, Behrouz Forouzan
Data Structures and program design in C, R. L. Kruse, B. P.Leung, C. L. Tondo, PHI.
Data Structure, Schaums outline series, TMH, 2002
Data Structures using C and C++, Y. Langsam et. al., PHI (1999).
Data Structures, N. Dale and S.C. Lilly, D.C. Heath and Co. (1995).
Data Structure & Algorithms, R. S. Salaria, Khanna BookPublishing Co. (P) Ltd., 2002.