A Knowledge Sharing Session on

Post on 02-Jan-2016

31 views 0 download

description

A Knowledge Sharing Session on. Unit IV: Tables (DSPS). Syllabus: Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Hash Tables: Basic Concepts, Hash Function, Hashing - PowerPoint PPT Presentation

Transcript of A Knowledge Sharing Session on

A Knowledge Sharing Session on

Unit IV: Tables (DSPS)

1

Syllabus: Symbol Tables: Static and dynamic tree tables,

AVL trees, AVL Tree Implementation, Algorithms

and analysis of AVL Tree

Hash Tables: Basic Concepts, Hash Function,

Hashing methods, Collision resolution, Bucket hashing,

Dynamic Hashing.

Tables |Unit IV of DSPS (SE-Comp)

2

Part I : Symbol Tables

Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree.

Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.

Part II: Hash Tables

3

Symbol Table Examples

AVL Tree

AVL Implementation

AVL Algorithm Analysis

Symbol Table | Why Symbol Table

What Compiler Does?

• Lexical analysis– Detects inputs with illegal tokens • e.g.: main$ ();

• Parsing– Detects inputs with ill-formed parse trees • e.g.: missing semicolons

• Semantic analysis– Last “front end” phase– Catches all remaining errors

Symbol Table

4

Symbol Table | Why Symbol Table

Typical Semantic Errors

• multiple declarations: a variable should be declared (in the same region) at most once.

• undeclared variable: a variable should not be used before being declared.

• type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side.

• wrong arguments: methods should be called with the right number and types of arguments.

5

Symbol Table | Aim of Symbol Table

Purpose of Symbol Table

– keep track of names declared in the program

– names of• variables, classes, fields, methods,

6

Symbol Table | Symbol Table Stores

What it Contains

associates a name with a set of attributes, e.g.:

• kind of name (variable, class, field, method, etc)

• type (int, float, etc)

• nesting level

• memory location (i.e., where will it be found at runtime).

7

Symbol Table | Symbol Table Revisit

In Short,

During Lexical Analysis --Finds Symbols--Adds Symbols to symbol table

During Syntactic Analysis--Information about each symbol is filled in

During Semantic Analysis--Used for type checking.

8

Symbol Table | Symbol Table Important?

Info Provided by Symbol Table,• Given an Identifier which name is it?

• What information is to be associated with a name? (Actual Characters of the name, Type, Storage allocation info (number of bytes), Line number where declared, Lines where referenced, Scope.

• How do we access this information?

• How do we associate this information with a name?

9

Symbol Table | Reminder on Symbol Table

Note,

• A name can represent– Variable– Type– Constant– Parameter– Record– Record Field– Procedure– Array– Label– file

10

Symbol Table

Operations on Symbol Table

determining whether a string has alreadybeen stored

inserting an entry for a string

deleting a string when it goes out of scope

This requires three functions:

1. lookup(s): returns the index of the entry forstring s, or 0 if there is no entry2. insert(s): add a new entry for string s and return its index3. delete(s): deletes s from the table (or, typically,hides it)

11

Symbol Table |Operations on Symbol Table

Symbol Table

Example

01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}

12

Symbol Table | Symbol Table Examples

Symbol Table Unsorted List

01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}

Name Characteristic Class Scope Other AttributesDeclared Referenced Other

Main Program 0 Line 1a Variable 0 Line 2 Line 11b Variable 0 Line 2 Line 7P Procedure 0 Line 3 Line 11 1, parameter, xx Parameter 1 Line 3 Line 8a Variable 1 Line 4 Line 6

nOLook up Complexity

13

Symbol Table Sorted List

01 PROGRAM Main02 GLOBAL a,b03 PROCEDURE P (PARAMETER x)04 LOCAL a05 BEGIN {P}06 …a…07 …b…08 …x…09 END {P}10 BEGIN{Main}11 Call P(a)12 END {Main}

nO logLook up Complexity

Name Characteristic Class Scope Other AttributesDeclared Referenced Other

a Variable 0 Line 2 Line 11a Variable 1 Line 4 Line 6b Variable 0 Line 2 Line 7Main Program 0 Line 1P Procedure 0 Line 3 Line 11 1, parameter, xx Parameter 1 Line 3 Line 8

nOWorst Case:

14

Two issues:

1. Interface: how to use symbol tables

2. Implementation: how to implement it.

15

Basic Implementation Techniques

Considerations:

Number of names

Storage space

Retrieval time

16

<1> unordered list (linked list/array)

<2> ordered list» binary search on arrays» expensive insertion

(+) good for a fixed set of names(e.g. reserved words, assembly opcodes)

<3> binary search tree» On average, searching takes O(log(n)) time.» However, names in programs are not chosen

randomly.

<4>AVL:<5> Hash table: most common

(+) constant time 17

Static Tree TableIf Symbols are known in advance :

No insertion and Deletion allowed Cost of searching symbols of higher frequency

should be small.• Huffman tree and OBST

if

do Read

while

Fig: Optimal Search Tree when frequency of symbols are specified

0

0

0

0

1

11

1abc

de

Fig: Huffman Tree 18

Dynamic Tree TablesSymbols are inserted as and when they

comeDeletion is also possibleAVL

32 60

20 45 55 68

50 bst

19

Part I : Symbol Tables

Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree

Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.

Part II: Hash Tables

20

Where Hashing will be Used?1. docDict2. Database3. Compliers 4. Network Router and Servers5. Substring Search6. Cryptography

Hash Table| Motivation

21

Motivation

Hashing Methods

Collision Resolution

Symbol Table | Why Hash Table

A Problem?

• We have to store some records and perform the following:

add new recorddelete recordsearch a record by

key

Find a way to do these efficiently!

Hashing

22

Use an array to store the records, in unsorted order1. add - add the records as the last entry

fast O(1)

2. delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n)

3. search - sequential search slow O(n)

Hash Table| Unsorted Array

23

Use an array to store the records, keeping them in sorted order1. add - insert the record in proper

position. much record movement slow O(n)

2. delete a target - how to handle the hole after deletion? Much record movement slow O(n)

3. search - binary search fast O(log n)

Hash Table| Sorted Array

24

Store the records in a linked list (unsorted) 1. add - fast if one can insert node

anywhere O(1)2. delete a target - fast at disposing the

node, but slow at finding the target O(n)

3. search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)

Hash Table| Linked List

25

What is the Solution then?have better performance but are more

complex

1. Hash table

2. Tree (BST, Heap, …)

Hash Table| More Approaches

26

Array as table?

Hash Table| More Approaches

27

9903030

98020209801010

0056789

00123450033333

tushar

manalipeter

david

sandybubli

73

10020

56.8

81.590

studid name score

9908080 Namrata 49

...

...

Hash Table| Array as table?

28

:33333

:12345

0:

:betty

:andy

:

:90:

81.5:

name score

56789 david 56.8

:9908080

::

:bill::

:49::

9999999

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

Hash Table| Whats Wrong Then?

29

Consider this problem. We want to store 1,000 student records and search them by student id.

Consider this problem. We want to store 1,000 student records and search them by student id.

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

1. Keys may not be nonnegative integers.

2. Gigantic Memory hog

Hash Table| What's Wrong Then?

30

1. Keys may not be nonnegative integers.

Solution: Prehash

2. Gigantic Memory hogSolution: Direct Hash Table

(reduce universe of all keys to reasonable size)

Hash Table| What's Wrong Then?

31

• Each slot, or position, corresponds to a key in U.

• If there’s an element x with key k, then T [k] contains a pointer to x.

• Otherwise, T [k] is empty, represented by NIL.

Hash Table| Direct Hashing Table

32

Store the records in a huge array where the index corresponds to the keyadd - very fast O(1) delete - very fast O(1) search - very fast O(1)

Hash Table| Direct Hashing Table

33

Hash Table| Hash function

34

function Hash(key: KeyType): integer;

Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.

Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.

H(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9908080’) = 3

Hash Table| Hash Table

35

:betty

:bill:

:90:

49:

name score

andy 81.5

::

david:

::

56.8:

:0033333

:9908080

:

0012345

::

0056789:

3

67

0

764

999

134

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

Ex: key mod size 2201 mod 1000 =201

Hash Table| Division Method

36

h(k) = k mod m

different keys map to the same indexi.e h(k1)=h(k2)=i (k1!=K2)

Ex: 5 mod 11 and 27 mod 11 have index 5.

Hash Table| Collision

37

HashingWidely useful technique for

implementing dictionariesConstant time per operation (on the

average)Best Case O(1)Worst Case O(n)

KeyRecord

f()=>address

01

23

45

38

Ch s Hash FunctionQuick ComputationI t should spread keys evenly:

Uniform DistributionAvoid collisionVery rare cases

E.g Birth day paradox

39

Hash FunctionsDirect hashingDigit ExtractionModulo –division methodMid-square MethodFolding method

40

1. Hashing with Separate Chaining (Open hashing)-unlimited space

2. Hashing with Open Addressing(closed hashing)

Hash Table|-Collision Resolution DS

41

Hash Table|-Collision Resolution Strategies

42

Separate chaining Open Addressing

Linear Probing Quadratic Probing Double Hashing

LP with chainingLP without chaining

LP WC without replacement

LPWC with replacement

Hash Table| Chained Hash Table

43

2

4

10

3

nil

nilnil

5

nil

:

HASHMAX Key: 9903030name: tomscore: 73

One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

Is required:• When table is completely full• With quadratic probing when table is

filled half • When insertion fail due to overflow

• Size get double after rehashing• Mod value changed to new size* Very costly as new table creation, insertion from old table with using new hash fun.

Hash Table| Rehashing

44

It’s more efficient when load factor is >=70%

Whr l is load factor=

l=h/t whr h is total mapped loc

t is total loc.

Hash Table| Rehashing

45

Types of Linear Probing (with chaining with and without replacement

Note: Try to Solve all example that is taken in class on transparencies and on board ……you can take it from book…

46

Extendible Hashing• All tech. so far are used for small data• When data becomes bulky there will be too

many disk access• So in that case use extendible hashing• This uses binary (disk) coding to mapped the

loc with binary values.– 4 size hash table with 4 slot– 00– 01– 10– 11 47

**Implementation:

• Followings are some example how to create structure and apply hash function on it…

1. Linear Probing with store and search2. Double hashing 3. Quadratic probing

48

Linear Probeint search_LP(int hashtable[],int key,int T[]){ int I,j;

J=key%max;// mapped locfor(i=0;i<MAX;i++){

if(T[j]==0){hashtable[j]=key; T[j]=1;return(j);}

j=(j+1)%MAX;//next loc in circular way.}

return(-1);}

49

Search in LP

Only change if condition checking

for{If(T[j]==1 && hashtable[j]==key)

{ return(j);

}}

50

Double hashing

51

int search_DH(int hashtable[],int T[]){

int I,j,start;start=f1(key)%max; // 1st mapped locu=f2(key); // u will used for incrementfor(i=0;i<MAX;i++){

j=(start+ i*u)%max; if(T[j]==0) // found empty{ hashtable[j]=key; T[j]=1; return(j);}

}return(-1);}

Quadratic hashing

52

int search_QP(int hashtable[],int T[]){

int I,j,start;start=key%max; // 1st mapped locfor(i=0;i<MAX;i++){ j=(start+ i*i)%max;

if(T[j]==0) // found empty{ hashtable[j]=key; T[j]=1; return(j);}

}return(-1);}