VISVESVARAYA TECHNOLOGICAL UNIVERSITYMachche, Belgaum
2010 - 2011
A project report On
“Extendible Hashing”
Submitted in partial fulfillment of the requirements for the award of degree
Bachelor of EngineeringIn
Information Science and Engineering
ByShiva Shankar B.N
1RV08IS048
Under the guidance of
Kavitha S.NProfessor,
Dept. Of ISE,RV College Of Engineering
Nagraj G CholliProfessor,
Dept. Of ISE,RV College Of Engineering
Department of Information Science and Engineering,
R. V. College of Engineering, (An Autonomous Institution under VTU, Accredited by NBA)
Bangalore – 560059
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Department of Information Science and Engineering
R.V. College of Engineering,(An Autonomous Institution under VTU, Accredited by NBA)
Bangalore – 560059
CERTIFICATE
This is to certify that the FS project entitled
“Extendible Hashing”
has been successfully carried out by Shiva Shankar B.N bearing USN:1RV08IS048 in
partial fulfillment of; File Structures Lab (07IS63); the requirements for the
award of degree of Bachelor of Engineering in Information Science and
Engineering, during the academic year 2010-2011. It is certified that all
corrections/suggestions indicated for Internal Assessment have been incorporated in Report
deposited in departmental library. The project has been approved as it satisfies the academic
requirements in respect of Project work prescribed for the File Structures Lab.
Kavitha S.NProfessor
Dept. Of ISE,RV College Of
Engineering
Nagraj V CholliProfessor
Dept. Of ISE,RV College Of
Engineering
Dr. Ramakanth Kumar PProf &HOD,Dept. Of ISE,
RV College Of Engineering
PrincipalRVCE
Name of the Examiners Signature with Date
1.____________________ __________________
2.____________________ __________________
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Department of Information Science and Engineering
R.V. College of Engineering,(An Autonomous Institution under VTU, Accredited by NBA)
Bangalore-560059
DECLARATION
I, Shiva Shankar B.N, student of sixth semester B.E. in Information Science and Engineering,
R.V. College of Engineering, Bangalore declare that the project entitled “Extendible
Hashing” has been carried out by me and submitted in partial fulfillment of the File Structures
course requirements for the award of degree in Bachelor of Engineering in Information
Science and Engineering of Visvesvaraya Technological University, Belgaum during the
academic year 2010-2011. The matter embodied in this report has not been submitted to any
other university or institution for the award of any other degree or diploma.
Shiva Shankar B.N
USN: 1RV08IS048B.E. in Information Science and Engineering R.V. College of EngineeringBangalore-560 059
ACKNOWLEDGMENT
My project was the result of the encouragement of many people who helped in shaping it and
provide feedback, direction valuable support. It is with hearty gratitude that I acknowledge their
contributions to my project.
I like to thank my internal guide Kavitha S.N, professor, Department of Information Science &
Engineering, RVCE, for the guidance, suggestions in the area of improvement, and
implementation of the project.
I like to thank my lab-in-charge Nagraj.G Cholli, Professor, Department of Information Science
& Engineering, RVCE, for the constant help and support extended towards me during the course
of the project.
I am also grateful to the Professor and HOD, Dr. Ramakanth Kumar P, Department of
Information Science & Engineering, RVCE, for permitting me to take up this project and his
encouragement. I thank our Principal, RVCE, who has always been a great source of inspiration.
I thank RSST, for the infrastructure and facilities provided that helped in the completion of the
project successfully.
Last, but not the least, I would like to thank my family and friends who provided me with
valuable suggestions to improve my project.
Shiva Shankar B.N
1RV08IS048
ABSTRACT
With the evolution of technology, the amount of data obtained through various transactions, have
exceeded more than ever. This enormous data which has a large potential in guiding an
organization in its future endeavors. However, preserving this huge amount of data is justified,
only if it is possible for the organization to extract the required information from the stored data,
when required. This reasoning finally brings us to the conclusion that, all the transaction data that
is stored, serves its purpose only if it can be accessed with atleast the acceptable performance
level.
Given the importance of accessing the required data efficiently, various efficient ways have been
developed overtime. Some of these efficient algorithms use the concepts of Indexing, B trees, B+
+ trees, Hashing etc. B trees is considered to be efficient, however, its shortcoming is its
performance dependence on the depth of the tree (logkN). Hashing concept overcomes this
drawback, and suggests a way of obtaining an access speed of the order of 1 i.e O(1). Thus
hashing is one of the most efficient in accessing static data. However, most of the data is
considered to be dynamic i.e data gets modified very frequently and static hashing cannot handle
the varying data. This necessity then led to another form of hashing called Extendible Hashing.
The project demonstrates the implementation of Extendible Hashing, in accessing a series of
student records. The project uses Object Oriented Programming using C++ in its code
implementation and thus shows the applicability of Object Oriented Programming in
implementing complex programs.
The project provides features such as insertion, deletion, search, update and displaying of student
records stored in a student database. The insertion of records uses the concept of Extendible
Hashing in generating the record insertion address. Further, concepts of extendible hashing such
as buckets and many others have been implemented. The project also uses C based graphics to
present the data in a better form.
i
LIST OF FIGURES
Fig No. Name of Figure Page No.
Fig. 3.1
Fig. 3.2
Fig. 3.3
Fig. 3.4
Fig. 4.1
Fig.4.2
Fig.4.3
Fig.4.4
Fig.4.5
Fig.4.6
Fig.4.7
Fig.4.8
Fig. 7.1
Fig. 7.2
Fig. 7.3
Fig. 7.4
Fig. 7.5
Fig. 7.6
System Block Diagram
Level 0 Diagram
Level 1 Diagram
Level 2 Diagram
Structure diagram
IOBuffer class diagram
Student class diagram
FixedLengthBuffer class diagram
DelimFieldBuffer class diagram
TextIndex class diagram
Insertion Flowchart
Deletion Flowchart
Home page
Choice screen
Data Entry-Record Insertion
Record Modification
Record Display
Directory Details
7
8
9
10
11
12
12
13
13
13
14
16
26
26
27
27
28
28
ii
LIST OF TABLES
Table no. Name of Table Page No.
Table 6.1
Table 6.2
Table 6.3
Table 6.4
Table 6.5
Table 6.6
Table 6.7
Table 6.8
Table 6.9
Table 6.10
Table 6.11
Table 6.12
Unit test case for insertion operation.
Unit test case for modification operation.
Unit test case for display operation.
Unit test case for display all operation.
Unit test case for directory display
Unit test case for deletion operation.
Unit test case for space utilization operation.
Unit test case for test failure
Unit test case for correction of failure
Integrated test case for doubling directory.
Integrated test case for collapsing directory.
System test case for hashing.
20
21
21
21
22
22
22
23
23
24
24
25
iii
TABLE OF CONTENTSSl. No. Chapter Name Page No.
1.
2.
3.
4.
5.
6.
Introduction1.1 Purpose1.2 Scope1.3 Motivation1.4 Literature Survey
Software Requirement Specification2.1 Overall Description2.2 Specific Requirements 2.2.1 Functionality 2.2.1.1 Functionality Requirement 1 2.2.1.2 Functionality Requirement 2 2.2.1.3 Functionality Requirement 3 2.2.2 Performance Requirement 2.2.3 Design Constraints 2.2.4 Hardware Requirement 2.2.5 Software Requirement 2.2.6 Interface Requirement
2.2.6.1User Interfaces 2.2.6.2 Communication Interfaces
High Level Design3.1. Design Considerations 3.1.1 Assumptions and Dependencies
3.1.2General Constraints3.2 System Block Diagram 3.2.1 Solution Architect Diagram3.3 Data Flow Diagram
Detailed Design 4.1 Structure Diagram4.2 Class Diagram4.3 Flow Charts
Implementation5.1 Selection of the platform5.2 Selection of the programming language5.3 Programming Coding Guidelines Testing6.1 Unit Testing 6.1.1 Unit Test Case1 6.1.1 Unit Test Case2
11112
33344445555555
6666778
11
1214
18171819
20202021
iv
7.
8.
6.1.1 Unit Test Case36.1.1 Unit Test Case46.1.1 Unit Test Case56.1.1 Unit Test Case66.1.1 Unit Test Case76.1.1 Unit Test Case86.1.1 Unit Test Case9
6.2 Integration Testing 6.2.1 Integration Test Case1 6.2.1 Integration Test Case26.3 System Testing 6.3.1 System Test Case1
Results 7.1 Snapshots7.2 Advantages of the Project7.3 Limitations of the Project
Conclusion8.1 Future Enhancement
References
Appendix A List Of Acronyms
Appendix B Coding
212122222223232424242525
26262929
3030
31
32
33
v
Extendible Hashing Introduction
Chapter 1
INTRODUCTION
1.1 Purpose
The purpose of this project is to demonstrate the working of a file structure based on the concept
of Extendible Hashing. This helps us to understand the complexity required and also the benefits
derived out of implementing a project based on the extendible hashing.
There have been many concepts developed in search for better accessing of records. Some of
them include B trees, B+ trees etc. A working model of the hashing concept lets us compare it
with the other working models and get a clear idea about the applicability of the suitable concepts
for the different purposes.
Each of the file structure concepts, indexing, B trees, B++ trees are very efficient in structuring
and storing the data in a file. However, the B trees have O(logkN) access i.e their performance
can decrease with increase in the amount of data. Extendible Hashing comes as a better solution
to this problem providing an access of the order of 1.
1.2 Scope
The project is mainly applicable in academic institutions such as schools, colleges etc where
student details are stored, accessed and modified. However, the implementation is not just
confined to this particular field and can serve a wide variety of diverse fields such as accounting,
medicine, business etc.
Since, the main purpose of the project is to maintain a database using Extendible Hashing file
structure, the area of influence of this project encompasses all the fields that require a database
for its functioning.
For, the schools and colleges in particular, the implementation focuses on adding, searching,
modifying, updating or deleting a student record from the database in a very efficient manner.
Further, certain University specific constraints have been added to prevent errors creeping in to
the database.
1.3 Motivation
The key motivation factor behind a project on Extendible Hashing is the miracle that can be
achieved using extendible hashing. A file structure concept which can provide a sense of
completely structured data because of extremely low access times i.e O(1) and also overcome the
overhead involved in actually structuring the data. Providing logical adjacency instead of
Dept of ISE, R.V.C.E. 2009-2010 1
Extendible Hashing Introduction
physical adjacency helps in overcoming the overhead. This idea in particular, provides the
inspiration towards creating a program based on Extendible Hashing.
1.4 Literature Survey
The file structure concepts evolved with the need to efficiently access the data. This efficient
access was mainly dependent on the way the data was stored i.e mapping of address to the data
and so research were carried out towards storing data in such a manner as to get the address of
required data as efficiently as possible.
Early work with files presumed that files were on tape, since most files were. Access was
sequential and the cost of access grew in direct proportion to the size of the file. Simple indexes
were used to speed up the access. However, as the indexes grew, they too became difficult to
manage. Due to this reason, in the early 1960s, the idea of applying tree structures emerged as a
potential solution. In the late 1960s, using the work of B trees and B+ trees, many commercial
vendors created file systems that were faster and were not sequential.
B trees provided excellent access performance, but there was a cost: no longer could a file be
accessed sequentially with efficiency. Fortunately, this problem was solved almost immediately
by adding a linked list structure at the bottom level of the B tree. The combination of a B-tree and
a sequential linked list is called a B+ tree.
Over the next ten years, B-trees and B+ trees became the basis for many commercial file systems,
since they provide access times that grow in proportion to logkN. However, even though B-trees
with all its advancements proved to be extremely efficient, the ultimate goal i.e to access any data
required present in any part of the disk in one access was not achieved. In the 1980s, there was
optimism in achieving this, and Hashing concept gave some hints of achieving this. Hashing
proved to be extremely efficient. However, it had its drawback-it was for static files. After much
work, Extendible Hashing was developed which could retrieve information with one or, at most,
two disk accesses no matter how big the file became.
Dept of ISE, R.V.C.E. 2009-2010 2
Extendible Hashing Software Requirements Specification
Chapter 2
SOFTWARE REQUIREMENT SPECIFICATION
2.1 Overall Description
The project involves the following specifications:
Product Perspective: The product shall have graphics implemented into it for ease of use.
The product shall be able to deliver the required functionality efficiently i.e faster access and also
lesser storage overhead.
The product shall implement the Object Oriented Approach and thus provide easier ways to
update, debug and correct modules.
Product Functions:
During user input, each entry of the record shall be validated based on certain constraints.
The product shall provide the insert, update, display and delete functionalities.
On every recursive collapse, the information shall be displayed to the user.
The space utilized at a particular time shall also be displayed.
User Characteristics:
The user can enter the details of each student adhering to the constraints provided.
The user can enter view the collapsing of directories.
General Constraints:
The application must be protected from viruses in the system on which it is installed.
The application specifically requires Turbo C++ or any other C++ compiler with the support for
C graphics.
2.2 Specific Requirements
The requirements below shall enhance the supportability or maintainability of the system being
built, including coding standards, naming conventions ,class libraries, maintenance access,
utilities etc.
Dept of ISE, R.V.C.E. 2009-2010 3
Extendible Hashing Software Requirements Specification
2.2.1 Functionality
2.2.1.1 Functionality Requirement 1–Bucket Size
A bucket consists of a series of records which share the same address. For the program to work
consistently, the bucket size shall be based on the following criteria:
The buffer size shall be large enough to provide the required performance
The buffer size shall not exceed the limit above which the operating system cannot
manage.
Sector and track capacities on the disk.
Data access time of the hard disk( seek, rotation and data transfer times)
Bucket size shall not be larger than a track.
Bucket size shall be that of a single cluster.
2.2.1.2 Functionality Requirement 2–Doubling the size of the directory
The directory shall be split each time the bucket overflows and shall be displayed to the
user.
Address shall be assigned to the new buckets created.
Double the address space extending it from 2n to 2n+1 cells.
2.2.1.3 Functionality Requirement 3–Directory Collapse
The directory collapse shall be preceded by a check to determine whether downsizing is
possible.
The directory shall be collapsed if a pair of directory cells which point to different
buckets cannot be found in a directory scan.
Space shall be allocated for new a new array of bucket addresses that is half the size of
the original and then copying the bucket references shared by each cell pair to a single
cell in the new directory.
2.2.2 Supportability
The coding shall follow the naming standard.
The coding shall use the Object Oriented approach and use modules.
Certain comments, algorithms shall be provided as and when necessary.
Dept of ISE, R.V.C.E. 2009-2010 4
Extendible Hashing Software Requirements Specification
2.2.3 Performance Requirements
The computers used must have Intel Pentium 4 processors to provide optimum performance.
2.2.4 Design Constraints
The design of the product follows the IO buffer hierarchy.
The product has been designed considering the need to store student information in a
database and has provided student specific attributes such as USN, Branch etc for the
same.
2.2.4 Hardware Requirements
Processor: Pentium (3 or above) or AMD Athlon
RAM: 128Mb or more
Hard Disk: 10 MB or more
2.2.5 Software Requirements
Operating System: Microsoft Windows XP 32bit
Compiler: Turbo C++
2.2.6 Interface Requirement
2.2.6.1 User Interfaces
The product shall be completely User Interface based, adopting the C Graphics as its primary
interface.
2.2.6.2 Communication Interfaces
The communications for permanent storage on secondary storage devices (and also the retrieval
from these devices) shall be provided using the I/O functions provided by the C++ library.
Dept of ISE, R.V.C.E. 2009-2010 5
Extendible Hashing High Level Design
Chapter 3
HIGH LEVEL DESIGN
3.1 Design Considerations
3.1.1 Assumptions and Dependencies
Each student has a unique identifier called USN.
The USN is a 10 letter alphanumeric key of the format NAANNAANNN where N is a
digit between 0 and 9 and A is an alphabet.
Modification of USN is not allowed.
All attributes of a particular record are dependent on the USN as the key.
Directory splitting takes place depending on the bucket size i.e when bucket overflow
occurs.
Directory collapse takes place depending on whether collapsing is possible.
3.1.2 General Constraints
NULL CONSTRAINT:
No attribute values can be null.
ENTITY INTEGRITY CONSTRAINT:
The USN value cannot be null.
SEMANTIC CONSTRAINTS:
The USN is a 10 letter alphanumeric key of the format NAANNAANNN where N is a
digit between 0 and 9 and A is an alphabet.
The semester values must be between 1 and 8.
The department values supported are ISE/ise, CSE/cse.
Dept of ISE, R.V.C.E. 2009-2010 6
Extendible Hashing High Level Design
3.2 System Block Diagram
Block diagram is a diagram of a system, in which the principal parts or functions are represented
by blocks connected by lines, that show the relationships of the blocks. They are heavily used in
the engineering world in hardware design, software design, and diagrams. The block diagram is
typically used for a higher level, less detailed description aimed more at understanding the
overall concepts and less at understanding the details of implementation.
The system block diagram for simple hashing is given below. It takes the key of the record as the
input and produces the address as the output in which the record will be stored.
Figure 3.1 System block diagram of Hashing
The system block diagram for extendible hashing is given below. Extendible Hashing provides a
directory which is made up of cells and each cell points to a bucket. More than one cell can point
to a single bucket. Bucket is nothing but an index file containing the keys of the records
Figure 3.2 System block diagram of Extendible hashing
Dept of ISE, R.V.C.E. 2009-2010 7
0.0
ExtendibleHashing
Client
Message
ResultsHashing
Data 1
Output2
Output1
Fig 3.2 Level 0 Data Flow diagram
Extendible Hashing High Level Design
3.3 Data Flow Diagram
A data-flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system. DFDs can also be used for the visualization of data processing (structured
design).On a DFD, data items flow from an external data source or an internal data store to an
internal data store or an external data sink, via an internal process. A DFD provides no
information about the timing of processes, or about whether processes will operate in sequence or
in parallel. It is therefore quite different from a flowchart, which shows the flow of control
through an algorithm, allowing a reader to determine what operations will be performed, in what
order, and under what circumstances, but not what kinds of data will be input to and output from
the system, nor where the data will come from and go to, nor where the data will be stored.
Level 0 Data Flow Diagram
It describes the overall processing of the system and shows one process for each major
processing step or functional requirement. Data flows from the context appear on system diagram
also (level balancing). It can show a single data store to represent all data in aggregate at this
level. It can draw duplicate sources, sinks and data stores to increase legibility
Level 1 Data Flow Diagram
A level 1 dataflow diagram depicts the main functional areas of the system under investigation. It
is derived with reference to the context diagram.
The context diagram on this screen depicts the overall business process for a generic system.
Further analysis is then necessary in order to identify the major functional areas.
Dept of ISE, R.V.C.E. 2009-2010 8
2.0Manipulate Details
REPORT
MESSAGE
1.0Validate Data FedCLIENT
3.0Retrieve Details
Input data
Input Output 1
STUDENT DATABASE
Output 2
ExtendibleHashing 3
2
1
4
5
Fig3.3 Level 1 Data Flow diagram
Extendible Hashing High Level Design
1: A Key
2: Another Key
3: Error status
4: Record address
5: Student characteristics
Output 1: Performs the operation of hashing as directed to the student database, and displays the
messages in a convenient manner which can be easily interpreted.
Output 2: Displays reports of student details.
Dept of ISE, R.V.C.E. 2009-2010 9
2.1
INSERTDETAILS MESSAGE
INPUT OUTPUT
2.2
UPDATEDETAILS MESSAGE
2.2
DELETEDETAILS MESSAGE
INPUT
INPUT
OUTPUT
OUTPUT
Fig3.4 Level 2 Data Flow Diagram
Extendible Hashing High Level Design
Level 2 Data Flow Diagram
A level 2 data flow diagram depicts the input and output forms of the data
It is derived with reference to the context diagram.
The context diagram on this screen depicts the overall business process for a generic system.
Further analysis is then necessary in order to identify the major parts of the input and output.
INPUT: Student details
OUTPUT: Successfully inserted/updated/deleted
Dept of ISE, R.V.C.E. 2009-2010 10
MAIN
DISPLAY ALLDELETE
SEARCH
UNPACK
REMOVE
UNPACK
MODIFY
UNPACK
UNPACK
UNPACK
UNPACK
APPEND
APPEND PACK
IsUSNOK
IsNameOK
IsBranchOK
IsSemOK
DISPLAY
SEARCH UNPACK
Fig4.1 Structure Diagram
Extendible Hashing Detailed Design
Chapter 4
DETAILED DESIGN
4.1 Structure Chart
A Structure Chart (SC) is a chart, which shows the breakdown of the configuration system to the
lowest manageable levels. This chart is used in structured programming to arrange the program
modules in a tree structure. Each module is represented by a box, which contains the module's
name. The tree structure visualizes the relationships between the modules.
Dept of ISE, R.V.C.E. 2009-2010 11
IOBUFFER
Buffer:stringBufferSize:integerMaxBytes:integer
Read(istream&):integerWrite(ostream &):integerPack(void*,int):integer
Unpack(void*,int):integer
Fig 4.2 IOBuffer Class Diagram
STUDENT
USN:stringLname:stringFname:string
Address:stringSemester:stringCollege:string
Pack(void*,int):inegerUnpack(void*,int):integer
Print(ostream &)Search(char*):integer
Append(char*)
Fig 4.3 Student Class Diagram
Extendible Hashing Detailed Design
4.2 Class Diagrams
Class Diagram is a graphical model used in the object-oriented approach to show all of the
classes of objects in the system. It is a set of classes that are closely related in terms of function
and data, and which form an independent and reusable product.
Dept of ISE, R.V.C.E. 2009-2010 12
FIXEDLENGTHBUFFER
Read(istream&):integerWrite(ostream &):integer
Print(ostream &)sizeofBuffer():integer
Fig 4.4 FixedLengthBuffer Class Diagram
DELIMFIELDBUFFER
Delim:charDefaultDelim:char
Pack(void*,int):integerUnpack(void*,int):integer
Print(ostream &)Init()
Clear()
Fig 4.5 DelimFieldBuffer Class Diagram
TEXTINDEX
MaxKeys:integerNumKeys:integer
Insert(char*,int):integerRemove(char*):integerSearch(char*):integer
Print(ostream&)
Fig 4.6 TextIndex Class Diagram
Extendible Hashing Detailed Design
Dept of ISE, R.V.C.E. 2009-2010 13
B
INPUT KEY
KEY EXISTS?
CALL BUCKET::INSERT
PRINT KEY EXISTS
END
CALL BUCKET::SPLIT
IS BUCKET FULL?ADD KEY TO
BUCKET
A
END
START
Y
N
Y
N
Extendible Hashing Detailed Design
4.3 Flow ChartsA flow chart is a graphical or symbolic representation of a process. Each step in the process is
represented by a different symbol and contains a short description of the process step. The flow
chart symbols are linked together with arrows showing the process flow direction.
4.3.1 Insertion
The flow chart for insertion is shown below:
Dept of ISE, R.V.C.E. 2009-2010 14
CALL BUCKET::INSERT
KEY EXISTS??
PRINT KEY EXISTS
END
A
DIVIDE THE KEYS INTO THE NEW BUCKETS
IS THE DIRECTORY BIG ENOUGH?
CALL DIRECTORY::DOUBLE
SIZE
DOUBLE THE DIRECTORY SIZE AND ALLOW NEW BUCKET
B
N
Y
N
Y
Extendible Hashing Detailed Design
Dept of ISE, R.V.C.E. 2009-2010 15
START
INPUT KEY
CALLDIRECTORY::
REMOVE
IS THE KEY FOUND?
CALL BUCKET::REMOVE
PRINT: KEY NOT FOUND
END
CALL DELETEMETHOD
PASS BUCKETTO DIRECTORY:
TRY COMBINE
A
Y
N
Extendible Hashing Detailed Design
4.3.2 Deletion
The Flow chart for deletion is as follows:
Dept of ISE, R.V.C.E. 2009-2010 16
A
IS THEREBUDDY BUCKET?
IS SUM OF 2 BUCKET<1 BUCKET?
PRINT DELETIONDONE
END
CALL DIRECTORY:: COLLAPSE
CAN DIRECTORY BE COLLAPSED
PRINTDELETION
COLLAPSE THE DIRECTORY
END
Y
N
Y
N
Y
N
Fig 4.8 Key Deletion Flow Diagram
Extendible Hashing Detailed Design
Dept of ISE, R.V.C.E. 2009-2010 17
Extendible Hashing Implementation
Chapter 5
IMPLEMENTATION
5.1 Selection of the platform
An operating system is software that manages computer resources and provides
programmer/users with an interface used to access those resources. An operating system
performs basic tasks such as controlling and allocating memory, prioritizing system requests,
controlling and internal system resources as a service to users and programs of the system.
The system under development works in a very restrictive environment. The
security concerns are large and require that the system being developed be robust and safe
from attack. Windows XP analyzes the performance impact of visual effects and uses this to
determine whether to enable them, so input and output devices, facilitating computer
networking and managing files. An operating system processes system data and user input,
and responds by allocating and managing tasks as to prevent the new functionality from
consuming excessive additional processing overhead. Users can further customize these
settings. Windows XP operating systems can fix problems and add features by using service
pack. The service pack is a superset of all previous service packs and patches so that only the
latest service pack needs to be installed.
5.2 Selection of the programming language-C++
The programming language used for the development work is C++. The reasons for selecting
this language include
Compared to C, C++ which is object oriented in its approach suits well for the
modular programming that I apply in my project.
C++ provides a lot of I/O features which are very crucial for my project.
Since the project has more of input operations from the user, C++ provides simple
ways to input data which is otherwise, complex in Java.
Compared to Java, C++ runs faster because of the direct conversion of source
code to machine code.
C++ has the capability to interact directly with the machine, which is a add-on
capability that can be utilized.
C++ is a widely used language and hence it can, to some extent guarantee the
portability of the application developed.
Dept of ISE, R.V.C.E. 2009-2010 18
Extendible Hashing Implementation
5.3 Programming Coding Guidelines
5.3.1 Naming Conventions
Every variable has all the letters in lowercase.
Every class begins with an uppercase letter and has all the other letters in lowercase.
Uppercase letters are used to distinguish between words in an identifier.
Every method name has all the letters in lowercase.
5.3.2 Coding Conventions
All the required variables are declared at the beginning of each module.
For every unique functionality, a function is written and thus modularized.
Unconditional looping statements such as goto are avoided as much as possible to keep
the program simple and easy to debug.
Unsigned variables have been rarely used i.e only when it is extremely necessary.
Static variables are used to save space as and when possible.
The I/O buffer hierarchy is made use of extensively.
Dept of ISE, R.V.C.E. 2009-2010 19
Extendible Hashing Testing
Chapter 6
TESTING
The testing done in this project are Unit testing, Integration testing and System testing.
Features to be tested: Insertion, deletion, modification, updating and directory
collapse.
Items to be tested: doubling of directory size and space utilization for buckets.
Purpose of testing: To check the effective implementation of Extendible Hashing
Pass / Fail Criteria: Changes made either in the program or in the database file must
reflect in the file or program respectively.
Assumptions and Constraints: The values that can be entered have specific formats
with size constraints for each record.
6.1 Unit Testing
Unit testing is a software verification and validation method in which a programmer tests if
individual units of source code are fit for use. A unit is the smallest testable part of an
application. In procedural programming a unit may be an individual function or procedure.
6.1.1 Unit Test Case 1
Table 6.1 Unit Test Case 1
Sl No. of test case : 1Name of test : Insertion Test
Item / Feature being tested : Insert a new Student record.
Sample Input : USN=’1RV07IS030’,Name=’Mithun’,Address=’Mangalore’, Semester=’6’, Branch=’ISE’, College=’RVCE’.
Expected output : The record is Inserted into the file.Actual output : The record is successfully inserted.
Remarks : Test succeeded.
Dept of ISE, R.V.C.E. 2009-2010 20
Extendible Hashing Testing
6.1.2 Unit Test Case 2
Table 6.2 Unit Test Case 2
Sl No. of test case : 2Name of test : Modify test.
Item / Feature being tested : Modification of Details from the student record.Sample Input : Name=’John’ with USN=’1RV07IS532’and values to be
modified.Expected output : The modifications must be reflected in the database.
Actual output : The record is successfully updated. Remarks : Test succeeded.
6.1.3 Unit Test Case 3
Table 6.3 Unit Test Case 3
Sl No. of test case : 3Name of test : Display test
Item / Feature being tested : Display student details.Sample Input : USN=’1RV07IS431’.
Expected output : The record with USN=1RV07IS431 must be Displayed.Actual output : The record with USN=1RV07IS431 is Displayed.
Remarks : Test succeeded.
6.1.4 Unit Test Case 4
Table 6.4 Unit Test Case 4
Sl No. of test case : 4Name of test : Display all test
Item / Feature being tested : Display all the student records stored in fileSample Input : Enter the choice to display all records.
Expected output : All the student records in the file must be displayed .Actual output : All the student records in the file is displayed.
Remarks : Test succeeded.
Dept of ISE, R.V.C.E. 2009-2010 21
Extendible Hashing Testing
6.1.5 Unit Test Case 5
Table 6.5 Unit Test Case 5
Sl No. of test case : 5Name of test : Delete test
Item / Feature being tested : Deletion of a student record from the fileSample Input : USN=’1RV07IS342’
Expected output : The record with USN=’1RV07IS342’ must be deleted from the file
Actual output : The record is deleted. Remarks : Test succeeded.
6.1.6 Unit Test Case 6
Table 6.6 Unit Test Case 6
Sl No. of test case : 6Name of test : Directory display
Item / Feature being tested : To display the Directory.Sample Input : Enter the choice to display directory.
Expected output : The directory details must be displayed.Actual output : The directory details are successfully displayed.
Remarks : Test succeeded.
6.1.7 Unit Test Case 7
Table 6.7 Unit Test Case 7
Sl No. of test case : 7Name of test : Space utilization test
Item / Feature being tested : Display Space utilizationSample Input : Enter the choice to display space utilization.
Expected output : The space utilization must be displayed.Actual output : The space utilization is displayed.
Remarks : Test succeeded.
6.1.8 Unit Test Case 8 Dept of ISE, R.V.C.E. 2009-2010 22
Extendible Hashing Testing
Table 6.8 Unit Test Case 8
Sl No. of test case : 8Name of test : Insert test
Item / Feature being tested : Inserting student record
Sample Input :USN=’1RV07IS030’,Name=’Mithun’,Address=’Bangalore’,Semester=’11’, Branch=’ISE’,College=’RVCE’.
Expected output : The record should be successfully Inserted into the fileActual output : Record not inserted
Remarks : Test Failed
The insertion with semester as 11 fails because of wrong value entered for semester. The
insertion of the record with correct values is shown next.
6.1.9 Unit Test Case 9
Table 6.9 Unit Test Case 9
Sl No. of test case : 9Name of test : Insert test
Item / Feature being tested : Inserting student record
Sample Input :USN=’1RV07IS030’, Name=’Mithun’,Address=’Bangalore’,Semester=’6’, Branch=’ISE’, College=’RVCE’.
Expected output : The record is successfully Inserted into the fileActual output : Record is inserted
Remarks : Test succeeded
6.2 Integration testing
Integration testing (sometimes called Integration and Testing, abbreviated "I&T") is the activity
of software testing in which individual software modules are combined and tested as a group. It
occurs after unit testing and before system testing. Integration testing takes as its input modules
that have been unit tested, groups them in larger aggregates, applies tests defined in an
integration test plan to those aggregates, and delivers as its output the integrated system ready for
system testing.
Dept of ISE, R.V.C.E. 2009-2010 23
Extendible Hashing Testing
6.2.1 Integration test case 1
Table 6.10 Integration Test Case 1
Sl No. of test case : 1Name of test : Doubling the directory
Item / Feature being tested : Doubling the directorySample Input : Inserting a new record
Expected output : The directory gets doubled and the record is storedActual output : The directory gets doubled and the record is stored
Remarks : Test succeeded
6.2.2 Integration test case 2
Table 6.11 Integration Test Case 2
Sl No. of test case : 2Name of test : Collapsing the directory
Item / Feature being tested : Collapsing the directorySample Input : Deleting a record
Expected output : The directory gets collapsed when there are no more records in it
Actual output : The directory gets collapsedRemarks : Test succeeded
6.3 System testing
Dept of ISE, R.V.C.E. 2009-2010 24
Extendible Hashing Testing
System testing of software or hardware is testing conducted on a complete, integrated system to
evaluate the system's compliance with its specified requirements. System testing takes, as its
input, all of the "integrated" software components that have successfully passed integration
testing and also the software system itself integrated with any applicable hardware system(s).
6.3.1 System test case 1
Table 6.12 System Test Case
Sl No. of test case : 1Name of test : Hashing
Item / Feature being tested : Hashing.cppSample Input : Inserting student records
Expected output :The records get inserted in the corresponding hash addresses resolving collision using buckets
Actual output : The records get inserted in the correct addressRemarks : Test succeeded
Dept of ISE, R.V.C.E. 2009-2010 25
Extendible Hashing Results
Chapter 7
RESULTS
7.1 Snapshots
Figure 7.1 The home page
Figure 7.2 The choice screen
Dept of ISE, R.V.C.E. 2009-2010 26
Extendible Hashing Results
Figure 7.3 Data Entry-Record Insertion
Figure 7.4 Record Modification
Dept of ISE, R.V.C.E. 2009-2010 27
Extendible Hashing Results
Figure 7.5 Record Display
Figure 7.6 Directory details
7.2 Advantages of the project Dept of ISE, R.V.C.E. 2009-2010 28
Extendible Hashing Results
The project provides the ability to store student records in a database (a file), by using one
of the best file structure data storage and access concepts i.e Extendible Hashing.
Using the project, the particular instant of directory collapse and expansion can be found.
Provides a functionality to view the current space utilization.
The program provides better performance compared to B,B+ implementations.
Also, Hashing provides faster access, usually with very little storage overhead and it is
adoptable to most types of primary keys, i.e. hashing makes it possible to find any record
with only one disk access.
Extendible Hashing allows the address space to grow and shrink dynamically along with the
file, thus avoiding the need of overflow handling.
The use of the model of ‘TRIE’ extends the use of the hashed value, by the addition of
another level to the depth of the trie with a radix of 2, another bit of the hashed value is
used.
Space utilization is calculated to approximately 69% using the approximation formula given
by Flajolet.
7.3 Limitations of the project
The program does not support the use of interfaces such as mouse etc.
The size of the program compared to the functionality it provides is extremely large.
The program has been implemented using C++, however, using a Java code, would have
drastically reduced the size because of many built-in functions.
The program does not use the newer concepts such as multi-threading etc in its
implementation.
The program implement in C++, is machine dependent and requires compilation each
time it is shifted to a different platform.
To cover the address space effectively more no. of bits must be used from the hashed
value.
A complete binary tree has to be formed from the trie.
To accommodate the use of even a single new bit, the address space has to be doubled
due to the splitting of the bucket.
Records can be combined for 2 buckets only if they are buddy buckets.
Dept of ISE, R.V.C.E. 2009-2010 29
Extendible Hashing Conclusion
Chapter 8
CONCLUSION
8.1 Future Enhancement
The project developed can be enhanced further by making the bucket size dynamic.
The implementations can also be done to graphically represent the changes that happen after each
insertion and deletion.
The user input forms can be made more user friendly by providing exit options wherever
possible. Also, integration of mouse functionalities within the program can make it very easier
for the user to interact.
Further, multithreading capabilities can be provided and the program can be made to run over the
network, storing the file (database) on the server.
Even though, the program has been modularized; there are instances of redundant codes being
part of the program. There are many places in the source program where the redundant codes can
be eliminated and also certain codes can be integrated into a single module.
The display functionality can be modified to make multiple records to be displayed possible on a
single screen and thereby effectively utilize the display area.
Dept of ISE, R.V.C.E. 2009-2010 30
Extendible Hashing References
REFERENCES
[1] Competence Center Corporate Data Quality 2-IWI-HSG, Institute of Information
Management, University of St. Gallen,St. ,Gallen
[2] Michael.J.Folk, Bill Zoellick and Greg Riccardi,File Structures-An Object Oriented Approach
Using C++,2008
[3]Bjarne Stroustrup,C++ -the complete reference.
[4]Yashwant P Kanetkar, Graphics Using C.
[5] Raghu Ramakrishnan and Johannes Gehrke-Database Management Systems
[6] Ian Sommerville, Software Engineering, 5th Edition, Pearson Education.
[7] www.CUserseJournal.com
[8] www.ThinkinginC.com
[9] www.AlgorithmsinCProgramming.com
Dept of ISE, R.V.C.E. 2009-2010 31
Extendible Hashing Appendix-A
APPENDIX-A
LIST OF ACRONYMS
1. FS: File Structures
2. USN: University Serial Number
3. SRS: Software Requirement Specification
4. OS: Operating System
5. RAM: Random Access Memory
6. SQL: Structured Query Language
7. ID: Identifier
8. CD-ROM:Compact Disk Read Only Memory
Dept of ISE, R.V.C.E. 2009-2010 32
Extendible Hashing Appendix-B
APPENDIX-B
CODING
B.1 Student Database
//student.h
#include "C:\tc\hash\delim.cpp"class Student{ public: char URN[13];char Lname[11];char Fname[21];char Address[50]; char Semester[2];char Branch[6];char College[11];
Student(); static int InitBuffer (DelimFieldBuffer &); void Clear (); int Unpack (IOBuffer &); int Pack (IOBuffer &) const; void Print (ostream &, char *label = 0) const; int Search(char *); int Append(char *);};
//main.cpp
#include<iostream.h>#include "c:\tc\hash\student.h"#define TRUE 1#define FALSE 0
Student :: Student () { Clear();}
void Student :: Clear(){ // Set each field to an empty string URN[0] = 0;Lname[0] = 0; Fname[0] = 0; Address[0] = 0; Semester[0] = 0; Branch[0] = 0; College[0] = 0;}
int Student :: Pack (IOBuffer & Buffer) const{ int numBytes; Buffer.Clear(); numBytes = Buffer.Pack(URN); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Lname); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Fname); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Address); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(Semester); if (numBytes == -1) return FALSE;
Dept of ISE, R.V.C.E. 2009-2010 33
Extendible Hashing Appendix-B numBytes = Buffer.Pack(Branch); if (numBytes == -1) return FALSE; numBytes = Buffer.Pack(College); if (numBytes == -1) return FALSE; return TRUE;}
int Student :: Unpack(IOBuffer & Buffer)//all the feilds are packed to the buffer
{ Clear(); int numBytes; numBytes = Buffer.Unpack(URN); if (numBytes == -1) return FALSE; URN[numBytes] = 0; numBytes = Buffer.Unpack(Lname); if (numBytes == -1) return FALSE; Lname[numBytes] = 0; numBytes = Buffer.Unpack(Fname); if (numBytes == -1) return FALSE; Fname[numBytes] = 0; numBytes = Buffer.Unpack(Address); if (numBytes == -1) return FALSE; Address[numBytes] = 0; numBytes = Buffer.Unpack(Semester); if (numBytes == -1) return FALSE; Semester[numBytes] = 0; numBytes = Buffer.Unpack(Branch); if (numBytes == -1) return FALSE; Branch[numBytes] = 0; numBytes = Buffer.Unpack(College); if (numBytes == -1) return FALSE; College[numBytes] = 0; return TRUE;}
int Student :: InitBuffer (DelimFieldBuffer & Buffer){ return TRUE;}void Student :: Print(ostream & stream, char * label) const{ gotoxy(3,4); if (label == 0) stream << "Stutent:"; else stream << label; gotoxy(3,5); stream << "Reg-no : " << URN ; gotoxy(3,6); stream << "Last Name : " << Lname; gotoxy(3,7); stream << "First Name: " << Fname; gotoxy(3,8); stream << "Address : " << Address; gotoxy(3,9); stream << "Semester : " << Semester; gotoxy(3,10); stream << "Branch : " << Branch;
Dept of ISE, R.V.C.E. 2009-2010 34
Extendible Hashing Appendix-B gotoxy(3,11); stream << "College : " << College; stream<<flush;}
int Student :: Search(char *myfile){ fstream file(myfile,ios::in); Student s1;
while(1){ DelimFieldBuffer :: SetDefaultDelim('|'); DelimFieldBuffer Buff; int add=Buff.Read(file); if (add==-1) return 0; s1.Unpack(Buff); if( strcmpi(s1.URN,URN)==0) return add+1;
}}
int Student :: Append(char *myfile){ DelimFieldBuffer :: SetDefaultDelim('|'); DelimFieldBuffer Buff; Student :: InitBuffer(Buff); Pack(Buff); fstream file(myfile,ios::in|ios::out); file.seekp(0,ios::end); file.seekg(0,ios::end); int recaddr=Buff.Write(file); file.close(); return recaddr;}
Dept of ISE, R.V.C.E. 2009-2010 35
Top Related