Data Structures (BE)
-
Upload
prabhaharan429 -
Category
Education
-
view
686 -
download
14
description
Transcript of Data Structures (BE)
1
ANNA UNIVERSITY TIRUCHIRAPPALLI
Regulations 2008 Syllabus
B. Tech IT/ B.E EEESEMESTER III
CS1201 - DATA STRUCTURES
Prepared By:
B.Sundara vadivazhagan HOD i/c / IT
S.Karthik Lect/ IT
G.Mahalakshmi Lect / IT
UNIT I - FUNDAMENTALS OF ALGORITHMS
Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of
Algorithm using Data Structures – Performance Analysis – Time Complexity – Space
Complexity – Amortized Time Complexity – Asymptotic Notation
UNIT II - FUNDAMENTALS OF DATA STRUCTURES
Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists
– Queue and its Representation – Applications of Stack – Queue and Linked Lists.
UNIT III - TREES
Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and
External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm –
Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching –
Hashing
UNIT - IV GRAPHS AND THEIR APPLICATIONS
Graphs – An Application of Graphs – Representation – Transitive Closure –Warshall‘s
Algorithm – Shortest path Algorithm – A Flow Problem – Dijikstra‘s Algorithm – Minimum
Spanning Trees – Kruskal and Prim‘s Algorithm – An Application of Scheduling – Linked
Representation of Graphs – Graph Traversals
UNIT V - STORAGE MANAGEMENT
General Lists – Operations – Linked List Representation – Using Lists – Freeing List Nodes –
Automatic List Management : Reference Count Method – Garbage Collection – Collection and
Compaction
2
TEXT BOOKS
1. Cormen T. H.., Leiserson C. E, and Rivest R.L., ―Introduction to Algorithms‖, Prentice Hall of
India, New Delhi, 2007.
2. M.A.Weiss, ―Data Structures and Algorithm Analysis in C‖, Second Edition, Pearson
Education, 2005.
REFERENCES
1. Ellis Horowitz, Sartaj Sahni and Sanguthevar Rajasekaran, ―Computer Algorthims/C++‖,
Universities Press (India) Private Limited, Second Edition, 2007.
2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, ―Data Structures and Algorithms‖,First Edition,
Pearson Education, 2003.
3. R. F. Gilberg and B. A. Forouzan, ―Data Structures‖, Second Edition, Thomson India Edition,
2005.
4. Robert L Kruse, Bruce P Leung and Clovin L Tondo, ―Data Structures and Program Design in
C‖, Pearson Education, 2004.
5. Tanaenbaum A. S. Langram, Y. Augestein M.J, ―Data Structures using C‖, Pearson Education,
2004.
3
4
UNIT I - FUNDAMENTALS OF ALGORITHMS
Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of
Algorithm using Data Structures – Performance Analysis – Time Complexity – Space
Complexity – Amortized Time Complexity – Asymptotic Notation
5
UNIT – I FUNDAMENTALS OF ALGORITHMS
I. ALGORITHM
An algorithm is any well-defined computational procedure that takes some value, or set
of values, as input and produces some values, as Output. In other words, an algorithm is a
sequence of computational steps that transform the input into the output.
An algorithm can be viewed as a tool for solving a well-specified computational problem.
The statement of the problem specifies the desired input/ output relationship. The algorithm
describes a specific computational procedure for achieving that input/output relationship.
Study of Algorithm :
Problem of sorting a sequence of numbers into a non-decreasing order. The Sorting
problem is defined as
Input: A sequence of n numbers (a1,a2….an)
Output: A permutation (reordering) (a1‘,a2‘,…an‘) of the input sequence such that
a1‘≤a2‘≤…..≤an‘
Given an input sequence such as (31,41,59,26,41,58), a sorting algorithm returns as
output the sequence (26,31,41,41,58,59). Such an input sequence is called an instance of the
sorting problem. In general, an instance of a problem consists of all the inputs needed to
compute a solution to the problem.
An algorithm is said to be correct if, for every instance, it halts with the correct output.
The correct algorithm solves the given computational problem.
An incorrect algorithm might not halt at all on some input instances, or it might halt with
other than the desired answer.
Example
Insertion Sort
Insertion sort is an efficient algorithm for sorting a small number of elements.
Insertion sort works the way many people sort a bridge or gin rummy hand.
We start with an empty left hand and the cards face down on the table. We then
remove one card at a time from the table and insert it into the correct position in
the left hand.
To find the correct position for a card, we compare with each of the cards already
in the hand, from right to left.
Pseudocode for INSERTION-SORT
INSERTION-SORT (A)
for j 2 to length[A]
do keyA[j]
6
Insert A[j] into the sorted sequence A[1….j-1]
i j-1
while i>0 and A[i]>key
do A[I+1] A[i]
ii-1
A[i+1]key
The insertion sort is presented as a procedure called INSERTION SORT, which
takes as parameter an array A[1….n] containing a sequence of length n that is to
be sorted.
The input numbers are sorted in place: the numbers are rearranged within the
array A, with at most a constant number of them stored outside the array at any
time.
The input array A contains the sorted output sequence when INSERTION-SORT
is finished.
5 4 6 1 3
2 5 6 1 3
2 4 5 1 3
2 4 5 6 3
1 2 4 5 6
1 2 3 4 5 6
Fig: The operation of INSERTION-SORT on the array A=(5, 2, 4,6, 1 ,3). The position
of index j is indicated by a circle
The fig shows how this algorithm works for A (5, 2, 4, 6, 1, 3). The index j indicates the
current card being inserted into the hand. Array elements A [1..j-1] constitute the currently
sorted hand, and elements A[j+1…..n] corresponds to the pile of cards still on the table.
The index j moves left to right through the array. At each iteration of the ―outer‖ for loop,
the element A[j] is picked out of the array.
Then starting in position j-1 elements are successively moved one position to the right
until the proper position for A[j] is found, at which point it is inserted.
2
4
6
1
3
7
Goals for an algorithm
Basic goals for an algorithm
1. Always correct
2. Always terminates
3. Performance- Performance often draws the line between what is possible and what is
impossible.
The notion of ―algorithm‖
Description of a procedure which is
1. Finite (i.e., consists of a finite sequence of Characters)
2. Complete (i.e., describes all computation steps)
3. Unique (i.e., there are no ambiguities)
4. Effective (i.e., each step has a defined effect and Can be executed in finite time)
Properties:
Desired properties of algorithms
Correctness
o For each input, the algorithm calculates the requested value
Termination
For each input, the algorithm performs only a finite number of steps
Efficiency
o Runtime : The algorithm runs as fast as possible
o Storage space: The algorithm requires as little storage space as possible.
Algorithms-Distinct areas:
Five distinct areas to study of algorithms.
1. Creating or devising algorithms: Various design techniques are created to yield good
algorithms.
2. Expressing the algorithms in a structured representation.
3. Validating algorithms: The algorithms devised should compute the correct answer for
all possible legal inputs. This process is known as a algorithm validation.
8
4. Analyzing algorithms: It refers to the process of determining how much computing time
and storage an algorithm will require. How well an algorithm does performs in the best case,
worst case, average case.
Kinds of Analyses
Worst-case: (usually)
T(n) = maximum time of algorithm on any input of size n.
Average-case: (sometimes)
T(n) = expected time of algorithm over all inputs of size n.
Need assumption of statistical distribution of inputs.
Best-case: (Never)
Cheat with a slow algorithm that works fast on some input.
5. Testing algorithm
It consists of two phases: Debugging and profiling
Debugging is the process of executing programs on sample data to determine if any
faulty results occur.
Profiling is the process of executing a correct programs on data sets and measuring the
time and space it takes to compute the results.
II. ANALYSIS OF ALGORITHM
Analyzing an algorithm has come to mean predicting the resources that the algorithm
requires. Occasionally, resources such as memory, communication bandwidth, or logic gates are
of primary concern, but most often it is computational time that we want to measure.
Generally, by analyzing several candidate algorithms for a problem, a most efficient one
can be easily identified. Such analysis may indicate more than one viable candidate, but several
inferior algorithms are usually discarded in the process.
Analysis predicting the resources that the algorithm requires, resources such as memory,
communication bandwidth or computer hardware are of primary concern, but most often it is
necessary to measure the computational time.
By analyzing several candidate algorithms for a problem, a most efficient one can be
easily identified and others are discarded in the process.
The main reasons for analyzing algorithms are
It is an intellectual activity
It is a challenging one to predict the future by narrowing the predictions to algorithms.
Computer science attracts many people who enjoy being efficiency experts.
Structural Programming Model
Niklaus Wirth started that any algorithm could be written with only three programming
constructs Sequence, Selection, Loop
9
The implementation of these constructs relies on the implementation language like C++
language.
Sequence is a series of statement that do not alter the execution path within an algorithm.
Selection statements evaluate one or more alternatives. If alternatives are true, one path is taken.
If alternatives are false, a different path is taken.
Loop
Iterates a block of code.
Usually the condition is evaluate before the Body of the loop is executed.
If the condition is true, the body is executed .
If the condition is false, the loop terminates.
ANALYSIS OF INSERTION SORT :
The time taken by the Insertion Sort procedure depends on the input: sorting a thousand
numbers takes longer than sorting three numbers.
Insertion sort can take different amounts of time to sort two input sequences of the same
size depending on how nearly sorted they already are.
In general, the time taken by an algorithm grows with the size of the input, so it is
traditional to describe the running time of a program as a function of the size of its input.
To do so, we need to define the terms "running time" and "size of input" more carefully.
The best notion for input size depends on the problem being studied.
For many problems, such-as sorting or computing discrete Fourier transforms, the most
natural measure is the number a/items in the input.
For example, the array size n for sorting. For many other problems, such as multiplying
two integers, the best measure of input size is the total number of bits needed to represent
the input in ordinary binary notation.
The running time of an algorithm on a particular input is the number of primitive
operations or "steps" executed.
We start by presenting the INSERTION-SORT procedure with the time "cost" of each
statement and the number of times each statement is executed. For each j = 2,3, ... , n,
where n = length[A], we let tj be the number of times the while loop test in line 5 is
executed for that value of j.
We assume that comments are not executable statements, and so they take no time.
INSERTION-SORT(A) cost times 1 for j +- 2 to length[A] C1 n
2 do key +- A[j] C2 n - 1 3 l> Insert A[j] into the sorted l> sequence A[l .. j - 1]. 0 n - 1
4 i+-j-l C4 n-l 5 while i > a and A[i] > key Cs L;=2 tj
6 do A[i + 1] +- A[i] C6 L;=2(tj - 1)
7 i+-i-l C7 L;=2(tj - 1)
10
8 A[i + 1] +- key C8 n - 1
The running time of the algorithm is the sum of running times for each statement
executed; a statement that takes Ci steps to execute and is executed n times will contribute Ci n
to the total running time. To compute T(n), the running time of INSERTION-SORT, we sum the
products of the cost and times columns, obtaining
n n n
T(n) =c1n+c2 (n-1)+c4(n-1)+ c5∑ tj + c6 ∑ (tj-1) +c7 ∑ (tj-1) + c8 (n-1)
j=2 j=2 j=2
Even for inputs of a given size, an algorithm's running time may depend on which input
of that size is given. For example, in INSERTION-SORT, the best case occurs if the array is
already sorted. For each j = 2,3, ... , n, we then find that A[i]≤key in line 5 when i has its initial
value of
j - 1. Thus tj = 1 for j = 2,3, ... , n, and the best-case running time is
T(n) = cln + c2(n - 1) + c4(n - 1) + c5(n - 1) + c8(n - 1)
= (cl + c2 + c4 + c5 + c8)n - (c2 + c4 + c5 + c8)
This running time can be expressed as an+b for constants a and b that depend on the
statement costs ci. It is thus a linear function of n.
If the array is in reverse sorted order the worst case results. Compare each element A[j]
with each element in the entire sorted sub array A[1…j-1] and so tj=j for j=2,3,…n
n
∑ j=n (n+1)/2 -1
j=2
and
n
∑ (j-1)=n (n+1)/2
j=2
T(n)=c1n+c2(n-1)+c5(n(n-1)/2 -1)+c6((n(n-1)/2 + c7((n(n-1)/2+c8(n-1)
= (c5/2+c6/2+c7/2) n2+(c1+c2+c4+c5/2-c6/2-c7/2+c8) n-(c2+c4+c5+c8)
This worst case running time can be expressed as an2= bn+c for constants a, b, c that
again depend on the statement costs ci; it is thus a quadratic function of n
11
Worst case and Average case Analysis
In the analysis of Insertion sort, the best case in which the input array was already sorted,
and the worst case, in which the input array was reverse sorted. To find only the worst case
running time, that is, the longest running time for any input of size n. Three reasons are
The worst case running time of an algorithm is an upper bound on the running time for
any input.
Knowing it gives us the guarantee that the algorithm will never take any longer. For some
algorithms, the worst case occurs fairly often. For example, in searching a database for a
particular piece of information, the searching algorithm worst case will often occur when
the information is not present in the database.
The average case is often roughly as bad as the worst case. Suppose that we randomly
choose n numbers and apply insertion sort. How long it takes to determine where in sub
array A[1…j-1] to insert element A[j]? On average half the elements in A[1..j-1] are less
than A[j], and half the elements are greater. On average, we check half of the sub array
A[1..j-1], so tj=j/2. Resulting average case running time, it turns out to be a quadratic
function of the input size, just like the worst case running time.
One problem with performing an average case analysis, however is that it may not be
apparent what constitutes an average input for a particular problem.
DESIGNING ALGORITHMS
There are many ways to design algorithms. Insertion sort uses an incrementa1
approach: having sorted the sub array A [I .. j - 1], we insert the single element
A[j] into its proper place, yielding the sorted sub array A[l .. j].
In this section, we examine an alternative design approach, known as "divide-
and-conquer."
We shall use divide-and-conquer to design a sorting algorithm whose worst-case
running time is much less than that of insertion Sort.
One advantage of divide-and-conquer algorithms is that their running times are
often easily determined using techniques
Divide and Conquer approach:
Many useful algorithms are recursive in structure: to solve a given problem, they call
themselves recursively one or more times to deal with closely related sub problems.
These algorithms typically follow a divide-and-conquer approach: they break the problem
into several sub problems that are similar to the original problem but smaller in size, solve the
sub problems recursively, and then combine these solutions to create a solution to the original
problem.
The divide-and-conquer paradigm involves three steps at each level of the recursion:
12
Divide the problem into a number of sub problems.
Conquer the sub problems by solving them recursively. If the sub problem sizes are small
enough, however, just solve the sub problems in a straightforward manner.
Combine the solutions to the sub problems into the solution for the original problem.
EXAMPLE - MERGE SORT
The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it
operates as follows.
Divide: Divide the n-element sequence to be sorted into two subsequences of nl2 elements each.
Conquer: Sort the two subsequences recursively using merge sort. Combine: Merge the two
sorted subsequences to produce the sorted answer.
We note that the recursion "bottoms out" when the sequence to be sorted has length I, in
which case there is no work to be done, since every sequence of length I is already in sorted
order.
The key operation of the merge sort algorithm is the merging of two sorted sequences in
the "combine" step. To perform the merging, we use an auxiliary procedure MERGE (A,p, q, r),
where A is an array and p, q, and r are indices numbering elements of the array such that p :S q <
r. The procedure assumes that the sub arrays A(p .. q] and A[q + I .. r] are in sorted order. It
merges them to form a single sorted sub array that replaces the current sub array A(p .. r].
Although we leave the pseudo code as an exercise it is easy to imagine a MERGE procedure that
takes time 8(n), where n = r - p + 1 is the number of elements being merged. Returning to our
card playing motif, suppose we have two piles of cards face up on a table. Each pile is sorted,
with the smallest cards on top. We wish to merge the two lines into a single sorted output pile,
which is to be face down on the table input pile and place it face down onto the output pile.
Computationally, each basic step takes constant time, since we are checking just two top cards.
Since we perform at most n basic steps, merging takes 8(n) time.
We can now use the MERGE procedure as a subroutine in the merge sort algorithm. The
procedure MERGE-SORT (A,p, r) sorts the elements in the sub array A[p .. r]. If p, r, the sub
array has at most one element and is therefore already sorted. Otherwise, the divide step simply
computes an index q that partitions A[p .. r] into two sub arrays: A[p .. q], containing rn/2l
elements, and A[q + I .. r], containing In/2J elements.4
MERGE-SORT (A,p, r)
I if p < r
2 then q <- l(p + r)/2J
3 MERGE-SORT(A,p, q)
4 MERGE-SORT(A,q + I,r)
5 MERGE(A,p,q,r)
To sort the entire sequence A = (A[I],A[2], ... ,A[nJ), we call MERGESORT(A,
1,length[A]), where once again length[A] = n.
If we look at the operation of the procedure bottom-up when n is a power of two, the al-
gorithm consists of merging pairs of I-item sequences to form sorted sequences of length 2,
merging pairs of sequences of length 2 to form sorted sequences of length 4, and so on, until two
sequences of length n /2 are merged to form the final sorted sequence of length n
13
When an algorithm contains a recursive call to itself, its running time can often be
described by a recurrence equation or recurrence equation, which describes the overall running
time on a problem of size n in terms of the running time on smaller inputs. We can then use
mathematical tools to solve the recurrence and provide bounds on the performance of the
algorithm.
A recurrence for the running time of a divide-and-conquer algorithm is based on the three
steps of the basic paradigm. As before, we let T(n) be the running time on a problem of size n.
If the problem size is small enough, say n :: c for some constant c, the straightforward
solution takes constant time, which we write as 8( I). suppose we divide the problem into sub
problems, each of which is 1/ b the size of the original.
If we take D( n) time to divide the problem into sub problems and C(11) time to combine
the solutions to the sub problems into the solution to the original problem, we get the recurrence
BEST CASE, WORST CASE AND AVG. CASE EFFICIENCIES
Time efficiency – function in terms of n (input size)
For some algorithm, the running time depends not only on input size n also on the
individual elements.
eg. Linear search, here we go for worst case/ best case and avg. case efficiency.
We will mainly focus on worst-case analysis, but sometimes it is useful to do average one.
Worst- / average- / best-case
Worst-case running time of an algorithm
– The longest running time for any input of size n
– An upper bound on the running time for any input
– Guarantee that the algorithm will never take longer
– Sequential search for an item which is not present / present at the end of list.
– Sort a set of numbers in increasing order; and the data is in decreasing order
– The worst case can occur fairly often
– Provides the expected running time
Best-case running time
– if the algorithm is executed, the fewest number of instructions are executed
– takes shortest running time for any input of size n
– Sequential search for an item which is present at beginning of the list.
– sort a set of numbers in increasing order; and the data is already in
increasing order
14
Average-case running time
– May be difficult to define what ―average‖ means, but gives the necessary
details about an algo‘s behavior on a typical /random input.
EXAMPLE: Sequential Search
A sequential search steps through the data sequentially until a match is found.
A sequential search is useful when the array is not sorted.
The basic operation count for
1. Best case input c(n)=1
2. Worst case input c(n)
Unsuccessful search --- n times
Successful search (worst) ---n times
3. Avg.case input
Here basic operation count is calculated as follows
Assumptions:
a) The probability of a successful search =p (0<=p<=1)
b) The probability of the first match occuring in the ith position of list is same for every i,
which is equal to p/n and the no of compare operations made by the algo. in such a
situation is i.
c) In case of unsuccessful search ,the no of comparisons made is n with the probability of
such a search is (1-p)
So c(n)=[1*p/n+2*p/n+…..i*p/n+….n*p/n]+n*(1-p)
=p/n(1+2+….i+…n)+n(1-p)
=p/n*n(n+1)/2+n(1-p)
c(n)=p(n+1)/2+n(1-p)
For successful search, p=1,c(n)=(n+1)/2
Unsuccessful search, p=0,c(n)=n
15
ANALYSIS OF ALGORITHM USING DATA STRUCTURES:
The analysis of algorithm is made considering both qualitative and quantitative aspects to
get the solution that is economical in the use of computing and human resources which improves
the performance of an algorithm. A good algorithm usually possesses the following qualities and
capabilities.
They are simple but powerful and general solutions
They are user friendly
They can be easily updated
They are correct
They are able to be understood on a number of levels
They are economical in the use of computer time, storage and peripherals
They are independent to run on a particular computer
They can be used as subprocedures for other problems
The solution is pleasing and satisfying to its designer
IV. COMPUTATIONAL COMPLEXITY
Space Complexity
The space complexity of an algorithm is the amount of memory it needs to run to
completion
[Core dumps = the most often encountered cause is ―memory leaks‖ – the amount of
memory required larger than the memory available on a given system]
Some algorithms may be more efficient if data completely loaded into memory
1. Need to look also at system limitations
2. E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters,
etc.] – can I afford to load the entire collection?
Time Complexity
The time complexity of an algorithm is the amount of time it needs to run to completion
Often more important than space complexity
1. space available (for computer programs!) tends to be larger and larger
2. time is still a problem for all of us
Algorithms running time is an important issue
Space Complexity
The Space needed by each algorithm is the sum of the following components:
1. Instruction space
2. Data space
3. Environment stack space
Instruction space
16
The space needed to store the compiled version of the program instructions
Data space
The space needed to store all constant and variable values
Environment stack space
The space needed to store information to resume execution of partially completed
functions
The total space needed by an algorithm can be simply divided into two parts from the 3
components of space complexity
1. Fixed part
2. Variable part
Fixed part
A fixed part space is independent of the characteristics of the inputs and outputs. This
part typically includes the instruction space, space for simple variables and fixed size component
variables, space for constants and so on
e.g. name of the data collection
same size for classifying 2GB or 1MB of texts
Variable part
A variable part space needed by component variables whose, size is dependent on the
particular problem instance being solved, the space needed by referenced variables and the
recursion stack space.
e.g. actual text
load 2GB of text VS. load 1MB of text
The space requirement S(P) of any algorithm or program P may be written as:
S(P)=C+Sp (instance characteristics)
C= Constant that denotes the fixed part of the space requirement
Sp= Variable Component depends on the magnitude of the inputs to and outputs from the
algorithm.
Example
void float sum (float* a, int n)
float s = 0;
for(int i = 0; i<n; i++)
17
s+ = a[i];
return s;
Space one word for n, one for a [passed by reference!], one for i constant space!
When memory was expensive we focused on making programs as space efficient as
possible and developed schemes to make memory appear larger than it really was (virtual
memory and memory paging schemes)Space complexity is still important in the field of
embedded computing
Time Complexity
The time T(P) taken by a program P is T(P)-Compile time +Run(or)Execution time
Compile time
It does not depend on the instance characteristics. We assume that a compiled program
will be run several times without recompilation
Run time
It depends on the instance characteristics denoted by tp. The tp(n) can be calculated by
the following form of expression
tp(n)= Ca ADD(n) + Cs SUB(n)+ Cm MUL(n)+ Cd DIV(n)+….
n = instance characteristics
Ca, Cs, Cm, Cd = time needed for addition, subtraction, multiplication and division
ADD, SUB, MUL, DIV= number of additions, subtractions, multiplications and divisions
performed for the program p on the instance characteristics n.
To find the value of tp(n) from the above expression is an impossible task, since the time
needed for Ca, Cs, Cm, Cd is often depends on the numbers being involved in the operation.
The value of tp(n) for any given n can be obtained experimentally, that is, the program
typed, compiled and run on a particular machine, the execution time is physically clocked, and
tp(n) is obtained.
The value of tp(n) depends on some factors, such as system load, the number of other
programs running on the computer at the time program p is run and so on. To overcome this
disadvantage, count only the program steps, where the time is required by each step is relatively
independent of the instant characteristics.
A program step is defined as syntactically or semantically meaningful segment of a program that
has an execution time that is independent of the instant characteristics.
The program statements are classified into three steps
1. Comments-Zero step
2. Assignment statement-One step
3. Iterative statement-finite number of steps.
18
V. AMORTIZED ANALYSIS
In an amortized analysis, the time required to perform a sequence of data structure
operations is averaged over all the operations performed.
Amortized analysis can be used to show that the average cost of an operation is
small, if one averages over a sequence of operations, even though a single
operation might be expensive.
Amortized analysis differs from average case analysis in that probability is not
involved; an amortized analysis guarantees the average performance of each
operation in the worst case.
In the aggregate method of amortized analysis, we show that for all n, a sequence
of n operations takes worst-case time T( n) in total. In the worst case, the average
cost, or amortized cost, per operation is therefore T( n) / n. Note that this
amortized cost applies to each operation, even when there are several types of
operations in the sequence.
The other two methods we shall study in this chapter, the accounting method and
the potential method, may assign different amortized costs to different types of
operations.
Stack operations
In our first example of the aggregate method, we analyze stacks that have been
augmented with a new operation. Section II. J presented the two fundamental stack operations,
each of which takes O( I) time:
PUSH(S, x) pushes object x onto stack S.
Pop(S) pops the top of stack S and returns the popped object.
Since each of these operations runs in O( I ) time, let us consider the cost of each to be I.
The total cost of a sequence of n PUSH and POP operations is therefore n, and the actual running
time for n operations is therefore 9(n).
The situation becomes more interesting if we add the stack operation MULTIPOP(S,k),
which removes the k top objects of stack S, or pops the entire stack if it contains less than k
objects.
In the following pseudo code, the operation STACK-EMPTY returns TRUE if there are
no objects currently on the stack, and FALSE otherwise.
MULTIPOP(S, k)
1 while not STACK-EMPTY(S) and k 1= 0
2 do Pop(S)
3 k..-k-l
19
VI. ASYMPTOTIC NOTATION
Complexity analysis: rate at which storage or time grows as a function of the problem size
Asymptotic analysis: describes the inherent complexity of a program, independent of
machine and compiler
Idea: as problem size grows, the complexity can be described as a simple
proportionality to some known function.
A) Big Oh (O)-Upper Bound
This notation is used to define the worst case running time of an algorithm and concerned
with very large values of n.
f(n) = O(g(n)) iff f(n) < cg(n)
for some constants c and n0, and all n > n0
B) Big Omega (Ω )-Lower Bound
This notation is used to describe the best case running time of algorithms and concerned
with large values of n
f(n) = Ω(g(n)) iff f(n) > cg(n)
for some constants c and n0, and all n > n0
C) Big Theta (Θ)-Two-way Bound
This notation is used to describe the average case running time of algorithms and
concerned with very large values of n
f(n) = Θ (g(n)) iff c1g(n) < f(n) < c2g(n)
for some constants c1, c2, and n0, and all n > n0
D) Little Oh (o)-Only Upper Bound
This notation is used to describe the worst case analysis of algorithms and concerned with
small values of n
f(n) = o(g(n)) iff
f(n) = O(g(n)) and f(n)≠ Ω(g(n))
To compare and rank order of growth of algorithm. 3 notaions O, Ω ,
Informal definition.
Let t(n) and g(n) be any non negative funs defined on the set of natural nos.
t(n)- algo‘s running time
g(n) – simple function to compare the count with.
i) O(g(n)) is the set of all fns with a smaller or same order of growth as g(n)
20
Eg.
n E O(n2) n
3 ^E O(n
2)
100n+5 E O(n2) n
4+n+1^E O(n
2)
100n+5 E O(n)
1/2n(n-1) E O(n2)
ii) Ω(g(n)) stands for set of all fns with a larger or same order of growth as g(n)
n3 E Ω(n
2)
1/2n(n-1) E Ω(n2)
100n+5 ^E Ω(n2)
iii) (g(n)) stands for set of all fns with the same order of growth as g(n)
n3 ^
E (n2)
an2+bn+c E (n
2)
100n+5 E (n)
Big Oh
f(N) = O(g(N))
There are positive constants c and n0 such that
o f(N) c g(N) when N n0
The growth rate of f(N) is less than or equal to the growth rate of g(N)
g(N) is an upper bound on f(N)
o We write f(n) = O(g(n)) if there are positive constants n0 and c such that to the
right of n0, the value of f(n) always lies on or below cg(n).
Meaning: For all data sets big enough (i.e., n>n0), the algorithm always executes in less than
cf(n) steps in [best, average, worst] case.
The idea is to establish a relative order among functions for large n
c , n0 > 0 such that f(N) c g(N) when N n0
21
f(N) grows no faster than g(N) for ―large‖ N
Big O Rules
• If is f(n) a polynomial of degree d, then f(n) is
• O(d), i.e.,
1. Drop lower-order terms
2. Drop constant factors
• Use the smallest possible class of functions
Say ―2n is O(n)‖ instead of ―2n is O(n2)‖
• Use the simplest expression of the class
Say ―3n + 5 is O(n)‖ instead of ―3n + 5 is O(3n)‖
Big-Oh: example
• Let f(N) = 2N2. Then
– f(N) = O(N4)
– f(N) = O(N3)
– f(N) = O(N2) (best answer, asymptotically tight)
• N2 / 2 – 3N = O(N
2)
• 1 + 4N = O(N)
• 7N2 + 10N + 3 = O(N
2) = O(N
3)
Big-Omega
• f(N) = (g(N))
• There are positive constants c and n0 such that
f(N) c g(N) when N n0
• The growth rate of f(N) is greater than or equal to the growth rate of g(N).
• c , n0 > 0 such that f(N) c g(N) when N n0
• f(N) grows no slower than g(N) for ―large‖ N
Big-Omega: example
22
• Let f(N) = 2N2. Then
– f(N) = (N)
– f(N) = (N2) (best answer)
Big-Theta
• f(N) = (g(N)) iff
f(N) = O(g(N)) and f(N) = (g(N))
• The growth rate of f(N) equals the growth rate of g(N)
• f(n) is Θ(g(n)) if there are constants c‘ > 0 and c‘‘ > 0 and an integer constant n0 ≥ 1
such that c‘•g(n) ≤ f(n) ≤ c‘‘•g(n) for n ≥ n0
• Big-Theta means the bound is the tightest possible.
• the growth rate of f(N) is the same as the growth rate of g(N)
Big-Theta rules
• Example: Let f(N)=N2 , g(N)=2N
2
– Since f(N) = O(g(N)) and f(N) = (g(N)),
thus f(N) = (g(N)).
• If T(N) is a polynomial of degree k, then
T(N) = (Nk).
• For logarithmic functions,
T(logm N) = (log N)
Mathematical Expression Relative Rates of Growth
T(n) = O(F(n)) Growth of T(n) <= growth of F(n)
23
T(n) = Ω(F(n)) Growth of T(n) >= growth of F(n)
T(n) = Θ(F(n)) Growth of T(n) = growth of F(n)
T(n) = o(F(n)) Growth of T(n) < growth of F(n)
Computation of step count using asymptotic notation
Asymptotic complexity can be determined easily without determining the exact step
count. This is done by first determining the asymptotic complexity of each statement in the
algorithm and then adds these complexities to derive the total step count.
Question Bank
UNIT I - PROBLEM SOLVING
PART – A (2 MARKS) 1. Define Modularity. 2. What do you mean by top down design? 3. What is meant by algorithm? What are its measures? 4. Give any four algorithmic techniques. 5. Write an algorithm to find the factorial of a given number 6. List the types of control structures 7. Define the top down design strategy 8. Define the worst case & average case complexities of an algorithm 9. What is meant by modular approach? 10. What is divide & conquer strategy? 11. What is dynamic programming? 12. What is program testing? 13. Define program verification 14. What is input/output assertion? 15. Define symbolic execution 16. Write the steps to verify a program segment with loops 17. What is CPU time? 18. Write at least five qualities & capabilities of a good algorithm 19. Write an algorithm to exchange the values of two variables 20. Write an algorithm to find N factorial (written as n!) where n>=0. PART- B (16 MARKS) 1. Explain Top down design in detail. 2. (a) Explain in detail the types on analysis that can be performed on an algorithm (8) (b) Write an algorithm to perform matrix multiplication algorithm and analyze the same (8) 3. Design an algorithm to evaluate the function sin(x) as defined by the infinite series expansion sin(x) = x/1!-x3/3! +x5/5!-x7/7! +…… 4. Write an algorithm to generate and print the first n terms of the Fibonacci series where n>=1 the first few terms are 0, 1, 1, 2, 3, 5, 8, 13. 5. Design an algorithm that accepts a positive integer and reverse the order of its digits. 6. Explain the Base conversion algorithm to convert a decimal integer to its corresponding octal representation
24
UNIT II - FUNDAMENTALS OF DATA STRUCTURES
Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists
– Queue and its Representation – Applications of Stack – Queue and Linked Lists.
25
Unit: II - FUNDAMENTALS OF DATA STRUCTURES
I. ARRAYS
Array is a finite ordered set of homogeneous elements. Array size may of large or small,
but it must exist. An array must contains collection of elements of same datatype.
Array declaration in C is given below.
int a[100];
Here array name is ‗a‘, and size is 100. Each element is represented by its index, which
starts from 0. For example 1st element index is 0, 2
nd element index is 1 and 100
th element
index is 99.
The two basic operations that access an array are extraction and storing. The extraction
operation is a function that accepts an array element by using array name and index. The storing
operation of value x in the index i is,
a[i] = x;
The smallest element of an array`s index is called its lower bound and in c is always 0, and
the highest element is called the upper bound. The number of elements in an array is called
range.
If lower bound is represented by ―lower‖ and upper bound is represented by ―upper‖, then
range = upper – lower + 1. For example for array a, lower bound is 0, upper bound is 99 and
range is 100.
An important feature of a C array is that neither the upper bound nor the lower bound may
be changed during a program`s execution. The lower bound is always fixed at 0, and the upper
bound is fixed at the time the program is written.
One very useful technique is to declare a bound as a constant identifier, so that work required
to modify the size of an array is minimized. For example, consider the following program
int a[100];
for(i = 0; i <100; a[i++] = 0);
To change the array to a larger (or smaller) size, the constant 100 must be changed in two
pieces, once in the declarations and once in the for statement. Consider the following equivalent
alternative,
#define NUMELTS 100
int a[NUMELTS];
26
for(i = 0; i < NUMELTS; a[i++] = 0);
Now only a single change in the constant definition is needed to change the upper bound.
One Dimensional Array
A one-dimensional array is used when it is necessary to keep a larger number of items in
memory and reference all the items in a uniform manner. Consider an application to read 100
integers, and find its average.
#define NUMELTS 100
aver()
int num[NUMELTS];
int i;
int total;
float avg;
total = 0;
for(i = 0; i < NUMELTS; i++)
Scanf(―%d‖, &num[i]);
Total += num[i];
avg = total / NUMELTS;
printf(―Average = %f‖, avg);
The first statement (int b[100];) reseves 100 successive memory locations, each large enough
to contain a single integer. The address of the first of these locations is called the base address of
the array b.
Two-Dimensional array
27
Two-Dimensional array is an array of another array. For example
int a[3][5];
This represents a array containing three elements. Each of these elements is itself an array
containing five elements. This can be represented in figure given below.
Total 15 (3 x 5) elements can be stored in this array. Each element can be accessed by its
corresponding row index and array index. Suppose if you want to access the first cell in the
second row means, just use a[1][0]. Like wise if you want to access second cell in third row,
use a[2][1].
Nested looping statement is used to access each element efficiently, sample code to read
values and full it in this array is given below.
for(i = 0; i < 3; i++)
for(j = 0; j < 5; j++)
scanf(―%d‖, &a[i][j]);
Multi-Dimensional array
C allows developers to declare array of dimensions more than two also. Three-Dimensional
array declaration is given below.
int [3][2][5];
This can be accessed using three nested looping statements. Developer can also declare more
than three dimensional arrays.
II. STRUCTURES
Row 0
Row 1
Row 2
Col 0 Col 1 Col 2 Col 3 Col 4
28
A structure is a group of items in which each item is identified by its own identifier. In
programming language, a structure is called a ―record‖ and a member is called a ―field‖.
Consider the following structure declaration,
struct
Char first[10];
Char midinit;
Char last[20];
sname, ename;
This declaration creates two structure variables, sname and ename, each of which contains
three members: first, midinit and last. Two of the members are character strings, and one is
single character. Declaration of the structure can be also in another format given below
struct nametype
Char first[10];
Char midinit;
Char last[20];
;
Struct nametype sname, ename;
The above definition creates a structure tag nametype containing three members. Once a
structure tag has been defined, variables sname and ename can be declared. An alternative
method for structure tag assigning is use of typedef definintion in C, which is given below
typedef struct
Char first[10];
Char midinit;
Char last[20];
nametype;
nametype sname, ename;
29
Structure variable sname contains three members and ename contains a separate three
members. Each member of a structure variable can be accessed using a dot(.) operator.
Consider a structure given below.
struct data
int a;
float b;
char c;
;
int main()
struct data x,y;
printf(―\nEnter the values for first variable\n‖);
scanf(―%d%f%c‖, &x.a,&x.b,&x.c);
printf(―\nEnter the values for second variable\n‖);
scanf(―%d%f%c‖, &y.a,&y.b,&y.c);
return 0;
Structure variable can also be an array variable. Looping statement is used to get input for
that structure array variable. Sample code is given below,
int main()
struct data x[5];
int i;
for(i = 0; i < 5; i++)
printf(―\nEnter the values for %d variable\n‖, (i+1));
scanf(―%d%f%c‖, &x[i].a, &x[i].b, &x[i].c);
30
return 0;
STACK AND QUEUE
Stacks and queues are used to represent sequence of elements which can be modified by
insertion and deletion. Both stacks and queues can be implemented efficiently as arrays or as
linked lists.
III. STACK
A stack is a list with the restriction that inserts and deletes can be performed in only one
position, namely the end of the list called the top. The fundamental operations on a stack
are push, which is equivalent to an insert, and pop, which deletes the most recently inserted
element.
The most recently inserted element can be examined prior to performing a pop by use of
the top routine. Stacks are used often in processing tree-structured objects, in compilers (in
processing nested structures), and is used in systems to implement recursion. Stacks are also
known as LIFO.
Stack model
31
(Stack model: only the top element is accessible)
REPRESENTATION OF STACK
A) Implementation of Stack using array
A stack K is most easily represented by an infinite array K[0], K[1], K[2],…. and an
index TOP of type integer. The stack K consist of K[0],…K[TOP] elements. Element at index
TOP (K[TOP]) is the top element of stack. Insertion of elements is called push and deletion of
element is called pop. The following code explains the operation push(K,a)
TOP = TOP + 1;
K[TOP] = a;
The following code explains the operation pop(K)
if(TOP<0) then
error;
else
X = K[TOP];
TOP = TOP – 1;
end if
Infinite array is not available. So use a finite array, of size n. In this case a push operation
must check whether overflow occurs.
B) Linked List Implementation of Stacks
Stack can be implemented using a singly linked list. We perform a push by inserting at
the front of the list. We perform a pop by deleting the element at the front of the list.
A top operation merely examines the element at the front of the list, returning its
value. Sometimes the pop and top operations are combined into one. Structure definition is
given below,
typedef struct node *node_ptr;
struct node
element_type element;
node_ptr next;
32
;
typedef node_ptr STACK;
Routine to test whether a stack is empty-linked list implementation is given below,
int is_empty( STACK S )
return( S->next == NULL );
We merely create a header node; make_null sets the nextpointer to NULL. Routine to
create and empty stack-linked list implementation are given below,
STACK S;
S = (STACK) malloc( sizeof( struct node ) );
if( S == NULL )
fatal_error("Out of space!!!");
return S;
Void make_null( STACK S )
if( S != NULL )
S->next = NULL;
else
error("Must use create_stack first");
The push is implemented as an insertion into the front of a linked list, where the front of the list
serves as the top of the stack. Routine to push onto a stack-linked list implementation is given
below,
push( element_type x, STACK S )
node_ptr tmp_cell;
tmp_cell = (node_ptr) malloc( sizeof ( struct node ) );
if( tmp_cell == NULL )
fatal_error("Out of space!!!");
else
tmp_cell->element = x;
33
tmp_cell->next = S->next;
S->next = tmp_cell;
The top is performed by examining the element in the first position of the list. Routine to return
top element in a stack--linked list implementation is given below,
element_type top( STACK S )
if( is_empty( S ) )
error("Empty stack");
else
return S->next->element;
Routine to pop from a stack--linked list implementation is given below,
Void pop( STACK S )
node_ptr first_cell;
if( is_empty( S ) )
error("Empty stack");
else
first_cell = S->next;
S->next = S->next->next;
free( first_cell );
One problem that affects the efficiency of implementing stacks is error testing. Our
linked list implementation carefully checked for errors. A pop on an empty stack or
a pushon a full stack will overflow the array bounds and cause a crash.
APPLICATIONS OF STACKS
34
A) Balancing Symbols
Every brace, bracket, and parenthesis must correspond to their left counterparts. The
sequence [()] is legal, but [(]) is wrong. It is easy to check these things using stack. Just check for
balancing of parentheses, brackets, and braces and ignore any other character that appears.
Make an empty stack. Read characters until end of file. If the character is an open anything,
push it onto the stack. If it is a close anything, then if the stack is empty report an error.
Otherwise, pop the stack. If the symbol popped is not the corresponding opening symbol, then
report an error. At end of file, if the stack is not empty report an error.
B) Postfix Expressions
Suppose we have a pocket calculator and would like to compute the cost of a shopping trip.
To do so, we add a list of numbers and multiply the result by 1.06; this computes the purchase
price of some items with local sales tax added. If the items are 4.99, 5.99, and 6.99, then a
natural way to enter this would be the sequence
4.99 + 5.99 + 6.99 * 1.06 =
Depending on the calculator, this produces either the intended answer, 19.05, or the scientific
answer, 18.39. Most simple four-function calculators will give the first answer, but better
calculators know that multiplication has higher precedence than addition.
On the other hand, some items are taxable and some are not, so if only the first and last items
were actually taxable, then the sequence
4.99 * 1.06 + 5.99 + 6.99 * 1.06 =
would give the correct answer (18.69) on a scientific calculator and the wrong answer (19.37)
on a simple calculator. A scientific calculator generally comes with parentheses, so we can
always get the right answer by parenthesizing, but with a simple calculator we need to remember
intermediate results.
A typical evaluation sequence for this example might be to multiply 4.99 and 1.06, saving this
answer as a1. We then add 5.99 and a1, saving the result in a1. We multiply 6.99 and 1.06, saving
the answer in a2, and finish by adding al and a2, leaving the final answer in al. We can write this
sequence of operations as follows:
4.99 1.06 * 5.99 + 6.99 1.06 * +
This notation is known as postfix or reverse Polish notation. For instance, the postfix
expression
6 5 2 3 + 8 * + 3 + *
35
is evaluated as follows: The first four symbols are placed on the stack. The resulting stack is
Next a '+' is read, so 3 and 2 are popped from the stack and their sum, 5, is pushed.
Next 8 is pushed.
Now a '*' is seen, so 8 and 5 are popped as 8 * 5 = 40 is pushed.
Next a '+' is seen, so 40 and 5 are popped and 40 + 5 = 45 is pushed.
Now, 3 is pushed.
Next '+' pops 3 and 45 and pushes 45 + 3 = 48.
36
Finally, a '*' is seen and 48 and 6 are popped, the result 6 * 48 = 288 is pushed.
The time to evaluate a postfix expression is O(n), because processing each element in the input
consists of stack operations and thus takes constant time. The algorithm to do so is very simple.
Notice that when an expression is given in postfix notation, there is no need to know any
precedence rules; this is an obvious advantage.
C) Infix to Postfix Conversion
Not only can a stack be used to evaluate a postfix expression, but we can also use a stack to
convert an expression in standard form (otherwise known as infix) into postfix. Suppose we want
to convert the infix expression
a + b * c + ( d * e + f ) * g
into postfix. A correct answer is a b c * + d e * f + g * +.
When an operand is read, it is immediately placed onto the output. Operators are not
immediately output, so they must be saved somewhere. The correct thing to do is to place
operators that have been seen, but not placed on the output, onto the stack. We will also stack left
parentheses when they are encountered. We start with an initially empty stack.
If we see a right parenthesis, then we pop the stack, writing symbols until we encounter a
(corresponding) left parenthesis, which is popped but not output.
If we see any other symbol ('+','*', '(' ), then we pop entries from the stack until we find an
entry of lower priority. One exception is that we never remove a '(' from the stack except when
processing a ')'. For the purposes of this operation, '+' has lowest priority and '(' highest. When
the popping is done, we push the operand onto the stack.
Finally, if we read the end of input, we pop the stack until it is empty, writing symbols onto
the output.
To see how this algorithm performs, we will convert the infix expression above into its
postfix form. First, the symbol ais read, so it is passed through to the output. Then '+' is read and
37
pushed onto the stack. Next b is read and passed through to the output. The state of affairs at this
juncture is as follows:
Next a '*' is read. The top entry on the operator stack has lower precedence than '*', so nothing
is output and '*' is put on the stack. Next, c is read and output. Thus far, we have
The next symbol is a '+'. Checking the stack, we find that we will pop a '*' and place it on the
output, pop the other '+', which is not of lower but equal priority, on the stack, and then push the
'+'.
The next symbol read is an '(', which, being of highest precedence, is placed on the stack.
Then d is read and output.
We continue by reading a '*'. Since open parentheses do not get removed except when a
closed parenthesis is being processed, there is no output. Next, e is read and output.
The next symbol read is a '+'. We pop and output '*' and then push '+'. Then we read and
output .
38
Now we read a ')', so the stack is emptied back to the '('. We output a '+'.
We read a '*' next; it is pushed onto the stack. Then g is read and output.
The input is now empty, so we pop and output symbols from the stack until it is empty.
As before, this conversion requires only O(n) time and works in one pass through the input.
IV) QUEUE
Queues supports insertions (called enqueues) at one end (called the tail or rear) and
deletions (called dequeues) from the other end (called the head or front). Queues are used in
operating systems and networking to store a list of items that are waiting for some resource.
Queues are also known as FIFO.
Model of a queue
ARRAY IMPLEMENTATION OF QUEUES
Both the linked list and array implementations give fast O(1) running times for every
operation. Array implementation of queue is given below
39
For each queue data structure, keep an array, QUEUE[], and the positions q_front and
q_rear, which represent the ends of the queue.
Keep track of the number of elements that are actually in the queue, q_size. The cells
that are blanks have undefined values in them.
In particular, the first two cells have elements that used to be in the queue.
To enqueue an element x, Increment q_size and q_rear, then setQUEUE[q_rear] = x. To
dequeue an element, set the return value to QUEUE[q_front], decrementq_size, and then
increment q_front.
There is one potential problem with this implementation. After 10 enqueues, the queue
appears to be full, since q_front is now 10, and the next enqueue would be in a nonexistent
position.
However, there might only be a few elements in the queue, because several elements may
have already been dequeued.
The simple solution is that whenever q_front or q_rear gets to the end of the array, it is
wrapped around to the beginning. The following figure shows the queue during some operations.
This is known as acircular array implementation.
40
There are two warnings about the circular array implementation of queues. First, it is
important to check the queue for emptiness, because a dequeue when the queue is empty will
return an undefined value, silently.
Secondly, some programmers use different ways of representing the front and rear of a
queue. For instance, some do not use an entry to keep track of the size, because they rely on the
base case that when the queue is empty, q_rear = q_front - 1. The size is computed implicitly by
comparing q_rear and q_front. This is a very tricky way to go, because there are some special
cases, so be very careful if you need to modify code written this way. If the size is not part of the
structure, then if the array size isA_SIZE, the queue is full when there are A_SIZE -1 elements,
since only A_SIZE different sizes can be differentiated, and one of these is 0.
Type declarations for queue--array implementation is given below.
struct queue_record
41
unsigned int q_max_size; /* Maximum # of elements */
/* until Q is full */
unsigned int q_front;
unsigned int q_rear;
unsigned int q_size; /* Current # of elements in Q */
element_type *q_array;
;
typedef struct queue_record * QUEUE;
Routine to test whether a queue is empty-array implementation, is given below.
int is_empty( QUEUE Q )
return( Q->q_size == 0 );
Routine to make an empty queue-array implementation, is given below.
Void make_null ( QUEUE Q )
Q->q_size = 0;
Q->q_front = 1;
Q->q_rear = 0;
Routines to enqueue-array implementation, is given below
Void enqueue( element_type x, QUEUE Q )
if( is_full( Q ) )
error("Full queue");
else
Q->q_size++;
Q->q_rear = succ( Q->q_rear, Q );
Q->q_array[ Q->q_rear ] = x;
APPLICATION OF QUEUES
There are several algorithms that use queues to give efficient running times.
42
When jobs are submitted to a printer, they are arranged in order of arrival. Thus,
essentially, jobs sent to a line printer are placed on a queue.
In computer networks, there are many network setups of personal computers in which the
disk is attached to one machine, known as the file server. Users on other machines are
given access to files on a first-come first-served basis, so the data structure is a queue.
Calls to large companies are generally placed on a queue when all operators are busy.
Queues are mostly used in graph theory.
V) LIST
List is an abstract data type (ADT). A general list is of the form a1, a2, a3, . . . , an. a1, a2
are called keys or values of list. The size of this list is n. The list of size 0 is called null list. List
can be implemented contiguously (array) or non-contiguously (linked list).
List Operations
A lot of operations are available to perform on the list ADT. Some popular operations are
find – returns the position of the first occurrence of a key (value)
insert – deletes key from the specified position in the list
delete – inserts key from the specified position in the list
find_kth – returns the element in some position
print_list – displays all the keys in the list.
make_null – makes the list as null list
For example consider a list, 34, 12, 52, 16, 12, then
find(52) – return 3
insert(x,4) – makes the list into 34, 12, 52, x, 16, 12
delete(3) – makes the list into 34, 12, x, 16, 12.
Simple Array Implementation of Lists
List can be implemented using an array. Even if the array is dynamically allocated, an
estimate of the maximum size of the list is required. Usually this requires a high over-estimate,
which wastes considerable space. This could be a serious limitation, especially if there are many
lists of unknown size.
Merits of List using array
An array implementation allows print_list and find to be carried out in linear time, which
is as good as can be expected, and the find_kth operation takes constant time.
43
Demerits of List using array
However, insertion and deletion are expensive. For example, inserting at position 0
(which amounts to making a new first element) requires first pushing the entire array down one
spot to make room, whereas deleting the first element requires shifting all the elements in the list
up one, so the worst case of these operations is O(n). On average, half the list needs to be moved
for either operation, so linear time is still required. Merely building a list by n successive inserts
would require quadratic time. Because the running time for insertions and deletions is so slow
and the list size must be known in advance, simple arrays are generally not used to implement
lists.
Linked Lists
The linked list consists of a series of structures, which are not necessarily adjacent in memory.
Each structure contains the element variable and a pointer variable to a structure containing its
successor. Element variable is used to store a key (value). A pointer variable is just a variable
that contains the address where some other data is stored. This pointer variable is called as next
pointer.
A linked list
Thus, if p is declared to be a pointer to a structure, then the value stored in p is interpreted
as the location, in main memory, where a structure can be found. A field of that structure can be
accessed by p->field_name.
Consider a list contains five structures, which happen to reside in memory locations 1000,
800, 712, 992, and 692 respectively. The next pointer in the first structure has the value 800,
which provides the indication of where the second structure is. The other structures each have a
pointer that serves a similar purpose. Of course, in order to access this list, we need to know
where the first cell can be found. A pointer variable can be used for this purpose.
Linked list with actual pointer values
44
To execute print_list(L) or find(L,key), we merely pass a pointer to the first element in the
list and then traverse the list by following the next pointers. This operation is clearly linear-
time, although the constant is likely to be larger than if an array implementation were used.
The find_kth operation is no longer quite as efficient as an array implementation; find_kth(L,i)
takes O(i) time and works by traversing down the list in the obvious manner.
The delete command can be executed in one pointer change. The result of deleting the third
element in the original list is shown below.
Deletion from a linked list
The insert command requires obtaining a new cell from the system by using an malloc call
(more on this later) and then executing two pointer maneuvers.
Insertion into a linked list
Programming Details
Keep a sentinel node, which is sometimes referred to as a header or dummy node. Our
convention will be that the header is in position 0. Linked list with a header is given below.
Type declarations for linked lists is given below.
typedef struct node *node_ptr;
struct node
element_type element;
node_ptr next;
;
45
typedef node_ptr LIST;
typedef node_ptr position;
Function to test whether a linked list is empty
int is_empty( LIST L )
return( L->next == NULL );
Empty list with header
Function to test whether current position is the last in a linked list
Int is_last( position p, LIST L )
return( p->next == NULL );
Find function returns the position of the element in the list of some element
position find ( element_type x, LIST L )
position p;
p = L->next;
while( (p != NULL) && (p->element != x) )
p = p->next;
return p;
Our fourth routine will delete some element x in list L. We need to decide what to do if x
occurs more than once or not at all. Our routine deletes the first occurrence of x and does
nothing if x is not in the list. To do this, we find p, which is the cell prior to the one containing
x, via a call to find_previous.
void delete( element_type x, LIST L )
46
position p, tmp_cell;
p = find_previous( x, L );
if( p->next != NULL ) /* Implicit assumption of header use */
/* x is found: delete it */
tmp_cell = p->next;
p->next = tmp_cell->next; /* bypass the cell to be deleted */
free( tmp_cell );
position find_previous( element_type x, LIST L )
position p;
p = L;
while( (p->next != NULL) && (p->next->element != x) )
p = p->next;
return p;
Insert routine allows us to pass an element to be inserted along with the list L and a position p.
Our particular insertion routine will insert an element after the position implied by p.
insert( element_type x, LIST L, position p )
position tmp_cell;
tmp_cell = (position) malloc( sizeof (struct node) );
if( tmp_cell == NULL )
fatal_error("Out of space!!!");
else
tmp_cell->element = x;
tmp_cell->next = p->next;
p->next = tmp_cell;
To delete a list
Void delete_list( LIST L )
47
position p, tmp;
p = L->next; /* header assumed */
L->next = NULL;
while( p != NULL )
tmp = p->next;
free( p );
p = tmp;
DOUBLY LINKED LISTS
To traverse lists backwards add an extra field to the data structure, containing a pointer to
the previous cell. The cost of this is an extra link, which adds to the space requirement and also
doubles the cost of insertions and deletions because there are more pointers to fix. On the other
hand, it simplifies deletion, because you no longer have to refer to a key by using a pointer to the
previous cell.
A doubly linked list
CIRCULARLY LINKED LISTS
A popular convention is to have the last cell keep a pointer back to the first. This can be
done with or without a header (if the header is present, the last cell points to it), and can also be
done with doubly linked lists (the first cell's previous pointer points to the last cell).
A double
circularly linked list
Question Bank
Unit II - LISTS, STACKS AND QUEUES
PART – A (2 MARKS) 1. Define ADT. 2. Give the structure of Queue model. 3. What are the basic operations of Queue ADT? 4. What is Enqueue and Dequeue? 5. Give the applications of Queue. 6. What is the use of stack pointer? 7. What is an array? 8. Define ADT (Abstract Data Type).
48
9. Swap two adjacent elements by adjusting only the pointers (and not the data) using singly linked list. 10. Define a queue model. 11. What are the advantages of doubly linked list over singly linked list? 12. Define a graph 13. What is a Queue? 14. What is a circularly linked list? 15. What is linear list? 16. How will you delete a node from a linked list? 17. What is linear pattern search? 18. What is recursive data structure? 19. What is doubly linked list? PART- B (16 MARKS) 1. Explain the implementation of stack using Linked List. 2. Explain Prefix, Infix and postfix expressions with example. 3. Explain the operations and the implementation of list ADT. 4. Give a procedure to convert an infix expression a+b*c+(d*e+f)*g to postfix notation 5. Design and implement an algorithm to search a linear ordered linked list for a given alphabetic key or name. 6. (a) What is a stack? Write down the procedure for implementing various stack operations(8) (b) Explain the various application of stack? (8) 7. (a) Given two sorted lists L1 and L2 write a procedure to compute L1_L2 using only the basic operations (8) (b) Write a routine to insert an element in a linked list (8) 8. What is a queue? Write an algorithm to implement queue with example.
49
UNIT III – TREES
Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and
External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm –
Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching –
Hashing
50
Unit: III TREES
TREES
A tree is a finite set of one or more nodes such that there is a specially designated node
called the root, and zero or more non empty sub trees T1, T2…TK, each of whose roots are
connected by a directed edge from Root R.
Fig: Tree
PRELIMINARIES
Root
A node which doesn‘t have a parent. In the above tree, the root is A.
Node
Item of Information
Leaf
A node which doesn‘t have children is called leaf or Terminal node. Here B, K, L, G, H,
M, J are leafs.
Siblings
Children of the same parents are said to be siblings, Here B, C, D, E are siblings, F, G are
siblings. Similarly, I, J, K, L are Siblings.
Path
A path from node n1to nk is defined as a sequence of nodes n1, n2, n3 ….nk such that n1 is
the parent of ni+1. There is exactly only one path from each node to root.
In fig path from A to L is A, C, F, L where A is the parent for C, C is the parent of F and F
is the parent of L.
Length
The length is defined as the number of edges on the path. In fig the length for the path A
to L is 3.
A
B C D E
F G F I J
K L
51
Degree
The number of sub trees of a node is called its degree.
Degree of A is 4
Degree of C is 2
Degree of D is 1
Degree of H is 0
The degree of the tree is the maximum degree of any node in the tree
In fig the degree of the tree is 4.
Level
The level of a node is defined by initially letting the root be at level one, if a node is at
level L then it as children are at level L+1
Level of A is 1
Level of A, B, C, D is 2
Level of F, G, H, I, J is 3
Level of K, L, M is 4
Depth
For any node n, the depth of n is the length of the unique path from root to n.
The depth of the root is zero
In fig Depth of node F is 2
Depth of node L is 3
Height
For any node n, the height of the node n is the length of the longest path from n to the
leaf.
The height of the leaf is zero
In fig Height of node F is 1
Height of L is 0
II. BINARY TREES
A binary tree is a special form of a tree. A binary tree is more important and frequently
used in various applications.
A T (Binary tree) is defined as,
T is empty or
T contains a specially designated node called the root of T, and the remaining nodes of T
from two disjoint binary trees T1and T2 which are called left-sub tree and the right sub-
tree respectively.
52
Fig: A sample binary tree with 11 nodes
Two possible situations of a binary tree are (a) Full binary tree (b) Complete Binary tree
Full binary tree
A binary tree is a full binary tree, if it contains maximum possible number of nodes in
all level. The full binary tree of height 4
Fig: Full Binary tree of height 4
Complete binary tree
A binary tree is said to be a complete binary tree, if all its level, except possibly the last
level, have the maximum number of possible nodes, all the nodes at the last level appear as far
left as possible.
A complete binary tree of height 4
Fig: A complete binary tree of height 4
III.REPRESENTATION OF BINARY TREE
Two common methods used for representing this structure
1. Linear or sequential representation. (Using an array)
2. Linked representation (Using Pointers)
53
A. Linear Representation of a Binary tree
In this representation, the nodes are stored level by level, starting from the zero level
where only root node is present. Root node is stored in the first memory location.
some rules to decide the location of any of a tree in the array.
The root node is at location 1.
For any node with index i, 1< i≤ n
o PARENT(i)=[i/2] when i=1, there is no parent
o L CHILD(i)=2*I If 2*i>n, then I has no left child
o R CHILD(i)=2*i+1 If 2*i+1>n, then I has no right child
Consider a binary tree for the following expression (A-B)+C*(D/E)
Fig: Binary Tree
The representation of the same Binary tree using array is shown in fig below
A full Binary tree and the index of its various nodes when stored in an array is shown in Fig
below
B. Linked representation of Binary Tree
When we inserting a new node or deleting a node in a linear representation, We require
data movement up and down in the array then it will take excessive amount of processing time.
Linear representation of binary trees has a number of overheads. All these overheads are
taken care of linked representation
Structure of a node in linked rep:
DATA
54
RC LC
RC-Right Child LC=Left Child
Here LC & RC are two Link fields to store the address of left child and right child of a node.
DATA is the information of the node.
The tree with 9 nodes are represented as:
Fig: Binary Tree
OPERATIONS ON BINARY TREES
There are number of primitive operations that can be applied to a binary tree. If p is a
pointer to a node and of a binary tree, the function info(p) returns the contents of nd.
The functions left(p), right(p), father(p), and brother(p) return pointers to the left son of
nd, the right son of nd, the father of nd, and the brother of nd, respectively.
These functions return the null pointer if nd has no left son, right son, father, brother.
Finally, the logical functions isleft(p) and isright(p) return the value true if nd is a left or right
son, respectively, of some other node in the the tree, and false otherwise.
Note that the functions isleft(p),isright(p), and brother(p) can be implemented using the
functions left(p),right(p) and father(p).
Example isleft may be implemented as
Q=father(p);
if (q==null)
return(false);
if (left(q)==p)
return(true);
return(false);
or even simpler, asfather(p) && p== left(father(p)).isright may be implemented in a
similar manner, or by calling isleft.brother(p) may be implemented using isleft or isright as
if (father(p)==null)
return(null);
if (isleft(p))
55
return(right(father(p)));
return(left(father(p));
In constructing a binary tree, the operations maketree, setleft and setright are useful.
Maketree(x) creates a new binary tree consisting of a single node with information field x and
returns a pointer to that node. Setleft(p,x) accepts a pointer p to a binary node with no left son. It
creates a new left son of node(p) with information field x.setright(p,x) is analogous to setleft
except that it creates a right son of node(p).
Make_Empty
This operation is mainly for initialization. Some programmers prefer to initialize the first
element as a one-node tree, but our implementation follows the recursive definition of trees more
closely. It is also a simple routine, as evidenced below
template <class Etype>
void
Binary_Search _ Tree<Etype>::
Make_Empty (Tree_Mode<Etype> * & T)
if (T!= NULL)
Make_Empty( T-> Left);
Make _ Empty( T -> Right) ;
T = NULL;
Find
This operation generally requires returning a pointer to the node in tree T that has key X
or NULL if there is no such node. The protected Find routine does The public routine then
returns nonzero if the Find succeeded, and sets Last_Find. If the find failed, Zero is returned, and
Last_Find point to NULL. The structures tree makes this simple.
template <class type>
Tree_Node<Etype>
Binary _Search_Tree<EType>::
Find (Const Etype & X,Tree-Node<EType>*T) const
if (T==NULL)
56
return NULL;
if (x<T-> Element)
return Find(X,T->Left);
else
if (x>T->Element)
return Find( X,T-> right);
else
return T;
Find_Min and Find_Max
Internally, these routines return the position of the smallest and largest elements in the
tree, respectively.
Although returning the exact values of these elements seem more reasonable, this would
be inconsistently with the Find operation.
It is important that similar-looking operations do similar things. To perform a Find_Min
start at the root and go left as long as there is a left child.
The stopping point is the smallest element. The Find_Max routine is the same, except that
branching is to the right child. The public interface is similar to that of the Find routine
template <class Etype>
Tree_Node<Etype>
Binary _Search_Tree<EType>::
Find_Min (Tree_Node <Etype>* T) const
if (T==NULL)
return NULL;
else
if (T->Left== NULL)
return T;
else
return Find_Min(T->Left);
57
template <class Etype>
void
Binary _Search_Tree<EType>::
Find_Max (Tree_Node <Etype>* T) const
if (T!=NULL)
while (T->Right!=NULL)
T=T->Right;
return T;
BINARY TREE REPRESENTATIONS
Node Representation of Binary Trees
Tree nodes may be implemented with array elements or as allocation of a dynamic
variable. Each node contains info, left, right and father fields. The left, right and father fields of a
node point to the node‘s left son, right son, and father respectively.
Using the array implementation,
#define NUMNODES 500
Struct nodetype
int info;
int left;
int right;
int father;
;
Struct nodetype node[NUMNODES];
Under this representations, the operation info(p),left(p),right(p), and father(p) are
implemented by references to node[p].info, node[p].left, node[p].right and node[p].father
respectively.
To implement isleft and isright more efficiently, we include within each node an
additional flag isleft. The value of this flag is TRUE if the node is a left son and FALSE
otherwise. The root is uniquely identified by a NULL value(0) in its father field.
58
Alternatively, the sign of the father field could be negative if the node is a left son or
positive if it is a right son. The pointer to a node‘s father then given by the absolute value of the
father field. The isleft or isright operations would then need only examine the sign of the father
field.
To implement brother(p) more efficiently, brother field is included in each node. Once
the array of nodes is declared, create an available list by executing the following statements.
int avail, I;
avail=1;
for(i=0;i<NUMNODES;i++)
node[i].left=i+1;
node[NUMNODES-1].left=0;
Note that the available list is not a binary tree but a linear list whose nodes are linked
together by the left field. Each node in a tree is taken from yhe available pool when needed and
returned to the available pool when no longer in use. This representation is called the linked
array representation of a binary tree.
A node may be defined by
Struct nodetype
int info;
struct nodetype *left;
struct nodetype *right;
struct nodetype *father;
;
typedef struct nodetype *NODEPTR;
The operations info(p),left(p),right(p),and father(p) would be implemented by the
references to p->info, p->left, p->right, and p->father respectively. An explicit available list is
not needed. The routines getnode and freenode simply allocate and free nodes using the routines
malloc and free. This representation is called the dynamic node representation of a binary tree.
59
Both the linked array representation and the dynamic node representation are
implementations of an abstract linked representation (also called node representation)in which
implicit explicit pointers link together the nodes of a binary tree.
The maketree function, which allocates a node and sets it as the root of a single-node
binary tree, may be written as
NODEPTR maketree(x)
int x;
NODEPTR P;
P=getnode();
p->info=x;
p->left=NULL;
p->right=NULL;
return(p);
The routine setleft(x) sets a node with contents x as the left son of node(p):
Setleft(p,x);
NODEPTR P;
int x;
if (p==NULL)
printf(―void insertion‖);
else if(p->left!=NULL)
printf(―invalid insertion\n‖);
else
p->left=maketree(x);
The routine setright(p, x) to create a right son of node(p) with contents x is
similar.
INTERNAL AND EXTERNAL NODES
By definition leaf nodes have no sons. Thus in the linked representation of binary trees,
left and right pointers are needed only in non-leaf nodes. Sometimes two separate set of nodes
are used for non-leaves and leaves. Non-leaf nodes contain info, left and right fields and are
allocated as dynamic records or as an array of records managed using an available list. Leaf
60
nodes do not contain a left or right fields and are kept as a single info array that is allocated
sequentially as needed.
Alternatively they can be allocated as dynamic variables containing only an info value.
Each node can also contain a father field, if necessary. When this distinction is made between
non-leaf and leaf nodes, non-leaves are called internal nodes and leaves are called external
nodes.
IMPLICIT ARRAY REPRESENTATION OF BINARY TREES
In general, the nodes n of an almost complete binary tree can be numbered from 1 to n,
so that the number assigned a left son is twice the number assigned its father, and the number
assigned a right son is 1 more than twice the number assigned its father.
We can extend this implicit array representation of almost complete binary trees to an
implicit array representation of binary trees generally. This can be done by identifying an almost
complete binary tree that contains the binary tree being represented.
The Fig (a) illustrates two binary trees, and Fig (b) illustrates the smallest almost
complete binary trees that contain them. Finally Fig(c) illustrates the array representations of
these almost complete binary trees, and by extension, of the original binary trees.
The implicit array representation is also called the sequential representation, because it
allows a tree to be implemented in a contiguous block of memory rather than via pointers
connecting widely separated nodes.
Under the sequential representation, an array element is allocated whether or not it
serves to contain a node of a tree. Therefore, flag unused array elements as non-existent or null
tree nodes.
Fig (a) Two Binary trees
A
B C
D E
F G
H
I J
K L
M
A
B C
D E
G F
61
Fig (b) Almost complete extensions
0 1 2 3 4 5 6 7 8 9 10 11 12
A B C D E F G
0 1 2 3 4 5 6 7 8 9
H I J K L M
Fig(c) Array representations
Example
The program to find duplicate numbers in an input list, as well as the routines
maketree and setleft, using the sequential representation of binary trees
#define NUMNODES 500
Struct nodetype
int info;
int used;
node[NUMNODES];
main()
int p, q, number;
scanf(―%d‖,&number);
maketree(number);
H
I J
K L
M
62
while(scanf(―%d‖,&number)!=EOF)
p=q=0;
while (q<NIMNODES && node[q].used && number!= node[p].info)
p=q;
if (number<node[p].info)
q=2*p+1;
else
q=2*p+2;
if (number==node[p].info)
printf(―%d is a duplicate\n‖, number);
else if (number<node[p].info)
setleft(p,number);
else
setright(p,number);
maketree(x)
int x;
int p;
node[0].info=x;
node[0].used=TRUE;
for (p=1;p<NUMNODES;p++)
node[p].used=FALSE;
setleft (p, x)
int p,x;
int q;
q=2*p+1;
63
if (q>=NUMNODES)
error (―array overflow‖);
else if (node[q].used)
error (―invalid insertion‖);
else
node[q].info=x;
node[q].used=TRUE;
The routine for setright is similar. Note that the routine maketree initializes the
fields info and used to represent a tree with a single node.
IV. BINARY TREE TRAVERSALS
Traversing means visitng each node only once. Tree traversal is a method for visiting all
the nodes in the tree exactly once. There are three types of tree traversal techniques, namely
Inorder Traversal
Preorder Traversal
Postorder Traversal
Inorder Traversal
The Inorder traversal of a binary tree is performed as
Traverse the left subtree in inorder
Visit the root
Traverse the right subtree in inorder
Example
Fig: Inorder 10, 20, 30
20
10 30
20
10 30
64
Fig: Inorder A B C D E G H I J K
Recursive routine for Inorder Traversal
Void Inorder (Tree T)
if ( T!= NULL)
Inorder (T->left);
printElement (T->Element);
Inorder (T->right);
Preorder Traversal
The preorder traversal of a binary tree is performed as
Visit the root
Traverse the left subtree in preorder
Traverse the right subtree in preorder
Example
D
C I
A
B
G K
E H J
20
10 30
65
Fig: Preorder 20, 10, 30
Fig: Preorder D C A B I G E H K J
Recursive routine for Inorder Traversal
Void Preorder (Tree T)
if ( T!= NULL)
printElement (T->Element);
Preorder (T->left);
Preorder (T->right);
Postorder Traversal
The postorder traversal of a binary tree is performed as
Traverse the left subtree in postorder
Traverse the right subtree in postorder
Visit the root
D
C I
A
B
G K
E H J
66
Example
Fig: Postorder 10, 30, 20
Fig: Postorder B A C Education H G J K I D
Recursive routine for Inorder Traversal
Void Postorder (Tree T)
if ( T!= NULL)
Postorder (T->left);
Postorder (T->right);
printElement (T->Element);
V. HUFFMAN ALGORITHM
The inputs to the algorithm are n, the number of symbols in the original alphabet, and
frequency, an array of size at least n such that frequency [i] is the re1ative frequency of the ith
symbol.
The algorithm assigns values to an array code of size at least n, so that code[il contains
the code assigned to the ith symbol.
20
10 30
D
C I
A
B
G K
E H J
67
The algorithm also constructs an array position of size at least n such that position[il
points to the node representing the ith symbol.
This array is necessary to identify the point in the tree from which to start in
constructing the code for a particular symbol in the alphabet. Once the tree has been
constructed, the isleft operation introduced earlier can be used to determine whether 0 or 1
should be placed at the front of the code as we climb the tree.
The info portion of a tree node contains the frequency of occurrence of the symbol
represented by that node.
A set root nodes is used to keep pointers to the roots of partial binary trees that are not yet
left or right subtrees.
Since this set is modified by removing elements with minimum frequency, combining
them and then reinserting the combined element into the set, it is implemented as an ascending
priority queue of pointers, ordered by the value of the info field of the pointers' target nodes.
We use the operations pqinsert, to insert a pointer into the priority queue, and
pqmindelete, to remove the pointer to the node with the smallest info value from the priority
queue.
We may outline Huffman's algorithm as follows:
/* initialize the set of root nodes */
rootnodes = the empty ascending priority queue;
/* construct a node for each symbol */
68
Fig: Huffman trees
The huffman tree is strictly binary. Thus, if there are n symbols in the alphabet, the Huffman tree
can be presented by an array of nodes of size 2n-1.
REPRESENTING LISTS AS BINARY TREES
In this section we introduce a tree representation of a linear list in which
operations of fmding the kth element of a list and deleting a specific e1ement are
relatively efficient.
It is also possible to build a list with given elements using representation. We also
briefly consider the operation of inserting a single new element.
A list may be represented by a binary tree as illustrated in Fig. In Fig(a) shows a
list in the usual linked format. while Fig(b) and (c) show two binary tree
representations of the list.
Elements of the original list are represented by leaves of the tree (shown as
squares in the figure).
Whereas non leaf node tree (shown as circles in the figure) are present as part of
the internal tree structure.
69
Associated with each leaf node are the contents of the corresponding list
edlement. Associated with each nonleaf node is a count representing the number
of leaves in the node's left subtree.
The elements of the list in their original sequence are assigned to the leaves of the
tree in the inorder sequence of the leaves. Note from Fig several binary trees can
represent the same list.
Fig: A list and two corresponding Binary Trees
Finding the kth Element
To justify using so many extra tree nodes to represent a list, we present an
algorithm to find the kth element of a list represented by a tree.
Let tree point to the root of the tree, and let lcount(p) represent the count
associated with the nonleaf node pointed to by p [lcount(p) is the number ofleaves
in the tree rooted at node(left(p))].
The following algorithm sets the variable find to point to the leaf containing the
kth element of the list.
o The algorithm maintains a variable r containing the number of list elements
remaining to be counted.
o At the beginning of the algorithm r is initialized to k. At each nonleaf
node(p), the algorithm determines from the values of rand lcount(p)
whether the kth element is located in the left or right subtree.
o If the leaf is in the left subtree, the algorithm proceeds directly to that
subtree. If the desired leaf is in the right subtree, the algorithm proceeds to
that subtree after reducing the value of r by the value of lcount(p).
o k is assumed to be less than or equal to the number of elements in the list.
r = k;
P = tree;
while (p is not a leaf node)
if(r <=lcount(p))
70
p = left(p);
else
r -= lcount(p);
p = right(p);
find = p;
Fig(a) illustrates finding the fifth element of a list in the tree of Fig(b), and Fig(b)
illustrates finding the eighth element in the tree of Fig(c).
The dashed line represents the path taken by the algorithm down the tree to the
appropriate leaf. We indicate the value of r (the remaining number of elements to
be counted) next to each node encountered by the algorithm.
The number of tree nodes examined in finding the kth list element is less than or equal to
1 more than the depth of the tree (the longest path in the tree from the root to a leaf). Thus four
nodes are examined in Fig (a) in finding the fifth element of the list, and also in Fig(b) in finding
the eighth element. If a list is represented as a linked structure, four nodes are accessed in finding
the fifth element of the list [that is, the operation p = next(p) is performed four times] and seven
nodes are accessed in finding the eighth element.
Although this is not a very impressive saving, consider a list with 1000 elements. A
binary tree of depth 10 is sufficient to represent such a list, since log2 1000 is less than 10.
Thus, finding the kth element using such a binary tree would require examining no more
than 11 nodes. Since the number of leaves of a binary tree increases as 2d, where d is the depth
of the tree, such a tree represents a relatively efficient data structure for finding the kth element
of a list.
If an almost complete tree is used, the kth element of an n-element list can be found in at
most log2n + 1 node accesses, whereas k accesses would be required if a linear linked list were
used.
Fig: Finding the nth element of a tree-represented list
71
Deleting an Element
It involves only resetting a left or right pointer in the father of the deleted leaf dl to
null.Fig illustrates the results of this algorithm for a tree in which the nodes C, D, and B are
deleted in that order. Make sure that you follow the actions of the algorithm on these examples.
Note that the algorithm maintains a 0 count in leaf nodes for consistency, although the count is
not required for such nodes. Note I!o that the algorithm never moves up a nonleaf node even if
this could be done. We can easily modify the algorithm to do this but have not done so for
reasons that will become apparent shortly.
This deletion algorithm involves inspection of up to two nodes at each level. Thus, the
operation deleting the kth element of a list represented by a tree requires a number of node
accesses approximately equal to three times the tree depth.
Although deletion from a linked list requires acesses to only three nodes. For large lists,
therefore, the tree representation is more efficient.
Fig: Deletion Algorithm
TREE SEARCHING
There are several ways of organizing files as trees and some associated searching
algorithms.
In previous, we presented a method of using a binary tree to store a file in order to make
sorting the file more efficient. In that method, all the left descendants of a node with key key
have keys that are less than key, and all right descendants have keys that are greater than or equal
to key.
The inorder such a binary tree yields the file in ascending key order.
Such a tree may also be used as a binary search tree. Using binary tree noation,the
algorithm for searching for the key key in such a tree is as follows
p=tree;
while(p!=NULL && KEY!=k(p))
p=(key< k(p)) ? left(p):right(p);
return(p);
72
The efficiency of the search process can be improved by using a sentinel, as in sequential
searching.
A sentinel node, with a separate external pointer pointing to it, remains allocated with
the tree. All left or right tree pointers that do not point to another tree node now point to this
sentinel node instead of equalling null. When a search is performed, the argument key is first
inserted into the sentinel node, thus guaranteeing that it will be located in the tree.
A sorted array can be produced from a binary search tree by traversing tree in inorder and
inserting each element sequentially into the array as it is visited. On the other hand, there are
many binary search trees that correspond to a given sorted array. Viewing the middle element of
the array as the root of a tree and viewing the remaining elements recursively as left and right
subtrees produces a relatively balanced binary search tree in Fig(a). Viewing the first element of
the array as the root of a tree and each successive element as the right predecessor produces a
very unbalanced binary tree in Fig (b).
The advantage of using a binary search tree over an array is that a tree enables search,
insertion, and deletion operations to be performed efficiently. If an array used, an insertion or
deletion requires that approximately half of the elements array be moved. (Why?) Insertion or
deletion in a search tree, on the other requires that only a few pointers be adjusted.
Fig (a) A sorted array and two of its binary tree representations
73
Fig(b) cont..
Inserting into a Binary search Tree
The following algorithm searches a binary search tree and inserts a new record into the
tree if the search is unsuccessful.
q = null;
p = tree;
while (p != null)
if (key == k(p))
return (p) ;
q = p;
if (key < k(p))
p = left(p);
else
p = right(p);
v = maketree(rec, key);
if (q == null)
tree = v;
else
if (key < k( q) )
left(q) = v;
74
else
right( q) = v;
return ( v) ;
Note that after a new record is inserted, the tree retains the property of being sorted in an inorder
traversal
Deleting from a Binary Search Tree
We now present an algorithm to delete a node with key key from a binary search tree.
There are three cases to consider. If the node to be deleted has no sons, may be deleted without
further adjustment to the tree. This is illustrated in Fig (a).
If the node to be deleted has only one subtree, its only son can be moved up to take its
place. This is illustrated in Fig (b). If, however, the node p to be deleted has two subtrees. its
inorder successor s (or predecessor) must take its place. The inorder successor cannot have a left
subtree.
Thus the right son of s can be moved up to the place of s. This is illustrated in Fig(c),
where the node with key 12
Replaces the node with key 11 and is replaced, in turn by the node with the key 13. In the
algorithm below, if no node with key key exists in the tree, the tree is left unchanged.
Fig(a) Deleting node with key 15
Fig(b) Deleting node with key 5
75
Fig(c) Deleting node with key 11
P=tree;
Q=null;
while (p!=null && k(p)!=key)
q=p;
p=(key < k(p))? Left(p):right(p);
if( p==null)
return;
if (left(p)==null)
rp=right(p);
else
if(right(p)==null)
rp=left(p);
else
f=p;
rp=right(p);
s=left(rp);
while (s!=null)
f=rp;
rp=s;
s=left(rp);
76
if (f!=p)
left (p)=right(p);
right (rp)=right(p);
left(rp)=left(p);
if (q==null)
tree=rp;
else
(p==left(q))? ,eft(q)=rp:right(q)=rp;
freenode(p);
return;
VI. SORTING AND SEARCHING TECHNIQUES
Sorting is the operation of arranging the records of a table according to the key value of
each record. A table of a file is an ordered sequence of records r[1],r[2],…r[n] each containing a
key k[1],k[2]…k[n]. The table is sorted based on the key.
A sorting algorithm is said to be stable if it preserves the order for all records. There are
Internal Sorting
External Sorting
Internal Sort:
All records to be sorted are kept internally in the main memory
External Sort:
If there are large number of records to be stored, they must be kept in external files
on auxiliary storage.
INTERNAL SORTING
A) INSERTION SORT
Insertion sort works by taking elements from the list one by one and inserting them in
their current position into the new sorted list.
Insertion sort consists of N-1 passes, where N is the number of elements to be sorted. The
ith pass of insertion sort will insert the ith element A[i] into its right place among A[1],A[2]..A[i-
1].
77
After doing this insertion the records occupying A[1]..A[i] are in sorted order.
Procedure
Void Insertion_Sort (int a[], int n)
int i, j,temp;
for ( i=0;i<n;i++)
temp=a[i];
for (j=I;j>0 && a[j-1]> temp; j--)
a[j]=a[j-1];
a[j]=temp;
Example
Consider an unsorted array
20 10 60 40 30 1
Passes of Insertion sort
Analysis
Worst Case Analysis O(N2)
Best Case Analysis O(N)
ORIGINAL 20 10 60 40 30 15 POSITIONS MOVED
After i=1 10 20 60 40 30 15 1
After i=2 10 20 60 40 30 15 0
After i=3 10 20 40 60 30 15 1
After i=4 10 20 30 40 60 15 2
After i=4 10 15 20 30 40 60 4
Sorted Array 10 15 20 30 40 60
78
Average Case Analysis O(N2)
B) SHELL SORT
Shell Sort was invented by Donald Shell. It improves upon bubble sort and insertion sort
by moving out of order elements more than one position at a time. It works by arranging the data
sequence in a two dimensional array and then sorting the columns of the array using insertion
sort.
In shell sort the whole array is first fragmented into K segments, where K is preferably a
prime number. After the first pass the whole array is partially sorted. In the next pass, the value
of K is reduced which increases the size of each argument and reduces the number of segments.
The next value of K is chosen so that it is relatively prime to its previous value. The
process is repeated until K=1 at which array is sorted. The insertion sort is applied to each
segment, so each successive segment is partially sorted.
The shell sort is also called the Diminishing Increment Sort, because the value of K
decreases continuously.
Procedure
Void shellsort (int A[],int N)
int i ,j,k,temp;
for (i=k;i<N;i++)
temp=A[i];
for( j=I;j>=k &&A[j-k]>temp;j=j-k)
A[j]=A[j-k];
A[j]=temp;
Example
Consider an unsorted array
81 94 11 96 12 35 17 95 28 58
Here N=10, the first pass K=5(10/2)
79
81 94 11 96 12 35 17 95 28 58
After first pass
35 17 11 28 12 81 94 95 96 58
In second pass, K is reduced to 3
After second pass
28 12 11 35 17 81 58 95 96 94
In third pass, K is reduced to 1
The final sorted array is
11 12 17 28 35 58 81 94 95 96
Analysis
Worst Case Analysis O(N2)
Best Case Analysis O(N log N)
Average Case Analysis O(N1.5
)
C) QUICK SORT
The basic version of quick sort algorithm was invented by C. A. R. Hoare in 1960 and
formally introduced quick sort in 1962.
It is used on the principle of divide-and-conquer. Quick sort is an algorithm of choice in
many situations because it is not difficult to implement, it is a good "general purpose" sort and it
consumes relatively fewer resources during execution.
Good points
It is in-place since it uses only a small auxiliary stack.
It requires only n log(n) time to sort n items.
It has an extremely short inner loop
This algorithm has been subjected to a thorough mathematical analysis, a very precise
statement can be made about performance issues.
Bad Points
It is recursive. Especially if recursion is not available, the implementation is extremely
complicated.
It requires quadratic (i.e., n2) time in the worst-case.
It is fragile i.e., a simple mistake in the implementation can go unnoticed and cause it to
perform badly.
80
Quick sort works by partitioning a given array A[p . . r] into two non-empty sub array A[p . .
q] and A[q+1 . . r] such that every key in A[p . . q] is less than or equal to every key in A[q+1 . .
r]. Then the two sub arrays are sorted by recursive calls to Quick sort. The exact position of the
partition depends on the given array and index q is computed as a part of the partitioning
procedure.
QuickSort
1. If p < r then
2. q= Partition (A, p, r)
3. Recursive call to Quick Sort (A, p, q-1)
4. Recursive call to Quick Sort (A, q + 1, r)
Note that to sort entire array, the initial call Quick Sort (A, 1, length[A])
As a first step, Quick Sort chooses as pivot one of the items in the array to be sorted.
Then array is then partitioned on either side of the pivot. Elements that are less than or equal to
pivot will move toward the left and elements that are greater than or equal to pivot will move
toward the right.
Partitioning the Array
Partitioning procedure rearranges the sub arrays in-place.
1. PARTITION (A, p, r)
2. x ← A[p]
3. i ← p-1
4. j ← r+1
5. while i<j do
6. Repeat j ← j-1
7. until A[j] ≤ x
8. repeat i ← i+1
9. until A[i] ≥ x
10. Exchange A[i] ↔ A[j]
11. Exchange A[p] ↔ A[j]
12. return j
Partition selects the first key, A[p] as a pivot key about which the array will partitioned:
Keys ≤ A[p] will be moved towards the left .
Keys ≥ A[p] will be moved towards the right.
The running time of the partition procedure is (n) where n = r - p +1 which is the
number of keys in the array. Another argument that running time of PARTITION on a subarray
of size (n) is as follows:
81
Pointer i and pointer j start at each end and move towards each other, conveying
somewhere in the middle. The total number of times that i can be incremented and j can be
decremented is therefore O(n).
Associated with each increment or decrement there are O(1) comparisons and swaps.
Hence, the total time is O(n).
Array of Same Elements
Since all the elements are equal, the "less than or equal" teat in lines 6 and 8 in the
PARTITION (A, p, r) will always be true.
This simply means that repeat loop all stop at once. Intuitively, the first repeat loop
moves j to the left; the second repeat loop moves i to the right. In this case, when all elements are
equal, each repeat loop moves i and j towards the middle one space.
They meet in the middle, so q= Floor(p+r/2). Therefore, when all elements in the array
A[p . . r] have the same value equal to Floor(p+r/2).
Performance of Quick Sort
The running time of quick sort depends on whether partition is balanced or unbalanced,
which in turn depends on which elements of an array to be sorted are used for partitioning.
A very good partition splits an array up into two equal sized arrays. A bad partition, on
other hand, splits an array up into two arrays of very different sizes
. The worst partition puts only one element in one array and all other elements in the
other array. If the partitioning is balanced, the Quick sort runs asymptotically as fast as merge
sort.
On the other hand, if partitioning is unbalanced, the Quick sort runs asymptotically as
slow as insertion sort.
Best Case
The best thing that could happen in Quick sort would be that each partitioning stage
divides the array exactly in half. In other words, the best to be a median of the keys in A[p . . r]
every time procedure 'Partition' is called.
The procedure 'Partition' always split the array to be sorted into two equal sized arrays.
If the procedure 'Partition' produces two regions of size n/2. the recurrence relation is then
T(n) = T(n/2) + T(n/2) + (n)
= 2T(n/2) + (n)
82
And from case 2 of Master theorem
T(n) = (n lg n)
Worst case Partitioning
The worst-case occurs if given array A[1 . . n] is already sorted. The PARTITION (A, p,
r) call always return p so successive calls to partition will split arrays of length n, n-1, n-2, . . . , 2
and running time proportional to n + (n-1) + (n-2) + . . . + 2 = [(n+2)(n-1)]/2 = (n2). The worst-
case also occurs if A[1 . . n] starts out in reverse order.
We conclude that QUICKSORT's average running time is (n lg(n))
Quick sort is an in place sorting algorithm whose worst-case running time is (n2) and expected
running time is (n lg n)
D) HEAP SORT
The binary heap data structures is an array that can be viewed as a complete binary tree.
Each node of the binary tree corresponds to an element of the array. The array is completely
filled on all levels except possibly lowest.
We represent heaps in level order, going from left to right. The array corresponding to the
heap above is [25, 13, 17, 5, 8, 3].
The root of the tree A[1] and given index i of a node, the indices of its parent, left child
and right child can be computed.
83
PARENT (i)
return floor (i/2
LEFT (i)
return 2i
RIGHT (i)
return 2i + 1
Let's try these out on a heap to make sure we believe they are correct. Take this heap,
which is represented by the array [20, 14, 17, 8, 6, 9, 4, 1].
We'll go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left
child, we calculate 1 * 2 = 2. This takes us (correctly) to the 14. Now, we go right, so we
calculate 2 * 2 + 1 = 5. This takes us (again, correctly) to the 6.
Now let's try going from the 4 to the 20. 4's index is 7. We want to go to the parent, so we
calculate 7 / 2 = 3, which takes us to the 17. Now, to get 17's parent, we calculate 3 / 2 = 1,
which takes us to the 20.
Heap Property
In a heap, for every node i other than the root, the value of a node is greater than or equal
(at most) to the value of its parent.
A[PARENT (i)] ≥A[i]
Thus, the largest element in a heap is stored at the root. Following is an example of Heap:
By the definition of a heap, all the tree levels are completely filled except possibly for the
lowest level, which is filled from the left up to a point.
84
Clearly a heap of height h has the minimum number of elements when it has just one
node at the lowest level. The levels above the lowest level form a complete binary tree of height
h -1 and 2h
-1 nodes. Hence the minimum number of nodes possible in a heap of height h is 2h.
Clearly a heap of height h, has the maximum number of elements when its lowest level is
completely filled. In this case the heap is a complete binary tree of height h and hence has 2h+1
-1
nodes.
Following is not a heap, because it only has the heap property - it is not a complete binary
tree.
Recall that to be complete, a binary tree has to fill up all of its levels with the possible
exception of the last one, which must be filled in from the left side.
Height of a node
We define the height of a node in a tree to be a number of edges on the longest simple
downward path from a node to a leaf.
Height of a tree
The number of edges on a simple downward path from a root to a leaf. Note that the
height of a tree with n node is lg n which is (logn). This implies that an n-element heap has
height lg n
In order to show this let the height of the n-element heap be h. From the bounds obtained
on maximum and minimum number of elements in a heap, we get
2h ≤ n ≤ 2
h+1-1
Where n is the number of elements in a heap.
2h ≤ n ≤ 2
h+1
Taking logarithms to the base 2
h ≤ lgn ≤ h +1 It follows that h = lgn
85
We known from above that largest element resides in root, A[1]. The natural question to
ask is where in a heap might the smallest element resides?
Consider any path from root of the tree to a leaf. Because of the heap property, as we
follow that path, the elements are either decreasing or staying the same.
If it happens to be the case that all elements in the heap are distinct, then the above
implies that the smallest is in a leaf of the tree.
It could also be that an entire subtree of the heap is the smallest element or indeed that
there is only one element in the heap, which in the smallest element, so the smallest element is
everywhere.
Note that anything below the smallest element must equal the smallest element, so in
general, only entire subtrees of the heap can contain the smallest element.
Inserting Element in the Heap
Suppose we have a heap as follows
Let's suppose we want to add a node with key 15 to the heap. First, we add the node to
the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree
remains complete.
Let's suppose we want to add a node with key 15 to the heap. First, we add the node to
the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree
remains complete.
86
Now we do the same thing again, comparing the new node to its parent. Since 14 < 15,
we have to do another swap
Now we are done, because 15 20.
Four basic procedures on heap are
1. Heapify, which runs in O(lg n) time.
2. Build-Heap, which runs in linear time.
3. Heap Sort, which runs in O(n lg n) time.
4. Extract-Max, which runs in O(lg n) time.
Maintaining the Heap Property
Heapify is a procedure for manipulating heap data structures. It is given an array A and
index i into the array.
The subtree rooted at the children of A[i] are heap but node A[i] itself may possibly
violate the heap property i.e., A[i] < A[2i] or A[i] < A[2i +1].
The procedure 'Heapify' manipulates the tree rooted at A[i] so it becomes a heap. In other
words, 'Heapify' is let the value at A[i] "float down" in a heap so that subtree rooted at index i
becomes a heap.
Outline of Procedure Heapify
Heapify picks the largest child key and compare it to the parent key. If parent key is
larger than heapify quits, otherwise it swaps the parent key with the largest child key. So that the
parent is now becomes larger than its children.
87
It is important to note that swap may destroy the heap property of the subtree rooted at
the largest child node. If this is the case, Heapify calls itself again using largest child node as the
new root.
Heapify (A, i)
1. l ← left [i]
2. r ← right [i]
3. if l ≤ heap-size [A] and A[l] > A[i]
4. then largest ← l
5. else largest ← i
6. if r ≤ heap-size [A] and A[i] > A[largest]
7. then largest ← r
8. if largest ≠ i
9. then exchange A[i] ↔ A[largest]
10. Heapify (A, largest)
Analysis
If we put a value at root that is less than every value in the left and right subtree, then
'Heapify' will be called recursively until leaf is reached.
To make recursive calls traverse the longest path to a leaf, choose value that make
'Heapify' always recurse on the left child.
It follows the left branch when left child is greater than or equal to the right child, so
putting 0 at the root and 1 at all other nodes, for example, will accomplished this task.
With such values 'Heapify' will called h times, where h is the heap height so its running
time will be θ(h) (since each call does (1) work), which is (lgn). Since we have a case in
which Heapify's running time (lg n), its worst-case running time is Ω(lgn).
Example of Heapify
Suppose we have a complete binary tree somewhere whose subtrees are heaps. In the
following complete binary tree, the subtrees of 6 are heaps:
88
The Heapify procedure alters the heap so that the tree rooted at 6's position is a heap.
Here's how it works. First, we look at the root of our tree and its two children.
We then determine which of the three nodes is the greatest. If it is the root, we are done,
because we have a heap. If not, we exchange the appropriate child with the root, and continue
recursively down the tree. In this case, we exchange 6 and 8, and continue.
Now, 7 is greater than 6, so we exchange them.
We are at the bottom of the tree, and can't continue, so we terminate.
Building a Heap
We can use the procedure 'Heapify' in a bottom-up fashion to convert an array A[1 . . n]
into a heap. Since the elements in the subarray A[ n/2 +1 . . n] are all leaves, the procedure
BUILD_HEAP goes through the remaining nodes of the tree and runs 'Heapify' on each one. The
bottom-up order of processing node guarantees that the subtree rooted at children are heap before
'Heapify' is run at their parent.
BUILD_HEAP (A)
89
1. heap-size (A) ← length [A]
2. For i ← floor(length[A]/2 down to 1 do
3. Heapify (A, i)
We can build a heap from an unordered array in linear time
Heap Sort Algorithm
The heap sort combines the best of both merge sort and insertion sort. Like merge sort,
the worst case time of heap sort is O(n log n) and like insertion sort, heap sort sorts in-place. The
heap sort algorithm starts by using procedure BUILD-HEAP to build a heap on the input array
A[1 . . n]. Since the maximum element of the array stored at the root A[1], it can be put into its
correct final position by exchanging it with A[n] (the last element in A). If we now discard node
n from the heap than the remaining elements can be made into heap. Note that the new element
at the root may violate the heap property. All that is needed to restore the heap property.
HEAPSORT (A)
1. BUILD_HEAP (A)
2. for i ← length (A) down to 2 do
exchange A[1] ↔ A[i]
heap-size [A] ← heap-size [A] - 1
Heapify (A, 1)
The HEAPSORT procedure takes time O(n lg n), since the call to BUILD_HEAP takes time
O(n) and each of the n -1 calls to Heapify takes time O(lg n).
Now we show that there are at most n/2h+1
nodes of height h in any n-element heap.
We need two observations to show this. The first is that if we consider the set of nodes of height
h, they have the property that the subtree rooted at these nodes are disjoint.
In other words, we cannot have two nodes of height h with one being an ancestor of the
other. The second property is that all subtrees are complete binary trees except for one subtree.
Let Xh be the number of nodes of height h. Since Xh-1 o f these subtrees are full, they each
contain exactly 2h+1
-1 nodes.
90
One of the height h subtrees may not full, but contain at least 1 node at its lower level
and has at least 2h nodes. The exact count is 1+2+4+ . . . + 2
h+1 + 1 = 2
h. The remaining nodes
have height strictly more than h.
To connect all subtrees rooted at node of height h., there must be exactly Xh -1 such
nodes. The total of nodes is at least
(Xh-1)(2h+1
+ 1) + 2h + Xh-1 which is at most n.
Simplifying gives
Xh ≤ n/2h+1
+ 1/2.
In the conclusion, it is a property of binary trees that the number of nodes at any level is
half of the total number of nodes up to that level. The number of leaves in a binary heap is equal
to n/2, where n is the total number of nodes in the tree, is even and n/2 when n is odd. If these
leaves are removed, the number of new leaves will be lgn/2/2 or n/4 . If this process is
continued for h levels the number of leaves at that level will be n/2h+1
Implementation
void heapSort(int numbers[], int array_size)
int i, temp;
for (i = (array_size / 2)-1; i >= 0; i--)
siftDown(numbers, i, array_size);
for (i = array_size-1; i >= 1; i--)
temp = numbers[0];
numbers[0] = numbers[i];
numbers[i] = temp;
siftDown(numbers, 0, i-1);
void siftDown(int numbers[], int root, int bottom)
int done, maxChild, temp;
done = 0;
while ((root*2 <= bottom) && (!done))
91
if (root*2 == bottom)
maxChild = root * 2;
else if (numbers[root * 2] > numbers[root * 2 + 1])
maxChild = root * 2;
else
maxChild = root * 2 + 1;
if (numbers[root] < numbers[maxChild])
temp = numbers[root];
numbers[root] = numbers[maxChild];
numbers[maxChild] = temp;
root = maxChild;
else
done = 1;
EXTERNAL SORTING
It is used for sorting methods that are employed when the data to be sorted is too large to
fit in primary memory.
Need for external sorting
During the sorting, some of the data must be stored externally such as tape or disk
The cost of accessing data is significantly greater than either book keeping or comparison
costs
If tape is used as external memory then the items must be accessed sequentially
Steps to be followed
The basic external sorting algorithm uses the merge routine from merge sort
Divide the file into runs that the size of a run is small enough to fit into main memory
Sort each run in main memory
Merge the resulting runs together into successively bigger runs
Repeat the steps until the file is sorted
92
MERGE SORT
Merge-sort is based on the divide-and-conquer paradigm. The Merge-sort algorithm can be
described in general terms as consisting of the following three steps:
1. Divide Step
If given array A has zero or one element, return S; it is already sorted. Otherwise, divide
A into two arrays, A1 and A2, each containing about half of the elements of A.
2. Recursion Step
Recursively sort array A1 and A2.
3. Conquer Step
Combine the elements back in A by merging the sorted arrays A1 and A2 into a sorted
sequence.
We can visualize Merge-sort by means of binary tree where each node of the tree
represents a recursive call and each external nodes represent individual elements of given array
A. Such a tree is called Merge-sort tree. The heart of the Merge-sort algorithm is conquer step,
which merge two sorted sequences into a single sorted
sequence.
To begin, suppose that we have two sorted arrays A1[1], A1[2], . . , A1[M] and A2[1],
A2[2], . . . , A2[N]. The following is a direct algorithm of the obvious strategy of successively
choosing the smallest remaining elements from A1 to A2 and putting it in A.
MERGE (A1, A2, A)
i.← j 1
A1[m+1], A2[n+1] ← INT_MAX
For k ←1 to m + n do
if A1[i] < A2[j]
then A[k] ← A1[i]
i ← i +1
else
93
A[k] ← A2[j]
j ← j + 1
Merge Sort Algorithm
MERGE_SORT (A)
A1[1 . . n/2 ] ← A[1 . . n/2 ]
A2[1 . . n/2 ] ← A[1 + n/2 . . n]
Merge Sort (A1)
Merge Sort (A1)
Merge Sort (A1, A2, A)
Analysis
Let T(n) be the time taken by this algorithm to sort an array of n elements dividing A into
subarrays A1 and A2 takes linear time. It is easy to see that the Merge (A1, A2, A) also takes the
linear time. Consequently,
T(n) = T( n/2 ) + T( n/2 ) + θ(n)
for simplicity
T(n) = 2T (n/2) + θ(n)
The total running time of Merge sort algorithm is O(n lg n), which is asymptotically optimal
like Heap sort, Merge sort has a guaranteed n lg n running time. Merge sort required (n) extra
space. Merge is not in-place algorithm. The only known ways to merge in-place (without any
extra space) are too complex to be reduced to practical program.
Implementation
void mergeSort(int numbers[], int temp[], int array_size)
m_sort(numbers, temp, 0, array_size - 1);
94
void m_sort(int numbers[], int temp[], int left, int right)
int mid;
if (right > left)
mid = (right + left) / 2;
m_sort(numbers, temp, left, mid);
m_sort(numbers, temp, mid+1, right);
merge(numbers, temp, left, mid+1, right);
void merge(int numbers[], int temp[], int left, int mid, int right)
int i, left_end, num_elements, tmp_pos;
left_end = mid - 1;
tmp_pos = left;
num_elements = right - left + 1;
while ((left <= left_end) && (mid <= right))
if (numbers[left] <= numbers[mid])
temp[tmp_pos] = numbers[left];
tmp_pos = tmp_pos + 1;
left = left +1;
else
temp[tmp_pos] = numbers[mid];
tmp_pos = tmp_pos + 1;
mid = mid + 1;
95
while (left <= left_end)
temp[tmp_pos] = numbers[left];
left = left + 1;
tmp_pos = tmp_pos + 1;
while (mid <= right)
temp[tmp_pos] = numbers[mid];
mid = mid + 1;
tmp_pos = tmp_pos + 1;
for (i=0; i <= num_elements; i++)
numbers[right] = temp[right];
right = right - 1;
SIMPLE ALGORITHM (2 way merge)
Let us consider four tapes Ta1, Ta2, Tb1, Tb2 which are two input and two output tapes.
The and b tapes can act as either input tapes or output tapes depending upon the algorithm.
Let the size of the run (M) is taken as 3 to sort the following set of values.
Ta1 44 80 12 35 45 58 75 60 24 48 92 98 85
Ta2
Ta3
96
Initial Run Construction
Step 1: Read M records at a time from the i/p tape Ta1
Step 2: Sort the records internally and write the resultant records alternately to Tb1 and Tb2
First 3 records from the input tape Ta1 is read and sorted internally as (12, 44, 80) and
placed in Tb1
Then next 3 records(35, 45, 58) are read and the sorted records is placed in Tb2
Similarly the rest of the records are placed alternatively inTb1 and Tb2 contain a group of runs.
Number of runs=4
First Pass
First run of Tb1 &Tb2 are merged and the sorted records placed in Ta1
Similarly the second run of Tb1 &Tb2 are merged and the sorted records placed in Ta2
Here the number of run is reduced to 2, but the size of the runs is increased.
Second Pass
Ta1 12 35 44 45 58 80 85
Ta2 24 48 60 75 92 98
Tb1
Tb2
Ta1
Ta2
Tb1 12 44 80 24 60 75 85
Tb2 35 45 58 48 92 98
97
First run of Ta1 and Ta2 are merged and the sorted records is placed inTb1 and the
second run of Ta1 is placed in Tb2.
Third Pass
In the third pass, run fromTb1 and Tb2 are merged and then sorted records are placed in Ta1.
This algorithm will require log(N/M)passes, plus the initial run constructing pass
MULTIWAY MERGE
The number of passes required to sort an input can be reduced by increasing the number
of tapes. This can be done by extending the 2 way merge to k way merge. The only difference is
that, it is more complicated to find the smallest of the k elements, which can be overcome by
using priority queues.
For the same input,
44 80 12 35 45 58 75 60 24 48 92 98 85
let us consider 6 tapes Ta1, Ta2, Ta3, Tb1, Tb2, Tb3 & M=3
Initial run constructing pass
Ta1 12 24 35 44 45 48 58 60 75 80 85 92
Ta2
Tb1
Tb2
Ta1
Ta2
Tb1 12 24 35 44 45 48 58 60 75 80 92
Tb2 85
98
In first pass, first run of Tb1, Tb2 and Tb3 are merged and sorted records are placed
Ta1
Similarly second run from Tb1 & Tb2 are merged and sorted records are then placed in Ta2
First Pass
In second pass, runs from Ta1 and Ta2 are merged and the sorted records are placed in
Tb1, which contains the final sorted records.
Second Pass
For the same example 2-way merge requires 4 passes to get the sorted elements whereas, in
multiway merge it is reduced to 3 passes, which also includes the initial run constructing pass.
POLYPHASE MERGE
The k way merge strategy requires 2k tapes to perform the sorting. In some application it
is possible to get by only k+1 tapes.
Ta1
Ta2
Ta3
Tb1 12 44 80 48 92 98
Tb2 35 45 58 85
Tb3 24 60 75
Ta1 12 24 35 44 45 58 60 75 80
Ta2 48 85 92 98
Ta3
Tb1
Tb2
Tb3
Ta1 12 24 35 44 45 48 58 60 75 80 85 92 98
Ta2
Ta3
Tb1
Tb2
Tb3
99
Example let us consider 3 tapes T1, T2 T3 and the input file on T1 that produces 8 runs
The distribution of the runs in each type varies as
1. Equal distribution (4&4) runs
2. Unequal Distribution (7&1) runs
3. Fibonacci numbers (3&5) runs
Equal Distribution
Put 4 runs on each tapes T2 & T3, after applying merge routine, the resultant tape T1 has
4 runs, whereas other tapes T2 & T3 are empty which leads to adding an halfpass for every pass.
In first pass, all the runs(4) are placed in one tapes, so it logically divided and placed half of the
runs(2) in any of the other tapes.
Unequal Distribution
For instance, if 7 runs are placed on T2 and 1 run in T3 then after the first merge T1 will
hold 1 run and T2 will hold 6 runs. As it merge only one set of run, the process get slower
resulting more number of passes.
Fibonacci Numbers
Tapes Run construction AfterT2+T3 After splitting After T1+T2 After split After T2+T3
T1 0 4 2 0 0 1
T2 4 0 2 0 1 0
T3 4 0 0 2 1 0
Tapes Run After After After After After After After
Const T2+T3 T1+T2 T2+T3 T1+T2 T2+T3 T1+T2 T2+T3
T1 0 1 0 1 0 1 0 1
T2 7 6 5 4 3 2 1 0
T3 1 0 1 0 1 0 1 0
100
If the number of runs is the fibonacci number F(N), then the runs are distributed as 2
fibonacci numbers F(N-1) & F(N-2)
Here the number of runs is 8, a fibonacci number, so it can be distributed as 3 runs in
Tape T2 and 5 runs in T3
This method of distributing runs gives the optimal result, i.e less number of passes to sort the
records than the other two methods.
VI.HASHING
An array in which TableNodes are not stored consecutively - their place of storage is
calculated using the key and a hash function
Hashed key: the result of applying a hash function to a key
Keys and entries are scattered throughout the array
An array in which TableNodes are not stored consecutively - their place of storage is
calculated using the key and a hash function
insert: calculate place of storage, insert TableNode; O(1)
find: calculate place of storage, retrieve entry; O(1)
remove: calculate place of storage, set it to null; O(1)
All are O(1) !
Tapes Run After After After After
Const T2+T3 T1+T3 T1+T2 T2+T3
T1 0 3 1 0 1
T2 3 0 2 1 0
T3 5 2 0 1 0
101
Three factors affecting the performance of hashing
The hash function
o Ideally, it should distribute keys and entries evenly throughout the table
o It should minimise collisions, where the position given by the hash function is
already occupied
The collision resolution strategy
o Separate chaining: chain together several keys/entries in each position
o Open addressing: store the key/entry in a different position
The size of the table
o Too big will waste memory; too small will increase collisions and may
eventually force rehashing (copying into a larger table)
o Should be appropriate for the hash function used – and a prime number is best
Choosing a hash function: turning a key into a table position
Truncation
o Ignore part of the key and use the rest as the array index (converting non-
numeric parts)
o A fast technique, but check for an even distribution throughout the table
Folding
o Partition the key into several parts and then combine them in any convenient
way
o Unlike truncation, uses information from the whole key
Modular arithmetic (used by truncation & folding, and on its own)
o To keep the calculated table position within the table, divide the position by
the size of the table, and take the remainder as the new position
Examples of hash functions
Truncation: If students have an 9-digit identification number, take the last 3 digits as
the table position
o e.g. 925371622 becomes 622
102
Folding: Split a 9-digit number into three 3-digit numbers, and add them
o e.g. 925371622 becomes 925 + 376 + 622 = 1923
Modular arithmetic: If the table size is 1000, the first example always keeps within the
table range, but the second example does not (it should be mod 1000)
o e.g. 1923 mod 1000 = 923 (in Java: 1923 % 1000)
Using a telephone number as a key
o The area code is not random, so will not spread the keys/entries evenly
through the table (many collisions)
o The last 3-digits are more random
Using a name as a key
o Use full name rather than surname (surname not particularly random)
o Assign numbers to the characters (e.g. a = 1, b = 2; or use Unicode values)
o Strategy 1: Add the resulting numbers. Bad for large table size.
o Strategy 2: Call the number of possible characters c (e.g. c = 54 for alphabet
in upper and lower case, plus space and hyphen). Then multiply each character in
the name by increasing powers of c, and add together
Choosing the table size to minimize collisions
As the number of elements in the table increases, the likelihood of a collision increases
- so make the table as large as practical
If the table size is 100, and all the hashed keys are divisible by 10, there will be many
collisions!
o Particularly bad if table size is a power of a small integer such as 2 or 10
More generally, collisions may be more frequent if:
o greatest common divisor (hashed keys, table size) > 1
Therefore, make the table size a prime number (gcd = 1)
Collisions may still happen, so we need a collision resolution strategy
Collision resolution: open addressing
Probing: If the table position given by the hashed key is already occupied, increase the position
by some amount, until an empty position is found.
Linear probing: increase by 1 each time [mod table size!]
Quadratic probing: to the original position, add 1, 4, 9, 16,…
Use the collision resolution strategy when inserting and when finding (ensure that the
search key and the found keys match
May also double hash: result of linear probing result of another hash function With
open addressing, the table size should be double the expected no. of elements
If the table is fairly empty with many collisions, linear probing may cluster (group)
keys/entries. This increases the time to insert and to find
103
1 2 3 4 5 6 7 8
For a table of size n, then if the table is empty, the probability of the next entry going to
any particular place is 1/n.In the diagram, the probability of position 2 getting filled next is 2/n
(either a hash to 1 or to 2 fills it).Once 2 is full, the probability of 4 being filled next is 4/n and
then of 7 is 7/n (i.e. the probability of getting long strings steadily increases)
An empty key/entry marks the end of a cluster, and so can be used to terminate a find
operation.So, if we remove an entry within a cluster, we should not empty it.To allow probing to
continue, the removed entry must be marked as ‗removed but cluster continues‘
Quadratic probing is a solution to the clustering problem
Linear probing adds 1, 2, 3, etc. to the original hashed key
Quadratic probing adds 12, 2
2, 3
2 etc. to the original hashed key
However, whereas linear probing guarantees that all empty positions will be examined if
necessary, quadratic probing does not
e.g. Table size 16 and original hashed key 3 gives the sequence: 3, 4, 7, 12, 3, 12, 7, 4…
More generally, with quadratic probing, insertion may be impossible if the table is more than
half-full
A simple Hash Function
Unsigned int
Hash (Const String & Key, Const int H_Size)
Const char*key_ptr= Key;
Unsigned int Hash_val=0;
While (*key_ptr)
Hash_val+=*key_ptr++;
return Hash_val%H_Size;
Collision resolution: chaining
Each table position is a linked list. Add the keys and entries anywhere in the list (front easiest)
104
Advantages over open addressing:
– Simpler insertion and removal
– Array size is not a limitation (but should still minimize collisions: make table size
roughly equal to expected number of keys and entries)
Disadvantage
– Memory overhead is large if entries are small
Rehashing: enlarging the table
To rehash:
Create a new table of double the size (adjusting until it is again prime)
Transfer the entries in the old table to the new table, by recomputing their positions (using
the hash function)
When should we rehash?
When the table is completely full
With quadratic probing, when the table is half-full or insertion fails
Why double the size?
If n is the number of elements in the table, there must have been n/2 insertions before the
previous rehash (if rehashing done when table full)
So by making the table size 2n, a constant cost is added to each insertion
Applications of Hashing
Compilers use hash tables to keep track of declared variables
A hash table can be used for on-line spelling checkers — if misspelling detection (rather
than correction) is important, an entire dictionary can be hashed and words checked in
constant time
Game playing programs use hash tables to store seen positions, thereby saving
computation time if the position is encountered again
105
Hash functions can be used to quickly check for inequality — if two elements hash to
different values they must be different
Storing sparse data
Performance of Hashing
The number of probes depends on the load factor (usually denoted by ) which represents
the ratio of entries present in the table to the number of positions in the array
We also need to consider successful and unsuccessful searches separately
For a chained hash table, the average number of probes for an unsuccessful search is and
for a successful search is 1 + /2
Question Bank
Unit III- TREES
PART – A (2 MARKS)
1. Explain Tree concept? 2. What is meant by traversal? 3. What is meant by depth first order? 4. What is In order traversal? 5. What is Pre order traversal? 6. What is Post order traversal? 7. Define Binary tree. 8. What is meant by BST? 9. Define AVL trees. 10. Give example for single rotation and double rotation. 11. Define Hashing. 12. Define Double Hashing. 13. What is meant by Binary Heap? 14. Mention some applications of Priority Queues. 15. Define complete binary tree. 16. How a binary tree is represented using an array? Give an example 17. A full node is a node with two children. Prove that the number of full nodes plus
one is equal to the number of leaves in a non empty binary tree. 18. Define (i) inorder (ii) preorder (iii) postorder traversal of a binary tree. 19. Suppose that we replace the deletion function, which finds, return, and removes
the minimum element in the priority queue, with find min, can both insert and find min be implemented in constant time?
20. What is an expression tree? 21. What is binary search tree? 22. What is meant by sorting? 23. Mention the preliminaries of sorting. 24. What are the types of sorting? 25. What is the difference between bubble sort and selection sort? 26. Give example for insertion sort. 27. Mention the Running time for insertion sort. 28. What is meant by heap sort? 29. What is meant by Quick sort? 30. What is the advantage of Quick sort over Merge sort? 31. Mention the Best case n worst care of the quick sort. 32. What is meant by external sorting? 33. Determine the average running time of quick sort. 34. Trace the steps of insertion sort – 12,19,33,26,29,35,22. Find the total number of 35. comparison made. 36. What is the principle of radix sort?
106
37. What is insertion sort? 38. What is shell sort? 39. Define the worst case analysis of shell sort 40. What is merge sort? 41. What is meant by external sorting? 42. What is multiway merge?
PART- B (16 MARKS)
1. Explain the operation and implementation of Binary Heap. 2. Explain the implementation of different Hashing techniques. 3. Give the prefix, infix and postfix expressions corresponding to the tree given in
figure. (a) How do you insert an element in a binary search tree? (8) (b) Show that for the perfect binary tree of height h containing2h+1-1 nodes, the sum of the heights of the nodes 2h+1 -1-1(h+1). (8)
4. Given input 4371,1323,6173,4199,4344,9679,1989 and a hash function h(X)=X(mod10), show the resulting:
a. Separate chaining table (4) b. Open addressing hash table using linear probing (4) c. Open addressing hash table using quadratic probing (4) d. Open addressing hash table with second hash function h2(X) =7-(X mod
7). (4)
5. Explain in detail (i) Single rotation (ii) double rotation of an AVL tree. 6. Explain the efficient implementation of the priority queue ADT 7. Explain how to find a maximum element and minimum element in BST? Explain
detail about Deletion in Binary Search Tree?
e
+
a b c d 8. Sort the sequence 3, 1, 4,7,5,9,2,6,5 using Insertion sort. 9. Explain the operation and implementation of Insertion sort and shell sort. 10. Explain the operation and implementation of merge sort. 11. Explain the operation and implementation of external sorting.
a. Write quick sort algorithm and explain. (10) b. Trace the quick sort algorithm for the following list of numbers.
90,77,60,99,55,88,66 (6) 12. Write down the merge sort algorithm and give its worst case, best case and
average case analysis. 13. Show how heap sort processes the input
142,543,123,65,453,879,572,434,111,242,811,102.
107
UNIT - IV GRAPHS AND THEIR APPLICATIONS
Graphs – An Application of Graphs – Representation – Transitive Closure –Warshall‘s
Algorithm – Shortest path Algorithm – A Flow Problem – Dijikstra‘s Algorithm – Minimum
Spanning Trees – Kruskal and Prim‘s Algorithm – An Application of Scheduling – Linked
Representation of Graphs – Graph Traversals
108
Unit: IV - GRAPHS AND THEIR APPLICATIONS
GRAPHS
A graph is a time you have a set of objects, and there is some connection or relationship
or interaction between collection of vertices or nodes, connected by a collection of edges.
Graphs are extremely important because they are a very flexible mathematical model for
many application problems.
Basically, any pairs of objects, a graph is a good way to model this. Examples of graphs
in application include communication and transportation networks, VLSI and other sorts of logic
circuits, surface meshes used for shape description in computer-aided design and geographic
information systems, precedence constraints in scheduling systems
Directed Graph
A directed graph (or digraph) G = (V,E) consists of a finite set V , called the vertices or
nodes, and E, a set of ordered pairs, called the edges of G.
Undirected Graph
An undirected graph (or graph) G = (V,E) consists of a finite set V of vertices, and a set E
of unordered pairs of distinct vertices, called the edges.
Directed graphs and undirected graphs are different objects mathematically. We say that
vertex v is adjacent to vertex u if there is an edge (u; v).
In a directed graph, given the edge e = (u; v), we say that u is the origin of e and v is the
destination of e. In undirected graphs u and v are the endpoints of the edge. The edge e is
incident (meaning that it touches) both u and v.
In a digraph, the number of edges coming out of a vertex is called the out-degree of that
vertex, and the number of edges coming in is called the in-degree.
In an undirected graph we just talk about the degree of a vertex as the number of incident
edges.
Weighted Graph
109
In weighted graph, a value (weight) is assigned to each vertex. Weighted graph may be
directed graph or undirected graph. Weight of a edge can be represented by W(u,v), in which
edge is in between the node u and v.
For example, W(1,2) = 1, W(1,4) = 3 in digraph.
AN APPLICATION OF GRAPHS
Assume one input line containing four integers followed by any number of input lines
with two integers each. The first integer on the first line, n, represents the number of cities,
which for simplicity are numbered from 0 to n-1. The second and third integers on that line are
between 0 and n-1 and represent two cities.
It is desired tot ravel from the first city to the second city using exactly nr nodes, where
nr is the fourth integer on the first input line.
Each subsequent input line contains two integers representing two cities, indicating that
there is a road from the first city to the second.
The problem is to determine whether there is a path of required line by which one can
travel from the first of the given cities to the second.
A plan for solution is the following: Create a graph with the cities as nodes and the roads
as arcs. To find a path of length nr from node A to node B, look for the node C such that an arc
exists from A to C and a path of length nr-1 exists from C to B.
If these conditions are satisfied from some node C, the desired path exists. If the
conditions are not satisfied for any node C, the desired path does not exist.
scanf (―%d‖, &n);
scanf(―%d%d‖, &a, &b);
scanf(―%d‖, &nr);
while (scanf(―%d %d‖,& city1,&city2)!=EOF)
join(city1,city2);
if (findpath(nr, a, b))
printf(―a path exists from %d to %d in %d steps‖, a, b ,nr);
else
printf(― no path exists from %d to %d in %d steps);
The algorithm for the function findpath(k, a,b) as:
110
if(k= =1)
return (adjacent(a,b));
for (c=0;c<n;++c)
if(adjacent(a,c) && findpath(k-1,c,b))
return (FALSE);
REPRESENTATIONS
DIRECTED GRAPH
Adjacency Matrix
An n x n matrix defined for 1 <= v; w <= n.
If the digraph has weights we can store the weights in the matrix. For example if (v;w) E then
A[v,w] = W(v,w). If (v,w) E then generally W(v,w) need not be defined, but often we set it to
some special value, e.g. A(v,w) = −1, or 1.
Adjacency List
An array adj[1 …. n] of pointers where for 1 <= v <= n, adj[v] points to a linked list
containing the vertices which are adjacent to v (i.e. the vertices that can be reached from v by a
single edge). If the edges have weights then these weights may also be stored in the linked list
elements.
UNDIRECTED GRAPH
Undirected graphs using exactly the same representation, but we will store each edge twice. In
particular, we representing the undirected edge (v,w) by the two oppositely directed edges (v, w)
and (w, v).
This can cause some complications. For example, suppose you write an algorithm that
operates by marking edges of a graph. You need to be careful when you mark edge (v, w) in the
representation that you also mark (w, v), since they are both the same edge in reality.
111
When dealing with adjacency lists, it may not be convenient to walk down the entire linked list,
so it is common to include cross links between corresponding edges.
An adjacency matrix requires O(V 2) storage and an adjacency list requires O(V +E)
storage. The V arises because there is one entry for each vertex in Adjacent. Since each list has
out-deg (v) entries, when this is summed over all vertices, the total number of adjacency list
records is O(E). For sparse graphs the adjacency list representation is more space efficient.
TRANSITIVE CLOSURE
Let us assume that the graph is completely described by its adjacency matrix, adj.
Consider the logical expression adj[i][k] && adj[k][j]. Its value is TRUE if and only if the values
of both adj[i][k] and adj[k][j] are TRUE, which implies that there is an arc from node I to node k
and an arc from node k to node j. Thus adj[j][k] && adj[k][j] equals TRUE if and only if there is
a path of length 2 from I to j passing through k.
Consider the expression
(adj[i][0] && adj[0][j]) || (adj[i][1] && adj[1][j]) ||….|| (adj[i][MAXNODES-1] &&
adj[MAXNODES-1][j])
The value of this expression is TRUE only if there is a path of length 2 from node I to node j
either through node 0 or through node 1… or through node MAXNODES-1.
Consider an array adj2 such that adj2[i][j] is the value of the foregoing expression. adj2 is
called the path matrix of length 2. adj2[i][j] indicates whether or not there is a path of length 2
between i and j. adj2 is said to be the Boolean product of adj2 itself.
Fig (a): adj Fig (b): adj2
112
Fig illustrates the process. Fig (a) depicts a graph and its adjacency matrix in which true
is reprented by 1 and false eresntd by 0. Fig (b) is the boolen oduct of that matx ith self, and thus
student the path matrix of length 2 for the graph. 1 appears in row I, column j of the matrix in Fig
(b) if and only if there is a path of length 2 from node I to node j in the graph.
Define adj3, the path matrix of length 3, as the Boolean product of adj2 with adj. adj3[i][j]
equals TRUE if and only if there is a path of length 3 from i to j. The below Fig illustrates the
matrices adj3 and adj4 of the graph in Fig (a)
A B C D E
A 0 0 0 1 1
B 0 0 0 1 1
C 0 0 0 1 1
D 0 0 0 0 1
E 0 0 0 1 0
Fig (a) adj3
A B C D E
A 0 0 0 1 1
B 0 0 0 1 1
C 0 0 0 1 1
D 0 0 0 1 0
E 0 0 0 0
Fig (a) adj4
Assume that to know whether a path of length 3 or less exists between two nodes of a graph. If
such a path exists between nodes i and j, it must be of length 1, 2, 3. If there is a path of length 3
or less between nodes I and j the value of
adj[i][j] || adj2[i][j] || adj3[i][j]
must be true. Fig shows the matrix formed by ―or-ing‖ the matrices adj, adj2 and adj3.
A B C D E
A 0 0 1 1 1
B 0 0 1 1 1
113
C 0 0 0 1 1
D 0 0 0 1 1
E 0 0 0 1 1
Fig matrix formed by or -ing
If the graph has n nodes, it must be true that,
Path[i][j]= =adj[i][j] || adj2 [i][j] || …|| adjn[i][j]
This is because if there is a path of length m>n from I to j such as i1,i2,i3…im,j, there must
be another path from I to j of length less than or equal to n. since there are only n nodes in the
graph, atleast one node k n or less. Fig illustrates the matrix path for the graph of Fig (a). The
matrix path is called the transitive closure must appear in the path Twice.
The path from I to j can be shortened by removing the cycle from k to k. This process is
repeated until no two nodes in the path are equal and therefore the path is of length of the matrix
adj.
A B C D E
A 0 0 1 1 1
B 0 0 1 1 1
C 0 0 0 1 1
D 0 0 0 1 1
E 0 0 0 1 1
Fig path = adj or adj2 or asj3 or adj4 or adj5
Transitive closure routine
transclose (adj,path)
int adj[ ] [MAXNODES], path[ ] [MAXNODES];
int i,j k;
int newprod[MAXNODES] [MAXNODES],adjprod [MAXNODES] [MAXNODES];
for ( i=0; i<MAXNODES; ++i)
for (j=0; j<MAXNODES; ++j)
114
adjprod[i][j] = path[i][j]=adj[i][j];
for ( i=1; i<MAXNODES; ++i)
prod (adjprod,adj,newprod);
for (j=0; j<MAXNODES; ++j)
for (k=0; k<MAXNODES; ++k)
path[j][k]=path[i][k] ||newprod[j][k];
for (j=0; j<MAXNODES; ++j)
for (k=0; k<MAXNODES; ++k)
adjprod[j][k]=newprod[j][k];
The routine prod may be
prod (a, b, c)
int a[ ] [MAXNODES],b[ ] [MAXNODES], c[ ] [MAXNODES];
int i, j, k, val;
for ( i=0; i<MAXNODES; ++i)
for (j=0; j<MAXNODES; ++j)
val = FALSE;
for (k=0; k<MAXNODES; ++k)
val =val || (a[j][k] && b[k][j]);
c[i][j] = val;
To analyze the efficiency of this routine, finding the Boolean product by the method is O(n3),
where n is the number of graph nodes. In transclose, this process is embedded in a loop that is
repeated n-1 times, so that the entire transitive closure routine is O(n4).
WARSHALL’S ALGORITHM
Let us define the matrix path such that pathk[i][j] is true if and only if there is a path from
node i to node j that does not pass through any nodes numbered higher than k.
The only situation in which pathk+1[i][j] can be TRUE while pathk[i][j] equals FALSE is
if there is a path from i to j passing through nodes 1 through k.
115
But this means that there must be a path from I to k+1 passing through only nodes 1
through k and a similar path from k+1 to j. Thus pathk+1[i][j] equals TRUE if and only if one of
the following two conditions holds:
pathk [i] [j]==TRUE
pathk [i] [k+1]==TRUE and pathk [k+1] [j]==TRUE
This means that pathk+1 [i][j] equals pathk [i][j] || (pathk [i][k+1] && pathk [k+1][j]).
To obtain the matrix pathk from the marix pathk-1 based on
for (i=0;i<MAXNODES;++i)
for (j=0;j<MAXNODES;++j)
pathk [i] [j]=pathk-1[i] [j] || (pathk [i][k+1] && pathk [k+1][j]);
This may be logically simplified and made more efiicient as
for (i=0;i<MAXNODES;++i)
for (j=0;j<MAXNODES;++j)
pathk [i] [j]=pathk-1[i] [j];
for (i=0;i<MAXNODES;++i)
if (pathk-1 [i] [k]=TRUE)
for (j=0;j<MAXNODES;++j)
pathk [i] [j]=pathk-1[i] [j] || pathk-1 [k] [j];
Clearly path0 [i] [j]=adj, since only way to go from node I to node j without passing through
any nodes is to go directly from i to j. The following C routineis used to compute the transitive
closure.
transclose( adj, path)
int adj[ ][MAXNODES],path[ ] [MAXNODES]
int i,j,k;
116
for (i=0;i<MAXNODES;++i)
for (j=0;j<MAXNODES;++j)
path [i] [j] =adj [i] [j];
for (k=0;i<MAXNODES;++k)
for (i=0;i<MAXNODES;++i)
if (path [i] [k]= =TRUE)
for (j=0;j<MAXNODES;++j)
path [i] [j]=path [i] [j] || path [k] [j];
This technique increase the efficiency of finding the transitive closure to O(n3). This method
is called Warshall‘s algorithm.
SHORTEST-PATH ALGORITHMS
The input is a weighted graph: associated with each edge (vi, vj) is a cost ci,j to traverse
the arc. The cost of a path v1v2 ... vn is
This is referred to as the weighted path length. The unweighted path length is merely the number
of edges on the path, namely, n - 1.
Unweighted Shortest Paths
Fig shows an unweighted graph, G. Using some vertex, s, which is an input parameter,
we would like to find the shortest path from s to all other vertices. there are no weights on the
edges. This is clearly a special case of the weighted shortest-path problem, since we could assign
all edges a weight of 1.
For now, suppose we are interested only in the length of the shortest paths, not in the
actual paths themselves. Keeping track of the actual paths will turn out to be a matter of simple
bookkeeping.
Fig An unweighted directed graph G
117
Suppose we choose s to be v3. Immediately, we can tell that the shortest path from s to v3
is then a path of length 0. Obtain this graph in Fig.
Looking for all vertices that are a distance 1 away from s. These can be found by looking
at the vertices that are adjacent to s. If we do this, we see that v1 and v6 are one edge from s. This
is shown n Fig
Fig: Graph after marking the start node as reachable in zero edges
Fig: Graph after finding all vertices whose path length from s is 1
Fig: Graph after finding all vertices whose shortest path is 2
We can now find vertices whose shortest path from s is exactly 2, by finding all the
vertices adjacent to v1 and v6 (the vertices at distance 1), whose shortest paths are not already
known. This search tells us that the shortest path to v2 and v4 is 2.
Finally we can find, by examining vertices adjacent to the recently evaluated v2 and v4,
that v5 and v7 have a shortest path of three edges. All vertices have now been calculated, and so
Figure 9.14 shows the final result of the algorithm.
118
This strategy for searching a graph is known as breadth-first search. It operates by
processing vertices in layers: the vertices closest to the start are evaluated first, and the most
distant vertices are evaluated last. This is much the same as a level-order traversal for trees.
Given this strategy, we must translate it into code. Fig shows the initial configuration of
the table that our algorithm will use to keep track of its progress.
For each vertex, we will keep track of three pieces of information. First, we will keep its
distance from s in the entry dv. Initially all vertices are unreachable except for s, whose path
length is 0. The entry in pv is the bookkeeping variable, which will allow us to print the actual
paths. The entry known is set to 1 after a vertex is processed. Initially, all entries are unknown,
including the start vertex.
Fig: Final shortest paths
When a vertex is known, we have a guarantee that no cheaper path will ever be found,
and so processing for that vertex is essentially complete. The basic algorithm can be described in
Fig. The algorithm in Fig mimics the diagrams by declaring as known the vertices at distance d =
0, then d = 1, then d = 2, and so on, and setting all the adjacent vertices w that still have dw = to
a distance dw = d + 1. The running time of the algorithm is O(|V|2), because of the doubly nested
for loops.
v Known dv pv
----------------------
v1 0 ∞ 0
v2 0 ∞ 0
v3 0 0 0
v4 0 ∞ 0
v5 0 ∞ 0
v6 0 ∞ 0
v7 0 ∞ 0
119
Fig: Initial configuration of table used in unweighted shortest-path computation
void unweighted( TABLE T ) /* assume T is initialized */
unsigned int curr_dist;
vertex v, w;
for( curr_dist = 0; curr_dist < NUM_VERTEX; curr_dist++)
for each vertex v
if( ( !T[v].known ) && ( T[v].dist = curr_dist ) )
T[v].known = TRUE;
for each w adjacent to v
if( T[w].dist = INT_MAX )
T[w].dist = curr_dist + 1;
T[w].path = v;
Fig: Pseudocode for unweighted shortest-path algorithm
NETWORK FLOW PROBLEM
Suppose we are given a directed graph G = (V, E) with edge capacities cv,w. These
capacities could represent the amount of water that could flow through a pipe or the amount of
traffic that could flow on a street between two intersections.
We have two vertices: s, which we call the source, and t, which is the sink. Through any
edge, (v, w), at most cv,w units of "flow" may pass. At any vertex, v, that is not either s or t, the
total flow coming in must equal the total flow going out.
The maximum flow problem is to determine the maximum amount of flow that can pass
from s to t. As an example, for the graph in Fig on the left the maximum flow is 5, as indicated
by the graph on the right.
Fig: A graph (left) and its maximum flow
As required by the problem statement, no edge carries more flow than its capacity. Vertex
a has three units of flow coming in, which it distributes to c and d. Vertex d takes three units of
flow from a and b and combines this, sending the result to t. A vertex can combine and distribute
120
flow in any manner that it likes, as long as edge capacities are not violated and as long as flow
conservation is maintained.
A first attempt to solve the problem proceeds in stages. We start with our graph, G, and
construct a flow graph Gf. Gf tells the flow that has been attained at any stage in the algorithm.
Initially all edges in Gf have no flow, when the algorithm terminates, Gf contains a maximum
flow. We also construct a graph, Gr, called the residual graph. Gr tells, for each edge, how much
more flow can be added. We can calculate this by subtracting the current flow from the capacity
for each edge. An edge in Gr is known as a residual edge.
At each stage, we find a path in Gr from s to t. This path is known as an augmenting path.
The minimum edge on this path is the amount of flow that can be added to every edge on the
path We do this by adjusting Gf and recomputing Gr. When we find no path from s to t in Gr, we
terminate. The initial configuration is in Fig (a). There are many paths from s to t in the residual
graph. Suppose we select s, b, d, t. Then we can send two units of flow through every edge on
this path once we have filled (saturated) an edge, it is removed from the residual graph in Fig (b).
Next, select the path s, a, c, t, which also allows two units of flow. Making the required
adjustments gives the graphs in Fig (c).
Fig (a) Initial stages of the graph, flow graph, and residual graph
Fig (b) G, Gf, Gr after two units of flow added along s, b, d, t
121
Fig (c ) G, Gf, Gr after two units of flow added along s, a, c, t
The only path left to select is s, a, d, t, which allows one unit of flow. The resulting
graphs are shown in Fig (d).The algorithm terminates at this point, because t is unreachable from
s.
suppose that with our initial graph, we chose the path s, a, d, t.there is now no longer any
path from s to t in the residual graph, and thus, our algorithm has failed to find an optimal
solution.The resulting flow of 5 happens to be the maximum. Fig (e) shows the algorithm fails.
For every edge (v, w) with flow fv,w in the flow graph, we will add an edge in the
residual graph (w, v) of capacity fv,w.. Starting from our original graph and selecting the
augmenting path s, a, d, t, we obtain the graphs in Fig (f).
That in the residual graph, there are edges in both directions between a and d. Either one
more unit of flow can be pushed from a to d, or up to three units can be pushed back we can
undo flow the algorithm finds the augmenting path s, b, d, a, c, t, of flow 2. By pushing two units
of flow from d to a, the algorithm takes two units of flow away from the edge (a, d). Fig (g)
shows the new graphs.
Fig (d) G, Gf, Gr after one unit of flow added along s, a, d, t -- algorithm terminates
Fig (e) G, Gf, Gr if initial action is to add three units of flow along s, a, d, t -- algorithm
terminates with suboptimal solution
122
Fig (f) Graphs after three units of flow added along s, a, d, t using correct algorithm
Fig (g) Graphs after two units of flow added along s, b, d, a, c, t using correct algorithm
There is no augmenting path in this graph, so the algorithm terminates. if the edge
capacities are rational numbers, this algorithm always terminates with a maximum flow.
If the capacities are all integers and the maximum flow is f, then, since each augmenting
path increases the flow value by at least 1, f stages suffice, and the total running time is O(f |E|),
since an augmenting path can be found in O(|E|) time by an unweighted shortest-path algorithm
Dijkstra's Algorithm
The general method to solve the single-source shortest-path problem is known as
Dijkstra's algorithm. This thirty-year-old solution is a prime example of a greedy algorithm.
Dijkstra's algorithm proceeds in stages, just like the unweighted shortest-path algorithm.
At each stage, Dijkstra's algorithm selects a vertex v, which has the smallest dv among all the
unknown vertices, and declares that the shortest path from s to v is known.
The remainder of a stage consists of updating the values of dw. In the unweighted case,
we set dw = dv + 1 if dw = ∞.
Thus, we essentially lowered the value of dw if vertex v offered a shorter path. If we
apply the same logic to the weighted case, then we should set dw = dv + cv,w if this new value
for dw would be an improvement.
The graph in Fig is our example. Fig (a) represents the initial configuration, assuming
that the start node, s, is v1. The first vertex selected is v1, with path length 0. This vertex is
marked known. Now that v1 is known, some entries need to be adjusted. The vertices adjacent to
v1 are v2 and v4.
Both these vertices get their entries adjusted, as indicated in Fig (b)
Next, v4 is selected and marked known. Vertices v3, v5, v6, and v7 are adjacent, and it
turns out that all require adjusting, as shown in Fig (c)
123
Next, v2 is selected. v4 is adjacent but already known, so no work is performed on it. v5
is adjacent but not adjusted, because the cost of going through v2 is 2 + 10 = 12 and a path of
length 3 is already known. Fig (d) shows the tables after these vertices are selected.
The next vertex selected is v5 at cost 3. v7 is the only adjacent vertex, but it is not
adjusted, because 3 + 6 > 5. Then v3 is selected, and the distance for v6 is adjusted down to 3 + 5
= 8. The resulting table is depicted in Fig (e)
Next v7 is selected; v6 gets updated down to 5 + 1 = 6. The resulting table is Fig (f)
Finally, v6 is selected.
The final table is shown in Fig (g). Fig (h) graphically shows how edges are marked
known and vertices updated during Dijkstra's algorithm.
Fig: The directed graph G
v Known dv pv
-------------------
v1 0 0 0
v2 0 ∞ 0
v3 0 ∞ 0
v4 0 ∞ 0
v5 0 ∞ 0
v6 0 ∞ 0
v7 0 ∞ 0
Fig (a) Initial configuration of table used in Dijkstra's algorithm
v Known dv pv
--------------------
v1 1 0 0
v2 0 2 v1
v3 0 ∞ 0
v4 0 1 v1
v5 0 ∞ 0
v6 0 ∞ 0
124
v7 0 ∞ 0
Fig (b) After v1 is declared known
v Known dv pv
--------------------
v1 1 0 0
v2 0 2 v1
v3 0 3 v4
v4 1 1 v1
v5 0 3 v4
v6 0 9 v4
v7 0 5 v4
Fig (c) After v4 is declared known
v Known dv pv
--------------------
v1 1 0 0
v2 1 2 v1
v3 0 3 v4
v4 1 1 v1
v5 0 3 v4
v6 0 9 v4
v7 0 5 v4
Fig (d) After v2 is declared known
v Known dv pv
--------------------
v1 1 0 0
v2 1 2 v1
v3 1 3 v4
v4 1 1 v1
v5 1 3 v4
v6 0 8 v3
v7 0 5 v4
Fig (e) After v5 and then v3 are declared known
v Known dv pv
-------------------
v1 1 0 0
v2 1 2 v1
v3 1 3 v4
125
v4 1 1 v1
v5 1 3 v4
v6 0 6 v7
v7 1 5 v4
Fig (f) After v7 is declared known
v Known dv pv
-------------------
v1 1 0 0
v2 1 2 v1
v3 1 3 v4
v4 1 1 v1
v5 1 3 v4
v6 1 6 v7
v7 1 5 v4
Fig (g) After v6 is declared known and algorithm terminates
Fig (h) Stages of Dijkstra's algorithm
void
dijkstra( TABLE T )
126
vertex v, w;
for( ; ; )
v = smallest unknown distance vertex;
if(V==NotAVertex)
break;
T[v].known = TRUE;
for each w adjacent to v
if( !T[w].known )
if( T[v].dist + cv,w < T[w].dist )
decrease( T[w].dist to
T[v].dist + cv,w );
T[w].path = v;
Fig: Pseudocode for Dijkstra's algorithm
MINIMUM SPANNING TREE
A minimum Spanning tree of an undirected graph G is a tree formed from graph edges
that connects all the vertices of G at a lowest total cost. A minimum spanning tree exists if and
only if G is connected.
In fig, the second graph is a minimum spanning tree of the first. The number of edges in
the minimum spanning tree is |v|-1. The minimum spanning tree is a tree because it is acyclic, it
is spanning because it covers every vertex, and it is minimum for the obvious reason.
127
Fig: A graph G and its minimum spanning tree
For any spanning tree T, if an edge Education that is not in T is added, a cycle is created.
The removal of any edge on the cycle reinstates the spanning tree property. The cost of
the spanning tree is lowered if e has lower cost than the edge that was removed. If, as a spanning
tree is created, the edge that is added is the one of minimum cost that avoids creation of a cycle,
then the cost of the resulting spanning tree cannot be removed, because any replacement edge
would have cost at least as much as an edge already in the spanning tree.
Prim’s Algorithm
One way to compute a minimum spanning tree is to grow the tree in successive stages. In
each stage, one node is picked as the root, and we add an edge, and thus an associated vertex, to
the tree.
The algorithm then finds, at each stage, a new vertex to add to the tree by choosing the
edge (u, v) such that the cost of (u, v) is the smallest among all edges where u is in the tree andv
is not. The Fig shows how this algorithm would build the minimum spanning tree, starting from
v1. Initially, v1 is in the tree as a root with no edges. Each step adds one edge and one vertex to
the tree.
128
Fig: Prim's algorithm after each stage
The prim‘s algorithm is identical to Dijkstra‘s algorithm for shortest paths. For each
vertex we keep values dv and pv and an indication of whether it is known or unknown. Dv is the
weight of the shortest arc connecting v to a known vertex, and pv id the last vertex to cause a
change in dv.
The rest of the algorithm is same, with the exception that since the definition of dv is
different, so is the update rule. After a vertex v is selected, for each unknown w adjacent to v,
dw=min(dw,cw,v).
The initial configuration of the table is shown in Fig (a) v1 is selected, and v2, v3, and v4
are updated. The table resulting from this is shown in Fig (b)
The next vertex selected is v4. Every vertex is adjacent to v4. v1 is not examined, because
it is known. v2 is unchanged, because it has dv = 2 and the edge cost from v4 to v2 is 3; all the
rest are updated.
Fig (c) shows the resulting table. The next vertex chosen is v2 (arbitrarily breaking a tie).
This does not affect any distances. Then v3 is chosen, which affects the distance in v6, producing
Fig (d). Fig (e) results from the selection of v7, which forces v6 and v5 to be adjusted. v6 and
then v5 are selected, completing the algorithm.
The final table is shown in Fig(f) the edges in the spanning tree can be read from the table: (v2, v1), (v3, v4), (v4, v1), (v5, v7), (v6, v7),
(v7, v4). The total cost is 16.
v Known dv pv
--------------------
v1 0 0 0
v2 0 ∞ 0
v3 0 ∞ 0
v4 0 ∞ 0
v5 0 ∞ 0
129
v6 0 ∞ 0
v7 0 ∞ 0
Fig (a) Initial configuration of table used in Prim's algorithm
v Known dv pv
--------------------
v1 1 0 0
v2 0 2 v1
v3 0 4 v1
v4 0 1 v1
v5 0 ∞ 0
v6 0 ∞ 0
v7 0 ∞ 0
Fig (b) The table after v1 is declared known
v Known dv pv
--------------------
v1 1 0 0
v2 0 2 v1
v3 0 2 v4
v4 1 1 v1
v5 0 7 v4
v6 0 8 v4
v7 0 4 v4
Fig (c) The table after v4 is declared known
v Known dv pv
--------------------
v1 1 0 0
v2 1 2 v1
v3 1 2 v4
v4 1 1 v1
v5 0 7 v4
v6 0 5 v3
v7 0 4 v4
Fig (d) The table after v2 and then v3 are declared known
v Known dv pv
--------------------
v1 1 0 0
v2 1 2 v1
130
v3 1 2 v4
v4 1 1 v1
v5 0 6 v7
v6 0 1 v7
v7 1 4 v4
Fig (e)The table after v7 is declared known
v Known dv pv
--------------------
v1 1 0 0
v2 1 2 v1
v3 1 2 v4
v4 1 1 v1
v5 1 6 v7
v6 1 1 v7
v7 1 4 v4
Fig (f) The table after v6 and v5 are selected (Prim's algorithm terminates)
The prim‘s algorithm runs on undirected graphs, so when coding it, remember to put
every edge in two adjacency lists. The running time is O(|V2|) without heaps, which is optimal
for dense graphs, and O(|E|log|V|) using binary heaps, which is good for sparse graphs.
Kruskal’s Algorithm
A second greedy strategy is continually to select the edges in order of smallest weight
and accept an edge if it does not cause a cycle. The action of the algorithm on the graph in the
preceding example is shown in Fig
Formally, Kruskal‘s algorithm maintains a forest- a collection of trees. Initially, there are
|V| single-node trees. Adding an edge merges two trees into one.
When the algorithm terminates, there is only one tree, and this is the minimum spanning
tree. Fig shows the order in which edges are added to the forest. The algorithm terminates when
enough edges are accepted. It turns out to be simple to decide whether edge (u, v) should be
accepted or rejected.
Edge Weight Action
----------------------------
(v1,v4) 1 Accepted
(v6,v7) 1 Accepted
(v1,v2) 2 Accepted
(v3,v4) 2 Accepted
(v2,v4) 3 Rejected
(v1,v3) 4 Rejected
(v4,v7) 4 Accepted
131
(v3,v6) 5 Rejected
(v5,v7) 6 Accepted
Fig: Action of Kruskal's algorithm on G
Fig: Kruskal's algorithm after each stage
The invariant is that at any point in the process, two vertices belong to the same set if and
only if they are connected in the current spanning forest. Thus, each vertex is initially in its own
set. If u and v are in the same set, the edge is rejected, because since they are already connected,
adding (u, v) would form a cycle. Otherwise, the edge is accepted, and a union is performed on
the two sets containing u and v. It is easy to see that this maintains the set invariant, because
once the edge (u, v) is added to the spanning forest, if w was connected to u and x was connected
to v, then x and w must now be connected, and thus belong in the same set.
The worst-case running time is O(|E| log |Education|), which is dominated by the heap
operations.
Pseudocode for Kruskal’s Algorithm
Void
Kruskal (Graph G)
int EdgesAccepted;
DisjSET Student;
PriorityQueue H;
Vertex U, V;
SetType Uset, Vset;
Edge E;
Initialize( S );
ReadGraphIntoHeapArray(G, H);
132
BuildHeap(H);
EdgesAccepted=0;
While(EdgesAccepted<NumVertex-1)
E=DeleteMin(H);
Uset=Find(U, S);
Vset=Find(V, S);
if ( Uset!=Vset)
EdgesAccepted++;
SetUnion(S, Uset, Vset);
AN APPLICATION OF SCHEDULING
Suppose a chef in a diner receives an order for a fried egg. The job of frying an egg can
be decomposed into a number of distinct subtasks:
Get egg Crack egg Get grease
Grease pan Heat grease Pour egg into pan
Wait until egg is done Remove egg
Some of these tasks must precede others (for example, "get egg" must precede "crack
egg"). Others may be done simultaneously (for example, "get egg" and "heat grease").
The chef wishes to provide the quickest service possible and is assumed to have an
unlimited number of assistants. The problem is to assign tasks to the assistants so as to complete
the job in the least possible time.
Although this example may seem frivolous, it is typical of many real-world scheduling
problems. A computer system may wish to schedule jobs to minimize turnaround time; a
compiler may wish to schedule machine language operations to minimize execution time; or a
plant manager may wish to organize an assembly line to minimize production time. All these
problems are closely related and can be solved by the use of graphs.
Let us represent the above problem as a graph. Each node of the graph represents a
subtask and each arc <x,y> represents the requirement that subtask y cannot be performed until
subtask x has been completed. This graph G is shown in Fig.
133
Fig: The graph G
Consider the transitive closure of G. The transitive closure is the graph T such that <x,y> is an
arc of T if and only if there is a path from x to y in G. This transitive closure is shown in Fig.
Fig: The graph T
In the graph T, an arc exists from node x to node y if and only if subtask x must be performed
before subtask y. Note that neither G nor T can contain a cycle, since if a cycle from node x to
itself existed, subtask x could not be performed until after subtask x had been completed. This is
clearly an impossible situation in the context of the problem. Thus G is a dag, a directed acyclic
graph.
In the graphs of Figures, the nodes A and F do not have predecessors. Since they have no
predecessors the subtasks that they represent may be performed immediately and simultaneously
without waiting for any other subtasks to be completed. Every other subtask must wait until at
least one of these is completed.
Once these two subtasks have been performed, their nodes can be removed from the
graph. Note that the resulting graph does not contain any cycles, since nodes and arcs have been
removed from a graph that originally contained no cycles.
Therefore the resulting graph must also contain at least one node with no predecessors. In
the example, B and H are two such nodes. Thus the subtasks B and H may be performed
simultaneously in the second time period.
Time period Assistant 1 Assistant 2
1 Get egg Get greese
134
2 Crack egg Grease pan
3 Heat grease
4 Pour egg into pan
5 Wait until egg is done
6 Remove egg
The above process can be outlined as follows:
Read the precedences and construct the graph.
Use the graph to determine subtasks that can be done simultaneously.
Let us refine each of these two steps. Two crucial decisions must be made in refining step 1.
The first is to decide the format of the input; the second is to decide on the representation of the
graph. The most convenient way to represent these requirements is by ordered pairs of subtasks;
each input line contains the names of two subtasks where the first subtask on a line must precede
the second
Step 2 can be refined into the following algorithm
While ( the graph is not empty)
determine which nodes have no predecesors;
output this group of nodes with an indication that yhey can be performed simultaneously in the
next time period;
remove these nodes and their incident arcs from the graph;
The refinement of Step 2 may be rewritten as
Outp=NULL;
for (all node(p) from the graph)
if (count(p)==0)
remove node(p) from the graph;
place node(p) on the input list;
period=0;
while (outp!=NULL)
++perod;
printf(―%d‖, period);
135
nextout=NULL;
p=outp;
while(p!= NULL)
printf(―%s‖,info(p));
for(all arcs a emanting from node(p)0
t=the pointer to the node that terminates a;
count(t)--;
if(count(t)==0)
remove node(t ) from the graph;
add node(t) to the nextout list;
free arc(a);
q=next(p);
freenode(p);
p=q;
outp=nextout;
if(any nodes remain in the graph)
error- there is a cycle in the graph;
It is necessary in step I to be able to access each graph node from the character string that
specifies the task the node represents. For this reason, it makes sense to organize the set of graph
nodes in a hash table. Although the initial traversal will require accessing some extra table
positions, this is more than offset by the ability to access a node directly from its task name. The
only impediment is the need (in line 19) to delete nodes from the graph.
However, further analysis reveals that the only reason to delete a node is to be able to
check whether any nodes remain when the output list is empty (line 30) so that a cycle may be
detected. If we maintain a counter of the number of nodes and implement the deletion by
reducing this counter by I, we can check for remaining nodes by comparing the counter with
zero.
LINKED REPRESENTATION OF GRAPHS
136
The adjacency matrix representation of a graph is frequently inadequate because it
requires advance knowledge of the number of nodes. If a graph must be constructed in the course
of solving a problem, or if it must be updated dynamically as the program proceeds, a new
matrix must be created for each addition or deletion of a node. This is prohibitively inefficient,
especially in a real-world situation where a graph may have a hundred or more nodes. Further,
even if a graph has very few arcs so that the adjacency matrix is sparse, space must be reserved
for every possible arc between two nodes, whether or not such an arc exists. If the graph contains
n nodes, a total of n2 locations must be used.
The remedy is to use a linked structure, allocating and freeing nodes from an available
pool. This is similar to the methods used to represent dynamic binary and general trees. In the
linked representation of trees, each allocated node corresponds to a tree node. This is possible
because each tree node is the son of only one other tree node and is therefore contained in only a
single list of sons. However, in a graph an arc may exist between any two graph nodes. It is
possible to keep an adjacency list for every node in a graph and a node might find itself on many
different adjacency lists. But this requires that each allocated node contain a variable number of
pointers, depending on the number of nodes to which it is adjacent.
An alternative is to construct a multilinked structure in the following way. The nodes of
the graph are represented by a linked list of header nodes. Each such header node contains three
fields: info. nextnode, and arcptr. If p points to a header node representing a graph node a,
info(p) contains any information associated with graph node a. nextnode(p) is a pointer to the
header node representing the next graph node, if any. Each header node is at the head of a list of
nodes of a second type called list nodes. This list is called the adjacency list. Each node on an
adjacency list represents an arc of the graph. arcptr(p) points to the adjacency list of nodes
representing the arcs emanating from the graph node a.
Each adjacency list node contains two fields: ndptr and nextarc. If q points to a list node
representing an arc <A.B>, ndptr(q) is a pointer to the header node representing the graph node
B. Nextarc(q) points to a list node representing the next arc emanating from graph node A, if
any. Each list node is contained in a single adjacency list representing all arcs emanating from a
given graph node. The term allocated node is used to refer to either a header or a list node of a
multilinked structure representing a graph.
A sample header node representing a graph node
A sample list ode representing an arc
Fig (a)
137
Fig (b) A graph
Fig (c) Linked represntation of a graph
Fig illustrates this representation. If each graph node carries some information but the
arcs do not, two types of allocated nodes are needed: one for header nodes (graph nodes) and the
other for adjacency list nodes (arcs). These are illustrated in Fig (a). Each header iwde contains
an info field and two pointers. The first of these is to the adjacency list of arcs emanating from
the graph node, and the second is to the next header node in the graph. Each arc node contains
two pointers, one to the next arc node in the adjacency list and the other to the header node
representing the graph node that terminates the arc. Fig (b) depicts a graph and Fig(c) its linked
representation
Nodes are declared using the array implementation as
struct nodetype
138
int info;
int point;
int next;
;
struct nodetype node[MAXNODES];
In the case of a header node, node[p] represents a graph node A, node[p].info represents
the information associated with the graph node A, node[p].next points to the next graph node,
and node[p].point points to the first list node representing an arc emanating from A. In the case
of a list node, node[p] represents an arc <A,B>, node[p].info represents the weight of the arc,
node[p].next points to the next arc emanating from A, and node[p] .point points to the header
node representing the graph node B.
In dynamic implementation, declaring the nodes as follows:
struct nodetype
int info;
struct node type *point;
struct nodetype *next;
;
struct nodetype *nodeptr;
The implementation of the primitive graph operations using the linked representation.
The operation joinwt(p,q,wt) accepts two pointers p and q to two header nodes and creates an arc
between them with weight wt. If an arc already exists between them, that arc's weight is set to
wt.
joinwt (p, q, wt)
int p, q, wt;
int r, r2;
r2 = -10;
r = node[p].point;
while (r >= 0 && node[r].point != q)
r2 = r:
r= node [r ]. next;
139
node[r].info =wt;
return;
r=getnode();
node[r].point=q;
node[r].next=-1;
node[r].info=wt;
(r2<0)? (node[p].point=r)node[r2].next=r);
The operation remv(p,q) accepts pointers to two header nodeand removes th arc between them, if
one exists.
remv (p,q)
int p,q;
int r,r2;
r2=-1;
r=node[p].point;
while (r>=0 && node[r].point!=q)
r2=r;
r= node[r].next;
if (r>=0)
(r2<0) ? (node[p].point=node[r].next);
(node[r2].next=node[r].next);
freenode(r);
return;
The function adjacent(p,q) accepts pointers to two headers nodes and determines whether
node(q) is adjacent to node(p).
140
adjacent (p,q);
int p,q;
int r;
r = node[p].point;
while (r>0)
if (node[r].point= = q)
return(TRUE);
else
r=node[r].next;
return(FALSE);
The function findnode(graph,x) which returns a pointer to a headrer node with information field
x if such a header node exists, and returns the null pointe otherwise.
findnode (graph,x)
int graph;
int x;
int p;
p=graph;
while (p>=0)
if (node[p].info= = x)
return(p);
else
p=node[p].next;
return(-1);
The function addnode(&graph,x) adds a node with information field x to a graph and returns a
pointer to that node.
Addnode (pgraph,x)
int *pgraph;
141
int x;
int p;
p = getnode( );
node[p].info =x;
node[p].point=-1;
node[p].next=*pgraph;
*pgraph=p;
return(p);
The difference between the adjacency matrix representation and the linked representation
of graphs. Implicit in matrix representation is the ability to traverse a row or column of the
matrix. Taversing a row is equivalent to identifying all arcs emanating from a given node. This
can be done efficiently in the linked representation by traversing the list of arc nodes starting at a
given header node. Traversing a column of an adjacency matrix, however, is equivalent to
identifying all arcs that terminate at a given node; there is no corresponding method for
accomplishing this under the linked representation. , the linked representation could be modified
to include two lists emanating from each header node: one for the arcs emanating from the graph
node and the other lr the arcs terminating at the graph node. However, this would require
allocating two nodes for each arc, thus increasing the complexity of adding or deleting an arc.
Alternatively, each arc node could be placed on two lists. In this case, an arc node would contain
four pointers: one to the next arc emanating from the same node, one to the next arc terminating
at the same node, one to the header node at which it terminates a.nd one to the header node from
Which emanate., A header node would contain three pointers: one to the next header node, one
to the list of the arcs emanating from it and one to the list of arcs terminating at it.
GRAPH TRAVERSALS
There are a number of approaches used for solving problems on graphs. One of the
most important approaches is based on the notion of systematically visiting all the vertices
and edge of a graph. The reason for this is that these traversals impose a type of tree structure
(or generally a forest) on the graph, and trees are usually much easier to reason about than
general graphs.
Breadth-first search
Given an graph G = (V, E), breadth-first search starts at some source vertex s and
discovers which vertices are reachable from s. Define the distance between a vertex v and s to
be the minimum number of edges on a path from s to v. Breadth-first search discovers
vertices in increasing order of distance, and hence can be used as an algorithm for computing
shortest paths.
BFS search define an inverted tree (an acyclic directed graph in which the source is
the root, and every other node has a unique path to the root). If we reverse these edges we get
a rooted unordered tree called a BFS tree for G. (Note that there are many potential BFS trees
for a given graph, depending on where the search starts, and in what order vertices are placed
on the queue.) These edges of G are called tree edges and the remaining edges of G are called
cross edges. It is not hard to prove that if G is an undirected graph, and then cross edges
always go between two nodes that are at most one level apart in the BFS tree.
Initially all vertices (except the source) are colored white, meaning that they are
undiscovered. When a vertex has first been discovered, it is colored gray (and is part of the
frontier). When a gray vertex is processed, then it becomes black. The search makes use of a
queue, a first-in first-out list, where elements are removed in the same order they are inserted.
The first item in the queue (the next to be removed) is called the head of the queue. We will
also maintain arrays
color[u] which holds the color of vertex u (either white, gray or black)
pred[u] which points to the predecessor of u (i.e. the vertex who first discovered u
d[u], the distance from s to u.
Only the colour is really needed for the search. We include all this information, because some
applications of BFS use this additional information.
BFS Algorithm
BFS Example
Depth-First Search
We assume we are given an directed graph G = (V,E). The same algorithm works for
undirected graphs (but the resulting structure imposed on the graph is different). We use four
auxiliary arrays.
A color[u] maintains the status of each vertex; white means undiscovered, gray means
discovered but not finished processing, and black means finished.
Predecessor pointers pred[u], stores the details about the vertex that discovered a
given vertex.
When a vertex u is first discover, stores a counter in d[u]
When a vertex processing is finished, stores a counter in f[u].
DFS Algorithm
DFS Example
The running time of DFS is (V + E). This is somewhat harder to see than the BFS analysis,
because the algorithm contains re
Question Bank
Unit IV – GRAPHS
PART – A (2 MARKS) 1. Define Graph. 2. What is meant by directed graph? 3. Give a diagrammatic representation of an adjacency list representation of a graph. 4. What is meant by topological sort? 5. What is meant by acyclic graph? 6. What is meant by Shortest Path Algorithm? 7. What is meant by Single-Source Shortest path problem? 8. What is meant by Dijkstra’s Algorithm? 9. What is minimum spanning tree? 10. Mention the types of algorithm. 11. Define NP- complete problems 12. What is space requirement of an adjacency list representation of a graph 13. What is topological sort? 14. What is breadth-first search? 15. Define minimum spanning tree 16. Define undirected graph 17. What is depth-first spanning tree 18. What is Bio connectivity? 19. What is Euler Circuit? 20. What is a directed graph? 21. What is meant by ‘Hamiltonian Cycle’? 22. Define (i)indegree (ii)outdegree PART - B (16 MARKS) 1. Explain Prim’s & Kruskal‘s Algorithm with am example. 2. Describe Dijkstra’s algorithm with an example. 3. Explain how to find shortest path using Dijkstra’s algorithm with an example. 4. Explain the application of DFS. 5. Find a minimum spanning tree for the graph using both Prim’s and Kruskal’s algorithms. 6. Explain in detail the simple topological sort pseudo code 7. Write notes on NP-complete problems
UNIT V - STORAGE MANAGEMENT
General Lists – Operations – Linked List Representation – Using Lists – Freeing List Nodes
– Automatic List Management : Reference Count Method – Garbage Collection – Collection
and Compaction
UNIT- V STORAGE MANAGEMENT
STORAGE MANAGEMENT
A programming language that incorporates a large number of data structures must
contain mechanisms for managing those structures and for controlling how storage is
assigned to them.
As data structures become more complex and provide greater capabilities to the user,
the management techniques grow in complexity as well.
So we look at several techniques for implementing dynamic allocation and freeing of
storage.
GENERAL LISTS
Linked lists as a concrete data structure and as a method of implementation for such
abstract data types as the stack, the queue, the priority queue, and the table. In those
implementations a list always contained elements of the same type.
Element is an object as a part of list, and the elements value, which is the object
considered individually.
A list must not be of same type, it contains both integers and characters.
A pointer that is not within the list node is called external pointer and a pointer that is
within the list node is called internal pointer.
It is possible for one or more elements of a list to themselves be lists.
If a list is an nonempty list, head (list) is defined as the value of the first element of
list and tail (list) is defined as the list obtained by removing the first element of list.
Example:
List 1=(5,12,‘s‘,147,‘a‘)
Head(list1)=5
Tail(list1)=(12,‘s‘,147,‘a‘)
Head(tail(list1))=12
Tail(tail(list1))=(‗s‘,147,‘a‘)
For example, consider the list list2 of Fig. This list contains four elements. Two of
these are integers (the first element is the integer 5; the third element is the integer 2) and the
other two are lists. The list that is the second element of list2 contains five elements, three of
which are integers (the first, second and fifth elements) and two of which are lists (the third
element is a list containing the integers 14, 9, and 3 and the fourth element is the null list [the
list with no elements]). The fourth element of list2 is a list containing the three integers 6, 3,
and 10.
list2 = (5,(3,2,(14,9,3),0,4),2,(6,3,10)
head (list I) = 5
tail (list1) = (12, 's', 147, 'a')
head (tail(Iist1) = 12
tail (tail(list1)) = ('s', 147, 'a')
list2 = (5,(3,2,(14,9,3),0,4),2,(6,3,10))
tail (list2) = «3,2,(14,9,3),0,4),2,(6,3,1))
head (tail(list2» = (3,2,(14,9,3),0,4)
head (head(tail(list2») = 3
THE LINKED LIST REPRESENTATION OF A LIST
There are two Possible ways of implementing a list they are
Pointer method
Copy method
Pointer Method:
In this pointer method, the list L1 is represented by a pointer to it.
To create a list L2,the values of list L1 is kept in the next field.
Thus the list L1 becomes a sub list of L2.
The nodes of list L1 are used in two contexts: as a part of L1 and of list L2.
Copy method:
In the copy method, the list L1 is copied before the new list element is added to it.
L1 still points to the original version, and the new copy is made a sub list of L2.
The copy method ensures that the node appears in only context.
Examples:
FREEING LIST NODES
A node or a set of nodes could be an element and/or a sublist of one or several lists. In
such case there is difficulty in determining when such a node can be Modified or Freed.
Define a simple method as a node containing a simple data item. So that infofield
doesnot contain a pointer.
Generally, multiple use of simple nodes is not permitted. That is, operations on simple
nodes are performed by the copy method rather than the pointer method. Thus any simple
node deleted from a list can be freed immediately.
Example:
Which nodes can be freed and which must be retained? Clearly,
The list nodes of list19 (nodes 1,2,3,4) can be' freed, since no other pointers reference them.
Freeing node 1 allows us to free nodes 11 and 12, since they too are accessed by no other
pointers.
Once node 11 is freed, can nodes 7 and 8 also be freed. Node 7 can be freed because
each of the nodes containing a pointer to it (nodes 11 and 4) can be freed.
However, node 8 cannot be freed, since listl1 points to it. List11 is an external
pointer; therefore the node to which it points may still be needed elsewhere in the program.
Since node 8 is kept, nodes 9 and 10 must also be kept. Finally nodes 5 and 6 must be
kept because of the external pointer list l2.
AUTOMATIC LIST MANAGEMENT
The programmer should code the solution to the problem with the assurance that the
system will automatically allocate any list nodes that are necessary for the lists being created
and that the system will make available for reuse any nodes that are no longer accessible.
This is called as Automatic List Management.
There are two principal methods used in automatic list management:
1) Reference count method.
2) Garbage collection method.
THE REFERENCE COUNT METHOD
Under this method each node has an additional count field that keeps a count (called
the reference count) of the number of pointers (both internal and external) to that node. Each
time that the value of some pointer is set to point to a node, the reference count in that node is
increased by I;
each time that the value of some pointer that had been pointing to a node is changed,
the reference count in that node is decreased by 1. When the reference count in any node
becomes 0, that node can be returned to the available list of free nodes.
For example, to execute the statement
1 = tail (1);
The following operations must be performed:
p = 1;
1 = next(l);
next(p) = null;
reduce(p) ;
where the operation reduce(p) is defined recursively as follow:
if (p ! = null)
count(p)--;
if (count(p) == 0)
r = next(p);
redace(r);
if (nodetype(p) == 1st)
redace(head(p»;
free ncde(p);/
/* end if */
/* end if */
To illustrate the reference count method, consider again"the list of Fig above.The following
set of statements creates that list:
listl0 = crlist(14,28);
listl1 = crlist(crlist(S,7);
11 = addon(listl1,~2);
m = crlist(11,head(list11);
list9 = crlist(m,list10,12,11);
m = null;
11 = null;
Fig below illustrates the creation of the list using the reference count method.
List 9 = null;
The results are illustrated in Fig below, where freed nodes are illustrated using dashed lines.
The following sequence of events may take place:
count of node 1 is set to 0.
Node I is freed.
counts of nodes 2 and 11 are set to 0.
Nodes 2 and 1 I are freed.
counts of nodes 5 and 7 are set to 1.
(Figure a) counts of nodes 3 and 12 are set to 0.
Nodes 3 and 12 are freed.
count of node 4 is set to O.
Node 4 is freed.
count of node 9 is set to 1.
count of node 7 is. set to 0.
Node 7 is freed.
(Figure b) count of node 8 is set to 1.
Only those nodes accessible from the external pointers list 10 and list11 remain allocated; all
others are freed.
One drawback of the reference count method is illustrated by the foregoing example.
The amount of work that must be performed by the system each time that a list
manipulation. statement is executed can be considerable.
Whenever a pointer. value is changed, all nodes previously accessible from that
pointer can potentially be freed. Often, the work involved in identifying the nodes to be freed
is not worth the reclaimed. S
After the program has terminated, a single pass reclaims all of its storage without
having to worry about reference count values.
There are two additional disadvantages to the reference count method.
1) The first is the additional space required in each node for the count.
2) The other disadvantage of the reference count method is that the count in the first
node of a recursive or circular list will never be reduced to O.
GARBAGE COLLECTION
Under the reference count method, nodes are reclaimed when they become available
for reuse.
The other principal method of detecting and reclaiming free nodes is called garbage
collection.
When a request is made for additional nodes and there are none available, a system
routine called the garbage collector is called. This routine searches through all of the nodes in
the system, identifies those that are no longer accessible from an external pointer, and
restores the inaccessible nodes to the available pool.
Phases in Garbage Collection:
Garbage collection is usually done in two phases.
1) Marking Phase
2) Collection Phase
Marking Phase:
It involves marking all nodes that are accessible from an external pointer.
Collection Phase:
It involves proceeding sequentially through memory and freeing all nodes that
have not been marked.
One field must be set aside in each node to indicate whether a node has or has not
been marked.
The marking phase sets the mark field to true in each accessible node. As the
collection phase proceeds, the mark field in each accessible node is reset to false.
Thus, at the start and end of garbage collection, all mark fields are false. User
programs do not affect the mark fields.
Whenever the garbage collector is called, all user processing comes to a halt this is
undesirable in case of realtime application.
To avoid this, garbage collector must be called before all space has been exhausted so
that user processing can continue in whatever space is left, while the garbage collector
recovers additional space. .
Another important consideration is that users must be careful to ensure that all lists
are well formed and that all pointers are correct.
Tharshing:
It is possible that, at the time the garbage collection program is called, users are
actually using almost all the nodes that are allocated. Thus almost all nodes are accessible and
the garbage collector recovers very little additional space. After the system runs for a short
time, it will again be out of space; the garbage collector will again be called only to recover
very few additional nodes, and the vicious cycle starts again. This phenomenon, in which
system storage management routines such as garbage collection are executing almost all the
time, is called thrashing.
Algorithms for Garbage Collection
The simplest method for marking all accessible nodes is to mark initially all nodes that
are immediately accessible and then repeatedly pass through all of memory sequentially.
These sequential passes continue until no new nodes have been marked in an entire pass.
Unfortunately, this method is as inefficient as it is simple.
Hence more efficient method is used. Suppose that a node nl in the sequential pass has
been previously marked and that nl includes a pointer to an unmarked node, n2. Then node n2
is marked and the sequential pass would ordinarily continue with the node that follows nl
sequentially in memory. Thus last node reached all accessible nodes has been marked.
Let us present this method as an algorithm. Assume that all list nodes in memory are
viewed as a sequential array.
#define NUMNODES … ;
stract nodetype
int mark;
int utype;
union
int intgrinfo;
char char info;
int lstinfo;
info;
int next; node;
node [NUMNODES];
node[0] is used to represent a dummy node.
node[0].Isti1ifo and node[0].next are initialized to 0,
node[0].mark to true, and node[0].utype to 1st, and that these values are never
changed throughout the system's execution.
The mark field in each node is initially false and is'set to true by the marking
algorithm when a node is found to be accessible.
Assume that ace is an array containing external pointers to immediately accessible
nodes, declared by
#define NUMACC = . . . ;
int acc[NUMACC];
The marking algorithm is as follows:
/* mark all immediately accessible nodes */
for (i = 0; i < NUMACC; i++)
noae[acc[iJ].mark = TRUE;
/* begin a sequential pass through the array of nodes */
/* i points to the node currently being examined */
i = 1;
while (i < NUI'1NODES)
J = i + L; /* J points to the node to be examined next */
if( node[iJ.. mark)
/* mark nodes to which i points */
if(node[i].utype == LST &&
node[ nodeUJ.lstinfoJ. mark != TRUE)
/* the information portion of i */
/* points to an unmarked node */
node[ node[ i ] .1stinfo • mark = TRUE;
if (node[i].lstinfo < J)
j = node[ iJ. Lstinfo;
/* end if */
if (node [node[ i ] • next ]. mark ! = TRUE)
/* the list node following */
/* node[iJ is an unmarked node */
node[node[i].next].mark = TRUE;
if (node[ iJ :next < j)
j = node[iJ.next;
/* end if */
/* end if */
i = j;
/* end while */
Although this method is better than successive sequential passes, it is still inefficient.
The most obvious way to accomplish this is by use of an auxiliary stack and is very
similar to depth-first traversal of a graph.
As each list is traversed through the next fields of its constituent nodes, the utype field
of each node is examined.
for (i = 0; i < NUSACC; i++)
/* mark the next immediately accessible */
/* node and place it on the stack */
node[acc[i]].mark = TRUE;
push(stack, acc[i]);
while (empty(stack) != TRUE)
P = pop(stack);
whil ( p ! = 0)
if (node[p].utype == LST &&
node[node[p].lstinfo].mark != TRUE)
node[node[pl.lstinfo].mark = TRUE;
push(stack, node[p].lstinfo);
/* end if */
if (node[node[p].next].mark == TRUE)
P = 0;
else
p = node[p].next;
node[p].mark = TRUE;
/* end if */
/ end while */
/* end while */
* end for */
This algorithm is as efficient as we can hope for in terms of time, since each node to
be marked is visited only once.
One solution is· to use a stack limited to some maximum size. If the stack is about to
overflow, we can revert to the sequential method given in the previous algorithm.
Fig above illustrates how this stacking mechanism works. Fig (a) shows a list before
the marking algorithm begins. The pointer p points to the node currently being processed, top
points to the stack top, and q is an auxiliary pointer. The mark field is shown as the first field
in each node.
Fig (b) shows the same list immediately after node 4 has been marked. The path taken
to node 4 is through the next fields of nodes 1, 2, and 3. This path can be retraced in reverse
order, beginning at top and following along the next fields.
Fig (c) shows the list after node 7 has been marked. The path to node 7 from the
beginning of the list was from node 1, through node[l].next to node 2, through node[2].lstinjo
to node 5, through node(5].ne.xr to node 6, and then from node(6].ne.xr to node 7. The same
fields that link together the stack are used to resore the list to its original form. Note that the
utype field in node 2 is stk rather than lst to indicate that its lstinfo field, not its next field, is
being used as a stack pointer.
The algorithm that incorporates these ideas is known. as the Schorr-Waite algorithm,
after its discoverers.
COLLECTION AND COMPACTION
The memory spaces that are not used by any program but unavailable to any user
were collected and compacted.
Collection phase
The purpose of this phase is to return to available memory all those locations that
were previously garbage .
It is easy to pass through memory sequentially, examine each node in turn, and return
unmarked nodes to available storage.
For example, the following algorithm could be used to return the unmarked nodes to
an available list headed by avail:
After this algorithm has completed, all unused nodes are on the available list and all
nodes that are in use be programs have their mark fields turned off.
Need For Collection:
All nodes that are not in use are on the available list, the memory of the system may
not be in an optimal state for future use. This is because the interleaving of the occupied
nodes with available nodes may make much of the memory on the available list unusable.
For example, memory is often required in blocks (groups of contiguous nodes) rather
than as single discrete nodes one at a time. The memory request by a compiler for space in
which to store an array would require the allocation of such a block.
Compaction:
The Process of moving all used node to one end of memory and all the available
memory to the otherend is called compaction.
This algorithm that performs such a process is called compaction algorithms
The following algorithm performs the actual compaction.
The basic Problem in developing an algorithm that moves portions of memory from
one location to another is to preserve the integrity of pointer values to the nodes being
moved.
For this reason the algorithm requires 1200 sequential passes.
The first pass maybe outlined as follows:
Update the memory location to be assigned to the next marked node, nd
Traverse the list of nodes pointed to by header (nd) and change the appropriate pointer
fields to point to the new location of nd.
Once this process has been completed for each marked node, a second pass through
memory will perform the actual compaction.
The Second Pass Perform the Following Operations:
Update the memory location to be assigned to the next marked node, nd
Traverse the list of nodes pointed to by header (nd) and change the appropriate pointer
fields to point to the new location of nd.
Move nd to its newlocation.
Several points should be noted with respect to this algorithm
First, node[O] is suitably initialized so that the algorithm need not test for special cases.
Second, the process of adjusting the pointers of all nodes on the list headed by the header
field of a particular node is performed twice: once during the first pass and once during the
second.
Time Requirements:
The time requirements of the algorithm are easy to analyze. There are two linear
passes throughout the complete array of memory. Each pass through memory scans each
node once and adjusts any pointer fields to which the nodes point. The time requirements are
obviously O(n).
Our compaction algorithm does not require any additional storage in the nodes.Such
an algorithm is called as bounded workspace algorithm.
Question Bank
UNIT V – STORAGE MANAGEMENT PART – A (2 MARKS)
1. What is storage management?
2. What is general list?
3. What are the possible ways to represent linked list?
4. How to free the list nodes?
5. What is automatic list management?
6. What are two principal methods used in automatic list management?
7. What is Reference count method?
8. What is Garbage collection method?
9. What is Garbage collection?
10. What are the two Phases in Garbage Collection?
11. What is marking phase?
12. What is collection phase?
13. What is thrashing?
14. What is collection?
15. What is compaction?
PART - B (16 MARKS) 1. Explain about collection and compaction. 2. What is garbage collection? Write an algorithm for garbage collection
3. Explain about Garbage collection.
4. Explain about automatic list management. Explain about two principal methods used in
automatic list management.
5. Explain about two Possible ways of implementing a list (8)
6. Explain about general list (8).
****
University Question papers
B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2010
Third Semester
Computer Science and Engineering CS 2201 — DATA STRUCTURES
(Regulation 2008)
Time : Three hours
Maximum : 100 Marks
Answer ALL questions
PART A — (10 × 2 = 20 Marks)
1. What is an Abstract Data Type? What are all not concerned in an ADT?
2. Explain why binary search cannot be performed on a linked list.
3. Number the following binary tree (Fig. 1) to traverse it in.
(a) Preorder
(b) Inorder
4. What is a threaded binary tree?
5. Define a heap. How can it be used to represent a priority queue?
6. Give any two applications of binary heaps.
7. What is hashing?
8. Mention any four applications of a set.
9. What is breadth-first traversal?
10. Define spanning tree of a graph.
PART B — (5 × 16 = 80 Marks)
11. (a) Describe the data structures used to represent lists. (Marks 16)
Or
(b) Describe circular queue implementation in detail giving all the relevant
features. (Marks 16)
12. (a) Explain the tree traversals. Give all the essential aspects. (Marks 16)
Or
(b) Explain binary search tree ADT in detail. (Marks 16)
13. (a) Discuss in detail the B-tree. What are its advantages? (Marks 16)
Or
(b) Explain binary heaps in detail. Give its merits. (Marks 16)
14. (a) Explain separate chaining and extendible hashing. (Marks 16)
Or
(b) Explain in detail the dynamic equivalence problem. (Marks 16)
15. (a) Consider the following graph
(i) Obtain minimum spanning tree by Kruskal's algorithm. (Marks 10)
(ii) Explain Topological sort with an example. (Marks 6)
Or
(b) (i) Find the shortest path from 'a' to 'd' using Dijkstra's algorithm
in the graph in Figure 2 given in question 15(a). (Marks 10)
(ii) Explain Euler circuits with an example. (Marks 6)
B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2009
Third Semester
Computer Science and Engineering CS 2201 — DATA STRUCTURES
(Regulation 2008)
Time : Three hours
Maximum : 100 Marks
Answer ALL Questions
PART A — (10 × 2 = 20 Marks)
1. List out the areas in which data structures are applied extensively.
2. Convert the expression ((A+B)*C-(D-E)^(F +G)) to equivalent Prefix and
Post fix notations.
3. How many different trees are possible with 10 nodes?
4. What is an almost complete binary tree?
5. In an AVL tree, at what condition the balancing is to be done?
6. What is the bucket size, when the overlapping and collision occur at same
time?
7. Define graph.
8. What is a minimum spanning tree?
9. Define NP hard and NP complete.
10. What is meant by dynamic programming?
PART B — (5 × 16 = 80 Marks)
11. (a) (i) What is a linked list? Explain with suitable program segments any
four operations of a linked list. (Marks 12)
(ii) Explain with a pseudo code how a linear queue could be converted
into a circular queue. (Marks 4)
Or
(b) (i) What is a stack ADT? Write in detail about any three applications
of stack. (Marks 11)
(ii) With a pseudo code explain how a node can inserted at a user
specified position of a doubly linked list. (Marks 5)
12. (a) (i) Discuss the various representations of a binary tree in memory with
suitable example. (Marks 8)
(ii) What are the basic operations that can be performed on a binary
tree? Explain each of them in detail with suitable example. (Marks 8)
Or
(b) (i) Give an algorithm to convert a general tree to binary tree. (Marks 8)
(ii) With an example, explain the algorithms of in order and post order
traversals on a binary search tree. (Marks 8)
13. (a) What is an AVL tree? Explain the rotations of an AVL tree. (Marks 16)
Or
(b) (i) Explain the binary heap in detail. (Marks 8)
(ii) What is hashing? Explain any two methods to overcome collision
problem of hashing. (Marks 8)
14. (a) (i) Explain Dijkstra's algorithm and solve the single source shortest
path problem with an example. (Marks 12)
(ii) Illustrate with an example, the linked list representation of graph.
(Marks 4)
Or
(b) (i) Write the procedures to perform the BFS and DFS search of a
graph. (Marks 8)
(ii) Explain Prim's algorithm to construct a minimum spanning tree
from an undirected graph. (Marks 8)
15. (a) (i) With an example, explain how will you measure the efficiency of an
algorithm. (Marks 8)
(ii) Analyze the linear search algorithm with an example. (Marks 8)
Or
(b) Explain how the travelling salesman problem can be solved using greedy
algorithm. (Marks 16)