Data Structures (BE)

1

ANNA UNIVERSITY TIRUCHIRAPPALLI

Regulations 2008 Syllabus

B. Tech IT/ B.E EEESEMESTER III

CS1201 - DATA STRUCTURES

Prepared By:

B.Sundara vadivazhagan HOD i/c / IT

S.Karthik Lect/ IT

G.Mahalakshmi Lect / IT

UNIT I - FUNDAMENTALS OF ALGORITHMS

Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of

Algorithm using Data Structures – Performance Analysis – Time Complexity – Space

Complexity – Amortized Time Complexity – Asymptotic Notation

UNIT II - FUNDAMENTALS OF DATA STRUCTURES

Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists

– Queue and its Representation – Applications of Stack – Queue and Linked Lists.

UNIT III - TREES

Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and

External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm –

Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching –

Hashing

UNIT - IV GRAPHS AND THEIR APPLICATIONS

Graphs – An Application of Graphs – Representation – Transitive Closure –Warshall‘s

Algorithm – Shortest path Algorithm – A Flow Problem – Dijikstra‘s Algorithm – Minimum

Spanning Trees – Kruskal and Prim‘s Algorithm – An Application of Scheduling – Linked

Representation of Graphs – Graph Traversals

UNIT V - STORAGE MANAGEMENT

General Lists – Operations – Linked List Representation – Using Lists – Freeing List Nodes –

Automatic List Management : Reference Count Method – Garbage Collection – Collection and

Compaction

2

TEXT BOOKS

1. Cormen T. H.., Leiserson C. E, and Rivest R.L., ―Introduction to Algorithms‖, Prentice Hall of

India, New Delhi, 2007.

2. M.A.Weiss, ―Data Structures and Algorithm Analysis in C‖, Second Edition, Pearson

Education, 2005.

REFERENCES

1. Ellis Horowitz, Sartaj Sahni and Sanguthevar Rajasekaran, ―Computer Algorthims/C++‖,

Universities Press (India) Private Limited, Second Edition, 2007.

2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, ―Data Structures and Algorithms‖,First Edition,

Pearson Education, 2003.

3. R. F. Gilberg and B. A. Forouzan, ―Data Structures‖, Second Edition, Thomson India Edition,

2005.

4. Robert L Kruse, Bruce P Leung and Clovin L Tondo, ―Data Structures and Program Design in

C‖, Pearson Education, 2004.

5. Tanaenbaum A. S. Langram, Y. Augestein M.J, ―Data Structures using C‖, Pearson Education,

2004.

4

UNIT I - FUNDAMENTALS OF ALGORITHMS

Algorithm – Analysis of Algorithm – Best Case and Worst Case Complexities –Analysis of

Algorithm using Data Structures – Performance Analysis – Time Complexity – Space

Complexity – Amortized Time Complexity – Asymptotic Notation

5

UNIT – I FUNDAMENTALS OF ALGORITHMS

I. ALGORITHM

An algorithm is any well-defined computational procedure that takes some value, or set

of values, as input and produces some values, as Output. In other words, an algorithm is a

sequence of computational steps that transform the input into the output.

An algorithm can be viewed as a tool for solving a well-specified computational problem.

The statement of the problem specifies the desired input/ output relationship. The algorithm

describes a specific computational procedure for achieving that input/output relationship.

Study of Algorithm :

Problem of sorting a sequence of numbers into a non-decreasing order. The Sorting

problem is defined as

Input: A sequence of n numbers (a1,a2….an)

Output: A permutation (reordering) (a1‘,a2‘,…an‘) of the input sequence such that

a1‘≤a2‘≤…..≤an‘

Given an input sequence such as (31,41,59,26,41,58), a sorting algorithm returns as

output the sequence (26,31,41,41,58,59). Such an input sequence is called an instance of the

sorting problem. In general, an instance of a problem consists of all the inputs needed to

compute a solution to the problem.

An algorithm is said to be correct if, for every instance, it halts with the correct output.

The correct algorithm solves the given computational problem.

An incorrect algorithm might not halt at all on some input instances, or it might halt with

other than the desired answer.

Example

Insertion Sort

Insertion sort is an efficient algorithm for sorting a small number of elements.

Insertion sort works the way many people sort a bridge or gin rummy hand.

We start with an empty left hand and the cards face down on the table. We then

remove one card at a time from the table and insert it into the correct position in

the left hand.

To find the correct position for a card, we compare with each of the cards already

in the hand, from right to left.

Pseudocode for INSERTION-SORT

INSERTION-SORT (A)

for j 2 to length[A]

do keyA[j]

6

Insert A[j] into the sorted sequence A[1….j-1]

i j-1

while i>0 and A[i]>key

do A[I+1] A[i]

ii-1

A[i+1]key

The insertion sort is presented as a procedure called INSERTION SORT, which

takes as parameter an array A[1….n] containing a sequence of length n that is to

be sorted.

The input numbers are sorted in place: the numbers are rearranged within the

array A, with at most a constant number of them stored outside the array at any

time.

The input array A contains the sorted output sequence when INSERTION-SORT

is finished.

5 4 6 1 3

2 5 6 1 3

2 4 5 1 3

2 4 5 6 3

1 2 4 5 6

1 2 3 4 5 6

Fig: The operation of INSERTION-SORT on the array A=(5, 2, 4,6, 1 ,3). The position

of index j is indicated by a circle

The fig shows how this algorithm works for A (5, 2, 4, 6, 1, 3). The index j indicates the

current card being inserted into the hand. Array elements A [1..j-1] constitute the currently

sorted hand, and elements A[j+1…..n] corresponds to the pile of cards still on the table.

The index j moves left to right through the array. At each iteration of the ―outer‖ for loop,

the element A[j] is picked out of the array.

Then starting in position j-1 elements are successively moved one position to the right

until the proper position for A[j] is found, at which point it is inserted.

2

4

6

1

3

7

Goals for an algorithm

Basic goals for an algorithm

1. Always correct

2. Always terminates

3. Performance- Performance often draws the line between what is possible and what is

impossible.

The notion of ―algorithm‖

Description of a procedure which is

1. Finite (i.e., consists of a finite sequence of Characters)

2. Complete (i.e., describes all computation steps)

3. Unique (i.e., there are no ambiguities)

4. Effective (i.e., each step has a defined effect and Can be executed in finite time)

Properties:

Desired properties of algorithms

Correctness

o For each input, the algorithm calculates the requested value

Termination

For each input, the algorithm performs only a finite number of steps

Efficiency

o Runtime : The algorithm runs as fast as possible

o Storage space: The algorithm requires as little storage space as possible.

Algorithms-Distinct areas:

Five distinct areas to study of algorithms.

1. Creating or devising algorithms: Various design techniques are created to yield good

algorithms.

2. Expressing the algorithms in a structured representation.

3. Validating algorithms: The algorithms devised should compute the correct answer for

all possible legal inputs. This process is known as a algorithm validation.

8

4. Analyzing algorithms: It refers to the process of determining how much computing time

and storage an algorithm will require. How well an algorithm does performs in the best case,

worst case, average case.

Kinds of Analyses

Worst-case: (usually)

T(n) = maximum time of algorithm on any input of size n.

Average-case: (sometimes)

T(n) = expected time of algorithm over all inputs of size n.

Need assumption of statistical distribution of inputs.

Best-case: (Never)

Cheat with a slow algorithm that works fast on some input.

5. Testing algorithm

It consists of two phases: Debugging and profiling

Debugging is the process of executing programs on sample data to determine if any

faulty results occur.

Profiling is the process of executing a correct programs on data sets and measuring the

time and space it takes to compute the results.

II. ANALYSIS OF ALGORITHM

Analyzing an algorithm has come to mean predicting the resources that the algorithm

requires. Occasionally, resources such as memory, communication bandwidth, or logic gates are

of primary concern, but most often it is computational time that we want to measure.

Generally, by analyzing several candidate algorithms for a problem, a most efficient one

can be easily identified. Such analysis may indicate more than one viable candidate, but several

inferior algorithms are usually discarded in the process.

Analysis predicting the resources that the algorithm requires, resources such as memory,

communication bandwidth or computer hardware are of primary concern, but most often it is

necessary to measure the computational time.

By analyzing several candidate algorithms for a problem, a most efficient one can be

easily identified and others are discarded in the process.

The main reasons for analyzing algorithms are

It is an intellectual activity

It is a challenging one to predict the future by narrowing the predictions to algorithms.

Computer science attracts many people who enjoy being efficiency experts.

Structural Programming Model

Niklaus Wirth started that any algorithm could be written with only three programming

constructs Sequence, Selection, Loop

9

The implementation of these constructs relies on the implementation language like C++

language.

Sequence is a series of statement that do not alter the execution path within an algorithm.

Selection statements evaluate one or more alternatives. If alternatives are true, one path is taken.

If alternatives are false, a different path is taken.

Loop

Iterates a block of code.

Usually the condition is evaluate before the Body of the loop is executed.

If the condition is true, the body is executed .

If the condition is false, the loop terminates.

ANALYSIS OF INSERTION SORT :

The time taken by the Insertion Sort procedure depends on the input: sorting a thousand

numbers takes longer than sorting three numbers.

Insertion sort can take different amounts of time to sort two input sequences of the same

size depending on how nearly sorted they already are.

In general, the time taken by an algorithm grows with the size of the input, so it is

traditional to describe the running time of a program as a function of the size of its input.

To do so, we need to define the terms "running time" and "size of input" more carefully.

The best notion for input size depends on the problem being studied.

For many problems, such-as sorting or computing discrete Fourier transforms, the most

natural measure is the number a/items in the input.

For example, the array size n for sorting. For many other problems, such as multiplying

two integers, the best measure of input size is the total number of bits needed to represent

the input in ordinary binary notation.

The running time of an algorithm on a particular input is the number of primitive

operations or "steps" executed.

We start by presenting the INSERTION-SORT procedure with the time "cost" of each

statement and the number of times each statement is executed. For each j = 2,3, ... , n,

where n = length[A], we let tj be the number of times the while loop test in line 5 is

executed for that value of j.

We assume that comments are not executable statements, and so they take no time.

INSERTION-SORT(A) cost times 1 for j +- 2 to length[A] C1 n

2 do key +- A[j] C2 n - 1 3 l> Insert A[j] into the sorted l> sequence A[l .. j - 1]. 0 n - 1

4 i+-j-l C4 n-l 5 while i > a and A[i] > key Cs L;=2 tj

6 do A[i + 1] +- A[i] C6 L;=2(tj - 1)

7 i+-i-l C7 L;=2(tj - 1)

10

8 A[i + 1] +- key C8 n - 1

The running time of the algorithm is the sum of running times for each statement

executed; a statement that takes Ci steps to execute and is executed n times will contribute Ci n

to the total running time. To compute T(n), the running time of INSERTION-SORT, we sum the

products of the cost and times columns, obtaining

n n n

T(n) =c1n+c2 (n-1)+c4(n-1)+ c5∑ tj + c6 ∑ (tj-1) +c7 ∑ (tj-1) + c8 (n-1)

j=2 j=2 j=2

Even for inputs of a given size, an algorithm's running time may depend on which input

of that size is given. For example, in INSERTION-SORT, the best case occurs if the array is

already sorted. For each j = 2,3, ... , n, we then find that A[i]≤key in line 5 when i has its initial

value of

j - 1. Thus tj = 1 for j = 2,3, ... , n, and the best-case running time is

T(n) = cln + c2(n - 1) + c4(n - 1) + c5(n - 1) + c8(n - 1)

= (cl + c2 + c4 + c5 + c8)n - (c2 + c4 + c5 + c8)

This running time can be expressed as an+b for constants a and b that depend on the

statement costs ci. It is thus a linear function of n.

If the array is in reverse sorted order the worst case results. Compare each element A[j]

with each element in the entire sorted sub array A[1…j-1] and so tj=j for j=2,3,…n

n

∑ j=n (n+1)/2 -1

j=2

and

n

∑ (j-1)=n (n+1)/2

j=2

T(n)=c1n+c2(n-1)+c5(n(n-1)/2 -1)+c6((n(n-1)/2 + c7((n(n-1)/2+c8(n-1)

= (c5/2+c6/2+c7/2) n2+(c1+c2+c4+c5/2-c6/2-c7/2+c8) n-(c2+c4+c5+c8)

This worst case running time can be expressed as an2= bn+c for constants a, b, c that

again depend on the statement costs ci; it is thus a quadratic function of n

11

Worst case and Average case Analysis

In the analysis of Insertion sort, the best case in which the input array was already sorted,

and the worst case, in which the input array was reverse sorted. To find only the worst case

running time, that is, the longest running time for any input of size n. Three reasons are

The worst case running time of an algorithm is an upper bound on the running time for

any input.

Knowing it gives us the guarantee that the algorithm will never take any longer. For some

algorithms, the worst case occurs fairly often. For example, in searching a database for a

particular piece of information, the searching algorithm worst case will often occur when

the information is not present in the database.

The average case is often roughly as bad as the worst case. Suppose that we randomly

choose n numbers and apply insertion sort. How long it takes to determine where in sub

array A[1…j-1] to insert element A[j]? On average half the elements in A[1..j-1] are less

than A[j], and half the elements are greater. On average, we check half of the sub array

A[1..j-1], so tj=j/2. Resulting average case running time, it turns out to be a quadratic

function of the input size, just like the worst case running time.

One problem with performing an average case analysis, however is that it may not be

apparent what constitutes an average input for a particular problem.

DESIGNING ALGORITHMS

There are many ways to design algorithms. Insertion sort uses an incrementa1

approach: having sorted the sub array A [I .. j - 1], we insert the single element

A[j] into its proper place, yielding the sorted sub array A[l .. j].

In this section, we examine an alternative design approach, known as "divide-

and-conquer."

We shall use divide-and-conquer to design a sorting algorithm whose worst-case

running time is much less than that of insertion Sort.

One advantage of divide-and-conquer algorithms is that their running times are

often easily determined using techniques

Divide and Conquer approach:

Many useful algorithms are recursive in structure: to solve a given problem, they call

themselves recursively one or more times to deal with closely related sub problems.

These algorithms typically follow a divide-and-conquer approach: they break the problem

into several sub problems that are similar to the original problem but smaller in size, solve the

sub problems recursively, and then combine these solutions to create a solution to the original

problem.

The divide-and-conquer paradigm involves three steps at each level of the recursion:

12

Divide the problem into a number of sub problems.

Conquer the sub problems by solving them recursively. If the sub problem sizes are small

enough, however, just solve the sub problems in a straightforward manner.

Combine the solutions to the sub problems into the solution for the original problem.

EXAMPLE - MERGE SORT

The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it

operates as follows.

Divide: Divide the n-element sequence to be sorted into two subsequences of nl2 elements each.

Conquer: Sort the two subsequences recursively using merge sort. Combine: Merge the two

sorted subsequences to produce the sorted answer.

We note that the recursion "bottoms out" when the sequence to be sorted has length I, in

which case there is no work to be done, since every sequence of length I is already in sorted

order.

The key operation of the merge sort algorithm is the merging of two sorted sequences in

the "combine" step. To perform the merging, we use an auxiliary procedure MERGE (A,p, q, r),

where A is an array and p, q, and r are indices numbering elements of the array such that p :S q <

r. The procedure assumes that the sub arrays A(p .. q] and A[q + I .. r] are in sorted order. It

merges them to form a single sorted sub array that replaces the current sub array A(p .. r].

Although we leave the pseudo code as an exercise it is easy to imagine a MERGE procedure that

takes time 8(n), where n = r - p + 1 is the number of elements being merged. Returning to our

card playing motif, suppose we have two piles of cards face up on a table. Each pile is sorted,

with the smallest cards on top. We wish to merge the two lines into a single sorted output pile,

which is to be face down on the table input pile and place it face down onto the output pile.

Computationally, each basic step takes constant time, since we are checking just two top cards.

Since we perform at most n basic steps, merging takes 8(n) time.

We can now use the MERGE procedure as a subroutine in the merge sort algorithm. The

procedure MERGE-SORT (A,p, r) sorts the elements in the sub array A[p .. r]. If p, r, the sub

array has at most one element and is therefore already sorted. Otherwise, the divide step simply

computes an index q that partitions A[p .. r] into two sub arrays: A[p .. q], containing rn/2l

elements, and A[q + I .. r], containing In/2J elements.4

MERGE-SORT (A,p, r)

I if p < r

2 then q <- l(p + r)/2J

3 MERGE-SORT(A,p, q)

4 MERGE-SORT(A,q + I,r)

5 MERGE(A,p,q,r)

To sort the entire sequence A = (A[I],A[2], ... ,A[nJ), we call MERGESORT(A,

1,length[A]), where once again length[A] = n.

If we look at the operation of the procedure bottom-up when n is a power of two, the al-

gorithm consists of merging pairs of I-item sequences to form sorted sequences of length 2,

merging pairs of sequences of length 2 to form sorted sequences of length 4, and so on, until two

sequences of length n /2 are merged to form the final sorted sequence of length n

13

When an algorithm contains a recursive call to itself, its running time can often be

described by a recurrence equation or recurrence equation, which describes the overall running

time on a problem of size n in terms of the running time on smaller inputs. We can then use

mathematical tools to solve the recurrence and provide bounds on the performance of the

algorithm.

A recurrence for the running time of a divide-and-conquer algorithm is based on the three

steps of the basic paradigm. As before, we let T(n) be the running time on a problem of size n.

If the problem size is small enough, say n :: c for some constant c, the straightforward

solution takes constant time, which we write as 8( I). suppose we divide the problem into sub

problems, each of which is 1/ b the size of the original.

If we take D( n) time to divide the problem into sub problems and C(11) time to combine

the solutions to the sub problems into the solution to the original problem, we get the recurrence

BEST CASE, WORST CASE AND AVG. CASE EFFICIENCIES

Time efficiency – function in terms of n (input size)

For some algorithm, the running time depends not only on input size n also on the

individual elements.

eg. Linear search, here we go for worst case/ best case and avg. case efficiency.

We will mainly focus on worst-case analysis, but sometimes it is useful to do average one.

Worst- / average- / best-case

Worst-case running time of an algorithm

– The longest running time for any input of size n

– An upper bound on the running time for any input

– Guarantee that the algorithm will never take longer

– Sequential search for an item which is not present / present at the end of list.

– Sort a set of numbers in increasing order; and the data is in decreasing order

– The worst case can occur fairly often

– Provides the expected running time

Best-case running time

– if the algorithm is executed, the fewest number of instructions are executed

– takes shortest running time for any input of size n

– Sequential search for an item which is present at beginning of the list.

– sort a set of numbers in increasing order; and the data is already in

increasing order

14

Average-case running time

– May be difficult to define what ―average‖ means, but gives the necessary

details about an algo‘s behavior on a typical /random input.

EXAMPLE: Sequential Search

A sequential search steps through the data sequentially until a match is found.

A sequential search is useful when the array is not sorted.

The basic operation count for

1. Best case input c(n)=1

2. Worst case input c(n)

Unsuccessful search --- n times

Successful search (worst) ---n times

3. Avg.case input

Here basic operation count is calculated as follows

Assumptions:

a) The probability of a successful search =p (0<=p<=1)

b) The probability of the first match occuring in the ith position of list is same for every i,

which is equal to p/n and the no of compare operations made by the algo. in such a

situation is i.

c) In case of unsuccessful search ,the no of comparisons made is n with the probability of

such a search is (1-p)

So c(n)=[1*p/n+2*p/n+…..i*p/n+….n*p/n]+n*(1-p)

=p/n(1+2+….i+…n)+n(1-p)

=p/n*n(n+1)/2+n(1-p)

c(n)=p(n+1)/2+n(1-p)

For successful search, p=1,c(n)=(n+1)/2

Unsuccessful search, p=0,c(n)=n

15

ANALYSIS OF ALGORITHM USING DATA STRUCTURES:

The analysis of algorithm is made considering both qualitative and quantitative aspects to

get the solution that is economical in the use of computing and human resources which improves

the performance of an algorithm. A good algorithm usually possesses the following qualities and

capabilities.

They are simple but powerful and general solutions

They are user friendly

They can be easily updated

They are correct

They are able to be understood on a number of levels

They are economical in the use of computer time, storage and peripherals

They are independent to run on a particular computer

They can be used as subprocedures for other problems

The solution is pleasing and satisfying to its designer

IV. COMPUTATIONAL COMPLEXITY

Space Complexity

The space complexity of an algorithm is the amount of memory it needs to run to

completion

[Core dumps = the most often encountered cause is ―memory leaks‖ – the amount of

memory required larger than the memory available on a given system]

Some algorithms may be more efficient if data completely loaded into memory

1. Need to look also at system limitations

2. E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters,

etc.] – can I afford to load the entire collection?

Time Complexity

The time complexity of an algorithm is the amount of time it needs to run to completion

Often more important than space complexity

1. space available (for computer programs!) tends to be larger and larger

2. time is still a problem for all of us

Algorithms running time is an important issue

Space Complexity

The Space needed by each algorithm is the sum of the following components:

1. Instruction space

2. Data space

3. Environment stack space

Instruction space

16

The space needed to store the compiled version of the program instructions

Data space

The space needed to store all constant and variable values

Environment stack space

The space needed to store information to resume execution of partially completed

functions

The total space needed by an algorithm can be simply divided into two parts from the 3

components of space complexity

1. Fixed part

2. Variable part

Fixed part

A fixed part space is independent of the characteristics of the inputs and outputs. This

part typically includes the instruction space, space for simple variables and fixed size component

variables, space for constants and so on

e.g. name of the data collection

same size for classifying 2GB or 1MB of texts

Variable part

A variable part space needed by component variables whose, size is dependent on the

particular problem instance being solved, the space needed by referenced variables and the

recursion stack space.

e.g. actual text

load 2GB of text VS. load 1MB of text

The space requirement S(P) of any algorithm or program P may be written as:

S(P)=C+Sp (instance characteristics)

C= Constant that denotes the fixed part of the space requirement

Sp= Variable Component depends on the magnitude of the inputs to and outputs from the

algorithm.

Example

void float sum (float* a, int n)

float s = 0;

for(int i = 0; i<n; i++)

17

s+ = a[i];

return s;

Space one word for n, one for a [passed by reference!], one for i constant space!

When memory was expensive we focused on making programs as space efficient as

possible and developed schemes to make memory appear larger than it really was (virtual

memory and memory paging schemes)Space complexity is still important in the field of

embedded computing

Time Complexity

The time T(P) taken by a program P is T(P)-Compile time +Run(or)Execution time

Compile time

It does not depend on the instance characteristics. We assume that a compiled program

will be run several times without recompilation

Run time

It depends on the instance characteristics denoted by tp. The tp(n) can be calculated by

the following form of expression

tp(n)= Ca ADD(n) + Cs SUB(n)+ Cm MUL(n)+ Cd DIV(n)+….

n = instance characteristics

Ca, Cs, Cm, Cd = time needed for addition, subtraction, multiplication and division

ADD, SUB, MUL, DIV= number of additions, subtractions, multiplications and divisions

performed for the program p on the instance characteristics n.

To find the value of tp(n) from the above expression is an impossible task, since the time

needed for Ca, Cs, Cm, Cd is often depends on the numbers being involved in the operation.

The value of tp(n) for any given n can be obtained experimentally, that is, the program

typed, compiled and run on a particular machine, the execution time is physically clocked, and

tp(n) is obtained.

The value of tp(n) depends on some factors, such as system load, the number of other

programs running on the computer at the time program p is run and so on. To overcome this

disadvantage, count only the program steps, where the time is required by each step is relatively

independent of the instant characteristics.

A program step is defined as syntactically or semantically meaningful segment of a program that

has an execution time that is independent of the instant characteristics.

The program statements are classified into three steps

1. Comments-Zero step

2. Assignment statement-One step

3. Iterative statement-finite number of steps.

18

V. AMORTIZED ANALYSIS

In an amortized analysis, the time required to perform a sequence of data structure

operations is averaged over all the operations performed.

Amortized analysis can be used to show that the average cost of an operation is

small, if one averages over a sequence of operations, even though a single

operation might be expensive.

Amortized analysis differs from average case analysis in that probability is not

involved; an amortized analysis guarantees the average performance of each

operation in the worst case.

In the aggregate method of amortized analysis, we show that for all n, a sequence

of n operations takes worst-case time T( n) in total. In the worst case, the average

cost, or amortized cost, per operation is therefore T( n) / n. Note that this

amortized cost applies to each operation, even when there are several types of

operations in the sequence.

The other two methods we shall study in this chapter, the accounting method and

the potential method, may assign different amortized costs to different types of

operations.

Stack operations

In our first example of the aggregate method, we analyze stacks that have been

augmented with a new operation. Section II. J presented the two fundamental stack operations,

each of which takes O( I) time:

PUSH(S, x) pushes object x onto stack S.

Pop(S) pops the top of stack S and returns the popped object.

Since each of these operations runs in O( I ) time, let us consider the cost of each to be I.

The total cost of a sequence of n PUSH and POP operations is therefore n, and the actual running

time for n operations is therefore 9(n).

The situation becomes more interesting if we add the stack operation MULTIPOP(S,k),

which removes the k top objects of stack S, or pops the entire stack if it contains less than k

objects.

In the following pseudo code, the operation STACK-EMPTY returns TRUE if there are

no objects currently on the stack, and FALSE otherwise.

MULTIPOP(S, k)

1 while not STACK-EMPTY(S) and k 1= 0

2 do Pop(S)

3 k..-k-l

19

VI. ASYMPTOTIC NOTATION

Complexity analysis: rate at which storage or time grows as a function of the problem size

Asymptotic analysis: describes the inherent complexity of a program, independent of

machine and compiler

Idea: as problem size grows, the complexity can be described as a simple

proportionality to some known function.

A) Big Oh (O)-Upper Bound

This notation is used to define the worst case running time of an algorithm and concerned

with very large values of n.

f(n) = O(g(n)) iff f(n) < cg(n)

for some constants c and n0, and all n > n0

B) Big Omega (Ω )-Lower Bound

This notation is used to describe the best case running time of algorithms and concerned

with large values of n

f(n) = Ω(g(n)) iff f(n) > cg(n)

for some constants c and n0, and all n > n0

C) Big Theta (Θ)-Two-way Bound

This notation is used to describe the average case running time of algorithms and

concerned with very large values of n

f(n) = Θ (g(n)) iff c1g(n) < f(n) < c2g(n)

for some constants c1, c2, and n0, and all n > n0

D) Little Oh (o)-Only Upper Bound

This notation is used to describe the worst case analysis of algorithms and concerned with

small values of n

f(n) = o(g(n)) iff

f(n) = O(g(n)) and f(n)≠ Ω(g(n))

To compare and rank order of growth of algorithm. 3 notaions O, Ω ,

Informal definition.

Let t(n) and g(n) be any non negative funs defined on the set of natural nos.

t(n)- algo‘s running time

g(n) – simple function to compare the count with.

i) O(g(n)) is the set of all fns with a smaller or same order of growth as g(n)

20

Eg.

n E O(n2) n

3 Ê O(n

2)

100n+5 E O(n2) n

4+n+1Ê O(n

2)

100n+5 E O(n)

1/2n(n-1) E O(n2)

ii) Ω(g(n)) stands for set of all fns with a larger or same order of growth as g(n)

n3 E Ω(n

2)

1/2n(n-1) E Ω(n2)

100n+5 Ê Ω(n2)

iii) (g(n)) stands for set of all fns with the same order of growth as g(n)

n3 ^

E (n2)

an2+bn+c E (n

2)

100n+5 E (n)

Big Oh

f(N) = O(g(N))

There are positive constants c and n0 such that

o f(N) c g(N) when N n0

The growth rate of f(N) is less than or equal to the growth rate of g(N)

g(N) is an upper bound on f(N)

o We write f(n) = O(g(n)) if there are positive constants n0 and c such that to the

right of n0, the value of f(n) always lies on or below cg(n).

Meaning: For all data sets big enough (i.e., n>n0), the algorithm always executes in less than

cf(n) steps in [best, average, worst] case.

The idea is to establish a relative order among functions for large n

c , n0 > 0 such that f(N) c g(N) when N n0

21

f(N) grows no faster than g(N) for ―large‖ N

Big O Rules

• If is f(n) a polynomial of degree d, then f(n) is

• O(d), i.e.,

1. Drop lower-order terms

2. Drop constant factors

• Use the smallest possible class of functions

Say ―2n is O(n)‖ instead of ―2n is O(n2)‖

• Use the simplest expression of the class

Say ―3n + 5 is O(n)‖ instead of ―3n + 5 is O(3n)‖

Big-Oh: example

• Let f(N) = 2N2. Then

– f(N) = O(N4)

– f(N) = O(N3)

– f(N) = O(N2) (best answer, asymptotically tight)

• N2 / 2 – 3N = O(N

2)

• 1 + 4N = O(N)

• 7N2 + 10N + 3 = O(N

2) = O(N

3)

Big-Omega

• f(N) = (g(N))

• There are positive constants c and n0 such that

f(N) c g(N) when N n0

• The growth rate of f(N) is greater than or equal to the growth rate of g(N).

• c , n0 > 0 such that f(N) c g(N) when N n0

• f(N) grows no slower than g(N) for ―large‖ N

Big-Omega: example

22

• Let f(N) = 2N2. Then

– f(N) = (N)

– f(N) = (N2) (best answer)

Big-Theta

• f(N) = (g(N)) iff

f(N) = O(g(N)) and f(N) = (g(N))

• The growth rate of f(N) equals the growth rate of g(N)

• f(n) is Θ(g(n)) if there are constants c‘ > 0 and c‘‘ > 0 and an integer constant n0 ≥ 1

such that c‘•g(n) ≤ f(n) ≤ c‘‘•g(n) for n ≥ n0

• Big-Theta means the bound is the tightest possible.

• the growth rate of f(N) is the same as the growth rate of g(N)

Big-Theta rules

• Example: Let f(N)=N2 , g(N)=2N

2

– Since f(N) = O(g(N)) and f(N) = (g(N)),

thus f(N) = (g(N)).

• If T(N) is a polynomial of degree k, then

T(N) = (Nk).

• For logarithmic functions,

T(logm N) = (log N)

Mathematical Expression Relative Rates of Growth

T(n) = O(F(n)) Growth of T(n) <= growth of F(n)

23

T(n) = Ω(F(n)) Growth of T(n) >= growth of F(n)

T(n) = Θ(F(n)) Growth of T(n) = growth of F(n)

T(n) = o(F(n)) Growth of T(n) < growth of F(n)

Computation of step count using asymptotic notation

Asymptotic complexity can be determined easily without determining the exact step

count. This is done by first determining the asymptotic complexity of each statement in the

algorithm and then adds these complexities to derive the total step count.

Question Bank

UNIT I - PROBLEM SOLVING

PART – A (2 MARKS) 1. Define Modularity. 2. What do you mean by top down design? 3. What is meant by algorithm? What are its measures? 4. Give any four algorithmic techniques. 5. Write an algorithm to find the factorial of a given number 6. List the types of control structures 7. Define the top down design strategy 8. Define the worst case & average case complexities of an algorithm 9. What is meant by modular approach? 10. What is divide & conquer strategy? 11. What is dynamic programming? 12. What is program testing? 13. Define program verification 14. What is input/output assertion? 15. Define symbolic execution 16. Write the steps to verify a program segment with loops 17. What is CPU time? 18. Write at least five qualities & capabilities of a good algorithm 19. Write an algorithm to exchange the values of two variables 20. Write an algorithm to find N factorial (written as n!) where n>=0. PART- B (16 MARKS) 1. Explain Top down design in detail. 2. (a) Explain in detail the types on analysis that can be performed on an algorithm (8) (b) Write an algorithm to perform matrix multiplication algorithm and analyze the same (8) 3. Design an algorithm to evaluate the function sin(x) as defined by the infinite series expansion sin(x) = x/1!-x3/3! +x5/5!-x7/7! +…… 4. Write an algorithm to generate and print the first n terms of the Fibonacci series where n>=1 the first few terms are 0, 1, 1, 2, 3, 5, 8, 13. 5. Design an algorithm that accepts a positive integer and reverse the order of its digits. 6. Explain the Base conversion algorithm to convert a decimal integer to its corresponding octal representation

24

UNIT II - FUNDAMENTALS OF DATA STRUCTURES

Arrays – Structures – Stacks – Definition and examples – Representing Stacks –Queues and Lists

– Queue and its Representation – Applications of Stack – Queue and Linked Lists.

25

Unit: II - FUNDAMENTALS OF DATA STRUCTURES

I. ARRAYS

Array is a finite ordered set of homogeneous elements. Array size may of large or small,

but it must exist. An array must contains collection of elements of same datatype.

Array declaration in C is given below.

int a[100];

Here array name is ‗a‘, and size is 100. Each element is represented by its index, which

starts from 0. For example 1st element index is 0, 2

nd element index is 1 and 100

th element

index is 99.

The two basic operations that access an array are extraction and storing. The extraction

operation is a function that accepts an array element by using array name and index. The storing

operation of value x in the index i is,

a[i] = x;

The smallest element of an array`s index is called its lower bound and in c is always 0, and

the highest element is called the upper bound. The number of elements in an array is called

range.

If lower bound is represented by ―lower‖ and upper bound is represented by ―upper‖, then

range = upper – lower + 1. For example for array a, lower bound is 0, upper bound is 99 and

range is 100.

An important feature of a C array is that neither the upper bound nor the lower bound may

be changed during a program`s execution. The lower bound is always fixed at 0, and the upper

bound is fixed at the time the program is written.

One very useful technique is to declare a bound as a constant identifier, so that work required

to modify the size of an array is minimized. For example, consider the following program

int a[100];

for(i = 0; i <100; a[i++] = 0);

To change the array to a larger (or smaller) size, the constant 100 must be changed in two

pieces, once in the declarations and once in the for statement. Consider the following equivalent

alternative,

#define NUMELTS 100

int a[NUMELTS];

26

for(i = 0; i < NUMELTS; a[i++] = 0);

Now only a single change in the constant definition is needed to change the upper bound.

One Dimensional Array

A one-dimensional array is used when it is necessary to keep a larger number of items in

memory and reference all the items in a uniform manner. Consider an application to read 100

integers, and find its average.

#define NUMELTS 100

aver()

int num[NUMELTS];

int i;

int total;

float avg;

total = 0;

for(i = 0; i < NUMELTS; i++)

Scanf(―%d‖, &num[i]);

Total += num[i];

avg = total / NUMELTS;

printf(―Average = %f‖, avg);

The first statement (int b[100];) reseves 100 successive memory locations, each large enough

to contain a single integer. The address of the first of these locations is called the base address of

the array b.

Two-Dimensional array

27

Two-Dimensional array is an array of another array. For example

int a[3][5];

This represents a array containing three elements. Each of these elements is itself an array

containing five elements. This can be represented in figure given below.

Total 15 (3 x 5) elements can be stored in this array. Each element can be accessed by its

corresponding row index and array index. Suppose if you want to access the first cell in the

second row means, just use a[1][0]. Like wise if you want to access second cell in third row,

use a[2][1].

Nested looping statement is used to access each element efficiently, sample code to read

values and full it in this array is given below.

for(i = 0; i < 3; i++)

for(j = 0; j < 5; j++)

scanf(―%d‖, &a[i][j]);

Multi-Dimensional array

C allows developers to declare array of dimensions more than two also. Three-Dimensional

array declaration is given below.

int [3][2][5];

This can be accessed using three nested looping statements. Developer can also declare more

than three dimensional arrays.

II. STRUCTURES

Row 0

Row 1

Row 2

Col 0 Col 1 Col 2 Col 3 Col 4

28

A structure is a group of items in which each item is identified by its own identifier. In

programming language, a structure is called a ―record‖ and a member is called a ―field‖.

Consider the following structure declaration,

struct

Char first[10];

Char midinit;

Char last[20];

sname, ename;

This declaration creates two structure variables, sname and ename, each of which contains

three members: first, midinit and last. Two of the members are character strings, and one is

single character. Declaration of the structure can be also in another format given below

struct nametype

Char first[10];

Char midinit;

Char last[20];

;

Struct nametype sname, ename;

The above definition creates a structure tag nametype containing three members. Once a

structure tag has been defined, variables sname and ename can be declared. An alternative

method for structure tag assigning is use of typedef definintion in C, which is given below

typedef struct

Char first[10];

Char midinit;

Char last[20];

nametype;

nametype sname, ename;

29

Structure variable sname contains three members and ename contains a separate three

members. Each member of a structure variable can be accessed using a dot(.) operator.

Consider a structure given below.

struct data

int a;

float b;

char c;

;

int main()

struct data x,y;

printf(―\nEnter the values for first variable\n‖);

scanf(―%d%f%c‖, &x.a,&x.b,&x.c);

printf(―\nEnter the values for second variable\n‖);

scanf(―%d%f%c‖, &y.a,&y.b,&y.c);

return 0;

Structure variable can also be an array variable. Looping statement is used to get input for

that structure array variable. Sample code is given below,

int main()

struct data x[5];

int i;

for(i = 0; i < 5; i++)

printf(―\nEnter the values for %d variable\n‖, (i+1));

scanf(―%d%f%c‖, &x[i].a, &x[i].b, &x[i].c);

30

return 0;

STACK AND QUEUE

Stacks and queues are used to represent sequence of elements which can be modified by

insertion and deletion. Both stacks and queues can be implemented efficiently as arrays or as

linked lists.

III. STACK

A stack is a list with the restriction that inserts and deletes can be performed in only one

position, namely the end of the list called the top. The fundamental operations on a stack

are push, which is equivalent to an insert, and pop, which deletes the most recently inserted

element.

The most recently inserted element can be examined prior to performing a pop by use of

the top routine. Stacks are used often in processing tree-structured objects, in compilers (in

processing nested structures), and is used in systems to implement recursion. Stacks are also

known as LIFO.

Stack model

31

(Stack model: only the top element is accessible)

REPRESENTATION OF STACK

A) Implementation of Stack using array

A stack K is most easily represented by an infinite array K[0], K[1], K[2],…. and an

index TOP of type integer. The stack K consist of K[0],…K[TOP] elements. Element at index

TOP (K[TOP]) is the top element of stack. Insertion of elements is called push and deletion of

element is called pop. The following code explains the operation push(K,a)

TOP = TOP + 1;

K[TOP] = a;

The following code explains the operation pop(K)

if(TOP<0) then

error;

else

X = K[TOP];

TOP = TOP – 1;

end if

Infinite array is not available. So use a finite array, of size n. In this case a push operation

must check whether overflow occurs.

B) Linked List Implementation of Stacks

Stack can be implemented using a singly linked list. We perform a push by inserting at

the front of the list. We perform a pop by deleting the element at the front of the list.

A top operation merely examines the element at the front of the list, returning its

value. Sometimes the pop and top operations are combined into one. Structure definition is

given below,

typedef struct node *node_ptr;

struct node

element_type element;

node_ptr next;

32

;

typedef node_ptr STACK;

Routine to test whether a stack is empty-linked list implementation is given below,

int is_empty( STACK S )

return( S->next == NULL );

We merely create a header node; make_null sets the nextpointer to NULL. Routine to

create and empty stack-linked list implementation are given below,

STACK S;

S = (STACK) malloc( sizeof( struct node ) );

if( S == NULL )

fatal_error("Out of space!!!");

return S;

Void make_null( STACK S )

if( S != NULL )

S->next = NULL;

else

error("Must use create_stack first");

The push is implemented as an insertion into the front of a linked list, where the front of the list

serves as the top of the stack. Routine to push onto a stack-linked list implementation is given

below,

push( element_type x, STACK S )

node_ptr tmp_cell;

tmp_cell = (node_ptr) malloc( sizeof ( struct node ) );

if( tmp_cell == NULL )


else

tmp_cell->element = x;

33

tmp_cell->next = S->next;

S->next = tmp_cell;

The top is performed by examining the element in the first position of the list. Routine to return

top element in a stack--linked list implementation is given below,

element_type top( STACK S )

if( is_empty( S ) )

error("Empty stack");

else

return S->next->element;

Routine to pop from a stack--linked list implementation is given below,

Void pop( STACK S )

node_ptr first_cell;

if( is_empty( S ) )

error("Empty stack");

else

first_cell = S->next;

S->next = S->next->next;

free( first_cell );

One problem that affects the efficiency of implementing stacks is error testing. Our

linked list implementation carefully checked for errors. A pop on an empty stack or

a pushon a full stack will overflow the array bounds and cause a crash.

APPLICATIONS OF STACKS

34

A) Balancing Symbols

Every brace, bracket, and parenthesis must correspond to their left counterparts. The

sequence [()] is legal, but [(]) is wrong. It is easy to check these things using stack. Just check for

balancing of parentheses, brackets, and braces and ignore any other character that appears.

Make an empty stack. Read characters until end of file. If the character is an open anything,

push it onto the stack. If it is a close anything, then if the stack is empty report an error.

Otherwise, pop the stack. If the symbol popped is not the corresponding opening symbol, then

report an error. At end of file, if the stack is not empty report an error.

B) Postfix Expressions

Suppose we have a pocket calculator and would like to compute the cost of a shopping trip.

To do so, we add a list of numbers and multiply the result by 1.06; this computes the purchase

price of some items with local sales tax added. If the items are 4.99, 5.99, and 6.99, then a

natural way to enter this would be the sequence

4.99 + 5.99 + 6.99 * 1.06 =

Depending on the calculator, this produces either the intended answer, 19.05, or the scientific

answer, 18.39. Most simple four-function calculators will give the first answer, but better

calculators know that multiplication has higher precedence than addition.

On the other hand, some items are taxable and some are not, so if only the first and last items

were actually taxable, then the sequence

4.99 * 1.06 + 5.99 + 6.99 * 1.06 =

would give the correct answer (18.69) on a scientific calculator and the wrong answer (19.37)

on a simple calculator. A scientific calculator generally comes with parentheses, so we can

always get the right answer by parenthesizing, but with a simple calculator we need to remember

intermediate results.

A typical evaluation sequence for this example might be to multiply 4.99 and 1.06, saving this

answer as a1. We then add 5.99 and a1, saving the result in a1. We multiply 6.99 and 1.06, saving

the answer in a2, and finish by adding al and a2, leaving the final answer in al. We can write this

sequence of operations as follows:

4.99 1.06 * 5.99 + 6.99 1.06 * +

This notation is known as postfix or reverse Polish notation. For instance, the postfix

expression

6 5 2 3 + 8 * + 3 + *

35

is evaluated as follows: The first four symbols are placed on the stack. The resulting stack is

Next a '+' is read, so 3 and 2 are popped from the stack and their sum, 5, is pushed.

Next 8 is pushed.

Now a '*' is seen, so 8 and 5 are popped as 8 * 5 = 40 is pushed.

Next a '+' is seen, so 40 and 5 are popped and 40 + 5 = 45 is pushed.

Now, 3 is pushed.

Next '+' pops 3 and 45 and pushes 45 + 3 = 48.

36

Finally, a '*' is seen and 48 and 6 are popped, the result 6 * 48 = 288 is pushed.

The time to evaluate a postfix expression is O(n), because processing each element in the input

consists of stack operations and thus takes constant time. The algorithm to do so is very simple.

Notice that when an expression is given in postfix notation, there is no need to know any

precedence rules; this is an obvious advantage.

C) Infix to Postfix Conversion

Not only can a stack be used to evaluate a postfix expression, but we can also use a stack to

convert an expression in standard form (otherwise known as infix) into postfix. Suppose we want

to convert the infix expression

a + b * c + ( d * e + f ) * g

into postfix. A correct answer is a b c * + d e * f + g * +.

When an operand is read, it is immediately placed onto the output. Operators are not

immediately output, so they must be saved somewhere. The correct thing to do is to place

operators that have been seen, but not placed on the output, onto the stack. We will also stack left

parentheses when they are encountered. We start with an initially empty stack.

If we see a right parenthesis, then we pop the stack, writing symbols until we encounter a

(corresponding) left parenthesis, which is popped but not output.

If we see any other symbol ('+','*', '(' ), then we pop entries from the stack until we find an

entry of lower priority. One exception is that we never remove a '(' from the stack except when

processing a ')'. For the purposes of this operation, '+' has lowest priority and '(' highest. When

the popping is done, we push the operand onto the stack.

Finally, if we read the end of input, we pop the stack until it is empty, writing symbols onto

the output.

To see how this algorithm performs, we will convert the infix expression above into its

postfix form. First, the symbol ais read, so it is passed through to the output. Then '+' is read and

37

pushed onto the stack. Next b is read and passed through to the output. The state of affairs at this

juncture is as follows:

Next a '*' is read. The top entry on the operator stack has lower precedence than '*', so nothing

is output and '*' is put on the stack. Next, c is read and output. Thus far, we have

The next symbol is a '+'. Checking the stack, we find that we will pop a '*' and place it on the

output, pop the other '+', which is not of lower but equal priority, on the stack, and then push the

'+'.

The next symbol read is an '(', which, being of highest precedence, is placed on the stack.

Then d is read and output.

We continue by reading a '*'. Since open parentheses do not get removed except when a

closed parenthesis is being processed, there is no output. Next, e is read and output.

The next symbol read is a '+'. We pop and output '*' and then push '+'. Then we read and

output .

38

Now we read a ')', so the stack is emptied back to the '('. We output a '+'.

We read a '*' next; it is pushed onto the stack. Then g is read and output.

The input is now empty, so we pop and output symbols from the stack until it is empty.

As before, this conversion requires only O(n) time and works in one pass through the input.

IV) QUEUE

Queues supports insertions (called enqueues) at one end (called the tail or rear) and

deletions (called dequeues) from the other end (called the head or front). Queues are used in

operating systems and networking to store a list of items that are waiting for some resource.

Queues are also known as FIFO.

Model of a queue

ARRAY IMPLEMENTATION OF QUEUES

Both the linked list and array implementations give fast O(1) running times for every

operation. Array implementation of queue is given below

39

For each queue data structure, keep an array, QUEUE[], and the positions q_front and

q_rear, which represent the ends of the queue.

Keep track of the number of elements that are actually in the queue, q_size. The cells

that are blanks have undefined values in them.

In particular, the first two cells have elements that used to be in the queue.

To enqueue an element x, Increment q_size and q_rear, then setQUEUE[q_rear] = x. To

dequeue an element, set the return value to QUEUE[q_front], decrementq_size, and then

increment q_front.

There is one potential problem with this implementation. After 10 enqueues, the queue

appears to be full, since q_front is now 10, and the next enqueue would be in a nonexistent

position.

However, there might only be a few elements in the queue, because several elements may

have already been dequeued.

The simple solution is that whenever q_front or q_rear gets to the end of the array, it is

wrapped around to the beginning. The following figure shows the queue during some operations.

This is known as acircular array implementation.

40

There are two warnings about the circular array implementation of queues. First, it is

important to check the queue for emptiness, because a dequeue when the queue is empty will

return an undefined value, silently.

Secondly, some programmers use different ways of representing the front and rear of a

queue. For instance, some do not use an entry to keep track of the size, because they rely on the

base case that when the queue is empty, q_rear = q_front - 1. The size is computed implicitly by

comparing q_rear and q_front. This is a very tricky way to go, because there are some special

cases, so be very careful if you need to modify code written this way. If the size is not part of the

structure, then if the array size isA_SIZE, the queue is full when there are A_SIZE -1 elements,

since only A_SIZE different sizes can be differentiated, and one of these is 0.

Type declarations for queue--array implementation is given below.

struct queue_record

41

unsigned int q_max_size; /* Maximum # of elements */

/* until Q is full */

unsigned int q_front;

unsigned int q_rear;

unsigned int q_size; /* Current # of elements in Q */

element_type *q_array;

;

typedef struct queue_record * QUEUE;

Routine to test whether a queue is empty-array implementation, is given below.

int is_empty( QUEUE Q )

return( Q->q_size == 0 );

Routine to make an empty queue-array implementation, is given below.

Void make_null ( QUEUE Q )

Q->q_size = 0;

Q->q_front = 1;

Q->q_rear = 0;

Routines to enqueue-array implementation, is given below

Void enqueue( element_type x, QUEUE Q )

if( is_full( Q ) )

error("Full queue");

else

Q->q_size++;

Q->q_rear = succ( Q->q_rear, Q );

Q->q_array[ Q->q_rear ] = x;

APPLICATION OF QUEUES

There are several algorithms that use queues to give efficient running times.

42

When jobs are submitted to a printer, they are arranged in order of arrival. Thus,

essentially, jobs sent to a line printer are placed on a queue.

In computer networks, there are many network setups of personal computers in which the

disk is attached to one machine, known as the file server. Users on other machines are

given access to files on a first-come first-served basis, so the data structure is a queue.

Calls to large companies are generally placed on a queue when all operators are busy.

Queues are mostly used in graph theory.

V) LIST

List is an abstract data type (ADT). A general list is of the form a1, a2, a3, . . . , an. a1, a2

are called keys or values of list. The size of this list is n. The list of size 0 is called null list. List

can be implemented contiguously (array) or non-contiguously (linked list).

List Operations

A lot of operations are available to perform on the list ADT. Some popular operations are

find – returns the position of the first occurrence of a key (value)

insert – deletes key from the specified position in the list

delete – inserts key from the specified position in the list

find_kth – returns the element in some position

print_list – displays all the keys in the list.

make_null – makes the list as null list

For example consider a list, 34, 12, 52, 16, 12, then

find(52) – return 3

insert(x,4) – makes the list into 34, 12, 52, x, 16, 12

delete(3) – makes the list into 34, 12, x, 16, 12.

Simple Array Implementation of Lists

List can be implemented using an array. Even if the array is dynamically allocated, an

estimate of the maximum size of the list is required. Usually this requires a high over-estimate,

which wastes considerable space. This could be a serious limitation, especially if there are many

lists of unknown size.

Merits of List using array

An array implementation allows print_list and find to be carried out in linear time, which

is as good as can be expected, and the find_kth operation takes constant time.

43

Demerits of List using array

However, insertion and deletion are expensive. For example, inserting at position 0

(which amounts to making a new first element) requires first pushing the entire array down one

spot to make room, whereas deleting the first element requires shifting all the elements in the list

up one, so the worst case of these operations is O(n). On average, half the list needs to be moved

for either operation, so linear time is still required. Merely building a list by n successive inserts

would require quadratic time. Because the running time for insertions and deletions is so slow

and the list size must be known in advance, simple arrays are generally not used to implement

lists.

Linked Lists

The linked list consists of a series of structures, which are not necessarily adjacent in memory.

Each structure contains the element variable and a pointer variable to a structure containing its

successor. Element variable is used to store a key (value). A pointer variable is just a variable

that contains the address where some other data is stored. This pointer variable is called as next

pointer.

A linked list

Thus, if p is declared to be a pointer to a structure, then the value stored in p is interpreted

as the location, in main memory, where a structure can be found. A field of that structure can be

accessed by p->field_name.

Consider a list contains five structures, which happen to reside in memory locations 1000,

800, 712, 992, and 692 respectively. The next pointer in the first structure has the value 800,

which provides the indication of where the second structure is. The other structures each have a

pointer that serves a similar purpose. Of course, in order to access this list, we need to know

where the first cell can be found. A pointer variable can be used for this purpose.

Linked list with actual pointer values

44

To execute print_list(L) or find(L,key), we merely pass a pointer to the first element in the

list and then traverse the list by following the next pointers. This operation is clearly linear-

time, although the constant is likely to be larger than if an array implementation were used.

The find_kth operation is no longer quite as efficient as an array implementation; find_kth(L,i)

takes O(i) time and works by traversing down the list in the obvious manner.

The delete command can be executed in one pointer change. The result of deleting the third

element in the original list is shown below.

Deletion from a linked list

The insert command requires obtaining a new cell from the system by using an malloc call

(more on this later) and then executing two pointer maneuvers.

Insertion into a linked list

Programming Details

Keep a sentinel node, which is sometimes referred to as a header or dummy node. Our

convention will be that the header is in position 0. Linked list with a header is given below.

Type declarations for linked lists is given below.

typedef struct node *node_ptr;

struct node

element_type element;

node_ptr next;

;

45

typedef node_ptr LIST;

typedef node_ptr position;

Function to test whether a linked list is empty

int is_empty( LIST L )

return( L->next == NULL );

Empty list with header

Function to test whether current position is the last in a linked list

Int is_last( position p, LIST L )

return( p->next == NULL );

Find function returns the position of the element in the list of some element

position find ( element_type x, LIST L )

position p;

p = L->next;

while( (p != NULL) && (p->element != x) )

p = p->next;

return p;

Our fourth routine will delete some element x in list L. We need to decide what to do if x

occurs more than once or not at all. Our routine deletes the first occurrence of x and does

nothing if x is not in the list. To do this, we find p, which is the cell prior to the one containing

x, via a call to find_previous.

void delete( element_type x, LIST L )

46

position p, tmp_cell;

p = find_previous( x, L );

if( p->next != NULL ) /* Implicit assumption of header use */

/* x is found: delete it */

tmp_cell = p->next;

p->next = tmp_cell->next; /* bypass the cell to be deleted */

free( tmp_cell );

position find_previous( element_type x, LIST L )

position p;

p = L;

while( (p->next != NULL) && (p->next->element != x) )

p = p->next;

return p;

Insert routine allows us to pass an element to be inserted along with the list L and a position p.

Our particular insertion routine will insert an element after the position implied by p.

insert( element_type x, LIST L, position p )

position tmp_cell;

tmp_cell = (position) malloc( sizeof (struct node) );

if( tmp_cell == NULL )


else

tmp_cell->element = x;

tmp_cell->next = p->next;

p->next = tmp_cell;

To delete a list

Void delete_list( LIST L )

47

position p, tmp;

p = L->next; /* header assumed */

L->next = NULL;

while( p != NULL )

tmp = p->next;

free( p );

p = tmp;

DOUBLY LINKED LISTS

To traverse lists backwards add an extra field to the data structure, containing a pointer to

the previous cell. The cost of this is an extra link, which adds to the space requirement and also

doubles the cost of insertions and deletions because there are more pointers to fix. On the other

hand, it simplifies deletion, because you no longer have to refer to a key by using a pointer to the

previous cell.

A doubly linked list

CIRCULARLY LINKED LISTS

A popular convention is to have the last cell keep a pointer back to the first. This can be

done with or without a header (if the header is present, the last cell points to it), and can also be

done with doubly linked lists (the first cell's previous pointer points to the last cell).

A double

circularly linked list

Question Bank

Unit II - LISTS, STACKS AND QUEUES

PART – A (2 MARKS) 1. Define ADT. 2. Give the structure of Queue model. 3. What are the basic operations of Queue ADT? 4. What is Enqueue and Dequeue? 5. Give the applications of Queue. 6. What is the use of stack pointer? 7. What is an array? 8. Define ADT (Abstract Data Type).

48

9. Swap two adjacent elements by adjusting only the pointers (and not the data) using singly linked list. 10. Define a queue model. 11. What are the advantages of doubly linked list over singly linked list? 12. Define a graph 13. What is a Queue? 14. What is a circularly linked list? 15. What is linear list? 16. How will you delete a node from a linked list? 17. What is linear pattern search? 18. What is recursive data structure? 19. What is doubly linked list? PART- B (16 MARKS) 1. Explain the implementation of stack using Linked List. 2. Explain Prefix, Infix and postfix expressions with example. 3. Explain the operations and the implementation of list ADT. 4. Give a procedure to convert an infix expression a+b*c+(d*e+f)*g to postfix notation 5. Design and implement an algorithm to search a linear ordered linked list for a given alphabetic key or name. 6. (a) What is a stack? Write down the procedure for implementing various stack operations(8) (b) Explain the various application of stack? (8) 7. (a) Given two sorted lists L1 and L2 write a procedure to compute L1_L2 using only the basic operations (8) (b) Write a routine to insert an element in a linked list (8) 8. What is a queue? Write an algorithm to implement queue with example.

49

UNIT III – TREES

Binary Trees – Operations on Binary Tree Representations – Node Representation –Internal and

External Nodes – Implicit Array Representation – Binary Tree Traversal – Huffman Algorithm –

Representing Lists as Binary Trees – Sorting and Searching Techniques – Tree Searching –

Hashing

50

Unit: III TREES

TREES

A tree is a finite set of one or more nodes such that there is a specially designated node

called the root, and zero or more non empty sub trees T1, T2…TK, each of whose roots are

connected by a directed edge from Root R.

Fig: Tree

PRELIMINARIES

Root

A node which doesn‘t have a parent. In the above tree, the root is A.

Node

Item of Information

Leaf

A node which doesn‘t have children is called leaf or Terminal node. Here B, K, L, G, H,

M, J are leafs.

Siblings

Children of the same parents are said to be siblings, Here B, C, D, E are siblings, F, G are

siblings. Similarly, I, J, K, L are Siblings.

Path

A path from node n1to nk is defined as a sequence of nodes n1, n2, n3 ….nk such that n1 is

the parent of ni+1. There is exactly only one path from each node to root.

In fig path from A to L is A, C, F, L where A is the parent for C, C is the parent of F and F

is the parent of L.

Length

The length is defined as the number of edges on the path. In fig the length for the path A

to L is 3.

A

B C D E

F G F I J

K L

51

Degree

The number of sub trees of a node is called its degree.

Degree of A is 4

Degree of C is 2

Degree of D is 1

Degree of H is 0

The degree of the tree is the maximum degree of any node in the tree

In fig the degree of the tree is 4.

Level

The level of a node is defined by initially letting the root be at level one, if a node is at

level L then it as children are at level L+1

Level of A is 1

Level of A, B, C, D is 2

Level of F, G, H, I, J is 3

Level of K, L, M is 4

Depth

For any node n, the depth of n is the length of the unique path from root to n.

The depth of the root is zero

In fig Depth of node F is 2

Depth of node L is 3

Height

For any node n, the height of the node n is the length of the longest path from n to the

leaf.

The height of the leaf is zero

In fig Height of node F is 1

Height of L is 0

II. BINARY TREES

A binary tree is a special form of a tree. A binary tree is more important and frequently

used in various applications.

A T (Binary tree) is defined as,

T is empty or

T contains a specially designated node called the root of T, and the remaining nodes of T

from two disjoint binary trees T1and T2 which are called left-sub tree and the right sub-

tree respectively.

52

Fig: A sample binary tree with 11 nodes

Two possible situations of a binary tree are (a) Full binary tree (b) Complete Binary tree

Full binary tree

A binary tree is a full binary tree, if it contains maximum possible number of nodes in

all level. The full binary tree of height 4

Fig: Full Binary tree of height 4

Complete binary tree

A binary tree is said to be a complete binary tree, if all its level, except possibly the last

level, have the maximum number of possible nodes, all the nodes at the last level appear as far

left as possible.

A complete binary tree of height 4

Fig: A complete binary tree of height 4

III.REPRESENTATION OF BINARY TREE

Two common methods used for representing this structure

1. Linear or sequential representation. (Using an array)

2. Linked representation (Using Pointers)

53

A. Linear Representation of a Binary tree

In this representation, the nodes are stored level by level, starting from the zero level

where only root node is present. Root node is stored in the first memory location.

some rules to decide the location of any of a tree in the array.

The root node is at location 1.

For any node with index i, 1< i≤ n

o PARENT(i)=[i/2] when i=1, there is no parent

o L CHILD(i)=2*I If 2*i>n, then I has no left child

o R CHILD(i)=2*i+1 If 2*i+1>n, then I has no right child

Consider a binary tree for the following expression (A-B)+C*(D/E)

Fig: Binary Tree

The representation of the same Binary tree using array is shown in fig below

A full Binary tree and the index of its various nodes when stored in an array is shown in Fig

below

B. Linked representation of Binary Tree

When we inserting a new node or deleting a node in a linear representation, We require

data movement up and down in the array then it will take excessive amount of processing time.

Linear representation of binary trees has a number of overheads. All these overheads are

taken care of linked representation

Structure of a node in linked rep:

DATA

54

RC LC

RC-Right Child LC=Left Child

Here LC & RC are two Link fields to store the address of left child and right child of a node.

DATA is the information of the node.

The tree with 9 nodes are represented as:

Fig: Binary Tree

OPERATIONS ON BINARY TREES

There are number of primitive operations that can be applied to a binary tree. If p is a

pointer to a node and of a binary tree, the function info(p) returns the contents of nd.

The functions left(p), right(p), father(p), and brother(p) return pointers to the left son of

nd, the right son of nd, the father of nd, and the brother of nd, respectively.

These functions return the null pointer if nd has no left son, right son, father, brother.

Finally, the logical functions isleft(p) and isright(p) return the value true if nd is a left or right

son, respectively, of some other node in the the tree, and false otherwise.

Note that the functions isleft(p),isright(p), and brother(p) can be implemented using the

functions left(p),right(p) and father(p).

Example isleft may be implemented as

Q=father(p);

if (q==null)

return(false);

if (left(q)==p)

return(true);

return(false);

or even simpler, asfather(p) && p== left(father(p)).isright may be implemented in a

similar manner, or by calling isleft.brother(p) may be implemented using isleft or isright as

if (father(p)==null)

return(null);

if (isleft(p))

55

return(right(father(p)));

return(left(father(p));

In constructing a binary tree, the operations maketree, setleft and setright are useful.

Maketree(x) creates a new binary tree consisting of a single node with information field x and

returns a pointer to that node. Setleft(p,x) accepts a pointer p to a binary node with no left son. It

creates a new left son of node(p) with information field x.setright(p,x) is analogous to setleft

except that it creates a right son of node(p).

Make_Empty

This operation is mainly for initialization. Some programmers prefer to initialize the first

element as a one-node tree, but our implementation follows the recursive definition of trees more

closely. It is also a simple routine, as evidenced below

template <class Etype>

void

Binary_Search _ Tree<Etype>::

Make_Empty (Tree_Mode<Etype> * & T)

if (T!= NULL)

Make_Empty( T-> Left);

Make _ Empty( T -> Right) ;

T = NULL;

Find

This operation generally requires returning a pointer to the node in tree T that has key X

or NULL if there is no such node. The protected Find routine does The public routine then

returns nonzero if the Find succeeded, and sets Last_Find. If the find failed, Zero is returned, and

Last_Find point to NULL. The structures tree makes this simple.

template <class type>

Tree_Node<Etype>

Binary _Search_Tree<EType>::

Find (Const Etype & X,Tree-Node<EType>*T) const

if (T==NULL)

56

return NULL;

if (x<T-> Element)

return Find(X,T->Left);

else

if (x>T->Element)

return Find( X,T-> right);

else

return T;

Find_Min and Find_Max

Internally, these routines return the position of the smallest and largest elements in the

tree, respectively.

Although returning the exact values of these elements seem more reasonable, this would

be inconsistently with the Find operation.

It is important that similar-looking operations do similar things. To perform a Find_Min

start at the root and go left as long as there is a left child.

The stopping point is the smallest element. The Find_Max routine is the same, except that

branching is to the right child. The public interface is similar to that of the Find routine


Tree_Node<Etype>


Find_Min (Tree_Node <Etype>* T) const

if (T==NULL)

return NULL;

else

if (T->Left== NULL)

return T;

else

return Find_Min(T->Left);

57


void


Find_Max (Tree_Node <Etype>* T) const

if (T!=NULL)

while (T->Right!=NULL)

T=T->Right;

return T;

BINARY TREE REPRESENTATIONS

Node Representation of Binary Trees

Tree nodes may be implemented with array elements or as allocation of a dynamic

variable. Each node contains info, left, right and father fields. The left, right and father fields of a

node point to the node‘s left son, right son, and father respectively.

Using the array implementation,

#define NUMNODES 500

Struct nodetype

int info;

int left;

int right;

int father;

;

Struct nodetype node[NUMNODES];

Under this representations, the operation info(p),left(p),right(p), and father(p) are

implemented by references to node[p].info, node[p].left, node[p].right and node[p].father

respectively.

To implement isleft and isright more efficiently, we include within each node an

additional flag isleft. The value of this flag is TRUE if the node is a left son and FALSE

otherwise. The root is uniquely identified by a NULL value(0) in its father field.

58

Alternatively, the sign of the father field could be negative if the node is a left son or

positive if it is a right son. The pointer to a node‘s father then given by the absolute value of the

father field. The isleft or isright operations would then need only examine the sign of the father

field.

To implement brother(p) more efficiently, brother field is included in each node. Once

the array of nodes is declared, create an available list by executing the following statements.

int avail, I;

avail=1;

for(i=0;i<NUMNODES;i++)

node[i].left=i+1;

node[NUMNODES-1].left=0;

Note that the available list is not a binary tree but a linear list whose nodes are linked

together by the left field. Each node in a tree is taken from yhe available pool when needed and

returned to the available pool when no longer in use. This representation is called the linked

array representation of a binary tree.

A node may be defined by

Struct nodetype

int info;

struct nodetype *left;

struct nodetype *right;

struct nodetype *father;

;

typedef struct nodetype *NODEPTR;

The operations info(p),left(p),right(p),and father(p) would be implemented by the

references to p->info, p->left, p->right, and p->father respectively. An explicit available list is

not needed. The routines getnode and freenode simply allocate and free nodes using the routines

malloc and free. This representation is called the dynamic node representation of a binary tree.

59

Both the linked array representation and the dynamic node representation are

implementations of an abstract linked representation (also called node representation)in which

implicit explicit pointers link together the nodes of a binary tree.

The maketree function, which allocates a node and sets it as the root of a single-node

binary tree, may be written as

NODEPTR maketree(x)

int x;

NODEPTR P;

P=getnode();

p->info=x;

p->left=NULL;

p->right=NULL;

return(p);

The routine setleft(x) sets a node with contents x as the left son of node(p):

Setleft(p,x);

NODEPTR P;

int x;

if (p==NULL)

printf(―void insertion‖);

else if(p->left!=NULL)

printf(―invalid insertion\n‖);

else

p->left=maketree(x);

The routine setright(p, x) to create a right son of node(p) with contents x is

similar.

INTERNAL AND EXTERNAL NODES

By definition leaf nodes have no sons. Thus in the linked representation of binary trees,

left and right pointers are needed only in non-leaf nodes. Sometimes two separate set of nodes

are used for non-leaves and leaves. Non-leaf nodes contain info, left and right fields and are

allocated as dynamic records or as an array of records managed using an available list. Leaf

60

nodes do not contain a left or right fields and are kept as a single info array that is allocated

sequentially as needed.

Alternatively they can be allocated as dynamic variables containing only an info value.

Each node can also contain a father field, if necessary. When this distinction is made between

non-leaf and leaf nodes, non-leaves are called internal nodes and leaves are called external

nodes.

IMPLICIT ARRAY REPRESENTATION OF BINARY TREES

In general, the nodes n of an almost complete binary tree can be numbered from 1 to n,

so that the number assigned a left son is twice the number assigned its father, and the number

assigned a right son is 1 more than twice the number assigned its father.

We can extend this implicit array representation of almost complete binary trees to an

implicit array representation of binary trees generally. This can be done by identifying an almost

complete binary tree that contains the binary tree being represented.

The Fig (a) illustrates two binary trees, and Fig (b) illustrates the smallest almost

complete binary trees that contain them. Finally Fig(c) illustrates the array representations of

these almost complete binary trees, and by extension, of the original binary trees.

The implicit array representation is also called the sequential representation, because it

allows a tree to be implemented in a contiguous block of memory rather than via pointers

connecting widely separated nodes.

Under the sequential representation, an array element is allocated whether or not it

serves to contain a node of a tree. Therefore, flag unused array elements as non-existent or null

tree nodes.

Fig (a) Two Binary trees

A

B C

D E

F G

H

I J

K L

M

A

B C

D E

G F

61

Fig (b) Almost complete extensions

0 1 2 3 4 5 6 7 8 9 10 11 12

A B C D E F G

0 1 2 3 4 5 6 7 8 9

H I J K L M

Fig(c) Array representations

Example

The program to find duplicate numbers in an input list, as well as the routines

maketree and setleft, using the sequential representation of binary trees

#define NUMNODES 500

Struct nodetype

int info;

int used;

node[NUMNODES];

main()

int p, q, number;

scanf(―%d‖,&number);

maketree(number);

H

I J

K L

M

62

while(scanf(―%d‖,&number)!=EOF)

p=q=0;

while (q<NIMNODES && node[q].used && number!= node[p].info)

p=q;

if (number<node[p].info)

q=2*p+1;

else

q=2*p+2;

if (number==node[p].info)

printf(―%d is a duplicate\n‖, number);

else if (number<node[p].info)

setleft(p,number);

else

setright(p,number);

maketree(x)

int x;

int p;

node[0].info=x;

node[0].used=TRUE;

for (p=1;p<NUMNODES;p++)

node[p].used=FALSE;

setleft (p, x)

int p,x;

int q;

q=2*p+1;

63

if (q>=NUMNODES)

error (―array overflow‖);

else if (node[q].used)

error (―invalid insertion‖);

else

node[q].info=x;

node[q].used=TRUE;

The routine for setright is similar. Note that the routine maketree initializes the

fields info and used to represent a tree with a single node.

IV. BINARY TREE TRAVERSALS

Traversing means visitng each node only once. Tree traversal is a method for visiting all

the nodes in the tree exactly once. There are three types of tree traversal techniques, namely

Inorder Traversal

Preorder Traversal

Postorder Traversal

Inorder Traversal

The Inorder traversal of a binary tree is performed as

Traverse the left subtree in inorder

Visit the root

Traverse the right subtree in inorder

Example

Fig: Inorder 10, 20, 30

20

10 30

20

10 30

64

Fig: Inorder A B C D E G H I J K

Recursive routine for Inorder Traversal

Void Inorder (Tree T)

if ( T!= NULL)

Inorder (T->left);

printElement (T->Element);

Inorder (T->right);

Preorder Traversal

The preorder traversal of a binary tree is performed as

Visit the root

Traverse the left subtree in preorder

Traverse the right subtree in preorder

Example

D

C I

A

B

G K

E H J

20

10 30

65

Fig: Preorder 20, 10, 30

Fig: Preorder D C A B I G E H K J


Void Preorder (Tree T)

if ( T!= NULL)


Preorder (T->left);

Preorder (T->right);

Postorder Traversal

The postorder traversal of a binary tree is performed as

Traverse the left subtree in postorder

Traverse the right subtree in postorder

Visit the root

D

C I

A

B

G K

E H J

66

Example

Fig: Postorder 10, 30, 20

Fig: Postorder B A C Education H G J K I D


Void Postorder (Tree T)

if ( T!= NULL)

Postorder (T->left);

Postorder (T->right);


V. HUFFMAN ALGORITHM

The inputs to the algorithm are n, the number of symbols in the original alphabet, and

frequency, an array of size at least n such that frequency [i] is the re1ative frequency of the ith

symbol.

The algorithm assigns values to an array code of size at least n, so that code[il contains

the code assigned to the ith symbol.

20

10 30

D

C I

A

B

G K

E H J

67

The algorithm also constructs an array position of size at least n such that position[il

points to the node representing the ith symbol.

This array is necessary to identify the point in the tree from which to start in

constructing the code for a particular symbol in the alphabet. Once the tree has been

constructed, the isleft operation introduced earlier can be used to determine whether 0 or 1

should be placed at the front of the code as we climb the tree.

The info portion of a tree node contains the frequency of occurrence of the symbol

represented by that node.

A set root nodes is used to keep pointers to the roots of partial binary trees that are not yet

left or right subtrees.

Since this set is modified by removing elements with minimum frequency, combining

them and then reinserting the combined element into the set, it is implemented as an ascending

priority queue of pointers, ordered by the value of the info field of the pointers' target nodes.

We use the operations pqinsert, to insert a pointer into the priority queue, and

pqmindelete, to remove the pointer to the node with the smallest info value from the priority

queue.

We may outline Huffman's algorithm as follows:

/* initialize the set of root nodes */

rootnodes = the empty ascending priority queue;

/* construct a node for each symbol */

68

Fig: Huffman trees

The huffman tree is strictly binary. Thus, if there are n symbols in the alphabet, the Huffman tree

can be presented by an array of nodes of size 2n-1.

REPRESENTING LISTS AS BINARY TREES

In this section we introduce a tree representation of a linear list in which

operations of fmding the kth element of a list and deleting a specific e1ement are

relatively efficient.

It is also possible to build a list with given elements using representation. We also

briefly consider the operation of inserting a single new element.

A list may be represented by a binary tree as illustrated in Fig. In Fig(a) shows a

list in the usual linked format. while Fig(b) and (c) show two binary tree

representations of the list.

Elements of the original list are represented by leaves of the tree (shown as

squares in the figure).

Whereas non leaf node tree (shown as circles in the figure) are present as part of

the internal tree structure.

69

Associated with each leaf node are the contents of the corresponding list

edlement. Associated with each nonleaf node is a count representing the number

of leaves in the node's left subtree.

The elements of the list in their original sequence are assigned to the leaves of the

tree in the inorder sequence of the leaves. Note from Fig several binary trees can

represent the same list.

Fig: A list and two corresponding Binary Trees

Finding the kth Element

To justify using so many extra tree nodes to represent a list, we present an

algorithm to find the kth element of a list represented by a tree.

Let tree point to the root of the tree, and let lcount(p) represent the count

associated with the nonleaf node pointed to by p [lcount(p) is the number ofleaves

in the tree rooted at node(left(p))].

The following algorithm sets the variable find to point to the leaf containing the

kth element of the list.

o The algorithm maintains a variable r containing the number of list elements

remaining to be counted.

o At the beginning of the algorithm r is initialized to k. At each nonleaf

node(p), the algorithm determines from the values of rand lcount(p)

whether the kth element is located in the left or right subtree.

o If the leaf is in the left subtree, the algorithm proceeds directly to that

subtree. If the desired leaf is in the right subtree, the algorithm proceeds to

that subtree after reducing the value of r by the value of lcount(p).

o k is assumed to be less than or equal to the number of elements in the list.

r = k;

P = tree;

while (p is not a leaf node)

if(r <=lcount(p))

70

p = left(p);

else

r -= lcount(p);

p = right(p);

find = p;

Fig(a) illustrates finding the fifth element of a list in the tree of Fig(b), and Fig(b)

illustrates finding the eighth element in the tree of Fig(c).

The dashed line represents the path taken by the algorithm down the tree to the

appropriate leaf. We indicate the value of r (the remaining number of elements to

be counted) next to each node encountered by the algorithm.

The number of tree nodes examined in finding the kth list element is less than or equal to

1 more than the depth of the tree (the longest path in the tree from the root to a leaf). Thus four

nodes are examined in Fig (a) in finding the fifth element of the list, and also in Fig(b) in finding

the eighth element. If a list is represented as a linked structure, four nodes are accessed in finding

the fifth element of the list [that is, the operation p = next(p) is performed four times] and seven

nodes are accessed in finding the eighth element.

Although this is not a very impressive saving, consider a list with 1000 elements. A

binary tree of depth 10 is sufficient to represent such a list, since log2 1000 is less than 10.

Thus, finding the kth element using such a binary tree would require examining no more

than 11 nodes. Since the number of leaves of a binary tree increases as 2d, where d is the depth

of the tree, such a tree represents a relatively efficient data structure for finding the kth element

of a list.

If an almost complete tree is used, the kth element of an n-element list can be found in at

most log2n + 1 node accesses, whereas k accesses would be required if a linear linked list were

used.

Fig: Finding the nth element of a tree-represented list

71

Deleting an Element

It involves only resetting a left or right pointer in the father of the deleted leaf dl to

null.Fig illustrates the results of this algorithm for a tree in which the nodes C, D, and B are

deleted in that order. Make sure that you follow the actions of the algorithm on these examples.

Note that the algorithm maintains a 0 count in leaf nodes for consistency, although the count is

not required for such nodes. Note I!o that the algorithm never moves up a nonleaf node even if

this could be done. We can easily modify the algorithm to do this but have not done so for

reasons that will become apparent shortly.

This deletion algorithm involves inspection of up to two nodes at each level. Thus, the

operation deleting the kth element of a list represented by a tree requires a number of node

accesses approximately equal to three times the tree depth.

Although deletion from a linked list requires acesses to only three nodes. For large lists,

therefore, the tree representation is more efficient.

Fig: Deletion Algorithm

TREE SEARCHING

There are several ways of organizing files as trees and some associated searching

algorithms.

In previous, we presented a method of using a binary tree to store a file in order to make

sorting the file more efficient. In that method, all the left descendants of a node with key key

have keys that are less than key, and all right descendants have keys that are greater than or equal

to key.

The inorder such a binary tree yields the file in ascending key order.

Such a tree may also be used as a binary search tree. Using binary tree noation,the

algorithm for searching for the key key in such a tree is as follows

p=tree;

while(p!=NULL && KEY!=k(p))

p=(key< k(p)) ? left(p):right(p);

return(p);

72

The efficiency of the search process can be improved by using a sentinel, as in sequential

searching.

A sentinel node, with a separate external pointer pointing to it, remains allocated with

the tree. All left or right tree pointers that do not point to another tree node now point to this

sentinel node instead of equalling null. When a search is performed, the argument key is first

inserted into the sentinel node, thus guaranteeing that it will be located in the tree.

A sorted array can be produced from a binary search tree by traversing tree in inorder and

inserting each element sequentially into the array as it is visited. On the other hand, there are

many binary search trees that correspond to a given sorted array. Viewing the middle element of

the array as the root of a tree and viewing the remaining elements recursively as left and right

subtrees produces a relatively balanced binary search tree in Fig(a). Viewing the first element of

the array as the root of a tree and each successive element as the right predecessor produces a

very unbalanced binary tree in Fig (b).

The advantage of using a binary search tree over an array is that a tree enables search,

insertion, and deletion operations to be performed efficiently. If an array used, an insertion or

deletion requires that approximately half of the elements array be moved. (Why?) Insertion or

deletion in a search tree, on the other requires that only a few pointers be adjusted.

Fig (a) A sorted array and two of its binary tree representations

73

Fig(b) cont..

Inserting into a Binary search Tree

The following algorithm searches a binary search tree and inserts a new record into the

tree if the search is unsuccessful.

q = null;

p = tree;

while (p != null)

if (key == k(p))

return (p) ;

q = p;

if (key < k(p))

p = left(p);

else

p = right(p);

v = maketree(rec, key);

if (q == null)

tree = v;

else

if (key < k( q) )

left(q) = v;

74

else

right( q) = v;

return ( v) ;

Note that after a new record is inserted, the tree retains the property of being sorted in an inorder

traversal

Deleting from a Binary Search Tree

We now present an algorithm to delete a node with key key from a binary search tree.

There are three cases to consider. If the node to be deleted has no sons, may be deleted without

further adjustment to the tree. This is illustrated in Fig (a).

If the node to be deleted has only one subtree, its only son can be moved up to take its

place. This is illustrated in Fig (b). If, however, the node p to be deleted has two subtrees. its

inorder successor s (or predecessor) must take its place. The inorder successor cannot have a left

subtree.

Thus the right son of s can be moved up to the place of s. This is illustrated in Fig(c),

where the node with key 12

Replaces the node with key 11 and is replaced, in turn by the node with the key 13. In the

algorithm below, if no node with key key exists in the tree, the tree is left unchanged.

Fig(a) Deleting node with key 15

Fig(b) Deleting node with key 5

75

Fig(c) Deleting node with key 11

P=tree;

Q=null;

while (p!=null && k(p)!=key)

q=p;

p=(key < k(p))? Left(p):right(p);

if( p==null)

return;

if (left(p)==null)

rp=right(p);

else

if(right(p)==null)

rp=left(p);

else

f=p;

rp=right(p);

s=left(rp);

while (s!=null)

f=rp;

rp=s;

s=left(rp);

76

if (f!=p)

left (p)=right(p);

right (rp)=right(p);

left(rp)=left(p);

if (q==null)

tree=rp;

else

(p==left(q))? ,eft(q)=rp:right(q)=rp;

freenode(p);

return;

VI. SORTING AND SEARCHING TECHNIQUES

Sorting is the operation of arranging the records of a table according to the key value of

each record. A table of a file is an ordered sequence of records r[1],r[2],…r[n] each containing a

key k[1],k[2]…k[n]. The table is sorted based on the key.

A sorting algorithm is said to be stable if it preserves the order for all records. There are

Internal Sorting

External Sorting

Internal Sort:

All records to be sorted are kept internally in the main memory

External Sort:

If there are large number of records to be stored, they must be kept in external files

on auxiliary storage.

INTERNAL SORTING

A) INSERTION SORT

Insertion sort works by taking elements from the list one by one and inserting them in

their current position into the new sorted list.

Insertion sort consists of N-1 passes, where N is the number of elements to be sorted. The

ith pass of insertion sort will insert the ith element A[i] into its right place among A[1],A[2]..A[i-

1].

77

After doing this insertion the records occupying A[1]..A[i] are in sorted order.

Procedure

Void Insertion_Sort (int a[], int n)

int i, j,temp;

for ( i=0;i<n;i++)

temp=a[i];

for (j=I;j>0 && a[j-1]> temp; j--)

a[j]=a[j-1];

a[j]=temp;

Example

Consider an unsorted array

20 10 60 40 30 1

Passes of Insertion sort

Analysis

Worst Case Analysis O(N2)

Best Case Analysis O(N)

ORIGINAL 20 10 60 40 30 15 POSITIONS MOVED

After i=1 10 20 60 40 30 15 1

After i=2 10 20 60 40 30 15 0

After i=3 10 20 40 60 30 15 1

After i=4 10 20 30 40 60 15 2

After i=4 10 15 20 30 40 60 4

Sorted Array 10 15 20 30 40 60

78

Average Case Analysis O(N2)

B) SHELL SORT

Shell Sort was invented by Donald Shell. It improves upon bubble sort and insertion sort

by moving out of order elements more than one position at a time. It works by arranging the data

sequence in a two dimensional array and then sorting the columns of the array using insertion

sort.

In shell sort the whole array is first fragmented into K segments, where K is preferably a

prime number. After the first pass the whole array is partially sorted. In the next pass, the value

of K is reduced which increases the size of each argument and reduces the number of segments.

The next value of K is chosen so that it is relatively prime to its previous value. The

process is repeated until K=1 at which array is sorted. The insertion sort is applied to each

segment, so each successive segment is partially sorted.

The shell sort is also called the Diminishing Increment Sort, because the value of K

decreases continuously.

Procedure

Void shellsort (int A[],int N)

int i ,j,k,temp;

for (i=k;i<N;i++)

temp=A[i];

for( j=I;j>=k &&A[j-k]>temp;j=j-k)

A[j]=A[j-k];

A[j]=temp;

Example

Consider an unsorted array

81 94 11 96 12 35 17 95 28 58

Here N=10, the first pass K=5(10/2)

79

81 94 11 96 12 35 17 95 28 58

After first pass

35 17 11 28 12 81 94 95 96 58

In second pass, K is reduced to 3

After second pass

28 12 11 35 17 81 58 95 96 94

In third pass, K is reduced to 1

The final sorted array is

11 12 17 28 35 58 81 94 95 96

Analysis

Worst Case Analysis O(N2)

Best Case Analysis O(N log N)

Average Case Analysis O(N1.5

)

C) QUICK SORT

The basic version of quick sort algorithm was invented by C. A. R. Hoare in 1960 and

formally introduced quick sort in 1962.

It is used on the principle of divide-and-conquer. Quick sort is an algorithm of choice in

many situations because it is not difficult to implement, it is a good "general purpose" sort and it

consumes relatively fewer resources during execution.

Good points

It is in-place since it uses only a small auxiliary stack.

It requires only n log(n) time to sort n items.

It has an extremely short inner loop

This algorithm has been subjected to a thorough mathematical analysis, a very precise

statement can be made about performance issues.

Bad Points

It is recursive. Especially if recursion is not available, the implementation is extremely

complicated.

It requires quadratic (i.e., n2) time in the worst-case.

It is fragile i.e., a simple mistake in the implementation can go unnoticed and cause it to

perform badly.

80

Quick sort works by partitioning a given array A[p . . r] into two non-empty sub array A[p . .

q] and A[q+1 . . r] such that every key in A[p . . q] is less than or equal to every key in A[q+1 . .

r]. Then the two sub arrays are sorted by recursive calls to Quick sort. The exact position of the

partition depends on the given array and index q is computed as a part of the partitioning

procedure.

QuickSort

1. If p < r then

2. q= Partition (A, p, r)

3. Recursive call to Quick Sort (A, p, q-1)

4. Recursive call to Quick Sort (A, q + 1, r)

Note that to sort entire array, the initial call Quick Sort (A, 1, length[A])

As a first step, Quick Sort chooses as pivot one of the items in the array to be sorted.

Then array is then partitioned on either side of the pivot. Elements that are less than or equal to

pivot will move toward the left and elements that are greater than or equal to pivot will move

toward the right.

Partitioning the Array

Partitioning procedure rearranges the sub arrays in-place.

1. PARTITION (A, p, r)

2. x ← A[p]

3. i ← p-1

4. j ← r+1

5. while i<j do

6. Repeat j ← j-1

7. until A[j] ≤ x

8. repeat i ← i+1

9. until A[i] ≥ x

10. Exchange A[i] ↔ A[j]

11. Exchange A[p] ↔ A[j]

12. return j

Partition selects the first key, A[p] as a pivot key about which the array will partitioned:

Keys ≤ A[p] will be moved towards the left .

Keys ≥ A[p] will be moved towards the right.

The running time of the partition procedure is (n) where n = r - p +1 which is the

number of keys in the array. Another argument that running time of PARTITION on a subarray

of size (n) is as follows:

81

Pointer i and pointer j start at each end and move towards each other, conveying

somewhere in the middle. The total number of times that i can be incremented and j can be

decremented is therefore O(n).

Associated with each increment or decrement there are O(1) comparisons and swaps.

Hence, the total time is O(n).

Array of Same Elements

Since all the elements are equal, the "less than or equal" teat in lines 6 and 8 in the

PARTITION (A, p, r) will always be true.

This simply means that repeat loop all stop at once. Intuitively, the first repeat loop

moves j to the left; the second repeat loop moves i to the right. In this case, when all elements are

equal, each repeat loop moves i and j towards the middle one space.

They meet in the middle, so q= Floor(p+r/2). Therefore, when all elements in the array

A[p . . r] have the same value equal to Floor(p+r/2).

Performance of Quick Sort

The running time of quick sort depends on whether partition is balanced or unbalanced,

which in turn depends on which elements of an array to be sorted are used for partitioning.

A very good partition splits an array up into two equal sized arrays. A bad partition, on

other hand, splits an array up into two arrays of very different sizes

. The worst partition puts only one element in one array and all other elements in the

other array. If the partitioning is balanced, the Quick sort runs asymptotically as fast as merge

sort.

On the other hand, if partitioning is unbalanced, the Quick sort runs asymptotically as

slow as insertion sort.

Best Case

The best thing that could happen in Quick sort would be that each partitioning stage

divides the array exactly in half. In other words, the best to be a median of the keys in A[p . . r]

every time procedure 'Partition' is called.

The procedure 'Partition' always split the array to be sorted into two equal sized arrays.

If the procedure 'Partition' produces two regions of size n/2. the recurrence relation is then

T(n) = T(n/2) + T(n/2) + (n)

= 2T(n/2) + (n)

82

And from case 2 of Master theorem

T(n) = (n lg n)

Worst case Partitioning

The worst-case occurs if given array A[1 . . n] is already sorted. The PARTITION (A, p,

r) call always return p so successive calls to partition will split arrays of length n, n-1, n-2, . . . , 2

and running time proportional to n + (n-1) + (n-2) + . . . + 2 = [(n+2)(n-1)]/2 = (n2). The worst-

case also occurs if A[1 . . n] starts out in reverse order.

We conclude that QUICKSORT's average running time is (n lg(n))

Quick sort is an in place sorting algorithm whose worst-case running time is (n2) and expected

running time is (n lg n)

D) HEAP SORT

The binary heap data structures is an array that can be viewed as a complete binary tree.

Each node of the binary tree corresponds to an element of the array. The array is completely

filled on all levels except possibly lowest.

We represent heaps in level order, going from left to right. The array corresponding to the

heap above is [25, 13, 17, 5, 8, 3].

The root of the tree A[1] and given index i of a node, the indices of its parent, left child

and right child can be computed.

83

PARENT (i)

return floor (i/2

LEFT (i)

return 2i

RIGHT (i)

return 2i + 1

Let's try these out on a heap to make sure we believe they are correct. Take this heap,

which is represented by the array [20, 14, 17, 8, 6, 9, 4, 1].

We'll go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left

child, we calculate 1 * 2 = 2. This takes us (correctly) to the 14. Now, we go right, so we

calculate 2 * 2 + 1 = 5. This takes us (again, correctly) to the 6.

Now let's try going from the 4 to the 20. 4's index is 7. We want to go to the parent, so we

calculate 7 / 2 = 3, which takes us to the 17. Now, to get 17's parent, we calculate 3 / 2 = 1,

which takes us to the 20.

Heap Property

In a heap, for every node i other than the root, the value of a node is greater than or equal

(at most) to the value of its parent.

A[PARENT (i)] ≥A[i]

Thus, the largest element in a heap is stored at the root. Following is an example of Heap:

By the definition of a heap, all the tree levels are completely filled except possibly for the

lowest level, which is filled from the left up to a point.

84

Clearly a heap of height h has the minimum number of elements when it has just one

node at the lowest level. The levels above the lowest level form a complete binary tree of height

h -1 and 2h

-1 nodes. Hence the minimum number of nodes possible in a heap of height h is 2h.

Clearly a heap of height h, has the maximum number of elements when its lowest level is

completely filled. In this case the heap is a complete binary tree of height h and hence has 2h+1

-1

nodes.

Following is not a heap, because it only has the heap property - it is not a complete binary

tree.

Recall that to be complete, a binary tree has to fill up all of its levels with the possible

exception of the last one, which must be filled in from the left side.

Height of a node

We define the height of a node in a tree to be a number of edges on the longest simple

downward path from a node to a leaf.

Height of a tree

The number of edges on a simple downward path from a root to a leaf. Note that the

height of a tree with n node is lg n which is (logn). This implies that an n-element heap has

height lg n

In order to show this let the height of the n-element heap be h. From the bounds obtained

on maximum and minimum number of elements in a heap, we get

2h ≤ n ≤ 2

h+1-1

Where n is the number of elements in a heap.

2h ≤ n ≤ 2

h+1

Taking logarithms to the base 2

h ≤ lgn ≤ h +1 It follows that h = lgn

85

We known from above that largest element resides in root, A[1]. The natural question to

ask is where in a heap might the smallest element resides?

Consider any path from root of the tree to a leaf. Because of the heap property, as we

follow that path, the elements are either decreasing or staying the same.

If it happens to be the case that all elements in the heap are distinct, then the above

implies that the smallest is in a leaf of the tree.

It could also be that an entire subtree of the heap is the smallest element or indeed that

there is only one element in the heap, which in the smallest element, so the smallest element is

everywhere.

Note that anything below the smallest element must equal the smallest element, so in

general, only entire subtrees of the heap can contain the smallest element.

Inserting Element in the Heap

Suppose we have a heap as follows

Let's suppose we want to add a node with key 15 to the heap. First, we add the node to

the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree

remains complete.

Let's suppose we want to add a node with key 15 to the heap. First, we add the node to

the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree

remains complete.

86

Now we do the same thing again, comparing the new node to its parent. Since 14 < 15,

we have to do another swap

Now we are done, because 15 20.

Four basic procedures on heap are

1. Heapify, which runs in O(lg n) time.

2. Build-Heap, which runs in linear time.

3. Heap Sort, which runs in O(n lg n) time.

4. Extract-Max, which runs in O(lg n) time.

Maintaining the Heap Property

Heapify is a procedure for manipulating heap data structures. It is given an array A and

index i into the array.

The subtree rooted at the children of A[i] are heap but node A[i] itself may possibly

violate the heap property i.e., A[i] < A[2i] or A[i] < A[2i +1].

The procedure 'Heapify' manipulates the tree rooted at A[i] so it becomes a heap. In other

words, 'Heapify' is let the value at A[i] "float down" in a heap so that subtree rooted at index i

becomes a heap.

Outline of Procedure Heapify

Heapify picks the largest child key and compare it to the parent key. If parent key is

larger than heapify quits, otherwise it swaps the parent key with the largest child key. So that the

parent is now becomes larger than its children.

87

It is important to note that swap may destroy the heap property of the subtree rooted at

the largest child node. If this is the case, Heapify calls itself again using largest child node as the

new root.

Heapify (A, i)

1. l ← left [i]

2. r ← right [i]

3. if l ≤ heap-size [A] and A[l] > A[i]

4. then largest ← l

5. else largest ← i

6. if r ≤ heap-size [A] and A[i] > A[largest]

7. then largest ← r

8. if largest ≠ i

9. then exchange A[i] ↔ A[largest]

10. Heapify (A, largest)

Analysis

If we put a value at root that is less than every value in the left and right subtree, then

'Heapify' will be called recursively until leaf is reached.

To make recursive calls traverse the longest path to a leaf, choose value that make

'Heapify' always recurse on the left child.

It follows the left branch when left child is greater than or equal to the right child, so

putting 0 at the root and 1 at all other nodes, for example, will accomplished this task.

With such values 'Heapify' will called h times, where h is the heap height so its running

time will be θ(h) (since each call does (1) work), which is (lgn). Since we have a case in

which Heapify's running time (lg n), its worst-case running time is Ω(lgn).

Example of Heapify

Suppose we have a complete binary tree somewhere whose subtrees are heaps. In the

following complete binary tree, the subtrees of 6 are heaps:

88

The Heapify procedure alters the heap so that the tree rooted at 6's position is a heap.

Here's how it works. First, we look at the root of our tree and its two children.

We then determine which of the three nodes is the greatest. If it is the root, we are done,

because we have a heap. If not, we exchange the appropriate child with the root, and continue

recursively down the tree. In this case, we exchange 6 and 8, and continue.

Now, 7 is greater than 6, so we exchange them.

We are at the bottom of the tree, and can't continue, so we terminate.

Building a Heap

We can use the procedure 'Heapify' in a bottom-up fashion to convert an array A[1 . . n]

into a heap. Since the elements in the subarray A[ n/2 +1 . . n] are all leaves, the procedure

BUILD_HEAP goes through the remaining nodes of the tree and runs 'Heapify' on each one. The

bottom-up order of processing node guarantees that the subtree rooted at children are heap before

'Heapify' is run at their parent.

BUILD_HEAP (A)

89

1. heap-size (A) ← length [A]

2. For i ← floor(length[A]/2 down to 1 do

3. Heapify (A, i)

We can build a heap from an unordered array in linear time

Heap Sort Algorithm

The heap sort combines the best of both merge sort and insertion sort. Like merge sort,

the worst case time of heap sort is O(n log n) and like insertion sort, heap sort sorts in-place. The

heap sort algorithm starts by using procedure BUILD-HEAP to build a heap on the input array

A[1 . . n]. Since the maximum element of the array stored at the root A[1], it can be put into its

correct final position by exchanging it with A[n] (the last element in A). If we now discard node

n from the heap than the remaining elements can be made into heap. Note that the new element

at the root may violate the heap property. All that is needed to restore the heap property.

HEAPSORT (A)

1. BUILD_HEAP (A)

2. for i ← length (A) down to 2 do

exchange A[1] ↔ A[i]

heap-size [A] ← heap-size [A] - 1

Heapify (A, 1)

The HEAPSORT procedure takes time O(n lg n), since the call to BUILD_HEAP takes time

O(n) and each of the n -1 calls to Heapify takes time O(lg n).

Now we show that there are at most n/2h+1

nodes of height h in any n-element heap.

We need two observations to show this. The first is that if we consider the set of nodes of height

h, they have the property that the subtree rooted at these nodes are disjoint.

In other words, we cannot have two nodes of height h with one being an ancestor of the

other. The second property is that all subtrees are complete binary trees except for one subtree.

Let Xh be the number of nodes of height h. Since Xh-1 o f these subtrees are full, they each

contain exactly 2h+1

-1 nodes.

90

One of the height h subtrees may not full, but contain at least 1 node at its lower level

and has at least 2h nodes. The exact count is 1+2+4+ . . . + 2

h+1 + 1 = 2

h. The remaining nodes

have height strictly more than h.

To connect all subtrees rooted at node of height h., there must be exactly Xh -1 such

nodes. The total of nodes is at least

(Xh-1)(2h+1

+ 1) + 2h + Xh-1 which is at most n.

Simplifying gives

Xh ≤ n/2h+1

+ 1/2.

In the conclusion, it is a property of binary trees that the number of nodes at any level is

half of the total number of nodes up to that level. The number of leaves in a binary heap is equal

to n/2, where n is the total number of nodes in the tree, is even and n/2 when n is odd. If these

leaves are removed, the number of new leaves will be lgn/2/2 or n/4 . If this process is

continued for h levels the number of leaves at that level will be n/2h+1

Implementation

void heapSort(int numbers[], int array_size)

int i, temp;

for (i = (array_size / 2)-1; i >= 0; i--)

siftDown(numbers, i, array_size);

for (i = array_size-1; i >= 1; i--)

temp = numbers[0];

numbers[0] = numbers[i];

numbers[i] = temp;

siftDown(numbers, 0, i-1);

void siftDown(int numbers[], int root, int bottom)

int done, maxChild, temp;

done = 0;

while ((root*2 <= bottom) && (!done))

91

if (root*2 == bottom)

maxChild = root * 2;

else if (numbers[root * 2] > numbers[root * 2 + 1])

maxChild = root * 2;

else

maxChild = root * 2 + 1;

if (numbers[root] < numbers[maxChild])

temp = numbers[root];

numbers[root] = numbers[maxChild];

numbers[maxChild] = temp;

root = maxChild;

else

done = 1;

EXTERNAL SORTING

It is used for sorting methods that are employed when the data to be sorted is too large to

fit in primary memory.

Need for external sorting

During the sorting, some of the data must be stored externally such as tape or disk

The cost of accessing data is significantly greater than either book keeping or comparison

costs

If tape is used as external memory then the items must be accessed sequentially

Steps to be followed

The basic external sorting algorithm uses the merge routine from merge sort

Divide the file into runs that the size of a run is small enough to fit into main memory

Sort each run in main memory

Merge the resulting runs together into successively bigger runs

Repeat the steps until the file is sorted

92

MERGE SORT

Merge-sort is based on the divide-and-conquer paradigm. The Merge-sort algorithm can be

described in general terms as consisting of the following three steps:

1. Divide Step

If given array A has zero or one element, return S; it is already sorted. Otherwise, divide

A into two arrays, A1 and A2, each containing about half of the elements of A.

2. Recursion Step

Recursively sort array A1 and A2.

3. Conquer Step

Combine the elements back in A by merging the sorted arrays A1 and A2 into a sorted

sequence.

We can visualize Merge-sort by means of binary tree where each node of the tree

represents a recursive call and each external nodes represent individual elements of given array

A. Such a tree is called Merge-sort tree. The heart of the Merge-sort algorithm is conquer step,

which merge two sorted sequences into a single sorted

sequence.

To begin, suppose that we have two sorted arrays A1[1], A1[2], . . , A1[M] and A2[1],

A2[2], . . . , A2[N]. The following is a direct algorithm of the obvious strategy of successively

choosing the smallest remaining elements from A1 to A2 and putting it in A.

MERGE (A1, A2, A)

i.← j 1

A1[m+1], A2[n+1] ← INT_MAX

For k ←1 to m + n do

if A1[i] < A2[j]

then A[k] ← A1[i]

i ← i +1

else

93

A[k] ← A2[j]

j ← j + 1

Merge Sort Algorithm

MERGE_SORT (A)

A1[1 . . n/2 ] ← A[1 . . n/2 ]

A2[1 . . n/2 ] ← A[1 + n/2 . . n]

Merge Sort (A1)

Merge Sort (A1)

Merge Sort (A1, A2, A)

Analysis

Let T(n) be the time taken by this algorithm to sort an array of n elements dividing A into

subarrays A1 and A2 takes linear time. It is easy to see that the Merge (A1, A2, A) also takes the

linear time. Consequently,

T(n) = T( n/2 ) + T( n/2 ) + θ(n)

for simplicity

T(n) = 2T (n/2) + θ(n)

The total running time of Merge sort algorithm is O(n lg n), which is asymptotically optimal

like Heap sort, Merge sort has a guaranteed n lg n running time. Merge sort required (n) extra

space. Merge is not in-place algorithm. The only known ways to merge in-place (without any

extra space) are too complex to be reduced to practical program.

Implementation

void mergeSort(int numbers[], int temp[], int array_size)

m_sort(numbers, temp, 0, array_size - 1);

94

void m_sort(int numbers[], int temp[], int left, int right)

int mid;

if (right > left)

mid = (right + left) / 2;

m_sort(numbers, temp, left, mid);

m_sort(numbers, temp, mid+1, right);

merge(numbers, temp, left, mid+1, right);

void merge(int numbers[], int temp[], int left, int mid, int right)

int i, left_end, num_elements, tmp_pos;

left_end = mid - 1;

tmp_pos = left;

num_elements = right - left + 1;

while ((left <= left_end) && (mid <= right))

if (numbers[left] <= numbers[mid])

temp[tmp_pos] = numbers[left];

tmp_pos = tmp_pos + 1;

left = left +1;

else

temp[tmp_pos] = numbers[mid];


mid = mid + 1;

95

while (left <= left_end)

temp[tmp_pos] = numbers[left];

left = left + 1;


while (mid <= right)

temp[tmp_pos] = numbers[mid];

mid = mid + 1;


for (i=0; i <= num_elements; i++)

numbers[right] = temp[right];

right = right - 1;

SIMPLE ALGORITHM (2 way merge)

Let us consider four tapes Ta1, Ta2, Tb1, Tb2 which are two input and two output tapes.

The and b tapes can act as either input tapes or output tapes depending upon the algorithm.

Let the size of the run (M) is taken as 3 to sort the following set of values.

Ta1 44 80 12 35 45 58 75 60 24 48 92 98 85

Ta2

Ta3

96

Initial Run Construction

Step 1: Read M records at a time from the i/p tape Ta1

Step 2: Sort the records internally and write the resultant records alternately to Tb1 and Tb2

First 3 records from the input tape Ta1 is read and sorted internally as (12, 44, 80) and

placed in Tb1

Then next 3 records(35, 45, 58) are read and the sorted records is placed in Tb2

Similarly the rest of the records are placed alternatively inTb1 and Tb2 contain a group of runs.

Number of runs=4

First Pass

First run of Tb1 &Tb2 are merged and the sorted records placed in Ta1

Similarly the second run of Tb1 &Tb2 are merged and the sorted records placed in Ta2

Here the number of run is reduced to 2, but the size of the runs is increased.

Second Pass

Ta1 12 35 44 45 58 80 85

Ta2 24 48 60 75 92 98

Tb1

Tb2

Ta1

Ta2

Tb1 12 44 80 24 60 75 85

Tb2 35 45 58 48 92 98

97

First run of Ta1 and Ta2 are merged and the sorted records is placed inTb1 and the

second run of Ta1 is placed in Tb2.

Third Pass

In the third pass, run fromTb1 and Tb2 are merged and then sorted records are placed in Ta1.

This algorithm will require log(N/M)passes, plus the initial run constructing pass

MULTIWAY MERGE

The number of passes required to sort an input can be reduced by increasing the number

of tapes. This can be done by extending the 2 way merge to k way merge. The only difference is

that, it is more complicated to find the smallest of the k elements, which can be overcome by

using priority queues.

For the same input,

44 80 12 35 45 58 75 60 24 48 92 98 85

let us consider 6 tapes Ta1, Ta2, Ta3, Tb1, Tb2, Tb3 & M=3

Initial run constructing pass

Ta1 12 24 35 44 45 48 58 60 75 80 85 92

Ta2

Tb1

Tb2

Ta1

Ta2

Tb1 12 24 35 44 45 48 58 60 75 80 92

Tb2 85

98

In first pass, first run of Tb1, Tb2 and Tb3 are merged and sorted records are placed

Ta1

Similarly second run from Tb1 & Tb2 are merged and sorted records are then placed in Ta2

First Pass

In second pass, runs from Ta1 and Ta2 are merged and the sorted records are placed in

Tb1, which contains the final sorted records.

Second Pass

For the same example 2-way merge requires 4 passes to get the sorted elements whereas, in

multiway merge it is reduced to 3 passes, which also includes the initial run constructing pass.

POLYPHASE MERGE

The k way merge strategy requires 2k tapes to perform the sorting. In some application it

is possible to get by only k+1 tapes.

Ta1

Ta2

Ta3

Tb1 12 44 80 48 92 98

Tb2 35 45 58 85

Tb3 24 60 75

Ta1 12 24 35 44 45 58 60 75 80

Ta2 48 85 92 98

Ta3

Tb1

Tb2

Tb3

Ta1 12 24 35 44 45 48 58 60 75 80 85 92 98

Ta2

Ta3

Tb1

Tb2

Tb3

99

Example let us consider 3 tapes T1, T2 T3 and the input file on T1 that produces 8 runs

The distribution of the runs in each type varies as

1. Equal distribution (4&4) runs

2. Unequal Distribution (7&1) runs

3. Fibonacci numbers (3&5) runs

Equal Distribution

Put 4 runs on each tapes T2 & T3, after applying merge routine, the resultant tape T1 has

4 runs, whereas other tapes T2 & T3 are empty which leads to adding an halfpass for every pass.

In first pass, all the runs(4) are placed in one tapes, so it logically divided and placed half of the

runs(2) in any of the other tapes.

Unequal Distribution

For instance, if 7 runs are placed on T2 and 1 run in T3 then after the first merge T1 will

hold 1 run and T2 will hold 6 runs. As it merge only one set of run, the process get slower

resulting more number of passes.

Fibonacci Numbers

Tapes Run construction AfterT2+T3 After splitting After T1+T2 After split After T2+T3

T1 0 4 2 0 0 1

T2 4 0 2 0 1 0

T3 4 0 0 2 1 0

Tapes Run After After After After After After After

Const T2+T3 T1+T2 T2+T3 T1+T2 T2+T3 T1+T2 T2+T3

T1 0 1 0 1 0 1 0 1

T2 7 6 5 4 3 2 1 0

T3 1 0 1 0 1 0 1 0

100

If the number of runs is the fibonacci number F(N), then the runs are distributed as 2

fibonacci numbers F(N-1) & F(N-2)

Here the number of runs is 8, a fibonacci number, so it can be distributed as 3 runs in

Tape T2 and 5 runs in T3

This method of distributing runs gives the optimal result, i.e less number of passes to sort the

records than the other two methods.

VI.HASHING

An array in which TableNodes are not stored consecutively - their place of storage is

calculated using the key and a hash function

Hashed key: the result of applying a hash function to a key

Keys and entries are scattered throughout the array

An array in which TableNodes are not stored consecutively - their place of storage is

calculated using the key and a hash function

insert: calculate place of storage, insert TableNode; O(1)

find: calculate place of storage, retrieve entry; O(1)

remove: calculate place of storage, set it to null; O(1)

All are O(1) !

Tapes Run After After After After

Const T2+T3 T1+T3 T1+T2 T2+T3

T1 0 3 1 0 1

T2 3 0 2 1 0

T3 5 2 0 1 0

101

Three factors affecting the performance of hashing

The hash function

o Ideally, it should distribute keys and entries evenly throughout the table

o It should minimise collisions, where the position given by the hash function is

already occupied

The collision resolution strategy

o Separate chaining: chain together several keys/entries in each position

o Open addressing: store the key/entry in a different position

The size of the table

o Too big will waste memory; too small will increase collisions and may

eventually force rehashing (copying into a larger table)

o Should be appropriate for the hash function used – and a prime number is best

Choosing a hash function: turning a key into a table position

Truncation

o Ignore part of the key and use the rest as the array index (converting non-

numeric parts)

o A fast technique, but check for an even distribution throughout the table

Folding

o Partition the key into several parts and then combine them in any convenient

way

o Unlike truncation, uses information from the whole key

Modular arithmetic (used by truncation & folding, and on its own)

o To keep the calculated table position within the table, divide the position by

the size of the table, and take the remainder as the new position

Examples of hash functions

Truncation: If students have an 9-digit identification number, take the last 3 digits as

the table position

o e.g. 925371622 becomes 622

102

Folding: Split a 9-digit number into three 3-digit numbers, and add them

o e.g. 925371622 becomes 925 + 376 + 622 = 1923

Modular arithmetic: If the table size is 1000, the first example always keeps within the

table range, but the second example does not (it should be mod 1000)

o e.g. 1923 mod 1000 = 923 (in Java: 1923 % 1000)

Using a telephone number as a key

o The area code is not random, so will not spread the keys/entries evenly

through the table (many collisions)

o The last 3-digits are more random

Using a name as a key

o Use full name rather than surname (surname not particularly random)

o Assign numbers to the characters (e.g. a = 1, b = 2; or use Unicode values)

o Strategy 1: Add the resulting numbers. Bad for large table size.

o Strategy 2: Call the number of possible characters c (e.g. c = 54 for alphabet

in upper and lower case, plus space and hyphen). Then multiply each character in

the name by increasing powers of c, and add together

Choosing the table size to minimize collisions

As the number of elements in the table increases, the likelihood of a collision increases

- so make the table as large as practical

If the table size is 100, and all the hashed keys are divisible by 10, there will be many

collisions!

o Particularly bad if table size is a power of a small integer such as 2 or 10

More generally, collisions may be more frequent if:

o greatest common divisor (hashed keys, table size) > 1

Therefore, make the table size a prime number (gcd = 1)

Collisions may still happen, so we need a collision resolution strategy

Collision resolution: open addressing

Probing: If the table position given by the hashed key is already occupied, increase the position

by some amount, until an empty position is found.

Linear probing: increase by 1 each time [mod table size!]

Quadratic probing: to the original position, add 1, 4, 9, 16,…

Use the collision resolution strategy when inserting and when finding (ensure that the

search key and the found keys match

May also double hash: result of linear probing result of another hash function With

open addressing, the table size should be double the expected no. of elements

If the table is fairly empty with many collisions, linear probing may cluster (group)

keys/entries. This increases the time to insert and to find

103

1 2 3 4 5 6 7 8

For a table of size n, then if the table is empty, the probability of the next entry going to

any particular place is 1/n.In the diagram, the probability of position 2 getting filled next is 2/n

(either a hash to 1 or to 2 fills it).Once 2 is full, the probability of 4 being filled next is 4/n and

then of 7 is 7/n (i.e. the probability of getting long strings steadily increases)

An empty key/entry marks the end of a cluster, and so can be used to terminate a find

operation.So, if we remove an entry within a cluster, we should not empty it.To allow probing to

continue, the removed entry must be marked as ‗removed but cluster continues‘

Quadratic probing is a solution to the clustering problem

Linear probing adds 1, 2, 3, etc. to the original hashed key

Quadratic probing adds 12, 2

2, 3

2 etc. to the original hashed key

However, whereas linear probing guarantees that all empty positions will be examined if

necessary, quadratic probing does not

e.g. Table size 16 and original hashed key 3 gives the sequence: 3, 4, 7, 12, 3, 12, 7, 4…

More generally, with quadratic probing, insertion may be impossible if the table is more than

half-full

A simple Hash Function

Unsigned int

Hash (Const String & Key, Const int H_Size)

Const char*key_ptr= Key;

Unsigned int Hash_val=0;

While (*key_ptr)

Hash_val+=*key_ptr++;

return Hash_val%H_Size;

Collision resolution: chaining

Each table position is a linked list. Add the keys and entries anywhere in the list (front easiest)

104

Advantages over open addressing:

– Simpler insertion and removal

– Array size is not a limitation (but should still minimize collisions: make table size

roughly equal to expected number of keys and entries)

Disadvantage

– Memory overhead is large if entries are small

Rehashing: enlarging the table

To rehash:

Create a new table of double the size (adjusting until it is again prime)

Transfer the entries in the old table to the new table, by recomputing their positions (using

the hash function)

When should we rehash?

When the table is completely full

With quadratic probing, when the table is half-full or insertion fails

Why double the size?

If n is the number of elements in the table, there must have been n/2 insertions before the

previous rehash (if rehashing done when table full)

So by making the table size 2n, a constant cost is added to each insertion

Applications of Hashing

Compilers use hash tables to keep track of declared variables

A hash table can be used for on-line spelling checkers — if misspelling detection (rather

than correction) is important, an entire dictionary can be hashed and words checked in

constant time

Game playing programs use hash tables to store seen positions, thereby saving

computation time if the position is encountered again

105

Hash functions can be used to quickly check for inequality — if two elements hash to

different values they must be different

Storing sparse data

Performance of Hashing

The number of probes depends on the load factor (usually denoted by ) which represents

the ratio of entries present in the table to the number of positions in the array

We also need to consider successful and unsuccessful searches separately

For a chained hash table, the average number of probes for an unsuccessful search is and

for a successful search is 1 + /2

Question Bank

Unit III- TREES

PART – A (2 MARKS)

1. Explain Tree concept? 2. What is meant by traversal? 3. What is meant by depth first order? 4. What is In order traversal? 5. What is Pre order traversal? 6. What is Post order traversal? 7. Define Binary tree. 8. What is meant by BST? 9. Define AVL trees. 10. Give example for single rotation and double rotation. 11. Define Hashing. 12. Define Double Hashing. 13. What is meant by Binary Heap? 14. Mention some applications of Priority Queues. 15. Define complete binary tree. 16. How a binary tree is represented using an array? Give an example 17. A full node is a node with two children. Prove that the number of full nodes plus

one is equal to the number of leaves in a non empty binary tree. 18. Define (i) inorder (ii) preorder (iii) postorder traversal of a binary tree. 19. Suppose that we replace the deletion function, which finds, return, and removes

the minimum element in the priority queue, with find min, can both insert and find min be implemented in constant time?

20. What is an expression tree? 21. What is binary search tree? 22. What is meant by sorting? 23. Mention the preliminaries of sorting. 24. What are the types of sorting? 25. What is the difference between bubble sort and selection sort? 26. Give example for insertion sort. 27. Mention the Running time for insertion sort. 28. What is meant by heap sort? 29. What is meant by Quick sort? 30. What is the advantage of Quick sort over Merge sort? 31. Mention the Best case n worst care of the quick sort. 32. What is meant by external sorting? 33. Determine the average running time of quick sort. 34. Trace the steps of insertion sort – 12,19,33,26,29,35,22. Find the total number of 35. comparison made. 36. What is the principle of radix sort?

106

37. What is insertion sort? 38. What is shell sort? 39. Define the worst case analysis of shell sort 40. What is merge sort? 41. What is meant by external sorting? 42. What is multiway merge?

PART- B (16 MARKS)

1. Explain the operation and implementation of Binary Heap. 2. Explain the implementation of different Hashing techniques. 3. Give the prefix, infix and postfix expressions corresponding to the tree given in

figure. (a) How do you insert an element in a binary search tree? (8) (b) Show that for the perfect binary tree of height h containing2h+1-1 nodes, the sum of the heights of the nodes 2h+1 -1-1(h+1). (8)

4. Given input 4371,1323,6173,4199,4344,9679,1989 and a hash function h(X)=X(mod10), show the resulting:

a. Separate chaining table (4) b. Open addressing hash table using linear probing (4) c. Open addressing hash table using quadratic probing (4) d. Open addressing hash table with second hash function h2(X) =7-(X mod

7). (4)

5. Explain in detail (i) Single rotation (ii) double rotation of an AVL tree. 6. Explain the efficient implementation of the priority queue ADT 7. Explain how to find a maximum element and minimum element in BST? Explain

detail about Deletion in Binary Search Tree?

e

+

a b c d 8. Sort the sequence 3, 1, 4,7,5,9,2,6,5 using Insertion sort. 9. Explain the operation and implementation of Insertion sort and shell sort. 10. Explain the operation and implementation of merge sort. 11. Explain the operation and implementation of external sorting.

a. Write quick sort algorithm and explain. (10) b. Trace the quick sort algorithm for the following list of numbers.

90,77,60,99,55,88,66 (6) 12. Write down the merge sort algorithm and give its worst case, best case and

average case analysis. 13. Show how heap sort processes the input

142,543,123,65,453,879,572,434,111,242,811,102.

107

UNIT - IV GRAPHS AND THEIR APPLICATIONS

Graphs – An Application of Graphs – Representation – Transitive Closure –Warshall‘s

Algorithm – Shortest path Algorithm – A Flow Problem – Dijikstra‘s Algorithm – Minimum

Spanning Trees – Kruskal and Prim‘s Algorithm – An Application of Scheduling – Linked

Representation of Graphs – Graph Traversals

108

Unit: IV - GRAPHS AND THEIR APPLICATIONS

GRAPHS

A graph is a time you have a set of objects, and there is some connection or relationship

or interaction between collection of vertices or nodes, connected by a collection of edges.

Graphs are extremely important because they are a very flexible mathematical model for

many application problems.

Basically, any pairs of objects, a graph is a good way to model this. Examples of graphs

in application include communication and transportation networks, VLSI and other sorts of logic

circuits, surface meshes used for shape description in computer-aided design and geographic

information systems, precedence constraints in scheduling systems

Directed Graph

A directed graph (or digraph) G = (V,E) consists of a finite set V , called the vertices or

nodes, and E, a set of ordered pairs, called the edges of G.

Undirected Graph

An undirected graph (or graph) G = (V,E) consists of a finite set V of vertices, and a set E

of unordered pairs of distinct vertices, called the edges.

Directed graphs and undirected graphs are different objects mathematically. We say that

vertex v is adjacent to vertex u if there is an edge (u; v).

In a directed graph, given the edge e = (u; v), we say that u is the origin of e and v is the

destination of e. In undirected graphs u and v are the endpoints of the edge. The edge e is

incident (meaning that it touches) both u and v.

In a digraph, the number of edges coming out of a vertex is called the out-degree of that

vertex, and the number of edges coming in is called the in-degree.

In an undirected graph we just talk about the degree of a vertex as the number of incident

edges.

Weighted Graph

109

In weighted graph, a value (weight) is assigned to each vertex. Weighted graph may be

directed graph or undirected graph. Weight of a edge can be represented by W(u,v), in which

edge is in between the node u and v.

For example, W(1,2) = 1, W(1,4) = 3 in digraph.

AN APPLICATION OF GRAPHS

Assume one input line containing four integers followed by any number of input lines

with two integers each. The first integer on the first line, n, represents the number of cities,

which for simplicity are numbered from 0 to n-1. The second and third integers on that line are

between 0 and n-1 and represent two cities.

It is desired tot ravel from the first city to the second city using exactly nr nodes, where

nr is the fourth integer on the first input line.

Each subsequent input line contains two integers representing two cities, indicating that

there is a road from the first city to the second.

The problem is to determine whether there is a path of required line by which one can

travel from the first of the given cities to the second.

A plan for solution is the following: Create a graph with the cities as nodes and the roads

as arcs. To find a path of length nr from node A to node B, look for the node C such that an arc

exists from A to C and a path of length nr-1 exists from C to B.

If these conditions are satisfied from some node C, the desired path exists. If the

conditions are not satisfied for any node C, the desired path does not exist.

scanf (―%d‖, &n);

scanf(―%d%d‖, &a, &b);

scanf(―%d‖, &nr);

while (scanf(―%d %d‖,& city1,&city2)!=EOF)

join(city1,city2);

if (findpath(nr, a, b))

printf(―a path exists from %d to %d in %d steps‖, a, b ,nr);

else

printf(― no path exists from %d to %d in %d steps);

The algorithm for the function findpath(k, a,b) as:

110

if(k= =1)

return (adjacent(a,b));

for (c=0;c<n;++c)

if(adjacent(a,c) && findpath(k-1,c,b))

return (FALSE);

REPRESENTATIONS

DIRECTED GRAPH

Adjacency Matrix

An n x n matrix defined for 1 <= v; w <= n.

If the digraph has weights we can store the weights in the matrix. For example if (v;w) E then

A[v,w] = W(v,w). If (v,w) E then generally W(v,w) need not be defined, but often we set it to

some special value, e.g. A(v,w) = −1, or 1.

Adjacency List

An array adj[1 …. n] of pointers where for 1 <= v <= n, adj[v] points to a linked list

containing the vertices which are adjacent to v (i.e. the vertices that can be reached from v by a

single edge). If the edges have weights then these weights may also be stored in the linked list

elements.

UNDIRECTED GRAPH

Undirected graphs using exactly the same representation, but we will store each edge twice. In

particular, we representing the undirected edge (v,w) by the two oppositely directed edges (v, w)

and (w, v).

This can cause some complications. For example, suppose you write an algorithm that

operates by marking edges of a graph. You need to be careful when you mark edge (v, w) in the

representation that you also mark (w, v), since they are both the same edge in reality.

111

When dealing with adjacency lists, it may not be convenient to walk down the entire linked list,

so it is common to include cross links between corresponding edges.

An adjacency matrix requires O(V 2) storage and an adjacency list requires O(V +E)

storage. The V arises because there is one entry for each vertex in Adjacent. Since each list has

out-deg (v) entries, when this is summed over all vertices, the total number of adjacency list

records is O(E). For sparse graphs the adjacency list representation is more space efficient.

TRANSITIVE CLOSURE

Let us assume that the graph is completely described by its adjacency matrix, adj.

Consider the logical expression adj[i][k] && adj[k][j]. Its value is TRUE if and only if the values

of both adj[i][k] and adj[k][j] are TRUE, which implies that there is an arc from node I to node k

and an arc from node k to node j. Thus adj[j][k] && adj[k][j] equals TRUE if and only if there is

a path of length 2 from I to j passing through k.

Consider the expression

(adj[i][0] && adj[0][j]) || (adj[i][1] && adj[1][j]) ||….|| (adj[i][MAXNODES-1] &&

adj[MAXNODES-1][j])

The value of this expression is TRUE only if there is a path of length 2 from node I to node j

either through node 0 or through node 1… or through node MAXNODES-1.

Consider an array adj2 such that adj2[i][j] is the value of the foregoing expression. adj2 is

called the path matrix of length 2. adj2[i][j] indicates whether or not there is a path of length 2

between i and j. adj2 is said to be the Boolean product of adj2 itself.

Fig (a): adj Fig (b): adj2

112

Fig illustrates the process. Fig (a) depicts a graph and its adjacency matrix in which true

is reprented by 1 and false eresntd by 0. Fig (b) is the boolen oduct of that matx ith self, and thus

student the path matrix of length 2 for the graph. 1 appears in row I, column j of the matrix in Fig

(b) if and only if there is a path of length 2 from node I to node j in the graph.

Define adj3, the path matrix of length 3, as the Boolean product of adj2 with adj. adj3[i][j]

equals TRUE if and only if there is a path of length 3 from i to j. The below Fig illustrates the

matrices adj3 and adj4 of the graph in Fig (a)

A B C D E

A 0 0 0 1 1

B 0 0 0 1 1

C 0 0 0 1 1

D 0 0 0 0 1

E 0 0 0 1 0

Fig (a) adj3

A B C D E

A 0 0 0 1 1

B 0 0 0 1 1

C 0 0 0 1 1

D 0 0 0 1 0

E 0 0 0 0

Fig (a) adj4

Assume that to know whether a path of length 3 or less exists between two nodes of a graph. If

such a path exists between nodes i and j, it must be of length 1, 2, 3. If there is a path of length 3

or less between nodes I and j the value of

adj[i][j] || adj2[i][j] || adj3[i][j]

must be true. Fig shows the matrix formed by ―or-ing‖ the matrices adj, adj2 and adj3.

A B C D E

A 0 0 1 1 1

B 0 0 1 1 1

113

C 0 0 0 1 1

D 0 0 0 1 1

E 0 0 0 1 1

Fig matrix formed by or -ing

If the graph has n nodes, it must be true that,

Path[i][j]= =adj[i][j] || adj2 [i][j] || …|| adjn[i][j]

This is because if there is a path of length m>n from I to j such as i1,i2,i3…im,j, there must

be another path from I to j of length less than or equal to n. since there are only n nodes in the

graph, atleast one node k n or less. Fig illustrates the matrix path for the graph of Fig (a). The

matrix path is called the transitive closure must appear in the path Twice.

The path from I to j can be shortened by removing the cycle from k to k. This process is

repeated until no two nodes in the path are equal and therefore the path is of length of the matrix

adj.

A B C D E

A 0 0 1 1 1

B 0 0 1 1 1

C 0 0 0 1 1

D 0 0 0 1 1

E 0 0 0 1 1

Fig path = adj or adj2 or asj3 or adj4 or adj5

Transitive closure routine

transclose (adj,path)

int adj[ ] [MAXNODES], path[ ] [MAXNODES];

int i,j k;

int newprod[MAXNODES] [MAXNODES],adjprod [MAXNODES] [MAXNODES];

for ( i=0; i<MAXNODES; ++i)

for (j=0; j<MAXNODES; ++j)

114

adjprod[i][j] = path[i][j]=adj[i][j];


prod (adjprod,adj,newprod);


for (k=0; k<MAXNODES; ++k)

path[j][k]=path[i][k] ||newprod[j][k];



adjprod[j][k]=newprod[j][k];

The routine prod may be

prod (a, b, c)

int a[ ] [MAXNODES],b[ ] [MAXNODES], c[ ] [MAXNODES];

int i, j, k, val;



val = FALSE;


val =val || (a[j][k] && b[k][j]);

c[i][j] = val;

To analyze the efficiency of this routine, finding the Boolean product by the method is O(n3),

where n is the number of graph nodes. In transclose, this process is embedded in a loop that is

repeated n-1 times, so that the entire transitive closure routine is O(n4).

WARSHALL’S ALGORITHM

Let us define the matrix path such that pathk[i][j] is true if and only if there is a path from

node i to node j that does not pass through any nodes numbered higher than k.

The only situation in which pathk+1[i][j] can be TRUE while pathk[i][j] equals FALSE is

if there is a path from i to j passing through nodes 1 through k.

115

But this means that there must be a path from I to k+1 passing through only nodes 1

through k and a similar path from k+1 to j. Thus pathk+1[i][j] equals TRUE if and only if one of

the following two conditions holds:

pathk [i] [j]==TRUE

pathk [i] [k+1]==TRUE and pathk [k+1] [j]==TRUE

This means that pathk+1 [i][j] equals pathk [i][j] || (pathk [i][k+1] && pathk [k+1][j]).

To obtain the matrix pathk from the marix pathk-1 based on

for (i=0;i<MAXNODES;++i)

for (j=0;j<MAXNODES;++j)

pathk [i] [j]=pathk-1[i] [j] || (pathk [i][k+1] && pathk [k+1][j]);

This may be logically simplified and made more efiicient as



pathk [i] [j]=pathk-1[i] [j];


if (pathk-1 [i] [k]=TRUE)


pathk [i] [j]=pathk-1[i] [j] || pathk-1 [k] [j];

Clearly path0 [i] [j]=adj, since only way to go from node I to node j without passing through

any nodes is to go directly from i to j. The following C routineis used to compute the transitive

closure.

transclose( adj, path)

int adj[ ][MAXNODES],path[ ] [MAXNODES]

int i,j,k;

116



path [i] [j] =adj [i] [j];

for (k=0;i<MAXNODES;++k)


if (path [i] [k]= =TRUE)


path [i] [j]=path [i] [j] || path [k] [j];

This technique increase the efficiency of finding the transitive closure to O(n3). This method

is called Warshall‘s algorithm.

SHORTEST-PATH ALGORITHMS

The input is a weighted graph: associated with each edge (vi, vj) is a cost ci,j to traverse

the arc. The cost of a path v1v2 ... vn is

This is referred to as the weighted path length. The unweighted path length is merely the number

of edges on the path, namely, n - 1.

Unweighted Shortest Paths

Fig shows an unweighted graph, G. Using some vertex, s, which is an input parameter,

we would like to find the shortest path from s to all other vertices. there are no weights on the

edges. This is clearly a special case of the weighted shortest-path problem, since we could assign

all edges a weight of 1.

For now, suppose we are interested only in the length of the shortest paths, not in the

actual paths themselves. Keeping track of the actual paths will turn out to be a matter of simple

bookkeeping.

Fig An unweighted directed graph G

117

Suppose we choose s to be v3. Immediately, we can tell that the shortest path from s to v3

is then a path of length 0. Obtain this graph in Fig.

Looking for all vertices that are a distance 1 away from s. These can be found by looking

at the vertices that are adjacent to s. If we do this, we see that v1 and v6 are one edge from s. This

is shown n Fig

Fig: Graph after marking the start node as reachable in zero edges

Fig: Graph after finding all vertices whose path length from s is 1

Fig: Graph after finding all vertices whose shortest path is 2

We can now find vertices whose shortest path from s is exactly 2, by finding all the

vertices adjacent to v1 and v6 (the vertices at distance 1), whose shortest paths are not already

known. This search tells us that the shortest path to v2 and v4 is 2.

Finally we can find, by examining vertices adjacent to the recently evaluated v2 and v4,

that v5 and v7 have a shortest path of three edges. All vertices have now been calculated, and so

Figure 9.14 shows the final result of the algorithm.

118

This strategy for searching a graph is known as breadth-first search. It operates by

processing vertices in layers: the vertices closest to the start are evaluated first, and the most

distant vertices are evaluated last. This is much the same as a level-order traversal for trees.

Given this strategy, we must translate it into code. Fig shows the initial configuration of

the table that our algorithm will use to keep track of its progress.

For each vertex, we will keep track of three pieces of information. First, we will keep its

distance from s in the entry dv. Initially all vertices are unreachable except for s, whose path

length is 0. The entry in pv is the bookkeeping variable, which will allow us to print the actual

paths. The entry known is set to 1 after a vertex is processed. Initially, all entries are unknown,

including the start vertex.

Fig: Final shortest paths

When a vertex is known, we have a guarantee that no cheaper path will ever be found,

and so processing for that vertex is essentially complete. The basic algorithm can be described in

Fig. The algorithm in Fig mimics the diagrams by declaring as known the vertices at distance d =

0, then d = 1, then d = 2, and so on, and setting all the adjacent vertices w that still have dw = to

a distance dw = d + 1. The running time of the algorithm is O(|V|2), because of the doubly nested

for loops.

v Known dv pv

----------------------

v1 0 ∞ 0

v2 0 ∞ 0

v3 0 0 0

v4 0 ∞ 0

v5 0 ∞ 0

v6 0 ∞ 0

v7 0 ∞ 0

119

Fig: Initial configuration of table used in unweighted shortest-path computation

void unweighted( TABLE T ) /* assume T is initialized */

unsigned int curr_dist;

vertex v, w;

for( curr_dist = 0; curr_dist < NUM_VERTEX; curr_dist++)

for each vertex v

if( ( !T[v].known ) && ( T[v].dist = curr_dist ) )

T[v].known = TRUE;

for each w adjacent to v

if( T[w].dist = INT_MAX )

T[w].dist = curr_dist + 1;

T[w].path = v;

Fig: Pseudocode for unweighted shortest-path algorithm

NETWORK FLOW PROBLEM

Suppose we are given a directed graph G = (V, E) with edge capacities cv,w. These

capacities could represent the amount of water that could flow through a pipe or the amount of

traffic that could flow on a street between two intersections.

We have two vertices: s, which we call the source, and t, which is the sink. Through any

edge, (v, w), at most cv,w units of "flow" may pass. At any vertex, v, that is not either s or t, the

total flow coming in must equal the total flow going out.

The maximum flow problem is to determine the maximum amount of flow that can pass

from s to t. As an example, for the graph in Fig on the left the maximum flow is 5, as indicated

by the graph on the right.

Fig: A graph (left) and its maximum flow

As required by the problem statement, no edge carries more flow than its capacity. Vertex

a has three units of flow coming in, which it distributes to c and d. Vertex d takes three units of

flow from a and b and combines this, sending the result to t. A vertex can combine and distribute

120

flow in any manner that it likes, as long as edge capacities are not violated and as long as flow

conservation is maintained.

A first attempt to solve the problem proceeds in stages. We start with our graph, G, and

construct a flow graph Gf. Gf tells the flow that has been attained at any stage in the algorithm.

Initially all edges in Gf have no flow, when the algorithm terminates, Gf contains a maximum

flow. We also construct a graph, Gr, called the residual graph. Gr tells, for each edge, how much

more flow can be added. We can calculate this by subtracting the current flow from the capacity

for each edge. An edge in Gr is known as a residual edge.

At each stage, we find a path in Gr from s to t. This path is known as an augmenting path.

The minimum edge on this path is the amount of flow that can be added to every edge on the

path We do this by adjusting Gf and recomputing Gr. When we find no path from s to t in Gr, we

terminate. The initial configuration is in Fig (a). There are many paths from s to t in the residual

graph. Suppose we select s, b, d, t. Then we can send two units of flow through every edge on

this path once we have filled (saturated) an edge, it is removed from the residual graph in Fig (b).

Next, select the path s, a, c, t, which also allows two units of flow. Making the required

adjustments gives the graphs in Fig (c).

Fig (a) Initial stages of the graph, flow graph, and residual graph

Fig (b) G, Gf, Gr after two units of flow added along s, b, d, t

121

Fig (c ) G, Gf, Gr after two units of flow added along s, a, c, t

The only path left to select is s, a, d, t, which allows one unit of flow. The resulting

graphs are shown in Fig (d).The algorithm terminates at this point, because t is unreachable from

s.

suppose that with our initial graph, we chose the path s, a, d, t.there is now no longer any

path from s to t in the residual graph, and thus, our algorithm has failed to find an optimal

solution.The resulting flow of 5 happens to be the maximum. Fig (e) shows the algorithm fails.

For every edge (v, w) with flow fv,w in the flow graph, we will add an edge in the

residual graph (w, v) of capacity fv,w.. Starting from our original graph and selecting the

augmenting path s, a, d, t, we obtain the graphs in Fig (f).

That in the residual graph, there are edges in both directions between a and d. Either one

more unit of flow can be pushed from a to d, or up to three units can be pushed back we can

undo flow the algorithm finds the augmenting path s, b, d, a, c, t, of flow 2. By pushing two units

of flow from d to a, the algorithm takes two units of flow away from the edge (a, d). Fig (g)

shows the new graphs.

Fig (d) G, Gf, Gr after one unit of flow added along s, a, d, t -- algorithm terminates

Fig (e) G, Gf, Gr if initial action is to add three units of flow along s, a, d, t -- algorithm

terminates with suboptimal solution

122

Fig (f) Graphs after three units of flow added along s, a, d, t using correct algorithm

Fig (g) Graphs after two units of flow added along s, b, d, a, c, t using correct algorithm

There is no augmenting path in this graph, so the algorithm terminates. if the edge

capacities are rational numbers, this algorithm always terminates with a maximum flow.

If the capacities are all integers and the maximum flow is f, then, since each augmenting

path increases the flow value by at least 1, f stages suffice, and the total running time is O(f |E|),

since an augmenting path can be found in O(|E|) time by an unweighted shortest-path algorithm

Dijkstra's Algorithm

The general method to solve the single-source shortest-path problem is known as

Dijkstra's algorithm. This thirty-year-old solution is a prime example of a greedy algorithm.

Dijkstra's algorithm proceeds in stages, just like the unweighted shortest-path algorithm.

At each stage, Dijkstra's algorithm selects a vertex v, which has the smallest dv among all the

unknown vertices, and declares that the shortest path from s to v is known.

The remainder of a stage consists of updating the values of dw. In the unweighted case,

we set dw = dv + 1 if dw = ∞.

Thus, we essentially lowered the value of dw if vertex v offered a shorter path. If we

apply the same logic to the weighted case, then we should set dw = dv + cv,w if this new value

for dw would be an improvement.

The graph in Fig is our example. Fig (a) represents the initial configuration, assuming

that the start node, s, is v1. The first vertex selected is v1, with path length 0. This vertex is

marked known. Now that v1 is known, some entries need to be adjusted. The vertices adjacent to

v1 are v2 and v4.

Both these vertices get their entries adjusted, as indicated in Fig (b)

Next, v4 is selected and marked known. Vertices v3, v5, v6, and v7 are adjacent, and it

turns out that all require adjusting, as shown in Fig (c)

123

Next, v2 is selected. v4 is adjacent but already known, so no work is performed on it. v5

is adjacent but not adjusted, because the cost of going through v2 is 2 + 10 = 12 and a path of

length 3 is already known. Fig (d) shows the tables after these vertices are selected.

The next vertex selected is v5 at cost 3. v7 is the only adjacent vertex, but it is not

adjusted, because 3 + 6 > 5. Then v3 is selected, and the distance for v6 is adjusted down to 3 + 5

= 8. The resulting table is depicted in Fig (e)

Next v7 is selected; v6 gets updated down to 5 + 1 = 6. The resulting table is Fig (f)

Finally, v6 is selected.

The final table is shown in Fig (g). Fig (h) graphically shows how edges are marked

known and vertices updated during Dijkstra's algorithm.

Fig: The directed graph G

v Known dv pv

-------------------

v1 0 0 0

v2 0 ∞ 0

v3 0 ∞ 0

v4 0 ∞ 0

v5 0 ∞ 0

v6 0 ∞ 0

v7 0 ∞ 0

Fig (a) Initial configuration of table used in Dijkstra's algorithm

v Known dv pv

--------------------

v1 1 0 0

v2 0 2 v1

v3 0 ∞ 0

v4 0 1 v1

v5 0 ∞ 0

v6 0 ∞ 0

124

v7 0 ∞ 0

Fig (b) After v1 is declared known

v Known dv pv

--------------------

v1 1 0 0

v2 0 2 v1

v3 0 3 v4

v4 1 1 v1

v5 0 3 v4

v6 0 9 v4

v7 0 5 v4

Fig (c) After v4 is declared known

v Known dv pv

--------------------

v1 1 0 0

v2 1 2 v1

v3 0 3 v4

v4 1 1 v1

v5 0 3 v4

v6 0 9 v4

v7 0 5 v4

Fig (d) After v2 is declared known

v Known dv pv

--------------------

v1 1 0 0

v2 1 2 v1

v3 1 3 v4

v4 1 1 v1

v5 1 3 v4

v6 0 8 v3

v7 0 5 v4

Fig (e) After v5 and then v3 are declared known

v Known dv pv

-------------------

v1 1 0 0

v2 1 2 v1

v3 1 3 v4

125

v4 1 1 v1

v5 1 3 v4

v6 0 6 v7

v7 1 5 v4

Fig (f) After v7 is declared known

v Known dv pv

-------------------

v1 1 0 0

v2 1 2 v1

v3 1 3 v4

v4 1 1 v1

v5 1 3 v4

v6 1 6 v7

v7 1 5 v4

Fig (g) After v6 is declared known and algorithm terminates

Fig (h) Stages of Dijkstra's algorithm

void

dijkstra( TABLE T )

126

vertex v, w;

for( ; ; )

v = smallest unknown distance vertex;

if(V==NotAVertex)

break;

T[v].known = TRUE;

for each w adjacent to v

if( !T[w].known )

if( T[v].dist + cv,w < T[w].dist )

decrease( T[w].dist to

T[v].dist + cv,w );

T[w].path = v;

Fig: Pseudocode for Dijkstra's algorithm

MINIMUM SPANNING TREE

A minimum Spanning tree of an undirected graph G is a tree formed from graph edges

that connects all the vertices of G at a lowest total cost. A minimum spanning tree exists if and

only if G is connected.

In fig, the second graph is a minimum spanning tree of the first. The number of edges in

the minimum spanning tree is |v|-1. The minimum spanning tree is a tree because it is acyclic, it

is spanning because it covers every vertex, and it is minimum for the obvious reason.

127

Fig: A graph G and its minimum spanning tree

For any spanning tree T, if an edge Education that is not in T is added, a cycle is created.

The removal of any edge on the cycle reinstates the spanning tree property. The cost of

the spanning tree is lowered if e has lower cost than the edge that was removed. If, as a spanning

tree is created, the edge that is added is the one of minimum cost that avoids creation of a cycle,

then the cost of the resulting spanning tree cannot be removed, because any replacement edge

would have cost at least as much as an edge already in the spanning tree.

Prim’s Algorithm

One way to compute a minimum spanning tree is to grow the tree in successive stages. In

each stage, one node is picked as the root, and we add an edge, and thus an associated vertex, to

the tree.

The algorithm then finds, at each stage, a new vertex to add to the tree by choosing the

edge (u, v) such that the cost of (u, v) is the smallest among all edges where u is in the tree andv

is not. The Fig shows how this algorithm would build the minimum spanning tree, starting from

v1. Initially, v1 is in the tree as a root with no edges. Each step adds one edge and one vertex to

the tree.

128

Fig: Prim's algorithm after each stage

The prim‘s algorithm is identical to Dijkstra‘s algorithm for shortest paths. For each

vertex we keep values dv and pv and an indication of whether it is known or unknown. Dv is the

weight of the shortest arc connecting v to a known vertex, and pv id the last vertex to cause a

change in dv.

The rest of the algorithm is same, with the exception that since the definition of dv is

different, so is the update rule. After a vertex v is selected, for each unknown w adjacent to v,

dw=min(dw,cw,v).

The initial configuration of the table is shown in Fig (a) v1 is selected, and v2, v3, and v4

are updated. The table resulting from this is shown in Fig (b)

The next vertex selected is v4. Every vertex is adjacent to v4. v1 is not examined, because

it is known. v2 is unchanged, because it has dv = 2 and the edge cost from v4 to v2 is 3; all the

rest are updated.

Fig (c) shows the resulting table. The next vertex chosen is v2 (arbitrarily breaking a tie).

This does not affect any distances. Then v3 is chosen, which affects the distance in v6, producing

Fig (d). Fig (e) results from the selection of v7, which forces v6 and v5 to be adjusted. v6 and

then v5 are selected, completing the algorithm.

The final table is shown in Fig(f) the edges in the spanning tree can be read from the table: (v2, v1), (v3, v4), (v4, v1), (v5, v7), (v6, v7),

(v7, v4). The total cost is 16.

v Known dv pv

--------------------

v1 0 0 0

v2 0 ∞ 0

v3 0 ∞ 0

v4 0 ∞ 0

v5 0 ∞ 0

129

v6 0 ∞ 0

v7 0 ∞ 0

Fig (a) Initial configuration of table used in Prim's algorithm

v Known dv pv

--------------------

v1 1 0 0

v2 0 2 v1

v3 0 4 v1

v4 0 1 v1

v5 0 ∞ 0

v6 0 ∞ 0

v7 0 ∞ 0

Fig (b) The table after v1 is declared known

v Known dv pv

--------------------

v1 1 0 0

v2 0 2 v1

v3 0 2 v4

v4 1 1 v1

v5 0 7 v4

v6 0 8 v4

v7 0 4 v4

Fig (c) The table after v4 is declared known

v Known dv pv

--------------------

v1 1 0 0

v2 1 2 v1

v3 1 2 v4

v4 1 1 v1

v5 0 7 v4

v6 0 5 v3

v7 0 4 v4

Fig (d) The table after v2 and then v3 are declared known

v Known dv pv

--------------------

v1 1 0 0

v2 1 2 v1

130

v3 1 2 v4

v4 1 1 v1

v5 0 6 v7

v6 0 1 v7

v7 1 4 v4

Fig (e)The table after v7 is declared known

v Known dv pv

--------------------

v1 1 0 0

v2 1 2 v1

v3 1 2 v4

v4 1 1 v1

v5 1 6 v7

v6 1 1 v7

v7 1 4 v4

Fig (f) The table after v6 and v5 are selected (Prim's algorithm terminates)

The prim‘s algorithm runs on undirected graphs, so when coding it, remember to put

every edge in two adjacency lists. The running time is O(|V2|) without heaps, which is optimal

for dense graphs, and O(|E|log|V|) using binary heaps, which is good for sparse graphs.

Kruskal’s Algorithm

A second greedy strategy is continually to select the edges in order of smallest weight

and accept an edge if it does not cause a cycle. The action of the algorithm on the graph in the

preceding example is shown in Fig

Formally, Kruskal‘s algorithm maintains a forest- a collection of trees. Initially, there are

|V| single-node trees. Adding an edge merges two trees into one.

When the algorithm terminates, there is only one tree, and this is the minimum spanning

tree. Fig shows the order in which edges are added to the forest. The algorithm terminates when

enough edges are accepted. It turns out to be simple to decide whether edge (u, v) should be

accepted or rejected.

Edge Weight Action

----------------------------

(v1,v4) 1 Accepted

(v6,v7) 1 Accepted

(v1,v2) 2 Accepted

(v3,v4) 2 Accepted

(v2,v4) 3 Rejected

(v1,v3) 4 Rejected

(v4,v7) 4 Accepted

131

(v3,v6) 5 Rejected

(v5,v7) 6 Accepted

Fig: Action of Kruskal's algorithm on G

Fig: Kruskal's algorithm after each stage

The invariant is that at any point in the process, two vertices belong to the same set if and

only if they are connected in the current spanning forest. Thus, each vertex is initially in its own

set. If u and v are in the same set, the edge is rejected, because since they are already connected,

adding (u, v) would form a cycle. Otherwise, the edge is accepted, and a union is performed on

the two sets containing u and v. It is easy to see that this maintains the set invariant, because

once the edge (u, v) is added to the spanning forest, if w was connected to u and x was connected

to v, then x and w must now be connected, and thus belong in the same set.

The worst-case running time is O(|E| log |Education|), which is dominated by the heap

operations.

Pseudocode for Kruskal’s Algorithm

Void

Kruskal (Graph G)

int EdgesAccepted;

DisjSET Student;

PriorityQueue H;

Vertex U, V;

SetType Uset, Vset;

Edge E;

Initialize( S );

ReadGraphIntoHeapArray(G, H);

132

BuildHeap(H);

EdgesAccepted=0;

While(EdgesAccepted<NumVertex-1)

E=DeleteMin(H);

Uset=Find(U, S);

Vset=Find(V, S);

if ( Uset!=Vset)

EdgesAccepted++;

SetUnion(S, Uset, Vset);

AN APPLICATION OF SCHEDULING

Suppose a chef in a diner receives an order for a fried egg. The job of frying an egg can

be decomposed into a number of distinct subtasks:

Get egg Crack egg Get grease

Grease pan Heat grease Pour egg into pan

Wait until egg is done Remove egg

Some of these tasks must precede others (for example, "get egg" must precede "crack

egg"). Others may be done simultaneously (for example, "get egg" and "heat grease").

The chef wishes to provide the quickest service possible and is assumed to have an

unlimited number of assistants. The problem is to assign tasks to the assistants so as to complete

the job in the least possible time.

Although this example may seem frivolous, it is typical of many real-world scheduling

problems. A computer system may wish to schedule jobs to minimize turnaround time; a

compiler may wish to schedule machine language operations to minimize execution time; or a

plant manager may wish to organize an assembly line to minimize production time. All these

problems are closely related and can be solved by the use of graphs.

Let us represent the above problem as a graph. Each node of the graph represents a

subtask and each arc <x,y> represents the requirement that subtask y cannot be performed until

subtask x has been completed. This graph G is shown in Fig.

133

Fig: The graph G

Consider the transitive closure of G. The transitive closure is the graph T such that <x,y> is an

arc of T if and only if there is a path from x to y in G. This transitive closure is shown in Fig.

Fig: The graph T

In the graph T, an arc exists from node x to node y if and only if subtask x must be performed

before subtask y. Note that neither G nor T can contain a cycle, since if a cycle from node x to

itself existed, subtask x could not be performed until after subtask x had been completed. This is

clearly an impossible situation in the context of the problem. Thus G is a dag, a directed acyclic

graph.

In the graphs of Figures, the nodes A and F do not have predecessors. Since they have no

predecessors the subtasks that they represent may be performed immediately and simultaneously

without waiting for any other subtasks to be completed. Every other subtask must wait until at

least one of these is completed.

Once these two subtasks have been performed, their nodes can be removed from the

graph. Note that the resulting graph does not contain any cycles, since nodes and arcs have been

removed from a graph that originally contained no cycles.

Therefore the resulting graph must also contain at least one node with no predecessors. In

the example, B and H are two such nodes. Thus the subtasks B and H may be performed

simultaneously in the second time period.

Time period Assistant 1 Assistant 2

1 Get egg Get greese

134

2 Crack egg Grease pan

3 Heat grease

4 Pour egg into pan

5 Wait until egg is done

6 Remove egg

The above process can be outlined as follows:

Read the precedences and construct the graph.

Use the graph to determine subtasks that can be done simultaneously.

Let us refine each of these two steps. Two crucial decisions must be made in refining step 1.

The first is to decide the format of the input; the second is to decide on the representation of the

graph. The most convenient way to represent these requirements is by ordered pairs of subtasks;

each input line contains the names of two subtasks where the first subtask on a line must precede

the second

Step 2 can be refined into the following algorithm

While ( the graph is not empty)

determine which nodes have no predecesors;

output this group of nodes with an indication that yhey can be performed simultaneously in the

next time period;

remove these nodes and their incident arcs from the graph;

The refinement of Step 2 may be rewritten as

Outp=NULL;

for (all node(p) from the graph)

if (count(p)==0)

remove node(p) from the graph;

place node(p) on the input list;

period=0;

while (outp!=NULL)

++perod;

printf(―%d‖, period);

135

nextout=NULL;

p=outp;

while(p!= NULL)

printf(―%s‖,info(p));

for(all arcs a emanting from node(p)0

t=the pointer to the node that terminates a;

count(t)--;

if(count(t)==0)

remove node(t ) from the graph;

add node(t) to the nextout list;

free arc(a);

q=next(p);

freenode(p);

p=q;

outp=nextout;

if(any nodes remain in the graph)

error- there is a cycle in the graph;

It is necessary in step I to be able to access each graph node from the character string that

specifies the task the node represents. For this reason, it makes sense to organize the set of graph

nodes in a hash table. Although the initial traversal will require accessing some extra table

positions, this is more than offset by the ability to access a node directly from its task name. The

only impediment is the need (in line 19) to delete nodes from the graph.

However, further analysis reveals that the only reason to delete a node is to be able to

check whether any nodes remain when the output list is empty (line 30) so that a cycle may be

detected. If we maintain a counter of the number of nodes and implement the deletion by

reducing this counter by I, we can check for remaining nodes by comparing the counter with

zero.

LINKED REPRESENTATION OF GRAPHS

136

The adjacency matrix representation of a graph is frequently inadequate because it

requires advance knowledge of the number of nodes. If a graph must be constructed in the course

of solving a problem, or if it must be updated dynamically as the program proceeds, a new

matrix must be created for each addition or deletion of a node. This is prohibitively inefficient,

especially in a real-world situation where a graph may have a hundred or more nodes. Further,

even if a graph has very few arcs so that the adjacency matrix is sparse, space must be reserved

for every possible arc between two nodes, whether or not such an arc exists. If the graph contains

n nodes, a total of n2 locations must be used.

The remedy is to use a linked structure, allocating and freeing nodes from an available

pool. This is similar to the methods used to represent dynamic binary and general trees. In the

linked representation of trees, each allocated node corresponds to a tree node. This is possible

because each tree node is the son of only one other tree node and is therefore contained in only a

single list of sons. However, in a graph an arc may exist between any two graph nodes. It is

possible to keep an adjacency list for every node in a graph and a node might find itself on many

different adjacency lists. But this requires that each allocated node contain a variable number of

pointers, depending on the number of nodes to which it is adjacent.

An alternative is to construct a multilinked structure in the following way. The nodes of

the graph are represented by a linked list of header nodes. Each such header node contains three

fields: info. nextnode, and arcptr. If p points to a header node representing a graph node a,

info(p) contains any information associated with graph node a. nextnode(p) is a pointer to the

header node representing the next graph node, if any. Each header node is at the head of a list of

nodes of a second type called list nodes. This list is called the adjacency list. Each node on an

adjacency list represents an arc of the graph. arcptr(p) points to the adjacency list of nodes

representing the arcs emanating from the graph node a.

Each adjacency list node contains two fields: ndptr and nextarc. If q points to a list node

representing an arc <A.B>, ndptr(q) is a pointer to the header node representing the graph node

B. Nextarc(q) points to a list node representing the next arc emanating from graph node A, if

any. Each list node is contained in a single adjacency list representing all arcs emanating from a

given graph node. The term allocated node is used to refer to either a header or a list node of a

multilinked structure representing a graph.

A sample header node representing a graph node

A sample list ode representing an arc

Fig (a)

137

Fig (b) A graph

Fig (c) Linked represntation of a graph

Fig illustrates this representation. If each graph node carries some information but the

arcs do not, two types of allocated nodes are needed: one for header nodes (graph nodes) and the

other for adjacency list nodes (arcs). These are illustrated in Fig (a). Each header iwde contains

an info field and two pointers. The first of these is to the adjacency list of arcs emanating from

the graph node, and the second is to the next header node in the graph. Each arc node contains

two pointers, one to the next arc node in the adjacency list and the other to the header node

representing the graph node that terminates the arc. Fig (b) depicts a graph and Fig(c) its linked

representation

Nodes are declared using the array implementation as

struct nodetype

138

int info;

int point;

int next;

;

struct nodetype node[MAXNODES];

In the case of a header node, node[p] represents a graph node A, node[p].info represents

the information associated with the graph node A, node[p].next points to the next graph node,

and node[p].point points to the first list node representing an arc emanating from A. In the case

of a list node, node[p] represents an arc <A,B>, node[p].info represents the weight of the arc,

node[p].next points to the next arc emanating from A, and node[p] .point points to the header

node representing the graph node B.

In dynamic implementation, declaring the nodes as follows:

struct nodetype

int info;

struct node type *point;

struct nodetype *next;

;

struct nodetype *nodeptr;

The implementation of the primitive graph operations using the linked representation.

The operation joinwt(p,q,wt) accepts two pointers p and q to two header nodes and creates an arc

between them with weight wt. If an arc already exists between them, that arc's weight is set to

wt.

joinwt (p, q, wt)

int p, q, wt;

int r, r2;

r2 = -10;

r = node[p].point;

while (r >= 0 && node[r].point != q)

r2 = r:

r= node [r ]. next;

139

node[r].info =wt;

return;

r=getnode();

node[r].point=q;

node[r].next=-1;

node[r].info=wt;

(r2<0)? (node[p].point=r)node[r2].next=r);

The operation remv(p,q) accepts pointers to two header nodeand removes th arc between them, if

one exists.

remv (p,q)

int p,q;

int r,r2;

r2=-1;

r=node[p].point;

while (r>=0 && node[r].point!=q)

r2=r;

r= node[r].next;

if (r>=0)

(r2<0) ? (node[p].point=node[r].next);

(node[r2].next=node[r].next);

freenode(r);

return;

The function adjacent(p,q) accepts pointers to two headers nodes and determines whether

node(q) is adjacent to node(p).

140

adjacent (p,q);

int p,q;

int r;

r = node[p].point;

while (r>0)

if (node[r].point= = q)

return(TRUE);

else

r=node[r].next;

return(FALSE);

The function findnode(graph,x) which returns a pointer to a headrer node with information field

x if such a header node exists, and returns the null pointe otherwise.

findnode (graph,x)

int graph;

int x;

int p;

p=graph;

while (p>=0)

if (node[p].info= = x)

return(p);

else

p=node[p].next;

return(-1);

The function addnode(&graph,x) adds a node with information field x to a graph and returns a

pointer to that node.

Addnode (pgraph,x)

int *pgraph;

141

int x;

int p;

p = getnode( );

node[p].info =x;

node[p].point=-1;

node[p].next=*pgraph;

*pgraph=p;

return(p);

The difference between the adjacency matrix representation and the linked representation

of graphs. Implicit in matrix representation is the ability to traverse a row or column of the

matrix. Taversing a row is equivalent to identifying all arcs emanating from a given node. This

can be done efficiently in the linked representation by traversing the list of arc nodes starting at a

given header node. Traversing a column of an adjacency matrix, however, is equivalent to

identifying all arcs that terminate at a given node; there is no corresponding method for

accomplishing this under the linked representation. , the linked representation could be modified

to include two lists emanating from each header node: one for the arcs emanating from the graph

node and the other lr the arcs terminating at the graph node. However, this would require

allocating two nodes for each arc, thus increasing the complexity of adding or deleting an arc.

Alternatively, each arc node could be placed on two lists. In this case, an arc node would contain

four pointers: one to the next arc emanating from the same node, one to the next arc terminating

at the same node, one to the header node at which it terminates a.nd one to the header node from

Which emanate., A header node would contain three pointers: one to the next header node, one

to the list of the arcs emanating from it and one to the list of arcs terminating at it.

GRAPH TRAVERSALS

There are a number of approaches used for solving problems on graphs. One of the

most important approaches is based on the notion of systematically visiting all the vertices

and edge of a graph. The reason for this is that these traversals impose a type of tree structure

(or generally a forest) on the graph, and trees are usually much easier to reason about than

general graphs.

Breadth-first search

Given an graph G = (V, E), breadth-first search starts at some source vertex s and

discovers which vertices are reachable from s. Define the distance between a vertex v and s to

be the minimum number of edges on a path from s to v. Breadth-first search discovers

vertices in increasing order of distance, and hence can be used as an algorithm for computing

shortest paths.

BFS search define an inverted tree (an acyclic directed graph in which the source is

the root, and every other node has a unique path to the root). If we reverse these edges we get

a rooted unordered tree called a BFS tree for G. (Note that there are many potential BFS trees

for a given graph, depending on where the search starts, and in what order vertices are placed

on the queue.) These edges of G are called tree edges and the remaining edges of G are called

cross edges. It is not hard to prove that if G is an undirected graph, and then cross edges

always go between two nodes that are at most one level apart in the BFS tree.

Initially all vertices (except the source) are colored white, meaning that they are

undiscovered. When a vertex has first been discovered, it is colored gray (and is part of the

frontier). When a gray vertex is processed, then it becomes black. The search makes use of a

queue, a first-in first-out list, where elements are removed in the same order they are inserted.

The first item in the queue (the next to be removed) is called the head of the queue. We will

also maintain arrays

color[u] which holds the color of vertex u (either white, gray or black)

pred[u] which points to the predecessor of u (i.e. the vertex who first discovered u

d[u], the distance from s to u.

Only the colour is really needed for the search. We include all this information, because some

applications of BFS use this additional information.

BFS Algorithm

BFS Example

Depth-First Search

We assume we are given an directed graph G = (V,E). The same algorithm works for

undirected graphs (but the resulting structure imposed on the graph is different). We use four

auxiliary arrays.

A color[u] maintains the status of each vertex; white means undiscovered, gray means

discovered but not finished processing, and black means finished.

Predecessor pointers pred[u], stores the details about the vertex that discovered a

given vertex.

When a vertex u is first discover, stores a counter in d[u]

When a vertex processing is finished, stores a counter in f[u].

DFS Algorithm

DFS Example

The running time of DFS is (V + E). This is somewhat harder to see than the BFS analysis,

because the algorithm contains re

Question Bank

Unit IV – GRAPHS

PART – A (2 MARKS) 1. Define Graph. 2. What is meant by directed graph? 3. Give a diagrammatic representation of an adjacency list representation of a graph. 4. What is meant by topological sort? 5. What is meant by acyclic graph? 6. What is meant by Shortest Path Algorithm? 7. What is meant by Single-Source Shortest path problem? 8. What is meant by Dijkstra’s Algorithm? 9. What is minimum spanning tree? 10. Mention the types of algorithm. 11. Define NP- complete problems 12. What is space requirement of an adjacency list representation of a graph 13. What is topological sort? 14. What is breadth-first search? 15. Define minimum spanning tree 16. Define undirected graph 17. What is depth-first spanning tree 18. What is Bio connectivity? 19. What is Euler Circuit? 20. What is a directed graph? 21. What is meant by ‘Hamiltonian Cycle’? 22. Define (i)indegree (ii)outdegree PART - B (16 MARKS) 1. Explain Prim’s & Kruskal‘s Algorithm with am example. 2. Describe Dijkstra’s algorithm with an example. 3. Explain how to find shortest path using Dijkstra’s algorithm with an example. 4. Explain the application of DFS. 5. Find a minimum spanning tree for the graph using both Prim’s and Kruskal’s algorithms. 6. Explain in detail the simple topological sort pseudo code 7. Write notes on NP-complete problems

UNIT V - STORAGE MANAGEMENT

General Lists – Operations – Linked List Representation – Using Lists – Freeing List Nodes

– Automatic List Management : Reference Count Method – Garbage Collection – Collection

and Compaction

UNIT- V STORAGE MANAGEMENT

STORAGE MANAGEMENT

A programming language that incorporates a large number of data structures must

contain mechanisms for managing those structures and for controlling how storage is

assigned to them.

As data structures become more complex and provide greater capabilities to the user,

the management techniques grow in complexity as well.

So we look at several techniques for implementing dynamic allocation and freeing of

storage.

GENERAL LISTS

Linked lists as a concrete data structure and as a method of implementation for such

abstract data types as the stack, the queue, the priority queue, and the table. In those

implementations a list always contained elements of the same type.

Element is an object as a part of list, and the elements value, which is the object

considered individually.

A list must not be of same type, it contains both integers and characters.

A pointer that is not within the list node is called external pointer and a pointer that is

within the list node is called internal pointer.

It is possible for one or more elements of a list to themselves be lists.

If a list is an nonempty list, head (list) is defined as the value of the first element of

list and tail (list) is defined as the list obtained by removing the first element of list.

Example:

List 1=(5,12,‘s‘,147,‘a‘)

Head(list1)=5

Tail(list1)=(12,‘s‘,147,‘a‘)

Head(tail(list1))=12

Tail(tail(list1))=(‗s‘,147,‘a‘)

For example, consider the list list2 of Fig. This list contains four elements. Two of

these are integers (the first element is the integer 5; the third element is the integer 2) and the

other two are lists. The list that is the second element of list2 contains five elements, three of

which are integers (the first, second and fifth elements) and two of which are lists (the third

element is a list containing the integers 14, 9, and 3 and the fourth element is the null list [the

list with no elements]). The fourth element of list2 is a list containing the three integers 6, 3,

and 10.

list2 = (5,(3,2,(14,9,3),0,4),2,(6,3,10)

head (list I) = 5

tail (list1) = (12, 's', 147, 'a')

head (tail(Iist1) = 12

tail (tail(list1)) = ('s', 147, 'a')

list2 = (5,(3,2,(14,9,3),0,4),2,(6,3,10))

tail (list2) = «3,2,(14,9,3),0,4),2,(6,3,1))

head (tail(list2» = (3,2,(14,9,3),0,4)

head (head(tail(list2») = 3

THE LINKED LIST REPRESENTATION OF A LIST

There are two Possible ways of implementing a list they are

Pointer method

Copy method

Pointer Method:

In this pointer method, the list L1 is represented by a pointer to it.

To create a list L2,the values of list L1 is kept in the next field.

Thus the list L1 becomes a sub list of L2.

The nodes of list L1 are used in two contexts: as a part of L1 and of list L2.

Copy method:

In the copy method, the list L1 is copied before the new list element is added to it.

L1 still points to the original version, and the new copy is made a sub list of L2.

The copy method ensures that the node appears in only context.

Examples:

FREEING LIST NODES

A node or a set of nodes could be an element and/or a sublist of one or several lists. In

such case there is difficulty in determining when such a node can be Modified or Freed.

Define a simple method as a node containing a simple data item. So that infofield

doesnot contain a pointer.

Generally, multiple use of simple nodes is not permitted. That is, operations on simple

nodes are performed by the copy method rather than the pointer method. Thus any simple

node deleted from a list can be freed immediately.

Example:

Which nodes can be freed and which must be retained? Clearly,

The list nodes of list19 (nodes 1,2,3,4) can be' freed, since no other pointers reference them.

Freeing node 1 allows us to free nodes 11 and 12, since they too are accessed by no other

pointers.

Once node 11 is freed, can nodes 7 and 8 also be freed. Node 7 can be freed because

each of the nodes containing a pointer to it (nodes 11 and 4) can be freed.

However, node 8 cannot be freed, since listl1 points to it. List11 is an external

pointer; therefore the node to which it points may still be needed elsewhere in the program.

Since node 8 is kept, nodes 9 and 10 must also be kept. Finally nodes 5 and 6 must be

kept because of the external pointer list l2.

AUTOMATIC LIST MANAGEMENT

The programmer should code the solution to the problem with the assurance that the

system will automatically allocate any list nodes that are necessary for the lists being created

and that the system will make available for reuse any nodes that are no longer accessible.

This is called as Automatic List Management.

There are two principal methods used in automatic list management:

1) Reference count method.

2) Garbage collection method.

THE REFERENCE COUNT METHOD

Under this method each node has an additional count field that keeps a count (called

the reference count) of the number of pointers (both internal and external) to that node. Each

time that the value of some pointer is set to point to a node, the reference count in that node is

increased by I;

each time that the value of some pointer that had been pointing to a node is changed,

the reference count in that node is decreased by 1. When the reference count in any node

becomes 0, that node can be returned to the available list of free nodes.

For example, to execute the statement

1 = tail (1);

The following operations must be performed:

p = 1;

1 = next(l);

next(p) = null;

reduce(p) ;

where the operation reduce(p) is defined recursively as follow:

if (p ! = null)

count(p)--;

if (count(p) == 0)

r = next(p);

redace(r);

if (nodetype(p) == 1st)

redace(head(p»;

free ncde(p);/

/* end if */

/* end if */

To illustrate the reference count method, consider again"the list of Fig above.The following

set of statements creates that list:

listl0 = crlist(14,28);

listl1 = crlist(crlist(S,7);

11 = addon(listl1,~2);

m = crlist(11,head(list11);

list9 = crlist(m,list10,12,11);

m = null;

11 = null;

Fig below illustrates the creation of the list using the reference count method.

List 9 = null;

The results are illustrated in Fig below, where freed nodes are illustrated using dashed lines.

The following sequence of events may take place:

count of node 1 is set to 0.

Node I is freed.

counts of nodes 2 and 11 are set to 0.

Nodes 2 and 1 I are freed.

counts of nodes 5 and 7 are set to 1.

(Figure a) counts of nodes 3 and 12 are set to 0.

Nodes 3 and 12 are freed.

count of node 4 is set to O.

Node 4 is freed.

count of node 9 is set to 1.

count of node 7 is. set to 0.

Node 7 is freed.

(Figure b) count of node 8 is set to 1.

Only those nodes accessible from the external pointers list 10 and list11 remain allocated; all

others are freed.

One drawback of the reference count method is illustrated by the foregoing example.

The amount of work that must be performed by the system each time that a list

manipulation. statement is executed can be considerable.

Whenever a pointer. value is changed, all nodes previously accessible from that

pointer can potentially be freed. Often, the work involved in identifying the nodes to be freed

is not worth the reclaimed. S

After the program has terminated, a single pass reclaims all of its storage without

having to worry about reference count values.

There are two additional disadvantages to the reference count method.

1) The first is the additional space required in each node for the count.

2) The other disadvantage of the reference count method is that the count in the first

node of a recursive or circular list will never be reduced to O.

GARBAGE COLLECTION

Under the reference count method, nodes are reclaimed when they become available

for reuse.

The other principal method of detecting and reclaiming free nodes is called garbage

collection.

When a request is made for additional nodes and there are none available, a system

routine called the garbage collector is called. This routine searches through all of the nodes in

the system, identifies those that are no longer accessible from an external pointer, and

restores the inaccessible nodes to the available pool.

Phases in Garbage Collection:

Garbage collection is usually done in two phases.

1) Marking Phase

2) Collection Phase

Marking Phase:

It involves marking all nodes that are accessible from an external pointer.

Collection Phase:

It involves proceeding sequentially through memory and freeing all nodes that

have not been marked.

One field must be set aside in each node to indicate whether a node has or has not

been marked.

The marking phase sets the mark field to true in each accessible node. As the

collection phase proceeds, the mark field in each accessible node is reset to false.

Thus, at the start and end of garbage collection, all mark fields are false. User

programs do not affect the mark fields.

Whenever the garbage collector is called, all user processing comes to a halt this is

undesirable in case of realtime application.

To avoid this, garbage collector must be called before all space has been exhausted so

that user processing can continue in whatever space is left, while the garbage collector

recovers additional space. .

Another important consideration is that users must be careful to ensure that all lists

are well formed and that all pointers are correct.

Tharshing:

It is possible that, at the time the garbage collection program is called, users are

actually using almost all the nodes that are allocated. Thus almost all nodes are accessible and

the garbage collector recovers very little additional space. After the system runs for a short

time, it will again be out of space; the garbage collector will again be called only to recover

very few additional nodes, and the vicious cycle starts again. This phenomenon, in which

system storage management routines such as garbage collection are executing almost all the

time, is called thrashing.

Algorithms for Garbage Collection

The simplest method for marking all accessible nodes is to mark initially all nodes that

are immediately accessible and then repeatedly pass through all of memory sequentially.

These sequential passes continue until no new nodes have been marked in an entire pass.

Unfortunately, this method is as inefficient as it is simple.

Hence more efficient method is used. Suppose that a node nl in the sequential pass has

been previously marked and that nl includes a pointer to an unmarked node, n2. Then node n2

is marked and the sequential pass would ordinarily continue with the node that follows nl

sequentially in memory. Thus last node reached all accessible nodes has been marked.

Let us present this method as an algorithm. Assume that all list nodes in memory are

viewed as a sequential array.

#define NUMNODES … ;

stract nodetype

int mark;

int utype;

union

int intgrinfo;

char char info;

int lstinfo;

info;

int next; node;

node [NUMNODES];

node[0] is used to represent a dummy node.

node[0].Isti1ifo and node[0].next are initialized to 0,

node[0].mark to true, and node[0].utype to 1st, and that these values are never

changed throughout the system's execution.

The mark field in each node is initially false and is'set to true by the marking

algorithm when a node is found to be accessible.

Assume that ace is an array containing external pointers to immediately accessible

nodes, declared by

#define NUMACC = . . . ;

int acc[NUMACC];

The marking algorithm is as follows:

/* mark all immediately accessible nodes */

for (i = 0; i < NUMACC; i++)

noae[acc[iJ].mark = TRUE;

/* begin a sequential pass through the array of nodes */

/* i points to the node currently being examined */

i = 1;

while (i < NUI'1NODES)

J = i + L; /* J points to the node to be examined next */

if( node[iJ.. mark)

/* mark nodes to which i points */

if(node[i].utype == LST &&

node[ nodeUJ.lstinfoJ. mark != TRUE)

/* the information portion of i */

/* points to an unmarked node */

node[ node[ i ] .1stinfo • mark = TRUE;

if (node[i].lstinfo < J)

j = node[ iJ. Lstinfo;

/* end if */

if (node [node[ i ] • next ]. mark ! = TRUE)

/* the list node following */

/* node[iJ is an unmarked node */

node[node[i].next].mark = TRUE;

if (node[ iJ :next < j)

j = node[iJ.next;

/* end if */

/* end if */

i = j;

/* end while */

Although this method is better than successive sequential passes, it is still inefficient.

The most obvious way to accomplish this is by use of an auxiliary stack and is very

similar to depth-first traversal of a graph.

As each list is traversed through the next fields of its constituent nodes, the utype field

of each node is examined.

for (i = 0; i < NUSACC; i++)

/* mark the next immediately accessible */

/* node and place it on the stack */

node[acc[i]].mark = TRUE;

push(stack, acc[i]);

while (empty(stack) != TRUE)

P = pop(stack);

whil ( p ! = 0)

if (node[p].utype == LST &&

node[node[p].lstinfo].mark != TRUE)

node[node[pl.lstinfo].mark = TRUE;

push(stack, node[p].lstinfo);

/* end if */

if (node[node[p].next].mark == TRUE)

P = 0;

else

p = node[p].next;

node[p].mark = TRUE;

/* end if */

/ end while */

/* end while */

* end for */

This algorithm is as efficient as we can hope for in terms of time, since each node to

be marked is visited only once.

One solution is· to use a stack limited to some maximum size. If the stack is about to

overflow, we can revert to the sequential method given in the previous algorithm.

Fig above illustrates how this stacking mechanism works. Fig (a) shows a list before

the marking algorithm begins. The pointer p points to the node currently being processed, top

points to the stack top, and q is an auxiliary pointer. The mark field is shown as the first field

in each node.

Fig (b) shows the same list immediately after node 4 has been marked. The path taken

to node 4 is through the next fields of nodes 1, 2, and 3. This path can be retraced in reverse

order, beginning at top and following along the next fields.

Fig (c) shows the list after node 7 has been marked. The path to node 7 from the

beginning of the list was from node 1, through node[l].next to node 2, through node[2].lstinjo

to node 5, through node(5].ne.xr to node 6, and then from node(6].ne.xr to node 7. The same

fields that link together the stack are used to resore the list to its original form. Note that the

utype field in node 2 is stk rather than lst to indicate that its lstinfo field, not its next field, is

being used as a stack pointer.

The algorithm that incorporates these ideas is known. as the Schorr-Waite algorithm,

after its discoverers.

COLLECTION AND COMPACTION

The memory spaces that are not used by any program but unavailable to any user

were collected and compacted.

Collection phase

The purpose of this phase is to return to available memory all those locations that

were previously garbage .

It is easy to pass through memory sequentially, examine each node in turn, and return

unmarked nodes to available storage.

For example, the following algorithm could be used to return the unmarked nodes to

an available list headed by avail:

After this algorithm has completed, all unused nodes are on the available list and all

nodes that are in use be programs have their mark fields turned off.

Need For Collection:

All nodes that are not in use are on the available list, the memory of the system may

not be in an optimal state for future use. This is because the interleaving of the occupied

nodes with available nodes may make much of the memory on the available list unusable.

For example, memory is often required in blocks (groups of contiguous nodes) rather

than as single discrete nodes one at a time. The memory request by a compiler for space in

which to store an array would require the allocation of such a block.

Compaction:

The Process of moving all used node to one end of memory and all the available

memory to the otherend is called compaction.

This algorithm that performs such a process is called compaction algorithms

The following algorithm performs the actual compaction.

The basic Problem in developing an algorithm that moves portions of memory from

one location to another is to preserve the integrity of pointer values to the nodes being

moved.

For this reason the algorithm requires 1200 sequential passes.

The first pass maybe outlined as follows:

Update the memory location to be assigned to the next marked node, nd

Traverse the list of nodes pointed to by header (nd) and change the appropriate pointer

fields to point to the new location of nd.

Once this process has been completed for each marked node, a second pass through

memory will perform the actual compaction.

The Second Pass Perform the Following Operations:

Update the memory location to be assigned to the next marked node, nd

Traverse the list of nodes pointed to by header (nd) and change the appropriate pointer

fields to point to the new location of nd.

Move nd to its newlocation.

Several points should be noted with respect to this algorithm

First, node[O] is suitably initialized so that the algorithm need not test for special cases.

Second, the process of adjusting the pointers of all nodes on the list headed by the header

field of a particular node is performed twice: once during the first pass and once during the

second.

Time Requirements:

The time requirements of the algorithm are easy to analyze. There are two linear

passes throughout the complete array of memory. Each pass through memory scans each

node once and adjusts any pointer fields to which the nodes point. The time requirements are

obviously O(n).

Our compaction algorithm does not require any additional storage in the nodes.Such

an algorithm is called as bounded workspace algorithm.

Question Bank

UNIT V – STORAGE MANAGEMENT PART – A (2 MARKS)

1. What is storage management?

2. What is general list?

3. What are the possible ways to represent linked list?

4. How to free the list nodes?

5. What is automatic list management?

6. What are two principal methods used in automatic list management?

7. What is Reference count method?

8. What is Garbage collection method?

9. What is Garbage collection?

10. What are the two Phases in Garbage Collection?

11. What is marking phase?

12. What is collection phase?

13. What is thrashing?

14. What is collection?

15. What is compaction?

PART - B (16 MARKS) 1. Explain about collection and compaction. 2. What is garbage collection? Write an algorithm for garbage collection

3. Explain about Garbage collection.

4. Explain about automatic list management. Explain about two principal methods used in

automatic list management.

5. Explain about two Possible ways of implementing a list (8)

6. Explain about general list (8).

****

University Question papers

B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2010

Third Semester

Computer Science and Engineering CS 2201 — DATA STRUCTURES

(Regulation 2008)

Time : Three hours

Maximum : 100 Marks

Answer ALL questions

PART A — (10 × 2 = 20 Marks)

1. What is an Abstract Data Type? What are all not concerned in an ADT?

2. Explain why binary search cannot be performed on a linked list.

3. Number the following binary tree (Fig. 1) to traverse it in.

(a) Preorder

(b) Inorder

4. What is a threaded binary tree?

5. Define a heap. How can it be used to represent a priority queue?

6. Give any two applications of binary heaps.

7. What is hashing?

8. Mention any four applications of a set.

9. What is breadth-first traversal?

10. Define spanning tree of a graph.

PART B — (5 × 16 = 80 Marks)

11. (a) Describe the data structures used to represent lists. (Marks 16)

Or

(b) Describe circular queue implementation in detail giving all the relevant

features. (Marks 16)

12. (a) Explain the tree traversals. Give all the essential aspects. (Marks 16)

Or

(b) Explain binary search tree ADT in detail. (Marks 16)

13. (a) Discuss in detail the B-tree. What are its advantages? (Marks 16)

Or

(b) Explain binary heaps in detail. Give its merits. (Marks 16)

14. (a) Explain separate chaining and extendible hashing. (Marks 16)

Or

(b) Explain in detail the dynamic equivalence problem. (Marks 16)

15. (a) Consider the following graph

(i) Obtain minimum spanning tree by Kruskal's algorithm. (Marks 10)

(ii) Explain Topological sort with an example. (Marks 6)

Or

(b) (i) Find the shortest path from 'a' to 'd' using Dijkstra's algorithm

in the graph in Figure 2 given in question 15(a). (Marks 10)

(ii) Explain Euler circuits with an example. (Marks 6)

B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2009

Third Semester

Computer Science and Engineering CS 2201 — DATA STRUCTURES

(Regulation 2008)

Time : Three hours

Maximum : 100 Marks

Answer ALL Questions

PART A — (10 × 2 = 20 Marks)

1. List out the areas in which data structures are applied extensively.

2. Convert the expression ((A+B)*C-(D-E)^(F +G)) to equivalent Prefix and

Post fix notations.

3. How many different trees are possible with 10 nodes?

4. What is an almost complete binary tree?

5. In an AVL tree, at what condition the balancing is to be done?

6. What is the bucket size, when the overlapping and collision occur at same

time?

7. Define graph.

8. What is a minimum spanning tree?

9. Define NP hard and NP complete.

10. What is meant by dynamic programming?

PART B — (5 × 16 = 80 Marks)

11. (a) (i) What is a linked list? Explain with suitable program segments any

four operations of a linked list. (Marks 12)

(ii) Explain with a pseudo code how a linear queue could be converted

into a circular queue. (Marks 4)

Or

(b) (i) What is a stack ADT? Write in detail about any three applications

of stack. (Marks 11)

(ii) With a pseudo code explain how a node can inserted at a user

specified position of a doubly linked list. (Marks 5)

12. (a) (i) Discuss the various representations of a binary tree in memory with

suitable example. (Marks 8)

(ii) What are the basic operations that can be performed on a binary

tree? Explain each of them in detail with suitable example. (Marks 8)

Or

(b) (i) Give an algorithm to convert a general tree to binary tree. (Marks 8)

(ii) With an example, explain the algorithms of in order and post order

traversals on a binary search tree. (Marks 8)

13. (a) What is an AVL tree? Explain the rotations of an AVL tree. (Marks 16)

Or

(b) (i) Explain the binary heap in detail. (Marks 8)

(ii) What is hashing? Explain any two methods to overcome collision

problem of hashing. (Marks 8)

14. (a) (i) Explain Dijkstra's algorithm and solve the single source shortest

path problem with an example. (Marks 12)

(ii) Illustrate with an example, the linked list representation of graph.

(Marks 4)

Or

(b) (i) Write the procedures to perform the BFS and DFS search of a

graph. (Marks 8)

(ii) Explain Prim's algorithm to construct a minimum spanning tree

from an undirected graph. (Marks 8)

15. (a) (i) With an example, explain how will you measure the efficiency of an

algorithm. (Marks 8)

(ii) Analyze the linear search algorithm with an example. (Marks 8)

Or

(b) Explain how the travelling salesman problem can be solved using greedy

algorithm. (Marks 16)

Data Structures (BE)

Education

Transcript of Data Structures (BE)