GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single...

GREEDY ALGORITHMS

UNIT IV

TOPICS TO BE COVERED

• Fractional Knapsack problem• Huffman Coding• Single source shortest paths• Minimum Spanning Trees• Task Scheduling Problem• Backtracking: –Introduction and N-Queens problem.

Overview• Like dynamic programming, used to solve

optimization problems.• Problems exhibit optimal substructure.• Problems also exhibit the greedy-choice property.– A greedy algorithm always makes the choice that looks

best at the moment.– Make a locally optimal choice in hope of getting a

globally optimal solution.– Greedy algorithms do not always yield optimal solutions,

but for many problems they do.– The greedy method is quite powerful and works well for

a wide range of problems.

Elements of the greedy strategy• Determine the optimal substructure of the problem.• Develop a recursive solution.• Prove that at any stage of the recursion, one of the

optimal choices is the greedy choice. • Thus, it is always safe to make the greedy choice.• Show that all but one of the subproblems induced by

having made the greedy choice are empty.• Develop a recursive algorithm that implements the

greedy strategy.• Convert the recursive algorithm to an iterative

algorithm.

The Fractional Knapsack Problem• Given: A set S of n items, with each item i having– bi - a positive benefit– wi - a positive weight

• Goal: Choose items with maximum total benefit but with weight at most W.

• If we are allowed to take fractional amounts, then this is the fractional knapsack problem. - In this case, we let xi denote the amount we take of item

i

– Objective: Maximize

–Constraint:

Si

iii wxb )/(

Si

i Wx

Example• Given: A set S of n items, with each item i having– bi - a positive benefit– wi - a positive weight

• Goal: Choose items with maximum total benefit but with weight at most W.

Weight:Benefit:

1 2 3 4 5

4 ml 8 ml 2 ml 6 ml 1 ml

$12 $32 $40 $30 $50

Items:

Value: 3($ per ml)

4 20 5 5010 ml

Solution:• 1 ml of 5• 2 ml of 3• 6 ml of 4• 1 ml of 2

“knapsack”

The Fractional Knapsack Algorithm• Greedy choice: Keep taking

item with highest value (benefit to weight ratio)– Use a heap-based priority

queue to store the items, then the time complexity is O(n log n).

• Correctness: Suppose there is a better solution– there is an item i with higher

value than a chosen item j (i.e., vj<vi) , if we replace some j with i, we get a better solution

– Thus, there is no better solution than the greedy one

Algorithm fractionalKnapsack(S, W)

Input: set S of items w/ benefit bi and weight wi; max. weight W

Output: amount xi of each item i to maximize benefit with weight at most W

for each item i in S

xi 0

vi bi / wi {value}w 0 {current total weight}while w < W

remove item i with highest vi

xi min{wi , W w}

w w + min{wi , W w}

Greedy versus dynamic programming(0-1 Knapsack Problem)

Huffman codes• Huffman codes are a widely used and very effective

technique for compressing data• Savings of 20% to 90% are typical, depending on the

characteristics of the data being compressed.• We consider the data to be a sequence of

characters.• Huffman’s greedy algorithm uses a table of the

frequencies of occurrence of the characters to build up an optimal way of representing each character as a binary string.

Example• Suppose we have a 100,000-character data file that

we wish to store compactly.• We observe that the characters in the file occur

with the frequencies given by Figure

• That is, only six different characters appear, and the character ‘a’ occurs 45,000 times.

Characters a b c d e fFrequency (in thousands) 45 13 12 16 9 5Fixed-length codeword 000 001 010 011 100 101

Variable-length codeword 0 101 100 111 1101 1100

Example• There are many ways to represent such a file of

information. • We consider the problem of designing a binary character

code (or code for short) wherein each character is represented by a unique binary string.

• If we use a fixed-length code, we need 3 bits to represent six characters: a = 000, b = 001, . . . , f = 101.

• This method requires 300,000 bits to code the entire file. • Can we do better?

Example• A variable-length code can do considerably better than a

fixed-length code, by giving frequent characters short codewords and infrequent characters long codewords.

• Above Figure shows such a code; • Here the 1-bit string 0 represents ‘a’, and • The 4-bit string 1100 represents ‘f’. • This code requires

(45 · 1 + 13 · 3 + 12 · 3 + 16 · 3 + 9 · 4 + 5 · 4)103 = 224 103bits

to represent the file, a savings of approximately 25%. • In fact, this is an optimal character code for this file.

Huffman codes (Basic Concepts)• We consider here only codes in which no codeword

is also a prefix of some other codeword. • Such codes are called prefix codes.• Encoding is always simple for any binary character

code; we just concatenate the codewords representing each character of the file.

• We code the 3-character file abc as 0·101·100 = 0101100, where we use “·” to denote concatenation.

Huffman codes (Basic Concepts)• Prefix codes are desirable because they simplify

decoding. • Since no codeword is a prefix of any other, the

codeword that begins an encoded file is unambiguous.

• We can simply identify the initial codeword, translate it back to the original character, and repeat the decoding process on the remainder of the encoded file.

Huffman codes (Construction)• Huffman invented a greedy algorithm that constructs an optimal prefix

code called a Huffman code.

• Keeping in line with our observations, its proof of correctness relies on the greedy-choice property and optimal substructure.

• In the pseudo-code that follows:

• we assume that C is a set of n characters and each character c C is an ∈object with a defined frequency f[c].

• The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner.

• It begins with a set of |C| leaves and performs a sequence of |C| − 1 “merging” operations to create the final tree.

• A min-priority queue Q, keyed on f, is used to identify the two least frequent objects to merge together.

• The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged.

Huffman codes (Algorithm)

Huffman codes (Example)

Characters a b c d e fFrequency (in thousands)

45 13 12 16 9 5

Fixed-length codeword 000 001 010 011 100 101Variable-length codeword

0 101 100 111 1101 1100


Char. Codea 0b 101c 100d 111e 1101f 1100

Single Source Shortest Path• we are given a weighted, directed graph G = (V, E),

with weight function w : E → R mapping edges to real-valued weights.

• The weight of path p = v0, v1, . . . , vk is the sum of the weights of its constituent edges:

• We define the shortest-path weight from u to v by

k

iii vvwpw

11 ),()(

otherwise

vupwp

p

,

:)(min)(

Single Source Shortest Path• A shortest path from vertex u to vertex v is then

defined as any path p with weight:w(p) = δ(u, v).

Relaxation• The algorithms use the technique of relaxation. For

each vertex v V, we maintain an attribute d[v], ∈which is an upper bound on the weight of a shortest path from source s to v.

• We call d[v] a shortest-path estimate. We initialize the shortest-path estimates and predecessors by the following (V)-time procedure.

Single Source Shortest Path• INITIALIZE-SINGLE-SOURCE(G, s)1 for each vertex v V[G]∈2 do d[v]←∞3 π[v]← NIL4 d[s] ← 0• After initialization, π[v] = NIL for all v V, d[s] = 0, ∈

and d[v] = ∞ for v V − {s}.∈

Single Source Shortest Path• The process of relaxing an edge (u, v) consists of

testing whether we can improve the shortest path to v found so far by going through u and, if so, updating d[v] and π[v].

• A relaxation step may decrease the value of the shortest-path estimate d[v] and update v’s predecessor field π[v].

• The following code performs a relaxation step on edge (u, v).

Single Source Shortest PathRELAX(u, v, w)1 if d[v] > d[u] + w(u, v)2 then d[v] ← d[u] + w(u, v)3 π[v]← u

Single Source Shortest Path Dijkstra’s algorithm

• Dijkstra’s algorithm solves the single-source shortest-paths problem on a weighted, directed graph G = (V, E) for the case in which all edge weights are nonnegative.

• We assume that w(u, v) ≥ 0 for each edge (u, v) E.∈• Dijkstra’s algorithm maintains a set S of vertices

whose final shortest-path weights from the source s have already been determined.


• The algorithm repeatedly selects the vertex u V − ∈S with the minimum shortest-path estimate, adds u to S, and relaxes all edges leaving u.

• In the following implementation, we use a min-priority queue Q of vertices, keyed by their d values.


DIJKSTRA(G, w, s)1 INITIALIZE-SINGLE-SOURCE(G, s)2 S ← ∅3 Q ← V[G]4 while Q ≠ ∅5 do u ← EXTRACT-MIN(Q)6 S ← S {u}∪7 for each vertex v Adj[u]∈8 do RELAX(u, v, w)

36

Task Scheduling• Given: a set T of n tasks, each having:

– A start time, si

– A finish time, fi (where si < fi)• Goal: Perform all the tasks using a minimum number of

“machines.”

1 98765432

Machine 1

Machine 3

Machine 2

37

Task Scheduling Algorithm• Greedy choice: consider tasks by their start time and use as few

machines as possible with this order.

Algorithm taskSchedule(T)

Input: set T of tasks with start time si and finish time fi

Output: non-conflicting schedule with minimum number of machines

m 0 {no. of machines}while T is not empty

remove task i with smallest si

if there’s a machine j for i thenschedule i on machine j

else m m + 1schedule i on machine m

38

Task Scheduling Algorithm• Running time: Given a set of n tasks specified by

their start and finish times, Algorithm TaskSchedule produces a schedule of the tasks with the minimum number of machines in O(nlogn) time.–Use heap-based priority queue to store tasks

with the start time as the priorities– Finding the earliest task takes O(logn) time

39

Example• Given: a set T of n tasks, each having:

– A start time, si

– A finish time, fi (where si < fi)– [1,4], [1,3], [2,5], [3,7], [4,7], [6,9], [7,8] (ordered by start)

• Goal: Perform all tasks on min. number of machines

1 98765432

Machine 1

Machine 3

Machine 2

40

Backtracking

• Suppose you have to make a series of decisions, among various choices, where– You don’t have enough information to know

what to choose– Each decision leads to a new set of choices– Some sequence of choices (possibly more than

one) may be a solution to your problem

• Backtracking is a methodical way of trying out various sequences of decisions, until you find one that “works”

41

Backtracking (animation)

start ?

?dead end

dead end

??

dead end

dead end

?

success!

dead end

42

Terminology I

There are three kinds of nodes:

A tree is composed of nodes

The (one) root node

Internal nodes

Leaf nodes

Backtracking can be thought of as searching a tree for a particular “goal” leaf node

43

Terminology II• Each non-leaf node in a tree is a parent of one

or more other nodes (its children)• Each node in the tree, other than the root, has

exactly one parent

parent

children

parent

children

Usually, however, we draw our trees downward, with the root at the top

44

The backtracking algorithm

• Backtracking is really quite simple--we “explore” each node, as follows:

• To “explore” node N: 1. If N is a goal node, return “success” 2. If N is a leaf node, return “failure” 3. For each child C of N,

3.1. Explore C 3.1.1. If C was successful, return “success”

4. Return “failure”

45

FOUR QUEENS PROBLEM

1. . 2

1 2

1 2 3. . . .

1

1

1 23. . 4

46

8 QUEENS PROBLEM

1 2 3 4 5 6 7 8

1 Q

2 Q

3 Q

4 Q

5 Q

6 Q

7 Q

8 Q

47

The End

GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single...

Documents

Transcript of GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single...