GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single...
-
Upload
barnard-malone -
Category
Documents
-
view
219 -
download
2
Transcript of GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single...
GREEDY ALGORITHMS
UNIT IV
TOPICS TO BE COVERED
• Fractional Knapsack problem• Huffman Coding• Single source shortest paths• Minimum Spanning Trees• Task Scheduling Problem• Backtracking: –Introduction and N-Queens problem.
Overview• Like dynamic programming, used to solve
optimization problems.• Problems exhibit optimal substructure.• Problems also exhibit the greedy-choice property.– A greedy algorithm always makes the choice that looks
best at the moment.– Make a locally optimal choice in hope of getting a
globally optimal solution.– Greedy algorithms do not always yield optimal solutions,
but for many problems they do.– The greedy method is quite powerful and works well for
a wide range of problems.
Elements of the greedy strategy• Determine the optimal substructure of the problem.• Develop a recursive solution.• Prove that at any stage of the recursion, one of the
optimal choices is the greedy choice. • Thus, it is always safe to make the greedy choice.• Show that all but one of the subproblems induced by
having made the greedy choice are empty.• Develop a recursive algorithm that implements the
greedy strategy.• Convert the recursive algorithm to an iterative
algorithm.
The Fractional Knapsack Problem• Given: A set S of n items, with each item i having– bi - a positive benefit– wi - a positive weight
• Goal: Choose items with maximum total benefit but with weight at most W.
• If we are allowed to take fractional amounts, then this is the fractional knapsack problem. - In this case, we let xi denote the amount we take of item
i
– Objective: Maximize
–Constraint:
Si
iii wxb )/(
Si
i Wx
Example• Given: A set S of n items, with each item i having– bi - a positive benefit– wi - a positive weight
• Goal: Choose items with maximum total benefit but with weight at most W.
Weight:Benefit:
1 2 3 4 5
4 ml 8 ml 2 ml 6 ml 1 ml
$12 $32 $40 $30 $50
Items:
Value: 3($ per ml)
4 20 5 5010 ml
Solution:• 1 ml of 5• 2 ml of 3• 6 ml of 4• 1 ml of 2
“knapsack”
The Fractional Knapsack Algorithm• Greedy choice: Keep taking
item with highest value (benefit to weight ratio)– Use a heap-based priority
queue to store the items, then the time complexity is O(n log n).
• Correctness: Suppose there is a better solution– there is an item i with higher
value than a chosen item j (i.e., vj<vi) , if we replace some j with i, we get a better solution
– Thus, there is no better solution than the greedy one
Algorithm fractionalKnapsack(S, W)
Input: set S of items w/ benefit bi and weight wi; max. weight W
Output: amount xi of each item i to maximize benefit with weight at most W
for each item i in S
xi 0
vi bi / wi {value}w 0 {current total weight}while w < W
remove item i with highest vi
xi min{wi , W w}
w w + min{wi , W w}
Greedy versus dynamic programming(0-1 Knapsack Problem)
Huffman codes• Huffman codes are a widely used and very effective
technique for compressing data• Savings of 20% to 90% are typical, depending on the
characteristics of the data being compressed.• We consider the data to be a sequence of
characters.• Huffman’s greedy algorithm uses a table of the
frequencies of occurrence of the characters to build up an optimal way of representing each character as a binary string.
Example• Suppose we have a 100,000-character data file that
we wish to store compactly.• We observe that the characters in the file occur
with the frequencies given by Figure
• That is, only six different characters appear, and the character ‘a’ occurs 45,000 times.
Characters a b c d e fFrequency (in thousands) 45 13 12 16 9 5Fixed-length codeword 000 001 010 011 100 101
Variable-length codeword 0 101 100 111 1101 1100
Example• There are many ways to represent such a file of
information. • We consider the problem of designing a binary character
code (or code for short) wherein each character is represented by a unique binary string.
• If we use a fixed-length code, we need 3 bits to represent six characters: a = 000, b = 001, . . . , f = 101.
• This method requires 300,000 bits to code the entire file. • Can we do better?
Example• A variable-length code can do considerably better than a
fixed-length code, by giving frequent characters short codewords and infrequent characters long codewords.
• Above Figure shows such a code; • Here the 1-bit string 0 represents ‘a’, and • The 4-bit string 1100 represents ‘f’. • This code requires
(45 · 1 + 13 · 3 + 12 · 3 + 16 · 3 + 9 · 4 + 5 · 4)103 = 224 103bits
to represent the file, a savings of approximately 25%. • In fact, this is an optimal character code for this file.
Huffman codes (Basic Concepts)• We consider here only codes in which no codeword
is also a prefix of some other codeword. • Such codes are called prefix codes.• Encoding is always simple for any binary character
code; we just concatenate the codewords representing each character of the file.
• We code the 3-character file abc as 0·101·100 = 0101100, where we use “·” to denote concatenation.
Huffman codes (Basic Concepts)• Prefix codes are desirable because they simplify
decoding. • Since no codeword is a prefix of any other, the
codeword that begins an encoded file is unambiguous.
• We can simply identify the initial codeword, translate it back to the original character, and repeat the decoding process on the remainder of the encoded file.
Huffman codes (Construction)• Huffman invented a greedy algorithm that constructs an optimal prefix
code called a Huffman code.
• Keeping in line with our observations, its proof of correctness relies on the greedy-choice property and optimal substructure.
• In the pseudo-code that follows:
• we assume that C is a set of n characters and each character c C is an ∈object with a defined frequency f[c].
• The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner.
• It begins with a set of |C| leaves and performs a sequence of |C| − 1 “merging” operations to create the final tree.
• A min-priority queue Q, keyed on f, is used to identify the two least frequent objects to merge together.
• The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged.
Huffman codes (Algorithm)
Huffman codes (Example)
Characters a b c d e fFrequency (in thousands)
45 13 12 16 9 5
Fixed-length codeword 000 001 010 011 100 101Variable-length codeword
0 101 100 111 1101 1100
Huffman codes (Example)
Huffman codes (Example)
Huffman codes (Example)
Huffman codes (Example)
Char. Codea 0b 101c 100d 111e 1101f 1100
Single Source Shortest Path• we are given a weighted, directed graph G = (V, E),
with weight function w : E → R mapping edges to real-valued weights.
• The weight of path p = v0, v1, . . . , vk is the sum of the weights of its constituent edges:
• We define the shortest-path weight from u to v by
k
iii vvwpw
11 ),()(
otherwise
vupwp
p
,
:)(min)(
Single Source Shortest Path• A shortest path from vertex u to vertex v is then
defined as any path p with weight:w(p) = δ(u, v).
Relaxation• The algorithms use the technique of relaxation. For
each vertex v V, we maintain an attribute d[v], ∈which is an upper bound on the weight of a shortest path from source s to v.
• We call d[v] a shortest-path estimate. We initialize the shortest-path estimates and predecessors by the following (V)-time procedure.
Single Source Shortest Path• INITIALIZE-SINGLE-SOURCE(G, s)1 for each vertex v V[G]∈2 do d[v]←∞3 π[v]← NIL4 d[s] ← 0• After initialization, π[v] = NIL for all v V, d[s] = 0, ∈
and d[v] = ∞ for v V − {s}.∈
Single Source Shortest Path• The process of relaxing an edge (u, v) consists of
testing whether we can improve the shortest path to v found so far by going through u and, if so, updating d[v] and π[v].
• A relaxation step may decrease the value of the shortest-path estimate d[v] and update v’s predecessor field π[v].
• The following code performs a relaxation step on edge (u, v).
Single Source Shortest PathRELAX(u, v, w)1 if d[v] > d[u] + w(u, v)2 then d[v] ← d[u] + w(u, v)3 π[v]← u
Single Source Shortest Path Dijkstra’s algorithm
• Dijkstra’s algorithm solves the single-source shortest-paths problem on a weighted, directed graph G = (V, E) for the case in which all edge weights are nonnegative.
• We assume that w(u, v) ≥ 0 for each edge (u, v) E.∈• Dijkstra’s algorithm maintains a set S of vertices
whose final shortest-path weights from the source s have already been determined.
Single Source Shortest Path Dijkstra’s algorithm
• The algorithm repeatedly selects the vertex u V − ∈S with the minimum shortest-path estimate, adds u to S, and relaxes all edges leaving u.
• In the following implementation, we use a min-priority queue Q of vertices, keyed by their d values.
Single Source Shortest Path Dijkstra’s algorithm
DIJKSTRA(G, w, s)1 INITIALIZE-SINGLE-SOURCE(G, s)2 S ← ∅3 Q ← V[G]4 while Q ≠ ∅5 do u ← EXTRACT-MIN(Q)6 S ← S {u}∪7 for each vertex v Adj[u]∈8 do RELAX(u, v, w)
Single Source Shortest Path Dijkstra’s algorithm
Single Source Shortest Path Dijkstra’s algorithm
Single Source Shortest Path Dijkstra’s algorithm
Single Source Shortest Path Dijkstra’s algorithm
Single Source Shortest Path Dijkstra’s algorithm
Single Source Shortest Path Dijkstra’s algorithm
36
Task Scheduling• Given: a set T of n tasks, each having:
– A start time, si
– A finish time, fi (where si < fi)• Goal: Perform all the tasks using a minimum number of
“machines.”
1 98765432
Machine 1
Machine 3
Machine 2
37
Task Scheduling Algorithm• Greedy choice: consider tasks by their start time and use as few
machines as possible with this order.
Algorithm taskSchedule(T)
Input: set T of tasks with start time si and finish time fi
Output: non-conflicting schedule with minimum number of machines
m 0 {no. of machines}while T is not empty
remove task i with smallest si
if there’s a machine j for i thenschedule i on machine j
else m m + 1schedule i on machine m
38
Task Scheduling Algorithm• Running time: Given a set of n tasks specified by
their start and finish times, Algorithm TaskSchedule produces a schedule of the tasks with the minimum number of machines in O(nlogn) time.–Use heap-based priority queue to store tasks
with the start time as the priorities– Finding the earliest task takes O(logn) time
39
Example• Given: a set T of n tasks, each having:
– A start time, si
– A finish time, fi (where si < fi)– [1,4], [1,3], [2,5], [3,7], [4,7], [6,9], [7,8] (ordered by start)
• Goal: Perform all tasks on min. number of machines
1 98765432
Machine 1
Machine 3
Machine 2
40
Backtracking
• Suppose you have to make a series of decisions, among various choices, where– You don’t have enough information to know
what to choose– Each decision leads to a new set of choices– Some sequence of choices (possibly more than
one) may be a solution to your problem
• Backtracking is a methodical way of trying out various sequences of decisions, until you find one that “works”
41
Backtracking (animation)
start ?
?dead end
dead end
??
dead end
dead end
?
success!
dead end
42
Terminology I
There are three kinds of nodes:
A tree is composed of nodes
The (one) root node
Internal nodes
Leaf nodes
Backtracking can be thought of as searching a tree for a particular “goal” leaf node
43
Terminology II• Each non-leaf node in a tree is a parent of one
or more other nodes (its children)• Each node in the tree, other than the root, has
exactly one parent
parent
children
parent
children
Usually, however, we draw our trees downward, with the root at the top
44
The backtracking algorithm
• Backtracking is really quite simple--we “explore” each node, as follows:
• To “explore” node N: 1. If N is a goal node, return “success” 2. If N is a leaf node, return “failure” 3. For each child C of N,
3.1. Explore C 3.1.1. If C was successful, return “success”
4. Return “failure”
45
FOUR QUEENS PROBLEM
1. . 2
1 2
1 2 3. . . .
1
1
1 23. . 4
46
8 QUEENS PROBLEM
1 2 3 4 5 6 7 8
1 Q
2 Q
3 Q
4 Q
5 Q
6 Q
7 Q
8 Q
47
The End