AaDS - W14 HuffmanCode_Knapsack.pdf
-
Upload
gustavomcpra -
Category
Documents
-
view
236 -
download
0
Transcript of AaDS - W14 HuffmanCode_Knapsack.pdf
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
1/16
1AaDS 2010/2011
Huffman code &
knapsack problemAlgorithms
and Data Structures
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
2/16
2AaDS 2010/2011
ProblemProblems:
Huffman codes
knapsack problem:
0-1 knapsack problem fractional knapsack problem
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
3/16
3AaDS 2010/2011
Huffman codes
110011011111001010Variable-length codeword
101100011010001000Fixed-length codeword5916121345
Frequency (in thousands)
fedcba
Suppose we have a 100,000-character data file that we wish to store compactly.We observe that the characters in the file occur with the frequencies given bybelow table. That is, only six different characters appear, and the character aoccurs 45,000 times.
Fixed-length code:3*100,000 = 300,000
Variable-length codeword:(45 1 + 13 3 + 12 3 + 16 3 + 9 4 + 5 4) 1,000 = 224,000 bits
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
4/16
4AaDS 2010/2011
Prefix codes We consider here only codes in which no codeword is also a prefix
of some other codeword. Such codes are calledprefix codes.
Encoding is always simple for any binary character code; we justconcatenate the codewords representing each character of the file.For example, with the variable-length prefix code of the table, wecode the 3-character file abc as 0101100 = 0101100, where weuse to denote concatenation.
Prefix codes are desirable because they simplify decoding. Since nocodeword is a prefix of any other, the codeword that begins anencoded file is unambiguous. We can simply identify the initialcodeword, translate it back to the original character, and repeat thedecoding process on the remainder of the encoded file. In ourexample, the string 001011101 parses uniquely as 0 0 101 1101, which decodes to aabe.
110011011111001010prefix codes
fedcba
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
5/16
5AaDS 2010/2011
Trees We interpret the binary codeword for a character as the path from the root to
that character, where 0 means "go to the left child" and 1 means "go to theright child
100
86 14
58 28 14
a:45 b:13 c:12 d:16 e:9 f:5
0 1 0 1 0 1
0 1 0
0 10 1
0 1
0 1
0 1 0 1
100
a:45 55
25 30
14b:13c:12 d:16
f:5 e:9
Given a tree Tcorresponding to a prefix code, it is a simplematter to compute the number of bits required to encode afile. For each charactercin the alphabet C, let f(c) denotethe frequency ofcin the file and let dT(c) denote the depth ofc's leaf in the tree. Note that dT(c) is also the length of thecodeword for characterc. The number of bits required toencode a file is thus B(T) which we define as the cost of thetree T
( ) ( ) ( )cdcfTBT
Cc
=
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
6/16
6AaDS 2010/2011
Huffman codes - Code We assume that Cis a set ofn characters and that each character
cCis an object with a defined frequency f[c]. The algorithm builds
the tree Tcorresponding to the optimal code in a bottom-up manner.It begins with a set of |C| leaves and performs a sequence of |C| - 1"merging" operations to create the final tree. A min-priority queue Q,keyed on f, is used to identify the two least-frequent objects tomerge together. The result of the merger of two objects is a newobject whose frequency is the sum of the frequencies of the twoobjects that were merged.
Huffman(C)
n := |C|
Q := C
for i:=1 to n-1 do
z := Allocate_Node()
x := left[z] := Extract_Min(Q)
y := right[z] := Extract_Min(Q)f[z] := f[x] + f[y]
Insert(Q,z)
return Extract_Min(Q)
{ 1}
{ 2}
{ 3}
{ 4}
{ 5}
{ 6}{ 7}
{ 8}
{ 9}
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
7/16
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
8/16
8AaDS 2010/2011
Huffman codes - complexity Q is implemented as a binary minheap. For a set Cofn
characters, the initialization ofQ in line 2 can beperformed in O (n) time using the BUILD-MIN-HEAPprocedure. The forloop in lines 3-8 is executed exactlyn-1 times, and since each heap operation requires timeO (lg n), the loop contributes O (n lg n) to the runningtime. Thus, the total running time of HUFFMAN on a set
ofn characters is O (n lg n).
letters and frequency:a:1 b:1 c:2 d:3 e:5 f:8 g:13 h:21
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
9/16
9AaDS 2010/2011
Knapsack problems Knapsack problems:
The 0-1 knapsack problem is posed as follows. Athief robbing a store finds n items; the ith item is worthvidollars and weighs wipounds, where viand wiareintegers. He wants to take as valuable a load aspossible, but he can carry at most Wpounds in his
knapsack for some integerW. Which items should hetake? (This is called the 0-1 knapsack problembecause each item must either be taken or leftbehind; the thief cannot take a fractional amount of anitem or take an item more than once.)
In the fractional knapsack problem, the setup is thesame, but the thief can take fractions of items, ratherthan having to make a binary (0-1) choice for eachitem.
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
10/16
10AaDS 2010/2011
Solutionfractional knapsack problem solution: (greedy
algorithm): compute the value per pound vi/wifor each item
take as much as possible of the item with the greatestvalue per pound. If the supply of that item is
exhausted and you can still carry more, take as muchas possible of the item with the next greatest valueper pound, and so forth until you can't carry any more
because of sorting the items by value per pound,
the greedy algorithm runs in O(n lg n) time.for0-1 knapsack problem this greedy strategy
does not work!
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
11/16
11AaDS 2010/2011
Example
10
2030
50 50 50 50 50
optimal solution forthe fractionalknapsack problem
10
20
20---30
$60 $100 $120 knapsack
$60
$100
$80
$6 $5 $4
+
+
=$240
10
20
10
20
30
30
$60 $60
$100$120
$100
$120
=$160 =$180 =$220
optimal solution forthe 0/1knapsack problem
non-optimal solution forthe 0/1 knapsack problem
valueperpound
value
-
7/27/2019 AaDS - W14 HuffmanCode_Knapsack.pdf
12/16
12AaDS 2010/2011
0/1 knapsack problem - solution 0/1 knapsack problem solution: (dynamic programming) Knowing
the best solutions for items from a set {v1,v2,,vi} for a knapsack
with capacity from 1 to W, we can find a formula to compute the bestsolutions for items from a set {v1,v2,,vi,vi+1}
=
n
i
iiWtw
1
=
n
i
iitv
1
max
ni
ix
ix
vwxiFxiF
xiF
ii
=