Running Time of Kruskal’s Algorithm Huffman Codes Monday, July 14th.

Running Time of Kruskal’s Algorithm

Huffman Codes

Monday, July 14th

Outline For Today

1. Runtime of Kruskal’s Algorithm (Union-Find Data

Structure)

2. Data Encodings & Finding An Optimal Prefix-free

Encoding

3. Prefix-free Encodings Binary Trees

4. Huffman Codes

Recap: Kruskal’s Algorithm Simulation

B C

1

46

2

5

A D E

F

3

G

2.5

7.5

H

8

7

9

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


B C

1

46

2

5

A D E

F

3

G

2.5

7.5

H

8

7

9

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9

Creates a cycle


B C

1

46

2

5

A D E

F

3

G

2.5

7.5

H

8

7

9

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


B C

1

46

2

5

A D E

F

3

G

2.5

7.5

H

8

7

9

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9

Creates a cycle


B C

1

4

2A D E

F

3

G

2.5

7.5

H

7

Final Tree!

Same as Tprim

Recap: Kruskal’s Algorithm Pseudocode procedure kruskal(G(V, E)):

sort E in order of increasing weights rename E so w(e1) < w(e2) < … < w(em) T = {} // final tree edges for i = 1 to m: if T ∪ ei=(u,v) doesn’t create cycle add ei to T return T

Recap: For Correctness We Proved 2 Things1. Outputs a Spanning Tree Tkrsk

2. Tkrsk is a minimum spanning tree

1: Kruskal Outputs a Spanning Tree (1)

Need to prove Tkrsk is spanning AND is acyclic

Acyclic is by definition of the algorithm.

Why is Tkrsk spanning (i.e., connected)?

Recall Empty Cut Lemma:

A graph is not connected iff ∃ cut (X, Y) with no

crossing edges

If all cuts have a crossing edge -> graph is

connected!

2: Kruskal is Optimal (by Cut Property)Let (u, v) be any edge added by Kruskal’s Algorithm.

u and v are in different comp. (b/c Kruskal checks for

cycles)

ux

y

v

t

zw

Claim: (u, v) is min-edge crossing this cut!

Kruskal’s Runtime

procedure kruskal(G(V, E)): sort E in order of increasing weights

rename E so w(e1) < w(e2) < … < w(em) T = {} // final tree edges for i = 1 to m: if T ∪ ei=(u,v) doesn’t create cycle add ei to T return T

O(mlog(n))

m iterations

?Option 1: check if u v path exists! ⤳

Run a BFS/DFS from u or v => O(|T| + n) = O(n)

Can we speed up cycle checking?

***BFS/DFS Total Runtime: O(mn)***

Speeding Kruskal’s Algorithm

Goal: Check for cycles in log(n) time.

Observation: (u, v) creates a cycle iff u and v

are in the same connected component

Option 2: check if u’s component = v’s

component

More Specific Goal: check the component of

each vertex in log(n) time

Union-Find Data Structure

Operation 1: Maintain the component

structure of T as we add new edges to it.

Operation 2: Query component of each

vertex v

Union

Find

Kruskal’s With Union-Find (Conceptually)

B C

1

46

2

5

A D E

F

3

G

2.5

7.5

H

8

7

9


A

CB

E

B C

1

46

2

5

D

A D E

FF

3

G G

2.5

7.5

HH

8

7

9

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9

Find(A) = A

Find(D) = DUnion(A, D)


A

CB

E

B C

1

46

2

5

A

A D E

FF

3

G G

2.5

7.5

HH

8

7

9

Find(D) = A

Find(E) = EUnion(A, E)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

CB

A

B C

1

46

2

5

A

A D E

FF

3

G G

2.5

7.5

HH

8

7

9

Find(C) = C

Find(F) = FUnion(C, F)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

CB

A

B C

1

46

2

5

A

A D E

FC

3

G G

2.5

7.5

HH

8

7

9

Find(E) = A

Find(F) = CUnion(A, C)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AB

A

B C

1

46

2

5

A

A D E

FA

3

G G

2.5

7.5

HH

8

7

9

Find(A) = A

Find(B) = BUnion(A, B)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AA

A

B C

1

46

2

5

A

A D E

FA

3

G G

2.5

7.5

HH

8

7

9

Find(D) = A

Find(C) = ASkip (D, C)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AA

A

B C

1

46

2

5

A

A D E

FA

3

G G

2.5

7.5

HH

8

7

9

Find(A) = A

Find(C) = ASkip (A, C)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AA

A

B C

1

46

2

5

A

A D E

FA

3

G G

2.5

7.5

HH

8

7

9

Find(C) = A

Find(H) = HUnion(A, H)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AA

A

B C

1

46

2

5

A

A D E

FA

3

G G

2.5

7.5

AH

8

7

9

Find(F) = A

Find(G) = GUnion(A, G)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AB

A

B C

1

46

2

5

A

A D E

FA

3

A G

2.5

7.5

AH

8

7

9

Find(B) = A

Find(C) = ASkip (B, C)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9


A

AB

A

B C

1

46

2

5

A

A D E

FA

3

A G

2.5

7.5

AH

8

7

9

Find(H) = A

Find(G) = ASkip (H, G)

1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, 9

Union-Find Implementation Simulation

A1

B1

C1

D1

E1

F1

G1

H1


A2

B1

C1

D

E1

F1

G1

H1


A3

B1

C1

D

F1

G1

H1

E


A3

B1

C2

D

G1

H1

E F


A5

B1

CD

G1

H1

E

F


A6

CD

G1

H1

E

F

B


A7

CD

G1

E

F

B H


A8

CD E

F

B H G

C

A

X7

W Z

Y

T

Linked Structure Per Connected Component

Leader

C

A W Z

Y

T

Union Operation

F G

X7

E3

Union: **Make Leader of Small Component Point to the leader of Large Component**

C

A W Z

Y

T

Union Operation

F G

X10

E

Cost: O(1)(1 pointer update, 1 increment)

Union: **Make Leader of Small Component Point to the leader of Large Component**

C

A W Z

Y

T

Union Operation

F G

X10

E

C

A W Z

Y

T

Find Operation

F G

X10

E

Find: “pointer chase” until the leader

Cost: # pointers to leader

?

Cost of Find Operation

Claim: For any v,

#-pointers to leader(v) ≤ log2(|

component(v)|)

≤ log2(n)

Proof: Each time v’s path to leader increases by

1, the size of its component at least doubles!

|component(v)| starts at 1, increases to n,

therefore it can double at most log2(n) time!

Summary of Union-Find

Initialization: Each v is a comp. of size 1 and points to

itself.

When we union two components, we make the leader

of the smaller one point to the larger one (break ties

arbitrarily).

Find(v):

Pointer chasing to the leader

Cost: O(log2(|component|)) = O(log2(n))

Union(u, v): 1 pointer update, 1 increment => O(1)

Kruskal’s Runtime With Union-Find

procedure kruskal(G(V, E)):sort E in order of increasing weights

rename E so w(e1) < w(e2) < … < w(em) init Union-Find T = {} // final tree edges for i = 1 to m: ei=(u,v) if find(u) != find(v) add ei to T Union(find(u), find(v)) return T

O(mlog(n))

m iterations

log(n)

***Total Runtime: O(mlog(n))***Same as Prim’s with heaps

O(1)

O(n)

Outline For Today


Structure)


Encoding


4. Huffman Codes

Data Encodings and Compression

All data in the digital world gets represented as 0s and

1s. 010010100010010100011110110010010101010110100001110100010011000010010101011010100010

100001110100010011000010010101011010100010010100011010100010010100010010110110010101

111100111010001001100001100101101011010100010011000110101000100101001010110110010101

Goal of Data Compression: Make the binary

blob as small as possible, satisfying the

protocol.

Encoding-Decoding Protocol

010010100010010100011110110010010101010110100001110100010011000010010101011010100010

encoder

decoder

Alphabet A = {a, b, c, …., z}, assume |A|

= 32 ab…z

0000000001…11111

Option 1: Fixed Length Codes

Each letter mapped to exactly 5

bits

Example: ASCII encoding

cat

ab…z

0000000001…11111

encoder

decoder

000110000010100

Example: Fixed Length Codes

000110000010100

cat

A = {a, b, c, …., z}

Output Size of Fixed Length Codes

Input: Alphabet A, text document of length n

Each letter is mapped to log2(|A|) bits

Output Size: nlog2(|A|)

Optimal if letters appear with same frequencies in

text!In practice, letters appear with different

frequencies

Ex: In English, letters a, t, e are much more

frequent than q, z, x

Question: Can we do better?

Option 2: Variable Length Binary Codes

Goal is to assign:

Frequently appearing letters short bit strings

Infrequently appearing ones long bit strings

Hope: On average have ≤ nlog2(|A|) encoded bits for

documents of size n (or ≤ log2(|A|) bits per letter)

Example 1: The Morse’s Code (not binary)Two Symbols: Dots (●) and Dash (−) or light and dark

But end of a letter is indicated with a pause

(effectively a third symbol)

frequents: e => ●, t => −, a => ●−

Infrequents: c => −●−●, j => ●−−−

cat encoder −●−●P

−●−●P●−P−P

cat decode

●−P

−P

Can We Have a Morse Code with 2 Symbols?Goal: Same idea as the Morse code but with only 2

symbols.

frequents: e => 0, t => 1, a => 01

Infrequents: c => 1010, j => 0111

cat encoder 1010

1010011decode

011

taeett?

teteat?cat?

**Decoding is Ambigous**

Why Was There Ambiguity?

The encoding of one letter was a

prefix of another letter.

Ex: e => 0 is a prefix of a => 01

Goal: Use a “prefix-free” encoding, i.e.

no letter’s encoding is a prefix of

another!Note: Fixed-length encoding was naturally

“prefix-free”.

Ex: Variable Length Prefix-free Encoding

Ex: A = {a, b, c, d}

abcd

010110111

110010

decode


Ex: A = {a, b, c, d}

abcd

010110111

110010

decodec


Ex: A = {a, b, c, d}

abcd

010110111

110010

decodeca


Ex: A = {a, b, c, d}

abcd

010110111

110010

decodecab


Ex: A = {a, b, c, d}

abcd

010110111

11101101100

decode


Ex: A = {a, b, c, d}

abcd

010110111

11101101100

decoded


Ex: A = {a, b, c, d}

abcd

010110111

11101101100

decodeda


Ex: A = {a, b, c, d}

abcd

010110111

11101101100

decodedac


Ex: A = {a, b, c, d}

abcd

010110111

11101101100

decodedacc


Ex: A = {a, b, c, d}

abcd

010110111

11101101100

decodedacca

Benefits of Variable Length Codes

Ex: A = {a, b, c, d}, Frequencies: a: 45% b: 40% c:

10% d: 5%

abcd

010110111

Variable Length

Codeabcd

00011011

Fixed Length Code

A document of length

100K

Fixed Length

Code

Variable Length

Code200K bits

(2

bits/letter)

a: 45Kb: 80Kc: 30Kd: 15K

Total:170K bits(1.7 b/l)

Formal Problem Statement

Input: An alphabet A, and frequencies 𝓕 of letters in A

Output: a prefix-free encoding Ɣ, i.e. a mapping A ->

{0,1}* that minimizes the average bits per letter

Outline For Today


Structure)


Encoding


4. Huffman Codes

Prefix-free Encodings Binary Trees

We can represent each prefix-free code Ɣ as a binary

tree T as follows:

abcd

010110111

Code 1

b

c d

0 1

a0 1

0 1

Encoding of letter x = path from the root to the leaf

with x

Prefix-free Encodings Binary Trees

We can represent each prefix-free code Ɣ as a binary

tree T as follows:

abcd

00011011

Code 2

c d

0 1

0 1

a b

0 1

Reverse is Also True

Each labeled binary tree T corresponds to a prefix-free

code for an alphabet A, where |A| = # leaves in T

b e

0 1

0 1

a0

1

c d0

1

abcde

011000000111

Why is this code prefix-free?

Reverse is Also True

Claim: Each labeled binary tree T corresponds to a

prefix-free code for an alphabet A, where |A| = #

leaves in T

Proof: Take path P = {0,1}* to leaf x as x’

encoding

Since each letter x is at a leaf,

the path from the root to x is a dead-end

and cannot be part of a path to another letter y.

Number of Bits for Letter x?

b

c d

0 1

a0 1

0 1

Let A be an alphabet, and T be a binary tree where

letters of A are the leaves of T

Answer: depthT(x)

Question: What’s the number

of bits for each letter x in the

encoding corresponding to T?

Formal Problem Statement Restated

Input: An alphabet A, and frequencies 𝓕 of letters in A

Output: A binary tree T, where letters of A are the

leaves of T, that has the minimum average bit length

(ABL):

Outline For Today


Structure)


Encoding


4. Huffman Codes

Observation 1 About Optimal T

Claim: The optimal binary tree T is full, i.e., each non-

leaf vertex u has exactly 2 children

a

0 1

c

0 1

b

0 1

e0

a

0 1

c

0 1

b

0 1

e

Why?T T`


leaf vertex u has exactly 2 children

a

0 1

c

0 1

b

0 1

e0

a

0 1

c

0 1

b

0 1

e

Exchange Argument: Can replace u with its only child and decrease the

depths of some leaves, giving a better tree T`.



leaf vertex has exactly 2 children

T T`

c

0 1

1

0

a b

1c

0 1

0

a b

1


First Algorithm: Shannon-Fano Codes

From 1948

Top-down Divide-Conquer type approach

1. Divide the alphabet into A0 and A1 s.t the frequencies

of letters in A0 and A1 are roughly 50%

2. Find an encoding Ɣ0 for A0, and Ɣ1 for A1

3. Append 0 to the encodings of Ɣ0 and 1 to Ɣ1

First Algorithm: Shannon-Fano Codes

Ex: A = {a, b, c, d}, Frequencies: a: 45% b: 40% c:

10% d: 5%

A0 = {a, d}, A1 = {b, c}

d

0 1

a c

0 1

b

0 1

Fixed-length encoding, which we saw was

suboptimal!


Claim: In any optimal tree T if leaf x has depth i, and leaf

y has depth j, s.t i < j => f(x) ≥ f(y)

Why?

Exchange Argument: Replace x and y and get a better

tree T`.


Ex: A = {a, b, c, d}, Frequencies: a: 45% b: 40% c: 10%

d: 5%

b

a d

0 1

c0 1

0 1b

c d

0 1

a0 1

0 1

T => 2.4 bits/letter

T` => 1.7 bits/letter

Corollary

In any optimal tree T the two lowest

frequency letters are both in the lowest

level of the tree!

Huffman’s Key Insight

Observation 1 => optimal Ts are full => each leaf has

a sibling

Corollary => 2 lowest freq. letters x, y are at the same

level

Changing letters across the same level does not

change the cost of T

b

c d

0 1

a0 1

0 1

There is an optimal tree T,

in which the two lowest

frequency letters are

siblings (in the lowest level

of the tree).

Possible Greedy Algorithm

Possible greedy algorithm:

1. If x, y are siblings, treat them as a single meta-letter

xy

2. Find an optimal tree T* with A-{x, y} + {xy}

3. Expand xy back into x and y in T*

Possible Greedy Algorithm (Example)

xy t

0 1

z0 1

Ex: A = {x, y, z, t}, and let x, y be the two lowest freq.

letters

Let A` = {xy, z, t}

t

0 1

z0 1

x y

0 1

T* T

The weight of meta-letter?

Q: What weight should be attached to the meta-letter

xy?

A: f(x) + f(y) procedure Huffman(A, 𝓕): if (|A|=2): return T where branch 0, 1 point to A[0] and A[1], respectively

let x, y be lowest two frequency letters let A` = A-{x,y}+{xy} let ` = - {x, y} + {xy: f(x) + f(y)}𝓕 𝓕 T* = Huffman(A`, `)𝓕 expand x, y in T* to get Treturn T

Huffman’s Algorithm (1951)

procedure Huffman(A, 𝓕): if (|A|=2): return T where branch 0, 1 point to A[0] and A[1], respectively

let x, y be lowest two frequency letters let A` = A-{x,y}+{xy} let ` = - {x, y} + {xy: f(x) + f(y)}𝓕 𝓕 T* = Huffman(A`, `)𝓕 expand x, y in T* to get Treturn T

Huffman’s Algorithm Correctness (1)

By induction on the |A|

Base case: |A| = 2 => return simple full tree with 2

leaves

IH: Assume true for all alphabets of size k-1

Huffman will get a Tk-1opt with meta-letter xy and

expand xy


xy t

0 1z

0 1t

0 1z

0 1

x y0 1

Tk-1opt T

f(xy)*depth(xy)=(f(x) +

f(y))*depth(xy)

(f(x) + f(y))*(depth(xy) + 1)

Total diff = f(x) + f(y)


Take any optimal Z, we’ll argue ABL(T) ≤ ABL(Z)

By corollary we can assume in Z x,y are also siblings at

the lowest level.

Consider Z` by merging them => Z` is valid prefix-

code for A` of size k-1

ABL(Z) = ABL(Z`) + f(x) + f(y)

ABL(T) = ABL(T`) + f(x) + f(y)

By IH: ABL(T`) ≤ ABL(T`) => ABL(T) ≤ ABL(z)

Q.E.D

Huffman’s Algorithm Runtime

Exercise: Make Huffman run in O(|A|log(|A|))?

Running Time of Kruskal’s Algorithm Huffman Codes Monday, July 14th.

Documents

Transcript of Running Time of Kruskal’s Algorithm Huffman Codes Monday, July 14th.