ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput...

39
ACM notes • Why… ? Not always an easy question! Entropy problem -- Huffman Codes Input Outpu t A..AB..BC..CD..DE. .EF 784 224 3.5 compression ratio, to 1 place of precision bits used in ASCII bits used in an optimal “prefix- free” encoding 40 20 16 12 9 98 chars 1

Transcript of ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput...

Page 1: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

ACM notes

• Why… ? Not always an easy question!

Entropy problem -- Huffman Codes

Input Output

A..AB..BC..CD..DE..EF 784 224 3.5

compression ratio, to 1 place of precision

bits used in ASCII

bits used in an optimal “prefix-free” encoding

40 20 16 12 9

98 chars

1

Page 2: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Prefix-free Codes via binary trees

A..AB..BC..CD..DE..EF

40 20 16 12 9 1

D12

CD

ABCDEF

BCDF

BFA E

40 921 28

49

98

49

C16

AE

B F20 1

49*2 + 49*3 = 245

Page 3: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Prefix-free Codes via binary trees

A..AB..BC..CD..DE..EF

40 20 16 12 9 1

D12

CD

ABCDEF

BCDF

BFA E

40 921 28

49

98

49

C16

AE

0 1

0 1

10

10

B F20 1

10

49*2 + 49*3 = 245

00 01

100 101110 111

codewords are read down the paths

Page 4: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Huffman Codes

Building the tree from the bottom up…

A..AB..BC..CD..DE..EF

40 20 16 12 9 1

EF10

E F9 1

10A..AB..BC..CD..DE..F

40 20 16 12 10

Page 5: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Huffman Codes

Building the tree from the bottom up…

A..AB..BC..CD..DE..F

40 20 16 12 10

EF10

E F9 1

D12

0

10

DEF22

A..AD..E..FB..BC..C

40 22 20 16

Page 6: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Huffman Codes

Building the tree from the bottom up…

EF10

E F9 1

DEFBC

B C20 16

36 22

D12

1010

10A..AB..CD..E..F

40 36 22

A..AD..E..FB..BC..C

40 22 20 16

Page 7: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Huffman Codes

Building the tree from the bottom up…

EF10

E F9 1

DEF

BCDEF

BC

B C20 16

36 22

5840

D12

A

0 1

1010

10

A..AB..CD..E..F

40 36 22

A..AB..C..D..E..F

40 58

Page 8: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Huffman Codes

Building the tree from the bottom up…

EF10

E F9 1

DEF

ABCDEF

BCDEF

BC

B C20 16

36 22

58

98

40

D12

A

0 1

0 1

1010

10

A..AB..C..D..E..F

40 58

A..B..C..D..E..F

58

Page 9: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Huffman Codes

Total number of bits needed...

EF10

E F9 1

DEF

ABCDEF

BCDEF

BC

B C20 16

36 22

58

98

40

D12

A

0 1

0 1

1010

10

40*1 + 36*3 + 12*3 + 10*4 = 224

Page 10: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Graphs

A

B

ED

C

Adjacency List Representation

[A,E], [A,B], [A,C], [B,D], [B,E], [C,B], [D,A], [D,C], [E,D]

(vector or array of pairs)

Page 11: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Graphs

A

B

ED

C

8

13

1

6

12

9

7 0

11 0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 - 0 0 -

- - - 11 0

AB

C

D

E

FROM

TO

Matrix representation

A B C D E

[A,E], [A,B], [A,C], [B,D], [B,E], [C,B], [D,A], [D,C], [E,D]

(2d array)

Adjacency List Representation

(vector or array of pairs ?)

Page 12: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Problems

Roman Forts (forts.cc)

J

I

C

G

A

F

D

B E

KH

fortify the most vulnerable...

Page 13: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Problems

Single Points of Failure (spf.cc)

1 25 43 13 23 43 50

Network #1SPF node 3 leaves 2 subnets

Input 1

Output 1

Graph 1

Network #2No SPF nodes

Output 2

Graph 2

Page 14: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

All-pairs shortest paths...

0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 - 0 0 -

- - - 11 0

AB

C

D

E

D0 = (dij )0

0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 - 0 0 -

- - - 11 0

AB

C

D

E

D1 = (dij )1

dij = shortest distance from i to j through {1, …, k} k

dij =k

0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 15 0 0 8

- - - 11 0

AB

C

D

E

“Floyd-Warshall algorithm”

Page 15: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

All-pairs shortest paths...

0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 - 0 0 -

- - - 11 0

AB

C

D

E

D0 = (dij )0

0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 - 0 0 -

- - - 11 0

AB

C

D

E

D1 = (dij )1

dij = shortest distance from i to j through {1, …, k} k

dij =k

0 8 13 - 1- 0 - 6 12

- 9 0 - -

7 15 0 0 8

- - - 11 0

AB

C

D

E

“Floyd-Warshall algorithm”

Page 16: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Geometric Problems

A

B

ED

C

Adjacency List Representation

[A,E], [A,B], [A,C], [B,D], [B,E], [C,B], [D,A], [D,C], [E,D]

(vector or array of pairs)

code resources to keep in mind

Page 17: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

All-pairs shortest paths...

0 8 13 14 1- 0 - 6 12

- 9 0 15 21

7 15 0 0 8

- - - 11 0

AB

C

D

E

D2 = (dij )2

0 8 13 14 1- 0 - 6 12

- 9 0 15 21

7 9 0 0 8

- - - 11 0

AB

C

D

E

D3 = (dij )3

0 8 13 14 113 0 6 6 12

22 9 0 15 21

7 9 0 0 8

18 20 11 11 0

AB

C

D

E

D4 = (dij )4

AB

C

D

E

D5 = (dij )5

to store the path, another matrix can track the last intermediate vertex

0 8 12 12 113 0 6 6 12

22 9 0 15 21

7 9 0 0 8

18 20 11 11 0

Page 18: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Graphs

Representations

Input Output

AAAAABCD 64 13 4.9

compression ratio, to 1 place of precision

bits used in ASCII

bits used in an optimal “prefix-free” encoding

Page 19: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Graphs

Representations

Input Output

AAAAABCD 64 13 4.9

compression ratio, to 1 place of precision

bits used in ASCII

bits used in an optimal “prefix-free” encoding

Page 20: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Problems

Entropy

Input Output

AAAAABCD 64 13 4.9

compression ratio, to 1 place of precision

bits used in ASCII

bits used in an optimal “prefix-free” encoding

A 0

B 10

C 110

D 111

Page 21: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Problems

N-Credible Mazes

Input Output

Maze #1 can be travelled2

0 0 2 2

0 0 0 1

0 1 0 2

0 2 1 2

0 2 0 3

1 2 2 2

-1

start end

dimensions

edge start

edge end

(or not…)

Page 22: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

C++ STL

set<int> s; // basically a bin. tree

s.size(); // returns an int

s.insert(14); // adds 14

s.insert(-9); // adds -9

s.insert(42); // adds 42

set<int>::iterator i; // may want to typedef

// think of an iterator as a pointer

i = s.find(42); // return 42’s iterator

cout << (*i) << endl; // prints 42

cout << (*--i)); // prints ...

i = s.find(43); // not there !

// at this point ( i == s.end() ) is true

s.erase(-9); // removing elements

multiset<int> m; // holds multiple copies

multiset#include <set>

set#include <set>

www.dinkumware.com/htm_cpl/index.html www.sgi.com/tech/stl/

Page 23: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Breadth-first search

• algorithm

data structuresqueue, deque, hashtable (map)

Page 24: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

C++ STL

vector<int> v; // basically an int array

v.reserve(10); // assure 10 spots

v.push_back(42); // adds 42 to the end

v.back(); // returns 42

v.pop_back(); // removes 42

v.size(); // # of elements

v[i]; // ith element

sort( v.begin(), v.end() ); // default sort

sort( v.begin(), v.end(), mycompare );

deque<int> d; // double-ended queue

d.push_front(42); // add to front

d.front(42); // return front element

d.pop_front(42); // remove from front

sort#include <algorithm>

vector#include <vector>

deque#include <deque>

last time

www.dinkumware.com/htm_cpl/index.html www.sgi.com/tech/stl/

Page 25: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Other problems

• Change counting

input: 1.00

0.06

0

output: There are 292 ways to make $1.00

There are 2 ways to make $0.06

• Sigma series

input: 3

4

87

99

-1

output: 1 2 3

1 2 4

1 2 4 8 16 24 28 29 58 87

1 2 4 8 16 32 33 66 99

Shortest sequences from 1 to N such that each element is the sum of two previous elements.

Page 26: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Useful C functions

int atoi(char* s);

double atof(char* s);

int strcasecomp(char* s1, char* s2);

long strtol(char* s, NULL, int base)

strtol(“Charlie”, NULL, 36) == 2147483647L

converts C strings to ints atoi(“100”) == 100

converts C strings to doubles atoi(“100.0”) == 100.0

case-insensitive C string comparison strcasecmp(“aCm”,“ACm”) == 0

arbitrary conversion from a string in bases (2-36) to a long int

use man for more...

Page 27: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

sprintf

int sprintf(char* str, char* format, ...);prints anything to the string str

char str[100];

sprintf(str,“%d”,42); // str is “42”

sprintf(str,“%f”,42.0); // str is “42.0”

sprintf(str,“%10d”,42); // str is “ 42”

sprintf(str,“%-10d”,42); // str is “42 ”

flexible formatting:

right/left justify:

Page 28: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

A chance to “improve” your C/C++ …

Preparation for the ACM competition ...

Problem Insight and Execution ...

Get into the minds of the judges

Anxiety!1 2

Two ACM programming skills

Page 29: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Get into the minds of the judges

Key Skill #1: mindreading

“What cases should I handle?” spectrum

100%0%

Page 30: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Key Skill #2: anxiety

Anxiety!

Page 31: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Dynamic Programming

Strategy: create a table of partial results & build on it.

divis.cc

T(n) = T(3n+1) + 1 if n odd

T(n) = number of steps yet to go

T(n) = T(n/2) + 1 if n even

Page 32: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Dynamic Programming

Keys: create a table of partial results, articulate what each table cell means, then build it up...

divis.cc0 1 2 3 4 5 6

0

1

2

T[i][j] is 1 if i is a possible remainder using the first j items in the list.

Table T

3

j = items considered so far

i = p

ossi

ble

rem

aind

er

1 1 6 2 -3the list

the divisor4

Page 33: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Dynamic programs can be short

#include <cstdio>#include <iostream>#include <vector>

vector<int> v(10000);vector<bool> m(100); // old modsvector<bool> m2(100); // new modsint n, k;

bool divisible(){ fill(m.begin(),m.end(),false); m[0] = true;

for (int i=0; i<n; i++) {

/* not giving away all of the code */ /* here the table is built (6 lines) */ }

return m[0];}

int main(){ cin >> n; // garbage

while (cin >> n) { cin >> k;

for (int i=0; i<n; i++) { cin >> v[i]; v[i] = abs(v[i]); v[i] %= k; }

cout << (divisible() ? "D" : "Not d") << "ivisible\n"; } cout << endl;}

acknowledgment: Matt Brubeck

STL: http://www.sgi.com/Technology/STL

Page 34: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

General ACM Programming

Try brute force first (or at least consider it)

-- sometimes it will work fine…

-- sometimes it will take a _bit_ too long

-- sometimes it will take _way_ too long

for (int j=1 ; j<N ; ++j){ cin >> Array[i];}

Table[i + n % k] = 1;Table[i - n % k] = 1;

filling in the table in the “divis” problem:

getting the input in the “pea” problem:

Best bugs from last week:

Page 35: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

New Problem

Input A list of words

Word Chains

Output yes or no -- can these words be chained together such that the last letter of one is the first letter of the next… ?

doze

aplomb

ceded

dozen

envy

ballistic

yearn

hertz

jazz

hajj

zeroth

Page 36: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

Knapsack Problem

0 1 2 3 4

1

2

3

V(n,w) = max value stealable w/ ‘n’ objects & ‘w’ weight

V(n,w) =

object wt. val. 1 3 8 2 2 5 3 1 1 4 2 5

Maximize loot w/ weight limit of 4.

4

Number of objects

considered

Weight available for use

n

w

Page 37: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

C Output

printf, fprintf, sprintf(char* s, const char* format, …)the destination the format string the values

h.412-#0% d

start character

flags- left-justify0 pad w/ zeros+ use sign (+ or -)(space) use sign ( or -)# deviant operation

minimum field width

precision

size modifier

h shortl long (lowercase L)L long double

type

d decimal integersu unsigned (decimal) intso octal integersx hexadecimal integersf doubles (floats are cast)e doubles (exp. notation)g f or e, if exp < -3 or -4c characters stringn outputs # of chars written !!% two of these print a ‘%’

allowed size modifiers

possible format strings

Page 38: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

C Output

%10.4d

value = 42 value = -42

0042 -0042

%-#12x 0x2a 0xffffffd6

%+10.4g +42 -42.42

value = 42 value = -42.419

%- 10.4g 42 -42.42

%-#10.4g 42.00 -42.42

value = “forty-two”

%10.5s forty

Page 39: ACM notes Why… ? Not always an easy question! Entropy problem -- Huffman Codes InputOutput A..AB..BC..CD..DE..EF784 224 3.5 compression ratio, to 1 place.

const int p=1,n=5,d=10,q=25,h=50;

int counter=0; int pn, nn, dn, qn, hn; for (hn = 0; hn*h <= num; hn++) for (qn = 0; hn*h + qn*q <= num; qn++) for (dn = 0; hn*h + qn*q + dn*d <= num; dn++) for (nn = 0; hn*h + qn*q + dn*d + nn*n <= num; nn++) for (pn = 0; hn*h + qn*q + dn*d + nn*n + pn*p <= num; pn++) { if (hn*h + qn*q + dn*d + nn*n + pn*p == num) counter++; }

Change

Brute Force

Dynamic Programming

1¢, 5¢

1¢, 5¢, 10¢

1using

total 1¢ 2¢ 3¢ 4¢ 5¢ 6¢ 7¢ 8¢ 9¢ 10¢ 11¢ 12¢

1 1 1 1 1 1 1 1 1

1 1 1 1 2 2 2 2 2

1 1 1 1 2 2 2 2 2

1

1

1

1 1

3 3 3

4 4 4