CS 121: Lecture 3 Representations - Harvard University

40
CS 121: Lecture 3 Representations Madhu Sudan https://madhu.seas.Harvard.edu/courses/Fall2020 Book: https://introtcs.org Only the course heads (slower): [email protected] { How to contact us The whole staff (faster response): CS 121 Piazza

Transcript of CS 121: Lecture 3 Representations - Harvard University

Page 1: CS 121: Lecture 3 Representations - Harvard University

CS 121: Lecture 3Representations

Madhu Sudan

https://madhu.seas.Harvard.edu/courses/Fall2020

Book: https://introtcs.org

Only the course heads (slower): [email protected]{How to contact usThe whole staff (faster response): CS 121 Piazza

Page 2: CS 121: Lecture 3 Representations - Harvard University

Announcements• HW0 graded; solutions posted; Feedback sent.• HW1 out; due 1 week from today.• Section 1 material + video posted. • New expectations:

• Watch video in section: Thu+Fri.• Must “pre-watch” video for sections: Sat.-Wed.

• Reminder Boaz Barak on “Compression, Coding and Entropy” – today at 4:30! (Canvas ՜ Zoom ՜ CS 121.5)

Page 3: CS 121: Lecture 3 Representations - Harvard University

Today• Main message: Can represent “everything” with 0,1 כ

• Will define “represent” and explain “everything”• Lesson 1: Can represent Գ with 0,1 כ

• Break 1:• Lesson 2: Representing Գ × Գ with• Break 2: • Lesson 3: Prefix-free representations. Representing Գכ

• End: Can’t Represent Թ

Page 4: CS 121: Lecture 3 Representations - Harvard University

Representations: Motivation• Computers manipulate data. But what is data?

• Can all data be expressed as bits?• Answer 1: Yes – don’t we already do it?• Answer 2: Obviously! There are so many ways to do it! Pick your favorite.

• Today’s Central Player: 0,1 כ = Գא 0,1

• All finite length binary strings

Page 5: CS 121: Lecture 3 Representations - Harvard University

. "

O"""

Page 6: CS 121: Lecture 3 Representations - Harvard University

Definition: A representation scheme for a set of objects ࣩ is a one to one function ࣩ:ܧ ՜ כ{0,1}

Page 7: CS 121: Lecture 3 Representations - Harvard University

Definition: A representation scheme for a set of objects ࣩ is a one to one function ࣩ:ܧ ՜ כ{0,1} Equivalently: There is ܦ: כ{0,1} ՜ ࣩ

s.t. ܦ ܧ ݔ = ݔ for every ݔ א ࣩ

y Encoding

? Decoding

-

XE E (D Cx) )

Page 8: CS 121: Lecture 3 Representations - Harvard University

Definition: A representation scheme for a set of objects ࣩ is a one to one function ࣩ:ܧ ՜ כ{0,1}

Much research on “good” representations:• Effectiveness: Can compute encoding and decoding• Compression: Representation with small size (e.g., JPEG)• Error correction: Representation that is robust to errors (e.g., “control digits”,

error correcting codes)• Data structures: Representation enabling fast operations (e.g., binary numbers,

distance oracles)• Feature extraction: Representation enabling prediction (e.g., deep nets)• Secrecy: Representation hiding certain information (e.g., encryption)

Today: Simple representations for standard objects.

Equivalently: There is ܦ: כ{0,1} ՜ ࣩs.t. ܦ ܧ ݔ = ݔ for every ݔ א ࣩ

-

- -

Page 9: CS 121: Lecture 3 Representations - Harvard University

Binary representationOne to one function ܧ:Գ ՜ כ{0,1}

64 32 16 8 4 2 1

ܧ 83 =

ܧ 17 =

16 8 4 2 1

D (0110 ) ? 6

D ( 110 ) = 6

I 0 I O O l T

IECXHE ( log x )

Page 10: CS 121: Lecture 3 Representations - Harvard University

Binary representationOne to one function ܧ:Գ ՜ כ{0,1}

()2 104 1006 1100 01 1

1 0 1 0 0 1 1

64 32 16 8 4 2 1

ܧ 83 = 1010011

ܧ 17 = 10001

1 0 0 0 1

16 8 4 2 1“I always work

in base 10.”

- -

Page 11: CS 121: Lecture 3 Representations - Harvard University

Exercise 1: • Give a representation of 3 כ = 0,1,2 כ as binary strings.• Specifically give the encoding function• “Prove” it is one-to-one (or give decoder)

• Give an upper bound on the ratio ா ௫௫

of your representation

• Now improve the ratio!

?D-

O'

*in the limit txt → a .

Page 12: CS 121: Lecture 3 Representations - Harvard University

Exercise T : Solutions

0 : Question is asking for one-to-one function E :O → So , is

where0=801,23*1.

Our simple solutionis

⑨ let e :{oil , 23 → So,B'

be the map

e. (01=00 ; ell)= 11 ; @

(2) = 10

then it d. , go.BZ → SO ,1,23 is the function

dloo) -- O ,

dal)= I ,d 40)=2 ,

DAD = O-

ther t x C- Eon ,23 we have udlelx)) -- X .

Page 13: CS 121: Lecture 3 Representations - Harvard University

So e is 4 - T.

④ het E :{ on ,z3*→ So ,B* be geircn by

E- (Xo . . Xm ,)= @ (Xo) @ Cx .) - . . ecxn . ) .

Let D(yo . . . Gm . ,)= dlyoy ,)d(yay,) - - - dlym.im,)

if moren

⇐ 0 if m odd

then FXE {011,23$ wehave

DCECX)) -- X .

aerify this !)

② this solution achieves IEM = 2 FX .

Txt

Page 14: CS 121: Lecture 3 Representations - Harvard University

2 .A more

"

length - efhient"

solution would go as following .

{on , 23$ {0,123$ IN {o,B*

Where#

• p ( Xo - - Xm) -- Xo -- - Xn-it ← append T to string .

n- I

• F. (Yo . . -Ym) = Eye. 3"

i -- o

• ECA) = (A mod 2) Fz(LE ))is the binaryrepresentation of A .

To see that the composition F-Fz of o Pis one- to - one

Page 15: CS 121: Lecture 3 Representations - Harvard University

u verify that Fop is one .to - lone

[even though F is not ! )

• Roedl ( from earlierin the lecture) that E is one-to-one

o Composition preservesone-to-one - ness .

3. To analyze performance of F we note

( i ) top (Xo .. Xm,) s 3h12

Lii ) IECA) Is logz A tI

⇒ IFCX . . - Xn . . ) / s log," Itt = (ht2) toga's t I

⇒Lin;zlHXo--nX = log,

3.

Page 16: CS 121: Lecture 3 Representations - Harvard University

Representing rational numbers

Lemma: There is a one to one function ܧ: כ{0,1} × כ{0,1} ՜ כ{0,1}

ܧ encodes a pair of strings as a single string.

What about ܧ Ԣݔݔ = ?Ԣݔݔ

(You) Will prove lemma shortly!

ܧ,ݔ Ԣݔ ݕ

Corollary: Can represent Գ × Գ as strings.

Corollary: Can represent rational numbers as strings.

Page 17: CS 121: Lecture 3 Representations - Harvard University

Proof of Corollary:Lemma: There is a one to one function ܧ: כ{0,1} × כ{0,1} ՜ כ{0,1}

Corollary: Can represent Գ × Գ as strings.o

*xN'"

so so,BYE, so , is.

Page 18: CS 121: Lecture 3 Representations - Harvard University

Prefix freenessDefinition: ܧ:ࣩ ՜ כ{0,1} is prefix free if for every ݔ ് ,Ԣݔ(ݔ)ܧ is not prefix of ܧ(ݔᇱ)

1 0 1 0 0 1 1

1 0 1Example: 101 is prefix of 1010011

English is not prefix free: teacherstalking.com

- -

e- -

Page 19: CS 121: Lecture 3 Representations - Harvard University

Prefix freenessDefinition: ܧ:ࣩ ՜ כ{0,1} is prefix free if ݔ ് (ݔ)ܧ ,Ԣݔ is not prefix of ܧ(ݔᇱ)

Theorem (2.18): “If ܧ is prefix free then we can use it to encode pairs/lists”ܧ prefix free ֜ כࣩ:ᇱܧ ՜ כ{0,1} defined as ܧᇱ ,ݔ ,ଵݔ … , ݔ = ܧ ݔ ܧ ଵݔ ܧ… ݔ is one to one

Lemma (2.20): “Every encoding can be converted into a prefix-free one”If ܧ:ࣩ ՜ ࣩ:ᇱܧ then there exists prefix-free one-to-one ,כ{0,1} ՜ כ{0,1}

numbers lists of numbers

lists of lists of numbers = matrices

imagesLists of lists of images = videos

Page 20: CS 121: Lecture 3 Representations - Harvard University

Exercise 2:• Give a prefix free mapping: ܧ: 0,1 כ ՜ 0,1 כ

• Hint (example): 0010100 հ 0010100#

• Give an upper bound on the ratio ா ௫௫

for your ܧ.

• (If you have extra time, try to think of ܧ’s that improve the ratio.)

Page 21: CS 121: Lecture 3 Representations - Harvard University

SolutionWARNING : I made some incorrect claims in lecture .

This is the corrected sanitized version .

The idea for our basic encoding E is simple .

so ,is €, so , 1,23

. ¥ Eo,B-

where E, :{ o ,B- → Eon ,25$ is given by

E,( Xo . - xn-p ) = Xo . . . Xna 2-

Page 22: CS 121: Lecture 3 Representations - Harvard University

& E:{ on ,23$ → Eo , 13.

is given by

Ez(yo .. ym . . ) = ezlyo) Ealy,) . . ezlym .)

where ez :{on ,23 → So ,Bt is given by

Edo) -- OO ,edD= 11

,ez(2) = 10 .

To provecorrectness we make following claims :

① E ,is prefix - free } Verify yourselves .

② Ea is prefix - free

③ if E = Ezo E,where Fez (Yo - -ym.D-edyot.cz/ym.D

& E , e ez are prefix-free ,then so is E .

Page 23: CS 121: Lecture 3 Representations - Harvard University

Notes the transformation ca → Ez is exactly the onefrom Theorem 2.18 in book ( mentioned earlier

in lecture ) .

Thm 2-18 says Ez is one - tot one .

② I claimed E , prefix - free t Ez one - to- one

implies E is prefix - free .This is NOT CORRECT

Example dueto Adam H

.

° Let F :{o , i. 23$ → So ,Bt be the mapfrom

solution 2 to Exercise'

l .

° Using Ez-

- F a E , as in this solution ,

we get Ecepnptystring)= 101

& E ( O ) = lol I

Page 24: CS 121: Lecture 3 Representations - Harvard University

Note that ① +② t ③ ⇒ E is prefix - free .

Proof of③ : . Suppose ECX)-

- yo . . yn . , i EG)-

-Yo . - Gn. . Yn - - Ym. ,

& so Ecx) is a prefix of Ek) .

• Let dz be the decoder corresponding to e, and

Bz the left -to - right decoder for Ez based on

@ 2 . [ So Dz(yo . . yn . .) = delYo .-Yi) dzttit . - - Yj) . -- ]

then ~Dz(yo . - yn . , )= Wo - - Wai

& Bz(Yo . .. Ym. .)

= Vo - -- - Vb. ,

satisfy woe Vo , Wi- V

.. - . Way = Va- i

Page 25: CS 121: Lecture 3 Representations - Harvard University

° In other word Wo . . Wa . , is a prefix of

Vor r - Vb- I

'

But Wo .. Vava. , = E , (x )

Vo . . . Vb. ,=E, (2)

& this implies E. (x) is a prefix of Elz)

contradicting" E ,

is prefix - free"

Page 26: CS 121: Lecture 3 Representations - Harvard University

Performance of our solution-.

1LI = ?

1×1

if 1×1 -- n ,

then IE. Cx) )= htt

& ⇐ (E. HH -

- 2cm)

⇒ Ijm,-

- fi: 275=2 .

Page 27: CS 121: Lecture 3 Representations - Harvard University

Improvedsolntion : the *

Claim : if E :{aBFEQB*-

na prefix - free , Eo'

- IN → { o ,B

solution from earlieris 4- to - T

-

then is E-( x) EE#txt) ) xis prefix - free

him =L

(Xt>A 1×1

Proof . ii ) is easy - letsdo that first .

if lxt-hth.in/EoClxl)lEl0g.zntI

⇒ E ( Eollxl)) s 2 dognt 2)⇒ I E-G)Is nt2bgnt4

⇒ Ling, nt2bsn = T.

Page 28: CS 121: Lecture 3 Representations - Harvard University

C) Assume again for contradiction-that Ek) is a prefix

of EG) .

Then since E ( Eollxl)) is a prefix of

E- Cx) & E (Eoka)) is a prefix of E- ( z),

we have

⑨ ELEM)= prefix of E (Edm))

or ④ E (Eotd)=

i ' E (Eo ( 1×1))

or ② E (Eollxl)) = E (Eo1121))

But @ & ③ can 't happen sine E is prefix

free .

& if ② happens then 1×1=121 HEI -Eirine))& # Axl is a prefix of 121 ⇒ X=2

Page 29: CS 121: Lecture 3 Representations - Harvard University

E.g., Representing Graphs• ܩ = ܧ,ܸ ; ܸ = [݊] ܧ ; ك ܸ × ܸ

• Let Eric: N → So ,Bt be prefix free

. Let EEVXV = { Cini ,) ..

. .

Cim,Jm) } & V -- Ln]

° then Eng : Graphs → { 0,13$ is given by

Enecn) Enllii) Ends'D . .. .

Enclim) Encljm) .

Page 30: CS 121: Lecture 3 Representations - Harvard University

Example: Representing Matrix• ܯ א Ժ×:• First represent “list”

• Then represent “list of lists”

Let E : a→ So , B'be prefix- free

-

.

Eeistllii - -- in)) -- Eln) Eli ,) . . .

Elin)

is prefix - free !

Ema, ( ( an - - - Ain) - - . . Cami . - - Amn))

= Elm) Eeistlan . - am)) Eeistlasi . - 92nA -- Eeisllhmi - aim))

is abo prefix - free . no

Page 31: CS 121: Lecture 3 Representations - Harvard University

Prefix-free encoding in practice• “C style strings”: null terminated• “Pascal style”: encode ݔ א {0,1}ஸଶହହ as ݔ , ݔ

..both led to many security breaks

Page 32: CS 121: Lecture 3 Representations - Harvard University

TLS Heartbeat protocolCheck connection is alive:

ݔ , ݔ

ݔ , ݔ

Page 33: CS 121: Lecture 3 Representations - Harvard University

Heartbleed attack

“Some might argue that it is the worst vulnerability found (at least in terms of its potential impact) since commercial traffic began to flow on the Internet.”,

Joseph Steinberg , Forbes.

Page 34: CS 121: Lecture 3 Representations - Harvard University

Heartbleed attack

Page 35: CS 121: Lecture 3 Representations - Harvard University

Heartbleed attack

Page 36: CS 121: Lecture 3 Representations - Harvard University

Heartbleed attack

Page 37: CS 121: Lecture 3 Representations - Harvard University

Heartbleed attack

Page 38: CS 121: Lecture 3 Representations - Harvard University

Can we represent everything?Unfortunate Fact (Thm 2.5):“Can’t represent real numbers as strings”There is no one-to-one function ܴܵݐ:Թ ՜ כ{0,1}

Page 39: CS 121: Lecture 3 Representations - Harvard University

ImplicationsThesis: Everything representable as 0,1 כ !

Everything = Ժכ

Or Everything = 0,1 כ

Or Everything = Keyboard כ

Includes Music? All sounds? All images?

All Smell? People (“Beam me up, Scotty!”)

Page 40: CS 121: Lecture 3 Representations - Harvard University

Rest of the course:Part I: Circuits: Finite computation, quantitative study

Part II: Automata: Infinite restricted computation, quantitative study

Part III: Turing Machines: Infinite computation, qualitative study

Part IV: Efficient Computation: Infinite computation, quantitative study

Part V: Randomized computation: Extending studies to non-classical algorithms