On the Impact of Representation 5 short stories

Machine Learning On the Impact of Representation

5 short stories Prof. Dr. Volker Sperschneider

AG Maschinelles Lernen und Natürlichsprachliche Systeme

Institut für Informatik Technische Fakultät

Albert-Ludwigs-Universität Freiburg [email protected]

1.  Cover a truncated chess board – simple

2.  Choose the lucky door – well known

3.  Playing „three bit lotto“ – surprising

4.  Genome rearrangement – clever

5.  PCP theorem - unbelievable

Cover a truncated chess board

Covering a truncated chess board

Can you see now that cover is impossible?

The lucky door

The lucky door

What we cannot see: Behind one of the doors there is a heart of gold.

The lucky door

Make a preliminary choice, e.g.

The lucky door

Among the two other doors I show you one that does not contain the heart of gold.

The lucky door

Now make the final choice: Maintain the preliminary choice or change door.

The lucky door

31

Probability that heart is in one of the two groups

32

31

The lucky door

31

Knowing where heart is not does not change prob‘s

32

31

Choose three bits

•  Two players, called A and B, play a game. •  Player A chooses a sequence of three

bits, for example 101. •  Player B knows what player A has chosen

and also chooses a sequence of three bits, not all the same as the ones player A has chosen, for example 111.

Choose three bits

•  Now a random bit generator generates a stream of bits such that all bits are genera-ted independently and with equal probability

•  Whose three bits appear first in consecution

wins the game. •  Which player would you prefer to be?

Choose three bits

1

Choose three bits

11

Choose three bits

110

Choose three bits

1100

Choose three bits

11001

Choose three bits

110010

Choose three bits

1100101

Pro‘s and Contra‘s

•  Every sequence of three bits has the same probability to be generated, namely 1/8.

•  Player A may choose among 8 sequen-

ces, player B then only among 7.

•  Player B may choose a better sequence than the one player A has chosen.

Pro‘s and Contra‘s

•  If there were a possibility for player B to outperform every choice of player A, why not let player A select the best sequence among the 8 possible ones?

•  Very disturbing until we represent the

generation of the bit stream using a graph with 4 nodes that represent the 2 bits generated just before, and edges that encode the next generated bit.

Graph representation

00 01

11 10

1 0

1 1

0

1 0

0

Graph representation

•  Each edge encodes the three bits actually generated: The just generated bit written at the edge, the two bits generated before written at the source node of the edge.

•  The generation of the bit stream corres-ponds to a random walk through the graph starting at the edge that represents the first three bits.

001

00 01

11 10

1 0

1 1

0

1 0

0

0010

00 01

11 10

1 0

1 1

0

1 0

0

00101

00 01

11 10

1 0

1 1

0

1 0

0

001011

00 01

11 10

1 0

1 1

0

1 0

0

0010111

00 01

11 10

1 0

1 1

0

1 0

0

00101110

00 01

11 10

1 0

1 1

0

1 0

0

•  Every selection of player A and player B corresponds to a specific edge.

00 01

11 10

1 0

1 1:A

0

1:B 0

0

•  Player B outperforms player A as follows:

00 01

11 10

1 0

1:B 1

0

1:A 0

0


00 01

11 10

1:B 0

1:A 1

0

1 0

0


00 01

11 10

1 0

1 1:A

0

1 0:B

0

Player B outperforms player A as follows:

•  Can you recognize B`s strategy how to outperform B?

•  Can you compute or estimate probabilities of winning the game for players A and B in all of the 3 cases presented above?

•  Why is there no best choice for player A?

Oriented genes

+1

-2

+5 +8 +7

-3 -4 -6

Reversal

+1

-2

+5 +8 +7

-3 -4 -6

Reversal

+1

-2

+8 +7 -3

-4

-6

+5

Reversal

+1

-2

+8 +7

Reversal

+1

-2

+8 +7

-3 -4

-6 +5

Reversal

+1

-2

+8 +7

Reversal

+1

-2

+8 +7

-3 -4 -6

+5

Reversal

+1

-2

+8 +7 +6

-5

+4 +3

A real example: 36 genes of mouse and worm on mitochondrial DNA

Transformation by 26 reversals

+/- signs omitted for typing reasons – please insert

•  12 31 34 28 26 17 29 04 09 36 18 35 19 01 16 14 32 33 22 15 11 27 05 20 13 30 23 10 06 03 24 21 08 25 02 07 •  20 05 27 11 15 22 33 32 14 16 01 19 35 18 36 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 16 14 32 33 22 15 11 27 05 20 19 35 18 36 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 16 15 22 33 32 14 11 27 05 20 19 35 18 36 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 16 15 36 18 35 19 20 05 27 11 14 32 33 22 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 16 15 14 11 27 05 20 19 35 18 36 32 33 22 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 16 15 14 31 34 28 26 17 29 04 09 22 33 32 36 18 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 26 28 34 31 14 15 16 17 29 04 09 22 33 32 36 18 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 26 28 18 36 32 33 22 09 04 29 17 16 15 14 31 34 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 26 28 29 04 09 22 33 32 36 18 17 16 15 14 31 34 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 •  01 26 28 29 30 13 12 11 27 05 20 19 35 34 31 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 •  01 26 11 12 13 30 29 28 27 05 20 19 35 34 31 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 13 12 11 05 20 19 35 34 31 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 31 34 35 19 20 05 11 12 13 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 31 34 35 19 20 09 22 33 32 36 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 31 22 09 20 19 35 34 33 32 36 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 19 20 09 22 36 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 36 22 09 20 19 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 36 22 09 24 03 06 10 23 04 05 11 12 13 14 15 16 17 18 19 20 21 08 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 36 22 09 08 21 20 19 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 36 08 09 22 21 20 19 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 36 08 09 22 21 20 19 18 17 16 15 14 13 12 11 05 04 03 06 10 23 24 25 02 07 •  01 26 27 28 29 30 31 32 33 34 35 36 08 09 22 21 20 19 18 17 16 15 14 13 12 11 05 04 03 02 25 24 23 10 06 07 •  01 02 03 04 05 11 12 13 14 15 16 17 18 19 20 21 22 09 08 36 35 34 33 32 31 30 29 28 27 26 25 24 23 10 06 07 •  01 02 03 04 05 11 12 13 14 15 16 17 18 19 20 21 22 09 08 07 06 10 23 24 25 26 27 28 29 30 31 32 33 34 35 36 •  01 02 03 04 05 06 07 08 09 22 21 20 19 18 17 16 15 14 13 12 11 10 23 24 25 26 27 28 29 30 31 32 33 34 35 36 •  01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Problem representation

•  Circular arrangement of genes •  Level 1: nodes •  Level 2: circles = set of nodes •  Level 3: components = sets of circles =

sets of sets of nodes •  Level 4: fortresses = sets of components

sets of sets of circles = sets of sets of sets of nodes

Level 1

•  Nodes

start +1 +2 +3 +4 +5 !

+2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

start

end +3 +2 +1 +4 -5!

+2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

end

start +1 +2 +3 +4 +5 end +3 +2 +1 +4 -5!

+2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

RD +2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

How complicated may diagrams be

•  Start sequence -1 +2 +3 –4 –5 –6 +7 –8 +9 +10 –11 +12

•  End sequence -2 +3 –4 –12 +5 –10 –9 +8 -1 -11 -6 +7

How complicated may diagrams be

R L

Effect of a reversal

R L

Diagram for the 36 mitochondrial genes n = 36 genes

k = 11 circles

h = 0 hurdles

f = 0 no fortress

requires

n + 1 – k + h + f =

26 reversals for

sorting

Level 2

•  Circles = sets of nodes

Good circles (with at least one pair of divergent links) Bad circles (with only pairs of convergent links)

+2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

RD +2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

Lower bound – upper bound

n20

Lower bound – upper bound Reversal between

divergent links of a single circle splits that circle into 2

convergent links of a single cycle does not split that circle

links of two different circles melts these circles into a single one

Lower bound – upper bound Starting with an RD-diagram with n genes and k

circles we require at least n + 1 – k reversals to

achieve the final diagram with n + 1 trivial circles

(each trivial circle has length 2) as each reversal

increases the number of circles by at most 1.

Proof: Clear


nkn 21 −+

Lower bound – upper bound The ideal situation would be to always have available circles with pairs of divergent links. Such circles are called good circles.

Circles with no pairs of divergent links are called bad circles. They may be present from the beginning, or created by reversals.

Bad circles that intersect good circles are not really bad. Proper reversals acting on diver-gent links of a good circle may turn bad circles into good ones.


Reversal acting on divergent links of a good circle: bad choice

Reversal acting on divergent links of a good circle: good choice

Level 2

•  Components = sets of circles = sets of sets of nodes

Lower bound – upper bound Theorem:

Every good component contains at least one

good circle with a pair of divergent links such

that the reversal between these links splits the

circle into two circles and generates one or

more good components from the selected good

component.

Lower bound – upper bound Proof:

Requires an easy simplification to diagrams

that contain only circles of length 2 and 4.

Then the proof is simple.

Look at an example to see that diagrams with

only circles of length 2 and 4 are much more

transparent than general ones.


badcknkn +−+−+ 11

Theorem: Every reversal acting on two links of

a bad circle of a bad component turn the bad

component into a good component.

Proof: Simple.

Consequence: Improved upper bound

Lower bound – upper bound badcknkn +−+−+ 11

Do we really require as many extra reversals as

there are bad components? No: Under certain

circumstances bad components may turn into

good components as a side effect of reversals

acting on links of circles with-in different bad

components. This may occur for bad compo-

nents that separate two bad components.

Area of a good components Area of a bad components

+2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

RD +2

-1

-2 +3 -3

-4

+4

+5 -5 +1

L R

Areas of good components are ignored in the

following discussion.

Level 4

•  Hurdles may form a fortress = set of components = set of sets of circles = set of sets of sets of nodes

More general constellations of bad components

Hurdles Non-Hurdles


hkn +−+1Theorem: Every reversal decreases the number

(with h = number of hurdles) by at most 1, either by

turning one hurdle into a good component (number

of circles unchanged), or by merging two hurdles

and all components that separate them into a good

component (number of circles increases by 1).

Proof: „Fleißaufgabe“.

Lower bound – upper bound Consequence: Improved lower bound

Lower bound is almost also an upper bound – up to

an eventually required further single reversal.

This additional reversal is required if generation of

a fresh hurdle cannot be avoided. The following

example shows how this may happen.

badcknhkn +−++−+ 11

Fresh hurdle appearing

Fresh hurdle appearing

Super-Hurdles

Fortresses

A fortress is characterized by the presence of an odd number of at least 3 hurdles all of which are super-hurdles. Fortress parameter f is 1 in case of a fortress, and 0 in case of a non-fortress.

Fortresses

Theorem: Every reversal decreases the number by at most 1. Proof: „Verschärfte Fleißaufgabe“.

fhkn ++−+1


Improved lower bound

badcknfhkn +−+++−+ 11

Lower bound meets upper bound

fhknfhkn ++−+++−+ 11

Proof: Easy

Proof details: Consult

PCP Theorem Probabilistically checkable proofs

•  Finding a proof (the solution of an examination exercise) is challenging.

•  Checking a proof (the solution of an

examination exercise) boring.

PCP Theorem Probabilistically checkable proofs

•  It would be nice if the lecturer/assistent could reduce checking of a solution to the inspection of only 13 random characters – regardless of how long the solution is.

PCP Theorem

•  We concentrate of satisfiability problem for 3-clauses in boolean logic:

Given a conjunction of 3-clauses and an assignment of truth values to the boolean variables, check whether these truth values satisfy the formula.

Standard satisfiability verifier

Input φ(X1,…,Xk) of length n

Certificate: Truth values (x1,…,xk)

Verifier: Insert truth values in formula and use truth tables

Standard satisfiability checking

•  Given a boolean formula and an assignment of truth values to all of its variables, insert truth values for the variables and compute truth value of the formula accor- ding to truth tables.

•  This is a polynomial time procedure.


•  Correctness: Given a non-satisfiable formula, the test deliveres „no“ for every assignment of truth values.

•  Completeness: Given a satisfiable

formula, the test deliveres „yes“ for at least one assignment of truth values.


•  All truth values must be read.

•  The longer the formula is, the more boolean variables are to be expected, the more truth values must be inspected.

Probabilistic satisfiability verifier

Input φ(X1,…,Xk)

Expect certificate (proof) Y

draw bit string r

access constant number of bits from Y

Verifier uses φ, r, and accessed bits of Y

Probabilistic satisfiability checking

•  For a boolean variable X its arithmetical representation is the polynomial 1-X.

•  For a negated boolean variable ¬X its arithmetical representation is the polynomial X.


•  For a 3-clause (that is, a disjunction of three literals) its arithmetical representation is the product of the arithmetical representations of its three literals.


•  Example: Clause

¬X˅Y˅Z is represented by polynomial

X(1-Y)(1-Z)


•  Clause ¬X˅Y˅Z is satisfied by truth values x, y, z iff

x = 0 or y = 1 or z = 1

that is, iff

x(1-y)(1-z) = 0

Probabilistic satisfiability checking •  For a conjunction of clauses its arithmetical

representation is the vector of arithmetical representations of all clauses.

•  Example: Conjunction of 4 clauses

(X˅¬Y˅Z) ⋀ (¬X˅Y˅¬Z) ⋀ (X˅¬Y˅¬Z) ⋀ (¬X˅¬Y˅¬Z)


•  Arithmetical representation by vector of 4 polynomial

((1-X)Y(1-Z), X(1-Y)Z, (1-X)YZ, XYZ)


•  For a conjunction of n clauses

φ(X1,…,Xk)

with k variables X1,…,Xk let its arithmetical representation be

(p1(X1,…,Xk),…,pn(X1,…,Xk))


•  For every list of truth values x1,…,xk the following are equivalent:

- φ(X1,…,Xk) is satisfied by x1,…,xk

- (p1(x1,…,xk),…,pn(x1,…,xk)) = (0,…,0)


•  Draw a random bit vector r of length n. •  From now on we compute mod 2, that

is, in the field of binary values 0, 1 with

1 + 1 = 0 + 0 = 0 1 + 0 = 0 + 1 = 1

0·0 = 0·1 = 1·0 = 0 1·1 = 1


•  For truth values x1,…,xk compute (mod 2)

),,(

)),,(,),,,((

),,,(

11

111

1

k

n

iii

knkT

k

xxpr

xxpxxprxxrtest

∑=

=

=


•  If φ(X1,…,Xk) is satisfied by x1,…,xk then

Proof: Trivial, since in this case

{ } 10),,,(Pr 1 ==kr xxrtest

)0,,0()),,(,),,,(( 111 =knk xxpxxp


•  If φ(X1,…,Xk) is not satisfied by x1,…,xk

then

{ } 2

11 0),,,(Pr ==kr xxrtest

Probabilistic satisfiability checking Proof: Consider an arbitrary component i with

Changing ri changes value of Thus for exactly 50% of all vectors r:

0),,( 1 ≠ki xxp

),,,( 1 kxxrtest

0),,,( 1 =kxxrtest

XYZZYXpYZXZYXpZYXZYXp

ZYXZYXp

ZYXZYXZYXZYXZYX

r

=

−=

−=

−−−=

¬∨¬∨¬∧¬∨¬∨

∧¬∨∨¬∧∨∨=

=

),,()1(),,()1(),,(

)1)(1)(1(),,(

)()()()(),,(

)1,0,1,1(

4

3

2

1

ϕ

How test formula is computed: Example

yzxyzyx

xyzxyzxzxyzxyzyzxzxyzyx

xyzzyxzyx

zyxpzyxpzyxpzyxprT

++−−−

=+−+

−−+++−−−

=+−+−−−

=

1

1

)1()1)(1)(1(

)),,(),,,(),,,(),,,(( 4321

How test formula is computed: Example

kljiljiT

kjijiT

kiiT

rCkjikji

rQjiji

rLii

knkT

xxxrcxxrqxrlrconst

xxxxxxrconst

xxpxxpr

1,,1,

1

),(,,),(,),(

111

)(),()(),(

)(),(),(

),(

)),,(,),,,((

==

=

∈∈∈

+

++

=+++

=

∑∑∑

ϕϕ

ϕϕ

ϕϕϕϕ

How test formula is computed: In general

with bit vectors l, q, c of length k, k2, k3.

Constant c and bit vectors l, q, c can be

computed in polynomial time given r and φ.

The occurring functions

are linear and refer to the same truth values

vector x1,…,xk.

kljiljiT

xx

kjijiT

xx

kiiT

xx

xxxcch

xxqqg

xllf

k

k

k

1,,,,

1,,,

1,,

)()(

)()(

)()(

1

1

1

=

=

=

=

=

=

The used satisfiability certificate Given conjunction of n 3-clauses φ with variables X1,…,Xk, a satisfiability certificate Y consists of all values of linear functions for a certain assignment of truth values x1,…,xk to X1,…,Xk. Note that Y consists of an exponential number of entries.

kkk xxxxxx hgf ,,,,,, 111

Linearity and consistency check In order that the verifier is correct it is to be guaranteed that the offered certificate Y indeed consists of the values of three linear functions and that these linear functions are derived from the same assignment of truth values x1,…,xk. Only with this the probability estimation for non-satifying truth values holds:

{ } 21

1 0),,,(Pr ==kr xxrtest

Linearity and consistency check This forces the verifier to check linearity and consistency. As a deterministic check would require exponential time, the verifier uses the option to check linearity and consistency probabilistically. For a linear function the test delivers „yes“ for sure. For a non-linear function the test delivers „no“ with confidence of 50%. The same applies for consistency.

Probabilistic linearity check

{ }

{ }

{ } )'()()'(1,0',

)'()()'(1,0',

)'()()'(1,0',

3

2

cYcYccYcheckccdraw

qYqYqqYcheckqqdraw

lYlYllYchecklldraw

krandom

krandom

krandom

+=+∈

+=+∈

+=+∈

For (almost) linear functions each check

returns „yes“ with probability 100%, other-

wise „no“ with probability at least 50%.

Probabilistic consistency check { }

{ } { })()()(

1,0,1,0

)()'()'(1,0,'2

qYlYlqYcheckqldraw

lYlYllYchecklldraw

T

krandom

krandom

Tkrandom

=

∈∈

=∈

If Y is in expected format then the checks

deliver „yes“ with probability 100%, else

„no“ with probability at least 50%.

Probabilistic linear verifier Given formula φ with k variables and proof Y DO

{ }

{ }

{ }{ }

{ } { })()()(

1,0,1,0

)()'()'(1,0,'

)'()()'(1,0',

)'()()'(1,0',

)'()()'(1,0',

2

3

2

qYlYlqYcheckqldraw

lYlYllYchecklldraw

cYcYccYcheckccdraw

qYqYqqYcheckqqdraw

lYlYllYchecklldraw

T

krandom

krandom

Tkrandom

krandom

krandom

krandom

=

∈∈

=∈

+=+∈

+=+∈

+=+∈

IF all checks return „yes“ THEN

ELSE reject proof Y

END

END

{ }

)),(()),(()),((),(),(),,(),,(),,(

1,0

ϕϕϕϕ

ϕϕϕϕ

rcYrqYrlYrconstreturnrcrqrlrconstcompute

rdraw nrandom

+++

∈

Probabilistic polynomial verifier Is far more complicated than the linear verifier. It uses „low degree polynomials“ over certain finite fields, and much algebra and prime number theory. It took efforts of more than 10 years of the best researchers in Theoretical Computer Science to establish it. The outcome is:

Probabilistic polynomial verifier

Verifier number of random bits

number of accesses to certificate

linear

polynomial

O(k3) O(1)

O(log(k)) O(polylog(k))

that‘s fine

Probabilistic polynomial verifier A procedure calles „recursive proof checking“ combines linear and polynomial verifier and takes the best part of both leading to:

O(1)

O(log(k))

Probabilistic polynomial verifier Interested? Consult: •  Arora, Lund, Motwani, Sudan, Szegedy (1992)

(original publication) •  Mayr, Prömel und Steger (Hrsg.) (1998) (workshop

lecture) •  http://www.cs.washington.edu/education/courses/

cse533/05au/pcp-history.pdf (history of PCP) •  Irit Dinur: The PCP Theorem by Gap Amplification,

2007 (new, shorter proof)

On the Impact of Representation 5 short stories

Documents

Transcript of On the Impact of Representation 5 short stories