On the Impact of Representation 5 short stories
Transcript of On the Impact of Representation 5 short stories
Machine Learning On the Impact of Representation
5 short stories Prof. Dr. Volker Sperschneider
AG Maschinelles Lernen und Natürlichsprachliche Systeme
Institut für Informatik Technische Fakultät
Albert-Ludwigs-Universität Freiburg [email protected]
1. Cover a truncated chess board – simple
2. Choose the lucky door – well known
3. Playing „three bit lotto“ – surprising
4. Genome rearrangement – clever
5. PCP theorem - unbelievable
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Cover a truncated chess board
Covering a truncated chess board
Can you see now that cover is impossible?
The lucky door
The lucky door
What we cannot see: Behind one of the doors there is a heart of gold.
The lucky door
Make a preliminary choice, e.g.
The lucky door
Among the two other doors I show you one that does not contain the heart of gold.
The lucky door
Now make the final choice: Maintain the preliminary choice or change door.
The lucky door
31
Probability that heart is in one of the two groups
32
31
The lucky door
31
Knowing where heart is not does not change prob‘s
32
31
Choose three bits
• Two players, called A and B, play a game. • Player A chooses a sequence of three
bits, for example 101. • Player B knows what player A has chosen
and also chooses a sequence of three bits, not all the same as the ones player A has chosen, for example 111.
Choose three bits
• Now a random bit generator generates a stream of bits such that all bits are genera-ted independently and with equal probability
• Whose three bits appear first in consecution
wins the game. • Which player would you prefer to be?
Choose three bits
1
Choose three bits
11
Choose three bits
110
Choose three bits
1100
Choose three bits
11001
Choose three bits
110010
Choose three bits
1100101
Pro‘s and Contra‘s
• Every sequence of three bits has the same probability to be generated, namely 1/8.
• Player A may choose among 8 sequen-
ces, player B then only among 7.
• Player B may choose a better sequence than the one player A has chosen.
Pro‘s and Contra‘s
• If there were a possibility for player B to outperform every choice of player A, why not let player A select the best sequence among the 8 possible ones?
• Very disturbing until we represent the
generation of the bit stream using a graph with 4 nodes that represent the 2 bits generated just before, and edges that encode the next generated bit.
Graph representation
00 01
11 10
1 0
1 1
0
1 0
0
Graph representation
• Each edge encodes the three bits actually generated: The just generated bit written at the edge, the two bits generated before written at the source node of the edge.
• The generation of the bit stream corres-ponds to a random walk through the graph starting at the edge that represents the first three bits.
001
00 01
11 10
1 0
1 1
0
1 0
0
0010
00 01
11 10
1 0
1 1
0
1 0
0
00101
00 01
11 10
1 0
1 1
0
1 0
0
001011
00 01
11 10
1 0
1 1
0
1 0
0
0010111
00 01
11 10
1 0
1 1
0
1 0
0
00101110
00 01
11 10
1 0
1 1
0
1 0
0
• Every selection of player A and player B corresponds to a specific edge.
00 01
11 10
1 0
1 1:A
0
1:B 0
0
• Player B outperforms player A as follows:
00 01
11 10
1 0
1:B 1
0
1:A 0
0
• Player B outperforms player A as follows:
00 01
11 10
1:B 0
1:A 1
0
1 0
0
• Player B outperforms player A as follows:
00 01
11 10
1 0
1 1:A
0
1 0:B
0
Player B outperforms player A as follows:
• Can you recognize B`s strategy how to outperform B?
• Can you compute or estimate probabilities of winning the game for players A and B in all of the 3 cases presented above?
• Why is there no best choice for player A?
Oriented genes
+1
-2
+5 +8 +7
-3 -4 -6
Reversal
+1
-2
+5 +8 +7
-3 -4 -6
Reversal
+1
-2
+8 +7 -3
-4
-6
+5
Reversal
+1
-2
+8 +7
Reversal
+1
-2
+8 +7
-3 -4
-6 +5
Reversal
+1
-2
+8 +7
Reversal
+1
-2
+8 +7
Reversal
+1
-2
+8 +7
-3 -4 -6
+5
Reversal
+1
-2
+8 +7 +6
-5
+4 +3
A real example: 36 genes of mouse and worm on mitochondrial DNA
Transformation by 26 reversals
+/- signs omitted for typing reasons – please insert
• 12 31 34 28 26 17 29 04 09 36 18 35 19 01 16 14 32 33 22 15 11 27 05 20 13 30 23 10 06 03 24 21 08 25 02 07 • 20 05 27 11 15 22 33 32 14 16 01 19 35 18 36 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 16 14 32 33 22 15 11 27 05 20 19 35 18 36 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 16 15 22 33 32 14 11 27 05 20 19 35 18 36 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 16 15 36 18 35 19 20 05 27 11 14 32 33 22 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 16 15 14 11 27 05 20 19 35 18 36 32 33 22 09 04 29 17 26 28 34 31 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 16 15 14 31 34 28 26 17 29 04 09 22 33 32 36 18 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 26 28 34 31 14 15 16 17 29 04 09 22 33 32 36 18 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 26 28 18 36 32 33 22 09 04 29 17 16 15 14 31 34 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 26 28 29 04 09 22 33 32 36 18 17 16 15 14 31 34 35 19 20 05 27 11 12 13 30 23 10 06 03 24 21 08 25 02 07 • 01 26 28 29 30 13 12 11 27 05 20 19 35 34 31 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 • 01 26 11 12 13 30 29 28 27 05 20 19 35 34 31 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 13 12 11 05 20 19 35 34 31 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 31 34 35 19 20 05 11 12 13 14 15 16 17 18 36 32 33 22 09 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 31 34 35 19 20 09 22 33 32 36 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 31 22 09 20 19 35 34 33 32 36 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 19 20 09 22 36 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 36 22 09 20 19 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 21 08 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 36 22 09 24 03 06 10 23 04 05 11 12 13 14 15 16 17 18 19 20 21 08 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 36 22 09 08 21 20 19 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 36 08 09 22 21 20 19 18 17 16 15 14 13 12 11 05 04 23 10 06 03 24 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 36 08 09 22 21 20 19 18 17 16 15 14 13 12 11 05 04 03 06 10 23 24 25 02 07 • 01 26 27 28 29 30 31 32 33 34 35 36 08 09 22 21 20 19 18 17 16 15 14 13 12 11 05 04 03 02 25 24 23 10 06 07 • 01 02 03 04 05 11 12 13 14 15 16 17 18 19 20 21 22 09 08 36 35 34 33 32 31 30 29 28 27 26 25 24 23 10 06 07 • 01 02 03 04 05 11 12 13 14 15 16 17 18 19 20 21 22 09 08 07 06 10 23 24 25 26 27 28 29 30 31 32 33 34 35 36 • 01 02 03 04 05 06 07 08 09 22 21 20 19 18 17 16 15 14 13 12 11 10 23 24 25 26 27 28 29 30 31 32 33 34 35 36 • 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Problem representation
• Circular arrangement of genes • Level 1: nodes • Level 2: circles = set of nodes • Level 3: components = sets of circles =
sets of sets of nodes • Level 4: fortresses = sets of components
sets of sets of circles = sets of sets of sets of nodes
Level 1
• Nodes
start +1 +2 +3 +4 +5 !
+2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
start
end +3 +2 +1 +4 -5!
+2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
end
start +1 +2 +3 +4 +5 end +3 +2 +1 +4 -5!
+2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
RD +2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
How complicated may diagrams be
• Start sequence -1 +2 +3 –4 –5 –6 +7 –8 +9 +10 –11 +12
• End sequence -2 +3 –4 –12 +5 –10 –9 +8 -1 -11 -6 +7
How complicated may diagrams be
R L
How complicated may diagrams be
R L
Effect of a reversal
R L
Effect of a reversal
R L
Effect of a reversal
R L
Diagram for the 36 mitochondrial genes n = 36 genes
k = 11 circles
h = 0 hurdles
f = 0 no fortress
requires
n + 1 – k + h + f =
26 reversals for
sorting
Level 2
• Circles = sets of nodes
Good circles (with at least one pair of divergent links) Bad circles (with only pairs of convergent links)
+2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
RD +2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
Lower bound – upper bound
n20
Lower bound – upper bound Reversal between
divergent links of a single circle splits that circle into 2
convergent links of a single cycle does not split that circle
links of two different circles melts these circles into a single one
Lower bound – upper bound Starting with an RD-diagram with n genes and k
circles we require at least n + 1 – k reversals to
achieve the final diagram with n + 1 trivial circles
(each trivial circle has length 2) as each reversal
increases the number of circles by at most 1.
Proof: Clear
Lower bound – upper bound
nkn 21 −+
Lower bound – upper bound The ideal situation would be to always have available circles with pairs of divergent links. Such circles are called good circles.
Circles with no pairs of divergent links are called bad circles. They may be present from the beginning, or created by reversals.
Bad circles that intersect good circles are not really bad. Proper reversals acting on diver-gent links of a good circle may turn bad circles into good ones.
Lower bound – upper bound
Reversal acting on divergent links of a good circle: bad choice
Reversal acting on divergent links of a good circle: good choice
Level 2
• Components = sets of circles = sets of sets of nodes
Lower bound – upper bound Theorem:
Every good component contains at least one
good circle with a pair of divergent links such
that the reversal between these links splits the
circle into two circles and generates one or
more good components from the selected good
component.
Lower bound – upper bound Proof:
Requires an easy simplification to diagrams
that contain only circles of length 2 and 4.
Then the proof is simple.
Look at an example to see that diagrams with
only circles of length 2 and 4 are much more
transparent than general ones.
R L
Lower bound – upper bound
badcknkn +−+−+ 11
Theorem: Every reversal acting on two links of
a bad circle of a bad component turn the bad
component into a good component.
Proof: Simple.
Consequence: Improved upper bound
Lower bound – upper bound badcknkn +−+−+ 11
Do we really require as many extra reversals as
there are bad components? No: Under certain
circumstances bad components may turn into
good components as a side effect of reversals
acting on links of circles with-in different bad
components. This may occur for bad compo-
nents that separate two bad components.
Area of a good components Area of a bad components
+2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
RD +2
-1
-2 +3 -3
-4
+4
+5 -5 +1
L R
Areas of good components are ignored in the
following discussion.
Level 4
• Hurdles may form a fortress = set of components = set of sets of circles = set of sets of sets of nodes
More general constellations of bad components
Hurdles Non-Hurdles
Lower bound – upper bound
hkn +−+1Theorem: Every reversal decreases the number
(with h = number of hurdles) by at most 1, either by
turning one hurdle into a good component (number
of circles unchanged), or by merging two hurdles
and all components that separate them into a good
component (number of circles increases by 1).
Proof: „Fleißaufgabe“.
Lower bound – upper bound Consequence: Improved lower bound
Lower bound is almost also an upper bound – up to
an eventually required further single reversal.
This additional reversal is required if generation of
a fresh hurdle cannot be avoided. The following
example shows how this may happen.
badcknhkn +−++−+ 11
Fresh hurdle appearing
Fresh hurdle appearing
Super-Hurdles
Fortresses
A fortress is characterized by the presence of an odd number of at least 3 hurdles all of which are super-hurdles. Fortress parameter f is 1 in case of a fortress, and 0 in case of a non-fortress.
Fortresses
Theorem: Every reversal decreases the number by at most 1. Proof: „Verschärfte Fleißaufgabe“.
fhkn ++−+1
Lower bound – upper bound
Improved lower bound
badcknfhkn +−+++−+ 11
Lower bound meets upper bound
fhknfhkn ++−+++−+ 11
Proof: Easy
Proof details: Consult
PCP Theorem Probabilistically checkable proofs
• Finding a proof (the solution of an examination exercise) is challenging.
• Checking a proof (the solution of an
examination exercise) boring.
PCP Theorem Probabilistically checkable proofs
• It would be nice if the lecturer/assistent could reduce checking of a solution to the inspection of only 13 random characters – regardless of how long the solution is.
PCP Theorem
• We concentrate of satisfiability problem for 3-clauses in boolean logic:
Given a conjunction of 3-clauses and an assignment of truth values to the boolean variables, check whether these truth values satisfy the formula.
Standard satisfiability verifier
Input φ(X1,…,Xk) of length n
Certificate: Truth values (x1,…,xk)
Verifier: Insert truth values in formula and use truth tables
Standard satisfiability checking
• Given a boolean formula and an assign- ment of truth values to all of its variables, insert truth values for the variables and compute truth value of the formula accor- ding to truth tables.
• This is a polynomial time procedure.
Standard satisfiability checking
• Correctness: Given a non-satisfiable formula, the test deliveres „no“ for every assignment of truth values.
• Completeness: Given a satisfiable
formula, the test deliveres „yes“ for at least one assignment of truth values.
Standard satisfiability checking
• All truth values must be read.
• The longer the formula is, the more boolean variables are to be expected, the more truth values must be inspected.
Probabilistic satisfiability verifier
Input φ(X1,…,Xk)
Expect certificate (proof) Y
draw bit string r
access constant number of bits from Y
Verifier uses φ, r, and accessed bits of Y
Probabilistic satisfiability checking
• For a boolean variable X its arithmetical representation is the polynomial 1-X.
• For a negated boolean variable ¬X its arithmetical representation is the polynomial X.
Probabilistic satisfiability checking
• For a 3-clause (that is, a disjunction of three literals) its arithmetical represen- tation is the product of the arithmetical representations of its three literals.
Probabilistic satisfiability checking
• Example: Clause
¬X˅Y˅Z is represented by polynomial
X(1-Y)(1-Z)
Probabilistic satisfiability checking
• Clause ¬X˅Y˅Z is satisfied by truth values x, y, z iff
x = 0 or y = 1 or z = 1
that is, iff
x(1-y)(1-z) = 0
Probabilistic satisfiability checking • For a conjunction of clauses its arithmetical
representation is the vector of arithmetical representations of all clauses.
• Example: Conjunction of 4 clauses
(X˅¬Y˅Z) ⋀ (¬X˅Y˅¬Z) ⋀ (X˅¬Y˅¬Z) ⋀ (¬X˅¬Y˅¬Z)
Probabilistic satisfiability checking
• Arithmetical representation by vector of 4 polynomial
((1-X)Y(1-Z), X(1-Y)Z, (1-X)YZ, XYZ)
Probabilistic satisfiability checking
• For a conjunction of n clauses
φ(X1,…,Xk)
with k variables X1,…,Xk let its arithmetical representation be
(p1(X1,…,Xk),…,pn(X1,…,Xk))
Probabilistic satisfiability checking
• For every list of truth values x1,…,xk the following are equivalent:
- φ(X1,…,Xk) is satisfied by x1,…,xk
- (p1(x1,…,xk),…,pn(x1,…,xk)) = (0,…,0)
Probabilistic satisfiability checking
• Draw a random bit vector r of length n. • From now on we compute mod 2, that
is, in the field of binary values 0, 1 with
1 + 1 = 0 + 0 = 0 1 + 0 = 0 + 1 = 1
0·0 = 0·1 = 1·0 = 0 1·1 = 1
Probabilistic satisfiability checking
• For truth values x1,…,xk compute (mod 2)
),,(
)),,(,),,,((
),,,(
11
111
1
k
n
iii
knkT
k
xxpr
xxpxxprxxrtest
∑=
=
=
Probabilistic satisfiability checking
• If φ(X1,…,Xk) is satisfied by x1,…,xk then
Proof: Trivial, since in this case
{ } 10),,,(Pr 1 ==kr xxrtest
)0,,0()),,(,),,,(( 111 =knk xxpxxp
Probabilistic satisfiability checking
• If φ(X1,…,Xk) is not satisfied by x1,…,xk
then
{ } 2
11 0),,,(Pr ==kr xxrtest
Probabilistic satisfiability checking Proof: Consider an arbitrary component i with
Changing ri changes value of Thus for exactly 50% of all vectors r:
0),,( 1 ≠ki xxp
),,,( 1 kxxrtest
0),,,( 1 =kxxrtest
XYZZYXpYZXZYXpZYXZYXp
ZYXZYXp
ZYXZYXZYXZYXZYX
r
=
−=
−=
−−−=
¬∨¬∨¬∧¬∨¬∨
∧¬∨∨¬∧∨∨=
=
),,()1(),,()1(),,(
)1)(1)(1(),,(
)()()()(),,(
)1,0,1,1(
4
3
2
1
ϕ
How test formula is computed: Example
yzxyzyx
xyzxyzxzxyzxyzyzxzxyzyx
xyzzyxzyx
zyxpzyxpzyxpzyxprT
++−−−
=+−+
−−+++−−−
=+−+−−−
=
1
1
)1()1)(1)(1(
)),,(),,,(),,,(),,,(( 4321
How test formula is computed: Example
kljiljiT
kjijiT
kiiT
rCkjikji
rQjiji
rLii
knkT
xxxrcxxrqxrlrconst
xxxxxxrconst
xxpxxpr
1,,1,
1
),(,,),(,),(
111
)(),()(),(
)(),(),(
),(
)),,(,),,,((
==
=
∈∈∈
+
++
=+++
=
∑∑∑
ϕϕ
ϕϕ
ϕϕϕϕ
How test formula is computed: In general
with bit vectors l, q, c of length k, k2, k3.
Constant c and bit vectors l, q, c can be
computed in polynomial time given r and φ.
The occurring functions
are linear and refer to the same truth values
vector x1,…,xk.
kljiljiT
xx
kjijiT
xx
kiiT
xx
xxxcch
xxqqg
xllf
k
k
k
1,,,,
1,,,
1,,
)()(
)()(
)()(
1
1
1
=
=
=
=
=
=
The used satisfiability certificate Given conjunction of n 3-clauses φ with variables X1,…,Xk, a satisfiability certificate Y consists of all values of linear functions for a certain assignment of truth values x1,…,xk to X1,…,Xk. Note that Y consists of an exponential number of entries.
kkk xxxxxx hgf ,,,,,, 111
Linearity and consistency check In order that the verifier is correct it is to be guaranteed that the offered certificate Y indeed consists of the values of three linear functions and that these linear functions are derived from the same assignment of truth values x1,…,xk. Only with this the probability estimation for non-satifying truth values holds:
{ } 21
1 0),,,(Pr ==kr xxrtest
Linearity and consistency check This forces the verifier to check linearity and consistency. As a deterministic check would require exponential time, the verifier uses the option to check linearity and consis- tency probabilistically. For a linear function the test delivers „yes“ for sure. For a non-linear function the test delivers „no“ with confidence of 50%. The same applies for consistency.
Probabilistic linearity check
{ }
{ }
{ } )'()()'(1,0',
)'()()'(1,0',
)'()()'(1,0',
3
2
cYcYccYcheckccdraw
qYqYqqYcheckqqdraw
lYlYllYchecklldraw
krandom
krandom
krandom
+=+∈
+=+∈
+=+∈
For (almost) linear functions each check
returns „yes“ with probability 100%, other-
wise „no“ with probability at least 50%.
Probabilistic consistency check { }
{ } { })()()(
1,0,1,0
)()'()'(1,0,'2
qYlYlqYcheckqldraw
lYlYllYchecklldraw
T
krandom
krandom
Tkrandom
=
∈∈
=∈
If Y is in expected format then the checks
deliver „yes“ with probability 100%, else
„no“ with probability at least 50%.
Probabilistic linear verifier Given formula φ with k variables and proof Y DO
{ }
{ }
{ }{ }
{ } { })()()(
1,0,1,0
)()'()'(1,0,'
)'()()'(1,0',
)'()()'(1,0',
)'()()'(1,0',
2
3
2
qYlYlqYcheckqldraw
lYlYllYchecklldraw
cYcYccYcheckccdraw
qYqYqqYcheckqqdraw
lYlYllYchecklldraw
T
krandom
krandom
Tkrandom
krandom
krandom
krandom
=
∈∈
=∈
+=+∈
+=+∈
+=+∈
IF all checks return „yes“ THEN
ELSE reject proof Y
END
END
{ }
)),(()),(()),((),(),(),,(),,(),,(
1,0
ϕϕϕϕ
ϕϕϕϕ
rcYrqYrlYrconstreturnrcrqrlrconstcompute
rdraw nrandom
+++
∈
Probabilistic polynomial verifier Is far more complicated than the linear verifier. It uses „low degree polynomials“ over certain finite fields, and much algebra and prime number theory. It took efforts of more than 10 years of the best researchers in Theoretical Computer Science to establish it. The outcome is:
Probabilistic polynomial verifier
Verifier number of random bits
number of accesses to certificate
linear
polynomial
O(k3) O(1)
O(log(k)) O(polylog(k))
that‘s fine
Probabilistic polynomial verifier A procedure calles „recursive proof checking“ combines linear and polynomial verifier and takes the best part of both leading to:
O(1)
O(log(k))
Probabilistic polynomial verifier Interested? Consult: • Arora, Lund, Motwani, Sudan, Szegedy (1992)
(original publication) • Mayr, Prömel und Steger (Hrsg.) (1998) (workshop
lecture) • http://www.cs.washington.edu/education/courses/
cse533/05au/pcp-history.pdf (history of PCP) • Irit Dinur: The PCP Theorem by Gap Amplification,
2007 (new, shorter proof)