Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf ·...

Finding k-best MAP Solutions Using LP Relaxations

Amir GlobersonSchool of Computer Science and Engineering

The Hebrew University

Joint Work with: Menachem Fromer (Hebrew Univ.)

Prediction ProblemsConsider the following problem:

Observe variables:

Predict variables: xh

xv

Prediction ProblemsConsider the following problem:

Observe variables:

Predict variables:

Noisy Image Source Image

Received bits Code word

Symptoms Disease

Sentence Derivation

Countless applications:

Images:

Error correcting codes

Medical diagnostics

Text

Visible Hidden

xh

xv

Statistical Models for Prediction


One approach:


One approach:

Assume (or learn) a model for p(xh,xv)


One approach:

Assume (or learn) a model for

Predict the most likely hidden values

p(xh,xv)

arg maxxh

p(xh|xv)


One approach:



p(xh,xv)

arg maxxh

p(xh|xv)

This conditional distribution often corresponds to a graphical model


One approach:



p(xh,xv)

arg maxxh

p(xh|xv)

This conditional distribution often corresponds to a graphical model

Need to know how to find an assignment with maximum probability

The MAP ProblemGiven a graphical model over

f(x) =!

ij

!ij(xi, xj)

x1, . . . , xn

Find the most likely assignment:

xi

xj!ij(xi, xj)

p(x) =1Z

ef(x)

arg maxx

f(x)

MAP Approximationsx is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

x is discrete so generally NP hard

MAP Approximations


Greedy search


MAP Approximations


Greedy search

Loopy belief propagation (e.g., max product)


MAP Approximations


Greedy search


Linear programming relaxations


MAP Approximations


Greedy search



LP approaches


MAP Approximations


Greedy search



LP approaches

Provide optimality certificates


MAP Approximations


Greedy search



LP approaches


Optimal in some cases (e.g., submodular functions)


MAP Approximations


Greedy search



LP approaches


Optimal in some cases (e.g., submodular functions)

Can be solved via message passing


The k-best MAP Problem


Find the k best assignments for f(x)



Denote these by x(1), . . . ,x(k)



Denote these by

Useful in:

x(1), . . . ,x(k)



Denote these by

Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

x(1), . . . ,x(k)



Denote these by

Useful in:


As a first processing stage before applying more complex methods

x(1), . . . ,x(k)



Denote these by

Useful in:


As a first processing stage before applying more complex methods

Supervised learning

x(1), . . . ,x(k)

From 2 to k best

We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k)

Focus on k=2

Our key question: what is the LP formulation of the problem, and its relaxations?

OutlineLP formulation of the MAP problem

LP for 2nd best

General (intractable) exact formulation

Tractable formulation for tree graphs

Approximations for non-tree graphs

Experiments

MAP and LP

MAP and LPMAP: max

xf(x)

MAP and LPMAP:

MAP as LP:

maxx

f(x)

MAP and LPMAP:

MAP as LP:

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

Approximate MAP via LP

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

Approximate MAP via LP

maxx

f(x)

Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov

maxµ!S

µ · !

LP Formulation of MAP

LP Formulation of MAPx! = arg max

x

!

ij"E

!ij(xi, xj)


maxq(x)

!

x

q(x)!

ij

!ij(xi, xj)=

x! = arg maxx

!

ij"E

!ij(xi, xj)


maxq(x)

!

x

q(x)!

ij

!ij(xi, xj)=

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)


maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)


Objective depends only on pairwise marginals

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj


0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)



But only those that correspond to some distribution

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj


0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)




This set is called the Marginal polytope ( Wainwright & Jordan)

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj


0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)





maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj


0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

maxx

!

ij

!ij(xi, xj) = maxµ!M(G)

!

ij

µij(xi, xj)!ij(xi, xj)





maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj


0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

maxx

!

ij


!

ij

µij(xi, xj)!ij(xi, xj)= maxµ!M(G)

µ · !





maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj


0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

maxx

!

ij


!

ij


See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)

= maxµ!M(G)

µ · !

The Marginal Polytope

Marginal Polytope

M(G)max

µ!M(G)

!

ij!E

!

xi,xj



Marginal Polytope

M(G)µmax

µ!M(G)

!

ij!E

!

xi,xj



Marginal Polytope

M(G)µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

maxµ!M(G)

!

ij!E

!

xi,xj



Marginal Polytope

M(G)µ


maxµ!M(G)

!

ij!E

!

xi,xj


Difficult set to characterize. Easy to outer bound


Marginal Polytope

M(G)µ


maxµ!M(G)

!

ij!E

!

xi,xj


Difficult set to characterize. Easy to outer bound

The vertices have integral values and correspond to assignments on x

Relaxing the MAP LPmax

x

!

ij


!

ij!E

!

xi,xj


M(G)


x

!

ij


!

ij!E

!

xi,xj


Exact but Hard!M(G)


x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj


S

M(G)


x

!

ij


!

ij!E

!

xi,xj


If optimum is an integral vertex, MAP is solved

S

M(G)


x

!

ij


!

ij!E

!

xi,xj



Possible outer bound: Pairwise consistencyS

M(G)


x

!

ij


!

ij!E

!

xi,xj



Possible outer bound: Pairwise consistency

j!

i!

k! !

xi

µij(xi, xj) =!

xk

µjk(xj , xk)

S

M(G)


x

!

ij


!

ij!E

!

xi,xj




j!

i!

k! !

xi

µij(xi, xj) =!

xk

µjk(xj , xk)Exact for trees

S

M(G)


x

!

ij


!

ij!E

!

xi,xj




j!

i!

k! !

xi

µij(xi, xj) =!

xk

µjk(xj , xk)

Efficient message passing schemes for solving the resulting (dual) LP

Exact for trees

S

M(G)


LP for 2nd best




Experiments

The 2nd best problem and LP

MAP 2nd best


maxx

f(x)MAP 2nd best


maxx !=x(1)

f(x)maxx

f(x)MAP 2nd best


maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · !

MAP 2nd best


maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

x(1)

MAP 2nd best


maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

x(1)

MAP 2nd best

Approximations:

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)



M(G, z)



µ

M(G, z)



µ


M(G, z)



µ


and:

M(G, z)



µ


and: p(z) = 0

M(G, z)



µ


and: p(z) = 0

M(G)

M(G, z)



µ


and: p(z) = 0

zM(G)

M(G, z)

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

x(1)



maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is there a simple characterization of ? M(G, x(1))

x(1)



maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !


Is it plus one inequality?M(G)

x(1)



maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !


Is it plus one inequality?

If so, what inequality?

M(G)

x(1)


LP for 2nd best




Experiments

Adding inequalities to z z

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

z z

M(G)


How about: (Santos 91)!

i

µi(zi) ! n" 1

z z

M(G)


How about: (Santos 91)

RHS is n for z and or less for other vertices

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)




But: Results in fractional vertices, even for trees

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)




But: Results in fractional vertices, even for trees

Only an outer bound on

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)

M(G, z)

The tree case

The tree caseFocus on the case where G is a tree


is given by pairwise consistencyM(G)


is given by pairwise consistency

Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)



Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

H(µ) =!

i

(1! di)Hi(Xi) +!

ij!G

H(Xi, Xj)Bethe:



Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)



Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"



Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"M(G)



Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"M(G)

I(µ,z) ! 0



Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

I(µ,z) ! 0

M(G, z)



Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

I(µ,z) ! 0

M(G, z)Proof...

ProofA(G, z) =

!µ | µ !M(G), I(µ,z) " 0

"Define:

ProofA(G, z) =

!µ | µ !M(G), I(µ,z) " 0

"Define:

A(G, z) =M(G, z)Want to show:

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Proof


µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Can construct p(x)

Proof


µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Proof


µ ! A(G, z)

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Proof


µ ! A(G, z)

F (µ) =

!""#

""$



A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

= 0!µ " A(G, z)

Define:

Proof


µ ! A(G, z)

F (µ) =

!""#

""$



In fact we can show that for trees:

µ !M(G) F (µ) = max{0, I(µ,z)}

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

= 0!µ " A(G, z)

Define:

Proof - key ideas

F (µ) =

!""#

""$



Proof - key ideas

F (µ) =

!""#

""$


pi(xi) = µi(xi)p(x) ! 0 !x "= z

Proof - key ideas

F (µ) =

!""#

""$


pi(xi) = µi(xi)p(x) ! 0 !x "= z

,

Proof - key ideas

F (µ) =

!""#

""$


pi(xi) = µi(xi)p(x) ! 0 !x "= z

,

Dual: max ! · µs.t.

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

!""#

""$


pi(xi) = µi(xi)p(x) ! 0 !x "= z

We show that the value of the above is

,

I(µ,z)


!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

!""#

""$


pi(xi) = µi(xi)p(x) ! 0 !x "= z


From there it’s easy to conclude that

,

I(µ,z)


!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

!""#

""$


pi(xi) = µi(xi)p(x) ! 0 !x "= z


From there it’s easy to conclude that

F (µ) = max{0, I(µ,z)}

,

I(µ,z)


!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - Max marginalsmax ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Proof - Max marginals

Use max-marginals:

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)


Use max-marginals:

!̄(xi) = maxx̂:x̂i=xi

!(x)

!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj

!(x)

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)


Use max-marginals:


!(x)


!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)


Use max-marginals:


!(x)


!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Rewrite: !(x) =!

i

(1! di)!̄(xi) +!

ij!T

!̄ij(xi, xj)


Use max-marginals:


!(x)


!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

Result follows after some algebra

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Rewrite: !(x) =!

i

(1! di)!̄(xi) +!

ij!T

!̄ij(xi, xj)

Tree Graph - Summary

x(1)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"


The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

x(1)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"



The 2nd best satisfies so it cannot be any assignment

x(1)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0




x(1)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0




x(1)

x(2)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0




x(1)

x(2)

x(2)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0




x(1)

x(2)

x(2)

x(2)X

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0




x(1)

x(2)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Non tree graphsAny graph can be converted into a junction tree

We can apply our tree result there

For a junction tree with cliques C and separators S, the inequality is:

!

S!S(1! dS)µS(zS) +

!

C!CµC(zC) " 0

Specifying the marginal polytope requires a number of variables exponential in the tree width. Not practical.


LP for 2nd best




Experiments

Non trees - Approximations

x(1)

TrueM(G, x(1))

Non trees - Approximations

x(1)

TrueM(G, x(1))

Outer bound on M(G)

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

IT (µ,z) ! 0And the constraint:



!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices




!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)


z




!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)


z


IT (µ,z) ! 0



!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)


zFractional vertex


IT (µ,z) ! 0

Adding all spanning trees


Can we add all spanning tree inequalities efficiently?



Yes, via a cutting plane approach:




Start with one inequality





Solve LP





Solve LP

If solution is fractional, find a violated tree inequality (if exists) and add it

Cutting Plane Algorithm


z


zT1


zµ1

T1


zµ1 Is there a tree

inequality thatviolates?

µ1

T1




µ1

T1

T2


How do we find a violated tree inequality?

Note: Even all spanning tree inequalities might not suffice



µ1

T1

T2

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)


For a given find


µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:


For a given find


µ maxT

IT (µ,z)


IT (µ,z) =!

ij!T

"µij(zi, zj)! µi(zi)! µj(zj)

#+

!

i

µi(zi)


For a given find


µ maxT

IT (µ,z)


IT (µ,z) =!

ij!T


#+

!

i

µi(zi)

wij


For a given find


µ maxT

IT (µ,z)


IT (µ,z) =!

ij!T


#+

!

i

µi(zi)

wij Fixed


For a given find


µ maxT

IT (µ,z)


IT (µ,z) =!

ij!T


#+

!

i

µi(zi)

Decomposes into edge scores. Maximizing tree can be found using a maximum-weight-spanning-tree algorithm (e.g., Wainwright 02)

wij Fixed

ExperimentsAlternative algorithms for approximate 2nd best:

Using approximate marginals from max-product (BMMF; Yanover and Weiss 04)

Lawler/Nillson (72,80) - Partition assignments :

Maximize over each part approximately. Cost O(n)

Our algorithm: STRIPES

x != x(1)

x1 != x(1)1 x2 = " x3 = " . . . xn = "

x1 = x(1)1 x2 != x(1)

2 x3 = " . . . xn = "...

......

......

x1 = x(1)1 x2 = x(1)

2 x3 = x(3)1 . . . xn != x(n)

1

Attractive GridsIsing models with ferromagnetic interaction

The local-polytope guaranteed to yield exact first best (but not equal to the marginal polytope)

Goal: Find 50 best. Stripes and Nillson find all of them exactly. Up to 19 spanning trees added

S N B0

0.5

1

S N B0

50

Stripes Nillson BMMF

0

50

0Stripes Nillson BMMF

Rank Run Time

Protein Side Chain Prediction

Given protein’s 3D shape (backbone), choose most probable side chain configuration

xi!

xk!

xj !

xh!

G=(V,E)!

Protein backbone!

Side-chains!

(MRFs from Yanover, Meltzer, Weiss ‘06)!

Can be cast as a MAP problem

Important to obtain multiple possible solutions

p(x) ! eP

ij!E !ij(xi,xj)

Protein Side Chain Prediction

Stripes found the exact solutions for all problems studied

In some cases, we used a tighter approximation of the marginal polytope (Sontag et al, UAI 08)

S N B0

50

S N B0

0.5

1

Stripes Nillson BMMF0

50

0Stripes Nillson BMMF

Open Questions

Open QuestionsWhen are spanning trees enough?


What is the polytope structure for k-best?



Finding k-best “different” solutions




Scalable algorithms




Scalable algorithms

If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

SummaryThe 2nd best can be posed as a linear program

For trees differs from 1st best by one constraint only

For non-trees, approximation can be devised by adding inequalities for all spanning trees

Empirically effective

Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf ·...

Documents

Transcript of Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf ·...