Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf ·...
Transcript of Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf ·...
Finding k-best MAP Solutions Using LP Relaxations
Amir GlobersonSchool of Computer Science and Engineering
The Hebrew University
Joint Work with: Menachem Fromer (Hebrew Univ.)
Prediction ProblemsConsider the following problem:
Observe variables:
Predict variables: xh
xv
Prediction ProblemsConsider the following problem:
Observe variables:
Predict variables:
Noisy Image Source Image
Received bits Code word
Symptoms Disease
Sentence Derivation
Countless applications:
Images:
Error correcting codes
Medical diagnostics
Text
Visible Hidden
xh
xv
Statistical Models for Prediction
Statistical Models for Prediction
One approach:
Statistical Models for Prediction
One approach:
Assume (or learn) a model for p(xh,xv)
Statistical Models for Prediction
One approach:
Assume (or learn) a model for
Predict the most likely hidden values
p(xh,xv)
arg maxxh
p(xh|xv)
Statistical Models for Prediction
One approach:
Assume (or learn) a model for
Predict the most likely hidden values
p(xh,xv)
arg maxxh
p(xh|xv)
This conditional distribution often corresponds to a graphical model
Statistical Models for Prediction
One approach:
Assume (or learn) a model for
Predict the most likely hidden values
p(xh,xv)
arg maxxh
p(xh|xv)
This conditional distribution often corresponds to a graphical model
Need to know how to find an assignment with maximum probability
The MAP ProblemGiven a graphical model over
f(x) =!
ij
!ij(xi, xj)
x1, . . . , xn
Find the most likely assignment:
xi
xj!ij(xi, xj)
p(x) =1Z
ef(x)
arg maxx
f(x)
MAP Approximationsx is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
Linear programming relaxations
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
Linear programming relaxations
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
Linear programming relaxations
LP approaches
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
Linear programming relaxations
LP approaches
Provide optimality certificates
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
Linear programming relaxations
LP approaches
Provide optimality certificates
Optimal in some cases (e.g., submodular functions)
x is discrete so generally NP hard
MAP Approximations
Many approximation approaches:
Greedy search
Loopy belief propagation (e.g., max product)
Linear programming relaxations
LP approaches
Provide optimality certificates
Optimal in some cases (e.g., submodular functions)
Can be solved via message passing
x is discrete so generally NP hard
The k-best MAP Problem
The k-best MAP Problem
Find the k best assignments for f(x)
The k-best MAP Problem
Find the k best assignments for f(x)
Denote these by x(1), . . . ,x(k)
The k-best MAP Problem
Find the k best assignments for f(x)
Denote these by
Useful in:
x(1), . . . ,x(k)
The k-best MAP Problem
Find the k best assignments for f(x)
Denote these by
Useful in:
Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)
x(1), . . . ,x(k)
The k-best MAP Problem
Find the k best assignments for f(x)
Denote these by
Useful in:
Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)
As a first processing stage before applying more complex methods
x(1), . . . ,x(k)
The k-best MAP Problem
Find the k best assignments for f(x)
Denote these by
Useful in:
Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)
As a first processing stage before applying more complex methods
Supervised learning
x(1), . . . ,x(k)
From 2 to k best
We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k)
Focus on k=2
Our key question: what is the LP formulation of the problem, and its relaxations?
OutlineLP formulation of the MAP problem
LP for 2nd best
General (intractable) exact formulation
Tractable formulation for tree graphs
Approximations for non-tree graphs
Experiments
MAP and LP
MAP and LPMAP: max
xf(x)
MAP and LPMAP:
MAP as LP:
maxx
f(x)
MAP and LPMAP:
MAP as LP:
maxx
f(x)
maxµ!S
µ · !
MAP and LPMAP:
MAP as LP:
S
maxx
f(x)
maxµ!S
µ · !
MAP and LPMAP:
MAP as LP:
S
Hard
maxx
f(x)
maxµ!S
µ · !
MAP and LPMAP:
MAP as LP:
S
Hard
Approximate MAP via LP
maxx
f(x)
maxµ!S
µ · !
MAP and LPMAP:
MAP as LP:
S
Hard
Approximate MAP via LP
maxx
f(x)
maxµ!S
µ · !
MAP and LPMAP:
MAP as LP:
S
Hard
Approximate MAP via LP
maxx
f(x)
Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov
maxµ!S
µ · !
LP Formulation of MAP
LP Formulation of MAPx! = arg max
x
!
ij"E
!ij(xi, xj)
LP Formulation of MAP
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj)=
x! = arg maxx
!
ij"E
!ij(xi, xj)
LP Formulation of MAP
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj)=
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
LP Formulation of MAP
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
LP Formulation of MAP
Objective depends only on pairwise marginals
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
LP Formulation of MAP
Objective depends only on pairwise marginals
But only those that correspond to some distribution
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
q(x)
LP Formulation of MAP
Objective depends only on pairwise marginals
But only those that correspond to some distribution
This set is called the Marginal polytope ( Wainwright & Jordan)
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
q(x)
LP Formulation of MAP
Objective depends only on pairwise marginals
But only those that correspond to some distribution
This set is called the Marginal polytope ( Wainwright & Jordan)
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
q(x)
maxx
!
ij
!ij(xi, xj) = maxµ!M(G)
!
ij
µij(xi, xj)!ij(xi, xj)
LP Formulation of MAP
Objective depends only on pairwise marginals
But only those that correspond to some distribution
This set is called the Marginal polytope ( Wainwright & Jordan)
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
q(x)
maxx
!
ij
!ij(xi, xj) = maxµ!M(G)
!
ij
µij(xi, xj)!ij(xi, xj)= maxµ!M(G)
µ · !
LP Formulation of MAP
Objective depends only on pairwise marginals
But only those that correspond to some distribution
This set is called the Marginal polytope ( Wainwright & Jordan)
maxq(x)
!
x
q(x)!
ij
!ij(xi, xj) maxq(x)
!
ij
!
xi,xj
qij(xi, xj)!ij(xi, xj)= =
0
1q!(x)
xx!x! = arg max
x
!
ij"E
!ij(xi, xj)
q(x)
maxx
!
ij
!ij(xi, xj) = maxµ!M(G)
!
ij
µij(xi, xj)!ij(xi, xj)
See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)
= maxµ!M(G)
µ · !
The Marginal Polytope
Marginal Polytope
M(G)max
µ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
The Marginal Polytope
Marginal Polytope
M(G)µmax
µ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
The Marginal Polytope
Marginal Polytope
M(G)µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
maxµ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
The Marginal Polytope
Marginal Polytope
M(G)µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
maxµ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
Difficult set to characterize. Easy to outer bound
The Marginal Polytope
Marginal Polytope
M(G)µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
maxµ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
Difficult set to characterize. Easy to outer bound
The vertices have integral values and correspond to assignments on x
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) = maxµ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) = maxµ!M(G)
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
Exact but Hard!M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) ! maxµ!S
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
S
M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) ! maxµ!S
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
If optimum is an integral vertex, MAP is solved
S
M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) ! maxµ!S
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
If optimum is an integral vertex, MAP is solved
Possible outer bound: Pairwise consistencyS
M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) ! maxµ!S
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
If optimum is an integral vertex, MAP is solved
Possible outer bound: Pairwise consistency
j!
i!
k! !
xi
µij(xi, xj) =!
xk
µjk(xj , xk)
S
M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) ! maxµ!S
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
If optimum is an integral vertex, MAP is solved
Possible outer bound: Pairwise consistency
j!
i!
k! !
xi
µij(xi, xj) =!
xk
µjk(xj , xk)Exact for trees
S
M(G)
Relaxing the MAP LPmax
x
!
ij
!ij(xi, xj) ! maxµ!S
!
ij!E
!
xi,xj
µij(xi, xj)!ij(xi, xj)
If optimum is an integral vertex, MAP is solved
Possible outer bound: Pairwise consistency
j!
i!
k! !
xi
µij(xi, xj) =!
xk
µjk(xj , xk)
Efficient message passing schemes for solving the resulting (dual) LP
Exact for trees
S
M(G)
OutlineLP formulation of the MAP problem
LP for 2nd best
General (intractable) exact formulation
Tractable formulation for tree graphs
Approximations for non-tree graphs
Experiments
The 2nd best problem and LP
MAP 2nd best
The 2nd best problem and LP
maxx
f(x)MAP 2nd best
The 2nd best problem and LP
maxx !=x(1)
f(x)maxx
f(x)MAP 2nd best
The 2nd best problem and LP
maxx !=x(1)
f(x)maxx
f(x)
maxµ!M(G)
µ · !
MAP 2nd best
The 2nd best problem and LP
maxx !=x(1)
f(x)maxx
f(x)
maxµ!M(G)
µ · ! maxµ!M(G,x(1))
µ · !
x(1)
MAP 2nd best
The 2nd best problem and LP
maxx !=x(1)
f(x)maxx
f(x)
maxµ!M(G)
µ · ! maxµ!M(G,x(1))
µ · !
x(1)
MAP 2nd best
Approximations:
The 2nd best problem and LP
maxx !=x(1)
f(x)maxx
f(x)
maxµ!M(G)
µ · ! maxµ!M(G,x(1))
µ · !
x(1)
MAP 2nd best
Approximations:
The 2nd best problem and LP
maxx !=x(1)
f(x)maxx
f(x)
maxµ!M(G)
µ · ! maxµ!M(G,x(1))
µ · !
x(1)
MAP 2nd best
Approximations:
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
and:
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
and: p(z) = 0
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
and: p(z) = 0
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
and: p(z) = 0
M(G)
M(G, z)
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)
µ
There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)
and: p(z) = 0
zM(G)
M(G, z)
LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
maxx !=x(1)
f(x;!) = maxµ"M(G,x(1))
µ · !
x(1)
LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
maxx !=x(1)
f(x;!) = maxµ"M(G,x(1))
µ · !
Is there a simple characterization of ? M(G, x(1))
x(1)
LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
maxx !=x(1)
f(x;!) = maxµ"M(G,x(1))
µ · !
Is there a simple characterization of ? M(G, x(1))
Is it plus one inequality?M(G)
x(1)
LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
maxx !=x(1)
f(x;!) = maxµ"M(G,x(1))
µ · !
Is there a simple characterization of ? M(G, x(1))
Is it plus one inequality?
If so, what inequality?
M(G)
x(1)
OutlineLP formulation of the MAP problem
LP for 2nd best
General (intractable) exact formulation
Tractable formulation for tree graphs
Approximations for non-tree graphs
Experiments
Adding inequalities to z z
M(G)
Adding inequalities to Any valid inequality must separate from the other vertices
z z
M(G)
Adding inequalities to Any valid inequality must separate from the other vertices
How about: (Santos 91)!
i
µi(zi) ! n" 1
z z
M(G)
Adding inequalities to Any valid inequality must separate from the other vertices
How about: (Santos 91)
RHS is n for z and or less for other vertices
!
i
µi(zi) ! n" 1
z z
n! 1
M(G)
Adding inequalities to Any valid inequality must separate from the other vertices
How about: (Santos 91)
RHS is n for z and or less for other vertices
But: Results in fractional vertices, even for trees
!
i
µi(zi) ! n" 1
z z
n! 1
M(G)
Adding inequalities to Any valid inequality must separate from the other vertices
How about: (Santos 91)
RHS is n for z and or less for other vertices
But: Results in fractional vertices, even for trees
!
i
µi(zi) ! n" 1
z z
n! 1
M(G)
Adding inequalities to Any valid inequality must separate from the other vertices
How about: (Santos 91)
RHS is n for z and or less for other vertices
But: Results in fractional vertices, even for trees
Only an outer bound on
!
i
µi(zi) ! n" 1
z z
n! 1
M(G)
M(G, z)
The tree case
The tree caseFocus on the case where G is a tree
The tree caseFocus on the case where G is a tree
is given by pairwise consistencyM(G)
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
H(µ) =!
i
(1! di)Hi(Xi) +!
ij!G
H(Xi, Xj)Bethe:
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
Theorem:
M(G, z) =!µ | µ !M(G), I(µ,z) " 0
"
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
z
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
Theorem:
M(G, z) =!µ | µ !M(G), I(µ,z) " 0
"M(G)
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
z
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
Theorem:
M(G, z) =!µ | µ !M(G), I(µ,z) " 0
"M(G)
I(µ,z) ! 0
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
z
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
Theorem:
M(G, z) =!µ | µ !M(G), I(µ,z) " 0
"M(G)
I(µ,z) ! 0
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
z
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
Theorem:
M(G, z) =!µ | µ !M(G), I(µ,z) " 0
"
I(µ,z) ! 0
M(G, z)
The tree caseFocus on the case where G is a tree
is given by pairwise consistency
Define:
z
I(µ,z) =!
i
(1! di)µi(zi) +!
ij!G
µij(zi, zj)
M(G)
Theorem:
M(G, z) =!µ | µ !M(G), I(µ,z) " 0
"
I(µ,z) ! 0
M(G, z)Proof...
ProofA(G, z) =
!µ | µ !M(G), I(µ,z) " 0
"Define:
ProofA(G, z) =
!µ | µ !M(G), I(µ,z) " 0
"Define:
A(G, z) =M(G, z)Want to show:
Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ! A(G, z)
A(G, z) =!µ | µ !M(G), I(µ,z) " 0
"Define:
Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ! A(G, z)
A(G, z) =!µ | µ !M(G), I(µ,z) " 0
"Define:
Can construct p(x)
Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ! A(G, z)
A(G, z) =!µ | µ !M(G), I(µ,z) " 0
"Define:
Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ! A(G, z)
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0
A(G, z) =!µ | µ !M(G), I(µ,z) " 0
"Define:
Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ! A(G, z)
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0
A(G, z) =!µ | µ !M(G), I(µ,z) " 0
"
= 0!µ " A(G, z)
Define:
Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ! A(G, z)
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0
In fact we can show that for trees:
µ !M(G) F (µ) = max{0, I(µ,z)}
A(G, z) =!µ | µ !M(G), I(µ,z) " 0
"
= 0!µ " A(G, z)
Define:
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0 !x "= z
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0 !x "= z
,
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0 !x "= z
,
Dual: max ! · µs.t.
!ij !ij(xi, xj) +
!i !i(xi) ! 0 "x #= z!
ij !ij(zi,zj) +!
i !i(zi) = 1
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0 !x "= z
We show that the value of the above is
,
I(µ,z)
Dual: max ! · µs.t.
!ij !ij(xi, xj) +
!i !i(xi) ! 0 "x #= z!
ij !ij(zi,zj) +!
i !i(zi) = 1
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0 !x "= z
We show that the value of the above is
From there it’s easy to conclude that
,
I(µ,z)
Dual: max ! · µs.t.
!ij !ij(xi, xj) +
!i !i(xi) ! 0 "x #= z!
ij !ij(zi,zj) +!
i !i(zi) = 1
Proof - key ideas
F (µ) =
!""#
""$
min p(z)s.t. pij(xi, xj) = µij(xi, xj)
pi(xi) = µi(xi)p(x) ! 0 !x "= z
We show that the value of the above is
From there it’s easy to conclude that
F (µ) = max{0, I(µ,z)}
,
I(µ,z)
Dual: max ! · µs.t.
!ij !ij(xi, xj) +
!i !i(xi) ! 0 "x #= z!
ij !ij(zi,zj) +!
i !i(zi) = 1
Proof - Max marginalsmax ! · µs.t. !(x) ! 0 "x #= z
!(z) = 1!(x) =
!
ij
!ij(xi, xj) +!
i
!i(xi)
Proof - Max marginals
Use max-marginals:
max ! · µs.t. !(x) ! 0 "x #= z
!(z) = 1!(x) =
!
ij
!ij(xi, xj) +!
i
!i(xi)
Proof - Max marginals
Use max-marginals:
!̄(xi) = maxx̂:x̂i=xi
!(x)
!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj
!(x)
max ! · µs.t. !(x) ! 0 "x #= z
!(z) = 1!(x) =
!
ij
!ij(xi, xj) +!
i
!i(xi)
Proof - Max marginals
Use max-marginals:
!̄(xi) = maxx̂:x̂i=xi
!(x)
!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj
!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi
max ! · µs.t. !(x) ! 0 "x #= z
!(z) = 1!(x) =
!
ij
!ij(xi, xj) +!
i
!i(xi)
Proof - Max marginals
Use max-marginals:
!̄(xi) = maxx̂:x̂i=xi
!(x)
!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj
!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi
max ! · µs.t. !(x) ! 0 "x #= z
!(z) = 1!(x) =
!
ij
!ij(xi, xj) +!
i
!i(xi)
Rewrite: !(x) =!
i
(1! di)!̄(xi) +!
ij!T
!̄ij(xi, xj)
Proof - Max marginals
Use max-marginals:
!̄(xi) = maxx̂:x̂i=xi
!(x)
!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj
!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi
Result follows after some algebra
max ! · µs.t. !(x) ! 0 "x #= z
!(z) = 1!(x) =
!
ij
!ij(xi, xj) +!
i
!i(xi)
Rewrite: !(x) =!
i
(1! di)!̄(xi) +!
ij!T
!̄ij(xi, xj)
Tree Graph - Summary
x(1)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
x(1)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
The 2nd best satisfies so it cannot be any assignment
x(1)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
I(µ,x(1)) = 0
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
The 2nd best satisfies so it cannot be any assignment
x(1)
x(2)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
I(µ,x(1)) = 0
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
The 2nd best satisfies so it cannot be any assignment
x(1)
x(2)
x(2)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
I(µ,x(1)) = 0
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
The 2nd best satisfies so it cannot be any assignment
x(1)
x(2)
x(2)
x(2)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
I(µ,x(1)) = 0
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
The 2nd best satisfies so it cannot be any assignment
x(1)
x(2)
x(2)
x(2)X
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
I(µ,x(1)) = 0
Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
The 2nd best satisfies so it cannot be any assignment
x(1)
x(2)
x(2)
M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0
"
I(µ,x(1)) = 0
Non tree graphsAny graph can be converted into a junction tree
We can apply our tree result there
For a junction tree with cliques C and separators S, the inequality is:
!
S!S(1! dS)µS(zS) +
!
C!CµC(zC) " 0
Specifying the marginal polytope requires a number of variables exponential in the tree width. Not practical.
OutlineLP formulation of the MAP problem
LP for 2nd best
General (intractable) exact formulation
Tractable formulation for tree graphs
Approximations for non-tree graphs
Experiments
Non trees - Approximations
x(1)
TrueM(G, x(1))
Non trees - Approximations
x(1)
TrueM(G, x(1))
Non trees - Approximations
x(1)
TrueM(G, x(1))
Outer bound on M(G)
Non trees - Approximations
x(1)
TrueM(G, x(1))
Outer bound on M(G)
Non trees - Approximations
x(1)
TrueM(G, x(1))
Outer bound on M(G)
Non trees - Approximations
x(1)
TrueM(G, x(1))
Outer bound on M(G)
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
IT (µ,z) ! 0And the constraint:
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
IT (µ,z) ! 0And the constraint:
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
IT (µ,z) ! 0And the constraint:
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
IT (µ,z) ! 0And the constraint:
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
IT (µ,z) ! 0And the constraint:
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
z
IT (µ,z) ! 0And the constraint:
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
z
IT (µ,z) ! 0And the constraint:
IT (µ,z) ! 0
Spanning tree inequalities
Give a spanning subtree T of G defineIT (µ,z) =
!
i
(1! di)µi(zi) +!
ij!T
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
zFractional vertex
IT (µ,z) ! 0And the constraint:
IT (µ,z) ! 0
Adding all spanning trees
Adding all spanning trees
Can we add all spanning tree inequalities efficiently?
Adding all spanning trees
Can we add all spanning tree inequalities efficiently?
Yes, via a cutting plane approach:
Adding all spanning trees
Can we add all spanning tree inequalities efficiently?
Yes, via a cutting plane approach:
Start with one inequality
Adding all spanning trees
Can we add all spanning tree inequalities efficiently?
Yes, via a cutting plane approach:
Start with one inequality
Solve LP
Adding all spanning trees
Can we add all spanning tree inequalities efficiently?
Yes, via a cutting plane approach:
Start with one inequality
Solve LP
If solution is fractional, find a violated tree inequality (if exists) and add it
Cutting Plane Algorithm
Cutting Plane Algorithm
z
Cutting Plane Algorithm
zT1
Cutting Plane Algorithm
zµ1
T1
Cutting Plane Algorithm
zµ1 Is there a tree
inequality thatviolates?
µ1
T1
Cutting Plane Algorithm
zµ1 Is there a tree
inequality thatviolates?
µ1
T1
T2
Cutting Plane Algorithm
zµ1 Is there a tree
inequality thatviolates?
µ1
T1
T2
Cutting Plane Algorithm
How do we find a violated tree inequality?
Note: Even all spanning tree inequalities might not suffice
zµ1 Is there a tree
inequality thatviolates?
µ1
T1
T2
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
How can we maximize over all trees? Note that:
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
How can we maximize over all trees? Note that:
IT (µ,z) =!
ij!T
"µij(zi, zj)! µi(zi)! µj(zj)
#+
!
i
µi(zi)
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
How can we maximize over all trees? Note that:
IT (µ,z) =!
ij!T
"µij(zi, zj)! µi(zi)! µj(zj)
#+
!
i
µi(zi)
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
How can we maximize over all trees? Note that:
IT (µ,z) =!
ij!T
"µij(zi, zj)! µi(zi)! µj(zj)
#+
!
i
µi(zi)
wij
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
How can we maximize over all trees? Note that:
IT (µ,z) =!
ij!T
"µij(zi, zj)! µi(zi)! µj(zj)
#+
!
i
µi(zi)
wij Fixed
Finding a violated spanning tree
For a given find
If it’s positive, add the maximizing tree
µ maxT
IT (µ,z)
How can we maximize over all trees? Note that:
IT (µ,z) =!
ij!T
"µij(zi, zj)! µi(zi)! µj(zj)
#+
!
i
µi(zi)
Decomposes into edge scores. Maximizing tree can be found using a maximum-weight-spanning-tree algorithm (e.g., Wainwright 02)
wij Fixed
ExperimentsAlternative algorithms for approximate 2nd best:
Using approximate marginals from max-product (BMMF; Yanover and Weiss 04)
Lawler/Nillson (72,80) - Partition assignments :
Maximize over each part approximately. Cost O(n)
Our algorithm: STRIPES
x != x(1)
x1 != x(1)1 x2 = " x3 = " . . . xn = "
x1 = x(1)1 x2 != x(1)
2 x3 = " . . . xn = "...
......
......
x1 = x(1)1 x2 = x(1)
2 x3 = x(3)1 . . . xn != x(n)
1
Attractive GridsIsing models with ferromagnetic interaction
The local-polytope guaranteed to yield exact first best (but not equal to the marginal polytope)
Goal: Find 50 best. Stripes and Nillson find all of them exactly. Up to 19 spanning trees added
S N B0
0.5
1
S N B0
50
Stripes Nillson BMMF
0
50
0Stripes Nillson BMMF
Rank Run Time
Protein Side Chain Prediction
Given protein’s 3D shape (backbone), choose most probable side chain configuration
xi!
xk!
xj !
xh!
G=(V,E)!
Protein backbone!
Side-chains!
(MRFs from Yanover, Meltzer, Weiss ‘06)!
Can be cast as a MAP problem
Important to obtain multiple possible solutions
p(x) ! eP
ij!E !ij(xi,xj)
Protein Side Chain Prediction
Stripes found the exact solutions for all problems studied
In some cases, we used a tighter approximation of the marginal polytope (Sontag et al, UAI 08)
S N B0
50
S N B0
0.5
1
Stripes Nillson BMMF0
50
0Stripes Nillson BMMF
Open Questions
Open QuestionsWhen are spanning trees enough?
Open QuestionsWhen are spanning trees enough?
What is the polytope structure for k-best?
Open QuestionsWhen are spanning trees enough?
What is the polytope structure for k-best?
Finding k-best “different” solutions
Open QuestionsWhen are spanning trees enough?
What is the polytope structure for k-best?
Finding k-best “different” solutions
Scalable algorithms
Open QuestionsWhen are spanning trees enough?
What is the polytope structure for k-best?
Finding k-best “different” solutions
Scalable algorithms
If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?
Open QuestionsWhen are spanning trees enough?
What is the polytope structure for k-best?
Finding k-best “different” solutions
Scalable algorithms
If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?
SummaryThe 2nd best can be posed as a linear program
For trees differs from 1st best by one constraint only
For non-trees, approximation can be devised by adding inequalities for all spanning trees
Empirically effective