Algorithms for MAP estimationin Markov Random Fields
Vladimir Kolmogorov
University College London
Energy function
qp
qppqp
ppconst xxxE,
),()()|( x
p
qunary terms
(data)pairwise terms
(coherence)
- xp are discrete variables (for example, xp{0,1})
- p(•) are unary potentials
- pq(•,•) are pairwise potentials
Minimisation algorithms• Min Cut / Max Flow [Ford&Fulkerson ‘56]
[Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables)[Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables)+ If applicable, gives very accurate results– Can be applied to a restricted class of functions
• BP – Max-product Belief Propagation [Pearl ‘86]+ Can be applied to any energy function– In vision results are usually worse than that of graph cuts– Does not always converge
• TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05]+ Can be applied to any energy function+ For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]
Main idea: LP relaxation• Goal: Minimize energy E(x) under constraints
xp{0,1}
• In general, NP-hard problem!
• Relax discreteness constraints: allow xp[0,1]
• Results in linear program. Can be solved in polynomial time!
Energy functionwith discrete variables
LP relaxation
E E Etight not tight
Solving LP relaxation• Too large for general purpose LP solvers (e.g. interior point methods) • Solve dual problem instead of primal:
– Formulate lower bound on the energy– Maximize this bound– When done, solves primal problem (LP relaxation)
• Two different ways to formulate lower bound– Via posiforms: leads to maxflow algorithm– Via convex combination of trees: leads to tree-reweighted message passing
Lower bound onthe energy function
E
Energy functionwith discrete variables
E E
LP relaxation
Notation and Preliminaries
Energy function - visualisation
0
4
0
1
3
02
5
node p edge (p,q) node q
label 0
label 1
)0(p
)1,0(pq
qp
qppqp
ppconst xxxE,
),()()|( x
0
const
0
4
0
1
3
02
5
node p edge (p,q) node q
label 0
label 1
Energy function - visualisation
qp
qppqp
ppconst xxxE,
),()()|( x
0
vector of
all parameters
0 0 4
4
1 12
5
0
-1
-1
0 + 1
Reparameterisation
Reparameterisation
0 0 3
4
1 02
5
• Definition. is a reparameterisation of
if they define the same energy:
xxx any for )|()|( EE
4 -1
1 -1 0 +1
• Maxflow, BP and TRW perform reparameterisations
1
Part I: Lower bound viaposiforms
( maxflow algorithm)
non-negative
const - lower bound on the energy:
xx constE )|(
maximize
Lower bound via posiforms[Hammer, Hansen, Simeone’84]
qp
qppqp
ppconst xxxE,
),()()|( x
• Maximisation algorithm?– Consider functions of binary variables only
• Maximising lower bound for submodular functions – Definition of submodular functions– Overview of min cut/max flow– Reduction to max flow– Global minimum of the energy
• Maximising lower bound for non-submodular functions– Reduction to max flow
• More complicated graph– Part of optimal solution
Outline of part I
• Definition: E is submodular if every pairwise term satisfies
• Can be converted to “canonical form”:
Submodular functions of binary variables
)0,1()1,0()1,1()0,0( pqpqpqpq
2
1 2 3 4
10
0 05
zerocost
Overview of min cut/max flow
Min Cut problemsource
sink
2 1
1
2
3
45
Directed weighted graph
Min Cut problem
sink
2 1
1
2
3
45
S = {source, node 1}T = {sink, node 2, node 3}
Cut:source
Min Cut problem
sink
2 1
1
2
3
45
S = {source, node 1}T = {sink, node 2, node 3}
Cut:
• Task: Compute cut with minimum cost
Cost(S,T) = 1 + 1 = 2
source
sink
2 1
1
2
3
45
source
Maxflow algorithm
value(flow)=0
Maxflow algorithm
sink
2 1
1
2
3
45
value(flow)=0
source
Maxflow algorithm
sink
1 1
0
3
3
44
value(flow)=1
source
Maxflow algorithm
sink
1 1
0
3
3
44
value(flow)=1
source
Maxflow algorithm
sink
1 0
0
3
4
33
value(flow)=2
source
Maxflow algorithm
sink
1 0
0
3
4
33
value(flow)=2
source
value(flow)=2
sink
1 0
0
3
4
33
source
Maxflow algorithm
Maximising lower bound for submodular functions:
Reduction to maxflow
2
1 2 3 4
10
0 05
sink
2 1
1
2
3
45
source
value(flow)=0
0
Maxflow algorithm and reparameterisation
sink
2 1
1
2
3
45
value(flow)=0
2
1 2 3 4
10
0 05
0
source
Maxflow algorithm and reparameterisation
sink
1 1
0
3
3
44
value(flow)=1
1
0 3 3 4
10
0 04
1
source
Maxflow algorithm and reparameterisation
sink
1 1
0
3
3
44
value(flow)=1
1
0 3 3 4
10
0 04
1
source
Maxflow algorithm and reparameterisation
sink
1 0
0
3
4
33
value(flow)=2
1
0 3 4 3
00
0 03
2
source
Maxflow algorithm and reparameterisation
sink
1 0
0
3
4
33
value(flow)=2
1
0 3 4 3
00
0 03
2
source
Maxflow algorithm and reparameterisation
value(flow)=2
0
00
0
)1,1,0(x
minimum of the energy:
2
0
sink
1 0
0
3
4
33
source
Maxflow algorithm and reparameterisation
Maximising lower bound for non-submodular functions
Arbitrary functions of binary variables
• Can be solved via maxflow [Boros,Hammer,Sun’91]– Specially constructed graph
• Gives solution to LP relaxation: for each node
xp{0, 1/2, 1}
E
LP relaxation
non-negativemaximize
qp
qppqp
ppconst xxxE,
),()()|( x
Arbitrary functions of binary variables
0
1
0
1
1 1/2 1/2
1/2
1/2
Part of optimal solution[Hammer, Hansen, Simeone’84]
Part II: Lower bound viaconvex combination of trees
( tree-reweighted message passing)
• Goal: compute minimum of the energy for
• In general, intractable!
• Obtaining lower bound:– Split into several components: – Compute minimum for each component:
– Combine to get a bound on
• Use trees!
)|(min)( xx
E
)|(min)( ii E xx
Convex combination of trees [Wainwright, Jaakkola, Willsky ’02]
'
2
1 TT2
1
graph tree T tree T’
)( )(2
1 T )(2
1 'T
lower bound on the energymaximize
Convex combination of trees (cont’d)
TRW algorithms• Goal: find reparameterisation maximizing lower bound
• Apply sequence of different reparameterisation operations:– Node averaging– Ordinary BP on trees
• Order of operations?– Affects performance dramatically
• Algorithms:– [Wainwright et al. ’02]: parallel schedule
• May not converge
– [Kolmogorov’05]: specific sequential schedule• Lower bound does not decrease, convergence guarantees
Node averaging
0
1
4
0
Node averaging
2
0.5
2
0.5
• Send messages– Equivalent to reparameterising node and edge parameters
• Two passes (forward and backward)
Belief propagation (BP) on trees
Belief propagation (BP) on trees
constEpx
p
)|(min)0(0
x3
0constE
pxp
)|(min)1(
1 x
• Key property (Wainwright et al.):
Upon termination p gives min-marginals for node p:
TRW algorithm of Wainwright et al. with tree-based updates (TRW-T)
Run BP on all trees “Average” all nodes
• If converges, gives (local) maximum of lower bound• Not guaranteed to converge. • Lower bound may go down.
Sequential TRW algorithm (TRW-S)[Kolmogorov’05]
Run BP on all trees containing p
“Average” node p
Pick node p
Main property of TRW-S
• Theorem: lower bound never decreases.
• Proof sketch:
constT 0)(
0
1
4
0
' 0)( ' constT
Main property of TRW-S
constT 5.0)(
2
0.5
2
0.5
' 5.0)( ' constT
• Theorem: lower bound never decreases.
• Proof sketch:
TRW-S algorithm
• Particular order of averaging and BP operations
• Lower bound guaranteed not to decrease
• There exists limit point that satisfies weak tree agreement condition
• Efficiency?
“Average” node p
Pick node p
inefficient?
Efficient implementation
Run BP on all trees containing p
Efficient implementation
• Key observation: Node averaging operation preserves messages oriented towards this node
• Reuse previously passed messages!
• Need a special choice of trees:– Pick an ordering of nodes– Trees: monotonic chains
4 5 6
7 8 9
1 2 3
Efficient implementation
4 5 6
7 8 9
1 2 3
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Efficient implementation
4 5 6
7 8 9
1 2 3
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Memory requirements
• Additional advantage of TRW-S: – Needs only half as much memory as standard message
passing!
– Similar observation for bipartite graphs and parallel schedule was made in [Felzenszwalb&Huttenlocher’04]
standard message passing TRW-S
Experimental results: binary segmentation (“GrabCut”)
0 100 200 300 400
3
4
5
6x 10
5
Time
Energy average over 50 instances
Experimental results: stereo
left image ground truth
BP TRW-S20 40 60 80 100
3.6
3.8
4x 10
5
Experimental results: stereo
20 40 60 80 100 120 1401.36
1.4
1.44
x 106
20 40 60 80 100 120 140
1.93
1.94
x 107
Summary• MAP estimation algorithms are based on LP relaxation
– Maximize lower bound
• Two ways to formulate lower bound
• Via posiforms: leads to maxflow algorithm– Polynomial time solution– But: applicable for restricted energies (e.g. binary variables)
• Submodular functions: global minimum• Non-submodular functions: part of optimal solution
• Via convex combination of trees: leads to TRW algorithm– Convergence in the limit (for TRW-S)– Applicable to arbitrary energy function
• Graph cuts vs. TRW:– Accuracy: similar– Generality: TRW is more general– Speed: for stereo TRW is currently 2-5 times slower. But:
• 3 vs. 50 years of research!• More suitable for parallel implementation (GPU? Hardware?)
Discrete vs. continuous functionals Continuous formulation (Geodesic active contours)
qp
qppqp
pp xxExEE,
),()()(x ||
0
))(()(C
dssCgCE
• Maxflow algorithm– Global minimum, polynomial-time
• Metrication artefacts?
• Level sets– Numerical stability?
• Geometrically motivated– Invariant under rotation
Discrete formulation (Graph cuts)
Geo-cuts
• Continuous functional
• Construct graph such that for smooth contours C
)interior(
||
0
)()(C
C
dVfdsgCE N
cut ingcorrespond theofcost )( CE
• Class of continuous functionals?
[Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]:
– Geometric length/area (e.g. Riemannian)
– Flux of a given vector field
– Regional term
TRW formulation
subject to
max)()(
T
TTT
TTθ
where is the collection of all parameter vectors
is a fixed probability distribution on trees T
T θ
T
Efficient implementation
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
4 5 6
7 8 9
1 2 3
node being
processed
valid
messages
Efficient implementation
4 5 6
7 8 9
1 2 3
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Efficient implementation
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
4 5 6
7 8 9
1 2 3
node being
processed
valid
messages
Efficient implementation
4 5 6
7 8 9
1 2 3
valid
messages
node being
processed
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Efficient implementation
4 5 6
7 8 9
1 2 3
node being
processed
valid
messages
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Top Related