Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers
description
Transcript of Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers
Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers
Ozgur Sumer, U. ChicagoUmut Acar, MPI-SWSAlexander Ihler, UC IrvineRamgopal Mettu, UMass Amherst
Graphical models• Structured (neg) energy function
• Goal:
• Examples
C
B
A C
B
AC
B
A
BayesianNetwork
FactorGraph
MarkovRandom FieldPairwise:
Graphical models• Structured (neg) energy function
• Goal:
• Examples– Stereo depth
C
B
A C
B
AC
B
A
BayesianNetwork
FactorGraph
MarkovRandom FieldPairwise:
Stereo image pair MRF model Depth
Graphical models• Structured (neg) energy function
• Goal:
• Examples– Stereo depth– Protein design & prediction
C
B
A C
B
AC
B
A
BayesianNetwork
FactorGraph
MarkovRandom FieldPairwise:
Graphical models• Structured (neg) energy function
• Goal:
• Examples– Stereo depth– Protein design & prediction– Weighted constraint satisfaction problems
C
B
A C
B
AC
B
A
BayesianNetwork
FactorGraph
MarkovRandom FieldPairwise:
Dual decomposition methods
Original
Dual decomposition methods
• Decompose graph into smaller subproblems• Solve each independently; optimistic bound• Exact if all copies agree
Original Decomposition
Dual decomposition methods
• Decompose graph into smaller subproblems• Solve each independently; optimistic bound• Exact if all copies agree• Enforce lost equality constraints via Langrange multipliers
Original Decomposition
Dual decomposition methods
Same bound by different names• Dual decomposition (Komodakis et al. 2007)
• TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)
• Soft arc consistency (Cooper & Schiex 2004)
Original Decomposition
Dual decomposition methods
Original Decomposition
MAPE
nerg
y
Consistent solutions
Relaxed
problems
Optimizing the bound
Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions
5 2 2
1 2 0
0 0 2
1 0 1
0 1 0
1 2 1
0 0
0 0
0 0
Optimizing the bound
Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions
5 2 2
1 2 0
0 0 2
1 0 1
0 1 0
1 2 1
0 0
0 0
0 0
Optimizing the bound
Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions
4 1 1
1 2 0
1 1 3
2 1 2
0 1 0
0 1 0
+1 -1
0 0
-1 +1
Equivalent decompositions
• Any collection of tree-structured parts are equivalent• Two extreme cases
– Set of all individual edges– Single “covering tree” of all edges; variables duplicated
Original graph “Edges” Covering tree
Speeding up inference
• Parallel updates– Easy to perform subproblems in parallel
(e.g. Komodakis et al. 2007)
• Adaptive updates
Some complications…
• Example: Markov chain– Can pass messages in parallel, but…– If xn depends on x1, takes O(n) time anyway– Slow “convergence rate”
• Larger problems are more “efficient”• Smaller problems are easily parallel & adaptive• Similar effects in message passing
– Residual splash (Gonzales et al. 2009)
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Cluster trees
x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)
• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results
Cluster trees
x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)
• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results
Cluster trees
x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)
• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results
Cluster trees
• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)
• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results
x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
Cluster trees
• Alternative means of parallel computation
• Eliminate variables in alternative order– Eliminate some intermediate (degree 2) nodes
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Cluster trees
• Alternative means of parallel computation
• Eliminate variables in alternative order– Eliminate some intermediate (degree 2) nodes
– Balanced: depth log(n)
x1---x2---x3--x4---x5---x6--x7--x8---x9---x10 x10
x5
x2 x6
x3 x8
x1 x4 x7 x9
Adapting to changes
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes
• 1st pass: update O(log n) cluster functions• 2nd pass: mark changed configurations, repeat
decoding: O(m log n/m)
n = sequence length; m = # of changes
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes
• 1st pass: update O(log n) cluster functions• 2nd pass: mark changed configurations, repeat
decoding: O(m log n/m)
n = sequence length; m = # of changes
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Experiments
• Random synthetic problems– Random, irregular but “grid-like” connectivity
• Stereo depth images– Superpixel representation– Irregular graphs
• Compare “edges” and “cover-tree”• 32-core Intel Xeon, Cilk++ implementation
Synthetic problems
• Larger problems improve convergence rate
Synthetic problems
• Larger problems improve convergence rate
• Adaptivity helps significantly
• Cluster overhead
Synthetic problems
• Larger problems improve convergence rate
• Adaptivity helps significantly
• Cluster overhead• Parallelism
Synthetic models
• As a function of problem size
Stereo depth
Stereo depth
Stereo depth
Stereo depth
• Time to convergence for different problems
Conclusions
• Fast methods for dual decomposition– Parallel computation– Adaptive updating
• Subproblem choice– Small problems: highly parallel, easily adaptive– Large problems: better convergence rates
• Cluster trees– Alternative form for parallel & adaptive updates– Benefits of both large & small subproblems