Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Ozgur Sumer, U. ChicagoUmut Acar, MPI-SWSAlexander Ihler, UC IrvineRamgopal Mettu, UMass Amherst

Graphical models• Structured (neg) energy function

• Goal:

• Examples

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph

MarkovRandom FieldPairwise:


• Goal:

• Examples– Stereo depth

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph


Stereo image pair MRF model Depth


• Goal:

• Examples– Stereo depth– Protein design & prediction

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph



• Goal:

• Examples– Stereo depth– Protein design & prediction– Weighted constraint satisfaction problems

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph


Dual decomposition methods

Original


• Decompose graph into smaller subproblems• Solve each independently; optimistic bound• Exact if all copies agree

Original Decomposition


• Decompose graph into smaller subproblems• Solve each independently; optimistic bound• Exact if all copies agree• Enforce lost equality constraints via Langrange multipliers



Same bound by different names• Dual decomposition (Komodakis et al. 2007)

• TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)

• Soft arc consistency (Cooper & Schiex 2004)




MAPE

nerg

y

Consistent solutions

Relaxed

problems

Optimizing the bound

Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions

5 2 2

1 2 0

0 0 2

1 0 1

0 1 0

1 2 1

0 0

0 0

0 0

Optimizing the bound

Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions

4 1 1

1 2 0

1 1 3

2 1 2

0 1 0

0 1 0

+1 -1

0 0

-1 +1

Equivalent decompositions

• Any collection of tree-structured parts are equivalent• Two extreme cases

– Set of all individual edges– Single “covering tree” of all edges; variables duplicated

Original graph “Edges” Covering tree

Speeding up inference

• Parallel updates– Easy to perform subproblems in parallel

(e.g. Komodakis et al. 2007)

• Adaptive updates

Some complications…

• Example: Markov chain– Can pass messages in parallel, but…– If xn depends on x1, takes O(n) time anyway– Slow “convergence rate”

• Larger problems are more “efficient”• Smaller problems are easily parallel & adaptive• Similar effects in message passing

– Residual splash (Gonzales et al. 2009)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Cluster trees

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results

Cluster trees

• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

Cluster trees

• Alternative means of parallel computation

• Eliminate variables in alternative order– Eliminate some intermediate (degree 2) nodes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Cluster trees

• Alternative means of parallel computation

• Eliminate variables in alternative order– Eliminate some intermediate (degree 2) nodes

– Balanced: depth log(n)

x1---x2---x3--x4---x5---x6--x7--x8---x9---x10 x10

x5

x2 x6

x3 x8

x1 x4 x7 x9

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Adapting to changes

• 1st pass: update O(log n) cluster functions• 2nd pass: mark changed configurations, repeat

decoding: O(m log n/m)

n = sequence length; m = # of changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Experiments

• Random synthetic problems– Random, irregular but “grid-like” connectivity

• Stereo depth images– Superpixel representation– Irregular graphs

• Compare “edges” and “cover-tree”• 32-core Intel Xeon, Cilk++ implementation

Synthetic problems

• Larger problems improve convergence rate

Synthetic problems


• Adaptivity helps significantly

• Cluster overhead

Synthetic problems


• Adaptivity helps significantly

• Cluster overhead• Parallelism

Synthetic models

• As a function of problem size

Stereo depth

Stereo depth

• Time to convergence for different problems

Conclusions

• Fast methods for dual decomposition– Parallel computation– Adaptive updating

• Subproblem choice– Small problems: highly parallel, easily adaptive– Large problems: better convergence rates

• Cluster trees– Alternative form for parallel & adaptive updates– Benefits of both large & small subproblems

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Documents

Transcript of Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers