Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

37
Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu, UMass Amherst

description

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers. Ozgur Sumer, U. Chicago Umut Acar , MPI-SWS Alexander Ihler , UC Irvine Ramgopal Mettu , UMass Amherst. Graphical models. Structured ( neg ) energy function Goal: Examples. Pairwise :. A. C. A. C. B. A. - PowerPoint PPT Presentation

Transcript of Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Page 1: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Ozgur Sumer, U. ChicagoUmut Acar, MPI-SWSAlexander Ihler, UC IrvineRamgopal Mettu, UMass Amherst

Page 2: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Graphical models• Structured (neg) energy function

• Goal:

• Examples

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph

MarkovRandom FieldPairwise:

Page 3: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Graphical models• Structured (neg) energy function

• Goal:

• Examples– Stereo depth

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph

MarkovRandom FieldPairwise:

Stereo image pair MRF model Depth

Page 4: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Graphical models• Structured (neg) energy function

• Goal:

• Examples– Stereo depth– Protein design & prediction

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph

MarkovRandom FieldPairwise:

Page 5: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Graphical models• Structured (neg) energy function

• Goal:

• Examples– Stereo depth– Protein design & prediction– Weighted constraint satisfaction problems

C

B

A C

B

AC

B

A

BayesianNetwork

FactorGraph

MarkovRandom FieldPairwise:

Page 6: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Dual decomposition methods

Original

Page 7: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Dual decomposition methods

• Decompose graph into smaller subproblems• Solve each independently; optimistic bound• Exact if all copies agree

Original Decomposition

Page 8: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Dual decomposition methods

• Decompose graph into smaller subproblems• Solve each independently; optimistic bound• Exact if all copies agree• Enforce lost equality constraints via Langrange multipliers

Original Decomposition

Page 9: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Dual decomposition methods

Same bound by different names• Dual decomposition (Komodakis et al. 2007)

• TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)

• Soft arc consistency (Cooper & Schiex 2004)

Original Decomposition

Page 10: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Dual decomposition methods

Original Decomposition

MAPE

nerg

y

Consistent solutions

Relaxed

problems

Page 11: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Optimizing the bound

Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions

5 2 2

1 2 0

0 0 2

1 0 1

0 1 0

1 2 1

0 0

0 0

0 0

Page 12: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Optimizing the bound

Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions

5 2 2

1 2 0

0 0 2

1 0 1

0 1 0

1 2 1

0 0

0 0

0 0

Page 13: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Optimizing the bound

Subgradient descent• Find each subproblem’s optimal configuration• Adjust entries for mis-matched solutions

4 1 1

1 2 0

1 1 3

2 1 2

0 1 0

0 1 0

+1 -1

0 0

-1 +1

Page 14: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Equivalent decompositions

• Any collection of tree-structured parts are equivalent• Two extreme cases

– Set of all individual edges– Single “covering tree” of all edges; variables duplicated

Original graph “Edges” Covering tree

Page 15: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Speeding up inference

• Parallel updates– Easy to perform subproblems in parallel

(e.g. Komodakis et al. 2007)

• Adaptive updates

Page 16: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Some complications…

• Example: Markov chain– Can pass messages in parallel, but…– If xn depends on x1, takes O(n) time anyway– Slow “convergence rate”

• Larger problems are more “efficient”• Smaller problems are easily parallel & adaptive• Similar effects in message passing

– Residual splash (Gonzales et al. 2009)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 17: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Cluster trees

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results

Page 18: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Cluster trees

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results

Page 19: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Cluster trees

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results

Page 20: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Cluster trees

• Alternative means of parallel computation– Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

• Simple chain model– Normally, eliminate variables “in order” (DP)– Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

Page 21: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Cluster trees

• Alternative means of parallel computation

• Eliminate variables in alternative order– Eliminate some intermediate (degree 2) nodes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 22: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Cluster trees

• Alternative means of parallel computation

• Eliminate variables in alternative order– Eliminate some intermediate (degree 2) nodes

– Balanced: depth log(n)

x1---x2---x3--x4---x5---x6--x7--x8---x9---x10 x10

x5

x2 x6

x3 x8

x1 x4 x7 x9

Page 23: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 24: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 25: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 26: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Adapting to changes

• 1st pass: update O(log n) cluster functions• 2nd pass: mark changed configurations, repeat

decoding: O(m log n/m)

n = sequence length; m = # of changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 27: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Adapting to changes

• 1st pass: update O(log n) cluster functions• 2nd pass: mark changed configurations, repeat

decoding: O(m log n/m)

n = sequence length; m = # of changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Page 28: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Experiments

• Random synthetic problems– Random, irregular but “grid-like” connectivity

• Stereo depth images– Superpixel representation– Irregular graphs

• Compare “edges” and “cover-tree”• 32-core Intel Xeon, Cilk++ implementation

Page 29: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Synthetic problems

• Larger problems improve convergence rate

Page 30: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Synthetic problems

• Larger problems improve convergence rate

• Adaptivity helps significantly

• Cluster overhead

Page 31: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Synthetic problems

• Larger problems improve convergence rate

• Adaptivity helps significantly

• Cluster overhead• Parallelism

Page 32: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Synthetic models

• As a function of problem size

Page 33: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Stereo depth

Page 34: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Stereo depth

Page 35: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Stereo depth

Page 36: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Stereo depth

• Time to convergence for different problems

Page 37: Fast Parallel and Adaptive Updates  for Dual-Decomposition Solvers

Conclusions

• Fast methods for dual decomposition– Parallel computation– Adaptive updating

• Subproblem choice– Small problems: highly parallel, easily adaptive– Large problems: better convergence rates

• Cluster trees– Alternative form for parallel & adaptive updates– Benefits of both large & small subproblems