Message Passing Algorithms for Optimization TexPoint fonts used in EMF. Read the TexPoint manual...

55
Message Passing Algorithms for Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University 1
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    233
  • download

    0

Transcript of Message Passing Algorithms for Optimization TexPoint fonts used in EMF. Read the TexPoint manual...

1

Message Passing Algorithms for Optimization

Nicholas Ruozzi Advisor: Sekhar Tatikonda

Yale University

2

The Problem

Minimize a real-valued objective function that factorizes as a sum of potentials

(a multiset whose elements are subsets of the indices 1,…,n)

3

Corresponding Graph

21 3

4

Local Message Passing Algorithms

Pass messages on this graph to minimize f

Distributed message passing algorithm

Ideal for large scientific problems, sensor networks, etc.

21 3

5

The Min-Sum Algorithm

Messages at time t:

21 3

4

6

Computing Beliefs The min-marginal corresponding to the ith

variable is given by

Beliefs approximate the min-marginals:

Estimate the optimal assignment as

7

Min-Sum: Convergence Properties

Iterations do not necessarily converge

Always converges when the factor graph is a tree

Converged estimates need not correspond to the optimal solution

Performs well empirically

8

Previous Work

Prior work focused on two aspects of message passing algorithms

Convergence Coordinate ascent schemes Not necessarily local message passing algorithms

Correctness No combinatorial characterization of failure modes Concerned only with global optimality

9

Contributions

A new local message passing algorithm

Parameterized family of message passing algorithms

Conditions under which the estimate produced by the splitting algorithm is guaranteed to be a global optima

Conditions under which the estimate produced by the splitting algorithm is guaranteed to be a local optima

10

Contributions

What makes a graphical model “good”?

Combinatorial understanding of the failure modes of the splitting algorithm via graph covers

Can be extended to other iterative algorithms

Techniques for handling objective functions for which the known convergent algorithms fail

Reparameterization centric approach

11

Publications

Convergent and correct message passing schemes for optimization problems over graphical modelsProceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010

Fixing Max-Product: A Unified Look at Message Passing Algorithms (invited talk)Proceedings of the Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 2010

Unconstrained minimization of quadratic functions via min-sumProceedings of the Conference on Information Sciences and Systems (CISS), Princeton, NJ/USA, March 2010

Graph covers and quadratic minimizationProceedings of the Forty-Seventh Annual Allerton Conference on Communication, Control, and Computing, September 2009

s-t paths using the min-sum algorithmProceedings of the Forty-Sixth Annual Allerton Conference on Communication, Control, and Computing, September 2008

12

Outline

Reparameterizations

Lower Bounds

Convergent Message Passing

Finding a Minimizing Assignment

Graph covers

Quadratic Minimization

13

The Problem

Minimize a real-valued objective function that factorizes as a sum of potentials

(a multiset whose elements are subsets of the indices 1,…,n)

14

Factorizations

Some factorizations are better than others

If xi takes one of k values this requires at most 2k2

+ k operations

15

Factorizations

Some factorizations are better than others

Suppose

Only need k operations to compute the minimum value!

16

Reparameterizations

We can rewrite the objective function as

This does not change the objective function as long as the messages are real-valued at each x

The objective function is reparameterized in terms of the messages

17

Reparameterizations

We can rewrite the objective function as

The reparameterization has the same factor graph as the original factorization

Many message passing algorithms produce a reparameterization upon convergence

18

The Splitting Reparameterization Let c be a vector of non-zero reals

If c is a vector of positive integers, then we could view this as a factorization in two ways: Over the same factor graph as the original

potentials Over a factor graph where each potential has been

“split” into several pieces

19

The Splitting Reparameterization

2

1

3 2

1

3

Factor graph

Factor graph resulting from “splitting” each of the

pairwise potentials 3 times

20

The Splitting Reparameterization

Beliefs:

Reparameterization:

21

Outline

Reparameterizations

Lower Bounds

Convergent Message Passing

Finding a Minimizing Assignment

Graph covers

Quadratic Minimization

22

Lower Bounds

Can lower bound the objective function with these reparameterizations:

Find the collection of messages that maximize this lower bound

Lower bound is a concave function of the messages

Use coordinate ascent or subgradient methods

23

Lower Bounds and the MAP LP

Equivalent to minimizing f

Dual provides a lower bound on f

Messages are a side-effect of certain dual formulations

24

Outline

Reparameterizations

Lower Bounds

Convergent Message Passing

Finding a Minimizing Assignment

Graph covers

Quadratic Minimization

25

The Splitting Algorithm A local message passing algorithm for the

splitting reparameterization

Contains the min-sum algorithm as a special case For the integer case, can be derived from the min-

sum update equations

26

The Splitting Algorithm

For certain choices of c, an asynchronous version of the splitting algorithm can be shown to be a block coordinate ascent scheme for the lower bound:

For example:

27

Asynchronous Splitting Algorithm

2

1

3

28

Asynchronous Splitting Algorithm

2

1

3

29

Asynchronous Splitting Algorithm

2

1

3

30

Coordinate Ascent

Guaranteed to converge

Does not necessarily maximize the lower bound

Can get stuck in a suboptimal configuration

Can be shown to converge to the maximum in restricted cases

Pairwise-binary objective functions

31

Other Ascent Schemes

Many other ascent algorithms are possible over different lower bounds:

TRW-S [Kolmogorov 2007]

MPLP [Globerson and Jaakkola 2007]

Max-Sum Diffusion [Werner 2007]

Norm-product [Hazan 2010]

Not all coordinate ascent schemes are local

32

Outline

Reparameterizations

Lower Bounds

Convergent Message Passing

Finding a Minimizing Assignment

Graph covers

Quadratic Minimization

33

Constructing the Solution

Construct an estimate, x*, of the optimal assignment from the beliefs by choosing

For certain choices of the vector c, if each argmin is unique, then x* minimizes f

A simple choice of c guarantees both convergence and correctness (if the argmins are unique)

34

Correctness

If the argmins are not unique, then we may not be able to construct a solution

When does the algorithm converge to the correct minimizing assignment?

35

Outline

Reparameterizations

Lower Bounds

Convergent Message Passing

Finding a Minimizing Assignment

Graph covers

Quadratic Minimization

36

Graph Covers

A graph H covers a graph G if there is homomorphism from H to G that is a bijection on neighborhoods

Graph G 2-cover of G

2

1

3

2

1 3

3’

2’

1’

37

Graph Covers

Potential functions are “lifts” of the nodes they cover

Graph G 2-cover of G

2

1

3

2

1 3

3’

2’

1’

38

Graph Covers

The lifted potentials define a new objective function

Objective function:

2-cover objective function

39

Graph Covers

Indistinguishability: for any cover and any choice of initial messages on the original graph, there exists a choice of initial messages on the cover such that the messages passed by the splitting algorithm are identical on both graphs

For choices of c that guarantee correctness, any assignment that uniquely minimizes each must also minimize the objective function corresponding to any finite cover

40

Maximum Weight Independent Set

1

2 3

2

1 3

3’

2’

1’

Graph G 2-cover of G

41

Maximum Weight Independent Set

5

2 2

2

5 2

2

2

5

Graph G 2-cover of G

42

Maximum Weight Independent Set

5

2 2

2

5 2

2

2

5

Graph G 2-cover of G

43

Maximum Weight Independent Set

3

2 2

2

3 2

2

2

3

Graph G 2-cover of G

44

Maximum Weight Independent Set

3

2 2

2

3 2

2

2

3

Graph G 2-cover of G

45

More Graph Covers

If covers of the factor graph have different solutions

The splitting algorithm cannot converge to the correct answer for choices of c that guarantee correctness

The min-sum algorithm may converge to an assignment that is optimal on a cover

There are applications for which the splitting algorithm always works

Minimum cuts, shortest paths, and more…

46

Graph Covers

Suppose f factorizes over a set with corresponding factor graph G and the choice of c guarantees correctness

Theorem: the splitting algorithm can only converge to beliefs that have unique argmins if

f is uniquely minimized at the assignment x*

The objective function corresponding to every finite cover H of G has a unique minimum that is a lift of x*

47

Graph Covers

This result suggests that

There is a close link between “good” factorizations and the difficulty of a problem

Convergent and correct algorithms are not ideal for all applications

Convex functions can be covered by functions that are not convex

48

Outline

Reparameterizations

Lower Bounds

Convergent Message Passing

Finding a Minimizing Assignment

Graph covers

Quadratic Minimization

49

Quadratic Minimization

symmetric positive definite implies a unique minimum

Minimized at

For a positive definite matrix, min-sum convergence implies a correct solution:

Min-sum is not guaranteed to converge for all symmetric positive definite matrices

50

Quadratic Minimization

51

Quadratic Minimization

A symmetric matrix is scaled diagonally dominant if there exists w > 0 such that for each row i:

Theorem: ¡ is scaled diagonally iff every finite cover of ¡ is positive definite

52

Quadratic Minimization

Scaled diagonal dominance is a sufficient condition for the convergence of other iterative methods

Gauss-Seidel, Jacobi, and min-sum

Suggests a generalization of scaled diagonal dominance for arbitrary convex functions

Purely combinatorial!

Empirically, the splitting algorithm can always be made to converge for this problem

53

Conclusion

General strategy for minimization Reparameterization Lower bounds Convergent and correct message passing

algorithms

Correctness is too strong Algorithms cannot distinguish graph covers Can fail to hold even for convex problems

54

Conclusion

Open questions

Deep relationship between “hardness” of a problem and its factorizations

Convergence and correctness criteria for the min-sum algorithm

Rates of convergence

55

Questions?

A draft of the thesis is available online at:

http://cs-www.cs.yale.edu/homes/nruozzi/Papers/ths2.pdf