Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using...

49
Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science University of Massachusetts, Amherst {belanger,sheldon,mccallum}@cs.umass.edu December 10, 2013

Transcript of Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using...

Page 1: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Marginal Inference in MRFs using Frank-Wolfe

David Belanger, Daniel Sheldon, Andrew McCallum

School of Computer ScienceUniversity of Massachusetts, Amherst

{belanger,sheldon,mccallum}@cs.umass.edu

December 10, 2013

Page 2: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Table of Contents

1 Markov Random Fields

2 Frank-Wolfe for Marginal Inference

3 Optimality Guarantees and Convergence Rate

4 Beyond MRFs

5 Fancier FW

December 10, 2013 2 / 26

Page 3: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Table of Contents

1 Markov Random Fields

2 Frank-Wolfe for Marginal Inference

3 Optimality Guarantees and Convergence Rate

4 Beyond MRFs

5 Fancier FW

December 10, 2013 3 / 26

Page 4: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Markov Random Fields

Φθ(x) =∑c∈C

θc(xc)

P(x) =exp (Φθ(x))

log(Z )

x→ µ

Φθ(x)→ 〈θ,µ〉

December 10, 2013 4 / 26

Page 5: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Markov Random Fields

Φθ(x) =∑c∈C

θc(xc)

P(x) =exp (Φθ(x))

log(Z )

x→ µ

Φθ(x)→ 〈θ,µ〉

December 10, 2013 4 / 26

Page 6: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Markov Random Fields

Φθ(x) =∑c∈C

θc(xc)

P(x) =exp (Φθ(x))

log(Z )

x→ µ

Φθ(x)→ 〈θ,µ〉

December 10, 2013 4 / 26

Page 7: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Markov Random Fields

Φθ(x) =∑c∈C

θc(xc)

P(x) =exp (Φθ(x))

log(Z )

x→ µ

Φθ(x)→ 〈θ,µ〉

December 10, 2013 4 / 26

Page 8: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Markov Random Fields

Φθ(x) =∑c∈C

θc(xc)

P(x) =exp (Φθ(x))

log(Z )

x→ µ

Φθ(x)→ 〈θ,µ〉

December 10, 2013 4 / 26

Page 9: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Marginal Inference

µMARG = EPθ[µ]

µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)

µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)

HB(µ) =∑c∈C

WcH(µc)

December 10, 2013 5 / 26

Page 10: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Marginal Inference

µMARG = EPθ[µ]

µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)

µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)

HB(µ) =∑c∈C

WcH(µc)

December 10, 2013 5 / 26

Page 11: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Marginal Inference

µMARG = EPθ[µ]

µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)

µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)

HB(µ) =∑c∈C

WcH(µc)

December 10, 2013 5 / 26

Page 12: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Marginal Inference

µMARG = EPθ[µ]

µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ)

µ̄approx = arg maxµ∈L〈µ,θ〉+ HB(µ)

HB(µ) =∑c∈C

WcH(µc)

December 10, 2013 5 / 26

Page 13: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

MAP Inference

µMAP = arg maxµ∈M〈µ,θ〉

Black&Box&&MAP&Solver&

✓ µMAP

Gray&Box&&MAP&Solver&

✓ µMAP

December 10, 2013 6 / 26

Page 14: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

MAP Inference

µMAP = arg maxµ∈M〈µ,θ〉

Black&Box&&MAP&Solver&

✓ µMAP

Gray&Box&&MAP&Solver&

✓ µMAP

December 10, 2013 6 / 26

Page 15: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

MAP Inference

µMAP = arg maxµ∈M〈µ,θ〉

Black&Box&&MAP&Solver&

✓ µMAP

Gray&Box&&MAP&Solver&

✓ µMAP

December 10, 2013 6 / 26

Page 16: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Marginal → MAP Reductions

Hazan and Jaakkola [2012]

Ermon et al. [2013]

December 10, 2013 7 / 26

Page 17: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Table of Contents

1 Markov Random Fields

2 Frank-Wolfe for Marginal Inference

3 Optimality Guarantees and Convergence Rate

4 Beyond MRFs

5 Fancier FW

December 10, 2013 8 / 26

Page 18: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Generic FW with Line Search

yt = arg minx∈X〈x,−∇f (xt−1)〉

xt = minγ∈[0,1]

f ((1− γ)xt + γyt)

December 10, 2013 9 / 26

Page 19: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Generic FW with Line Search

Linear&&Minimiza<on&

Oracle&

Line&Search&Compute&&Gradient&

xt

�rf(xt�1) yt

December 10, 2013 10 / 26

Page 20: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

FW for Marginal Inference

MAP&Inference&Oracle&

Line&Search&Compute&Gradient&

rF (µt) = ✓ +rH(µt)

✓̃ µ̃MAP

µt+1

December 10, 2013 11 / 26

Page 21: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Subproblem Parametrization

F (µ) = 〈µ,θ〉+∑c∈C

WcH(µc)

θ̃ = ∇F (µt) = θ +∑c∈C

Wc∇H(µc)

December 10, 2013 12 / 26

Page 22: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Subproblem Parametrization

F (µ) = 〈µ,θ〉+∑c∈C

WcH(µc)

θ̃ = ∇F (µt) = θ +∑c∈C

Wc∇H(µc)

December 10, 2013 12 / 26

Page 23: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Line Search

µ̃MAP

µt

µt+1

Computing line search objective can scale with:

Bad: # possible values in cliques.

Good: # cliques in graph.

(see paper)

December 10, 2013 13 / 26

Page 24: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Line Search

µ̃MAP

µt

µt+1

Computing line search objective can scale with:

Bad: # possible values in cliques.

Good: # cliques in graph.

(see paper)

December 10, 2013 13 / 26

Page 25: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Line Search

µ̃MAP

µt

µt+1

Computing line search objective can scale with:

Bad: # possible values in cliques.

Good: # cliques in graph.

(see paper)

December 10, 2013 13 / 26

Page 26: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Line Search

µ̃MAP

µt

µt+1

Computing line search objective can scale with:

Bad: # possible values in cliques.

Good: # cliques in graph.

(see paper)

December 10, 2013 13 / 26

Page 27: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Experiment #1

December 10, 2013 14 / 26

Page 28: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Table of Contents

1 Markov Random Fields

2 Frank-Wolfe for Marginal Inference

3 Optimality Guarantees and Convergence Rate

4 Beyond MRFs

5 Fancier FW

December 10, 2013 15 / 26

Page 29: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Convergence Rate

Convergence Rate of Frank-Wolfe [Jaggi, 2013]

F (µt)− F (µ∗) ≤ 2CF

t + 2(1 + δ)

δCft+2 MAP suboptimality at iter t −→ NP-Hard

How to deal with MAP hardness?

Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].

Relax to the local polytope.

December 10, 2013 16 / 26

Page 30: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Convergence Rate

Convergence Rate of Frank-Wolfe [Jaggi, 2013]

F (µt)− F (µ∗) ≤ 2CF

t + 2(1 + δ)

δCft+2 MAP suboptimality at iter t

−→ NP-Hard

How to deal with MAP hardness?

Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].

Relax to the local polytope.

December 10, 2013 16 / 26

Page 31: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Convergence Rate

Convergence Rate of Frank-Wolfe [Jaggi, 2013]

F (µt)− F (µ∗) ≤ 2CF

t + 2(1 + δ)

δCft+2 MAP suboptimality at iter t −→ NP-Hard

How to deal with MAP hardness?

Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].

Relax to the local polytope.

December 10, 2013 16 / 26

Page 32: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Convergence Rate

Convergence Rate of Frank-Wolfe [Jaggi, 2013]

F (µt)− F (µ∗) ≤ 2CF

t + 2(1 + δ)

δCft+2 MAP suboptimality at iter t −→ NP-Hard

How to deal with MAP hardness?

Use MAP solver and hope for the best [Hazan and Jaakkola, 2012].

Relax to the local polytope.

December 10, 2013 16 / 26

Page 33: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Curvature + Convergence Rate

Cf = supx ,s∈D;γ∈[0,1];y=x+γ(s−x)

2

γ2(f (y)− f (x)− 〈y − x ,∇f (x)〉)

µ̃MAP

µt

µt+1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

entr

opy

prob x = 1

December 10, 2013 17 / 26

Page 34: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Curvature + Convergence Rate

Cf = supx ,s∈D;γ∈[0,1];y=x+γ(s−x)

2

γ2(f (y)− f (x)− 〈y − x ,∇f (x)〉)

µ̃MAP

µt

µt+1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

entr

opy

prob x = 1

December 10, 2013 17 / 26

Page 35: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Experiment #2

December 10, 2013 18 / 26

Page 36: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Table of Contents

1 Markov Random Fields

2 Frank-Wolfe for Marginal Inference

3 Optimality Guarantees and Convergence Rate

4 Beyond MRFs

5 Fancier FW

December 10, 2013 19 / 26

Page 37: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Beyond MRFs

Question

Are MRFs the right Gibbs distribution to use Frank-Wolfe?

Problem Family MAP Algorithm Marginal Algorithmtree-structured graphical models Viterbi Forward-Backward

loopy graphical models Max-Product BP Sum-Product BPDirected Spanning Tree Chu Liu Edmonds Matrix Tree Theorem

Bipartite Matching Hungarian Algorithm ×

December 10, 2013 20 / 26

Page 38: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Beyond MRFs

Question

Are MRFs the right Gibbs distribution to use Frank-Wolfe?

Problem Family MAP Algorithm Marginal Algorithmtree-structured graphical models Viterbi Forward-Backward

loopy graphical models Max-Product BP Sum-Product BPDirected Spanning Tree Chu Liu Edmonds Matrix Tree Theorem

Bipartite Matching Hungarian Algorithm ×

December 10, 2013 20 / 26

Page 39: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Table of Contents

1 Markov Random Fields

2 Frank-Wolfe for Marginal Inference

3 Optimality Guarantees and Convergence Rate

4 Beyond MRFs

5 Fancier FW

December 10, 2013 21 / 26

Page 40: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

norm-regularized marginal inference

µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ) + λR(µ)

Harchaoui et al. [2013].

Local linear oracle for MRFs?

µ̃t = arg maxµ∈M∩Br (µt)

〈µ,θ〉

Garber and Hazan [2013]

December 10, 2013 22 / 26

Page 41: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

norm-regularized marginal inference

µMARG = arg maxµ∈M〈µ,θ〉+ HM(µ) + λR(µ)

Harchaoui et al. [2013].

Local linear oracle for MRFs?

µ̃t = arg maxµ∈M∩Br (µt)

〈µ,θ〉

Garber and Hazan [2013]

December 10, 2013 22 / 26

Page 42: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Conclusion

We need to figure out how to handle the entropy gradient.

There are plenty of extensions to further Gibbs distributions +regularizers.

December 10, 2013 23 / 26

Page 43: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Conclusion

We need to figure out how to handle the entropy gradient.

There are plenty of extensions to further Gibbs distributions +regularizers.

December 10, 2013 23 / 26

Page 44: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Further Reading I

Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman. Taming thecurse of dimensionality: Discrete integration by hashing and optimization. InProceedings of the 30th International Conference on Machine Learning(ICML-13), pages 334–342, 2013.

D. Garber and E. Hazan. A Linearly Convergent Conditional Gradient Algorithmwith Applications to Online and Stochastic Optimization. ArXiv e-prints,January 2013.

Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradientalgorithms for norm-regularized smooth convex optimization. arXiv preprintarXiv:1302.2325, 2013.

Tamir Hazan and Tommi S Jaakkola. On the Partition Function and RandomMaximum A-Posteriori Perturbations. In Proceedings of the 29th InternationalConference on Machine Learning (ICML-12), pages 991–998, 2012.

Bert Huang and Tony Jebara. Approximating the permanent with beliefpropagation. arXiv preprint arXiv:0908.1769, 2009.

December 10, 2013 24 / 26

Page 45: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Further Reading II

Mark Huber. Exact sampling from perfect matchings of dense regular bipartitegraphs. Algorithmica, 44(3):183–193, 2006.

Martin Jaggi. Revisiting Frank-Wolfe: Projection-Free Sparse ConvexOptimization. In Proceedings of the 30th International Conference on MachineLearning (ICML-13), pages 427–435, 2013.

James Petterson, Tiberio Caetano, Julian McAuley, and Jin Yu. Exponentialfamily graph matching and ranking. 2009.

Tim Roughgarden and Michael Kearns. Marginals-to-models reducibility. InAdvances in Neural Information Processing Systems, pages 1043–1051, 2013.

Maksims Volkovs and Richard S Zemel. Efficient sampling for bipartite matchingproblems. In Advances in Neural Information Processing Systems, pages1322–1330, 2012.

Pascal O Vontobel. The bethe permanent of a non-negative matrix. InCommunication, Control, and Computing (Allerton), 2010 48th AnnualAllerton Conference on, pages 341–346. IEEE, 2010.

December 10, 2013 25 / 26

Page 46: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Finding the Marginal Matching

Sampling

Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].

Used for maximum-likelihood learning [Petterson et al., 2009].

Sum-Product

Also requires Bethe approximation.Works well:

In practice [Huang and Jebara, 2009]

In theory [Vontobel, 2010]

Frank-Wolfe

Basically the same algorithm as for graphical models.

Same issue with curvature.

December 10, 2013 26 / 26

Page 47: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Finding the Marginal Matching

Sampling

Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].Used for maximum-likelihood learning [Petterson et al., 2009].

Sum-Product

Also requires Bethe approximation.Works well:

In practice [Huang and Jebara, 2009]

In theory [Vontobel, 2010]

Frank-Wolfe

Basically the same algorithm as for graphical models.

Same issue with curvature.

December 10, 2013 26 / 26

Page 48: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Finding the Marginal Matching

Sampling

Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].Used for maximum-likelihood learning [Petterson et al., 2009].

Sum-Product

Also requires Bethe approximation.Works well:

In practice [Huang and Jebara, 2009]

In theory [Vontobel, 2010]

Frank-Wolfe

Basically the same algorithm as for graphical models.

Same issue with curvature.

December 10, 2013 26 / 26

Page 49: Marginal Inference in MRFs using Frank-Wolfe · 2013. 12. 29. · Marginal Inference in MRFs using Frank-Wolfe David Belanger, Daniel Sheldon, Andrew McCallum School of Computer Science

Finding the Marginal Matching

Sampling

Expensive, but doable [Huber, 2006, Volkovs and Zemel, 2012].Used for maximum-likelihood learning [Petterson et al., 2009].

Sum-Product

Also requires Bethe approximation.Works well:

In practice [Huang and Jebara, 2009]

In theory [Vontobel, 2010]

Frank-Wolfe

Basically the same algorithm as for graphical models.

Same issue with curvature.

December 10, 2013 26 / 26