Statistical Physics Tools in Information Science - Stanford University

149
Statistical Physics Tools in Information Science Marc M´ ezard 1 and Andrea Montanari 2 (1) Universit` e de Paris Sud and (2) Stanford University June 23, 2007 Marc M´ ezard 1 and Andrea Montanari 2 Statistical Physics Tools in Information Science

Transcript of Statistical Physics Tools in Information Science - Stanford University

Page 1: Statistical Physics Tools in Information Science - Stanford University

Statistical Physics Tools in Information Science

Marc Mezard1 and Andrea Montanari2

(1) Universite de Paris Sud and (2) Stanford University

June 23, 2007

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 2: Statistical Physics Tools in Information Science - Stanford University

Structure of the presentation

Andrea: What is statistical physics and why should you care.

Marc: Two test cases: (1) counting matchings, (2) random k-SAT.

Ask whatever you want!

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 3: Statistical Physics Tools in Information Science - Stanford University

Structure of the presentation

Andrea: What is statistical physics and why should you care.

Marc: Two test cases: (1) counting matchings, (2) random k-SAT.

Ask whatever you want!

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 4: Statistical Physics Tools in Information Science - Stanford University

Structure of the presentation

Andrea: What is statistical physics and why should you care.

Marc: Two test cases: (1) counting matchings, (2) random k-SAT.

Ask whatever you want!

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 5: Statistical Physics Tools in Information Science - Stanford University

Structure of the presentation

Andrea: What is the statistical physics we do and . . . .

Marc: Two test cases: (1) counting matchings, (2) random k-SAT.

Ask whatever you want!

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 6: Statistical Physics Tools in Information Science - Stanford University

Sources

General: → M. Mezard and A Montanari, ’Information, Physics and

Computation,’ Upcoming book our web pages

Random k-SAT: → M. Mezard, G. Parisi, and R. Zecchina, ’Analytic and

Algorithmic Solution of Random Satisfiability Problems,’ Science

→ F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian,

L. Zdeborova ‘Gibbs States and the Set of Solutions of Random

Constraint Satisfaction Problems,’ PNAS

Coding: → A. Montanari and R. Urbanke, ‘Modern Coding Theory: The

Statistical Mechanics and Computer Science Point of View,’ Lecture

notes

General graphical models: → google ee374

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 7: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 8: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 9: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 10: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 11: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 12: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 13: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 14: Statistical Physics Tools in Information Science - Stanford University

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 15: Statistical Physics Tools in Information Science - Stanford University

Problems

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 16: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

Energy E : x 7→ E (x) ∈ R

(Boltzmann) probability distribution:

µ(x) =1

Zexp{−βE (x)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 17: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

Energy E : x 7→ E (x) ∈ R

(Boltzmann) probability distribution:

µ(x) =1

Zexp{−βE (x)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 18: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

Energy E : x 7→ E (x) ∈ R

(Boltzmann) probability distribution:

µ(x) =1

Zexp{−βE (x)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 19: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

Energy E : x 7→ E (x) ∈ R

(Boltzmann) probability distribution:

µ(x) =1

Zexp{−βE (x)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 20: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

Energy E : x 7→ E (x) ∈ R

(Boltzmann) probability distribution:

µ(x) =1

Zexp{−E (x)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 21: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

(Boltzmann) probability distribution:

µ(x) =1

Zw(x) .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 22: Statistical Physics Tools in Information Science - Stanford University

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

probability distribution:

µ(x) =1

Zw(x) .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 23: Statistical Physics Tools in Information Science - Stanford University

What is left? An example

L× L grid: G = (V ,E )xi ∈ X = {0, 1}, i ∈ V

µ(x) =1

Z (λ;G )λ|x | I{x is an independent set} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 24: Statistical Physics Tools in Information Science - Stanford University

What is left? Locality

L× L grid: G = (V ,E )xi ∈ X = {0, 1}, i ∈ V

µ(x) =1

Z (λ;G )

∏i∈V

λxi∏

(ij)∈E

I{(xi , xj) 6= (1, 1)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 25: Statistical Physics Tools in Information Science - Stanford University

A more abstract version of locality

G = (V ,E ), V = [n], x = (x1, . . . , xN) ∈ {0, 1}V

µ(x) =1

Z (λ;G )

∏i∈V

λxi∏

(ij)∈E

I{(xi , xj) 6= (1, 1)} .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 26: Statistical Physics Tools in Information Science - Stanford University

A more abstract version of locality

x1

x2 x3 x4

x5

x6

x7x8x9

x10

x11

x12

G = (V ,E ), V = [N], x = (x1, . . . , xN) ∈ XN

µ(x) =1

Z

∏(ij)∈G

ψij(xi , xj) .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 27: Statistical Physics Tools in Information Science - Stanford University

Statistical mechanics questions: I. Qualitative

How does a typical configuration sampled from µ look like?

Disordered versus Ordered

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 28: Statistical Physics Tools in Information Science - Stanford University

Statistical mechanics questions: I. Qualitative

How does a typical configuration sampled from µ look like?

Disordered versus Ordered

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 29: Statistical Physics Tools in Information Science - Stanford University

Statistical mechanics questions: I. Qualitative

How does a typical configuration sampled from µ look like?

Liquid versus Solid

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 30: Statistical Physics Tools in Information Science - Stanford University

Statistical mechanics questions: II. Quantitative

L× L grid: N = L2

Compute (for N large)

φN(λ) =1

Nlog Z (G ;λ) =

1

Nlog

∑x∈IS(G)

λ|x |

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 31: Statistical Physics Tools in Information Science - Stanford University

Isn’t Z just an irrelevant normalization constant?

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

= log Z (λ;G )− log λ∑i∈V

〈xi 〉G

= log Z (λ;G )− log λ∂ log Z (λ;G )

∂ log λ

[this relation is completely general]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 32: Statistical Physics Tools in Information Science - Stanford University

Isn’t Z just an irrelevant normalization constant?

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

= log Z (λ;G )− log λ∑i∈V

〈xi 〉G

= log Z (λ;G )− log λ∂ log Z (λ;G )

∂ log λ

[this relation is completely general]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 33: Statistical Physics Tools in Information Science - Stanford University

Isn’t Z just an irrelevant normalization constant?

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

= log Z (λ;G )− log λ∑i∈V

〈xi 〉G

= log Z (λ;G )− log λ∂ log Z (λ;G )

∂ log λ

[this relation is completely general]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 34: Statistical Physics Tools in Information Science - Stanford University

Isn’t Z just an irrelevant normalization constant?

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

= log Z (λ;G )− log λ∑i∈V

〈xi 〉G

= log Z (λ;G )− log λ∂ log Z (λ;G )

∂ log λ

[this relation is completely general]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 35: Statistical Physics Tools in Information Science - Stanford University

Isn’t Z just an irrelevant normalization constant?

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

= log Z (λ;G )− log λ∑i∈V

〈xi 〉G

= log Z (λ;G )− log λ∂ log Z (λ;G )

∂ log λ

[this relation is completely general]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 36: Statistical Physics Tools in Information Science - Stanford University

Questions I and II are related!

∆(x) =∑

i∈EVEN

xi −∑

i∈ODD

xi .

φN(λ, δ) =1

Nlog Z (G ;λ, δ) =

1

Nlog

∑x :∆(x)=Nδ

λ|x |

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 37: Statistical Physics Tools in Information Science - Stanford University

Liquid

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

φN(λ, δ)

δ

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 38: Statistical Physics Tools in Information Science - Stanford University

Solid

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

φN(λ, δ) ↑bottleneck

δ

l B

Theorem (Mossel/Weitz/Wormald/06)

On a random sparse bipartite graph B = Θ(1) whp for λ > λ∗.

Similar Thm for Ising models [A. Gerschenfeld/AM/07]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 39: Statistical Physics Tools in Information Science - Stanford University

Solid

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

φN(λ, δ) ↑bottleneck

δ

l B

Theorem (Mossel/Weitz/Wormald/06)

On a random sparse bipartite graph B = Θ(1) whp for λ > λ∗.

Similar Thm for Ising models [A. Gerschenfeld/AM/07]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 40: Statistical Physics Tools in Information Science - Stanford University

An artistic view of µ in the solid phase

δδ = 0

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 41: Statistical Physics Tools in Information Science - Stanford University

What about non-bipartite graphs?

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 42: Statistical Physics Tools in Information Science - Stanford University

Frustration

?

?

No ‘simple ordering’⇒ Solid amorphous state?

[Solid+Amorphous = Glass]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 43: Statistical Physics Tools in Information Science - Stanford University

Frustration

?

?

No ‘simple ordering’⇒ Solid amorphous state?

[Solid+Amorphous = Glass]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 44: Statistical Physics Tools in Information Science - Stanford University

Frustration

?

?

No ‘simple ordering’⇒ Solid amorphous state?

[Solid+Amorphous = Glass]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 45: Statistical Physics Tools in Information Science - Stanford University

How do you define ‘solid’?

i ∈ V

B(i , r) ball of radius r around i

x∼i ,r = {xj : j 6∈ B(i , r)}

Liquid: I (Xi ;X∼i ,r )r→ 0

Solid: I (Xi ;X∼i ,r )r→ I∞ > 0

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 46: Statistical Physics Tools in Information Science - Stanford University

How do you define ‘solid’?

i ∈ V

B(i , r) ball of radius r around i

x∼i ,r = {xj : j 6∈ B(i , r)}

Liquid: I (Xi ;X∼i ,r )r→ 0

Solid: I (Xi ;X∼i ,r )r→ I∞ > 0

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 47: Statistical Physics Tools in Information Science - Stanford University

How do you define ‘solid’?

i ∈ V

B(i , r) ball of radius r around i

x∼i ,r = {xj : j 6∈ B(i , r)}

Liquid: I (Xi ;X∼i ,r )r→ 0

Solid: I (Xi ;X∼i ,r )r→ I∞ > 0

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 48: Statistical Physics Tools in Information Science - Stanford University

Methods

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 49: Statistical Physics Tools in Information Science - Stanford University

Mean Field Methods

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 50: Statistical Physics Tools in Information Science - Stanford University

Mean field ******

Mean field methods : A family of techniques for approximatecalculations in statistical mechanics and graphical models.1

Mean field models : A class of models on which mean fieldmethods are asymptotically exact in the large system limit

1And more: Markov chains, queuing theory, stochastic networks, etc...Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 51: Statistical Physics Tools in Information Science - Stanford University

The simplest mean field calculation

i

∂i

µA( · ) marginal of XA, A ⊆ V

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 52: Statistical Physics Tools in Information Science - Stanford University

The simplest mean field calculation

i

∂i

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 53: Statistical Physics Tools in Information Science - Stanford University

The simplest mean field calculation

i

∂i

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 54: Statistical Physics Tools in Information Science - Stanford University

The simplest mean field calculation

i

∂i

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 55: Statistical Physics Tools in Information Science - Stanford University

The simplest mean field calculation

i

∂i

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 56: Statistical Physics Tools in Information Science - Stanford University

Solving the equations

Bipartite, degree k + 1, assume

µi (1) =

{p1 if i ∈EVEN,p2 if i ∈ODD.

Then, MF equations are

p1 = fλ(p2) , p2 = fλ(p1)

where fλ(x) = λ(1 + λ)−1 (1− x)k+1

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 57: Statistical Physics Tools in Information Science - Stanford University

Solving the equations

Bipartite, degree k + 1, assume

µi (1) =

{p1 if i ∈EVEN,p2 if i ∈ODD.

Then, MF equations are

p1 = fλ(p2) , p2 = fλ(p1)

where fλ(x) = λ(1 + λ)−1 (1− x)k+1

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 58: Statistical Physics Tools in Information Science - Stanford University

Solving the equations (continued)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

p1(`)

p2(`)

p1(`)

p2(`)

Liquid vs Solid

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 59: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 60: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 61: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 62: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 63: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 64: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 65: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Algorithmic version

Naive mf Neglects correlations Mean field

Bethe-Peierls ‘Nearest neighbors’ correls Belief Propagation

Cavity5 As BP + Glassy states Survey Propagation

Kikuchi6 Short loops / Nonpert. Generalized BP

Loop corr.7 Loops / Perturbative Loop corr. BP

5Mezard/Parisi,. . .6Kikuchi, Yedidia/Freeman/Weiss7AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 66: Statistical Physics Tools in Information Science - Stanford University

The family of mean field approximations

Method Basic intuition Algorithmic version

Naive mf Neglects correlations Mean field

Bethe-Peierls ‘Nearest neighbors’ correls Belief Propagation

Cavity8 As BP + Glassy states Survey Propagation

Kikuchi9 Short loops / Nonpert. Generalized BP

Loop corr.10 Loops / Perturbative Loop corr. BP

8Mezard/Parisi,. . .9Kikuchi, Yedidia/Freeman/Weiss

10AM/Rizzo, Parisi/Slanina, Chernyak/ChertkovMarc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 67: Statistical Physics Tools in Information Science - Stanford University

‘Any sparse random graph?’

Caveats

Many (rigorous and non) indications but no proof.

‘Sparse random graph is a bit vague.’

Can define a family of ensembles.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 68: Statistical Physics Tools in Information Science - Stanford University

‘Any sparse random graph?’

Caveats

Many (rigorous and non) indications but no proof.

‘Sparse random graph is a bit vague.’

Can define a family of ensembles.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 69: Statistical Physics Tools in Information Science - Stanford University

‘Any sparse random graph?’

Factor graph G = (V ,F ,E ),

x3

x1

x6

x4

x2

x5

x7

x

x

x

8

9

10

← variables xi ∈ X

← factors, e.g. ψa(x5, x7, x9, x10)

µ(x) =1

Z

∏a∈F

ψa(x∂a)

∂a ≡ {i ∈ V : (i , a) ∈ E}Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 70: Statistical Physics Tools in Information Science - Stanford University

Graph ensemble

︸ ︷︷ ︸degree 2

︸ ︷︷ ︸degree 3

︸ ︷︷ ︸degree dmax factorss

degree 2︷ ︸︸ ︷degree 3︷ ︸︸ ︷ degree dmax variables︷ ︸︸ ︷random permutation π

[∼ irregular LDPC ensembles]

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 71: Statistical Physics Tools in Information Science - Stanford University

Compatibility functions ensemble

Assign, for d ∈ {1, . . . dmax} a set of functions

{ψ(d ,r) : X × · · · × X︸ ︷︷ ︸d

→ R+}r=1,2,...

and a distribution {pd(r)} (pd(r) ≥ 0,∑

r pd(r) ≥ 0)

Then, for each f -node a of degree d(a)

ψa = ψ(d(a),r) independently, with prob pd(a)(r)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 72: Statistical Physics Tools in Information Science - Stanford University

Compatibility functions ensemble

Assign, for d ∈ {1, . . . dmax} a set of functions

{ψ(d ,r) : X × · · · × X︸ ︷︷ ︸d

→ R+}r=1,2,...

and a distribution {pd(r)} (pd(r) ≥ 0,∑

r pd(r) ≥ 0)

Then, for each f -node a of degree d(a)

ψa = ψ(d(a),r) independently, with prob pd(a)(r)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 73: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

0. Cavity method = Replica method

Replica method is formal, while cavity makes some probabilityassumptions.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 74: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

1. What does it mean asymptotically exact?

Partition function

limN→∞

1

Nlog ZN = φcavity almost surely.

Marginals

limN→∞

1

N

N∑i=1

||µi − µcavityi ||TV = 0 almost surely.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 75: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

1. What does it mean asymptotically exact?

Partition function

limN→∞

1

Nlog ZN = φcavity almost surely.

Marginals

limN→∞

1

N

N∑i=1

||µi − µcavityi ||TV = 0 almost surely.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 76: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

1. What does it mean asymptotically exact?

Partition function

limN→∞

1

Nlog ZN = φcavity almost surely.

Marginals

limN→∞

1

N

N∑i=1

||µi − µcavityi ||TV = 0 almost surely.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 77: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

2. Naive mean field → µi ≈ νi (vertex quantities)Cavity → νi→j (messages)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 78: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 79: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 80: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 81: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 82: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 83: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 84: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 85: Statistical Physics Tools in Information Science - Stanford University

The cavity method: An high level view

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1

*** 3RSB M3 = distribs over M2

· · · · ·· · · · ·· · · · ·*** ∞RSB ???

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 86: Statistical Physics Tools in Information Science - Stanford University

Results

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 87: Statistical Physics Tools in Information Science - Stanford University

A list of models from. . .

Coding

Multi-user detection

Stochastic networks

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 88: Statistical Physics Tools in Information Science - Stanford University

Channel coding

BMSx = (x1 . . . xN) y = (y1 . . . yN)

Channel transition probability {Q(y |x)}.

Codeword: x ∈ {0, 1}N

Hx = 0 mod 2 .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 89: Statistical Physics Tools in Information Science - Stanford University

LDPC codes [Gallager, MacKay, Luby et al.]

x1 ⊕ x2 ⊕ x3 ⊕ x4 = 0 · · · x5 ⊕ x6 ⊕ x8 = 0

x1 x2 x3 x4 x5 x6 x7 x8

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 90: Statistical Physics Tools in Information Science - Stanford University

x1 ⊕ x2 ⊕ x3 ⊕ x4 = 0 · · · x5 ⊕ x6 ⊕ x8 = 0

x1 x2 x3 x4 x5 x6 x7 x8y y y y y y y yy1 y2 y3 y4 y5 y6 y7 y8

µy (x) =1

ZN(y)I(x1 ⊕ x2 ⊕ x3 ⊕ x4 = 0) · · · I(x5 ⊕ x6 ⊕ x8 = 0) ·

· Q(y1|x1) · · ·Q(y8|x8)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 91: Statistical Physics Tools in Information Science - Stanford University

Some results

Saad/Kabashima et al., AM/Sourlas (Replica method)

φ = limN→∞

1

NE log ZN(Y )⇒ [Conditional entropy per bit H(X |Y )/N]]

Proof: Lower bound → AM, MacrisUpper bound: Measson/AM/Urbanke (BEC)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 92: Statistical Physics Tools in Information Science - Stanford University

Multi-user detection (CDMA channel)

N users: x ≡ (x1, x2, . . . , xN), xi ∈ {+1,−1} i.i.d uniform

M chips: y = (y1, y2, . . . , yN), ya ∈ R

ya = sa1xi1(a) + · · ·+ sakxik (a) + wa

wa = Normal(0, σ2) , {sai} spread sequences

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 93: Statistical Physics Tools in Information Science - Stanford University

Multi-user detection (CDMA channel)

noise

(+x1 − x2 + x3 + x4) + w1

y1 =· · · (−x5 − x6 + x8) + w6

y6 =

x1 x2 x3 x4 x5 x6 x7 x8

A posteriori distribution: µy (x) ≡ P {x |Y } → graphical model. . .

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 94: Statistical Physics Tools in Information Science - Stanford University

µy (x) =1

ZK (y)

N∏a=1

1√2πσ2

exp

− 1

2σ2

(ya −

∑l

salxil (a)

)2 .

Tanaka (replica method)

φ = limK→∞

1

KE log ZK (Y )⇒ [Capacity per user]

Several generalizations: Guo/Verdu, Caire et al., Kabashima et al.Proof: AM/Tse

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 95: Statistical Physics Tools in Information Science - Stanford University

µy (x) =1

ZK (y)

N∏a=1

1√2πσ2

exp

− 1

2σ2

(ya −

∑l

salxil (a)

)2 .

Tanaka (replica method)

φ = limK→∞

1

KE log ZK (Y )⇒ [Capacity per user]

Several generalizations: Guo/Verdu, Caire et al., Kabashima et al.Proof: AM/Tse

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 96: Statistical Physics Tools in Information Science - Stanford University

Channel assignment in cellular networks

ni ≥ 0, number of channels in cell i

µ(n) =1

Z

∏i∈V

λnii

ni !

∏(ij)∈E

I(ni + nj ≤ C ) .

Z → loss probability

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 97: Statistical Physics Tools in Information Science - Stanford University

END OF FIRST HALF

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 98: Statistical Physics Tools in Information Science - Stanford University

BEGINNING OF SECOND HALF

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 99: Statistical Physics Tools in Information Science - Stanford University

Cavity method: general (heuristic) framework

1- Draw the factor graph2- Write elementary “mean field (BP) equations” assuming thatthe local environment of a variable in the factor graph is a tree3- Two ways to use them: a) Statistical analysis of equations in agraph ensemble. b) Iteration of the message passing on a singleinstance (belief propagation)4- Check the existence of “Replica Symmetry Breaking”=dependence of the root from boundaries, using typical boundaries5- If needed, write the 1RSB cavity equations → surveypropagation ....

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 100: Statistical Physics Tools in Information Science - Stanford University

Cavity method: general (heuristic) framework

1- Draw the factor graph2- Write elementary “mean field (BP) equations” assuming thatthe local environment of a variable in the factor graph is a tree3- Two ways to use them: a) Statistical analysis of equations in agraph ensemble. b) Iteration of the message passing on a singleinstance (belief propagation)4- Check the existence of “Replica Symmetry Breaking”=dependence of the root from boundaries, using typical boundaries5- If needed, write the 1RSB cavity equations → surveypropagation ....

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 101: Statistical Physics Tools in Information Science - Stanford University

Cavity method: general (heuristic) framework

1- Draw the factor graph2- Write elementary “mean field (BP) equations” assuming thatthe local environment of a variable in the factor graph is a tree3- Two ways to use them: a) Statistical analysis of equations in agraph ensemble. b) Iteration of the message passing on a singleinstance (belief propagation)4- Check the existence of “Replica Symmetry Breaking”=dependence of the root from boundaries, using typical boundaries5- If needed, write the 1RSB cavity equations → surveypropagation ....

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 102: Statistical Physics Tools in Information Science - Stanford University

Cavity method: general (heuristic) framework

1- Draw the factor graph2- Write elementary “mean field (BP) equations” assuming thatthe local environment of a variable in the factor graph is a tree3- Two ways to use them: a) Statistical analysis of equations in agraph ensemble. b) Iteration of the message passing on a singleinstance (belief propagation)4- Check the existence of “Replica Symmetry Breaking”=dependence of the root from boundaries, using typical boundaries5- If needed, write the 1RSB cavity equations → surveypropagation ....

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 103: Statistical Physics Tools in Information Science - Stanford University

Cavity method: general (heuristic) framework

1- Draw the factor graph2- Write elementary “mean field (BP) equations” assuming thatthe local environment of a variable in the factor graph is a tree3- Two ways to use them: a) Statistical analysis of equations in agraph ensemble. b) Iteration of the message passing on a singleinstance (belief propagation)4- Check the existence of “Replica Symmetry Breaking”=dependence of the root from boundaries, using typical boundaries5- If needed, write the 1RSB cavity equations → surveypropagation ....

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 104: Statistical Physics Tools in Information Science - Stanford University

Factor graphs for graphical models

Many discrete variables xi , many constraints fa(Xa), each involvinga small number of variables. Factor graph:

2

1

4

5

a

b

c

d

e

3

P(x1, ..., x5) = 1Z fa(x1, x2, x3, x4)

fb(x1, x2, x3) fc(x2, x4, x5)fd(x1, x2, x5) fe(x1, x3, x5)

Q: Estimate marginals. Ubiquitous:inference, coding, combinatorial opti-mization, physics....

NB: In physics, ’energy’, ’tempera-ture’

fa(x1, x2, x3, x4) = e−βEa(x1,x2,x3,x4)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 105: Statistical Physics Tools in Information Science - Stanford University

Factor graphs for graphical models

Many discrete variables xi , many constraints fa(Xa), each involvinga small number of variables. Factor graph:

2

1

4

5

a

b

c

d

e

3

P(x1, ..., x5) = 1Z fa(x1, x2, x3, x4)

fb(x1, x2, x3) fc(x2, x4, x5)fd(x1, x2, x5) fe(x1, x3, x5)

Q: Estimate marginals. Ubiquitous:inference, coding, combinatorial opti-mization, physics....

NB: In physics, ’energy’, ’tempera-ture’

fa(x1, x2, x3, x4) = e−βEa(x1,x2,x3,x4)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 106: Statistical Physics Tools in Information Science - Stanford University

Locally tree-like factor graph

in LDPC error correcting codes,random K -satisfiability, colour-ing of random Erdos Renyigraphs, matching in randomgraphs, etc...: The factor graphis locally tree-like.

Ex: random 3-SAT

LoopsLog N

:

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 107: Statistical Physics Tools in Information Science - Stanford University

Simple mean field recursion: merge rooted trees

m m3 4

1 2 3 4

0

µ

µa

b

a b

m1 ( x )1

(x0 )

(x 0 )

m2(x2) (x3) (x4)

µa(x0) =∑

x1,x2m1(x1)m2(x2)fa(x1, x2, x0)

µb(x0) =∑

x3,x4m3(x3)m4(x4)fa(x3, x4, x0)

m0(x0) = Cµa(x0)µb(x0)0

m 0 ( x 0)

m0 = F (m1,m2,m3,m4) = Belief propagation

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 108: Statistical Physics Tools in Information Science - Stanford University

Simple mean field recursion: merge rooted trees

m m3 4

1 2 3 4

0

µ

µa

b

a b

m1 ( x )1

(x0 )

(x 0 )

m2(x2) (x3) (x4)

µa(x0) =∑

x1,x2m1(x1)m2(x2)fa(x1, x2, x0)

µb(x0) =∑

x3,x4m3(x3)m4(x4)fa(x3, x4, x0)

m0(x0) = Cµa(x0)µb(x0)0

m 0 ( x 0)

m0 = F (m1,m2,m3,m4) = Belief propagation

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 109: Statistical Physics Tools in Information Science - Stanford University

Belief propagation = iteration of mean field equations onone instance

mi→a(xi ) = C∏

b∈V (i)\a

µb→i (xi )

µa→i (xi ) =∑

{xj},j∈V (a)\i

fa(xi , {xj})∏

j∈V (a)\i

mj→a(xj)

Marginal on i (“belief”): pi (xi ) = C∏

b∈V (i) µb→i (xi )

Marginal around node a: Pa(Xa) = C∏

j∈V (a) mj→a(xj)

Entropy (exact on tree):

P(x) ' C∏

a Pa(Xa)∏

i pi (xi )1−di ; S = −

∑x P(x) log P(x)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 110: Statistical Physics Tools in Information Science - Stanford University

Belief propagation = iteration of mean field equations onone instance

mi→a(xi ) = C∏

b∈V (i)\a

µb→i (xi )

µa→i (xi ) =∑

{xj},j∈V (a)\i

fa(xi , {xj})∏

j∈V (a)\i

mj→a(xj)

Marginal on i (“belief”): pi (xi ) = C∏

b∈V (i) µb→i (xi )

Marginal around node a: Pa(Xa) = C∏

j∈V (a) mj→a(xj)

Entropy (exact on tree):

P(x) ' C∏

a Pa(Xa)∏

i pi (xi )1−di ; S = −

∑x P(x) log P(x)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 111: Statistical Physics Tools in Information Science - Stanford University

Belief propagation = iteration of mean field equations onone instance

mi→a(xi ) = C∏

b∈V (i)\a

µb→i (xi )

µa→i (xi ) =∑

{xj},j∈V (a)\i

fa(xi , {xj})∏

j∈V (a)\i

mj→a(xj)

Marginal on i (“belief”): pi (xi ) = C∏

b∈V (i) µb→i (xi )

Marginal around node a: Pa(Xa) = C∏

j∈V (a) mj→a(xj)

Entropy (exact on tree):

P(x) ' C∏

a Pa(Xa)∏

i pi (xi )1−di ; S = −

∑x P(x) log P(x)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 112: Statistical Physics Tools in Information Science - Stanford University

Statistical analysis

Factor graph ensembles:1- Random regular graph: local environment = regular tree foralmost all points → measure should be translationally invariantm = F (m,m,m,m)2-Erdos Renyi graph: P(m)= probability that mi = m, when i istaken at random in the graph with uniform probability.k neighbours, Poisson distributed. m0 = F (m1, ...,mk) → integralequation for P(m), easily solved numerically

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 113: Statistical Physics Tools in Information Science - Stanford University

Example: matching

Edge i : si ∈ {0, 1}.Matching: Constraint on each vertex

∑i∈V (a) si ≤ 1.

Energy E (s) = number of unmatched vertices.Probability: P(s) = 1

Z exp(−βE (s))

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 114: Statistical Physics Tools in Information Science - Stanford University

Example: matching

Edge i : si ∈ {0, 1}.Matching: Constraint on each vertex

∑i∈V (a) si ≤ 1.

Energy E (s) = number of unmatched vertices.Probability: P(s) = 1

Z exp(−βE (s))

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 115: Statistical Physics Tools in Information Science - Stanford University

BP equations in the matching problem

ψa(s) = I(∑

i∈V (a) si ≤ 1)

e−β(1−P

i∈V (a) si )

BP equations:

i

j

a

b

mi→a(si = 1) =∏

j∈∂b−i mj→b(sj = 0)

mi→a(si = 0) = e−β∏

j∈∂b−i mj→b(sj = 0)+∑j∈∂b−i mj→b(sj = 1)

∏k∈∂b−{i ,j} mk→b(sk = 0)

Closed set of equations for hi→a = − 1β log mi→a(0)

mi→a(1)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 116: Statistical Physics Tools in Information Science - Stanford University

BP equations in the matching problem

hi→a = − 1β log

[e−β +

∑j∈b−i eβhj→b

]= F (h1→b, h2→b, h3→b)

Statistical analysis:

1: r−regular random graph: h = 1β log

[√4(r−1)+e−2β−e−β

2(r−1)

]2: Erdos Renyi graph: P(h), solution of a simple integral equation

→ entropy S(β) = 1N E log[1 +N ] ,

→ size of the matching x(β) = Number of Matched VerticesN

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 117: Statistical Physics Tools in Information Science - Stanford University

Entropy of matchings: results

r−regular random graph: E logN = log EN , simple explicitformula, (Bollobas and McKay 86)

Erdos Renyi graph:

NB1: Size of largestmatching known fromKarp-Sipser 1981

NB2: Cavity methodcomputes E logN

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 118: Statistical Physics Tools in Information Science - Stanford University

How to control this heuristic approach?

One assumption:

P(x1, x2, x3, x4|x0, a, b absent) == m1(x1)m2(x2)m3(x3)m4(x4)

m m3 4

1 2 3 4

0

µ

µa

b

a b

m1 ( x )1

(x0 )

(x 0 )

m2(x2) (x3) (x4)

Two conditions:

- 1, 2, 3, 4 should be far away when 0, a, b are absent

- Correlations should decay at large distances

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 119: Statistical Physics Tools in Information Science - Stanford University

How to control this heuristic approach?

One assumption:

P(x1, x2, x3, x4|x0, a, b absent) == m1(x1)m2(x2)m3(x3)m4(x4)

m m3 4

1 2 3 4

0

µ

µa

b

a b

m1 ( x )1

(x0 )

(x 0 )

m2(x2) (x3) (x4)

Two conditions:

- 1, 2, 3, 4 should be far away when 0, a, b are absent

- Correlations should decay at large distances

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 120: Statistical Physics Tools in Information Science - Stanford University

How to control this heuristic approach?

One assumption:

P(x1, x2, x3, x4|x0, a, b absent) == m1(x1)m2(x2)m3(x3)m4(x4)

m m3 4

1 2 3 4

0

µ

µa

b

a b

m1 ( x )1

(x0 )

(x 0 )

m2(x2) (x3) (x4)

Two conditions:

- 1, 2, 3, 4 should be far away when 0, a, b are absent:OK for broad classes of random graphs

- Correlations should decay at large distances??.. Depends..

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 121: Statistical Physics Tools in Information Science - Stanford University

Correlation decay

Cavity = treeCorrelations (mutual infor-mation) between root andboundary should decay atlarge distances, for typicalconfigurations outside thetree

Sufficient condition (much easier, but too strong): correlationsdecay for worst case

Correlations for typical case (more difficult) → replica symmetrybreaking

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 122: Statistical Physics Tools in Information Science - Stanford University

Correlation decay

Cavity = treeCorrelations (mutual infor-mation) between root andboundary should decay atlarge distances, for typicalconfigurations outside thetree

Sufficient condition (much easier, but too strong): correlationsdecay for worst case

Correlations for typical case (more difficult) → replica symmetrybreaking

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 123: Statistical Physics Tools in Information Science - Stanford University

“Replica symmetry breaking”

Non trivial correlations between the root and the boundary

NB1: point-to-set correlationNB2: not necessarily detected by local stability condition

Random regular graph: m0 = F (m1, ..,m4)

RS solution: m = F (m,m,m,m) (transla-tional invariance)

Modulated solutions: mα0 = F (mα

1 , ..,mα4 )

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 124: Statistical Physics Tools in Information Science - Stanford University

“Replica symmetry breaking”

Non trivial correlations between the root and the boundary

NB1: point-to-set correlationNB2: not necessarily detected by local stability condition

Random regular graph: m0 = F (m1, ..,m4)

RS solution: m = F (m,m,m,m) (transla-tional invariance)

Modulated solutions: mα0 = F (mα

1 , ..,mα4 )

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 125: Statistical Physics Tools in Information Science - Stanford University

“Replica symmetry breaking 2”

RSB: exponentially many solutions to BP equations (extremalGibbs states)Survey: statistics on the solutionsµα

a→i (xi ): message from a to i in the solution α.

Qa→i (µ)= probability that the message µαa→i is equal to µ, when

α is chosen at random (with measure exp(−βxFα)).

Random reg. graph: translational invariance recovered with thestatistics over the sols → Qa→i (µ) = Q(µ), satisfies aself-consistent equation.

Matching: no RSB: Q(µ) = δ(µ, µrs)In many problems (SAT, colouring, 3-matching,...): RSB presentwhen the density of constraints is large enough

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 126: Statistical Physics Tools in Information Science - Stanford University

“Replica symmetry breaking 2”

RSB: exponentially many solutions to BP equations (extremalGibbs states)Survey: statistics on the solutionsµα

a→i (xi ): message from a to i in the solution α.

Qa→i (µ)= probability that the message µαa→i is equal to µ, when

α is chosen at random (with measure exp(−βxFα)).

Random reg. graph: translational invariance recovered with thestatistics over the sols → Qa→i (µ) = Q(µ), satisfies aself-consistent equation.

Matching: no RSB: Q(µ) = δ(µ, µrs)In many problems (SAT, colouring, 3-matching,...): RSB presentwhen the density of constraints is large enough

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 127: Statistical Physics Tools in Information Science - Stanford University

Random 3-satisfiability

NP-complete (Cook)

Pb: random Boolean formula, conjunctive normal form, threevariables per clause, chosen randomly in {x1, .., xN}, negatedrandomly with probability 1/2:(x1 ∨ x27 ∨ x3) ∧ (x11 ∨ x3 ∨ x2) ∧ . . . ∧ (x9 ∨ x8 ∨ x30)

Control parameter: α = MN = Constraints/Variables.

Numerically: Threshold phenomenon at αc ∼ 4.26.

Proba(SAT)=1 when α < αc ; Proba(SAT)=0 when α > αc .

Numerics Mitchell Selman Levesque Kirkpatrick Crawford Auton..Threshold Friedgut;Bounds Kaporis Kirousis Lalas Dubois Boufkhad..

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 128: Statistical Physics Tools in Information Science - Stanford University

Random 3-satisfiability

NP-complete (Cook)

Pb: random Boolean formula, conjunctive normal form, threevariables per clause, chosen randomly in {x1, .., xN}, negatedrandomly with probability 1/2:(x1 ∨ x27 ∨ x3) ∧ (x11 ∨ x3 ∨ x2) ∧ . . . ∧ (x9 ∨ x8 ∨ x30)

Control parameter: α = MN = Constraints/Variables.

Numerically: Threshold phenomenon at αc ∼ 4.26.

Proba(SAT)=1 when α < αc ; Proba(SAT)=0 when α > αc .

Numerics Mitchell Selman Levesque Kirkpatrick Crawford Auton..Threshold Friedgut;Bounds Kaporis Kirousis Lalas Dubois Boufkhad..

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 129: Statistical Physics Tools in Information Science - Stanford University

Threshold phenomenon → Phase transition

100

50

0

%SAT

α=Μ/Ν

N=200N=100

1 2 3 4 65αc

generically SAT for α < αc

generically UNSAT α > αc

Friedgut: → step function

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 130: Statistical Physics Tools in Information Science - Stanford University

Threshold phenomenon → Phase transition

100

50

0

%SAT

α=Μ/Ν1 2 3 4 65αc

Computer time Easy, and generically SAT,for α < αc

Hard, in the region α ∼ αc

Easy, generically UNSAT, forα > αc

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 131: Statistical Physics Tools in Information Science - Stanford University

Statistical physics of the random 3-SAT problem

Monasson, Zecchina, Weigt, Biroli, ....., MM, Parisi, Zecchina: →Phase diagram + New algorithm.

1- Analytic result:Discontinuousglass transition

Three phases:Easy-SAT, Hard-SAT,UNSAT

SAT (E = 0 ) UNSAT (E >0)0 0

1 stateE=0 E>0

Many states Many statesE>0

=M/Nαd

αc α= 4.267

2- New algorithm: Survey propagation (N = 107 at α = 4.23)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 132: Statistical Physics Tools in Information Science - Stanford University

Statistical physics of the random 3-SAT problem

Monasson, Zecchina, Weigt, Biroli, ....., MM, Parisi, Zecchina: →Phase diagram + New algorithm.

1- Analytic result:Discontinuousglass transition

Three phases:Easy-SAT, Hard-SAT,UNSAT

SAT (E = 0 ) UNSAT (E >0)0 0

1 stateE=0 E>0

Many states Many statesE>0

=M/Nαd

αc α= 4.267

2- New algorithm: Survey propagation (N = 107 at α = 4.23)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 133: Statistical Physics Tools in Information Science - Stanford University

Simple mean field message passing: warning propagation(Min Sum)

ua 1= 1

0

a

2 3

1

Message ua→1 ∈ {0, 1}

sent from clause a

to variable 1

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 134: Statistical Physics Tools in Information Science - Stanford University

Simple message passing: warning propagation

ua 1= 1

1

0

10

0 0

0 10

1

1

a

2 3

1

Warning ua→i = 1:

“According to the messagesI received, you should take thevalue which satisfies me!”.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 135: Statistical Physics Tools in Information Science - Stanford University

Simple message passing: warning propagation

ua 1=

1

0

10

00

00

0

0

1

0

a

2 3

1

No warning ua→i = 0:

“No problem, take any value!”

Warning propagation (= ’Min Sum’) converges and gives thecorrect answer on a tree: SAT iff no contradictory messageOn a real random 3-SAT: limited to α < 3.9. Cannot get close tothe SAT-UNSAT transition

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 136: Statistical Physics Tools in Information Science - Stanford University

Replica symmetry breaking

Minimum Energy Configurations:energy cannot be lowered by a fi-nite number of flips

State/Cluster= { MEC connectedby finite flips } → one fixed pointof WP

Proliferation of states:

At α > αd , many states:

N (E ) ∼ exp(N Σ

(EN

))

c

eth

Σ

Ε/Ν

α αα

αα

α

d< <

c α<

=

c

Σ(0) → clusters of SAT configu-rationsΣ(eth)→ metastable clusters

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 137: Statistical Physics Tools in Information Science - Stanford University

From warning propagation to survey propagation

RSB: assume many states: N (E ) ∼ exp(N Σ

(EN

))Message = Survey of the elementary warnings in the variousstates:

ηa→i = probability of a warning being sent from constraint a tovariable i , when a state is picked up at random.

→ Propagate the surveys along the graph. Converges!

→ Results on the phase diagram and the complexity, from thestatistical analysis of the distribution of surveys in a generic sample.

→ Information on a single sample: a local field on each variable →new algorithmic strategies

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 138: Statistical Physics Tools in Information Science - Stanford University

From warning propagation to survey propagation

RSB: assume many states: N (E ) ∼ exp(N Σ

(EN

))Message = Survey of the elementary warnings in the variousstates:

ηa→i = probability of a warning being sent from constraint a tovariable i , when a state is picked up at random.

→ Propagate the surveys along the graph. Converges!

→ Results on the phase diagram and the complexity, from thestatistical analysis of the distribution of surveys in a generic sample.

→ Information on a single sample: a local field on each variable →new algorithmic strategies

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 139: Statistical Physics Tools in Information Science - Stanford University

Survey propagation

a 1η = Prob(warning)

ηb−>2

b

a

2 3

1

ηa→1: known exactly fromsurveys ofincoming warnings.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 140: Statistical Physics Tools in Information Science - Stanford University

Statistical analysis of the SP equations in random K-SAT:phase diagram

Thresholds from integral equa-tion. Solved numerically orthrough large K asymptotic ex-pansion.

αc : SAT-UNSAT threshold.

αd : Onset of clustering→ clusters with frozen variables.

K αd αc α(7)c

3 3.93 4.2667 4.3074 8.30 9.931 9.9385 16.1 21.117 21.1186 30.5 43.37 43.3727 57.2 87.79 87.7858 107.2 176.5439 201.3 354.010

10 379.1 708.915

αc is conjectured to be exact (not αd).

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 141: Statistical Physics Tools in Information Science - Stanford University

Using the surveys : local field

In one given cluster of solutions, α:Hα

j =∑

a ua→j

Hαj > 0: number of warnings telling

“xi should be one”

Hαj < 0: number of warnings telling

“xi should be zero”

Hαj = 0: no warning

→ Survey of local field.

Pj(H) = Probability that Hαj = H

when α chosen at random.

0 H1−1

P(H)

32−2−3

W W +− W0

Some types of variables:

Balanced:

W± ' 1/2,W0 ' 0

Polarized:

W+ ' 1 or W− ' −1

Underconstrained:

W0 ' 1

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 142: Statistical Physics Tools in Information Science - Stanford University

Survey Inspired Decimation

Biased variable W i+ ' 1: In almost all clusters of solutions, xi = 1.

→ Fix xi = 1

SID algorithm: Iterate:

Run SP until convergence

Find most biased variable, i such that |W i+ −W i

−| maximal.

Fix it to xi = 1 if W i+ > W i

−, to xi = 0 if W i+ < W i

−, simplifythe formula.

Two possible ends: 1) Fix all variables 2) reduce the formula to astage where all W i

0 = 1. Underconstrained problem, easily solvedby e.g. simulated annealing or Walksat.

Solves: 107 variables at α ' 4.2− 4.25. Time O(N2), reduced toO(N) by fixing a fraction of the variables.

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 143: Statistical Physics Tools in Information Science - Stanford University

Survey decimation example

Number of clustersof assignmentswhich violate E clauses:

eΣ(E)

N = 10000, plot every 500decimation steps 0

50

100

150

200

0 5 10 15 20 25 30 35 40 45Σ

E’

decimationprocess

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 144: Statistical Physics Tools in Information Science - Stanford University

Glass phase in LDPC codes

p

Binary Symmetric Channel

Flip probability p

Complexity of the landscape(configurations on the sphere)

Σ(e) = 1N logN (E = Ne)

.04

.3

.2

.1

0.08 .12

p=.155

p=.3

(6,5) regular code. p

p d

c

= .139=.264

p=pc

Σ

e

p=.2

pd = threshold BP decoding

pc = threshold optimal decoding

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 145: Statistical Physics Tools in Information Science - Stanford University

Glass phase in LDPC codes

p

Binary Symmetric Channel

Flip probability p

Complexity of the landscape(configurations on the sphere)

Σ(e) = 1N logN (E = Ne)

.04

.3

.2

.1

0.08 .12

p=.155

p=.3

(6,5) regular code. p

p d

c

= .139=.264

p=pc

Σ

e

p=.2

pd = threshold BP decoding

pc = threshold optimal decoding

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 146: Statistical Physics Tools in Information Science - Stanford University

Miscellaneous comments

General approach to many constraint satisfaction networks, whenthe factor graph has a local tree structure (large girth)

Simple case (low density of constraint): RS cavity method OK.e.g. decoding with belief propagation at low enoug noise

Increasing density 1RSB: many pure states → statistical physics inthe space of pure states. Phase diagram for K -sat, q-colouring,LDPC codes...

Generic picture:SATHard-SAT (clusters)UNSAT

SAT (E = 0 ) UNSAT (E >0)0 0

1 stateE=0 E>0

Many states Many statesE>0

=M/Nαd

αc α= 4.267

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 147: Statistical Physics Tools in Information Science - Stanford University

Miscellaneous comments

Always “tree computations” (= iterative mapping of pdf), butwith different interpretations

Algorithmic implementation (single instance): belief propagation -survey propagation. Very powerful

Statistical analysis: Typical samples, typical configurations, viewedfrom a typical point: phase diagrams

Some predictions are rigorously confirmed (weighted matching,clusters in hard SAT phase, satisfiability threshold as upperbound...).

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 148: Statistical Physics Tools in Information Science - Stanford University

Appendix 1: Survey propagation equations

a 1η = Prob(warning)

ηb−>2

b

U

VW

X

a

2 3

1

π2+ =

∏b∈U(1− ηb→2)

π2− =

∏b∈V (1− ηb→2)

P(no contrad): π2+ + π2

− − π2+π

2−

q2 ≡ Prob(x2 = 1)

=π2−(1−π2

+)

π2++π2

−−π2+π2−

q3 ≡ Prob(x3 = 0)

=π3

+(1−π3−)

π3++π3

−−π3+π3−

ηa→1 = q2q3

Survey propagation: statistical analysis, or single sample →algorithms

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Page 149: Statistical Physics Tools in Information Science - Stanford University

Appendix 2 Origins of the cavity method

1975: Definition of the SK model of spin glasses E = −∑

ij Jijsi sj1979: Parisi solution of this model with replicas1986: An alternative approach: the cavity method (M, Parisi,Virasoro). Direct probabilistic approach, based on N → N + 1 butusing N � 1. Equivalent to replica approach.2001: A new version of the cavity method to handle ’finiteconnectivity’ problems (M, Parisi)2002: Applications to XORSAT, K-SAT, colouring.... → phasediagrams (thresholds) and algorithms (survey propagation).2003: Rigorous confirmation of Parisi’s solution for the SK model(Talagrand, Guerra)

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science