Statistical Physics Tools in Information Science

Marc Mezard1 and Andrea Montanari2

(1) Universite de Paris Sud and (2) Stanford University

June 23, 2007

Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Structure of the presentation

Andrea: What is statistical physics and why should you care.

Marc: Two test cases: (1) counting matchings, (2) random k-SAT.

Ask whatever you want!

Andrea: What is the statistical physics we do and . . . .

Sources

General: → M. Mezard and A Montanari, ’Information, Physics and

Computation,’ Upcoming book our web pages

Random k-SAT: → M. Mezard, G. Parisi, and R. Zecchina, ’Analytic and

Algorithmic Solution of Random Satisfiability Problems,’ Science

→ F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian,

L. Zdeborova ‘Gibbs States and the Set of Solutions of Random

Constraint Satisfaction Problems,’ PNAS

Coding: → A. Montanari and R. Urbanke, ‘Modern Coding Theory: The

Statistical Mechanics and Computer Science Point of View,’ Lecture

General graphical models: → google ee374

Outline

1 Problems

2 Methods

3 Results

4 The cavity method at work

5 Mean Field (BP) on graphical models

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Outline

1 Problems

2 Methods

3 Results

6 Matching

7 K-SAT

8 Appendices

Problems

Probabilistic description of a physical system

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

Energy E : x 7→ E (x) ∈ R

(Boltzmann) probability distribution:

µ(x) =1

Zexp{−βE (x)} .

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

µ(x) =1

Zexp{−βE (x)} .

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

µ(x) =1

Zexp{−βE (x)} .

State: x = (x1, . . . , xN), xi ∈ X

Temperature: β

µ(x) =1

Zexp{−βE (x)} .

State: x = (x1, . . . , xN), xi ∈ X

µ(x) =1

Zexp{−E (x)} .

State: x = (x1, . . . , xN), xi ∈ X

µ(x) =1

Zw(x) .

State: x = (x1, . . . , xN), xi ∈ X

probability distribution:

µ(x) =1

Zw(x) .

What is left? An example

L× L grid: G = (V ,E )xi ∈ X = {0, 1}, i ∈ V

µ(x) =1

Z (λ;G )λ|x | I{x is an independent set} .

What is left? Locality

L× L grid: G = (V ,E )xi ∈ X = {0, 1}, i ∈ V

µ(x) =1

Z (λ;G )

∏i∈V

λxi∏

(ij)∈E

I{(xi , xj) 6= (1, 1)} .

A more abstract version of locality

G = (V ,E ), V = [n], x = (x1, . . . , xN) ∈ {0, 1}V

µ(x) =1

Z (λ;G )

∏i∈V

λxi∏

(ij)∈E

I{(xi , xj) 6= (1, 1)} .

A more abstract version of locality

x2 x3 x4

x7x8x9

G = (V ,E ), V = [N], x = (x1, . . . , xN) ∈ XN

µ(x) =1

∏(ij)∈G

ψij(xi , xj) .

Statistical mechanics questions: I. Qualitative

How does a typical configuration sampled from µ look like?

Disordered versus Ordered

Liquid versus Solid

Statistical mechanics questions: II. Quantitative

L× L grid: N = L2

Compute (for N large)

φN(λ) =1

Nlog Z (G ;λ) =

∑x∈IS(G)

λ|x |

Isn’t Z just an irrelevant normalization constant?

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

= log Z (λ;G )− log λ∑i∈V

〈xi 〉G

= log Z (λ;G )− log λ∂ log Z (λ;G )

∂ log λ

[this relation is completely general]

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

〈xi 〉G

∂ log λ

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

〈xi 〉G

∂ log λ

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

〈xi 〉G

∂ log λ

H(X ) = −∑x

µ(x) log µ(x)

= log Z (λ;G )−∑

x∈IS(G)

µ(x) |x | log λ

〈xi 〉G

∂ log λ

Questions I and II are related!

∆(x) =∑

i∈EVEN

xi −∑

i∈ODD

φN(λ, δ) =1

Nlog Z (G ;λ, δ) =

∑x :∆(x)=Nδ

λ|x |

Liquid

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

φN(λ, δ)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

φN(λ, δ) ↑bottleneck

Theorem (Mossel/Weitz/Wormald/06)

On a random sparse bipartite graph B = Θ(1) whp for λ > λ∗.

Similar Thm for Ising models [A. Gerschenfeld/AM/07]

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

φN(λ, δ) ↑bottleneck

Theorem (Mossel/Weitz/Wormald/06)

On a random sparse bipartite graph B = Θ(1) whp for λ > λ∗.

Similar Thm for Ising models [A. Gerschenfeld/AM/07]

An artistic view of µ in the solid phase

δδ = 0

What about non-bipartite graphs?

Frustration

No ‘simple ordering’⇒ Solid amorphous state?

[Solid+Amorphous = Glass]

Frustration

How do you define ‘solid’?

i ∈ V

B(i , r) ball of radius r around i

x∼i ,r = {xj : j 6∈ B(i , r)}

Liquid: I (Xi ;X∼i ,r )r→ 0

Solid: I (Xi ;X∼i ,r )r→ I∞ > 0

i ∈ V

x∼i ,r = {xj : j 6∈ B(i , r)}

i ∈ V

x∼i ,r = {xj : j 6∈ B(i , r)}

Methods

Mean Field Methods

Mean field ******

Mean field methods : A family of techniques for approximatecalculations in statistical mechanics and graphical models.1

Mean field models : A class of models on which mean fieldmethods are asymptotically exact in the large system limit

1And more: Markov chains, queuing theory, stochastic networks, etc...Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

The simplest mean field calculation

µA( · ) marginal of XA, A ⊆ V

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

µi (1) =∑x∂i

µi |∂i (1|x∂i )µ∂i (x∂i ) =λ

1 + λµ∂i (0)

≈ λ

1 + λ

∏j∈∂i

µj(0) =λ

1 + λ

∏j∈∂i

(1− µj(1))

Solving the equations

Bipartite, degree k + 1, assume

µi (1) =

{p1 if i ∈EVEN,p2 if i ∈ODD.

Then, MF equations are

p1 = fλ(p2) , p2 = fλ(p1)

where fλ(x) = λ(1 + λ)−1 (1− x)k+1

Solving the equations

Bipartite, degree k + 1, assume

µi (1) =

{p1 if i ∈EVEN,p2 if i ∈ODD.

Then, MF equations are

p1 = fλ(p2) , p2 = fλ(p1)

where fλ(x) = λ(1 + λ)−1 (1− x)k+1

Solving the equations (continued)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Liquid vs Solid

The family of mean field approximations

Method Basic intuition Asympt. exact for

Naive mf Neglects correlations Some dense G ’s

Bethe-Peierls ‘Nearest neighbors’ correls Some sparse rand. G ’s

Cavity2 As BP + Glassy states ‘Any’ sparse rand. G

Kikuchi3 Short loops / Nonpert. ???

Loop corr.4 Loops / Perturbative ***

2Mezard/Parisi,. . .3Kikuchi, Yedidia/Freeman/Weiss4AM/Rizzo, Parisi/Slanina, Chernyak/Chertkov

Method Basic intuition Algorithmic version

Naive mf Neglects correlations Mean field

Bethe-Peierls ‘Nearest neighbors’ correls Belief Propagation

Cavity5 As BP + Glassy states Survey Propagation

Kikuchi6 Short loops / Nonpert. Generalized BP

Loop corr.7 Loops / Perturbative Loop corr. BP

Method Basic intuition Algorithmic version

Naive mf Neglects correlations Mean field

Bethe-Peierls ‘Nearest neighbors’ correls Belief Propagation

Cavity8 As BP + Glassy states Survey Propagation

Kikuchi9 Short loops / Nonpert. Generalized BP

Loop corr.10 Loops / Perturbative Loop corr. BP

8Mezard/Parisi,. . .9Kikuchi, Yedidia/Freeman/Weiss

10AM/Rizzo, Parisi/Slanina, Chernyak/ChertkovMarc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

‘Any sparse random graph?’

Caveats

Many (rigorous and non) indications but no proof.

‘Sparse random graph is a bit vague.’

Can define a family of ensembles.

Caveats

Many (rigorous and non) indications but no proof.

‘Sparse random graph is a bit vague.’

Can define a family of ensembles.

Factor graph G = (V ,F ,E ),

← variables xi ∈ X

← factors, e.g. ψa(x5, x7, x9, x10)

µ(x) =1

∏a∈F

ψa(x∂a)

∂a ≡ {i ∈ V : (i , a) ∈ E}Marc Mezard1 and Andrea Montanari2 Statistical Physics Tools in Information Science

Graph ensemble

︸︷︷︸degree 2

︸︷︷︸degree 3

︸︷︷︸degree dmax factorss

degree 2︷︸︸︷degree 3︷︸︸︷ degree dmax variables︷︸︸︷random permutation π

[∼ irregular LDPC ensembles]

Compatibility functions ensemble

Assign, for d ∈ {1, . . . dmax} a set of functions

{ψ(d ,r) : X × · · · × X︸︷︷︸d

→ R+}r=1,2,...

and a distribution {pd(r)} (pd(r) ≥ 0,∑

r pd(r) ≥ 0)

Then, for each f -node a of degree d(a)

ψa = ψ(d(a),r) independently, with prob pd(a)(r)

Compatibility functions ensemble

Assign, for d ∈ {1, . . . dmax} a set of functions

{ψ(d ,r) : X × · · · × X︸︷︷︸d

→ R+}r=1,2,...

and a distribution {pd(r)} (pd(r) ≥ 0,∑

r pd(r) ≥ 0)

Then, for each f -node a of degree d(a)

ψa = ψ(d(a),r) independently, with prob pd(a)(r)

The cavity method: An high level view

0. Cavity method = Replica method

Replica method is formal, while cavity makes some probabilityassumptions.

1. What does it mean asymptotically exact?

Partition function

limN→∞

Nlog ZN = φcavity almost surely.

Marginals

limN→∞

N∑i=1

||µi − µcavityi ||TV = 0 almost surely.

Partition function

limN→∞

Marginals

limN→∞

N∑i=1

Partition function

limN→∞

Marginals

limN→∞

N∑i=1

2. Naive mean field → µi ≈ νi (vertex quantities)Cavity → νi→j (messages)

3. A hierarchy

Std terminology Cavity jargon Message space

Bethe-Peierls RS (0RSB) M0 = distribs over X*** 1RSB M1 = distribs over M0

*** 2RSB M2 = distribs over M1