Download - The Sherrington Kirkpatrick Modelweb.stanford.edu/~psimin/presentations/ultra.pdf · Poisson-Dirichlet Process The sequence(v n) n 1 is a random probability distribution on IN and

Introduction Ruelle Probability Cascades Ultrametricity

The Sherrington Kirkpatrick ModelRuelle Probability Cascades and Ultrametricity

Paris SiminelakisPhD Student

Department of Electrical EngineeringStanford University

December 6, 2013


Outline

1 IntroductionSpin GlassesGibbs MeasuresFree EnergyClustering

2 Ruelle Probability CascadesPoisson ProcessesConstruction of RPCGhirlanda-Guerra Identities

3 UltrametricityMotivationProof of UltrametricityReplication Property


Introduction


Spin Glasses

The Sherrington-Kirkpatrick model is a special instance of a spinglass, the first of which is the Ising model(picture).

Spins σ ∈ ΣN are the states of thesystem. Usually Σ = −1,+1(spherical spin glasses). Glasses due to:

spins are “frozen” to seeminglyrandom configurations.

Non-equilibrium materials: Aging,Metastability


Applications of Spin Glasses

Structure of Materials(phase transtitions).Ising model, Hardcore model etc.

Combinatorial Optimization(random instances)Assignment Problem, Random CSP’s., connections tolocal-weak covergence(Aldous-Steele, Benjamini-Schramm)

Neural Networks - Learning. Hopfield Model.

Networked Dynamics: Interacting Particle systems(Ligett),Finance, Diffusion of Innovations etc.

Immunology, Coding Theory, Compressed Sensing,Protein Folding, ...


Gibbs Measures

We are interested in studying the structure of ΣN endowed bya (random) Hamiltonian HN : ΣN → IR. Of particular interestis the quantity: maxσ∈ΣN

HN(σ).

The Gibbs measure for a given Hamiltonian is a probabilitydistribution over ΣN , defined by:

GN,β(σ) =1

ZN,βexp (βHN(σ))

As β →∞ Gibbs measure concentrates on the maxima of HN .

Question: Sampling from the Gibbs Measure?Answer: NP-Hard for most models and β ≥ βc .


Partition Function and Free Energy

We could sample if we could approximate the partition functionZN,β =

∑σ∈ΣN

exp(βHN(σ)). We cannot do that in most cases[Sly-Sun, Galanis et al]. An intimately related quantity is the freeenergy:

FN(β) =1

NlogZN,β

If F (β) = limn Fn(β) exists for all large enough β. Then weget that limβ F (β) = maxσ HN(σ).

Let < · > denote the expectation w.r.t Gβ,n. The free energycan also be used to compute (thermal) averages.

Example: H(σ) =∑ζihi (σ), then: ∂Fn(ζ)

∂ζi=< hi (σ) >.


Saddle Point Approximation

Gibbs measure has a complex structure. Typically it exhibitsmultiple valleys (rugged landscape) [Chatterjee’09]. We need toreduce its complexity.

Idea: Saddle Point Approximationfrom Asymptotic analysis for ZN,β:∫

ΣN

f (σ) exp(βH(σ))dσ

→∑a∈A

waf (a) exp(βH(a))

Interpretation: Pure statedecomposition, Clustering.


Clustering

Identifying “similar” states. Partition state space according to ameasure of similarity: Hamming distance(Random CSPs),overlaps(SK model), bond overlaps(EA model) etc.

Geometric decomposition of statespace.

Related to Metastability [Bovier]:random walk confined within acluster, unable to overcome highenergy barriers.


Replica Method

Replica’s σ1, . . . , σk ∈ ΣN are independent samples form theGibbs measure. Although, sampling may be hard, it is used asreasoning tool to peer into the structure of the Gibbs measure.

Original use [E’69,EA’75]:

limn→0

Zn − 1

n= logZ

Modern Use [MPV’87]: correlations between replicas revealgeometric structure of the measure, stability underperturbations.

Correlation structure between replicas⇒ handle on the Gibbs measure.


Parisi Ansatz

In a series of papers 80’s physicist Giorgio Parisi had the crucialinsight about the structure of the correlations in Spin Glasses:

The overlap matrix Qa,b has Ultrametric structure.

Led to the Parisi Formula


Cavity Method

Our initial goal was to calculate the free energy. The basis of thecavity method is to integrate the change to the free energy by theaddition of one particle:

FN(β) =1

NElogZN =

1

N

N−1∑n=0

E(

lnZN+1

ZN

)

Decomposing the Gaussian processes HN+1,HN(onlycorrelation structure matters), leads to the Aizenmann, Starr,Sims[ASS’03] representation:

F (β) = limN

(E ln〈2 cosh zN(σ)〉 − E ln〈exp yN(σ)〉)

Observation: Since zN , yN , HN are Gaussian processes, if thelogarithm was not there then the limiting free energy would bea function only of the distribution of the overlaps!


Stochastic Stability and ROSt

One can interpret the terms in the ASS representation as freeenergies of a different Gibbs measure that has undergone arandom change of density. Let Xi , i ∈ IN be particles accordingto some distribution P. Define:

ξi =eXi∑i∈N eXi

Let κi a cavity field independent of Xi with covariance givenby Q. The cavity dynamics are:

((ξi )i∈IN,Q)→

( ξieψ(κi )∑

j≥1 ξjeψ(κj )

, i ∈ IN

)↓

,Q↓

Stochastic Stability: One expects that the Gibbs measure tobe asymptotically invariant under the cavity dynamics.


Ruelle ProbabilityCascades


Precursors: REM and GREM

Random Walks in Random Environment [Chernov’56, Solomon’75,

Sinai 80s, Kesten] as models to study disordered media. Reflectsboth agnosticism and the disordered nature of the material.

In Spin Glasses the Gibbs measure is random, HN(σ) is a zeromean gaussian process indexed by σ ∈ ΣN .

Important: The distribution of HN is determined solely bythe covariance structure.

For uncorrelated r.v. we get Derrida’s Random EnergyModel(REM). A generelization where the correlation matrix ishierearchical was also proposed by Derrida in 1985(GREM):

HN(σ) =√a1Xσ1 +

√a2Xσ1σ2 + . . .

√akXσ1σ2...σk


Properties of Poisson Processes

Let Π be a Poisson process on S with mean measure µ.

The image f (Π) of Π is also a Poisson process with measureµ f −1.

If Πm independent Poisson processes then the superposition∪Πm is a p.p. with mean measure

∑µm.

Let (S′,S ′

) be a measurable space and K : S × S ′ → [0, 1] atransition function. Then the “markings” are poisson processwith measure µ∗(A) = K ∗ µ(A).

Invariance Properties


Poisson Dirichlet Processes

Let ζ ∈ (0, 1) and Π be a Poisson process on S = (0,∞) withµ(dx) = ζx−1−ζdx . Let (un)n≥1 be a decreasing enumeration ofpoints in Π, define.:

vn =un∑`≥1 u`

Poisson-Dirichlet Process

The sequence (vn)n≥1 is a random probability distribution on INand is called the Poisson-Dirichlet process PD(ζ).

Motivation: homogeneous process supported on (0,∞) suchthat the sum

∑u` is finite.

Property: E log∑

unZnXn = E log∑

unZn + 1ζ logEX ζ


Invariance

Consider that Xn = exp(gn − tζ/2) and Yn = gn − tζ. Definev tn = unXn/

∑u`X`, then by homogeneity and gaussian nature:

Bolthausen-Sznitman Invariance

Let π(n) be a random bijection such that (v tπ(n))n≥1 is in decreasingorder. We have the following invariance property:

(v tπ(n), gπ(n) − tζ) d= (vn, gn)


Ghirlanda-Guerra Identity

Define G (ei ) = vi a measure on a Hilbert space H and g(ei ) agaussian process. Let h1, . . . , hn ∈ H n replicas sampled from G⊗n.

Ghirlanda-Guerra Identity

For any n ≥ 1 and any function f of the overlaps Rn(R`,`′):

E〈fR1,n+1〉 =1

nE〈f 〉E〈R1,2〉+

1

n

n∑l=2

E〈fR1,l〉

Proof: Inner products preserved under bijection + invariance.

Given ζ we can reconstruct the measure G by expressing allthe joint moments as a function of ζ.


Ruelle Probability Cascades

RPC’s are random measures over INr vertices. It is proposed as amodel for the limiting Gibbs measure of mean field models.

Tree with vertex set A = IN0 ∪ . . . ∪ INr , with INr as leaves.

At each non-leaf node of level k an independent PD(ζk) process isgenerated with 0 < ζ0 < . . . < ζr−1 < 1.

Each node a ∈ INk in the tree, is identified by its path:p(a) = n1, (n1, n2), . . . , (n1, . . . , nk). For a ∈ INr letwa =

∏β∈p(a) uβ . Then the random measure on INr is given by:

va =wa∑

a∈INr wa


A Recursive Computation

Keeping in mind the Poisson Dirichlet property (earlier), letXr (ω1, . . . , ωr ) be a function or U[0, 1] r.v. Define recursively:

X` =1

ζ`logE`+1 exp ζ`X`+1

Cavity Computation

Let Ωa = (ωβ)β∈p(a) be independent U[0, 1] r.v. If X0 is defined viathe previous recursive computation, then:

X0 = E log∑a∈INr

va expXr (Ωa)

Proof: main idea recursion/renormalization. Exploitconditional independence and PD(ζk) at each level.

Importance: Averages of the “Gibbs measure” va andreplicas (through Ωa), used in computing the Parisi Formula.


Random Measure on a Hilbert space

Using the weights (va)a∈INr of the RPC we construct a randommeasure on the Hilbert space H.

Let ea for a ∈ A \ IN0 a sequence of orthonormal vectors.Given numbers 0 < q1 < . . . qr < 1 we construct a set ofpoints ha ∈ H indexed by vertices of the tree:

ha =∑β∈p(a)

eβ(q|β| − q|β|−1

)1/2

Properties: ha · hγ = q|p(a)∧p(γ)|, β ∧ γ ≥ min(β ∧ α, α ∧ γ)

(paths), ha · ha = qr (spherical) for all a ∈ A \ IN0

‖hβ − hγ‖ ≤ max (‖ha − hβ‖, ‖ha − hγ‖)


Extended Ghirlanda Guerra identities

Extended Ghirlanda-Guerra Identities

For any n ≥ 1, any function f of the overlaps Rn = (R`,`′)`,`′≤nand any function ψ : IR→ <:

E〈f ψ(R1,n+1)〉 =1

nE〈f 〉E〈ψ(R1,2)〉+

1

n

n∑`=2

E〈f ψ(R1,`)〉

Proof: Invariance, conditioning, independence, recursivedefinition of permutations to preserve structure of the tree.

Averages: Using the EGG(n = 1, f = 1) the average of anyfunction of an overlap is given by:

E〈ψ(R1,2)〉 =r∑

p=0

ψ(qp)(ζp − ζp−1)

Functional Order Parameter ζ (qp)


Consequences of Ghirlanda-Guerra

Consider the measure EG⊗∞ and let ζ(A) = E〈I (R1,2 ∈ A)〉, bethe distribution of one overlap. Suppose, that theGhirlanda-Guerra identities hold:

Sphere: If q∗ is the largest point in the support of ζ, thenwith probability one G (‖σ‖2 = q∗) = 1.

Positivity Principle: The overlap is non-negativeζ([0,∞)) = 1.

Ultrametricity: the overlap matrix is ultrametric.

Theorem(Functional Order Parameter)

The distribution of the entire overlap matrix (R`,`′))`,`′≥1 underEG⊗∞ is uniquely determined by ζ.


Proof of the f.o.p Theorem

1 Approximation: Assume that ζ is supported only on finitenumber of points.

2 Polarization: Pick the largest element of the(n

2

)overlaps.

Using ultrametricity, the two replicas will have the sameoverlaps with the remaining elements.

3 Recursion: Using the Ghirlanda-Guerra identities we canexpress the joint distribution as a sum of “simpler”distributions(n − 1 replicas).

4 Lifting: Show, that the discretization of Rκ = κ(R) satisfiesall the required properties(Positive Definite, Ghirlanda-Guerra)to conclude that the distribution of κ(R1,2) given by ζ κ−1

uniquely determines Rκ

5 Convergence: Since, Rκa.s.→ R as r →∞, then that means

that also the distribution of R is uniquely determined by ζ.


RPC and Random Overlap Structures

The Ruelle Probability Cascades are stationary and ergodicunder the cavity dynamics.

Conjecture

The Ruelle Probability Cascades exhaust the measures (ROSt) thatare stochastically stable.

This conjecture was proved by Aizenmann,Arguin’07 assumingthat the overlap matrix Q has finitely many values.

Arguin’08 proved that a robust stochastically stable ROStmust obey the Extended Ghirlanda-Guerra identities.

Chatterjee, Arguin’10 proved that the cavity mapping iscontinuous and consequently that the limiting Gibbsmeasure is stochastically stable.


Ultrametricity


Ultrametricity

Theorem(Panchenko’2013)

Suppose the Ghirlanda-Guerra identities hold then:

E 〈I (R1,2 ≥ min(R1,3,R2,3))〉 = 1

A statement about the support of the distribution of theoverlap matrix under EG⊗∞.⇒ Control of events in σ (Rn)

Insight: sufficient to prove the “almost Replication” property.Motivated by cluster-tree picture(board).

Handle: Invariance properties implied by theGhirlanda-Guerra identities.


Replication Property

Let x ≈ a denote the event ∃ε > 0 such that a− ε < x < a + ε.Consider A a n × n matrix such that E〈Rn ≈ A〉 > 0 and definea∗n = max(a1,n, . . . , an−1,n).


Suppose the Ghirlanda-Guerra identities hold and A as above, then:

E 〈Rn ≈ A, R`,n+1 ≈ a`,n, Rn,n+1 < a∗n + ε〉 > 0

Heart of Ultrametricity: for all n you can always find a newnon-identical “replicate” of n, such that with respect to therest replicas 1, . . . , n − 1 it is “identical”(clustering).

Given this property ultrametricity follows easily bycontradiction(picture on board).


Ghirlanda Guerra Identities

Ruelle Probability Cascades consist the canonical measuresatisfying the Ghirlanda-Guerra identities.

Connected to stochastic stability and in general invarianceproperties of the measure.

Question: What is the most we can extract?


Invariance under Transformationss

Consider n bounded measurable functions f1, . . . , fn : IR→ IRand let:

F (σ, σ1, . . . , σn) = f1(σ · σ1) + . . .+ fn(σ · σn)

Define the functions:

F`(σ, σ1, . . . , σn) = F (σ, σ1, . . . , σn)− f`(σ · σ`) + E〈f`(R1,2)〉.

Theorem(Panchenko)

Suppose the Ghirlanda-Guerra identities hold and let Φ be a boundedmeasurable function of Rn = (R`,`′)`,`′≤n. Then:

E〈Φ〉 = E⟨

Φ exp∑n`=1 F`(σ

`, σ1, . . . , σn)

〈exp∑n`=1 F`(σ, σ

1, . . . , σn)〉−

⟩where the average 〈·〉− with respect to G is in σ only for fixed σ1, . . . , σn.


Hilbert Space Approach

Let H a Hilbert space endowed with the annealed Gibbs measureµ = EG⊗∞. Consider the spaces Mn = L∞ (H, σ(Rn), µ) andSk = L∞ (H, σ(R1,k), µ).

Conditional Expectation Interpretation

Let ψ ∈ Sn+1 and suppose the Ghirlanda-Guerra identities hold, then:

E〈ψ(R1,n+1)∣∣σ(Rn)〉 =

1

nE〈ψ〉+

n∑k=2

1

nψ(R1,k) ∈Mn

Hajek Projections, Replica Equivalence.

Reinterpretation: Ghirlanada-Guerra is a tool for calculatingconditional expectations of a certain kind. Geometricstatement about the Hilbert space.

This decomposition property is related to both invariance andultrametricity(picture). There must be some underlyingOptimality Principle.


Encoding the GG identities

In order to prove the Replication Property, we need control overσ(Rn+1). Let f : Rn+1 → IR a bounded measurable function,consider the transformation:

Qt(f ) =exp tE〈f |Rn〉〈exp tf 〉−

= exp (tE〈f |Rn〉 − log〈exp tf 〉−)

Observe that Q0(f ) = 1.Taking derivative with respect to t:

∂tQt(f ) =

(E〈f |Rn〉 −

〈f exp tf 〉−〈exp tf 〉−

)Qt(f )

Evaluating the derivative at zero and using the fact thatσ(Rn) measurable w.r.t 〈·〉−, we get:

∂tQt(f )∣∣t=0

= 〈E〈f |Rn〉 − f 〉− ∈ mσ(Rn)


Extended Invariance

Ideally we would like to conclude that for any boundedmeasurable function Φ of Rn: E

⟨Φ · ∂tQt(f )

∣∣t=0

⟩= 0

This would follow from GG if f ∈ bmσ(R1,n+1) ⊂ bmσ(Rn+1).Nevertheless, due to linearity of expectation we can actuallyhandle f ∈ spanA1, . . . ,An with Ak = L∞(H, σ(Rk,n+1), µ).

Theorem(Reinterpretation)

Suppose that the Ghirlanda-Guerra identities hold, Φ ∈ bmσ(Rn)and f =

∑n`=1 f` where f` ∈ A`, then for any t > 0:

E 〈ΦQt(f )〉 = E 〈Φ〉Proof: Ghirlanda-Guerra imply that all derivatives at zero vanish, usingTaylor Remainder and boundedness conclude that the function isconstant.


Replication PropertyPreliminaries

Statement about the probability of finding a duplicate of onereplica given the realization of the rest n − 1 replicas.

The existence of replicas such that E〈Rn ≈ A〉 > 0, impliesthat the Gibbs weight of the set

Ω(σ1, . . . , σn−1) = σ : σ · σ` ≈ a`,n for ` ≤ n − 1

is positive G (Ω) > 0 for the given choice of A.

Polarization: Assuming that replication does not hold for A:

E⟨Rn ≈ A, σn+1 ∈ Ωn−1, σ

n+1 · σn ≥ a∗n + ε⟩> 0

Focus on B(σn) = σ : σ · σn ≥ a∗n + ε(smaller set, compatible with GG).


Replication Property II

Under the assumption of non-replication and Rn ≈ A, it mustbe the case that Ωn−1 ⊂ B(σ) for all σ ∈ Ωn−1. Hence:

W1(σ) := G (B(σ)) ≥ G (Ωn−1) > 0

Handle: if we could show that under an appropriate t-changeof density, W1 ≤ f (t)→ 0 we would obtain a contradiction.

Control of “Weights” under GG transformations


Augmenting Ghirlanda-Guerra Identities

Consider a finite index set A and a partition of H in sets Ba suchthat their indicators are measurable functions of inner productsbetween σ, σ1, . . . , σn.

Let Wa = Wa(Rn) = G (Ba) for a ∈ A. For W = (Wa)a∈A Definethe transformations:

Tt(W ) =

(⟨IBa(σ) exp tF (σ, σ1, . . . , σn)

⟩−

〈exp tF )〉−

)


Suppose that the Ghirlanda-Guerra identities hold then for any boundedmeasurable function φ : IRn×n × IR|A| and F ∈ spanA1, . . . ,An:

E 〈φ(Rn,W )〉 = E⟨φ(Rn,Tt(W )) exp tE〈F |Rn〉

〈exp tF 〉−

⟩Proof: Approximate by polynomials through use of auxiliary replicas,continuous functions, finally bounded functions.


Proof of Replication Property

We aim to control W1 = G (B(σ)), so naturally we will use thepartition B,Bc and the funtion F (σ) = I (σ · σn ≥ an + ε). Thetransformation Tt is:

Tt(W1, 1−W1) =

(W1e

t

W1et + 1−W1,

1−W1

W1et + 1−W1

)The invariance property gives for φ = I (Rn ≈ A,W1 ∈ (p, p

′)) :

E〈φ(Rn,W )〉 = E〈φ(Rn,Tt(W ))etγ

W1et + 1−W1〉

Decomposing I (Rn = A) = I (Rn−1 ≈ A)I (σ ∈ Ωn−1), integratingwith respect to σn, we get an upper bound of the form:

E〈W1(σ′)I[Tt(W1(σ

′)) ∈ (p, p′)

]〉

The proof concludes by using the inverse transformationT−1t = T−t and showing that under the events we are discussing

W1 ≤ f (t)→ 0 as t →∞.


Open Questions

1 Can we recover Ghirlanda-Guerra identity solely through theconstruction of RPC without using Bolthausen-Sznitmaninvariance?

2 Provide a direct connection between concentration of the freeenergy and “Hajek projections”(GG).

3 Clarify the implications of positivity. What is the relationshipwith Chaos? Contrast between SK and EA

Thank You