Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan...

Learning Gaussian Tree Models: Analysis of ErrorExponents and Extremal Structures

Vincent TanAnimashree Anandkumar, Alan Willsky

Stochastic Systems Group,Laboratory for Information and Decision Systems,

Massachusetts Institute of Technology

Allerton Conference (Sep 30, 2009)

1/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 1 / 20

Motivation

Given a set of i.i.d. samples drawn fromp, a Gaussian tree model.

Inferring structure of Phylogenetic Trees from observed data.

Carlson et al. 2008,PLoS Comp. Bio.

Motivation

Given a set of i.i.d. samples drawn fromp, a Gaussian tree model.

Inferring structure of Phylogenetic Trees from observed data.

Carlson et al. 2008,PLoS Comp. Bio.

More motivation

What is the exact rate of decay of the probability of error?

How do the structure and parameters of the model influence theerror exponent (rate of decay)?

What are extremal tree distributions for learning?

Consistency is well established (Chow and Wagner 1973).

Error Exponent is a quantitative measure of the “goodness” oflearning.

More motivation

Main Contributions

1 Provide the exact Rate of Decay for a given p.

2 Rate of decay ≈ SNR for learning.

3 Characterized the extremal trees structures for learning, i.e., starsand Markov chains.

Stars have the slowest rate. Chains have the fastest rate.

Main Contributions

Notation and Background

p = N (0,Σ): d-dimensional Gaussian tree model.

Samples xn = {x1, x2, . . . , xn} drawn i.i.d. from p.

p: Markov on Tp = (V, Ep), a tree.

p: Factorizes according to Tp.

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1),

Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

Max-Likelihood Learning of Tree Distributions(Chow-Liu)

Denote p = pxn as the empirical distribution of xn, i.e.,

p(x) := N (x; 0, Σ)

where Σ is the empirical covariance matrix of xn.

pe: Empirical on edge e.

Reduces to a max-weight spanning tree problem (Chow-Liu 1968)

ECL(xn) = argmaxEq : q∈Trees

∑e∈Eq

I(pe). I(pe) := I(Xi;Xj).

p(x) := N (x; 0, Σ)

∑e∈Eq

p(x) := N (x; 0, Σ)

∑e∈Eq

Max-Likelihood Learning of Tree Distributions

True MIs {I(pe)} Max-weight spanning tree Ep

Empirical MIs {I(pe)} from xn Max-weight spanning tree ECL(xn) 6= Ep

Problem StatementThe estimated edge set is ECL(xn)

and the error event is{ECL(xn) 6= Ep

Find and analyze the error exponent Kp:

Kp := limn→∞

logP({ECL(xn) 6= Ep

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

Problem StatementThe estimated edge set is ECL(xn) and the error event is{

ECL(xn) 6= Ep

Kp := limn→∞

logP({ECL(xn) 6= Ep

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

ECL(xn) 6= Ep

Kp := limn→∞

logP({ECL(xn) 6= Ep

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

ECL(xn) 6= Ep

Kp := limn→∞

logP({ECL(xn) 6= Ep

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

ECL(xn) 6= Ep

Kp := limn→∞

logP({ECL(xn) 6= Ep

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

The Crossover Rate I

Two pairs of nodes e, e′ ∈(V

)with distribution pe,e′ , s.t.

I(pe) > I(pe′).

Consider the crossover event:

{I(pe) ≤ I(pe′)}.

Definition: Crossover Rate

Je,e′ := limn→∞

logP ({I(pe) ≤ I(pe′)}) .

This event may potentially lead to an error in structure learning. Why?

I(pe) > I(pe′).

{I(pe) ≤ I(pe′)}.

I(pe) > I(pe′).

{I(pe) ≤ I(pe′)}.

I(pe) > I(pe′).

{I(pe) ≤ I(pe′)}.

I(pe) > I(pe′).

{I(pe) ≤ I(pe′)}.

The Crossover Rate II

TheoremThe crossover rate is

Je,e′ = infq∈Gaussians

{D(q || pe,e′) : I(qe′) = I(qe)

By assumption I(pe) > I(pe′). v

pe,e′

q∗e,e′{I(qe)= I(qe′)}

D(q∗e,e′ ||pe,e′)

{D(q || pe,e′) : I(qe′) = I(qe)

pe,e′

{D(q || pe,e′) : I(qe′) = I(qe)

pe,e′

Error Exponent for Structure Learning II

P({ECL(xn) 6= Ep

}).= exp(−nKp).

Theorem (First Result)

Kp = mine′ /∈Ep

e∈Path(e′;Ep)Je,e′

Error Exponent for Structure Learning II

P({ECL(xn) 6= Ep

}).= exp(−nKp).

Theorem (First Result)

Kp = mine′ /∈Ep

Approximating the Crossover Rate I

Definition: pe,e′ satisfies the very noisy learning condition if

||ρe| − |ρe′ || < ε ⇒ I(pe) ≈ I(pe′).

Euclidean Information Theory (Borade, Zheng 2007).

Approximating the Crossover Rate II

Theorem (Second Result)The approximate crossover rate is:

Je,e′ =(I(pe′)− I(pe))

2 Var(se′ − se)

where se is the information density:

se(xi, xj) = logpi,j(xi, xj)

pi(xi)pj(xj)

The approximate error exponent is

Kp = mine′∈Ep

2 Var(se′ − se)

pi(xi)pj(xj)

Kp = mine′∈Ep

2 Var(se′ − se)

pi(xi)pj(xj)

Kp = mine′∈Ep

Correlation Decay

w@@@@@ �

��

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pp p p p p p p p p p p p p p p p p p p p p p p px1

ρi,j = E[xi xj].

Markov property⇒ ρ1,3 = ρ1,2 × ρ2,3.

Correlation decay⇒ |ρ1,4| ≤ |ρ1,3|.

(1, 4) is not likely to be mistaken as a true edge.

Only need to consider triangles in the true tree.

Correlation Decay

w@@@@@ �

��

ρi,j = E[xi xj].

Correlation Decay

w@@@@@ �

��

ρi,j = E[xi xj].

Correlation Decay

w@@@@@ �

��

ρi,j = E[xi xj].

Extremal Structures I

Fix ρ, a vector of correlation coefficients on the tree, e.g.

ww w@@@@@ �

��

ρ3ρ4 ρ5

ρ := [ρ1, ρ2, ρ3, ρ4, ρ5].

Which structures gives the highest and lowest exponents?

Extremal Structures I

Fix ρ, a vector of correlation coefficients on the tree, e.g.

ww w@@@@@ �

��

ρ3ρ4 ρ5

ρ := [ρ1, ρ2, ρ3, ρ4, ρ5].

Which structures gives the highest and lowest exponents?

Extremal Structures II

Theorem (Main Result)

Worst: The star minimizes Kp.

Kstar ≤ Kp.

Best: The Markov chain maximizes Kp.

Kchain ≥ Kp.

ρ3ρ4

w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)

Extremal Structures III

Chain, Star and Hybrid Graphs for d = 10.

Number of samples n

ChainHybridStar

2.5x 10

Number of samples n

Plot of the error probability and error exponent for 3 tree graphs.

Extremal Structures III

Chain, Star and Hybrid Graphs for d = 10.

Number of samples n

ChainHybridStar

2.5x 10

Number of samples n

Plot of the error probability and error exponent for 3 tree graphs.

Extremal Structures IV

Remarks:Universal result.

Extremal structures wrt diameter are the extremal structures forlearning.

Corroborates our intuition about correlation decay.

Extremal Structures IV

Remarks:Universal result.

Extremal structures wrt diameter are the extremal structures forlearning.

Corroborates our intuition about correlation decay.

Extensions

Significant reduction of complexity for computation of errorexponent.

Finding the best distributions for fixed ρ.

Effect of adding and deleting nodes and edges on error exponent.

Extensions

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

deviations.

Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan...

Documents

Transcript of Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan...

Gaussian Based Visualization of Gaussian and Non-Gaussian ...

Shannon Meets Nyquist: The Interplay between Capacity ...webee.technion.ac.il/Sites/People/YoninaEldar/...Forty-Ninth Annual Allerton Conference Allerton House, UIUC, Illinois, USA

Designated Chapel Allerton Neighbourhood Area

Brindle Close, Allerton, BD15€¦ · Brindle Close, Allerton, BD15 £87,500 Freehold Two Bedroom Semi Detached EPC Rating: C . Main Description TWO BEDROOM SEMI DETACHED PROPERTY

56th Allerton Conference - publish.illinois.edupublish.illinois.edu/cslallertonconference/files/2018/03/Allerton... · University of Illinois at Urbana-Champaign: Paulo Tabuada, UCLA;

Delivery area H - Kippax, Allerton Bywater & surrounding areas

{ Settling the Northern Colonies Chapter Three. Alden, John Allerton, Isaac Mary (Norris) Allerton, wife Bartholomew Allerton, son

Tourism in a LEDC Kenya Miss Parson – Allerton Grange School.

Allerton Residence (Design Option 2) mm South-west ... · Allerton Residence (Design Option 2) mm South-west elevation Perspective View Images for illustrative purposes only. 2700mm

Shannon Meets Nyquist: The Interplay between Capacity …webee.technion.ac.il/people/YoninaEldar/conferences/06120289.pdf · Forty-Ninth Annual Allerton Conference Allerton House,

The Allerton Family Journal

DEVELOPMENT INDICATORS Miss Parson – Allerton Grange School.

04/17/2008 ILLINOIS DEPARTMENT OF AGRICULTURE …...harry allen box 200 allerton, il 61810-0000 49 1 allerton supply company newman, il 61942-0000 49 2 allerton supply company ridgefarm,

BEAK-PINES ORGANIC AREA ALLERTON …...BEAK-PINES ORGANIC AREA ALLERTON CONFERENCE 2 3 ALLERTON CONFERENCE HISTORY On October 17, 1987 the firstOrganic Area Allerton Conference was

tavener cottage, stone allerton, abridge, somerset, bs26 2NNmedia.rightmove.co.uk/116k/115258/70051802/115258... · 2017-11-10 · TAVENER COTTAGE STONE ALLERTON AXBRIDGE SOMERSET

UIUC Allerton Park Master Plan

Allerton highly qualified and more - what is new

Specification For Single Residential Home … pdfs/Allerton/309 S...Page 1 Specification For Single Residential Home Renovation/Repair Address: 309 South Green Street, Allerton, IA

Penny Lane (Allerton) - Crosby Penny Lane (Allerton) … Penny Lane (Allerton...Penny Lane (Allerton) - Crosby Penny Lane (Allerton) - North Park (Bootle) serving: Allerton Wavertree

Dixons Allerton Academy Primary Provision Guidance Nursery ...€¦ · Dixons Allerton Academy Primary Provision Guidance ... last no longer than 10 to 15 minutes. ... lolly sticks,