Download - Learning with Tree-averaged Densities and Distributions

Transcript
Page 1: Learning with Tree-averaged Densities and Distributions

Learning with Tree-averaged Densities and Distributions

Sergey KirshnerAlberta Ingenuity Centre for Machine Learning,

Department of Computing Science, University of Alberta, Canada

December 5, 2007

NIPS 2007Poster W12

Page 2: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 2

Overview• Want to fit density to complete multivariate

data

• New density estimation model based on averaging over tree-dependence structures– Distribution = Univariate Marginals + Copula– Bayesian averaging over tree-structured

copulas– Efficient parameter estimation for tree-

averaged copulas• Can solve problems with 10-30 dimensions

Page 3: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 3

Most Popular Distribution…• Interpretable• Closed under taking

marginals• Generalizes to

multiple dimensions• Models pairwise

dependence• Tractable• 245 pages out of

691 from Continuous Multivariate Distributions by Kotz, Balakrishnan, and Johnson

-3-2

-10

12

3

-3-2

-10

12

30

0.05

0.1

0.15

0.2

Page 4: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 4

What If the Data Is NOT Gaussian?

Page 5: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 5

Curse of Dimensionality

1/n

1/n

nd cells

-3-2

-10

12

3

-3-2

-10

12

30

0.05

0.1

0.15

0.2

V[-2,2]d ≈ 0.9545d

[Bellman 57]

Page 6: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 6

Avoiding the Curse: Step 1Separating Univariate Marginals

univariate marginals,independent variables,

multivariate dependence term,copula

Page 7: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 7

Monotonic Transformation of the Variables

Page 8: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 8

CopulaCopula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals:

Page 9: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 9

Sklar’s Theorem[Sklar 59]

= +

Page 10: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 10

Example: Bivariate Gaussian Copula

Page 11: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 11

Useful Properties of Copulas• Preserves concordance between the

variables– Rank-based measure of dependence

• Preserves mutual information

• Can be viewed as a canonical form of a multivariate distribution for the purpose of the estimation of multivariate dependence

Page 12: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 12

Copula Density

Page 13: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 13

Separating Univariate Marginals

1. Fit univariate marginals (parametric or non-parametric)

2. Replace data points with cdf’s of the marginals

3. Estimate copula densityInference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95]

Page 14: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 14

What Next?

• Aren’t we back to square one?– Still estimating multivariate density from data

• Not quite– All marginals are fixed– Lots of approaches for copulas

• Vast majority focus on bivariate case– Design models that use only pairs of variables

Page 15: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 15

Tree-Structured Densities

x2

x3

x4

x5

x6

x1

Page 16: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 16

Tree-Structured Copulas

Page 17: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 17

A1A2

A1A3

A1A4

A2A3

A2A4

A3A4

c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)

a1

a3

a2

a4

0.31260.02290.01720.02300.01830.2603

0.31260.02290.01720.02300.01830.2603

c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)

A1A2

A1A3

A1A4

A2A3

A2A4

A3A4

a1

a3

a2

a4

Chow-Liu Algorithm (for Copulas)

a1

a3

a2

a4

Page 18: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 18

a1

a3

a2

a4

a1

a3

a2

a4

Distribution over Spanning Trees[Meilă and Jaakkola 00, 06]

a1

a3

a2

a4

O(d3) !!!

Page 19: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 19

Tree-Averaged Copula• Can compute sum over all dd-2 spanning

trees

• Can be viewed as a mixture over many, many spanning trees

• Can use EM to estimate the parameters– Even though there are dd-2 mixture components!

Page 20: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 20

EM for Tree-Averaged Copulas

• E-step: compute– Can be done in O(d3) per data point

• M-step: update and – Update of is often linear in the number of

points• Gaussian copula: solving cubic equation

– Update of is essentially iterative scaling• Can be done in O(d3) per iteration

Intractable!!!

Page 21: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 21

Experiments: Log-Likelihood on Test Data

UCI ML RepositoryMAGIC data set

12000 10-dimensional vectors

2000 examples in test sets

Average over 10 partitions

Page 22: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 22

Binary-Continuous Data

Page 23: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 23

Summary• Multivariate distribution = univariate

marginals + copula• Copula density estimation via tree-averaging

– Closed form• Tractable parameter estimation algorithm in

ML framework (EM)– O(Nd3) per iteration

• Only bivariate distributions at each estimation– Potentially avoiding the curse of dimensionality

• New model for multi-site rainfall amounts (POSTER W12)