Learning with Tree-averaged Densities and Distributions
description
Transcript of Learning with Tree-averaged Densities and Distributions
Learning with Tree-averaged Densities and Distributions
Sergey KirshnerAlberta Ingenuity Centre for Machine Learning,
Department of Computing Science, University of Alberta, Canada
December 5, 2007
NIPS 2007Poster W12
NIPS 2007 Learning with Tree-averaged Densities and Distributions 2
Overview• Want to fit density to complete multivariate
data
• New density estimation model based on averaging over tree-dependence structures– Distribution = Univariate Marginals + Copula– Bayesian averaging over tree-structured
copulas– Efficient parameter estimation for tree-
averaged copulas• Can solve problems with 10-30 dimensions
NIPS 2007 Learning with Tree-averaged Densities and Distributions 3
Most Popular Distribution…• Interpretable• Closed under taking
marginals• Generalizes to
multiple dimensions• Models pairwise
dependence• Tractable• 245 pages out of
691 from Continuous Multivariate Distributions by Kotz, Balakrishnan, and Johnson
-3-2
-10
12
3
-3-2
-10
12
30
0.05
0.1
0.15
0.2
NIPS 2007 Learning with Tree-averaged Densities and Distributions 4
What If the Data Is NOT Gaussian?
NIPS 2007 Learning with Tree-averaged Densities and Distributions 5
Curse of Dimensionality
1/n
1/n
nd cells
-3-2
-10
12
3
-3-2
-10
12
30
0.05
0.1
0.15
0.2
V[-2,2]d ≈ 0.9545d
[Bellman 57]
NIPS 2007 Learning with Tree-averaged Densities and Distributions 6
Avoiding the Curse: Step 1Separating Univariate Marginals
univariate marginals,independent variables,
multivariate dependence term,copula
NIPS 2007 Learning with Tree-averaged Densities and Distributions 7
Monotonic Transformation of the Variables
NIPS 2007 Learning with Tree-averaged Densities and Distributions 8
CopulaCopula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals:
NIPS 2007 Learning with Tree-averaged Densities and Distributions 9
Sklar’s Theorem[Sklar 59]
= +
NIPS 2007 Learning with Tree-averaged Densities and Distributions 10
Example: Bivariate Gaussian Copula
NIPS 2007 Learning with Tree-averaged Densities and Distributions 11
Useful Properties of Copulas• Preserves concordance between the
variables– Rank-based measure of dependence
• Preserves mutual information
• Can be viewed as a canonical form of a multivariate distribution for the purpose of the estimation of multivariate dependence
NIPS 2007 Learning with Tree-averaged Densities and Distributions 12
Copula Density
NIPS 2007 Learning with Tree-averaged Densities and Distributions 13
Separating Univariate Marginals
1. Fit univariate marginals (parametric or non-parametric)
2. Replace data points with cdf’s of the marginals
3. Estimate copula densityInference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95]
NIPS 2007 Learning with Tree-averaged Densities and Distributions 14
What Next?
• Aren’t we back to square one?– Still estimating multivariate density from data
• Not quite– All marginals are fixed– Lots of approaches for copulas
• Vast majority focus on bivariate case– Design models that use only pairs of variables
NIPS 2007 Learning with Tree-averaged Densities and Distributions 15
Tree-Structured Densities
x2
x3
x4
x5
x6
x1
NIPS 2007 Learning with Tree-averaged Densities and Distributions 16
Tree-Structured Copulas
NIPS 2007 Learning with Tree-averaged Densities and Distributions 17
A1A2
A1A3
A1A4
A2A3
A2A4
A3A4
c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)
a1
a3
a2
a4
0.31260.02290.01720.02300.01830.2603
0.31260.02290.01720.02300.01830.2603
c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)
A1A2
A1A3
A1A4
A2A3
A2A4
A3A4
a1
a3
a2
a4
Chow-Liu Algorithm (for Copulas)
a1
a3
a2
a4
NIPS 2007 Learning with Tree-averaged Densities and Distributions 18
a1
a3
a2
a4
a1
a3
a2
a4
Distribution over Spanning Trees[Meilă and Jaakkola 00, 06]
a1
a3
a2
a4
O(d3) !!!
NIPS 2007 Learning with Tree-averaged Densities and Distributions 19
Tree-Averaged Copula• Can compute sum over all dd-2 spanning
trees
• Can be viewed as a mixture over many, many spanning trees
• Can use EM to estimate the parameters– Even though there are dd-2 mixture components!
NIPS 2007 Learning with Tree-averaged Densities and Distributions 20
EM for Tree-Averaged Copulas
• E-step: compute– Can be done in O(d3) per data point
• M-step: update and – Update of is often linear in the number of
points• Gaussian copula: solving cubic equation
– Update of is essentially iterative scaling• Can be done in O(d3) per iteration
Intractable!!!
NIPS 2007 Learning with Tree-averaged Densities and Distributions 21
Experiments: Log-Likelihood on Test Data
UCI ML RepositoryMAGIC data set
12000 10-dimensional vectors
2000 examples in test sets
Average over 10 partitions
NIPS 2007 Learning with Tree-averaged Densities and Distributions 22
Binary-Continuous Data
NIPS 2007 Learning with Tree-averaged Densities and Distributions 23
Summary• Multivariate distribution = univariate
marginals + copula• Copula density estimation via tree-averaging
– Closed form• Tractable parameter estimation algorithm in
ML framework (EM)– O(Nd3) per iteration
• Only bivariate distributions at each estimation– Potentially avoiding the curse of dimensionality
• New model for multi-site rainfall amounts (POSTER W12)