Estimation of causal direction in the presence of latent...
Transcript of Estimation of causal direction in the presence of latent...
Shohei Shimizu
Osaka University, Japan
with
Kenneth Bollen
University of North Carolina, Chapel Hill, USA
1
Estimation of causal direction
in the presence of latent confounders
and linear non-Gaussian SEMs
Causal Modeling and Machine Learning
Beijing, China, June 2014
Abstract
• Estimation of causal direction of two
observed variables in the presence of
latent confounders
• A key challenge in causal discovery
• Propose a non-Gaussian method
• Not require to specify the number of latent
confounders
• Experiments on artificial and sociology data
2
Background
4
Motivation
• Causality is a main interest in many empirical sciences
• Many recent methods for estimating causal directions (with no temporal information)– Linear non-Gaussian model (Dodge & Rousson 2001; Shimizu et al., 2006)
– Nonlinear model (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Peters et al. 2011)
• Another important challenge: Latent confounders
Sleep
problems
Depression
mood
Sleep
problems
Depression
mood ?
or
Epidemiology (Rosenstrom et al., 2012)
Which is dominant?
Structural equation modeling
(SEM) (Bollen, 1989; Pearl, 2000, 2009)
• A framework for describing causal relations
• An example (of linear cases):
– The value of 𝑥2 is determined by the values of 𝑥1 and
error/exogenous variable 𝑒2 through the linear function
• Generally speaking, if the value of 𝑥1 is changed
and that of 𝑥2 also changes, then 𝑥1 causes 𝑥2
5
= 𝒃𝟐𝟏𝒙𝟏 + 𝒆𝟐x1x2
𝒙𝟐 ∶= 𝒇(𝒙𝟏, 𝒆𝟐)e2
1. Estimation of causal direction when temporal information is not available
2. Coping with latent confounders
6
Major challenges
x1 x2?
x1 x2
or
x1 x2 ?x1 x2 or
f1 f1
• Acyclic SEMs with different directions distinguishable (Dodge & Rousson, 2001; Shimizu et al., 2006)
• Fundamental assumptions:
– e1 and e2 are non-Gaussian
– Independence btw. e1 and e2 (No latent confounders)
where and are error/exogenous variables
Non-Gaussian approach: LiNGAM
(Linear Non-Gaussian Acyclic Model) (Shimizu et al., 2006)
7
or
21212
11
exbx
ex
22
12121
ex
exbx
Model 1: Model 2:
x1 x2
e1 e2
1e 2e
x1 x2
e1 e2
88Different directions give
different data distributionsGaussian Non-Gaussian
(uniform)
Model 1:
Model 2:
x1
x2
x1
x2
e1
e2
x1
x2
e1
e2
x1
x2
x1
x2
x1
x2
212
11
8.0 exx
ex
22
121 8.0
ex
exx
1varvar 21 xx
,021 eEeE
0.8
0.8
• Extension to incorporate non-Gaussian latent
confounders
i
ij
jij
Q
q
qiqii exbfx 1
LiNGAM with latent confounders (Hoyer, Shimizu & Kerminen, 2008)
9
where, WLG, are independent: ),,1( Qqfq
qf
x1 x2 2e1e
1f 2f
2121
1
222
1
1
111
exbfx
efx
Q
q
Q
q
Previous estimation approaches• Explicitly model latent confounders and
compare two models with opposite directions of
causation
– Maximum likelihood principle (Hoyer et al., 2008 )
– Bayesian model selection (Henao & Winther, 2011)
– Laplace / finite mixture of Gaussians for p( )
• Require to specify the number of latent
confounders, which is difficult in general
10
x1 x2
f1
x1 x2
orfQ f1 fQ
… …
2e1e2e1e
ie
Our proposal
Reference:
Shimizu and Bollen (2014)
Journal of Machine Learning Research
In press
)(
2
m
Observations are generated from the LiNGAM
model with possibly different intercepts )(
22
m
)1(
1x)1(
2x
)(
2
mx)1(
1x
)(
2
)(
121
1
)(
22
)(
2
mmQ
q
m
m exbfx
Key idea (1/2)
• Another look at the LiNGAM with latent confounders:
12
x1 x2
f1 fQ…
2e1e
m-th obs.:
)1(
2e)1(
1e
)(
2
me)(
1
me
……
21b
)(
22
m
)1(
22 21b
21b
Key idea (2/2)
• Include the sums of latent confounders as
the observation-specific intercepts:
• Not explicitly model latent confounders
• Neither necessary to specify the number
of latent confounders Q nor estimate the
coefficients
13
)(
2
m
)(
2
)(
121
1
)(
22
)(
2
mmQ
q
m
m exbfx
m-th obs.:
q2
Obs.-specific
intercept
• Compare these two LiNGAM models with opposite
directions:
• Many additional parameters
• Prior for the observation-specific intercepts
• Other para. low-informative: Gaussian with large sd.
• Bayesian model selection (marginal likelihoods)
)()(
121
)(
22
)(
2
)(
1
)(
11
)(
1
m
i
mmm
mmm
exbx
ex
Our approach14
),,1;2,1()( nmim
i )(m
i
Model 3 (x1 x2)
)(
2
)(
22
)(
2
)(
1
)(
212
)(
11
)(
1
mmm
mmmm
ex
exbx
Model 4 (x1 x2)
v
Prior for the observation-specific
intercepts
• Motivation: Central limit theorem
– Sums of independent variables tend to be more Gaussian
• Approximate the density by a bell-shaped curve dist.
• Select the hyper-parameter values that maximize the
marginal likelihood: Empirical Bayes
–
– DOF fixed to be 6 in the experiments below
• Small means similar intercepts
15
Q
q
m
mQ
q
m
m ff1
)(
2
)(
2
1
)(
1
)(
1 ,
~)(
2
)(
1
m
m
t-distribution with sd ,
correlation , and DOF1221,
v
)},(sd0.1,),(sd2.0,0{ lll xx }9.0,,1.0,0{12
l
Experiments on artificial data
8072
5847
34 39
0
20
40
60
80
100
N. latent confounders = 6N. latent confounders = 1
Experimental results (100 obs.)• Data generated from LiNGAM with latent confounders
• Various non-Gaussian distributions
– Laplace, Uniform, asymmetric dist. etc.
• Our method uses Laplace for p( )
17
x1 x2
f1 fQ…
Numbers of successful discoveries (100 rep.)
86
55 58 55 51 54
0
20
40
60
80
100
Our
mthd
Henao:
1, 4, 10 conf.
Hoyer:
1, 4 conf.
Our
mthdHenao:
1, 4, 10 conf.
Hoyer:
1, 4 conf.
ie
Experiment on sociology data
Sociology data
• Source: General Social Survey (n=1380)
– Non-farm background, ages 35-44, white,
male, in the labor force, no missing data for
any of the covariates, 1972-2006
19
Status attainment model(Duncan et al., 1972)
x2: Son’s Income
Evaluation of our method
using the sociology data
Known (temporal)
orderings of 15 pairs
20
Son’s
Education
Father’s
Education
Son’s
Income
Father’s
Education
Son’s
Income
Son’s
Occupation
……
Conclusions
Conclusions
• Estimation of causal direction in the presence of
latent confounders is a major challenge in
causal discovery
• Our proposal: Fit linear non-Gaussian SEM with
possibly different intercepts to data
• Future works
– Test other informative priors for observation-specific
intercepts
– Implement a wider variety of error/prior distributions
(e.g., learn DOF of t dist.)
– Develop extensions using nonlinear/cyclic models (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Lacerda et al., 2008)
instead of LiNGAM
22