Estimation of causal direction in the presence of latent...

Shohei Shimizu

Osaka University, Japan

with

Kenneth Bollen

University of North Carolina, Chapel Hill, USA

1

Estimation of causal direction

in the presence of latent confounders

and linear non-Gaussian SEMs

Causal Modeling and Machine Learning

Beijing, China, June 2014

Abstract

• Estimation of causal direction of two

observed variables in the presence of

latent confounders

• A key challenge in causal discovery

• Propose a non-Gaussian method

• Not require to specify the number of latent

confounders

• Experiments on artificial and sociology data

2

Background

4

Motivation

• Causality is a main interest in many empirical sciences

• Many recent methods for estimating causal directions (with no temporal information)– Linear non-Gaussian model (Dodge & Rousson 2001; Shimizu et al., 2006)

– Nonlinear model (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Peters et al. 2011)

• Another important challenge: Latent confounders

Sleep

problems

Depression

mood

Sleep

problems

Depression

mood ?

or

Epidemiology (Rosenstrom et al., 2012)

Which is dominant?

Structural equation modeling

(SEM) (Bollen, 1989; Pearl, 2000, 2009)

• A framework for describing causal relations

• An example (of linear cases):

– The value of 𝑥2 is determined by the values of 𝑥1 and

error/exogenous variable 𝑒2 through the linear function

• Generally speaking, if the value of 𝑥1 is changed

and that of 𝑥2 also changes, then 𝑥1 causes 𝑥2

5

= 𝒃𝟐𝟏𝒙𝟏 + 𝒆𝟐x1x2

𝒙𝟐 ∶= 𝒇(𝒙𝟏, 𝒆𝟐)e2

1. Estimation of causal direction when temporal information is not available

2. Coping with latent confounders

6

Major challenges

x1 x2?

x1 x2

or

x1 x2 ?x1 x2 or

f1 f1

• Acyclic SEMs with different directions distinguishable (Dodge & Rousson, 2001; Shimizu et al., 2006)

• Fundamental assumptions:

– e1 and e2 are non-Gaussian

– Independence btw. e1 and e2 (No latent confounders)

where and are error/exogenous variables

Non-Gaussian approach: LiNGAM

(Linear Non-Gaussian Acyclic Model) (Shimizu et al., 2006)

7

or

21212

11

exbx

ex

22

12121

ex

exbx

Model 1: Model 2:

x1 x2

e1 e2

1e 2e

x1 x2

e1 e2

88Different directions give

different data distributionsGaussian Non-Gaussian

(uniform)

Model 1:

Model 2:

x1

x2

x1

x2

e1

e2

x1

x2

e1

e2

x1

x2

x1

x2

x1

x2

212

11

8.0 exx

ex

22

121 8.0

ex

exx

1varvar 21 xx

,021 eEeE

0.8

0.8

• Extension to incorporate non-Gaussian latent

confounders

i

ij

jij

Q

q

qiqii exbfx 1

LiNGAM with latent confounders (Hoyer, Shimizu & Kerminen, 2008)

9

where, WLG, are independent: ),,1( Qqfq

qf

x1 x2 2e1e

1f 2f

2121

1

222

1

1

111

exbfx

efx

Q

q

qq

Q

q

qq

Previous estimation approaches• Explicitly model latent confounders and

compare two models with opposite directions of

causation

– Maximum likelihood principle (Hoyer et al., 2008 )

– Bayesian model selection (Henao & Winther, 2011)

– Laplace / finite mixture of Gaussians for p( )

• Require to specify the number of latent

confounders, which is difficult in general

10

x1 x2

f1

x1 x2

orfQ f1 fQ

… …

2e1e2e1e

ie

Our proposal

Reference:

Shimizu and Bollen (2014)

Journal of Machine Learning Research

In press

)(

2

m

Observations are generated from the LiNGAM

model with possibly different intercepts )(

22

m

)1(

1x)1(

2x

)(

2

mx)1(

1x

)(

2

)(

121

1

)(

22

)(

2

mmQ

q

m

qq

m exbfx

Key idea (1/2)

• Another look at the LiNGAM with latent confounders:

12

x1 x2

f1 fQ…

2e1e

m-th obs.:

)1(

2e)1(

1e

)(

2

me)(

1

me

……

21b

)(

22

m

)1(

22 21b

21b

Key idea (2/2)

• Include the sums of latent confounders as

the observation-specific intercepts:

• Not explicitly model latent confounders

• Neither necessary to specify the number

of latent confounders Q nor estimate the

coefficients

13

)(

2

m

)(

2

)(

121

1

)(

22

)(

2

mmQ

q

m

qq

m exbfx

m-th obs.:

q2

Obs.-specific

intercept

• Compare these two LiNGAM models with opposite

directions:

• Many additional parameters

• Prior for the observation-specific intercepts

• Other para. low-informative: Gaussian with large sd.

• Bayesian model selection (marginal likelihoods)

)()(

121

)(

22

)(

2

)(

1

)(

11

)(

1

m

i

mmm

mmm

exbx

ex

Our approach14

),,1;2,1()( nmim

i )(m

i

Model 3 (x1 x2)

)(

2

)(

22

)(

2

)(

1

)(

212

)(

11

)(

1

mmm

mmmm

ex

exbx

Model 4 (x1 x2)

v

Prior for the observation-specific

intercepts

• Motivation: Central limit theorem

– Sums of independent variables tend to be more Gaussian

• Approximate the density by a bell-shaped curve dist.

• Select the hyper-parameter values that maximize the

marginal likelihood: Empirical Bayes

–

– DOF fixed to be 6 in the experiments below

• Small means similar intercepts

15

Q

q

m

qq

mQ

q

m

qq

m ff1

)(

2

)(

2

1

)(

1

)(

1 ,

~)(

2

)(

1

m

m

t-distribution with sd ,

correlation , and DOF1221,

v

)},(sd0.1,),(sd2.0,0{ lll xx }9.0,,1.0,0{12

l

Experiments on artificial data

8072

5847

34 39

0

20

40

60

80

100

N. latent confounders = 6N. latent confounders = 1

Experimental results (100 obs.)• Data generated from LiNGAM with latent confounders

• Various non-Gaussian distributions

– Laplace, Uniform, asymmetric dist. etc.

• Our method uses Laplace for p( )

17

x1 x2

f1 fQ…

Numbers of successful discoveries (100 rep.)

86

55 58 55 51 54

0

20

40

60

80

100

Our

mthd

Henao:

1, 4, 10 conf.

Hoyer:

1, 4 conf.

Our

mthdHenao:

1, 4, 10 conf.

Hoyer:

1, 4 conf.

ie

Experiment on sociology data

Sociology data

• Source: General Social Survey (n=1380)

– Non-farm background, ages 35-44, white,

male, in the labor force, no missing data for

any of the covariates, 1972-2006

19

Status attainment model(Duncan et al., 1972)

x2: Son’s Income

Evaluation of our method

using the sociology data

Known (temporal)

orderings of 15 pairs

20

Son’s

Education

Father’s

Education

Son’s

Income

Father’s

Education

Son’s

Income

Son’s

Occupation

……

Conclusions

Conclusions

• Estimation of causal direction in the presence of

latent confounders is a major challenge in

causal discovery

• Our proposal: Fit linear non-Gaussian SEM with

possibly different intercepts to data

• Future works

– Test other informative priors for observation-specific

intercepts

– Implement a wider variety of error/prior distributions

(e.g., learn DOF of t dist.)

– Develop extensions using nonlinear/cyclic models (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Lacerda et al., 2008)

instead of LiNGAM

22

Estimation of causal direction in the presence of latent...

Documents

Transcript of Estimation of causal direction in the presence of latent...