Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

1
Joint Causal Inference Sara Magliacane 1,2 , Tom Claassen 1,3 , Joris M. Mooij 1 1 University of Amsterdam 2 VU Amsterdam 3 Radboud University Nijmegen CausAM Causality@AmsterdaM Abstract We propose Joint Causal Inference (JCI), a powerful formulation of causal discovery over multiple datasets in which we jointly learn both the causal structure and targets of interventions from independence test results. Our implementation ACID-JCI substantially improves the accuracy of the causal predictions with respect to the state-of-the-art. Causal discovery methods To answer “what if?” questions we need the causal structure. Two main categories for methods that learn causal structures from data: Score-based: evaluate models using a penalized likelihood score Constraint-based: use statistical independences to express con- straints over possible causal models Advantage of constraint-based methods: can handle latent confounders naturally Advantage of score-based methods: can formulate joint inference on observational and experimental data and learn the targets of interventions, e.g. [Eaton and Murphy, 2007]. Goal: Can we perform joint inference using constraint-based methods? Joint causal inference (JCI) We propose to model jointly several observational or experimental datasets {D r } r ∈{1...n} with zero or more possibly unknown intervention targets. We assume a unique underlying causal DAG across datasets defined over system variables {X j } j ∈X (some of which possibly hidden). Consequence : we do not allow certain interventions, e.g. perfect interventions. We introduce two types of dummy variables in the data: a regime variable R, indicating the dataset D r a data point is from intervention variables {I i } i∈I , which are functions of R. We assume that we can represent the whole system as an acyclic SCM: R = E R , I i = g i (R), i ∈I , X j = f j (X pa(X j )∩X ,I pa(X j )∩I ,E j ), j ∈X , P ( (E k ) k ∈X ∪{R} ) = Y k ∈X ∪{R} P (E k ). We represent this SCM with a causal DAG C , for example: R I 1 I 2 X 1 X 2 X 4 1 20 0 0.1 0.2 0.5 1 20 0 0.13 0.21 0.49 1 20 0 ... ... ... 2 20 1 ... ... ... 3 30 0 ... ... ... 4 30 1 ... ... ... 4 datasets with 2 interventions R I 1 I 2 X 1 X 2 X 3 X 4 Causal DAG representing all 4 datasets We assume Causal Markov and Minimality Assumptions hold in C . Deterministic relations in JCI: R determines each of {I i } i∈I and there are no other deterministic relations. In this setting D-separation D is provably complete. We conjecture it may also be complete more generally. To allow these deterministic relations, we relax faithfulness assumption to: D-Faithfulness assumption: X ⊥⊥ Y | W = X D Y | W [C ]. Joint Causal Inference = Given all the assumptions, reconstruct the causal DAG C from independence test results. Problem: Current constraint-based methods cannot work with JCI. Extending constraint-based methods for JCI We propose a simple but effective strategy for dealing with faithfulness violations due to functionally determined relations, e.g. in JCI. This strategy can be applied to any constraint-based method, if it can deal with partial inputs (missing results for certain independence tests). 1. Rephrase a constraint-based method in terms of d-separations in- stead of independence test results 2. Decide for each independence test result which d-separations can be soundly derived and provide them as input to the method: X 6 ⊥⊥ Y |W = X 6 d Y |W X 6Det(W ) and Y 6Det(W ) and X ⊥⊥ Y | W = X d Y | Det(W ) where d is d-separation, 6 d is d-connection, and Det(W ) are the vari- ables determined by (a subset of) W . Under Causal Markov, Minimality and D-Faithfulness we show this strategy is sound. Conjecture : sound also for a larger class of deterministic relations. Ancestral Causal Inference with Determinism (ACID) We implement the strategy in ACID, a determinism-tolerant extension of ACI [Magliacane et al., 2016], a causal discovery method that accurately reconstructs ancestral relations in the presence of latent confounders. ACI is based on a set of logical rules, e.g.: (X ⊥⊥ Y | Z ) (X 6 99K Z )= X 6 99K Y. ACID implements the proposed strategy for dealing with faithfulness viola- tions and reformulates the rules of ACI in terms of d-separation, e.g.: (X d Y | Z ) (X 6 99K Z )= X 6 99K Y. ACID-JCI: To improve the identifiability and accuracy of the predictions, we also add as background knowledge a series of rules encoding the JCI background knowledge on the dummy variables, e.g.: i ∈I , j ∈X :(X j 6 99K R) (X j 6 99K I i ). Preliminary evaluation on simulated data Simulated data: 600 randomly generated causal graphs PR curves for ancestral (left) and nonancestral (right) relations ACID-JCI substantially improves the accuracy compared to learning the causal structures on each dataset separately and merging them. References D. Eaton and K. Murphy. Exact Bayesian structure learning from uncertain interventions. In AISTATS, pages 107–114, 2007. Sara Magliacane, Tom Claassen, and Joris M. Mooij. Ancestral Causal Inference. In NIPS, 2016. Working paper : https://arxiv.org/abs/1611.10351 SM and JMM were supported by NWO (VIDI grant 639.072.410). SM was also sup- ported by COMMIT/ under the Data2Semantics project. TC was supported by NWO grant 612.001.202 (MoCoCaDi), and EU-FP7 grant agreement n.603016 (MATRICS). NIPS 2016 Barcelona, Spain; 10-12-2016

Transcript of Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Page 1: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal InferenceSara Magliacane1,2, Tom Claassen1,3, Joris M. Mooij1

1 University of Amsterdam 2 VU Amsterdam 3 Radboud University Nijmegen

Current'best'choice'

CausAMCausality@AmsterdaM

Abstract

We propose Joint Causal Inference (JCI), a powerful formulation ofcausal discovery over multiple datasets in which we jointly learn both thecausal structure and targets of interventions from independence testresults. Our implementation ACID-JCI substantially improves the accuracyof the causal predictions with respect to the state-of-the-art.

Causal discovery methods

To answer “what if?” questions we need the causal structure. Two maincategories for methods that learn causal structures from data:

•Score-based: evaluate models using a penalized likelihood score

•Constraint-based: use statistical independences to express con-straints over possible causal models

Advantage of constraint-based methods:

• can handle latent confounders naturally

Advantage of score-based methods:

• can formulate joint inference on observational and experimental dataand learn the targets of interventions, e.g. [Eaton and Murphy, 2007].

Goal: Can we perform joint inference using constraint-based methods?

Joint causal inference (JCI)

We propose to model jointly several observational or experimental datasets{Dr}r∈{1...n} with zero or more possibly unknown intervention targets.

We assume a unique underlying causal DAG across datasets defined oversystem variables {Xj}j∈X (some of which possibly hidden). Consequence:we do not allow certain interventions, e.g. perfect interventions.

We introduce two types of dummy variables in the data:

• a regime variable R, indicating the dataset Dr a data point is from

• intervention variables {Ii}i∈I, which are functions of R.

We assume that we can represent the whole system as an acyclic SCM:R = ER,

Ii = gi(R), i ∈ I,Xj = fj(Xpa(Xj)∩X , Ipa(Xj)∩I, Ej), j ∈ X ,

P((Ek)k∈X∪{R}

)=

∏k∈X∪{R}

P (Ek).

We represent this SCM with a causal DAG C, for example:

R I1 I2 X1 X2 X4

1 20 0 0.1 0.2 0.51 20 0 0.13 0.21 0.491 20 0 . . . . . . . . .

2 20 1 . . . . . . . . .

3 30 0 . . . . . . . . .

4 30 1 . . . . . . . . .

4 datasets with 2 interventions

R

I1 I2

X1

X2

X3

X4

Causal DAG representing all 4 datasets

We assume Causal Markov and Minimality Assumptions hold in C.

Deterministic relations in JCI: R determines each of {Ii}i∈I and thereare no other deterministic relations. In this setting D-separation ⊥D isprovably complete. We conjecture it may also be complete more generally.

To allow these deterministic relations, we relax faithfulness assumption to:

D-Faithfulness assumption: X ⊥⊥ Y |W =⇒ X ⊥D Y |W [C].

Joint Causal Inference = Given all the assumptions, reconstruct thecausal DAG C from independence test results.

Problem: Current constraint-based methods cannot work with JCI.

Extending constraint-based methods for JCI

We propose a simple but effective strategy for dealing with faithfulnessviolations due to functionally determined relations, e.g. in JCI.

This strategy can be applied to any constraint-based method, if it can dealwith partial inputs (missing results for certain independence tests).

1. Rephrase a constraint-based method in terms of d-separations in-stead of independence test results

2. Decide for each independence test result which d-separations can besoundly derived and provide them as input to the method:

•X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

•X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where ⊥d is d-separation, 6⊥d is d-connection, and Det(W ) are the vari-ables determined by (a subset of) W .

Under Causal Markov, Minimality and D-Faithfulness we show this strategyis sound. Conjecture: sound also for a larger class of deterministic relations.

Ancestral Causal Inference with Determinism (ACID)

We implement the strategy in ACID, a determinism-tolerant extension ofACI [Magliacane et al., 2016], a causal discovery method that accuratelyreconstructs ancestral relations in the presence of latent confounders.

ACI is based on a set of logical rules, e.g.:

(X ⊥⊥ Y | Z) ∧ (X 699K Z) =⇒ X 699K Y.

ACID implements the proposed strategy for dealing with faithfulness viola-tions and reformulates the rules of ACI in terms of d-separation, e.g.:

(X ⊥d Y | Z) ∧ (X 699K Z) =⇒ X 699K Y.

ACID-JCI: To improve the identifiability and accuracy of the predictions,we also add as background knowledge a series of rules encoding the JCIbackground knowledge on the dummy variables, e.g.:

∀i ∈ I,∀j ∈ X : (Xj 699K R) ∧ (Xj 699K Ii).

Preliminary evaluation on simulated data

•Simulated data: 600 randomly generated causal graphs

•PR curves for ancestral (left) and nonancestral (right) relations

•ACID-JCI substantially improves the accuracy compared to learningthe causal structures on each dataset separately and merging them.

References

D. Eaton and K. Murphy. Exact Bayesian structure learning from uncertain interventions.In AISTATS, pages 107–114, 2007.

Sara Magliacane, Tom Claassen, and Joris M. Mooij. Ancestral Causal Inference. In NIPS,2016.

Working paper: https://arxiv.org/abs/1611.10351

SM and JMM were supported by NWO (VIDI grant 639.072.410). SM was also sup-ported by COMMIT/ under the Data2Semantics project. TC was supported by NWO grant612.001.202 (MoCoCaDi), and EU-FP7 grant agreement n.603016 (MATRICS).

NIPS 2016 Barcelona, Spain; 10-12-2016