Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal InferenceSara Magliacane1,2, Tom Claassen1,3, Joris M. Mooij1

1 University of Amsterdam 2 VU Amsterdam 3 Radboud University Nijmegen

Current'best'choice'

CausAMCausality@AmsterdaM

Abstract

We propose Joint Causal Inference (JCI), a powerful formulation ofcausal discovery over multiple datasets in which we jointly learn both thecausal structure and targets of interventions from independence testresults. Our implementation ACID-JCI substantially improves the accuracyof the causal predictions with respect to the state-of-the-art.

Causal discovery methods

To answer “what if?” questions we need the causal structure. Two maincategories for methods that learn causal structures from data:

•Score-based: evaluate models using a penalized likelihood score

•Constraint-based: use statistical independences to express con-straints over possible causal models

Advantage of constraint-based methods:

• can handle latent confounders naturally

Advantage of score-based methods:

• can formulate joint inference on observational and experimental dataand learn the targets of interventions, e.g. [Eaton and Murphy, 2007].

Goal: Can we perform joint inference using constraint-based methods?

Joint causal inference (JCI)

We propose to model jointly several observational or experimental datasets{Dr}r∈{1...n} with zero or more possibly unknown intervention targets.

We assume a unique underlying causal DAG across datasets defined oversystem variables {Xj}j∈X (some of which possibly hidden). Consequence:we do not allow certain interventions, e.g. perfect interventions.

We introduce two types of dummy variables in the data:

• a regime variable R, indicating the dataset Dr a data point is from

• intervention variables {Ii}i∈I, which are functions of R.

We assume that we can represent the whole system as an acyclic SCM:R = ER,

Ii = gi(R), i ∈ I,Xj = fj(Xpa(Xj)∩X , Ipa(Xj)∩I, Ej), j ∈ X ,

P((Ek)k∈X∪{R}

)=

∏k∈X∪{R}

P (Ek).

We represent this SCM with a causal DAG C, for example:

R I1 I2 X1 X2 X4

1 20 0 0.1 0.2 0.51 20 0 0.13 0.21 0.491 20 0 . . . . . . . . .

2 20 1 . . . . . . . . .

3 30 0 . . . . . . . . .

4 30 1 . . . . . . . . .

4 datasets with 2 interventions

R

I1 I2

X1

X2

X3

X4

Causal DAG representing all 4 datasets

We assume Causal Markov and Minimality Assumptions hold in C.

Deterministic relations in JCI: R determines each of {Ii}i∈I and thereare no other deterministic relations. In this setting D-separation ⊥D isprovably complete. We conjecture it may also be complete more generally.

To allow these deterministic relations, we relax faithfulness assumption to:

D-Faithfulness assumption: X ⊥⊥ Y |W =⇒ X ⊥D Y |W [C].

Joint Causal Inference = Given all the assumptions, reconstruct thecausal DAG C from independence test results.

Problem: Current constraint-based methods cannot work with JCI.

Extending constraint-based methods for JCI

We propose a simple but effective strategy for dealing with faithfulnessviolations due to functionally determined relations, e.g. in JCI.

This strategy can be applied to any constraint-based method, if it can dealwith partial inputs (missing results for certain independence tests).

1. Rephrase a constraint-based method in terms of d-separations in-stead of independence test results

2. Decide for each independence test result which d-separations can besoundly derived and provide them as input to the method:

•X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

•X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where ⊥d is d-separation, 6⊥d is d-connection, and Det(W ) are the vari-ables determined by (a subset of) W .

Under Causal Markov, Minimality and D-Faithfulness we show this strategyis sound. Conjecture: sound also for a larger class of deterministic relations.

Ancestral Causal Inference with Determinism (ACID)

We implement the strategy in ACID, a determinism-tolerant extension ofACI [Magliacane et al., 2016], a causal discovery method that accuratelyreconstructs ancestral relations in the presence of latent confounders.

ACI is based on a set of logical rules, e.g.:

(X ⊥⊥ Y | Z) ∧ (X 699K Z) =⇒ X 699K Y.

ACID implements the proposed strategy for dealing with faithfulness viola-tions and reformulates the rules of ACI in terms of d-separation, e.g.:

(X ⊥d Y | Z) ∧ (X 699K Z) =⇒ X 699K Y.

ACID-JCI: To improve the identifiability and accuracy of the predictions,we also add as background knowledge a series of rules encoding the JCIbackground knowledge on the dummy variables, e.g.:

∀i ∈ I,∀j ∈ X : (Xj 699K R) ∧ (Xj 699K Ii).

Preliminary evaluation on simulated data

•Simulated data: 600 randomly generated causal graphs

•PR curves for ancestral (left) and nonancestral (right) relations

•ACID-JCI substantially improves the accuracy compared to learningthe causal structures on each dataset separately and merging them.

References

D. Eaton and K. Murphy. Exact Bayesian structure learning from uncertain interventions.In AISTATS, pages 107–114, 2007.

Sara Magliacane, Tom Claassen, and Joris M. Mooij. Ancestral Causal Inference. In NIPS,2016.

Working paper: https://arxiv.org/abs/1611.10351

SM and JMM were supported by NWO (VIDI grant 639.072.410). SM was also sup-ported by COMMIT/ under the Data2Semantics project. TC was supported by NWO grant612.001.202 (MoCoCaDi), and EU-FP7 grant agreement n.603016 (MATRICS).

NIPS 2016 Barcelona, Spain; 10-12-2016

Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Science

Transcript of Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster