Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data...
Transcript of Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data...
![Page 1: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/1.jpg)
Blessing of heterogeneous large-scale datafor high-dimensional causal inference
Peter Buhlmann
Seminar fur Statistik, ETH Zurich
joint work with
Jonas PetersMPI Tubingen
Nicolai MeinshausenSfS ETH Zurich
![Page 2: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/2.jpg)
Heterogeneous large-scale data
Big Data
the talk is not (yet) on “really big data”
but we will take advantage of heterogeneityoften arising with large-scale data where
i.i.d./homogeneity assumption is not appropriate
![Page 3: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/3.jpg)
causal inference = intervention analysis
in genomics (for yeast or plants):if we would make an intervention at a single (or many) gene(s),what would be its (their) effect on a response of interest?
want to infer/predict such effects without actually doing theinterventione.g. from observational data (cf. Pearl; Spirtes, Scheines & Glymour)(from observations of a “steady-state system”)
or, as is mainly our focus: fromI observational and interventional data with well-specified
interventionsI changing environments or experimental settings with
“vaguely” or “un-” specified interventionsthat is, from “heterogeneous” data
![Page 4: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/4.jpg)
causal inference = intervention analysis
in genomics (for yeast or plants):if we would make an intervention at a single (or many) gene(s),what would be its (their) effect on a response of interest?
want to infer/predict such effects without actually doing theinterventione.g. from observational data (cf. Pearl; Spirtes, Scheines & Glymour)(from observations of a “steady-state system”)
or, as is mainly our focus: fromI observational and interventional data with well-specified
interventionsI changing environments or experimental settings with
“vaguely” or “un-” specified interventionsthat is, from “heterogeneous” data
![Page 5: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/5.jpg)
Genomics
1. Flowering of Arabidopsis Thaliana
phenotype/response variable of interest:Y = days to bolting (flowering)“covariates” X = gene expressions from p = 21′326 genes
question: infer/predict the effect of knocking-out a single geneon the phenotype/response variable Y?
using statistical method based on n = 47 observational data; validated the top-predictions with randomized experiments
with some moderate success(Stekhoven, Moraes, Sveinbjornsson, Hennig, Maathuis & PB, 2012)
![Page 6: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/6.jpg)
Genomics
1. Flowering of Arabidopsis Thaliana
phenotype/response variable of interest:Y = days to bolting (flowering)“covariates” X = gene expressions from p = 21′326 genes
question: infer/predict the effect of knocking-out a single geneon the phenotype/response variable Y?
using statistical method based on n = 47 observational data; validated the top-predictions with randomized experiments
with some moderate success(Stekhoven, Moraes, Sveinbjornsson, Hennig, Maathuis & PB, 2012)
![Page 7: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/7.jpg)
2. Gene expressions of yeast
p = 5360 genesphenotype of interest: Y = expression of first gene“covariates” X = gene expressions from all other genes
and thenphenotype of interest: Y = expression of second gene“covariates” X = gene expressions from all other genes
and so on
infer/predict the effects of single gene deletions on all othergenes
![Page 8: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/8.jpg)
Effects of single gene deletions on all other genes (yeast)
(Maathuis, Colombo, Kalisch & PB, 2010)• p = 5360 genes (expression of genes)• 231 single gene deletions ; 1.2 · 106 intervention effects• the truth is “known in good approximation”
(thanks to intervention experiments)
goal: prediction of the true large intervention effectsbased on observational data with no knock-downs
n = 63observational data
False positives
True
pos
itive
s
0 1,000 2,000 3,000 4,000
0
200
400
600
800
1,000 IDALassoElastic−netRandom
![Page 9: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/9.jpg)
REGRESSION
False positives
True
pos
itive
s
0 1,000 2,000 3,000 4,000
0
200
400
600
800
1,000 IDALassoElastic−netRandom
![Page 10: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/10.jpg)
REGRESSION
because: for
Y =
p∑j=1
βjX (j) + ε
βj measures effect of X (j) on Y whenkeeping all other variables {X (k); k 6= j} fixed
but when doing an intervention at a gene ; some/many othergenes might change as well and cannot be kept fixed
causal inference framework: allows to definedynamic notion of intervention effect“without keeping other variables fixed”
![Page 11: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/11.jpg)
False positives
True
pos
itive
s
0 1,000 2,000 3,000 4,000
0
200
400
600
800
1,000 IDALassoElastic−netRandom
“predictions of single genedeletions effects in yeast”
was a good finding for a difficultproblem...
... but some criticisms apply:I not very “robust”:
depends somewhat how we define the “truth”(has been found recently by J. Mooij, J. Peters and others)
I old, publicly available data (Hughes et al., 2000)I no collaborator... just (publicly available︸ ︷︷ ︸
this is good!
) data
I we didn’t use the interventional data for training
![Page 12: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/12.jpg)
A new and “better” attempt for the “same” problemsingle gene knock-down in yeast and measure genomewideexpression (observational and interventional data)
goal: predict unseen gene knock-down effectsbased on both observational and interventional data
collaborators:Frank Holstege, Patrick Kemmeren et al. (Utrecht)
data from modern technologyKemmeren, ..., and Holstege (Cell, 2014)
![Page 13: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/13.jpg)
Graphical and Structural equation modelslate 1980s: Pearl; Spirtes, Glymour, Scheines; Dawid; Lauritzen;. . .
powerful language of graphs make formulations and modelsmore transparent (Evans and Richardson, 2011)
![Page 14: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/14.jpg)
variables X1, . . . ,Xp+1 (Xp+1 = Y is the response of interest)
directed acyclic graph (DAG) D0 encoding the true underlyingcausal influence diagram
structural equation model (SEM):
Xj ← f 0j (XpaD0 (j), εj), j = 1, . . . ,p + 1,
ε1, . . . , εp+1 independent
e.g. linear Xj ←∑
k∈paD0 (j)
β0jkXk + εj j = 1, . . . ,p + 1
causal variables for Y = Xp+1: S0 = {j ; j ∈ paD0(Y )}causal coefficients for Y in linear SEM: β0
Yj , j ∈ paD0(Y )
![Page 15: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/15.jpg)
severe issues of identifiability !
(X ,Y ) ∼ N2(0,Σ)
X XY Y
X causes Y Y causes X
![Page 16: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/16.jpg)
agenda for estimation(Chickering, 2002; Shimizu, 2005; Kalisch & PB, 2007;... )
1. estimate the Markov equivalence class of DAGs D0: Dsevere issues of identifiability !
2. derive causal variables: the ones which are causal in allDAGs from D; derive bounds for causal effects based on D(Maathuis, Kalisch & PB, 2009)
goal: construction of confidence statements(without knowing the structure of the underlying graph)
problem:direct likelihood-based inference:
(as in e.g. Mladen Kolar’s talk for undirected graphs)difficult, because of severe identifiability issues !
![Page 17: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/17.jpg)
nevertheless: our goal is to construct a procedureS ⊆ {1, . . . ,p} such that
P[ S ⊆ S0︸ ︷︷ ︸no false positives
] ≥ 1− α
without the need to specify identifiable componentsthat is: if a variable Xj cannot be identified to be causal⇒ j /∈ S
we do this with an entirely new framework(which might be “much simpler” for causal inference)NOT or AVOIDINGgraphical model fitting, potential outcome models,...and maybe more accessible to experts in regression modeling?
![Page 18: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/18.jpg)
nevertheless: our goal is to construct a procedureS ⊆ {1, . . . ,p} such that
P[ S ⊆ S0︸ ︷︷ ︸no false positives
] ≥ 1− α
without the need to specify identifiable componentsthat is: if a variable Xj cannot be identified to be causal⇒ j /∈ S
we do this with an entirely new framework(which might be “much simpler” for causal inference)NOT or AVOIDINGgraphical model fitting, potential outcome models,...and maybe more accessible to experts in regression modeling?
![Page 19: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/19.jpg)
Causal inference using invariant prediction
Peters, PB and Meinshausen (2015)
a main message:
causal structure/components remain the samefor different sub-populations
while the non-causal components can change acrosssub-populations
thus:; look for “stability” of structures among
different sub-populations
![Page 20: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/20.jpg)
Causal inference using invariant prediction
Peters, PB and Meinshausen (2015)
a main message:
causal structure/components remain the samefor different sub-populations
while the non-causal components can change acrosssub-populations
thus:; look for “stability” of structures among
different sub-populations
![Page 21: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/21.jpg)
goal:find the causal variables (components) among a p-dimensionalpredictor variable X for a specific response variable Y
consider data
(X e,Y e) ∼ F e, e ∈ E
with response variables Y e and predictor variables X e
here: e ∈ E︸︷︷︸space of exp. sett.
denotes an experimental setting
“heterogeneous” data fromdifferent environments/experiments e ∈ E
(aspect of “Big Data”)
![Page 22: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/22.jpg)
goal:find the causal variables (components) among a p-dimensionalpredictor variable X for a specific response variable Y
consider data
(X e,Y e) ∼ F e, e ∈ E
with response variables Y e and predictor variables X e
here: e ∈ E︸︷︷︸space of exp. sett.
denotes an experimental setting
“heterogeneous” data fromdifferent environments/experiments e ∈ E
(aspect of “Big Data”)
![Page 23: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/23.jpg)
data
(X e,Y e) ∼ F e, e ∈ E
example 1: E = {1,2} encoding observational (1) and allpotentially unspecific interventional data (2)
example 2: E = {1,2} encoding observational data (1) and(repeated) data from one specific intervention (2)
example 3: E = {1,2,3} ... or E = {1,2,3, . . . ,26} ...
do not need data from carefullydesigned (randomized) experiments
![Page 24: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/24.jpg)
Invariance Assumption
for a set S∗ ⊆ {1, . . . ,p}: L(Y e|X eS∗) is invariant across e ∈ E
for linear model setting: there exists a vector γ∗ such that
∀e ∈ E : Y e = X eγ∗ + εe, εe ⊥ X eS∗ , S∗ = {j ; γ∗j 6= 0}
εe ∼ Fε the same for all eX e has an arbitrary distribution, different across e
γ∗, S∗ is interesting in its own right!
namely the parameter and structure which remain invariantacross experimental settings, or across heterogeneous groups
![Page 25: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/25.jpg)
Invariance Assumption
for a set S∗ ⊆ {1, . . . ,p}: L(Y e|X eS∗) is invariant across e ∈ E
for linear model setting: there exists a vector γ∗ such that
∀e ∈ E : Y e = X eγ∗ + εe, εe ⊥ X eS∗ , S∗ = {j ; γ∗j 6= 0}
εe ∼ Fε the same for all eX e has an arbitrary distribution, different across e
γ∗, S∗ is interesting in its own right!
namely the parameter and structure which remain invariantacross experimental settings, or across heterogeneous groups
![Page 26: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/26.jpg)
Moreover: link to causality
assume:
in short : L(Y e|X eS∗) is invariant across e ∈ E
Proposition (Peters, PB & Meinshausen, 2015)If E does not affect the structural equation for Y in a SEM:
e.g. linear SEM: Y e ←∑
k∈pa(Y )
βYk︸︷︷︸∀e
X ek + εe
Y︸︷︷︸∼Fε∀e
then S0 = pa(Y )︸ ︷︷ ︸causal var.
satisfies the Invariance Assumption (w.r.t. E)
the causal variables lead to invariance (of conditional distr.)
![Page 27: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/27.jpg)
Moreover: link to causality
assume:
in short : L(Y e|X eS∗) is invariant across e ∈ E
Proposition (Peters, PB & Meinshausen, 2015)If E does not affect the structural equation for Y in a SEM:
e.g. linear SEM: Y e ←∑
k∈pa(Y )
βYk︸︷︷︸∀e
X ek + εe
Y︸︷︷︸∼Fε∀e
then S0 = pa(Y )︸ ︷︷ ︸causal var.
satisfies the Invariance Assumption (w.r.t. E)
the causal variables lead to invariance (of conditional distr.)
![Page 28: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/28.jpg)
if E does not affect structural equation for Y :
S0 = pa(Y ) satisfies the Invariance Assumption
this assumption holds for example for:I do-intervention (Pearl) at variables different than YI noise (or “soft”) intervention (Eberhardt & Scheines, 2007)
at variables different than Y
in addition: there might be many other S∗ satisfying theInvariance Assumptionbut uniqueness is not really important (see later)
![Page 29: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/29.jpg)
how do we know whetherE is not affecting structural equation for Y ?
if E does affect structural equation for Y :we will argue:
“robustness” of our procedure (proposed later)
; no causal statementsno false positivesconservative, but on the safe side
![Page 30: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/30.jpg)
how do we know whetherE is not affecting structural equation for Y ?
if E does affect structural equation for Y :we will argue:
“robustness” of our procedure (proposed later)
; no causal statementsno false positivesconservative, but on the safe side
![Page 31: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/31.jpg)
Invariance Assumption: plausible to hold with real data
two-dimensional conditional distributions of observational (blue)and interventional (orange) data(no intervention at displayed variables X ,Y )
seeminglyno invarianceof conditional d.
plausibleinvarianceof conditional d.
![Page 32: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/32.jpg)
A procedure: population caserequire and exploit the Invariance Assumption
L(Y e|X eS∗) the same across e ∈ E
H0,γ,S(E) : γk = 0 if k /∈ S and∃Fε such that ∀ e ∈ E :
Y e = X eγ + εe, εe ⊥ X eS , ε
e ∼ Fε the same for all e
if H0,γ,S(E) is true:I model is correctI S, γ are plausible causal variables/predictors and
coefficientsand
H0,S(E) : there exists γ such that H0,γ,S(E) holds
S is called “plausible causal predictors” if H0,S(E) holds
![Page 33: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/33.jpg)
A procedure: population caserequire and exploit the Invariance Assumption
L(Y e|X eS∗) the same across e ∈ E
H0,γ,S(E) : γk = 0 if k /∈ S and∃Fε such that ∀ e ∈ E :
Y e = X eγ + εe, εe ⊥ X eS , ε
e ∼ Fε the same for all e
if H0,γ,S(E) is true:I model is correctI S, γ are plausible causal variables/predictors and
coefficientsand
H0,S(E) : there exists γ such that H0,γ,S(E) holds
S is called “plausible causal predictors” if H0,S(E) holds
![Page 34: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/34.jpg)
identifiable causal predictors under E :
is defined as the set S(E), where
S(E) =⋂{ S; H0,S(E) holds︸ ︷︷ ︸plausible causal predictors
}
the intersection of all plausible causal predictors
under the Invariance Assumption we have: for any S∗,
S(E) ⊆ S∗
and this is key to obtain confidence bounds for identifiablecausal predictors
![Page 35: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/35.jpg)
identifiable causal predictors under E :
is defined as the set S(E), where
S(E) =⋂{ S; H0,S(E) holds︸ ︷︷ ︸plausible causal predictors
}
the intersection of all plausible causal predictors
under the Invariance Assumption we have: for any S∗,
S(E) ⊆ S∗
and this is key to obtain confidence bounds for identifiablecausal predictors
![Page 36: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/36.jpg)
we have by definition:
S(E1) ⊆ S(E2) if E1 ⊆ E2
withI more interventionsI more “heterogeneity”I more “diversity in complex data”
we can identify more causal predictors
![Page 37: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/37.jpg)
identifiable causal predictors : S(E)↗ as E ↗
question: when is S(E) = S0 ?but it is not important that it is “equal to” or unique(see later)
![Page 38: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/38.jpg)
Theorem (Peters, PB and Meinshausen, 2015)
S(E) = S0 = (parental set of Y in the causal DAG)
if there is:I a single do-intervention for each variable other than Y and |E| = pI a single noise intervention for each variable other than Y and |E| = pI a simultaneous noise intervention and |E| = 2
the conditions can be relaxed such that it is not necessary to intervene at allthe variables
![Page 39: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/39.jpg)
Statistical confidence sets for causal predictors
“the finite sample version of S(E) =⋂
S{S; H0,S(E) is true}”
for “any” S ⊆ {1, . . . ,p}:test whether H0,S(E) is accepted or rejected
S(E) =⋂S
{H0,S accepted at level α}
![Page 40: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/40.jpg)
for H0,S(E):test constancy of regression param. and its residual error distr.across e ∈ E
by weakening H0,S(E) to H0,S(E):
∃β, σ : βepred(S) ≡ β, σe
pred(S) ≡ σ ∀e ∈ E
where
βepred(S) = argminβ;βk=0 (k /∈S)E|Y e − X eβ|2,
σepred(S) = (E|Y e − X eβe
pred(S)|2)1/2
note: H0,S(E) true =⇒ H0,S(E) true
![Page 41: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/41.jpg)
testing H0,S(E): assuming Gaussian errors
DT Σ−1D D
σ2ne∼ F (ne,n−e − |S| − 1)
D = Y e − Y e, Y e based on data \{e}ΣD = Ine + Xe,S(X T
−e,SX−e,S)−1X Te,S
reject H0,S(E) if p-value < α/|E|
![Page 42: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/42.jpg)
S(E) =⋂S
{H0,S accepted at level α}
for some significance level 0 < α < 1
going through all sets S?
![Page 43: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/43.jpg)
S(E) =⋂S
{H0,S accepted at level α}
for some significance level 0 < α < 1
going through all sets S?
![Page 44: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/44.jpg)
going through all sets S?
1. start with S = ∅: if H0,∅(E) accepted =⇒ S(E) = ∅2. consider small sets S of cardinality 1,2, . . .
and construct corresponding intersections S∩ withpreviously considered accepted sets S (H0,S(E) accepted)
for S with H0,S accepted :
S∩ ← S∩ ∩ S
if intersection S∩ = ∅ =⇒ S(E) = ∅if not:
discard all S with S ⊇ S∩
and continue with the remaining sets3. for large p:
restrict search space by variables from Lasso regression;need a faithfulness assumption (and sparsity and assumptionson X e for justification)
![Page 45: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/45.jpg)
confidence sets with invariant prediction
1. for each S ⊆ {1, . . . ,p} construct a set ΓS(E) as follows:
if H0,S(E) rejected at α/(2|E|) ; set ΓS(E) = ∅otherw. ΓS(E) = classic. (1− α/2) CI for βS based on XS
2. set
Γ(E) =⋃
S⊆{1,...,p}
ΓS(E) (est. plausible causal coeff.)
![Page 46: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/46.jpg)
we then obtain:
confidence set for S∗ : S(E)
confidence set for γ∗ : Γ(E)
![Page 47: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/47.jpg)
Theorem (Peters, PB and Meinshausen, 2015)
assume: linear model, Gaussian errors
then, for any γ∗,S∗ satisfying the Invariance Assumption:
P[S(E) ⊆ S∗] ≥ 1− α : confidence w.r.t. true causal var.P[γ∗ ∈ Γ(E)] ≥ 1− α : confidence set for true causal param.
and can choose S∗ = S0 = causal variables in linear SEMif E does not affect struct. eqn. of Y
“on the safe side” (conservative)
we do not need to care about identifiability: if the effect is notidentifiable, the method will not wrongly claim an effect
we do not require that S∗ is minimal and unique
![Page 48: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/48.jpg)
Theorem (Peters, PB and Meinshausen, 2015)
assume: linear model, Gaussian errors
then, for any γ∗,S∗ satisfying the Invariance Assumption:
P[S(E) ⊆ S∗] ≥ 1− α : confidence w.r.t. true causal var.P[γ∗ ∈ Γ(E)] ≥ 1− α : confidence set for true causal param.
and can choose S∗ = S0 = causal variables in linear SEMif E does not affect struct. eqn. of Y
“on the safe side” (conservative)
we do not need to care about identifiability: if the effect is notidentifiable, the method will not wrongly claim an effect
we do not require that S∗ is minimal and unique
![Page 49: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/49.jpg)
“the first” result on statistical confidence for potentiallynon-identifiable causal predictors when structure is unknown(route via graphical modeling for confidence sets seems awkward)
leading to (hopefully) morereliable causal inferential statements
![Page 50: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/50.jpg)
“Robustness”
if Invariance Assumption does not holdbecause E has a direct effect on Y
; for all S: L(Y e|X eS) is not invariant across e ∈ E
; S(E) = ∅ (at least as n→∞)still on the safe side but no power
![Page 51: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/51.jpg)
Empirical results: simulations100 different scenarios, 1000 data sets per scenario:|E| = 2, nobs = ninterv ∈ {100, . . . ,500}, p ∈ {5, . . . ,40}
SU
CC
ES
S P
RO
BA
BIL
ITY
Mar
gina
lM
argi
nal
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Reg
ress
ion
Reg
ress
ion
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
Ges
Ges
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
Gie
s (u
nkno
wn)
Gie
s (u
nkno
wn)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
Gie
s (k
now
n)G
ies
(kno
wn) ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
Ling
amLi
ngam
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
Inva
riant
pre
dict
ion
Inva
riant
pre
dict
ion
0.00
0.25
0.50
0.75
1.00
power todetect causal predictors
FW
ER
Mar
gina
l Mar
gina
l
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Reg
ress
ion R
egre
ssio
n
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
Ges
Ges
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Gie
s (u
nkno
wn)
Gie
s (u
nkno
wn)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
Gie
s (k
now
n)G
ies
(kno
wn)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
Ling
amLi
ngam
●
●
● ●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●● ●●● ●● ●●● ● ●●●● ●●● ●●●●●● ●●
●●●●
●●●● ●●●●● ●●● ●● ●●● ●●
●
●● ●● ● ●●
●●● ●●● ●●
●●● ●● ●●●●● ●●● ●●● ● ●●●●●
●● ●● ●● ●● ●●
●
●
Inva
riant
pre
dict
ion In
varia
nt p
redi
ctio
n
0.00
0.50
1.00
familywise error rate:P[S(E) 6⊆ S∗], aimed at 0.05
![Page 52: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/52.jpg)
Single gene deletion experiments in yeast
p = 6170 genesresponse of interest: Y = expression of first gene“covariates” X = gene expressions from all other genes
and thenresponse of interest: Y = expression of second gene“covariates” X = gene expressions from all other genes
and so on
infer/predict the effects of a single gene knock-down on allother genes
![Page 53: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/53.jpg)
collaborators:Frank Holstege, Patrick Kemmeren et al. (Utrecht)
data from modern technologyKemmeren, ..., and Holstege (Cell, 2014)
![Page 54: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/54.jpg)
Kemmeren et al. (2014):genome-wide mRNA expressions in yeast: p = 6170 genes
I nobs = 160 “observational” samples of wild-typesI nint = 1479 “interventional” samples
each of them corresponds to a single gene deletion strain
for our method:I we use |E| = 2 (observational and interventional data)I training-test data:• training: all observational and 2/3 of interventional data• test: other 1/3 of gene deletion interventions• repeat this for the three blocks of interventional data
I since every interventional data point is used once as aresponse variable:we use coverage 1− α/nint with α = 0.01 and nint = 1479
![Page 55: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/55.jpg)
Results
8 genes are significant (α = 0.01 level) causal variables(each of the 8 genes “causes” another gene)
validation with test datamethod invar.pred. GIES PC-IDA marg.corr. rand.guess.
no. true pos. 6 2 2 2 *(out of 8)
*: quantiles for selecting true positives among 7 random draws2 (95%), 3 (99%)
; our invariant prediction method has most power !and it should exhibit control against false positive selections
![Page 56: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/56.jpg)
# INTERVENTION PREDICTIONS
# S
TR
ON
G IN
TE
RV
EN
TIO
N E
FF
EC
TS
0 5 10 15 20 25
02
46
8
PERFECTINVARIANTHIDDEN−INVARIANTPCRFCIREGRESSION (CV−Lasso)GES and GIESRANDOM (99% prediction− interval)
I : invariant prediction methodH: invariant prediction with some hidden variables
![Page 57: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/57.jpg)
Validation (Meinshausen, Hauser, Mooij, Peters, Versteeg & PB, 2015)with intervention experiments: strong intervention effect (SIE)with yeastgenome.org database: scores A-F
rank cause effect SIE A B C D E F1 YMR104C YMR103C X2 YPL273W YMR321C X3 YCL040W YCL042W X4 YLL019C YLL020C X5 YMR186W YPL240C X X X X X X6 YDR074W YBR126C X X X X X X7 YMR173W YMR173W-A X8 YGR162W YGR264C9 YOR027W YJL077C X
10 YJL115W YLR170C11 YOR153W YDR011W X X12 YLR270W YLR345W13 YOR153W YBL005W14 YJL141C YNR007C15 YAL059W YPL211W16 YLR263W YKL098W17 YGR271C-A YDR339C18 YLL019C YGR130C19 YCL040W YML100W20 YMR310C YOR224C
SIE: correctly predicting a strong intervention effect which is inthe 1%- or 99% tail of the observational data
![Page 58: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/58.jpg)
Flow cytometry data (Sachs et al., 2005)
I p = 11 abundances of chemical reagentsI 8 different environments (not “well-defined” interventions)
(one of them observational; 7 different reagents added)I each environment contains ne ≈ 700− 1′000 samples
goal:recover network of causal relations (linear SEM)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
approach: invariant causal prediction(one variable the response Y ; the other 10 the covariates X ;
do this 11 times with every variable once the response)
main concern: different environments might have a directinfluence on the response ; Invariance Assumption would fail
![Page 59: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/59.jpg)
main concern: different environments might have a directinfluence on the response ; Invariance Assumption would fail
instead of requiring invariance among 8 environments; require invariance for pairs of environments Eij (28 pairs intotal) and do Bonferroni correction by taking the union
S(E) = ∪i<j Sα/28(Eij)
this is weakening the assumption that an intervention shouldnot directly influence the response Y
and if a pair of interventions does nevertheless; S = ∅ (“robustness” as discussed before)
![Page 60: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/60.jpg)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
blue edges: only invariant causal prediction approach (ICP)red: only ICP allowing hidden variables and feedbackpurple: both ICP with and without hidden variablessolid: all relations that have been reported in literaturebroken: new findings not reported in the literature
; reasonable consensus with existing resultsbut no real ground-truth available
serves as an illustration that we can work with “vaguely definedinterventions”
![Page 61: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/61.jpg)
Concluding thoughts
generalize Invariance Assumption and statistical testing tononparametric/nonlinear modelsin particular additive models
∀e ∈ E : Y e = f ∗(X eS∗) + εe, εe ∼ Fε, εe ⊥ XS∗
∀e ∈ E : Y e =∑j∈S∗
f ∗j (X ej ) + εe, εe ∼ Fε, εe ⊥ XS∗
the statistical significance testing becomes more difficultimproved identifiability with nonlinear SEMs (Mooij et al., 2009)
![Page 62: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/62.jpg)
generalize to include hidden variables
S0 = pa(Y )∩observed variables, S0H = pa(Y )∩hidden variables
presented procedure still OK if (essentially):- no interventions at Y , at S0
H and ancestors(S0H)
- no hidden confounder between Y and S0
Y
H1 H2H3
S0X1
X8
X12
![Page 63: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/63.jpg)
more general hidden variable models can be treated with a“related” technique
(Rothenhausler, Heinze, Peters & Meinshausen, 2015)
I
C
Y
E
W
I
X Y
W
1
(X ,Y ) form a DAGI
C
Y
E
W
I
X Y
W
1
X = (C,E)without knowing C,E
C ← Bc←i I + Bc←w W + εC
Y ← By←w W + By←cC + εY
E ← Be←i I + Be←w W + Be←cC + Be←y Y + εE
![Page 64: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/64.jpg)
provocative next step: how about using “Big Data”?
; structure the “large-scale” data into differentunknown groups of experimental settings Ethat is: learn E from data
“optimal” E is a trade-off betweenidentifiability and statistical power
![Page 65: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/65.jpg)
learning/estimating E
problem: given a “large bag of data”, can we estimate theunknown underlying different experimental conditions?
mathematically: denote byJ all experimental settings, satisfying Invariance Assumption; mixture distribution for e ∈ constr. E : F e =
∑j∈J we
j F j
when pooling two constr. experimental settings e1 and e2:; new mixture distribution with weights (we1 + we2)/2
- mixture modeling- change point modeling for (time) ordered datamight be useful to estimate E
![Page 66: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/66.jpg)
causal components remain the same fordifferent sub-populations or experimental settings
; exploit the power of heterogeneity in complex data!
and confidence bounds follow naturally
![Page 67: Blessing of heterogeneous large-scale data for high ......Blessing of heterogeneous large-scale data for high-dimensional causal inference Peter Buhlmann¨ Seminar fur Statistik, ETH](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4d9ac472003934d1715f40/html5/thumbnails/67.jpg)
Thank you!
SoftwareR-package: pcalg
(Kalisch, Machler, Colombo, Maathuis & PB, 2010–2015)R-package: InvariantCausalPrediction (Meinshausen, 2014)
References to some of our own work:I Peters, J., Buhlmann, P. and Meinshausen, N. (2015). Causal inference using invariant prediction:
identification and confidence intervals. To appear in J. Royal Statistical Society, Series B (with discussion).Preprint arXiv:1501.01332
I Meinshausen, N., Hauser, A. Mooij, J., Peters, J., Versteeg, P. and Buhlmann, P. and (2015). Causalinference from gene perturbation experiments: methods, software and validation. Preprint.
I Hauser, A. and Buhlmann, P. (2015). Jointly interventional and observational data: estimation ofinterventional Markov equivalence classes of directed acyclic graphs. Journal of the Royal StatisticalSociety, Series B, 77, 291-318.
I Hauser, A. and Buhlmann, P. (2012). Characterization and greedy learning of interventional Markovequivalence classes of directed acyclic graphs. Journal of Machine Learning Research 13, 2409-2464.
I Kalisch, M., Machler, M., Colombo, D., Maathuis, M.H. and Buhlmann, P. (2012). Causal inference usinggraphical models with the R package pcalg. Journal of Statistical Software 47 (11), 1-26.
I Stekhoven, D.J., Moraes, I., Sveinbjornsson, G., Hennig, L., Maathuis, M.H. and Buhlmann, P. (2012).Causal stability ranking. Bioinformatics 28, 2819-2823.
I Maathuis, M.H., Colombo, D., Kalisch, M. and Buhlmann, P. (2010). Predicting causal effects in large-scalesystems from observational data. Nature Methods 7, 247-248.
I Maathuis, M.H., Kalisch, M. and Buhlmann, P. (2009). Estimating high-dimensional intervention effects fromobservational data. Annals of Statistics 37, 3133-3164.