DAG discovery - Network Analysis 2017
Transcript of DAG discovery - Network Analysis 2017
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG discoveryNetwork Analysis 2017
Sacha Epskamp
04-12-2017
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Last week
• Regularization controls for spurious connection• LASSO regularization• EBIC model selection
• Bootstrap methods assess accuracy and stability of results• Non-parametric bootstrap• Case-drop bootstrap
• Comparing networks takes three steps• Visually inspect; Correlate weights; Permutation test
(NetworkComparisonTest)• Non-normal data
• Non-paranormal transformation• Polychoric correlations
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Bootnet estimation
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Directed Acyclic Graphs
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Building blocks of a DAGCommon Cause
A
B
C
Example: Disease (B)causes twosymptoms (A and C).
Chain
A B C
Example: Insomnia(A) causes fatigue(B), which in turncauses concentrationproblems (C)
ColliderA
B
C
Example: Difficulty ofclass (A) andIntelligence of student(C) cause grade on atest (B)
A 6⊥⊥ C
A ⊥⊥ C | B
A 6⊥⊥ C
A ⊥⊥ C | B
A ⊥⊥ C
A 6⊥⊥ C | B
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
To identify two variables (e.g., B and F) are conditionallyindependent given a third (e.g., C) or set of multiple variables:• List all paths between the variables (ignore direction of edge)• For each path, check if the variable to condition on is:
• The middle node in a chain or common cause structure• Not the middle node (common effect) in a collider structure or
an effect of such a common effect• If so, then the path is blocked• If all such paths are blocked, the two variables are
d-separated and thus conditionally independent
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
• A ⊥⊥ B• A ⊥⊥ D | C• B ⊥⊥ G | C ,E• ...
Testing this causal model involves testing if all these conditionalindependence relations hold
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
However, if this model fits:
• A → B → C
Then so do these:
• A ← B → C
• A ← B ← C
Because these models imply the same conditional independencerelationships and are therefore equivalent
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAGS & Probability
• A key problem in statistics is characterizing a joint likelihoodfunction of all data• A function that tells you how likely your observed data is given
some parameters• Pr(A ,B ,C ,D, . . .)
• This function is used in estimating parameters• Parameters are selected that maximize the likelihood function
• Obtaining the joint likelihood may be complicated though
• DAGs make this much simpler!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAGS & Probability
Normally, to obtain the joint likelihood we need to factorize (chainrule):
Pr(A ,B ,C ,D,E) = Pr(A ) Pr(B | A ) Pr(C | A ,B) Pr(D | A ,B ,C) Pr(E | A ,B ,C ,D)
But if we know the DAG:
A → B → C → D → E
Then we know, e.g., Pr(E | A ,B ,C ,D) = Pr(E | D) (any node onlydepends on their “parents”), and thus:
Pr(A ,B ,C ,D,E) = Pr(A ) Pr(B | A ) Pr(C | B) Pr(D | C) Pr(E | D)
Much simpler!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Joint Likelihood of Multiple Realizations
Y1 Y2 Y3 Y4 Y5
lag−0
Simplest: independent cases (e.g., cross-sectional data):
Pr(YYY ) = Pr(YYY1) Pr(YYY2) Pr(YYY3) Pr(YYY4) Pr(YYY5)
Estimable if all probability distributions are assumed identical
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Joint Likelihood of Multiple Realizations
Y1 Y2 Y3 Y4 Y5
lag−1
Lag-1 factorization (time-series):
Pr(YYY ) = Pr(YYY1) Pr(YYY2 | YYY1) Pr(YYY3 | YYY2) Pr(YYY4 | YYY3) Pr(YYY5 | YYY4)
Estimable if all probability distributions are assumed identical
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Statistical models can often be portrayed as DAGs, in which casethey are called graphical models. For example:
Lee, M. D., & Wagenmakers, E. J. (2014). Bayesian cognitive modeling: A practical course.Cambridge university press.
• Powerful method for showing how the parameters of a complex model interact withone-another
• Bayesian software packages (e.g., WinBUGS, JAGS, Stan) use this DAG in samplingfrom the posterior distribution
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class
• List equally plausible DAGs• Two types of algorithms:
• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:
• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:
• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:
• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:
• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:
• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:
• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:
• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:
• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm
• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
• DAG search algorithms intent to identify an equivalence class• List equally plausible DAGs
• Two types of algorithms:• Constraint-based algorithms
• (1) identify edge locations, (2) identify colliders, (3) orient edgesunder acyclicity assumtion
• Score-based algorithms:• Find optimal DAG by model selection/search
• Prior knowledge can be used in both cases to greatly help thealgorithm• E.g., causation cannot go backward in time
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption
• “There exist no common unobserved (also known as hidden orlatent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity
• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity
• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables
• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity
• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity
• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Assumptions
• Causal Sufficiency Assumption• “There exist no common unobserved (also known as hidden or
latent) variables in the domain that are parent of one or moreobserved variables of the domain”.
• tl;dr: No latent variables• Markov Assumption
• “Given a Bayesian network model B, any variable isindependent of all its nondescendants in B, given its parents”.
• tl;dr: Acyclicity• Faithfulness Assumption
• “A BN graph G and a probability distribution P are faithful toone another iff every one and all independence relations validin P are those entailed by the Markov assumption on G”.
• tl;dr: No weird stuff
Source: Margaritis, D., 2003. Learning Bayesian network model structure from data.Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Score-based algorithms
• Score-based algorithms fit several DAGs to some criteriumand selects the best
• Possible criteria are posterior model fit and AIC/BIC
• Searching all possible DAGs is intractable, so some strategyis needed
• Examples• Hill Climbing; Tabu Search
• Used, e.g., by McNally, R. J., Mair, P., Mugno, B. L., &Riemann, B. C. (2017). Co-morbid obsessive-compulsivedisorder and depression: a Bayesian network approach.Psychological Medicine, 1-11.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Hill Climbing
• 1. Start at empty, full or random network• 2. Add, remove, or reverse edges all possible edges• 3. Select the best fitting model that performs better than
current model• 4. Go to 2
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Hill Climbing
• Hill Climbing results in a local optimum• Random restarts and perturbations can be used to find a
global optimum• No control for overfitting
• Bootstrapping and only retaining stable edges is highlyrecommended
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships
• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:
• IC algorithm; PC algorithm; Grow-Shrink; IncrementalAssociation Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:
• IC algorithm; PC algorithm; Grow-Shrink; IncrementalAssociation Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.
2. For each trio (a, b , c) such that a − c − b check if c belongs toSab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:
• IC algorithm; PC algorithm; Grow-Shrink; IncrementalAssociation Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:
• IC algorithm; PC algorithm; Grow-Shrink; IncrementalAssociation Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:
• IC algorithm; PC algorithm; Grow-Shrink; IncrementalAssociation Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:
• IC algorithm; PC algorithm; Grow-Shrink; IncrementalAssociation Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:• IC algorithm; PC algorithm; Grow-Shrink; Incremental
Association Markov Blanket
• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Networkanalysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based algorithm• Structure estimated based on conditional independence
relationships• E.g., Inductive Causation algorithm:
1. For each pair a and b, look for (a y b | Sab ). If no such Sab
exists, then a and b are dependent.2. For each trio (a, b , c) such that a − c − b check if c belongs to
Sab . If so, then nothing. If c is not in Sab then make a colliderat c, i.e. a → c ← b.
3. Orient as many of the undirected edges as possible, subjectto: (i) no new v-structures and (ii) no cycles.
• Examples:• IC algorithm; PC algorithm; Grow-Shrink; Incremental
Association Markov Blanket• Used, e.g., by Borsboom, D., & Cramer, A. O. (2013). Network
analysis: an integrative approach to the structure ofpsychopathology. Annual review of clinical psychology, 9,91-121.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
What if we don’t know the structure?
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• Yes! They are independent to begin with!
• Draw no edge between Easiness of Class and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• Yes! They are independent to begin with!
• Draw no edge between Easiness of Class and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• Yes! They are independent to begin with!
• Draw no edge between Easiness of Class and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• No!
• Draw an edge between Easiness of Class and Grade
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• No!
• Draw an edge between Easiness of Class and Grade
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• No!
• Draw an edge between Easiness of Class and Grade
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• No!
• Draw an edge between Grade and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• No!
• Draw an edge between Grade and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Are the two nodes independent given *any* set of other nodes(including the empty set)?
• No!
• Draw an edge between Grade and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Is the middle node in the set that separated the other two nodes?
• Yes!
• Do nothing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Is the middle node in the set that separated the other two nodes?
• Yes!
• Do nothing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Is the middle node in the set that separated the other two nodes?
• Yes!
• Do nothing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Is the middle node in the set that separated the other two nodes?
• Yes!
• Grade is a collider between Easiness of Class and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Is the middle node in the set that separated the other two nodes?
• Yes!
• Grade is a collider between Easiness of Class and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Is the middle node in the set that separated the other two nodes?
• Yes!
• Grade is a collider between Easiness of Class and Intelligence
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Do we now know the direction of the edge between Grade andDiploma?
• Yes! Grade was not a common effect of diploma and anothervariable!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Do we now know the direction of the edge between Grade andDiploma?
• Yes! Grade was not a common effect of diploma and anothervariable!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Do we now know the direction of the edge between Intelligenceand IQ?
• No!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Do we now know the direction of the edge between Intelligenceand IQ?
• No!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
Grade IQ
Diploma
Easiness of Class Intelligence
IQGrade
Diploma
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Easiness of Class Intelligence
IQGrade
Diploma
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based vs Score-based algorithms
• Constraint-based algorithms are more specific and detailed,allow for a more certain causal interpretation. But are alsosensitive to error (if one test is wrong everything fails!)
• Score-based methods provide a metric of confidence in thereturned model and are useful in approximating the jointprobability distribution
• Hybrid methods that aim to take the best from both worlds arealso developed!
• e.g., Max-Min Hill Climbing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based vs Score-based algorithms
• Constraint-based algorithms are more specific and detailed,allow for a more certain causal interpretation. But are alsosensitive to error (if one test is wrong everything fails!)
• Score-based methods provide a metric of confidence in thereturned model and are useful in approximating the jointprobability distribution
• Hybrid methods that aim to take the best from both worlds arealso developed!
• e.g., Max-Min Hill Climbing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based vs Score-based algorithms
• Constraint-based algorithms are more specific and detailed,allow for a more certain causal interpretation. But are alsosensitive to error (if one test is wrong everything fails!)
• Score-based methods provide a metric of confidence in thereturned model and are useful in approximating the jointprobability distribution
• Hybrid methods that aim to take the best from both worlds arealso developed!
• e.g., Max-Min Hill Climbing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Constraint-based vs Score-based algorithms
• Constraint-based algorithms are more specific and detailed,allow for a more certain causal interpretation. But are alsosensitive to error (if one test is wrong everything fails!)
• Score-based methods provide a metric of confidence in thereturned model and are useful in approximating the jointprobability distribution
• Hybrid methods that aim to take the best from both worlds arealso developed!• e.g., Max-Min Hill Climbing
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Directed Acyclic Graphs
• A DAG implies a set of independence relationships, which canbe tested
• If the data is assumed Multivariate Gaussian:
• Each variable normally distributed• Linear relationships between variables
• Then the correlation or covariance can be used to test fordependencies and the partial correlation or partial covariancecan be used to test for conditional dependencies
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Directed Acyclic Graphs
• A DAG implies a set of independence relationships, which canbe tested
• If the data is assumed Multivariate Gaussian:
• Each variable normally distributed• Linear relationships between variables
• Then the correlation or covariance can be used to test fordependencies and the partial correlation or partial covariancecan be used to test for conditional dependencies
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Directed Acyclic Graphs
• A DAG implies a set of independence relationships, which canbe tested
• If the data is assumed Multivariate Gaussian:
• Each variable normally distributed
• Linear relationships between variables
• Then the correlation or covariance can be used to test fordependencies and the partial correlation or partial covariancecan be used to test for conditional dependencies
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Directed Acyclic Graphs
• A DAG implies a set of independence relationships, which canbe tested
• If the data is assumed Multivariate Gaussian:
• Each variable normally distributed• Linear relationships between variables
• Then the correlation or covariance can be used to test fordependencies and the partial correlation or partial covariancecan be used to test for conditional dependencies
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Directed Acyclic Graphs
• A DAG implies a set of independence relationships, which canbe tested
• If the data is assumed Multivariate Gaussian:
• Each variable normally distributed• Linear relationships between variables
• Then the correlation or covariance can be used to test fordependencies and the partial correlation or partial covariancecan be used to test for conditional dependencies
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
A
B
C
• Cov (A ,C) , 0
• Cov (A ,C | B) = 0
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix
• If multivariate normality holds, then the Schur complementshows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)• Thus, a specific structure of the correlation matrix also implies
a model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix• If multivariate normality holds, then the Schur complement
shows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)
• Thus, a specific structure of the correlation matrix also impliesa model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix• If multivariate normality holds, then the Schur complement
shows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)• Thus, a specific structure of the correlation matrix also implies
a model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix• If multivariate normality holds, then the Schur complement
shows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)• Thus, a specific structure of the correlation matrix also implies
a model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix• If multivariate normality holds, then the Schur complement
shows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)• Thus, a specific structure of the correlation matrix also implies
a model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!
• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix• If multivariate normality holds, then the Schur complement
shows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)• Thus, a specific structure of the correlation matrix also implies
a model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Structural Equation Modeling• In SEM, the variance-covariance matrix is modeled and
compared to the observed variance-covariance matrix• If multivariate normality holds, then the Schur complement
shows that any partial covariance can be expressed solely interms of variances and covariances:
• Cov(Yi ,Yj | X = x
)=
Cov(Yi ,Yj
)− Cov (Yi ,X ) Var (X )−1 Cov
(X ,Yj
)• Thus, a specific structure of the correlation matrix also implies
a model for all possible partial correlations
• If the implied covariance matrix of SEM exactly matches theobserved covariance matrix, then the data contains alld-separations that are implied by the causal model
• In that case, the model could have generated the data!• But, this does not mean the model is correct
• Equivalent models could have generated the same data!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Doosje, B., Loseman, A., & Bos, K. (2013). Determinants ofradicalization of Islamic youth in the Netherlands: Personaluncertainty, perceived injustice, and perceived group threat.Journal of Social Issues, 69(3), 586-604.
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
What does pcalg come up with?
In−group Identification
Individual Deprivation
Collective Deprivation
Intergroup Anxiety
Symbolic Threat
Realistic Threat
Personal Emotional Uncertainty
Perceived Injustice
Perceived Illegitimacy authorities
Perceived In−group superiority
Distance to Other People
Societal Disconnected
Attitude towards Muslim Violence
Own Violent Intentions
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Does it fit?
## chisq df pvalue cfi nfi
## 80.52 39.00 0.00 0.89 0.82
## rmsea rmsea.ci.lower rmsea.ci.upper
## 0.09 0.06 0.12
• Not really. . .
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
DAG Discovery
Discovering an equivalence set of DAGs is possible under someassumptions:
• Causal Sufficiency
• Markov Aumption
• Faithfulness
Two general methods:
• Score-based algorithms
• Constraint-based algorithms
DAGs provide useful characterisations of the joint likelihood andcan be fitted to the data (e.g., SEM)
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
But...
• Assumptions often not plausible
• Latents or acyclicity• Prone to errors
• Often edges are estimated in a different direction than youwould expect
• Exploratory estimation may suffer from low power
• Confirmatory fit may suffer from many equivalent models
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
But...
• Assumptions often not plausible• Latents or acyclicity
• Prone to errors
• Often edges are estimated in a different direction than youwould expect
• Exploratory estimation may suffer from low power
• Confirmatory fit may suffer from many equivalent models
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
But...
• Assumptions often not plausible• Latents or acyclicity
• Prone to errors
• Often edges are estimated in a different direction than youwould expect
• Exploratory estimation may suffer from low power
• Confirmatory fit may suffer from many equivalent models
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
But...
• Assumptions often not plausible• Latents or acyclicity
• Prone to errors• Often edges are estimated in a different direction than you
would expect
• Exploratory estimation may suffer from low power
• Confirmatory fit may suffer from many equivalent models
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
But...
• Assumptions often not plausible• Latents or acyclicity
• Prone to errors• Often edges are estimated in a different direction than you
would expect
• Exploratory estimation may suffer from low power
• Confirmatory fit may suffer from many equivalent models
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
But...
• Assumptions often not plausible• Latents or acyclicity
• Prone to errors• Often edges are estimated in a different direction than you
would expect
• Exploratory estimation may suffer from low power
• Confirmatory fit may suffer from many equivalent models
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Software
Several R packages, but mainly:• pcalg
• Implements the PC-algorithm (a faster variant of theIC-algorithm)
• bnlearn• Implements everything *but* the PC-algorithm
We will see these in the assignment!
Recap D-separation recap DAGS & Probability DAG Discovery Fitting DAGs Conclusion
Thank you for your attention!