The IMAP Hybrid Method for Learning Gaussian Bayes Nets

Oliver SchulteSchool of Computing ScienceSimon Fraser University

Vancouver, Canadaoschulte@cs.sfu.ca

with Gustavo Frigo, Hassan Khosravi (Simon Fraser) andRuss Greiner (U of Alberta)

Outline

Brief Intro to Bayes Nets (BN)BN structure learning: Score-based vs. Constraint-based.Hybrid Design: Combining Dependency (Correlation) Information with Model Selection.Evaluation by Simulations

Bayes Nets: Overview

Bayes Net Structure = Directed Acyclic Graph.Nodes = Variables of Interest.Arcs = direct “influence”, “association”.Parameters = CP Tables = Prob of Child given Parents.Structure represents (in)dependencies.(Structure + parameters) represents joint probability distribution over variables.Many applications: Gene expression, finance, medicine, …

Example: Alienation Model

• Wheaton et al. 1977. Structure entails (in)dependencies:

• Anomia67 dependent on SES.

• Education independent of any node given SES

• Can predict any node, e.g. “Probability of Powerless67 = x, given SES = y?”

Anomia67

Powerless67

Anomia71

Powerless71

Alienation67

Alienation71

Education

Gaussian Bayes Nets

• aka (recursive) structural equation models.• Continous variables.• Each child is linear combination of parents, plus normally distributed error ε (mean 0).• E.g. Alienation71 = 0.61*Alienation67 -0.23 SES + ε

Alienation71

Alienation67

Two Approaches to Learning Bayes Net Structure

• Select graph G as model with parameters to be estimated

• score BIC, AIC

• balances data fit with model complexity

• use statistical correlation test (e.g., Fisher’s z) to find (in)dependencies.

• choose G that entails (in)dependencies in sample.

Input: random sample

Search and Score Constraint-Based (CB)

A B Covers correlation between A and B

Score = 5

Score = 10

3.2 4.5

-2.1 -3.3

-5.4 -4.3

… …

Overfitting with Score-based Search

Anomia67

Powerless67

Anomia71

Powerless71

Alienation67

Alienation71

Education

Anomia67

Powerless67

Anomia71

Powerless71

Alienation67

Alienation71

Education

TargetModel

BICModel

Overfitting with Score-based Search

• Generate random parametrized graphs with different sizes, 30 graphs for each size. • Generate random samples of size 100-20,000.• Use Bayes Information Criterion (BIC) to learn.

Our Hybrid Approach

CB: weakness is type II error,false acceptance of independencies. use only dependenciesSchulte et al. 2009 IEEE CIDM

Score-based: produces overly dense graphs cross-check score with dependenciesNew idea

• Let Dep be a set of conditional dependencies extracted from sample d.

• Move from graph G to G’ only if

1. score increases: S(G’,d) > score(G,d) and

2. G’ entails strictly more dependencies from Dep than G.

S(G,d) = score function.

Case 1 Case 2 Case 3

S 10.5

d = sample

Example for Hybrid Criterion

Dep(A,B),Dep(A,B|C)sample

20score

• The score prefers thedenser graph.

• The correlation between B and C is not statistically significant.

• So the hybrid criterion prefers the simpler graph.

Adapting Local Search

Dep(A,B),Dep(A,B|C),Dep(B,C)

sampletest

test n2 Markov blanket dependencies:

is A independent of B given the Markov blanket of A?n= number of nodes

current graph new graphapply local search operator to maximize score while covering additional dependencies

• any local search can be used

• inherits convergence properties of local search if test is correct in the sample size limit

Simulation Setup (key results)

• Statistical Test: Fisher z-statistic, = 5%

• Score S: BIC (Bayes information score).

• Search Method: GES [Meek, Chickering 2003].

Random Gaussian DAGs.

• #Nodes: 4-20.

• Sample Sizes 100-20,000.

• 30 random graphs for each combination of node number with sample size, average results.

• Graphs generated with Tetrad’s random DAG utility (CMU).

Simulation Results: Number of Edges

• Edge ratio of 1 is ideal.GES = standard score search. IGES = hybrid search.

• Improvement = ratio(GES) – ratio(IGES).

Simulation Results: f-measure

Structure fit improves most for >10 nodes.

f-measure on correctly placed links combines false positives and negatives: 2 correctly placed/ (2 correctly placed + incorrectly placed + missed edges.)

Real-World Structures: Insurance and Alarm

• Alarm (Beinlich et al. 1989)/Insurance (Binder et al.) 37/25 nodes.• Average over 5 samples per sample size.• F-measure, higher is better.

False Dependency Rate

type I error (false correlations) rare,frequency less than 3%.

False Independency/Acceptance Rate

• type II error (false independencies) are quite frequent.

• e.g., 20% for sample size 1,000.

ConclusionIn Gaussian BNs, score-based methods tend to produce overly dense graphs for >10 variables.Key new idea: constrain search s.t. adding edges is justified by fitting more statistically significant correlations. Also: use only dependency information, not independencies (rejections of null hypothesis, not acceptances).For synthetic and real-world target graphs finds less complex graphs, better fit to target graph.

The End

Thank you!

full paper

The IMAP Hybrid Method for Learning Gaussian Bayes Nets

Documents

Transcript of The IMAP Hybrid Method for Learning Gaussian Bayes Nets

Lec18 Bayes Nets

CS 188: Artificial Intelligence Bayes’ Net Representation Bayes’ …cs188/fa18/assets/slides/... · 2018. 10. 17. · Bayes’ nets implicitly encode joint distributions As a

Graphical Models - svivek · • Example: Hidden Markov Model –Naïve Bayes classifier is also a simple Bayes net • Sometimes Bayes nets cannot represent the independence ...

Bayes’ Nets - Wei Xu · CS 5522: Artificial Intelligence II Bayes’ Nets Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.]

Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari

Bayesian Networks - University of Wisconsin–Madisonpages.cs.wisc.edu/~dyer/cs540/notes/13_bayes-net.pdfBayesian Networks (aka Bayes Nets, Belief Nets, Directed Graphical Models)

Lecture 15: Probability review Bayes Nets intro

Bayes Nets Representation: joint distribution and ...tom/10701_sp11/recitations/Recitation_4.pdf · Bayes Nets Representation: joint distribution and conditional independence Yi Zhang

Intro to Machine Learning Parameter Estimation for Bayes Nets Naïve Bayes.

Large Sample Robustness Bayes Nets with Incomplete Information

Cps270 Bayes Nets

Bayes’ Nets: Big Picture

Bayes’ Nets: Samplingcs188/fa19/assets/slides/lec17.pdf · CS 188: Artificial Intelligence Bayes’ Nets: Sampling Instructor: Professor Dragan --- University of California, Berkeley

CS 188: Artificial Intelligence Bayes’ Nets: Samplingcs188/fa18/assets/... · CS 188: Artificial Intelligence Bayes’ Nets: Sampling Instructors: Dan Klein and Pieter Abbeel ---

Bayes’ Nets: Big Picturepstone/Courses/343spring12/resources/... · 2012-02-21 · Bayes’ Nets: Big Picture Two problems with using full joint distribution tables as our probabilistic

Inference Algorithms for Bayes Networks. Outline Bayes Nets are popular representations in AI, and researchers have developed many inference techniques.

The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada oschulte@cs.sfu.ca.

The Neural Basis of Thought and Language Week 11 Metaphor and Bayes Nets.

Learning Bayes Nets Based on Conditional Dependencies

Compression in Bayes nets - MIT OpenCourseWare · PDF fileCompression in Bayes nets • A Bayes net compresses the joint probability distribution over a set of variables in two ways: