Tutorial on Bayesian Networks
-
Upload
jade-nunez -
Category
Documents
-
view
47 -
download
3
description
Transcript of Tutorial on Bayesian Networks
![Page 1: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/1.jpg)
1
Tutorial on Bayesian Networks
First given as a AAAI’97 tutorial.
Jack BreeseMicrosoft [email protected]
Daphne KollerStanford [email protected]
![Page 2: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/2.jpg)
7© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Probabilities
Probability distribution P(X| X is a random variable
Discrete Continuous
is background state of information
![Page 3: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/3.jpg)
8© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Discrete Random Variables
Finite set of possible outcomes
0)( ixP
1)(1
n
iixP
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
X1 X2 X3 X41)()( xPxPX binary:
nxxxxX ,...,,, 321
![Page 4: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/4.jpg)
9© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Continuous Random Variable
Probability distribution (density function) over continuous values
10
0
1)( dxxP
10,0X 0)( xP
7
5
)()75( dxxPxP
)(xP
x5 7
![Page 5: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/5.jpg)
14© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Bayesian networks
Basics Structured representation Conditional independence Naïve Bayes model Independence facts
![Page 6: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/6.jpg)
15© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Bayesian Networks
CancerSmoking heavylightnoS ,,
malignantbenignnoneC ,,P(S=no) 0.80P(S=light) 0.15P(S=heavy) 0.05
Smoking= no light heavyP(C=none) 0.96 0.88 0.60P(C=benign) 0.03 0.08 0.25P(C=malig) 0.01 0.04 0.15
![Page 7: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/7.jpg)
16© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Product Rule
P(C,S) = P(C|S) P(S)
S C none benign malignantno 0.768 0.024 0.008
light 0.132 0.012 0.006
heavy 0.035 0.010 0.005
![Page 8: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/8.jpg)
17© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Marginalization
S C none benign malig totalno 0.768 0.024 0.008 .80
light 0.132 0.012 0.006 .15
heavy 0.035 0.010 0.005 .05
total 0.935 0.046 0.019
P(Cancer)
P(Smoke)
![Page 9: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/9.jpg)
18© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Bayes Rule Revisited
)(
),(
)(
)()|()|(
CP
SCP
CP
SPSCPCSP
S C none benign maligno 0.768/.935 0.024/.046 0.008/.019
light 0.132/.935 0.012/.046 0.006/.019
heavy 0.030/.935 0.015/.046 0.005/.019
Cancer= none benign malignantP(S=no) 0.821 0.522 0.421P(S=light) 0.141 0.261 0.316P(S=heavy) 0.037 0.217 0.263
![Page 10: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/10.jpg)
19© Jack Breese (Microsoft) & Daphne Koller (Stanford)
A Bayesian Network
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
![Page 11: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/11.jpg)
20© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Independence
Age and Gender are independent.
P(A|G) = P(A) A G P(G|A) = P(G) G A
GenderAge
P(A,G) = P(G|A) P(A) = P(G)P(A)P(A,G) = P(A|G) P(G) = P(A)P(G)
P(A,G) = P(G)P(A)
![Page 12: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/12.jpg)
21© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Conditional Independence
Smoking
GenderAge
Cancer
Cancer is independent of Age and Gender given Smoking.
P(C|A,G,S) = P(C|S) C A,G | S
![Page 13: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/13.jpg)
22© Jack Breese (Microsoft) & Daphne Koller (Stanford)
More Conditional Independence:Naïve Bayes
Cancer
LungTumor
SerumCalcium
Serum Calcium is independent of Lung Tumor, given Cancer
P(L|SC,C) = P(L|C)
Serum Calcium and Lung Tumor are dependent
![Page 14: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/14.jpg)
23© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Naïve Bayes in general
H
E1 E2 E3 En…...
2n + 1 parameters:nihePheP
hP
ii ,,1),|(),|(
)(
![Page 15: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/15.jpg)
24© Jack Breese (Microsoft) & Daphne Koller (Stanford)
More Conditional Independence:Explaining Away
Exposure to Toxics is dependent on Smoking, given Cancer
Exposure to Toxics and Smoking are independentSmoking
Cancer
Exposureto Toxics
E S
P(E = heavy | C = malignant) >
P(E = heavy | C = malignant, S=heavy)
![Page 16: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/16.jpg)
25© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Put it all together),,,,,,( SCLCSEGAP
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
)|()|( CLPCSCP
)()( GPAP
),|()|( GASPAEP
),|( SECP
![Page 17: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/17.jpg)
26© Jack Breese (Microsoft) & Daphne Koller (Stanford)
General Product (Chain) Rule for Bayesian Networks
)|(),,,(1
21 iPa
n
iin XPXXXP
Pai=parents(Xi)
![Page 18: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/18.jpg)
27© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Conditional Independence
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics Cancer is independent
of Age and Gender given Exposure to Toxics and Smoking.
Descendants
Parents
Non-Descendants
A variable (node) is conditionally independent of its non-descendants given its parents.
![Page 19: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/19.jpg)
28© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Another non-descendant
Diet
Cancer is independent of Diet given Exposure to Toxics and Smoking.
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
![Page 20: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/20.jpg)
29© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Independence and Graph Separation
Given a set of observations, is one set of variables dependent on another set?
Observing effects can induce dependencies. d-separation (Pearl 1988) allows us to check
conditional independence graphically.
![Page 21: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/21.jpg)
36© Jack Breese (Microsoft) & Daphne Koller (Stanford)
CPCS Network
![Page 22: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/22.jpg)
48© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Structuring
LungTumor
SmokingExposureto Toxic
GenderAge
Extending the conversation.
Network structure correspondingto “causality” is usually good.
CancerGeneticDamage
![Page 23: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/23.jpg)
50© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Local Structure Causal independence: from
2n to n+1 parameters Asymmetric assessment:
similar savings in practice. Typical savings (#params):
145 to 55 for a small hardware network;
133,931,430 to 8254 for CPCS !!
![Page 24: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/24.jpg)
51© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Course Contents
Concepts in Probability Bayesian Networks» Inference Decision making Learning networks from data Reasoning over time Applications
![Page 25: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/25.jpg)
52© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Inference
Patterns of reasoning Basic inference Exact inference Exploiting structure Approximate inference
![Page 26: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/26.jpg)
53© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Predictive Inference
How likely are elderly malesto get malignant cancer?
P(C=malignant | Age>60, Gender= male)
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
![Page 27: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/27.jpg)
54© Jack Breese (Microsoft) & Daphne Koller (Stanford)
CombinedHow likely is an elderly male patient with high Serum Calcium to have malignant cancer?
P(C=malignant | Age>60, Gender= male, Serum Calcium = high)
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
![Page 28: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/28.jpg)
55© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Explaining away
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
If we see a lung tumor, the probability of heavy smoking and of exposure to toxics both go up.
If we then observe heavy smoking, the probability of exposure to toxics goes back down.
Smoking
![Page 29: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/29.jpg)
56© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Inference in Belief Networks
Find P(Q=q|E= e) Q the query variable E set of evidence variables
P(q | e) =P(q, e)
P(e)
X1,…, Xn are network variables except Q, E
P(q, e) = P(q, e, x1,…, xn) x1,…, xn
![Page 30: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/30.jpg)
57© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Basic Inference
P(C,S) = P(C|S) P(S)
S CP(c) =
S C none benign malig totalno 0.768 0.024 0.008 .80
light 0.132 0.012 0.006 .15
heavy 0.035 0.010 0.005 .05
total 0.935 0.046 0.019
P(Cancer)
S C none benign malig totalno 0.768 0.024 0.008 .80
light 0.132 0.012 0.006 .15
heavy 0.035 0.010 0.005 .05
total 0.935 0.046 0.019
P(Cancer)P(Cancer)
P(S=no) 0.80P(S=light) 0.15P(S=heavy) 0.05
P(S=no) 0.80P(S=light) 0.15P(S=heavy) 0.05
Smoking= no light heavyP(C=none) 0.96 0.88 0.60P(C=benign) 0.03 0.08 0.25P(C=malig) 0.01 0.04 0.15
Smoking= no light heavyP(C=none) 0.96 0.88 0.60P(C=benign) 0.03 0.08 0.25P(C=malig) 0.01 0.04 0.15
![Page 31: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/31.jpg)
58© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Basic Inference
A B
= P(c | b) P(b | a) P(a) b a
P(b)
P(c) = P(a, b, c)b,a
P(b) = P(a, b) = P(b | a) P(a) a a
C
bP(c) = P(c | b) P(b)
b,a= P(c | b) P(b | a) P(a)
![Page 32: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/32.jpg)
59© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Inference in trees
X
Y1Y2
P(x) = P(x | y1, y2) P(y1, y2)y1, y2
because of independence of Y1, Y2:
y1, y2
= P(x | y1, y2) P(y1) P(y2)
X
![Page 33: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/33.jpg)
60© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Polytrees A network is singly connected (a polytree)
if it contains no undirected loops.
Theorem: Inference in a singly connected network can be done in linear time*.
Main idea: in variable elimination, need only maintain distributions over single nodes.
* in network size including table sizes.
![Page 34: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/34.jpg)
61© Jack Breese (Microsoft) & Daphne Koller (Stanford)
The problem with loops
Rain
Cloudy
Grass-wet
Sprinkler
P(c) 0.5
P(r)c c
0.99 0.01 P(s)c c
0.01 0.99
deterministic or
The grass is dry only if no rain and no sprinklers.
P(g) = P(r, s) ~ 0
![Page 35: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/35.jpg)
62© Jack Breese (Microsoft) & Daphne Koller (Stanford)
The problem with loops contd.
= P(r, s)
P(g | r, s) P(r, s) + P(g | r, s) P(r, s)
+ P(g | r, s) P(r, s) + P(g | r, s) P(r, s)
0
10
0
= P(r) P(s) ~ 0.5 ·0.5 = 0.25
problem
~ 0
P(g) =
![Page 36: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/36.jpg)
63© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Variable elimination
A
B
C
P(c) = P(c | b) P(b | a) P(a) b a
P(b)
x
P(A) P(B | A)
P(B, A) A P(B)
x
P(C | B)
P(C, B) B P(C)
![Page 37: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/37.jpg)
64© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Inference as variable elimination
A factor over X is a function from val(X) to numbers in [0,1]: A CPT is a factor A joint distribution is also a factor
BN inference: factors are multiplied to give new ones variables in factors summed out
A variable can be summed out as soon as all factors mentioning it have been multiplied.
![Page 38: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/38.jpg)
65© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Variable Elimination with loops
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
x
P(A,G,S)
P(A) P(S | A,G)P(G)
P(A,S)G
E,S
P(C)
P(L | C) x P(C,L) C
P(L)
Complexity is exponential in the size of the factors
P(E,S)A
P(A,E,S)
P(E | A)
x
P(C | E,S)
P(E,S,C)
x
![Page 39: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/39.jpg)
66© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Join trees*
P(A)
P(S | A,G)
P(G)
P(A,S)x
xx A, G, S
E, S, C
C, LC, S-C
A join tree is a partially precompiled factorization
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
* aka junction trees, Lauritzen-Spiegelhalter, Hugin alg., …
A, E, S
![Page 40: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/40.jpg)
70© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Computational complexity Theorem: Inference in a multi-connected
Bayesian network is NP-hard.
Boolean 3CNF formula = (u v w) (u w y)
Probability ( ) = 1/2n · # satisfying assignments of
or
and
U V W Y
prior probability1/2or
![Page 41: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/41.jpg)
71© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Samples:
B E A C N
Stochastic simulation
Call
Alarm
Burglary Earthquake
Newscast
P(b) 0.03 P(e) 0.001
P(a)b e b e b e b e
0.98 0.40.7 0.01
P(c)a a
0.8 0.05P(n)
e e0.3 0.001
e a c
= c
b n
b e a c n
0.03 0.001
0.3
0.4
0.8
P(b|c) ~# of live samples with B=b
total # of live samples
...
![Page 42: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/42.jpg)
72© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Likelihood weighting
Samples:B E A C N
...
P(b|c) =weight of samples with B=b
total weight of samples
e a cb n
P(c)a a
0.8 0.05
Call
Alarm
Burglary Earthquake
Newscast= c
weight
0.8
0.05b e a c n
P(c) 0.2 0.95
![Page 43: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/43.jpg)
73
Markov Chain Monte CarloMarkov Chain Monte Carlo
![Page 44: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/44.jpg)
74
MCMC with Gibbs SamplingMCMC with Gibbs Sampling
Fix the values of observed variables
Set the values of all non-observed variables randomly
Perform a random walk through the space of complete variable assignments. On each move:
1. Pick a variable X
2. Calculate Pr(X=true | all other variables)
3. Set X to true with that probability
Repeat many times. Frequency with which any variable X is true is it’s posterior probability.
Converges to true posterior when frequencies stop changing significantly
• stable distribution, mixing
![Page 45: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/45.jpg)
75
Markov Blanket SamplingMarkov Blanket Sampling
How to calculate Pr(X=true | all other variables) ?
Recall: a variable is independent of all others given it’s Markov Blanket
• parents
• children
• other parents of children
So problem becomes calculating Pr(X=true | MB(X))
• We solve this sub-problem exactly
• Fortunately, it is easy to solve
( )
( ) ( | ( )) ( | ( ))Y Children X
P X P X Parents X P Y Parents Y
![Page 46: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/46.jpg)
76
ExampleExample
( )
( ) ( | ( )) ( | ( ))Y Children X
P X P X Parents X P Y Parents Y
( )
( , , , )( | , , )
( , , )
( , , )
( ) ( )( | ) ( | , )
( , , )
( | )
( | ) ( ) ( | , )
( | , )
P X A B CP X A B C
P A B C
P A B
P A P X A P C P B
C
P A P CP X A P B X C
P A B C
P X
X
A P B X C
C
A
X
B
C
![Page 47: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/47.jpg)
77
ExampleExample
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 48: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/48.jpg)
78
ExampleExample
Evidence: s, b
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 49: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/49.jpg)
79
ExampleExample
Evidence: s, b
Randomly set: h, b
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 50: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/50.jpg)
80
ExampleExample
Evidence: s, b
Randomly set: h, g
Sample H using P(H|s,g,b)
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 51: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/51.jpg)
81
ExampleExample
Evidence: s, b
Randomly set: ~h, g
Sample H using P(H|s,g,b)
Suppose result is ~h
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 52: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/52.jpg)
82
ExampleExample
Evidence: s, b
Randomly set: ~h, g
Sample H using P(H|s,g,b)
Suppose result is ~h
Sample G using P(G|s,~h,b)
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 53: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/53.jpg)
83
ExampleExample
Evidence: s, b
Randomly set: ~h, g
Sample H using P(H|s,g,b)
Suppose result is ~h
Sample G using P(G|s,~h,b)
Suppose result is g
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 54: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/54.jpg)
84
ExampleExample
Evidence: s, b
Randomly set: ~h, g
Sample H using P(H|s,g,b)
Suppose result is ~h
Sample G using P(G|s,~h,b)
Suppose result is g
Sample G using P(G|s,~h,b)
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 55: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/55.jpg)
85
ExampleExample
Evidence: s, b
Randomly set: ~h, g
Sample H using P(H|s,g,b)
Suppose result is ~h
Sample G using P(G|s,~h,b)
Suppose result is g
Sample G using P(G|s,~h,b)
Suppose result is ~g
Smoking
Heartdisease
Lungdisease
Shortnessof breath
P(s)
s 0.6
~s 0.1
P(g)
s 0.8
~s 0.1
H G P(b)
h g 0.9
h ~g 0.8
~h g 0.7
~h ~g 0.1
P(s)
0.2
![Page 56: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/56.jpg)
86
Gibbs MCMC SummaryGibbs MCMC Summary
Advantages:
• No samples are discarded
• No problem with samples of low weight
• Can be implemented very efficiently– 10K samples @ second
Disadvantages:
• Can get stuck if relationship between two variables is deterministic
• Many variations have been devised to make MCMC more robust
P(X|E) =number of samples with X=x
total number of samples
![Page 57: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/57.jpg)
87© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Other approaches
Search based techniques search for high-probability instantiations use instantiations to approximate probabilities
Structural approximation simplify network
eliminate edges, nodes abstract node values simplify CPTs
do inference in simplified network
![Page 58: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/58.jpg)
107© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Course Contents
Concepts in Probability Bayesian Networks Inference Decision making» Learning networks from data Reasoning over time Applications
![Page 59: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/59.jpg)
108© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Learning networks from data
The learning task Parameter learning
Fully observable Partially observable
Structure learning Hidden variables
![Page 60: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/60.jpg)
109© Jack Breese (Microsoft) & Daphne Koller (Stanford)
The learning task
B E A C N
...
Input: training data
Call
Alarm
Burglary Earthquake
Newscast
Output: BN modeling data
Input: fully or partially observable data cases? Output: parameters or also structure?
e a cb n
b e a c n
![Page 61: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/61.jpg)
110© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Parameter learning: one variable
Different coin tosses independent given P(X1, …, Xn | ) =
h heads, t tails
Unfamiliar coin: Let = bias of coin (long-run fraction of heads)
If known (given), then P(X = heads | ) =
h (1-)t
![Page 62: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/62.jpg)
111© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Maximum likelihood
= hh+t
Input: a set of previous coin tosses X1, …, Xn = {H, T, H, H, H, T, T, H, . . ., H}
h heads, t tails Goal: estimate The likelihood P(X1, …, Xn | ) = h (1-)t
The maximum likelihood solution is:
![Page 63: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/63.jpg)
115© Jack Breese (Microsoft) & Daphne Koller (Stanford)
General parameter learning A multi-variable BN is composed of several
independent parameters (“coins”).
A B A, B|a, B|a
Can use same techniques as one-variable case to learn each one separately
Three parameters:
Max likelihood estimate of B|a would be:
#data cases with b, a#data cases with a
B|a =
![Page 64: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/64.jpg)
116© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Partially observable data
B E A C N
... Call
Alarm
Burglary Earthquake
Newscast
Fill in missing data with “expected” value expected = distribution over possible values use “best guess” BN to estimate distribution
? a cb ?
b ? a ? n
![Page 65: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/65.jpg)
117© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Intuition In fully observable case:
Problem: * unknown.
n|e =
#data cases with n, e#data cases with e
j I(n,e | dj)
j I(e | dj)
I(e | dj) =1 if E=e in data case dj
0 otherwise
=
In partially observable case I is unknown.
Best estimate for I is: )|,()|,(ˆ * jj denPdenI
![Page 66: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/66.jpg)
118© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Expectation Maximization (EM)
Expectation (E) step Use current parameters to estimate filled in data.
Maximization (M) step Use filled in data to do max likelihood estimation
)|,()|,(ˆ jj denPdenI
j j
j j
endeI
denI
)|(ˆ
)|,(ˆ~|
Repeat :
until convergence.
Set: ~:
![Page 67: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/67.jpg)
119© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Structure learning
Goal: find “good” BN structure (relative to data)
Solution: do heuristic search over space of network structures.
![Page 68: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/68.jpg)
120© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Search spaceSpace = network structuresOperators = add/reverse/delete edges
![Page 69: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/69.jpg)
121© Jack Breese (Microsoft) & Daphne Koller (Stanford)
score
Heuristic searchUse scoring function to do heuristic search (any algorithm).Greedy hill-climbing with randomness works pretty well.
![Page 70: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/70.jpg)
122© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Scoring Fill in parameters using previous techniques
& score completed networks. One possibility for score:
likelihood function: Score(B) = P(data | B)
Example: X, Y independent coin tosses typical data = (27 h-h, 22 h-t, 25 t-h, 26 t-t)
Maximum likelihood network structure:
X Y
Max. likelihood network typically fully connected
This is not surprising: maximum likelihood always overfits…
![Page 71: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/71.jpg)
123© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Better scoring functions MDL formulation: balance fit to data and
model complexity (# of parameters)
Score(B) = P(data | B) - model complexity
* with Dirichlet parameter prior, MDL is an approximation to full Bayesian score.
Full Bayesian formulation prior on network structures & parameters more parameters higher dimensional space get balance effect as a byproduct*
![Page 72: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/72.jpg)
146© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Course Contents
Concepts in Probability Bayesian Networks Inference Decision making Learning networks from data Reasoning over time» Applications
![Page 73: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/73.jpg)
147© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Applications Medical expert systems
Pathfinder Parenting MSN
Fault diagnosis Ricoh FIXIT Decision-theoretic troubleshooting
Vista Collaborative filtering
![Page 74: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/74.jpg)
148© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Why use Bayesian Networks?
Explicit management of uncertainty/tradeoffs Modularity implies maintainability Better, flexible, and robust recommendation
strategies
![Page 75: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/75.jpg)
149© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Pathfinder
Pathfinder is one of the first BN systems. It performs diagnosis of lymph-node diseases. It deals with over 60 diseases and 100 findings. Commercialized by Intellipath and Chapman Hall publishing and
applied to about 20 tissue types.
![Page 76: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/76.jpg)
152© Jack Breese (Microsoft) & Daphne Koller (Stanford)
On Parenting: Selecting problem
Diagnostic indexing for Home Health site on Microsoft Network
Enter symptoms for pediatric complaints
Recommends multimedia content
![Page 77: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/77.jpg)
153© Jack Breese (Microsoft) & Daphne Koller (Stanford)
On Parenting : MSNOriginal Multiple Fault Model
![Page 78: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/78.jpg)
157© Jack Breese (Microsoft) & Daphne Koller (Stanford)
RICOH Fixit Diagnostics and information retrieval
![Page 79: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/79.jpg)
170© Jack Breese (Microsoft) & Daphne Koller (Stanford)
What is Collaborative Filtering?
A way to find cool websites, news stories, music artists etc
Uses data on the preferences of many users, not descriptions of the content.
Firefly, Net Perceptions (GroupLens), and others offer this technology.
![Page 80: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/80.jpg)
171© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Bayesian Clustering for Collaborative Filtering
P(Like title i | Like title j, Like title k)
Probabilistic summary of the data Reduces the number of parameters to
represent a set of preferences Provides insight into usage patterns. Inference:
![Page 81: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/81.jpg)
172© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Applying Bayesian clustering
class1 class2 ...title1 p(like)=0.2 p(like)=0.8title2 p(like)=0.7 p(like)=0.1title3 p(like)=0.99 p(like)=0.01
...
user classes
title 1 title 2 title n...
![Page 82: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/82.jpg)
173© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Readers of commerce andtechnology stories (36%):
MSNBC Story clusters
E-mail delivery isn't exactly guaranteed
Should you buy a DVD player? Price low, demand high for
Nintendo
Sports Readers (19%): Umps refusing to work is the right
thing Cowboys are reborn in win over
eagles Did Orioles spend money wisely?
Readers of top promotedstories (29%): 757 Crashes At Sea Israel, Palestinians Agree To
Direct Talks Fuhrman Pleads Innocent To
Perjury
Readers of “Softer” News (12%): The truth about what things cost Fuhrman Pleads Innocent To
Perjury Real Astrology
![Page 83: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/83.jpg)
174© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Top 5 shows by user classClass 1• Power rangers• Animaniacs• X-men• Tazmania• Spider man
Class 4• 60 minutes• NBC nightly news• CBS eve news• Murder she wrote• Matlock
Class 2• Young and restless• Bold and the beautiful• As the world turns• Price is right• CBS eve news
Class 3• Tonight show• Conan O’Brien• NBC nightly news• Later with Kinnear• Seinfeld
Class 5• Seinfeld• Friends• Mad about you• ER• Frasier
![Page 84: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/84.jpg)
175© Jack Breese (Microsoft) & Daphne Koller (Stanford)
Richer model
Age GenderLikessoaps
User class
WatchesSeinfeld
WatchesNYPD Blue
WatchesPower Rangers
![Page 85: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/85.jpg)
176© Jack Breese (Microsoft) & Daphne Koller (Stanford)
What’s old?
principled models of belief and preference; techniques for:
integrating evidence (conditioning); optimal decision making (max. expected utility); targeted information gathering (value of info.); parameter estimation from data.
Decision theory & probability theory provide:
![Page 86: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/86.jpg)
177© Jack Breese (Microsoft) & Daphne Koller (Stanford)
What’s new?
Knowledge Acquisition
Inference
Learning
Bayesian networks exploit domain structure to allowcompact representations of complex models.
StructuredRepresentation
![Page 87: Tutorial on Bayesian Networks](https://reader036.fdocuments.us/reader036/viewer/2022062321/5681369a550346895d9e34d3/html5/thumbnails/87.jpg)
179© Jack Breese (Microsoft) & Daphne Koller (Stanford)
What’s in our future?
Better models for: preferences & utilities; not-so-precise numerical probabilities.
Inferring causality from data. More expressive representation languages:
structured domains with multiple objects; levels of abstraction; reasoning about time; hybrid (continuous/discrete) models.
StructuredRepresentation