Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based...
Transcript of Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based...
![Page 1: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/1.jpg)
Bayesian networks
Chapter 14.1–3
Chapter 14.1–3 1
![Page 2: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/2.jpg)
Outline
♦ Syntax
♦ Semantics
♦ Parameterized distributions
Chapter 14.1–3 2
![Page 3: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/3.jpg)
Bayesian networks
A simple, graphical notation for conditional independence assertionsand hence for compact specification of full joint distributions
Syntax:a set of nodes, one per variablea directed, acyclic graph (link ≈ “directly influences”)a conditional distribution for each node given its parents:
P(Xi|Parents(Xi))
In the simplest case, conditional distribution represented asa conditional probability table (CPT) giving thedistribution over Xi for each combination of parent values
Chapter 14.1–3 3
![Page 4: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/4.jpg)
Example
Topology of network encodes conditional independence assertions:
Weather Cavity
Toothache Catch
Weather is independent of the other variables
Toothache and Catch are conditionally independent given Cavity
Chapter 14.1–3 4
![Page 5: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/5.jpg)
Example
I’m at work, neighbor John calls to say my alarm is ringing, but neighborMary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there aburglar?
Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCallsNetwork topology reflects “causal” knowledge:
– A burglar can set the alarm off– An earthquake can set the alarm off– The alarm can cause Mary to call– The alarm can cause John to call
Chapter 14.1–3 5
![Page 6: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/6.jpg)
Example contd.
.001
P(B)
.002
P(E)
Alarm
Earthquake
MaryCallsJohnCalls
Burglary
B
TTFF
E
TFTF
.95
.29
.001
.94
P(A|B,E)
A
TF
.90
.05
P(J|A) A
TF
.70
.01
P(M|A)
Chapter 14.1–3 6
![Page 7: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/7.jpg)
Compactness
A CPT for Boolean Xi with k Boolean parents hasB E
J
A
M
2k rows for the combinations of parent values
Each row requires one number p for Xi = true(the number for Xi = false is just 1 − p)
If each variable has no more than k parents,the complete network requires O(n · 2k) numbers
I.e., grows linearly with n, vs. O(2n) for the full joint distribution
For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25 − 1 = 31)
Chapter 14.1–3 7
![Page 8: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/8.jpg)
Global semantics
Global semantics defines the full joint distributionB E
J
A
M
as the product of the local conditional distributions:
P (x1, . . . , xn) = Πni = 1P (xi|parents(Xi))
e.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e)
=
Chapter 14.1–3 8
![Page 9: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/9.jpg)
Global semantics
“Global” semantics defines the full joint distributionB E
J
A
M
as the product of the local conditional distributions:
P (x1, . . . , xn) = Πni = 1P (xi|parents(Xi))
e.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e)
= P (j|a)P (m|a)P (a|¬b,¬e)P (¬b)P (¬e)
= 0.9× 0.7× 0.001× 0.999× 0.998
≈ 0.00063
Chapter 14.1–3 9
![Page 10: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/10.jpg)
Local semantics
Local semantics: each node is conditionally independentof its nondescendants given its parents
. . .
. . .U1
X
Um
Yn
Znj
Y1
Z1j
Theorem: Local semantics ⇔ global semantics
Chapter 14.1–3 10
![Page 11: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/11.jpg)
Markov blanket
Each node is conditionally independent of all others given itsMarkov blanket: parents + children + children’s parents
. . .
. . .U1
X
Um
Yn
Znj
Y1
Z1j
Chapter 14.1–3 11
![Page 12: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/12.jpg)
Constructing Bayesian networks
Need a method such that a series of locally testable assertions ofconditional independence guarantees the required global semantics
1. Choose an ordering of variables X1, . . . , Xn
2. For i = 1 to nadd Xi to the networkselect parents from X1, . . . ,Xi−1 such that
P(Xi|Parents(Xi)) = P(Xi|X1, . . . , Xi−1)
This choice of parents guarantees the global semantics:
P(X1, . . . ,Xn) = Πni = 1P(Xi|X1, . . . , Xi−1) (chain rule)
= Πni = 1P(Xi|Parents(Xi)) (by construction)
Chapter 14.1–3 12
![Page 13: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/13.jpg)
Example
Suppose we choose the ordering M , J , A, B, E
MaryCalls
JohnCalls
P (J |M) = P (J)?
Chapter 14.1–3 13
![Page 14: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/14.jpg)
Example
Suppose we choose the ordering M , J , A, B, E
MaryCalls
Alarm
JohnCalls
P (J |M) = P (J)? NoP (A|J, M) = P (A|J)? P (A|J, M) = P (A)?
Chapter 14.1–3 14
![Page 15: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/15.jpg)
Example
Suppose we choose the ordering M , J , A, B, E
MaryCalls
Alarm
Burglary
JohnCalls
P (J |M) = P (J)? NoP (A|J, M) = P (A|J)? P (A|J, M) = P (A)? NoP (B|A, J, M) = P (B|A)?P (B|A, J, M) = P (B)?
Chapter 14.1–3 15
![Page 16: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/16.jpg)
Example
Suppose we choose the ordering M , J , A, B, E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P (J |M) = P (J)? NoP (A|J, M) = P (A|J)? P (A|J, M) = P (A)? NoP (B|A, J, M) = P (B|A)? YesP (B|A, J, M) = P (B)? NoP (E|B,A, J, M) = P (E|A)?P (E|B,A, J, M) = P (E|A,B)?
Chapter 14.1–3 16
![Page 17: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/17.jpg)
Example
Suppose we choose the ordering M , J , A, B, E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P (J |M) = P (J)? NoP (A|J, M) = P (A|J)? P (A|J, M) = P (A)? NoP (B|A, J, M) = P (B|A)? YesP (B|A, J, M) = P (B)? NoP (E|B,A, J, M) = P (E|A)? NoP (E|B,A, J, M) = P (E|A,B)? Yes
Chapter 14.1–3 17
![Page 18: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/18.jpg)
Example contd.
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
Deciding conditional independence is hard in noncausal directions
(Causal models and conditional independence seem hardwired for humans!)
Assessing conditional probabilities is hard in noncausal directions
Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
Chapter 14.1–3 18
![Page 19: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/19.jpg)
Example: Car diagnosis
Initial evidence: car won’t startTestable variables (green), “broken, so fix it” variables (orange)Hidden variables (gray) ensure sparse structure, reduce parameters
lights
no oil no gas starterbroken
battery age alternator broken
fanbeltbroken
battery dead no charging
battery flat
gas gauge
fuel lineblocked
oil light
battery meter
car won’t start dipstick
Chapter 14.1–3 19
![Page 20: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/20.jpg)
Example: Car insurance
SocioEconAge
GoodStudent
ExtraCarMileage
VehicleYearRiskAversion
SeniorTrain
DrivingSkill MakeModel
DrivingHist
DrivQualityAntilock
Airbag CarValue HomeBase AntiTheft
TheftOwnDamage
PropertyCostLiabilityCostMedicalCost
Cushioning
Ruggedness Accident
OtherCost OwnCost
Chapter 14.1–3 20
![Page 21: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/21.jpg)
Compact conditional distributions
CPT grows exponentially with number of parentsCPT becomes infinite with continuous-valued parent or child
Solution: canonical distributions that are defined compactly
Deterministic nodes are the simplest case:X = f(Parents(X)) for some function f
E.g., Boolean functionsNorthAmerican ⇔ Canadian ∨ US ∨ Mexican
E.g., numerical relationships among continuous variables
∂Level
∂t= inflow + precipitation - outflow - evaporation
Chapter 14.1–3 21
![Page 22: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/22.jpg)
Compact conditional distributions contd.
Noisy-OR distributions model multiple noninteracting causes1) Parents U1 . . . Uk include all causes (can add leak node)2) Independent failure probability qi for each cause alone
⇒ P (X|U1 . . . Uj,¬Uj+1 . . .¬Uk) = 1 − Πji =1qi
Cold F lu Malaria P (Fever) P (¬Fever)F F F 0.0 1.0F F T 0.9 0.1
F T F 0.8 0.2
F T T 0.98 0.02 = 0.2 × 0.1T F F 0.4 0.6
T F T 0.94 0.06 = 0.6 × 0.1T T F 0.88 0.12 = 0.6 × 0.2T T T 0.988 0.012 = 0.6 × 0.2 × 0.1
Number of parameters linear in number of parents
Chapter 14.1–3 22
![Page 23: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/23.jpg)
Hybrid (discrete+continuous) networks
Discrete (Subsidy? and Buys?); continuous (Harvest and Cost)
Buys?
HarvestSubsidy?
Cost
Option 1: discretization—possibly large errors, large CPTsOption 2: finitely parameterized canonical families
1) Continuous variable, discrete+continuous parents (e.g., Cost)2) Discrete variable, continuous parents (e.g., Buys?)
Chapter 14.1–3 23
![Page 24: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/24.jpg)
Continuous child variables
Need one conditional density function for child variable given continuousparents, for each possible assignment to discrete parents
Most common is the linear Gaussian model, e.g.,:
P (Cost = c|Harvest = h, Subsidy? = true)
= N(ath + bt, σt)(c)
=1
σt
√2π
exp
−1
2
c − (ath + bt)
σt
2
Mean Cost varies linearly with Harvest, variance is fixed
Linear variation is unreasonable over the full rangebut works OK if the likely range of Harvest is narrow
Chapter 14.1–3 24
![Page 25: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/25.jpg)
Continuous child variables
05
10 0
5
10
00.05
0.10.15
0.20.25
0.30.35
CostHarvest
P(Cost|Harvest,Subsidy?=true)
All-continuous network with LG distributions⇒ full joint distribution is a multivariate Gaussian
Discrete+continuous LG network is a conditional Gaussian network i.e., amultivariate Gaussian over all continuous variables for each combination ofdiscrete variable values
Chapter 14.1–3 25
![Page 26: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/26.jpg)
Discrete variable w/ continuous parents
Probability of Buys? given Cost should be a “soft” threshold:
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12
P(B
uys?
=fa
lse|
Cos
t=c)
Cost c
Probit distribution uses integral of Gaussian:Φ(x) =
∫ x−∞ N(0, 1)(x)dx
P (Buys? = true | Cost = c) = Φ((−c + µ)/σ)
Chapter 14.1–3 26
![Page 27: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/27.jpg)
Why the probit?
1. It’s sort of the right shape
2. Can view as hard threshold whose location is subject to noise
Buys?
Cost Cost Noise
Chapter 14.1–3 27
![Page 28: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/28.jpg)
Discrete variable contd.
Sigmoid (or logit) distribution also used in neural networks:
P (Buys? = true | Cost = c) =1
1 + exp(−2−c+µσ )
Sigmoid has similar shape to probit but much longer tails:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12
P(B
uys?
=fa
lse|
Cos
t=c)
Cost c
Chapter 14.1–3 28
![Page 29: Chapter 14.1{3dmitra/ArtInt/TextSlides/chapter14a.pdfChapter 14.1{3 12 The ordering is based somewhat on the objective. It could be for causal network or diagnostic network, or based](https://reader035.fdocuments.us/reader035/viewer/2022062414/5ee3a314ad6a402d666d5dcd/html5/thumbnails/29.jpg)
Summary
Bayes nets provide a natural representation for (causally induced)conditional independence
Topology + CPTs = compact representation of joint distribution
Generally easy for (non)experts to construct
Canonical distributions (e.g., noisy-OR) = compact representation of CPTs
Continuous variables ⇒ parameterized distributions (e.g., linear Gaussian)
Chapter 14.1–3 29