Lecture 16 • 1 Lecture 16 • 2 · 1 Lecture 16 • 1 6.825 Techniques in Artificial Intelligence...
Transcript of Lecture 16 • 1 Lecture 16 • 2 · 1 Lecture 16 • 1 6.825 Techniques in Artificial Intelligence...
1
Lecture 16 • 1
6.825 Techniques in Artificial Intelligence
Inference in Bayesian Networks
• Exact inference• Approximate inference
Lecture 16 • 2
Query Types
Given a Bayesian network, what questions might wewant to ask?
• Conditional probability query: P(x | e)• Maximum a posteriori probability:
What value of x maximizes P(x|e) ?
General question: What’s the whole probabilitydistribution over variable X given evidence e,P(X | e)?
Lecture 16 • 3
Using the joint distribution
To answer any query involving a conjunction ofvariables, sum over the variables not involved inthe query.
†
Pr(d) = Pr(a,b,c,d)ABCÂ
= Pr(A = a Ÿ B = b ŸC = c)cŒdom(C )Â
bŒdom(B )Â
aŒdom(A )Â
†
Pr(d | b) =Pr(b,d)Pr(b)
=Pr(a,b,c,d)
ACÂ
Pr(a,b,c,d)ACDÂ
Lecture 16 • 4
Simple Case
A B C D
†
Pr(d) = Pr(a,b,c,d)ABCÂ
= Pr(d | c)Pr(c | b)Pr(b | a)Pr(a)ABCÂ
= Pr(d | c)Pr(c | b)Pr(b | a)Pr(a)AÂ
BÂ
CÂ
= Pr(d | c) Pr(c | b) Pr(b | a)Pr(a)AÂ
BÂ
CÂ
Lecture 16 • 5
Simple Case
A B C D
†
Pr(d) = Pr(d | c) Pr(c | b) Pr(b | a)Pr(a)AÂ
BÂ
CÂ
†
Pr(b1 | a1) Pr(a1) Pr(b1 | a2 ) Pr(a2 )Pr(b2 | a1) Pr(a1 ) Pr(b2 | a2 ) Pr(a2 )
È
Î Í
˘
˚ ˙
Lecture 16 • 6
Simple Case
A B C D
†
Pr(d) = Pr(d | c) Pr(c | b) Pr(b | a)Pr(a)AÂ
BÂ
CÂ
†
Pr(b1 | a) Pr(a)A
ÂPr(b2 | a) Pr(a)
AÂ
È
Î
Í Í Í
˘
˚
˙ ˙ ˙
2
Lecture 16 • 7
Simple Case
A B C D
†
Pr(d) = Pr(d | c) Pr(c | b) Pr(b | a)Pr(a)AÂ
BÂ
CÂ
†
f1(b)
Lecture 16 • 8
Simple Case
B C D
†
Pr(d) = Pr(d | c) Pr(c | b) f1BÂ
CÂ (b)
†
f2 (c)
Lecture 16 • 9
Simple Case
B C D
†
Pr(d) = Pr(d | c) Pr(c | b) f1BÂ
CÂ (b)
†
f2 (c)
Lecture 16 • 10
Simple Case
C D
†
Pr(d) = Pr(d | c) f2 (c)CÂ
Lecture 16 • 11
Variable Elimination Algorithm
Given a Bayesian network, and an elimination orderfor the non-query variables, compute
For i = m downto 1• remove all the factors that mention Xi
• multiply those factors, getting a value for eachcombination of mentioned variables
• sum over Xi
• put this new factor into the factor set
†
K Pr(x j | Pa(x jj
’Xm
ÂX2
ÂX1
 ))
Lecture 16 • 12
One more example
Smoke
Bronchitis
Lungcancer
DyspneaXray
chest
Abnrm
Tuberculosis
risky
Visit
†
Pr(d) =Pr(d | a,b) Pr(a | t,l ) Pr(b | s) Pr(l | s) Pr(s)Pr(x | a) Pr(t | v) Pr(v)A,B ,L,T,S ,X ,V
Â
3
Lecture 16 • 13
One more example
Smoke
Bronchitis
Lungcancer
DyspneaXray
chest
Abnrm
Tuberculosis
risky
Visit
†
Pr(d) =Pr(d | a,b) Pr(a | t,l) Pr(b | s) Pr(l | s) Pr(s)Pr(x | a) Pr(t | v) Pr(v)
VÂ
A,B ,L,T,S ,XÂ
†
f1 (t)Lecture 16 • 14
One more example
Smoke
Bronchitis
Lungcancer
DyspneaXray
chest
Abnrm
Tuberculosis
†
Pr(d) =Pr(d | a,b) Pr(a | t,l) Pr(b | s) Pr(l | s) Pr(s)Pr(x | a) f1 (t)A,B ,L,T,S ,X
Â
Lecture 16 • 15
One more example
Smoke
Bronchitis
Lungcancer
DyspneaXray
chest
Abnrm
Tuberculosis
†
Pr(d) =Pr(d | a,b) Pr(a | t,l) Pr(b | s) Pr(l | s) Pr(s) f1(t)
Pr(x | a)X
ÂA,B ,L,T,SÂ
†
1Lecture 16 • 16
One more example
Smoke
Bronchitis
Lungcancer
Dyspnea
chest
Abnrm
Tuberculosis
†
Pr(d) = Pr(d | a,b) Pr(a | t,l ) Pr(b | s) Pr(l | s) Pr(s) f1(t)A,B ,L,T,SÂ
Lecture 16 • 17
One more example
Smoke
Bronchitis
Lungcancer
Dyspnea
chest
Abnrm
Tuberculosis
†
Pr(d) = Pr(d | a,b) Pr(a | t,l ) f1(t) Pr(b | s) Pr(l | s) Pr(s)S
ÂA,B ,L,TÂ
†
f2 (b,l )
Lecture 16 • 18
One more example
Bronchitis
Lungcancer
Dyspnea
chest
Abnrm
Tuberculosis
†
Pr(d) = Pr(d | a,b) Pr(a | t,l) f1 (t) f2 (b,l)A,B ,L,TÂ
4
Lecture 16 • 19
One more example
Bronchitis
Lungcancer
Dyspnea
chest
Abnrm
Tuberculosis
†
Pr(d) = Pr(d | a,b) f2 (b,l) Pr(a | t,l) f1(t)TÂ
A,B ,LÂ
†
f3 (a,l )
Lecture 16 • 20
One more example
Bronchitis
Lungcancer
Dyspnea
chest
Abnrm
†
Pr(d) = Pr(d | a,b) f2 (b,l) f3(a,l )A,B ,LÂ
Lecture 16 • 21
One more example
Bronchitis
Lungcancer
Dyspnea
chest
Abnrm
†
Pr(d) = Pr(d | a,b) f2 (b,l) f3 (a,l )L
ÂA ,BÂ
†
f4 (a,b)
Lecture 16 • 22
One more example
Bronchitis
Dyspnea
chest
Abnrm
†
Pr(d) = Pr(d | a,b) f4 (a,b)A ,BÂ
Lecture 16 • 23
One more example
Bronchitis
Dyspnea
chest
Abnrm
†
Pr(d) = Pr(d | a,b) f4 (a,b)BÂ
AÂ
†
f5(a)Lecture 16 • 24
One more example
chest
Abnrm
†
Pr(d) = f5(a)AÂ
5
Lecture 16 • 25
Properties of Variable Elimination
• Time is exponential in size of largest factor• Bad elimination order can generate huge factors• NP Hard to find the best elimination order• Even the best elimination order may generate large
factors• There are reasonable heuristics for picking an
elimination order (such as choosing the variablethat results in the smallest next factor)
• Inference in polytrees (nets with no cycles) is linearin size of the network (the largest CPT)
• Many problems with very large nets have only smallfactors, and thus efficient inference
Lecture 16 • 26
Sampling
To get approximate answer we can do stochastic simulation(sampling).
A
B C
D
P(A) = 0.4
…
TTFT
DCBA
•Flip a coin where P(T)=0.4, assumewe get T, use that value for A
•Given A=T, lookup P(B|A=T) and flipa coin with that prob., assume we getF
•Similarly for C and D
•We get one sample from jointdistribution of these four vars
Estimate:
P*(D|A) = #D,A / #A
Lecture 16 • 27
Estimation
• Some probabilities are easier than others toestimate
• In generating the table, the rare events will not bewell represented
• P(Disease| spots-on-your-tongue, sore toe)• If spots-on-your-tongue and sore toe are not root
nodes, you would generate a huge table but thecases of interest would be very sparse in the table
• Importance sampling lets you focus on the set ofcases that are important to answering yourquestion
Lecture 16 • 28
Recitation Problem
• Do the variable elimination algorithm on the netbelow using the elimination order A,B,C (that is,eliminate node C first). In computing P(D=d), whatfactors do you get?
• What if you wanted to compute the whole marginaldistribution P(D)?
A B C D
Lecture 16 • 29
Another Recitation Problem
A
IHGFEDCB
MLKJ
Find an elimination order that keeps the factors smallfor the net below, or show that there is no suchorder.
N O
P
Lecture 16 • 30
The Last Recitation Problem (in thislecture)
Bayesian networks (or related models) are oftenused in computer vision, but they almost alwaysrequire sampling. What happens when you try todo variable elimination on a model like the gridbelow?
A B C D E
P Q R S T
K L M N O
F G H I J