CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty
description
Transcript of CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty
CPSC 7373: Artificial IntelligenceLecture 4: Uncertainty
Jiang Bian, Fall 2012University of Arkansas at Little Rock
Chapter 13: Uncertainty
• Outline– Uncertainty– Probability– Syntax and Semantics– Inference– Independence and Bayes' Rule
UncertaintyLet action At = leave for airport t minutes before flightWill At get me there on time?
Problems:1. partial observability (road state, other drivers' plans, etc.)2. noisy sensors (traffic reports)3. uncertainty in action outcomes (flat tire, etc.)4. immense complexity of modeling and predicting traffic5.
Hence a purely logical approach either6. risks falsehood: “A25 will get me there on time”, or7. leads to conclusions that are too weak for decision making:
“A25 will get me there on time, if there‘s no accident on the bridge and it doesn’t rain and my tires remain intact etc etc.”
(A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …)
Methods for handling uncertainty• Default or nonmonotonic logic:
– Assume my car does not have a flat tire– Assume A25 works unless contradicted by evidence– Issues: What assumptions are reasonable? How to handle contradiction?
•• Rules with fudge factors:
– A25 |→0.3 get there on time– Sprinkler |→ 0.99 WetGrass– WetGrass |→ 0.7 Rain– Issues: Problems with combination, e.g., Sprinkler causes Rain??
•• Probability• Model agent's degree of belief• Given the available evidence,• A25 will get me there on time with probability 0.04
–
ProbabilityProbabilistic assertions summarize effects oflaziness: failure to enumerate exceptions, qualifications, etc.ignorance: lack of relevant facts, initial conditions, etc.
–
Subjective probability:• Probabilities relate propositions to agent's own state of knowledge• e.g., P(A25 | no reported accidents) = 0.06
These are not assertions about the world
Probabilities of propositions change with new evidence:e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
Bayes Network: Example
CAR WON’TSTART
BATTERYFLAT
BATTERYDEAD
ALTERNATORBROKEN
FAN-BLETBROKEN
NOTCHARGING
NOOIL
NOGAS
FUEL LINEBLOCKED
STARTERBROKEN
BATTERYMETER
BATTERYAGE
LIGHTS OIL LIGHT GAS GAUGE DIP STICK
Bayes Network: Example
CAR WON’TSTART
BATTERYFLAT
BATTERYDEAD
ALTERNATORBROKEN
FAN-BLETBROKEN
NOTCHARGING
NOOIL
NOGAS
FUEL LINEBLOCKED
STARTERBROKEN
BATTERYMETER
BATTERYAGE
LIGHTS OIL LIGHT GAS GAUGE DIP STICK
Probabilities: Coin Flip
• Suppose the probability for heads is 0.5. What's the probability for it coming up tails?– P(H) = 1/2– P(T) = __?__
Probabilities: Coin Flip
• Suppose the probability for heads is 1/4. What's the probability for it coming up tails?– P(H) = 1/4– P(T) = __?__
Probabilities: Coin Flip
• Suppose the probability for heads is 1/2. Each of the coin flip is independent. What's the probability for it coming up three heads in a row?– P(H) = 1/2– P(H, H, H) = __?__
Probabilities: Coin Flip
• Xi = result of i-th coin flig;
• Xi = {H, T}; and
• Pi(H) = 1/2 i∀• P(X1=X2=X3=X4) = __?__
Probabilities: Coin Flip
• Xi = result of i-th coin flig;
• Xi = {H, T}; and
• Pi(H) = 1/2 i∀• P(X1=X2=X3=X4) = __?__
– P(X1=X2=X3=X4=H) + P(X1=X2=X3=X4=T) = 1/8
Probabilities: Coin Flip
• Xi = result of i-th coin flig;
• Xi = {H, T}; and
• Pi(H) = 1/2 i∀• P({X1,X2,X3,X4} contains at least 3 H) = __?__
Probabilities: Coin Flip
• Xi = result of i-th coin flig;
• Xi = {H, T}; and
• Pi(H) = 1/2 i∀• P({X1,X2,X3,X4} contains at least 3 H) = __?__– P(HHHH) + P(HHHT) + P(HHTH) + P(HTHH) +
P(THHH) = 5*1/16 = 5/16
Probabilities: Summary
• Complementary probability:– P(A) = p; then P(¬A) = 1 – p
• Independence:– X Y; then P(X)P(Y) = P(X,Y)⊥
joint probability
marginal
Dependence
• Given, P(X1=H)= ½– H: P(X2=H|X1=H) = 0.9
– T: P(X2=T|X1=T) = 0.8
• P(X2=H) = __?__
Dependence
• Given, P(X1=H)= ½– H: P(X2=H|X1=H) = 0.9
– T: P(X2=T|X1=T) = 0.8
• P(X2=H) = __?__– P(X2=H|X1=H) * P(X1=H) + P(X2=H|X1=T) * P(X1=T)– = 0.9 * ½ + (1 – 0.8) * ½ = 0.55
What we have learned?
• Total probability:
• Negation of probabilities
– What about?
What we have learned?
• Negation of probabilities
– What about?
– You can negate the event (X), but you can never negate the conditional variable (Y).
Example: Weather
• Given,– P(D1); P(D1=Sunny) = 0.9
– P(D2=Sunny|D1=Sunny) = 0.8
– P(D2=Rainy|D1=Sunny) = ??
Example: Weather• Given,– P(D1); P(D1=Sunny) = 0.9– P(D2=Sunny|D1=Sunny) = 0.8– P(D2=Rainy|D1=Sunny) = ??
• 1 - P(D2=Sunny|D1=Sunny) = 0.2
– P(D2=Sunny|D1=Rainy) = 0.6– P(D2=Rainy|D1=Rainy) = ??
• 1 - P(D2=Sunny|D1=Rainy) = 0.4
– Assume the transition probabilities from D2 to D3 are the same:
– P(D2=Sunny) = ??; and P(D3=Sunny) = ??
Example: Weather• Given,– P(D1); P(D1=Sunny) = 0.9– P(D2=Sunny|D1=Sunny) = 0.8– P(D2=Sunny|D1=Rainy) = 0.6– Assume the transition probabilities from D2 to D3 are the same:– P(D2=Sunny) = 0.78;
• P(D2=Sunny|D1=Sunny) * P(D1=Sunny) + P(D2=Sunny|D1=Rainy) * P(D1=Rainy) = 0.8*0.9 + 0.6*0.1 = 0.78
– P(D3=Sunny) = 0.756• P(D3=Sunny|D2=Sunny) * P(D2=Sunny) + P(D3=Sunny|D2=Rainy) *
P(D2=Rainy) = 0.8*0.78 + 0.6*0.22 = 0.756
Example: Cancer
• There exists a type of cancer, where 1% of the population will carry the disease.– P(C) = 0.01; P(¬C) = 1- 0.01 = 0.99
• There exists a test of the cancer.– P(+|C) = 0.9; P(-|C) = 0.1– P(+|¬C) = 0.2; P(-|¬C) = 0.8
• P(C|+) = ??– Joint probabilities:
• P(+, C) = ??; P(-, C) = ??• P(+, ¬C) = ??; P(-, ¬C) = ??
Example: Cancer
• There exists a type of cancer, where 1% of the population will carry the disease.– P(C) = 0.01; P(¬C) = 1- 0.01 = 0.99
• There exists a test of the cancer.– P(+|C) = 0.9; P(-|C) = 0.1– P(+|¬C) = 0.2; P(-|¬C) = 0.8
• P(C|+) = ??– Joint probabilities:
• P(+, C) = 0.009; P(-, C) = 0.001• P(+, ¬C) = 0.198; P(-, ¬C) = 0.792
Example: Cancer• There exists a type of cancer, where 1% of the population will
carry the disease.– P(C) = 0.01; P(¬C) = 1- 0.01 = 0.99
• There exists a test of the cancer.– P(+|C) = 0.9; P(-|C) = 0.1– P(+|¬C) = 0.2; P(-|¬C) = 0.8
• P(C|+) = 0.043– P(+, C) / (P(+, C) + P(+, ¬C))– 0.009 / 0.009 + 0.198 = 0.043– Joint probabilities:
• P(+, C) = 0.009; P(-, C) = 0.001• P(+, ¬C) = 0.198; P(-, ¬C) = 0.792
Bayes Rule
LIKELILHOOD
MARGINAL LIKEIIHOOD
PRIORPOSTERIOR
TOTAL PROBABILITY
Bayes Rule: Cancer ExampleLIKELILHOOD
MARGINAL LIKEIIHOOD
PRIORPOSTERIOR
Bayes Network
• Graphically,
• Diagnostic reasoning: P(A|B) or P(A|¬B)• How many parameters??
A
B
Not observable
observable
P(A)
P(B|A)P(B|¬A)
Two test cancer example
• 2-Test Cancer Example
• P(C|T1=+,T2=+) = P(C|++) = ??
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2
Two test cancer example
• 2-Test Cancer Example
• P(C|T1=+,T2=+) = P(C|++) = 0.1698
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2
Bayes Rule: Compute
)()()|()|(
BPAPABPBAP
1)|()|( BAPBAP
)()|()|(' APABPBAP
)()|()|(' APABPBAP
)|(')|( BAPBAP
)|(')|( BAPBAP
)()|()()|(/(1))|(')|('/(1
1)|()|())|(')|('()(/1
APABPAPABPBAPBAP
BAPBAPBAPBAPBP
Two test cancer example
• 2-Test Cancer Example
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2P(C|++) = ??
Prior + + P’
C 0.01 0.9 0.9 0.0081
¬C 0.99 0.2 0.2 0.0396
0081.0)()()()()|()|(' CPPPCPCPCP0396.0)()()()()|()|(' CPPPCPCPCP0477.0/1)0396.00081.0/(1))|(')|('/(1 CPCP
1698.00477.0/0081.0)|(')|( CPCP 8302.00477.0/0396.0)|(')|( CPCP
Two test cancer example
• 2-Test Cancer Example
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2P(C|+-) = ??
Two test cancer example
• 2-Test Cancer Example
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2P(C|+-) = 0.0056
Prior + - P’ P
C 0.01 0.9 0.1 0.0009 0.0056
¬C 0.99 0.2 0.8 0.1584 0.9943
Conditional Independence
• 2-Test Cancer Example
• We not only assume that T1 and T2 are identically distributed; but also conditionally independent.– P(T2|C,T1)=P(T2|C)
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2
Conditional Independence
• Given A, B C⊥– B C|A =⊥ ? B C⊥
A
B C
Conditional Independence• Given A, B C⊥
– B C|A =⊥ ? B C⊥• Intuitively, getting a positive test result about cancer gives us information about whether you have cancer or not.
– So if you get a positive test result you're going to raise the probability of having cancer relative to the prior probability.
– With that increased probability we will predict that another test will with a higher likelihood give us a positive response than if we hadn't taken the previous test.
A
B C
Two test cancer example
• 2-Test Cancer Example
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2
P(T2=+|T1=+) = ??
Conditional independence: cancer example
• 2-Test Cancer Example
• Conditional independence: given that I know C, knowledge of the first test gives me no more information about the second test.– It only gives me information if C was unknown.
C
T1 T2
P(C) = 0.01; P(¬C) = 0.99P(+|C) = 0.9; P(-|C) = 0.1P(-|¬C) = 0.8; P(+|¬C) = 0.2
P(T2=+|T1=+)= P(+2|+1,C)P(C|+1) + P(+2 |+1, ¬C)P(¬C|+1)= P(+2|C)0.043 + P(+2 |¬C)(1-0.043)= 0.9 * 0.043 + 0.2 * 0.957= 0.2301
P(+2|+1,C) = P(+2|C)
Absolute and Conditional
C
A B
A BA B⊥
A B|C⊥
A B => A B | C ⊥ ⊥ ??
A B | C => A B ⊥ ⊥ ??
Absolute and Conditional
C
A B
A BA B⊥
A B|C⊥
A B => A B | C ⊥ ⊥ ??
A B | C => A B ⊥ ⊥ ??
Confounding CauseC
A B
H
S R
P(S) = 0.7P(R) = 0.01 P(H|S, R) = 1
P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
P(R|S) = ??
Confounding CauseC
A B
H
S R
P(S) = 0.7P(R) = 0.01 P(H|S, R) = 1
P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
P(R|S) = 0.01
Explaining Away
H
S R P(S) = 0.7P(R) = 0.01
P(H|S, R) = 1P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
Explaining away means that if we know that we are happy, then sunny weather can explain away the cause of happiness. - If I know that it’s sunny, it becomes less likely that I received a raise.If we see a certain effect that could be caused by multiple causes, seeing one of those causes can explain away any other potential cause of this effect over here.
P(R|H,S) = ??
Explaining Away
H
S R P(S) = 0.7P(R) = 0.01
P(H|S, R) = 1P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
P(R|H,S) = 0.0142
P(R|H,S) = P(H|R,S)P(R|S) / P(H|S)= P(H|R,S)P(R)/ P(H|R,S)P(R) + P(H| ¬R, S)P(¬R)= 1* 0.01/ 1* 0.01+ 0.7*(1-0.01) = 0.0142
P(R|H) = ??
Explaining Away
H
S R P(S) = 0.7P(R) = 0.01
P(H|S, R) = 1P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
P(R|H) = P(H|R)P(R) / P(H)= (P(H|R,S)P(S) + P(H|R, ¬S)P(¬S))P(R)/P(H)=0.97 * 0.01 /0.5245=0.01849
P(R|H) = 0.01849
P(H) = 0.5245
P(R|H,S) = 0.0142
P(R|H, ¬S) = ??
Explaining Away
H
S R P(S) = 0.7P(R) = 0.01
P(H|S, R) = 1P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
P(R|H, ¬S) P(H|R, ¬S)P(R|¬S) / P(H|¬S)= P(H|R, ¬S)P(R)/ P(H|R, ¬S)P(R) + P(H| ¬R, ¬S)P(¬R)= 0.0833
P(R|H) = 0.01849
P(H) = 0.5245
P(R|H,S) = 0.0142
P(R|H, ¬S) = 0.0833
Conditional Dependence
H
S R P(S) = 0.7P(R) = 0.01
P(H|S, R) = 1P(H|¬S, R) = 0.9P(H|S, ¬R) = 0.7P(H|¬S, ¬R) = 0.1
P(R|H,S) = 1.42% ≠ P(R|H) = 1.849%P(R|S) = 0.01 = P(R) since R S⊥P(R|H, ¬S) = 8.33%
R S⊥ R S | H⊥Independence does not imply conditional independence!!!
Absolute and Conditional
C
A B
A BA B⊥
A B => A B | C ⊥ ⊥ ??
A B | C => A B ⊥ ⊥ ??
C
A B
Bayes Networks
C
D E
A B • Bayes networks define probability distributions over graphs of random variables.
• Instead of enumerating all possibilities of the combinations of random variables, the Bayes network is defined by probability distributions that are inherent to each individual node.
• The joint probability represented by a Bayes network is the product of various Bayes network probabilities that are defined over individual nodes where each node's probability is only conditioned on the incoming arcs.• P(A)P(B) P(C|A,B) P(D|C) P(E|C)• 10 probability values
P(A), P(B)
P(C|A,B)
P(D|C) P(E|C)
25-1 = 31 probability values
Bayes Network: Quiz 1A
B C D
E F
How many probability values are required to specific this Bayes network?
Bayes Network: Quiz 1A
B C D
E F
How many probability values are required to specific this Bayes network?13
P(A)
P(B|A), P(C|A), P(D|A)
P(E|B), P(F|C,D)
Bayes Network: Quiz 2A B C
D
E F
How many probability values are required to specific this Bayes network?
G
Bayes Network: Quiz 2
D
A B C
E F G
P(A), P(B), P(C) = 3
P(D|A, B, C) = 8
P(E|D), P(F|D), P(G|D, C) = 2 + 2 +4 = 8
19
Bayes Network: Example
CAR WON’TSTART
BATTERYFLAT
BATTERYDEAD
ALTERNATORBROKEN
FAN-BLETBROKEN
NOTCHARGING
NOOIL
NOGAS
FUEL LINEBLOCKED
STARTERBROKEN
BATTERYMETER
BATTERYAGE
LIGHTS OIL LIGHT GAS GAUGE DIP STICK
216-1 = 65535 probability values
Bayes Network: Example
CAR WON’TSTART
BATTERYFLAT
BATTERYDEAD
ALTERNATORBROKEN
FAN-BLETBROKEN
NOTCHARGING
NOOIL
NOGAS
FUEL LINEBLOCKED
STARTERBROKEN
BATTERYMETER
BATTERYAGE
LIGHTS OIL LIGHT GAS GAUGE DIP STICK
216-1 = 65535 probability values
1 1 1
1 1 1
2 4
1
2
2
4
4 416 2
47
D-Separation
A
B D
C E
Yes No
C A⊥C A | B ⊥C D⊥C D | A ⊥E C | D ⊥
D-Separation
A
B D
C E
Yes No
C A⊥ X A influences C by virtue of B.
C A | B ⊥ X If you know B, the knowledge of A won’t tell you anything about C.
C D⊥ X If I know D, I can infer more about C through A.
C D | A ⊥ XE C | D ⊥ X
• C and A are not independent but C and A are independent given B.
• C and D are not independent but C and D are independent given A.
• E and C are independent given D.
D-Separation
C
D E
A BYes N
oA E⊥A E | B ⊥A E | C⊥A B⊥A B | C ⊥
D-Separation
C
D E
A BYes N
oA E⊥ XA E | B ⊥ XA E | C⊥ XA B⊥ XA B | C ⊥ X
EXPLAINING AWAY EFFECT• The knowledge of A will discredit the
information given by B on its influence on C.
D-Separation: Reachability
• Active triplets: render variables dependent
• Inactive triplets: render variables independent
– cut off by a known variable in the middle, that separates or d-separates the left variable from the right variable, and they become independent.
D-Separation: ReachabilityActive triplets In-active triplets
D-Separation: ReachabilityActive triplets In-active triplets
D-Separation: QuizA C
B
D
F
E
G
H
Yes No
F A⊥F A | D ⊥F A | G⊥F A | H⊥
D-Separation: QuizA C
B
D
F
E
G
H
Yes No
F A⊥ XF A | D ⊥ XF A | G⊥ XF A | H⊥ X
Bayes Network: Summary
• Graph Structure• Compact representation• Conditional Independence• Next: applications