Computer Science CPSC 322 Lecture A* and Search Refinements (Ch 3.6.1, 3.7.1, 3.7.2) Slide 1.
Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.
-
Upload
jonas-price -
Category
Documents
-
view
215 -
download
0
Transcript of Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.
Computer Science CPSC 322
Lecture 27
Conditioning
Ch 6.1.3
Slide 1
Lecture Overview
• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)
Slide 2
Probability as a measure of uncertainty/ignorance• Probability measures an agent's degree of belief in truth of
propositions about states of the world • Belief in a proposition f can be measured in terms of a number
between 0 and 1• this is the probability of f• E.g. P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = 0.167• P(f) = 0 means that f is believed to be definitely false• P(f) = 1 means f is believed to be definitely true• Using probabilities between 0 and 1 is purely a convention.
.
Slide 3
Probability Theory and Random Variables
• Probability Theory• system of logical axioms and formal operations for sound
reasoning under uncertainty
• Basic element: random variable X • X is a variable like the ones we have seen in
CSP/Planning/Logic• but the agent can be uncertain about the value of X• We will focus on Boolean and categorical variables
Boolean: e.g., Cancer (does the patient have cancer or not?)Categorical: e.g., CancerType could be one of {breastCancer,
lungCancer, skinMelanomas}
Slide 4
Possible Worlds• A possible world specifies an assignment to each random variable
• Example: weather in Vancouver, represented by random variables - -
- Temperature: {hot mild cold}
- Weather: {sunny, cloudy}
• There are 6 possible worlds:
• w╞ f means that proposition f is true in world w• A probability measure (w) over possible worlds w is a
nonnegative real number such that- (w) sums to 1 over all possible worlds w
Because for sure we are in one of these worlds!
Weather Temperature
w1 sunny hot
w2 sunny mild
w3 sunny cold
w4 cloudy hot
w5 cloudy mild
w6 cloudy cold
Slide 5
Possible Worlds• A possible world specifies an assignment to each random variable
• Example: weather in Vancouver, represented by random variables - -
- Temperature: {hot mild cold}
- Weather: {sunny, cloudy}
• There are 6 possible worlds:
• w╞ f means that proposition f is true in world w• A probability measure (w) over possible worlds w is a
nonnegative real number such that- (w) sums to 1 over all possible worlds w
- The probability of proposition f is defined by: P(f )= Σ w╞ f µ(w). i.e. sum of the probabilities of the worlds w in which f is true
Because for sure we are in one of these worlds!
Weather Temperature µ(w)
w1 sunny hot 0.10
w2 sunny mild 0.20
w3 sunny cold 0.10
w4 cloudy hot 0.05
w5 cloudy mild 0.35
w6 cloudy cold 0.20
Slide 6
Example• What’s the probability of it
being cloudy or cold?• µ(w3) + µ(w4) + µ(w5) + µ(w6) =
0.7 Weather Temperature µ(w)
w1 sunny hot 0.10
w2 sunny mild 0.20
w3 sunny cold 0.10
w4 cloudy hot 0.05
w5 cloudy mild 0.35
w6 cloudy cold 0.20
• Remember
- The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w)- sum of the probabilities of the worlds w in which f is true
Slide 7
Joint Probability Distribution
• Joint distribution over random variables X1, …, Xn:• a probability distribution over the joint random variable <X1, …, Xn>
with domain dom(X1) × … × dom(Xn) (the Cartesian product)
• Think of a joint distribution over n variables as the n-dimensional table of the corresponding possible worlds• Each row corresponds to an assignment X1= x1, …, Xn= xn and its
probability P(X1= x1, … ,Xn= xn)
• E.g., {Weather, Temperature} example
Weather Temperature µ(w)
sunny hot 0.10
sunny mild 0.20
sunny cold 0.10
cloudy hot 0.05
cloudy mild 0.35
cloudy cold 0.20
Definition (probability distribution)A probability distribution P on a random variable X is a function dom(X)
[0,1] such that x P(X=x)
Slide 8
Marginalization
Weather µ(w)
sunny 0.40
cloudy
Wind Weather Temperature µ(w)
yes sunny hot 0.04
yes sunny mild 0.09
yes sunny cold 0.07
yes cloudy hot 0.01
yes cloudy mild 0.10
yes cloudy cold 0.12
no sunny hot 0.06
no sunny mild 0.11
no sunny cold 0.03
no cloudy hot 0.04
no cloudy mild 0.25
no cloudy cold 0.08
Marginalization over Temperature and Wind
• Given the joint distribution, we can compute distributions over subsets of the variables through marginalization:
P(X=x,Y=y) = z1dom(Z1),…, zndom(Zn) P(X=x, Y=y, Z1 = z1, …, Zn = zn)
Slide 9
Lecture Overview
• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)
Slide 10
Conditioning• Are we done with reasoning under uncertainty? What can happen?• Remember from last class
I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’?• It is 1/6 ≈ 16.7%.
• I now look at the dice. What is ‘the’ (my) probability now?• My probability is now either 1 or 0, depending on what I observed.• Your probability hasn’t changed: 1/6 ≈ 16.7%
• What if I tell some of you the result is even?• Their probability increases to 1/3 ≈ 33.3%, if they believe me
• Different agents can have different degrees of belief in (probabilities for) a proposition, based on the evidence they have.
Slide 11
ConditioningConditioning: revise beliefs based on new observations
• Build a probabilistic model (the joint probability distribution, JPD)Takes into account all background informationCalled the prior probability distribution Denote the prior probability for hypothesis h as P(h)
• Observe new information about the worldCall all information we received subsequently the evidence e
• Integrate the two sources of information to compute the conditional probability P(h|e)This is also called the posterior probability of h given e.
Example• Prior probability for having a disease (typically small)• Evidence: a test for the disease comes out positive
But diagnostic tests have false positives• Posterior probability: integrate prior and evidence
Slide 12
Example for conditioning• You have a prior for the joint distribution of weather and
temperaturePossible
worldWeather Temperature µ(w)
w1 sunny hot 0.10
w2 sunny mild 0.20
w3 sunny cold 0.10
w4 cloudy hot 0.05
w5 cloudy mild 0.35
w6 cloudy cold 0.20
T P(T|W=sunny)
hot 0.10/0.40=0.25
mild
cold
C. 0.40
A. 0.20 B. 0.50
D. 0.80
??
• Now, you look outside and see that it’s sunny• You are now certain that you’re in one of worlds w1, w2, or w3
• To get the conditional probability P(T|W=sunny)• renormalize µ(w1), µ(w2), µ(w3) to sum to 1
• µ(w1) + µ(w2) + µ(w3) = 0.10+0.20+0.10=0.40 Slide 13
How can we compute P(h|e)• What happens in term of possible worlds if we know
the value of a random var (or a set of random vars)?
cavity toothache catch µ(w) µe(w)T T T .108
T T F .012
T F T .072
T F F .008
F T T .016
F T F .064
F F T .144
F F F .576
e = (cavity = T)
• Some worlds are . The other become ….
Slide 14
How can we compute P(h|e)
cavity toothache catch µ(w) µcavity=T (w)T T T .108
T T F .012
T F T .072
T F F .008
F T T .016
F T F .064
F F T .144
F F F .576
)()|( wTcavityFtoothachePFtoothachew
Tcavity
)()|( wehPhw
e
Slide 15
Semantics of Conditional Probability
• The conditional probability of formula h given evidence e is
ewif
ewifweP
0
)()(
1
(w)e
)()(
1)(
)(
1)()|( w
ePw
ePwehP
hwe
Slide 16
Semantics of Conditional Prob.: Example
cavity toothache catch µ(w) µe(w)T T T .108 .54
T T F .012 .06
T F T .072 .36
T F F .008 .04
F T T .016 0
F T F .064 0
F F T .144 0
F F F .576 0
e = (cavity = T)
P(h | e) = P(toothache = T | cavity = T) =
17
Lecture Overview
• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)
Slide 18
Inference by Enumeration
Great, we can compute arbitrary probabilities now!• Given:
• Prior joint probability distribution (JPD) on set of variables X• specific values e for the evidence variables E (subset of X)
• We want to compute:• posterior joint distribution of query variables Y (a subset of X)
given evidence e
• Step 1: Condition to get distribution P(X|e)• Step 2: Marginalize to get distribution P(Y|e)
Slide 19
Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”
• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)
• Step 1: condition to get distribution P(X|e)Windy
WCloudy
CTemperature
TP(W, C, T)
yes no hot 0.04
yes no mild 0.09
yes no cold 0.07
yes yes hot 0.01
yes yes mild 0.10
yes yes cold 0.12
no no hot 0.06
no no mild 0.11
no no cold 0.03
no yes hot 0.04
no yes mild 0.25
no yes cold 0.08 Slide 20
Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”
• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)
• Step 1: condition to get distribution P(X|e)
• P(X|e) CloudyC
TemperatureT
P(C, T| W=yes)
sunny hot
sunny mild
sunny cold
cloudy hot
cloudy mild
cloudy cold
Windy W
Cloudy C
Temperature T
P(W, C, T)
yes no hot 0.04
yes no mild 0.09
yes no cold 0.07
yes yes hot 0.01
yes yes mild 0.10
yes yes cold 0.12
no no hot 0.06
no no mild 0.11
no no cold 0.03
no yes hot 0.04
no yes mild 0.25
no yes cold 0.08
Slide 21
Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”
• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)
• Step 1: condition to get distribution P(X|e)Cloudy
CTemperature
TP(C, T| W=yes)
sunny hot 0.04/0.43 0.10
sunny mild 0.09/0.43 0.21
sunny cold 0.07/0.43 0.16
cloudy hot 0.01/0.43 0.02
cloudy mild 0.10/0.43 0.23
cloudy cold 0.12/0.43 0.28
Windy W
Cloudy C
Temperature T
P(W, C, T)
yes no hot 0.04
yes no mild 0.09
yes no cold 0.07
yes yes hot 0.01
yes yes mild 0.10
yes yes cold 0.12
no no hot 0.06
no no mild 0.11
no no cold 0.03
no yes hot 0.04
no yes mild 0.25
no yes cold 0.08
Slide 22
Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”
• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)
• Step 2: marginalize to get distribution P(Y|e)
CloudyC
TemperatureT
P(C, T| W=yes)
sunny hot 0.10
sunny mild 0.21
sunny cold 0.16
cloudy hot 0.02
cloudy mild 0.23
cloudy cold 0.28
TemperatureT
P(T| W=yes)
hot 0.10+0.02 = 0.12
mild 0.21+0.23 = 0.44
cold 0.16+0.28 = 0.44
Slide 23
Problems of Inference by Enumeration
• If we have n variables, and d is the size of the largest domain
• What is the space complexity to store the joint distribution?
B. O(nd)A. O(dn) C. O(nd) D. O(n+d)
Slide 24
Problems of Inference by Enumeration
• If we have n variables, and d is the size of the largest domain• What is the space complexity to store the joint distribution?
• We need to store the probability for each possible world• There are O(dn) possible worlds, so the space complexity is O(dn)
• How do we find the numbers for O(dn) entries?• Time complexity O(dn)
• In the worse case, need to sum over all entries in the JPD
• We have some of our basic tools, but to gain computational efficiency we need to do more• We will exploit (conditional) independence between variables• But first, let’s look at a neat application of conditioning
Slide 25
Lecture Overview
• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)
Slide 26
Using conditional probability• Often you have causal knowledge (forward from cause to evidence):
• For exampleP(symptom | disease)P(light is off | status of switches and switch positions)P(alarm | fire)
• In general: P(evidence e | hypothesis h)
• ... and you want to do evidential reasoning (backwards from evidence to cause):• For example
P(disease | symptom)P(status of switches | light is off and switch positions)P(fire | alarm)
• In general: P(hypothesis h | evidence e)
Slide 27
Bayes Rule
Bayes Rule• By definition, we know that :
• We can rearrange terms to write
• But
• From (1) (2) and (3) we can derive
)(
)()|(
eP
ehPehP
(1) )()|()( ePehPehP
(3) )()( hePehP
(3) )(
)()|()|(
eP
hPhePehP
)(
)()|(
hP
hePheP
(2) )()|()( hPhePheP
Slide 28
• Define, use and prove the formula to compute conditional probability P(h|e)• Use inference by enumeration
• to compute joint posterior probability distributions over any subset of variables given evidence
• Define Bayes Rule
Learning Goals For Today’s Class
Slide 29
Lecture Overview
• Recap Lecture 27• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)
Slide 30
Product Rule• By definition, we know that :
• We can rewrite this to
• In general
)(
)()|(
1
1212 fP
ffPffP
)()|()( 11212 fPffPffP
Slide 31
Chain Rule
Slide 32
Chain Rule
Slide 33
Chain Rule
Slide 34
Chain Rule
Slide 35
Chain Rule
Slide 36
Why does the chain rule help us?
We will see how, under specific circumstances (variables
independence), this rule helps gain compactness
• We can represent the JPD as a product of marginal distributions
• We can simplify some terms when the variables involved are independent or conditionally independent
Slide 37