Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

37
Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1

Transcript of Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Page 1: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Computer Science CPSC 322

Lecture 27

Conditioning

Ch 6.1.3

Slide 1

Page 2: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Lecture Overview

• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)

Slide 2

Page 3: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Probability as a measure of uncertainty/ignorance• Probability measures an agent's degree of belief in truth of

propositions about states of the world • Belief in a proposition f can be measured in terms of a number

between 0 and 1• this is the probability of f• E.g. P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = 0.167• P(f) = 0 means that f is believed to be definitely false• P(f) = 1 means f is believed to be definitely true• Using probabilities between 0 and 1 is purely a convention.

.

Slide 3

Page 4: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Probability Theory and Random Variables

• Probability Theory• system of logical axioms and formal operations for sound

reasoning under uncertainty

• Basic element: random variable X • X is a variable like the ones we have seen in

CSP/Planning/Logic• but the agent can be uncertain about the value of X• We will focus on Boolean and categorical variables

Boolean: e.g., Cancer (does the patient have cancer or not?)Categorical: e.g., CancerType could be one of {breastCancer,

lungCancer, skinMelanomas}

Slide 4

Page 5: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Possible Worlds• A possible world specifies an assignment to each random variable

• Example: weather in Vancouver, represented by random variables - -

- Temperature: {hot mild cold}

- Weather: {sunny, cloudy}

• There are 6 possible worlds:

• w╞ f means that proposition f is true in world w• A probability measure (w) over possible worlds w is a

nonnegative real number such that- (w) sums to 1 over all possible worlds w

Because for sure we are in one of these worlds!

Weather Temperature

w1 sunny hot

w2 sunny mild

w3 sunny cold

w4 cloudy hot

w5 cloudy mild

w6 cloudy cold

Slide 5

Page 6: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Possible Worlds• A possible world specifies an assignment to each random variable

• Example: weather in Vancouver, represented by random variables - -

- Temperature: {hot mild cold}

- Weather: {sunny, cloudy}

• There are 6 possible worlds:

• w╞ f means that proposition f is true in world w• A probability measure (w) over possible worlds w is a

nonnegative real number such that- (w) sums to 1 over all possible worlds w

- The probability of proposition f is defined by: P(f )= Σ w╞ f µ(w). i.e. sum of the probabilities of the worlds w in which f is true

Because for sure we are in one of these worlds!

Weather Temperature µ(w)

w1 sunny hot 0.10

w2 sunny mild 0.20

w3 sunny cold 0.10

w4 cloudy hot 0.05

w5 cloudy mild 0.35

w6 cloudy cold 0.20

Slide 6

Page 7: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Example• What’s the probability of it

being cloudy or cold?• µ(w3) + µ(w4) + µ(w5) + µ(w6) =

0.7 Weather Temperature µ(w)

w1 sunny hot 0.10

w2 sunny mild 0.20

w3 sunny cold 0.10

w4 cloudy hot 0.05

w5 cloudy mild 0.35

w6 cloudy cold 0.20

• Remember

- The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w)- sum of the probabilities of the worlds w in which f is true

Slide 7

Page 8: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Joint Probability Distribution

• Joint distribution over random variables X1, …, Xn:• a probability distribution over the joint random variable <X1, …, Xn>

with domain dom(X1) × … × dom(Xn) (the Cartesian product)

• Think of a joint distribution over n variables as the n-dimensional table of the corresponding possible worlds• Each row corresponds to an assignment X1= x1, …, Xn= xn and its

probability P(X1= x1, … ,Xn= xn)

• E.g., {Weather, Temperature} example

Weather Temperature µ(w)

sunny hot 0.10

sunny mild 0.20

sunny cold 0.10

cloudy hot 0.05

cloudy mild 0.35

cloudy cold 0.20

Definition (probability distribution)A probability distribution P on a random variable X is a function dom(X)

[0,1] such that x P(X=x)

Slide 8

Page 9: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Marginalization

Weather µ(w)

sunny 0.40

cloudy

Wind Weather Temperature µ(w)

yes sunny hot 0.04

yes sunny mild 0.09

yes sunny cold 0.07

yes cloudy hot 0.01

yes cloudy mild 0.10

yes cloudy cold 0.12

no sunny hot 0.06

no sunny mild 0.11

no sunny cold 0.03

no cloudy hot 0.04

no cloudy mild 0.25

no cloudy cold 0.08

Marginalization over Temperature and Wind

• Given the joint distribution, we can compute distributions over subsets of the variables through marginalization:

P(X=x,Y=y) = z1dom(Z1),…, zndom(Zn) P(X=x, Y=y, Z1 = z1, …, Zn = zn)

Slide 9

Page 10: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Lecture Overview

• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)

Slide 10

Page 11: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Conditioning• Are we done with reasoning under uncertainty? What can happen?• Remember from last class

I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’?• It is 1/6 ≈ 16.7%.

• I now look at the dice. What is ‘the’ (my) probability now?• My probability is now either 1 or 0, depending on what I observed.• Your probability hasn’t changed: 1/6 ≈ 16.7%

• What if I tell some of you the result is even?• Their probability increases to 1/3 ≈ 33.3%, if they believe me

• Different agents can have different degrees of belief in (probabilities for) a proposition, based on the evidence they have.

Slide 11

Page 12: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

ConditioningConditioning: revise beliefs based on new observations

• Build a probabilistic model (the joint probability distribution, JPD)Takes into account all background informationCalled the prior probability distribution Denote the prior probability for hypothesis h as P(h)

• Observe new information about the worldCall all information we received subsequently the evidence e

• Integrate the two sources of information to compute the conditional probability P(h|e)This is also called the posterior probability of h given e.

Example• Prior probability for having a disease (typically small)• Evidence: a test for the disease comes out positive

But diagnostic tests have false positives• Posterior probability: integrate prior and evidence

Slide 12

Page 13: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Example for conditioning• You have a prior for the joint distribution of weather and

temperaturePossible

worldWeather Temperature µ(w)

w1 sunny hot 0.10

w2 sunny mild 0.20

w3 sunny cold 0.10

w4 cloudy hot 0.05

w5 cloudy mild 0.35

w6 cloudy cold 0.20

T P(T|W=sunny)

hot 0.10/0.40=0.25

mild

cold

C. 0.40

A. 0.20 B. 0.50

D. 0.80

??

• Now, you look outside and see that it’s sunny• You are now certain that you’re in one of worlds w1, w2, or w3

• To get the conditional probability P(T|W=sunny)• renormalize µ(w1), µ(w2), µ(w3) to sum to 1

• µ(w1) + µ(w2) + µ(w3) = 0.10+0.20+0.10=0.40 Slide 13

Page 14: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

How can we compute P(h|e)• What happens in term of possible worlds if we know

the value of a random var (or a set of random vars)?

cavity toothache catch µ(w) µe(w)T T T .108

T T F .012

T F T .072

T F F .008

F T T .016

F T F .064

F F T .144

F F F .576

e = (cavity = T)

• Some worlds are . The other become ….

Slide 14

Page 15: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

How can we compute P(h|e)

cavity toothache catch µ(w) µcavity=T (w)T T T .108

T T F .012

T F T .072

T F F .008

F T T .016

F T F .064

F F T .144

F F F .576

)()|( wTcavityFtoothachePFtoothachew

Tcavity

)()|( wehPhw

e

Slide 15

Page 16: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Semantics of Conditional Probability

• The conditional probability of formula h given evidence e is

ewif

ewifweP

0

)()(

1

(w)e

)()(

1)(

)(

1)()|( w

ePw

ePwehP

hwe

Slide 16

Page 17: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Semantics of Conditional Prob.: Example

cavity toothache catch µ(w) µe(w)T T T .108 .54

T T F .012 .06

T F T .072 .36

T F F .008 .04

F T T .016 0

F T F .064 0

F F T .144 0

F F F .576 0

e = (cavity = T)

P(h | e) = P(toothache = T | cavity = T) =

17

Page 18: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Lecture Overview

• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)

Slide 18

Page 19: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Inference by Enumeration

Great, we can compute arbitrary probabilities now!• Given:

• Prior joint probability distribution (JPD) on set of variables X• specific values e for the evidence variables E (subset of X)

• We want to compute:• posterior joint distribution of query variables Y (a subset of X)

given evidence e

• Step 1: Condition to get distribution P(X|e)• Step 2: Marginalize to get distribution P(Y|e)

Slide 19

Page 20: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”

• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)

• Step 1: condition to get distribution P(X|e)Windy

WCloudy

CTemperature

TP(W, C, T)

yes no hot 0.04

yes no mild 0.09

yes no cold 0.07

yes yes hot 0.01

yes yes mild 0.10

yes yes cold 0.12

no no hot 0.06

no no mild 0.11

no no cold 0.03

no yes hot 0.04

no yes mild 0.25

no yes cold 0.08 Slide 20

Page 21: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”

• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)

• Step 1: condition to get distribution P(X|e)

• P(X|e) CloudyC

TemperatureT

P(C, T| W=yes)

sunny hot

sunny mild

sunny cold

cloudy hot

cloudy mild

cloudy cold

Windy W

Cloudy C

Temperature T

P(W, C, T)

yes no hot 0.04

yes no mild 0.09

yes no cold 0.07

yes yes hot 0.01

yes yes mild 0.10

yes yes cold 0.12

no no hot 0.06

no no mild 0.11

no no cold 0.03

no yes hot 0.04

no yes mild 0.25

no yes cold 0.08

 

Slide 21

Page 22: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”

• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)

• Step 1: condition to get distribution P(X|e)Cloudy

CTemperature

TP(C, T| W=yes)

sunny hot 0.04/0.43 0.10

sunny mild 0.09/0.43 0.21

sunny cold 0.07/0.43 0.16

cloudy hot 0.01/0.43 0.02

cloudy mild 0.10/0.43 0.23

cloudy cold 0.12/0.43 0.28

Windy W

Cloudy C

Temperature T

P(W, C, T)

yes no hot 0.04

yes no mild 0.09

yes no cold 0.07

yes yes hot 0.01

yes yes mild 0.10

yes yes cold 0.12

no no hot 0.06

no no mild 0.11

no no cold 0.03

no yes hot 0.04

no yes mild 0.25

no yes cold 0.08

 

Slide 22

Page 23: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Inference by Enumeration: example• Given P(X) as JPD below, and evidence e : “Wind=yes”

• What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes)

• Step 2: marginalize to get distribution P(Y|e)

CloudyC

TemperatureT

P(C, T| W=yes)

sunny hot 0.10

sunny mild 0.21

sunny cold 0.16

cloudy hot 0.02

cloudy mild 0.23

cloudy cold 0.28

TemperatureT

P(T| W=yes)

hot 0.10+0.02 = 0.12

mild 0.21+0.23 = 0.44

cold 0.16+0.28 = 0.44

Slide 23

Page 24: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Problems of Inference by Enumeration

• If we have n variables, and d is the size of the largest domain

• What is the space complexity to store the joint distribution?

B. O(nd)A. O(dn) C. O(nd) D. O(n+d)

Slide 24

Page 25: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Problems of Inference by Enumeration

• If we have n variables, and d is the size of the largest domain• What is the space complexity to store the joint distribution?

• We need to store the probability for each possible world• There are O(dn) possible worlds, so the space complexity is O(dn)

• How do we find the numbers for O(dn) entries?• Time complexity O(dn)

• In the worse case, need to sum over all entries in the JPD

• We have some of our basic tools, but to gain computational efficiency we need to do more• We will exploit (conditional) independence between variables• But first, let’s look at a neat application of conditioning

Slide 25

Page 26: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Lecture Overview

• Recap Lecture 26• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)

Slide 26

Page 27: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Using conditional probability• Often you have causal knowledge (forward from cause to evidence):

• For exampleP(symptom | disease)P(light is off | status of switches and switch positions)P(alarm | fire)

• In general: P(evidence e | hypothesis h)

• ... and you want to do evidential reasoning (backwards from evidence to cause):• For example

P(disease | symptom)P(status of switches | light is off and switch positions)P(fire | alarm)

• In general: P(hypothesis h | evidence e)

Slide 27

Page 28: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Bayes Rule

Bayes Rule• By definition, we know that :

• We can rearrange terms to write

• But

• From (1) (2) and (3) we can derive

)(

)()|(

eP

ehPehP

(1) )()|()( ePehPehP

(3) )()( hePehP

(3) )(

)()|()|(

eP

hPhePehP

)(

)()|(

hP

hePheP

(2) )()|()( hPhePheP

Slide 28

Page 29: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

• Define, use and prove the formula to compute conditional probability P(h|e)• Use inference by enumeration

• to compute joint posterior probability distributions over any subset of variables given evidence

• Define Bayes Rule

Learning Goals For Today’s Class

Slide 29

Page 30: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Lecture Overview

• Recap Lecture 27• Conditioning• Inference by Enumeration• Bayes Rule • Chain Rule (time permitting)

Slide 30

Page 31: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Product Rule• By definition, we know that :

• We can rewrite this to

• In general

)(

)()|(

1

1212 fP

ffPffP

)()|()( 11212 fPffPffP

 

Slide 31

Page 32: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Chain Rule 

Slide 32

Page 33: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Chain Rule 

Slide 33

Page 34: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Chain Rule 

Slide 34

Page 35: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Chain Rule 

Slide 35

Page 36: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Chain Rule

 

 

Slide 36

Page 37: Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Why does the chain rule help us?

We will see how, under specific circumstances (variables

independence), this rule helps gain compactness

• We can represent the JPD as a product of marginal distributions

• We can simplify some terms when the variables involved are independent or conditionally independent

Slide 37