Probabilities Probabilities: Bayes' Theorem Bayesian Networks... · – based on Bayes Theorem (see...

31 Computing Science & Mathematics University of Stirling

CSC9T6 Information Systems Computing Science & Mathematics University of Stirling

Recap: SUM RULE

P(A or B) = P(A ∪ B) = P(A) + P(B) with A, B disjoint.

P(A or B) = P(A) + P(B) - P(A ∩ B) otherwise CONDITIONAL PROBABILITY OF X GIVEN Y

P(X=A | Y=A) PRODUCT RULE

P( X=A,Y=A ) = P( X=A | Y=A) * P( Y=A ) INDEPENDENT EVENTS

P(X,Y) = P(X) P(Y), i.e. P(X | Y) = P(X) Note that: P(A or B) = P(B or A)

P(A, B) = P(B, A) P(A | B) ≠ P(B | A)

Probabilities

Clearly P(X, Y) = P(Y, X) by symmetry of ∩ -- BUT P(X | Y) ≠ P(Y | X) P(X | Y) P(Y) = P(Y | X) P(X) by definition and product rule

Note again that if the knowledge about Y does not change the probability of X, i.e. P(X | Y) = P(X) then the two events are said to be independent, and P(X,Y) = P(X) P(Y), as in the case of picking two cards from different decks (or reinserting the card after each test).

It follows

P(X | Y) = P(Y | X) P(X) Bayes' Theorem P(Y)

Important result. Informally: By knowing the probability of X given Y, and the probability of X and Y, I can derive the probability of Y given X. We will see an example soon.

Probabilities: Bayes' Theorem

Summing up: SUM RULE P(A or B) = P(A ∪ B) = P(A) + P(B) with A, B disjoint.

Probabilities

We have Box and Fruit, random variables. In the red box (B=r) there are 2 apples (a) and 6 oranges (o). In the blue box (B=b) there are 3

apples and 1 orange. - If I pick a fruit from the red box, what would you expect? -  How can you express this? -  Map all conditional probabilities of fruit | box. -  WORKING Hypothesis: P(B=r) = 4/10

P(B=b) = 6/10

Apples and oranges in boxes [Bishop]

0.4 0.6 P(F=a) = ?

It is the sum of the events "pick an apple from red" or "pick an apple from blue"

P( (F=a, B=r) or (F=a, B=b) ) = P(F=a, B=r) + P(F=a, B=b) = (sum rule) P(F=a | B=r) P(B=r) + P(F=a | B=b) P(B=b) = (product rule) 0.25 * 0.4 + 0.75 * 0.6 = 0.55

Hence P(F=o) = 1 - 0.55 = .45 (sum rule)

Let us try to "invert" our reasoning.

Suppose the two boxes have the same probability. If I observe an orange, on which box would you bet on?

0.4 0.6

Can we make this more precise, even in the original case, where boxes have associated probabilities ? How ? Which probability are we looking for ?

P(B=r | F=o) = P(F=o | B=r) P(B=r) = (Bayes' theorem) P(F=o)

(0.75 * 0.4 ) / 0.45 = 0.66666...

P(B=b | F=o) ?

0.4 0.6

Are B and F independent ?

P(F=o | B= r) = 0.75 ≠ 0.45 = P(F=o)

0.4 0.6 A note on the pre- post- "inversion" we are going through...

Suppose the two boxes have the same probability. If I observe an orange, on which box would you bet ? Probably, on the red one. (0.75 ) Remember that, without any knowledge of the fruit, one knows that the blue one is more probable (0.6) Via the Bayes' theorem, a subsequent observation (F=o) modifies our prior probability into a posterior probability.

P(rain) = 0.4 P(grass is wet) = 0.6 (rain, watering, ...)

P(R, W) ?

Independent? P(W | R) = 1 ≠ 0.6 = P(W) No.

Easily P(W | R) = 1 so P(R, W) = P(W | R) * P(R) = 1 * 0.4 = 0.4

P(R | W) ? P(R | W) = P(W, R)/P(W) = 0.4 / 0.6 = 0.66

NOTE: P(A | B) cannot be calculated from P(A) and P(B) alone.

Grass and rain

Thomas Bayes (1701 - 1761) Logic and theology at Edinburgh. FRS 1742. Probability relevant for gambling and the new concept of insurance. "Essay towards solving a problem in the doctrine of chances" (1764 - three years after his death). [Then Laplace independently rediscovered the theory in a general form].

(Informally) An interpretation of probability as measure of uncertainty, based on so-far observed facts that can be revised. Remember the case of oranges and boxes.

Pierre-Simon Laplace (1749-1827) "Probability theory is nothing else but common sense reduced to calculation". Discussion about the same ideas of Bayes (inverse probability calculation), with applications to life expectancy, jurisprudence, planetary masses, error estimation (1812).

Probabilities

Outline.

1.  Introduction 2.  Probability 3.  Bayesian classification

•  Bayesian Classifiers are statistical classifiers –  based on Bayes� Theorem (see following slides)

•  They can predict the probability that a particular sample is a member of a particular class

•  Perhaps the simplest Bayesian Classifier is known as the Naïve Bayesian Classifier based on an independence assumption (see later on …)

•  In very simple terms, this means that we assume that values given for one variable are not influenced by values given to another variable. No relationship exists between them

•  Although the independence assumption is often a bold assumption to make, performance is still often comparable to Decision Trees and Neural Network classifiers (explore on wikipedia! and references therein about "the surprisingly good performance of Naive Bayes in classification" )

What is a Bayesian Classifier?

Bayes� Classifier: An Example Examples of things we can derive from our dataset:

•  4 males took advantage of the Mag.(azine) Promo and these 4 males represent 2/3 of the total male population,

•  3/4�s of females purchased the Mag. Promo.

Mag. Promotion TV Promotion Life Insurance Promotion Credit Card Insurance Sex Y N N N M Y Y Y N F N N N N M Y Y Y Y M Y N Y N F N N N N F Y N Y Y M N Y N N M Y N N N M Y Y Y Y F

For our example, let�s use sex as the output attribute whose value is to be predicted

Bayes� Classifier: An Example Suppose we want to classify a new instance (or customer), called Lee. We are told the

following holds true for our new customer, i.e. this is our evidence E =

Mag. Promo = Y and TV Promo = Y and LI Promo = N and C.C. Ins. = N

We want to know if Lee is male (H1) or female (H2). Note that there was no example of YYNN in the data. We apply Bayes� classifier and compute a probability for each hypothesis

Mag. Promotion TV Promotion Life Insurance Promotion Credit Card Insurance Sex Y N N N M Y Y Y N F N N N N M Y Y Y Y M Y N Y N F N N N N F Y N Y Y M N Y N N M Y N N N M Y Y Y Y F

•  Firstly we list the distribution of the output attribute values for each input attribute. •  This is done using a distribution table.

Bayes� Classifier: An Example

Mag Promo TV Promo LI Promo C.C. InsSex M F M F M F M FY 4 3 2 2 2 3 2 1N 2 1 4 2 4 1 4 3Ratio: Yes/Total 4/6 3/4 2/6 2/4 2/6 3/4 2/6 1/4Ratio: No/Total 2/6 1/4 4/6 2/4 4/6 1/4 4/6 3/4

So for example, 4 males answered Y to the Mag Promo

2 out of the total of 6 males answered Y to the LI Promo

Ratio for Y/T and N/T sum to 1 for each column

H1: Lee is male

P(sex = M | E) = P(E | sex = M) P(sex = M) P(E)

Starting with P(E | sex = M) … This is

P(Mag. Promo = Y, TV Promo = Y, LI Promo = N, C.C. Ins = N | sex = M) We have (mathematical justification):

* Assumption: E1, … E4 are conditionally independent given C, i.e.. the information added

by knowing that E2, … E4 have happened does not add much to P(E1 | C) and is forgotten. This is not always correct, it is an approximation, but often works well (and fast!).

Bayes� Classifier: Back to hypothesis H1 and H2

Bayes’ Theorem

H1: Lee is male

P(sex = M | E) = P(E | sex = M) P(sex = M) P(E)

Then we can calculate the conditional probability values for each piece of evidence as

explained:

P(Mag. Promo = Y | sex = M) = 4/6 P(TV Promo = Y | sex = M) = 2/6 P(LI Promo = N | sex = M) = 4/6 P(C.C. Ins = N | sex = M) = 4/6

and P(sex = M) = 6/10 = 3/5

These values are easily obtained from our distribution table. It follows:

P(E | sex = M) P(sex = M) = (4/6) * (2/6) * (4/6) * (4/6) * (3/5) = 8/81 * 3/5

Bayes’ Theorem

Analogously for H2: Lee is female

P(sex = F | E) = P(E | sex = F) P(sex = F) P(E)

And we have

P(Mag. Promo = Y | sex = F) = 3/4 P(TV Promo = Y | sex = F) = 2/4 P(LI Promo = N | sex = F) = 1/4 P(C.C. Ins = N | sex = F) = 3/4

P(sex = F) = 2/5

It follows:

P(sex = F | E) = (9/128) * (2/5) P(E)

Bayes’ Theorem

Finally,

P(sex = F | E) = (9/128) * (2/5) < (8/81) * (3/5) = P(sex = M | E) P(E) P(E)

Hence, Bayes� classifier tells us that Lee is most likely a male. Calculating also the value of P(E), i.e. the (conditionally independent !) probabilities of

Mag. Promo, TV Promo, not LI Promo and not CC Promo

P(E) = (7/10) * (4/10) * (5/10) * (7/10) = 0.098 we have:

P(sex = F | E) = 0.2815 < 0.5926 = P(sex = M | E)

1. What proportion of Glasgow customers buy books? 2. What proportion of all customers buy DVDs? 3.  Given a new customer that we knows buys Videos, is it more likely that they

live in Glasgow or Stirling? Classifying according to further evidence !

CDs Books DVDs Videos Region

Y Y N N Stirling

Y N Y N Glasgow

Y N Y Y Glasgow

Y Y Y N Glasgow

N N Y N Stirling

N Y Y Y Stirling

Y N Y N Stirling

Y Y Y Y Glasgow

Items Bought from Amazon

After a bit of feature extraction ... 1. P(Region = G, Books = Y) = 2/4 = 1/2 2. P(DVDs = Y) = 7/8 3. ??

CDs G S

Books G S

DVDs G S

Videos G S

4 2 0 2

2 2 2 2

4 3 0 1

2 1 2 3

1 ½ 0 ½

½ ½ ½ ½

1 ¾ 0 ¼

½ ¼ ½ ¾

Ratio Y/Tot

Ratio N/Tot

P(Glasgow | Videos) = P(Videos | Glasgow) P(Glasgow) / P(videos) = 1/2 * 1/2 / 3/8 = 2/3

CDs G S

Books G S

DVDs G S

Videos G S

4 2 0 2

2 2 2 2

4 3 0 1

2 1 2 3

1 ½ 0 ½

½ ½ ½ ½

1 ¾ 0 ¼

½ ¼ ½ ¾

Ratio Y/Tot

Ratio N/Tot

P(Stirling | Videos) = P(Videos | Stirling) P(Stirling) / P(videos) = 1/3 * 1/2 / 3/8 = 1/3

CDs G S

Books G S

DVDs G S

Videos G S

4 2 0 2

2 2 2 2

4 3 0 1

2 1 2 3

1 ½ 0 ½

½ ½ ½ ½

1 ¾ 0 ¼

½ ¼ ½ ¾

Ratio Y/Tot

Ratio N/Tot

Exercise. Is in general true that P(a|r) + P(b|r) = 1 ? Note that in our case a ∪ b = T. (One either comes from Glasgow or from Stirling, but not from both places. Indeed a ∩ b = ∅, too). Is it true when assuming a ∪ b = T ?

(by def) = P(a,r) /P(r) + P(b,r)/P(r) (by sum) = P(a U b, r) / P(r) = P(T,r)/P(r) = P(r)/P(r) =1

Note: Revisit the Naïve Bayesian Classifier example about promotions. Does the above result hold there? If not, why ? How is the situation different here ?

Why use Bayesian Classifiers?

•  There are several classification methods, no one has been found to be superior over all others in every case (i.e. a data set drawn from a particular domain of interest)

•  Methods can be compared based on: –  accuracy –  interpretability of the results –  robustness of the method with different datasets –  training time –  scalability

An Important Remark

Mind the difference between

calculating probabilities from a known set of outcomes, for example, tossing a coin heads when we know the two outcomes are heads or tails, and calculating the probability of an event from data, and the event is not directly expressed by data.

The probabilities calculated from data are estimates, not true values. If we tossed a coin 10 times to generate data, we might easily get 6 heads and 4 tails. Without knowing about how coins work, we would estimate the probability of getting heads as 6/10 – not a bad estimate, but incorrect. The more data we have, the more reliable our estimates get.

Results can be dramatically sensitive to the specific evidence you have. E.g. suppose you have evidence that P(Head)=0.6, then you can estimate the probability of getting 6 heads in a row, even if it didn’t happen in your data

=0.6 * 0.6 * 0.6 * 0.6 * 0.6 *0.6 = 0.047, i.e. 1 in 21 tries! (with P(Head)=0.5 is 0.016)

Bayesian Belief Networks

Modern pattern recognition based on probabilities, mainly on sum and product rules.

All could be treated algebraically.

However, graphical models offer advantages, in terms of

•  visualisation, e.g. structure and relationships; •  communication, easily to grasp; •  expressiveness, e.g. graphical manipulations corresponding to

mathematical operations.

Probabilistic graphical models.

•  What are they?

–  Bayesian Belief Networks (BBNs) are a way of modelling probabilities based on data or knowledge to allow probabilities of new events to be estimated

•  What are they used for?

–  Intelligent decision aids, data fusion, intelligent diagnostic aids, automated free text understanding, data mining

•  Where did they come from?

–  Cross fertilization of ideas between the artificial intelligence, decision analysis, and statistic communities

Bayesian Belief Networks

Definition of a Bayesian Network

•  Factored joint probability distribution as a directed graph:

•  structure for representing knowledge about uncertain variables •  computational architecture for computing (propagating) the impact of evidence

on beliefs

•  Knowledge structure:

•  random variables are depicted as nodes •  arcs represent probabilistic dependence between variables •  conditional probabilities encode the strength of the dependencies

•  Computational architecture:

•  computes posterior probabilities given evidence about selected nodes •  exploits probabilistic independence for efficient computation

Visit to Asia

Tuberculosis

Tuberculosis or Cancer

XRay Result Dyspnea

Bronchitis Lung Cancer

Smoking

Patient Information

Medical Difficulties

Diagnostic Tests

Example from Medical Diagnostics

•  Relationship knowledge is modeled by deterministic functions, logic and conditional probability distributions

Patient Information

Diagnostic Tests

Visit to Asia

Tuberculosis

Tuberculosis or Cancer

XRay Result Dyspnea

Bronchitis Lung Cancer

Smoking Tuber

Present

Absent

Lung Can

Present

Absent

Present

Absent

Tub or Can

Medical Difficulties Tub or Can

Bronchitis

Present

Absent

Present

Absent

Present

Absent

Dyspnea

•  Propagation algorithm processes relationship information to provide likelihood information for occurrence of each state for each node

TuberculosisPresentAbsent

1.0499.0

XRay ResultAbnormalNormal

11.089.0

Tuberculosis or CancerTrueFalse

6.4893.5

Lung CancerPresentAbsent

5.5094.5

DyspneaPresentAbsent

43.656.4

BronchitisPresentAbsent

45.055.0

Visit To AsiaVisitNo Visit

1.0099.0

SmokingSmokerNonSmoker

50.050.0

•  As a finding is entered, the propagation algorithm updates the beliefs attached to each relevant node in the network

•  Interviewing the patient produces the information that �Visit to Asia� is �Visit� •  This finding propagates through the network and the belief functions of several nodes are updated

5.0095.0

14.585.5

10.289.8

5.5094.5

45.055.0

50.050.0

5.0095.0

18.581.5

14.585.5

10.090.0

56.443.6

60.040.0

•  Further interviewing of the patient produces the finding �Smoking� is �Smoker� •  This information propagates through the network

0.1299.9

0.3699.6

0.2599.8

52.147.9

60.040.0

•  Finished with interviewing the patient, the physician begins the examination •  The physician now moves to specific diagnostic tests such as an X-Ray, which results in a �Normal� finding which propagates through the network

•  Note that the information from this finding propagates backward and forward through the arcs

0.1999.8

0.5699.4

0.3999.6

92.27.84

•  The physician also determines that the patient is having difficulty breathing, the finding �Present� is entered for �Dyspnea� and is propagated through the network

•  The doctor might now conclude that the patient has bronchitis and does not have tuberculosis or lung cancer

Applications •  Industrial

•  Processor Fault Diagnosis - by Intel •  Auxiliary Turbine Diagnosis - GEMS

by GE •  Diagnosis of space shuttle

propulsion systems - VISTA by NASA/Rockwell

•  Situation assessment for nuclear power plant - NRC

•  Military •  Automatic Target Recognition -

MITRE •  Autonomous control of unmanned

underwater vehicle - Lockheed Martin

•  Assessment of Intent

•  Medical Diagnosis •  Internal Medicine •  Pathology diagnosis - Intellipath by

Chapman & Hall •  Breast Cancer Manager with

Intellipath

•  Commercial •  Financial Market Analysis •  Information Retrieval •  Software troubleshooting and advice

- Windows 95 & Office 97 •  Pregnancy and Child Care -

Microsoft •  Software debugging - American

Airlines� SABRE online reservation system

Glandular fever

•  Suppose we know that on average, 1% of the population have had glandular fever.

P(had_GF) = 0.01

•  Suppose we have a test for having had glandular fever such that:

–  For a person who has had GF the test would give a positive result with probability 0.977

–  For a person who has not had GF the test would give a negative result with probability 0.926

Q: How could this information be represented as a BBN Q: How could the BBN be used to find out new information?

Step 1: "feature extraction"

List information we are given, determine information we can deduce from that information. •  "on average, 1% of the population have had glandular fever"

P(had GF) = 0.01 => P(not had GF) = 0.99 •  "for a person who has had GF the test would give a positive result with probability 0.977"

Note this is not "a person has had GF"! but

P(+ve test | person has had GF) = 0.977

from which we have that the probability of having had GF but receiving a –ve test result is

P(-ve test | person has had GF) = 1 – p(+ve test | person has had GF)] = 0.023

Step 1: "feature extraction"

•  “for a person who has not had GF the test would give a negative result with probability 0.926”

P(-ve test | person has not had GF) = 0.926

from which

P(+ve test | person has not had GF) = 1 – p(-ve test | person has not had GF) = 0.074

Summing up:

P(had GF) = 0.01 P(not had GF) = 0.99

P(+ve test | had GF) = 0.977 P(-ve test | had GF) = 0.023

P(+ve test | not had GF) = 0.074 P(-ve test | not had GF) = 0.926

Define the structure of the BBN. Important issue! Nodes are random variables. Two choices: 1.  Top-level nodes. These have not any probabilistic dependency. Could be understood

as observations. 2.  Dependency relationships. Describe our interpretation of dependencies in our model.

Both the choices contribute to the structure of the network. Different choices are generally possible. The final structure must not have cycles in the dependency relationship. I.e. the directed graph must be and acyclic (DAG) in order to guarantee efficient propagation algorithms.

In our example: •  Nodes: - Had_GF

- Test_Result •  Relationships: - Had_GF node influences state of Test_Result node

Step 2: BBN construction

1 parent related to 1 child model

Finally,

3. Establish values for Conditional Probability Tables (CPTs), i.e. a representation of the conditional probability for the values of the random variable represented by a node, conditioned by the parent random variable.

Application of a formula. Values must be consistent! E.g. rows (union of all the possible values) should sum to 1.

Test +ve True+ Test –ve False

Had GF True 0.977 (97.7%) 0.023 (2.3%)

Had GF False 0.074 (7.4%) 0.926 (92.6%)

True 0.01 (1%)

False 0.99 (99%) Had GFNode

Test +ve Node

Had GF True 0.977 (97.7%) 0.023 (2.3%)

Had GF False 0.074 (7.4%) 0.926 (92.6%)

Use the network to determine new information.

Examples of information we may wish to determine:

Q1: Given a person has had GF, what is the probability of a negative test result? Q2: What is the probability of a +ve test result? Q3: Given a positive test, what is the probability that the person has had GF? We will start by looking the calculation that these questions require. This is actually

computed by the network, once it has been properly designed.

Step 3: Use of the BBN

Q1: Given a person has had GF, what is the probability of a negative test? Well, this is easy as we have now determined this information in Step 2. Formulated in terms of probability, we wish to find out: P(-ve test | has had GF) Looking back a few slides, P(-ve test | has had GF) = 0.023

Q2: What is the probability of a +ve test result? Well, we need to think of all the possible situations that could happen which would lead

to a +ve test result. Situation 1. +ve test result and had GF:

p(+ve test result ∩ had GF) Situation 2. +ve test result and not had GF:

p(+ve test result ∩ not had GF) We don’t know the above information, BUT we can now use what we know of conditional

probability to calculate it…

Q2: What is the probability of a +ve test result? Again from P(A ∩ B) = P(B|A) P(A):

P(+ve test ∩ had GF) = P(had GF | +ve test) * P(+ve test) P(+ve test ∩ not had GF) = P(not had GF | +ve test) * P(+ve test)

Still, P(+ve test) has not yet been defined. BUT from P(A ∩ B) = P(B ∩ A), then

P(had GF ∩ +ve test) = P(+ve test | had GF) * P(had GF) = 0.977 * 0.01 = 0.00977

P(not had GF ∩ +ve test) = P(+ve test | not had GF) * P(not had GF) = 0.074 * 0.99 = 0.07326

Q2: What is the probability of a +ve test result? Finally, by the sum rule:

P(+ve test) = P(+ve test ∩ had GF) + P(+ve test ∩ had GF) = 0.00977 + 0.07326 = 0.08303

If we chose a random person and gave them a test, they would have a 0.08 probability of showing positive.

Q3: Given a positive test, what is the probability that the person has had GF?

P(has had GF | +ve test) ?

Again, this isn’t a piece of information that has been already defined. We can caluclate

the answer by exploiting the network and Bayes’ theorem:

P(A | B) = P(B | A) P(A) / P(B)

P(has had GF | +ve test) = P( +ve test |has had GF) P(has had GF) / P(+ve test) = (0.977 * 0.01) / P(+ve test) = 0.118

Given that, from Q2 we have P(+ve test) = 0.08303

Q3: Given a positive test, what is the probability that the person has had GF?

P(has had GF | +ve test) = 0.118

So the test isn’t so useful after all:

– remember that 7% of those who’ve never had GF get a positive result – and they far out number those people who have had it and got a positive result!

A BBN model with a hierarchy of 3 nodes

True 0.01 (1%)

False 0.99 (99%)

Had GFNode

Test +ve Node

Had GF True 0.977 (97.7%) 0.023 (2.3%)

Had GF False 0.074 (7.4%) 0.926 (92.6%)

Facts we deduced from the last time…

P(-ve test | has had GF) = 0.023

P(+ve test) = 0.08303

P(had GF | +ve test) = 0.118

Extension to the Glandular Fever Model

•  Suppose we are informed that the school nurse sends home 80% of students that have a positive GF test.

•  She also sends home 5% of students for other medical reasons (i.e. students that have not had a positive GF test).

Q1. How do we incorporate this new information into our network?

Q2. What is the probability of being sent home? Q3. Given that a child is sent home, what is the probability of them having had a

negative test?

Extension to the Glandular Fever Model

Q1. How do we incorporate new information?

True 0.01 (1%)

False 0.99 (99%)

Had GFNode

Test +ve Node

Had GF True 0.977 (97.7%) 0.023 (2.3%)

Had GF False 0.074 (7.4%) 0.926 (92.6%)

Sent home

Sent home True Sent home False

Test +ve True 0.8 (80%) 0.2 (20%)

Test +ve False 0.05 (5%) 0.95 (95%)

We need to add up all the combinations of scenarios which would lead us to being sent home:

- due to a +ve test - due to another reason

P(home) = P(home | +ve test) * P(+ve test) + P(home | -ve test) * P(-ve test) = 0.8 * P(+ve test) + 0.05 * P(-ve test)

We need to calculate P(+ve test) and P(-ve test). We already know

P(+ve test) = P(+ve test | GF) * P(GF) + P(+ve test | not had GF) * P(not had GF) = 0.00977 + 0.07326 = 0.08303

From which P(-ve test) = 1 -0.08303 = 0.91697

And finally P(home) = 0.8 * P(+veTest) + 0.05 * P(-ve Test) = 0.8 * 0.08303 + 0.05 * 0.91697 = 0.1122725 = 0.112 (3 d.p.)

Q2. What is the probability of being sent home?

We’ve already done a similar calculation to this before:

P(-ve test | home) = P(home| -ve test) * P(-ve test) / P(home)

= (0.05 * 0.91697) / 0.1122725 = 0.408 (3 d.p.)

Q3. Given that a child is sent home, what is the probability of them having had a negative test?

Things to remember about calculations in BBNs

1.  Know parent information, want to find out child node information: Use conditional probability

2. Know child information want to find out parent node information:

Use Bayes’ Theorem 3. If have more than 1 parent to a node, remember parents are independent (unless they

share an arc). Thus P(parent A ∩ parent B) = P(parent A) * P(parent B)

A BBN model with 2 parents and 1 child

•  Diane does her shopping each week. The bill for her shop is sometimes under £30, sometimes £30 or over. Two factors influence the cost of her shopping: whether she takes her 2 year old son with her, and whether she takes her 40 year old husband with her.

•  If we know Diane has gone shopping by herself, the likelihood that the bill will be less than £30 is 90%. If we know that Diane was accompanied only by her husband, the likelihood that the bill will be less than £30 is 70%, and if we know that Diane took only her son, the likelihood that the bill will be less than £30 is 80%. Given we know both son and husband accompanied Diane to the shops, then the likelihood that the bill is under £30 reduces to 60%.

•  50% of the time Diane’s husband accompanies her to the shops. 60% of the time Diane is accompanied by her son.

How do you imagine a BBN could represent the above information?

Diane’s shopping model (2p 1c)

•  Diane does her shopping each week. The bill for her shop is sometimes under £30, sometimes £30 or over. Two factors influence the cost of her shopping: whether she takes her 2 year old son with her, and whether she takes her 40 year old husband with her.

•  If we know Diane has gone shopping by herself, the likelihood that the bill will be less than £30 is 90%. If we know that Diane was accompanied only by her husband, the likelihood that the bill will be less than £30 is 70%, and if we know that Diane took only her son, the likelihood that the bill will be less than £30 is 80%. Given we know both son and husband accompanied Diane to the shops, then the likelihood that the bill is under £30 reduces to 60%.

•  50% of the time Diane’s husband accompanies her to the shops. 60% of the time Diane is accompanied by her son.

How do you imagine a BBN could represent the above information?

Questions: Q1. What is the probability of the bill being under £30? Q2. Given that the bill is under £30, what is the probability that Diane’s husband (with or

without her son) accompanied her to the shops?

Firstly, let’s list what we know, ...

P(under_30 | no husband ∩ no son) = 0.9 P(under_30 | husband ∩ no son) = 0.7 P(under_30 | no husband ∩ son) = 0.8 P(under_30 | husband ∩ son) = 0.6 P(husband) = 0.5 P(son) = 0.6 ... and what we can deduce ( 1-P(...) ) P(not_under_30 | no husband ∩ no son) = 0.1 P(not_under_30 | husband ∩ no son) = 0.3 P(not_under_30 | no husband ∩ son) = 0.2 P(not_under_30 | husband ∩ son) = 0.4 P(no_husband) = 0.5 P(no_son) = 0.4

CPT True False

Hub T Son T 60% 40%

Hub T Son F 70% 30%

Hub F Son T 80% 20%

Hub F Son F 90% 10%

Firstly, list situations that lead to bill being under £30

P(under_30) = P(under_30 | no husband ∩ no son) * P(no husband ∩ no son) + P(under_30 | no husband ∩ son) * P(no husband ∩ son) + P(under_30 | husband ∩ no son) * P(husband ∩ no son) + P(under_30 | husband ∩ son) * P(husband ∩ son)

What do we know about the presence of Diane’s husband and the presence of her son? Are these 2 events dependent on each other in any way? No… These events are independent. Independent events are where the occurrence of one

event does not impact on the occurrence of another event:

P(A | B) = P(A) therefore, if A and B are independent events,

P(A, B) = P(A | B) * P(B) = P(A) * P(B)

Q1. What is the probability of the bill being under £30?

P(no husband ∩ no son) = P(no husband) * P(no son) = 0.5 * 0.4 = 0.2

P(no husband ∩ son) = P(no husband) * P(son) = 0.5 * 0.6 = 0.3

P(husband ∩ no son) = P(husband) * P(no son) = 0.5 * 0.4 = 0.2

P(husband ∩ son) = P(husband) * P(son) = 0.5 * 0.6 = 0.3

P(under_30) = (0.9 * 0.2) + (0.8 * 0.3) + (0.7 * 0.2) + (0.6 * 0.3) = 0.74

Q2. Given that the bill is under £30, what is the probability that Diane’s husband (with or without her son)

accompanied her to the shops? 2 scenarios:

P(husband | under £30) = P(husband ∩ son | under £30) + P(husband ∩ not son | under £30)

P(husband ∩ son | under £30) = { using Bayes’) = P(under £30 | P(husband ∩ son)) * P(husband ∩ son) / P(under £30) = (0.6 * 0.3) / 0.74 = 0.243r (r = recurring)

P(husband ∩ not son | under £30) = {using Bayes’} = P(under £30 | P(husband ∩ not son)) * P(husband ∩

not son))/ P(under £30) = (0.7 * 0.2) / 0.74 = 0.189r

Overall:

P(husband | under £30) = 0.243r + 0.189r = 0.432r (3 d.p.)

BBNs: •  Development of propagation algorithms followed by availability of easy to use

commercial software •  Growing number of creative applications, e.g. dementia diagnosis, cancer care

symptom modelling, likelihood of car purchase, ... •  Different from other knowledge-based systems tools because uncertainty is handled

in mathematically rigorous yet efficient and simple way •  Different from other probabilistic analysis tools because of

- network representation of problems, - use of Bayesian statistics, and - the synergy between these.

•  Issue: How to build a network ?

Why are BBNs interesting?

Outline.

1.  Introduction 2.  Probability 3.  Bayesian classification 4.  Bayesian Belief Networks

Probabilities Probabilities: Bayes' Theorem Bayesian Networks... · – based on Bayes Theorem (see...

Documents

Transcript of Probabilities Probabilities: Bayes' Theorem Bayesian Networks... · – based on Bayes Theorem (see...

Bayes’ Theorem - ggn.dronacharya.info

Bayes Theorem and Naive BayesClassifier

Bayes Theorem

Probabilities, Bayes Rule, Markov Chain Monte Carlo

Credible Intervals, Bayes Theorem + Diagnostic Tests

Bayes ’ Theorem

Lecture 3 - Probability and Bayes' rule › ... › Lectures › Lecture3.pdf · Lecture 3 - Probability and Bayes’ rule DD2427 March 22, 2013. When probabilities are used Probabilities

2feb5Session 10 Bayes Theorem

Understanding bayes theorem

Bayes Nets and Probabilities

Bayes theorem applications

Tree Diagrams & Bayes Theorem

Bayesian InferenceStatisticat].pdf1. Bayes’ Theorem Bayes’ theorem shows the relation between two conditional probabilities that are the reverse of each other. This theorem is

Bayes’ Theorem Bayes’ Theorem Proof - Computer Scienceevans/poker/wp-content/uploads/2011/02/class3.pdf · • One-line Proof of Bayes’ Theorem • Inductive Learning Home Game

Bayes’ Theorem

Why Bayes for Clinical Trials? - Hbiostathbiostat.org/doc/bayes/why.pdf · 2019-09-22 · Why Bayes for Clinical Trials? Background Freq&Bayes Types of Probabilities Needed Probabilities

Business Analysis- Bayes Theorem and Posterior Probabilities

Decision Trees & Bayes Theorem 2016

Yudkowsky Bayes' Theorem

Bayesian Theorem Problem Example- Teorema Bayes (Thomas Bayes, 1763) - Yoppy Soleman