Belief Networks & Bayesian Classification
-
Upload
adnanmasood -
Category
Technology
-
view
1.908 -
download
9
description
Transcript of Belief Networks & Bayesian Classification
AN INTRODUCTION TO BAYESIAN BELIEF NETWORKS AND NAÏVE BAYESIAN
CLASSIFICATION
ADNAN MASOODSCIS.NOVA.EDU/~ADNAN
Belief Networks & Bayesian Classification
Overview
Probability and Uncertainty Probability Notation Bayesian Statistics Notation of Probability Axioms of Probability Probability Table Bayesian Belief Network Joint Probability Table Probability of Disjunctions Conditional Probability Conditional Independence Bayes' Rule Classification with Bayes rule Bayesian Classification Conclusion & Further Reading
Probability and Uncertainty
Probability provide a way of summarizing the uncertainty. 60% chance of rain today 85% chance of alarm in case of a burglary
Probability is calculated based upon past performance, or degree of belief.
Bayesian Statistics
Three approaches to Probability Axiomatic
Probability by definition and properties Relative Frequency
Repeated trials Degree of belief (subjective)
Personal measure of uncertaintyExamples
The chance that a meteor strikes earth is 1% The probability of rain today is 30% The chance of getting an A on the exam is 50%
Notation of Probability
Random Variables (RV): are capitalized (usually) e.g. Sky, RoadCurvature, Temperature Refer to attributes of the world whose “status” is unknown have one and only value at a time Have a domain of values that are possible states of the world:
Boolean: domain abbreviated as abbreviated as
Discrete: domain is countable (includes Boolean)
Values are exhaustive and mutually exclusive e.g. Sky domain = abbreviated as also abbrv. as
Continuous: domain is real numbers
Notation of Probability
Uncertainty is represented by: or simply this is: degree of belief that variable takes on value given no
other information relating to a single probability called an unconditional or prior
probability
Properties of
Sum over all values in the domain of variable X is 1 because domain is exhaustive and mutually exclusive.
Axioms of Probability
S – Sample Space (set of possible outcomes)E – Some Event (some subset of outcomes)Axioms
For any sequence of mutually exclusive events, , , … …
Probability Table
P(Weather= sunny)=P(sunny)=5/13P(Weather)={5/14, 4/14, 5/14}Calculate probabilities from data
sunny overcast
rainy
5/14 4/14 5/14
Outlook
An expert built belief network using weather dataset(Mitchell; Witten & Frank)
Bayesian inference can help answer questions like probability of game play if
a. Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong
b. Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong
Bayesian Belief Network
Bayesian belief network allows a subset of the variables conditionally independent
A graphical model of causal relationshipsSeveral cases of learning Bayesian belief
networks• Given both network structure and all the variables:
easy• Given network structure but only some variables• When the network structure is not known in
advance
Bayesian Belief Network
FamilyHistory
Smoker
Lung Cancer Emphysema
Positive X Ray Dyspnea
LC 0.8 0.5 0.7 0.1
~LC 0.2 0.5 0.3 0.9
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
Bayesian Belief Network
The conditional probability tablefor the variable Lung Cancer
A Hypothesis for playing tennis
Joint Probability Table
P(Outlook= sunny, Temperature=hot)=P(sunny, hot)=2/14
P(Temperature=hot)=P(hot)=2/14+2/14+0/14=4/14
With N random variables that can take k values the full joint probability table size is
2/14 2/14 0/14
2/14 1/14 3/14
1/14 1/14 2/14
Outlook
Sunny overcast rainyHot
mild
cool
Temperature
Example: Calculating Global Probabilistic Beliefs
P(PlayTennis) = 9/14 = 0.64P(~PlayTennis) = 5/14 = 0.36P(Outlook=sunny|PlayTennis) = 2/9 = 0.22P(Outlook=sunny|~PlayTennis) = 3/5 = 0.60P(Temperature=cool|PlayTennis) = 3/9 =
0.33P(Temperature=cool|~PlayTennis) = 1/5
= .20P(Humidity=high|PlayTennis) = 3/9 = 0.33P(Humidity=high|~PlayTennis) = 4/5 = 0.80P(Wind=strong|PlayTennis) = 3/9 = 0.33P(Wind=strong|~PlayTennis) = 3/5 = 0.60
Probability of Disjunctions
Conditional Probability
Probabilities discussed so far are called prior probabilities or unconditional probabilities Probabilities depend only on the data, not on any other
variableBut what if you have some evidence or knowledge
about the situation? You know have a toothache. Now what is the probability of having a cavity?
Conditional Probability
Calculate conditional probabilities from data as follows:
What is
Conditional Probability
You can think of P(B) as just a normalization constant to make P(A|B) adds up to 1.
Product rule: P(A, B)=P(A|B)P(B)=P(B|A)P(A)Chain Rules is successive applications of product rule:
¿
Conditional Independence
What if I know Weather=cloudy today. Now what is the P(cavity)?
If knowing some evidence doesn’t change the probability of some other random variable then we say the two random variables are independent
A and B are independent if P(A|B)=P(A).Other ways of seeing this (all are equivalent):
Absolute Independence is powerful but rare!
The independence hypothesis…
… makes computation possible… yields optimal classifiers when satisfied… but is seldom satisfied in practice, as
attributes (variables) are often correlated.Attempts to overcome this limitation:• Bayesian networks, that combine Bayesian
reasoning with causal relationships between attributes
• Decision trees, that reason on one attribute at the time, considering most important attributes first
Conditional Independence
P(Toothache, Cavity, Catch) has independent entries If I have a cavity, the probability that the probe catches in it
doesn’t depend on whether I have a toothache:
The same independent holds if I haven’t got a cavity:(
Catch is conditionally independent of toothache given cavity:
Equivalent statements:
Bayes’ Rule
Remember Conditional Probabilities: P(A|B)=P(A,B)/P(B) P(B)P(A|B)=P(A.B) P(B|A)=P(B,A)/P(A) P(A)P(B|A)=P(B,A) P(B,A)=P(A,B) P(B)P(A|B)=P(A)P(B|A)
Bayes’ Rule: P(A|B)=P(B|A)P(A)/P(B)
Bayes’ Rule
A more general form is:
Bayes’ rule allows you to conditional probabilities on their head: Useful for assessing diagnostics probability from causal probability
E.g., let M be meningitis, S be stiff neck:=0.80.0001/0.1=0.0008
Note posterior probability of meningitis still very small!
Classification with Bayes Rule
Naïve Bayes Model
Classify with highest probability One of the most widely used classifiers Very Fast to train and to classify
• One pass over all data to train• One lookup for each features/ class combination to classify
Assuming the features are independent given the class (conditional independence)
Naïve Bayes Classifier
Simplified assumption: features are conditionally independent:
Bayesian Classification: Why?
Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems
Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.
Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities
Benchmark: Even if Bayesian methods are computationally intractable, they can provide a benchmark for other algorithms
Classification with Bayes Rule
Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics
Issues with naïve Bayes
Change in Classifier Data (on the fly, during classification)
Conditional independence assumption is violated Consider the task of classifying whether or not a certain
word is corporation name E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”
Two useful features we might want to use are capitalized, and all-capitals
Native Bayes will assume that these two features are independent given the class, but this clearly isn’t the case (things that are all-caps must also be capitalized )!!
However naïve Bayes seems to work well in practice even when this assumption is violated
Naïve Bayes Classifier
Naive Bayesian Classifier
Given a training set, we can compute the probabilities
Outlook P N
Sunny 2/9 3/5
Overcast 4/9 0
rain 3/9 2/5
Temperature
Hot 2/9 2/5
Mild 4/9 2/5
cool 3/9 1/5
Humidity
P N
High 3/9 4/5
normal 6/9 1/5
Windy
true 3/9 3/5
false 6/9 2/5
Bayesian classification
The classification problem may be formalized using a-posteriori probabilities :
probability that the sample tuple ,……., is of class C
E.g., P(class=N | outlook=sunny, windy=true,…)
Idea: assign sample X class label C such that is maximal
Estimating a-posteriori probabilities
Bayes theorem: /
is constant for all classes relative frequency of class C samplesC such that is maximum =C such that is maximumProblem: computing P(X|C) is infeasible!
Naïve Bayesian Classification
Naïve assumption: attribute independence
If i-th attribute is categorical: is estimated as the relative freq of samples having value as i-th attribute in class C
If i-th attribute is continuous: is estimated thru a Gaussian density function
Computationally easy in both cases
Play-tennis example: estimating
P(p) = 9/14
P(n) = 5/14
outlook
P(sunny|p) = 2/9 P(sunny|n) = 3/5
P(overcast|p) =4/9 P(overcast|n) = 0
P(rain|p) = 3/9 P(rain|n) = 2/5
temperature
P(hot|p) = 2/9 P(hot|n) = 2/5
P(mild|p) = 4/9 P(mild|n) = 2/5
P(cool|p) = 3/9 P(cool|n) = 1/5
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
Play Tennis example
An unseen sample X = <rain, hot, high, false>
=
=Sample X is classified in class n (don’t play)
Conclusion & Future Reading
ProbabilitiesJoint ProbabilitiesConditional ProbabilitiesIndependence, Conditional IndependenceNaïve Bayes Classifier
References
J. Han, M. Kamber; Data Mining; Morgan Kaufmann Publishers: San Francisco, CA.
Bayesian Networks without Tears. | Charniak | AI Magazine http://www.aaai.org/ojs/index.php/aimagazine/article/view/918
Bayesian networks - Automated Reasoning Group – UCLA – Adnan Darwiche