Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a...
-
Upload
valentine-oneal -
Category
Documents
-
view
215 -
download
2
Transcript of Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a...
What does it mean for two variables to be independent?
Consider a multidimensional distribution p(x).
If for two features we know that p(xi,xj) = p(xi)p(xj) we say the features are statistically independent.
If we know which features are independent and which not,we can simplify the computation of joint probabilities.
A Bayesian Belief Network is a method to describe the joint probabilitydistribution of a set of variables.
It is also called a casual network or belief net.
Let x1, x2, …, xn be a set of variables or features. A Bayesian Belief Network or BBN will tell us the probability of any combination of x1, x2 , .., xn.
StormBus Tour Group
Lightning
Thunder
Campfire
Forest Fire
Set of Boolean variables and their relations:
We say x1 is conditionally independent of x2 given x3 if theprobability of x1 is independent of x2 given x3:
P(x1|x2,x3) = P(x1|x3)
The same can be said for a set of variables:
x1,x2,x3 is independent of y1,y2,y3 given z1,z2,z3:
P(x1,x2,x2|y1,y2,y3,z1,z2,z3) = P(x1,x2,x3|z1,z2,z3)
A BBN represents the joint probability distribution of a set of variablesby explicitly indicating the assumptions of conditional independencethrough :a) directed acyclic graph and b) local conditional probabilities.
conditionalprobabilities
StormBus Tour Group
Campfire
Each variable is independent of its non-descendants given its predecessors We say x1 is a descendant of x2 if there is a direct path from x2 to x1. Example: Predecessors of Campfire: Storm, Bus Tour Group. (Campfire is a descendant of these two variables). Campfire is independent of Lightning given its predecessors.
Lightning
StormBus Tour Group
Campfire
To compute the joint probability distribution of a set of variablesgiven a Bayesian Belief Network we simply use the formula:
P(x1,x2,…,xn) = Π i P(xi | Parents(xi))
Where parents are the immediate predecessors of xi.
Example:
P(Campfire, Storm, BusGroupTour, Lightning, Thunder, ForestFire)?
P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup)P(Lightning|Storm)P(Thunder|Lightning)P(ForestFire|Lightning,Storm,Campfire).
P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup)P(Lightning|Storm)P(Thunder|Lightning)P(ForestFire|Lightning,Storm,Campfire).
StormBus Tour Group
Lightning
Thunder
Campfire
Forest Fire
C 0.4 0.1 0.8 0.2
S,B S,~B ~S,B ~S,~B
~C 0.6 0.9 0.2 0.8
P(Campfire=true|Storm=true,BusTourGroup=true) = 0.4
StormBus Tour Group
Campfire
We can learn BBN in different ways. Two basic approaches follow:
1. Assume we know the network structure: We can estimate the conditional probabilities for each variable from the data.
2. Assume we know part of the structure but some variables are missing: This is like learning hidden units in a neural network. One can use a gradient ascent method to train the BBN.
3. Assume nothing is known. We can learn the structure and conditional probabilities by looking in the space of possible networks.
What is the connection between a BBN and classification?
Suppose one of the variables is the target variable. Can we compute the probability of the target variable given the other variables?
In Naïve Bayes:
X1 X2 Xn…
Concept wj
P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) … P(xn|wj)
In the general case we can use a BBN to specify independence assumptions among variables.
General Case:
X1 X2 x4
Concept wj
P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) P(x3|x1,x2,wj)P(x4,wj)
X3
Choose the right order from causes to effects.
P(x1,x2,…,xn) = P(xn|xn-1,..,x1)P(xn-1,…,x1)
= Π P(xi|xi-1,…,x1) -- chain rule
Example: P(x1,x2,x3) = P(x1|x2,x3)P(x2|x3)P(x3)
P(x1,x2,x3)
x3
x2
x1
root cause
leaf
Correct order: add root causes first, and then “leaves”, with no influence on other nodes.
BBN are locally structured systems.They represent joint distributions compactly.
Assume n random variables, each influencedby k nodes. Size BBN: n2k Full size: 2n
Even if k is small O(2k) may be unmanageable.
Solution: use canonical distributions.
Example:
U.S.
CanadaMexico
North America simpledisjunction
Inhibitions probabilities:
P(~fever | cold, ~flu, ~malaria) = 0.6 P(~fever | ~cold, flu, ~malaria) = 0.2 P(~fever | ~cold, ~flu, malaria) = 0.1
Now the whole probability can be built:
P(~fever | cold, ~flu, malaria) = 0.6 x 0.1 P(~fever | cold, flu, ~malaria) = 0.6 x 0.2P(~fever | ~cold, flu, malaria) = 0.2 x 0.1P(~fever | cold, flu, malaria) = 0.6 x 0.2 x 0.1
P(~fever | ~cold, ~flu, ~malaria) = 1.0
Continuous variables can be discretized.
Or define probability density functionsExample: Gaussian distribution.
A network with both variables is called a Hybrid Bayesian Network.