Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a...

What does it mean for two variables to be independent?

Consider a multidimensional distribution p(x).

If for two features we know that p(xi,xj) = p(xi)p(xj) we say the features are statistically independent.

If we know which features are independent and which not,we can simplify the computation of joint probabilities.

A Bayesian Belief Network is a method to describe the joint probabilitydistribution of a set of variables.

It is also called a casual network or belief net.

Let x1, x2, …, xn be a set of variables or features. A Bayesian Belief Network or BBN will tell us the probability of any combination of x1, x2 , .., xn.

StormBus Tour Group

Lightning

Thunder

Campfire

Forest Fire

Set of Boolean variables and their relations:

C 0.4 0.1 0.8 0.2

S,B S,~B ~S,B ~S,~B

~C 0.6 0.9 0.2 0.8

StormBus Tour Group

Campfire

We say x1 is conditionally independent of x2 given x3 if theprobability of x1 is independent of x2 given x3:

P(x1|x2,x3) = P(x1|x3)

The same can be said for a set of variables:

x1,x2,x3 is independent of y1,y2,y3 given z1,z2,z3:

P(x1,x2,x2|y1,y2,y3,z1,z2,z3) = P(x1,x2,x3|z1,z2,z3)

A BBN represents the joint probability distribution of a set of variablesby explicitly indicating the assumptions of conditional independencethrough :a) directed acyclic graph and b) local conditional probabilities.

conditionalprobabilities

StormBus Tour Group

Campfire

Each variable is independent of its non-descendants given its predecessors We say x1 is a descendant of x2 if there is a direct path from x2 to x1. Example: Predecessors of Campfire: Storm, Bus Tour Group. (Campfire is a descendant of these two variables). Campfire is independent of Lightning given its predecessors.

Lightning

StormBus Tour Group

Campfire

To compute the joint probability distribution of a set of variablesgiven a Bayesian Belief Network we simply use the formula:

P(x1,x2,…,xn) = Π i P(xi | Parents(xi))

Where parents are the immediate predecessors of xi.

Example:

P(Campfire, Storm, BusGroupTour, Lightning, Thunder, ForestFire)?

P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup)P(Lightning|Storm)P(Thunder|Lightning)P(ForestFire|Lightning,Storm,Campfire).

P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup)P(Lightning|Storm)P(Thunder|Lightning)P(ForestFire|Lightning,Storm,Campfire).

StormBus Tour Group

Lightning

Thunder

Campfire

Forest Fire

C 0.4 0.1 0.8 0.2

S,B S,~B ~S,B ~S,~B

~C 0.6 0.9 0.2 0.8

P(Campfire=true|Storm=true,BusTourGroup=true) = 0.4

StormBus Tour Group

Campfire

We can learn BBN in different ways. Two basic approaches follow:

1. Assume we know the network structure: We can estimate the conditional probabilities for each variable from the data.

2. Assume we know part of the structure but some variables are missing: This is like learning hidden units in a neural network. One can use a gradient ascent method to train the BBN.

3. Assume nothing is known. We can learn the structure and conditional probabilities by looking in the space of possible networks.

What is the connection between a BBN and classification?

Suppose one of the variables is the target variable. Can we compute the probability of the target variable given the other variables?

In Naïve Bayes:

X1 X2 Xn…

Concept wj

P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) … P(xn|wj)

In the general case we can use a BBN to specify independence assumptions among variables.

General Case:

X1 X2 x4

Concept wj

P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) P(x3|x1,x2,wj)P(x4,wj)

X3

Professor at Univ. of California at LA

Pioneer of Bayesian Networks

Simple Diagnosis

Smoker

CancerLung Disease

PositiveResults

Congestion

Choose the right order from causes to effects.

P(x1,x2,…,xn) = P(xn|xn-1,..,x1)P(xn-1,…,x1)

= Π P(xi|xi-1,…,x1) -- chain rule

Example: P(x1,x2,x3) = P(x1|x2,x3)P(x2|x3)P(x3)

P(x1,x2,x3)

x3

x2

x1

root cause

leaf

Correct order: add root causes first, and then “leaves”, with no influence on other nodes.

BBN are locally structured systems.They represent joint distributions compactly.

Assume n random variables, each influencedby k nodes. Size BBN: n2k Full size: 2n

Even if k is small O(2k) may be unmanageable.

Solution: use canonical distributions.

Example:

U.S.

CanadaMexico

North America simpledisjunction

Cold Flu Malaria

Fever

A link may be inhibited due to uncertainty

Inhibitions probabilities:

P(~fever | cold, ~flu, ~malaria) = 0.6 P(~fever | ~cold, flu, ~malaria) = 0.2 P(~fever | ~cold, ~flu, malaria) = 0.1

Continuous variables can be discretized.

Or define probability density functionsExample: Gaussian distribution.

A network with both variables is called a Hybrid Bayesian Network.

Subsidy Harvest

Cost

Buys

P(cost | harvest, subsidy)P(cost | harvest, ~subsidy)

Normal distribution

x

P(x)

• Bayesian networks are directed acyclic graphs that concisely represent conditional independence relations among random variables.• BBN specify the full joint probability distribution of a set of variables.• BBN can be hybrid, combining categorical variables with numeric variables.

Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a...

Documents

Transcript of Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a...