1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ?...

78
1 Sampling Bayesian Networks ICS 295 2008
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ?...

Page 1: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

1

Sampling Bayesian Networks

ICS 295

2008

Page 2: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

4

Algorithm Tree

Page 3: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

5

Sampling Fundamentals

)(

)()(XD

dxXxggE

Given a set of variables X = {X1, X2, … Xn}, a joint probability distribution (X) and some function g(X), we can compute expected value of g(X) :

)(

)()(XDx

xpxggE

Page 4: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

6

Sampling From (X)

Given independent, identically distributed samples (iid) S1, S2, …ST from (X), it follows from Strong Law of Large Numbers:

T

t

tSgT

g1

)(1

},...,,{ 21tn

ttt xxxS A sample St is an instantiation:

Page 5: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

7

Sampling Challenge

It is hard to generate samples from (X) Trade-Offs:

Generate samples from Q(X) Forward Sampling, Likelyhood Weighting, IS Try to find Q(X) close to (X)

Generate dependent samples forming a Markov Chain from P’(X)(X)

Metropolis, Metropolis-Hastings, Gibbs Try to reduce dependence between samples

Page 6: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

8

Markov Chain

A sequence of random values x0,x1,… , defined on a finite state space D(X), is called a Markov Chain if it satisfies the Markov Property:

)|(),...,|( 101 xxyxPzxxxyxP tttt

If P(xt+1 =y |xt) does not change with t (time homogeneous), then it is often expressed as a transition function, A(x,y)

Liu, Ch.12, p 245

1),( y

yxA

Page 7: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

9

Markov Chain Monte Carlo

First, define a transition probability P(xt+1=y|xt)

Pick initial state x0, usually not important because it becomes “forgotten”

Generate samples x1, x2,… sampling each next value from P(X| xt)

x0 x1 xt xt+1

If we choose proper P(xt+1|xt), we can guarantee that the distribution represented by samples x0,x1,… converges to (X)

Page 8: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

10

Markov Chain Properties

Irreducibility Periodicity Recurrence Revercibility Ergodicity Stationary Distribution

Page 9: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

11

Irreducible

A station x is said to be irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps.

If on state is irreducible, then all the sates must be irreducible.

Liu, Ch. 12, pp. 249, Def. 12.1.1

Page 10: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

12

Aperiodic

A state x is aperiodic if the greatest common divider of {n : An(x,x) > 0} is 1.

If state x is aperiodic and the chain is irreducible, then every state must be aperiodic.

Liu, Ch. 12, pp.240-250, Def. 12.1.1

Page 11: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

13

Recurrence

A state x is recurrent if the chain returns to x with probability 1

State x is recurrent if and only if:

Let M(x) be the expected number of steps to return to state x

State x is positive recurrent if M(x) is finite The recurrent states in a finite state chain are

positive recurrent.

0

)(

n

niip

Page 12: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

14

Ergodicity

A state x is ergodic if it is aperiodic and positive recurrent.

If all states in a Markov chain are ergodic then the chain is ergodic.

Page 13: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

15

Reversibility

Detail balance condition:

Markov chain is reversible if there is a such that:

For a reversible Markov chain, is always a stationary distribution.

)()|()()|( 111 jxPjxixPixPixjxP tttttt

jijiji pp

Page 14: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

16

Stationary Distribution

If the Markov chain is time-homogeneous, then the vector (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:

)(

),()()(XDy

xyAyx

An irreducible chain has a stationary distribution if and only if all of its states are positive recurrent. The distribution is unique.

Page 15: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

17

Stationary Distribution In Finite State Space

Stationary distribution always exists but may not be unique

If a finite-state Markov chain is irreducible and aperiodic, it is guaranteed to be unique and A(n)=P(xn = y | x0) converges to a rank-one matrix in which each row is the stationary distribution .

Thus, initial state x0 is not important for convergence: it gets forgotten and we start sampling from target distribution

However, it is important how long it takes to forget it!

Page 16: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

18

Convergence Theorem

Given a finite state Markov Chain whose transition function is irreducible and aperiodic, then An(x0,y) converges to its invariant distribution (y) geometrically in variation distance, then there exists a 0 < r < 1 and c > 0 s.t.:

nn crxA var

),(

Page 17: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

19

Eigen-Value Condition

Convergence to stationary distribution is driven by eigen-values of matrix A(x,y).

“The chain will converge to its unique invariant distribution if and only if matrix A’s second largest eigen-value in modular is strictly less than 1.”

Many proofs of convergence are centered around analyzing second eigen-value.

Liu, Ch. 12, p. 249

Page 18: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

20

Convergence In Finite State Space

Assume a finite-state Markov chain is irreducible and aperiodic

Initial state x0 is not important for convergence: it gets forgotten and we start sampling from target distribution

However, it is important how long it takes to forget it! – known as burn-in time

Since the first k states are not drown exactly from , they are often thrown away. Open question: how big a k ?

Page 19: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

21

Sampling in BN

Same Idea: generate a set of samples T

Estimate P(Xi|E) from samples Challenge: X is a vector and P(X) is a

huge distribution represented by BN Need to know:

How to generate a new sample ? How many samples T do we need ? How to estimate P(E=e) and P(Xi|e) ?

Page 20: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

23

Sampling Algorithms

Forward Sampling Gibbs Sampling (MCMC)

Blocking Rao-Blackwellised

Likelihood Weighting Importance Sampling Sequential Monte-Carlo (Particle

Filtering) in Dynamic Bayesian Networks

Page 21: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

24

Gibbs Sampling

Markov Chain Monte Carlo method(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

Transition probability equals the conditional distribution

Example: (X,Y), A(xt+1|yt)=P(x|y), A(yt+1|xt) = P(y|x)

x0

y0

x1

y1

Page 22: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

25

Gibbs Sampling for BN

Samples are dependent, form Markov Chain Sample from P’(X|e) which converges to P(X|

e) Guaranteed to converge when all P > 0 Methods to improve convergence:

Blocking Rao-Blackwellised

Error Bounds Lag-t autocovariance Multiple Chains, Chebyshev’s Inequality

Page 23: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

26

Gibbs Sampling (Pearl, 1988)

A sample t[1,2,…],is an instantiation of all variables in the network:

Sampling process Fix values of observed variables e Instantiate node values in sample x0 at

random Generate samples x1,x2,…xT from P(x|e) Compute posteriors from samples

},...,,{ 2211tNN

ttt xXxXxXx

Page 24: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

27

Ordered Gibbs Sampler

Generate sample xt+1 from xt :

In short, for i=1 to N:),\|(

),,...,,|(

...

),,...,,|(

),,...,,|(

1

11

12

11

1

31

121

22

3211

11

exxxPxX

exxxxPxX

exxxxPxX

exxxxPxX

it

itii

tN

ttN

tNN

tN

ttt

tN

ttt

from sampled

ProcessAllVariablesIn SomeOrder

Page 25: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

28

Gibbs Sampling (cont’d)(Pearl, 1988)

ij chX

jjiiit

i paxPpaxPxxxP )|()|()\|(

:)\|( )\|( :Important it

iit

i xmarkovxPxxxP

iX )()( jj chX

jiii pachpaXM

Markov blanket:

nodesother all oft independen is parents), their andchildren, (parents,

Given

iX

blanketMarkov

Page 26: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

29

Ordered Gibbs Sampling Algorithm

Input: X, EOutput: T samples {xt } Fix evidence E Generate samples from P(X | E)1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. Xi sample xi

t from P(Xi | markovt \ Xi)

Page 27: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

Answering Queries

Query: P(xi |e) = ? Method 1: count #of samples where Xi=xi:

Method 2: average probability (mixture estimator):

T

t it

iiii XmarkovxXPT

xXP1

)\|(1

)(

T

xXsamplesxXP ii

ii

)(#)(

Page 28: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

31

Gibbs Sampling Example - BN

X = {X1,X2,…,X9}

E = {X9}X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 29: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

32

Gibbs Sampling Example - BN

X1 = x10

X6 = x60

X2 = x20

X7 = x70

X3 = x30

X8 = x80

X4 = x40

X5 = x50

X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 30: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

33

Gibbs Sampling Example - BN

X1 P (X1 |X0

2,…,X0

8 ,X9}

E = {X9}X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 31: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

34

Gibbs Sampling Example - BN

X2 P(X2 |X1

1,…,X0

8 ,X9}

E = {X9}

X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 32: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

35

Gibbs Sampling: Illustration

Page 33: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

36

Gibbs Sampling Example – Init

Initialize nodes with random values:

X1 = x10 X6 = x6

0

X2 = x20 X7 = x7

0

X3 = x30 X8 = x8

0

X4 = x40

X5 = x50

Initialize Running Sums:SUM1 = 0

SUM2 = 0

SUM3 = 0

SUM4 = 0

SUM5 = 0

SUM6 = 0

SUM7 = 0

SUM8 = 0

Page 34: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

37

Gibbs Sampling Example – Step 1

Generate Sample 1 compute SUM1 += P(x1| x2

0, x30, x4

0, x50, x6

0, x70, x8

0, x9 ) select and assign new value X1 = x1

1 compute SUM2 += P(x2| x1

1, x30, x4

0, x50, x6

0, x70, x8

0, x9 ) select and assign new value X2 = x2

1 compute SUM3 += P(x2| x1

1, x21, x4

0, x50, x6

0, x70, x8

0, x9 ) select and assign new value X3 = x3

1 …..

At the end, have new sample: S1 = {x1

1, x21, x4

1, x51, x6

1, x71, x8

1, x9 }

Page 35: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

38

Gibbs Sampling Example – Step 2

Generate Sample 2 Compute P(x1| x2

1, x31, x4

1, x51, x6

1, x71, x8

1, x9 ) select and assign new value X1 = x1

1 update SUM1 += P(x1| x2

1, x31, x4

1, x51, x6

1, x71, x8

1, x9 ) Compute P(x2| x1

2, x31, x4

1, x51, x6

1, x71, x8

1, x9 ) select and assign new value X2 = x2

1 update SUM2 += P(x2| x1

2, x31, x4

1, x51, x6

1, x71, x8

1, x9 ) Compute P(x3| x1

2, x22, x4

1, x51, x6

1, x71, x8

1, x9 ) select and assign new value X3 = x3

1 compute SUM3 += P(x2| x1

2, x22, x4

1, x51, x6

1, x71, x8

1, x9 ) …..

New sample: S2 = {x12, x2

2, x42, x5

2, x62, x7

2, x82, x9 }

Page 36: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

39

Gibbs Sampling Example – Answering Queries

P(x1|x9) = SUM1 /2

P(x2|x9) = SUM2 /2 P(x3|x9) = SUM3 /2P(x4|x9) = SUM4 /2

P(x5|x9) = SUM5 /2P(x6|x9) = SUM6 /2P(x7|x9) = SUM7 /2P(x8|x9) = SUM8 /2

Page 37: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

40

Gibbs Convergence Stationary distribution = target sampling

distribution MCMC converges to the stationary

distribution if network is ergodic Chain is ergodic if all probabilities are

positive

Si Sj pij > 0pij

If i,j such that pij = 0 , then we may not be able to explore full sampling space !

Page 38: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

41

Gibbs Sampling: Burn-In

We want to sample from P(X | E) But…starting point is random Solution: throw away first K samples Known As “Burn-In” What is K ? Hard to tell. Use intuition. Alternatives: sample first sample values

from approximate P(x|e) (for example, run IBP first)

Page 39: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

42

Gibbs Sampling: Performance

+Advantage: guaranteed to converge to P(X|E)-Disadvantage: convergence may be slow

Problems:

Samples are dependent ! Statistical variance is too big in high-

dimensional problems

Page 40: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

43

Gibbs: Speeding Convergence

Objectives:1. Reduce dependence between

samples (autocorrelation) Skip samples Randomize Variable Sampling Order

2. Reduce variance Blocking Gibbs Sampling Rao-Blackwellisation

Page 41: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

44

Skipping Samples

Pick only every k-th sample (Gayer, 1992)

Can reduce dependence between samples !

Increases variance ! Waists samples !

Page 42: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

45

Randomized Variable Order

Random Scan Gibbs SamplerPick each next variable Xi for update at

random with probability pi , i pi = 1.

(In the simplest case, pi are distributed uniformly.)

In some instances, reduces variance (MacEachern, Peruggia, 1999 “Subsampling the Gibbs Sampler: Variance

Reduction”)

Page 43: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

46

Blocking

Sample several variables together, as a block Example: Given three variables X,Y,Z, with

domains of size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:

Xt+1 P(yt,zt)=P(wt)(yt+1,zt+1)=Wt+1 P(xt+1)

+ Can improve convergence greatly when two variables are strongly correlated!

- Domain of the block variable grows exponentially with the #variables in a block!

Page 44: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

47

Blocking Gibbs Sampling

Jensen, Kong, Kjaerulff, 1993“Blocking Gibbs Sampling Very Large

Probabilistic Expert Systems” Select a set of subsets:

E1, E2, E3, …, Ek s.t. Ei X

Ui Ei = X

Ai = X \ Ei

Sample P(Ei | Ai)

Page 45: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

48

Rao-Blackwellisation

Do not sample all variables! Sample a subset! Example: Given three variables

X,Y,Z, sample only X and Y, sum out Z. Given sample (xt,yt), compute next sample:

Xt+1 P(x|yt)yt+1 P(y|xt+1)

Page 46: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

49

Rao-Blackwell Theorem

Bottom line: reducing number of variables in a sample reduce variance!

Page 47: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

50

Blocking vs. Rao-Blackwellisation

Standard Gibbs:P(x|y,z),P(y|x,z),P(z|x,y) (1)

Blocking:P(x|y,z), P(y,z|x) (2)

Rao-Blackwellised:P(x|y), P(y|x) (3)

Var3 < Var2 < Var1 [Liu, Wong, Kong, 1994Covariance structure of the Gibbs

sampler…]

X Y

Z

Page 48: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

51

Rao-Blackwellised Gibbs: Cutset Sampling

Select C X (possibly cycle-cutset), |C| = m

Fix evidence E Initialize nodes with random values:

For i=1 to m: ci to Ci = c 0i

For t=1 to n , generate samples:For i=1 to m:Ci=ci

t+1 P(ci|c1 t+1,…,ci-1

t+1,ci+1t,…,cm

t ,e)

Page 49: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

52

Cutset Sampling

Select a subset C={C1,…,CK} X A sample t[1,2,…],is an instantiation of C:

Sampling process Fix values of observed variables e Generate sample c0 at random Generate samples c1,c2,…cT from P(c|e) Compute posteriors from samples

},...,,{ 2211tKK

ttt cCcCcCc

Page 50: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

53

Cutset SamplingGenerating Samples

Generate sample ct+1 from ct :

In short, for i=1 to K:),\|(

),,...,,|(

...

),,...,,|(

),,...,,|(

1

11

12

11

1

31

121

22

3211

11

ecccPcC

eccccPcC

eccccPcC

eccccPcC

it

itii

tK

ttK

tKK

tK

ttt

tK

ttt

from sampled

Page 51: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

54

Rao-Blackwellised Gibbs: Cutset Sampling

How to compute P(ci|c t\ci, e) ?

Compute joint P(ci, c t\ci, e) for each ci

D(Ci) Then normalize:

P(ci| c t\ci , e) = P(ci, c

t\ci , e) Computation efficiency depends

on choice of C

Page 52: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

55

Rao-Blackwellised Gibbs: Cutset Sampling

How to choose C ? Special case: C is cycle-cutset, O(N) General case: apply Bucket Tree Elimination (BTE), O(exp(w)) where w is the induced width of the network when nodes in C are observed.

Pick C wisely so as to minimize w notion of w-cutset

Page 53: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

56

w-cutset Sampling

C=w-cutset of the network, a set of nodes such that when C and E are instantiated, the adjusted induced width of the network is w

Complexity of exact inference: bounded by w !

cycle-cutset is a special case

Page 54: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

57

Cutset Sampling-Answering Queries

Query: ci C, P(ci |e)=? same as Gibbs: Special case of w-cutset

computed while generating sample t

compute after generating sample t

T

t it

ii ecccPT

|e)(cP1

),\|(1

Query: P(xi |e) = ?

T

t

tii ,ecxP

T|e)(xP

1)|(

1

Page 55: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

58

Cutset Sampling Example

}{ 05

02

0 ,xxc

X1

X7

X5 X4

X2

X9 X8

X3

E=x9

X6

Page 56: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

59

Cutset Sampling Example

),(

),(1)(

),(

),(

}{

905

''2

905

'2

9052

12

905

''2

905

'2

05

02

0

,xxxBTE

,xxxBTE,x| xxP x

,xxxBTE

,xxxBTE

,xx c

X1

X7

X6 X5 X4

X2

X9 X8

X3

Sample a new value for X2 :

Page 57: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

60

Cutset Sampling Example

},{

),(

),(1)(

),(

),(

)(

},{

15

12

1

9''

512

9'5

12

9125

15

9''

512

9'5

12

9052

12

05

02

0

xxc

,xxxBTE

,xxxBTE,x| xxP x

,xxxBTE

,xxxBTE

,x| xxP x

xxc

X1

X7

X6 X5 X4

X2

X9 X8

X3

Sample a new value for X5 :

Page 58: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

61

Cutset Sampling Example

)(

)(

)(

3

1)|(

)(

)(

)(

9252

9152

9052

92

9252

32

9152

22

9052

12

,x| xxP

,x| xxP

,x| xxP

xxP

,x| xxP x

,x| xxP x

,x| xxP x

X1

X7

X6 X5 X4

X2

X9 X8

X3

Query P(x2|e) for sampling node X2 :Sample 1

Sample 2

Sample 3

Page 59: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

62

Cutset Sampling Example

),,|(

),,|(

),,|(

3

1)|(

),,|(},{

),,|(},{

),,|(},{

935

323

925

223

915

123

93

935

323

35

32

3

925

223

25

22

2

915

123

15

12

1

xxxxP

xxxxP

xxxxP

xxP

xxxxPxxc

xxxxPxxc

xxxxPxxc

X1

X7

X6 X5 X4

X2

X9 X8

X3

Query P(x3 |e) for non-sampled node X3 :

Page 60: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

63

Gibbs: Error Bounds

Objectives: Estimate needed number of samples T Estimate error Methodology: 1 chain use lag-k autocovariance

Estimate T M chains standard sampling

variance Estimate Error

Page 61: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

64

Gibbs: lag-k autocovariance

12

1

1

1

)(2)0(1

)(

))((1

)(

)\|(1

)|(

)\|(

i

kN

t kii

itN

t ii

it

ii

iT

PVar

PPPPT

k

xxxPT

exPP

xxxPP

Lag-k autocovariance

Page 62: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

65

Gibbs: lag-k autocovariance

12

1

)(2)0(1

)(

i

iT

PVar

)(

)0(ˆPVar

T

Estimate Monte Carlo variance:

Here, is the smallest positive integer satisfying:

1)12()2( Effective chain size:

In absense of autocovariance: TT ˆ

Page 63: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

66

Gibbs: Multiple Chains

Generate M chains of size K Each chain produces independent estimate Pm:

M

i mPM

P1

1

)\|(1

)|(1 i

tK

t iim xxxPK

exPP

Treat Pm as independent random variables.

Estimate P(xi|e) as average of Pm (xi|e) :

Page 64: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

67

Gibbs: Multiple Chains

{ Pm } are independent random variables

Therefore:

M

St

PMPM

PPM

SPVar

M

M

mm

M

m m

1,2/

1

22

2

1

2

1

1

1

1)(

Page 65: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

68

Geman&Geman1984

Geman, S. & Geman D., 1984. Stocahstic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans.Pat.Anal.Mach.Intel. 6, 721-41.

Introduce Gibbs sampling; Place the idea of Gibbs sampling in a general

setting in which the collection of variables is structured in a graphical model and each variable has a neighborhood corresponding to a local region of the graphical structure. Geman and Geman use the Gibbs distribution to define the joint distribution on this structured set of variables.

Page 66: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

69

Tanner&Wong 1987

Tanner and Wong (1987) Data-augmentation Convergence Results

Page 67: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

70

Pearl1988

Pearl,1988. Probabilistic Reasoning in Intelligent Systems, Morgan-Kaufmann.

In the case of Bayesian networks, the neighborhoods correspond to the Markov blanket of a variable and the joint distribution is defined by the factorization of the network.

Page 68: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

71

Gelfand&Smith,1990

Gelfand, A.E. and Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. J. Am.Statist. Assoc. 85, 398-409.

Show variance reduction in using mixture estimator for posterior marginals.

Page 69: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

72

Neal, 1992

R. M. Neal, 1992. Connectionist learning of belief networks, Artifical Intelligence, v. 56, pp. 71-118.

Stochastic simulation in Noisy-Or networks.

Page 70: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

73

CPCS54 Test Results

MSE vs. #samples (left) and time (right)

Ergodic, |X| = 54, D(Xi) = 2, |C| = 15, |E| = 4

Exact Time = 30 sec using Cutset Conditioning

CPCS54, n=54, |C|=15, |E|=3

0

0.001

0.002

0.003

0.004

0 1000 2000 3000 4000 5000

# samples

Cutset Gibbs

CPCS54, n=54, |C|=15, |E|=3

0

0.0002

0.0004

0.0006

0.0008

0 5 10 15 20 25

Time(sec)

Cutset Gibbs

Page 71: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

74

CPCS179 Test Results

MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35

Exact Time = 122 sec using Loop-Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

100 500 1000 2000 3000 4000

# samples

Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

0 20 40 60 80

Time(sec)

Cutset Gibbs

Page 72: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

75

CPCS360b Test Results

MSE vs. #samples (left) and time (right)

Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36

Exact Time > 60 min using Cutset Conditioning

Exact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

0 200 400 600 800 1000

# samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

1 2 3 5 10 20 30 40 50 60

Time(sec)

Cutset Gibbs

Page 73: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

76

Random Networks

MSE vs. #samples (left) and time (right)

|X| = 100, D(Xi) =2,|C| = 13, |E| = 15-20

Exact Time = 30 sec using Cutset Conditioning

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0 200 400 600 800 1000 1200

# samples

Cutset Gibbs

RANDOM, n=100, |C|=13, |E|=15-20

0

0.0002

0.0004

0.0006

0.0008

0.001

0 1 2 3 4 5 6 7 8 9 10 11

Time(sec)

Cutset Gibbs

Page 74: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

77

Coding Networks

MSE vs. time (right)

Non-Ergodic, |X| = 100, D(Xi)=2, |C| = 13-16, |E| = 50

Sample Ergodic Subspace U={U1, U2,…Uk}

Exact Time = 50 sec using Cutset Conditioning

x1 x2 x3 x4

u1 u2 u3 u4

p1 p2 p3 p4

y4y3y2y1

Coding Networks, n=100, |C|=12-14

0.001

0.01

0.1

0 10 20 30 40 50 60

Time(sec)

IBP Gibbs Cutset

Page 75: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

78

Non-Ergodic Hailfinder

MSE vs. #samples (left) and time (right)

Non-Ergodic, |X| = 56, |C| = 5, 2 <=D(Xi) <=11, |E| = 0

Exact Time = 2 sec using Loop-Cutset Conditioning

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

1

1 2 3 4 5 6 7 8 9 10

Time(sec)

Cutset Gibbs

HailFinder, n=56, |C|=5, |E|=1

0.0001

0.001

0.01

0.1

0 500 1000 1500

# samples

Cutset Gibbs

Page 76: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

79

Non-Ergodic CPCS360b - MSE

cpcs360b, N=360, |E|=[20-34], w*=20, MSE

0

0.000005

0.00001

0.000015

0.00002

0.000025

0 200 400 600 800 1000 1200 1400 1600

Time (sec)

Gibbs

IBP

|C|=26,fw=3

|C|=48,fw=2

MSE vs. Time

Non-Ergodic, |X| = 360, |C| = 26, D(Xi)=2

Exact Time = 50 min using BTE

Page 77: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

80

Non-Ergodic CPCS360b - MaxErr

cpcs360b, N=360, |E|=[20-34], MaxErr

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0 200 400 600 800 1000 1200 1400 1600

Time (sec)

Gibbs

IBP

|C|=26,fw=3

|C|=48,fw=2

Page 78: 1 Sampling Bayesian Networks ICS 295 2008. 2 Answering BN Queries Probability of Evidence P(e) ? NP-hard Conditional Prob. P(x i |e) ? NP-hard MPE x =

81

Importance vs. Gibbs

T

tt

tt

t

T

t

t

T

t

xq

xpxf

Tf

exqx

xfT

xf

expexp

expx

1

1

)(

)()(1

)|(:Importance

)(1

)(

)|()|(~)|(~ :Gibbs

wt