Importance Sampling

27
Importance Sampling ICS 276 Fall 2007 Rina Dechter

description

Importance Sampling. ICS 276 Fall 2007 Rina Dechter. Outline. Gibbs Sampling Advances in Gibbs sampling Blocking Cutset sampling (Rao-Blackwellisation) Importance Sampling Advances in Importance Sampling Particle Filtering. Importance Sampling Theory. Importance Sampling Theory. - PowerPoint PPT Presentation

Transcript of Importance Sampling

Page 1: Importance Sampling

Importance Sampling

ICS 276Fall 2007

Rina Dechter

Page 2: Importance Sampling

Outline

Gibbs Sampling Advances in Gibbs sampling

Blocking Cutset sampling (Rao-Blackwellisation)

Importance Sampling Advances in Importance Sampling Particle Filtering

Page 3: Importance Sampling

Importance Sampling Theory

Z

EX

n

iii

EX

eEZPeEP

eXpaXPeEEXPeEP

),()(

)),(|(),\()(\ 1\

simplify E,\XZLet

Page 4: Importance Sampling

Importance Sampling Theory

Given a distribution called the proposal distribution Q (such that P(Z=z,e)>0=> Q(Z=z)>0)

Zz

eEzZPeEP ),()(

)()(

),()( zZQ

zZQ

eEzZPeEP

Zz

Zz

Q zZzQZE )( :value expected of definition By

)()(

),()( zZwE

zZQ

eEzZPEeEP QQ

w(Z=z) is called as importance weight

Page 5: Importance Sampling

Importance Sampling Theory

)()(

),()( zZwE

zZQ

eEzZPEeEP QQ

)()(ˆ ,N

)(1

)(

),(1)(ˆ

)z,...,(z Samples

Q fromdrawn samples ofset aGiven

11

n1

eEPeEPAs

zZwNzZQ

eEzZP

NeEP

N

i

ii

N

ii

i

Underlying principle, Approximate Average over a set of numbers by an average over a set of sampled numbers

Page 6: Importance Sampling

Importance Sampling (Informally) Express the problem as computing the

average over a set of real numbers Sample a subset of real numbers Approximate the true average by sample

average. True Average:

Average of (0.11, 0.24, 0.55, 0.77, 0.88,0.99)=0.59 Sample Average over 2 samples:

Average of (0.24, 0.77) = 0.505

Page 7: Importance Sampling

How to generate samples from Q

Express Q in product form: Q(Z)=Q(Z1)Q(Z2|Z1)….Q(Zn|Z1,..Zn-1)

Sample along the order Z1,..Zn

Example: Q(Z1)=(0.2,0.8) Q(Z2|Z1)=(0.2,0.8,0.1,0.9) Q(Z3|Z1,Z2)=Q(Z3|Z1)=(0.5,0.5,0.3,0.7)

N

ii

i

zZQ

eEzZP

NeEP

1 )(

),(1)(

Page 8: Importance Sampling

How to sample from Q

Generate a random number between 0 and 1

Q(Z1)=(0.2,0.8)Q(Z2|Z1)=(0.2,0.8,0.1,0.9)Q(Z3|Z1,Z2)=Q(Z3|Z1)=(0.5,0.5,0.3,0.7)

0 10.2

Which value to select for Z1?

Domains of each variable is {0,1}

01

Page 9: Importance Sampling

How to sample from Q?

Each Sample Z=z Sample Z1=z1 from Q(Z1) Sample Z2=z2 from Q(Z2|Z1=z1) Sample Z3=z3 from Q(Z3|Z1=z1)

Generate N such samples

)(1

)(

),(1)(

)z,...,(z Samples

11

n1

iN

i

N

ii

i

zZwNzZQ

eEzZP

NeEP

Page 10: Importance Sampling

Likelihood weighting

Q= Prior Distribution=CPTs of the Bayesian network

Page 11: Importance Sampling

Likelihood weighting example

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

0)BC,|S)P(DC,|1S)P(X|0S)P(B|P(S)P(C0)B1,P(X

false0 and 1 where?)0,1( trueBXP

Page 12: Importance Sampling

Likelihood weighting example

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

Q=Prior

Q(S,C,D)=Q(S)*Q(C|S)*Q(D|C,B=0)

=P(S)P(C|S)P(D|C,B=0)

Sample S=s from P(S)

Sample C=c from P(C|S=s)

Sample D=d from P(D|C=c,B=0)

N

ii

i

zZQ

eEzZP

NeEP

1 )(

),(1)(

),|1()|0(

)0,|()|()(

)0,|(),|1()|0()|()(

)0,|()|()(

)0,1,,,(

)(

),()(

sScCXPsSBP

BcCdDPsScCPsSP

BcCdDPsScCXPsSBPsScCPsSP

BcCdDPsScCPsSP

BXdDcCsSP

zZQ

eEzZPzZw

i

ii

Page 13: Importance Sampling

The Algorithm

N

P

w(e)P(e)P

paePww

eX

paxPxX

EX

XXoX

w

N1k

(e)P

k

iikk

ii

iiii

i

ni

k

(e)ˆReturn

ˆˆ

)|(

Assign

)|( from sample

:),...,(order icalin topologeach each For

1

to

1

else

if

For

Page 14: Importance Sampling

How to solve belief updating?

eE

eExX

eEP

eExXPeExXP

ii

iiii

is Evidence :rDenominato

, is Evidence :Numerator

sampling importanceby r Denominato andNumerator Estimate

)(

),()|(

0 , z sample iff 1),(,

)(

)(),(

)|(ˆ

j

1

1

elsexXcontainszxwhere

zw

zwzx

eExXP

iij

i

N

j

j

N

j

jji

ii

Page 15: Importance Sampling

Difference between estimating P(E=e) and P(Xi=xi|E=e)

N

i

izwN

eEP1

)(1

)(ˆ

N

j

j

N

j

jji

ii

zw

zwzx

eExXP

1

1

)(

)(),(

)|(ˆ

UnbiasedAsymptotically Unbiased )()(ˆ eEPeEPEQ )|()|(ˆ eExXPeExXPE iiiiQ

)|()|(ˆlim eExXPeExXPE iiiiQN

Page 16: Importance Sampling

Proposal Distribution: Which is better?

e)P(E compute tosufficient is sample oneonly and

)()(ˆ then 0, varianceIf

ondistributi proposal variancelowprefer should one So

)()()(

is |)()(ˆ|y thatprobabilit The

22

2

eEPeEP

VariancezQeEPzw

eEPeEP

Zz

Page 17: Importance Sampling

Outline

Gibbs Sampling Advances in Gibbs sampling

Blocking Cutset sampling (Rao-Blackwellisation)

Importance Sampling Advances in Importance Sampling Particle Filtering

Page 18: Importance Sampling

Research Issues in Importance Sampling

Better Proposal Distribution Likelihood weighting

Fung and Chang, 1990; Shachter and Peot, 1990 AIS-BN

Cheng and Druzdzel, 2000 Iterative Belief Propagation

Changhe and Druzdzel, 2003 Iterative Join Graph Propagation and

variable ordering Gogate and Dechter, 2005

Page 19: Importance Sampling

Research Issues in Importance Sampling (Cheng and

Druzdzel 2000)

Adaptive Importance Sampling

k

)(ˆ Re

')(Q Update

)(N

1)(ˆe)(EP̂

Q z,...,z samples Generate

dok to1iFor

0)(ˆ

))(|(*..*))(|(*)()(Q Proposal Initial

1k

1

N1

2211

eEPturn

End

QQkQ

zweEP

from

eEP

ZpaZQZpaZQZQZ

kk

iN

jk

k

nn

Page 20: Importance Sampling

Adaptive Importance Sampling

General case Given k proposal distributions Take N samples out of each

distribution Approximate P(e)

1)(ˆ

1

k

j

proposaljthweightAvgk

eP

Page 21: Importance Sampling

Estimating Q'(z)

sampling importanceby estimated is

)Z,..,Z|(ZQ'each where

))(|('*..*))(|('*)(')(Q

1-i1i

221'

nn ZpaZQZpaZQZQZ

Page 22: Importance Sampling

Cutset importance sampling

Divide the Set of variables into two parts Cutset (C) and Remaining Variables

(R)

instancefor bel-Elim using computed is )|(

)|(*)(

)(1)(

~

1

j

jN

jj

j

cCRP

cCRPcCQ

cCP

NeEP

(Gogate and Dechter, 2005) and (Bidyuk and Dechter 2006)

Page 23: Importance Sampling

Outline

Gibbs Sampling Advances in Gibbs sampling

Blocking Cutset sampling (Rao-Blackwellisation)

Importance Sampling Advances in Importance Sampling Particle Filtering

Page 24: Importance Sampling

Dynamic Belief Networks (DBNs)

Bayesian Network at time t

Bayesian Network at time t+1

Transition arcs

Xt Xt+1

Yt Yt+1

X0 X1 X2

Y0 Y1 Y2

Unrolled DBN for t=0 to t=10

X10

Y10

Page 25: Importance Sampling

Query

Compute P(X 0:t |Y 0:t ) or P(X t |Y 0:t ) Example P(X0:10|Y0:10) or P(X10|Y0:10)

Hard!!! over a long time period Approximate! Sample!

Page 26: Importance Sampling

Particle Filtering (PF)

= “condensation” = “sequential Monte Carlo” = “survival of the fittest”

PF can treat any type of probability distribution, non-linearity, and non-stationarity;

PF are powerful sampling based inference/learning algorithms for DBNs.

Page 27: Importance Sampling

Particle Filtering

On white board