CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Monte Carlo Methods for Probabilistic...

CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNINGMonte Carlo Methods for Probabilistic Inference

AGENDA

Monte Carlo methods O(1/sqrt(N)) standard deviation

For Bayesian inference Likelihood weighting Gibbs sampling

MONTE CARLO INTEGRATION

Estimate large integrals/sums: I = f(x)p(x) dx I = f(x)p(x)

Using a sample of N i.i.d. samples from p(x) I 1/N f(x(i))

Examples: [a,b] f(x) dx (b-a)/N S f(x(i)) E[X] = x p(x) dx 1/N S x(i)

Volume of a set in Rn

MEAN & VARIANCE OF ESTIMATE

Let IN be the random variable denoting the estimate of the integral with N samples

What is the bias (mean error) E[I-IN]?

What is the bias (mean error) E[I-IN]? E[I-IN]=I-E[IN] (linearity of expectation)

= E[f(x)] - 1/N S E[f(x(i))] (definition of I and IN)

= 1/N S (E[f(x)]-E[f(x(i))]) = 1/N S 0 (x and x(i) are distributed

w.r.t. p(x))= 0

What is the bias (mean error) E[I-IN]? Unbiased estimator

What is the variance Var[IN]?

What is the variance Var[IN]? Var[IN] = Var[1/N S f(x(i))] (definition)

= 1/N2 Var[S f(x(i))] (scaling of variance)

= 1/N2 S Var[f(x(i))] (variance of a sum of independent variables)

= 1/N2 S Var[f(x(i))]= 1/N Var[f(x)] (i.i.d. sample)

What is the variance Var[IN]? 1/N Var[f(x)]

Standard deviation: O(1/sqrt(N))

APPROXIMATE INFERENCE THROUGH SAMPLING

Unconditional simulation: To estimate the probability of a coin flipping

heads, I can flip it a huge number of times and count the fraction of heads observed

APPROXIMATE INFERENCE THROUGH SAMPLING

Unconditional simulation: To estimate the probability of a coin flipping

heads, I can flip it a huge number of times and count the fraction of heads observed

Conditional simulation: To estimate the probability P(H) that a coin

picked out of bucket B flips heads: Repeat for i=1,…,N:1. Pick a coin C out of a random bucket b(i) chosen

with probability P(B)2. h(i) = flip C according to probability P(H|b(i))3. Sample (h(i),b(i)) comes from distribution P(H,B)

Result approximates P(H,B)

MONTE CARLO INFERENCE IN BAYES NETS

BN over variables X Repeat for i=1,…,N

In top-down order, generate x(i) as follows: Sample xj

(i) ~ P(Xj |paXj(i))

(RHS is taken by putting parent values in sample into the CPT for Xj)

Sample x(1)… x(N) approximates the

distribution over X

APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION

Sample from the joint distribution

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=0A=0J=1M=0

APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION

As more samples are generated, the distribution of the samples approaches the joint distribution

B=0E=0A=0J=1M=0

B=0E=0A=0J=0M=0

B=1E=0A=1J=1M=0

BASIC METHOD FOR HANDLING EVIDENCE

Inference: given evidence E=e (e.g., J=1), approximate P(X/E|E=e)

Remove the samples that conflict

B=0E=0A=0J=1M=0

B=0E=0A=0J=0M=0

B=1E=0A=1J=1M=0

Distribution of remaining samples approximates the conditional distribution

RARE EVENT PROBLEM:

What if some events are really rare (e.g., burglary & earthquake ?)

# of samples must be huge to get a reasonable estimate

Solution: likelihood weighting Enforce that each sample agrees with evidence While generating a sample, keep track of the

ratio of(how likely the sampled value is to occur in the real world)

(how likely you were to generate the sampled value)

LIKELIHOOD WEIGHTING

Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=1

w=0.008

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=1A=1

w=0.0023

A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=1A=1M=1J=1

w=0.0016

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=0

w=3.988

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=0A=1

w=0.004

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=0E=0A=1M=1J=1

w=0.0028

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=1E=0A=1

w=0.00375

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=1E=0A=1M=1J=1

w=0.0026

B E P(A|…)

0.950.940.290.001

Burglary Earthquake

MaryCallsJohnCalls

A P(J|…)

0.900.05

A P(M|…)

0.700.01

B=1E=1A=1M=1J=1

w=5e-7

N=4 gives P(B|A,M)~=0.371 Exact inference gives P(B|A,M) = 0.375

B=0E=1A=1M=1J=1

w=0.0016

B=0E=0A=1M=1J=1

w=0.0028

B=1E=0A=1M=1J=1

w=0.0026

B=1E=1A=1M=1J=1

ANOTHER RARE-EVENT PROBLEM

B=b given as evidence Probability each bi is rare given all but one

setting of Ai (say, Ai=1)

Chance of sampling all 1’s is very low => most likelihood weights will be too low

Problem: evidence is not being used to sample A’s effectively (i.e., near P(Ai|b))

A1 A2 A10

B1 B2 B10

GIBBS SAMPLING

Idea: reduce the computational burden of sampling from a multidimensional distribution P(x)=P(x1,…,xn) by doing repeated draws of individual attributes Cycle through j=1,…,n Sample xj ~ P(xj | x[1…j-1,j+1,…n])

Over the long run, the random walk taken by x approaches the true distribution P(x)

GIBBS SAMPLING IN BNS

Each Gibbs sampling step: 1) pick a variable Xi, 2) sample xi ~ P(Xi|X/Xi)

Look at values of “Markov blanket” of Xi: Parents PaXi

Children Y1,…,Yk

Parents of children (excluding Xi) PaY1/Xi, …, PaYk/Xi

Xi is independent of rest of network given Markov blanket

Sample xi~P(Xi|, Y1, PaY1/Xi, …, Yk, PaYk/Xi)= 1/Z P(Xi|PaXi) P(Y1|PaY1) *…* P(Yk|PaYk) Product of Xi’s factor and the factors of its

children

HANDLING EVIDENCE

Simply set each evidence variable to its appropriate value, don’t sample

Resulting walk approximates distribution P(X/E|E=e)

Uses evidence more efficiently than likelihood weighting

GIBBS SAMPLING ISSUES

Demonstrating correctness & convergence requires examining Markov Chain random walk (more later)

Need to take many steps before the effects of poor initialization wear off (mixing time) Difficult to tell how much is needed a priori

Numerous variants Known as Markov Chain Monte Carlo techniques

NEXT TIME

Continuous and hybrid distributions

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Monte Carlo Methods for Probabilistic...

Documents

Transcript of CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Monte Carlo Methods for Probabilistic...

CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Linear programming, quadratic programming, sequential quadratic programming.

G ENETIC A LGORITHMS Ranga Rodrigo March 5, 2014 1.

MAGE EGMENTATION PTIMIZATION Tunnel Face Image ...

CHAPTER 12 S TATISTICAL M ETHODS FOR O PTIMIZATION IN D ISCRETE P ROBLEMS

CS b553 : A lgorithms for Optimization and Learning

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Structure Learning.

PTIMIZATION OF A ACKYARD QUAPONIC RODUCTION YSTEM

INDR 262 INTRODUCTION TO O PTIMIZATION METHODS

DEVELOPMENT AND PTIMIZATION OF FOOD PACKAGING IN …

PNEUMATIC TIRE FORKLIFT · 2019. 10. 3. · lb-ft N-m 553 750 553 750 553 750 42 lb-ft N-m 553 750 553 750 43 at r.p.m. 1,400 1,400 1,400 43 at r.p.m. 1,400 1,400 4 Tire type –

Presenting a Multi - Objective athematical ptimization M odel for C …ijiepm.iust.ac.ir/article-1-536-fa.pdf · h Multi - Objective athematical ptimization Presenting a M odel for

CPU S CHEDULING A LGORITHMS Lecture 6 1 L.Mohammad R. Alkafagee.

A Cloud-based Decision Intelligence Application ntegrated ecision ptimization enter idocidoc.

Topology ptimization of a otorcycle wing rm nder ervice ... · Topology ptimization of a otorcycle wing ... Abaqus/CAE Optimization results ... The "mono—no-beam ' has been imported

Report on the CapacityCapacity, DemandDemand, and … Capacity , MW 0 2,447 2,411 2,393 2,376 2,359 2,359 2,359 2,359 2,359 50% of Non-Synchronous Ties, MW 553 553 553 553 553 553

EXPERIMENTAL STUDY OF RADIO FREQUENCY INTERFERENCE DETECTION A LGORITHMS IN MICROWAVE RADIOMETRY

S earch E ngine O ptimization търсещите машини & изграждане на успешни сайтове

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.

· 2019. 3. 19. · P AR TIAL_DCBT .......... . 552 PREFETCH_BY_LOAD ........ . 553 PREFETCH_BY_STREAM ....... . 553 PREFETCH_FOR_LOAD ....... . …

A LGORITHMS AND D ATA S TRUCTURES