Towards Likelihood Free Inference

50
Queensland University of Technology CRICOS No. 000213J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane [email protected] Joint work with Rob Reeves

description

Towards Likelihood Free Inference. Tony Pettitt QUT, Brisbane [email protected] Joint work with Rob Reeves. Outline. Some problems with intractable likelihoods . Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. - PowerPoint PPT Presentation

Transcript of Towards Likelihood Free Inference

Page 1: Towards Likelihood Free Inference

Queensland University of Technology

CRICOS No. 000213J

Towards Likelihood Free Inference

Tony PettittQUT, Brisbane

[email protected]

Joint work with Rob Reeves

Page 2: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 3: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Stochastic models (Riley et al, 2003)

Macroparasite within a host.Juvenile worm grows to adulthood in a cat.Host fights back with immunity.Number of Juveniles, Adults and amount of Immunity (all integer).

evolve through time according to Markov process unknown parameters, eg

Juvenile → Adult rate of maturationImmunity changes with timeJuveniles die due to Immunity

Moment closure approximations for distribution of limited to restricted parameter values.

( ), ( ), ( )J t A t I t

( ( ), ( ), ( ))J t A t I t

Page 4: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Numerical computation of limited by small maximum values of J, A, I.

Can simulate process easily.

Data: J at t=0 and A at t (sacrifice of cat), replicated with several cats

( , , )pr J A I

Source: Riley et al, 2003.

( ( ), ( ), ( ))J t A t I t

Page 5: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Other stochastic process models include

spatial stochastic expansion of species

(Hamilton et al, 2005; Estoup et al, 2004)

birth-death-mutation process for estimating transmission rate from TB genotyping

(Tanaka et al, 2006)

population genetic models, eg coalescent models

(Marjoram et al 2003)

Likelihood free Bayesian MCMC methods are often employed with quite precise priors.

Page 6: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Normalizing constant/partition function problem.

The algebraic form of the distribution for y is known but it is not normalized, eg Ising model

For means neighbours (on a lattice, say). The normalizing constant involves in general a sum over terms.

Write

0 1~

( | ) exp ( )i i ji i j

p y y Ind y y

{ 1,1} 1, , and ~iy i n i j

( ; ) known( | )

( ) unknown

f yp y

z

2n

Page 7: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

N-S and E-W neighbourhood

Page 8: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 9: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Monte Carlo methods and Inference.

Intractable likelihood, instead use easily simulated values of y.

Simulated method of moments (McFadden, 1989).

Method of estimation: comparing theoretical moments or frequencies with observed moments or frequencies.

Can be implemented using a chi-squared goodness-fit-statistic, eg Riley et al, 2003. Data: number of adult worms in cat at sacrifice.

Page 10: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Source: Riley et al 2003.

Plot of goodness-of-fit statistic versus parameter. Greedy Monte Carlo. Precision of estimate?

Page 11: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 12: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

3. Normalizing constant/partition function and MCMC

(half-way to likelihood free inference)

Here we assume (Møller, Pettitt, Reeves and Berthelsen, 2006)

Key idea Importance sample estimate of given by

Sample

.

( ; ) known( | )

( ) unknown

z( ) ( ; ) and difficult to find.

f yp y

z

f y dy

( )

( )

z

z

~ ( | ) theny p y ( ; ) z( )

unbiased estimate of ( ; ) z( )

f y

f y

Page 13: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Used off-line to estimate then carry out standard Metropolis-

Hastings with interpolation over a grid of values.( eg Green

and Richardson, 2002, in a Potts model).

Standard Metropolis Hastings: Simulating from target distribution

Acceptance ratio for changing

accepted with probability .

Key Question: Can be calculated on-line or avoided?

( ) / ( )z z ( )

( )

z

z

( | ) ( | ) ( ).p y p y p

, proposal ( | )q ( | ) ( ) ( | )

( | ) ( | ) ( ) ( | )

( ; ) ( ) ( | ) ( )

( ; ) ( ) ( | ) ( )

p y p qA

p y p q

f y p q z

f y p q z

min{1, ( | )}A ( )

( ')

z

z

Page 14: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

On-line algorithm – single auxiliary variable method.

Introduce auxiliary variable x on same space as y and extend target distribution for the MCMC

Key Question: How to choose distribution of x so that

removed from

Now acceptance ratio is as a new pair proposed.

Proposal becomes .

Assume the factorisation

Choose the proposal so that

Then algebra → cancellation of and

does not depend on

( , | ) ( | , ) ( | ) ( ).p x y p x y p y p ( )z

( | ).A ( , | , )A x x ( , )x

( | )q ( , | , )q x x

( , | , ) ( | ) ( | , )q x x q x q x

( ; )( | )

( )

f xq x

z

( )z s( , | , )A x x ( )z

Page 15: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Note: Need perfect or exact simulation from for the proposal.

Key Question: How to choose , the auxiliary variable distribution?

The best choice

( ; )

( )

f y

z

( | , )p x y

( | , ) ( , | , )

( ; ) but ( ) needed in M-H!!

( )

p x y q x x

f xz

z

Page 16: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Choice (i)

Page 17: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Choice (ii)

Page 18: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Choice (i)

Fix , say at a good estimate of . Then

so does not depend on only y and cancels in .

Choice (ii)

Eg Partially ordered Markov mesh model for Ising data

Comment

Both choices can suffer from getting stuck because

can be very different from the ideal .

ˆ( )y ˆ( ; ( ))

( | , )ˆ( ( ))

f x yp x y

z y

ˆ( )z ( | )A

( | , ) approximation top x y ( ; )

, z( )

f x

( | , )p x y ( ; )

( )

f x

z

Page 19: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

0 1=.2, =.160 by 60 array with

Single auxiliary variable method

(Moller et al, 2006)

Auxiliary variable is Choice (ii).

Approximation to Ising model.

Partially ordered Ma

Example: Ising Model

Run chain 500,000 iterations and thin 1 in 100

rkov mesh

model with same neighbourhood as Ising

DAG with N, W as parents, S, E as children

Page 20: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Source: Møller et al, 2006

Single auxiliary method tends to get stuckMurray et al (2006) offer suggestions involving multiple auxiliary variables

Page 21: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 22: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

4. Likelihood free MCMC

Single Auxiliary Variable Method as almost Approximate Bayesian Computation (ABC)

We wish to eliminate or equivalently , the likelihood from the M-H algorithm.

Solution: The distribution of x given y and puts all probability on y, the observed data,

then

with the likelihood

This might work for discrete data, sample size small, and if the proposal were a very good approximation to . If sufficient statistics s(y) exist then

( ; ) / ( )f y z ( | )p y

( | , ) ( )p x y Ind x y

( ) ( | )( , | , ) ( )

( ) ( | )

p qA x x Ind x y

p q

( ; )~

( )

f xx

z

( | ) ( | , )q q y ( | )p y

( ( ) ( )) replaces ( ).Ind s x s y Ind x y

Page 23: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 24: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Likelihood free methods, ABC- MCMC

Change of notation, observed data (fixed), y is pseudo data or auxiliary data generated from the likelihood .

Instead of , now have y close to in the sense of statistics s( ),

distance

ABC allows rather than equal to 0

Target distribution for variables

Standard M-H with proposals

(Marjoram et al 2003; ABC MCMC)

for acceptance of .

Ideally should be small but this leads to very small acceptance probabilities.

obsy( | )p y

obsy y obsy obsy

( , ) || ( ) ( ) || .obs obsd y y s y s y

( , )obsd y y

( , )y ( , | , ) ( | ) ( ) ( ( , ) ).obs obsp y y p y p Ind d y y

~ ( | , )

~ ( | )obsq y

y p y

( , ) ( , )y y

Page 25: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Issues of implementing Metropolis-Hastings ABC

(a) Tune for to get reasonable acceptance probabilities;

(b) All satisfying (hard) accepted

with equal probability

rather than smoothly weighted by (soft).

(c) Choose summary statistics carefully if no sufficient statistics

( , )y ( , )obsd y y

( , )obsd y y

Page 26: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Tune for

A solution is to allow to vary as a parameter (Bortot et al, 2004). The target distribution is

Run chain and post filter output for small values of

( , , | ) ( | ) ( ) ( ( , ) ) ( ).obs obsp y y p y p Ind d y y p

Page 27: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 28: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Beaumont, Zhang and Balding (2002) use kernel smoothing in ABC-MC

Page 29: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

(Reeves and Pettitt , 2005)

1Replace ( ( , ) ) by exp( ( , )), a soft

2

constraint, with replacing .

Interpret as an approximate likelihood for

Ke

Soft Constraint f

y Idea.

or ( , )ob

obs

s

obsInd d y y d y y

d y y

( | )

Simple case ( | ) ( , ) with φ known

Approximating Hierarchical model with joint probability

( ) ( | ) ( | )

obs

obs

obs

y y

y y N y

p p y p y y

Page 30: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Approximating Hierarchical Model

Page 31: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

.

2

2

Sufficient statistic

( , ) ( ) and "likelihood"

( | ) ( , )

Integrate out pseudo data from ( , ,

Normal model with mean , variance ,sample .Simple case.

obs

obs obs

obs

obs

y

d y y y y

y y N y

y p y y

n

2

) to get

marginal ( , ) to obtain

( , ) ( ; , ) ( )

Approximation using pseudo data and match to observed data

introduces in likelihood ap

proximavariance inflationKey idea

obs

obs obs

p y

p y N y pn

tion. Will affect posterior

mean (if prior not improper vague) and variance.

Page 32: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

1Implement a tempering scheme with 0

ˆand is the posterior mean estimated from

chain .

Combine estimate

General scheme to overcome variance inflationand bias of posterior

k

j

j

20 1 2

0

s using weighted least squares

and bias estimated using a quadratic in , say

ˆ , 1, ,

gives combined estimate of posterior mean for 0

Similarly for pos

j j j error j k

terior variance and quantile estimates

using chain

(compare Liu, 2001, MCMC for "indirect models")

j

Page 33: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 34: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

-1 and , propose swaps of

( , ) values to improve mixing.

M-H ratio does not involve intractable likelihood.

Soft constraint improve

Parallel Temperi

For chai

ng t

ns

o improve mixing

j j

y

s mixing

Page 35: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 36: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

0 1

2 2 2

=.2, =.1

Compare

,30 ,15

with correct sufficient statistics

and (

60 by 60 array with

Exact method, auxiliary variable method

ABC with = 50

Example: Ising Model (continued)

iy Ind y

)

Run chains 500,000 and thin 1 in 100

i ji j

y

Page 37: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 38: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 39: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 40: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

(Gourieroux et al, 1993)

(also Heggland and Frigessi, 2004)

Observe data .

Suppose True model ( | ) is intractable

but Approximating model

Indirect Inference

obs

T

y

p y ( | ) is tractable,

and with the same support.

ˆCan find ( ) easily.

ˆPut into model and obtain ( | ).

Repeat many times for simulated from ( | )

to find an

A

T

p x

x y

x

y x y

y p y

ˆaccurate value ( ).

ˆ ˆ ˆFind so that ( ) is close to ( ) giving ( ).obs obsy y

Page 41: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

.(Reeves and Pettitt, 2005; P & R, 2006)

Consider the True hierarchical model

( ) ( | )

and the Approximating hierarchical model

( ) ( | ) ( | )

Hierarchical Model using ideas of Indirect Inference

T obs

T A

p p y

p p y p y

( | )

with

is pseudo data from ( | )

( | ) being the True intractable likelihood

( | ) is an Approximating model distribution

( | ) is an Approximating likelihood eval

posterior

A obs

T

T

A

A obs

p y

y p y

p y

p y

p y

uated

at the observed data

: Marginilising the Approximating HM over

random and should be close to the True HM

Key point

obsy

y

Page 42: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 43: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Can be implemented with the proposal using

the idea from Moller et al (2006).

( , , | , , ) ( | ) ( | ) ( | )

and then the MH acceptance

A Metropolis Hastings algorithm for Indirect Inference

T Aq y y q p y p y

probability is given with

( | ) ( | ) ( )( , , | , , )

( | ) ( | ) ( )

which replaces the likelihood ratio for intractable

and replaces it by a ratio involving tractable .

A obs

A obs

T

A

p y q pA y y

p y q p

p

p

Page 44: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

0 1=.2, =.1

MH Indirect Inference implemented with

Approximate Likelihood taken as the POMM

with 2 parameters equivalent to Ising model

60 by 60 array with

Example: Ising Model (continued)

0 1,

20,000 iterations with Approximating posterior

found from "side MH chain" with 400 iterations.

No summary statistics required, implied by

Approximating Likelihood

Page 45: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 46: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Page 47: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Some points

• How could approximate posterior be made more precise?– Use more parameters in approximating likelihood, the

POMM? (Gouriéroux at al (1993), Heggland and Frigassi (2004) discuss this in the frequentist setting)

– More iterations for side chain “exact” calculation of approximate posterior?

• How to choose a good approximating likelihood?• Relationship to summary statistics approach?

Page 48: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.

Page 49: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Conclusions1. For the normalizing constant problem we presented a single on-

line M-H algorithm.2. We linked these ideas to ABC-MCMC and developed a

hierarchical model (HM) to approximate the true posterior – showed variance inflation.

3. We showed that the approximating HM could be tempered swaps made to improve mixing using parallel chains, variance inflation effect corrected by smoothing posterior summaries

from the tempered chains.

4. We extended indirect inference to an HM to find a way of implementing the Metropolis Hastings algorithm which is likelihood free.

5. We demonstrated the ideas with the Ising/autologistic model.6. Application to specific examples is on-going and requires

refinement of general approaches.

Page 50: Towards Likelihood Free Inference

CRICOS No. 000213Ja university for the worldrealR

Acknowledgements

Support of the Australian Research Council

Co-authors Rob Reeves, Jesper Møller, Kasper Berthelsen

Discussions with Malcolm Faddy, Gareth Ridall, Chris Glasbey, Grant Hamilton …