Bayesian inversion of deterministic dynamic causal models

Kay H. Brodersen1,2 & Ajita Gupta2

Mike West

1 Department of Economics, University of Zurich, Switzerland 2 Department of Computer Science, ETH Zurich, Switzerland

1. Problem setting Model; likelihood; prior; posterior.

2. Variational Laplace Factorization of posterior; why the free-energy; energies; gradient ascent; adaptive step size; example.

3. Sampling Transformation method; rejection method; Gibbs sampling; MCMC; Metropolis-Hastings; example.

4. Model comparison Model evidence; Bayes factors; free-energy; prior arithmetic mean; posterior harmonic mean; Savage-Dickey; example.

Overview

With material from Will Penny, Klaas Enno Stephan, Chris Bishop, and Justin Chumbley.

Problem setting 1

Model = likelihood + prior

Question 1: what do the data tell us about the model parameters?

compute the posterior

𝑝 𝜃 𝑦, 𝑚 =𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚

𝑝 𝑦 𝑚

Question 2: which model is best?

compute the model evidence

𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚) = ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃

Bayesian inference is conceptually straightforward

Variational Laplace 2

Variational Laplace in a nutshell

, | ,p y q q q

exp exp ln , ,

q I p y

Iterative updating of sufficient statistics of approx. posteriors by gradient ascent.

ln | , , , |

ln , , , , , |q

p y m F KL q p y

F p y KL q p m

Mean field approx.

Neg. free-energy approx. to model evidence.

Maximise neg. free energy wrt. q = minimise divergence,

by maximising variational energies

K.E. Stephan

Variational Laplace

Assumptions

mean-field approximation

Laplace approximation

W. Penny

Variational Laplace

Inversion strategy

Recall the relationship between the log model evidence and the negative free-energy 𝐹:

Maximizing 𝐹 implies two things:

(i) we obtain a good approximation to ln 𝑝 𝑦 𝑚

(ii) the KL divergence between 𝑞 𝜃 and 𝑝 𝜃 𝑦, 𝑚 becomes minimal

Practically, we can maximize F by iteratively (EM) maximizing the variational energies:

W. Penny

ln 𝑝 𝑦 𝑚 = 𝐸𝑞 ln 𝑝 𝑦 𝜃 − 𝐾𝐿 𝑞 𝜃 |𝑝 𝜃 𝑚

=: 𝐹

+ 𝐾𝐿 𝑞 𝜃 ||𝑝 𝜃 𝑦, 𝑚≥0

Variational Laplace

Implementation: gradient-ascent scheme (Newton’s method)

Compute curvature matrix

Compute gradient vector

𝑗𝜃 𝑖 =𝜕𝐼(𝜃)

𝜕𝜃(𝑖)

𝐻𝜃 𝑖, 𝑗 =𝜕2𝐼(𝜃)

𝜕𝜃(𝑖)𝜕𝜃(𝑗)

Newton’s Method for finding a root (1D)

( ( ))( ) ( )

'( ( ))

f x oldx new x old

f x old

Variational Laplace

New estimate

Compute Newton update (change)

Big curvature -> small step

Small curvature -> big step

Implementation: gradient-ascent scheme (Newton’s method)

Newton’s Method for finding a root (1D)

( ( ))( ) ( )

'( ( ))

f x oldx new x old

f x old

W. Penny

Newton’s method – demonstration

Newton’s method is very efficient. However, its solution is not insensitive to the starting point, as shown above.

http://demonstrations.wolfram.com/LearningNewtonsMethod/

Variational Laplace

Nonlinear regression (example)

Ground truth

(known parameter values that were used to generate the data on the left):

Model (likelihood):

W. Penny

Variational Laplace

We begin by defining our prior: Prior density ( ) (ground truth)

W. Penny

Variational Laplace

Posterior density ( ) VL optimization (4 iterations)

Starting point ( 2.9, 1.65 )

VL estimate ( 3.4, 2.1 )

True value ( 3.4012, 2.0794 )

W. Penny

Sampling 3

Sampling

Deterministic approximations

computationally efficient

efficient representation

learning rules may give additional insight

application initially involves hard work

systematic error

Stochastic approximations

asymptotically exact

easily applicable general-purpose algorithms

computationally expensive

storage intensive

We can obtain samples from some distribution 𝑝 𝑧 by first sampling from the uniform distribution and then transforming these samples.

Strategy 1 – Transformation method

0 0.5 10

0 1 2 3 40

𝑢 = 0.9 𝑧 = 1.15

transformation: 𝑧 𝜏 = 𝐹−1 𝑢 𝜏

yields high-quality samples

easy to implement

computationally efficient

obtaining the inverse cdf can be difficult

Strategy 2 – Rejection method

When the transformation method cannot be applied, we can resort to a more general method called rejection sampling. Here, we draw random numbers from a simpler proposal distribution 𝑞(𝑧) and keep only some of these samples.

The proposal distribution 𝑞(𝑧), scaled by a factor 𝑘, represents an envelope of 𝑝(𝑧).

yields high-quality samples

easy to implement

can be computationally efficient

computationally inefficient if proposal is a poor approximation Bishop (2007) PRML

Often the joint distribution of several random variables is unavailable, whereas the full-conditional distributions are available. In this case, we can cycle over full-conditionals to obtain samples from the joint distribution.

Strategy 3 – Gibbs sampling

easy to implement

samples are serially correlated

the full-conditions may not be available

Bishop (2007) PRML

Idea: we can sample from a large class of distributions and overcome the problems that previous methods face in high dimensions using a framework called Markov Chain Monte Carlo.

Strategy 4 – Markov Chain Monte Carlo (MCMC)

Increase in density:

Decrease in density:

p(z*) p(zτ)

p(zτ)

)(~*)(~

M. Dümcke

MCMC demonstration: finding a good proposal density

When the proposal distribution is too narrow, we might miss a mode.

When it is too wide, we obtain long constant stretches without an acceptance.

http://demonstrations.wolfram.com/MarkovChainMonteCarloSimulationUsingTheMetropolisAlgorithm/

MH creates as series of random points 𝜃 1 , 𝜃 2 , … , whose distribution converges to the target distribution of interest. For us, this is the posterior density 𝑝(𝜃|𝑦).

We could use the following proposal distribution:

𝑞 𝜃 𝜏 𝜃 𝜏−1 = 𝒩 0, 𝜎𝐶𝜃

MCMC for DCM

prior covariance

scaling factor adapted such that acceptance rate is between 20% and 40%

W. Penny

MCMC for DCM

W. Penny

MCMC – example

W. Penny

MCMC – example

W. Penny

MCMC – example

Variational-Laplace Metropolis-Hastings

W. Penny

Model comparison 4

Model evidence

W. Penny

Prior arithmetic mean

W. Penny

Posterior harmonic mean

W. Penny

In many situations we wish to compare models that are nested. For example:

𝑚𝐹: full model with parameters 𝜃 = 𝜃1, 𝜃2

𝑚𝑅: reduced model with 𝜃 = (𝜃1, 0)

Savage-Dickey ratio

In this case, we can use the Savage-Dickey ratio to obtain a Bayes factor without having to compute the two model evidences:

𝐵𝑅𝐹 =𝑝 𝜃2 = 0 𝑦, 𝑚𝐹

𝑝 𝜃2 = 0 𝑚𝐹

𝐵𝑅𝐹 = 0.9

W. Penny

Comparison of methods

Chumbley et al. (2007) NeuroImage

MCMC VL

Comparison of methods

Chumbley et al. (2007) NeuroImage

Bayesian inversion of deterministic dynamic causal models

Documents

Transcript of Bayesian inversion of deterministic dynamic causal models

Uncovering Deterministic Causal Structures: A Boolean Approach · Introduction Since the early nineties, the philosophical literature on causal reasoning ... well known, all the different

Inversion finale du sujet ou inversion post-verbale ?

Homogenization Results for a Deterministic Multi … · Homogenization Results for a Deterministic Multi-domains Periodic Control ... Homogenization Results for a Deterministic ...

Probing causal mechanisms and strengthening causal ...dev1.education.uconn.edu/m3c/assets/File/Probing causal mechanisms... · Probing causal mechanisms and strengthening causal inference

Inversion of Elastic Impedance for Unconsolidated Sediments · using AVO (termed “AVO inversion”), and poststack inversion using elastic impedance (termed “EI inversion”).

Exploiting Negative Curvature in Deterministic and ... Negative Curvature in Deterministic and ... in our deterministic algorithms and provide the ... Curvature in Deterministic and

PROBABILISTIC SEISMIC INVERSION BASED ON ROCK-PHYSICS ... · these questions, I link reservoir thickness, lithology, porosity, and saturation to seismic data by coupling deterministic

Deterministic ethernet

INVERSION TABLE - LuckyVitamin Up Exercises ... • Stop using the inversion table immediately if you ... Inversion Table is in the full 90 degree inversion ...

pwross/Ross Spectrum Inversion for Han… · Web viewequivalent in their causal relations to stimuli, other mental states, and behavior, precisely the factors by which the standard

Multiscale Geometric Integration of Deterministic and ... Geometric Integration of Deterministic and ... Multiscale Geometric Integration of Deterministic and Stochastic ... Kapitza’s

Deterministic vs. Non-Deterministic Graph Property T esting

« INVERSION-BASED CONTROL€¦ · • Inversion-based control methodology 2. Inversion of EMR elements • Inversion of accumulation elements • Inversion of conversion elements

Locally Causal and Deterministic Interpretations of ...

Causal Laws I and Causal Instances

Deterministic Optimization

Goyder 15.50 - Inversion report-feb 23 2016...Standard deterministic inversion that looks for a single ‘best’ model through an iterative process using a so‐called local optimization

Causal KL: Evaluating Causal Discovery

Technical Note: Coloured, Deterministic & Stochastic …Technical Note: Coloured, Deterministic & Stochastic Inversion ... the well log data is used to ... advantages of coloured inversion

Hinode inversion strategy Attacking inversion problems