Bayesian inference - University of Edinburgh. Daunizeau Institute of Empirical Research in...
Transcript of Bayesian inference - University of Edinburgh. Daunizeau Institute of Empirical Research in...
J. Daunizeau
Institute of Empirical Research in Economics, Zurich, Switzerland
Brain and Spine Institute, Paris, France
Bayesian inference
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Degree of plausibility desiderata:- should be represented using real numbers (D1)
- should conform with intuition (D2)- should be consistent (D3)
a=2b=5
a=2
• normalization:
• marginalization:
• conditioning :(Bayes rule)
Bayesian paradigmprobability theory: basics
Bayesian paradigmderiving the likelihood function
- Model of data with unknown parameters:
( )y f θ= e.g., GLM: ( )f Xθ θ=
- But data is noisy: ( )y f θ ε= +
- Assume noise/residuals is ‘small’:
( ) 2
2
1exp
2p ε ε
σ
∝ −
( )4 0.05P ε σ> ≈
ε
→ Distribution of data, given fixed parameters:
( ) ( )( )2
2
1exp
2p y y fθ θ
σ
∝ − −
θ
f
Likelihood:
Prior:
Bayes rule:
Bayesian paradigmlikelihood, priors and the model evidence
θ
generative model m
Bayesian paradigmforward and inverse problems
( ),p y mϑ
forward problem
likelihood
( ),p y mϑ
inverse problem
posterior distribution
Principle of parsimony :
« plurality should not be assumed without necessity »
y=f(
x)y
= f
(x)
x
“Occam’s razor” :
model e
vid
en
ce p
(y|m
)
space of all data sets
Model evidence:
Bayesian paradigmmodel comparison
••• hierarchy
causality
Hierarchical modelsprinciple
Hierarchical modelsdirected acyclic graphs (DAGs)
•••
prior densities posterior densities
Hierarchical modelsunivariate linear hierarchical model
( )t t Y≡ t *
( )0*P t t H>
( )0p t H
( )0*P t t H α> ≤if then reject H0
• estimate parameters (obtain test stat.)
H
0:θ = 0• define the null, e.g.:
• apply decision rule, i.e.:
classical SPM
( )p yθ
θ
( )0P H y
( )0P H y α≥if then accept H0
• invert model (obtain posterior pdf)
H
0:θ > 0• define the null, e.g.:
• apply decision rule, i.e.:
Bayesian PPM
Frequentist versus Bayesian inferencea (quick) note on hypothesis testing
Y
• define the null and the alternative hypothesis in terms of priors, e.g.:
( )
( ) ( )
0 0
1 1
1 if 0:
0 otherwise
: 0,
H p H
H p H N
θθ
θ
==
= Σ
( )( )
0
1
1P H y
P H y≤if then reject H0• apply decision rule, i.e.:
y
Frequentist versus Bayesian inferencewhat about bilateral tests?
( )1p Y H
( )0p Y H
space of all datasets
• Savage-Dickey ratios (nested models, i.i.d. priors):
( ) ( )( )( )
1
0 1
1
0 ,
0
p y Hp y H p y H
p H
θ
θ
==
=
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Sampling methodsMCMC example: Gibbs sampling
Variational methodsVB / EM / ReML
→ VB : maximize the free energy F(q) w.r.t. the “variational” posterior q(θ)
under some (e.g., mean field, Laplace) approximation
( )1 or 2q θ
( )1 or 2,p y mθ
( )1 2, ,p y mθ θ
θ
1
θ
2
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
realignmentrealignment smoothingsmoothingnormalisationnormalisation general linear modelgeneral linear modeltemplatetemplate Gaussian Gaussian field theoryfield theoryp <0.05p <0.05statisticalstatisticalinferenceinference
segmentationsegmentationsegmentationsegmentationand normalisationand normalisationand normalisationand normalisation dynamic causaldynamic causaldynamic causaldynamic causalmodellingmodellingmodellingmodellingposterior probabilityposterior probabilityposterior probabilityposterior probabilitymaps (PPMs)maps (PPMs)maps (PPMs)maps (PPMs) multivariatemultivariatemultivariatemultivariatedecodingdecodingdecodingdecoding
grey matter CSFwhite matter
…
…
yi
ci λ
µk
µ2
µ1
σ1σ 2 σ k
class variances
class
means
ith voxel
value
ith voxellabel
class
frequencies
aMRI segmentationmixture of Gaussians (MoG) model
Decoding of brain imagesrecognizing brain states from fMRI
+
fixation cross
>>
paceresponse
log-evidence of X-Y sparse mappings:effect of lateralization
log-evidence of X-Y bilateral mappings:effect of spatial deployment
fMRI time series analysisspatial priors and model comparison
PPM: regions best explainedby short-term memory model
PPM: regions best explained by long-term memory model
fMRI time series
GLM coeff
prior variance
of GLM coeff
prior variance
of data noiseAR coeff
(correlated noise)
short-term memorydesign matrix (X)
long-term memorydesign matrix (X)
m2m1 m3 m4V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattentionV1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattentionm1 m2 m3 m4151050 V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention1.251.251.251.250.130.130.130.130.460.460.460.46 0.390.390.390.390.260.260.260.260.260.260.260.26 0.100.100.100.10 estimatedeffective synaptic strengthsfor best model (m4)models marginal likelihood
ln p y m( )
Dynamic Causal Modellingnetwork structure identification
1 2
31 2
31 2
3
1 2
3
time
( , , )x f x u θ=&
32θ
21θ
13θ
0t∆ →
t t+ ∆
t t− ∆
tu
13
uθ3
uθ
DCMs and DAGsa note on causality
m1
m2
diff
ere
nces in
lo
g-
mo
de
l e
vid
en
ces
( ) ( )1 2ln lnp y m p y m−
subjects
fixed effect
random effect
assume all subjects correspond to the same model
assume different subjects might correspond to different models
Dynamic Causal Modellingmodel comparison for group studies
I thank you for your attention.
A note on statistical significancelessons from the Neyman-Pearson lemma
• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test
( )( )
1
0
p y Hu
p y HΛ = ≥
is the most powerful test of size to test the null. ( )0p u Hα = Λ ≥
MVB (Bayes factor) u=1.09, power=56%
CCA (F-statistics)
F=2.20, power=20%
error I rate
1 -
err
or
II r
ate
ROC analysis
• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?