The General Linear Model (GLM) · Statistical Parametric Mapping: The Analysis of Functional Brain...
Transcript of The General Linear Model (GLM) · Statistical Parametric Mapping: The Analysis of Functional Brain...
With many thanks for slides & images to:
FIL Methods group, Virginia Flanagin and Klaas Enno Stephan
Dr. Frederike Petzschner
Translational Neuromodeling Unit (TNU)Institute for Biomedical Engineering, University of Zurich & ETH Zurich
The General Linear Model (GLM)
Translational Neuromodeling Unit
Overview of SPM
Realignment Smoothing
Normalisation
General linear model
Statistical parametric map (SPM)Image time-series
Parameter estimates
Design matrix
Template
Kernel
Gaussian field theory
p <0.05
Statisticalinference
Research Question:
Where in the brain do we represent listening to sounds?
Image a very simple experiment…
Time
SINGLE VOXEL TIME SERIES…
TIME
Image a very simple experiment…
Time
Question: Is there a change in the BOLD response betweenlistening and rest?
Image a very simple experiment…
Time
Question: Is there a change in the BOLD response betweenlistening and rest?
linear model
effectsestimate
error estimatestatistic
You need a model of your data…
ß
BOLD signal
Time = + + erro
r
x1 x2 e
Explain your data… as a combination of experimental manipulation, confounds and errors
Single voxel regression model:regressor
!β1 !β2
y = x1β1 + x2β2 + e
BOLD signal
Time = + + erro
r
x1 x2 e
y = x1β1 + x2β2 + eSingle voxel regression model:
Explain your data… as a combination of experimental manipulation,confounds and errors
!β2!β1
eXy += b
n
= + + erro
r
e
eXy += b
1
n
p
p
1
n
1
n: number of scansp: number of regressors
The black and white version in SPM
Des
ignm
atrix
erro
r!β2!β1
The design matrix embodies all available knowledge about experimentally controlled factors and potential confounds.
Model assumptions
Designmatrix
errorYou want to estimate your parameters such that you minimize:
This can be done using an Ordinary least squares estimation(OLS) assuming an i.i.d. error
et2
t=1
N
∑
error
GLM assumes identical and independently distributed errors
i.i.d. = error covariance is a scalar multiple of the identity matrix
úû
ùêë
é=
1001
)(eCov úû
ùêë
é=
1004
)(eCov úû
ùêë
é=
2112
)(eCov
non-identity non-independencet1 t2
t1
t2
e ≈ N(0,σ 2I )
= +
e
úû
ùêë
é
2
1
bb
y X
„Option 1“: Per hand
How to fit the model and estimate the parameters?
erro
r
= +
e
úû
ùêë
é
2
1
bb
y X
How to fit the model and estimate the parameters?
erro
r
y = Xβe = y− y
e = y− Xβ
min(eTe) =min((y− Xβ)T (y− Xβ))
Data predicted by our model
Error between predicted and actual data
Goal is to determine the betas such that we minimize the quadratic error
OLS (Ordinary Least Squares)
eTe = (y− Xβ)T (y− Xβ)
eTe = (yT − βT XT )(y− Xβ)
eTe = yT y− yTXβ − βT XT y+ βT XTXβ
eTe = yT y− 2βT XT y+ βT XTXβ∂eTe∂β
= −2XT y+ 2XTXβ
0 = −2XT y+ 2XTXβ
β = (XTX)−1XT y
OLS (Ordinary Least Squares)The goal is to minimize the quadratic error between data and model
eTe = (y− Xβ)T (y− Xβ)
eTe = (yT − βT XT )(y− Xβ)
eTe = yT y− yTXβ − βT XT y+ βT XTXβ
eTe = yT y− 2βT XT y+ βT XTXβ∂eTe∂β
= −2XT y+ 2XTXβ
0 = −2XT y+ 2XTXβ
β = (XTX)−1XT y
OLS (Ordinary Least Squares)The goal is to minimize the quadratic error between data and model
eTe = (y− Xβ)T (y− Xβ)
eTe = (yT − βT XT )(y− Xβ)
eTe = yT y− yTXβ − βT XT y+ βT XTXβ
eTe = yT y− 2βT XT y+ βT XTXβ∂eTe∂β
= −2XT y+ 2XTXβ
0 = −2XT y+ 2XTXβ
β = (XTX)−1XT y
OLS (Ordinary Least Squares)
This is a scalar and the transpose of a scalar is a scalar J
The goal is to minimize the quadratic error between data and model
eTe = (y− Xβ)T (y− Xβ)
eTe = (yT − βT XT )(y− Xβ)
eTe = yT y− yTXβ − βT XT y+ βT XTXβ
eTe = yT y− 2βT XT y+ βT XTXβ∂eTe∂β
= −2XT y+ 2XTXβ
0 = −2XT y+ 2XTXβ
β = (XTX)−1XT y
OLS (Ordinary Least Squares)
This is a scalar and the transpose of a scalar is a scalar J
The goal is to minimize the quadratic error between data and model
eTe = (y− Xβ)T (y− Xβ)
eTe = (yT − βT XT )(y− Xβ)
eTe = yT y− yTXβ − βT XT y+ βT XTXβ
eTe = yT y− 2βT XT y+ βT XTXβ∂eTe∂β
= −2XT y+ 2XTXβ
0 = −2XT y+ 2XTXβ
β = (XTX)−1XT y
OLS (Ordinary Least Squares)
This is a scalar and the transpose of a scalar is a scalar J
You find the extremum of a function by taking its derivative and setting it to zero
The goal is to minimize the quadratic error between data and model
eTe = (y− Xβ)T (y− Xβ)
eTe = (yT − βT XT )(y− Xβ)
eTe = yT y− yTXβ − βT XT y+ βT XTXβ
eTe = yT y− 2βT XT y+ βT XTXβ∂eTe∂β
= −2XT y+ 2XTXβ
0 = −2XT y+ 2XTXβ
β = (XTX)−1XT y
OLS (Ordinary Least Squares)
This is a scalar and the transpose of a scalar is a scalar J
You find the extremum of a function by taking its derivative and setting it to zero
The goal is to minimize the quadratic error between data and model
y
e
Design space defined by X
x1
x2
A geometric perspective on the GLM
bˆ Xy =
yXXX TT 1)(ˆ -=bOLS estimates
x1
x2x2*
y
Correlated and orthogonal regressors
When x2 is orthogonalized with regard to x1, only the parameter estimate for x1 changes, not that for x2!
Correlated regressors = explained variance is shared between regressors
121
2211
==++=
bbbb exxy
1;1 *21
*2
*211
=>
++=
bb
bb exxy
Design space defined by X
linear model
effectsestimate
error estimatestatisticß
…but we are dealing with fMRI data
We are nearly there…
What are the problems?
n
= + + erro
r
n
p
1
n
!β2!β1
ò -=Ät
dtgftgf0
)()()( ttt
The response of a linear time-invariant (LTI) system is the convolution of the input with the system's response to an impulse (delta function).
Problem 1: Shape of BOLD response
Solution: Convolution model of the BOLD response
expected BOLD response = input function x impulse response function (HRF) ò -=Ä
t
dtgftgf0
)()()( ttt
blue = datagreen = predicted response, taking convolved with HRFred = predicted response, NOT taking into account the HRF
Problem 2: Low frequency noise
blue = datablack = mean + low-frequency driftgreen = predicted response, taking into account low-frequency driftred = predicted response, NOT taking into account low-frequency drift
Problem 2: Low frequency noise
blue = datablack = mean + low-frequency driftgreen = predicted response, taking into account low-frequency driftred = predicted response, NOT taking into account low-frequency drift
Linear model
discrete cosine transform (DCT) set
Solution 2: High pass filtering
Problem 3: Serial correlations
non-identity non-independencet1 t2
t1
t2
i.i.d
Cov(e) = 1 00 1
!
"#
$
%& Cov(e) = 4 0
0 1
!
"#
$
%& Cov(e) = 2 1
1 2
!
"#
$
%&
Cov(e)
n: number of scans
n
n
Problem 3: Serial correlations
• Transform the signal into a space where the error is iid
• Pre-whitening:
1. Use an enhanced noise model with multiple error covariance components, i.e. e ~ N(0,σ2V) instead of e ~ N(0, σ2I).
2. Use estimated serial correlation to specify filter matrix W for whitening the data.
WeWXWy += b
This is i.i.d
Problem 3: How to find W à Model the noise
Cov(e)
n: number of scans
n
n
autocovariancefunction
withttt aee e+= -1 ),0(~ 2se Nt
1st order autoregressive process: AR(1)
Model the noise: Multiple covariance components
),0(~ 2VNe s å=µ
iiQVeCovV
l)(
= Q1 Q2
Estimation of hyperparameters with EM (expectation maximisation) or ReML (restricted maximum likelihood).
V
enhanced noise model error covariance components Qand hyperparameters
!λ1 !+λ2
How do we define W ?
• Enhanced noise model
• Remember linear transform for Gaussians
• Choose W such that error covariance becomes spherical
• Conclusion: W is a simple function of V
WeWXWy += b
),0(~ 2VNe s
),(~),,(~
22
2
sµ
sµ
aaNyaxyNx
Þ
=
2/1
2
22 ),0(~
-=Þ
=Þ
VWIVW
VWNWe s
!ys = Xsβ +es
We are there…
• the GLM models the effect of your experimental manipulation on the acquired data
• GLM includes all known experimental effects and confounds
• estimates effects an errors on a voxel-by-voxel basis
Because we are dealing with fMRI data there are a number of problems we need to take care of:
• Convolution with a canonical HRF
• High-pass filtering to account for low-frequency drifts
• Estimation of multiple variance components (e.g. to account for serial correlations)
linear model
effectsestimate
error estimatestatisticß
We are there…c = 1 0 0 0 0 0 0 0 0 0 0
Null hypothesis: 01 =b
)ˆ(
ˆ
bbT
T
cStdct =
So far we have looked at a single voxel…
single voxel time series
• Mass-univariateapproach: GLM applied to > 100,000 voxels
• Threshold of p<0.05 more than 5000 voxels significant by chance!
• Massive problem with multiple comparisons!
• Solution: Gaussian random field theory
Outlook: further challenges
• correction for multiple comparisons• variability in the HRF across voxels• limitations of frequentist statistics• GLM ignores interactions among voxels
Thank you for listening!
• Friston, Ashburner, Kiebel, Nichols, Penny (2007) Statistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier.
• Christensen R (1996) Plane Answers to Complex Questions: The Theory of Linear Models. Springer.
• Friston KJ et al. (1995) Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping 2: 189-210.