Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...
-
Upload
jeffrey-peters -
Category
Documents
-
view
215 -
download
1
Transcript of Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...
Lorelei Howard and Nick WrightMfD 2008
t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data
Overview
• Why do we need statistics?
• P values
• T-tests
• ANOVA
Why do we need statistics?
• To enable us to test experimental hypotheses– H0 = null hypothesis
– H1 = experimental hypothesis
• In terms of fMRI– Null = no difference in brain activation between these
2 conditions– Exp = there is a difference in brain activation between
these 2 conditions
2 types of statistics
• Descriptive Stats– e.g., mean and standard deviation (S.D)
• Inferential statistics– t-tests, ANOVAs and regression
Issues when making inferences
• So how do we know whether the effect observed in our sample was genuine?
– We don’t
• Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population
P values
• P values = the probability that the observed result was obtained by chance– i.e. when the null hypothesis is true
• α level is set a priori (Usually 0.05)
• If p < α level then we reject the null hypothesis and accept the experimental hypothesis– 95% certain that our experimental effect is genuine
• If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis
Two types of errors
• Type I error = false positive
– α level of 0.05 means that there is 5% risk that a type I error will be encountered
• Type II error = false negative
t-tests
• Compare two group means
Hypothetical experiment
Time
Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions?
Condition 1 = Simpson family facesCondition 2 = Griffin family faces
Calculating T
21
21
xxs
xxt
2
22
1
21
21 n
s
n
ss xx Group 1 Group 2
Difference between the means divided by the pooled standard error of the mean
How do we apply this to fMRI data analysis?
Time
Degrees of freedom
• = number of unconstrained data points
• Which in this case = number of data points – 1.
• Can use t value and df to find the associated p value
• Then compare to the α level
Different types of t-test
• 2 sample t tests– Related = two samples related, i.e. same
people in both conditions– Independent = two independent samples, i.e.
diff people in 2 conditions
• One sample t tests– compare the mean of one sample to a given
value
Another approach to group differences
• Analysis Of VAriance (ANOVA)– Variances not means
• Multiple groupse.g. Different facial expressions
• H0 = no differences between groups
• H1 = differences between groups
Calculating F
• F = the between group variance divided by the within group variance– the model variance/error variance
• for F to be significant the between group variance should be considerably larger than the within group variance
What can be concluded from a significant ANOVA?
• There is a significant difference between the groups
• NOT where this difference lies
• Finding exactly where the differences lie requires further statistical analyses
Different types of ANOVA
• One-way ANOVA– One factor with more than 2 levels
• Factorial ANOVAs – More than 1 factor
• Mixed design ANOVAs– Some factors independent, others related
Conclusions
• T-tests assess if two group means differ significantly
• Can compare two samples or one sample to a given value
• ANOVAs compare more than two groups or more complicated scenarios
• They use variances instead of means
Further reading
• Howell. Statistical methods for psychologists
• Howitt and Cramer. An introduction to statistics in psychology
• Huettel. Functional magnetic resonance imaging (especially chapter 12)
Acknowledgements
• MfD Slides 2005 – 2007
PART 2
• Correlation
• Regression
• Relevance to GLM and SPM
Correlation
• Strength and direction of the relationship between variables
• Scattergrams
Y
X
YY
X
YY Y
Positive correlation Negative correlation No correlation
Describe correlation: covariance
• A statistic representing the degree to which 2 variables vary together
– Covariance formula
– cf. variance formula
but…• the absolute value of cov(x,y) is also a function of the
standard deviations of x and y.
n
yyxxyx
i
n
ii ))((
),cov( 1
n
xxS
n
ii
x
2
12
)(
Describe correlation: Pearson correlation coefficient (r)
• Equation
– r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation)
• Limitations:– Sensitive to extreme values, e.g.
– r is an estimate from the sample, but does it represent the population parameter?
– Relationship not a prediction.
yxxy ss
yxr
),cov( s = st dev of sample
0
1
2
3
4
5
0 1 2 3 4 5 6
Summary
• Correlation
• Regression
• Relevance to SPM
Regression
• Regression: Prediction of one variable from knowledge of one or more other variables.
• Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association).
• Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.
Best fit line, minimising sum of squared errors
• Describing the line as in GCSE maths: y = m x + c• Here, ŷ = bx + a
– ŷ : predicted value of y
– b: slope of regression line
– a: intercept
Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ).
Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2
ε
ε = residual
= y i , observed
= ŷ, predicted
ŷ = bx + a
How to minimise SSerror
• Minimise (y- ŷ)2 , which is (y-bx+a)2
• Plotting SSerror for each possible regression line gives a parabola.
• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus.
• Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving:
Values of a and bS
um
s of
sq
uar
ed e
rror
(S
Ser
ror)
Gradient = 0min SSerror
x
y
s
rsb xbya
How good is the model?• We can calculate the regression line for any data, but how well does it fit
the data?
• Total variance = predicted variance + error variance
sy2 = sŷ
2 + ser2
• Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model
r2 = sŷ2 / sy
2
• Insert r2 sy2 into sy
2 = sŷ2 + ser
2 and rearrange to get:
ser2 = sy
2 (1 – r2)
• From this we can see that the greater the correlation the smaller the error variance, so the better our prediction
Is the model significant?
• i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?
• F-statistic:
F(dfŷ,dfer) =sŷ
2
ser2
=......=r2 (n - 2)2
1 – r2
complicatedrearranging
• And it follows that:
t(n-2) =r (n - 2)
√1 – r2
So all we need to know are r and n !
Summary
• Correlation
• Regression
• Relevance to SPM
General Linear Model
• Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.
y = bx + a +ε
• A General Linear Model is just any model that describes the data in terms of a straight line
++
== ++YY XX
data v
ecto
r (Vox
el)
data v
ecto
r (Vox
el)
design
mat
rix
design
mat
rix
param
eters
param
eters
erro
r vec
tor
erro
r vec
tor
××
==
One voxel: The GLM
Our aim: Solve equation for β – tells us how much BOLD signal is explained by X
Multiple regression• Multiple regression is used to determine the effect of a
number of independent variables, x1, x2, x3 etc., on a single dependent variable, y
• The different x variables are combined in a linear way and each has its own regression coefficient:
y = b0 + b1x1+ b2x2 +…..+ bnxn + ε
• The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.
• i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for
SPM
• Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y
• Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y
• Both are types of General Linear Model
• This is what SPM does and will be explained soon…
Summary
• Correlation
• Regression
• Relevance to SPM
Thanks!