Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...

37
Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data

Transcript of Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...

Page 1: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Lorelei Howard and Nick WrightMfD 2008

t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data

Page 2: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Overview

• Why do we need statistics?

• P values

• T-tests

• ANOVA

Page 3: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Why do we need statistics?

• To enable us to test experimental hypotheses– H0 = null hypothesis

– H1 = experimental hypothesis

• In terms of fMRI– Null = no difference in brain activation between these

2 conditions– Exp = there is a difference in brain activation between

these 2 conditions

Page 4: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

2 types of statistics

• Descriptive Stats– e.g., mean and standard deviation (S.D)

• Inferential statistics– t-tests, ANOVAs and regression

Page 5: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Issues when making inferences

Page 6: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

• So how do we know whether the effect observed in our sample was genuine?

– We don’t

• Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population

Page 7: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

P values

• P values = the probability that the observed result was obtained by chance– i.e. when the null hypothesis is true

• α level is set a priori (Usually 0.05)

• If p < α level then we reject the null hypothesis and accept the experimental hypothesis– 95% certain that our experimental effect is genuine

• If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis

Page 8: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Two types of errors

• Type I error = false positive

– α level of 0.05 means that there is 5% risk that a type I error will be encountered

• Type II error = false negative

Page 9: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

t-tests

• Compare two group means

Page 10: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Hypothetical experiment

Time

Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions?

Condition 1 = Simpson family facesCondition 2 = Griffin family faces

Page 11: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Calculating T

21

21

xxs

xxt

2

22

1

21

21 n

s

n

ss xx Group 1 Group 2

Difference between the means divided by the pooled standard error of the mean

Page 12: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

How do we apply this to fMRI data analysis?

Page 13: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Time

Page 14: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Degrees of freedom

• = number of unconstrained data points

• Which in this case = number of data points – 1.

• Can use t value and df to find the associated p value

• Then compare to the α level

Page 15: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Different types of t-test

• 2 sample t tests– Related = two samples related, i.e. same

people in both conditions– Independent = two independent samples, i.e.

diff people in 2 conditions

• One sample t tests– compare the mean of one sample to a given

value

Page 16: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Another approach to group differences

• Analysis Of VAriance (ANOVA)– Variances not means

• Multiple groupse.g. Different facial expressions

• H0 = no differences between groups

• H1 = differences between groups

Page 17: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Calculating F

• F = the between group variance divided by the within group variance– the model variance/error variance

• for F to be significant the between group variance should be considerably larger than the within group variance

Page 18: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

What can be concluded from a significant ANOVA?

• There is a significant difference between the groups

• NOT where this difference lies

• Finding exactly where the differences lie requires further statistical analyses

Page 19: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Different types of ANOVA

• One-way ANOVA– One factor with more than 2 levels

• Factorial ANOVAs – More than 1 factor

• Mixed design ANOVAs– Some factors independent, others related

Page 20: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Conclusions

• T-tests assess if two group means differ significantly

• Can compare two samples or one sample to a given value

• ANOVAs compare more than two groups or more complicated scenarios

• They use variances instead of means

Page 21: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Further reading

• Howell. Statistical methods for psychologists

• Howitt and Cramer. An introduction to statistics in psychology

• Huettel. Functional magnetic resonance imaging (especially chapter 12)

Acknowledgements

• MfD Slides 2005 – 2007

Page 22: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

PART 2

• Correlation

• Regression

• Relevance to GLM and SPM

Page 23: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Correlation

• Strength and direction of the relationship between variables

• Scattergrams

Y

X

YY

X

YY Y

Positive correlation Negative correlation No correlation

Page 24: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Describe correlation: covariance

• A statistic representing the degree to which 2 variables vary together

– Covariance formula

– cf. variance formula

but…• the absolute value of cov(x,y) is also a function of the

standard deviations of x and y.

n

yyxxyx

i

n

ii ))((

),cov( 1

n

xxS

n

ii

x

2

12

)(

Page 25: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Describe correlation: Pearson correlation coefficient (r)

• Equation

– r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation)

• Limitations:– Sensitive to extreme values, e.g.

– r is an estimate from the sample, but does it represent the population parameter?

– Relationship not a prediction.

yxxy ss

yxr

),cov( s = st dev of sample

0

1

2

3

4

5

0 1 2 3 4 5 6

Page 26: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Summary

• Correlation

• Regression

• Relevance to SPM

Page 27: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Regression

• Regression: Prediction of one variable from knowledge of one or more other variables.

• Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association).

• Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.

Page 28: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Best fit line, minimising sum of squared errors

• Describing the line as in GCSE maths: y = m x + c• Here, ŷ = bx + a

– ŷ : predicted value of y

– b: slope of regression line

– a: intercept

Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ).

Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2

ε

ε = residual

= y i , observed

= ŷ, predicted

ŷ = bx + a

Page 29: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

How to minimise SSerror

• Minimise (y- ŷ)2 , which is (y-bx+a)2

• Plotting SSerror for each possible regression line gives a parabola.

• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus.

• Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving:

Values of a and bS

um

s of

sq

uar

ed e

rror

(S

Ser

ror)

Gradient = 0min SSerror

x

y

s

rsb xbya

Page 30: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

How good is the model?• We can calculate the regression line for any data, but how well does it fit

the data?

• Total variance = predicted variance + error variance

sy2 = sŷ

2 + ser2

• Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model

r2 = sŷ2 / sy

2

• Insert r2 sy2 into sy

2 = sŷ2 + ser

2 and rearrange to get:

ser2 = sy

2 (1 – r2)

• From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Page 31: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Is the model significant?

• i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

• F-statistic:

F(dfŷ,dfer) =sŷ

2

ser2

=......=r2 (n - 2)2

1 – r2

complicatedrearranging

• And it follows that:

t(n-2) =r (n - 2)

√1 – r2

So all we need to know are r and n !

Page 32: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Summary

• Correlation

• Regression

• Relevance to SPM

Page 33: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

General Linear Model

• Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.

y = bx + a +ε

• A General Linear Model is just any model that describes the data in terms of a straight line

Page 34: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

++

== ++YY XX

data v

ecto

r (Vox

el)

data v

ecto

r (Vox

el)

design

mat

rix

design

mat

rix

param

eters

param

eters

erro

r vec

tor

erro

r vec

tor

××

==

One voxel: The GLM

Our aim: Solve equation for β – tells us how much BOLD signal is explained by X

Page 35: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Multiple regression• Multiple regression is used to determine the effect of a

number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

• The different x variables are combined in a linear way and each has its own regression coefficient:

y = b0 + b1x1+ b2x2 +…..+ bnxn + ε

• The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.

• i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

Page 36: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

SPM

• Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y

• Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y

• Both are types of General Linear Model

• This is what SPM does and will be explained soon…

Page 37: Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data.

Summary

• Correlation

• Regression

• Relevance to SPM

Thanks!