Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...

Lorelei Howard and Nick WrightMfD 2008

t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data

Overview

• Why do we need statistics?

• P values

• T-tests

• ANOVA

Why do we need statistics?

• To enable us to test experimental hypotheses– H0 = null hypothesis

– H1 = experimental hypothesis

• In terms of fMRI– Null = no difference in brain activation between these

2 conditions– Exp = there is a difference in brain activation between

these 2 conditions

2 types of statistics

• Descriptive Stats– e.g., mean and standard deviation (S.D)

• Inferential statistics– t-tests, ANOVAs and regression

Issues when making inferences

• So how do we know whether the effect observed in our sample was genuine?

– We don’t

• Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population

P values

• P values = the probability that the observed result was obtained by chance– i.e. when the null hypothesis is true

• α level is set a priori (Usually 0.05)

• If p < α level then we reject the null hypothesis and accept the experimental hypothesis– 95% certain that our experimental effect is genuine

• If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis

Two types of errors

• Type I error = false positive

– α level of 0.05 means that there is 5% risk that a type I error will be encountered

• Type II error = false negative

t-tests

• Compare two group means

Hypothetical experiment

Time

Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions?

Condition 1 = Simpson family facesCondition 2 = Griffin family faces

Calculating T

21

21

xxs

xxt

2

22

1

21

21 n

s

n

ss xx Group 1 Group 2

Difference between the means divided by the pooled standard error of the mean

How do we apply this to fMRI data analysis?

Degrees of freedom

• = number of unconstrained data points

• Which in this case = number of data points – 1.

• Can use t value and df to find the associated p value

• Then compare to the α level

Different types of t-test

• 2 sample t tests– Related = two samples related, i.e. same

people in both conditions– Independent = two independent samples, i.e.

diff people in 2 conditions

• One sample t tests– compare the mean of one sample to a given

value

Another approach to group differences

• Analysis Of VAriance (ANOVA)– Variances not means

• Multiple groupse.g. Different facial expressions

• H0 = no differences between groups

• H1 = differences between groups

Calculating F

• F = the between group variance divided by the within group variance– the model variance/error variance

• for F to be significant the between group variance should be considerably larger than the within group variance

What can be concluded from a significant ANOVA?

• There is a significant difference between the groups

• NOT where this difference lies

• Finding exactly where the differences lie requires further statistical analyses

Different types of ANOVA

• One-way ANOVA– One factor with more than 2 levels

• Factorial ANOVAs – More than 1 factor

• Mixed design ANOVAs– Some factors independent, others related

Conclusions

• T-tests assess if two group means differ significantly

• Can compare two samples or one sample to a given value

• ANOVAs compare more than two groups or more complicated scenarios

• They use variances instead of means

Further reading

• Howell. Statistical methods for psychologists

• Howitt and Cramer. An introduction to statistics in psychology

• Huettel. Functional magnetic resonance imaging (especially chapter 12)

Acknowledgements

• MfD Slides 2005 – 2007

PART 2

• Correlation

• Regression

• Relevance to GLM and SPM

Correlation

• Strength and direction of the relationship between variables

• Scattergrams

Y

X

YY

X

YY Y

Positive correlation Negative correlation No correlation

Describe correlation: covariance

• A statistic representing the degree to which 2 variables vary together

– Covariance formula

– cf. variance formula

but…• the absolute value of cov(x,y) is also a function of the

standard deviations of x and y.

n

yyxxyx

i

n

ii ))((

),cov( 1

n

xxS

n

ii

x

2

12

)(

Describe correlation: Pearson correlation coefficient (r)

• Equation

– r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation)

• Limitations:– Sensitive to extreme values, e.g.

– r is an estimate from the sample, but does it represent the population parameter?

– Relationship not a prediction.

yxxy ss

yxr

),cov( s = st dev of sample

0

1

2

3

4

5

0 1 2 3 4 5 6

Summary

• Correlation

• Regression

• Relevance to SPM

Regression

• Regression: Prediction of one variable from knowledge of one or more other variables.

• Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association).

• Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.

Best fit line, minimising sum of squared errors

• Describing the line as in GCSE maths: y = m x + c• Here, ŷ = bx + a

– ŷ : predicted value of y

– b: slope of regression line

– a: intercept

Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ).

Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2

ε

ε = residual

= y i , observed

= ŷ, predicted

ŷ = bx + a

How to minimise SSerror

• Minimise (y- ŷ)2 , which is (y-bx+a)2

• Plotting SSerror for each possible regression line gives a parabola.

• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus.

• Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving:

Values of a and bS

um

s of

sq

uar

ed e

rror

(S

Ser

ror)

Gradient = 0min SSerror

x

y

s

rsb xbya

How good is the model?• We can calculate the regression line for any data, but how well does it fit

the data?

• Total variance = predicted variance + error variance

sy2 = sŷ

2 + ser2

• Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model

r2 = sŷ2 / sy

2

• Insert r2 sy2 into sy

2 = sŷ2 + ser

2 and rearrange to get:

ser2 = sy

2 (1 – r2)

• From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Is the model significant?

• i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

• F-statistic:

F(dfŷ,dfer) =sŷ

2

ser2

=......=r2 (n - 2)2

1 – r2

complicatedrearranging

• And it follows that:

t(n-2) =r (n - 2)

√1 – r2

So all we need to know are r and n !

Summary

• Correlation

• Regression


General Linear Model

• Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.

y = bx + a +ε

• A General Linear Model is just any model that describes the data in terms of a straight line

++

== ++YY XX

data v

ecto

r (Vox

el)

data v

ecto

r (Vox

el)

design

mat

rix

design

mat

rix

param

eters

param

eters

erro

r vec

tor

erro

r vec

tor

××

==

One voxel: The GLM

Our aim: Solve equation for β – tells us how much BOLD signal is explained by X

Multiple regression• Multiple regression is used to determine the effect of a

number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

• The different x variables are combined in a linear way and each has its own regression coefficient:

y = b0 + b1x1+ b2x2 +…..+ bnxn + ε

• The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.

• i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

SPM

• Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y

• Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y

• Both are types of General Linear Model

• This is what SPM does and will be explained soon…

Summary

• Correlation

• Regression


Thanks!

Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...

Documents

Transcript of Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the...