Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec ›...

23

Transcript of Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec ›...

Page 1: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Lectures 7: One-Way ANOVA

Junshu Bao

University of Pittsburgh

1 / 23

Page 2: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Table of contents

Review of ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Data Exploration

ANOVA Test

Post-hoc Tests

Non-parametric Method

2 / 23

Page 3: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Review of ANOVA

Introduction

I ANOVA is ANALYSIS OF VARIANCE.

Contrary to what this phrase seems to say, we will be

primarily concerned with the comparison of the means of

the data, not their variances.

I One-Way ANOVA compares mean values between multipleindependent groupsI Independent variable: categorical variable (> 2 levels)I Dependent variable: continuous outcome variableI Extension of the t-test (2 groups)

3 / 23

Page 4: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Review of ANOVA

ANOVA Hypothesis Testing

I Null: All the population means are equal, i.e.

H0 : µ1 = µ2 = · · · = µI

I Alternative: Not all the µi's are equal, i.e.

Ha : µi 6= µj

for some i, j.

4 / 23

Page 5: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Review of ANOVA

One-Way ANOVA Model

The statistic model for one-way ANOVA is

Yij = µ+ αi + εij

I Yij is the jth observation of the ith treatment (group)

I µ is the overall mean level

I αi is the di�erential e�ect of the ith treatment.

I The αi are normalized:∑I

i=1 αi = 0.

I εij is the random error and εij ∼ N(0, σ2).

The mean of the ith treatment group is µi = µ+ αi. It follows that

H0 : α1 = α2 = · · · = αI = 0

5 / 23

Page 6: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Review of ANOVA

The F Test

The analysis of variance is based on the following identity:

I∑i=1

J∑j=1

(Yij − Y..)2 =

I∑i=1

J∑j=1

(Yij − Yi.)2 + J

I∑i=1

(Yi. − Y..)2

and the identity may be symbolically expressed as

SST = SSE + SSG

I SST is the total sum of squares (total variation)

I SSE is the error sum of squares (variation within groups)

I SSG is the sum of squares among groups (variation amonggroups)

6 / 23

Page 7: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Review of ANOVA

The F Test (cont.)

I Theorem:

E(SSE) =I(J − 1)σ2 =⇒ E

[SSE

I(J − 1)

]= σ2

E(SSG) =J

I∑i=1

α2i + (I − 1)σ2

Under H0, E(SSG) = (I − 1)σ2 and E[SSG/(I − 1)] = σ2.

I Theorem:

SSG/σ2 ∼χ2I−1 under H0

SSE/σ2 ∼χ2I(J−1)

And SSE and SSG are independent. Consequently,

F =SSG/(I − 1)

SSE/[I(J − 1)]=MSG

MSE∼ F(I−1),I(J−1)

7 / 23

Page 8: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Example

A group of psychiatrists wanted to better understand the linkbetween a common stress hormone and di�erent psychiatric disorders.They enrolled random samples of patients diagnosed with

1. `normal' psychiatric status (as a control group)

2. major depression

3. schizophrenia

4. bipolar disorder

5. `atypical' psychiatric status.

The cortisol level of each patient was measured at their study visit.

Research question: Are the levels of cortisol signi�cantly di�erent for

patients with di�ering psychiatric status?

8 / 23

Page 9: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Analysis Plan

To examine the relationship between cortisol level and psychiatricstatus, we will perform the following analysis:

I Data exploration

I Descriptive statisticsI Side-by-side boxplot of cortisol levels by psychiatric status.

I ANOVA

I Assumption checking

I Post-hoc tests (if appropriate)

9 / 23

Page 10: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Reading DataReading the data and create formats.

data cort;

infile `C:\STAT 1301\data_supp\cortisol.dat';

input group cortisol;

label group = `Psychiatric status';

run;

proc format;

value groupform

1 = `Normal'

2 = `Major Depression'

3 = `Bipolar Depression'

4 = `Schizophrenia'

5 = `Atypical';

run;

Notice that, originally, `group' is read as a numeric variable with

values 1, 2, 3, 4, and 5. The format procedure is used to print

numeric values as character values.

10 / 23

Page 11: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Data Exploration

Data ExplorationCompare descriptive statistics: mean and standard deviation.

proc tabulate data=cort;

class group;

format group groupform.;

var cortisol;

table group,

cortisol*(mean std n);

run;

I The `Major Depression' group has the highest sample mean ofcortisol.

I The `Major Depression' group also has the largest std.

11 / 23

Page 12: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Data Exploration

Data Exploration (2)Create a side-by-side boxplot.

proc sort data=cort;

by group;

run;

proc boxplot data=cort;

plot (cortisol)*group;

run;

I In order to create the boxplot, the data must be sorted by thegrouping variable.

I Apparently, the second group `Major Depression' is very di�erentfrom other groups.

12 / 23

Page 13: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

ANOVA F Test

proc anova data=cort;

class group;

model cortisol=group;

run;

ANOVA Table:

Source DF Sum of Squares Mean Square F Value Pr>FModel 4 1426 356 22.32 <0.0001Error 66 1054 16Total 70 2480

I H0 : µ1 = µ2 = · · · = µ5 versus Ha : µi 6= µj for some i and j.

I Test statistic: F = 22.32

I p-value < 0.0001

I Decision: Reject H0. Not all of the population means are equal.13 / 23

Page 14: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

Checking Assumptions

I The dependent variable is normally distributed within eachgroup.

I ANOVA is actually pretty robust against violations innormality.

I Test by running proc univariate on the variable or theresiduals by group.

I Homogeneity of variance: variances are equal across groups.

I Use Levene's test.

I Independence of observations: a study design issue.

If assumptions are violated (normality or homogeneity of variable),

can try a transformation to �x the issue or run a non-parametric

alternative (Kruskal-Wallis)

14 / 23

Page 15: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

Checking Normality Assumption

proc univariate data=cort normal;

var cortisol;

by group;

run;

Test results

I Group 1: signi�cant

I Group 2: not signi�cant

I Group 3: signi�cant

I Group 4: not signi�cant

I Group 5: marginally signi�cant

15 / 23

Page 16: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

Log Transformation of Response

Since the normal test is signi�cant for some groups, we can take log ofthe cortisol level and see if the data will be more `normal'.

data cort2; set cort; logcort = log(cortisol); run;

proc sort data=cort2; by group; run;

proc univariate data=cort2 normal;

var logcort; by group; run;

Test results

I Group 1: signi�cant

I Group 2: signi�cant

I Group 3: not signi�cant

I Group 4: marginally signi�cant

I Group 5: not signi�cant

Normality is not improved by log transformation.16 / 23

Page 17: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

Checking Equal-Variance Assumption (1)

Levene's test

I H0: the variances in each group are equivalent. The assumptionis `met' if we accept the null hypothesis � so we want anon-signi�cant p-value.

I Ha: The variances are not equivalent.

We can ask SAS to provide you with Levene's test using the hovtestoption in the means statement.

proc anova data=cort;

class group;

model cortisol=group;

means group/ hovtest;

run;

The p-value is less than 0.0001 so the test is signi�cant.

17 / 23

Page 18: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

Checking Equal-Variance Assumption (2)

Let us check the homogeneous variance assumption for thelog-transformed data.

proc anova data=cort2;

class group;

model logcort=group;

means group/ hovtest; *asks for Levene's test;

run;

The p-value is 0.0856 so fail to reject the null. The variances are not

signi�cantly di�erent among groups for log(cortisol). So we will use

the ANOVA test result on the log transformed response.

18 / 23

Page 19: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

ANOVA Test

ANOVA F Test for log(cortisol)

Source DF Sum of Squares Mean Square F Value Pr>FModel 4 36.13 9.03 20.63 <0.0001Error 66 28.90 0.44Total 70 65.03

I H0 : µ1 = µ2 = · · · = µ5 versus Ha : µi 6= µj for some i and j.Note that log(µi) = log(µj) =⇒ µi = µj .

I Test statistic: F = 20.63

I p-value < 0.0001

I Decision: Reject H0. Not all of the population means are equal.

19 / 23

Page 20: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Post-hoc Tests

Post-hoc Tests

If the overall F-test is signi�cant, we only know that there is at leastone pair of means that are di�erent but we do not know WHICHpairs are di�erent.

I Pairwise tests (I = 3)

H01 :µ1 = µ2

H02 :µ1 = µ3

H03 :µ2 = µ3

I Three di�erent methods of performing pairwise tests. All 3adjust for the in�ated type I error rate when performing multiplecomparisons.

I Tukey's test (use when sample sizes are the same)*Tukey-Kramer test (for unequal sample sizes)

I Bonferroni's test (tends to be conservative)I Sche�e's test (equal or unequal sample sizes)

20 / 23

Page 21: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Post-hoc Tests

Pairwise comparisons

The bon, tukey, and sche�e options on the means statements requestthe three pairwise tests. The cldi� option presents results of the testsas con�dence intervals for all pairwise di�erences between means. It isthe default for unequal cell sizes.

proc anova data=cort2;

class group;

model logcort=group;

means group/tukey bon scheffe cldiff;

run;

Summary of test results:

I Tukey and Bonferroni have the same results:

I Signi�cantly di�erent pairs: (1,2),(1,4),(2,3),(2,4),(2,5),(3,4)

I Sche�e's test:

I Signi�cantly di�erent pairs: (2,1), (2,3), (2,4), (2,5)

21 / 23

Page 22: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Non-parametric Method

A Non-parametric Method - The Kruskal-Wallis Test

The Kruskal-Wallis test is a generalization of the Mann-Whitney test.It is a non-parametric alternative to the ANOVA F test.

proc npar1way data=cort;

class group;

var cortisol;

run;

Summary of test results:

I Test statistic: X2 = 31.7657

I p-value < 0.0001

22 / 23

Page 23: Lectures 7: One-Way ANOVApitt.edu › ~jub69 › material › material › stat1301 › Lec › SAS-ANOVA-1.… · proc anova data=cort2; class group; model logcort=group; means group/tukey

Lectures 7: One-Way ANOVA

Example: Cortisol Levels and Psychiatric Disorders

Non-parametric Method

Non-Parametric Pairwise Comparison

If you specify the dscf option, proc npar1way computes the Dwass,Steel, Critchlow-Fligner (DSCF) multiple comparison analysis, whichis based on pairwise two-sample Wilcoxon comparisons.

proc npar1way data=cort dscf;

class group;

var cortisol;

run;

Signi�cantly di�erent pairs:

I (1,2), (1,4), (2,3), (2,4), (3,4)

The results are similar to those of Tukey-Kramer and Bonferroni.

23 / 23