ANOVA: fixed effects

21
- p. 1/21 Statistics 203: Introduction to Regression and Analysis of Variance ANOVA: fixed effects Jonathan Taylor

Transcript of ANOVA: fixed effects

Page 1: ANOVA: fixed effects

- p. 1/21

Statistics 203: Introduction to Regressionand Analysis of Variance

ANOVA: fixed effects

Jonathan Taylor

Page 2: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 2/21

Today

Qualitative / categorical variables. One & Two-way ANOVA models.

Page 3: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 3/21

Categorical variables

Most variables we have looked at so far were continuous:height, rating, etc.

In many situations, we record a categorical variable: gender,state, country, etc.

How do we include this in our model?

Page 4: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 4/21

Example: tool lifetime

Outcome: Y , lifetime of a cutting tool on a lathe. Predictor:

X1, lathe speed, revolutions per minute T , tool type (A or B)

Goal: to study if the effect of lathe speed is differentdepending on the tool type.

Page 5: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 5/21

Solution #1: stratification

One solution is to “stratify” data set by this categoricalvariable.

We could break data set up into 2 groups by tool type, fitmodel

Yi = β0 + β1Xi,1 + εi

in each group. Problem: this results in very small samples in each group:

low degrees of freedom for estimating σ2 in each group.

Page 6: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 6/21

Solution #2: qualitative predictors

IF it is reasonable to assume that σ2 is constant for eachobservation.

THEN, we can incorporate all observations into 1 model.

Yi = β0 + β1Xi,1 + β2Xi,2 + β3Xi,1 ∗ Xi,2 + εi

where

Xi,2 =

1 if T = A,

0 otherwise.

This model estimate different slopes and intercepts withineach model: for tool type A: slope=β1 + β3, intercept=β0 + β2

for tool type B: slope=β1, intercept=β0

Test for different slopes: H0 : β3 = 0. Test for different intercepts: H0 : β2 = 0. Test for different slope & intercept : H0 : β2 = β3 = 0. Here is the example

Page 7: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 7/21

More than two levels

If our categorical variable has r levels (i.e. r different tooltypes t1, . . . , tr) then we need to add r − 1 categoricalvariables to X: for 1 ≤ j ≤ r − 1

Ci,j =

1 if Ti = tj

0 otherwise.

Note: there are many ways to “code” the qualitative variable.The scheme aboves that the mean in group r is β0 and thecoefficients of the columns Ci,j represent differences fromthe mean of group r.

To look for different “slopes” for a given continuous predictorX we need to add r − 1 more columns: for 1 ≤ j ≤ r − 1

Ii,j = Xi ∗ Ci,j , 1 ≤ i ≤ n.

These are our first “real” interactions: taking some columnsof a smaller X and multiplying them together (i.e. the Ccolumns and X columns).

Page 8: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 8/21

Analysis of Variance models

Models with only qualitative variables. One-way ANOVA: extension of “two-sample” t-test. Example: in studying the effect of BP on heart disease we

might consider the overall health (Poor, Moderate, Good). Two-way ANOVA: more than one qualitative variable: include

an ethnicity as part of our study of the effect of BP on heartdisease.

Page 9: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 9/21

One-way ANOVA

Generalizes two sample t-test: more than one level. One-way ANOVA model: observations:

(Yij), 1 ≤ i ≤ r, 1 ≤ j ≤ ni: r groups and ni samples in i-thgroup.

Yij = µ + αi + εij , εij ∼ N(0, σ2).

Constraint:∑r

i=1 αi = 0. Why a constraint? Otherwise,model is unidentifiable: r + 1 parameters for only r means.We can find infinitely many choices of (µ, α1, . . . , αr) thatyield same means for each Yij .

This particular constraint comes down to a different “coding”of the group levels (see Ci,j above). In this case, αi’s aredifferences from “grand mean” µ.

Page 10: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 10/21

Extension of two sample t-test

Model is easy to fit:

Yij =1

ni

ni∑

j=1

Yij .

Simplest question: is there any group effect?

H0 : α1 = · · · = αr = 0?

Test is based on F -test with full model vs. reduced model.Reduced model just has an intercept.

Page 11: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 11/21

ANOVA tables: One-way

Source SS df E(MS)

Treatments SST R =Pr

i=1 ni

Y i· − Y··

”2r − 1 σ2 +

Pri=1 niα2

ir−1

Error SSE =Pr

i=1Pni

j=1(Yij − Y i·)

2 Pri=1 ni − r σ2

Notation: Y i· is i-th group mean, Y··

is overall mean. We see that under H0 : α1 = · · · = αr = 0, the expected

value of SSTR and SSE is σ2. Entries in the ANOVA table are, in general, independent. Therefore, under H0

F =MSTR

MSTO=

SSTRdfTR

SSEdfE

∼ FdfTR,dfE.

Reject H0 at level α if F > F1−α,dfTR,dfT O.

Page 12: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 12/21

Example: rehab surgery

Example: rehab surgery How does prior fitness affect recovery from surgery?

Observations: 24 subjects’ recovery time. Three fitness levels: below average, average, above

average. If you are in better shape before surgery, does it take less

time to recover?

Page 13: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 13/21

Inference for linear combinations

Suppose we want to “infer” something about

r∑

i=1

ai(µ + αi).

Var

(r∑

i=1

aiY i·

)= σ2

r∑

i=1

a2i

ni

.

Usual confidence intervals, t-tests.

Page 14: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 14/21

Two-way ANOVA

Second generalization: more than one grouping variable. Two-way ANOVA model: observations:

(Yijk), 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤ nij : r groups in firstgrouping variable, m groups ins second and nij samples in(i, j)-“cell”:

Yijk = µ + αi + βj + (αβ)ij + εijk, εijk ∼ N(0, σ2).

Again: just a regression model. Main effects: α, β. Interaction effects (αβ): “second derivatives”

Page 15: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 15/21

Constraints on the parameters

∑r

i=1 αi = 0

∑m

j=1 βj = 0

∑m

j=1(αβ)ij = 0, 1 ≤ i ≤ r

∑r

i=1(αβ)ij = 0, 1 ≤ j ≤ m.

Page 16: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 16/21

Fitting model

Easy to fit:

Yijk = Y ij· =1

nij

nij∑

k=1

Yijk.

Inference for combinations

Var

r∑

i=1

m∑

j=1

aijY ij·

= σ2 ·

r∑

i=1

m∑

j=1

a2ij

nij

.

Usual t-tests, confidence intervals.

Page 17: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 17/21

Questions of interest

Are there main effects for the grouping variables?

H0 : α1 = · · · = αr = 0, H0 : β1 = · · · = βm = 0.

Are there interaction effects:

H0 : (αβ)ij = 0, 1 ≤ i ≤ r, 1 ≤ j ≤ m.

Page 18: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 18/21

ANOVA table: Two-way (assuming nij = n)

Term SS

A SSA = nm∑r

i=1

(Y i·· − Y

···

)2

B SSB = nr∑m

j=1

(Y

·j· − Y···

)2

AB SSAB = n∑r

i=1

∑mj=1

(Y ij· − Y i·· − Y

·j· + Y···

)2

Error SSE =∑r

i=1

∑mj=1

∑nk=1(Yijk − Y ij·)

2

Page 19: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 19/21

ANOVA table: Two-way (continued)

SS df E(MS)

SSA r − 1 σ2 + nm

Pri=1 α2

ir−1

SSB m − 1 σ2 + nr

Pmj=1 β2

jm−1

SSAB (m − 1)(r − 1) σ2 + n

Pri=1

Pmj=1(αβ)2

ij(r−1)(m−1)

SSE (n − 1)mr σ2

Under H0 : (αβ)ij = 0, ∀i, j the expected value of SSAB andSSE is σ2 – use these for an F -test. Use

MSAB

MSE=

SSAB/dfAB

SSE/dfE

∼ F(m−1)(r−1),(n−1)mr

to test H0. To test H0 : αi = 0, ∀i, use

MSA

MSE=

SSA/dfA

SSE/dfE

∼ Fr−1,(n−1)mr.

To test H0 : βi = 0, ∀i, use

MSB

MSE

SSB/dfB

SSE/dfE

∼ Fm−1,(n−1)mr.

Page 20: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 20/21

Example: kidney failure

Time of stay in hospital depends on weight gain betweentreatments and duration of treatment.

Two levels of duration, three levels of weight gain. Is there an interaction? Main effects? Here is the example

Page 21: ANOVA: fixed effects

Today

Categorical variables

Example: tool lifetime

Solution #1: stratification

Solution #2: qualitative

predictors More than two levels

Analysis of Variance models

One-way ANOVA

Extension of two sample

t-test ANOVA tables: One-way

Example: rehab surgery

Inference for linear

combinations Two-way ANOVA

Constraints on the parameters

Fitting model

Questions of interest

ANOVA table: Two-way

(assuming nij = n)

ANOVA table: Two-way

(continued)

Example: kidney failure

Caveats

- p. 21/21

Caveats

Testing for main effects is NOT the same as usual. R uses SSE from full model (including interactions) as

denominator. This allows for interaction terms with no main effects.