Anova General

download Anova General

of 61

Transcript of Anova General

  • 8/12/2019 Anova General

    1/61

    Analysis of Variance

    Desci2 Lecture

  • 8/12/2019 Anova General

    2/61

    Learning Objectives

    In this chapter, you learn:

    The basic concepts of experimental design

    How to use one-way analysis of variance to testfor differences among the means of several

    populations (also referred to as groups in this

    chapter)

    When to use a randomized block design

    How to use two-way analysis of variance and the

    concept of interaction

  • 8/12/2019 Anova General

    3/61

    Chapter Overview

    Analysis of Variance (ANOVA)

    F-test

    Tukey-

    Kramer

    test

    One-WayANOVA Two-WayANOVA

    Interaction

    Effects

    RandomizedBlock Design

    Multiple

    Comparisons

  • 8/12/2019 Anova General

    4/61

    example

    A financial analyst studied the annual returns of

    three different categories of unit trust.

    She wanted to know whether the averageannual returns per category varied across unit

    trust categories.

    A random sample of unit trusts from each of

    three categories (labelled A, B and C) wereselected and their annual returns were

    recorded.

    The data is shown in Table below.

  • 8/12/2019 Anova General

    5/61

  • 8/12/2019 Anova General

    6/61

    Management Question

    Can the financial analyst conclude that the

    average annual returns from the threecategories of unit trusts are the same?

  • 8/12/2019 Anova General

    7/61

    General ANOVA Setting

    Investigator controls one or more independent

    variables

    Called factors(or treatment variables)

    Each factor contains two or more levels(or groups orcategories/classifications)

    Observe effects on the dependent variable

    Response to levels of independent variable

    Experimental design: the plan used to collect

    the data

  • 8/12/2019 Anova General

    8/61

    Completely Randomized Design

    Experimental units (subjects) are assigned

    randomly to treatments

    Subjects are assumed homogeneous Only one factor or independent variable

    With two or more treatment levels

    Analyzed by one-way analysis of variance

    (ANOVA)

  • 8/12/2019 Anova General

    9/61

    One-Way Analysis of Variance

    Evaluate the difference among the means of three

    or more groups

    Examples: Accident rates for 1st, 2nd, and 3rdshift

    Expected mileage for five brands of tires

    Assumptions

    Populations are normally distributed Populations have equal variances

    Samples are randomly and independently drawn

  • 8/12/2019 Anova General

    10/61

    Hypotheses of One-Way ANOVA

    All population means are equal

    i.e., no treatment effect (no variation in means among

    groups)

    At least one population mean is different

    i.e., there is a treatment effect

    Does not mean that all population means are different

    (some pairs may be the same)

    c3210 :H

    samethearemeanspopulationtheofallNot:H1

  • 8/12/2019 Anova General

    11/61

    One-Factor ANOVA

    All Means are the same:

    The Null Hypothesis is True

    (No Treatment Effect)

    c3210 :H

    sametheareallNot:H j1

    321

    Example:

    3 Groups

  • 8/12/2019 Anova General

    12/61

    One-Factor ANOVA

    At least one mean is different:

    The Null Hypothesis is NOT true

    (Treatment Effect is present)

    c3210 :H sametheareallNot:H j1

    321 321

    or

    (continued)

  • 8/12/2019 Anova General

    13/61

    Partitioning the Variation

    Total variation can be split into two parts:

    SST = Total Sum of Squares

    (Total variation)

    SSA = Sum of Squares Among Groups

    (Among-group variation)

    SSW = Sum of Squares Within Groups

    (Within-group variation)

    SST = SSA + SSW

  • 8/12/2019 Anova General

    14/61

    Partitioning the Variation

    Total Variation= the aggregate dispersion of the individualdata values across the various factor levels (SST)

    Within-Group Variation= dispersion that exists among

    the data values within a particular factor level (SSW)

    Among-Group Variation= dispersion between the factor

    sample means (SSA)

    SST = SSA + SSW

    (continued)

  • 8/12/2019 Anova General

    15/61

    Partition of Total Variation

    Variation Due toFactor (SSA)

    Variation Due to RandomSampling (SSW)

    Total Variation (SST)

    Commonly referred to as: Sum of Squares Within

    Sum of Squares Error

    Sum of Squares Unexplained

    Within-Group Variation

    Commonly referred to as: Sum of Squares Between

    Sum of Squares Among

    Sum of Squares Explained

    Among Groups Variation

    = +

    d.f. = n1

    d.f. = c1 d.f. = nc

  • 8/12/2019 Anova General

    16/61

    Total Sum of Squares

    c

    1j

    n

    1i

    2

    ij

    j

    )XX(SST

    Where:

    SST = Total sum of squares

    c = number of groups (levels or treatments)

    nj= number of observations in group j

    Xij= ithobservation from group j

    X = grand mean (mean of all data values)

    SST = SSA + SSW

  • 8/12/2019 Anova General

    17/61

    Total Variation

    Group 1 Group 2 Group 3

    Response, X

    X

    2

    cn

    2

    12

    2

    11 )XX(...)XX()XX(SST c

    (continued)

  • 8/12/2019 Anova General

    18/61

    Among-Group Variation

    Where:

    SSA = Sum of squares among groups

    c = number of groups

    nj= sample size from group j

    Xj= sample mean from group j

    X = grand mean (mean of all data values)

    2j

    c

    1j

    j )XX(nSSA

    SST = SSA + SSW

  • 8/12/2019 Anova General

    19/61

    Among-Group Variation

    Variation Due toDifferences Among Groups

    i j

    2j

    c

    1j

    j )XX(nSSA

    1cSSAMSA

    Mean Square Among =

    SSA/degrees of freedom

    (continued)

  • 8/12/2019 Anova General

    20/61

    Among-Group Variation

    Group 1 Group 2 Group 3

    Response, X

    X1

    X 2X

    3X

    22

    22

    2

    11 )xx(n...)xx(n)xx(nSSA cc

    (continued)

  • 8/12/2019 Anova General

    21/61

    Within-Group Variation

    Where:

    SSW = Sum of squares within groups

    c = number of groupsnj= sample size from group j

    Xj= sample mean from group j

    Xij

    = ithobservation in group j

    2jij

    n

    1i

    c

    1j

    )XX(SSWj

    SST = SSA + SSW

  • 8/12/2019 Anova General

    22/61

    Within-Group Variation

    Summing the variation

    within each group and then

    adding over all groups cnSSWMSW

    Mean Square Within =

    SSW/degrees of freedom

    2jij

    n

    1i

    c

    1j

    )XX(SSWj

    (continued)

    j

  • 8/12/2019 Anova General

    23/61

    Within-Group Variation

    Group 1 Group 2 Group 3

    Response, X

    1X

    2X

    3X

    2ccn

    2212

    2111 )XX(...)XX()Xx(SSW c

    (continued)

  • 8/12/2019 Anova General

    24/61

    Obtaining the Mean Squares

    cn

    SSWMSW

    1c

    SSAMSA

    1n

    SSTMST

  • 8/12/2019 Anova General

    25/61

    One-Way ANOVA Table

    Source of

    VariationdfSS MS

    (Variance)

    Among

    Groups SSA MSA =

    Within

    Groupsn - cSSW MSW =

    Total n - 1SST =SSA+SSW

    c - 1MSA

    MSW

    F ratio

    c = number of groups

    n = sum of the sample sizes from all groups

    df = degrees of freedom

    SSA

    c - 1

    SSW

    n - c

    F =

  • 8/12/2019 Anova General

    26/61

    One-Way ANOVAF Test Statistic

    Test statistic

    MSAis mean squares amonggroups

    MSWis mean squares withingroups

    Degrees of freedom

    df1= c1 (c = number of groups)

    df2= nc (n = sum of sample sizes from all populations)

    MSWMSAF

    H0: 1= 2= = c

    H1: At least two population means are different

  • 8/12/2019 Anova General

    27/61

    Interpreting One-Way ANOVAF Statistic

    The F statistic is the ratio of the amongestimate of variance and the withinestimateof variance

    The ratio must always be positive df1= c-1 will typically be small

    df2= n- c will typically be large

    Decision Rule:

    Reject H0if F > FU,otherwise do notreject H0

    0

    = .05

    Reject H0Do notreject H0

    FU

  • 8/12/2019 Anova General

    28/61

    One-Way ANOVAF Test Example

    You want to see if three

    different golf clubs yield

    different distances. You

    randomly select fivemeasurements from trials on

    an automated driving

    machine for each club. At the

    0.05 significance level, isthere a difference in mean

    distance?

    Club 1 Club 2 Club 3

    254 234 200

    263 218 222

    241 235 197237 227 206

    251 216 204

  • 8/12/2019 Anova General

    29/61

    One-Way ANOVA Example:Scatter Diagram

    270

    260

    250

    240

    230

    220

    210200

    190

    Distance

    1X

    2X

    3X

    X

    227.0x

    205.8x226.0x249.2x 321

    Club 1 Club 2 Club 3

    254 234 200

    263 218 222

    241 235 197237 227 206

    251 216 204

    Club1 2 3

  • 8/12/2019 Anova General

    30/61

    One-Way ANOVA ExampleComputations

    Club 1 Club 2 Club 3

    254 234 200

    263 218 222

    241 235 197

    237 227 206

    251 216 204

    X1= 249.2

    X2= 226.0

    X3= 205.8

    X = 227.0

    n1= 5

    n2= 5

    n3= 5

    n = 15

    c = 3

    SSA = 5 (249.2227)2+ 5 (226227)2+ 5 (205.8227)2 = 4716.4

    SSW = (254249.2)2+ (263249.2)2++ (204 205.8)2= 1119.6

    MSA = 4716.4 / (3-1) = 2358.2

    MSW = 1119.6 / (15-3) = 93.325.275

    93.3

    2358.2F

  • 8/12/2019 Anova General

    31/61

    F= 25.275

    One-Way ANOVA ExampleSolution

    H0: 1= 2= 3

    H1: jnot all equal

    = 0.05

    df1= 2 df2= 12

    Test Statistic:

    Decision:

    Conclusion:

    Reject H0at = 0.05

    There is evidence that

    at least one j differs

    from the rest

    0

    = .05

    FU

    = 3.89

    Reject H0Do notreject H0

    25.275

    93.3

    2358.2

    MSW

    MSAF

    Critical

    Value:

    FU

    = 3.89

  • 8/12/2019 Anova General

    32/61

    SUMMARY

    Groups Count Sum Average Variance

    Club 1 5 1246 249.2 108.2

    Club 2 5 1130 226 77.5Club 3 5 1029 205.8 94.2

    ANOVA

    Source of

    VariationSS df MS F P-value F crit

    BetweenGroups

    4716.4 2 2358.2 25.275 4.99E-05 3.89

    Within

    Groups1119.6 12 93.3

    Total 5836.0 14

    One-Way ANOVAExcel Output

    EXCEL: tools | data analysis | ANOVA: single factor

  • 8/12/2019 Anova General

    33/61

    The Tukey-Kramer Procedure

    Tells whichpopulation means are significantlydifferent e.g.: 1= 23

    Done after rejection of equal means in ANOVA

    Allows pair-wise comparisons

    Compare absolute mean differences with criticalrange

    x1 = 2 3

  • 8/12/2019 Anova General

    34/61

    Tukey-Kramer Critical Range

    where:

    QU = Value from Studentized Range Distribution

    with c and n - c degrees of freedom for

    the desired level of (see appendix E.9 table)

    MSW = Mean Square Within

    njand nj= Sample sizes from groups j and j

    j'j

    U

    n

    1

    n

    1

    2

    MSWQRangeCritical

  • 8/12/2019 Anova General

    35/61

    The Tukey-Kramer Procedure:Example

    1. Compute absolute meandifferences:Club 1 Club 2 Club 3

    254 234 200

    263 218 222

    241 235 197

    237 227 206

    251 216 204 20.2205.8226.0xx

    43.4205.8249.2xx

    23.2226.0249.2xx

    32

    31

    21

    2. Find the QUvalue from the table in appendix E.10 with

    c = 3 and (nc) = (153) = 12 degrees of freedomfor the desired level of (= 0.05 used here):

    3.77QU

  • 8/12/2019 Anova General

    36/61

    The Tukey-Kramer Procedure:Example

    5. All of the absolute mean differencesare greater than critical range.Therefore there is a significant

    difference between each pair ofmeans at 5% level of significance.Thus, with 95% confidence we can concludethat the mean distance for club 1 is greaterthan club 2 and 3, and club 2 is greater than

    club 3.

    16.2855

    1

    5

    1

    2

    93.33.77

    n

    1

    n

    1

    2

    MSWQRangeCritical

    j'j

    U

    3. Compute Critical Range:

    20.2xx

    43.4xx

    23.2xx

    32

    31

    21

    4. Compare:

    (continued)

  • 8/12/2019 Anova General

    37/61

    The Randomized Block Design

    Like One-Way ANOVA, we test for equal

    population means (for different factor levels, for

    example)...

    ...but we want to control for possible variation

    from a second factor (with two or more levels)

    Levels of the secondary factor are called blocks

  • 8/12/2019 Anova General

    38/61

    Partitioning the Variation

    Total variation can now be split into three parts:

    SST = Total variation

    SSA = Among-Group variation

    SSBL = Among-Block variation

    SSE = Random variation

    SST = SSA + SSBL + SSE

  • 8/12/2019 Anova General

    39/61

    Sum of Squares for Blocking

    Where:

    c = number of groups

    r = number of blocks

    Xi.= mean of all values in block i

    X = grand mean (mean of all data values)

    r

    1i

    2

    i. )XX(cSSBL

    SST = SSA + SSBL + SSE

  • 8/12/2019 Anova General

    40/61

    Partitioning the Variation

    Total variation can now be split into three parts:

    SST and SSA are

    computed as they werein One-Way ANOVA

    SST = SSA + SSBL + SSE

    SSE = SST(SSA + SSBL)

  • 8/12/2019 Anova General

    41/61

    Mean Squares

    1c

    SSAgroupsamongsquareMeanMSA

    1r

    SSBLblockingsquareMeanMSBL

    )1)(1(

    cr

    SSEMSE errorsquareMean

  • 8/12/2019 Anova General

    42/61

    Randomized Block ANOVA Table

    Source of

    VariationdfSS MS

    Among

    BlocksSSBL MSBL

    Error

    (r1)(c-1)SSE MSE

    Total rc - 1SST

    r - 1 MSBL

    MSE

    F ratio

    c = number of populations rc = sum of the sample sizes from all populations

    r = number of blocks df = degrees of freedom

    Among

    Treatments SSA c - 1 MSAMSA

    MSE

  • 8/12/2019 Anova General

    43/61

    Blocking Test

    Blocking test: df1= r1

    df2= (r1)(c1)

    MSBL

    MSE

    ...:H 3.2.1.0

    equalaremeansblockallNot:H1

    F =

    Reject H0 if F > FU

  • 8/12/2019 Anova General

    44/61

    Main Factor test: df1= c1

    df2= (r1)(c1)

    MSA

    MSE

    c..3.2.10 ...:H

    equalaremeanspopulationallNot:H1

    F =

    Reject H0 if F > FU

    Main Factor Test

  • 8/12/2019 Anova General

    45/61

    The Tukey Procedure

    To test whichpopulation means are significantlydifferent e.g.: 1= 23 Done after rejection of equal means in randomized

    block ANOVA design

    Allows pair-wise comparisons Compare absolute mean differences with critical

    range

    x= 1 2 3

  • 8/12/2019 Anova General

    46/61

    etc...

    xx

    xx

    xx

    .3.2

    .3.1

    .2.1

    The Tukey Procedure(continued)

    r

    MSERangeCritical uQ

    If the absolute mean differenceis greater than the critical rangethen there is a significantdifference between that pair ofmeans at the chosen level ofsignificance.

    Compare:?RangeCriticalxxIs .j'.j

    F t i l D i

  • 8/12/2019 Anova General

    47/61

    Factorial Design:

    Two-Way ANOVA

    Examines the effect of

    Two factors of intereston the dependent

    variable

    e.g., Percent carbonation and line speed on soft drinkbottling process

    Interaction between the different levelsof these

    two factors

    e.g., Does the effect of one particular carbonation

    level depend on which level the line speed is set?

  • 8/12/2019 Anova General

    48/61

    Two-Way ANOVA

    Assumptions

    Populations are normally distributed

    Populations have equal variances

    Independent random samples are

    drawn

    (continued)

    T W ANOVA

  • 8/12/2019 Anova General

    49/61

    Two-Way ANOVASources of Variation

    Two Factors of interest: A and B

    r = number of levels of factor A

    c = number of levels of factor B

    n= number of replications for each cell

    n = total number of observations in all cells

    (n = rcn)

    Xijk= value of the kthobservation of level i of

    factor A and level j of factor B

    T W ANOVA

  • 8/12/2019 Anova General

    50/61

    Two-Way ANOVASources of Variation

    SST

    Total Variation

    SSAFactor A Variation

    SSBFactor B Variation

    SSAB

    Variation due to interactionbetween A and B

    SSERandom variation (Error)

    Degrees of

    Freedom:

    r1

    c1

    (r1)(c1)

    rc(n 1)

    n - 1

    SST = SSA + SSB + SSAB + SSE

    (continued)

  • 8/12/2019 Anova General

    51/61

    Two Factor ANOVA Equations

    r

    1i

    c

    1j

    n

    1k

    2

    ijk )XX(SST

    2r

    1i

    ..i )XX(ncSSA

    2c

    1j

    .j. )XX(nrSSB

    Total Variation:

    Factor A Variation:

    Factor B Variation:

  • 8/12/2019 Anova General

    52/61

    Two Factor ANOVA Equations

    2

    r

    1i

    c

    1j

    .j.i..ij . )XXXX(nSSAB

    r

    1i

    c

    1j

    n

    1k

    2.ijijk )XX(SSE

    Interaction Variation:

    Sum of Squares Error:

    (continued)

  • 8/12/2019 Anova General

    53/61

    Two Factor ANOVA Equations

    where:MeanGrand

    nrc

    X

    X

    r

    1i

    c

    1j

    n

    1k

    ijk

    r)...,2,1,(iAfactorofleveliofMeannc

    X

    X th

    c

    1j

    n

    1k

    ijk

    ..i

    c)...,2,1,(jBfactorofleveljofMeannr

    X

    X

    th

    r

    1i

    n

    1k

    ijk

    .j.

    ijcellofMeann

    XX

    n

    1k

    ijk.ij

    r = number of levels of factor A

    c = number of levels of factor B

    n= number of replications in each cell

    (continued)

  • 8/12/2019 Anova General

    54/61

    Mean Square Calculations

    1r

    SSAAfactorsquareMeanMSA

    1c

    SSBBfactorsquareMeanMSB

    )1c)(1r(

    SSABninteractiosquareMeanMSAB

    )1'n(rc

    SSEerrorsquareMeanMSE

    T W ANOVA

  • 8/12/2019 Anova General

    55/61

    Two-Way ANOVA:The F Test Statistic

    F Test for Factor B Effect

    F Test for Interaction Effect

    H0: 1.. = 2.. = 3..=

    H1: Not all i..are equal

    H0: the interaction of A and B is

    equal to zero

    H1: interaction of A and B is not

    zero

    F Test for Factor A Effect

    H0: .1. = .2. = .3.=

    H1: Not all .j.are equal

    Reject H0

    if F > FUMSE

    MSAF

    MSE

    MSBF

    MSE

    MSABF

    Reject H0

    if F > FU

    Reject H0

    if F > FU

    T o Wa ANOVA

  • 8/12/2019 Anova General

    56/61

    Two-Way ANOVASummary Table

    Source of

    Variation

    Sum of

    Squares

    Degrees of

    Freedom

    Mean

    Squares

    F

    Statistic

    Factor A SSA r1MSA

    = SSA/(r1)

    MSA

    MSE

    Factor B SSB c1MSB

    = SSB/(c1)

    MSB

    MSE

    AB

    (Interaction)

    SSAB (r1)(c1)MSAB

    = SSAB/ (r1)(c1)

    MSAB

    MSE

    Error SSE rc(n1)MSE =

    SSE/rc(n1)

    Total SST n1

    Features of Two Way ANOVA

  • 8/12/2019 Anova General

    57/61

    Features of Two-Way ANOVAFTest

    Degrees of freedom always add up

    n-1 = rc(n-1) + (r-1) + (c-1) + (r-1)(c-1)

    Total = error + factor A + factor B + interaction

    The denominator of the FTest is always the

    same but the numerator is different

    The sums of squares always add up

    SST = SSE + SSA + SSB + SSAB

    Total = error + factor A + factor B + interaction

    Examples:

  • 8/12/2019 Anova General

    58/61

    Examples:Interaction vs. No Interaction

    No interaction:

    Factor B Level 1

    Factor B Level 3

    Factor B Level 2

    Factor A Levels

    Factor B Level 1

    Factor B Level 3

    Factor B Level 2

    Factor A Levels

    MeanResponse

    Me

    anResponse

    Interaction is

    present:

    Multiple Comparisons:

  • 8/12/2019 Anova General

    59/61

    Multiple Comparisons:

    The Tukey Procedure

    Unless there is a significant interaction, you

    can determine the levels that are significantly

    different using the Tukey procedure

    Consider all absolute mean differences and

    compare to the calculated critical range

    Example: Absolute differences

    for factor A, assuming three factors:

    3..2..

    3..1..

    2..1..

    XX

    XX

    XX

    Multiple Comparisons:

  • 8/12/2019 Anova General

    60/61

    Multiple Comparisons:

    The Tukey Procedure

    Critical Range for Factor A:

    (where Quis from Table E.10 with r and rc(n1) d.f.)

    Critical Range for Factor B:

    (where Quis from Table E.10 with c and rc(n1) d.f.)

    n'c

    MSERangeCritical UQ

    n'r

    MSERangeCritical UQ

  • 8/12/2019 Anova General

    61/61

    Chapter Summary

    Described one-way analysis of variance

    The logic of ANOVA

    ANOVA assumptions

    F test for difference in c means

    The Tukey-Kramer procedure for multiple comparisons

    Considered the Randomized Block Design

    Treatment and Block Effects

    Multiple Comparisons: Tukey Procedure

    Described two-way analysis of variance

    Examined effects of multiplefactors

    Examined interaction between factors