2.16 Analysis of Variance (ANOVA) Rev DD 20100604

download 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

of 71

Transcript of 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    1/71

    2.16 Analysis of Variance (ANOVA)

    Six Sigma Black Belt and Green Belt

    Week 2

    Revised 4th

    June 2010

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    2/71

    2010-06-04 SKF Group Slide 1 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Objectives

    To introduce ANOVA hypothesis testing

    Graphical method for analysing differences between means obtained

    from two or more samples

    Analysis of Variance (ANOVA) methods for analysing the differencesbetween means

    To understand the relationship of

    "within" subgroup estimates of variation and

    "between" subgroup estimates of variation

    To understand the measuring effect size

    To practice examples

    To introduce the Post Hoc test

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    3/71

    2010-06-04 SKF Group Slide 2 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    SKF Six Sigma roadmap

    Six Sigma methodology and roadmap for common tool usage

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    4/71

    2010-06-04 SKF Group Slide 3 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    By knowing and controlling the Xs, we reduce thevariability in Y.

    We validate Xs and Ys with hypothesis testing.

    Variable

    (Continuous)

    Variables with categories

    (Attribute)

    Validating key process inputs and outputs with ANOVA

    Y = f(X1

    , X2

    , X3

    , ..., Xn

    )

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    5/71

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    6/712010-06-04 SKF Group Slide 5 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Method

    Uses sums of squared differences, just like a standard deviation, to

    evaluate the total variability of the system

    Calculates "standard deviations" for each source and subtracts their

    variability from the total

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    7/712010-06-04 SKF Group Slide 6 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    ANOVA graphical

    Between subgroup variation (signal)

    Within subgroup variation (error)

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    8/71

    2010-06-04 SKF Group Slide 7 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Degree of freedom introduction

    Degrees of freedom (df) is the number of independent comparisons

    available to estimate a specific statistic.

    In ANOVA, the degrees of freedom are based on the total number of

    responses and the number of levels at which factors are tested.

    What is the minimum number of comparisons it would take todetermine which person is the shortest?

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    9/71

    2010-06-04 SKF Group Slide 8 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    The degree of freedom concept?

    Example:

    Consider a sample ofn

    = 3 scores with a mean of X-bar = 5. The first

    score in the sample can be selected without any restrictions; all

    scores are independent of each other and they can have any value.

    For this demonstration assume X = 2 is obtained for the first scoreand X = 9 for the second.

    At this point, however, the third score can be determined.

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    10/71

    2010-06-04 SKF Group Slide 9 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    The degree of freedom concept?

    In this case the third score must be X = 4.

    The reason that the third score has to be X = 4 is, the entire sampleofn

    = 3 scores has a mean of: X-bar = 5, which means that the sum

    of the total must be: X = 15. The first two scores add up to 11

    (= 9 + 2), so the third score must be X = 4.

    In this case the first two out of three scores were free to have

    any

    value, but the final score was dependent on the values chosen for thefirst two. With a sample ofn

    scores, the first n-1 scores are free to

    vary, but the final score can be determined.

    As a result, the sample is said to have n-1 degrees of freedom (df).The degrees of freedom determine the number of scores in thesample which are independent and free to vary.

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    11/71

    2010-06-04 SKF Group Slide 10 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Thus in other words ...

    Degrees of freedom are "statistical cash" ...

    We "earn" a degree of freedom for every data point we collect

    We "spend" a degree of freedom for every parameter we estimate

    Degrees of freedom (within groups):

    Earn a degree of freedom for each observation within each group

    Spend one degree of freedom to calculate the average for each group

    dfW

    = n

    1, where n

    = sample size / treatment

    Degrees of freedom (between groups):

    Earn a degree of freedom for each group

    Spend one degree of freedom to calculate the overall average

    dfB

    = k

    1, where k

    = # of group averages or number of treatments

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    12/71

    2010-06-04 SKF Group Slide 11 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Degree of freedom and ANOVA

    The n-1 degrees of freedom for a sample is the same n-1 that isused in the formulas for sample variance and sample standarddeviation.

    Remember, variance is defined as the mean square deviation. This

    mean it is computed by finding the sum and dividing by the numberof scores:

    Mean = Sum / Number of scores

    To calculate sample variance (mean squared deviation), we find thesum of the squared deviations (SS) and divide by the number ofscores that are free to vary. This number is n-1 = df.

    df

    Sum of squared deviations

    Number of scores free to vary

    SS==s2

    i i

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    13/71

    2010-06-04 SKF Group Slide 12 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Calculate the F statistic Example of seal life by shift (1)Shift 1 Shift 2 Shift 3

    25.40 23.40 20.00

    26.31 21.80 22.20

    24.10 23.50 19.75

    23.74 22.75 20.60

    25.10 21.60 20.40

    Mean 24.93 22.61 20.59

    df = (5-1) (5-1) (5-1)

    Overall average = 22.71

    Data collection !

    dftotal

    = (4) + (4) + (4) = 12

    C l l h F i i

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    14/71

    2010-06-04 SKF Group Slide 13 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Calculate the F statistic Example of seal life by shift (2)

    Mean shift 1

    (24.93) Mean shift 2(22.61)

    Mean shift 3(20.59)

    Overall average = 22.71

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    15/71

    C l l h F i i

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    16/71

    2010-06-04 SKF Group Slide 15 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Calculate the F statistic Example of seal life by shift (3)between

    between2between

    dfSSs =

    within

    within2

    withindf

    SSs =

    25.600.921123.582

    ssF 2

    within

    2

    betweendf,df 21

    ===

    23.58213

    47.164dfSS

    between

    between=

    =

    0.9211315

    11.0532

    df

    SS

    within

    within=

    =

    The F-distribution depends on two sets of degrees of freedom:

    the df from each variance: df1

    for the between

    and df2

    for the within.

    Number of shifts

    Total data available

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    17/71

    2010-06-04 SKF Group Slide 16 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    What is the distribution of the F-ratio?

    This is the distribution of F-ratios that would occur if there was nodifference in group means.

    For example, say Im willing to take a 5% chance of being wrong by

    saying there is more between than within variation.

    Fcritical

    at 5%

    5% of the total area is fromthis F value, Fcrit

    to the right

    The curvechanges as afunction of thenumerator df

    anddenominator df

    Represents theamount of risk I'mwilling to take ofbeing wrong when Isay that Ive foundthis factor to be asignificant effect.

    A calculated F-ratio > Fcrit

    gives me less than a 5%chance that the largerbetween variationoccurred by chance alone.

    Remember you choose the amount of risk to take, then find a corresponding Fcritical

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    18/71

    F di t ib ti t bl

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    19/71

    2010-06-04 SKF Group Slide 18 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    F-distribution table Probability points of the F-distributionDegrees of

    Freedom for

    Denominator

    Degrees of Freedom for Numerator (df)1 2 3 4 5 6 7 8 9 10 15 20

    1161.4 199.5 315.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 245.9 248.0

    4052 5000 5403 5625 5764 5859 5928 5981 6022 6056 6157 6209

    218.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.43 19.45

    98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.43 99.45

    3

    10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.70 8.66

    34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 26.87 26.69

    47.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.86 5.80

    21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.20 14.02

    56.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.62 4.56

    16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.72 9.55

    65.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 3.94 3.87

    13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.56 7.40

    75.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.51 3.44

    12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.31 6.16

    85.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.22 3.15

    11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.52 5.36

    95.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.01 2.94

    10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 4.96 4.81

    10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.85 2.7710.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.56 4.41

    154.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.40 2.33

    8.68 6.36 5.42 4.89 5.56 4.32 4.14 4.00 3.89 3.80 3.52 3.37

    204.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.20 2.12

    8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.09 2.94

    = 0.05 ... first row

    = 0.01 ... second rowNumerator

    Denominator

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    20/71

    2010-06-04 SKF Group Slide 19 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Mean sum of squares (MS)

    In ANOVA, we use the term Mean Square, or simply

    MS, in stead of the

    term variance.

    Remember that variance is defined as the mean of the

    squared deviations. In the same way that we use SS

    to stand for the

    sum of the squared deviations, we now will use MS

    to stand for the

    mean of the squared deviations. For the final F-ratio we will need anMSbetween

    treatments for the numerator and MSwithin

    treatments for the

    denominator.

    within

    between

    MS

    MSratio-F =

    MSbetween

    = SSbetween/ dfbetween

    MSwithin

    = SSwithin/ dfwithin

    P titi f i d F ti

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    21/71

    2010-06-04 SKF Group Slide 20 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Partition of variance and F-ratio OverviewTotal variability

    Between treatmentsvariance

    Within treatmentsvariance

    Measures differences due to:

    Treatment effects and

    Chance

    Measures differences due to:

    Chance

    Signal Error

    Variance (MSbetween

    ) = SSbetweendfbetween

    Variance (MSwithin

    ) = SSwithindfwithin

    F-ratio =MSbetween

    MSwithin

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    22/71

    2010-06-04 SKF Group Slide 21 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    The p-value and ANOVA

    Assumptions ...

    H0 : There are no differences between subgroups meansHA

    : There are differences between subgroups means

    Low p-values suggest that there ARE differences betweensubgroups means.

    Tip: P-value is low, H0

    must go !

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    23/71

    2010-06-04 SKF Group Slide 22 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    The R-squared value and ANOVA

    To provide an indication of how large the effect actually is, we

    check

    p-value but also the R2

    value to take the decision if the result is robust

    or not.

    For Analysis of Variance, the simplest and most direct way to measureeffect size is to compute R2, the percentage of variance accounted for.

    In simpler terms, R2

    measures how much of the difference between

    scores is accounted for by the differences between treatments.

    SSbetween

    measures the variability accounted for by the treatment

    differences, and SStotal

    measures the total variability.

    total

    between2

    SS

    SSR =

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    24/71

    2010-06-04 SKF Group Slide 23 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    The R-squared value and ANOVA - examples

    R2

    = 90%

    Sum

    ofvariance

    Variance explained by thefactor (treatment)

    Error, part ofvariance not

    explained by thefactor (xs)

    R2

    = 50 %

    50%

    90%

    Sum

    ofvariance

    Which model is more robust? A or B?

    A B

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    25/71

    2010-06-04 SKF Group Slide 24 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    ANOVA assumptions

    1.

    Normality

    2.

    Homogeneity of variance (equal variances)

    3.

    Independence of error

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    26/71

    2010-06-04 SKF Group Slide 25 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Independence of error

    Errors should be independent for each value and over

    time

    If not, then do not assume test is valid

    Identify why error is not independent and correct

    We use control charts to check the stability and detectthe special cause

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    27/71

    2010-06-04 SKF Group Slide 26 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Normality

    The values in each group are Normally distributed

    While the ANOVA method is robust against departures

    from normality as in the t-test, especially with largesample sizes, non-normal distributions where normality

    would be expected may indicate an area of investigation

    Master Black Belt may be consulted when non-normal

    data is being analysed (non-parametric tests)

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    28/71

    2010-06-04 SKF Group Slide 27 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Homogeneity of variance

    The variance within each group is equal

    However, if the sample sizes are equal between groups,

    the F-test is robust enough for unequal variances

    Always try to have equal sample sizes

    If both normality and equal variances are violated,Master Black Belt may be consulted

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    29/71

    2010-06-04 SKF Group Slide 28 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    The p-value

    For a classical hypothesis test, use the p-value to evaluatethe probability that the calculated F-ratio (or test statistic)was due to within

    subgroup noise.

    Low p-values

    suggest that there ARE differences between

    subgroups means:

    H0

    : There are no differences between subgroups means

    HA

    : There are differences between subgroups means

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    30/71

    2010-06-04 SKF Group Slide 29 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Examples to practice !

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    31/71

    2010-06-04 SKF Group Slide 30 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    One-way ANOVA

    Stat > ANOVA > One-Way

    Data must be in one column and the subscripts in another

    Can be used with balanced and unbalanced designs

    A one-way analysis of variance (ANOVA) tests the hypothesis that

    the means of several populations are equal

    The method is an extension of the two-sample t-test, specifically

    for the case were the population variances are assumed to be

    equal. A one-way analysis of variance requires the following: A response, or measurement taken from the units sampled

    A factor, or discrete variable which is altered systematically

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    32/71

    2010-06-04 SKF Group Slide 31 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    One-way ANOVA example: Tire brand test

    Four cars:

    1 2 3 4

    Four brands of tires:

    A B C D

    Objective: To determine tread wear of tires after 30,000 km of driving.

    Problem: How do we assign 16 tires to the 4 cars?

    Assign each of the 16 tires at random to a wheel. (Large variability

    within brands.)

    Ref.: "Fundamental Concepts in the Designof Experiments" by Hicks and Turner

    Cars 1 2 3 4C (12) A (14) C (10) A (13)

    A (17) A (13) D (11) D (9)

    D (13) B (14) B (14) B (8)D (11) C (12) B (13) C (9)

    Difference in tread thickness in mm.Model

    Tread wear = Overall mean + Brand effect + error

    Data of tread wear of tires

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    33/71

    2010-06-04 SKF Group Slide 32 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Data of tread wear of tires Each of the 16 tires assigned at random to wheelCar Brand Tread

    One

    C 12

    A 17

    D 13

    D 11

    Two

    A 14

    A 13

    B 14

    C 12

    Three

    D 10

    C 11

    B 14

    B 13

    Four

    A 13

    D 9

    B 8

    C 9

    Open the file

    and

    check the different assumptions:

    Stability

    Normality

    Homogeneity of variance

    (equal variances)

    ANOVA - Tire Brand.MTW

    One-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    34/71

    2010-06-04 SKF Group Slide 33 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    One way ANOVA Stat > ANOVA > One-way

    You wish to compare the mean tread wear

    for the different types of brands of tires.H0 is that

    the tread wear are all the same.

    Any variation is caused by random variationfound in each brand. The HA is that differentbrands have different tread wear.

    Normality and stability

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    35/71

    2010-06-04 SKF Group Slide 34 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    5.02.50.0-2.5-5.0

    99

    90

    50

    10

    1

    Residual

    Perc

    ent

    14131211

    2

    0

    -2

    -4

    Fitted V alue

    Resi

    dual

    3210-1-2-3-4

    3

    2

    1

    0

    Residual

    Frequency

    16151413121110987654321

    2

    0

    -2

    -4

    Observation Order

    Res

    idual

    Normal Probability Plot Versus Fits

    Histogram Versus Order

    Residual Plots for Tread

    Normality and stability One-way ANOVA Residual plots

    Normal ?

    Stable ?

    V i

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    36/71

    2010-06-04 SKF Group Slide 35 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    D

    C

    B

    A

    181614121086420

    B

    rand

    95% Bonferroni Confidence Intervals for StDevs

    Test St at istic 1.52

    P-V alue 0.677

    Test St at istic 0.15

    P-V alue 0.926

    Bart lett 's Test

    Levene's Test

    Test for Equal Variances for Tread

    Variance

    Variances are equal ?

    O ANOVA l

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    37/71

    2010-06-04 SKF Group Slide 36 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    One-way ANOVA example

    1.

    Open the file 2.

    Select Stat > ANOVA > One-Way

    3.

    Select Tread for the Response

    and Brand for the Factor

    4.

    Click on OK

    ANOVA - Tire Brand.MTW

    Interpreting the One-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    38/71

    2010-06-04 SKF Group Slide 37 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    One-way ANOVA: Tread versus Brand

    Source DF SS MS F P

    Brand 3 30.69 10.23 2.44 0.115

    Error 12 50.25 4.19

    Total 15 80.94

    S = 2.046 R-Sq = 37.92% R-Sq(adj) = 22.39%

    Individual 95% CIs For Mean Based on

    Pooled StDev

    Level N Mean StDev -------+---------+---------+---------+--

    A 4 14.250 1.893 (----------*----------)

    B 4 12.250 2.872 (----------*----------)

    C 4 11.000 1.414 (----------*----------)

    D 4 10.750 1.708 (----------*----------)

    -------+---------+---------+---------+--

    10.0 12.0 14.0 16.0

    Pooled StDev = 2.046

    MINIT

    AB

    Interpreting the One way ANOVA Output from the session windowThe 1st

    row "Brand" gives the stats for

    the variation between the means of thefactor levels.The 2nd

    row "Error" gives the stats for

    the variation due to random error.The 3rd

    row "Total" gives the stats for

    the overall variability in the data.

    Interpreting the One-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    39/71

    2010-06-04 SKF Group Slide 38 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    One-way ANOVA: Tread versus Brand

    Source DF SS MS F P

    Brand 3 30.69 10.23 2.44 0.115

    Error 12 50.25 4.19

    Total 15 80.94

    S = 2.046 R-Sq = 37.92% R-Sq(adj) = 22.39%

    Individual 95% CIs For Mean Based on

    Pooled StDev

    Level N Mean StDev -------+---------+---------+---------+--

    A 4 14.250 1.893 (----------*----------)

    B 4 12.250 2.872 (----------*----------)

    C 4 11.000 1.414 (----------*----------)

    D 4 10.750 1.708 (----------*----------)

    -------+---------+---------+---------+--

    10.0 12.0 14.0 16.0

    Pooled StDev = 2.046

    MINIT

    AB

    Interpreting the One way ANOVA Output from the session window1.

    What is your decision?

    2.

    The result is robust or not and why?

    3. Which Brand is best?

    Two-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    40/71

    2010-06-04 SKF Group Slide 39 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Two way ANOVAUsing a 2nd

    variable to block Car variation

    Assign each tire at random but under the condition thateach tire occurs exactly once on each car.

    Reduces

    unexplained variability.

    Cars 1 2 3 4B (14) D (11) A (13) C (9)

    C (12) C (12) B (13) D (9)

    A (17) B (14) D (11) B (8)D (13) A (14) C (10) A (13)

    Model

    Tread wear = Overall mean + Brand effect + Car effect + errorDifference in tread thickness in mm.

    Data of tread wear of tires

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    41/71

    2010-06-04 SKF Group Slide 40 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Data of tread wear of tires Each tire occurs exactly once on each carCar Brand Tread

    One

    B 14

    C 12

    A 17

    D 13

    Two

    D 11

    C 12

    B 14

    A 14

    Three

    A 13

    B 13

    D 11

    C 10

    Four

    C 9

    D 9

    B 8

    A 13

    Open the file

    and check the assumptions:

    Stability

    Normality

    Homogeneity of variance

    (equal variances)

    ANOVA - Tire Brand Car.MTW

    T ANOVA l Ti b d t t

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    42/71

    2010-06-04 SKF Group Slide 41 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Two-way ANOVA example: Tire brand test

    1.

    Open the file 2.

    Select Stat > ANOVA > Two-Way

    3.

    Select Tread for the Response

    and Brand for the Row factorand Car for Column factor.Check Display means.

    4.

    Click on OK

    ANOVA - Tire Brand Car.MTW

    Interpreting the Two-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    43/71

    2010-06-04 SKF Group Slide 42 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Two-way ANOVA: Tread versus Brand, Car

    Source DF SS MS F P

    Brand 3 30.6875 10.2292 7.96 0.007

    Car 3 38.6875 12.8958 10.04 0.003Error 9 11.5625 1.2847

    Total 15 80.9375

    S = 1.133 R-Sq = 85.71% R-Sq(adj) = 76.19%

    MINIT

    AB

    Interpreting the Two way ANOVA Output from the session windowp-values are low for Car and Brand,therefore: Brands are not the same, andTread loss for Cars is not the same.

    Lets look at the residuals plots ...

    T ANOVA R id l l t

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    44/71

    2010-06-04 SKF Group Slide 43 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Two-way ANOVA Residual plots

    210-1-2

    99

    90

    50

    10

    1

    Residual

    Perc

    ent

    161412108

    1

    0

    -1

    -2

    Fit t ed Value

    Residual

    1.00.50.0-0.5-1.0-1.5-2.0

    4

    3

    2

    1

    0

    Residual

    Freq

    uency

    16151413121110987654321

    1

    0

    -1

    -2

    Observat ion Order

    Res

    idual

    Normal Probability Plot Versus Fits

    Histogram Versus Order

    Residual Plots for Tread

    The residuals plots show no unusual observations.The Histogram is not bell shaped (only 16 observations) so it is hard to interpret.

    Interpreting the Two-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    45/71

    2010-06-04 SKF Group Slide 44 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Two-way ANOVA: Tread versus Brand, Car

    Individual 95% CIs For Mean Based on Pooled StDev

    Brand Mean -+---------+---------+---------+--------

    A 14.25 (-------*-------)B 12.25 (-------*-------)

    C 10.75 (-------*-------)

    D 11.00 (-------*-------)

    -+---------+---------+---------+--------

    9.6 11.2 12.8 14.4

    Individual 95% CIs For Mean Based on Pooled StDev

    Car Mean --------+---------+---------+---------+-

    Four 9.75 (------*-----)

    One 14.00 (-----*-----)

    Three 11.75 (------*-----)Two 12.75 (------*-----)

    --------+---------+---------+---------+-

    10.0 12.0 14.0 16.0

    MINITA

    B

    p g y Output from the session window

    The confidence intervals show:Brands are not the same, andTread loss for Cars is not the same.

    1.

    What is your decision?

    2.

    The result is robust or not and why?

    3.

    Which factor is significant?

    Lets look at the data graphically

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    46/71

    2010-06-04 SKF Group Slide 45 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Let s look at the data graphically

    Graph > Chart > Values from a table

    Lets look at the data graphically

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    47/71

    2010-06-04 SKF Group Slide 46 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Let s look at the data graphically

    Graph > Chart > Values from a table > Data View

    Displaying the Two way ANOVA design

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    48/71

    2010-06-04 SKF Group Slide 47 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Displaying the Two-way ANOVA design

    Car

    Brand

    FourThreeTwoOne

    DACBDACBDACBDACB

    18

    16

    14

    12

    10

    8

    6

    4

    2

    0

    Tread

    B

    C

    A

    D

    Brand

    Chart of Tread

    All 4 Brands performed better in Car One. This is an assignable difference due to Car.

    Also it appears that Brand A performs better at each Car than the other Brands.

    Displaying the Two way ANOVA design

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    49/71

    2010-06-04 SKF Group Slide 48 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Displaying the Two-way ANOVA design

    Brand

    Car

    DACB

    Four

    Three

    Two

    One

    Four

    Three

    Two

    One

    Four

    Three

    Two

    One

    Four

    Three

    Two

    One

    18

    16

    14

    12

    10

    8

    6

    4

    2

    0

    Tread

    One

    Two

    Three

    Four

    Car

    Chart of Tread

    Here we are trying to discover which Brand of Tires had the best

    Tread Wear characteristics.

    We included a blocking variable to explain some of the variability. Based on a comparison

    of the bar chart and the ANOVA table which Brand should be selected?

    Three-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    50/71

    2010-06-04 SKF Group Slide 49 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    y Using a Latin Square designEach brand appears once in each position and only once oneach car (2 restrictions on randomisation).

    Minimises

    variability.

    Model

    Tread wear = Overall mean + Brand effect + Car effect+ Position

    effect

    + errorDifference in tread thickness in mm.

    Position 1 2 3 4I C (12) D (11) A (13) B (8)

    I I B (14) C (12) D (11) A (13)

    I I I A (17) B (14) C (10) D (9)I V D (13) A (14) B (13) C (9)

    Data of tread wear of tires

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    51/71

    2010-06-04 SKF Group Slide 50 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Each brand appears once in each position and each carCar Position Brand Tread

    One

    Left Front B 14

    Right Front C 12

    Left Back A 17

    Right Back D 13

    Two

    Left Front D 11

    Right Front C 12

    Left Back B 14

    Right Back A 14

    Three

    Left Front A 13

    Right Front B 13

    Left Back D 11

    Right Back C 10

    Four

    Left Front C 9

    Right Front D 9

    Left Back B 8

    Right Back A 13

    Open the file

    and check the

    assumptions:

    Stability

    Normality

    Homogeneity of variance

    (equal variances)

    ANOVA - Tire Brand Car Position.MTW

    Three-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    52/71

    2010-06-04 SKF Group Slide 51 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    y Stat > ANOVA > General Linear ModelFill out the dialog box as shown.

    Click OK.

    Interpreting the General Linear Model

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    53/71

    2010-06-04 SKF Group Slide 52 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    General Linear Model: Tread versus Car, Position, Brand

    Factor Type Levels Values

    Car fixed 4 Four, One, Three, Two

    Position fixed 4 Left Back, Left Front, Right Back, Right FrontBrand fixed 4 A, B, C, D

    Analysis of Variance for Tread, using Adjusted SS for Tests

    Source DF Seq SS Adj SS Adj MS F P

    Car 3 38.6875 38.6875 12.8958 14.40 0.004

    Position 3 6.1875 6.1875 2.0625 2.30 0.177

    Brand 3 30.6875 30.6875 10.2292 11.42 0.007

    Error 6 5.3750 5.3750 0.8958

    Total 15 80.9375

    S = 0.946485 R-Sq = 93.36% R-Sq(adj) = 83.40%

    MINITA

    B

    p g Output from the session windowThe 1st

    half of the table lists the

    value for each level of each factor.The 2nd

    half is the ANOVA table.

    Two factors are statisticallysignificant at the

    = 0.05 level:

    Car, Brand. Factor Position

    doesnt

    appear to be a significant effect.The residual plots will confirm

    whether the basic assumptionsabout the error have been met.

    Lets look at the residuals plots ...

    General Linear Model Residual plots

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    54/71

    2010-06-04 SKF Group Slide 53 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    General Linear Model Residual plots

    General Linear Model Residual plots

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    55/71

    2010-06-04 SKF Group Slide 54 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    General Linear Model Residual plots

    10-1

    99

    90

    50

    10

    1

    Residual

    Pe

    rcent

    16141210

    1.0

    0.5

    0.0

    -0.5

    -1.0

    Fit t ed Value

    Re

    sidual

    1.00.50.0-0.5-1.0

    4

    3

    2

    1

    0

    Residual

    Frequency

    16151413121110987654321

    1.0

    0.5

    0.0

    -0.5

    -1.0

    Observat ion Order

    Residual

    Normal Probability Plot Versus Fits

    Histogram Versus Order

    Residual Plots for Tread

    Review the residual plots and state the conclusions about the assumptions

    regarding error, i.e. that the errors for each treatment level areindependent, normally distributed with a mean = 0 and a constant

    variance.

    ANOVA example with GLM

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    56/71

    2010-06-04 SKF Group Slide 55 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    ANOVA example with GLM

    How to include the interaction in the model?

    2.

    Select Tread for Response

    Car and Brand

    for Model.

    For the interaction we create Car*Brand.

    3. Click on OK

    1.

    Select Stat > ANOVA > General Linear Model

    GLM with an unbalanced and nested design

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    57/71

    2010-06-04 SKF Group Slide 56 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    GLM with an unbalanced and nested design

    Four chemical companies produce insecticides that can be used to

    kill

    mosquitoes, but the composition of the insecticides differs from

    company to

    company.

    An experiment is conducted to test the efficacy of the insecticides by placing

    400 mosquitoes inside a glass container treated with a single insecticide andcounting the live mosquitoes 4 hours later.

    Three replications are performed for each product.

    The goal is to compare the product effectiveness of the different companies.The factors

    are fixed

    because you are interested in comparing the particular

    brands.

    The factors are nested

    because each insecticide for each company is unique.

    You use GLM to analyse your data because the design is unbalanced:

    Company A: 3 type of products

    Company B: 2 type of products

    Company C: 2 type of products

    Company D: 4 type of products

    GLM with an unbalanced and nested design

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    58/71

    2010-06-04 SKF Group Slide 57 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    GLM with an unbalanced and nested design

    For the

    Nested design add (Company)

    2.

    Select NMosquito for Response

    Company and Product for Model.

    1.

    Select Stat > ANOVA > General Linear Model

    3.

    Click on OK

    GLM with an unbalanced and nested design

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    59/71

    2010-06-04 SKF Group Slide 58 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    GLM with an unbalanced and nested design

    ANOVA table in session window

    1.

    What is your decision?

    2.

    Which parameter is significant?

    Multi-way ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    60/71

    2010-06-04 SKF Group Slide 59 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Multi way ANOVA

    Two-way, Balanced, General Linear Model.

    Two-way ANOVA may also be used to analyse a design where thereare two controllable factors, both of which are of interest.

    More than two factors can be analysed using Balanced ANOVA orGeneral Linear Model.

    There may be more than one factor that has an effect on theresponse variable.

    This commonly occurs in manufacturing processes. It is often wise toinclude more than one factor in the analysis.

    Valuable resources can be used more efficiently by investigating

    several

    factors at one time.

    More error can be explained by including additional factors in the model.

    By including more factors interactions can be studied.

    What about the other ANOVA options?Wh h i ?

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    61/71

    2010-06-04 SKF Group Slide 60 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    When are they appropriate?One-wayANOVA

    Studies the effect of one factor at various levels on a response

    variable.Two-wayANOVA

    Studies the effect of two factors and their interaction at variouslevels on a response variable.

    BalancedANOVA

    Studies the impact of 2 or more factors and their interactions atvarious levels on a response variable. The levels of factors are

    structured such that there are an equal number of levels andobservations within each level for each factor.GeneralLinearModel

    Studies the impact of 2 or more factors and their interactions atvarious levels on a response variable. The number of levels andobservations may vary. The factors may be a mixture nested andcrossed relationship. User must specify factors, interactions andnested/crossed relationships of interest.

    FullyNestedANOVA

    Studies the impact of 2 or more factors. The factors arestructured in a hierarchical structure such that one factor isnested (or unique to) the factor above it. No interactions areobtained.

    Partitioning of sums of squares

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    62/71

    2010-06-04 SKF Group Slide 61 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Partitioning of sums of squares

    SS Within

    Brands

    Total

    SS

    SS Between

    Brands

    SS Within

    Cars

    SS Between

    Cars

    SS Within

    (Error)

    SS Between

    Positions

    Summary ANOVA

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    63/71

    2010-06-04 SKF Group Slide 62 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Summary ANOVA

    One Way ANOVA

    To analyse the difference between means from 2 or more samples

    Balanced ANOVA

    To compare the means of populations that are classified in two ormore ways (two or more factors)

    General Linear Model

    Similar to above

    Last words

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    64/71

    2010-06-04 SKF Group Slide 63 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Last words

    We reviewed:

    Graphical methods for analysing differences between means obtainedfrom 2 or more samples.

    Analysis of Variance (ANOVA) methods for analysing the differencesbetween means.

    Methods for determining whether or not significant differences invariance exist between two or more samples.

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    65/71

    2010-06-04 SKF Group Slide 64 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Appendix

    Post Hoc tests

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    66/71

    2010-06-04 SKF Group Slide 65 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Pos Hoc es s

    Definition: Post hoc tests are additional hypothesis tests that are doneafter an ANOVA to determine exactly which mean differences aresignificant and which are not. These tests are done when:

    You reject H0

    and there are three or more treatments.

    Rejecting H0

    indicates that at least one difference exists among the

    treatments.

    With k

    = 3 or more, the problem is to find where the differences are.

    Note that when you have two treatments, rejecting H0

    indicates that the

    two means are not equal, in this case there is no question about

    which

    means are different, and there is no need to do Post Hoc Tests.

    The first test we consider is Tukeys HSD test. Tukeys test allows youto compute a single value that determines the minimum difference

    between treatment mean that is necessary for significance.

    Post Hoc tests

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    67/71

    2010-06-04 SKF Group Slide 66 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    This value, called the Honestly Significant Difference (HSD) is then usedto compare any two treatments (Xs). If the mean difference exceedTukeys HSD you conclude that there is significant difference betweentreatments. The formula is:

    N: number of data for each treatment

    Where the value ofq

    is found in the table (next slide). To locate the

    appropriate value ofq, you must know the number of treatments in theoverall experiment (k) and the degree of freedom for the Error andselect the Alpha-risk (0.05) q

    value used in this test is called a

    Studentised range statistic.

    Tukeys test requires that the sample size must be the same for all

    treatments.

    n

    MSqHSD within=

    Post Hoc tests

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    68/71

    2010-06-04 SKF Group Slide 67 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    Tukeys HSD test example

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    69/71

    2010-06-04 SKF Group Slide 68 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    y p

    Example of seal life by shift

    Shift 1 Shift 2 Shift 3

    25.40 23.40 20.00

    26.31 21.80 22.20

    24.10 23.50 19.75

    23.74 22.75 20.60

    25.10 21.60 20.40

    Mean 24.93 22.61 20.59

    ANOVA result:

    P-value is low, the difference issignificant between the shifts.

    Now the question is:

    Which mean differences are

    significant and which are not?

    Tukeys HSD test example

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    70/71

    2010-06-04 SKF Group Slide 69 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

    y p

    Tukeys HSD calculation step 1:Determine the q value, in this example k=3 and dffor Error = 12. Checkthe value in the table, we get = 3.77 with Alpha-risk = 0.05

    Tukeys HSD calculation step 2:

    Determine the HSD value

    Tukeys HSD calculation step 3:

    The mean difference between any two samples must be at least 1,618to be significant. Using this value, we can make the followingconclusions :

    Shift 1 is significantly different from Shift 2 (Mean S1

    Mean S2

    = 2.32)

    Shift 1 is significantly different from Shift 3 (Mean S1

    Mean S3

    = 4.34)

    Shift 2 is significantly different from Shift 2 (Mean S2

    Mean S3

    = 2.02)

    1.6185

    0.9213.77HSD ==

    Summary

  • 8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604

    71/71

    y

    ANOVA is used as a hypothesis test and we also use it forcomponents of variation studies

    The X is attribute and Y is variable

    very common data sets

    ANOVA introduced us to 3 preliminary tests before concluding to

    accept or reject the null:

    Stability

    Normality

    Homogeneity of variance

    All hypothesis tests require these or similar tests of assumptions

    Use the appropriate design before to calculate ANOVA

    Use Tukeys HSD test to adjust the conclusion if needed