Analysis of Variance

32
ANOVA (Analysis of Variance) ANOVA na naman toh!!!!!!!!!!!!!!!!!!!!

Transcript of Analysis of Variance

  • ANOVA (Analysis of Variance)

    ANOVA na naman toh!!!!!!!!!!!!!!!!!!!!

  • Reality Human beings are complex beings. Our

    interaction is also complex.

    Not every human phenomenon/ social reality can be conveniently sliced in two groups.

    Social researchers often seeks to compare more than two groups or samples.

    T-test as an statistical method is limited in this sense.

  • It is a lot of work!

    Increasing risk (probability) to commit Type I Error (yielding statistically significant finding due to sampling error rather than by the true population difference.

    Need for a single overall decision.

    Analysis of Variance alyas ANOVA

    Why not do series of t-tests?

  • The Logic of Analysis of Variance

    Three aspects of variation

    Total variation

    Variation within groups

    Variation between groups

    Instead of t ratio, ANOVA yields F ratio (variation

    between groups and variation within groups are compared).

    Hypotheses:

    Ho: 1 = 2 = 3 = j

    Ha: i j

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    Variation Within Groups/Residuals

  • Variation Between Groups/ Model

    Copyright Pearson Education, Inc., Allyn & Bacon 2010

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    The Sum of Squares

    Sum of squares - the heart of ANOVA

    Sum of squared deviations from the mean

    Total (SStotal), Between (SSbetween/model), Within (SSwithin/residual)

    Formulas:

    2)( totaltotal XXSS

    2)( groupwithin XXSS

    2)( totalgroupgroupbetween XXNSS

    22

    22

    22

    totaltotalgroupgroupbetween

    totalgrouptotalwithin

    totaltotaltotaltotal

    XNXNSS

    XNXSS

    XNXSS

    These are much simpler formulas:

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    Mean Square (note: estimates of variance that exist in the population)

    Value of sums of squares grows as variation increases

    Value also increases with sample size

    Mean square (or variance) control for these

    influences

    Formulas:

    within

    withinwithin

    between

    betweenbetween

    df

    SSMS

    df

    SSMS

    kNdf

    kdf

    totalwithin

    between

    1

    Where k = number of groups

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    The F Ratio

    ANOVA yields an F ratio

    Variation between and within groups are compared

    F ratio must be evaluated for significance

    The larger the F ratio, the more likely it is to be

    statistically significant (F ratio > F critical)

    Must calculate degrees of freedom (within

    [denominator] and between [numerator])

    Interpret F ratio with Table D in the appendix

    within

    between

    MS

    MSF

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    A Multiple Comparison of Means

    A significant F ratio tells us there is a difference among groups.

    If we had only two samples, no other test would be necessary.

    With three or more samples, we need to determine where the difference lies.

    We do this with Tukeys HSD (Honestly Significant Difference).

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    Tukeys HSD (note: a post hoc test) Tukeys HSD (Honestly Significant Difference) only makes

    sense after a significant F ratio has been identified.

    This allows for a comparison of any two means against Tukeys HSD calculation.

    Takes into account Type I errors

    group

    within

    N

    MSqHSD

    q = table value at a given level of significance for the total number of group means being compared MSwithin = within-groups mean square (obtained from the analysis of variance) Ngroup = number of subjects in each group (assumes the same number in each group)

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    Summary

    t ratio limited F ratio allows comparison of two or more means

    F ratio significance is interpreted using Table D ( F critical).

    Tukeys HSD isolates significant differences

  • Copyright Pearson Education, Inc., Allyn & Bacon 2010

    Requirements for Using the F Ratio

    1. A comparison between two or more

    independent means

    2. Data at the interval level of measurement

    3. Random sampling techniques

    4. A normal distribution

    5. Equal variances assumed

  • Example of ANOVA (One-way)

  • nce upon a time in a far far away land, there lived

    a witch who is so obsessed of K-pop and TRUE LOVE. To make the story short, she concocted a LOVE potion out of Cupids urine, Austin Powers mojo, mermaids tongues, Taylor Swifts teardrops on her guitar, sugar and spice, and everything niceand also not to forget chemical X blah blah blah blah blah blah blah blah blah blah blah the witch became so filthy rich. THE END

    The Love Potion

  • As Merlins experimental researcher, you are interested about the effect of different concentration of this LOVE potion to the lovers lovelovelove. Given the lovelovelove levels(1-10) of the three sample groups (Placebo group, Low

    dose, and High dose); you want to predict the levels of lovelovelove from different levels (dosage) of Love potion. You may use 95% confidence interval.

    Placebo group

    3

    2

    1

    1

    4

    Low Dose group

    5

    2

    4

    2

    3

    High Dose group

    7

    4

    5

    3

    6

  • Hypotheses

    Ho: There is no significant difference of lovelovelove levels of participants when grouped according to different dosage of the Love potion. Thus the potion has no effect in inducing lovelovelove.

    Ha: There is a significant difference of lovelovelove levels of participants when grouped according to different dosage of the Love potion. Thus the potion has effect in inducing lovelovelove.

  • Placebo Group Low Dose Group High Dose Group

    (N1 = 5) (N2 = 5) (N3 = 5)

    X1 X12 X2 X2

    2 X3 X32

    3 9 5 25 7 49

    2 4 2 4 4 16

    1 1 4 16 5 25

    1 1 2 4 3 9

    4 16 3 9 6 36

    X1 = 11 X12 = 31 X2 = 16 X2

    2 = 58 X3 = 25 X32 = 135

    X1 = 2.2 X2 = 3.2 X3 = 5.0

  • Step 1: Find the mean for each sample

    X = X/N

    X1 = 2.2

    X2 = 3.2

    X3 = 5.0

    Step 2: Find the sum of scores, sum of squared scores, number of subjects, and mean for all groups combined.

    Xtotal = X1 + X2 + X3

    = 11 + 16 + 25

    = 52

  • Step 2 continuation X2total = X1

    2+ X22+ X3

    2

    = 31 + 58 + 135

    = 224

    Ntotal = N1 + N2 + N3 = 5+ 5 + 5

    = 15

    Xtotal = Xtotal / Ntotal = 52/15

    = 3.47

  • Step 3: Find the total sum of squares. SStotal = X

    2total - Ntotal X

    2total

    = 224 (15) (3.47)2 = 224 (15) (12.04)

    = 224 180.6

    = 43.4

    Step 4: Find the within-groups sum of squares SSwithin = X

    2total - Ngroup X

    2group

    = 224 [ (5) (2.2)2 + (5) (3.2)2 + (5) (5.0)2 ]

    = 224 [ (5) (4.84) + (5) (10.24) + (5) (25) ]

    = 224 (24.2 + 51.2 + 125)

    = 224 200.4

    = 23.6

  • Step 5: Find the between-groups sum of squares. SSbetween = Ngroup X

    2group - Ntotal X

    2total

    = [ (5) (2.2)2 + (5) (3.2)2 + (5) (5.0)2 ] (15) (3.47)2

    = [ (5) (4.84) + (5) (10.24) + (5) (25) ] (15) (12.04)

    = (24.2 + 51.2 + 125) - 180.6

    = 200.4 180.6

    = 19.8

    Step 6: Find the between-groups degrees of freedom. dfbetween = k 1

    = 3 1

    = 2

  • Step 7: Find the within-groups degrees of freedom.

    dfwithin = Ntotal k

    = 15 3

    = 12

    Step 8: Find the within-groups mean square.

    MSwithin = SSwithin/ dfwithin = 23.6/12

    = 1.97

    Step 9: Find the between-groups mean square.

    MSbetween = SSbetween/dfbetween = 19.8/2

    = 9.9

  • Step 10: Obtain the F-ratio.

    F = MSbetween/MSwithin =9.9/1.97

    =5.025

    Step 11: Compare F-obtained with F-table.

    F-obtained > F-table

    F-obtained = 5.025

    F-table (3.88, to other references 3.89)

    df = df between/ df within 2 (column) and 12 (row)

    = 0.05

    5.025 > 3.88, TRUE!!!! Reject Ho!

    E-celebrate na yan!

    F-Distribution table used: Levin, Fox, and Forde, 2013)

  • Mag-manwal Kalkyulesyon napud ta Sir, foreves?!!!!!

    Thanks to Merlin, he uses SPSS!

    Lets go analyze this LOVEly problem with SPSS: using the Love Potion data set

  • Descriptives

    LoveLoveLove N Mean Std.

    Deviation Std. Error

    95% Confidence Interval for Mean

    Minimum Maximum

    Lower Bound Upper Bound

    Placebo 5 2.2000 1.30384 .58310 .5811 3.8189 1.00 4.00

    Low Dose 5 3.2000 1.30384 .58310 1.5811 4.8189 2.00 5.00

    High Dose 5 5.0000 1.58114 .70711 3.0368 6.9632 3.00 7.00

    Total 15 3.4667 1.76743 .45635 2.4879 4.4454 1.00 7.00

  • ANOVA

    LoveLoveLove Sum of Squares df Mean Square F Sig.

    Between Groups 20.133 2 10.067 5.119 .025

    Within Groups 23.600 12 1.967

    Total 43.733 14

  • Multiple Comparisons

    Dependent Variable: LoveLoveLove

    Tukey HSD (I) Dosage of Love Potion

    (J) Dosage of Love Potion

    Mean Difference

    (I-J)

    Std. Error

    Sig. 95% Confidence Interval

    Lower Bound

    Upper Bound

    Placebo Low Dose

    -1.00000 .88694 .516 -3.3662 1.3662

    High Dose -2.80000* .88694 .021 -5.1662 -.4338

    Low Dose Placebo

    1.00000 .88694 .516 -1.3662 3.3662

    High Dose -1.80000 .88694 .147 -4.1662 .5662

    High Dose Placebo

    2.80000* .88694 .021 .4338 5.1662

    Low Dose 1.80000 .88694 .147 -.5662 4.1662

    *. The mean difference is significant at the 0.05 level.

  • Decision for our Null Hypothesis

    Reject Ho

    (P

  • How about the effect size?

    Eta squared (2) a.k.a R2

  • Effect size

    Omega squared

  • Thank you wizard trainees

    for listening