Learn to Use Factorial Analysis of Variance (ANOVA) in ...
Transcript of Learn to Use Factorial Analysis of Variance (ANOVA) in ...
Learn to Use Factorial Analysis of
Variance (ANOVA) in SPSS With
Data From the English Health
Survey (Teaching Dataset) (2002)
© 2019 SAGE Publications Ltd. All Rights Reserved.
This PDF has been generated from SAGE Research Methods Datasets.
Learn to Use Factorial Analysis of
Variance (ANOVA) in SPSS With
Data From the English Health
Survey (Teaching Dataset) (2002)
Student Guide
Introduction
This example dataset introduces Factorial Analysis of Variance (ANOVA). There
are a range of different types of ANOVA tests, including one-way and multiple
ANOVAs and MANOVAs. At their heart, all ANOVA tests examine the variance
of a continuous dependent variable(s) (for example, age, weight, and income)
across two (or more) levels of one or more independent categorical variable(s) (for
example, gender, class, and ethnicity). This example dataset introduces Factorial
ANOVA, which involves the ANOVA of a continuous dependent variable across
more than two levels of two or more categorical independent variables. This
example describes Factorial ANOVA, discusses the assumptions underlying it,
and shows how to compute and interpret it. We illustrate Factorial ANOVA using a
subset of data derived from the 2002 English Health Survey (Teaching Dataset).
Specifically, we test the extent to which the variance of an individual’s waist/
hip ratio differs by sex and marital status; does sex have more of an effect on
respondent’s waist/hip ratio (or vice versa) and is there a significant interaction
between sex and marital status in relation to respondent’s waist/hip ratio? This
page provides links to this sample dataset and a guide to producing a Factorial
ANOVA using statistical software.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 2 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
What Is Factorial ANOVA?
There are a range of ANOVA tests; the simplest is a One-way ANOVA that
examines the variance of a continuous dependent variable across the
subgroupings of a categorical variable. Typically, Factorial ANOVA refers to an
ANOVA test that has more than two independent categorical variables (or
“factors”); sometimes it may also be referred to as a Two-way (two independent
variables) or Three-way (three independent variables) ANOVA. The inclusion of
more than three independent variables in an ANOVA is rarely done because
they become evermore difficult to interpret, and a multiple linear regression or
similar test would be easier to use. In Factorial ANOVA, variability is attributed
to differences both between groups and within groups; each level or subgrouping
and factor are paired up with each other. This allows us to identify interactions
between levels and factors; if an interaction is present, it means that the
differences in one factor depend on the differences in another factor. A Factorial
ANOVA seeks to compare the influence of at least two independent variables,
each with at least two levels, on a dependent variable. Our example examines
differences in variances in respondent’s waist/hip ratio (our dependent variable)
both within and between sex (male/female) and marital status (single/married-
cohabiting/previously married now single), our two independent variables. In a
One-way ANOVA, we would simply look at the difference in variances of
respondent’s waist/hip ratio between men and women. Our Factorial ANOVA
allows us to examine several issues: First, is sex the main effect on waist/hip ratio
or is marital status? Is there a significant interaction between the factors – how
do marital status and sex interact in relation to respondent’s waist/hip ratio and
can differences in sex and respondent’s waist/hip ratio be found in the different
levels of marital status. In our Factorial ANOVA, we can also look at the potential
interaction of marital status on sex differences, for example, might married men
have a greater mean waist/hip ratio than both married women and single men.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 3 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
When computing formal statistical tests, it is customary to define the null
hypothesis (H0) to be tested. In Factorial ANOVA, as with many more complex
statistical tests, there are three null hypotheses: H0a, which tests that there
will be no significant difference on respondent’s waist/hip ratio based on sex
(male/female); H0b, which tests that there will be no significant difference on
respondent’s waist/hip ratio based on marital status (single/married-cohabiting/
previously married now single); and H0c, which tests that there will be no
significant interaction between sex and marital status in terms of respondent’s
waist/hip ratio. Some difference is expected simply due to sampling error, i.e.,
random chance in sampling. The Factorial ANOVA conducted here will help us
determine whether the difference in the variances is large enough to have been
sufficiently unlikely to have arisen by chance alone, so that we may declare the
test statistically significant. “Large enough,” as usually defined, is a test statistic
with a level of statistical significance, or p-value, of less than .05. (That is, that a
difference in variances so large or larger would emerge only 5% of the time by
chance alone, under the maintained assumptions of the test.) This would lead us
to reject all or some of our null hypotheses and conclude that there likely is a
difference between the categories of the (independent) categorical variable and
the variances of the (dependent) variable being tested.
Calculating a Factorial ANOVA
Factorial ANOVA works by using the F-distribution, which is an asymmetric
distribution. The F-distribution is related to Chi-Square. In Factorial ANOVA, an
F-statistic is calculated by dividing the variance of the group means by the mean
of the within group variances. This F-statistic can then be compared to an F critical
value to determine whether to reject the null hypotheses (or not). If the F critical
value is larger than the F-statistic that your test has generated, then you can reject
the null hypothesis.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 4 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
To illustrate how to calculate, by hand, a Factorial ANOVA, let’s imagine that we
have collected some (hypothetical) data. We have collected data on the “level
of domestic contentment” (scored on a scale of 0–10, with 10 being extremely
contented and 0 being extremely discontented) expressed by heterosexual men
and women who are married/cohabit. We also collected data on their domestic
division of labour. Our dependent variable is “level of domestic contentment,”
which is continuous; our independent variables are “sex” (two levels: male/female)
and “domestic division of labour” (three levels: “I do the majority of the
housework,” “Partner does majority of the housework,” and “50/50 housework split
between partner and I”). Table 1 shows the data.
Table 1: “Level of Domestic Contentment”: “Sex” and “Domestic Division
of Labour.”
“I do the majority of the housework” “Partner the does majority of the housework” “50/50 split”
Men
3 5 7
4 3 8
5 3 7
3 5 8
4 5 7
1 5 7
Women
1 5 9
2 4 8
1 3 9
1 5 8
1 5 7
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 5 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
2 4 9
We will test three null hypotheses:
H0a = There will be no difference on level of domestic contentment based
on sex
H0b = There will be no difference on level of domestic contentment based
on domestic division of labour
H0c = There will be no significant interaction between sex and domestic
division of labour in terms of level of domestic contentment
Social scientists generally choose a critical value for a Factorial ANOVA, such that
there is a less than a 0.05 probability that the result occurred strictly due to random
chance. Thus, researchers tend to reject the null hypothesis only when the F-test
statistic has a corresponding significance level (p-value) equal to or less than .05.
First, we need to calculate the degrees of freedom for our data. Table 2 shows this
data.
Table 2: Degrees of Freedom.
dfsex(A) a − 1 = 2 −1 = 1
dfdomesticdivision (B) b − 1 = 3 − 1 = 2
dfsex × dfdomesticdivision(A × B) (2−1)(3−1) = 2
dferror N − ab = 36 − (2 × 3) = 30
dfsex(A) N − 1 = 36 − 1 = 35
Now that we have our hypotheses and df, we should now locate the F critical
value for each of our hypotheses by consulting a table of the critical values
of the F-distribution. These are available in the appendices of most statistics
textbooks and can be found online easily. Table 3 shows the critical values for
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 6 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
each hypothesis.
Table 3: Critical Values of F.
Hypothesis df Critical value
of F
H0a: There will be no difference on level of domestic contentment based on sex. 1,
30 4.17
H0b: There will be no difference on level of domestic contentment based on domestic division of
labour.
2,
30 3.32
H0c: There will be no significant interaction between sex and domestic division of labour in terms of
level of domestic contentment.
2,
30 3.32
We can now calculate our test statistic F. To do this, we need to do a number
of calculations; firstly, we need to calculate the sum of squares (SS) for each
hypothesis, the error, and the total. We can do this using Equation 1.
(1)
SSsex =∑ (∑ ai )
2
(b)(n)−
T2
N
Let’s calculate SSsex to illustrate.
SSsex =932 + 842
(3)(6)−
1772
36
= 2.25
We can now calculate the other SS using versions of the same equation.
SSdomesticdivision =302 + 532 + 942
(2)(6)−
1772
36
= 175.17
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 7 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
SSsex x domestic division =∑ (∑ ai bi)
2
(n)−
∑ (∑ ai )2
(b)(n)−
∑ (∑ bi )2
(a)(n)+
T2
N
SSsex x domestic division =222 + 82 + 272 + 262 + 442 + 502
(6)−
932 + 842
(3)(6)−
302 + 532
(2)(6
= 17.16
SStotal = ∑ Y2 −1772
36
SStotal = 1081 −1772
36= 210.75
SSerror = SStotal − ( SSsex + SSdomesticdivision + SSsex x domestic division)
= 16.17
Next, we need to calculate our mean squares (MS) which we can do using
Equation 2.
(2)
MS =SSdf
Table 4 shows all our calculations (inlcuding MS).
Table 4: Calculations for SS, df, and MS.
SS df MS
Sex 2.25 1 2.25
Domestic division of labour 175.17 2 87.59
Interaction 17.16 2 8.58
Error 16.17 30 0.54
Total 210.75 35
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 8 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
Now we can calculate our F test statistic, using Equation 3
(3)
F =MSeffect
MSerror
Fsex =2.250.54
= 4.166
Fdomesticdivision =87.590.54
= 162.20
Fdomesticdivision =87.590.54
= 162.20
Now that we have calculated our F test statistic and we already know our F critical
values, we can return to our null hypotheses. For H0a there will be no difference
on level of domestic contentment based on sex, our test statistic is F = 4.166,
which is less than the critical value F (4.17), and so we fail to reject the null
hypothesis. For H0b there will be no difference on level of domestic contentment
based on domestic division of labour, our test statistic is F = 162.20, which is
greater than the critical value F (3.32), and so we reject the null hypothesis. For
H0c there will be no significant interaction between sex and domestic division
of labour in terms of level of domestic contentment, our test statistic is F =
15.89, which is greater than the critical value F (3.32), and so we reject the null
hypothesis.
We can summarise our findings by stating that there is no sex difference in
terms of level of domestic contentment but that there were significant differences
between the three different levels of domestic division of labour. In addition, an
interaction effect was present.
This example has shown you how to calculate an F-statistic manually, which is
quite cumbrous, and only really feasible to do with small samples. Especially with
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 9 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
larger samples, it is much easier to use statistical software.
Assumptions Behind the Method
All statistical tests rely on some underlying assumptions, and they all are affected
by the type of data that you have.
Assumptions of Factorial ANOVA:
• The dependent variable must be continuous.
• There must be two or more independent variables indicating two or more
categorical groupings that partition the sample.
• There must be independence of observations, so there is no relationship
between the groups or between the observations in each group.
• The data must be normally distributed or large enough for satisfactory
asymptotic approximation.
• The data must have equality of variance across factors and levels.
Assumptions 1, 2, and 3 are not typically testable from the sample data and are
related to the research design. The third assumption is only likely to be violated if
the data were sampled by pairs rather than individuals (e.g., couples rather than
individual persons). It is important to understand how your data were collected
and categorised, this will help you avoid violating the first three assumptions. The
fourth and fifth assumptions can be easily tested using statistical software.
Illustrative Example: Sex, Marital Status, and Respondent’s Waist/Hip
Ratio
This example presents a Factorial ANOVA using three variables from the 2002
English Health Survey (Teaching Dataset). Specifically, we test the extent to which
the variance of an individual’s waist/hip ratio differs by sex and/or marital status;
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 10 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
and is there a significant interaction between sex and marital status in relation to
waist/hip ratio
Thus, this example addresses the following research question:
Does an individual’s waist/hip ratio vary by sex and marital status?
Our model has three null hypotheses:
H0a: There will be no difference in variance of waist/hip ratio based on
sex.
H0b: There will be no difference in variance of waist/hip ratio based on
marital status.
H0c: There will be no significant interaction between sex and marital
status in terms of waist/hip ratio.
The Data
This example uses a subset of data from the 2002 English Health Survey
(Teaching Dataset). This extract includes 5,123 adult respondents. Please note
that the original dataset is larger than this, but it has been “cleaned” to reduce its
size for ease of analysis and also to include only those who have responded to
our dependent variable. The three variables we examine are:
• Respondent’s sex (sex)
• Marital status including cohabitees (RECODE) (RECODEmarstab)
• Valid Mean Waist/Hip ratio (whval)
The first variable, sex, is coded 1 if a respondent is “male” and 2 if “female.”
The second variable, RECODEmarstab, is coded 1 if respondent is “single,”
2 if “married/cohabiting,” and 3 if “previously married now single.” Given that
our dependent variable, whval, is continuous and our independent variables are
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 11 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
categorical with two or more groups each, then Factorial ANOVA is appropriate for
these data.
Analysing the Data
Before conducting Factorial ANOVA, we should first examine each variable in
isolation; this is called univariate analysis. We start by presenting a frequency
table of sex (Table 5). We can see that there are slightly more females (54.1) than
makes (45.9%).
Table 5: Frequency Distribution of sex.
Frequency Percent Valid percent Cumulative percent
Valid
Male 2,354 45.9 45.9 45.9
Female 2,769 54.1 54.1 100.0
Total 5,123 100.0
Table 6 below shows the frequency distribution of RECODEmarstab. We can see
that the majority (66.4%) of respondents are married/cohabiting, while only 14.9%
are “previously married now single.”
Table 6: Frequency Distribution of RECODEmarstab.
Frequency Percent Valid percent Cumulative percent
Valid
Single 958 18.7 18.7 18.7
Married/cohabiting 3,402 66.4 66.4 85.1
Previously married now single 763 14.9 14.9 100.0
Total 5,123 100.0 100.0
Table 7 shows the frequency distribution of whval. The mean waist/hip ratio is
0.86, which is deemed a moderate health risk for women but low health risk for
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 12 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
men.
Table 7: Frequency Distribution of whval.
whval
N
Valid 5,123
Missing 0
Mean 0.8622
Median 0.8581
Standard deviation 0.08718
Range 0.66
Minimum 0.57
Maximum 1.24
Tables 5, 6, and 7 show the distribution of the variables individually, but our
interest lies with their relationship with each other.
Prior to running our Factorial ANOVA, we need to test our data for normality and
homogeneity. We can do this in a variety of ways, but a good way to test for
normality is to run Q:Q Plots on statistical software, such as IBM® SPSS®. Figure
1 shows the Q:Q Plots for our data.
Figure 1: Q:Q Plots of sex: whval.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 13 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
Figure 2: Q:Q Plots of RECODEmarstab: whval.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 14 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 15 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
Figures 1 and 2 show that our data are approximately normal. We can test for
homogeneity using Levene’s test of Homogeneity of Variance (Table 8).
Table 8: Levene’s Test of Homogeneity of Variance.
whval sex RECODEmarstab
Levene statistic (based on mean) 2.981 8.245
df 1 2
df2 5,121 5,120
Sig. 0.84 0.000
The results of the Levene’s test suggest that our data’s homogeneity is affected
by the large subgrouping of “married/cohabiting” within the RECODEmarstab
variable, as we have homogeneity between sex:wvhal. However, if we review the
variance of wvhal within RECODEmarstab, we can see that the variances are very
similar (Table 9).
Table 9: Variance of wvhal: RECODEmarstab.
Variance of wvhal
Single 0.007
Married/Cohabiting 0.008
Previously married, now single 0.006
Given that the Levene’s test can be easily skewed and our variances look similar,
we will proceed with the Factorial ANOVA.
Conducting Factorial ANOVA
Table 10 presents the findings of our Factorial ANOVA. We can see that our model
is statistically significant (F = 776.436, p = .00), indeed the Adjusted R Squared
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 16 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
(0.431) suggests that sex and marital status account for 43% of the variance in
waist/hip ratio in this model.
Table 10: Results Factorial Analysis of Variance.
Tests of between-subjects effects
Type III sum of squares df Mean square F Sig.
Corrected model 16.792* 5 3.358 776.436 .000
Intercept 2,297.615 1 2,297.615 531,189.947 .000
RECODEmarstab 1.670 2 .835 193.009 .000
Sex 8.149 1 8.149 1,884.023 .000
RECODEmarstab*sex .213 2 .106 24.585 .000
Error 22.133 5,117 .004
Total 3,847.389 5,123
Corrected Total 38.925 5,122
* R Squared = 0.431 (adjusted R Squared = 0.431).
From Table 10 we can see that when we control for marital status, there is a
difference in respondent’s waist/hip ratio by sex (F = 1,884.023, p = .000), causing
us to reject H0a that there will be no difference in variance of waist/hip ratio based
on sex. Likewise when we control for sex, there is a difference in respondent’s
waist/hip ratio by marital status (F = 193.009, p = .000), causing us to reject H0b
there will be no difference in variance of waist/hip ratio based on marital status.
Finally, if we can reject (F = 24.585, p = .000) H0c there will be no significant
interaction between sex and marital status in terms of waist/hip ratio. Figure 3
confirms this rejection of H0c.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 17 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
Figure 3: Graph of the Interaction Effect of Sex-Marital Status on
Waist/Hip Ratio.
Overall, we can reject all three null hypotheses. Our findings suggest that
respondent’s waist/hip ratio varies by sex and marital status, even when we
control for each; relatedly, there is a significant interaction between sex and
marital status on respondent’s waist/hip ratio.
Further testing of these data, such as analysis of post hoc tests or calculation of
effect sizes would generate further detail as to the impact of specific subgroupings
within variables on waist/hip ratios.
Presenting Results
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 18 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
A Factorial ANOVA can be reported as follows:
“A Factorial ANOVA was conducted to compare the main effects of sex and marital
status and interaction between sex-marital status on respondent’s waist/hip ratio.
We used a subset of the 2002 English Health Survey (Teaching Dataset).
H0a = There will be no difference in variance of waist/hip ratio based on
sex.
H0b = There will be no difference in variance of waist/hip ratio based on
marital status.
H0c = There will be no significant interaction between sex and marital
status in terms of waist/hip ratio.
This extract includes 5,123 adult respondents. A Two-way ANOVA was conducted
on the influence of two independent variables (sex, marital status) on respondent’s
waist/hip ratio. Sex included two levels (male, female) and marital status consisted
of three levels (single, married/cohabiting, previously married now single). All
effects were statistically significant at the 0.05 significance level. The main effect
for sex yielded F (1, 5117) = 1,884.023, p = .000, indicating a significant difference
between males (M = 0.9205, SD = 0.07200) and females (M = 0.8126, SD =
0.0655). The main effect for marital status yielded F (2, 5117) = 24.585, p = .000,
indicating a significant difference between single (M = 0.8298, SD = 0.08330),
married/cohabiting (M = 0.8709, SD = 0.08776) and previously married now single
(M = 0.8642, SD = 0.06699). The interaction effect was significant, F (2, 5117) =
24.585, p = .000.”
Review
Factorial ANOVA is a test that seeks to compare the influence of at least two
independent variables, each with at least two levels, on a dependent variable. It
tests the null hypotheses of no difference in variances between and within the
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 19 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)
groups.
You should know:
• What types of variables are suited for Factorial ANOVA.
• The basic assumptions underlying this statistical test.
• How to compute and interpret Factorial ANOVA.
• How to report the results of a Factorial ANOVA.
Your Turn
You can download this sample dataset along with a guide showing how to produce
a Factorial ANOVA using statistical software. The sample dataset also includes
another variable called saywgt, which is respondent’s view of own weight. See
whether you can reproduce the results presented here for the sex and
RECODEmarstab variables and then try producing your own Factorial analysis
substituting saywgt for sex in the analysis.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 20 of 20 Learn to Use Factorial Analysis of Variance (ANOVA) in SPSS With Data
From the English Health Survey (Teaching Dataset) (2002)