전자여행허가(eTA)...전자여행허가 (eTA) 신청 도움 안내서 이 3 부로 구성된 안내서는 전자 여행 허가(eTA) 신청서를 올바르게 작성하는 데
Learn to Use the Eta Coefficient Test in R With Data From ...
Transcript of Learn to Use the Eta Coefficient Test in R With Data From ...
![Page 1: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/1.jpg)
Learn to Use the Eta Coefficient
Test in R With Data From the
NIOSH Quality of Worklife Survey
(2014)
© 2019 SAGE Publications, Ltd. All Rights Reserved.
This PDF has been generated from SAGE Research Methods Datasets.
![Page 2: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/2.jpg)
Learn to Use the Eta Coefficient
Test in R With Data From the
NIOSH Quality of Worklife Survey
(2014)
Student Guide
Introduction
This example dataset introduces the Eta Coefficient test. This test allows
researchers to test the strength of association between an independent variable
that is categorical and a dependent variable that is scale or interval level. This
example describes the Eta Coefficient test, discusses the assumptions underlying
it, and shows how to compute and interpret it. We illustrate the Eta Coefficient
test using a subset of data from the 2014 NIOSH-Quality of Worklife Survey.
Specifically, we test the strength of association between respondent’s sex and
respondent’s income. The Eta Coefficient test allows us to measure the strength
of a nonlinear or curvilear association; in other words, it is a test for correlation
between a categorical and a scale variable. Because categorical data by its nature
cannot exist in a truly linear relationship with scale data, we cannot use the typical
measure of linear association, Pearson’s Correlation Coefficient. However, the Eta
Coefficient can test for correlation in curvilear or nonlinear relationships.
This page provides links to this sample dataset and a guide to producing the Eta
Coefficient test using statistical software.
What Is an Eta Coefficient Test?
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 2 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 3: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/3.jpg)
An Eta Coefficient test is a method for determining the strength of association
between a categorical variable (e.g., sex, occupation, ethnicity), typically the
independent variable and a scale- or interval-level variable (e.g., income, weight,
test score), typically the dependent variable. Because the Eta Coefficient is
asymmetric, unlike Pearson’s Correlation Coefficient, it is important to identify
clearly which is your independent and dependent variable. This test can be used
to test for strength of linear association, but it would be more appropriate to use
the Pearson’s Correlation Coefficient. The Eta Coefficient test is most typically
used for testing the strength of nonlinear association between a categorical and
a scale variable. When computing formal statistical tests, it is customary to define
the null hypothesis (H0) to be tested. In this case, the standard null hypothesis
is that there will be no association between the two variables. Some difference
in association is expected simply due to sampling error, i.e., random chance
in sampling. The Eta Coefficient test conducted here is designed to help us
determine whether the difference is large enough to declare the test statistically
significant. “Large enough” is typically defined as an Eta Coefficient test statistic
with a level of statistical significance of more than 0.0. This would lead us to reject
the null hypothesis (H0) of no association between the two variables.
Calculating an Eta Coefficient Test
The Eta Coefficient test has similarities to two other statistical tests; the One-Way
ANOVA and Pearson’s Correlation Coefficient. The Eta Coefficient test statistic
is calculated in a way that is very similar to that of a One-Way ANOVA, the
key difference being that the Eta Coefficient’s equation does not incorporate
error sum of squares. We interpret the Eta Coefficient test statistic in much
the same way that we would the Pearson’s Correlation Coefficient; in fact, we
use the Pearson’s Correlation Coefficient scale. The value of the Eta Coefficient
test statistic will always be greater than, but never less than, the corresponding
Pearson’s Correlation Coefficient; thus, the Pearson’s Correlation Coefficient
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 3 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 4: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/4.jpg)
scale of strength of association can be used to interpret the Eta Coefficient test
statistic. However, it should be noted that the Eta Correlation Coefficient is not
calculated in the same way as the Pearson’s Correlation Coefficient. A value of
0.0 means that our variables are not associated.
To illustrate, let’s imagine that we have 12 participants who were tested on a short
physical endurance test (scored out of 20). The participants were categorized by
their drink of choice prior to the test; “water,” “caffeine-drink,” and “milk.”
Table 1 shows the results below.
Table 1: Test Results by Drink Category.
Frequency
Water Caffeine drink Milk
6 10 10
4 6 12
2 8 14
Total 12 24 36
Mean 8 8 12
Grand mean (the mean of the means) 8
Table 1 shows that there is a clear variance between test scores and drink
consumed prior to the test, which suggests that there is an association between
drink and test score. However, we need to ascertain how strong this association
actually is.
Equation 1 presents the formula for the Eta Coefficient test:
(1)
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 4 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 5: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/5.jpg)
η = √ SSBSST
where:
• SSB = between sum of squares
• SST = total sum of squares
Equation 2 presents the equation partially calculated:
(2)
n = √ (4 − 8)2 + (4 − 8)2 + (4 − 8)2 + (8 − 8)2 + (8 − 8)2 + (8 − 8)2 + (12 − 8)2 + (12 −
(6 − 8)2 + (4 − 8)2 + (2 − 8)2 + (10 − 8)2 + (6 − 8)2 + (8 − 8)2 + (10 − 12)2 + (12
Equation 3 presents the equation fully calculated:
(3)
η = √ 96120
η = 0.89
Our Eta Coefficient test statistic (η) is 0.89. Because this figure is above 0.0
on the Pearson’s Correlation Coefficient scale, we can determine that there
is an association between type of drink consumed prior to the test and test
performance.
Table 2 presents the Pearson’s Correlation Coefficient scale. It should be noted
that because we are working with nonlinear or curvilinear data, we cannot
determine the direction of the association between our two variables; thus, we
cannot talk about positive or negative associations, as we could when working
with linear correlations. Therefore, the Pearson Correlation Coefficient scale
below has been amended to reflect this. Anything above 0.0 determines an
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 5 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 6: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/6.jpg)
association, but we should use the figure of 0.2 as our minimum level for
acceptance of an association.
Table 2: The Pearson’s Correlation Coefficient Scale for Use With the Eta
Coefficient.
Pearson’s correlation coefficient Interpretation
0.00 No association between the two variables
0.01–0.19 No or negligible association between the variables
0.2–0.39 Weak association between the variables
0.4–0.69 Medium association between the variables
0.70–1.0 Strong association between the variables
Our Eta Coefficient test statistic (η) is 0.89, which we can determine, based on
Table 2, to mean that there is a strong association between our two variables.
We can reject the H0, in other words, there is an association between the two
variables. Moreover, by reviewing the test scores in Table 1, we might further
suggest that milk is the drink that produced the best test performance scores
(Mean test score = 12).
Assumptions Behind the Method
Nearly every statistical test relies on some underlying assumptions, and they all
are affected by the type of data that you have.
Assumptions of the Eta Coefficient test
• The data must be nonlinear or curvilinear
• The data must be asymmetric
• The dependent variable should be scale or interval level
• The independent variables should be categorical with two or more
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 6 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 7: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/7.jpg)
categories
• There must be independence of observations, so there is no relationship
between the groups or between the observations in each group.
The first two assumptions can be tested easily in most statistical software
programs. Assumptions three and four are not typically testable from the sample
data and are related to the research design. The last assumption is only likely
to be violated if the data were sampled by pairs rather than individuals (e.g.,
couples rather than individual persons). It is important to understand how your
data were collected and categorized, this will help you avoid violating the first two
assumptions.
Illustrative Example: Is There an Association Between Sex and
Respondent’s Income?
This example presents an Eta Coefficient test using two variables from the 2014
NIOSH-Quality of Worklife Survey. Specifically, we test whether there is an
association between respondent’s sex and respondent’s income.
Thus, this example addresses the following research question:
Is there a gender difference in income earned?
Stated in the form of a null hypothesis:
H0 = There will be no association between sex and respondent’s income.
It should be noted that this hypothesis is two-tailed.
The Data
This example uses a subset of data from the 2014 NIOSH-Quality of Worklife
Survey. This extract includes 30,865 respondents, which is a large sample. It
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 7 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 8: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/8.jpg)
should be noted that the original dataset is larger than this, but it has been
“cleaned” to include only those who have responded to our dependent variable.
The two variables we examine are:
• Respondent’s sex (sex)
• Respondent’s income in constant dollars (conrinc)
The first variable, Respondent’s sex, (sex), is coded 1, if male and 2, if female.
Respondent’s income (conrinc) is scale and therefore not coded. We treat sex as
categorical and conrinc as scale in line with common practice in social science
research.
Analysing the Data
Because our independent variable is categorical, we know that our data cannot
be symmetrical nor can it be linear. Before conducting the Eta Coefficient test, we
should first examine each variable in isolation. We start by presenting a frequency
distribution of sex in Table 3. Table 3 shows the distribution of sex; there are
slightly more males (51.6%) than females (48.4%) in the sample.
Table 3: Frequency Distribution of sex.
Frequency Percent Valid percent Cumulative percent
Valid
Male 19,000 51.6 51.6 51.6
Female 17,805 48.4 48.4 100.0
Total 36,805 100.0 100.0
Table 4 shows the frequency distribution of conrinc. The income Range is large
($434,243), with the Median income $26,754. Two thirds of respondents’ income
falls between $721.88 and $66,971.50. The figures suggest a skewed distribution,
caused by some very low and very high incomes; this is confirmed by Figure 1.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 8 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 9: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/9.jpg)
Table 4: Frequency Distribution of conrinc: Income.
N
Valid 30,865
Missing 5,940
Mean 33,846.68
Median 26,754.00
Mode 39,695
Standard deviation 33,124.796
Range 43,4243
Minimum 370
Maximum 43,4612
Figure 1: Histogram of conrinc: Income.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 9 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 10: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/10.jpg)
Tables 3 and 4 show the distribution of each of these variables by themselves, but
they cannot tell us whether they are in a relationship. Table 5 below shows the
measures of central tendency for the variables in association with each other.
Table 5: Descriptives for Income: Sex.
Sex Statistic Standard error
Income Male
Mean 42,212.32
302.418 95% Confidence interval for mean:
Lower bound 41,619.54
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 10 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 11: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/11.jpg)
Upper bound 42,805.09
5% Trimmed mean 37,829.52
Median 34,487.00
Variance 1,457,000,319
Range 434,243
Female
Mean 24,922.55
193.516
95% Confidence interval for mean:
Lower bound 24,543.24
Upper bound 25,301.86
5% Trimmed mean 22,477.30
Median 20,129.00
Variance 559,252,386.40
Range 434,243
We can see from Table 5 that there appears to be a clear difference in the
distribution of income by gender; the Median income for males is $34,487 and for
females is $20,129.
We are now ready to run the Eta Coefficient test to determine whether our
variables do have a statistically significant association, as our preliminary analysis
suggests.
Conducting the Eta Coefficient Test
Table 6 presents the results of the Eta Coefficient test. The Eta Coefficient test
statistic (η) is 0.261, which is above 0.2, the minimum tolerance for a statistically
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 11 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 12: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/12.jpg)
significant association between our variables, according to Table 2. Relatedly, we
can interpret our Eta Coefficient test statistic (η) as meaning that gender and
income have a weak association with each other. However, we can reject our null
hypothesis that there will be no association between sex and income; they have a
weak association.
Table 6: Results of the Eta Coefficient Test.
Value
Nominal by interval Eta Respondent income in constant dollars 0.261
N of valid cases 30,865
It is possible to calculate the amount of variance in income attributed to sex
by calculating Eta Squared (Eta2), which is done by simply squaring the Eta
Coefficient test statistic (η). Thus, our Eta2 result would be 0.07, we then convert
this to a percentage (7%); only 7% of the variance of income can be attributed to
sex, in other words, it is a weak effect, again confirming our interpretation of the
Eta Coefficient test.
Presenting Results
An Eta Coefficient test can be reported as follows:
“We used a subset of data from NIOSH-Quality of Worklife Survey (2014) dataset,
to test whether there is an association between sex and income. Thus, we tested
the following null hypothesis:
H0 = There will be no association between sex and respondent’s income.
The data included 30,865 adults. There was a significant association between sex
and income, η = 0.0261, η2 = 0.07, which suggests a weak association between
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 12 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)
![Page 13: Learn to Use the Eta Coefficient Test in R With Data From ...](https://reader033.fdocuments.us/reader033/viewer/2022051402/627c919d0e0eff1b382307f3/html5/thumbnails/13.jpg)
the variables. This leads us to reject the null hypothesis of no association between
sex and income; sex attributes to 7% of the variance in income.”
Review
The Eta Coefficient test is a statistical test used to evaluate the strength of
association between a categorical variable and a scale- or interval-level variable.
You should know:
• What types of variables are suited for an Eta Coefficient test.
• The basic assumptions underlying this statistical test.
• How to compute and interpret an Eta Coefficient test.
• How to report the results of an Eta Coefficient test.
Your Turn
You can download this sample dataset along with a guide showing how to produce
an Eta Coefficient test using statistical software. The sample dataset also includes
another variable called rrrace, which is the respondent’s race. See whether you
can reproduce the results presented here for the sex variable, and then try
producing your own Eta Coefficient test substituting rrrace for sex in the analysis.
SAGE
2019 SAGE Publications, Ltd. All Rights Reserved.
SAGE Research Methods Datasets Part
2
Page 13 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH
Quality of Worklife Survey (2014)