Learn to Use the Eta Coefficient Test in R With Data From ...

13
Learn to Use the Eta Coefficient Test in R With Data From the NIOSH Quality of Worklife Survey (2014) © 2019 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets.

Transcript of Learn to Use the Eta Coefficient Test in R With Data From ...

Page 1: Learn to Use the Eta Coefficient Test in R With Data From ...

Learn to Use the Eta Coefficient

Test in R With Data From the

NIOSH Quality of Worklife Survey

(2014)

© 2019 SAGE Publications, Ltd. All Rights Reserved.

This PDF has been generated from SAGE Research Methods Datasets.

Page 2: Learn to Use the Eta Coefficient Test in R With Data From ...

Learn to Use the Eta Coefficient

Test in R With Data From the

NIOSH Quality of Worklife Survey

(2014)

Student Guide

Introduction

This example dataset introduces the Eta Coefficient test. This test allows

researchers to test the strength of association between an independent variable

that is categorical and a dependent variable that is scale or interval level. This

example describes the Eta Coefficient test, discusses the assumptions underlying

it, and shows how to compute and interpret it. We illustrate the Eta Coefficient

test using a subset of data from the 2014 NIOSH-Quality of Worklife Survey.

Specifically, we test the strength of association between respondent’s sex and

respondent’s income. The Eta Coefficient test allows us to measure the strength

of a nonlinear or curvilear association; in other words, it is a test for correlation

between a categorical and a scale variable. Because categorical data by its nature

cannot exist in a truly linear relationship with scale data, we cannot use the typical

measure of linear association, Pearson’s Correlation Coefficient. However, the Eta

Coefficient can test for correlation in curvilear or nonlinear relationships.

This page provides links to this sample dataset and a guide to producing the Eta

Coefficient test using statistical software.

What Is an Eta Coefficient Test?

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 2 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 3: Learn to Use the Eta Coefficient Test in R With Data From ...

An Eta Coefficient test is a method for determining the strength of association

between a categorical variable (e.g., sex, occupation, ethnicity), typically the

independent variable and a scale- or interval-level variable (e.g., income, weight,

test score), typically the dependent variable. Because the Eta Coefficient is

asymmetric, unlike Pearson’s Correlation Coefficient, it is important to identify

clearly which is your independent and dependent variable. This test can be used

to test for strength of linear association, but it would be more appropriate to use

the Pearson’s Correlation Coefficient. The Eta Coefficient test is most typically

used for testing the strength of nonlinear association between a categorical and

a scale variable. When computing formal statistical tests, it is customary to define

the null hypothesis (H0) to be tested. In this case, the standard null hypothesis

is that there will be no association between the two variables. Some difference

in association is expected simply due to sampling error, i.e., random chance

in sampling. The Eta Coefficient test conducted here is designed to help us

determine whether the difference is large enough to declare the test statistically

significant. “Large enough” is typically defined as an Eta Coefficient test statistic

with a level of statistical significance of more than 0.0. This would lead us to reject

the null hypothesis (H0) of no association between the two variables.

Calculating an Eta Coefficient Test

The Eta Coefficient test has similarities to two other statistical tests; the One-Way

ANOVA and Pearson’s Correlation Coefficient. The Eta Coefficient test statistic

is calculated in a way that is very similar to that of a One-Way ANOVA, the

key difference being that the Eta Coefficient’s equation does not incorporate

error sum of squares. We interpret the Eta Coefficient test statistic in much

the same way that we would the Pearson’s Correlation Coefficient; in fact, we

use the Pearson’s Correlation Coefficient scale. The value of the Eta Coefficient

test statistic will always be greater than, but never less than, the corresponding

Pearson’s Correlation Coefficient; thus, the Pearson’s Correlation Coefficient

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 3 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 4: Learn to Use the Eta Coefficient Test in R With Data From ...

scale of strength of association can be used to interpret the Eta Coefficient test

statistic. However, it should be noted that the Eta Correlation Coefficient is not

calculated in the same way as the Pearson’s Correlation Coefficient. A value of

0.0 means that our variables are not associated.

To illustrate, let’s imagine that we have 12 participants who were tested on a short

physical endurance test (scored out of 20). The participants were categorized by

their drink of choice prior to the test; “water,” “caffeine-drink,” and “milk.”

Table 1 shows the results below.

Table 1: Test Results by Drink Category.

Frequency

Water Caffeine drink Milk

6 10 10

4 6 12

2 8 14

Total 12 24 36

Mean 8 8 12

Grand mean (the mean of the means) 8

Table 1 shows that there is a clear variance between test scores and drink

consumed prior to the test, which suggests that there is an association between

drink and test score. However, we need to ascertain how strong this association

actually is.

Equation 1 presents the formula for the Eta Coefficient test:

(1)

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 4 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 5: Learn to Use the Eta Coefficient Test in R With Data From ...

η = √ SSBSST

where:

• SSB = between sum of squares

• SST = total sum of squares

Equation 2 presents the equation partially calculated:

(2)

n = √ (4 − 8)2 + (4 − 8)2 + (4 − 8)2 + (8 − 8)2 + (8 − 8)2 + (8 − 8)2 + (12 − 8)2 + (12 −

(6 − 8)2 + (4 − 8)2 + (2 − 8)2 + (10 − 8)2 + (6 − 8)2 + (8 − 8)2 + (10 − 12)2 + (12

Equation 3 presents the equation fully calculated:

(3)

η = √ 96120

η = 0.89

Our Eta Coefficient test statistic (η) is 0.89. Because this figure is above 0.0

on the Pearson’s Correlation Coefficient scale, we can determine that there

is an association between type of drink consumed prior to the test and test

performance.

Table 2 presents the Pearson’s Correlation Coefficient scale. It should be noted

that because we are working with nonlinear or curvilinear data, we cannot

determine the direction of the association between our two variables; thus, we

cannot talk about positive or negative associations, as we could when working

with linear correlations. Therefore, the Pearson Correlation Coefficient scale

below has been amended to reflect this. Anything above 0.0 determines an

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 5 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 6: Learn to Use the Eta Coefficient Test in R With Data From ...

association, but we should use the figure of 0.2 as our minimum level for

acceptance of an association.

Table 2: The Pearson’s Correlation Coefficient Scale for Use With the Eta

Coefficient.

Pearson’s correlation coefficient Interpretation

0.00 No association between the two variables

0.01–0.19 No or negligible association between the variables

0.2–0.39 Weak association between the variables

0.4–0.69 Medium association between the variables

0.70–1.0 Strong association between the variables

Our Eta Coefficient test statistic (η) is 0.89, which we can determine, based on

Table 2, to mean that there is a strong association between our two variables.

We can reject the H0, in other words, there is an association between the two

variables. Moreover, by reviewing the test scores in Table 1, we might further

suggest that milk is the drink that produced the best test performance scores

(Mean test score = 12).

Assumptions Behind the Method

Nearly every statistical test relies on some underlying assumptions, and they all

are affected by the type of data that you have.

Assumptions of the Eta Coefficient test

• The data must be nonlinear or curvilinear

• The data must be asymmetric

• The dependent variable should be scale or interval level

• The independent variables should be categorical with two or more

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 6 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 7: Learn to Use the Eta Coefficient Test in R With Data From ...

categories

• There must be independence of observations, so there is no relationship

between the groups or between the observations in each group.

The first two assumptions can be tested easily in most statistical software

programs. Assumptions three and four are not typically testable from the sample

data and are related to the research design. The last assumption is only likely

to be violated if the data were sampled by pairs rather than individuals (e.g.,

couples rather than individual persons). It is important to understand how your

data were collected and categorized, this will help you avoid violating the first two

assumptions.

Illustrative Example: Is There an Association Between Sex and

Respondent’s Income?

This example presents an Eta Coefficient test using two variables from the 2014

NIOSH-Quality of Worklife Survey. Specifically, we test whether there is an

association between respondent’s sex and respondent’s income.

Thus, this example addresses the following research question:

Is there a gender difference in income earned?

Stated in the form of a null hypothesis:

H0 = There will be no association between sex and respondent’s income.

It should be noted that this hypothesis is two-tailed.

The Data

This example uses a subset of data from the 2014 NIOSH-Quality of Worklife

Survey. This extract includes 30,865 respondents, which is a large sample. It

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 7 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 8: Learn to Use the Eta Coefficient Test in R With Data From ...

should be noted that the original dataset is larger than this, but it has been

“cleaned” to include only those who have responded to our dependent variable.

The two variables we examine are:

• Respondent’s sex (sex)

• Respondent’s income in constant dollars (conrinc)

The first variable, Respondent’s sex, (sex), is coded 1, if male and 2, if female.

Respondent’s income (conrinc) is scale and therefore not coded. We treat sex as

categorical and conrinc as scale in line with common practice in social science

research.

Analysing the Data

Because our independent variable is categorical, we know that our data cannot

be symmetrical nor can it be linear. Before conducting the Eta Coefficient test, we

should first examine each variable in isolation. We start by presenting a frequency

distribution of sex in Table 3. Table 3 shows the distribution of sex; there are

slightly more males (51.6%) than females (48.4%) in the sample.

Table 3: Frequency Distribution of sex.

Frequency Percent Valid percent Cumulative percent

Valid

Male 19,000 51.6 51.6 51.6

Female 17,805 48.4 48.4 100.0

Total 36,805 100.0 100.0

Table 4 shows the frequency distribution of conrinc. The income Range is large

($434,243), with the Median income $26,754. Two thirds of respondents’ income

falls between $721.88 and $66,971.50. The figures suggest a skewed distribution,

caused by some very low and very high incomes; this is confirmed by Figure 1.

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 8 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 9: Learn to Use the Eta Coefficient Test in R With Data From ...

Table 4: Frequency Distribution of conrinc: Income.

N

Valid 30,865

Missing 5,940

Mean 33,846.68

Median 26,754.00

Mode 39,695

Standard deviation 33,124.796

Range 43,4243

Minimum 370

Maximum 43,4612

Figure 1: Histogram of conrinc: Income.

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 9 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 10: Learn to Use the Eta Coefficient Test in R With Data From ...

Tables 3 and 4 show the distribution of each of these variables by themselves, but

they cannot tell us whether they are in a relationship. Table 5 below shows the

measures of central tendency for the variables in association with each other.

Table 5: Descriptives for Income: Sex.

Sex Statistic Standard error

Income Male

Mean 42,212.32

302.418 95% Confidence interval for mean:

Lower bound 41,619.54

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 10 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 11: Learn to Use the Eta Coefficient Test in R With Data From ...

Upper bound 42,805.09

5% Trimmed mean 37,829.52

Median 34,487.00

Variance 1,457,000,319

Range 434,243

Female

Mean 24,922.55

193.516

95% Confidence interval for mean:

Lower bound 24,543.24

Upper bound 25,301.86

5% Trimmed mean 22,477.30

Median 20,129.00

Variance 559,252,386.40

Range 434,243

We can see from Table 5 that there appears to be a clear difference in the

distribution of income by gender; the Median income for males is $34,487 and for

females is $20,129.

We are now ready to run the Eta Coefficient test to determine whether our

variables do have a statistically significant association, as our preliminary analysis

suggests.

Conducting the Eta Coefficient Test

Table 6 presents the results of the Eta Coefficient test. The Eta Coefficient test

statistic (η) is 0.261, which is above 0.2, the minimum tolerance for a statistically

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 11 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 12: Learn to Use the Eta Coefficient Test in R With Data From ...

significant association between our variables, according to Table 2. Relatedly, we

can interpret our Eta Coefficient test statistic (η) as meaning that gender and

income have a weak association with each other. However, we can reject our null

hypothesis that there will be no association between sex and income; they have a

weak association.

Table 6: Results of the Eta Coefficient Test.

Value

Nominal by interval Eta Respondent income in constant dollars 0.261

N of valid cases 30,865

It is possible to calculate the amount of variance in income attributed to sex

by calculating Eta Squared (Eta2), which is done by simply squaring the Eta

Coefficient test statistic (η). Thus, our Eta2 result would be 0.07, we then convert

this to a percentage (7%); only 7% of the variance of income can be attributed to

sex, in other words, it is a weak effect, again confirming our interpretation of the

Eta Coefficient test.

Presenting Results

An Eta Coefficient test can be reported as follows:

“We used a subset of data from NIOSH-Quality of Worklife Survey (2014) dataset,

to test whether there is an association between sex and income. Thus, we tested

the following null hypothesis:

H0 = There will be no association between sex and respondent’s income.

The data included 30,865 adults. There was a significant association between sex

and income, η = 0.0261, η2 = 0.07, which suggests a weak association between

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 12 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)

Page 13: Learn to Use the Eta Coefficient Test in R With Data From ...

the variables. This leads us to reject the null hypothesis of no association between

sex and income; sex attributes to 7% of the variance in income.”

Review

The Eta Coefficient test is a statistical test used to evaluate the strength of

association between a categorical variable and a scale- or interval-level variable.

You should know:

• What types of variables are suited for an Eta Coefficient test.

• The basic assumptions underlying this statistical test.

• How to compute and interpret an Eta Coefficient test.

• How to report the results of an Eta Coefficient test.

Your Turn

You can download this sample dataset along with a guide showing how to produce

an Eta Coefficient test using statistical software. The sample dataset also includes

another variable called rrrace, which is the respondent’s race. See whether you

can reproduce the results presented here for the sex variable, and then try

producing your own Eta Coefficient test substituting rrrace for sex in the analysis.

SAGE

2019 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods Datasets Part

2

Page 13 of 13 Learn to Use the Eta Coefficient Test in R With Data From the NIOSH

Quality of Worklife Survey (2014)