Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of...

44
Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of two quantitative variables. If the relationship is not linear, the application of statistics that assume linearity may give questionable results. Determining whether a relationship should be characterized as linear or non-linear is challenging. One indicator of non-linearity is the difference between the rank-order correlation correlation coefficient (Spearman's rho) and Pearson's r. When Spearman's rho is larger than Pearson's r, the relationship is likely to be non-linear, and Pearson's r may understate the strength of the relationship. However, we can improve the linearity of the relationship and justify the use of statistics that assume linearity if one or both variables are badly skewed due to outliers, but can be corrected by re- expressing the data.

Transcript of Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of...

Page 1: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 1

SOLVING THE HOMEWORK PROBLEMS

Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of two quantitative variables. If the relationship is not linear, the application of statistics that assume linearity may give questionable results. Determining whether a relationship should be characterized as linear or non-linear is challenging.

One indicator of non-linearity is the difference between the rank-order correlation correlation coefficient (Spearman's rho) and Pearson's r. When Spearman's rho is larger than Pearson's r, the relationship is likely to be non-linear, and Pearson's r may understate the strength of the relationship.

However, we can improve the linearity of the relationship and justify the use of statistics that assume linearity if one or both variables are badly skewed due to outliers, but can be corrected by re-expressing the data.

Page 2: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 2

The introductory statement in the question indicates:• The data set to use (2001WorldFactBook)• The task to accomplish (association between

variables)• The variables to use in the analysis: percent of the

total population who was literate [literacy] HIV-AIDS adult prevalence rate [hivaids]

Page 3: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 3

These problem also contains a second paragraph of instructions that provide the formulas to use if our examination of the association between quantitative variables requires us to re-express or transform one or both of the variables.

Page 4: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 4

The first statement concerns the number of valid cases. To answer this question, we produce the statistics using the SPSS Correlate procedure.

Page 5: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 5

To compute correlations, select Correlate > Bivariate from the Correlate menu.

Page 6: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 6

First, move the variables hivLive and literacy to the Variables list box.

Second, mark the check box for Spearman and leave the check box for Pearson marked.

Third, click on the OK button to produce the output.

Page 7: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 7

The Correlations table shows us the number of cases available for computing the correlation – 131.

Page 8: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 8

The number of cases with valid data to analyze the relationship between "percent of the total population who was literate" and "number of people living with HIV-AIDS" was 131, out of the total of 218 cases in the data set.

Click on the check box to mark the statement as correct.

Page 9: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 9

The next statement asks us to extract the correlation coefficients from the SPSS output, and compare the two.

Page 10: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 10

The Correlations table shows us that the Pearson r correlation is -0.203.

Page 11: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 11

The Nonparametric Correlations table shows us that the Spearman rho correlation is -0.548.

The comparison of the strength of the relationship indicated by each measure is based on the relative size of the absolute values of the coefficients. Since the absolute value of Spearman's rho (0.548) is larger than the absolute value of Pearson's r (0.203), Spearman's rho indicates a stronger relationship.

Page 12: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 12

Pearson's r was correctly identified as -0.203. Spearman's rho was correctly identified as -0.548.

The comparison of the strength of the relationship indicated by each measure is based on the relative size of the absolute values of the coefficients. Since the absolute value of Spearman's rho (0.548) is larger than the absolute value of Pearson's r (0.203), Spearman's rho indicates a stronger relationship.

Click on the check box to mark the statement as correct.

Page 13: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 13

The next statement asks us to identify the outliers in the distribution. If there are outliers, we can re-express one or both variables to see if the linear correlation between the variables increases.

In these problems, outliers are defined as cases with scores that are three or more standard deviations from the mean. We use the SPSS Descriptives procedure to identify them.

Page 14: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 14

To compute the standard scores, select the Descriptive Statistics > Descriptives command from the Analyze menu.

Page 15: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 15

First, move the variable for the analysis hivlive and literacy to the Variable(s) list box.

Second, click on the Options button to request skewness in case we decide to re-express the variable.

Page 16: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 16

First, click the check box for Skewness so that we can decide which transformation to use should we decide to re-express the variable.

Second, click on Continue button to return to the prior dialog box.

Page 17: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 17

Finally, click on the OK button to produce the output.

Next, mark the check box Save standardized values as variables.

Page 18: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 18

If we scroll the Data View all the way to the right, we see that SPSS has created the standard scores. To name the standard score variables, SPSS prepends the letter “Z” to the variable name.

Page 19: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 19

Click the right mouse button on the column header for Zhivlive, and select Sort Ascending from the pop-up menu. This will show any negative outliers at the top of the column.

Page 20: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 20

After we scroll down past the cases with missing values, we do not see any negative values less than or equal to -3.0.

Page 21: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 21

Click the right mouse button again on the column header for Zpop, and select Sort Descending from the pop-up menu. This will show any positive outliers at the top of the column.

Click the right mouse button again on the column header for Zhivlive, and select Sort Descending from the pop-up menu. This will show any positive outliers at the top of the column.

Page 22: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 22

At the top of the column, we see four positive outliers with values greater than or equal to +3.0.

Page 23: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 23

If we scroll back to the country column, we see the names for the four outliers: Ethiopia, Kenya, Nigeria, and South Africa.

Page 24: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 24

Scroll right to the Zliterarcy column and click the right mouse button on the column header for Zliterarcy, and select Sort Ascending from the pop-up menu. This will show any negative outliers at the top of the column.

Page 25: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 25

After we scroll down past the cases with missing values, we see that we have one negative value less than or equal to -3.0.

Page 26: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 26

Click the right mouse button again on the column header for Zliteracy, and select Sort Descending from the pop-up menu. This will show any positive outliers at the top of the column.

Page 27: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 27

At the top of the column, we see that there are not any standard scores equal to or greater than +3.0.

Page 28: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 28

If we select the row with the negative outlier and scroll back to the country column, we see that Niger was an outlier on the variable literacy.

Page 29: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 29

The distribution of "number of people living with HIV-AIDS" [hivlive] contained four outliers that were three or more standard deviations from the mean: Ethiopia with a value of 3,000,000 (z=4.81), Kenya with a value of 2,100,000 (z=3.25), Nigeria with a value of 2,700,000 (z=4.29), and South Africa with a value of 4,200,000 (z=6.88). The distribution of "percent of the total population who was literate" [literacy] contained one outlier that was three or more standard deviations from the mean: Niger with a value of 13.6 (z=-3.02).

We mark the check box for correct.

Page 30: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 30

Since Spearman’s rho indicated a stronger relationship than Pearson’s r in distributions that had outliers, we can re-express the variables to see if the strength of the linear relationship increases.

Page 31: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 31

In the table of Descriptive Statistics, we see that the skewness for number of people living with HIV-AIDS [hivlive] was 4.412. Since the variable was positively skewed, the data will be re-expressed as logarithms, transforming it to the log transformation of number of people living with HIV-AIDS [LG_hivlive].

Page 32: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 32

We type the formula identified in the second paragraph of the problem.

Click on the OK button to produce the output.

Page 33: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 33

The skewness for "percent of the total population who was literate" [literacy] was -1.112. Since the variable was negatively skewed, the data will be re-expressed as squares. The independent variable percent of the total population who was literate [literacy] was transformed to the square transformation of percent of the total population who was literate [SQ_literacy].

Page 34: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 34

We type the formula identified in the second paragraph of the problem.

Click on the OK button to produce the output.

Page 35: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 35

Next, we compute the correlation coefficients for the transformed variables.

To compute correlations, select Correlate > Bivariate from the Correlate menu.

Page 36: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 36

First, move the variables LG_hivLive and SQ_literacy to the Variables list box.

Second, click on the OK button to produce the output.

Page 37: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 37

The linear fit of the relationship improved, and we report the Pearson's r for the relationship using the transformed variables.

Before re-expressing the data, Pearson's r was -0.203. After re-expressing both the dependent variable number of people living with HIV-AIDS [hivlive] as the log transformation of number of people living with HIV-AIDS [LG_hivlive] and the independent variable percent of the total population who was literate [literacy] as the square transformation of percent of the total population who was literate [SQ_literacy], Pearson's r increased to -0.487.

Page 38: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 38

Since the strength of the linear relationship increased when we used the transformed variables, we mark the check box for the question.

Page 39: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 39

The next statement asks us to interpret the Pearson correlation coefficient using the guidelines attributed to Tukey.

Page 40: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 40

Using the rule of thumb attributed to Tukey, a correlation between 0.0 and ±0.20 is very weak; ±0.20 to ±0.40 is weak; ±0.40 to ±0.60 is moderate; ±0.60 to ±0.80 is strong; and greater than ±0.80 is very strong, the relationship between the square transformation of "percent of the total population who was literate" and the log transformation of "number of people living with HIV-AIDS"was correctly characterized as a moderate relationship (Pearson's r = -.487).

Click on the check box to mark the statement as correct.

Page 41: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 41

The next statement asks us to interpret the Pearson correlation coefficient using the guidelines attributed to Cohen.

Page 42: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 42

Applying Cohen's criteria for effect size (less than ±0.10 = trivial; ±0.10 up to ±0.30 = weak or small; ±0.30 up to ±0.50 = moderate; ±0.50 or greater = strong or large), the relationship was correctly characterized as a moderate relationship (Pearson's r = -.487).

Click on the check box to mark the statement as correct.

Page 43: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 43

The next statement asks us to interpret the direction of the relationship between the variables. A direct or positive relationship means that the values of the variables change in the same direction, i.e. when one goes up or down, the other goes up or down. An inverse or negative relationship means that the values of the variables move in different directions.

Page 44: Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Slide 44

Since the sign of the correlation coefficient was negative (Pearson's r = -.487), the relationship is inverse and the values for the variables move in opposite directions. The statement that “higher scores on the variable the square transformation of percent of the total population who was literate were associated with lower scores on the log transformation of number of people living with HIV-AIDS” is correct.

Click on the check box to mark the statement as correct.