but - SFU Mathematics and Statistics Web...

37
Topics: Non-Parametric Measures of Correlation Spearman’s Rho Goodman’s & Kruskal’s Gamma Stat203 Page 1 of 37 Fall2011 – Week 13, Lecture 1

Transcript of but - SFU Mathematics and Statistics Web...

Topics:

Non-Parametric Measures of Correlation

Spearman’s Rho

Goodman’s & Kruskal’s Gamma

Stat203 Page 1 of 26Fall2011 – Week 13, Lecture 1

More Correlation?

The Pearson correlation we’ve already studied was somewhat restrictive.

Recall, Pearson correlation was usually on good when:- both variables are interval or ratio scaled- both variables are approximately normally distributed

What about when we want to know the relationship between two variables that are:

- interval or ratio but not normally distributed- ordinal- nominal

Stat203 Page 2 of 26Fall2011 – Week 13, Lecture 1

Correlation: Interval or Ratio data thatisn’t Normally Distributed

Spearman’s Rho is a measure of correlation you can use when the data is strongly skewed, or you’re not sure whether the variables are Normally distributed.

It also sometimes called the Spearman Rank-Order correlation coefficient, so let’s define ‘Rank-Order’ first.

An observation’s rank is it’s order (highest to smallest) out of the n observations in the dataset.

An example …

Stat203 Page 3 of 26Fall2011 – Week 13, Lecture 1

Example (C12 Q5): Is there a relationship between Distance from School and Number of clubs joined? First, let’s look at ‘ranks’:

Distance to School (miles)

Rank Order of Distance to School

Number of Clubs Joined

Rank order of Number of Clubs Joined

Lee 4 3

Rhonda 2 1

Jess 7 5

Evelyn 1 2

Mohammed 4 1

Steve 6 1

George 9 9

Juan 7 6

Chi 7 5

David 17 8

Stat203 Page 4 of 26Fall2011 – Week 13, Lecture 1

Now that we have the ranks of each individual’s value of each variable, Spearman’s Rho is actually calculated using exactly the same formula as Pearson’s correlation, but using the ranks!

Let’s look at this using SPSS.

First off, let’s examine the histograms of these two variables to see why we shouldn’t use the Pearson Correlation.

Stat203 Page 5 of 26Fall2011 – Week 13, Lecture 1

Are these variables normally distributed?

... the sample size is small, so it’s hard to tell … but there is one easy way to see which correlation to use.Stat203 Page 6 of 26Fall2011 – Week 13, Lecture 1

Use SPSS to calculate both the Pearson and Spearman Correlations

Stat203 Page 7 of 26Fall2011 – Week 13, Lecture 1

If the data were perfectly normally distributed, the Spearman and Pearson correlations would be identical!

In this case, the Pearson and Spearman correlation coefficients are not identical and the histograms seem to show some skewness, so we should use the Spearman’s Rho as the correlation.

So, our conclusion from examining this data is that Distance to School and the number of clubs joined have a statistically significant (p-value = 0.002), positive correlation (ρ = 0.838)

Stat203 Page 8 of 26Fall2011 – Week 13, Lecture 1

Correlations between Ordinal Variables?

From the previous example, we should note that the key to calculating Spearman’s Rho was to identify the rank-order of each individual for each variable.

Recall, Ordinal variables only give us an ‘ordering’ … or the rank of one individual compared to another!

So … we can use Spearman’s Rho for correlations involving ordinal data!

Stat203 Page 9 of 26Fall2011 – Week 13, Lecture 1

Example (Ch12, q7): A researcher ranks population density and Quality of Life for 10 cities. Is there a relationship between these two variables?

Research Question:

Individuals:

Population:

Variables:

Parameter:

Stat203 Page 10 of 26Fall2011 – Week 13, Lecture 1

Statistical Hypothesis:

… From SPSS:

Conclusion:

Stat203 Page 11 of 26Fall2011 – Week 13, Lecture 1

… but note that the Spearman correlation is identical to the Pearson!

This is because the data we analyzed was the ranks … and when only the ranks are available Spearman and Pearson will be the same.

Stat203 Page 12 of 26Fall2011 – Week 13, Lecture 1

Example (Ch12, Q11): Comparing High School GPA to College performance. Is there a relationship between the two?

Research Question:

Individuals:

Population:

Variables:

Parameter:

Stat203 Page 13 of 26Fall2011 – Week 13, Lecture 1

Statistical Hypothesis:

From SPSS:

Conclusion:

Stat203 Page 14 of 26Fall2011 – Week 13, Lecture 1

… but note now that the Pearson does not match the Spearman.

Why?

Only one variable contained ranks, the other variable was ratio scaled. So, for determining the correlation between ratio and ordinal data, we should use the Spearman.

Stat203 Page 15 of 26Fall2011 – Week 13, Lecture 1

Goodman’s and Kruskal’s GammaAlthough Spearman’s Rho can be used in most cases involving ordinal data, if you have ‘lots’ of ties, you may have to use Gamma as an alternative.

What’s a tie? A tie is when many many individuals will have the same value of a variable, or combination of variables.

Why would there be lots of ties?

Think back to the homework; recall the General Happiness variable from the GSS – there were only three categories for this ordinal variable and most people selected ‘Pretty Happy’ … they were all tied.

Gamma in SPSSStat203 Page 16 of 26Fall2011 – Week 13, Lecture 1

As with the other statistics, we won’t calculate this by hand. But it’s easy to find in SPSS.

Example (Ch12, Q12): Is there a relationship between SocioEconomic Status and Number of books read?

Let’s first look at this data in SPSS:

Stat203 Page 17 of 26Fall2011 – Week 13, Lecture 1

Stat203 Page 18 of 26Fall2011 – Week 13, Lecture 1

Are there ties?

Stat203 Page 19 of 26Fall2011 – Week 13, Lecture 1

Let’s do a cross-tab to obtain the table in the textbook, but note that we can generate some statistics along the way:

Stat203 Page 20 of 26Fall2011 – Week 13, Lecture 1

… and the output:

Stat203 Page 21 of 26Fall2011 – Week 13, Lecture 1

So, what would our conclusion be regarding the relationship between these variables?

When making conclusions regarding ‘relationship’ questions, quote the strength and direction (ie: the actual correlation) and the p-value or ‘significance’.

Conclusion:

Stat203 Page 22 of 26Fall2011 – Week 13, Lecture 1

So, we’ve studied 3 different ways to calculate correlation:- Pearson’s r- Spearman’s Rho- Goodman’s and Kruskall’s Gamma

How do I know which to use?

- Consider the type of variables involved

- Consider the distribution of the variables

- Consider the # of ties

... and if all else fails, if Spearman’s gives a different conclusion than Pearson, use Spearman … and if it looks like you have 10% or more of your data with the same value of one variable or the other, use Gamma

Stat203 Page 23 of 26Fall2011 – Week 13, Lecture 1

For all correlations …

All correlations we have studied have a maximum of 1 and a minimum of 1 and describe the strength and direction of the relationship between TWO variables.

SPSS provides a p-value for all correlations, and all are interpreted the same (significance of the relationship).

Research Hypotheses involving correlations always ask about a significant relationship between the variables.

Stat203 Page 24 of 26Fall2011 – Week 13, Lecture 1

Today’s TopicsNon-Parametric Measures of Correlation

- Pearson’s r isn’t always good enough- All correlations have similar interpretations regarding

strength and direction of relationship- All correlations have a p-value which is interpreted

similarlySpearman’s Rho

- for non-normal (ie: skewed) interval or ratio-scaled variables

- if one or more of the variables are ordinalGoodman’s & Kruskal’s Gamma

- useful for correlations between ordinal variables with lots of ties

Reading:Stat203 Page 25 of 26Fall2011 – Week 13, Lecture 1

This lecture included material from Chapter 12 up to page 430.

No more reading for this course!

Stat203 Page 26 of 26Fall2011 – Week 13, Lecture 1