Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r...
Transcript of Correlation - Weebly · 2019. 11. 6. · No Correlation .75 Negative Positive Correlation r = .88 r...
1/27/15
1
Correlation • The relationship between two variables
• E.g., achievement in college is related to…? • Motivation • Openness to new experience • Conscientiousness • IQ • Etc…
• Relationships can be causal • E.g., Higher motivation likely causes higher achievement
in college (and elsewhere)
• Although, they can also be non-causal (i.e., spurious)
Correlation and Scatterplots • Correlations are best visualized by using a scatterplot graph
• Two variables are plotted on the x and y axes • If there is a relationship between them, a noticeable
pattern will emerge
• E.g., scatterplot showing • Y-axis = number of hours playing violent video games
per week
• X-axis = number of school infractions per year
Data Hours playing
video games per week
Number of school infractions per
year 1 2 3 1 3 2 4 4 5 3 5 3 6 6 6 5 6 6 6 6 6 …
7 …
Hours playing video games per
week
Number of school infractions per
year … 6
… 4
7 7 7 6 7 8 8 6 8 10 9 7
10 12 11 10 12 10 12 11
1/27/15
2
Data Hours playing
video games per week
Number of school infractions per
year 1 2 3 1 3 2 4 4 5 3 5 3 6 6 6 5 6 6 6 6 6 …
7 …
Hours playing video games per
week
Number of school infractions per
year 1 2
Hours playing video games per
week
Number of school infractions per
year
Hours playing video games per
week
Number of school infractions per
year 1 2 3 1
Hours playing video games per
week
Number of school infractions per
year 1 2 3 1 3 2
Hours playing video games per
week
Number of school infractions per
year 1 2 3 1 3 2 4 4 5 3 5 3
Notice a pattern emerging?
Scatterplot Graph
Scatterplot Graph • From the scatter plot we can tell two things about the relationship…
1. Direction of the relationship
¤ Positive = as one variable increases, so does the other
¤ Negative = as one variables increases, the other decreases
1/27/15
3
Scatterplot Graph • From the scatter plot we can tell two things about the relationship…
2. Strength of the relationship
¤ Strong = dots fit closely together, almost forming a line
¤ Weak = dots are scattered about randomly
Scatterplot Graphs • Strong, positive relationship…
Scatterplot Graphs • Strong, negative relationship…
1/27/15
4
Scatterplot Graphs • Weak, negative relationship…
Scatterplot Graphs • Weak, positive relationship…
Scatterplot Graphs • No relationship (random dispersion)…
1/27/15
5
Correlation • Mathematically expressed as r or R
• Can range from -1 to +1
r = +.75
-1.0 +1.0 0
Weakly Correlated
-.3 +.3
Strongly Correlated
-.8 +.8
No Correlation .75
Positive Negative
Correlation
r = .88 r = -.88
r = .42 r = -.42
r = .08 (non-significant)
Correlation ! Prediction • If a correlation is perfect (1 or -1), then we can perfectly predict one variable from the other
• Once we use correlations to make predictions, we are conducting regression analyses
• E.g., Y = height, X = weight
• If someone told us their weight, we could perfectly predict their height and vice versa
Weight
Hei
ght
1/27/15
6
Weight
Hei
ght
Regression Line • To make predictions, we first need to figure out the regression line • “Predictive line” that best fits the relationship between
the two variables
• If the correlation is perfect, you simply draw a line over the data points
Regression Line • We can then use this line to make predictions
• If R = 1, our predictions will always be correct
¤ E.g., If weight perfectly predicted height
¤ What would be the weight of someone who is 5’ 10”?
" 180 lbs
6’ 2” 6’ 0”
5’ 10” 5’ 8” 5’ 6” 5’ 4” 5’ 2”
120 150 180 210
R = 1.0
Regression Line • We can then use this line to make predictions
• If R = 1, our predictions will always be correct
¤ E.g., If weight perfectly predicted height
¤ What would be the height of someone who is 200 lbs.?
" 6’ 0”
6’ 2” 6’ 0”
5’ 10” 5’ 8” 5’ 6” 5’ 4” 5’ 2”
120 150 180 210
R = 1.0
1/27/15
7
Correlation ! Prediction • If a correlation is not perfect (always the case), we can still predict one variable from the other
• We just won’t be perfectly accurate
• E.g.,
• If someone told us their weight, we could predict their height with 95% confidence
Weight
Hei
ght
95% C.I.
Weight
Hei
ght
Regression Line • Whenever making predictions, we first need to figure out the “best fitting” regression line • Line that is the minimum average distance from every
point of data
Regression Line • Line that “bets fits” data
• I.e., Line that has is the minimum average distance from each point of data
1/27/15
8
Regression Line • Line that “bets fits” data
• I.e., Line that has is the minimum average distance from each point of data
Regression Line • Line that “bets fits” data
• I.e., Line that has is the minimum average distance from each point of data
Regression Line • Line that “bets fits” data
• I.e., Line that has is the minimum average distance from each point of data
1/27/15
9
Regression Line • Line that “bets fits” data
• I.e., Line that has is the minimum average distance from each point of data
Best fitting line
• Once you have a best-fitting line, you can make predictions using the regression equation
DV = b0 + b1 (IV)
• b0: known as the “intercept” or “constant”
• What is the predicted value of the DV if the IV = 0?
• b1: known as the “slope” or “regression line” • Average change in the DV based on change in the IV
Regression
• What is the predicted DV if IV = 0?
b0: Constant
b0 = 1
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1/27/15
10
• What is the predicted DV if IV = 0?
b0: Constant
b0 = 2.5
IV = Predictor
2
3
4
5
1
DV
= P
redi
cted
• What is the predicted DV if IV = 0?
b0: Constant
b0 = 0
IV = Predictor
2
3
4
5
1
DV
= P
redi
cted
• What is the predicted DV if IV = 0?
b0: Constant
b0 = 0
IV = Predictor
2
3
4
5
1
DV
= P
redi
cted
b0 = 1
b0 = 2.5
1/27/15
11
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = change in y-axis change in x-axis
b1 = change in DV change in IV
b1 = rise run
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = ? ?
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = +1 ?
1/27/15
12
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = +1 +3
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = ? ?
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = -1 ?
1/27/15
13
b0 = 4
• For every unit of change in the IV, how much does the DV also change?
b1: Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = -1 5
b0 = ?
• What is the constant and slope?
Constant and Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b1 = 1 3
b1 = 1 6
b1 = -1 3
b0 = 3
?
• What is the constant and slope?
Constant and Slope
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b0 = 2 b1 = 1 6 b0 = ? ?
1/27/15
14
• What is the constant and slope?
Constant and Slope D
V =
Pre
dict
ed
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
b0 = 5 b1 = -1 3
b0 = ? ?
DV = b0 + b1 (IV) b0 = b1 = DV = 1 + 1/3 (IV) What is the DV if the IV is 6? DV = 1 + 1/3 (6) = 3
Regression Equation
1 1/3
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
DV = b0 + b1 (IV) b0 = b1 = DV = 1 + 1/3 (IV) What is the DV if the IV is 3? DV = 1 + 1/3 (3) = 2
Regression Equation
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
1 1/3
1/27/15
15
DV = b0 + b1 (IV) b0 = b1 = DV = 1 + 1/3 (IV) What is the DV if the IV is 4.65? DV = 1 + 1/3 (4.65) = 2.55
Regression Equation
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
1 1/3
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
DV = b0 + b1 (IV) b0 = b1 = DV = 4 - 1/5 (IV) What is the DV if the IV is 5? DV = 4 - 1/5 (5) = 3
Regression Equation
4 -1/5
Regression Research • Advantages
• Can be easily used with any kind of data • E.g., experiments, survey research, archival studies
• Can predict one variable from other variables • E.g., predicted recidivism rates for prison inmates
• Disadvantages • Cannot be used with non-linear relationships
• Cannot make predictions beyond the data
• Cannot infer causal relationship • Although, many news articles and even scientific
papers incorrectly discuss regression as causation
1/27/15
16
• Relationship cannot be curvilinear
Assumptions of Regression
Straight line doesn’t fit!
• Relationship between variables must be linear
• Regression line that best fits the data is straight
Assumptions of Regression
• Yerkes-Dodson Law Examples of curvilinear relationships
1/27/15
17
• Practice effects
Examples of curvilinear relationships S
kill
Time
• Variables are not sharply skewed • Regression will work with skewed variables, but the
sharper the skew, the lower your R
Assumptions of Regression
R = .47 R = .39
• Variables are not sharply skewed • Regression will work with skewed variables, but the
sharper the skew, the lower your R
Assumptions of Regression
R = .47 R = non-sig.
1/27/15
18
• Variables are continuous • Regression will work with discrete variables, but the
fewer values, the lower your R
Assumptions of Regression
R = .47 R = .32* *Conservatism = discrete
• Variables are continuous • Regression will work with discrete variables, but the
fewer values, the lower your R
Assumptions of Regression
R = .47 R = .23* *Both IVs = discrete
DV = b0 + b1 (IV) b0 = b1 = DV = 4 - 1/5 (IV) What is the DV if the IV is 25? DV = 4 - 1/5 (25) = 4 – 5 = -1
Regression Equation
Doesn’t make sense!
DV
= P
redi
cted
IV = Predictor
2
3
4
5
1
1 2 3 4 5 6
4 -1/5
1/27/15
19
Potential Problem • We get into trouble when estimating the unknown
• Especially when trying to project into the future
• E.g., The U.S. saw a rise in crime following record lows during WW2
Potential Problem • Media projected these trends into the future
Potential Problem • Projections turned out to be completely wrong
1/27/15
20
Problems with Projection • Other example: Housing bubble in 2000s
• It’s never a good idea to project past available data
Regression ≠ Causation • Spurious Relationship – correlation between two variables that is created by a third variable • Being a Christian predicts being overweight (Feinstein,
American Heart Association, 2011) • In fact, states in the U.S. with the most Christian churches
tend to have the highest average BMI
• This spawned recent obesity prevention efforts by The Christian Post, Christian Leadership Alliance, and others
• Does this really mean being Christian causes people to become fat? • What could be some third factors?
Possible Third Factors • Christians, compared to non-Christians, tend to be…
-Gallup, 2003 – 2011
What would Jesus eat?
• More overweight
• Lower SES • More difficult to afford healthy foods
and exercise equipment/ gym membership
• Older
• More likely to live in the South • And thus eat a “Southern diet”
1/27/15
21
• Christians, compared to non-Christians, also tend to be…
-Gallup, 2003 – 2011
Possible Third Factors
• Happier
• Closer to family
• More generous
• Less psychotic
• These just don’t happen to be related to obesity
No crazies here ☺
Spurious Relationships • Psychologically speaking, almost all variables are related to each other to some extent
Christianity
Obesity
Loves Comedies
Extraverted
Flosses
Regularly
Dog
Owner
SES
Generosity
Happiness
Age
Related Variables
Christianity
Obesity
Loves Comedies
Extraverted
Flosses
Regularly
Dog
Owner
SES
Generosity
Happiness
Age
Spurious Relationships • Psychologically speaking, almost all variables are related to each other to some extent
• Because of this, you can find significant relationships between almost any two variables
• Especially if the sample is big enough to make even small relationships significant
1/27/15
22
Spurious Relationships • Correlated.org was started to discover some of these odd and senseless correlations • Collects random bits of information from its users and
runs correlations between all of the variables • Some of my favorites include…
• 67% of people who prefer to be the "O" in tic-tac-toe support capital punishment, compared with 40% of people in general
• 15% of people who dislike mayonnaise are good dancers, compared to 29% of people in general
• In general, 48% percent of people can burp at will, but of those who enjoy camping, 67% can burp at will
In the news… • The news media often turns correlation into causation to make a better story
• E.g., “Shaving less than once a day could increase a man's risk of having a stroke by around 70%” (BBC News, February 7, 2003) • What researchers actually found
was that men who have strokes tend to have less testosterone
• They suggested doctors ask men if they shave less than once per day as an indicator of low testosterone
In the news… • Correlations reported in the news often have a very small impact in the real world
• E.g., Researchers did find that getting breast implants tripled women’s risk of committing suicide
• “A desire for breast augmentation may be a symptom of a far deeper insecurity and low self-esteem, which, in extreme cases, could trigger a suicide attempt” (BBC News, March 7, 2003)
1/27/15
23
In the news… • Correlations reported in the news often have a very small impact in the real world
• Actual data:
• Out of 3,521 women who received breast implants in Sweden from 1965 to 1993, 15 committed suicide
• In the general Swedish population, you would expect 5 out of 3,500 to commit suicide • So, the risk was tripled (5 ! 15) • But the actual risk of suicide only went from 0.2% to 0.4%
• Also, these women differed in many other ways besides breast augmentation (SES, lifestyle, religiosity, etc.)
In the news… • Correlations may be misleading depending on how variables are operationalized
• E.g., Researchers found uncoordinated children are more likely to become obese adults
• Those who were obese at age 33 had “57% higher odds of having poor hand control at age seven, were twice as likely to have suffered poor coordination and almost four times as likely to have been clumsy” (British Medical Journal, Osika & Montgomery, 2008)
In the news… • Correlations may be misleading depending on how variables are operationalized
• “Coordination scores” of kids were attained by asking their teachers how clumsy they were • Problems with this method?
• Teacher may have assumed that overweight kids are clumsy • Common stereotype of
overweight people
• E.g., Researchers found uncoordinated children are more likely to become obese adults
1/27/15
24
In the news… • Correlations may be misleading depending on how variables are operationalized
• More recently, this data was re-analyzed
• After controlling for people’s BMI at age 7, there was no correlation between childhood coordination and adult obesity
• All the researchers had actually found was that childhood obesity predicts adult obesity
• And that people assume over-weight kids are uncoordinated