Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research...
-
Upload
antonia-wadsworth -
Category
Documents
-
view
217 -
download
2
Transcript of Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research...
![Page 1: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/1.jpg)
Quantitative Data Analysis II: Correlation and Simple Linear
Regression
SI0030Social Research Methods
Week 6
Luke Sloan
![Page 2: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/2.jpg)
Introduction
• Last Week – Recap
• Correlation
• How To Draw A Line
• Simple Linear Regression
• Summary
![Page 3: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/3.jpg)
Last Week - Recap
• Hypotheses
• Probability & Significance (p=<0.05)
• Chi-square test for two categorical variables
• t-test for one categorical and one interval variables
• What about a test for two interval variables?...
![Page 4: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/4.jpg)
Correlation I
• Calculates the strength and direction of a linear relationship between two interval variables
• e.g. is there a relationship between age and income?
• Measured using the Pearson correlation coefficient (r)
• Data must be normally distributed (check with a histogram)
If not normally distributed use Spearman’s Rank Order Correlation (rho) - consult Pallant (2005:297)
![Page 5: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/5.jpg)
Correlation II• ‘r’ can take any value from +1 to -1
• +/- indicates whether the relationship is positive or negative
• +1 or -1 is a perfect linear relationship, but usually it is not this clear cut
• Rule of thumb:– +/- 0.7 = a strong linear relationship– +/- 0.5 = a good linear relationship– +/- 0.3 = a linear relationship– Below +/- 0.3 = weak linear relationship– 0 = no linear relationship
Alternatively:- +/- 0.10 to 0.29 = weak- +/- 0.30 to 0.49 = medium- +/- 0.50 to 1.00 - strong
![Page 6: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/6.jpg)
Correlation III
Positive relationship
Negative relationship
No relationshipPositiveRelationship
NoRelationship
NegativeRelationship
Formulate hypotheses and use scatter plots!
![Page 7: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/7.jpg)
Correlation IV
H1 = There is a relationship between Age and the number of years a candidate has been a member of a political party
H0 = There is no relationship between Age and the number of years a candidate has been a member of a political party
What do you think?
![Page 8: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/8.jpg)
Correlation V
Is this normal? Just to prove a point…
![Page 9: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/9.jpg)
Correlation VI
Correlations
What was your age last birthday
Number of years a party member
What was your age last birthday Pearson Correlation 1 .425**
Sig. (2-tailed) .000
N 4481 1874Number of years a party member Pearson Correlation .425** 1
Sig. (2-tailed) .000
N 1874 1936
**. Correlation is significant at the 0.01 level (2-tailed).
Perfect correlation against itself (obviously!) and number of cases in analysis
Pearson’s Correlation Coefficient is r=0.43 – medium/good positive linear relationship
Significance for correlation is problematic (highly dependent on sample size) – report p-value but ignore level of significance
![Page 10: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/10.jpg)
Correlation VII
• Don’t forget to refute or accept the null hypothesis and discuss the relationship
• Correlation is not causation!
The relationship between the number of years a candidate has been a member of a party and candidate age was explored using Pearson’s correlation
coefficient. Both variables were confirmed to have normal distributions [?] and a scatter plot revealed a linear relationship. There was a medium-strength,
positive relationship between the two variables (r=0.43, n=4481, p<0.05)... [go on to explain the relationship in detail]
![Page 11: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/11.jpg)
How To Draw A Line I
• Correlation is indicative of a relationship, but it does not allow us to quantify it
• What if we wanted to explain how an increase in age leads to an increase in years of party membership?
• What if we wanted to predict years of party membership based only on age?
The line of best fit is a predictive – it is the regression line!
![Page 12: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/12.jpg)
How To Draw A Line II• The regression line allows us to predict any given value of y when we
know x
• i.e. if we know the age of a candidate we can predict how long they are likely to have been a member of a political party
• Another (more useful!) example would be years in education and income
• Using a regression line we can predict someone’s income based on the number of years they have been in education
• Assumes a causal relationship – that income is ‘caused’ by years in education
![Page 13: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/13.jpg)
How To Draw A Line III
• But… we don’t simply look very closely at the line and the axis of the scatter plot because the regression line can be written as an equation:
y = a + b x
‘y’ represents the dependent
variable (what we are trying to predict) e.g.
income
‘a’ represents the intercept(where
the regression line crosses the vertical
‘y’ axis) aka the constant
‘b’ represents the slope of the line (the association
between ‘y’ & ‘x’) e.g. how income
changes in relation to education
‘x’ represents the independent
variable (what we are using to predict ‘y’) e.g.
years in education
![Page 14: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/14.jpg)
y = 0 + 1x
y = 0 + 2x
y = 0 + 0.5x
x axis
y ax
is
y = 0 + 0.25x
y = 1 + 1x
What about…
How To Draw A Line IV
![Page 15: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/15.jpg)
• If we know the slope (b) and the intercept (a), for any given value of ‘x’ we can predict ‘y’
Preconditions:
Simple Linear Regression
EXAMPLE: predicting income (y) in thousands (£) from years in education (x)
Intercept (a) = 4
Slope (b) = 1.5
For someone with 10 years of
education
Equations:
y = a + bx
Income = intercept + (slope*years in education)
Income = 4 + (1.5*10) = 19 (£19,000)
Or…
Or…
![Page 16: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/16.jpg)
Simple Linear Regression II
• Assumptions– Interval level data– Linearity between ‘x’ and ‘y’– Outliers (check scatter plot)– Sample size = 100+?
• R2 measure of ‘model fit’– Literally the Pearson’s correlation coefficient squared– R2 tells us how much of the variance in the dependent variable is
explained by the independent variable e.g. how much of the variance in income can be explained by age
– Expressed as a percentage (1.0 = 100%, 0.5 = 50% etc)
![Page 17: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/17.jpg)
Simple Linear Regression III
H0 = There is no relationship between Age and the number of years a candidate has been a member of a political partyH1 = There is a relationship between Age and the number of years a candidate has
been a member of a political party
H2 = As the age of a candidate increases, so will the number of years that they have been a party member
‘Years as Party Member’ = intercept + (slope * ’Age’)
![Page 18: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/18.jpg)
Simple Linear Regression IV
Model Summary
Model
R R Square Adjusted R SquareStd. Error of the
Estimate
1 .425a .181 .180 11.995
a. Predictors: (Constant), What was your age last birthday
ANOVAb
Model
Sum of Squares df Mean Square F Sig.1 Regression 59446.085 1 59446.085 413.170 .000a
Residual 269339.696 1872 143.878 Total 328785.781 1873
a. Predictors: (Constant), What was your age last birthday
b. Dependent Variable: Number of years a party member
Pearson’s correlation coefficient (same value!)
18% of variance in party membership (y) explained by age (x)
This tests the hypothesis that the model is a better predictor of party membership than if we simply
used the mean value of party membership
p<0.05 so the regression model is a significantly better
predictor than the mean value
![Page 19: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/19.jpg)
Simple Linear Regression V
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta1 (Constant) -6.899 1.156 -5.966 .000
What was your age last birthday
.418 .021 .425 20.327 .000
a. Dependent Variable: Number of years a party member
This is the intercept (a)
y = a + b x
This is the slope (b)
p<0.05 so ‘Age’ has a significant effect on ‘Party Membership’
‘Party Membership’ = -6.9 + (0.42 * ’Age’)A one unit increase in age will result in an increase in party membership of 0.42
Or…
![Page 20: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/20.jpg)
Simple Linear Regression VI… and this is what we saw in the original scatter plot!
• The ‘regression line’ will intercept the verticle (y) axis at -6.9
• The ‘regression line’ rises by 0.42 on the verticle axis (y) for every one unit increase on the horizontal axis (x)
• The R2 value is low because of the fanning effect (remember the histograms!)
![Page 21: Quantitative Data Analysis II: Correlation and Simple Linear Regression SI0030 Social Research Methods Week 6 Luke Sloan SI0030 Social Research Methods.](https://reader036.fdocuments.us/reader036/viewer/2022062511/551b303f5503465c7e8b48e2/html5/thumbnails/21.jpg)
Summary
• How to describe and quantify the relationship between two interval variables
• Correlation – the strength and direction of the association
• Regression – the causal and quantified effect of an independent on a dependent variable