Statistics and Research methods
description
Transcript of Statistics and Research methods
Statistics and Research methods
Wiskunde voor HMIBijeenkomst 2
Correlation
Association between scores on two variables– e.g., age and coordination skills in children, price
and quality
Scatter Diagram
A Scatter Diagram (or scatterplot) is a visual display of the relationship between two variables
Example: A company is interested in whether there is a relationship between the number of employees supervised by a manager and the amount of stress reported by that manager
Stress and Employees Supervised
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10 12
# of Employees Supervised
Stre
ss L
evel
Cause and Effect
An important type of relationship between two variables: cause and effect
Independent variable = cause Dependent variable = effect
Correlation and Causality
Three possible directions of causality:
1. X Y
2. X Y
3. Z
X Y
Correlation and Causality
In situations where variables cannot be manipulated experimentally, it is difficult to know whether one is actually causing the other
Example in newspaper: “drinking coffee causes cancer”– However, a third variable may cause both high
coffee consumption and cancer– Such third variables are called ‘confounds’
However, we can still try to predict one variable on the basis of a second variable, even if the causal relationship has not been determined
Predictor variable Criterion variable
Scatter Diagrams
The independent (or predictor) variable goes on the horizontal (x) axis; the dependent (or criterion) variable on the vertical (y) axis.
Hours of Overtime Worked and Spouse’s Marital Satisfaction
0123456789
10
0 5 10 15 20 25
Hours of Overtime
Mar
ital S
atis
fact
ion
Patterns of Correlation
Linear correlation Curvilinear correlation No correlation Positive correlation Negative correlation
Degree of Linear CorrelationThe Correlation Coefficient
Figure correlation using Z scores Cross-product of Z scores
– Multiply score on one variable by score on the other variable
Correlation coefficient– Average of the cross-products of Z scores
Degree of Linear CorrelationThe Correlation Coefficient
Formula for the correlation coefficient:
Positive perfect correlation: r = +1 No correlation: r = 0 Negative perfect correlation: r = –1
Correlation and Causality
Correlational research design– Correlation as a statistical procedure– Correlation as a kind of research design
Issues in Interpreting the Correlation Coefficient
Statistical significance e.g. p < .05 Proportionate reduction in error =
Proportion of variance accounted for– r2
– Used to compare correlations
Issues in Interpreting the Correlation Coefficient (continued)
Restriction in range
Unreliability of measurement
Correlation in Research Articles
Scatter diagrams occasionally shown Correlation matrix
Regression
Making predictions– does knowing a person’s score on one variable allow us to say
what their score on a second variable is likely to be? The method we use to make predictions is called
regression When scores on one variable are used to predict
scores on another variable, it is called bivariate regression (two variables)
When scores on two or more variables are used to predict scores on another variable, it is called multiple regression
Naming (two variables)
Variable Predicted From
Variable Predicted To
Name Independent Variable Dependent Variable
Alternative Name Predictor Variable Criterion Variable
Symbol X Y
Example Number of hours slept night before
Happy mood that day
• These two variables correlate positively
• People who drink a lot of coffee tend to be happy, and people who do not tend to be unhappy
• Preview: The line is called a regression line, and represents the estimated linear relationship between the two variables. Notice that the slope of the line is positive in this example.
The Regression Line
Relation between predictor variable and predicted values of the criterion variable
Formula: Y = a + (b) X Slope of regression line
– Equals b, the raw-score regression coefficient Intercept of the regression line
– Equals a, the regression constant Method of least squares to derive a and b
Method of least squares
a and b derived by:– least squares method (drawing)– line through MX and MY
where b = (SDY/SDX) = (r)(SDY/SDX) a = MY – bMX
The Regression Line
Y = a + (b) X
Bivariate Raw Score Prediction
Direct raw-score prediction model– Predicted raw score (on criterion variable) =
regression constant plus the result of multiplying a raw-score regression coefficient by the raw score on the predictor variable
– Formula
– The “hat” over Y means “predicted”
))((ˆ XbaY
Bivariate prediction with Z scores
Given the Z score for X, what is the Z score for Y? We use the prediction model:
where (beta) is the “standardized regression coefficient”
It’s also called “beta weight”, because it tells us how much “weight” to give to ZX when making a prediction for ZY.
The “hat” over ZY means “predicted”.
XY ZZ ˆ
What is ?
It turns out that the best value to use for in the prediction model is r, the (Pearson) correlation coefficient
Thus, the bivariate regression model is
When r = 1, ; when r = -1,
When r = 0; no relation;
“best guess” for Y is the mean score
XY ZrZ ˆXY ZZ ˆ XY ZZ ˆ
0ˆ YZ
Proportionate Reduction in Error
We want a measure of how accurate our regression model (raw score prediction formula) is predicting the data
We can compare the error we make when predicting with our regression model, SSError to the error that we would make if we didn’t have the model SSTotal
Proportionate Reduction in Error
Error– Actual score minus the predicted score
SSError = Sum of squared error using prediction model
SSTotal = Sum of squared error when predicting from the mean = 2MY
22 )ˆ( YYError
2)ˆ( YY
Error and Proportionate Reduction in Error
Formula for proportionate reduction in error:
Proportionate reduction in error = r2
Proportion of variance accounted for
Total
ErrorTotal error in reduction ateProportionSS
SSSS