Data mining, prediction, correlation, regression, correlation analysis, regression analysis.
Lecture 5 Correlation and Regression Dr Peter Wheale.
-
Upload
emery-powers -
Category
Documents
-
view
237 -
download
0
Transcript of Lecture 5 Correlation and Regression Dr Peter Wheale.
![Page 1: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/1.jpg)
Lecture 5Correlation and Regression
Dr Peter Wheale
![Page 2: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/2.jpg)
A Scatter Plot of Monthly Returns
![Page 3: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/3.jpg)
Interpretation of Correlation Coefficient
Correlation Interpretation coefficient (r)
(r) r = +1 perfect positive correlation 0 < r < +1 positive linear relationship r = 0 no linear relationship r = -1 perfect negative correlation -1 < r < 0 negative linear relationship
![Page 4: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/4.jpg)
Scatter Plots and Correlation
![Page 5: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/5.jpg)
Covariance of Rates of Return
n
t,1 1 t,2 2t 1
1,2
R R R Rcov
n 1Example: Calculate the covariance between the returns on the two stocks indicated below:
![Page 6: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/6.jpg)
Covariance Using Historical Data
R2 = 0.07
Σ = 0.0154
Cov = 0.0154 / 2 = 0.0077 R1 = 0.05
![Page 7: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/7.jpg)
Sample Correlation Coefficient
Correlation, ρ, is a standardized measure of covariance and is bounded by +1 and –1
1,21,2
1 2
Cov
1,2
0.00510.662
0.07 0.11
Example: The covariance of returns on two assets is 0.0051 and σ1= 7% and σ2= 11%. Calculate ρ1,2.
![Page 8: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/8.jpg)
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
![Page 9: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/9.jpg)
Example
Data:n = 10r = 0.475 Determine if the sample correlation is significant at the
5% level of significance.t = 0.475 (8)0.5 / [1 – (0.475)2] 0.5
= 1.3435 / 0.88 = 1.5267 The two-tailed critical t – values at a 5% level of
significance with df = 8 (n-2) are found to be +/- 2.306.
Since -2.306≤ 1.5267≤ 2.306, the null hypothesis cannot be rejected, i.e. correlation between variables X and Y is not significantly different from zero at a 5% significance level.
![Page 10: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/10.jpg)
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
![Page 11: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/11.jpg)
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
![Page 12: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/12.jpg)
Testing H0: Correlation = 0
The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is:
Degrees of freedom is n – 2
![Page 13: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/13.jpg)
Linear Regression• Dependent variable: you are trying to
explain changes in this variable• Independent variable: the variable being
used to explain the changes in the dependent variable
• Example: You want to predict housing starts using mortgage interest rates:
Independent variable = mortgage interest ratesDependent variable = housing starts
![Page 14: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/14.jpg)
Regression Equation
y-Intercept
Slope Coefficient
Independent Variable
Dependent Variable
Error Term
![Page 15: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/15.jpg)
Assumptions of Linear Regression
• Linear relation between dependent and independent variables
• Independent variable uncorrelated with error term
• Expected value of error term is zero• Variance of the error term is constant• Error term is independently distributed• Error term is normally distributed
![Page 16: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/16.jpg)
Estimated Regression Coefficients
Estimated regression line is:
Y-InterceptSlope
![Page 17: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/17.jpg)
Estimating the slope coefficient
b1 = the cov(X,Y) / var(X)
Example Compute the slope coefficient and intercept term for
the least squares regression equation using the following information:
Where X – Xmean multiplied by Y-Ymean = 445, and X – Xmean squared = 374.50. The sample means of X and Y = 25 and 75, respectively.
The slope coefficient, b1 = 445/374.5 = 1.188.
The intercept term, b0 = 75 – 1.188 (25) = 45.3.
![Page 18: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/18.jpg)
Calculating the Standard Error of the Estimate (SEE)
• SEE measures the accuracy of the prediction from a regression equation It is the standard dev. of the error term The lower the SEE, the greater the accuracy
SSESEE
n – 2
where:
SSE sum of squared errors
![Page 19: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/19.jpg)
Interpreting the Coefficient of Determination (R2)
• R2 measures the percentage of the variation in the dependent variable that can be explained by the independent variable
• An R2 of 0.25 means the independent variable explains 25% of the variation in the dependent variable
Caution: You cannot conclude causation
![Page 20: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/20.jpg)
Calculating the Coefficient of Determination (R2)
• For simple linear regression, R2 is the correlation coefficient (r) squared
Example: Correlation coefficient between X and Y, (r) = 0.50
Coefficient of determination = 0.502 = 0.25
![Page 21: Lecture 5 Correlation and Regression Dr Peter Wheale.](https://reader033.fdocuments.us/reader033/viewer/2022061612/56649dbc5503460f94aae385/html5/thumbnails/21.jpg)
Coefficient of Determination (R2)
R2 can also be calculated with SST and SSRSS Total = SS Regression + SS Error
Total variation = explained variation + unexplained variation
2 SSR SST – SSE SSE explained variationR = = =1– =
SST SST SST total variation