Day 4 Correlation and Regression by Binam Ghimire
description
Transcript of Day 4 Correlation and Regression by Binam Ghimire
![Page 1: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/1.jpg)
1
Day 4Correlation and Regression
by Binam Ghimire
![Page 2: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/2.jpg)
Example DataDay Output (TONS) Cost £000
1 23 582 17 503 24 544 35 645 10 406 16 437 15 428 24 509 18 5310 30 62
![Page 3: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/3.jpg)
The scatter diagram of the data would appear as below:
5 10 15 20 25 30 35 4040
45
50
55
60
65
70
![Page 4: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/4.jpg)
Alternatively a negative correlation would appear as below:
5 10 15 20 25 30 35 400
10
20
30
40
50
![Page 5: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/5.jpg)
Alternatively data with no correlation may appear as below:
0 5 10 15 20 25 30 35 400
10
20
30
40
50
60
![Page 6: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/6.jpg)
Correlation Scale
-1 0 +1
Perfect negative No correlation Perfect positive correlation correlation
![Page 7: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/7.jpg)
Pearson’s product moment correlation coefficient (r)
r = n ∑ xy - ∑x ∑y
√ [n ∑x2 - (∑x)2] [n ∑y2 - (∑y)2]
x y xy x2 y2
23 58 1334 529 3364 17 50 850 289 2500 24 54 1296 576 2916∑ 212 516 11452 5000 27242
![Page 8: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/8.jpg)
Pearson’s product moment correlation coefficient (r) or the formula may be
![Page 9: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/9.jpg)
Pearson’s product moment correlation coefficient (r) (2)
r = 10 x 11452 – 212 * 516
√ [10 x 5000 – (212)] [10 x 27242 – (516)]
= 5128
√ 5056 x 6164
= 0.9186
![Page 10: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/10.jpg)
Linear Regression
Need to establish a ‘line of best fit’ The ‘freehand method’ has many drawbacks.
In some sense we need the ‘best fit’ to the data. To obtain this we do not use crude graphical techniques. We identify the ‘line of best fit’ or ‘least squares line.’
![Page 11: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/11.jpg)
Linear Regression (2)
The equation for this line is Y = 30.10 + 1.014X
![Page 12: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/12.jpg)
Linear Regression (3)
The equation of this line is Y =30.10 +1.014XBut how is this obtained?
The scattered points illustrate the actual data, while the least squares line is an estimate of Y for a given value of X. Notice the distance between the scattered points and the line; this will give you some idea of how good a fit the line is.
![Page 13: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/13.jpg)
Linear Regression (4)
How do we determine the least squares line?
Simply we need to determine the intercept (a) and the (b) gradient.
The formula is therefore Y = a + bx
You need to apply a little calculus (we will omit that process here) to develop standard equations.
![Page 14: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/14.jpg)
Linear Regression Equations b = n ∑ xy - ∑ x ∑ y
n ∑X2 – (∑X)2
b = 10 x 11452 – 212 x 516 10 x 5000 – 44944
b = 1.0142405 Or
2ˆ
ˆˆ
XXXXYY
i
ii
![Page 15: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/15.jpg)
Linear Regression Equations (2)And
a = Y – b. X or
a = 51.6 – 1.0142405 x 21.2
a = 30.098101
Rounding these values a little:Y = 30.10 + 1.014X
XxYi
![Page 16: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/16.jpg)
Coefficient of Determination The coefficient of determination measures the
proportion of the variation in the dependent variable (y) explained by the variation in the independent variable (x).
It is reported as r2 - the square of the product moment correlation coefficient.
Does not explain causation
![Page 17: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/17.jpg)
Coefficient of Determination (2) For our previous example:
r2 = (0.9186)2 = 0.844
This means that 84.4% of the variation in cost is dependent upon output volume. Alternatively, 15.6% of variation is not explained.
![Page 18: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/18.jpg)
So far Correlation is measured on a scale from -1 to +1
using Pearson’s product moment correlation coefficient (r).
Linear regression identifies the line of ‘best fit’ using the formula Y = a + bx
The coefficient of determination (r2) measures the extent to which the dependent variable is explained by the independent variable.
![Page 19: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/19.jpg)
Question to learn the terminologies The data below shows annual company income (£m) against year of
trading.
Year Income (£m)
1 202 233 264 285 35
A regression of income on year gives the following results:
r = 0.974, r squared = 0.948, intercept = 11.4, slope = 3.5 Can we explain each of the results above. Use the results above to make a forecast for company income for year
6. What assumption is made in making this forecast?
![Page 20: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/20.jpg)
Coefficient of Determination (3) Relationship among SST, SSR, SSE:
SST = SSR + SSE
Where,SST = total sum of squares given by
SSR = sum of squares due to regression
SSE = sum of squares due to error
Coefficient of Determination is : r2 = SSR/ SST
![Page 21: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/21.jpg)
Coefficient of Determination (4) Relationship among SST, SSR, SSE:
SST = SSR + SSETotal Variation= Explained Variation + Unexplained
Variation
Coefficient of Determination is : r2 = SSR/ SST
VariationTotalVariationExplained
SSTSSE1
SSTSSESST
SSTSSRr2
![Page 22: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/22.jpg)
Calculating the Standard Error of the Estimate (SEE)
SEE measures the accuracy of the prediction from a regression equation It is the standard dev. of the error term The lower the SEE, the greater the accuracy
Where SSE = sum of Squared Errors
2nSSESEE
![Page 23: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/23.jpg)
Covariance of Rates of Return
n
t,1 1 t,2 2t 1
1,2
R R R Rcov
n 1Example: Calculate the covariance between the returns on the two stocks indicated below:
![Page 24: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/24.jpg)
Covariance Using Historical Data
R2 = 0.07
Σ = 0.0154
Cov = 0.0154 / 2 = 0.0077 R1 = 0.05
![Page 25: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/25.jpg)
Sample Correlation CoefficientCorrelation, ρ, is a standardized measure of covariance and is bounded by +1 and –1
1,21,2
1 2
Cov
1,2
0.0051 0.6620.07 0.11
Example: The covariance of returns on two assets is 0.0051 and σ1= 7% and σ2= 11%. Calculate ρ1,2.
![Page 26: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/26.jpg)
Testing H0: Correlation = 0The test of whether the true correlation between two random variables is zero (i.e., there is no correlation) is a t-test based on the sample correlation coefficient, r. With n (pairs of) observations the test statistic is: Degrees of freedom
is n – 2
![Page 27: Day 4 Correlation and Regression by Binam Ghimire](https://reader036.fdocuments.us/reader036/viewer/2022062310/56815f5b550346895dce3e00/html5/thumbnails/27.jpg)
ExampleData:n = 10r = 0.475 Determine if the sample correlation is significant at the
5% level of significance.t = 0.475 (8)0.5 / [1 – (0.475)2] 0.5
= 1.3435 / 0.88 = 1.5267 The two-tailed critical t – values at a 5% level of
significance with df = 8 (n-2) are found to be +/- 2.306.
Since -2.306≤ 1.5267≤ 2.306, the null hypothesis cannot be rejected, i.e. correlation between variables X and Y is not significantly different from zero at a 5% significance level.