Interpreting Bi-variate OLS Regression
-
Upload
jescie-hansen -
Category
Documents
-
view
27 -
download
1
description
Transcript of Interpreting Bi-variate OLS Regression
Week 4, 2007 Lecture 4 Slide #1
Interpreting Bi-variate OLS Regression• Stata Regression Output• Regression plots and RSS• R2 -- Coefficient of Determination
– Adjusted R2
• Sample Covariance/Correlation • Hypothesis Testing
– Standard Errors– T-tests and P-values
Week 4, 2007 Lecture 4 Slide #2
Data
• Use the “caschool.dat” file
• Data description:– CaliforniaTestScores.pdf
• Build a Stata do-file as you go
• Model:– Test score=f(student/teacher ratio)
Week 4, 2007 Lecture 4 Slide #3
Stata Regression Model:Regressing Student Teacher Ratio onto Test Score
0
5
10
15
Percent
15 20 25Student Teacher Ratio
0
5
10
15
Percent
600 620 640 660 680 700Test Score
histogram str, percent normal histogram testscr, percent normal
Week 4, 2007 Lecture 4 Slide #4
Source | SS df MS Number of obs = 420-------------+------------------------------ F( 1, 418) = 22.58 Model | 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual | 144315.484 418 345.252353 R-squared = 0.0512-------------+------------------------------ Adj R-squared = 0.0490 Total | 152109.594 419 363.030056 Root MSE = 18.581------------------------------------------------------------------------------ testscr | Coef. Std. Err. t P>|t| Beta-------------+---------------------------------------------------------------- str | -2.279808 .4798256 -4.75 0.000 -.2263628 _cons | 698.933 9.467491 73.82 0.000 .------------------------------------------------------------------------------
Regression Outputregress testscr str
Week 4, 2007 Lecture 4 Slide #5
Regression Descriptive Statistics
cor testscr str, means
Variable | Mean Std. Dev. Min Max-------------+---------------------------------------------------- testscr | 654.1565 19.05335 605.55 706.75 str | 19.64043 1.891812 14 25.8
| testscr str-------------+------------------ testscr | 1.0000 str | -0.2264 1.0000
Week 4, 2007 Lecture 4 Slide #6
Regression Plot
600
620
640
660
680
700
15 20 25str
95% CI Fitted valuestestscr
twoway (scatter testscr str) (lfitci testscr str)
Week 4, 2007 Lecture 4 Slide #7
Measuring “Goodness of Fit”
• Root of Mean Squared Error (“Root MSE”)
– Measures spread around the regression line
• Coefficient of Determination (R2)
se =RSSn−K
, where RSS= e2 , K=parameters∑
∑∑
∑ ∑
−==−=
−=−=
2
222
22
)()1( and
)( and )ˆ(
YY
e
TSS
RSSR
TSS
ESSR
YYTSSYYESS
i
ii
“model” or explained sum of squares “total” sum of squares
Week 4, 2007 Lecture 4 Slide #8
Explaining R2
ˆ Y
Y
unexplained deviation explained deviation
For each observation Yi, variation around the mean canbe decomposed into that which is “explained” by theregression and that which is not:
Book terminology:TSS = (all)2
RSS = (unexplained)2
ESS = (explained)2
Stata terminology:Residual = (unexplained)2
Model = (explained)2
Total = (all)2
Week 4, 2007 Lecture 4 Slide #9
Sample Covariance & Correlation
• Sample covariance for a bivariate model is defined as:
• Sample correlations (r) “standardize” covariance by dividing by the product of the X and Y standard deviations:
sXY =(Xi − X)(Yi − Y)∑
n−1
r =sXYsXsY
Sample correlations range from-1 (perfect negative relationship) to+1 (perfect positive relationship)
Week 4, 2007 Lecture 4 Slide #10
Standardized Regression Coefficients(aka “Beta Weights” or “Betas”)
• Formula:
• In our example:
• Interpretation: the number of std. deviations change in Y one should expect from a one-std. deviation change in X.
b1* =b1
sXsY
226.019.053
1.892 2.28- −=⎟
⎠
⎞⎜⎝
⎛∗
Week 4, 2007 Lecture 4 Slide #11
Hypothesis Tests for Regression Coefficients
• For our model: Yi = 698.933-2.279808*Xi+ei
• Another sample of 420 observations would lead to different estimates for b0 and b1. If we drew many
such samples, we’d get the sample distribution of the estimates
• We need to estimate the sample distribution, (because we usually can’t see it) based on our sample size and variance
Week 4, 2007 Lecture 4 Slide #12
To do that we calculate SEbs (Bivariate case only)
SEb1 =seTSSX
, whereTSSX = (Xi − X)2∑
SEb0 =se1n+
X2
TSSX
Week 4, 2007 Lecture 4 Slide #13
Interpreting Standard Errors
• For our model:– b0 = 698.933, and SEb0 = 9.467
– b1 = -2.28, and SEb1 = .4798
b1 = -2.28b1 + SEb1= -1.8
0(which is 4.75 SEb1 “units”
away from b1)
Assuming that we estimated thesample standard error correctly, wecan identify how many standarderrors our estimate is away fromzero.
Estimated Sampling Distribution for b1 The T-test reports the number ofstandard errors our estimate fallsaway from zero. Thus, the “T” forb1 is 4.75 for our model. (rounding!)
b1 - SEb1=-2.76
Week 4, 2007 Lecture 4 Slide #14
Classical Hypothesis Testing
Estimated b1 = 2.27(working hypothesis)
Assume that b1 = 0.0(null hypothesis)
Assume that b1 is zero. What is the probability that your sample would haveresulted in an estimate for b1 that is 4.75 SEb1’s away from zero?
To find out, determine the cumulative density of the estimated samplingdistribution that falls more than 4.75 SEb1’s away from zero.
See Table 2, page 757, in Stock & Watson. It reports discrete “p-values”, giventhe sample size and t-values. Note the distinction between 1 and 2 sided tests
In general, if the t-stat is above 2,the p-value will be <0.05 -- which isthe acceptable upper limit in aclassical hypothesis test.
Note: in Stata-speak,a p-value is a “p>|t|”
Week 4, 2007 Lecture 4 Slide #15
Coming up...• For Next Week
– Use the caschool.dta dataseet
– Run a model in Stata using Average Income (avginc) to predict Average Test Scores (testscr)
– Examine the univariate distributions of both variables and the residuals
• Walk through the entire interpretation
• Build a Stata do-file as you go
• For Next Week:– Read Chapter 8 of Stock & Watson