Regression Review Evaluation Research (8521) Prof. Jesse Lecy Lecture 1 1.
-
Upload
eugene-bradford -
Category
Documents
-
view
215 -
download
0
Transcript of Regression Review Evaluation Research (8521) Prof. Jesse Lecy Lecture 1 1.
1
Regression ReviewEvaluation Research (8521)
Prof. Jesse Lecy
Lecture 1
Regression Review:
Regression Error and the Standard Error of the Regressors
What should be clear in my mind?
• Variance and standard deviation.• What is a sampling distribution?• Difference between the standard deviation and standard error.• What role the standard error plays in the confidence interval.• Definitions of covariance and correlation.• The “intuitive” regression formula.
The Road Map
Variance St. Dev St. Error Confidence Intervals
1
2
2
n
xxixMean:
Slope:
2xx
nSE x
x
111 bSEtb
xSEtx
22
22
n
e
n
SSE i 2
2
2
1 xxSE
i
b
(of x)
(of the residual) (of the slope)
(of the mean)
VARIANCE
Kevin Larry
Who travels the furthest?
Kevin Larry
2
22
1
1
xxN
xxN
i
i
5.24
1423
34
2523
STD. DEVIATION VS. STD. ERROR
• You first need a reference point in order to calculate average distance. You can’t ask “how far have I traveled if I am in Chicago?” without knowing where you started from. Use the mean as the starting point for each distance.
• Variance is a measure of average dispersion. So you add up all the distances. The problem is that when you use the mean then the sum of all distances from the mean will always be zero. As a result, you must square them first.
• Divide by N so you have an average squared distance from the mean. It is actually N-1 for reasons we will not discuss. This is variance.
• We want to reconcile the units so that the number is meaningful. We squared everything, so we take the square root. This is the standard deviation. Intuitively, it is the “average” amount each point must travel to reach the mean.
What is a standard deviation?
1-2 2-1x
1
2
2
n
xxs i
What is the standard error?
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
In this case the “error” means how far, on average, we come from the true mean.
Note! This is the standard error of the
mean.
n
sSEx
0 20 40 60 80 100
-30
0-2
00
-10
00
10
02
00
30
0
Regression Simulation
Class Size
Te
st P
erf
orm
an
ce
True Slope = 1
slopes1
Fre
qu
en
cy
-4 -2 0 2 4 6
05
01
00
15
02
00
25
0
What is the standard error?
This is the standard error of the slope.
2
2
1 xxSE
i
b
CONFIDENCE INTERVALS
Recall that a distribution can reflect the observed characteristics of a population, but it can also reflect the sampling variation of a parameter.
How sure are we about the mean?• If we were sure of ourselves we wouldn’t need a margin of error! Because
we only have a sample, though, we can’t be certain.
ns
txforCI The population parameters are never known, so we use t-stats and the formula for the sample standard error.(CI of the mean)
How often are we wrong?
x
x
μ
Chose an alpha-level, which determines the size of the confidence interval. This example uses alpha=0.05. We would expect five samples in one-hundred to result in confidence intervals that do not contain the true mean. We see 3 in 50 draws here, which is consistent with expectations.
COVARIANCE
What is Covariance?
yx,(+,+)
(+,-)(-,-)
(-,+)Classroom size and test scores
X Y X-Xbar Y-Ybar (X-Xbar)(Y-Ybar)
3 5 3-6 (-) 5-3 (+) (-)(+) = (-)
5 2 5-6 (-) 2-3 (-) (-)(-) = (+)
10 2 10-6 (+) 2-3 (-) (+)(-) = (-)
1
),cov(
n
yyxxyx ii
Negative Covariance Example
yx,
High negative correlation, large negative b1 in a regression
Class Size
Test Scores
Class size and student performance
Positive Covariance Example
yx,
Hours of study and test scores
High positive correlation, large positive b1 in a regression
Small Covariance
yx,
Low correlation, small b1 in a regression
What is Correlation?
yx
xy
ii
ysdxsd
yxyxcor
n
yyxxyx
)()(
),cov(),(
1),cov(
The correlation is the covariance in simple units:
-1 < cor(x,y) < +1
THE REGRESSION SLOPE
Intuitive Regression Formula
yx,Cov(x,y)
Var(x)
X
Yslope
We want to interpret the slope causally, meaning the outcomeis going to change b amount because of a one-unit change in X.
xxxx
yyxx
x
yx
X
Yslope
ii
ii
)var(
),cov(
The Road Map
Variance St. Dev St. Error Confidence Intervals
1
2
2
n
xxixMean:
Slope:
2xx
nSE x
x
111 bSEtb
xSEtx
22
22
n
e
n
SSE i 2
2
2
1 xxSE
i
b
(of x)
(of the residual) (of the slope)
(of the mean)
25
Exercise
Which model is the “right” one?
To answer this we need to understand both slopes and standard errors.
x y
z
x y
z
x y
z
x y
x y
0 50 100 150 200
010
030
0
x
y
0 50 100 150 200
010
030
0
z
y
0 50 100 150 200
050
100
200
x
z
x y
z
Example #1
x y
z
0 100 200 300 400
010
030
0
x
y
0 50 100 150 200
010
030
0
z
y
0 100 200 300 400
050
100
200
x
z
Example #2
x y
z
0 100 200 300 400
020
060
0
x
y
0 50 100 150 200
020
060
0
z
y
0 100 200 300 400
050
100
200
x
z
Example #3
x y
x y
0 20 40 60 80 100
-50
50
15
02
50
x
y
-50 0 50 100 150 200 250
02
06
01
00
y
x