Introduction to Bivariate Regression
Transcript of Introduction to Bivariate Regression
-
7/30/2019 Introduction to Bivariate Regression
1/51
Introduction to Econometrics
Eco-20042Lecture 3
1/51
Bivariate Regression Analysis
Motivation:
dependent = f( independent or explanatory variables)
e.g.
defense expenditure =f(GNP)qd= f(po, ps, Y .)ls = f(wage, no. of kids, age of kids, .)
notation:
Y = f(X1, X2, )
Note: does NOT imply causation (from theory)
-
7/30/2019 Introduction to Bivariate Regression
2/51
Introduction to Econometrics
Eco-20042Lecture 3
2/51
Reminder: objectives of exercise
estimate mean value of Y for given X - E(Y/X)e.g. mean sales if advertising is 10k
Test hypothesis suggested by theorye.g. does advertising affect sales
Predict Y
e.g. if adv increased by 10% what would happen to sales
Population Regression Function (PRF)
Example: Law of demand
Y: quantity demandedX: priceN=55 - assume this is the population
-
7/30/2019 Introduction to Bivariate Regression
3/51
Introduction to Econometrics
Eco-20042Lecture 3
3/51
The demand schedule for Widgets
Price (X) Quantity Demanded (Y) Number of consumers Average Y demanded
1
2
3
4
5
6
7
8
9
10
45,46, 47, 48, 49, 50, 51
44, 45, 46, 47, 48
40, 42, 44, 46, 48
35, 38, 42, 44, 46, 47
36, 39, 40, 42, 43
32, 35, 37, 38, 39, 42, 43
32, 34, 36, 38, 40
31, 32, 33, 34, 35, 36, 37
28, 30, 32, 34, 36
29, 30, 31
Total
7
5
5
6
5
7
5
7
5
3
55
48
46
44
42
40
38
36
34
32
30
-
7/30/2019 Introduction to Bivariate Regression
4/51
Introduction to Econometrics
Eco-20042Lecture 3
4/51
30
35
40
45
50
0 2 4 6 8 10price
quantity Population Regression Line (PRL)
Scattergram of Price and Quantity
-
7/30/2019 Introduction to Bivariate Regression
5/51
Introduction to Econometrics
Eco-20042Lecture 3
5/51
PRL: gives average (mean) Y for each level of X
mathematically
E(Y/Xi) = B1 + B2 Xi (1)
(1) is the Population Regression Function (PRF)
i.e. line that passes through conditional means of Y
B1 and B2 are parameters of PRF
Stochastic Population Regression Function
Not all points lie on the PRL:
Yi = B1 + B2 Xi + ui
-
7/30/2019 Introduction to Bivariate Regression
6/51
Introduction to Econometrics
Eco-20042Lecture 3
6/51
ui accounts for fact that not all individuals are equal to mean value.
ui is stochastic or random error term; a random variable.
Properties of ui:
Error may represent
variables not included in modele.g. income, price of other variables
inherent randomness in behaviourmeasurement errorprinciple of parsimony
-
7/30/2019 Introduction to Bivariate Regression
7/51
Introduction to Econometrics
Eco-20042Lecture 3
7/51
Sample Regression Function
Generated from sample of
population
Yi = b1 + b2 Xi + ei
ei is residual, estimator of ui.b1 is estimator of B1.b2 is estimator of B2.
25
30
35
40
45
50
0 2 4 6 8 10Price (X)
Sample 1 SRL for sample 1
Sample 2 SRL for sample 2
Regression Lines from two Samples
-
7/30/2019 Introduction to Bivariate Regression
8/51
Introduction to Econometrics
Eco-20042Lecture 3
8/51
-
7/30/2019 Introduction to Bivariate Regression
9/51
Introduction to Econometrics
Eco-20042Lecture 3
9/51
Digression: Linearity
Models need not be linear in variables
e.g.2
21)(
iXBBYE
iXBBYE
1)( 21
can be estimated using regressionbut NOT non-linear in parameters
iXBBYE2
21)(
-
7/30/2019 Introduction to Bivariate Regression
10/51
Introduction to Econometrics
Eco-20042Lecture 3
10/51
Estimation of parameters when we have one sample : OLS
How to find line?
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30
Sales
Advertising
Sales v Advertising
30
35
40
45
50
0 2 4 6 8 10
Price (X)
qs3 SRL for sample 3
Sample Regression for Widget Demand
-
7/30/2019 Introduction to Bivariate Regression
11/51
Introduction to Econometrics
Eco-20042Lecture 3
11/51
1 2i i iY b b X e
or iiieYY
where 1 2i iY b b X
iii
YYe so iii
XbbYe21
choose b1 and b2 such that minimize residual sum of squares
minimize 2
21
2)(
iiiXbbYe
solve using calculus to get:
XbYb 21
-
7/30/2019 Introduction to Bivariate Regression
12/51
Introduction to Econometrics
Eco-20042Lecture 3
12/51
22i
ii
x
yx
b
2)(
))((XX
YYXXi
ii
Q (Y) P (X) x y y2
x2
xy predicted e e2
eX
49 1 -4.5 11.2 125.44 20.25 -50.4 47.5091 1.4909 2.2228 1.490909
45 2 -3.5 7.2 51.84 12.25 -25.2 45.3515 -0.3515 0.1236 -0.70303
44 3 -2.5 6.2 38.44 6.25 -15.5 43.1939 0.8061 0.6497 2.418182
39 4 -1.5 1.2 1.44 2.25 -1.8 41.0364 -2.0364 4.1468 -8.14545
38 5 -0.5 0.2 0.04 0.25 -0.1 38.8788 -0.8788 0.7723 -4.39394
37 6 0.5 -0.8 0.64 0.25 -0.4 36.7212 0.2788 0.0777 1.672727
34 7 1.5 -3.8 14.44 2.25 -5.7 34.5636 -0.5636 0.3177 -3.94545
33 8 2.5 -4.8 23.04 6.25 -12 32.4061 0.5939 0.3528 4.751515
30 9 3.5 -7.8 60.84 12.25 -27.3 30.2485 -0.2485 0.0617 -2.23636
29 10 4.5 -8.8 77.44 20.25 -39.6 28.0909 0.9091 0.8264 9.090909
sum 378 55 0 0 393.6 82.5 -178 378 0 9.551515 0
mean Y =378/10 = 37.8
mean X =55/10 = 5.5
b2 -2.15758
b1 49.66667
-
7/30/2019 Introduction to Bivariate Regression
13/51
Introduction to Econometrics
Eco-20042Lecture 3
13/51
In this example:
2 2
1782.1576
82.5
i i
i
x yb
x
1 2 37.8 ( 2.1576)(5.5) 49.667b Y b X
So ii XY 1576.2667.49
interpretation:
b2: ceteris parabis, if price goes up by $1, mean quantity falls by 2.16units
b1: if price was zero, mean quantity is 49.7 units (often intercept hasno economic meaning)
-
7/30/2019 Introduction to Bivariate Regression
14/51
Introduction to Econometrics
Eco-20042Lecture 3
14/51
Note:
OLS line passes through sample mean values ofXand Y
mean(e) =ei/n =0
residuals and explanatory variables are uncorrelated: eiXi/n =0
Hypothesis Testing
Remember so far we have:
Stochastic Population Regression: Yi = B1 + B2Xi + ui
Sample Regression: Yi = b1 + b2Xi + ei
-
7/30/2019 Introduction to Bivariate Regression
15/51
Introduction to Econometrics
Eco-20042Lecture 3
15/51
For the example of widget demand the estimated regression was:
Yi = 49.667 - 2.1576Xi
The estimates ofb1 and b2 will differ with each sample so there will be a
probability distribution associated with them.
-
7/30/2019 Introduction to Bivariate Regression
16/51
Introduction to Econometrics
Eco-20042Lecture 3
16/51
Assumptions of the Classical Linear Regression Model
1 The explanatory variable(s)Xis uncorrelated with the disturbance term u.
2 The expected, or mean, value of the disturbance term u is zero E(ui) = 0i.e. on average the error term u has no effect on Y
3 The variance of each ui is constant, or homoscedastic: var(ui) = 2 i.e. the
conditional distribution of each Y population corresponding to a givenXhas the same variance. The alternative is that we have heteroscedasticityor unequal variance
4 There is no correlation between two error termsno autocorrelation
cov(ui,uj) = 0 for i j
-
7/30/2019 Introduction to Bivariate Regression
17/51
Introduction to Econometrics
Eco-20042Lecture 3
17/51
i.e. no systematic relationship between two error terms. If one u is above
the mean value then the other error neednt also be above (below) themean. Error terms ui are random.
Remember :
OLS estimates are random variablestheir value will change fromsample to sample.
XbYb21
22
i
ii
x
yxb
The variance or standard error of the estimates tells us something about thesampling variability of the estimates.
-
7/30/2019 Introduction to Bivariate Regression
18/51
Introduction to Econometrics
Eco-20042Lecture 3
18/51
Formula:
For the relationship Yi = b1 + b2 Xi we have
Var(b1) =2
2
2
i
i
xn
X Se(b1) = )var( 1b
Var(b2) = 2
2
ix
Se(b2) = )var( 2b
2
2
2
n
ei
..fd
RSS 2
-
7/30/2019 Introduction to Bivariate Regression
19/51
Introduction to Econometrics
Eco-20042Lecture 3
19/51
Estimator Formula Result
2
8
5515.9
2
2
n
ei
1.1939
1939.1 2 1.0926
Var(b1)
)5.82(10)1939.1)(385(
2
22
ixnX
0.5572
Se(b1) 5572.0)var(1
b 0.7464
Var(b2)
5.82
1935.12
2
i
x
0.0145
Se(b2) 0145.0)var(2
b 0.1203
-
7/30/2019 Introduction to Bivariate Regression
20/51
Introduction to Econometrics
Eco-20042Lecture 3
20/51
ii
XY
)1203.0(
15676.2
)7464.0(
6670.49
Tells us that the slope coefficient is2.1576 and that the standard error is 0.1203that is a measure of the variability of b2 from sample to sample
Hypothesis Testing
Suppose someone suggests that price has no effect on the quantity demanded. The
null hypothesis is that
H0:B2 = 0
This hypothesis is in effect a straw man. If sustained it says that there is no
relationship between YandXto begin with.
-
7/30/2019 Introduction to Bivariate Regression
21/51
Introduction to Econometrics
Eco-20042Lecture 3
21/51
IfXbelongs to the model one would expect to reject the null hypothesis H0 in favour
of the alternative hypothesis H1, which saysB2 is different from zero.
H1:B2 0
Remember: We cant simply look at the numerical value ofb2 because this value
is random and will vary from sample to sample. A formal test is required.
-
7/30/2019 Introduction to Bivariate Regression
22/51
Introduction to Econometrics
Eco-20042Lecture 3
22/51
Two approaches:
The confidence interval approach
The test of significance approach to test any hypothesis aboutB2 as well as
B1
General Testing issues
In particular, we know that b2 follows the normal distributionbecause b2 is simply
a linear function of u, which is a normally distributed random variable
Ifb2 is distributed as ),(
2
2 2bBN
then
-
7/30/2019 Introduction to Bivariate Regression
23/51
Introduction to Econometrics
Eco-20042Lecture 3
23/51
)1,0(~)(2
22
2
22
Nx
Bb
bse
Bb
Z
i
This allows us to calculate the probability ofb2 lying within a given range ofB2.
Problem
We dont know true but can replace it using .
If we replace using then
-
7/30/2019 Introduction to Bivariate Regression
24/51
Introduction to Econometrics
Eco-20042Lecture 3
24/51
2
22
ix
Bb
~ tn-2
The confidence interval approach
Assume that the level of significance , the probability of committing a type I error
is fixed at 5%.
From the t table, we find that with 8 d.f. P(-2.306 t 2.306) = 0.95
The probability that a t value (for 8 d.f.) lies between the limits
(-2.306, 2.306) is 0.95 or 95%.
-
7/30/2019 Introduction to Bivariate Regression
25/51
Introduction to Econometrics
Eco-20042Lecture 3
25/51
These are the critical t values
Substituting we have
P(-2.306
2
22
ix
Bb
2.306) = 0.95
95.0306.2306.2
P2
222
2
ii xbB
xb
Or more generally:
P[b22.306 se(b2) B2 b2 + 2.306 se(b2)] = 0.95
-
7/30/2019 Introduction to Bivariate Regression
26/51
Introduction to Econometrics
Eco-20042Lecture 3
26/51
Which provides us with the 95% confidence interval for B2.
For our example:
-2.15762.306(0.1203) B2-2.1576 + 2.306 (0.1203)
-2.4350 B2 -1.8802
Because this range does not include the null-hypothesized value of 0, we can reject a
null hypothesis that price has no effect on quantity demanded.
Check the conf idence interval for B1
-
7/30/2019 Introduction to Bivariate Regression
27/51
Introduction to Econometrics
Eco-20042Lecture 3
27/51
The test of significance approach to hypothesis testing:
Here the decision to accept or reject H0 is made on the basis of the value of the test
statistic obtained from the sample data.
In particular, we know that )(2
22
bse
Bbt
follows a t distribution with n2 d.f.
Let H0: B2 = B2* where B2
* is a specific numerical value of B2, then
)( 2
*
22
bse
Bbt
may be interpreted as the test statistic which follows a t distribution with n2 d.f.
-
7/30/2019 Introduction to Bivariate Regression
28/51
Introduction to Econometrics
Eco-20042Lecture 3
28/51
Test requires three pieces of information
The d.f. - always n2 for bivariate regression
The level of significance - conventionally set at 1%, 5%, 10%
Whether to use a one-tailed or a two-tailed test
Two-tailed test
H0: B2 = 0
H1: B2 0
Using the formula we have
-
7/30/2019 Introduction to Bivariate Regression
29/51
Introduction to Econometrics
Eco-20042Lecture 3
29/51
94.171203.0
01576.2
t with 10-2=8 d.f.
Level of significance 0.01 0.05 0.1
critical t: t* 3.355 2.306 1.860
-
7/30/2019 Introduction to Bivariate Regression
30/51
Introduction to Econometrics
Eco-20042Lecture 3
30/51
Compare calculated t value with critical value, say 0.01 level
-17.94>3.355
Hence reject null hypothesis that B2 = 0 in favour of alternative
One-tailed test:
H0: B2 0
H1: B2 0 left sided test
We already know t = -17.94Level of significance 0.01 0.05 0.1
critical t: t* -2.896 -1.860 -1.397
-
7/30/2019 Introduction to Bivariate Regression
31/51
Introduction to Econometrics
Eco-20042Lecture 3
31/51
Compare calculated t value with critical value, say 0.01 level
-17.94
-
7/30/2019 Introduction to Bivariate Regression
32/51
Introduction to Econometrics
Eco-20042Lecture 3
32/51
Hence reject null hypothesis that B2 0 in favour of alternative i.e. price coefficient
is negative as expected
We have looked at tests on the coefficients now look at some other tests;
How good is Fitted regression line overall?
This is measured by r2: coefficient of determination
How can this be computed?
iii eYY
iii eYYYY )()(
iii eyy
-
7/30/2019 Introduction to Bivariate Regression
33/51
Introduction to Econometrics
Eco-20042Lecture 3
33/51
or total deviation of
Yi from the mean
= explained
deviation
+ unexplained
deviation
-
7/30/2019 Introduction to Bivariate Regression
34/51
Introduction to Econometrics
Eco-20042Lecture 3
34/51
Square and sum gives, with some manipulation
222
iii
eyy
or totalvariation in
Y about itsmean
= explainedvariation in Y
ESS
+ unexplained variationin Y: or residual sum
of squaresRSS
TSS = ESS + RSS
i.e. TSS = ESS + RSS
TSS
RSS
TSS
ESS1
-
7/30/2019 Introduction to Bivariate Regression
35/51
Introduction to Econometrics
Eco-20042Lecture 3
35/51
let TSS
ESS
r
2
the coefficient of determination
then
2
2
211
i
i
y
e
TSS
RSSr
Note: 0 r2 1
Example:9757.0
360.393
5515.91
2 r
i.e. 98% of the variation in Y (Quantity) is explained by the regressionin thiscase the variable X (Price)
-
7/30/2019 Introduction to Bivariate Regression
36/51
Introduction to Econometrics
Eco-20042Lecture 3
36/51
Note: sample correlation: r
r = (r2)
so here r = (0.9757) = -0.9875
sign determined from graph, estimated slope coefficient etc.
Test on Overall Model: R2 = 0
H0: R2 = 0 i.e. no explanatory power in model
H1: R2 > 0
i.e. variables together have no effect on Y is the null (here we only have onevariable)
-
7/30/2019 Introduction to Bivariate Regression
37/51
Introduction to Econometrics
Eco-20042Lecture 3
37/51
We can show that
)2(
)12(
nRSS
ESS
F~ F(1,n-2)
if ESS large and RSS small then F gets big, reject H0
also )2()1(
)12(2
2
nR
RF
if R2 = 0; F = 0
R2 = 1; F =
-
7/30/2019 Introduction to Bivariate Regression
38/51
Introduction to Econometrics
Eco-20042Lecture 3
38/51
Using our example:
218.3218)9757.01(
19757.0
F
5% critical value F(1,8) = 5.32 from tables
F > CV so reject H0: R2 = 0
Normality tests:
We assumed errors normally distributed and all preceding tests are based on thisassumption, need to check.
-
7/30/2019 Introduction to Bivariate Regression
39/51
Introduction to Econometrics
Eco-20042Lecture 3
39/51
Look at histogram of errors to see if random, or perform Bera-Jacques test. Might
come back to this latertoo few observations to show really.
Regression using Stata:
_cons 49.66667 .7464394 66.54 0.000 47.94537 51.38796
price -2.157576 .1202996 -17.94 0.000 -2.434987 -1.880164
quantity Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 393.6 9 43.7333333 Root MSE = 1.0927
Adj R-squared = 0.9727
Residual 9.55151515 8 1.19393939 R-squared = 0.9757
Model 384.048485 1 384.048485 Prob > F = 0.0000F( 1, 8) = 321.66
Source SS df MS Number of obs = 10
. reg quantity price
-
7/30/2019 Introduction to Bivariate Regression
40/51
Introduction to Econometrics
Eco-20042Lecture 3
40/51
Forecasting/Prediction:
Use model to forecast MEAN value for Y given some value for X
Let X = X0 e.g. X0 = 3
We want E(Y/ X0=3)
0 49.667 2.1576 (3) 43.194Y
Now 00
YY there exists forecasting error so we need a distribution for 0Y
Mean: E(Y/ X0) =B1 + B2 Xt
-
7/30/2019 Introduction to Bivariate Regression
41/51
Introduction to Econometrics
Eco-20042Lecture 3
41/51
Variance:
2
2
02
0
)(1
)var(ix
XX
nY
2 not known so use2
;
now Y0 distributed as t, generate confidence interval
1)Yse(t)Xb(bXBB)Yse(t)Xb(bP 0
20210210
2021
Widget example:
2
0
1 (3 5.5)var(Y ) 1.1939 0.290844
10 82.50
-
7/30/2019 Introduction to Bivariate Regression
42/51
Introduction to Econometrics
Eco-20042Lecture 3
42/51
0se(Y ) 0.4581
95% confidence interval where critical t value with 8df = 2.306
P 43.194 2.306 (0.4581) E(Y) 43.194 2.306 (0.4581) 0.95
or 42.138 E(Y/X0) 44.250
CI grows as X0 goes away from X so one cannot extrapolate very far away from themean or out of sample
-
7/30/2019 Introduction to Bivariate Regression
43/51
Introduction to Econometrics
Eco-20042Lecture 3
43/51
25
30
35
40
45
50
0 2 4 6 8 10
Price (X)
quantity Fitted values
80% CI Fitted values
Sample Regression for Widget Demand
X
-
7/30/2019 Introduction to Bivariate Regression
44/51
Introduction to Econometrics
Eco-20042Lecture 3
44/51
Illustrative Examples:
1) estimate relationship between average wages and years of schooling;sample of 13 observations
_cons -.0144527 .8746238 -0.02 0.987 -1.939487 1.910581schooling .7240967 .0695813 10.41 0.000 .5709492 .8772442
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 105.118326 12 8.75986048 Root MSE = .9387
Adj R-squared = 0.8994
Residual 9.6928077 11 .881164337 R-squared = 0.9078
Model 95.4255181 1 95.4255181 Prob > F = 0.0000
F( 1, 11) = 108.29
Source SS df MS Number of obs = 13
. reg wage schooling
d
-
7/30/2019 Introduction to Bivariate Regression
45/51
Introduction to Econometrics
Eco-20042Lecture 3
45/51
0.0144 0.7241i iY X
where Y is average hourly wage rate ($)X is years of schooling
conclusions:
if schooling goes up 1 unit i.e. 1 year; expect average hourly wage toincrease approx. 72 cents
negative intercept has no particular economic interpretationconsider t values, conf intervals, R2 etc
d
-
7/30/2019 Introduction to Bivariate Regression
46/51
Introduction to Econometrics
Eco-20042Lecture 3
46/51
2) Gujarati has data available on a clock auction which included information
on the price of the winning bid, age of clock and number of bidders.
Note: age of clock and number of biddersHow do we expect age of clock to affect winning bid?
numbider 32 9.53125 2.839632 5 15
age 32 144.625 27.54556 108 194
price 32 1328.094 393.6495 729 2131
observation 32 16.5 9.380832 1 32
Variable Obs Mean Std. Dev. Min Max
d i i
-
7/30/2019 Introduction to Bivariate Regression
47/51
Introduction to Econometrics
Eco-20042Lecture 3
47/51
Expected relationship: Price andAgethe older the clock, thehigher the winning bidexpect
positive relationship
500
100 120 140 160 180 200Age
Price Fitted values
d i E i
-
7/30/2019 Introduction to Bivariate Regression
48/51
Introduction to Econometrics
Eco-20042Lecture 3
48/51
183.04 10.49i iY X
where Y is price of winning bitX is age of clock
_cons -183.0435 261.9194 -0.70 0.490 -717.9542 351.8672
age 10.44866 1.780017 5.87 0.000 6.813378 14.08394
price Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 4803756.72 31 154959.894 Root MSE = 273
Adj R-squared = 0.5191
Residual 2235809.47 30 74526.9823 R-squared = 0.5346
Model 2567947.25 1 2567947.25 Prob > F = 0.0000
F( 1, 30) = 34.46Source SS df MS Number of obs = 32
. reg price age
I t d ti t E t i
-
7/30/2019 Introduction to Bivariate Regression
49/51
Introduction to Econometrics
Eco-20042Lecture 3
49/51
conclusions:
if age goes up 1 unit i.e. 1year; expect price to increase on average by$10.49R2 mid value at 0.5346
What about number of bidders?
Expected relationship: Price andnumber of biddersthe morebidders the higher the pricebecause large number of bidderssuggest clock is valuableexpectpositive relationship500
5 10 15NumBider
Price Fitted values
I t d ti t E t i
-
7/30/2019 Introduction to Bivariate Regression
50/51
Introduction to Econometrics
Eco-20042Lecture 3
50/51
807.95 54.57i iY X
where Y is price of winning bitX is number of bidders
_cons 807.9501 231.0921 3.50 0.001 335.9972 1279.903
numbider 54.57245 23.26605 2.35 0.026 7.056827 102.0881
price Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 4803756.72 31 154959.894 Root MSE = 367.85
Adj R-squared = 0.1268
Residual 4059311.81 30 135310.394 R-squared = 0.1550
Model 744444.914 1 744444.914 Prob > F = 0.0258
F( 1, 30) = 5.50
Source SS df MS Number of obs = 32
. reg price numbider
I t d ti t E t i
-
7/30/2019 Introduction to Bivariate Regression
51/51
Introduction to Econometrics
Eco-20042Lecture 3
51/51
Conclusions:
if number of bidders goes up 1 person; expect price to increase onaverage by $54.5Note: R2 low at 0.1550
Today:We have explored how to estimate the best-fit line, interpret and evaluatecoefficients in a bivariate model using:
Hypothesis testing for coefficients(t-test, confidence intervals)Hypothesis testing for R2 (F-test)How to predict and see if it is good prediction