X and Y are not perfectly correlated. However, there is on average a positive
description
Transcript of X and Y are not perfectly correlated. However, there is on average a positive
Y
X0
X and Y are notperfectly correlated.However, there is
on average a positiverelationship
between Y and X
X1 X2
1
Y1
E(Y1/X1)
Y
X0 X1
E(Yi/Xi) = 0 + 1Xi
We assume that expectedconditional values of Y
associated with alternativevalues of X
fall on a line.
1 = Y1 - E(Y1/X1)
Specification
Estimation
Evaluation
Forecasting
Econometric models posit causal relationships among economic variables.
Simple regression analysis is used totest the hypothesis about the relationship
between a dependent variable (Y, or in our case, C)and independent variable (X, or in our case, Y)).
Our model is specified as follows: C = f (Y), where
C: personal consumption expenditureY: Personal disposable income
ii YC 10
Simple linear regression begins by plotting C-Y values (see table 1)on a scatter diagram (see figure 1) to determine if there exists an approximate linear relationship:
(1)Since the data
points are unlikely to fallexactly on a line, (1)
must be modifiedto include a disturbance
term (ui)
iii uYC 10 (2)
0 and 1 are called parameters or population parameters.
We estimate these parameters using the data we have available
Table 1
Year n Cii Yii
1987 1 102 114
1988 2 106 118
1989 3 108 126
1990 4 110 130
1991 5 122 136
1992 6 124 140
1993 7 128 148
1994 8 130 156
1995 9 142 160
1996 10 148 164
1997 11 150 170
1998 12 154 178
Table 1
Figure 1: Scatter Diagram
Disposable income (billions)
180170160150140130120110
Con
sum
ptio
n (b
illio
ns)
160
150
140
130
120
110
100
We estimate the values of 0 and 1 using the Ordinary Least Squares (OLS) method. OLS is a technique for fitting the "best" straight line to the sample of XY observations. The line of best fit is that which minimizes the sum of the squared (vertical) deviations of the sample points from the line:
212
1
ˆ
i
ii CCMINIMIZE
Where,
Ci are the actual observations of consumption
iC are fitted values of consumption
ii YC 10 ˆˆˆ
C
Y
1C
C1
Y10
e1
iii CCe ˆ
The OLS estimators--single variable case
10 ˆˆ and are estimators of the true parameters 0 and 1
222
1ˆXX
YXX
XXn
YXYXn
i
ii
ii
iiii
XY 10 ˆˆ
Note that we use X to denote the explanatoryvariable and Y is the dependent variable.
N CI YI CIYI Yi2
1 102 114 11,628 12,9962 106 118 12,508 13,9243 108 126 13,608 15,8764 110 130 14,300 16,9005 122 136 16,592 18,4966 124 140 17,360 19,6007 128 148 18,944 21,9048 130 156 20,280 24,3369 142 160 22,720 25,600
10 148 164 24,272 26,86911 150 170 24,500 28,90012 154 178 27,412 31,684
n = 12 CI = 1,524 YI= 1,740 YICI =225,124
Yi2 = 257,112
Table 2
861.0744,57728,49
)740,1()]112,257)(12[()]524,1)(740,1[()]124,225)(12[(ˆ
21
Thus, we have:
30.2)]145)(861.0[(1270ˆ
Thus the equation obtained from the regression is:
ii YC 861.030.2ˆ
Y e a r C i iC iii CCe ˆ1 9 8 7 1 0 2 1 0 0 . 3 4 1 . 6 61 9 8 8 1 0 6 1 0 3 . 7 8 2 . 2 21 9 8 9 1 0 8 1 1 0 . 6 6 - 2 . 6 61 9 9 0 1 1 0 1 1 4 . 1 0 - 4 . 1 01 9 9 1 1 2 2 1 1 9 . 2 6 2 . 7 51 9 9 2 1 2 4 1 2 2 . 7 0 1 . 3 01 9 9 3 1 2 8 1 2 9 . 5 8 - 1 . 5 81 9 9 4 1 3 0 1 3 6 . 4 6 - 6 . 4 61 9 9 5 1 4 2 1 3 9 . 9 0 2 . 1 01 9 9 6 1 4 8 1 4 3 . 3 4 4 . 6 61 9 9 7 1 5 0 1 4 8 . 5 0 1 . 5 01 9 9 8 1 5 4 1 5 5 . 3 8 - 1 . 3 8
e i2 1 1 5 . 2 8
Table 3: Fitted values of consumption
Actual and Fitted Values of Consumption, 1987-99
Year
989796959493929190898887
Con
sum
ptio
n (b
illio
ns)
160
155
150
145
140
135
130
125
120
115
110
105
100
9590
Actual
FITTED
Coefficientsa
2.129 7.164 .297 .772 -13.834 18.092.861 .049 .984 17.596 .000 .752 .970
(Constant)INCOME
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.LowerBound
UpperBound
95% Confidence Intervalfor B
Dependent Variable: CONSUMEa.
Model Summary
.984a .969 .966 3.40Model1
R R SquareAdjusted R
Square
Std. Errorof the
Estimate
Predictors: (Constant), INCOMEa.
ANOVAb
3568.732 1 3568.732 309.602 .000a
115.268 10 11.5273684.000 11
RegressionResidualTotal
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), INCOMEa.
Dependent Variable: CONSUMEb.
Goodness of fit criteria•Standard errors of the estimates
•Are the estimates statistically significant?
•Constructing confidence intervals
•The coefficient of determination (R2).
•The standard error of the regression
These statistics tellus how well the equation
obtained from the regression performs
in terms of producingaccurate forecasts
We assume that the regression coefficients are normally distributed variables. The standard error (or standard deviation) of the estimates is a measure of the dispersion of the estimates around their mean value. As a general principle, the smaller the standard error, the better the estimates (in terms of yielding accurate forecasts of the dependent variable). The following rule-of-thumb is useful:"[the] standard error of the regression coefficient should be less than half of the size of [the] corresponding regression coefficient."Let 1s denote the standard error of our estimate of the slope
parameter
2ˆˆ 11 ss
2
22ˆ1
i
i
xkne
s
By reference to the SPSS output, we see that the standard error of our estimate
of 1 is 0.049, whereas our estimate of 1
is 0.861. Hence our estimate is about 17 times the size of its standard error
Note that: XXx ii
To test for the significance of our estimate of 1, we set the following null hypothesis, H0, and the alternative hypothesis, H1
H0: 1 0
H1: 1 > 0
The t distribution is used to test for statistical significance of the estimate:
57.17049.0
0861.0ˆ
1ˆ
11
s
t
The t test is a wayof comparing the errorsuggested by the null
hypothesis to the standard error of the estimate
A rule-of thumb: if t > 2, reject H0
Constructing confidence intervals
To find the 95 percent confidence interval for 1, that is:
Pr( a < 1 < b) = .95
To find the upper and lower boundaries of the confidence interval (a and b):
1ˆ1ˆ stc
Where tc is the critical value of t at the 5 percent confidence level (two-sided,10 degrees of freedom ). tc = 2.228.
Working it out, we have:
Pr( .752< 1 < .970) = .95We can be 95 percent confident that the true value of the slope coefficient is in this range.
The coefficient of determination (R2)
The coefficient of determination, R2, is defined as the proportion of the total variation in the dependent variable (Y) "explained" by the regression of Y on the independent variable (X).
The total variation in Y or the total sum of squares (TSS) is defined as:
n
i
i
n
i
i yYYTSS1
22
1
The explained variation in the dependent variable(Y) is called the regression sum of squares (RSS) and is given by:
n
i
i
n
i
i yYYRSS1
22
1
ˆˆ
Note: YYy ii
What remains is the unexplained variation in the dependent variable or the error sum of squares (ESS)
n
i
i
n
i
i eYYESS1
22
1
We can say the following:
•TSS = RSS + ESS, or
•Total variation = Explained variation + Unexplained variation
R2 is defined as:
n
i
i
n
i
i
n
i
i
n
i
i
y
e
y
y
RSSESS
TSSRSSR
1
2
1
2
1
2
1
2
2 1ˆ
1
Note that: 0 R2 1
If R2 = 0, all the sample points lie on a horizontal line or in a circle
If R2 = 1, the sample points all lie on the regression line
In our case, R2 0.984, meaning that 98.4 percent of the variation in the dependent variable (consumption) is explained by the regression.
Think of R2 as the proportion of the total deviation of the dependent variable from its
mean value that is accounted for by the explanatory variable(s).
The standard error of the regression (s) is given by
11
2
kn
es
n
i
i
In our case, s = 3.40
Regression is based on the assumption that the error term is normally distributed, so that 6.87% of the actual values of the dependent variable should be within one standard error ($3.4 billion in our example) of their fitted value.
Also, 95.45% of the observed values of consumption should be within 2 standard errors of their fitted values ($6.8 billion).
Our forecasting equation was estimated as follows:
ii YC ˆ861.030.2ˆ
At the most basic level, forecasting consists of inserting forecasted values of the explanatory variable X (disposable income) into the forecasting equation to obtain forecasted values of the dependent variable Y (personal consumption expenditure).
Our ability to generate accurate forecasts of the dependent variable depends on two factors:
Do we have good forecasts of the explanatory variable?
Does our model exhibit structural stability, i.e., will the causal relationship between C and Y expressed in our forecasting equation hold up over time? After all, the estimated coefficients are average values for a specific time interval (1987-1998). While the past may be a serviceable guide to the future in the case of purely physical phenomena, the same principle does not necessarily hold in the realm of social phenomena (to which economy belongs).
Can we make a good forecast?
Year iYˆ iC
1999 199 173.44
2000 206 179.46
2001 213 185.48
2002 218 189.78
2003 215 187.20
Having forecastedvalues of income in
hand, we can forecastconsumption through the
year 2003
Forecast of Consumption Expenditure, 1999-2003
Year
20032002200120001999
Cons
umpt
ion
(bill
ions
)
190
188
186
184
182
180
178
176
174
172
170