X and Y are not perfectly correlated. However, there is on average a positive

Y

X0

X and Y are notperfectly correlated.However, there is

on average a positiverelationship

between Y and X

X1 X2

1

Y1

E(Y1/X1)

Y

X0 X1

E(Yi/Xi) = 0 + 1Xi

We assume that expectedconditional values of Y

associated with alternativevalues of X

fall on a line.

1 = Y1 - E(Y1/X1)

Specification

Estimation

Evaluation

Forecasting

Econometric models posit causal relationships among economic variables.

Simple regression analysis is used totest the hypothesis about the relationship

between a dependent variable (Y, or in our case, C)and independent variable (X, or in our case, Y)).

Our model is specified as follows: C = f (Y), where

C: personal consumption expenditureY: Personal disposable income

ii YC 10

Simple linear regression begins by plotting C-Y values (see table 1)on a scatter diagram (see figure 1) to determine if there exists an approximate linear relationship:

(1)Since the data

points are unlikely to fallexactly on a line, (1)

must be modifiedto include a disturbance

term (ui)

iii uYC 10 (2)

0 and 1 are called parameters or population parameters.

We estimate these parameters using the data we have available

Table 1

Year n Cii Yii

1987 1 102 114

1988 2 106 118

1989 3 108 126

1990 4 110 130

1991 5 122 136

1992 6 124 140

1993 7 128 148

1994 8 130 156

1995 9 142 160

1996 10 148 164

1997 11 150 170

1998 12 154 178

Table 1

Figure 1: Scatter Diagram

Disposable income (billions)

180170160150140130120110

Con

sum

ptio

n (b

illio

ns)

160

150

140

130

120

110

100

We estimate the values of 0 and 1 using the Ordinary Least Squares (OLS) method. OLS is a technique for fitting the "best" straight line to the sample of XY observations. The line of best fit is that which minimizes the sum of the squared (vertical) deviations of the sample points from the line:

212

1

ˆ

i

ii CCMINIMIZE

Where,

Ci are the actual observations of consumption

iC are fitted values of consumption

ii YC 10 ˆˆˆ

C

Y

1C

C1

Y10

e1

iii CCe ˆ

The OLS estimators--single variable case

10 ˆˆ and are estimators of the true parameters 0 and 1

222

1ˆXX

YXX

XXn

YXYXn

i

ii

ii

iiii

XY 10 ˆˆ

Note that we use X to denote the explanatoryvariable and Y is the dependent variable.

N CI YI CIYI Yi2

1 102 114 11,628 12,9962 106 118 12,508 13,9243 108 126 13,608 15,8764 110 130 14,300 16,9005 122 136 16,592 18,4966 124 140 17,360 19,6007 128 148 18,944 21,9048 130 156 20,280 24,3369 142 160 22,720 25,600

10 148 164 24,272 26,86911 150 170 24,500 28,90012 154 178 27,412 31,684

n = 12 CI = 1,524 YI= 1,740 YICI =225,124

Yi2 = 257,112

Table 2

861.0744,57728,49

)740,1()]112,257)(12[()]524,1)(740,1[()]124,225)(12[(ˆ

21

Thus, we have:

30.2)]145)(861.0[(1270ˆ

Thus the equation obtained from the regression is:

ii YC 861.030.2ˆ

Y e a r C i iC iii CCe ˆ1 9 8 7 1 0 2 1 0 0 . 3 4 1 . 6 61 9 8 8 1 0 6 1 0 3 . 7 8 2 . 2 21 9 8 9 1 0 8 1 1 0 . 6 6 - 2 . 6 61 9 9 0 1 1 0 1 1 4 . 1 0 - 4 . 1 01 9 9 1 1 2 2 1 1 9 . 2 6 2 . 7 51 9 9 2 1 2 4 1 2 2 . 7 0 1 . 3 01 9 9 3 1 2 8 1 2 9 . 5 8 - 1 . 5 81 9 9 4 1 3 0 1 3 6 . 4 6 - 6 . 4 61 9 9 5 1 4 2 1 3 9 . 9 0 2 . 1 01 9 9 6 1 4 8 1 4 3 . 3 4 4 . 6 61 9 9 7 1 5 0 1 4 8 . 5 0 1 . 5 01 9 9 8 1 5 4 1 5 5 . 3 8 - 1 . 3 8

e i2 1 1 5 . 2 8

Table 3: Fitted values of consumption

Actual and Fitted Values of Consumption, 1987-99

Year

989796959493929190898887

Con

sum

ptio

n (b

illio

ns)

160

155

150

145

140

135

130

125

120

115

110

105

100

9590

Actual

FITTED

Coefficientsa

2.129 7.164 .297 .772 -13.834 18.092.861 .049 .984 17.596 .000 .752 .970

(Constant)INCOME

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.LowerBound

UpperBound

95% Confidence Intervalfor B

Dependent Variable: CONSUMEa.

Model Summary

.984a .969 .966 3.40Model1

R R SquareAdjusted R

Square

Std. Errorof the

Estimate

Predictors: (Constant), INCOMEa.

ANOVAb

3568.732 1 3568.732 309.602 .000a

115.268 10 11.5273684.000 11

RegressionResidualTotal

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), INCOMEa.

Dependent Variable: CONSUMEb.

Goodness of fit criteria•Standard errors of the estimates

•Are the estimates statistically significant?

•Constructing confidence intervals

•The coefficient of determination (R2).

•The standard error of the regression

These statistics tellus how well the equation

obtained from the regression performs

in terms of producingaccurate forecasts

We assume that the regression coefficients are normally distributed variables. The standard error (or standard deviation) of the estimates is a measure of the dispersion of the estimates around their mean value. As a general principle, the smaller the standard error, the better the estimates (in terms of yielding accurate forecasts of the dependent variable). The following rule-of-thumb is useful:"[the] standard error of the regression coefficient should be less than half of the size of [the] corresponding regression coefficient."Let 1s denote the standard error of our estimate of the slope

parameter

2ˆˆ 11 ss

2

22ˆ1

i

i

xkne

s

By reference to the SPSS output, we see that the standard error of our estimate

of 1 is 0.049, whereas our estimate of 1

is 0.861. Hence our estimate is about 17 times the size of its standard error

Note that: XXx ii

To test for the significance of our estimate of 1, we set the following null hypothesis, H0, and the alternative hypothesis, H1

H0: 1 0

H1: 1 > 0

The t distribution is used to test for statistical significance of the estimate:

57.17049.0

0861.0ˆ

1ˆ

11

s

t

The t test is a wayof comparing the errorsuggested by the null

hypothesis to the standard error of the estimate

A rule-of thumb: if t > 2, reject H0

Constructing confidence intervals

To find the 95 percent confidence interval for 1, that is:

Pr( a < 1 < b) = .95

To find the upper and lower boundaries of the confidence interval (a and b):

1ˆ1ˆ stc

Where tc is the critical value of t at the 5 percent confidence level (two-sided,10 degrees of freedom ). tc = 2.228.

Working it out, we have:

Pr( .752< 1 < .970) = .95We can be 95 percent confident that the true value of the slope coefficient is in this range.

The coefficient of determination (R2)

The coefficient of determination, R2, is defined as the proportion of the total variation in the dependent variable (Y) "explained" by the regression of Y on the independent variable (X).

The total variation in Y or the total sum of squares (TSS) is defined as:

n

i

i

n

i

i yYYTSS1

22

1

The explained variation in the dependent variable(Y) is called the regression sum of squares (RSS) and is given by:

n

i

i

n

i

i yYYRSS1

22

1

ˆˆ

Note: YYy ii

What remains is the unexplained variation in the dependent variable or the error sum of squares (ESS)

n

i

i

n

i

i eYYESS1

22

1

We can say the following:

•TSS = RSS + ESS, or

•Total variation = Explained variation + Unexplained variation

R2 is defined as:

n

i

i

n

i

i

n

i

i

n

i

i

y

e

y

y

RSSESS

TSSRSSR

1

2

1

2

1

2

1

2

2 1ˆ

1

Note that: 0 R2 1

If R2 = 0, all the sample points lie on a horizontal line or in a circle

If R2 = 1, the sample points all lie on the regression line

In our case, R2 0.984, meaning that 98.4 percent of the variation in the dependent variable (consumption) is explained by the regression.

Think of R2 as the proportion of the total deviation of the dependent variable from its

mean value that is accounted for by the explanatory variable(s).

The standard error of the regression (s) is given by

11

2

kn

es

n

i

i

In our case, s = 3.40

Regression is based on the assumption that the error term is normally distributed, so that 6.87% of the actual values of the dependent variable should be within one standard error ($3.4 billion in our example) of their fitted value.

Also, 95.45% of the observed values of consumption should be within 2 standard errors of their fitted values ($6.8 billion).

Our forecasting equation was estimated as follows:

ii YC ˆ861.030.2ˆ

At the most basic level, forecasting consists of inserting forecasted values of the explanatory variable X (disposable income) into the forecasting equation to obtain forecasted values of the dependent variable Y (personal consumption expenditure).

Our ability to generate accurate forecasts of the dependent variable depends on two factors:

Do we have good forecasts of the explanatory variable?

Does our model exhibit structural stability, i.e., will the causal relationship between C and Y expressed in our forecasting equation hold up over time? After all, the estimated coefficients are average values for a specific time interval (1987-1998). While the past may be a serviceable guide to the future in the case of purely physical phenomena, the same principle does not necessarily hold in the realm of social phenomena (to which economy belongs).

Can we make a good forecast?

Year iYˆ iC

1999 199 173.44

2000 206 179.46

2001 213 185.48

2002 218 189.78

2003 215 187.20

Having forecastedvalues of income in

hand, we can forecastconsumption through the

year 2003

Forecast of Consumption Expenditure, 1999-2003

Year

20032002200120001999

Cons

umpt

ion

(bill

ions

)

190

188

186

184

182

180

178

176

174

172

170

X and Y are not perfectly correlated. However, there is on average a positive

Documents

Transcript of X and Y are not perfectly correlated. However, there is on average a positive