Trees Example More than one variable. The residual plot suggests that the linear model is...

64
Trees Example More than one variable

Transcript of Trees Example More than one variable. The residual plot suggests that the linear model is...

Page 1: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Trees Example

More than one variable

Page 2: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 3: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 4: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 5: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though, so from physical arguments we force the line to pass through the origin.

Page 6: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 7: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The R squared value is higher now, but the residual plot is not so random.

Page 8: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 9: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

We might now ask if we can find a model with both explanatory variables height and girth. Physical considerations suggest that we should explore the very simple model

Volume = b1 × height × (girth)2 +

This is basically the formula for the volume of a cylinder.

Page 10: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 11: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

So the equation is:

Volume = 0.002108 × height × (girth)2 +

Page 12: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 13: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 14: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The residuals are considerably smaller than those from any of the previous modelsconsidered. Further graphical analysis fails to reveal any further obvious dependenceon either of the explanatory variable girth or height.

Further analysis also shows that inclusion of a constant term in the model does not significantly improve the fit. Model 4 is thus the most satisfactory of those models considered for the data.

Page 15: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

However, this is regression “through the origin” so it may be more satisfactory torewrite Model 4 as

volume = b1 +

height × (girth)2

Page 16: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

so that b1 can then just be regarded as the mean of the observations of

volume height × (girth)2

recall that is assumed to have location measure (here mean) 0.

Page 17: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Compare with 0.002108 found earlier

Page 18: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Practical Question 2

y x1 x2

3.5 3.1 30

3.2 3.4 25

3.0 3.0 20

2.9 3.2 30

4.0 3.9 40

2.5 2.8 25

2.3 2.2 30

Page 19: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

So y = -0.2138 + 0.8984x1 + 0.01745x2 + e

Page 20: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 21: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Use >plot(multregress)

or >plot(cooks.distance(multregress),type="h")

Page 22: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

> ynew=c(y,12)> x1new=c(x1,20)> x2new=c(x2,100)

> multregressnew=lm(ynew~x1new+x2new)

Page 23: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 24: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 25: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Very large influence

Page 26: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Second Example

> ynew=c(y,40)> x1new=c(x1,10)> x2new=c(x2,50)

> multregressnew=lm(ynew~x1new+x2new)

Page 27: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 28: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 29: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 30: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Multiple Linear Regression - Matrix Formulation

Let x = (x1, x2, … , xn)′ be a n 1 column

vector and let g(x) be a scalar function of x. Then, by definition,

xgx

xgx

xgx

xgx

n

2

1

Page 31: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

For example, let 2

1

n

ii

g x x x x

Let a = (a1, a2, … , a n)′ be a n 1 column vector

of constants. It is easy to verify that

x a ax

and that, for symmetrical A (n n)

2x A x A xx

Page 32: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Theory of Multiple Regression

Suppose we have response variables Yi ,

i = 1, 2, … , n and k explanatory variables/predictors X1, X2, … , Xk .

0 1 1 2 2 ...i i i k ki iY b b x b x b x

i = 1,2, … , nThere are k+2 parameters b0 , b1 , b2 , …, bk and σ2

Page 33: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

nY

Y

Y

1

11 21 1

12 22 2

1 2

1

1

1

k

k

n n kn

x x x

x x x

X

x x x

X is called the design matrix

Page 34: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

0

k

b

b

b

n

1

:Model Y Xb

Page 35: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

OLS (ordinary least-squares) estimation

S Y Xb Y Xb

Y b X Y Xb

2Y Y b X Y b X Xb

2 2 0S

X Y X Xbb

Page 36: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

1

1

b X X X Xb

b A A X X X

where

ˆ ˆE b b AE b b so is unbiased

ˆX Xb X Y

Page 37: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Fitted values are given by

1ˆY X b X X X X Y HY

1H X X X X

H is called the “hat matrix” (… it puts the hats on the Y’s)

Page 38: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The error sum of squares, SSRES , is

ˆ ˆ ˆ2S Y Y b X Y b X Xb Min

1ˆ ˆ2Y Y b X Y b X X X X X Y

ˆY Y b X Y

The estimate of 2 is based on this.

Page 39: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Example: Find a model of the form

y x1 x2

3.5 3.1 30

3.2 3.4 25

3.0 3.0 20

2.9 3.2 30

4.0 3.9 40

2.5 2.8 25

2.3 2.2 30

0 1 1 2 2 ...i i i k ki iY b b x b x b x for the data below.

Page 40: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Y

35

32

30

29

4 0

25

23

.

.

.

.

.

.

.

X

1 31 30

1 34 25

1 30 20

1 32 30

1 39 40

1 28 25

1 2 2 30

.

.

.

.

.

.

.

X is called the design matrix

Page 41: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Y Xb The model in matrix form is given by:

1

ˆ

ˆ ( )

X Xb X Y

b X X X Y

We have already seen that

Now calculate this for our example

Page 42: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

X X

7 0 216 200 0

216 683 626 0

200 0 626 0 5950 0

. . .

. . .

. . .

R can be used to calculate X’X and the answer is:

Page 43: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

To input the matrix in R use

X=matrix(c(1,1,1,1,1,1,1,3.1,3.4,3.0,3.4,3.9,2.8,2.2,30,25,20,30,40,25,30),7,3)

Number of rows

Number of columns

Page 44: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 45: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Page 46: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Notice command for matrix multiplication

Page 47: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The inverse of X’X can also be obtained by using R

Page 48: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

We also need to calculate X’Y

1ˆ ( )b X X X Y Now

Page 49: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Notice that this is the same result as obtained previously using the lm result on R

Page 50: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

So y = -0.2138 + 0.8984x1 + 0.01745x2 + e

Page 51: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

1H X X X X

The “hat matrix” is given by

Page 52: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Y HY

The fitted Y values are obtained by

Page 53: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Recall once more we are looking at the model

Page 54: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Compare with

Page 55: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Error Terms and Inference

2 1 ˆˆ1Y Y b X Y

n k

A useful result is :

n : number of points

k: number of explanatory variables

Page 56: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

1

ˆˆ ˆ~ . .

ˆ. .i i

n k i ii

i

b bt s e b c

s e b

where

In addition we can show that:

1.X X

And c(i+1)(i+1) is the (i+1)th diagonal element of

where s.e.(bi)=c(i+1)(i+1)

Page 57: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

For our example:

ˆ67.44 67.1031Y Y b X Y

. . . 2 1

467 44 671031 0 08422

ˆ 0.2902

Page 58: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

1.X X

was calculated as:

Page 59: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

This means that

c11= 6.683, c22=0.7600,c33=0.0053

Note that c11 is associated with b0, c22 with b1 and c33 with b2

We will calculate the standard error for b1

This is 0.7600 x 0.2902 = 0.2530

Page 60: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The value of b1 is 0.8984

Now carry out a hypothesis test.

H0: b1 = 0

H1: b1 ≠ 0

The standard error of b1 is 0.2530

^

Page 61: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

The test statistic is

This calculates as (0.8984 – 0)/0.2530 = 3.55

1 1ˆ

. .

b bt

S E

Page 62: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

Ds…..

……….

t tables using 4 degrees of freedom give cut of point of 2.776 for 2.5%.

………………................

Page 63: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

We therefore accept H1. There is no evidence at the 5% level that b1 is zero.

The process can be repeated for the other b values and confidence intervals calculated in the usual way.

CI for 2 - based on the 42 distribution of

4 2 2 / ((4 0.08422)/11.14 , (4 0.08422)/0.4844)

i.e. (0.030 , 0.695)

Page 64: Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,

ˆ ˆˆ ˆRESSS Y Xb Y Xb

The sum of squares of the residuals can also be calculated.