Trees Example More than one variable. The residual plot suggests that the linear model is...

Post on 18-Jan-2016

217 views 0 download

Tags:

Transcript of Trees Example More than one variable. The residual plot suggests that the linear model is...

Trees Example

More than one variable

The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though, so from physical arguments we force the line to pass through the origin.

The R squared value is higher now, but the residual plot is not so random.

We might now ask if we can find a model with both explanatory variables height and girth. Physical considerations suggest that we should explore the very simple model

Volume = b1 × height × (girth)2 +

This is basically the formula for the volume of a cylinder.

So the equation is:

Volume = 0.002108 × height × (girth)2 +

The residuals are considerably smaller than those from any of the previous modelsconsidered. Further graphical analysis fails to reveal any further obvious dependenceon either of the explanatory variable girth or height.

Further analysis also shows that inclusion of a constant term in the model does not significantly improve the fit. Model 4 is thus the most satisfactory of those models considered for the data.

However, this is regression “through the origin” so it may be more satisfactory torewrite Model 4 as

volume = b1 +

height × (girth)2

so that b1 can then just be regarded as the mean of the observations of

volume height × (girth)2

recall that is assumed to have location measure (here mean) 0.

Compare with 0.002108 found earlier

Practical Question 2

y x1 x2

3.5 3.1 30

3.2 3.4 25

3.0 3.0 20

2.9 3.2 30

4.0 3.9 40

2.5 2.8 25

2.3 2.2 30

So y = -0.2138 + 0.8984x1 + 0.01745x2 + e

Use >plot(multregress)

or >plot(cooks.distance(multregress),type="h")

> ynew=c(y,12)> x1new=c(x1,20)> x2new=c(x2,100)

> multregressnew=lm(ynew~x1new+x2new)

Very large influence

Second Example

> ynew=c(y,40)> x1new=c(x1,10)> x2new=c(x2,50)

> multregressnew=lm(ynew~x1new+x2new)

Multiple Linear Regression - Matrix Formulation

Let x = (x1, x2, … , xn)′ be a n 1 column

vector and let g(x) be a scalar function of x. Then, by definition,

xgx

xgx

xgx

xgx

n

2

1

For example, let 2

1

n

ii

g x x x x

Let a = (a1, a2, … , a n)′ be a n 1 column vector

of constants. It is easy to verify that

x a ax

and that, for symmetrical A (n n)

2x A x A xx

Theory of Multiple Regression

Suppose we have response variables Yi ,

i = 1, 2, … , n and k explanatory variables/predictors X1, X2, … , Xk .

0 1 1 2 2 ...i i i k ki iY b b x b x b x

i = 1,2, … , nThere are k+2 parameters b0 , b1 , b2 , …, bk and σ2

nY

Y

Y

1

11 21 1

12 22 2

1 2

1

1

1

k

k

n n kn

x x x

x x x

X

x x x

X is called the design matrix

0

k

b

b

b

n

1

:Model Y Xb

OLS (ordinary least-squares) estimation

S Y Xb Y Xb

Y b X Y Xb

2Y Y b X Y b X Xb

2 2 0S

X Y X Xbb

1

1

b X X X Xb

b A A X X X

where

ˆ ˆE b b AE b b so is unbiased

ˆX Xb X Y

Fitted values are given by

1ˆY X b X X X X Y HY

1H X X X X

H is called the “hat matrix” (… it puts the hats on the Y’s)

The error sum of squares, SSRES , is

ˆ ˆ ˆ2S Y Y b X Y b X Xb Min

1ˆ ˆ2Y Y b X Y b X X X X X Y

ˆY Y b X Y

The estimate of 2 is based on this.

Example: Find a model of the form

y x1 x2

3.5 3.1 30

3.2 3.4 25

3.0 3.0 20

2.9 3.2 30

4.0 3.9 40

2.5 2.8 25

2.3 2.2 30

0 1 1 2 2 ...i i i k ki iY b b x b x b x for the data below.

Y

35

32

30

29

4 0

25

23

.

.

.

.

.

.

.

X

1 31 30

1 34 25

1 30 20

1 32 30

1 39 40

1 28 25

1 2 2 30

.

.

.

.

.

.

.

X is called the design matrix

Y Xb The model in matrix form is given by:

1

ˆ

ˆ ( )

X Xb X Y

b X X X Y

We have already seen that

Now calculate this for our example

X X

7 0 216 200 0

216 683 626 0

200 0 626 0 5950 0

. . .

. . .

. . .

R can be used to calculate X’X and the answer is:

To input the matrix in R use

X=matrix(c(1,1,1,1,1,1,1,3.1,3.4,3.0,3.4,3.9,2.8,2.2,30,25,20,30,40,25,30),7,3)

Number of rows

Number of columns

Notice command for matrix multiplication

The inverse of X’X can also be obtained by using R

We also need to calculate X’Y

1ˆ ( )b X X X Y Now

Notice that this is the same result as obtained previously using the lm result on R

So y = -0.2138 + 0.8984x1 + 0.01745x2 + e

1H X X X X

The “hat matrix” is given by

Y HY

The fitted Y values are obtained by

Recall once more we are looking at the model

Compare with

Error Terms and Inference

2 1 ˆˆ1Y Y b X Y

n k

A useful result is :

n : number of points

k: number of explanatory variables

1

ˆˆ ˆ~ . .

ˆ. .i i

n k i ii

i

b bt s e b c

s e b

where

In addition we can show that:

1.X X

And c(i+1)(i+1) is the (i+1)th diagonal element of

where s.e.(bi)=c(i+1)(i+1)

For our example:

ˆ67.44 67.1031Y Y b X Y

. . . 2 1

467 44 671031 0 08422

ˆ 0.2902

1.X X

was calculated as:

This means that

c11= 6.683, c22=0.7600,c33=0.0053

Note that c11 is associated with b0, c22 with b1 and c33 with b2

We will calculate the standard error for b1

This is 0.7600 x 0.2902 = 0.2530

The value of b1 is 0.8984

Now carry out a hypothesis test.

H0: b1 = 0

H1: b1 ≠ 0

The standard error of b1 is 0.2530

^

The test statistic is

This calculates as (0.8984 – 0)/0.2530 = 3.55

1 1ˆ

. .

b bt

S E

Ds…..

……….

t tables using 4 degrees of freedom give cut of point of 2.776 for 2.5%.

………………................

We therefore accept H1. There is no evidence at the 5% level that b1 is zero.

The process can be repeated for the other b values and confidence intervals calculated in the usual way.

CI for 2 - based on the 42 distribution of

4 2 2 / ((4 0.08422)/11.14 , (4 0.08422)/0.4844)

i.e. (0.030 , 0.695)

ˆ ˆˆ ˆRESSS Y Xb Y Xb

The sum of squares of the residuals can also be calculated.