Post on 18-Jan-2016
Trees Example
More than one variable
The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though, so from physical arguments we force the line to pass through the origin.
The R squared value is higher now, but the residual plot is not so random.
We might now ask if we can find a model with both explanatory variables height and girth. Physical considerations suggest that we should explore the very simple model
Volume = b1 × height × (girth)2 +
This is basically the formula for the volume of a cylinder.
So the equation is:
Volume = 0.002108 × height × (girth)2 +
The residuals are considerably smaller than those from any of the previous modelsconsidered. Further graphical analysis fails to reveal any further obvious dependenceon either of the explanatory variable girth or height.
Further analysis also shows that inclusion of a constant term in the model does not significantly improve the fit. Model 4 is thus the most satisfactory of those models considered for the data.
However, this is regression “through the origin” so it may be more satisfactory torewrite Model 4 as
volume = b1 +
height × (girth)2
so that b1 can then just be regarded as the mean of the observations of
volume height × (girth)2
recall that is assumed to have location measure (here mean) 0.
Compare with 0.002108 found earlier
Practical Question 2
y x1 x2
3.5 3.1 30
3.2 3.4 25
3.0 3.0 20
2.9 3.2 30
4.0 3.9 40
2.5 2.8 25
2.3 2.2 30
So y = -0.2138 + 0.8984x1 + 0.01745x2 + e
Use >plot(multregress)
or >plot(cooks.distance(multregress),type="h")
> ynew=c(y,12)> x1new=c(x1,20)> x2new=c(x2,100)
> multregressnew=lm(ynew~x1new+x2new)
Very large influence
Second Example
> ynew=c(y,40)> x1new=c(x1,10)> x2new=c(x2,50)
> multregressnew=lm(ynew~x1new+x2new)
Multiple Linear Regression - Matrix Formulation
Let x = (x1, x2, … , xn)′ be a n 1 column
vector and let g(x) be a scalar function of x. Then, by definition,
xgx
xgx
xgx
xgx
n
2
1
For example, let 2
1
n
ii
g x x x x
Let a = (a1, a2, … , a n)′ be a n 1 column vector
of constants. It is easy to verify that
x a ax
and that, for symmetrical A (n n)
2x A x A xx
Theory of Multiple Regression
Suppose we have response variables Yi ,
i = 1, 2, … , n and k explanatory variables/predictors X1, X2, … , Xk .
0 1 1 2 2 ...i i i k ki iY b b x b x b x
i = 1,2, … , nThere are k+2 parameters b0 , b1 , b2 , …, bk and σ2
nY
Y
Y
1
11 21 1
12 22 2
1 2
1
1
1
k
k
n n kn
x x x
x x x
X
x x x
X is called the design matrix
0
k
b
b
b
n
1
:Model Y Xb
OLS (ordinary least-squares) estimation
S Y Xb Y Xb
Y b X Y Xb
2Y Y b X Y b X Xb
2 2 0S
X Y X Xbb
1
1
b X X X Xb
b A A X X X
where
ˆ ˆE b b AE b b so is unbiased
ˆX Xb X Y
Fitted values are given by
1ˆY X b X X X X Y HY
1H X X X X
H is called the “hat matrix” (… it puts the hats on the Y’s)
The error sum of squares, SSRES , is
ˆ ˆ ˆ2S Y Y b X Y b X Xb Min
1ˆ ˆ2Y Y b X Y b X X X X X Y
ˆY Y b X Y
The estimate of 2 is based on this.
Example: Find a model of the form
y x1 x2
3.5 3.1 30
3.2 3.4 25
3.0 3.0 20
2.9 3.2 30
4.0 3.9 40
2.5 2.8 25
2.3 2.2 30
0 1 1 2 2 ...i i i k ki iY b b x b x b x for the data below.
Y
35
32
30
29
4 0
25
23
.
.
.
.
.
.
.
X
1 31 30
1 34 25
1 30 20
1 32 30
1 39 40
1 28 25
1 2 2 30
.
.
.
.
.
.
.
X is called the design matrix
Y Xb The model in matrix form is given by:
1
ˆ
ˆ ( )
X Xb X Y
b X X X Y
We have already seen that
Now calculate this for our example
X X
7 0 216 200 0
216 683 626 0
200 0 626 0 5950 0
. . .
. . .
. . .
R can be used to calculate X’X and the answer is:
To input the matrix in R use
X=matrix(c(1,1,1,1,1,1,1,3.1,3.4,3.0,3.4,3.9,2.8,2.2,30,25,20,30,40,25,30),7,3)
Number of rows
Number of columns
Notice command for matrix multiplication
The inverse of X’X can also be obtained by using R
We also need to calculate X’Y
1ˆ ( )b X X X Y Now
Notice that this is the same result as obtained previously using the lm result on R
So y = -0.2138 + 0.8984x1 + 0.01745x2 + e
1H X X X X
The “hat matrix” is given by
Y HY
The fitted Y values are obtained by
Recall once more we are looking at the model
Compare with
Error Terms and Inference
2 1 ˆˆ1Y Y b X Y
n k
A useful result is :
n : number of points
k: number of explanatory variables
1
ˆˆ ˆ~ . .
ˆ. .i i
n k i ii
i
b bt s e b c
s e b
where
In addition we can show that:
1.X X
And c(i+1)(i+1) is the (i+1)th diagonal element of
where s.e.(bi)=c(i+1)(i+1)
For our example:
ˆ67.44 67.1031Y Y b X Y
. . . 2 1
467 44 671031 0 08422
ˆ 0.2902
1.X X
was calculated as:
This means that
c11= 6.683, c22=0.7600,c33=0.0053
Note that c11 is associated with b0, c22 with b1 and c33 with b2
We will calculate the standard error for b1
This is 0.7600 x 0.2902 = 0.2530
The value of b1 is 0.8984
Now carry out a hypothesis test.
H0: b1 = 0
H1: b1 ≠ 0
The standard error of b1 is 0.2530
^
The test statistic is
This calculates as (0.8984 – 0)/0.2530 = 3.55
1 1ˆ
. .
b bt
S E
Ds…..
……….
t tables using 4 degrees of freedom give cut of point of 2.776 for 2.5%.
………………................
We therefore accept H1. There is no evidence at the 5% level that b1 is zero.
The process can be repeated for the other b values and confidence intervals calculated in the usual way.
CI for 2 - based on the 42 distribution of
4 2 2 / ((4 0.08422)/11.14 , (4 0.08422)/0.4844)
i.e. (0.030 , 0.695)
ˆ ˆˆ ˆRESSS Y Xb Y Xb
The sum of squares of the residuals can also be calculated.