Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record...

43
Non-Linear Non-Linear Regression Regression

Transcript of Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record...

Page 1: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Non-Linear RegressionNon-Linear Regression

Page 2: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The data frame trees is made available in R with

>data(trees)

These record the girth in inches, height in feet and volume of timber in cubic feet of each of a sample of 31 felled black cherry trees in Allegheny National Forest, Pennsylvania.

Note that girth is the diameter of the tree (in inches) measured at 4 ft 6 in above the ground.

Page 3: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 4: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

We treat volume as the (continuous) response variable y and seek a reasonablemodel describing its distribution conditional first on the explanatory variable girth (we will call this x).

This might be a first step to prediction of volume based on further observations ofthe explanatory variables.

Page 5: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 6: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Observation of the graph leads us to first try out whether there may be a linear dependence here.

Thus the relationship is approximately y=a+bx+є, for some constants a and b

We will use R to find a and b, their standard errors and the residuals.

Page 7: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 8: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The fitted model isvolume = −36.9 + 5.07 × girth + residual

i.e.

y = −36.9 + 5.07x (+ residual)

To check its validity, first look at the standard errors

Page 9: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The standard errors of both a and b are low in comparison with the actual values and the p-values associated with the coefficients show that neither of these may reasonably be taken as zero. Thus there is evidence that the model is appropriate.

Page 10: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 11: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Some measure of the success of the fitted model is also given by the residual standard error. For a good fit this should be small in relation to the variation in the response variable itself.

Page 12: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Note: 18.1 = 4.252

Page 13: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

However, a full examination of the residuals, and of the nature of any further dependencethey may have on the explanatory variables, is to be preferred to reliance on any single number. All this will require graphical analysis, the results of which follow.

Page 14: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 15: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 16: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

There is a slight evidence of non random behaviour in the residuals with perhaps the hint of a quadratic curve. We now adapt the model.

Page 17: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The residuals from Model 1 show some further, perhaps quadratic, dependence on the explanatory variable girth, so we try introducing a nonlinear term.

We consider the modelvolume = a + b1 × girth + b2× (girth)2 + resid

The relevant R commands, and associated output, are now

>model2 = lm(Volume~Girth+I(Girth^2))> summary(model.2)

Page 18: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 19: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The fitted model is therefore

volume = 10.8 − 2.09 × girth + 0.255 × (girth)2 + residual.

Page 20: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Consider now the graphs produced by the following commands.

> plot(Volume~Girth)> lines(fitted(model2)~Girth)

> plot(residuals(model2)~Girth, ylab="residuals from Model 2")

Page 21: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 22: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 23: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

It is clear that these residuals are both smaller than those from Model 1 and showno further obvious dependence on the explanatory variable girth.

Further the very small p-value (0.00015) associated with the coefficient b2 shows that this cannot reasonably be set equal to zero, so that Model 2 is considerably more successful than Model 1.

Page 24: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Note also that the residual standard error in Model 2 is 3.335 whilst in Model 1 it is 4.252.

Further Analysis: On physical grounds, we might also consider the simpler model

Volume = b2 × (Girth)2 + Residual

For extra justification look at this R output

Page 25: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The R code to fit this model, and brief summary output, are:

> model3 = lm(Volume ~ I(Girth^2) - 1)

> summary(model3)

Page 26: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 27: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 28: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

We might now ask if we can find a model with both explanatory variables height and girth. Physical considerations suggest that we should explore the very simple model

Volume = b1 × height × (girth)2 +

This is basically the formula for the volume of a cylinder.

Page 29: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 30: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

So the equation is:

Volume = 0.002108 × height × (girth)2 +

Page 31: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 32: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 33: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

The residuals are considerably smaller than those from any of the previous modelsconsidered. Further graphical analysis fails to reveal any further obvious dependenceon either of the explanatory variable girth or height.

Further analysis also shows that inclusion of a constant term in the model does not significantly improve the fit. Model 4 is thus the most satisfactory of those models considered for the data.

Page 34: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

However, this is regression “through the origin” so it may be more satisfactory torewrite Model 4 as

volume = b1 +

height × (girth)2

Page 35: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

so that b1 can then just be regarded as the mean of the observations of

volume height × (girth)2

recall that is assumed to have location measure (here mean) 0.

Page 36: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Compare with 0.002108 found earlier

Page 37: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Multiple Regression Example

yy xx11 xx22

3.53.5 3.13.1 3030

3.23.2 3.43.4 2525

3.03.0 3.03.0 2020

2.92.9 3.23.2 3030

4.04.0 3.93.9 4040

2.52.5 2.82.8 2525

2.32.3 2.22.2 3030

Page 38: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

So y = -0.2138 + 0.8984x1 + 0.01745x2 + e

Page 39: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 40: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

> ynew=c(y,12)> x1new=c(x1,20)> x2new=c(x2,100)

> multregressnew=lm(ynew~x1new+x2new)

Adding an extra point:

Page 41: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 42: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Page 43: Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.

Very large influence