Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time...

16
Xuhua Xia Tim e (hr) W t(kg) 1.22 40.9 2.14 44.3 2.39 44.7 3.50 48.6 1.66 43.0 2.97 45.4 3.95 50.0 1.34 41.8 2.51 45.0 3.53 49.0 1.72 43.4 3.17 46.2 4.11 50.8 1.51 42.4 2.78 45.1 3.85 49.7 1.93 43.9 3.32 47.0 4.18 51.1 Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species. The data he recorded are shown in the table. The objectives are: Construct an equation relating TIME to BODYWT. Understand the model selection criteria. Estimate mean TIME for a given BODYWT with 95% CLM.

Transcript of Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time...

Xuhua Xia

Time (hr) Wt (kg)1.22 40.92.14 44.32.39 44.73.50 48.61.66 43.02.97 45.43.95 50.01.34 41.82.51 45.03.53 49.01.72 43.43.17 46.24.11 50.81.51 42.42.78 45.13.85 49.71.93 43.93.32 47.04.18 51.1

Polynomial Regression

• A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species. The data he recorded are shown in the table. The objectives are:– Construct an equation relating TIME to

BODYWT.

– Understand the model selection criteria.

– Estimate mean TIME for a given BODYWT with 95% CLM.

Xuhua Xia

The Relationship Is Nonlinear

1.001.50

2.002.503.003.50

4.004.50

40.0 45.0 50.0 55.0

Body Weight, in kg

Fe

ed

ing

Tim

e,

in h

r.

Y = a + b X ?

Y = a eX ?

Y = a Xb ?

Xuhua Xia

Polynomial Regression• Polynomial regression is a special type of multiple

regression whose independent variables are powers of a single variable X. It is used to approximate a curve with unknown functional form.Yi = + 1 X + 2 X2 + … + k Xk + i

• Model selection is done by successively testing highest order terms and discarding insignificant highest-order terms. Tests should use a liberal level of significance, such as = 0.25. The starting order should usually be k < N/10, where N is the number of observations.

Xuhua Xia

Polynomial Regression• The main reason for successively testing/discarding highest

degree terms and discarding insignificant terms is because the higher order terms are more prone to random error in X, i.e, the random error is multiplied several times in higher order terms.

• Suppose the true value for X is 2 but, because of measurement error, we obtain a value of 3. X2 is then 9. If we had measured the X value accurately, the X2 value would have been 4. So the value of 9 obtained is 4 + 5 units of error. X3 = 27 = 8 + 19 units of error.

• Thus, if an order-4 regression is not significantly better than an order-3 regression, then the X4 term is dropped.

• Contrast with the model selection in multiple regression with X1, X2, etc.

Xuhua Xia

Try Linear Regression First

y = 0.31x - 11.40

R2 = 0.9621.001.50

2.002.50

3.003.50

4.004.50

40.0 45.0 50.0 55.0

Body Weight, in kgF

ee

din

g T

ime

, in

hr.

-0.4-0.2

00.20.40.6

40.0 45.0 50.0 55.0

Body Weight, in kg

Res

idua

ls

Xuhua Xia

Polynomial Regression (order 3)

2.14 44.3 1962.5 86938.32.39 44.7 1998.1 89314.63.50 48.6 2362.0 114791.31.66 43.0 1849.0 79507.02.97 45.4 2061.2 93576.73.95 50.0 2500.0 125000.01.34 41.8 1747.2 73034.62.51 45.0 2025.0 91125.03.53 49.0 2401.0 117649.01.72 43.4 1883.6 81746.53.17 46.2 2134.4 98611.14.11 50.8 2580.6 131096.51.51 42.4 1797.8 76225.02.78 45.1 2034.0 91733.93.85 49.7 2470.1 122763.51.93 43.9 1927.2 84604.53.32 47.0 2209.0 103823.04.18 51.1 2611.2 133432.8

y = -0.0024x3 + 0.3234x2 - 13.964x + 197.54

R2 = 0.9753

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

40.0 45.0 50.0 55.0

Body Weight, in kg

Tim

e, in

hr.

Xuhua Xia

Polynomial Regression (order 4)

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

40.0 45.0 50.0 55.0

Body Weight, in kg

Tim

e, in

hr.

Xuhua Xia

Polynomial Regression (order 6)

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

40.0 45.0 50.0 55.0

Body W eight, in kg

Tim

e, in

hr.

If you keep increasing the number of polynomial terms in the equation, eventually you will have perfect fit. Is that what you want?

Xuhua Xia

Criteria of Model Selection

n 19 19 19 19m 1 2 3 4

R2 0.9619 0.972 0.9753 0.9755

R2adj 0.9597 0.9685 0.9704 0.9685

)1(1

11 22 R

mn

nRa

Xuhua Xia

Do the Test in SASdata polydat;input FeedTime BodyWt @@; BodyWt2=BodyWt*BodyWt; BodyWt3=BodyWt2*BodyWt; BodyWt4=BodyWt3*BodyWt;cards;1.22 40.9 2.14 44.3 2.39 44.7 3.50 48.61.66 43.0 2.97 45.4 3.95 50.0 1.34 41.82.51 45.0 3.53 49.0 1.72 43.4 3.17 46.24.11 50.8 1.51 42.4 2.78 45.1 3.85 49.71.93 43.9 3.32 47.0 4.18 51.1;proc glm; model FeedTime=BodyWt BodyWt2 BodyWt3/SS1;run;proc glm; model FeedTime=BodyWt BodyWt2/ss1 p clm;run;

Xuhua Xia

SAS Output

Dependent Variable: FEEDTIME

Source DF Sum of Squares F Value Pr > FModel 3 17.16627141 197.13 0.0001Error 15 0.43540228Corrected Total 18 17.60167368

R-Square C.V. FEEDTIME Mean 0.975264 6.251601 2.72526316

Source DF Type I SS F Value Pr > F

BODYWT 1 16.93053484 583.27 0.0001BODYWT2 1 0.17828754 6.14 0.0256BODYWT3 1 0.05744902 1.98 0.1799

Xuhua Xia

SAS Output: order of 3

T for H0: Pr>|T| Std Error ofParameter Estimate Parameter=0 Estimate

INTERCEPT 197.5414064 1.19 0.2533 166.2638449BODYWT -13.9642883 -1.28 0.2200 10.9105501BODYWT2 0.3234063 1.36 0.1945 0.2381090BODYWT3 -0.0024311 -1.41 0.1799 0.0017280

T-Test here is equivalent to F-test based on Type II SS (Type II, Type III and Type IV are all the same in regression).

Note: T-tests give misleading results for polynomial models. For our data, all t-tests are nonsignificant, which is clearly misleading. Why? (Hint: what models are the t-tests comparing?)

Xuhua Xia

SAS output: Order of 2Dependent Variable: FEEDTIME

Source DF Sum of Squares F Value Pr > FModel 2 17.10882239 277.71 0.0001Error 16 0.49285130Corrected Total 18 17.60167368

Source DF Type I SS F Value Pr > FBODYWT 1 16.93053484 549.64 0.0001BODYWT2 1 0.17828754 5.79 0.0286

T for H0: Pr>|T| Std Error ofParameter Estimate Parameter=0 EstimateINTERCEPT -35.94660928 -3.52 0.0029 10.22189563BODYWT 1.37306931 3.10 0.0069 0.44306000BODYWT2 -0.01150885 -2.41 0.0286 0.00478376

Feeding Time = -35.947 + 1.373 BodyWt - 0.012 BodyWt2

Hand-compute the adjusted R2 for the two polynomial regressions (i.e., order 3 and order 2) and decide whether X3 should be kept or discarded.

Xuhua Xia

Prediction Observation Observed Predicted Residual

1 1.22000000 0.95980313 0.26019687 2 2.14000000 2.29435461 -0.15435461 3 2.39000000 2.43386721 -0.04386721 4 3.50000000 3.60111164 -0.10111164 5 1.66000000 1.81550409 -0.15550409 6 2.97000000 2.66915245 0.30084755 7 3.95000000 3.93472678 0.01527322...... 95% Confidence Limits for Observation Mean Predicted Value

1 0.70344686 1.21615939 2 2.18244285 2.40626636 3 2.31762886 2.55010556 4 3.47982526 3.72239801 ......

Xuhua Xia

The Danger of Polynomial Regression

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Random X

Ran

dom

Y

RandX RandY

0.65232

0.95616

0.10743

0.70663

0.29166

0.01942

0.64533

0.90362

0.95148

0.67739

0.71822

0.90728

0.88513

0.64330

0.02542

0.07266

0.85852

0.85366

0.73669

0.96528

0.22272

0.18555

0.54621

0.52321

0.57460

0.65462

0.33640

0.21208

0.95080

0.04560

0.05365

0.09695

0.06928

0.35087

Xuhua Xia

Polynomial Regression (order 6)y = -110.21x6 + 426.11x5 - 645.52x4 + 465.7x3 -

156.89x2 + 21.296x - 0.4697

R2 = 0.8263

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Random X

Ran

dom

Y