Chapter 10

6
Chap. 10, page 1 Math 445 Chapter 10 Inferences About Regression Coefficients Chapter 10 concerns statistical inferences about individual regression coefficients, about linear combinations of coefficients, and about sets of coefficients. All these inferences, which are based on either the t or F distribution, are dependent on the assumptions of normality of the residuals, constant variance, and independence. Assessment of these assumptions is covered in Chapter 11. Example 1: Rainfall data Consider the additive model (a model without interactions is called “additive” because the effects of the variables are additive and don’t depend on the levels of the other variables): Rainshadow Altitude Latitude ) Rainshadow Altitude, Latitude, Precip ( 3 2 1 0 β β β β µ + + + = Assume that the linear regression model assumptions are satisfied; that is, that the model fits, that the residuals are normal with constant variance and that the observations are independent. The formal inferences we make below are valid only if these assumptions are satisfied. Coefficients a -97.557 24.554 -3.973 .0005 -148.028 -47.085 3.428 .667 5.139 .0000 2.057 4.800 .00115 .00085 1.352 .1880 -.00060 .00290 -19.688 3.439 -5.725 .0000 -26.758 -12.619 (Constant) Latitude (degrees) Altitude (ft) Rainshadow Model 1 B Std. Error Unstandardized Coefficients t Sig. Lower Bound Upper Bound 95% Confidence Interval for B Dependent Variable: Precipitation (in) a. The t statistic and P-value for each coefficient are for a two-sided test of the hypothesis that the true coefficient is 0. Is there evidence of an effect of latitude on precipitation? This is addressed by a two-sided test of the hypothesis 0 : 1 0 = β H . There is convincing evidence (P=.0005) that 1 β is greater than 0. In addition, we estimate that mean precipitation rises about 3.43 inches for every one degree increase in latitude (95% confidence interval: 2.06 to 4.80 inches) given that altitude and rain shadow remain the same. The test of 0 : 1 0 = β H (and the confidence interval) is for the model which also has Altitude and Rainshadow in it. Thus, it is a test of the effect of Latitude after the linear effects of Altitude and Rainshadow have been adjusted for. This is different than a test of 0 : 1 0 = β H without Altitude and Rainshadow in the model. We do not have convincing evidence (P = .188) that mean precipitation changes with altitude, given that latitude and rain shadow remain fixed. We estimate that mean precipitation increases by 1.15 inches for every 1000 foot increase in altitude (95% confidence interval, 0.60 inch decrease to 2.90 inch increase). Do locations in the rain shadow differ from those not in the rain shadow, after adjusting for the effects of latitude and altitude? (In other words, is there evidence that 0 3 β ?) There is

Transcript of Chapter 10

Page 1: Chapter 10

Chap. 10, page 1 Math 445 Chapter 10 Inferences About Regression Coefficients Chapter 10 concerns statistical inferences about individual regression coefficients, about linear combinations of coefficients, and about sets of coefficients. All these inferences, which are based on either the t or F distribution, are dependent on the assumptions of normality of the residuals, constant variance, and independence. Assessment of these assumptions is covered in Chapter 11. Example 1: Rainfall data Consider the additive model (a model without interactions is called “additive” because the effects of the variables are additive and don’t depend on the levels of the other variables): RainshadowAltitudeLatitude)Rainshadow Altitude, Latitude,Precip( 3210 ββββµ +++= Assume that the linear regression model assumptions are satisfied; that is, that the model fits, that the residuals are normal with constant variance and that the observations are independent. The formal inferences we make below are valid only if these assumptions are satisfied.

Coefficientsa

-97.557 24.554 -3.973 .0005 -148.028 -47.0853.428 .667 5.139 .0000 2.057 4.800

.00115 .00085 1.352 .1880 -.00060 .00290-19.688 3.439 -5.725 .0000 -26.758 -12.619

(Constant)Latitude (degrees)Altitude (ft)Rainshadow

Model1

B Std. Error

UnstandardizedCoefficients

t Sig. Lower Bound Upper Bound95% Confidence Interval for B

Dependent Variable: Precipitation (in)a.

• The t statistic and P-value for each coefficient are for a two-sided test of the hypothesis that the

true coefficient is 0. • Is there evidence of an effect of latitude on precipitation? This is addressed by a two-sided test

of the hypothesis 0: 10 =βH . There is convincing evidence (P=.0005) that 1β is greater than 0. In addition, we estimate that mean precipitation rises about 3.43 inches for every one degree increase in latitude (95% confidence interval: 2.06 to 4.80 inches) given that altitude and rain shadow remain the same.

• The test of 0: 10 =βH (and the confidence interval) is for the model which also has Altitude

and Rainshadow in it. Thus, it is a test of the effect of Latitude after the linear effects of Altitude and Rainshadow have been adjusted for. This is different than a test of 0: 10 =βH without Altitude and Rainshadow in the model.

• We do not have convincing evidence (P = .188) that mean precipitation changes with altitude,

given that latitude and rain shadow remain fixed. We estimate that mean precipitation increases by 1.15 inches for every 1000 foot increase in altitude (95% confidence interval, 0.60 inch decrease to 2.90 inch increase).

• Do locations in the rain shadow differ from those not in the rain shadow, after adjusting for the

effects of latitude and altitude? (In other words, is there evidence that 03 ≠β ?) There is

Page 2: Chapter 10

Chap. 10, page 2 completely convincing evidence (P<.00005) that locations in the rain shadow receive less precipitation on average than locations of the same latitude and altitude not in the rain shadow. What is more interesting is that locations in the rain shadow are estimated to have mean precipitation 19.7 inches less (95% confidence interval: 26.8 inches to 12.6 inches less) than equivalent locations (on altitude and latitude) not in the rain shadow.

Inferences and interpretation when there are interactions in the model When interactions are present in a model, the test of significance for the coefficient on a term which is involved in a higher order interaction is not useful because we must always include this term in the model anyway. In addition, the coefficient on this term does not have a meaningful interpretation. Example: In the Chapter 9 notes, we fit the following model to the rainfall data:

Rainshadow*LatitudeRainshadowLatitude)Rainshadow Latitude,Precip( 3210 ββββµ +++=

Coefficientsa

-175.457 26.177 -6.703 .0005.581 .705 .895 7.912 .000

139.839 39.019 4.240 3.584 .001-4.315 1.051 -4.871 -4.105 .000

(Constant)Latitude (degrees)RainshadowLatitude*Rainshadow

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Precipitation (in)a.

The coefficient on rainshadow is large and positive – but it does not mean that locations in the rainshadow are estimated to have mean precipitation 139.8 inches greater than locations of the same latitude not in the rainshadow! Why not? The statistical significance of the coefficients on the first-order terms (Latitude and Rainshadow) is also irrelevant since they are both involved in the second-order term. In particular, if either coefficient were not statistically significantly different from 0 (large P-value), that would not mean that we had no evidence of an effect of that variable. For example, if the coefficient for Latitude in the above model had had a statistically nonsignificant coefficient, that would not mean that we had no evidence of an effect of latitude, because the effect of latitude also comes through the Latitude*Rainshadow interaction, which is statistically significant. Suppose we fit the following model with a 3-way interaction:

Rainshadow*Latitude*AltitudeRainshadow*LatitudeRainshadow*AltitudeLatitude*AltitudeRainshadowLatitudeAltitude)Rainshadow Latitude,Precip(

765

43210

ββββββββµ

+++

++++=

• We must include all two-way interactions which are part of the 3-way interaction.

Page 3: Chapter 10

Chap. 10, page 3 • The coefficient on the 3-way interaction is interpreted as the difference between the effect of the

two-way interaction between any pair of variables for different levels of the third variable. For example, 7β represents the difference in the effect of the Altitude by Latitude interaction for locations in and not in the rain shadow.

• The coefficients on all the terms below the 3-way interaction have no useful interpretation as long

as the 3-way interaction is in the model, and the tests of significance of these terms are not meaningful.

• The test of significance on the coefficient on the 3-way interaction is meaningful: we have no

evidence that there is a 3-way interaction among these variables in their association with precipitation. That’s good: we generally don’t want to include a 3-way interaction unless we have strong evidence to the contrary.

• Interactions will be addressed further in the model-building chapter.

Coefficientsa

-178.154 26.390 -6.751 .000.0248 .0172 3.129 1.444 .163

5.5929 .7191 .897 7.778 .00072.7033 50.9637 2.205 1.427 .168

-.0006 .0004 -2.953 -1.358 .188.0067 .0233 .572 .289 .776

-2.4465 1.3797 -2.761 -1.773 .090-.0002 .0006 -.746 -.376 .711

(Constant)Altitude (ft)Latitude (degrees)RainshadowAltitude*LatitudeAltitude*RainshadowLatitude*RainshadowAlt*Lat*Raindshadow

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Precipitation (in)a.

Inferences for linear combinations of parameters Sometimes, the effect of interest is a linear combination of parameters. Example 2: Exercise 9.18, p. 263, Speed of Evolution. There are two binary variables: Sex and Continent. Suppose they are coded as indicator variables as follows: Sex: 0 = Female, 1 = Male Continent: 0 = NA, 1 =EU Consider the model ( ) Continent*SexContinentSexLatitudeContinent Sex, Latitude,Wing 43210 βββββµ ++++= This model implies the following relationships between Wing size and Latitude:

Page 4: Chapter 10

Chap. 10, page 4 Female, NA: ( ) Latitude0Continent 0,Sex Latitude,Wing 10 ββµ +===

Female, EU: ( ) 310 Latitude1Continent 0,Sex Latitude,Wing βββµ ++===

Male, NA: ( ) 210 Latitude0Continent 1,Sex Latitude,Wing βββµ ++===

Male, EU: ( ) 43210 Latitude1Continent 1,Sex Latitude,Wing βββββµ ++++===

• The slope coefficients are identical for all four groups since there are no interactions with Latitude.

• The intercepts are different and the differences represent the vertical distances between the

parallel lines relating Wing size to Latitude. • 3β represents the difference between mean Wing size for females in NA and EU; a test of

0: 30 =βH and a confidence interval for 3β can be obtained directly from the regression output.

• The difference between mean Wing size for males in NA and EU is 43 ββ + . An estimate of

this difference is 43ˆˆ ββ + ; however, the SE and a confidence interval cannot be easily obtained

from the regression output. SE( 43ˆˆ ββ + ) depends on the SE’s of 3β̂ and 4β̂ individually, but

also on the covariance of 3β̂ and 4β̂ . Although you can obtain the needed covariance from the

SPSS regression output to calculate SE( 43ˆˆ ββ + ), it is easier to simply reparameterize the

model to obtain this directly from the regression output.

• Reparameterization: reverse the coding on Sex: let 0 be male and 1 be female. The “Male” and “Female” labels are then switched in the above set of equations and 3β in this new model represents the difference in mean wing size for males in NA and EU; i.e., it is the same as

43 ββ + in the old model. The SE of the estimated difference can be obtained directly from the regression output.

• Reparameterizing changes the interpretation of individual parameters but it doesn’t change the

model. Inferences about the mean response at some combination of X’s. The estimated mean of Y at any combination of X’s is obtained by plugging in these values into the estimated regression equation. The standard error of the mean response can be obtained in SPSS by including an extra case in the data file which has the desired X’s but a missing value for Y. Then, as with simple linear regression, on the regression dialog box, choose Save…SE of mean predictions for the SE of the mean, and choose Prediction Intervals Mean for confidence intervals for the mean response and Prediction Intervals…Individual for prediction intervals for an individual response. These are individual confidence intervals and prediction intervals, not simultaneous. Example 1: Rainfall data. Here are some results when the additive model was fit.

RainshadowAltitudeLatitude)Rainshadow Altitude, Latitude,Precip( 3210 ββββµ +++=

Page 5: Chapter 10

Chap. 10, page 5 The fitted model is

Rainshadow*688.19Altitude*00115.0Latitude*428.3557.97)Rainshadow Altitude, Latitude,Precip(ˆ −++−=µ The predicted values, standard error of the mean (SEP), 95% confidence interval for the mean (LMCI, UMCI) and 95% prediction interval (LICI, UICI) are shown for cases 26-30 plus two new sets of X values. These confidence intervals are valid only if the assumptions of the regression model are satisfied; we have not checked these assumptions yet.

Case Precip Altitude Latitude Shadow Pred SEP LMCI UMCI LICI UICI26 9.94 19 32.7 0 14.574 3.846 6.669 22.479 -6.151 35.29927 4.25 2105 34.1 1 2.047 3.184 -4.499 8.593 -18.198 22.29228 1.66 -178 36.5 1 7.687 2.565 2.415 12.959 -12.183 27.55729 74.87 35 41.7 0 45.448 4.460 36.281 54.615 24.210 66.68630 15.95 60 39.2 1 17.217 2.989 11.072 23.362 -2.902 37.336

. 1000 35.0 0 23.586 2.892 17.640 29.531 3.527 43.645 . 3000 40.0 1 23.337 3.126 16.911 29.763 3.130 43.544

According to this model, the estimated mean annual precipitation for locations at 3000 feet and 40 degrees latitude which are in the rain shadow is 23.34 inches (95% confidence interval 16.9 to 29.8 inches). A 95% prediction interval for the annual precipitation at an individual location like this is 3.13 to 43.5 inches. Extra –Sums-of-Squares Tests We sometimes want to test a hypothesis about a set of parameters in a regression model. Recall that we did this in an ANOVA model where the overall F test tested IH µµµ === …210 : and where an extra sum of squares F test was used to compare two models. This test is valid only if the assumptions of the regression model (normality, constant variance, independence) are satisfied. Example 1: Meadowfoam study, Case Study 9.1 Suppose we fit the model regressing number of Flowers on Timing (binary variable; early or late) and Light Intensity where Light Intensity is treated as a factor with 6 levels. Thus there is an indicator variable for Timing called early (1 for early, 0 for late) and 5 indicator variables for Intensity, called L300, L450, L600, L750, L900 with 150 treated as the reference level. There are no interactions so the model is:

( ) 900L750L600L450L300LLIGHT,Flowers 6543210 βββββββµ ++++++= earlyearly A shorthand way of describing the model (see Section 9.3.5, p. 249) is:

( ) LIGHTLIGHT,Flowers += earlyearlyµ

Suppose we want to test the hypothesis that there is no effect of light intensity given that the Timing variable is in the model.. What hypothesis about the regression parameters do we want to test?

Page 6: Chapter 10

Chap. 10, page 6 To test this hypothesis, we fit a full model with early and all the indicator variables for LIGHT in the model. Then we fit a reduced model with just early in the model and carry out an extra sum-of-squares F-test just as we did in Chapter 5. Full model results:

ANOVAb

3570.464 6 595.077 13.181 .000a

767.472 17 45.1454337.936 23

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Early, L900, L750, L600, L450, L300a.

Dependent Variable: Flowersb.

Coefficientsa

67.196 3.629 18.518 .000-9.125 4.751 -.253 -1.921 .072

-13.375 4.751 -.371 -2.815 .012-23.225 4.751 -.644 -4.888 .000-27.750 4.751 -.769 -5.841 .000-29.350 4.751 -.814 -6.178 .00012.158 2.743 .452 4.432 .000

(Constant)L300L450L600L750L900Early

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Flowersa.

Reduced model results:

ANOVAb

886.950 1 886.950 5.654 .027a

3450.986 22 156.8634337.936 23

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Earlya.

Dependent Variable: Flowersb.

Coefficientsa

50.058 3.616 13.845 .00012.158 5.113 .452 2.378 .027

(Constant)Early

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Flowersa.

Carry out the F-test (the coefficients above are not necessary for this test, only the ANOVA table).