Chapter 12

20
Chapter 12, page 1 Math 445 Chapter 12: Strategies for Variable Selection Chapter 12 introduces criteria for model selection and comparison and discusses some standard methods of choosing models. Model selection depends on the objectives of the study. Ramsey and Schafer identify three possible objectives (pp. 345-6) that will influence how you select a model or models: 1. Adjusting for a large set of explanatory variables. We want to examine the effect of a particular variable or variables after adjusting for the effect of other variables which we know may affect the response. 2. Fishing for explanation. Which variables are important in explaining the response? 3. Prediction. All that is desired is a model that predicts the response well from the values of the explanatory variables. Interpretation of the model is not a goal. Before considering some criteria by which a “best” model might be selected among a class of models, let’s review model development. Model Development Steps Variable Selection: identification of response variable(s) and all candidate explanatory variables. This is done in the planning stages of a study. Note that a general rule of thumb is that you need 5 to 10 times as many observations as there are explanatory variables in order to do a good job of model selection and fitting. Model Formation: fitting and comparing models based on some selection criteria to determine one or more candidate models. Model Diagnostics: checking for problems in the model and/or its assumptions. 1. Residual Analysis - identifying outliers, missing variables, model lack of fit, and violation of assumptions. 2. Influence Statistics - identifying influential observations, or those which have a great effect on the form of the model. Example : Suppose we are studying differences in abundance of bird species in 3 forest habitats. The habitats represent various levels of prescribed burns. The experiment itself consists of counting the number of birds of each species type heard from a station within 100 meters in a 10-minute period. Many stations were used in the study for replication. What is the response variable? Explanatory variables? Habitat type, Neighboring habitat type, elevation, slope, aspect, visibility, etc. Suppose we have 8 candidate explanatory variables 8 1 , , X X . How many possible first-order models are there? With 20 variables, there are 1,048,575 models, and these are only the first-order models. Clearly fitting all possible models is not a feasible prospect.

Transcript of Chapter 12

Page 1: Chapter 12

Chapter 12, page 1 Math 445 Chapter 12: Strategies for Variable Selection Chapter 12 introduces criteria for model selection and comparison and discusses some standard methods of choosing models. Model selection depends on the objectives of the study. Ramsey and Schafer identify three possible objectives (pp. 345-6) that will influence how you select a model or models:

1. Adjusting for a large set of explanatory variables. We want to examine the effect of a particular variable or variables after adjusting for the effect of other variables which we know may affect the response.

2. Fishing for explanation. Which variables are important in explaining the response?

3. Prediction. All that is desired is a model that predicts the response well from the values of the explanatory variables. Interpretation of the model is not a goal.

Before considering some criteria by which a “best” model might be selected among a class of models, let’s review model development. Model Development Steps • Variable Selection: identification of response variable(s) and all candidate explanatory variables.

This is done in the planning stages of a study. Note that a general rule of thumb is that you need 5 to 10 times as many observations as there are explanatory variables in order to do a good job of model selection and fitting.

• Model Formation: fitting and comparing models based on some selection criteria to determine one or more candidate models.

• Model Diagnostics: checking for problems in the model and/or its assumptions. 1. Residual Analysis - identifying outliers, missing variables, model lack of fit, and

violation of assumptions. 2. Influence Statistics - identifying influential observations, or those which have a great

effect on the form of the model. Example: Suppose we are studying differences in abundance of bird species in 3 forest habitats. The habitats represent various levels of prescribed burns. The experiment itself consists of counting the number of birds of each species type heard from a station within 100 meters in a 10-minute period. Many stations were used in the study for replication. What is the response variable? Explanatory variables? Habitat type, Neighboring habitat type, elevation, slope, aspect, visibility, etc. Suppose we have 8 candidate explanatory variables 81 ,, XX … . How many possible first-order models are there? With 20 variables, there are 1,048,575 models, and these are only the first-order models. Clearly fitting all possible models is not a feasible prospect.

Page 2: Chapter 12

Chapter 12, page 2 Criteria for selecting models

1. R2 : R2 cannot decrease when variables are added so the model maximizing R2 is the one with all the variables. Maximizing R2 is equivalent to minimizing SSE. R2 is an appropriate way to compare models with the same number of explanatory variables (as long as the response variable is the same). Be aware that measures like R2 based on correlations are sensitive to outliers.

2. MSE = SSE/(n-p): MSE can increase when variables are added to the model so minimizing

MSE is a reasonable procedure. However, minimizing MSE is equivalent to maximizing adjusted R2 (discussed below) and tends to overfit (include too many variables).

3. Adjusted R2 : This statistic adjusts R2 by including a penalty for the number of parameters in

the model. This statistic is closely related to both R2 and MSE, as shown below.

Adjusted R2 =

pnRpR

−−−

−=−=−

=)1)(1(

MSTMSE1

MSTMSEMST

squaremean Totalsquaremean Residual-squaremean Total 2

2

where p is the number of coefficients (including the intercept) in the model. The third expression shows that maximizing adjusted R2 is equivalent to minimizing MSE since MST is fixed (it’s simply the variance of the response variable).

• Adjusted R2 tends to select models with too many variables (overfitting). This can be seen from the fact that adjusted R2 will increase when a variable is added if the F statistic for comparing the two models is greater than 1. This is a very generous criterion as this corresponds to a significance level of around .5.

4. Mallows’ Cp: The Cp statistic assumes that the full model with all variables fits. Then Cp is computed for a reduced model as

Cp = ( )

nppnpnp −+−=−

−+ 2ˆˆ

)(ˆ

ˆˆ)( 2

full

2

2full

2full

2

σσ

σσσ

where p is the number of coefficients (including the intercept) in the reduced model.

• Note that 2σ̂ is simply MSE (mean square error or mean square residual) for a model. • Models with small values of Cp are considered better and, ideally, we look for the smallest

model with a Cp of around p or smaller. Some statistics programs will compute Cp for a large set of models and plot Cp versus p, as in Display 12.9 on p. 357. Unfortunately, SPSS does not compute Cp automatically.

• CP assumes that the full model fits and satisfies all the regression model assumptions.

Outliers, unexplained nonlinearity, nonconstant variance, may seriously affect the performance of Cp as a model selection tool.

Page 3: Chapter 12

Chapter 12, page 3 • Mallow's Cp is closely related to AIC. AIC has come to be preferred by many statisticians

in recent years.

5. Akaike's Information Criterion (AIC): The AIC statistic for a model is given by:

AIC = pn

n 2SSEln +⎟⎠⎞

⎜⎝⎛

where SSE = the error SS for the model under consideration and ln is natural log. • The model with the smallest AIC value is considered best. • The term 2p is the penalty for the number of parameters in the model. • Ripley: “AIC has been criticized in asymptotic studies and simulation studies for tending to

over-fit, that is, choose a model at least as large as the true model. That is a virtue, not a deficiency: this is a prediction-based criterion, not an explanation based one.'' BIC (below) is a criterion based on “explanation” approach and places a bigger penalty on the number of parameters.

• AIC can only be used to compare models. It is not an absolute measure of fit of the model

like R2 is. The model with the smallest AIC among those you examined may fit the data best, but this does not mean it's a good model. Therefore, selecting which models to consider (which variables, transformations, form of the model) and making sure the models satisfy the regression model assumptions is very important.

• Since AIC is not an absolute measure of fit, many authors suggest reporting ∆AIC, the

difference between the AIC of each model and the AIC of the best fitting model. A further suggestion is to consider all models with ∆AIC less than about 2 as having essentially equal support.

• Neither AIC nor Cp nor R2 nor adjusted R2 can be used to compare models with different

response variables. • AIC is based on the assumption that the models satisfy the regression model assumptions

and can be greatly affected by outliers.

6. Bayesian Information Criterion (BIC). BIC is similar to AIC but the penalty on the number of parameters is pln(n) where ln is the natural log. That is,

BIC = )ln(SSEln npn

n +⎟⎠⎞

⎜⎝⎛

BIC is motivated by a Bayesian approach to model selection and is said not to tend to overfit like AIC. Therefore, it may be better for model selection for “explanation.” The purpose of having the penalty depend on the sample size n is to reduce the likelihood that small and relatively unimportant parameters are included (which is more likely with large n).

Page 4: Chapter 12

Chapter 12, page 4 7. PRESS Statistic (not in text): another prediction-based model selection statistic is the PRESS

statistic. It is calculated as follows: Remove the ith observation and fit the model with the remaining n-1 observations. Then use this model to calculate a predicted value for the left-out observation; call this predicted value *

iY . Compute *ii YY − , the difference between the

observed response and the predicted response from the model without the ith observation in it. Repeat this process for each data value. The PRESS statistic is then defined as:

PRESS = ( )∑=

−n

iii YY

1

2*

• The model with the smallest PRESS statistic is considered “best.” • Leaving one item out at a time is known as n-fold cross-validation or leave-one-out cross-

validation.. • The *

ii YY − are called “deleted” residuals in SPSS. So the PRESS statistic can be computed in SPSS by saving the deleted residuals, creating a new variable which is the square of the deleted residuals, then computing the sum of this new variable using Analyze…Descriptive Statistics…Descriptives and choosing Sum under Options.

• PRESS is similar to SSE, but is based on the deleted residuals rather than the raw residuals.

Unlike SSE, it’s possible for PRESS to increase when variables are added to the model. The PRESS statistic is an example of the general idea of using crossvalidation to assess the predictive power of models. A model will generally predict the data it's based on better than new data and bigger models will necessarily do a better job of predicting the data they’re based on than smaller models: SSE always decreases as more terms are added to the model. A less biased way of assessing the predictive power of a model is to use the following general idea: fit a model using a subset of the data, then validate the model using the remainder of the data. This is called crossvalidation (abbreviated CV). In k-fold CV, the data are randomly split into k approximately equal-sized subsets. Each subset is left out in turn and the model based on the remaining subsets is used to predict for the left-out subset. The PRESS statistic is based on n-fold CV, that is, only one observation at a time is left out. Simulations have suggested that smaller values of k may work better; 10-fold CV has become a standard method of cross-validation. Cross-validation is most useful as a way to compare models rather than as an absolute measure of how good the predictions will be. This is because the model used for prediction of each subset is different than the model based on all the data that will actually be used to predict future observations. Each of the models being compared should use the same splits of the data. It’s also best to repeat the 10-fold CV several times and average the results.

Page 5: Chapter 12

Chapter 12, page 5 Example Ozone data without case 17. n = 110 cases. Dependent variable is log10(ozone). All possible models with main effects and two-way interactions

Model p SSE R2 MSE AIC BIC PRESSW + T + S + W:T + W:S + T:S 7 21.534 0.695 0.209 -165.39 -146.49 25.62W + T + S + W:T + W:S 6 22.152 0.687 0.213 -164.28 -148.08 25.56W + T + S + W:T + T:S 6 21.537 0.695 0.207 -167.38 -151.17 24.51W + T + S + W:S + T:S 6 21.867 0.691 0.210 -165.70 -149.50 25.44W + T + S + W:T 22.182 24.55W + T + S + W:S 5 22.726 0.679 0.216 -163.47 -149.96 25.63W + T + S + T:S 5 21.897 0.690 0.209 -167.56 -154.05 24.54W + T + S 4 23.069 0.674 0.218 -163.82 -153.02 25.20W + T + W:T 4 26.372 0.627 0.249 -149.10 -138.30 28.54W + T 3 26.995 0.618 0.252 -148.53 -140.43 28.78W + S + W:S 4 36.121 0.489 0.341 -114.50 -103.69 39.39W + S 3 36.410 0.485 0.340 -115.62 -107.52 38.70T + S + T:S 4 27.029 0.618 0.255 -146.39 -135.59 29.22T + S 3 28.038 0.603 0.262 -144.36 -136.26 29.68W 2 44.985 0.364 0.417 -94.36 -88.95 46.84T 2 31.908 0.549 0.295 -132.14 -126.74 32.98S 2 57.974 0.180 0.537 -66.45 -61.05 60.15Constant 1 70.695 0.000 0.649 -46.63 -43.93 72.00 All possible models with main effects and quadratic terms

Model p SSE R2 MSE AIC BIC PRESSW + T + S + W^2 + T^2 + S^2 7 20.175 0.715 0.196 -172.56 -153.66 23.57W + T + S + W^2 + T^2 6 20.754 0.706 0.200 -171.45 -155.25 23.79W + T + S + W^2 + S^2 6 20.875 0.705 0.201 -170.81 -154.61 23.51W + T + S + T^2 + S^2 6 21.270 0.699 0.205 -168.75 -152.55 24.15W + T + S + W^2 21.393 23.65W + T + S + T^2 5 21.818 0.691 0.208 -167.95 -154.45 24.36W + T + S + S^2 5 22.614 0.680 0.215 -164.01 -150.51 25.12W + T + W^2 + T^2 5 24.924 0.647 0.237 -153.31 -139.81 28.19W + T + W^2 4 25.390 0.641 0.240 -153.27 -142.47 27.68W + T + T^2 4 25.998 0.632 0.245 -150.67 -139.87 28.33W + S + W^2 + S^2 5 29.996 0.576 0.286 -132.94 -119.43 32.79W + S + W^2 4 32.958 0.534 0.311 -124.58 -113.78 35.31W + S + S^2 4 33.350 0.528 0.315 -123.28 -112.47 36.12T + S + T^2 + S^2 5 25.466 0.640 0.243 -150.95 -137.44 28.14T + S + T^2 4 26.418 0.626 0.249 -148.91 -138.11 28.58T + S + S^2 4 27.207 0.615 0.257 -145.67 -134.87 29.39W + W^2 3 41.263 0.416 0.386 -101.86 -93.76 43.98T + T^2 3 30.579 0.567 0.286 -134.82 -126.72 32.32S + S^2 3 49.093 0.306 0.459 -82.74 -74.64 51.72

Page 6: Chapter 12

Chapter 12, page 6 Approaches to choosing a model There are a number of possible approaches to model selection using the measures above to compare and select models:

• Choose several models a priori that make scientific sense. Use criteria above (like AIC and BIC) to compare models.

• Examine all possible models involving the variables, including interactions and/or quadratic

terms or both (this is what was done with Ozone data). Generally feasible only up to 3 or 4 variables.

• Examine all main effects models only (there are 2k-1 possible models where k is the number of

variables). Consider interactions or other higher order terms only after the main effects have been selected.

• If the number of variables is large, select a subset of the variables first, perhaps based on the

correlation of each of the variables individually with the response and/or eliminating redundant variables (ones which are highly correlated with another variable). Then proceed with one of the above approaches.

• If the number of variables is large, use stepwise regression to select possible models. Stepwise

regression does not require examination of all models. Some authors do not believe in stepwise methods and other procedures that search for “good-fitting” models because they are essentially searching through many tens or hundreds of possible models, whether they make any scientific sense or not, and picking the “best” ones. The more models you consider, the higher the likelihood you will select the “wrong” one. Therefore, they believe, you should select a few models a priori that you will compare. Others argue that there is no “right” model and that if the goal is prediction, it does not matter if the model makes physical sense. In that case, cross-validation (discussed above) might be an important tool. Stepwise regression Stepwise regression methods attempt to find models minimizing or maximizing some criterion without examining every possible model. Stepwise methods are not guaranteed to find the best model (in terms of the criterion selected), but simply try to find the best models using a one-step at a time approach. The three most common types of subset selection methods employed are outlined below. The criterion used in these descriptions is the F statistic for comparing two nested models, but stepwise methods can also use the associated P-value, or AIC or BIC as a criterion. The latter two are now generally preferred to the F statistic or P-value. SPSS, however, only does stepwise regression with the F statistic or P-value. The three types of stepwise methods are: Forward Selection

1. Start with the model with only the constant.

Page 7: Chapter 12

Chapter 12, page 7 2. Consider all models which consist of the current model plus one more term. For each term not

in the model, calculate its “F-to-enter” (the extra sum-of-squares F statistic). Identify the variable with the largest F-to-enter. Higher order terms (interactions, quadratic terms) are eligible for entry only if all lower order terms involved in them are already in the model. For example, do not consider the interaction AxB for entry unless both A and B individually are already in the model.

3. If the largest F-to-enter is greater than 4 (or some other user-specified number), add this

variable to get a new current model and return to step 2. If the largest F-to-enter is less than the user-specified number, stop.

The criterion could also be the P-value for the F-test, in which case a term is added only if its P-value is less than the user-specified cutoff (usually somewhere between .05 to .20). If a variable is a categorical variable with more than 2 levels, we add all the indicator variables for this variable at once. Note that once a variable has been entered it cannot be removed, even if its coefficient becomes statistically nonsignificant with the addition of other variables, which is possible. Backward Elimination

1. Start with the model with all of the candidate variables and any higher order terms which might be important.

2. Calculate the F-to-remove for each variable in the current model (the extra-sum-of-squares test statistic). Identify the variable with the smallest F-to-remove. A lower order term is eligible for removal only if all higher order terms involving that variable have already been removed. For example, the variable A is not eligible for removal if AxB is still in the model.

3. If the smallest F-to-remove is 4 (or some other user-specified number) or less, then remove that variable to get a new current model and return to step 2. If the smallest F-to-remove is greater than the user-specified number, stop.

• Again, the criterion for removal could be the P-value (remove a variable only if its P-value is

greater than the cutoff). • Backward elimination is preferred to forward selection by many users because it does not

eliminate a term unless there is good reason to (forward selection, on the other hand, does not include a term unless there is convincing evidence to include it).

Stepwise Selection This method is a hybrid of the previous two, involving both forward selection and backward elimination.

1. Start with the model with only the constant.

2. Do one step of forward selection.

3. Do one step of backward elimination.

4. Repeat steps 2 and 3 until no changes occur during one cycle of steps 2 and 3. The F-to-enter must be greater than the F-to-remove; otherwise, you could have a never-ending cycle of a variable being entered, then eliminated. If a P-value cutoff is used, then the P for entry must be smaller than the P for removal.

Page 8: Chapter 12

Chapter 12, page 8 Forward selection in SAT data (Case study 12.1) using P of .05 or less to enter. Preliminary analysis presented in text suggested that log of percent taking exam (log(takers)) should be used in place of takers.

Coefficientsa

1112.248 12.275 90.611 .000-135.896 9.476 -.900 -14.340 .0001060.351 15.539 68.239 .000-148.061 8.459 -.981 -17.504 .000

2.900 .646 .252 4.488 .000851.315 87.022 9.783 .000

-143.383 8.272 -.950 -17.333 .0002.698 .620 .234 4.350 .000

12.833 5.265 .127 2.438 .019

(Constant)Log10(takers)(Constant)Log10(takers)expend(Constant)Log10(takers)expendyears

Model1

2

3

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: sata.

Excluded Variablesd

.078a .997 .324 .144 .648

.157a 2.592 .013 .354 .960

.048a .755 .454 .109 .980

.252a 4.488 .000 .548 .897

.221a 1.028 .309 .148 .086-.057b -.783 .438 -.115 .533.127b 2.438 .019 .338 .943

-.014b -.254 .801 -.037 .916.101b .546 .588 .080 .084

-.051c -.726 .472 -.108 .532.056c .938 .353 .138 .727.369c 1.939 .059 .278 .067

incomeyearspublicexpendrankincomeyearspublicrankincomepublicrank

Model1

2

3

Beta In t Sig.Partial

Correlation Tolerance

CollinearityStatistics

Predictors in the Model: (Constant), Log10(takers)a.

Predictors in the Model: (Constant), Log10(takers), expendb.

Predictors in the Model: (Constant), Log10(takers), expend, yearsc.

Dependent Variable: satd.

Page 9: Chapter 12

Chapter 12, page 9 These three stepwise methods will not necessarily lead to the same model. In addition, changes in the F or P-to-enter and F or P-to-remove can result in more or fewer variables in the final model. The SPSS stepwise regression procedure has some disadvantages. SPSS has no way of knowing that some variables may be higher order terms that involve lower order terms. Therefore, it cannot enforce the restriction that higher order terms cannot be added before the corresponding lower order terms have been added, nor that lower order terms cannot be eliminated until all higher order terms involving them have been eliminated (that is why I used the SAT data and not the Ozone data with higher order terms in this example). SPSS also cannot treat the set of indicator variables corresponding to a categorical variable as one set of variables that should all be added or eliminated at once. However, SPSS does allow you to define blocks of explanatory variables which can be treated differently in stepwise regression. Therefore, for the ozone data, where I wanted to look at adding two-way interactions and quadratic terms, I defined Block 1 to be Wind, MaxTemp and SolarRad and Block 2 to be all the two way interactions and quadratic terms. I also defined the “Method” for Block 1 to be “Enter”, which means these variables will be in the starting model and cannot be eliminated. I also defined the “Method” for Block 2 to be “Stepwise”, which means these variables can be added or eliminated. The P-to-enter and P-to-remove were the default values of .05 and .10, respectively. will be in the starting model and cannot be eliminated. Ozone data, case #17 deleted: stepwise regression; Wind, MaxTemp and SolarRad fored to be in the model.

Coefficientsa

.114 .226 .504 .615-.030 .006 -.308 -4.779 .000.019 .002 .519 7.830 .000.001 .000 .245 4.248 .000.518 .260 1.992 .049

-.096 .024 -.980 -4.040 .000.018 .002 .489 7.534 .000.001 .000 .247 4.429 .000.003 .001 .676 2.868 .005

(Constant)Wind speed (mph)Maximum temperature (F)Solar radiation (langleys)(Constant)Wind speed (mph)Maximum temperature (F)Solar radiation (langleys)Wind^2

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Log10(Ozone)a.

Excluded Variables

.676 2.868 .005 .270 .0521.929 2.454 .016 .233 .005-.359 -1.453 .149 -.140 .050-.776 -2.049 .043 -.196 .021-.256 -1.258 .211 -.122 .0741.198 2.371 .020 .225 .0121.431 1.789 .076 .173 .004-.384 -1.606 .111 -.156 .050-.021 -.038 .969 -.004 .010-.117 -.572 .568 -.056 .069.933 1.846 .068 .178 .011

Wind^2MaxTemp^2SolarRad^2WindTempWindSolarTempSolarMaxTemp^2SolarRad^2WindTempWindSolarTempSolar

Model1

2

Beta In t Sig.Partial

Correlation Tolerance

CollinearityStatistics

Page 10: Chapter 12

Chapter 12, page 10 One significant problem with using the F statistic or P-value is that the addition and elimination of variables is not based on a criterion for comparing models – the final model is not necessarily “optimal” in any sense. Why not add or eliminate variables based on one of the measures considered in the first part of this handout, such as AIC or BIC? The stepAIC function in the MASS library of S-Plus does stepwise regression using AIC (or BIC) as the criterion. In forward selection, it looks for the single variable which reduces AIC the most; if no variable reduces AIC, then it stops. In backward elimination, the goal is the same: find the variable whose elimination reduces AIC the most. If no variable reduces AIC when it's eliminated, then stop. In stepwise using both directions, find the addition or deletion which reduces AIC the most. Using AIC has the additional appeal of not having to set arbitrary criteria for entering and removing variables. The stepAIC function also handles categorical variables and interactions properly: an interaction cannot be added unless all the variables involved in the interaction have been added; similarly, a variable cannot be eliminated unless all higher order interactions involving that variable have been eliminated. Unfortunately, stepAIC does not handle quadratic terms properly. > m0 <- lm(sat~1,data=case1201) > summary(m0) Call: lm(formula = sat ~ 1, data = case1201) Residuals: Min 1Q Median 3Q Max -158.4 -59.45 19.55 50.55 139.6 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 948.4490 10.2140 92.8574 0.0000 Residual standard error: 71.5 on 48 degrees of freedom Multiple R-Squared: 2.465e-029 F-statistic: Inf on 0 and 48 degrees of freedom, the p-value is NA > stepAIC(m0,~log(takers) + income + years + public + expend + rank) Start: AIC= 419.42 sat ~ 1 Df Sum of Sq RSS AIC + log(takers) 1 199006.8593 46369.26 339.7760 + rank 1 190296.7388 55079.38 348.2108 + income 1 102026.4049 143349.72 395.0799 + years 1 26338.2438 219037.88 415.8538 <none> NA NA 245376.12 419.4176 + public 1 1231.7335 244144.39 421.1710 + expend 1 385.5838 244990.54 421.3406 Step: AIC= 339.78 sat ~ log(takers) Df Sum of Sq RSS AIC + expend 1 20523.4615 25845.80 313.1361 + years 1 6363.5198 40005.74 334.5429 <none> NA NA 46369.26 339.7760 + rank 1 871.1345 45498.13 340.8467 + income 1 785.0507 45584.21 340.9393 + public 1 448.9059 45920.36 341.2993

Page 11: Chapter 12

Chapter 12, page 11 - log(takers) 1 199006.8593 245376.12 419.4176 Step: AIC= 313.14 sat ~ log(takers) + expend Df Sum of Sq RSS AIC + years 1 1248.184463 24597.62 312.7106 + rank 1 1053.599508 24792.20 313.0967 <none> NA NA 25845.80 313.1361 + income 1 53.329409 25792.47 315.0349 + public 1 1.292761 25844.51 315.1336 - expend 1 20523.461462 46369.26 339.7760 - log(takers) 1 219144.737003 244990.54 421.3406 Step: AIC= 312.71 sat ~ log(takers) + expend + years Df Sum of Sq RSS AIC + rank 1 2675.51301 21922.10 309.0681 <none> NA NA 24597.62 312.7106 - years 1 1248.18446 25845.80 313.1361 + public 1 287.82166 24309.80 314.1339 + income 1 19.19044 24578.43 314.6724 - expend 1 15408.12616 40005.74 334.5429 - log(takers) 1 190946.97826 215544.60 417.0660 Step: AIC= 309.07 sat ~ log(takers) + expend + years + rank Df Sum of Sq RSS AIC <none> NA NA 21922.10 309.0681 + income 1 505.3684 21416.74 309.9253 + public 1 185.0259 21737.08 310.6528 - rank 1 2675.5130 24597.62 312.7106 - years 1 2870.0980 24792.20 313.0967 - log(takers) 1 5094.3405 27016.44 317.3067 - expend 1 13619.6111 35541.72 330.7455 Call: lm(formula = sat ~ log(takers) + expend + years + rank, data = case1201) Coefficients: (Intercept) log(takers) expend years rank 399.1147 -38.1005 3.995661 13.14731 4.400277 Degrees of freedom: 49 total; 44 residual Residual standard error: 22.32106 Stepwise regression starting with the main effects model and allowing all two-way interactions. > stepAIC(mfull,list(upper=~.^2,lower=~1)) Start: AIC= 311.88 sat ~ log(takers) + income + years + public + expend + rank Df Sum of Sq RSS AIC + years:public 1 5027.807692 16368.93 300.7547 + log(takers):public 1 3617.792915 17778.95 304.8035 + income:public 1 1977.822427 19418.92 309.1269 + income:years 1 1804.755461 19591.98 309.5617 - public 1 19.997447 21416.74 309.9253 + public:rank 1 1452.863422 19943.87 310.4340

Page 12: Chapter 12

Chapter 12, page 12 - income 1 340.339906 21737.08 310.6528 + log(takers):years 1 1197.663996 20199.07 311.0570 + log(takers):income 1 1194.412626 20202.33 311.0649 + income:rank 1 1046.006240 20350.73 311.4235 <none> NA NA 21396.74 311.8795 + years:rank 1 485.951497 20910.79 312.7538 + log(takers):expend 1 447.951860 20948.79 312.8428 + expend:rank 1 323.487437 21073.25 313.1330 + years:expend 1 93.688852 21303.05 313.6645 + public:expend 1 51.522079 21345.22 313.7614 + log(takers):rank 1 44.248267 21352.49 313.7781 + income:expend 1 9.445369 21387.29 313.8579 - log(takers) 1 2150.004922 23546.74 314.5712 - years 1 2531.615348 23928.35 315.3590 - rank 1 2679.046601 24075.78 315.6599 - expend 1 10964.372896 32361.11 330.1517 Step: AIC= 300.75 sat ~ log(takers) + income + years + public + expend + rank + years:public Df Sum of Sq RSS AIC - income 1 193.844212 16562.77 299.3315 + log(takers):public 1 923.331155 15445.60 299.9097 + income:rank 1 869.194138 15499.74 300.0811 <none> NA NA 16368.93 300.7547 + public:rank 1 587.095100 15781.84 300.9649 + expend:rank 1 513.555766 15855.37 301.1927 + log(takers):expend 1 496.074306 15872.86 301.2467 + log(takers):income 1 417.822552 15951.11 301.4877 + income:public 1 119.187306 16249.74 302.3966 + log(takers):rank 1 96.896741 16272.03 302.4638 + income:expend 1 16.336369 16352.59 302.7058 + income:years 1 10.664688 16358.27 302.7227 + log(takers):years 1 9.199796 16359.73 302.7271 + public:expend 1 4.688396 16364.24 302.7406 + years:rank 1 4.080195 16364.85 302.7425 + years:expend 1 3.618119 16365.31 302.7439 - log(takers) 1 2319.536747 18688.47 305.2482 - rank 1 2533.477921 18902.41 305.8060 - years:public 1 5027.807692 21396.74 311.8795 - expend 1 13670.486641 30039.42 328.5038 Step: AIC= 299.33 sat ~ log(takers) + years + public + expend + rank + years:public Df Sum of Sq RSS AIC + log(takers):public 1 7.036022e+002 15859.17 299.2045 <none> NA NA 16562.77 299.3315 + expend:rank 1 6.439627e+002 15918.81 299.3884 + log(takers):expend 1 6.224671e+002 15940.31 299.4545 + public:rank 1 4.726451e+002 16090.13 299.9129 + income 1 1.938442e+002 16368.93 300.7547 + public:expend 1 3.375877e+000 16559.40 301.3216 + log(takers):rank 1 1.935137e+000 16560.84 301.3258 + years:expend 1 1.528711e+000 16561.25 301.3270 + years:rank 1 8.679866e-001 16561.91 301.3290 + log(takers):years 1 5.202697e-002 16562.72 301.3314 - rank 1 2.456165e+003 19018.94 304.1071 - log(takers) 1 2.985168e+003 19547.94 305.4514 - years:public 1 5.174303e+003 21737.08 310.6528 - expend 1 1.615704e+004 32719.81 330.6919

Page 13: Chapter 12

Chapter 12, page 13 Step: AIC= 299.2 sat ~ log(takers) + years + public + expend + rank + years:public +

log(takers):public Df Sum of Sq RSS AIC <none> NA NA 15859.17 299.2045 + expend:rank 1 602.5956096 15256.58 299.3063 - log(takers):public 1 703.6021875 16562.77 299.3315 + log(takers):expend 1 549.9128359 15309.26 299.4752 + income 1 413.5731794 15445.60 299.9097 + years:rank 1 141.9104795 15717.26 300.7640 + log(takers):years 1 102.4165565 15756.76 300.8870 + public:rank 1 54.7708444 15804.40 301.0350 + public:expend 1 39.7984090 15819.37 301.0813 + log(takers):rank 1 6.6716882 15852.50 301.1839 + years:expend 1 0.8878288 15858.28 301.2017 - years:public 1 2725.3253513 18584.50 304.9749 - rank 1 3086.8696076 18946.04 305.9190 - expend 1 12860.9171063 28720.09 326.3031 Call: lm(formula = sat ~ log(takers) + years + public + expend + rank + years:public + log(takers):public, data = case1201) Coefficients: (Intercept) log(takers) years public expend rank years:public 2590.556 19.42852 -134.2278 -26.43972 4.347684 5.991911 1.661026 log(takers):public -0.5848999 Degrees of freedom: 49 total; 41 residual Residual standard error: 19.66746 Stepwise using BIC > stepAIC(mfull,list(upper=~.^2,lower=~1),k=log(49)) Start: AIC= 325.12 sat ~ log(takers) + income + years + public + expend + rank Df Sum of Sq RSS AIC + years:public 1 5027.807692 16368.93 315.8892 + log(takers):public 1 3617.792915 17778.95 319.9381 - public 1 19.997447 21416.74 321.2762 - income 1 340.339906 21737.08 322.0037 + income:public 1 1977.822427 19418.92 324.2615 + income:years 1 1804.755461 19591.98 324.6963 <none> NA NA 21396.74 325.1222 + public:rank 1 1452.863422 19943.87 325.5686 - log(takers) 1 2150.004922 23546.74 325.9221 + log(takers):years 1 1197.663996 20199.07 326.1916 + log(takers):income 1 1194.412626 20202.33 326.1995 + income:rank 1 1046.006240 20350.73 326.5581 - years 1 2531.615348 23928.35 326.7099 - rank 1 2679.046601 24075.78 327.0109 + years:rank 1 485.951497 20910.79 327.8884 + log(takers):expend 1 447.951860 20948.79 327.9773 + expend:rank 1 323.487437 21073.25 328.2676 + years:expend 1 93.688852 21303.05 328.7990 + public:expend 1 51.522079 21345.22 328.8959 + log(takers):rank 1 44.248267 21352.49 328.9126

Page 14: Chapter 12

Chapter 12, page 14 + income:expend 1 9.445369 21387.29 328.9924 - expend 1 10964.372896 32361.11 341.5027 Step: AIC= 315.89 sat ~ log(takers) + income + years + public + expend + rank + years:public Df Sum of Sq RSS AIC - income 1 193.844212 16562.77 312.5743 <none> NA NA 16368.93 315.8892 + log(takers):public 1 923.331155 15445.60 316.9361 + income:rank 1 869.194138 15499.74 317.1075 + public:rank 1 587.095100 15781.84 317.9913 + expend:rank 1 513.555766 15855.37 318.2191 + log(takers):expend 1 496.074306 15872.86 318.2731 - log(takers) 1 2319.536747 18688.47 318.4910 + log(takers):income 1 417.822552 15951.11 318.5141 - rank 1 2533.477921 18902.41 319.0487 + income:public 1 119.187306 16249.74 319.4230 + log(takers):rank 1 96.896741 16272.03 319.4901 + income:expend 1 16.336369 16352.59 319.7321 + income:years 1 10.664688 16358.27 319.7491 + log(takers):years 1 9.199796 16359.73 319.7535 + public:expend 1 4.688396 16364.24 319.7670 + years:rank 1 4.080195 16364.85 319.7688 + years:expend 1 3.618119 16365.31 319.7702 - years:public 1 5027.807692 21396.74 325.1222 - expend 1 13670.486641 30039.42 341.7466 Step: AIC= 312.57 sat ~ log(takers) + years + public + expend + rank + years:public Df Sum of Sq RSS AIC <none> NA NA 16562.77 312.5743 + log(takers):public 1 7.036022e+002 15859.17 314.3390 + expend:rank 1 6.439627e+002 15918.81 314.5230 + log(takers):expend 1 6.224671e+002 15940.31 314.5891 + public:rank 1 4.726451e+002 16090.13 315.0475 - rank 1 2.456165e+003 19018.94 315.4581 + income 1 1.938442e+002 16368.93 315.8892 + public:expend 1 3.375877e+000 16559.40 316.4561 + log(takers):rank 1 1.935137e+000 16560.84 316.4604 + years:expend 1 1.528711e+000 16561.25 316.4616 + years:rank 1 8.679866e-001 16561.91 316.4635 + log(takers):years 1 5.202697e-002 16562.72 316.4659 - log(takers) 1 2.985168e+003 19547.94 316.8024 - years:public 1 5.174303e+003 21737.08 322.0037 - expend 1 1.615704e+004 32719.81 342.0428 Call: lm(formula = sat ~ log(takers) + years + public + expend + rank + years:public, data = case1201) Coefficients: (Intercept) log(takers) years public expend rank years:public 3274.012 -34.05226 -164.8157 -33.8661 4.651103 5.040749 2.042115 Degrees of freedom: 49 total; 42 residual Residual standard error: 19.85829

Page 15: Chapter 12

Chapter 12, page 15 > m1 <- lm(log(ozone)~wind+temp+solar,data=Ozone) > summary(m1) Call: lm(formula = log(ozone) ~ wind + temp + solar, data = Ozone) Residuals: Min 1Q Median 3Q Max -1.0203 -0.31515 -0.0093072 0.32296 1.1222 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.26236 0.52033 0.50423 0.61515 wind -0.06931 0.01450 -4.77854 0.00001 temp 0.04445 0.00568 7.82953 0.00000 solar 0.00219 0.00052 4.24768 0.00005 Residual standard error: 0.46651 on 106 degrees of freedom Multiple R-Squared: 0.67369 F-statistic: 72.947 on 3 and 106 degrees of freedom, the p-value is 0 Stepwise regression using AIC: start with the main effects model and allow all two-way interactions and quadratic terms; “lower” specifies the lowest allowable model, which is the main effects model. > stepAIC(m1,list(upper=~.^2+wind^2+temp^2+solar^2,lower=m1)) Start: AIC= -163.82 log(ozone) ~ wind + temp + solar Df Sum of Sq RSS AIC + I(wind^2) 1 1.67592921 21.392844 -170.11663 + I(temp^2) 1 1.25107360 21.817700 -167.95347 + temp:solar 1 1.17208023 21.896693 -167.55592 + wind:temp 1 0.88700820 22.181765 -166.13308 + I(solar^2) 1 0.45453682 22.614236 -164.00908 <none> NA NA 23.068773 -163.82005 + wind:solar 1 0.34252408 22.726249 -163.46557 Step: AIC= -170.12 log(ozone) ~ wind + temp + solar + I(wind^2) Df Sum of Sq RSS AIC + temp:solar 1 0.67869427353 20.714150 -171.66297 + I(temp^2) 1 0.63901036417 20.753834 -171.45243 + I(solar^2) 1 0.51800644492 20.874838 -170.81295 <none> NA NA 21.392844 -170.11663 + wind:solar 1 0.06713886979 21.325705 -168.46239 + wind:temp 1 0.00030265311 21.392541 -168.11818 - I(wind^2) 1 1.67592920662 23.068773 -163.82005 Step: AIC= -171.66 log(ozone) ~ wind + temp + solar + I(wind^2) + temp:solar Df Sum of Sq RSS AIC + I(solar^2) 1 0.7474978978 19.966652 -173.70586 <none> NA NA 20.714150 -171.66297 + I(temp^2) 1 0.2793246140 20.434825 -171.15638 - temp:solar 1 0.6786942735 21.392844 -170.11663 + wind:temp 1 0.0536327944 20.660517 -169.94815 + wind:solar 1 0.0015866564 20.712563 -169.67139 - I(wind^2) 1 1.1825432544 21.896693 -167.55592 Step: AIC= -173.71

Page 16: Chapter 12

Chapter 12, page 16 log(ozone) ~ wind + temp + solar + I(wind^2) + I(solar^2) + temp:solar Df Sum of Sq RSS AIC <none> NA NA 19.966652 -173.70586 + I(temp^2) 1 0.2687418912 19.697910 -173.19646 + wind:temp 1 0.0981394822 19.868512 -172.24786 + wind:solar 1 0.0051540289 19.961498 -171.73426 - I(solar^2) 1 0.7474978978 20.714150 -171.66297 - temp:solar 1 0.9081857264 20.874838 -170.81295 - I(wind^2) 1 1.1811295810 21.147781 -169.38399 Call: lm(formula = log(ozone) ~ wind + temp + solar + I(wind^2) + I(solar^2) + temp: solar, data = Ozone) Coefficients: (Intercept) wind temp solar I(wind^2) 2.7000915 -0.19764083 0.016191722 -0.0024656831 0.0059294158 I(solar^2) temp:solar -0.000012334129 0.0001202964 Degrees of freedom: 110 total; 103 residual Residual standard error: 0.44028512 Stepwise using BIC (k is the multiplier on p; the default value is k=2); > stepAIC(m1,list(upper=~.^2+wind^2+temp^2+solar^2,lower=m1),k=log(110)) Start: AIC= -153.02 log(ozone) ~ wind + temp + solar Df Sum of Sq RSS AIC + I(wind^2) 1 1.67592921 21.392844 -156.61423 + I(temp^2) 1 1.25107360 21.817700 -154.45107 + temp:solar 1 1.17208023 21.896693 -154.05352 <none> NA NA 23.068773 -153.01813 + wind:temp 1 0.88700820 22.181765 -152.63068 + I(solar^2) 1 0.45453682 22.614236 -150.50668 + wind:solar 1 0.34252408 22.726249 -149.96317 Step: AIC= -156.61 log(ozone) ~ wind + temp + solar + I(wind^2) Df Sum of Sq RSS AIC <none> NA NA 21.392844 -156.61423 + temp:solar 1 0.67869427353 20.714150 -155.46008 + I(temp^2) 1 0.63901036417 20.753834 -155.24955 + I(solar^2) 1 0.51800644492 20.874838 -154.61006 - I(wind^2) 1 1.67592920662 23.068773 -153.01813 + wind:solar 1 0.06713886979 21.325705 -152.25951 + wind:temp 1 0.00030265311 21.392541 -151.91530 Call: lm(formula = log(ozone) ~ wind + temp + solar + I(wind^2), data = Ozone) Coefficients: (Intercept) wind temp solar I(wind^2) 1.1932358 -0.22081888 0.041915712 0.0022096915 0.0068982286 Degrees of freedom: 110 total; 105 residual Residual standard error: 0.45137719

Page 17: Chapter 12

Chapter 12, page 17 Bayesion posterior probabilities based on equal priors

Model p SSE R2 MSE AIC BIC PRESS EXP(-BIC) Post. ProbW + T + S + W:T + W:S + T:S 7 21.534 0.695 0.209 -165.39 -146.49 25.62 4.16676E+63 0.00002W + T + S + W:T + W:S 6 22.152 0.687 0.213 -164.28 -148.08 25.56 2.04328E+64 0.00012W + T + S + W:T + T:S 6 21.537 0.695 0.207 -167.38 -151.17 24.51 4.49052E+65 0.00254W + T + S + W:S + T:S 6 21.867 0.691 0.210 -165.70 -149.50 25.44 8.45328E+64 0.00048W + T + S + W:T 5 22.182 0.686 0.211 -166.13 -152.63 24.55 1.93360E+66 0.01093W + T + S + W:S 5 22.726 0.679 0.216 -163.47 -149.96 25.63 1.33906E+65 0.00076W + T + S + T:S 5 21.897 0.690 0.209 -167.56 -154.05 24.54 7.99954E+66 0.04522W + T + S 4 23.069 0.674 0.218 -163.82 -153.02 25.20 2.85589E+66 0.01614W + T + W:T 4 26.372 0.627 0.249 -149.10 -138.30 28.54 1.15592E+60 0.00000W + T 3 26.995 0.618 0.252 -148.53 -140.43 28.78 9.72689E+60 0.00000W + S + W:S 4 36.121 0.489 0.341 -114.50 -103.69 39.39 1.07645E+45 0.00000W + S 3 36.410 0.485 0.340 -115.62 -107.52 38.70 4.95841E+46 0.00000T + S + T:S 4 27.029 0.618 0.255 -146.39 -135.59 29.22 7.69111E+58 0.00000T + S 3 28.038 0.603 0.262 -144.36 -136.26 29.68 1.50302E+59 0.00000W 2 44.985 0.364 0.417 -94.36 -88.95 46.84 4.27065E+38 0.00000T 2 31.908 0.549 0.295 -132.14 -126.74 32.98 1.10276E+55 0.00000S 2 57.974 0.180 0.537 -66.45 -61.05 60.15 3.26346E+26 0.00000Constant 1 70.695 0.000 0.649 -46.63 -43.93 72.00 1.19828E+19 0.00000W + T + S + W^2 + T^2 + S^2 7 20.175 0.715 0.196 -172.56 -153.66 23.57 5.41614E+66 0.03062W + T + S + W^2 + T^2 6 20.754 0.706 0.200 -171.45 -155.25 23.79 2.65594E+67 0.15014W + T + S + W^2 + S^2 6 20.875 0.705 0.201 -170.81 -154.61 23.51 1.40046E+67 0.07917W + T + S + T^2 + S^2 6 21.270 0.699 0.205 -168.75 -152.55 24.15 1.78494E+66 0.01009W + T + S + W^2 5 21.393 0.697 0.204 -170.12 -156.61 23.65 1.03481E+68 0.58499W + T + S + T^2 5 21.818 0.691 0.208 -167.95 -154.45 24.36 1.19339E+67 0.06746W + T + S + S^2 5 22.614 0.680 0.215 -164.01 -150.51 25.12 2.32093E+65 0.00131W + T + W^2 + T^2 5 24.924 0.647 0.237 -153.31 -139.81 28.19 5.23253E+60 0.00000W + T + W^2 4 25.390 0.641 0.240 -153.27 -142.47 27.68 7.48057E+61 0.00000W + T + T^2 4 25.998 0.632 0.245 -150.67 -139.87 28.33 5.55609E+60 0.00000W + S + W^2 + S^2 5 29.996 0.576 0.286 -132.94 -119.43 32.79 7.37547E+51 0.00000W + S + W^2 4 32.958 0.534 0.311 -124.58 -113.78 35.31 2.59434E+49 0.00000W + S + S^2 4 33.350 0.528 0.315 -123.28 -112.47 36.12 7.00004E+48 0.00000T + S + T^2 + S^2 5 25.466 0.640 0.243 -150.95 -137.44 28.14 4.89140E+59 0.00000T + S + T^2 4 26.418 0.626 0.249 -148.91 -138.11 28.58 9.55897E+59 0.00000T + S + S^2 4 27.207 0.615 0.257 -145.67 -134.87 29.39 3.74366E+58 0.00000W + W^2 3 41.263 0.416 0.386 -101.86 -93.76 43.98 5.24144E+40 0.00000T + T^2 3 30.579 0.567 0.286 -134.82 -126.72 32.32 1.08093E+55 0.00000S + S^2 3 49.093 0.306 0.459 -82.74 -74.64 51.72 2.60459E+32 0.00000 1.76893E+68 1.000000.207 Example: Data were collected for each of the 50 states on the average SAT score and a number of other variables. The reason for collecting the other variables is to help explain the discrepancy between states' SAT averages. For example, many midwestern states (Montana included) have much higher SAT scores than other regions. A closer look reveals that

Page 18: Chapter 12

Chapter 12, page 18 this difference is due primarily to the fact that only the better students in these states actually take the SAT exam. Hence it is important to examine what factors affect the average SAT scores for each state. Some of the variables considered as ``explanatory'' variables were: \begin{enumerate} \item Percentage of eligible students who took the exam (TAKERS) \item Median income of families of test-takers (INCOME) \item Average number of years of study in social science, natural science, and humanities among the test-takers (YEARS) \item Percentage of test-takers in public schools (PUBLIC) \item State expenditures in hundreds of dollars per student (EXPEND) \item Median percentile ranking of test-takers within their schools (RANK). \end{enumerate} Before fitting any models, it is a good idea to examine the relationships between all pairs of variables. A scatterplot matrix and a correlation matrix are very useful. The variable TAKERS appears to have a nonlinear relationship with SAT score so we may want to consider a transformation of takers: log of TAKERS appears to work well. There also appear to be a couple of outliers; Alaska is a particularly extreme outlier on state expenditures (EXPEND). We can try For this data set, there are other possible objectives, besides finding good models for predicting SAT score. For example: \begin{quotation} {\em After accounting for the percentage of students who took the test (Log(TAKERS)) and the median class rank of the test-takers (RANK), which variables are important predictors of state SAT scores?} \end{quotation} \begin{quotation} {\em After accounting for the percentage of students who took the test (TAKERS) and the median class rank of the test-takers (RANK), which states performed best for the amount of money they spend?} \end{quotation} The first question might be examined by looking at partial correlations between SAT score and other variables after adjusting for TAKERS and RANK. Added variable plots and partial residual plots (available in S-Plus on the regression menu) allow us to look at this visually (these plots should be obtained by adding each variable separately to the model with TAKERS and RANK. The second question could be answered in this way. First, fit the regression model involving the TAKERS and RANK variables. What do the resulting residuals tell us? The residuals are the difference in the observed SAT scores and those predicted by the variables TAKERS and RANK. A positive residual means the SAT score is higher than predicted and a negative residual means it is lower

Page 19: Chapter 12

Chapter 12, page 19 than predicted based on these 2 variables. The states could then be ranked based on these residuals. \end{document} \underline{Note:} Both AIC and BIC are available in S-Plus in the MASS library. The AIC of any fitted linear model can be obtained by the command \textbf{extractAIC(m)} and the BIC by \textbf{extractAIC(m,k=log(n))} where m is a fitted model and $n$ is the sample size. Stepwise regression using AIC or BIC is obtained from the \textbf{stepAIC} command which is illustrated on a separate handout. Example Ozone data without case 17. n = 110 cases. Dependent variable is log10(ozone).

Page 20: Chapter 12

Chapter 12, page 20 All possible models with main effects and two-way interactions

Model p SSE R2 MSE AIC BIC PRESSW + T + S + W:T + W:S + T:S 7 21.534 0.695 0.209 -165.39 -146.49 25.62W + T + S + W:T + W:S 6 22.152 0.687 0.213 -164.28 -148.08 25.56W + T + S + W:T + T:S 6 21.537 0.695 0.207 -167.38 -151.17 24.51W + T + S + W:S + T:S 6 21.867 0.691 0.210 -165.70 -149.50 25.44W + T + S + W:T 5 22.182 0.686 0.211 -166.13 -152.63 24.55W + T + S + W:S 5 22.726 0.679 0.216 -163.47 -149.96 25.63W + T + S + T:S 5 21.897 0.690 0.209 -167.56 -154.05 24.54W + T + S 4 23.069 0.674 0.218 -163.82 -153.02 25.20W + T + W:T 4 26.372 0.627 0.249 -149.10 -138.30 28.54W + T 3 26.995 0.618 0.252 -148.53 -140.43 28.78W + S + W:S 4 36.121 0.489 0.341 -114.50 -103.69 39.39W + S 3 36.410 0.485 0.340 -115.62 -107.52 38.70T + S + T:S 4 27.029 0.618 0.255 -146.39 -135.59 29.22T + S 3 28.038 0.603 0.262 -144.36 -136.26 29.68W 2 44.985 0.364 0.417 -94.36 -88.95 46.84T 2 31.908 0.549 0.295 -132.14 -126.74 32.98S 2 57.974 0.180 0.537 -66.45 -61.05 60.15Constant 1 70.695 0.000 0.649 -46.63 -43.93 72.00 All possible models with main effects and quadratic terms

Model p SSE R2 MSE AIC BIC PRESSW + T + S + W^2 + T^2 + S^2 7 20.175 0.715 0.196 -172.56 -153.66 23.57W + T + S + W^2 + T^2 6 20.754 0.706 0.200 -171.45 -155.25 23.79W + T + S + W^2 + S^2 6 20.875 0.705 0.201 -170.81 -154.61 23.51W + T + S + T^2 + S^2 6 21.270 0.699 0.205 -168.75 -152.55 24.15W + T + S + W^2 5 21.393 0.697 0.204 -170.12 -156.61 23.65W + T + S + T^2 5 21.818 0.691 0.208 -167.95 -154.45 24.36W + T + S + S^2 5 22.614 0.680 0.215 -164.01 -150.51 25.12W + T + W^2 + T^2 5 24.924 0.647 0.237 -153.31 -139.81 28.19W + T + W^2 4 25.390 0.641 0.240 -153.27 -142.47 27.68W + T + T^2 4 25.998 0.632 0.245 -150.67 -139.87 28.33W + S + W^2 + S^2 5 29.996 0.576 0.286 -132.94 -119.43 32.79W + S + W^2 4 32.958 0.534 0.311 -124.58 -113.78 35.31W + S + S^2 4 33.350 0.528 0.315 -123.28 -112.47 36.12T + S + T^2 + S^2 5 25.466 0.640 0.243 -150.95 -137.44 28.14T + S + T^2 4 26.418 0.626 0.249 -148.91 -138.11 28.58T + S + S^2 4 27.207 0.615 0.257 -145.67 -134.87 29.39W + W^2 3 41.263 0.416 0.386 -101.86 -93.76 43.98T + T^2 3 30.579 0.567 0.286 -134.82 -126.72 32.32S + S^2 3 49.093 0.306 0.459 -82.74 -74.64 51.72