0 10 20 30 40 50 60 70 80 Multiple Regression › mparker › 1342 › tf › mm ›...

231

CHAPTER

28

9080706050403020100

600

580

560

540

520

500

480

460

Multiple Regression Multiple Regression Multiple regression fits the regression equation

μ y = β0 + β1X1 + β2 X2 +L +βk X k

to data in selected response and predictor variables. To run a multiple regression analysis using Minitab, select

Stat Regression Regression

from the menu. In the dialog box, enter the response variable and as many explanatory variables as you like. In Example 28.1 of BPS, we examine the relationship between percent reporting for jury selection, y and sequentially numbered reporting dates, x. The data comes from two years, 1998 and 2000. To illustrate the relationship between x and y for the two different years, we will use a scatterplot. A variety of methods were used to increase the reporting percentage. To see if these were successful, we want to compare 1998 and 2000. The data in Table 28.1 of BPS and TA28‐01.MTW is in separate columns for each year, we will need to stack the data and create a variable for Year. This can be done by selecting Data

232 Chapter 28

Stack Block of Columns from the Minitab menu. Filling in the dialog box as follows, we obtain columns for x, y, and the year.

With the data in the appropriate format, we can make the scatterplot by selecting Graph Scatterplot With Regression and Groups from the Minitab menu. This will allow us to

separately fit regression lines for each year.

We specify the Y variable, the X variable, and the Categorical variable, as shown in the following Scatterplot dialog box.

Multiple Regression 233

We notice from the scatterplot below that the intercepts for the two regression lines are much different, but the slopes are about the same. This indicates that efforts to increase the percentage reporting have been successful.

Reporting Date

Perc

enta

ge

2520151050

100

90

80

70

60

50

Year19982000

Scatterplot of Percentage vs Reporting Date

Since efforts to increase the percentage of jurors reporting for jury duty have been successful, we will add an indicator variable that is equal to 1 for year 2000 and 0 for year 1998. This indicator variable will be used as an explanatory variable in our regression model. To code

234 Chapter 28

the variable, we select Data Code and then either Numeric to Numeric or Text to Numeric from the Minitab menu. This will depend on how the original variable is stored. In the dialog box, specify the old column, the new column with the coded values, the original values and the new values. The original values can be specified as a range. For example 0:50 would mean that any value from 0 to 50 is recoded.

Once we have the indicator variable, we can include it as an explanatory variable in the multiple regression equation. We select Stat Regression Regression from the dialog box and enter the Response variable along with the Predictor variables. In the example below, we have two predictors variables, Reporting Date and the Indicator variable. The regression output appears after the Regression dialog box.


Regression Analysis: Percentage versus Reporting Date, Indicator The regression equation is Percentage = 77.1 - 0.717 Reporting Date + 17.8 Indicator Predictor Coef SE Coef T P Constant 77.082 2.130 36.19 0.000 Reporting Date -0.7168 0.1241 -5.78 0.000 Indicator 17.833 1.861 9.58 0.000 S = 6.70905 R-Sq = 71.9% R-Sq(adj) = 70.7% Analysis of Variance Source DF SS MS F P Regression 2 5637.4 2818.7 62.62 0.000 Residual Error 49 2205.6 45.0 Total 51 7842.9 Source DF Seq SS Reporting Date 1 1503.0 Indicator 1 4134.4 Unusual Observations Reporting Obs Date Percentage Fit SE Fit Residual St Resid 43 17.0 65.900 82.729 1.386 -16.829 -2.56R 46 20.0 94.700 80.579 1.543 14.121 2.16R R denotes an observation with a large standardized residual.

Inference for multiple regression The preceding output provides the estimated regression coefficients; b0 = 77.082, b1 = ‐0.7168, and b2 = 17.833. These values are rounded and presented in the regression equation.

Percentage = 77.1 ‐ 0.717 Reporting Date + 17.8 Indicator The estimate of σ is given as s = 6.70905. The estimate is calculated as

1

2

−−Σ

=pnes i

where the ie ʹs are the residuals and p is the number of predictor variables. In the column marked SE Coef are the estimated standard errors:

0bs = 2.130, 1bs = 0.1241, and

2bs = 1.861. A level C confidence interval for jβ can be computed as

jbj stb *±

where t* is the upper 2)1( C− critical value for the )1( −− pnt distribution. This is exactly the same as for simple linear regression. In this case, p = 2, so the number of degrees of freedom is

3−n . Since n = 52, the degrees of freedom is 49. To test the hypothesis 0:0 =jH β , the value of t is computed as .

jbj sb For each coefficient, the value appears in the column marked t‐ratio. The values are given as 36.19, ‐5.78,

236 Chapter 28

and 9.58. The P‐values for a test against 0: ≠jaH β are provided in the column marked P and are from the )1( −− pnt distribution. The analysis of variance table for multiple regression is illustrated below. It has the same format as for simple linear regression. The only difference is that the number of degrees of freedom for the model increases from 1 to p, reflecting the fact that there are p explanatory variables. Similarly, the number of degrees of freedom for the error decreases from 2−n to

.1−− pn Analysis of Variance SOURCE DF SS MS F p Regression p ( )2ˆ yyi −Σ MSM=SSM/DFM MSM/MSE Error 1−− pn 2)ˆ( ii yy −Σ MSE=SSE/DFE Total 1−n 2)( yyi −Σ The value of MSE is the estimate of 2σ . In the example above, it is given as 45.0. This value could also be obtained by squaring the estimate of σ (s = 2.450). The ratio MSM/MSE is an F statistic for testing the null hypothesis

0: 210 ==== pH βββ L

against the alternative hypothesis

0: ≠jaH β for at least one pj ,,2,1 K=

The test statistic has the )1,( −− pnpF distribution. In the example above, F = 8.78. The P‐value listed under the column marked p is given as 0.001. This means that there is strong evidence that at least one 0≠jβ . The value of R‐sq is listed above as 71.9%. This means that the proportion of the variation in profits that is explained by assets and sales is

719.0SSTSSM2 ==R

An assumption of the multiple linear regression is that the residuals are normally distributed. This assumption should be verified by examining residual plots as well as a histogram or Normal quantile plot of the residuals. These can be obtained by clicking on the Graphs button in the Regression dialog box. To obtain prediction intervals for a new observation, click on the Option button after selecting Stat Regression Regression from the Minitab menu. In the Options subdialog box, enter the values of the explanatory variables for the new observation. The values must be entered in the same order as in the regression equation. The values entered below correspond to Reporting Date = 10 and the indicator variable =1, corresponding to the year 2000.


Minitab computes a confidence interval for the mean response for these values as well as a prediction interval for an individual response for the same set of values. The output shown on the following page will appear at the bottom of the regression output. As with simple regression, the prediction interval is wider than the interval for the mean response even though the prediction is the same for both.

Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 87.747 1.386 (84.963, 90.531) (73.980, 101.514) Values of Predictors for New Observations New Reporting Obs Date Indicator 1 10.0 1.00

The confidence interval for the mean response is listed under 95.0% CI as (84.963, 90.563).

This interval is μμ SE*t± . SE� is given as SE Fit = 1.386. The individual prediction interval is listed under 95% PI as (73.980, 101.514). This interval is yty ˆSE*ˆ ± . The value of ySE is not given on the Minitab output, but it is easily obtained from the following formula

( )2ˆ2

ˆ SE=SE μ+sy Remember that before using regression inference, the data must satisfy the regression model assumptions. We can examine residual plots to check required assumptions by clicking on the Graphs button on the regression dialog box. For multiple regression, it is convenient to select the Four in one radio button as shown below.

238 Chapter 28

The following residual plots indicate that the linear trend, normality, and constant variance assumptions appear to be reasonable. Since the 26 groups are randomly selected each year, it is reasonable to assume that the reporting percentages are independent.

20100-10-20

99

90

50

10

1

Residual

Per

cent

90807060

10

0

-10

-20

Fitted Value

Res

idua

l

1680-8-16

16

12

8

4

0

Residual

Freq

uenc

y

50454035302520151051

10

0

-10

-20

Observation Order

Res

idua

l

Normal Probability Plot of the Residuals Residuals Versus the Fitted Values

Histogram of the Residuals Residuals Versus the Order of the Data

Residual Plots for Percentage

The Multiple Linear Regression Model When the relationship between a response and an explanatory variable is curved, a quadratic function may be an appropriate model. In Example 28.15 of BPS, a couple is examining the relationship between Price and Carat for diamonds. The data is given in TA28_04.MTW. A fitted line plot of the quadratic model is a good way to see if a quadratic relationship is reasonable. To make a fitted line plot, select


Stat Regression Fitted Line Plot

from the menu and select a quadratic model as shown below. This fits a model of the form 2

210ˆ xxy βββ ++= as shown below the dialog box.

3.53.02.52.01.51.00.50.0

60000

50000

40000

30000

20000

10000

0

Carat

Tota

l Pri

ce

S 2126.76R-Sq 92.6%R-Sq(adj) 92.5%

Fitted Line PlotTotal Price = - 522.7 + 2386 Carat

+ 4498 Carat**2

The following output shows a multiple regression model with the variables Carat and Carat2.

240 Chapter 28

Polynomial Regression Analysis: Total Price versus Carat The regression equation is Total Price = - 522.7 + 2386 Carat + 4498 Carat**2 S = 2126.76 R-Sq = 92.6% R-Sq(adj) = 92.5% Analysis of Variance Source DF SS MS F P Regression 2 1.96158E+10 9807882561 2168.39 0.000 Error 348 1.57404E+09 4523116 Total 350 2.11898E+10 Sequential Analysis of Variance Source DF SS F P Linear 1 1.82930E+10 2203.91 0.000 Quadratic 1 1.32274E+09 292.44 0.000

The output shows that 92.6% of the variation in the total price of the diamonds in the database is explained using this quadratic regression model. However, residual plots show some problems. The model can be improved by using additional explanatory variables such as Color, Clarity, Depth, or Price/Carat in addition to Carat and Carat2. When a categorical variable is used for prediction, it is usually best to made variables to indicate whether or not the value is in a particular category. For example, we may wish to use color to predict price. To make the appropriate indicator variables, select

Calc Make Indicator Variables

from the menu to create the indicator variables. For example, the variable Color_D will be equal to 1 for diamonds with that color and equal to 0 for other diamonds.

In this example, there are 7 colors, so 7 indicator variables will be created. You can use at most 6 of these in the regression model. If you include all 7, Minitab will give a message indicating


that one variable is highly correlated with other X variables. In this case, one variable will be removed from the equation.

Interaction effects are easily modeled by multiplying two variables together. Select Calc Calculator to create an interaction variable. Once all the indicator and interaction variables

have been construction, Minitab can help you to select from among the possible models. Select

Stat Regression Best Subsets

from the menu to identify the best fitting regression models that can be constructed with the predictor variables that you specify. Best subsets regression is an efficient way to identify models that achieve your goals with as few predictors as possible. By default, all possible subsets of the predictors are evaluated, beginning with all models containing one predictor, and then all models containing two predictors, and so on. You may want to specify the particular variables, such as Carat and Carat2 below are included in all models. Models are evaluated based on r2, adjusted R‐Sq, C‐p, and s.

EXERCISES

28.3 Table 28.2 and TA28‐02.MTW provide more data on the reporting percentages for randomly selected registered voters who received a summons to appear for jury duty in the Franklin County Municipal Court for 1985 and 1997 through 2004. Each year 26 different groups of potential jurors are randomly selected to serve two weeks of jury duty. The reporting dates vary slightly from year‐to‐year so they are coded sequentially from 1, the first group to report in January, to 26, the last group to report in December. The jury commissioner and other officials use the data in Table 28.2 to evaluate their efforts to improve turnout for the pool of potential jurors.

242 Chapter 28

(a) Select Stat Regression Regression to compute the least squares regression line for predicting the reporting percentage from the coded reporting date in 1985.

(b) Compute the least squares regression line for predicting the reporting percentage from the coded reporting date in 1997.

(c) Interpret the value of the slope for each of your estimated models. (d) Are the two estimated slopes about the same? (e) Would you be willing to use the multiple regression model with equal slopes to

predict the reporting percentages in 1985 and 1997? Explain why or why not. 28.4 In Example 28.3 the indicator variable for year ( 2 0x = for 1998 and 2 1x = for 2000) was

used to combine the two separate regression models into one. The data is in Table 28.1 of BPS and TA28‐01.MTW. Suppose that we instead use an indicator variable 3x that reverses the two years, so that 3 1x = for 1998 and 3 0x = for 2000. The mean reporting percentage is 0 1 1 2 3y x xμ β β β= + + , where 1x is the code for the reporting date and 3x is

an indicator variable to identify the year. Select Stat Regression Regression to find the estimated regression model. (a) Select Data Stack from the menu so that the data is in columns for reporting

percentage, reporting data, and year. Select Data Code from the menu to create the indicator variable x3. Select Stat Regression Regression to obtain a least‐squares line for each year.

(b) How do your estimated regression lines in part (a) compare with the estimated regression lines provided for each year in Example 28.3?

(c) Will the regression standard error change when this new indicator variable is used? Explain.

28.5 Table 28.2 and TA28‐02.MTW provide more data on the reporting percentages for

randomly selected registered voters who received a summons to appear for jury duty in the Franklin County Municipal Court for 1985 and 1997 through 2004. (a) Select Stat Regression Regression to compute the least squares regression

line for predicting the reporting percentage from the coded reporting date in 2003.

(b) Compute the least squares regression line for predicting the reporting percentage from the coded reporting date in 2004.

(c) Interpret the value of the slope for each of your estimated models. (d) Are the two estimated slopes about the same? (e) Would you be willing to use the multiple regression model with equal slopes to

predict the reporting percentages in 2003 and 2004? Explain why or why not. (f) How does the estimated slope in 2003 compare with the estimated slope

obtained in Example 25.2 for 1998 and 2000?


(g) Based on the descriptive statistics and scatterplots provided in Exercise 28.3, Example 28.1, and above, do you think that the jury commissioner is happy with the modifications he made to improve the reporting percentages?

28.8 Does the general relationship between metabolic rate and body mass described in

Example 28.10 hold for tobacco hornworm caterpillars? The data are provided in Table 28.3 in BPS and TA28_03.MTW. (a) Select Stat Regression Regression to find the regression line for the

response variable log(MR) on log(BM). Use the regression equation to estimate α and β in the general relationship MR BM βα= or

log( ) log( ) log( )MR BMμ α β= + . Recall that the predicted model is ˆ log( )y a b BM= + .

(b) Click on the graphs button and select the radio button for the Four in one graphs. Based on the graphs, do you think that the conditions for inference are satisfied?

(c) Identify the percentage of variation in log( )MR that is explained by using linear regression with the explanatory variable log( )BM .

(d) Even if you noticed some departures from the conditions for inference, the researchers were interested in making inferences because this model is well known in the field and has been used for a variety of different insects and animals. Find a 95% confidence interval for the slope parameter β .

(e) Are the values 32=β and 4

3=β contained in your confidence interval?

(f) Use appropriate values from the Minitab output to test the claim that 32=β .

(g) Use appropriate values from the Minitab output to test the claim that 43=β .

28.9 Does the general relationship between metabolic rate and body mass described in

Example 28.10 hold for tobacco hornworm caterpillars? The data are provided in Table 28.3 in BPS and TA28_03.MTW. Select Stat Regression Regression to find the regression line for the response variable log(MR) on log(BM). Use the output provided in Example 28.10 to answer the questions below. (a) Find a 95% confidence interval for the slope parameter β for caterpillars during

instar 4. (b) If you were asked to report a confidence interval for the slope parameter β for

caterpillars during instar 5, would you report the same interval that you calculated in part (a)? Explain why or why not.

(c) Are the values 32=β and 4

3=β contained in your confidence interval from part (a)?

(d) How does your confidence interval in part (a) compare with the confidence interval you computed in part (d) of Exercise 28.8?

(e) Use appropriate values from the output to test the claim that 32=β .

244 Chapter 28

(f) Use appropriate values from the output to test the claim that 43=β .

28.14 Examine the relationship between math SAT scores and the percentage of high school

graduates who take the SAT. Select Data Code from the Minitab menu to create an indicator variable which is equal to one for states with at most 50% of students taking the SAT and equal to 0 for states with a lower percentage. Also, select Calc Calculator from the menu to crease an interaction variable equal to the indicator times the percent variable. Select Stat Regression Regression from the Minitab menu and use the percent, the indicator variable, and the interaction variable as explanatory variables to predict math SAT scores as described in Example 28.11. Click on Graphs and then Four in one to produce the residual graphs. Use the regression output below to answer the following questions. (a) What is the estimated regression line for predicting mean verbal SAT score for

states with more than half of high school graduates taking the SAT. (b) What is the estimated regression line for predicting mean verbal SAT score for

states with at most half of high school graduates taking the SAT. (c) Does the ANOVA F statistic indicate that at least one of the explanatory variables

is useful in predicting mean verbal SAT scores? Explain. (d) Interpret the squared multiple correlation. (e) A t distribution was used to compute the P‐values provided after each Tstat in

the table. How many degrees of freedom does that t distribution have? (f) Identify the value you would use to estimate the standard deviation σ . (g) Select Graph Scatterplot from the menu to create a scatterplot containing the

estimated regression lines for each cluster. (h) Examine the plot the residuals against the fitted values. Does this plot indicate

any serious problems with the conditions for inference? (i) Use the histogram of the residuals to check the normality condition. Do you

think the residuals follow a normal distribution? 28.15 The data in EX28‐15.MTW shows the progress of world record times (in seconds) for the

10,000‐meter run for both men and women. (a) Select Graph Scatterplot With Regression and Groups from the Minitab

menu to make a scatterplot of world record time against year, using separate symbols for men and women. Describe the pattern for each sex. Then compare the progress of men and women.

(b) Select Stat Regression Regression to fit the model with two regression lines, one for women and one for men, and identify the estimated regression lines.

(c) Women began running this long distance later than men, so we might expect their improvement to be more rapid. Moreover, it is often said that men have little advantage over women in distance running as opposed to sprints, where


muscular strength plays a greater role. Do the data appear to support these claims?

28.19 An experiment was conducted using a Geiger‐Mueller tube in a physics lab. Geiger‐

Mueller tubes respond to gamma rays and to beta particles (electrons). A pulse which corresponds to each detection of a decay product is produced and these pulses were counted using a computer‐based nuclear counting board. The time (in seconds) and counts of pulses for a short‐lived unstable isotope of silver are shown in Table 28.5 in BPS and TA20‐05.MTW. (a) Select Graph Scatterplot from the Minitab menu to create a plot of the counts

versus time. Describe the pattern. (b) Since some curvature is apparent in the scatterplot, you might want to consider

the quadratic model for predicting counts based on time. Select Stat Regression Fitted line plot to fit the quadratic model. Identify the estimated mean response.

(c) Would you recommend the use of the quadratic model for predicting radioactive decay in this situation? Explain.

(d) Select Calc Calculator from the Minitab menu to transform the counts using the natural logarithm and create a scatterplot of the transformed variable versus time.

(e) Select Stat Regression Fitted line plot to fit a simple linear regression model using the natural logarithm of the counts. Click the Graphs button to provide the appropriate residual plots.

(f) Does the simple linear regression model for the transformed counts fit the data better than the quadratic regression model? Explain.

28.20 Use the SAT data found in EX04‐04.MTW to evaluate similar models for SAT verbal

scores. (a) Select Stat Regression Fitted line plot to find the least squares line for

predicting verbal SAT scores from percent taking the exam and make a plot verbal SAT score versus percent taking the exam. Click the Graphs buttom to provide the appropriate residual plots.

(b) Are you happy with the fit of your model? Comment on the value of 2R and the residual plots.

(c) Fit a model with two regression lines. Identify the two lines, parameter estimates, t statistics, and corresponding P‐values. Does this model improve the fit?

(d) Specify and fit the model suggested by the inferences for the model in part (c). Identify the two lines, parameter estimates, t statistics, and corresponding P‐values. Are you happy with the fit of this model? Explain.

246 Chapter 28

28.24 Information regarding tuition and fees at a small liberal arts college from 1951 to 2005, with one exception, is provided in Table 28.7 in BPS and in TA28‐07.MTW. (a) Select Stat Regression Regression to find the simple linear regression

equation for predicting tuition and fees from year and save the residuals and fitted values.

(b) The value of tuition and fees in 1961 is missing from the data set. Click on the Options button in the Regression dialog box and enter 1960 in the Prediction interval for a new observation box to estimate the missing value.

(c) Does the estimate obtained in part (b) intuitively make sense to you? That is, are you happy with this estimate? Explain.

(d) Click on the Graphs button on the Regression dialog box to plot the residuals against year. What does the plot tell you about the adequacy of the linear fit?

(e) Will this linear model overestimate or underestimate the tuition and fees at this college in the 1990ʹs?

(f) Since the residual plot shows a quadratic trend, it might be helpful to add a quadratic term to this model. Select Stat Regression Fitted Line Plot from the menu to fit the quadratic regression model and provide the estimated model.

(h) Does the quadratic model provide a better fit than the linear model? (i) Would you be willing to make inferences based on the quadratic model?

Explain.

28.25 Table 28.8 and TA28‐08.MTW contains data on the size of perch caught in a lake in Finland. (a) Select Stat Regression Regression from the Minitab menu to fit the multiple

regression model with two explanatory variables, length and width, to predict the weight of a perch.

(b) How much of the variation in the weight of perch is explained by the model in part (a)?

(c) Does the ANOVA table indicate that at least one of the explanatory variables is helpful in predicting the weight of perch? Explain.

(d) Do the individual t tests indicate that both 1β and 2β are significantly different from zero? Explain.

(e) Create a new variable, called interaction, that is the product of length and width. Use the multiple regression model with three explanatory variables, length, width and interaction, to predict the weight of a perch. Provide the estimated multiple regression equation.

(f) How much of the variation in the weight of perch is explained by the model in part (e)?

(g) Does the ANOVA table indicate that at least one of the explanatory variables is helpful in predicting the weight of perch? Explain.


(h) Describe how the individual t statistics changed when the interaction term was added.

28.27 Table 28‐8 and TA28‐08.MTW contains data on the size of perch caught in a lake in

Finland. Select Stat Regression Regression from the Minitab menu to fit a multiple regression model. Use explanatory variables length, width, and interaction from Exercise 28.25 on the 56 perch to provide confidence intervals for the mean and prediction intervals for future observations. Interpret both intervals for the tenth perch in the data set. What t distribution is used to provide both intervals?

28.28 The data provided in Table 28.6 of BPS and TA28‐06.MTW represent a random sample

of 60 customers from a large clothing retailer. The manager of the store is interested in predicting how much a customer will spend on their next purchase. The average purchase amount

121212

DollarPurchaseFreq

=

was found to be a good predictor. So, the manager would like you to consider another explanatory variable that is the average purchase amount from the previous 12 months.

24 121224 12

Dollar DollarPurchase bFreq Freq

⎛ ⎞−= ⎜ ⎟−⎝ ⎠

.

(a) Select Calc Calculator from the Minitab menu to create the new variables. Select Stat Regression Regression to fit a model for Amount using these new explanatory variables. What is the 2R for this model? How does this value compare to the 2R in Example 28.19 in BPS?

(b) What is the value of the individual t statistic for this new explanatory variable? How much did the individual t statistics change from their previous values?

(c) Would you recommend this model over the model in Example 28.19? Explain. 28.41 A multimedia statistics learning system includes a test of skill in using the computerʹs

mouse. The software displays a circle at a random location on the computer screen. The subject clicks in the circle with the mouse as quickly as possible. A new circle appears as soon as the subject clicks the old one. Table 5.3 of BPS and TA05‐03.MTW give data for one subjectʹs trials, 20 with each hand. Distance is the distance from the cursor location to the center of the new circle, in units whose actual size depends on the size of the screen. Time is the time required to click in the new circle, in milliseconds.

(a) Specify the population multiple regression model for predicting time from distance separately for each hand. Make sure you include the interaction term that is necessary to allow for the possibility of having different slopes. Explain in words what each β in your model means.

(b) Select Stat Regression Regression from the Minitab menu to find the estimated multiple regression equation for predicting time from distance

248 Chapter 28

separately for each hand. What percentage of variation in the distances is explained by this multiple regression model?

(c) Explain how to use the estimated multiple regression equation in part (b) to obtain the least squares line for each hand. Select Graph Scatterplot With Regression and Groups to draw these lines on a scatterplot of time versus distance.

28.42 We assume that our wages will increase as we gain experience and become more

valuable to our employers. Wages also increase because of inflation. By examining a sample of employees at a given point in time, we can look at part of the picture. How does length of service (LOS) relate to wages? Table 28.10 and TA28‐10.MTW give data on the LOS in months and wages for 60 women who work in Indiana banks. Wages are yearly total income divided by the number of weeks worked. We have multiplied wages by a constant for reasons of confidentiality. (a) Select Graph Scatterplot With Groups to plot wages versus LOS using

different symbols for size of the bank. There is one woman with relatively high wages for her length of service. Circle this point and do not use it in the rest of this exercise.

(b) Would you be willing to use a multiple regression model with parallel slopes to predict wages from LOS for the two different size banks? Explain.

(c) Select Stat Regression Regression to fit a model that will allow you to test the hypothesis that the slope of the regression line for small banks is equal to the slope of the regression line for large banks. Conduct the test for equal slopes.

(d) Are the conditions for inference met for your model in part (c)? Construct appropriate residual plots and comment.

28.43 Table 20.11 in BPS and TA28‐11.MTW contain data on the mean annual temperatures

(degrees Fahrenheit) for the years 1951 to 2000 at two locations in California: Pasadena and Redding.

(a) Select Graph Scatterplot With Groups to plot the temperatures versus year using different symbols for the two cities

(b) Would you be willing to use a multiple regression model with parallel slopes to predict temperatures from year for the two different cities? Explain.

(c) Select Stat Regression Regression from the Minitab menu to fit a model that will allow you to test the hypothesis that the slope of the regression line for Pasadena is equal to the slope of the regression line for Redding. Conduct the test for equal slopes.

(d) Are the conditions for inference met for your model in part (c)? Construct appropriate residual plots and comment.


28.46 Many exercise bikes, elliptical trainers, and treadmills display basic information like distance, speed, calories burned per hour (or total calories), and duration of the workout. The data in Table 28.9 and TA28‐09.MTW show the treadmill display’s claimed calories per hour by speed for a 175 pound male using a Cybex treadmill at inclines of 0%, 2%, and 4%. (a) Select Calc Make Indicator Variables from the Minitab menu to make

indicator variables from the incline values. Select Stat Regression Regression to fit a multiple regression model to predict Calories from Speed and the incline indicator variables.

(b) How many separate lines are fit with this model? Do the lines all have the same slope? Identify each fitted line.

(c) Do you think this model provides a good fit for these data? Explain. (d) Is there significant evidence that more calories are burned for higher speeds?

State the hypotheses, identify the test statistic and P‐value, and provide a conclusion in the context of this question.

28.47 Table 28.13 provides data on speed and calories burned per hour for a 175 pound male

using two different treadmills (a Cybex and a LifeFitness) at inclines of 0%, 2%, and 4%. Select Data Stack Block of Columns from the Minitab menu to create columns for speed, calories, and treadmill/incline. (a) Select Graph Scatterplot With Groups to create a scatterplot of calories

against miles per hour using six different plotting symbols, one for each combination of incline level and machine.

(b) Select Calc Make Indicator Variables from the Minitab menu to create indicator variables for brand of treadmill and incline. Select Stat Regression Regression to fit a multiple regression model to predict Calories from MPH, Ind_slow, NoIncline, 2%Incline, and Treadmill brand.

(c) Does the model provide a good fit for these data? Explain. (d) Is there a significant difference in the relationship between calories and speed for

the two different treadmills? 28.50 Table 28.15 and TA28‐15.MTW contain measured and self‐estimated reading ability data

for 60 fifth grade students randomly sampled from one elementary school. The variables are: OBS = number for each individual, Sex = gender of the individual, LSS = median grade level of studentʹs selection of ʺbest for me to readʺ (8 reps, each with 4 choices at grades 3, 5, 7, 9 level), IQ = IQ score, Read = score on reading subtest of the Metropolitan Achievement Test, and EST = studentʹs own estimate of his/her reading ability, scale 1 to 5 (1 = low). (a) Select Calc Make Indicator Variables from the menu to create an indicator

variable for gender and fit an appropriate multiple regression model to see if the

250 Chapter 28

relationship between measured and self‐estimated reading ability the same for both boys and girls.

(b) Select Stat Regression Regression from the Minitab menu to fit a multiple regression model for predicting IQ from the explanatory variables LSS, Read, and EST. Are you happy with the fit of this model? Explain.

(c) Click on the Graphs button in the regression dialog box to produce the appropriate residual plots. Use residual plots to check the appropriate conditions for your model.

(d) Only two of the three explanatory variables have parameters that are significantly different from zero according to the individual t tests. Drop the explanatory variable that is not significant and add the interaction term for the two remaining explanatory variables. Are you surprised by the results from fitting this new model? Explain what happened to the individual t tests for the two explanatory variables.

28.53 Use the data provided in Table 28.4 of BPS and TA28‐04.MTW to fit the multiple

regression model with two explanatory variables, carat and depth, to predict the total price of diamonds. Don’t forget to include the interaction term in your model. (a) Select Stat Regression Regression from the Minitab menu to identify the

estimated multiple regression equation. (b) Conduct the overall F‐test for the model. (c) Identify the estimated regression parameters, standard errors, and t statistics

with P‐values. (d) Click on the Graph button in the regression dialog box to prepare residuals plots.

Comment on whether the conditions for inference are satisfied. (e) What percentage of variation in the total price is explained by this model? (f) Find an estimate for σ and interpret this value.

0 10 20 30 40 50 60 70 80 Multiple Regression › mparker › 1342 › tf › mm ›...

Documents

Transcript of 0 10 20 30 40 50 60 70 80 Multiple Regression › mparker › 1342 › tf › mm ›...