Ken Black QA 5th chapter14 Solution

8/3/2019 Ken Black QA 5th chapter14 Solution

1/43

Chapter 14: Simple Regression Analysis 1

Chapter 14Simple Regression Analysis

LEARNING OBJECTIVES

The overall objective of this chapter is to give you an understanding of bivariateregression analysis, thereby enabling you to:

1. Compute the equation of a simple regression line from a sample of data andinterpret the slope and intercept of the equation.

2. Understand the usefulness of residual analysis in examining the fit of theregression line to the data and in testing the assumptions underlying regressionanalysis.

3. Compute a standard error of the estimate and interpret its meaning.4. Compute a coefficient of determination and interpret it.5. Test hypotheses about the slope of the regression model and interpret the results.6. Estimate values of y using the regression model.7. Develop a linear trend line and use it to forecast.

CHAPTER TEACHING STRATEGY

This chapter is about all aspects of simple (bivariate, linear) regression. Early inthe chapter through scatter plots, the student begins to understand that the object of simple regression is to fit a line through the points. Fairly soon in the process, the studentlearns how to solve for slope and y intercept and develop the equation of the regression

line. Most of the remaining material on simple regression is to determine how good thefit of the line is and if assumptions underlying the process are met.

The student begins to understand that by entering values of the independentvariable into the regression model, predicted values can be determined. The questionthen becomes: Are the predicted values good estimates of the actual dependent values?One rule to emphasize is that the regression model should not be used to predict for


2/43


independent variable values that are outside the range of values used to construct themodel. MINITAB issues a warning for such activity when attempted. There are manyinstances where the relationship between x and y are linear over a given interval butoutside the interval the relationship becomes curvilinear or unpredictable. Of course,with this caution having been given, many forecasters use such regression models to

extrapolate to values of x outside the domain of those used to construct the model. Suchforecasts are introduced in section 14.8, Using Regression to Develop a ForecastingTrend Line. Whether the forecasts obtained under such conditions are any better than"seat of the pants" or "crystal ball" estimates remains to be seen.

The concept of residual analysis is a good one to show graphically andnumerically how the model relates to the data and the fact that it more closely fits some

points than others, etc. A graphical or numerical analysis of residuals demonstrates thatthe regression line fits the data in a manner analogous to the way a mean fits a set of numbers. The regression model passes through the points such that the vertical distancesfrom the actual y values to the predicted values will sum to zero. The fact that the

residuals sum to zero points out the need to square the errors (residuals) in order to get ahandle on total error. This leads to the sum of squares error and then on to the standarderror of the estimate. In addition, students can learn why the process is called leastsquares analysis (the slope and intercept formulas are derived by calculus such that thesum of squares of error is minimized - hence "least squares"). Students can learn that byexamining the values of se, the residuals, r 2, and the t ratio to test the slope they can beginto make a judgment about the fit of the model to the data. Many of the chapter problemsask the student to comment on these items ( se, r 2, etc.).

It is my view that for many of these students, the most important facet of thischapter lies in understanding the "buzz" words of regression such as standard error of theestimate, coefficient of determination, etc. because they may only interface regressionagain as some type of computer printout to be deciphered. The concepts then may bemore important important than the calculations.


3/43


CHAPTER OUTLINE

14.1 Introduction to Simple Regression Analysis

14.2 Determining the Equation of the Regression Line

14.3 Residual AnalysisUsing Residuals to Test the Assumptions of the Regression ModelUsing the Computer for Residual Analysis

14.4 Standard Error of the Estimate

14.5 Coefficient of DeterminationRelationship Between r and r 2

14.6 Hypothesis Tests for the Slope of the Regression Model and Testing the OverallModel

Testing the SlopeTesting the Overall Model

14.7 EstimationConfidence Intervals to Estimate the Conditional Mean of y: y/x Prediction Intervals to Estimate a Single Value of y

14.8 Using Regression to Develop a Forecasting Trend LineDetermining the Equation of the Trend LineForecasting Using the Equation of the Trend LineAlternate Coding for Time Periods

14.8 Interpreting Computer Output

KEY TERMS

Coefficient of Determination ( r 2) Prediction IntervalConfidence Interval Probabilistic ModelDependent Variable Regression AnalysisDeterministic Model ResidualHeteroscedasticity Residual PlotHomoscedasticity Scatter PlotIndependent Variable Simple RegressionLeast Squares Analysis Standard Error of the Estimate ( se)Outliers Sum of Squares of Error (SSE)


4/43


SOLUTIONS TO CHAPTER 14

14.1 x x

12 1721 1528 22

8 1920 24

0

5

10

15

20

25

30

0 5 10 15 20 25 30

x

x = 89 y = 97 xy = 1,767

x2= 1,833 y2 = 1,935 n = 5

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy2

2 )(=

5)89(

833,1

5)97)(89(

767,12

= 0.162

b0 =5

89162.0

597

1 =

n

xb

n

y= 16.51

y = 16.51 + 0.162 x


5/43


14.2 x_ _ y _ 140 25119 29103 46

91 7065 8829 11224 128

x = 571 y = 498 xy =

30,099 x2 = 58,293 y2 = 45,154 n = 7

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

7)571(

293,58

7)498)(571(

099,30

2

= -0.898

b0 =7

571)898.0(

7498

1 =

n

xb

n

y= 144.414

y = 144.414 0.898 x


6/43


14.3 (Advertising) x (Sales) y 12.5 148

3.7 5521.6 33860.0 994

37.6 5416.1 8916.8 12641.2 379

x = 199.5 y = 2,670 xy = 107,610.4 x2 = 7,667.15 y2 = 1,587,328 n = 8

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy2

2 )(=

8)5.199(

15.667,7

8)670,2)(5.199(

4.610,107

2

=

15.240

b0 =8

5.19924.15

8670,2

1 =

n

xb

n

y= -46.292

y = -46.292 + 15.240 x

14.4 (Prime) x (Bond) y 16 56 128 94 157 7

x = 41 y = 48 xy = 333 x2 = 421 y2 = 524 n = 5

b1 =

=

n x

x

n

y x xy

SS

SS

x

xy

22 )( =

5)41(421

5)48)(41(

333

2

= -0.715

b0 =5

41)715.0(

548

1 =

n

xb

n

y= 15.460

y = 15.460 0.715 x


7/43


14.5 Bankruptcies( y ) Firm Births( x )34.3 58.1

35.0 55.438.5 57.040.1 58.535.5 57.437.9 58.0

x = 344.4 y = 221.3 x2 = 19,774.78

y2 = 8188.41 xy = 12,708.08 n = 6

b1 =

=

n

x x

n y x xy

SS

SS

x

xy

22 )(

=

6)4.344(

78.774,19

6)3.221)(4.344(08.708,12

2

=

b1 = 0.878

b0 =6

4.344)878.0(

63.221

1 =

n

xb

n

y= -13.503

y

= -13.503 + 0.878 x


8/43


14.6 No. of Farms ( x) Avg. Size ( y)

5.65 213

4.65 2583.96 2973.36 3402.95 3742.52 4202.44 4262.29 4412.15 4602.07 4692.17 4342.10 444

x = 36.31 y = 4,576 x2 = 124.7931

y2 = 1,825,028 xy = 12,766.71 n = 12

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

12)31.36(

7931.124

12)576,4)(31.36(

71.766,12

2

= -72.328

b0 = 1231.36)328.72(12576,41 = n x

bn

y= 600.186

y = 600.186 72.3281 x


9/43


14.7 Steel New Orders99.9 2.7497.9 2.8798.9 2.9387.9 2.8792.9 2.9897.9 3.09

100.6 3.36104.9 3.61105.3 3.75108.6 3.95

x = 994.8 y = 32.15 x2

= 99,293.28

y2 = 104.9815 xy = 3,216.652 n = 10

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

10)8.994(

28.293,99

10)15.32)(8.994(

652.216,3

2

=

0.05557

b0 =10

8.994)05557.0(10

15.321 = n xbn y = -2.31307

y = -2.31307 + 0.05557 x


10/43


14.8 x y 15 478 36

19 5612 445 21

y = 13.625 + 2.303 x

Residuals:

x y y Residuals ( y- y )15 47 48.1694 -1.1694

8 36 32.0489 3.9511

19 56 57.3811 -1.381112 44 41.2606 2.73945 21 25.1401 -4.1401

14.9 x y Predicted ( y ) Residuals ( y- y )12 17 18.4582 -1.458221 15 19.9196 -4.919628 22 21.0563 0.9437

8 19 17.8087 1.191320 24 19.7572 4.2428

y = 16.51 + 0.162 x

14.10 x y Predicted ( y ) Residuals ( y- y )140 25 18.6597 6.3403119 29 37.5229 -8.5229103 46 51.8948 -5.8948

91 70 62.6737 7.326365 88 86.0281 1.972029 112 118.3648 -6.364824 128 122.8561 5.1439

y = 144.414 - 0.898 x


11/43


14.11 x y Predicted ( y ) Residuals ( y- y )12.5 148 144.2053 3.7947

3.7 55 10.0954 44.904721.6 338 282.8873 55.112760.0 994 868.0945 125.905537.6 541 526.7236 14.2764

6.1 89 46.6708 42.329216.8 126 209.7364 -83.736441.2 379 581.5868 -202.5868

y = -46.292 + 15.240 x

14.12 x y Predicted ( y ) Residuals ( y- y )16 5 4.0259 0.9741

6 12 11.1722 0.82788 9 9.7429 -0.74294 15 12.6014 2.39867 7 10.4576 -3.4575

y = 15.460 - 0.715 x

14.13 _ x_ _ y_ Predicted ( y ) Residuals ( y- y )58.1 34.3 37.4978 -3.197855.4 35.0 35.1277 -0.127757.0 38.5 36.5322 1.967858.5 40.1 37.8489 2.251157.4 35.5 36.8833 -1.383358.0 37.9 37.4100 0.4900

The residual for x = 58.1 is relatively large, but the residual for x = 55.4 is quitesmall.


12/43


14.14 _ x_ _ y _ Predicted (y ) Residuals ( y- y )5 47 42.2756 4.72447 38 38.9836 -0.9836

11 32 32.3997 -0.399612 24 30.7537 -6.7537

19 22 19.2317 2.768325 10 9.3558 0.6442

y = 50.506 - 1.646 x

No apparent violation of assumptions

14.15 Miles ( x ) Cost y ( y ) ( y- y )1,245 2.64 2.5376 .1024

425 2.31 2.3322 -.02221,346 2.45 2.5629 -.1128

973 2.52 2.4694 .0506255 2.19 2.2896 -.0996865 2.55 2.4424 .1076

1,080 2.40 2.4962 -.0962296 2.37 2.2998 .0702

y = 2.2257 0.00025 x

No apparent violation of assumptions


13/43


14.16

Error terms appear to be non independent

14.17

There appears to be nonlinear regression

14.18 The MINITAB Residuals vs. Fits graphic is strongly indicative of a violation of the homoscedasticity assumption of regression. Because the residuals are veryclose together for small values of x, there is little variability in the residuals at theleft end of the graph. On the other hand, for larger values of x, the graph flaresout indicating a much greater variability at the upper end. Thus, there is a lack of homogeneity of error across the values of the independent variable.


14/43


14.19 SSE = y2 b0 y - b1 XY = 1,935 - (16.51)(97) - 0.1624(1767) = 46.5692

35692.46

2=

=

nSSE

s e = 3.94

Approximately 68% of the residuals should fall within 1 se.3 out of 5 or 60% of the actual residuals fell within 1 se.

14.20 SSE = y2 b0 y - b1 XY = 45,154 - 144.414(498) - (-.89824)(30,099) =

SSE = 272.0

5

0.272

2=

=

n

SSE

s e = 7.376

6 out of 7 = 85.7% fall within + 1 se 7 out of 7 = 100% fall within + 2 se

14.21 SSE = y2 b0 y - b1 XY = 1,587,328 - (-46.29)(2,670) - 15.24(107,610.4) =

SSE = 70,940

6940,70

2=

=

nSSE

s e = 108.7

Six out of eight (75%) of the sales estimates are within $108.7 million.

14.22 SSE = y2 b0 y - b1 XY = 524 - 15.46(48) - (-0.71462)(333) = 19.8885

38885.19

2=

=

nSSE

s e = 2.575

Four out of five (80%) of the estimates are within 2.575 of the actual rate for bonds. This amount of error is probably not acceptable to financial analysts.


15/43


14.23 _ x_ _ y_ Predicted ( y ) Residuals ( y- y ) 2)( y y 58.1 34.3 37.4978 -3.1978 10.225955.4 35.0 35.1277 -0.1277 0.016357.0 38.5 36.5322 1.9678 3.8722

58.5 40.1 37.8489 2.2511 5.067557.4 35.5 36.8833 -1.3833 1.913558.0 37.9 37.4100 0.4900 0.2401

2)( y y = 21.3355

SSE = 2)( y y = 21.3355

43355.21

2=

=

nSSE

se = 2.3095

This standard error of the estimate indicates that the regression model iswith + 2.3095(1,000) bankruptcies about 68% of the time. In this particular problem, 5/6 or 83.3% of the residuals are within this standarderror of the estimate.

14.24 ( y- y ) ( y- y )2 4.7244 22.3200

-0.9836 .9675

-0.3996 .1597-6.7537 45.61252.7683 7.66350.6442 .4150

( y- y )2 = 77.1382

SSE = 2)( y y = 77.1382

41382.77

2=

=

nSSE

se = 4.391


16/43


14.25 ( y- y ) ( y- y )2 .1024 .0105

-.0222 .0005-.1129 .0127.0506 .0026

-.0996 .0099.1076 .0116-.0962 .0093.0702 .0049

( y- y )2 = .0620 SSE = 2)( y y = .0620

60620.

2=

=

nSSE

s e = .1017

The model produces estimates that are .1017 or within about 10 cents 68% of thetime. However, the range of milk costs is only 45 cents for this data.

14.26 Volume ( x ) Sales ( y )728.6 10.5497.9 48.1439.1 64.8377.9 20.1375.5 11.4363.8 123.8276.3 89.0

n = 7 x = 3059.1 y = 367.7

x2 = 1,464,071.97 y2 = 30,404.31 xy = 141,558.6

b1 = -.1504 b0 = 118.257

y = 118.257 - .1504 x

SSE = y2 b0 y - b1 XY

= 30,404.31 - (118.257)(367.7) - (-0.1504)(141,558.6) = 8211.6245

56245.8211

2=

=

nSSE

s e = 40.526

This is a relatively large standard error of the estimate given the sales values(ranging from 10.5 to 123.8).


17/43


14.27 r 2 =5

)97(935,1

6399.461

)(1 22

2

=

n

y y

SSE

= .123

This is a low value of r 2

14.28 r 2 =7

)498(154,45

121.2721

)(1 22

2

=

n

y y

SSE

= .972

This is a high value of r 2

14.29 r 2 =8

)670,2(328,587,1

940,701

)(1 22

2 =

n

y y

SSE

= .898

This value of r 2 is relatively high

14.30 r 2

=5

)48(524

8885.191

)(1 22

2

=

n y y

SSE

= .685

This value of r 2 is a modest value.68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.

14.31 r 2 =6

)3.221(41.188,8

33547.211

)(1 22

2

=

n

y y

SSE

= .183

This value is a low value of r 2.Only 18.3% of the variability of y is accounted for by the x values and 81.7% areunaccounted for.


18/43


14.32 CCI Median Income116.8 37.415

91.5 36.77068.5 35.50161.6 35.047

65.9 34.70090.6 34.942100.0 35.887104.6 36.306125.4 37.005

x = 323.573 y = 824.9 x2 = 11,640.93413 y2 = 79,718.79 xy = 29,804.4505 n = 9

b1 =

=

n x

x

n

y x xy

SS

SS

x

xy

22 )( =

9)573.323(93413.640,11

9)9.824)(573.323(

4505.804,29

2

=

b1 = 19.2204

b0 =9

573.323)2204.19(

99.824

1 =

n

xb

n

y= -599.3674

y = -599.3674 + 19.2204 x

SSE = y2

b0 y - b1 XY =

79,718.79 (-599.3674)(824.9) 19.2204(29,804.4505) = 1283.13435

713435.1283

2=

=

nSSE

se = 13.539

r 2 =9

)9.824(79.718,79

13435.12831

)(1 22

2

=

n

y y

SSE

= .688

14.33 sb =5

)89(833.1

94.3

)( 222

=

n

x x

se= .2498


19/43


b1 = 0.162

Ho: = 0 = .05Ha: 0

This is a two-tail test, /2 = .025 df = n - 2 = 5 - 2 = 3

t .025,3 = 3.182

t =2498.

0162.011 =

b sb

= 0.65

Since the observed t = 0.65 < t .025,3 = 3.182, the decision is to fail to reject thenull hypothesis .

14.34 sb =7

)571(293,58

376.7

)( 222

=

n

x x

se= .068145

b1 = -0.898

Ho: = 0 = .01Ha: 0

Two-tail test, /2 = .005 df = n - 2 = 7 - 2 = 5

t .005,5 = 4.032

t =068145.

0898.011 =

b sb

= -13.18

Since the observed t = -13.18 < t .005,5 = -4.032, the decision is to reject the nullhypothesis .

14.35 sb =8

)5.199(15.667,7

7.108

)( 222

=

n

x x

se= 2.095

b1 = 15.240


20/43


Ho: = 0 = .10Ha: 0

For a two-tail test, /2 = .05 df = n - 2 = 8 - 2 = 6

t .05,6 = 1.943

t =095.2

0240,1511 =

b sb

= 7.27

Since the observed t = 7.27 > t .05,6 = 1.943, the decision is to reject the nullhypothesis .

14.36 sb =5

)41(421

575.2

)( 222 =

n

x x

se= .27963

b1 = -0.715

Ho: = 0 = .05Ha: 0


t .025,3 = 3.182

t =27963.

0715.011 =

b sb

= -2.56

Since the observed t = -2.56 > t .025,3 = -3.182, the decision is to fail to reject thenull hypothesis .


21/43


14.37 sb =6

)4.344(78.774,19

3095.2

)( 222

=

n

x x

se= 0.926025

b1 = 0.878

Ho: = 0 = .05Ha: 0


t .025,4 = 2.776

t =926025.

0878.011 =b s

b = 0.948


14.38 F = 8.26 with a p-value of .021. The overall model is significant at = .05 butnot at = .01. For simple regression,

t = F = 2.874

t .05,8 = 1.86 but t .01,8 = 2.896. The slope is significant at = .05 but not at = .01.


22/43


14.39 x0 = 2595% confidence /2 = .025df = n - 2 = 5 - 2 = 3 t .025,3 = 3.182

589==

n x x = 17.8

x = 89 x2 = 1,833

se = 3.94

y = 16.5 + 0.162(25) = 20.55

y

t /2,n-2 se

+

n x x

x x

n2

2

20

)(

)(1

20.55 3.182(3.94)

5)89(

833,1

)8.1725(

5

12

2

+

= 20.55 3.182(3.94)(.63903) =

20.55 8.01

12.54 < E ( y25) < 28.56


23/43


14.40 x0 = 100 For 90% confidence, /2 = .05df = n - 2 = 7 - 2 = 5 t .05,5 = 2.015

7571==

n x x = 81.57143

x= 571 x2 = 58,293 se = 7.377

y = 144.414 - .0898(100) = 54.614

y

t /2,n-2 se

++

n

x x

x x

n 22

20

)(

)(11

=

54.614 2.015(7.377)

7)571(

293,58

)57143.81100(71

1 22

++

=

54.614 2.015(7.377)(1.08252) = 54.614 16.091

38.523 < y < 70.705

For x0 = 130, y = 144.414 - 0.898(130) = 27.674

y t /2,n-2 se

++

n

x x

x x

n 22

20

)(

)(11

=

27.674 2.015(7.377)

7)571(

293,58

)57143.81130(

7

11 2

2

++

=

27.674 2.015(7.377)(1.1589) = 27.674 17.227

10.447 < y < 44.901

The width of this confidence interval of y for x0 = 130 is wider that theconfidence interval of y for x0 = 100 because x0 = 100 is nearer to the value of

x = 81.57 than is x0 = 130.


24/43


14.41 x0 = 20 For 98% confidence, /2 = .01df = n - 2 = 8 - 2 = 6 t .01,6 = 3.143

85.199==

n

x x = 24.9375

x = 199.5 x2 = 7,667.15 se = 108.8

y = -46.29 + 15.24(20) = 258.51

y

t /2,n-2 se

+

n

x x

x x

n 22

20

)(

)(1

258.51 (3.143)(108.8)

8)5.199(

15.667,7)9375.2420(81 2

2

+

258.51 (3.143)(108.8)(0.36614) = 258.51 125.20

133.31 < E ( y20) < 383.71

For single y value:

y

t /2,n-2 se

++

n

x x

x x

n 22

20

)(

)(11

258.51 (3.143)(108.8)

8)5.199(

15.667,7

)9375.2420(81

1 22

++

258.51 (3.143)(108.8)(1.06492) = 258.51 364.16

-105.65 < y < 622.67

The confidence interval for the single value of y is wider than the confidenceinterval for the average value of y because the average is more towards themiddle and individual values of y can vary more than values of the average.


25/43


14.42 x0 = 10 For 99% confidence /2 = .005df = n - 2 = 5 - 2 = 3 t .005,3 = 5.841

541==

n x x = 8.20

x = 41 x2 = 421 se = 2.575

y = 15.46 - 0.715(10) = 8.31

y

t /2,n-2 se

+

n

x x

x x

n 22

20

)(

)(1

8.31 5.841(2.575)

5)41(

421

)2.810(51

2

2

+

=

8.31 5.841(2.575)(.488065) = 8.31 7.34

0.97 < E ( y10) < 15.65

If the prime interest rate is 10%, we are 99% confident that the average bond rate

is between 0.97% and 15.65%.


26/43


14.43 Year Fertilizer 2001 11.92002 17.9

2003 22.02004 21.82005 26.0

x = 10,015 y = 99.6 xy = 199,530.9 x2= 20,060,055 y2 = 2097.26 n = 5

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

5)015,10(

055,060,20

5)6.99)(015,10(

9.530,199

2

= 3.21

b0 =5015,10

21.35

6.991 =

n

xb

n

y= -6,409.71

y = -6,409.71 + 3.21 x

y (2008) = -6,409.71 + 3.21(2008) = 35.97


27/43


14.44 Year Fertilizer 1998 58601999 6632

2000 71252001 60002002 43802003 33262004 2642

x = 14,007 y = 35,965 xy = 71,946,954 x2= 28,028,035 y2 = 202,315,489 n = 7

b1 =

=

n x

x

n

y x xy

SS

SS

x

xy

22 )( =

7)007,14(035,028,28

7)965,35)(007,14(

954,946,71

2

=

-678.9643

b0 =7007,14

9643.6787965,35

1 =

n

xb

n

y= 1,363,745.39

y = 1,363,745.39 + -678.9643 x

y (2007) = 1,363,745.39 + -678.9643(2007) = 1,064.04


28/43


14.45 Year Quarter Cum. Quarter( x) Sales( y)2003 1 1 11.93

2 2 12.46

3 3 13.284 4 15.082004 1 5 16.08

2 6 16.823 7 17.604 8 18.66

2005 1 9 19.732 10 21.113 11 22.214 12 22.94

Use the cumulative quarters as the predictor variable, x, to predict sales, y.

x = 78 y = 207.9 xy = 1,499.07 x2= 650 y2 = 3,755.2084 n = 12

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

12)78(

650

12)9.207)(78(

07.499,1

2

= 1.033

b0 =1278

033.112

9.2071 =

n

xb

n

y= 10.6105

y = 10.6105 + 1.033 x

Remember, this trend line was constructed using cumulative quarters. To forecastsales for the third quarter of year 2007, we must convert this time frame tocumulative quarters. The third quarter of year 2007 is quarter number 19 in our scheme.

y

(19) = 10.6105 + 1.033(19) = 30.2375


29/43


14.46 x y 5 87 9

3 1116 2712 15

9 13

x = 52 x2 = 564 y = 83 y2 = 1,389 b1 = 1.2853 xy = 865 n = 6 b0 = 2.6941

a) y = 2.6941 + 1.2853 x

b) y (Predicted Values) ( y- y ) residuals 9.1206 -1.120611.6912 -2.6912

6.5500 4.450023.2588 3.741218.1177 -3.117614.2618 -1.2618

c) ( y- y )2

1.2557

7.242619.802513.9966

9.71941.5921

SSE = 53.6089

46089.53

2=

=

nSSE

se = 3.661

d) r 2 =6

)83(389,1

6089.531

)(1

222

=

n y y

SSE

= .777


30/43


e) H o: = 0 = .01Ha: 0

Two-tailed test, /2 = .005 df = n - 2 = 6 - 2 = 4

t .005,4 = 4.604

sb =6

)52(564

661.3

)( 222

=

n

x x

se= .34389

t =34389.

02853.111 =

b sb

= 3.74

Since the observed t = 3.74 < t .005,4 = 4.604, the decision is to fail to rejectthe null hypothesis .

f) The r 2 = 77.74% is modest. There appears to be some prediction with thismodel. The slope of the regression line is not significantly different fromzero using = .01. However, for = .05, the null hypothesis of a zeroslope is rejected. The standard error of the estimate, se = 3.661 is not

particularly small given the range of values for y (11 - 3 = 8).

14.47 x y 53 547 541 750 458 1062 1245 3

60 11

x = 416 x2 = 22,032 y = 57 y2 = 489 b1 = 0.355 xy = 3,106 n = 8 b0 = -11.335

a) y = -11.335 + 0.355 x


31/43


b) y (Predicted Values) ( y- y ) residuals 7.48 -2.48

5.35 -0.35

3.22 3.786.415 -2.4159.255 0.745

10.675 1.3254.64 -1.649.965 1.035

c) ( y- y )2 6.15040.1225

14.2884

5.83220.55501.75562.68961.0712

SSE = 32.4649

d) se =64649.32

2=

nSSE = 2.3261

e) r 2

=8

)57(489

4649.321

)(1 22

2

=

n y y

SSE

= .608

f) H o: = 0 = .05

Ha: 0

Two-tailed test, /2 = .025 df = n - 2 = 8 - 2 = 6

t .025,6 = 2.447

sb =8

)416(032,22

3261.2

)( 222

=

n

x x

se= 0.116305

t =116305.

03555.011 =

b sb

= 3.05


32/43


Since the observed t = 3.05 > t .025,6 = 2.447, the decision is to reject thenull hypothesis .

The population slope is different from zero.

g) This model produces only a modest r 2 = .608. Almost 40% of thevariance of y is unaccounted for by x. The range of y values is 12 - 3 = 9and the standard error of the estimate is 2.33. Given this small range, the

se is not small.

14.48 x = 1,263 x2 = 268,295 y = 417 y2 = 29,135

xy = 88,288 n = 6

b0 = 25.42778 b1 = 0.209369

SSE = y2 - b0 y - b1 xy =

29,135 - (25.42778)(417) - (0.209369)(88,288) = 46.845468

r 2 = 5.153845468.46

1)(

1 22

=

n

y y

SSE

= .695

Coefficient of determination = r 2 = .695

14.49a) x0 = 60 x = 524 x2 = 36,224 y = 215 y2 = 6,411 b1 = .5481 xy = 15,125 n = 8 b0 = -9.026

se = 3.201 95% Confidence Interval /2 = .025df = n - 2 = 8 - 2 = 6

t.025,6 = 2.447

y = -9.026 + 0.5481(60) = 23.86


33/43


8524==

n

x x = 65.5

y

t /2,n-2 se

+

n

x x

x x

n 22

20

)(

)(1

23.86 + 2.447(3.201)

8)524(

224,36

)5.6560(81

2

2

+

23.86 + 2.447(3.201)(.375372) = 23.86 + 2.94

20.92 < E ( y60) < 26.8

b) x0 = 70

y 70 = -9.026 + 0.5481(70) = 29.341

y

+ t /2,n-2 se

++

n

x x

x x

n 22

20

)(

)(11

29.341 + 2.447(3.201)

8)524(

224,36

)5.6570(

8

11 2

2

++

29.341 + 2.447(3.201)(1.06567) = 29.341 + 8.347

20.994 < y < 37.688

c) The confidence interval for (b) is much wider because part (b) is for a single valueof y which produces a much greater possible variation. In actuality, x0 = 70 in

part (b) is slightly closer to the mean ( x) than x0 = 60. However, the width of thesingle interval is much greater than that of the average or expected y value in part (a).


34/43


14.50 Year Cost1 562 543 49

4 465 45

x = 15 y = 250 xy = 720 x2= 55 y2 = 12,594 n = 5

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

5)15(

55

5)250)(15(

720

2

= -3

b0 =5

15)3(

5250

1 =

n

xb

n

y= 59

y = 59 - 3 x

y (7) = 59 - 3(7) = 38

14.51 y = 267 y2 = 15,971

x = 21 x2

= 101 xy = 1,256 n = 5

b0 = 9.234375 b1 = 10.515625

SSE = y2 - b0 y - b1 xy =

15,971 - (9.234375)(267) - (10.515625)(1,256) = 297.7969

r 2 = 2.713,17969.297

1)(

1 22

=

n

y

y

SSE

= .826

If a regression model would have been developed to predict number of cars sold by the number of sales people, the model would have had an r 2 of 82.6%. Thesame would hold true for a model to predict number of sales people by thenumber of cars sold.


35/43


14.52 n = 12 x = 548 x2 = 26,592 y = 5940 y2 = 3,211,546 xy = 287,908

b1 = 10.626383 b0 = 9.728511

y = 9.728511 + 10.626383 x

SSE = y2 - b0 y - b1 xy =

3,211,546 - (9.728511)(5940) - (10.626383)(287,908) = 94337.9762

109762.337,94

2=

=

nSSE

s e = 97.1277

r 2 = 246,2719762.337,94

1)(

12

2

=

n y

y

SSE

= .652

t =

12)548(

592,26

1277.970626383.10

2

= 4.33

If = .01, then t .005,10 = 3.169. Since the observed t = 4.33 > t .005,10 = 3.169, thedecision is to reject the null hypothesis .


36/43


14.53 Sales( y ) Number of Units( x )

17.1 12.47.9 7.5

4.8 6.84.7 8.74.6 4.64.0 5.12.9 11.22.7 5.12.7 2.9

y = 51.4 y2 = 460.1 x = 64.3 x2 = 538.97 xy = 440.46 n = 9

b1 = 0.92025 b0 = -0.863565

y = -0.863565 + 0.92025 x

SSE = y2 - b0 y - b1 xy =

460.1 - (-0.863565)(51.4) - (0.92025)(440.46) = 99.153926

r 2 = 55.166

153926.991

)(1 2

2

=

n y y

SSE

= .405


37/43


14.54 Year Total Employment1995 11,1521996 10,935

1997 11,0501998 10,8451999 10,7762000 10,7642001 10,6972002 9,2342003 9,2232004 9,158

x = 19,995 y = 103,834 xy =207,596,350

x2

= 39,980,085 y2

= 1,084,268,984 n = 7

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

10)995,19(

085,980,39

10)834,103)(995,19(

350,596,207

2

=

-239.188

b0 =10995,19

)188.239(10

834,1031 =

n

xb

n

y= 488,639.564

y = 488,639.564 + -239.188 x

y (2008) = 488,639.564 + -239.188(2008) = 8,350.30


38/43


14.55 1977 2003581 666213 214

668 496345 2041476 16001776 6278

x= 5059 y = 9458 x2 = 6,280,931 y2 = 42,750,268 xy = 14,345,564 n = 6

b1 =

=

n x

x

n

y x xy

SS

SS

x

xy

22 )( =

6)5059(931,280,6

6)9358)(5059(

272,593,13

2

= 3.1612

b0 =6

5059)1612.3(

69458

1 =

n

xb

n

y= -1089.0712

y = -1089.0712 + 3.1612 x

for x = 700:

y = 1076.6044

y

+ t /2,n-2 se

+

n

x x

x x

n 22

20

)(

)(1

= .05, t .025,4 = 2.776

x0 = 700, n = 6

x = 843.167

SSE = y2 b0 y b1 xy =

42,750,268 (-1089.0712)(9458) (3.1612)(14,345,564) = 7,701,506.49


39/43


449.506,701,7

2=

=

nSSE

s e = 1387.58

Confidence Interval =

1123.757 + (2.776)(1387.58)6

)5059(931,280,6

)167.843700(

6

12

2

+ =

1123.757 + 1619.81

-496.05 to 2743.57

H0: 1 = 0Ha: 1 0

= .05 df = 4

Table t .025,4 = 2.776

t = 8231614.9736.2

833.350,015,2

58.138701612.301 =

=

b sb

= 3.234

Since the observed t = 3.234 > t .025,4 = 2.776, the decision is to reject the nullhypothesis .

14.56 x = 11.902 x2 = 25.1215 y = 516.8 y2 = 61,899.06 b1 = 66.36277 xy = 1,202.867 n = 7 b0 = -39.0071

y = -39.0071 + 66.36277 x

SSE = y2 - b0 y - b1 xy

SSE = 61,899.06 - (-39.0071)(516.8) - (66.36277)(1,202.867) = 2,232.343

5 343.232,22 == nSSE se = 21.13

r 2 =7

)8.516(06.899,61

343.232,21

)(1 22

2

=

n

y y

SSE

= 1 - .094 = .906

14.57 x = 44,754 y = 17,314 x2 = 167,540,610


40/43


y2 = 24,646,062 n = 13 xy = 59,852,571

b1 =

=

n x

x

n

y x xy

SS

SS

x

xy

22 )( =

13)754,44(610,540,167

13)314,17)(754,44(

571,852,59

2

= .

01835

b0 =13

754,44)01835(.

13314,17

1 =

n

xb

n

y= 1268.685

y = 1268.685 + .01835 x

r 2 for this model is .002858. There is no predictability in this model.

Test for slope: t = 0.18 with a p-value of 0.8623. Not significant

Time-Series Trend Line:

x = 91 y = 44,754 xy = 304,797 x2= 819 y2 = 167,540,610 n = 13

b1 =

=

n

x x

n

y x xy

SS

SS

x

xy

22 )(

=

13)91(

819

13)754,44)(91(

797,304

2

= -46.5989

b0 =1391

)5989.46(13

754,441 =

n

xb

n

y= 3,768.81

y = 3,768.81 46.5989 x

y (2007) = 3,768.81 - 46.5989(15) = 3,069.83


41/43


14.58 x = 323.3 y = 6765.8 x2 = 29,629.13 y2 = 7,583,144.64 xy = 339,342.76 n = 7

b1 =

=

n

x x

n y x xy

SS

SS

x

xy

22 )(

=

7)3.323(

13.629,29

7)8.6765)(3.323(76.342,339

2

=

1.82751

b0 =7

3.323)82751.1(

78.6765

1 =

n

xb

n

y= 882.138

y = 882.138 + 1.82751 x

SSE = y2 b0 y b1 xy

= 7,583,144.64 (882.138)(6765.8) (1.82751)(339,342.76) = 994,623.07

507.623,994

2=

=

nSSE

s e = 446.01

r 2 =7

)8.6765(64.144,583,7

07.623,9941

)(1 22

2

=

n

y y

SSE

= 1 - .953 = .047

H0: = 0Ha: 0 = .05 t .025,5 = 2.571

SSxx =( )

7)3.323(

13.629,2922

2 = n

x x = 14,697.29

t =

29.697,14

01.446082751.101 =

xx

e

SS

sb

= 0.50



42/43


14.59 Let Water use = y and Temperature = x

x = 608 x2 = 49,584 y = 1,025 y2 = 152,711 b1 = 2.40107

xy = 86,006 n = 8 b0 = -54.35604

y = -54.35604 + 2.40107 x

y 100 = -54.35604 + 2.40107(100) = 185.751

SSE = y2 - b0 y - b1 xy

SSE = 152,711 - (-54.35604)(1,025) - (2.40107)(86,006) = 1919.5146

6

5146.919,1

2=

=

n

SSE s

e = 17.886

r 2 =8

)1025(711,152

5145.919,11

)(1 22

2

=

n

y y

SSE

= 1 - .09 = .91

Testing the slope:

Ho: = 0Ha: 0 = .01

Since this is a two-tailed test, /2 = .005df = n - 2 = 8 - 2 = 6

t .005,6 = 3.707

s b =8

)608(584,49

886.17

)( 222

=

n

x x

se= .30783

t = 30783.040107.2

11

=

b sb

= 7.80

Since the observed t = 7.80 < t .005,6 = 3.707, the decision is to reject the nullhypothesis .


43/43


14.60 a) The regression equation is: y = 67.2 0.0565 x

b) For every unit of increase in the value of x, the predicted value of y willdecrease by -.0565.

c) The t ratio for the slope is 5.50 with an associated p-value of .000. This issignificant at = .10. The t ratio negative because the slope is negative andthe numerator of the t ratio formula equals the slope minus zero.

d) r 2 is .627 or 62.7% of the variability of y is accounted for by x. This is onlya modest proportion of predictability. The standard error of the estimate is10.32. This is best interpreted in light of the data and the magnitude of thedata.

e) The F value which tests the overall predictability of the model is 30.25. For simple regression analysis, this equals the value of t 2 which is (-5.50) 2.

f) The negative is not a surprise because the slope of the regression line is alsonegative indicating an inverse relationship between x and y. In addition,taking the square root of r 2 which is .627 yields .7906 which is the magnitudeof the value of r considering rounding error.

14.61 The F value for overall predictability is 7.12 with an associated p-value of .0205which is significant at = .05. It is not significant at alpha of .01.The

coefficient of determination is .372 with an adjusted r 2

of .32. This representsvery modest predictability. The standard error of the estimate is 982.219which982.219, which in units of 1,000 laborers means that about 68% of the

predictions are within 982,219 of the actual figures. The regression model is: Number of Union Members = 22,348.97 - 0.0524 Labor Force. For a labor forceof 100,000 (thousand, actually 100 million), substitute x = 100,000 and get a

predicted value of 17,108.97 (thousand) which is actually 17,108,970 unionmembers.

14.62 The Residual Model Diagnostics from MINITAB indicate a relatively healthy setof residuals. The Histogram indicates that the error terms are generally normallydistributed. This is somewhat confirmed by the semi straight line Normal Plot of Residuals. However, the Residuals vs. Fits graph indicates that there may besome heteroscedasticity with greater error variance for small x values.

Ken Black QA 5th chapter14 Solution

Documents

Transcript of Ken Black QA 5th chapter14 Solution