ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the...

16
ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function of weight, horsepower, use of air conditioning, etc. Metal fatigue in airplanes is a function of number of takeoffs and landings, climbout speed, landing speed, etc. Incidence of heart attack is a function of age, BMI, cholesterol levels, etc. If the function that defines the relationship between the indicator variables and the response is linear, then we have multiple linear regression, i.e., If a polynomial relationship between indicators and response is the best fit, then we have polynomial regression, e.g., ETM 620 - 09U 1 y 0 1 x k ... k x k y 0 1 x 1 2 x 2 11 x 1 2 22 x 2 2 12 x 1 x 2

Transcript of ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the...

Page 1: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U1

Multiple regressionMore than one indicator variable may be responsible

for the variation we see in the response.Gas mileage is a function of weight, horsepower, use of air

conditioning, etc.Metal fatigue in airplanes is a function of number of takeoffs and

landings, climbout speed, landing speed, etc.Incidence of heart attack is a function of age, BMI, cholesterol

levels, etc.

If the function that defines the relationship between the indicator variables and the response is linear, then we have multiple linear regression, i.e.,

If a polynomial relationship between indicators and response is the best fit, then we have polynomial regression, e.g.,

ETM 620 - 09U1

y 0 1xk ...k xk

y 0 1x1 2x2 11x12 22x2

2 12x1x2

Page 2: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U2

Multiple linear regression: Matrix approachThe viscosity of slurry is believed to be a function of

the temperature and the feed rate. A number of readings were taken with the following results:

Hypothesize the relationship,

Y = β0 + β1 x1 + β2 x2 + ε

and calculate the estimate,

ˆ y b0 b1x1 b2x2

ETM 620 - 09U2

TempFeed Rate Viscosity

80 8 2256

93 9 2340

100 10 2426

82 13 2293

90 11 2330

99 8 2368

81 8 2250

96 10 2409

94 12 2364

93 11 2379

97 13 2440

95 11 2364

100 8 2404

85 12 2317

86 9 2309

87 12 2328

Page 3: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U3

Matrix form of the equationDefine the matrices:

2328

...

2330

2293

2426

2340

2256

Y

12871

...

11901

12821

101001

9931

8801

X

Bb0

b1

b2

ETM 620 - 09U3

Page 4: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U4

General Matrix FormWe obtain the least squares estimates (b0, b1, b2)

of (β0, β1, β2) by solving the matrix equation:

for b, or

XTXbXTY

YXXXb TT 1)(

ETM 620 - 09U4

Page 5: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U5

14.519

-0.132

7

-0.22

9

-0.132

70.001

40.00

02

-0.229

1

0.0002

0.0203

From Excel,

XTX = (XTX)-1 =

XTY =

b =

16 1458 165

14581335

60150

28

1651502

8175

137577

3429550

387855

1560.67

7.73

8.11

Page 6: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U6

Or, using regression analysis on Excel

Regression Statistics

Multiple R0.9620594

25

R Square0.9255583

37

Adjusted R2

0.914105774

Std. Error16.515955

92

Observations 16

ANOVA

  df SS MS FSignifican

ce F

Regression 2

44089.84

22045

80.82

4.64306E-08

Residual 133546.0

98272.

78

Total 1547635.

94      

 Coefficient

s Std Errt

Stat

P-valu

eLower 95% Upper 95%

Intercept1560.6678

862.932

0124.7

992E-12

1424.711536

1696.624225

Temp7.7281042

110.6248

8112.3

671E-08

6.378130266

9.078078155

Feed Rate8.1135634

812.3509

363.45

120.00

43.034676

02313.192450

94

Page 7: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U7

How do we interpret these results?R2 – the degree to which the variability of the data is

accounted for in the modelwill naturally increase as number of regressor variables

increasesadjusted R2 – adjusted to reflect how well the addition

of new regressors improves the ability of the model to account for the variability in the data.adjusted R2 > R2 if the new term significantly decreases

MSE

adjusted R2 << R2 if the new term is not significantIn our example,

R2 = _______________ ; adj R2 = ________________Interpretation?

Page 8: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U8

Confidence intervals around β values …Calculated by,

Given in the regression results …

Interpretation?

jjpnj Ct 2,2/

ˆˆ

  Coefficients Std Err t StatP-

valueLower 95% Upper 95%

Intercept 1560.6678862.932

0124.79

9 2E-121424.71

15 1696.6242

Temp7.72810421

10.6248

8112.36

7 1E-086.37813

03 9.078078

Feed Rate8.11356348

12.3509

363.451

2 0.0043.03467

60 13.19245

Page 9: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U9

A trickier example…The gas mileage for a passenger automobile is

believed to be a function of the weight of the car and the horsepower of the engine. Several cars were tested with the following results:

ETM 620 - 09U9

MPG, y

Wt., x1 HP, x2

26 3.4 16931 2.5 10620 3.8 30431 2.8 15524 3.6 21129 3.3 14020 3.3 21023 3.9 25524 4.1 25526 3.3 164

Page 10: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U10

Regression results from Excel …

Regression Statistics

Multiple R0.8497

6

R Square0.7220

9Adjusted R Square

0.64269

Standard Error

2.39433

Observations 10ANOVA

  df SS MS FSignifica

nce F

Regression 2104.

352.14

9.094

0.0113149

Residual 740.1

35.733

Total 9144.

4      

 Coefficients

Std Err

t Stat

P-valu

eLower 95%

Upper 95%

Intercept 36.7447.04

65.215

0.001

20.081785

53.406268

Wt., x1-

0.19173.16

9

-0.0

60.95

3

-7.68627

97.3029

757

HP, x2-

0.05430.02

5

-2.1

50.06

9

-0.11401

90.0054

114

Page 11: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U11

Let’s try it in Minitab …What do the residuals look like?

What does the output of the regression tell us?

What do we get if we try “Stepwise Regression”?

Page 12: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U12

Polynomial regression …Example: The expected yield of a crop of

marigolds is hypothesized to be a function of the days after the first bloom. Yield (in number of blooms) from a given plot was counted in one growing season with the results as given in the data file.

Step 1: plot the data …

Page 13: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U13

Plot of the data …

Marigold Yields

0500

10001500200025003000350040004500

14 19 24 29 34 39 44 49

Page 14: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U14

Fitting the polynomial …Hypothesize the model,

In Excel,

In Minitab,

2210 )()( daydayy

Page 15: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U15

Indicator variablesAllows us to include qualitative factors in

regression analysis …machine typegrade of fueloperator

Example,In addition to SAT scores, an admissions officer is concerned that whether or not a student attended private high school might affect the freshman GPA. Data from 20 students resulted is given in the data file.Conduct the analysis and interpret the results …

Page 16: ETM 620 - 09U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.

ETM 620 - 09U16

Problems in multiple regressionMulticollinearity

Influential observations

Autocorrelation