Remedial measures … or “how to fix problems with the model” Transforming the data so that the...

36
Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed data.

Transcript of Remedial measures … or “how to fix problems with the model” Transforming the data so that the...

Page 1: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Remedial measures … or “how to fix problems with the model”

Transforming the data so that the simple linear regression model is

okay for the transformed data.

Page 2: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Options for fixing problems with the model

• Abandon the simple linear regression model and find a more appropriate (but typically more complex) model.

• Transform the data so that the simple linear regression model works for the transformed (new) data.

Page 3: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Abandoning the model

• If not linear: try a different function, like a quadratic (Ch. 7) or an exponential function (Ch. 13).

• If unequal error variances: use weighted least squares (Ch. 10).

• If error terms are not independent: try fitting a time series model (Ch. 12).

• If important predictor variables omitted: try fitting a multiple regression model (Ch. 6).

• If outlier: use robust estimation procedure (Ch. 10).

Page 4: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Choices for transforming the data

• Transform X values only.

• Transform Y values only.

• Transform both X and Y values simultaneously.

Page 5: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

If the only thing wrong with your model is that linear doesn’t work…

• Try transforming only the X values.

• You wouldn’t want to transform the Y values here, because you might change the well-behaved error terms (normal, equal variances) into badly-behaved error terms (not normal, unequal variances).

Page 6: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Memory retention

time prop1 0.845 0.7115 0.6130 0.5660 0.54120 0.47240 0.45480 0.38720 0.361440 0.262880 0.205760 0.1610080 0.08

• Subjects asked to memorize a list of disconnected items. Asked to recall them at various times up to a week later

• Predictor time = time, in minutes, since initially memorized the list.

• Response prop = proportion of items recalled correctly.

Page 7: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Fitted line plot

10000 5000 0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

time

pro

p

S = 0.152284 R-Sq = 57.1 % R-Sq(adj) = 53.2 %

prop = 0.525870 - 0.0000557 time

Regression Plot

Page 8: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Residual vs. fits plot

0.50.40.30.20.10.0

0.3

0.2

0.1

0.0

-0.1

-0.2

Fitted Value

Re

sid

ual

Residuals Versus the Fitted Values(response is prop)

Page 9: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Normal probability plot

P-Value (approx): > 0.1000R: 0.9751W-test for Normality

N: 13StDev: 0.145801Average: -0.0000000

0.30.20.10.0-0.1-0.2

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

RESI1

Normal Probability Plot

Page 10: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Transform the X data

time prop log10_time1 0.84 0.000005 0.71 0.6989715 0.61 1.1760930 0.56 1.4771260 0.54 1.77815120 0.47 2.07918240 0.45 2.38021480 0.38 2.68124720 0.36 2.857331440 0.26 3.158362880 0.20 3.459395760 0.16 3.7604210080 0.08 4.00346

Change (“transform”) the predictor time to log10(time).

Page 11: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: New fitted line plot

43210

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

log10_time

pro

p

S = 0.0233881 R-Sq = 99.0 % R-Sq(adj) = 98.9 %

prop = 0.846415 - 0.182427 log10_time

Regression Plot

Page 12: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Predicting new proportion

Estimated regression function:

timeY 10log18.085.0ˆ

Therefore, we predict the proportion of words recalled after 1000 days is:

31.0318.085.01000log18.085.0ˆ10 Y

Page 13: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: New residuals vs. fits plot

0.90.80.70.60.50.40.30.20.1

0.04

0.03

0.02

0.01

0.00

-0.01

-0.02

-0.03

-0.04

Fitted Value

Re

sid

ual

Residuals Versus the Fitted Values(response is prop)

Page 14: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Normal probability plot

P-Value (approx): > 0.1000R: 0.9786W-test for Normality

N: 13StDev: 0.0223924Average: -0.0000000

0.030.00-0.03

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

RESI1

Normal Probability Plot

Page 15: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Some possible transformations of X

These are guidelines only and not complete. It usually takes some trial and error to find the best transformation.

XX 10log

XX elog

XX

2XX

XeX

XX

1

XeX

XX 10log

XX elog

3XX

Page 16: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Time* = 1/Time

1.00.90.80.70.60.50.40.30.20.10.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

invtime

pro

p

S = 0.175783 R-Sq = 42.8 % R-Sq(adj) = 37.6 %

prop = 0.378010 + 0.529152 invtime

Regression Plot

Page 17: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 1: Time* = exp(-Time)

0.40.30.20.10.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

e_negtime

pro

p

S = 0.192907 R-Sq = 31.1 % R-Sq(adj) = 24.9 %

prop = 0.397184 + 1.21886 e_negtime

Regression Plot

Page 18: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

If evidence of non-normality and unequal error variances …

• Since it is the shapes and spreads of the Y distributions that need to be changed, try transforming the Y values.

• Transformation on Y may also help “straighten out” a curved relationship.

• May also need to simultaneously transform the X values.

Page 19: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: Gestation time and birthweight for mammals

Mammal Birthwgt GestationGoat 2.75 155Sheep 4.00 175Deer 0.48 190Porcupine 1.50 210Bear 0.37 213Hippo 50.00 243Horse 30.00 340Camel 40.00 380Zebra 40.00 390Giraffe 98.00 457Elephant 113.00 670

• Predictor Birthwgt = birthweight, in kg, of mammal.

• Response Gestation = number of days until birth

Page 20: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: Fitted line plot

100 50 0

700

600

500

400

300

200

Birthwgt

Ges

tatio

n

S = 66.0943 R-Sq = 83.9 % R-Sq(adj) = 82.1 %

Gestation = 187.084 + 3.59137 Birthwgt

Regression Plot

Page 21: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: Residual vs. fits plot

600500400300200

100

0

-100

Fitted Value

Re

sid

ual

Residuals Versus the Fitted Values(response is Gestatio)

Page 22: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: Normal probability plot

P-Value (approx): > 0.1000R: 0.9703W-test for Normality

N: 11StDev: 62.7025Average: -0.0000000

500-50-100

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

RESI1

Normal Probability Plot

Page 23: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: Transform the Y data

Mammal Birthwgt Gestation logGestGoat 2.75 155 2.19033Sheep 4.00 175 2.24304Deer 0.48 190 2.27875Porcupine 1.50 210 2.32222Bear 0.37 213 2.32838Hippo 50.00 243 2.38561Horse 30.00 340 2.53148Camel 40.00 380 2.57978Zebra 40.00 390 2.59106Giraffe 98.00 457 2.65992Elephant 113.00 670 2.82607

Change (“transform”) the response Gestation to log10(Gestation).

Page 24: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: New fitted line plot

100 50 0

2.8

2.7

2.6

2.5

2.4

2.3

2.2

Birthwgt

logG

est

S = 0.0939425 R-Sq = 80.3 % R-Sq(adj) = 78.1 %

logGest = 2.29256 + 0.0045211 Birthwgt

Regression Plot

Page 25: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: Predicting new gestation

Estimated regression function:

BirthwgtestG 005.029.2)ˆ(log10

Therefore, since:

54.250005.029.2)ˆ(log10 estG

we predict the gestation length of another mammal at 50 kgs to be:

7.3461010ˆ 54.2)ˆ(log10 estGestG

Page 26: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: New residual vs fits plot

2.82.72.62.52.42.3

0.1

0.0

-0.1

Fitted Value

Re

sid

ual

Residuals Versus the Fitted Values(response is logGest)

Page 27: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 2: New normal probability plot

P-Value (approx): > 0.1000R: 0.9743W-test for Normality

N: 11StDev: 0.0891217Average: -0.0000000

0.10.0-0.1

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

RESI2

Normal Probability Plot

Page 28: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Some possible transformations of Y if not normal and unequal variances

These are guidelines only. It usually takes trial and error to find the best transformation. And maybe a simultaneous transformation on X.

YY

1

YY 10log

YY

YY elog

Page 29: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

150140130120110100 90 80 70 60

700

600

500

400

300

200

100

0

Length

Wei

ght

S = 54.0115 R-Sq = 83.6 % R-Sq(adj) = 82.9 %

Weight = -393.264 + 5.90235 Length

Regression Plot

Example 3: Length and Weight of Alligators

Page 30: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 3: Residuals vs fits plot

5004003002001000

200

100

0

-100

Fitted Value

Re

sid

ual

Residuals Versus the Fitted Values(response is weight)

Page 31: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 3: Normal probability plot

P-Value (approx): 0.0165R: 0.9436W-test for Normality

N: 25StDev: 52.8742Average: 0.0000000

150100500-50

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

RESI1

Normal Probability Plot

Page 32: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 3: Transform the dataweight length loge_wt

loge_len 130 94 4.86753

4.54329 51 74 3.93183

4.30407 640 147 6.46147

4.99043 28 58 3.33220

4.06044 80 86 4.38203

4.45435 110 94 4.70048

4.54329 33 63 3.49651

4.14313 90 86 4.49981

4.45435 36 69 3.58352

4.23411 38 72 3.63759

4.27667 366 128 5.90263

4.85203 84 85 4.43082

4.44265 80 82 4.38203

4.40672 102 90 4.62497

4.49981 … and so on …

• Transform predictor weight to loge(weight)

• Transform response length to loge(length)

Page 33: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 3: New fitted line plot

5.04.94.84.74.64.54.44.34.24.14.0

6.5

6.0

5.5

5.0

4.5

4.0

3.5

3.0

loge_len

loge

_wt

S = 0.175311 R-Sq = 94.5 % R-Sq(adj) = 94.3 %

loge_wt = -10.1746 + 3.28599 loge_len

Regression Plot

Page 34: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 3: New residual plot

6543

0.5

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

-0.3

Fitted Value

Re

sid

ual

Residuals Versus the Fitted Values(response is loge_wt)

Page 35: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Example 3: New normal probability plot

P-Value (approx): > 0.1000R: 0.9847W-test for Normality

N: 25StDev: 0.171619Average: 0.0000000

0.40.30.20.10.0-0.1-0.2-0.3

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

babi

lity

RESI2

Normal Probability Plot

Page 36: Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Transforming data in Minitab

• Calc >> Calculator …• In box labeled “Store result in variable,”, tell

Minitab in which column (variable) you want the transformed data stored.

• Type (input) the expression for the desired transformation in the box labeled Expression. Use the available functions.

• Select okay. The data will appear in the column of the worksheet that you specified.