Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R....

48
McGraw-Hill/Irwin Copyright © 2015 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics in Business & Economics, 5 th edition David P. Doane and Lori E. Seward Prepared by Lloyd R. Jaisingh

Transcript of Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R....

Page 1: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

McGraw-Hill/Irwin Copyright © 2015 by The McGraw-Hill Companies, Inc. All rights reserved.

A PowerPoint Presentation Package to Accompany

Applied Statistics in Business &

Economics, 5th edition

David P. Doane and Lori E. Seward

Prepared by Lloyd R. Jaisingh

Page 2: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-2

Simple Regression

Chapter Contents

12.1 Visual Displays and Correlation Analysis

12.2 Simple Regression

12.3 Regression Models

12.4 Ordinary Least Squares Formulas

12.5 Tests for Significance

12.6 Analysis of Variance: Overall Fit

12.7 Confidence and Prediction Intervals for Y

Ch

ap

ter 1

2

Page 3: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-3

Simple Regression

Chapter Contents

12.8 Residual Tests

12.9 Unusual Observations

12.10 Other Regression Problems (Optional)

Ch

ap

ter 1

2

Page 4: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-4

Chapter Learning Objectives (LO’s)

LO12-1: Calculate and test a correlation coefficient for significance.

LO12-2: Interpret a regression equation and use it to make predictions.

LO12-3: Explain the form and assumptions of a simple regression model.

LO12-4: Explain the least squares method, apply formulas for coefficients,

and interpret 𝑅2.

LO12-5: Construct confidence intervals and test hypotheses for the slope

and intercept.

LO12-6: Interpret the ANOVA table and use it to compute F, 𝑅2, and

standard error.

Ch

ap

ter 1

2

Simple Regression

Page 5: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-5

Chapter Learning Objectives (LO’s)

LO12-7: Distinguish between confidence and prediction intervals for Y.

LO12-8: Calculate residuals and perform tests of regression

assumptions.

LO12-9: Identify unusual residuals and tell when they are outliers.

LO12-10: Define leverage and identify high-leverage observations.

LO12-11: Improve data conditioning and use transformations if needed

(Optional).

Ch

ap

ter 1

2

Simple Regression

Page 6: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12.1 Visual Displays and

Correlation Analysis

• Begin the analysis of bivariate data (i.e., two variables) with a

scatter plot.

• A scatter plot

- displays each observed data pair (xi, yi) as a dot on an X/Y grid.

- indicates visually the strength of the relationship between the

two variables.

Visual Displays

Ch

ap

ter 1

2

12-6

Sample Scatter Plot

Page 7: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Correlation Coefficient, r

Note: -1 ≤ r ≤ +1

• The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y.

r = 0 indicates no linear

relationship

Ch

ap

ter 1

2

LO12-1: Calculate and test a correlation coefficient for

significance.

LO12-112.1 Visual Displays and

Correlation Analysis

12-7

Page 8: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-8

Scatter Plots Showing Various Correlation Values

Ch

ap

ter 1

2

12.1 Visual Displays and

Correlation AnalysisLO12-1

Page 9: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Step 1: State the Hypotheses

Determine whether you are using a one or two-tailed test and the

level of significance (a).

H0: r = 0

H1: r ≠ 0

• Step 2: Specify the Decision Rule

For degrees of freedom df = n -2, look up the critical value ta in

Appendix D.

Tests for Significant Correlation Using Student’s t

• Note: r is an estimate of the population

correlation coefficient r (rho).

Ch

ap

ter 1

2

LO12-112.1 Visual Displays and

Correlation Analysis

12-9

Page 10: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Step 3: Calculate the Test Statistic

• Step 4: Make the Decision

Reject H0 if t > ta/2 or if t < -ta/2

• .Also, reject H0 the if the p-value a.

Tests for Significant Correlation Using Student’s t

Ch

ap

ter 1

2

LO12-112.1 Visual Displays and

Correlation Analysis

12-10

Page 11: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Equivalently, you can calculate the critical value for the correlation

coefficient using

• This method gives a benchmark for the correlation coefficient.

• However, there is no p-value and is inflexible if you change your

mind about a.

• MegaStat uses this method, giving two-tail critical values for

a = 0.05 and a = 0.01.

Critical Value for Correlation Coefficient (Tests for Significance)

Ch

ap

ter 1

2

LO12-112.1 Visual Displays and

Correlation Analysis

12-11

Page 12: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Ch

ap

ter 1

2

LO12-112.1 Visual Displays and

Correlation Analysis

12-12

Page 13: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Simple Regression analyzes the relationship between two

variables.

• It specifies one dependent (response) variable and one

independent (predictor) variable.

• The hypothesized relationship here will be linear of the form

Y = slope X + y-intercept..

What is Simple Regression?

Ch

ap

ter 1

2

12.2 Simple Regression

12-13

LO12-2

LO12-2: Interpret the slope and intercept of a regression equation

and use it to make prediction.

Page 14: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Interpreting an Estimated Regression Equation: Examples

Ch

ap

ter 1

2

12.2 Simple RegressionLO12-2

12-14

Page 15: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Prediction Using Regression: Examples

Ch

ap

ter 1

2

12.2 Simple RegressionLO12-2

12-15

Page 16: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

NOTES:

Ch

ap

ter 1

2

12.2 Simple Regression

12-16

LO12-2

Page 17: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• The assumed model for a linear relationship is

y = b0 + b1x + e.

• The relationship holds for all pairs (xi , yi ).

• The error term e is not observable, is assumed to be independently

normally distributed with mean of 0 and standard deviation s.

• The unknown parameters are:

b0 Intercept

b1 Slope.

Model and Parameters

Ch

ap

ter 1

2

12.3 Regression Models

12-17

LO12-3

LO12-3: Explain the form and assumptions of a simple

regression model.

Page 18: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• The fitted model or regression model is used to predict the

expected value of Y for a given value of X and is given below.

• The fitted coefficients are

b0 the estimated intercept

b1 the estimated slope

Model and Parameters

Ch

ap

ter 1

2

12.3 Regression Models

12-18

LO12-3

Page 19: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Ch

ap

ter 1

2

LO12-3

A more precise method is to let Excel

calculate the estimates. We enter

observations on the independent

variable x1, x2, . . ., xn and the

dependent variable y1, y2, . . ., yn into

separate columns, and let Excel fit the

regression equation, as illustrated in

Figure 12.6. Excel will choose the

regression coefficients so as to

produce a good fi t

12.3 Regression Models

12-19

Fitting a Regression on a Scatter Plot

Page 20: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Ch

ap

ter 1

2

LO12-3 12.3 Regression Models

Slope and Intercept Interpretations

• Figure 12.6 (previous slide) shows a sample of miles per gallon and

horsepower for 15 engines. The Excel graph and its fitted regression

equation are also shown.

• Slope Interpretation: The slope of -0.0785 says that for each

additional unit of engine horsepower, the miles per gallon decreases

by 0.0785 mile. This estimated slope is a statistic because a different

sample might yield a different estimate of the slope.

• Intercept Interpretation: The intercept value of 49.216 suggests

that when the engine has no horsepower, the fuel efficiency would

be quite high. However, the intercept has little meaning in this case,

not only because zero horsepower makes no logical sense, but also

because extrapolating to x = 0 is beyond the range of the observed

data.

12-20

Page 21: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• The ordinary least squares method (OLS) estimates the slope

and intercept of the regression line so that the sum of residuals is

minimized which will ensure the best fit.

• The sum of the residuals = 0.

• The sum of the squared residuals is SSE.

Slope and Intercept

Ch

ap

ter 1

2

12.4 Ordinary Least Squares (OLS)

Formulas

12-21

LO12-4

LO12-4: Explain the least squares method, apply

formulas for coefficients, and interpret 𝑅2.

Page 22: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• The OLS estimator for the slope is:

• The OLS estimator for the intercept is:

Slope and Intercept

Ch

ap

ter 1

2

12.4 Ordinary Least Squares (OLS)

Formulas

12-22

LO12-4

or

Page 23: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Slope and Intercept

Ch

ap

ter 1

2

12.4 Ordinary Least Squares (OLS)

Formulas

12-23

LO12-4

*Recall from Chapter 8 that an unbiased estimator’s expected value is the true

parameter and that a consistent estimator approaches ever closer to the true

parameter as the sample size increases.

Page 24: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• We want to explain the total variation in Y around its mean (SST for

Total Sums of Squares).

• The regression sum of squares (SSR) is the explained variation in Y.

Assessing Fit

Ch

ap

ter 1

2

12.4 Ordinary Least Squares (OLS)

Formulas

12-24

LO12-4

Page 25: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• The error sum of squares (SSE) is the unexplained variation in Y.

• If the fit is good, SSE will be relatively small compared to SST.

• A perfect fit is indicated by an SSE = 0.

• The magnitude of SSE depends on n and on the units of

measurement.

Assessing Fit

Ch

ap

ter 1

2

12.4 Ordinary Least Squares (OLS)

Formulas

12-25

LO12-4

Page 26: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

Coefficient of Determination

• Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates

perfect fit. In simple regression, R2 = (r)2

• R2 is a measure of relative fit based on a comparison of SSR and SST.

Ch

ap

ter 1

2

12.4 Ordinary Least Squares (OLS)

Formulas

12-26

LO12-4

Page 27: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• The standard error (𝑠𝑒) is an overall measure of model fit.

Standard Error of Regression

• If the fitted model’s predictions are perfect

(SSE = 0), then s = 0. Thus, a small 𝑠𝑒 indicates a better fit.

• Used to construct confidence intervals.

• Magnitude of 𝑠𝑒 depends on the units of measurement of Y and

on data magnitude.

Ch

ap

ter 1

2

12.5 Test For Significance

LO12-5: Construct confidence intervals and test

hypotheses for the slope and intercept.

LO12-5

12-27

Page 28: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Standard error of the slope and intercept:

Confidence Intervals for Slope and Intercept

Ch

ap

ter 1

2

12.5 Test For SignificanceLO12-5

12-28

Page 29: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Confidence interval for the true slope and intercept:

Confidence Intervals for Slope and Intercept

• Note: One can use Excel, Minitab, MegaStat or

other software to compute these intervals

and do hypothesis tests relating to linear regression.

Ch

ap

ter 1

2

12.5 Test For SignificanceLO12-5

12-29

Page 30: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

• Is the true slope different from zero? Well, if b1 = 0, then X cannot

influence Y and the regression model collapses to a constant b0

plus random error.

• The hypotheses (for zero slope and/or intercept) to be tested are:

Hypothesis Tests

Ch

ap

ter 1

2

12.5 Test For SignificanceLO12-5

df = n -2

Reject H0 if tcalc > ta/2

or if p-value a.

12-30

Page 31: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-31

• The decomposition of variance may be written as

Decomposition of Variance

Ch

ap

ter 1

2

12.6 Analysis of Variance: Overall Fit

LO12-6: Interpret the ANOVA table and use it to calculate F, R2, and

the standard error.

LO12-6

Page 32: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-32

• To test a regression for overall significance, we use an F test to

compare the explained (SSR) and unexplained (SSE) sums of

squares.

F Test for Overall Fit

Ch

ap

ter 1

2

12.6 Analysis of Variance: Overall Fit

LO12-6: Interpret the ANOVA table and use it to calculate F, R2, and

the standard error.

LO12-6

Page 33: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-33

12.7 Confidence and Prediction

Intervals for Y

• Confidence Interval for the conditional mean of Y.

• Prediction intervals are wider than confidence intervals because

individual Y values vary more than the mean of Y.

How to Construct an Interval Estimate for Y

Ch

ap

ter 1

2

LO12-7: Distinguish between confidence and prediction

intervals for Y.

LO12-7

Page 34: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-34

12.8 Residual Tests

Three Important Assumptions

1. The errors are normally distributed.

2. The errors have constant variance (i.e., they are homoscedastic).

3. The errors are independent (i.e., they are nonautocorrelated).

Ch

ap

ter 1

2LO12-8: Calculate residuals and perform tests of

regression assumptions.

Violation of Assumption 1: Non-normal Errors

• Non-normality of errors is a mild violation since the regression

parameter estimates b0 and b1 and their variances remain

unbiased and consistent.

• Confidence intervals for the parameters may be untrustworthy

because normality assumption is used to justify using

Student’s t distribution.

LO12-8

Page 35: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-35

Non-normal Errors

• A large sample size would compensate.

• Outliers could pose serious problems.

Ch

ap

ter 1

2

Normal Probability Plot

• The Normal Probability Plot tests the assumption

H0: Errors are normally distributed

H1: Errors are not normally distributed

• If H0 is true, the

residual probability

plot should be linear

as shown in the example.

12.8 Residual TestsLO12-8

Page 36: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-36

What to Do About Non-Normality?

1. Trim outliers only if they clearly are mistakes.

2. Increase the sample size if possible.

3. Try a logarithmic transformation of both X and Y.

Ch

ap

ter 1

2

Violation of Assumption 2: Nonconstant Variance

• The ideal condition is if the error magnitude is constant (i.e.,

errors are homoscedastic).

12.8 Residual TestsLO12-8

Page 37: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-37

Violation of Assumption 2: Nonconstant Variance

• Heteroscedastic (nonconstant) errors increase or decrease with X.

• In the most common form of heteroscedasticity, the variances of the

estimators are likely to be understated.

• This results in overstated t statistics and artificially narrow

confidence intervals.

Ch

ap

ter 1

2

Tests for Heteroscedasticity

• Plot the residuals against X.

Ideally, there is no pattern in the

residuals moving from left to right.

12.8 Residual TestsLO12-8

Page 38: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-38

Tests for Heteroscedasticity

• The “fan-out” pattern of increasing residual variance is the most

common pattern indicating heteroscedasticity.

Ch

ap

ter 1

2

12.8 Residual TestsLO12-8

Page 39: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-39

What to Do About Heteroscedasticity?

• Transform both X and Y, for example, by taking logs.

• Although it can widen the confidence intervals for the coefficients,

heteroscedasticity does not bias the estimates.

Ch

ap

ter 1

2

Violation of Assumption 3: Autocorrelated Errors

• Autocorrelation is a pattern of non-independent errors.

• In a first-order autocorrelation, et is correlated with et-1.

• The estimated variances of the OLS estimators are biased,

resulting in confidence intervals that are too narrow, overstating the

model’s fit.

12.8 Residual TestsLO12-8

Page 40: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-40

Runs Test for Autocorrelation

• In the runs test, count the number of the residual’s sign reversals (i.e., how

often does the residual cross the zero centerline?).

• If the pattern is random, the number of sign changes should be n/2.

• Fewer than n/2 would suggest positive autocorrelation.

• More than n/2 would suggest negative autocorrelation.

Ch

ap

ter 1

2

Durbin-Watson (DW) Test

• Tests for autocorrelation under the hypotheses

H0: Errors are non-autocorrelated

H1: Errors are autocorrelated

• The DW statistic will range from 0 to 4.

DW < 2 suggests positive autocorrelation

DW = 2 suggests no autocorrelation (ideal)

DW > 2 suggests negative autocorrelation

12.8 Residual TestsLO12-8

Page 41: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-41

What to Do About Autocorrelation?

• Transform both variables using the method of first differences in

which both variables are redefined as changes. Then we regress Y

against X.

• Although it can widen the confidence interval for the coefficients,

autocorrelation does not bias the estimates.

Ch

ap

ter 1

2

12.8 Residual TestsLO12-8

Page 42: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-42

12.9 Unusual Observations

Standardized Residuals

• One can use Excel, Minitab, MegaStat or other software to compute

standardized residuals.

• If the absolute value of any standardized residual is at least 2, then it is

classified as unusual.

Ch

ap

ter 1

2LO12-9: Identify unusual residuals and tell when they are

outliers.

LO12-9

Page 43: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-43

12.9 Unusual Observations

Ch

ap

ter 1

2

High Leverage

• A high leverage statistic indicates the observation is far from the

mean of X.

• These observations are influential because they are at the “ end

of the lever.”

• The leverage for observation i is denoted hi .

LO12-10

LO12-10: Define leverage and identify high leverage

observations.

Page 44: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-44

High Leverage

• A leverage that exceeds 3/n is unusual.

Ch

ap

ter 1

2

12.9 Unusual ObservationsLO12-10

Page 45: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-4512B-45

12.10 Other Regression Problems

(optional)

Outliers

To fix the problem,

- delete the observation(s)

- delete the data

- formulate a multiple regression

model that includes the lurking

variable.

Outliers may be caused by

- an error in recording

data

- impossible data

- an observation that has

been influenced by an

unspecified “lurking”

variable that should

have been controlled

but wasn’t.

Ch

ap

ter 1

2

LO12-11

LO12-11: Improve data conditioning and use

transformations if needed (optional).

Page 46: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-46

Model Misspecification

• If a relevant predictor has been omitted, then the model is

misspecified.

• Use multiple regression instead of bivariate regression.

Ill-Conditioned Data

• Well-conditioned data values are of the same general order of

magnitude.

• Ill-conditioned data have unusually large or small data values and

can cause loss of regression accuracy or awkward estimates.

Ch

ap

ter 1

2

12.10 Other Regression Problems

(optional)LO12-11

Page 47: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-47

Ill-Conditioned Data

• Avoid mixing magnitudes by adjusting the magnitude of your data

before running the regression.

Spurious Correlation

• In a spurious correlation two variables appear related because of

the way they are defined.

• This problem is called the size effect or problem of totals.

Ch

ap

ter 1

2

12.10 Other Regression Problems

(optional)LO12-11

Page 48: Applied Statistics in Business & Economics, 5 edition · 2017. 4. 27. · Prepared by Lloyd R. Jaisingh . 12-2 Simple Regression Chapter Contents 12.1 Visual Displays and Correlation

12-48

Model Form and Variable Transforms

• Sometimes a nonlinear model is a better fit than a linear model.

• Excel offers many model forms.

• Variables may be transformed (e.g., logarithmic or exponential

functions) in order to provide a better fit.

• Log transformations reduce heteroscedasticity.

• Nonlinear models may be difficult to interpret.

Ch

ap

ter 1

2

12.10 Other Regression Problems

(optional)LO12-11