Chap 012

Chapter 12Simple Regression

True / False Questions1.A scatter plot is used to visualize the association (or lack of association) between two quantitative variables.TrueFalse

2.The correlation coefficient r measures the strength of the linear relationship between two variables.TrueFalse

3.Pearson's correlation coefficient (r) requires that both variables be interval or ratio data.TrueFalse

4.If r = .55 and n = 16, then the correlation is significant at = .05 in a two-tailed test.TrueFalse

5.A sample correlation r = .40 indicates a stronger linear relationship than r = -.60.TrueFalse

6.A common source of spurious correlation between X and Y is when a third unspecified variable Z affects both X and Y.TrueFalse

7.The correlation coefficient r always has the same sign as b1 in Y = b0 + b1X.TrueFalse

8.The fitted intercept in a regression has little meaning if no data values near X = 0 have been observed.TrueFalse

9.The least squares regression line is obtained when the sum of the squared residuals is minimized.TrueFalse

10.In a simple regression, if the coefficient for X is positive and significantly different from zero, then an increase in X is associated with an increase in the mean (i.e., the expected value) of Y.TrueFalse

11.In least-squares regression, the residuals e1, e2, . . . , en will always have a zero mean.TrueFalse

12.When using the least squares method, the column of residuals always sums to zero.TrueFalse

13.In the model Sales = 268 + 7.37 Ads, an additional $1 spent on ads will increase sales by 7.37 percent.TrueFalse

14.If R2 = .36 in the model Sales = 268 + 7.37 Ads with n = 50, the two-tailed test for correlation at = .05 would say that there is a significant correlation between Sales and Ads.TrueFalse

15.If R2 = .36 in the model Sales = 268 + 7.37 Ads, then Ads explains 36 percent of the variation in Sales.TrueFalse

16.The ordinary least squares regression line always passes through the point .TrueFalse

17.The least squares regression line gives unbiased estimates of 0 and 1.TrueFalse

18.In a simple regression, the correlation coefficient r is the square root of R2.TrueFalse

19.If SSR is 1800 and SSE is 200, then R2 is .90.TrueFalse

20.The width of a prediction interval for an individual value of Y is less than standard error se.TrueFalse

21.If SSE is near zero in a regression, the statistician will conclude that the proposed model probably has too poor a fit to be useful.TrueFalse

22.For a regression with 200 observations, we expect that about 10 residuals will exceed two standard errors.TrueFalse

23.Confidence intervals for predicted Y are less precise when the residuals are very small.TrueFalse

24.Cause-and-effect direction between X and Y may be determined by running the regression twice and seeing whether Y = 0 + 1X or X = 1 + 0Y has the larger R2.TrueFalse

25.The ordinary least squares method of estimation minimizes the estimated slope and intercept.TrueFalse

26.Using the ordinary least squares method ensures that the residuals will be normally distributed.TrueFalse

27.If you have a strong outlier in the residuals, it may represent a different causal system.TrueFalse

28.A negative correlation between two variables X and Y usually yields a negative p-value for r.TrueFalse

29.In linear regression between two variables, a significant relationship exists when the p-value of the t test statistic for the slope is greater than .TrueFalse

30.The larger the absolute value of the t statistic of the slope in a simple linear regression, the stronger the linear relationship exists between X and Y.TrueFalse

31.In simple linear regression, the coefficient of determination (R2) is estimated from sums of squares in the ANOVA table.TrueFalse

32.In simple linear regression, the p-value of the slope will always equal the p-value of the F statistic.TrueFalse

33.An observation with high leverage will have a large residual (usually an outlier).TrueFalse

34.A prediction interval for Y is narrower than the corresponding confidence interval for the mean of Y.TrueFalse

35.When X is farther from its mean, the prediction interval and confidence interval for Y become wider.TrueFalse

36.The total sum of squares (SST) will never exceed the regression sum of squares (SSR).TrueFalse

37."High leverage" would refer to a data point that is poorly predicted by the model (large residual).TrueFalse

38.The studentized residuals permit us to detect cases where the regression predicts poorly.TrueFalse

39.A poor prediction (large residual) indicates an observation with high leverage.TrueFalse

40.Ill-conditioned refers to a variable whose units are too large or too small (e.g., $2,434,567).TrueFalse

41.A simple decimal transformation (e.g., from 18,291 to 18.291) often improves data conditioning.TrueFalse

42.Two-tailed t-tests are often used because any predictor that differs significantly from zero in a two-tailed test will also be significantly greater than zero or less than zero in a one-tailed test at the same .TrueFalse

43.A predictor that is significant in a one-tailed t-test will also be significant in a two-tailed test at the same level of significance .TrueFalse

44.Omission of a relevant predictor is a common source of model misspecification.TrueFalse

45.The regression line must pass through the origin.TrueFalse

46.Outliers can be detected by examining the standardized residuals.TrueFalse

47.In a simple regression, there are n - 2 degrees of freedom associated with the error sum of squares (SSE).TrueFalse

48.In a simple regression, the F statistic is calculated by taking the ratio of MSR to the MSE.TrueFalse

49.The coefficient of determination is the percentage of the total variation in the response variable Y that is explained by the predictor X.TrueFalse

50.A different confidence interval exists for the mean value of Y for each different value of X.TrueFalse

51.A prediction interval for Y is widest when X is near its mean.TrueFalse

52.In a two-tailed test for correlation at = .05, a sample correlation coefficient r = 0.42 with n = 25 is significantly different than zero.TrueFalse

53.In correlation analysis, neither X nor Y is designated as the independent variable.TrueFalse

54.A negative value for the correlation coefficient (r) implies a negative value for the slope (b1).TrueFalse

55.High leverage for an observation indicates that X is far from its mean.TrueFalse

56.Autocorrelated errors are not usually a concern for regression models using cross-sectional data.TrueFalse

57.There are usually several possible regression lines that will minimize the sum of squared errors.TrueFalse

58.When the errors in a regression model are not independent, the regression model is said to have autocorrelation.TrueFalse

59.In a simple bivariate regression, Fcalc = tcalc2.TrueFalse

60.Correlation analysis primarily measures the degree of the linear relationship between X and Y.TrueFalse

Multiple Choice Questions61.The variable used to predict another variable is called the:

A.response variable.

B.regression variable.

C.independent variable.

D.dependent variable.

62.The standard error of the regression:

A.is based on squared deviations from the regression line.

B.may assume negative values if b1 < 0.

C.is in squared units of the dependent variable.

D.may be cut in half to get an approximate 95 percent prediction interval.

63.A local trucking company fitted a regression to relate the travel time (days) of its shipments as a function of the distance traveled (miles). The fitted regression is Time = -7.126 + 0.0214 Distance, based on a sample of 20 shipments. The estimated standard error of the slope is 0.0053. Find the value of tcalc to test for zero slope.

A.2.46

B.5.02

C.4.04

D.3.15

64.A local trucking company fitted a regression to relate the travel time (days) of its shipments as a function of the distance traveled (miles). The fitted regression is Time = -7.126 + .0214 Distance, based on a sample of 20 shipments. The estimated standard error of the slope is 0.0053. Find the critical value for a right-tailed test to see if the slope is positive, using = .05.

A.2.101

B.2.552

C.1.960

D.1.734

65.If the attendance at a baseball game is to be predicted by the equation Attendance = 16,500 - 75 Temperature, what would be the predicted attendance if Temperature is 90 degrees?

A.6,750

B.9,750

C.12,250

D.10, 020

66.A hypothesis test is conducted at the 5 percent level of significance to test whether the population correlation is zero. If the sample consists of 25 observations and the correlation coefficient is 0.60, then the computed test statistic would be:

A.2.071.

B.1.960.

C.3.597.

D.1.645.

67.Which of the following is not a characteristic of the F-test in a simple regression?

A.It is a test for overall fit of the model.

B.The test statistic can never be negative.

C.It requires a table with numerator and denominator degrees of freedom.

D.The F-test gives a different p-value than the t-test.

68.A researcher's Excel results are shown below using Femlab (labor force participation rate among females) to try to predict Cancer (death rate per 100,000 population due to cancer) in the 50 U.S. states.

Which of the following statements is not true?

A.The standard error is too high for this model to be of any predictive use.

B.The 95 percent confidence interval for the coefficient of Femlab is -4.29 to -0.28.

C.Significant correlation exists between Femlab and Cancer at = .05.

D.The two-tailed p-value for Femlab will be less than .05.

69.A researcher's results are shown below using Femlab (labor force participation rate among females) to try to predict Cancer (death rate per 100,000 population due to cancer) in the 50 U.S. states.

Which statement is valid regarding the relationship between Femlab and Cancer?

A.A rise in female labor participation rate will cause the cancer rate to decrease within a state.

B.This model explains about 10 percent of the variation in state cancer rates.

C.At the .05 level of significance, there isn't enough evidence to say the two variables are related.

D.If your sister starts working, the cancer rate in your state will decline.


What is the R2 for this regression?

A..9018

B..0982

C..8395

D..1605

71.A news network stated that a study had found a positive correlation between the number of children a worker has and his or her earnings last year. You may conclude that:

A.people should have more children so they can get better jobs.

B.the data are erroneous because the correlation should be negative.

C.causation is in serious doubt.

D.statisticians have small families.

72.William used a sample of 68 large U.S. cities to estimate the relationship between Crime (annual property crimes per 100,000 persons) and Income (median annual income per capita, in dollars). His estimated regression equation was Crime = 428 + 0.050 Income. We can conclude that:

A.the slope is small so Income has no effect on Crime.

B.crime seems to create additional income in a city.

C.wealthy individuals tend to commit more crimes, on average.

D.the intercept is irrelevant since zero median income is impossible in a large city.

73.Mary used a sample of 68 large U.S. cities to estimate the relationship between Crime (annual property crimes per 100,000 persons) and Income (median annual income per capita, in dollars). Her estimated regression equation was Crime = 428 + 0.050 Income. If Income decreases by 1000, we would expect that Crime will:

A.increase by 428.

B.decrease by 50.

C.increase by 500.

D.remain unchanged.

74.Amelia used a random sample of 100 accounts receivable to estimate the relationship between Days (number of days from billing to receipt of payment) and Size (size of balance due in dollars). Her estimated regression equation was Days = 22 + 0.0047 Size with a correlation coefficient of .300. From this information we can conclude that:

A.9 percent of the variation in Days is explained by Size.

B.autocorrelation is likely to be a problem.

C.the relationship between Days and Size is significant.

D.larger accounts usually take less time to pay.

75.Prediction intervals for Y are narrowest when:

A.the mean of X is near the mean of Y.

B.the value of X is near the mean of X.

C.the mean of X differs greatly from the mean of Y.

D.the mean of X is small.

76.If n = 15 and r = .4296, the corresponding t-statistic to test for zero correlation is:

A.1.715.

B.7.862.

C.2.048.

D.impossible to determine without .

77.Using a two-tailed test at = .05 for n = 30, we would reject the hypothesis of zero correlation if the absolute value of r exceeds:

A..2992.

B..3609.

C..0250.

D..2004.

78.The ordinary least squares (OLS) method of estimation will minimize:

A.neither the slope nor the intercept.

B.only the slope.

C.only the intercept.

D.both the slope and intercept.

79.A standardized residual ei = -2.205 indicates:

A.a rather poor prediction.

B.an extreme outlier in the residuals.

C.an observation with high leverage.

D.a likely data entry error.

80.In a simple regression, which would suggest a significant relationship between X and Y?

A.Large p-value for the estimated slope

B.Large t statistic for the slope

C.Large p-value for the F statistic

D.Small t-statistic for the slope

81.Which is indicative of an inverse relationship between X and Y?

A.A negative F statistic

B.A negative p-value for the correlation coefficient

C.A negative correlation coefficient

D.Either a negative F statistic or a negative p-value

82.Which is not correct regarding the estimated slope of the OLS regression line?

A.It is divided by its standard error to obtain its t statistic.

B.It shows the change in Y for a unit change in X.

C.It is chosen so as to minimize the sum of squared errors.

D.It may be regarded as zero if its p-value is less than .

83.Simple regression analysis means that:

A.the data are presented in a simple and clear way.

B.we have only a few observations.

C.there are only two independent variables.

D.we have only one explanatory variable.

84.The sample coefficient of correlation does not have which property?

A.It can range from -1.00 up to +1.00.

B.It is also sometimes called Pearson's r.

C.It is tested for significance using a t-test.

D.It assumes that Y is the dependent variable.

85.When comparing the 90 percent prediction and confidence intervals for a given regression analysis:

A.the prediction interval is narrower than the confidence interval.

B.the prediction interval is wider than the confidence interval.

C.there is no difference between the size of the prediction and confidence intervals.

D.no generalization is possible about their comparative width.

86.Which is not true of the coefficient of determination?

A.It is the square of the coefficient of correlation.

B.It is negative when there is an inverse relationship between X and Y.

C.It reports the percent of the variation in Y explained by X.

D.It is calculated using sums of squares (e.g., SSR, SSE, SST).

87.If the fitted regression is Y = 3.5 + 2.1X (R2 = .25, n = 25), it is incorrect to conclude that:

A.Y increases 2.1 percent for a 1 percent increase in X.

B.the estimated regression line crosses the Y axis at 3.5.

C.the sample correlation coefficient must be positive.

D.the value of the sample correlation coefficient is 0.50.

88.In a simple regression Y = b0 + b1X where Y = number of robberies in a city (thousands of robberies), X = size of the police force in a city (thousands of police), and n = 45 randomly chosen large U.S. cities in 2008, we would be least likely to see which problem?

A.Autocorrelated residuals (because this is time-series data)

B.Heteroscedastic residuals (because we are using totals uncorrected for city size)

C.Nonnormal residuals (because a few larger cities may skew the residuals)

D.High leverage for some observations (because some cities may be huge)

89.When homoscedasticity exists, we expect that a plot of the residuals versus the fitted Y:

A.will form approximately a straight line.

B.crosses the centerline too many times.

C.will yield a Durbin-Watson statistic near 2.

D.will show no pattern at all.

90.Which statement is not correct?

A.Spurious correlation can often be reduced by expressing X and Y in per capita terms.

B.Autocorrelation is mainly a concern if we are using time-series data.

C.Heteroscedastic residuals will have roughly the same variance for any value of X.

D.Standardized residuals make it easy to identify outliers or instances of poor fit.

91.In a simple bivariate regression with 25 observations, which statement is most nearly correct?

A.A non-standardized residual whose value is ei = 4.22 would be considered an outlier.

B.A leverage statistic of 0.16 or more would indicate high leverage.

C.Standardizing the residuals will eliminate any heteroscedasticity.

D.Non-normal residuals imply biased coefficient estimates, a major problem.

92.A regression was estimated using these variables: Y = annual value of reported bank robbery losses in all U.S. banks ($millions), X = annual value of currency held by all U.S. banks ($millions), n = 100 years (1912 through 2011). We would not anticipate:

A.autocorrelated residuals due to time-series data.

B.heteroscedastic residuals due to the wide variation in data magnitudes.

C.nonnormal residuals due to skewed data as bank size increases over time.

D.a negative slope because banks hold less currency when they are robbed.

93.A fitted regression for an exam in Prof. Hardtack's class showed Score = 20 + 7 Study, where Score is the student's exam score and Study is the student's study hours. The regression yielded R2 = 0.50 and SE = 8. Bob studied 9 hours. The quick 95 percent prediction interval for Bob's grade is approximately:

A.69 to 97.

B.75 to 91.

C.67 to 99.

D.76 to 90.

94.Which is not an assumption of least squares regression?

A.Normal X values

B.Non-autocorrelated errors

C.Homoscedastic errors

D.Normal errors

95.In a simple bivariate regression with 60 observations there will be _____ residuals.

A.60

B.59

C.58

D.57

96.Which is correct to find the value of the coefficient of determination (R2)?

A.SSR/SSE

B.SSR/SST

C.1 - SST/SSE

97.The critical value for a two-tailed test of H0: 1 = 0 at = .05 in a simple regression with 22 observations is:

A.1.725

B.2.086

C.2.528

D.1.960

98.In a sample of size n = 23, a sample correlation of r = .400 provides sufficient evidence to conclude that the population correlation coefficient exceeds zero in a right-tailed test at:

A. = .01 but not = .05.

B. = .05 but not = .01.

C.both = .05 and = .01.

D.neither = .05 nor = .01.

99.In a sample of n = 23, the Student's t test statistic for a correlation of r = .500 would be:

A.2.559.

B.2.819.

C.2.646.

D.can't say without knowing .

100.In a sample of n = 23, the critical value of the correlation coefficient for a two-tailed test at = .05 is:

A..524

B..412

C..500

D..497

101.In a sample of n = 23, the critical value of Student's t for a two-tailed test of significance for a simple bivariate regression at = .05 is:

A.2.229

B.2.819

C.2.646

D.2.080

102.In a sample of n = 40, a sample correlation of r = .400 provides sufficient evidence to conclude that the population correlation coefficient exceeds zero in a right-tailed test at:

A. = .025 but not = .05.

B. = .05 but not = .025.

C.both = .025 and = .05.



A.2.110

B.1.645

C.1.852

D.can't say without knowing if it's a two-tailed or one-tailed test.


A..587

B..412

C..444

D..497


A.2.060

B.2.052

C.2.898

D.2.074

106.In a sample of size n = 36, a sample correlation of r = -.450 provides sufficient evidence to conclude that the population correlation coefficient differs significantly from zero in a two-tailed test at:

A. = .01

B. = .05

C.both = .01 and = .05.


107.In a sample of n = 36, the Student's t test statistic for a correlation of r = -.450 would be:

A.-2.110.

B.-2.938.

C.-2.030.



A..329

B..387

C..423

D..497

109.In a sample of n = 36, the critical value of Student's t for a two-tailed test of significance of the slope for a simple regression at = .05 is:

A.2.938

B.2.724

C.2.032

D.2.074

110.A local trucking company fitted a regression to relate the travel time (days) of its shipments as a function of the distance traveled (miles). The fitted regression is Time = -7.126 + 0.0214 Distance. If Distance increases by 50 miles, the expected Time would increase by:

A.1.07 days

B.7.13 days

C.2.14 days

D.1.73 days

111.A local trucking company fitted a regression to relate the cost of its shipments as a function of the distance traveled. The Excel fitted regression is shown.

Based on this estimated relationship, when distance increases by 50 miles, the expected shipping cost would increase by:

A.$286.

B.$143.

C.$104.

D.$301.

112.If SSR is 2592 and SSE is 608, then:

A.the slope is likely to be insignificant.

B.the coefficient of determination is .81.

C.the SST would be smaller than SSR.

D.the standard error would be large.

113.Find the sample correlation coefficient for the following data.

A..8911

B..9124

C..9822

D..9556

114.Find the slope of the simple regression= b0 + b1x.

A.1.833

B.3.294

C.0.762

D.-2.228


A..7291

B..8736

C..9118

D..9563


A.2.595

B.1.109

C.-2.221

D.1.884

117.A researcher's results are shown below using n = 25 observations.

The 95 percent confidence interval for the slope is:

A.[ -3.282, -1.284].

B.[ -4.349, -0.217].

C.[1.118, 5.026].

D.[ -0.998, +0.998].

118.A researcher's regression results are shown below using n = 8 observations.


A.[1.333, 2.284].

B.[1.602, 2.064].

C.[1.268, 2.398].

D.[1.118, 2.449].

119.Bob thinks there is something wrong with Excel's fitted regression. What do you say?

A.The estimated equation is obviously incorrect.

B.The R2 looks a little high but otherwise it looks OK.

C.Bob needs to increase his sample size to decide.

D.The relationship is linear, so the equation is credible.

Short Answer Questions120.Pedro became interested in vehicle fuel efficiency, so he performed a simple regression using 93 cars to estimate the model CityMPG = 0 + 1 Weight where Weight is the weight of the vehicle in pounds. His results are shown below. Write a brief analysis of these results, using what you have learned in this chapter. Is the intercept meaningful in this regression? Make a prediction of CityMPG when Weight = 3000, and also when Weight = 4000. Do these predictions seem believable? If you could make a car 1000 pounds lighter, what change would you predict in its CityMPG?

121.Mary noticed that old coins are smoother and more worn. She weighed 31 nickels and recorded their age, and then performed a simple regression to estimate the model Weight = 0 + 1 Age where weight is the weight of the coin in grams and Age is the age of the coin in years. Her results are shown below. Write a brief analysis of these results, using what you have learned in this chapter. Make a prediction of Weight when Age = 10, and also when Age = 20. What does this tell you? Is the intercept meaningful in this regression?

Chapter 12 Simple Regression Answer Key

True / False Questions1.A scatter plot is used to visualize the association (or lack of association) between two quantitative variables.TRUEThe scatter plot shows association between two quantitative variables.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

2.The correlation coefficient r measures the strength of the linear relationship between two variables.TRUEA correlation coefficient measures linearity between two variables.


3.Pearson's correlation coefficient (r) requires that both variables be interval or ratio data.TRUECorrelation assumes quantitative data with at least interval measurements.


4.If r = .55 and n = 16, then the correlation is significant at = .05 in a two-tailed test.TRUEtcalc = r[(n - 2)/(1 - r2)]1/2 = (.55)[(16 - 2)/(1 - .552)]1/2 = 2.464 > t.025 = 2.145 for d.f. = 16 - 2 = 14.

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

5.A sample correlation r = .40 indicates a stronger linear relationship than r = -.60.FALSEThe sign only indicates the direction, not the strength, of the linear relationship.

AACSB: AnalyticBlooms: UnderstandDifficulty: 1 EasyLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

6.A common source of spurious correlation between X and Y is when a third unspecified variable Z affects both X and Y.TRUEBoth X and Y could be influenced by Z.

AACSB: AnalyticBlooms: UnderstandDifficulty: 1 EasyLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

7.The correlation coefficient r always has the same sign as b1 in Y = b0 + b1X.TRUEThe t-test for the slope in simple regression gives the same result as the t-test for r.

AACSB: AnalyticBlooms: UnderstandDifficulty: 1 EasyLearning Objective: 12-04 Fit a simple regression on an Excel scatter plot.Topic: Regression Terminology

8.The fitted intercept in a regression has little meaning if no data values near X = 0 have been observed.TRUEPredicting Y for X = 0 makes little sense if the observed data have no values near X = 0.

AACSB: AnalyticBlooms: UnderstandDifficulty: 1 EasyLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

9.The least squares regression line is obtained when the sum of the squared residuals is minimized.TRUEThe OLS method minimizes the sum of squared residuals.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-04 Fit a simple regression on an Excel scatter plot.Topic: Ordinary Least Squares Formulas

10.In a simple regression, if the coefficient for X is positive and significantly different from zero, then an increase in X is associated with an increase in the mean (i.e., the expected value) of Y.TRUEThe conditional mean of Y depends on X (unless the slope is effectively zero).

AACSB: AnalyticBlooms: UnderstandDifficulty: 1 EasyLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

11.In least-squares regression, the residuals e1, e2, . . . , en will always have a zero mean.TRUEThe residuals must sum to zero if the OLS method is used, so their mean is zero.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Ordinary Least Squares Formulas

12.When using the least squares method, the column of residuals always sums to zero.TRUEThe residuals must sum to zero if the OLS method is used.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Ordinary Least Squares Formulas

13.In the model Sales = 268 + 7.37 Ads, an additional $1 spent on ads will increase sales by 7.37 percent.FALSEThe slope coefficient is in the same units as Y (dollars, not percent, in this case).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

14.If R2 = .36 in the model Sales = 268 + 7.37 Ads with n = 50, the two-tailed test for correlation at = .05 would say that there is a significant correlation between Sales and Ads.TRUEtcalc = r[(n - 2)/(1 - r2)]1/2 = (.60)[(50 - 2)/(1 - .36)]1/2 = 5.196 > t.025 = 2.011 for d.f. = 50 - 2 = 48.

AACSB: AnalyticBlooms: ApplyDifficulty: 3 HardLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

15.If R2 = .36 in the model Sales = 268 + 7.37 Ads, then Ads explains 36 percent of the variation in Sales.TRUEWe can interpret R2 as the fraction of variation in Y explained by X (expressed as a percent).

AACSB: AnalyticBlooms: ApplyDifficulty: 1 EasyLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Ordinary Least Squares Formulas

16.The ordinary least squares regression line always passes through the point .TRUEThe OLS formulas require the line to pass through this point.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Regression Terminology

17.The least squares regression line gives unbiased estimates of 0 and 1.TRUEThe expected values of the OLS estimators b0 and b1 are the true parameters 0 and 1.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-04 Fit a simple regression on an Excel scatter plot.Topic: Ordinary Least Squares Formulas

18.In a simple regression, the correlation coefficient r is the square root of R2.TRUEIn fact, we could use the notation r2 instead of R2 when talking about simple regression.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Ordinary Least Squares Formulas

19.If SSR is 1800 and SSE is 200, then R2 is .90.TRUER2 = SSR/SST = SSR/(SSR + SSE) = 1800/(1800 + 200) = .90.

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Tests for Significance

20.The width of a prediction interval for an individual value of Y is less than standard error se.FALSEThe formula for the interval width multiplies the standard error by an expression > 1.

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-09 Distinguish between confidence and prediction intervals for Y.Topic: Confidence and Prediction Intervals for Y

21.If SSE is near zero in a regression, the statistician will conclude that the proposed model probably has too poor a fit to be useful.FALSESSE is the sum of the square residuals, which would be smaller if the fit is good.


22.For a regression with 200 observations, we expect that about 10 residuals will exceed two standard errors.TRUEIf the residuals are normal, 95.44 percent (190 of 200) will lie within 2se (so 10 outside).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Unusual Observations

23.Confidence intervals for predicted Y are less precise when the residuals are very small.FALSESmall residuals imply a small standard error and thus a narrower prediction interval.


24.Cause-and-effect direction between X and Y may be determined by running the regression twice and seeing whether Y = 0 + 1X or X = 1 + 0Y has the larger R2.FALSECause and effect cannot be determined in the context of simple regression models.

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

25.The ordinary least squares method of estimation minimizes the estimated slope and intercept.FALSEOLS minimizes the sum of squared residuals.


26.Using the ordinary least squares method ensures that the residuals will be normally distributed.FALSEOLS produces unbiased estimates but cannot ensure normality of the residuals.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

27.If you have a strong outlier in the residuals, it may represent a different causal system.TRUEOutliers might come from a different population or causal system.

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Other Regression Problems (Optional)

28.A negative correlation between two variables X and Y usually yields a negative p-value for r.FALSEThe p-value cannot be negative.

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Visual Displays and Correlation Analysis

29.In linear regression between two variables, a significant relationship exists when the p-value of the t test statistic for the slope is greater than .FALSEReject 1 = 0 if the p-value is less than .

AACSB: AnalyticBlooms: ApplyDifficulty: 1 EasyLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Tests for Significance

30.The larger the absolute value of the t statistic of the slope in a simple linear regression, the stronger the linear relationship exists between X and Y.TRUEThe correlation coefficient measures linearity, regardless of its sign (+ or -).


31.In simple linear regression, the coefficient of determination (R2) is estimated from sums of squares in the ANOVA table.TRUER2 = SSR/SST or R2 = 1 - SSE/SST.


32.In simple linear regression, the p-value of the slope will always equal the p-value of the F statistic.TRUEThis is true only if there is one predictor (but is no longer true in multiple regression).

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Analysis of Variance: Overall Fit

33.An observation with high leverage will have a large residual (usually an outlier).FALSEThe concepts are distinct (a high-leverage point could have a good fit).

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Unusual Observations

34.A prediction interval for Y is narrower than the corresponding confidence interval for the mean of Y.FALSEPredicting an individual case requires a wider confidence interval than predicting the mean.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-09 Distinguish between confidence and prediction intervals for Y.Topic: Confidence and Prediction Intervals for Y

35.When X is farther from its mean, the prediction interval and confidence interval for Y become wider.TRUEThe width increases when X differs from its mean (review the formula).


36.The total sum of squares (SST) will never exceed the regression sum of squares (SSR).FALSEThe identity is SSR + SSE = SST.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Analysis of Variance: Overall Fit

37."High leverage" would refer to a data point that is poorly predicted by the model (large residual).FALSEA high-leverage observation may have a good fit (only its X value determines its leverage).

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Unusual Observations

38.The studentized residuals permit us to detect cases where the regression predicts poorly.TRUEStudentized residuals resemble a t-distribution. A large studentized t-value (e.g., t < -2.00 or t > + 2.00) would implies a poor fit.


39.A poor prediction (large residual) indicates an observation with high leverage.FALSEHigh leverage indicates an unusually large or small X value (not a poor prediction). A high-leverage observation may have a good fit or a poor fit. Only its X value determines its leverage.


40.Ill-conditioned refers to a variable whose units are too large or too small (e.g., $2,434,567).TRUEIn Excel, a symptom of poor data conditioning is exponential notation (e.g., 4.3E + 06).

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-07 Perform regression analysis with Excel or other software.Topic: Other Regression Problems (Optional)

41.A simple decimal transformation (e.g., from 18,291 to 18.291) often improves data conditioning.TRUEKeeping data magnitudes similar helps avoid exponential notation (e.g., 4.3E + 06).

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-07 Perform regression analysis with Excel or other software.Topic: Other Regression Problems (Optional)

42.Two-tailed t-tests are often used because any predictor that differs significantly from zero in a two-tailed test will also be significantly greater than zero or less than zero in a one-tailed test at the same .TRUETrue because the critical t is larger in the two-tailed test (the default in most software).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Tests for Significance

43.A predictor that is significant in a one-tailed t-test will also be significant in a two-tailed test at the same level of significance .FALSEFalse because the critical t would be larger in a two-tailed test.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Tests for Significance

44.Omission of a relevant predictor is a common source of model misspecification.TRUEIn a multivariate world, simple regression may be inadequate.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-07 Perform regression analysis with Excel or other software.Topic: Other Regression Problems (Optional)

45.The regression line must pass through the origin.FALSEThe OLS intercept estimate does not, in general, equal zero. We might be unable to reject a zero intercept if a t-test, but the fitted intercept is rarely zero.


46.Outliers can be detected by examining the standardized residuals.TRUEA poor fit implies a large t-value (e.g., larger than 3 would be an outlier).

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Unusual Observations

47.In a simple regression, there are n - 2 degrees of freedom associated with the error sum of squares (SSE).TRUEThis is true in simple regression because we estimate two parameters (0 and 1).


48.In a simple regression, the F statistic is calculated by taking the ratio of MSR to the MSE.TRUEBy definition, Fcalc = MSR/MSE (obtained from the ANOVA table).

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Analysis of Variance: Overall Fit

49.The coefficient of determination is the percentage of the total variation in the response variable Y that is explained by the predictor X.TRUER2 = SSR/SST or R2 = 1 - SSE/SST lies between 0 and 1 and often is expressed as a percent.

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Ordinary Least Squares Formulas

50.A different confidence interval exists for the mean value of Y for each different value of X.TRUEBoth the interval width and also E(Y|X) =0 + 1 X depend on the value of X.


51.A prediction interval for Y is widest when X is near its mean.FALSEThe prediction interval is narrowest when X is near its mean. Review the formula, which has a term (xi - )2 in the numerator. The minimum would be when xi = .


52.In a two-tailed test for correlation at = .05, a sample correlation coefficient r = 0.42 with n = 25 is significantly different than zero.TRUEtcalc = r[(n - 2)/(1 - r2)]1/2 = (.42)[(25 - 2)/(1 - .422)]1/2 = 2.219 > t.025 = 2.069 for d.f. = 25 - 2 = 23.


53.In correlation analysis, neither X nor Y is designated as the independent variable.TRUEIn correlation analysis, X and Y covary without designating either as "independent."


54.A negative value for the correlation coefficient (r) implies a negative value for the slope (b1).TRUEThe sign of r must be the same as the sign of the slope estimate b1.


55.High leverage for an observation indicates that X is far from its mean.TRUEBy definition, observations have higher leverage when X is far from its mean.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Unusual Observations

56.Autocorrelated errors are not usually a concern for regression models using cross-sectional data.TRUEWe more often expect autocorrelated residuals in time series data.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

57.There are usually several possible regression lines that will minimize the sum of squared errors.FALSEThe OLS solution for the estimators b0 and b1 is unique.


58.When the errors in a regression model are not independent, the regression model is said to have autocorrelation.TRUEFor example, in first-order autocorrelation t depends on t-1.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

59.In a simple bivariate regression, Fcalc = tcalc2.TRUEThis statement is true only in a simple regression (one predictor).


60.Correlation analysis primarily measures the degree of the linear relationship between X and Y.TRUEThe sign of r indicates the direction and its magnitude indicates the degree of linearity.

AACSB: AnalyticBlooms: RememberDifficulty: 2 MediumLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

Multiple Choice Questions61.The variable used to predict another variable is called the:

A.response variable.

B.regression variable.

C.independent variable.

D.dependent variable.

We might also call the independent variable a predictor of Y.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

62.The standard error of the regression:

A.is based on squared deviations from the regression line.

B.may assume negative values if b1 < 0.

C.is in squared units of the dependent variable.

D.may be cut in half to get an approximate 95 percent prediction interval.

In a simple regression, the standard error is the square root of the sum of the squared residuals divided by (n - 2).


63.A local trucking company fitted a regression to relate the travel time (days) of its shipments as a function of the distance traveled (miles). The fitted regression is Time = -7.126 + 0.0214 Distance, based on a sample of 20 shipments. The estimated standard error of the slope is 0.0053. Find the value of tcalc to test for zero slope.

A.2.46

B.5.02

C.4.04

D.3.15

tcalc = = (0.0214)/(0.0053) = 4.038.


64.A local trucking company fitted a regression to relate the travel time (days) of its shipments as a function of the distance traveled (miles). The fitted regression is Time = -7.126 + .0214 Distance, based on a sample of 20 shipments. The estimated standard error of the slope is 0.0053. Find the critical value for a right-tailed test to see if the slope is positive, using = .05.

A.2.101

B.2.552

C.1.960

D.1.734

For d.f. = n - 2 = 20 - 2 = 18, Appendix D gives t.05 = 1.734.


65.If the attendance at a baseball game is to be predicted by the equation Attendance = 16,500 - 75 Temperature, what would be the predicted attendance if Temperature is 90 degrees?

A.6,750

B.9,750

C.12,250

D.10, 020

The predicted Attendance is 16,500 - 75(90) = 9,750.

AACSB: AnalyticBlooms: ApplyDifficulty: 1 EasyLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

66.A hypothesis test is conducted at the 5 percent level of significance to test whether the population correlation is zero. If the sample consists of 25 observations and the correlation coefficient is 0.60, then the computed test statistic would be:

A.2.071.

B.1.960.

C.3.597.

D.1.645.

tcalc = r[(n - 2)/(1 - r2)]1/2 = (.60)[(25 - 2)/(1 - .602)]1/2 = 3.597.Comment: Requires formula handout or memorizing the formula.


67.Which of the following is not a characteristic of the F-test in a simple regression?

A.It is a test for overall fit of the model.

B.The test statistic can never be negative.

C.It requires a table with numerator and denominator degrees of freedom.

D.The F-test gives a different p-value than the t-test.

Fcalc is the ratio of two variances (mean squares) that measures overall fit. The test statistic cannot be negative because the variances are non-negative. In a simple regression, the F-test always agrees with the t-test.


68.A researcher's Excel results are shown below using Femlab (labor force participation rate among females) to try to predict Cancer (death rate per 100,000 population due to cancer) in the 50 U.S. states.

Which of the following statements is not true?

A.The standard error is too high for this model to be of any predictive use.

B.The 95 percent confidence interval for the coefficient of Femlab is -4.29 to -0.28.

C.Significant correlation exists between Femlab and Cancer at = .05.

D.The two-tailed p-value for Femlab will be less than .05.

The magnitude of se depends on Y (and, in this case, the tcalc indicates significance).



Which statement is valid regarding the relationship between Femlab and Cancer?

A.A rise in female labor participation rate will cause the cancer rate to decrease within a state.

B.This model explains about 10 percent of the variation in state cancer rates.

C.At the .05 level of significance, there isn't enough evidence to say the two variables are related.

D.If your sister starts working, the cancer rate in your state will decline.

It is customary to express the R2 as a percent (here, the tcalc indicates significance).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Ordinary Least Squares Formulas


What is the R2 for this regression?

A..9018

B..0982

C..8395

D..1605

R2 = SSR/SST = (5,377.836)/(54,745.225) = .0982.


71.A news network stated that a study had found a positive correlation between the number of children a worker has and his or her earnings last year. You may conclude that:

A.people should have more children so they can get better jobs.

B.the data are erroneous because the correlation should be negative.

C.causation is in serious doubt.

D.statisticians have small families.

There is no a priori basis for expecting causation.

AACSB: AnalyticBlooms: ApplyDifficulty: 1 EasyLearning Objective: 12-01 Calculate and test a correlation coefficient for significance.Topic: Visual Displays and Correlation Analysis

72.William used a sample of 68 large U.S. cities to estimate the relationship between Crime (annual property crimes per 100,000 persons) and Income (median annual income per capita, in dollars). His estimated regression equation was Crime = 428 + 0.050 Income. We can conclude that:

A.the slope is small so Income has no effect on Crime.

B.crime seems to create additional income in a city.

C.wealthy individuals tend to commit more crimes, on average.

D.the intercept is irrelevant since zero median income is impossible in a large city.

Zero median income makes no sense (significance cannot be assessed from given facts).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Simple Regression

73.Mary used a sample of 68 large U.S. cities to estimate the relationship between Crime (annual property crimes per 100,000 persons) and Income (median annual income per capita, in dollars). Her estimated regression equation was Crime = 428 + 0.050 Income. If Income decreases by 1000, we would expect that Crime will:

A.increase by 428.

B.decrease by 50.

C.increase by 500.

D.remain unchanged.

The constant has no effect so Crime = 0.050 Income = 0.050(-1000) = -50.


74.Amelia used a random sample of 100 accounts receivable to estimate the relationship between Days (number of days from billing to receipt of payment) and Size (size of balance due in dollars). Her estimated regression equation was Days = 22 + 0.0047 Size with a correlation coefficient of .300. From this information we can conclude that:

A.9 percent of the variation in Days is explained by Size.

B.autocorrelation is likely to be a problem.

C.the relationship between Days and Size is significant.

D.larger accounts usually take less time to pay.

R2 = .302 = .09. These are not time-series data, so there is no reason to expect autocorrelation. We cannot judge significance without more information.

AACSB: AnalyticBlooms: ApplyDifficulty: 3 HardLearning Objective: 12-08 Interpret the standard error; R2; ANOVA table; and F test.Topic: Ordinary Least Squares Formulas

75.Prediction intervals for Y are narrowest when:

A.the mean of X is near the mean of Y.

B.the value of X is near the mean of X.

C.the mean of X differs greatly from the mean of Y.

D.the mean of X is small.

Review the formula, which has (xi - )2 in the numerator. The minimum would be when xi = .


76.If n = 15 and r = .4296, the corresponding t-statistic to test for zero correlation is:

A.1.715.

B.7.862.

C.2.048.

D.impossible to determine without .

tcalc = r[(n - 2)/(1 - r2)]1/2 = (.4296)[(15 - 2)/(1 - .42962)]1/2 = 1.715.


77.Using a two-tailed test at = .05 for n = 30, we would reject the hypothesis of zero correlation if the absolute value of r exceeds:

A..2992.

B..3609.

C..0250.

D..2004.

Use rcrit = t.025/(t.0252 + n - 2)1/2 = (2.048)/(2.0482 + 30 - 2)1/2 = .3609 for d.f. = 30 - 2 = 28.


78.The ordinary least squares (OLS) method of estimation will minimize:

A.neither the slope nor the intercept.

B.only the slope.

C.only the intercept.

D.both the slope and intercept.

OLS method minimizes the sum of squared residuals.


79.A standardized residual ei = -2.205 indicates:

A.a rather poor prediction.

B.an extreme outlier in the residuals.

C.an observation with high leverage.

D.a likely data entry error.

This residual is beyond 2se but is not an outlier (and without xi we cannot assess leverage).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Residual Tests

80.In a simple regression, which would suggest a significant relationship between X and Y?

A.Large p-value for the estimated slope

B.Large t statistic for the slope

C.Large p-value for the F statistic

D.Small t-statistic for the slope

The larger the tcalc the more we feel like rejecting H0: 1 = 0.


81.Which is indicative of an inverse relationship between X and Y?

A.A negative F statistic

B.A negative p-value for the correlation coefficient

C.A negative correlation coefficient

D.Either a negative F statistic or a negative p-value

Fcalc and the p-value cannot be negative.


82.Which is not correct regarding the estimated slope of the OLS regression line?

A.It is divided by its standard error to obtain its t statistic.

B.It shows the change in Y for a unit change in X.

C.It is chosen so as to minimize the sum of squared errors.

D.It may be regarded as zero if its p-value is less than .

We would reject H0: 1 = 0 if its p-value is less than the level of significance.


83.Simple regression analysis means that:

A.the data are presented in a simple and clear way.

B.we have only a few observations.

C.there are only two independent variables.

D.we have only one explanatory variable.

Multiple regression has more than one independent variable (predictor).

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

84.The sample coefficient of correlation does not have which property?

A.It can range from -1.00 up to +1.00.

B.It is also sometimes called Pearson's r.

C.It is tested for significance using a t-test.

D.It assumes that Y is the dependent variable.

Correlation analysis makes no assumption of causation or dependence.


85.When comparing the 90 percent prediction and confidence intervals for a given regression analysis:

A.the prediction interval is narrower than the confidence interval.

B.the prediction interval is wider than the confidence interval.

C.there is no difference between the size of the prediction and confidence intervals.

D.no generalization is possible about their comparative width.

Individual values of Y vary more than the mean of Y.

AACSB: AnalyticBlooms: RememberDifficulty: 1 EasyLearning Objective: 12-09 Distinguish between confidence and prediction intervals for Y.Topic: Confidence and Prediction Intervals for Y

86.Which is not true of the coefficient of determination?

A.It is the square of the coefficient of correlation.

B.It is negative when there is an inverse relationship between X and Y.

C.It reports the percent of the variation in Y explained by X.

D.It is calculated using sums of squares (e.g., SSR, SSE, SST).

R2 cannot be negative.


87.If the fitted regression is Y = 3.5 + 2.1X (R2 = .25, n = 25), it is incorrect to conclude that:

A.Y increases 2.1 percent for a 1 percent increase in X.

B.the estimated regression line crosses the Y axis at 3.5.

C.the sample correlation coefficient must be positive.

D.the value of the sample correlation coefficient is 0.50.

Units are not percent unless Y is already a percent.

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-02 Interpret the slope and intercept of a regression equation.Topic: Simple Regression

88.In a simple regression Y = b0 + b1X where Y = number of robberies in a city (thousands of robberies), X = size of the police force in a city (thousands of police), and n = 45 randomly chosen large U.S. cities in 2008, we would be least likely to see which problem?

A.Autocorrelated residuals (because this is time-series data)

B.Heteroscedastic residuals (because we are using totals uncorrected for city size)

C.Nonnormal residuals (because a few larger cities may skew the residuals)

D.High leverage for some observations (because some cities may be huge)

It is not a time series, so autocorrelation would not be expected, but the "size effect" is likely to produce heteroscedasticity, nonnormality, and unusual leverage.

AACSB: AnalyticBlooms: ApplyDifficulty: 3 HardLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

89.When homoscedasticity exists, we expect that a plot of the residuals versus the fitted Y:

A.will form approximately a straight line.

B.crosses the centerline too many times.

C.will yield a Durbin-Watson statistic near 2.

D.will show no pattern at all.

Homoscedastic residuals exhibit no pattern (equal variance for all Y).

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

90.Which statement is not correct?

A.Spurious correlation can often be reduced by expressing X and Y in per capita terms.

B.Autocorrelation is mainly a concern if we are using time-series data.

C.Heteroscedastic residuals will have roughly the same variance for any value of X.

D.Standardized residuals make it easy to identify outliers or instances of poor fit.

Heteroscedastic residuals exhibit different variance for different X or Y values.

AACSB: AnalyticBlooms: UnderstandDifficulty: 2 MediumLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

91.In a simple bivariate regression with 25 observations, which statement is most nearly correct?

A.A non-standardized residual whose value is ei = 4.22 would be considered an outlier.

B.A leverage statistic of 0.16 or more would indicate high leverage.

C.Standardizing the residuals will eliminate any heteroscedasticity.

D.Non-normal residuals imply biased coefficient estimates, a major problem.

For simple regression, the "high leverage criterion" is hi > 4/n = 4/25 = .16. We cannot judge a residual's magnitude without knowing the standard error se. Standardizing is only a scale shift so does not reduce heteroscedasticity. Non-normal errors do not bias the OLS estimates.

AACSB: AnalyticBlooms: ApplyDifficulty: 3 HardLearning Objective: 12-11 Identify unusual residuals and high-leverage observations.Topic: Unusual Observations

92.A regression was estimated using these variables: Y = annual value of reported bank robbery losses in all U.S. banks ($millions), X = annual value of currency held by all U.S. banks ($millions), n = 100 years (1912 through 2011). We would not anticipate:

A.autocorrelated residuals due to time-series data.

B.heteroscedastic residuals due to the wide variation in data magnitudes.

C.nonnormal residuals due to skewed data as bank size increases over time.

D.a negative slope because banks hold less currency when they are robbed.

It is a time series, so autocorrelation would be expected, and the "size effect" is likely to produce heteroscedasticity and nonnormality, but growth in both X and Y would yield a positive slope.

AACSB: AnalyticBlooms: ApplyDifficulty: 3 HardLearning Objective: 12-10 Test residuals for violations of regression assumptions.Topic: Residual Tests

93.A fitted regression for an exam in Prof. Hardtack's class showed Score = 20 + 7 Study, where Score is the student's exam score and Study is the student's study hours. The regression yielded R2 = 0.50 and SE = 8. Bob studied 9 hours. The quick 95 percent prediction interval for Bob's grade is approximately:

A.69 to 97.

B.75 to 91.

C.67 to 99.

D.76 to 90.

The quick interval is ypredicted 2se or 83 (2)(8) or 83 16.

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-09 Distinguish between confidence and prediction intervals for Y.Topic: Confidence and Prediction Intervals for Y

94.Which is not an assumption of least squares regression?

A.Normal X values

B.Non-autocorrelated errors

C.Homoscedastic errors

D.Normal errors

The predictor X is not assumed to be a random variable at all.

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-04 Fit a simple regression on an Excel scatter plot.Topic: Ordinary Least Squares Formulas

95.In a simple bivariate regression with 60 observations there will be _____ residuals.

A.60

B.59

C.58

D.57

There is one residual for every observation.

AACSB: AnalyticBlooms: ApplyDifficulty: 1 EasyLearning Objective: 12-03 Make a prediction for a given x value using a regression equation.Topic: Regression Terminology

96.Which is correct to find the value of the coefficient of determination (R2)?

A.SSR/SSE

B.SSR/SST

C.1 - SST/SSE

We use the ANOVA sums of squares to calculate R2.


97.The critical value for a two-tailed test of H0: 1 = 0 at = .05 in a simple regression with 22 observations is:

A.1.725

B.2.086

C.2.528

D.1.960

From Appendix D, tcrit = 2.086 for d.f. = n - 2 = 22 - 2 = 20.


98.In a sample of size n = 23, a sample correlation of r = .400 provides sufficient evidence to conclude that the population correlation coefficient exceeds zero in a right-tailed test at:

A. = .01 but not = .05.

B. = .05 but not = .01.

C.both = .05 and = .01.


tcalc = r[(n - 2)/(1 - r2)]1/2 = (.40)[(23 - 2)/(1 - .402)]1/2 = 2.000 > t.05 = 1.721 for d.f. = 23 - 2 = 21. However, the test would not be significant for t.01 = 2.518.



A.2.559.

B.2.819.

C.2.646.


tcalc = r[(n - 2)/(1 - r2)]1/2 = (.50)[(23 - 2)/(1 - .502)]1/2 = 2.646.



A..524

B..412

C..500

D..497




A.2.229

B.2.819

C.2.646

D.2.080

From Appendix D, t.025 = 2.080 for d.f. = n - 2 = 23 - 2 = 21.


102.In a sample of n = 40, a sample correlation of r = .400 provides sufficient evidence to conclude that the population correlation coefficient exceeds zero in a right-tailed test at:

A. = .025 but not = .05.

B. = .05 but not = .025.

C.both = .025 and = .05.


tcalc = r[(n - 2)/(1 - r2)]1/2 = (.40)[(40 - 2)/(1 - .402)]1/2 = 2.690 > t.025 = 2.024 for d.f. = 40 - 2 = 38. The test would also be significant a fortiori if we used t.05 = 1.686.



A.2.110

B.1.645

C.1.852

D.can't say without knowing if it's a two-tailed or one-tailed test.

tcalc = r[(n - 2)/(1 - r2)]1/2 = (.40)[(20 - 2)/(1 - .402)]1/2 = 1.852.



A..587

B..412

C..444

D..497




A.2.060

B.2.052

C.2.898

D.2.074



106.In a sample of size n = 36, a sample correlation of r = -.450 provides sufficient evidence to conclude that the population correlation coefficient differs significantly from zero in a two-tailed test at:

A. = .01

B. = .05

C.both = .01 and = .05.


tcalc = r[(n - 2)/(1 - r2)]1/2 = (-.45)[(36 - 2)/(1 - (-.40)2)]1/2 = -2.938 < t.005 = -2.728 for d.f. = 34. The test would also be significant a fortiori if we used t.025 = -2.032


107.In a sample of n = 36, the Student's t test statistic for a correlation of r = -.450 would be:

A.-2.110.

B.-2.938.

C.-2.030.


tcalc = r[(n - 2)/(1 - r2)]1/2 = (-.45)[(36 - 2)/(1 - (-.40)2)]1/2 = -2.938.



A..329

B..387

C..423

D..497



109.In a sample of n = 36, the critical value of Student's t for a two-tailed test of significance of the slope for a simple regression at = .05 is:

A.2.938

B.2.724

C.2.032

D.2.074



110.A local trucking company fitted a regression to relate the travel time (days) of its shipments as a function of the distance traveled (miles). The fitted regression is Time = -7.126 + 0.0214 Distance. If Distance increases by 50 miles, the expected Time would increase by:

A.1.07 days

B.7.13 days

C.2.14 days

D.1.73 days

50(0.0214) = 1.07.


111.A local trucking company fitted a regression to relate the cost of its shipments as a function of the distance traveled. The Excel fitted regression is shown.

Based on this estimated relationship, when distance increases by 50 miles, the expected shipping cost would increase by:

A.$286.

B.$143.

C.$104.

D.$301.

2.8666(50) = $143.33.


112.If SSR is 2592 and SSE is 608, then:

A.the slope is likely to be insignificant.

B.the coefficient of determination is .81.

C.the SST would be smaller than SSR.

D.the standard error would be large.

R2 = SSR/SST = SSR/(SSR + SSE) = 2592/(2592 + 608) = .81. SST cannot be smaller than SSR because SST = SSR + SSE. The significance and standard error cannot be judged without more information.



A..8911

B..9124

C..9822

D..9556

Use Excel =CORREL(XData, YData) to verify your calculation using the formula for r.



A.1.833

B.3.294

C.0.762

D.-2.228

Use Excel to verify your calculations using the formulas for b0 and b1.



A..7291

B..8736

C..9118

D..9563

Use Excel =CORREL(XData, YData) to verify your calculation using the formula for r.



A.2.595

B.1.109

C.-2.221

D.1.884

Use Excel to verify your calculations using the formulas for b0 and b1.


117.A researcher's results are shown below using n = 25 observations.


A.[ -3.282, -1.284].

B.[ -4.349, -0.217].

C.[1.118, 5.026].

D.[ -0.998, +0.998].

For d.f. = n - 2 = 25 - 2 = 23, t.025 = 2.069, so -2.2834 (2.069)(0.99855).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-05 Calculate and interpret confidence intervals for regression coefficients.Topic: Tests for Significance

118.A researcher's regression results are shown below using n = 8 observations.


A.[1.333, 2.284].

B.[1.602, 2.064].

C.[1.268, 2.398].

D.[1.118, 2.449].

For d.f. = n - 2 = 8 - 2 = 6, t.025 = 2.447, so 1.8333 (2.447)(0.2307).

AACSB: AnalyticBlooms: ApplyDifficulty: 2 MediumLearning Objective: 12-05 Calculate and interpret confidence intervals for regression coefficients.Topic: Tests for Significance

119.Bob thinks there is something wrong with Excel's fitted regression. What do you say?

A.The estimated equation is obviously incorrect.

B.The R2 looks a little high but otherwise it looks OK.

C.Bob needs to increase his sample size to decide.

D.The relationship is linear, so the equation is credible.

A visual estimate of the slope is y/x = (625 - 100)/(200 - 0) = 2.625, so the indicated slope less than 1 must be wrong, plus the visual intercept is 100 (not 154.61) and the fit seems better than R2 = .2284.

AACSB: AnalyticBlooms: ApplyDifficulty: 3 HardLearning Objective: 12-04 Fit a simple regression on an Excel scatter plot.Topic: Ordinary Least Squares Formulas

Short Answer Questions120.Pedro became interested in vehicle fuel efficiency, so he performed a simple regression using 93 cars to estimate the model CityMPG = 0 + 1 Weight where Weight is the weight of the vehicle in pounds. His results are shown below. Write a brief analysis of these results, using what you have learned in this chapter. Is the intercept meaningful in this regression? Make a prediction of CityMPG when Weight = 3000, and also when Weight = 4000. Do these predictions seem believable? If you could make a car 1000 pounds lighter, what change would you predict in its CityMPG?

It is reasonable that a causal relationship might exist between a vehicle's weight and its MPG. We expect a negative slope (heavier vehicles would get lower MPG). The coefficient of Weight differs from zero at any common value of (the p-value is less than .0001) and the F statistic is huge. The confidence interval for the coefficient of the predictor Weight does not include zero. The highly significant predictor Weight is consistent with the high coefficient of determination (R2 = .711), which says that well over half the variation in MPG is explained by Weight. If Weight = 3000, we predict MPG = 47.0484 - .0080 Weight = 47.0484 - .0080(3000) = 23.05 mpg. If Weight = 4000, we predict MPG = 47.0484 - .0080 Weight = 47.0484 - .0080(4000) = 15.05 mpg. The intercept is not meaningful since no vehicle has zero weight or a weight close to zero.

Feedback: It is reasonable to postulate that a causal relationship might exist between a vehicle's weight and its MPG. Our a priori expectation would be that the slope should be negative since we would expect that heavier vehicles would get lower MPG. The coefficient of Weight differs from zero at any common value of (the p-value is less than .0001) and the F statistic is huge. The confidence interval for the coefficient of the predictor Weight does not include zero. The slope's sign is negative, as anticipated a priori. The highly significant predictor Weight is consistent with the high coefficient of determination (R2 = .711), which says that well over half the variation in MPG is explained by Weight. If Weight = 3000, we predict MPG = 47.0484 - .0080 Weight = 47.0484 - .0080(3000) = 23.05 mpg. When Weight = 4000, we would predict MPG = 47.0484 - .0080 Weight = 47.0484 - .0080(4000) = 15.05 mpg. The intercept is not meaningful since no vehicle has zero weight or any weight close to zero.

AACSB: Reflective ThinkingBlooms: EvaluateDifficulty: 3 HardLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Tests for Significance

121.Mary noticed that old coins are smoother and more worn. She weighed 31 nickels and recorded their age, and then performed a simple regression to estimate the model Weight = 0 + 1 Age where weight is the weight of the coin in grams and Age is the age of the coin in years. Her results are shown below. Write a brief analysis of these results, using what you have learned in this chapter. Make a prediction of Weight when Age = 10, and also when Age = 20. What does this tell you? Is the intercept meaningful in this regression?

It is reasonable to postulate a causal relationship between a coin's age and its weight (negative slope, since we would expect that coins will wear down with usage). The coefficient of Age differs from zero at any common (the p-value is less than .0001) and the F test statistic is large. The confidence interval for the coefficient of Age does not include zero, and its sign is negative, as anticipated a priori. Despite the significant predictor Age, the coefficient of determination (R2 = .442) shows that less than half the variation in nickel weights is explained by Age. If Age = 10, we predict Weight = 5.0210 - .0040 Age = 5.0210 - .0040(10) = 4.981 gm. If Age = 20, we predict Weight = 5.0210 - .0040 Age = 5.0210 - .0040(20) = 4.941 gm. The intercept is meaningful if Age = 0 was in the sample data set (or at least some Age value near zero). The intercept is logically meaningful because Age = 0 is something we might observe (i.e., a newly minted nickel).

Feedback: It is reasonable to postulate that a causal relationship might exist between a coin's age and its weight. Our a priori expectation would be that the slope should be negative since we would expect that coins will wear down with usage. The coefficient of Age differs from zero at any common value of (the p-value is less than .0001) and the F test statistic is quite large. The confidence interval for the coefficient of Age does not include zero, and its sign is negative, as anticipated a priori. Despite the highly significant predictor Age, the coefficient of determination (R2 = .442) shows that less than half the variation in nickel weights is explained by Age. Our predictions: If Age = 10, we would predict Weight = 5.0210 - .0040 Age = 5.0210 - .0040(10) = 4.981 gm. If Age = 20, we would predict Weight = 5.0210 - .0040 Age = 5.0210 - .0040(20) = 4.941 gm. The intercept is meaningful, assuming that Age = 0 years was included in the sample data set (or at least some Age value near zero). The intercept is logically meaningful a priori because Age = 0 is something we might easily observe (i.e., a newly minted nickel).

AACSB: Reflective ThinkingBlooms: EvaluateDifficulty: 3 HardLearning Objective: 12-06 Test hypotheses about the slope and intercept by using t tests.Topic: Tests for Significance

Chap 012

Documents

Transcript of Chap 012