1 Chapter 13 Multiple Regression. 2 Chapter Outline Multiple Regression Model Least Squares Method...
-
Upload
norah-oconnor -
Category
Documents
-
view
231 -
download
1
Transcript of 1 Chapter 13 Multiple Regression. 2 Chapter Outline Multiple Regression Model Least Squares Method...
1
Chapter 13
Multiple Regression
2
Chapter Outline
Multiple Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance Estimation and Prediction Categorical Independent Variables
3
Multiple Regression Model
yy = = 00 + + 11xx11 + + 22xx2 2 ++ . . . + . . . + ppxxpp + +
The equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . xp and an error term is:
where:0, 1, 2, . . . , p are the parameters, and is a random variable called the error term
Multiple Regression Model
4
Multiple Regression Equation
Multiple regression equation is:Multiple regression equation is:
The equation that describes how the mean value of y is related to x1, x2, . . . xp is:
E(y) = 0 + 1x1 + 2x2 + . . . + pxp
5
Estimated Multiple Regression Equation
A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimates of the parameters 0, 1, 2, . . . , p.
^y = b0 + b1x1 + b2x2 + . . .
+ bpxp
Estimated Multiple Regression Equation
6
Estimation Process
Multiple Regression ModelE(y) = 0 + 1x1 + 2x2 +. . .+ pxp +
Multiple Regression EquationE(y) = 0 + 1x1 + 2x2 +. . .+ pxp
Unknown parameters are0, 1, 2, . . . , p
Sample Data:x1 x2 . . . xp y. . . .. . . .
0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x
Estimated MultipleRegression Equation
Sample statistics are
b0, b1, b2, . . . , bp
b0, b1, b2, . . . , bp
provide estimates of0, 1, 2, . . . , p
7
Least Squares Method
Least Squares CriterionLeast Squares Criterion
min (y yi i )2min (y yi i )2
Computation of Coefficient Values:
The formulas for the regression coefficientsb0, b1, b2, . . . bp involve the use of matrix algebra.
We will rely on computer software packages toperform the calculations.
8
Multiple Regression
Example: Employee Salary SurveyExample: Employee Salary Survey
The gender of employees, years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 employees is shown on the next slide.
A local firm collected data for a sample of 20 employees. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s aptitude test.
9
Multiple Regression Example: Employee Salary Survey (data)Example: Employee Salary Survey (data)
GenderYears of
ExperienceScore
Salary ($1,000)
GenderYears of
ExperienceScore
Salary ($1,000)
F 4 78 24.0 M 9 88 38.0M 7 100 43.0 F 2 73 26.6F 1 86 23.7 M 10 75 36.2M 5 82 34.3 F 5 81 31.6M 8 86 35.8 F 6 74 29.0M 10 84 38.0 M 8 87 34.0F 0 75 22.2 F 4 79 30.1F 1 80 23.1 M 6 94 33.9F 6 83 30.0 F 3 70 28.2M 6 91 33.0 F 3 89 30.0
10
Multiple Regression Model
Suppose we believe that salary (y) is related to
the years of experience (x1) and the score on the
aptitude test (x2) by the following regression model:
where y = annual salary ($1000) x1 = years of experience
x2 = score on aptitude test
y = 0 + 1x1 + 2x2 +
11
Solving for the Estimates of 0, 1, 2
Input DataInput DataLeast SquaresLeast Squares
OutputOutput
xx11 xx22 yy
4 78 4 78 2424
7 100 7 100 4343
. . . . ..
. . . . ..
3 89 3 89 3030
ComputerComputerPackagePackage
for Solvingfor SolvingMultipleMultiple
RegressionRegressionProblemsProblems
bb00 = =
bb11 = =
bb22 = =
RR22 = =
etc.etc.
12
Solving for the Estimates of 0, 1, 2
Excel’s Regression Output – Parameter Estimates
Coefficients Standard Error t Stat P-valueIntercept 3.1739 6.1561 0.5156 0.6128Years of
Experience1.4039 0.1986 7.0702 0.0000
Score 0.2509 0.0774 3.2433 0.0048
Note: All the numbers are rounded to the 4th decimal point.
13
Estimated Regression Equation
SALARY = 3.174 + 1.404(YEARS) + 0.251(SCORE)
Note: Predicted salary will be in thousands of dollars.
14
Interpreting the Coefficients
In multiple regression analysis, we interpret each
regression coefficient as follows: bi represents an estimate of the change in y corresponding to a 1-unit increase in xi when all other independent variables are held constant.
15
Interpreting the Coefficients
Salary is expected to increase by $1,404 for each additional year of experience (when the variablescore on attitude test is held constant).
b1 = 1.404b1 = 1.404
16
Interpreting the Coefficients
b2 = 0.251b2 = 0.251
Salary is expected to increase by $251 for each additional point scored on the aptitude test (when the variable years of experience is held constant).
17
Multiple Coefficient of Determination
where: SST = total sum of squares (i.e. total variability
of y) SSR = sum of squares due to regression (i.e. the
variability of y that is explained by regression) SSE = sum of squares due to error (i.e. the variability
of y that cannot be explained by regression)
SST = SSR + SSE
2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y
• Relationship Among SST, SSR, SSE
18
Multiple Coefficient of Determination
Excel’s ANOVA Output
ANOVAdf SS MS F Significance F
Regression 2 500.3285303 250.1643 42.76013 2.32774E-07Residual 17 99.45696969 5.85041
Total 19 599.7855
SSTSSR
19
Multiple Coefficient of Determination
r2 = SSR/SST = 500.3285/599.7855 = .83418500.3285/599.7855 = .83418
The regression relationship is strong. About83.4% of the variability in the salary of employees can beexplained by the years of experience and the aptitude score.
20
Adjusted Multiple Coefficient of Determination
R Rn
n pa2 21 1
11
( )R Rn
n pa2 21 1
11
( )
2 20 11 (1 .834179) .814671
20 2 1aR
2 20 11 (1 .834179) .814671
20 2 1aR
Note: p is the number of slope coefficients.
21
Assumptions About the Error Term
yy = = 00 + + 11xx11 + + 22xx22 + … + + … + ppxxpp + +
1. The error is a random variable with mean of zero.1. The error is a random variable with mean of zero.
2. The variance of , denoted by 2, is the same for all values of the independent variable.2. The variance of , denoted by 2, is the same for all values of the independent variable.
3. The values of are independent.3. The values of are independent.
4. The error 4. The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..
4. The error 4. The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..
22
Testing for Significance
In simple linear regression, the F and t tests provide the same conclusion. In simple linear regression, the F and t tests provide the same conclusion.
In multiple regression, the F and t tests have different purposes. In multiple regression, the F and t tests have different purposes.
23
Testing for Significance
The F test is used to test the overall significance of a regression model. The F test is used to test the overall significance of a regression model.
The t test is used to test the individual significance, i.e. whether each of the individual independent variables is significant.
The t test is used to test the individual significance, i.e. whether each of the individual independent variables is significant.
24
Testing for Significance: F Test
Hypotheses
Rejection Rule
Test Statistics
H0: 1 = 2 = . . . = p = 0
Ha: One or more of the parameters
is not equal to zero.
F = MSR/MSE
Reject H0 if p-value < or if F > F
where F is based on an F distribution
with p d.f. in the numerator andn - p - 1 d.f. in the denominator.
25
F Test: Employee Salary Survey
Hypotheses
Test Statistics
H0: 1 = 2 = 0
Ha: One or both of the parameters
is not equal to zero.
F = MSR/MSE = 250.16/5.85 = 42.76
Rejection Rule For = .05 and d.f. = (2, 17); F.05 = 3.59
Reject H0 if p-value < .05 or F > 3.59
26
F Test: Employee Salary Survey
ANOVAdf SS MS F Significance F
Regression 2 500.3285303 250.1643 42.76013 2.32774E-07Residual 17 99.45696969 5.85041
Total 19 599.7855
ConclusionConclusion pp-value -value << .05, so we can reject .05, so we can reject HH00..(Also, (Also, FF = 42.76 = 42.76 >> 3.59) 3.59)
p-value
27
Testing for Significance: t Test
Hypotheses
Rejection Rule
Test Statistics
Reject H0 if p-value < or
if t < -tor t > twhere t
is based on a t distributionwith n - p - 1 degrees of freedom.
0 : 0iH 0 : 0iH
: 0a iH : 0a iH
ib
i
s
bt
28
t Test: Employee Salary Survey
Hypotheses
Rejection Rule
Test Statistics
0 : 0iH 0 : 0iH
: 0a iH : 0a iH
07.71986.0
4039.1
1
1 bs
b24.3
07735.0
25089.0
2
2 bs
b
For = .05 and d.f. = 17, t.025 = 2.11
Reject H0 if p-value < .05, or
if t < -2.11 or t > 2.11
29
t Test: Employee Salary Survey
Coefficients Standard Error t Stat P-valueIntercept 3.1739 6.1561 0.5156 0.6128Years of
Experience1.4039 0.1986 7.0702 0.0000
Score 0.2509 0.0774 3.2433 0.0048
ConclusionsReject both H0: 1 = 0 and H0: 2 = 0.
Both independent variables aresignificant.
30
Test for Significance: Multicollinearity
The term multicollinearity refers to the correlation among the independent variables. The term multicollinearity refers to the correlation among the independent variables.
When the independent variables are highly correlated (say, |r | > .7), it is not possible to determine the separate effect of any particular independent variable on the dependent variable.
When the independent variables are highly correlated (say, |r | > .7), it is not possible to determine the separate effect of any particular independent variable on the dependent variable.
31
Using the Estimated Regression Equation for Estimation and Prediction
The procedures for estimating the mean value of y and predicting an individual value of y in multiple regression are similar to those in simple regression.
The procedures for estimating the mean value of y and predicting an individual value of y in multiple regression are similar to those in simple regression.
We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the corresponding value of y as the point estimate.
We substitute the given values of x1, x2, . . . , xp into the estimated regression equation and use the corresponding value of y as the point estimate.
32
Categorical Independent Variables
In many situations we must work with categorical independent variables such as gender (male, female), method of payment (cash, check, credit card), etc.
In many situations we must work with categorical independent variables such as gender (male, female), method of payment (cash, check, credit card), etc.
For example, xi might represent gender where xi = 0 indicates male and xi = 1 indicates female.
For example, xi might represent gender where xi = 0 indicates male and xi = 1 indicates female.
In this case, xi is called a dummy or indicator variable. In this case, xi is called a dummy or indicator variable.
33
Categorical Independent Variables
The years of experience, the score on the aptitude test, employees’ gender, and the annual salary ($000) for each of the sampled 20 employees are shown on the next slide.
Example: Employee Salary Survey
As an extension of the problem involving the employee salary survey, suppose that management wants to find out if the annual salary is related to employees’ gender.
34
Estimated Regression Equation
^
where:
y = annual salary ($1000) x1 = years of experience
x2 = score on aptitude test
x3 = 0 if an employee is female; 1 if an employee is male.
x3 is a dummy variable
y = b0 + b1x1 + b2x2 + b3x3^
35
Categorical Independent Variables
Years of Experience
Score GenderSalary ($1,000)
Years of Experience
Score GenderSalary ($1,000)
4 78 0 24.0 9 88 1 38.07 100 1 43.0 2 73 0 26.61 86 0 23.7 10 75 1 36.25 82 1 34.3 5 81 0 31.68 86 1 35.8 6 74 0 29.010 84 1 38.0 8 87 1 34.00 75 0 22.2 4 79 0 30.11 80 0 23.1 6 94 1 33.96 83 0 30.0 3 70 0 28.26 91 1 33.0 3 89 0 30.0
36
Categorical Independent Variables
Excel’s Regression Statistics
Regression StatisticsMultiple R 0.92021524R Square 0.84679609Adjusted R Square 0.81807035Standard Error 2.3964751Observations 20
37
Categorical Independent Variables
Excel’s ANOVA Output
ANOVAdf SS MS F Significance F
Regression 3 507.8960134 169.2987 29.47866 9.41675E-07Residual 16 91.88948657 5.743093Total 19 599.7855
38
Categorical Independent Variables
Excel’s Regression Equation Output
Coefficients Standard Error t Stat P-valueIntercept 7.94484872 7.380797058 1.076422 0.297702Years of Experience 1.14758173 0.29760152 3.856102 0.001397Score 0.19693699 0.089903726 2.190532 0.04364Gender 2.28042384 1.986610668 1.147897 0.267885
Not significant