1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of...
1
Chapter 3 Multiple Linear Regression
Ray-Bing Chen
Institute of Statistics
National University of Kaohsiung
2
3.1 Multiple Regression Models
• Multiple regression model: involve more than one regressor variable.
• Example: The yield in pounds of conversion depends on temperature and the catalyst concentration.
3
• E(y) = 50 +10 x1 + 7 x2
4
• The response y may be related to k regressor or predictor variables: (multiple linear regression model)
• The parameter j represents the expected change
in the response y per unit change in xi when all of
the remaining regressor variables xj are held
constant.
5
• Multiple linear regression models are often used as the empirical models or approximating functions. (True model is unknown)
• The cubic model:
• The model with interaction effects:
• Any regression model that is linear in the parameters is a linear regression model, regardless of the shape of the surface that it generates.
6
7
• The second-order model with interaction:
8
9
3.2 Estimation of the Model Parameters
3.2.1 Least-squares Estimation of the Regression Coefficients
• n observations (n > k)• Assume
– The error term , E() = 0 and Var() = 2 – The errors are uncorrelated.
– The regressor variables, x1,…, xk are fixed.
10
• The sample regression model:
• The least-squares function:
• The normal equations:
11
• Matrix notation:
12
• The least-squares function:
13
• The fitted model corresponding to the levels of the regressor variable, x:
• The hat matrix, H, is an idempotent matrix and is a symmetric matrix. i.e. H2 = H and HT = H
• H is an orthogonal projection matrix.• Residuals:
14
• Example 3.1 The Delivery Time Data– y: the delivery time,
– x1: the number of cases of product stocked,
– x2: the distance walked by the route driver
– Consider y = 0 + 1 x1 + 2 x2 +
15
16
17
3.2.2 A Geometrical Interpretation of Least Square
• y = (y1,…,yn) is the vector of observations.
• X contains p (p = k+1) column vectors (n ×1), i.e.
X = (1,x1,…,xk)
• The column space of X is called the estimation space.
• Any point in the estimation space is X.• Minimize square distance
S()=(y-X)’(y-X)
18
• Normal equation: 0)ˆ(' XyX
19
3.2.3 Properties of the Least Square Estimators• Unbiased estimator:
• Covariance matrix:
• Let C=(X’X)-1
• The LSE is the best linear unbiased estimator• LSE = MLE under normality assumption
)')'(()')'(()ˆ( 11 XXXXEyXXXEE
12 )'()ˆ( XXCov
20
3.2.4 Estimation of 2 • Residual sum of squares:
• The degree of freedom: n – p• The unbiased estimator of 2: Residual mean
squares
yXyy
XXyXyy
XyXy
eeSS s
''ˆ'
ˆ)'('ˆ''ˆ2'
)ˆ()'ˆ(
'Re
pn
SSMS s
s Re
Re
21
• Example 3.2 The Delivery Time Data
• Both estimates are in a sense correct, but they depend heavily on the choice of model.
• The model with small variance would be better.
22
3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression
• For the simple linear regression, the scatter diagram is an important tool in analyzing the relationship between y and x.
• However it may not be useful in multiple regression.
– y = 8 – 5 x1 + 12 x2
– The y v.s. x1 plot do not exhibit any apparent
relationship between y and x1
– The y v.s. x2 plot indicates the linear relationship
with the slope 8.
23
24
• In this case, constructing scatter diagrams of y v.s. xj (j = 1,2,…,k) can be misleading.
• If there is only one (or a few) dominant regressor, or if the regressors operate nearly independently, the matrix scatterplots is most useful.
25
3.2.6 Maximum-Likelihood Estimation• The Model is y = X + ~N(0, 2I)• The likelihood function and log-likelihood
function:
• The MLE of 2
)()'(2
1))ln()2(ln(
2),(
))2/()()'(exp()2(
1),(
222
22/2
2
XyXyn
l
XyXyLn
26
3.3 Hypothesis Testing in Multiple Linear Regression
• Questions:– What is the overall adequacy of the model?– Which specific regressors seem important?
• Assume the errors are independent and follow a normal distribution with mean 0 and variance 2
27
3.3.1 Test for Significance of Regression• Determine if there is a linear relationship between
y and xj, j = 1,2,…,k.
• The hypotheses are
H0: β1 = β2 =…= βk = 0
H1: βj 0 for at least one j
• ANOVA
• SST = SSR + SSRes
• SSR/2 ~ 2k, SSRes/2 ~ 2
n-k-1, and SSR and SSRes
are independent1,
ReRe0 ~
)1/(
/
knk
s
R
s
R FMS
MS
knSS
kSSF
28
•
• Under H1, F0 follows F distribution with k and n-
k-1 and a noncentrality parameter of
knkn
kk
c
k
ccR
s
xxxx
xxxx
X
k
XXMSE
MSE
11
1111
1*
2
*'*'2
2Re
)',...,(
)(
)(
2
*'*'
ccXX
29
• ANOVA table
30
31
• Example 3.3 The Delivery Time Data
32
• R2 and Adjusted R2 – R2 always increase when a regressor is added to
the model, regardless of the value of the contribution of that variable.
– An adjusted R2:
– The adjusted R2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares.
)1/(
)/(1 Re2
nSS
pnSSR
T
sadj
33
3.3.2 Tests on Individual Regression Coefficients• For the individual regression coefficient:
– H0: βj = 0 v.s. H1: βj 0
– Let Cjj be the j-th diagonal element of (X’X)-1.
The test statistic:
– This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables.
– This test is a test of contribution of xj given the
other regressors in the model
120 ~)ˆ(
ˆ
ˆ
ˆ kn
j
j
jj
j tseC
t
34
• Example 3.4 The Delivery Time Data
35
• The subset of regressors:
36
• For the full model, the regression sum of square
• Under the null hypothesis, the regression sum of squares for the reduce model
• The degree of freedom is p-r for the reduce model.
• The regression sum of square due to β2 given β1
• This is called the extra sum of squares due to β2
and the degree of freedom is p - (p - r) = r• The test statistic
yXSSR ''ˆ)(
yXSSR'1
'11
ˆ)(
)()()|( 112 RRR SSSSSS
pnrs
R FMS
rSSF ,
Re
120 ~
/)|(
37
• If β2 0, F0 follows a noncentral F distribution
with
• Multicollinearity: this test actually has no power!
• This test has maximal power when X1 and X2 are
orthogonal to one another!
• Partial F test: Given the regressors in X1, measure
the contribution of the regressors in X2.
22'1
11
'11
'2
'22
])([1
XXXXXIX
38
• Consider y = β0 + β1 x1 + β2 x2 + β3 x3 +
SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3) and SSR(β3|
β0 , β2, β1) are signal-degree-of –freedom sums of
squares.
• SSR(βj| β0 ,…, βj-1, βj, … βk) : the contribution of
xj as if it were the last variable added to the model.
• This F test is equivalent to the t test.
• SST = SSR(β1 ,β2, β3|β0) + SSRes
• SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) + SSR(β2|β1, β0) +
SSR(β3 |β1, β2, β0)
39
• Example 3.5 Delivery Time Data
40
3.3.3 Special Case of Orthogonal Columns in X
• Model: y = Xβ + = X1β1+ X2β2 +
• Orthogonal: X1’X2 = 0
• Since the normal equation (X’X)β= X’y,
•
yX
yX
XX
XX'2
'1
2
1
2'2
1'1
ˆ
ˆ
0
0
yXXXyXXX '2
12
'22
'1
11
'11 )(ˆ and )(ˆ
41
42
3.3.4 Testing the General Linear Hypothesis• Let T be an m p matrix, and rank(T) = r• Full model: y = Xβ +
• Reduced model: y = Z + , Z is an n (p-r) matrix and is a (p-r) 1 vector. Then
• The difference: SSH = SSRes(RM) – SSRes(FM) with r degree of freedom. SSH is called the sum of squares due to the hypothesis H0: Tβ = 0
freedom) of degree p-(n ''ˆ')(Re yXyyFMSS s
freedom) of degreer p-(n ''ˆ')(
')'(ˆ
Re
1
yZyyRMSS
yZZZ
s
43
• The test statistic:
pnrs
H FpnFMSS
rSSF
,Re
~)/()(
/
44
45
• Another form:
• H0: Tβ = c v.s. H1: Tβ c Then
)/()(
/ˆ]')'([''ˆ
Re
11
pnFMSS
rTTXXTTF
s
pnrs
FpnFMSS
rcTTXXTcTF
,Re
11
~)/()(
/)ˆ(]')'([)'ˆ(