Multiple Linear Regrssion - Chapter3.3
-
Upload
jorge-gonza -
Category
Documents
-
view
52 -
download
1
Transcript of Multiple Linear Regrssion - Chapter3.3
![Page 1: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/1.jpg)
1
3.3 Hypothesis Testing in Multiple Linear Regression
• Questions:– What is the overall adequacy of the model?– Which specific regressors seem important?
• Assume the errors are independent and follow a normal distribution with mean 0 and variance 2
![Page 2: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/2.jpg)
2
3.3.1 Test for Significance of Regression• Determine if there is a linear relationship between
y and xj, j = 1,2,…,k.
• The hypotheses are
H0: β1 = β2 =…= βk = 0
H1: βj 0 for at least one j
• ANOVA
• SST = SSR + SSRes
• SSR/2 ~ 2k, SSRes/2 ~ 2
n-k-1, and SSR and SSRes
are independent1,
ReRe0 ~
)1/(
/
knk
s
R
s
R FMS
MS
knSS
kSSF
![Page 3: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/3.jpg)
3
•
• Under H1, F0 follows F distribution with k and n-
k-1 and a noncentrality parameter of
knkn
kk
c
k
ccR
s
xxxx
xxxx
X
k
XXMSE
MSE
11
1111
1*
2
*'*'2
2Re
)',...,(
)(
)(
2
*'*'
ccXX
![Page 4: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/4.jpg)
4
• ANOVA table
![Page 5: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/5.jpg)
5
![Page 6: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/6.jpg)
6
• Example 3.3 The Delivery Time Data
![Page 7: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/7.jpg)
7
• R2 and Adjusted R2 – R2 always increase when a regressor is added to
the model, regardless of the value of the contribution of that variable.
– An adjusted R2:
– The adjusted R2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares.
)1/(
)/(1 Re2
nSS
pnSSR
T
sadj
![Page 8: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/8.jpg)
8
3.3.2 Tests on Individual Regression Coefficients• For the individual regression coefficient:
– H0: βj = 0 v.s. H1: βj 0
– Let Cjj be the j-th diagonal element of (X’X)-1.
The test statistic:
– This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables.
– This test is a test of contribution of xj given the
other regressors in the model
120 ~)ˆ(
ˆ
ˆ
ˆ kn
j
j
jj
j tseC
t
![Page 9: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/9.jpg)
9
• Example 3.4 The Delivery Time Data
![Page 10: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/10.jpg)
10
• The subset of regressors:
![Page 11: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/11.jpg)
11
• For the full model, the regression sum of square
• Under the null hypothesis, the regression sum of squares for the reduce model
• The degree of freedom is p-r for the reduce model.
• The regression sum of square due to β2 given β1
• This is called the extra sum of squares due to β2
and the degree of freedom is p - (p - r) = r• The test statistic
yXSSR ''ˆ)(
yXSSR'1
'11
ˆ)(
)()()|( 112 RRR SSSSSS
pnrs
R FMS
rSSF ,
Re
120 ~
/)|(
![Page 12: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/12.jpg)
12
• If β2 0, F0 follows a noncentral F distribution
with
• Multicollinearity: this test actually has no power!
• This test has maximal power when X1 and X2 are
orthogonal to one another!
• Partial F test: Given the regressors in X1, measure
the contribution of the regressors in X2.
22'1
11
'11
'2
'22
])([1
XXXXXIX
![Page 13: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/13.jpg)
13
• Consider y = β0 + β1 x1 + β2 x2 + β3 x3 +
SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3) and SSR(β3|
β0 , β2, β1) are signal-degree-of –freedom sums of
squares.
• SSR(βj| β0 ,…, βj-1, βj, … βk) : the contribution of
xj as if it were the last variable added to the model.
• This F test is equivalent to the t test.
• SST = SSR(β1 ,β2, β3|β0) + SSRes
• SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) + SSR(β2|β1, β0) +
SSR(β3 |β1, β2, β0)
![Page 14: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/14.jpg)
14
• Example 3.5 Delivery Time Data
![Page 15: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/15.jpg)
15
3.3.3 Special Case of Orthogonal Columns in X
• Model: y = Xβ + = X1β1+ X2β2 +
• Orthogonal: X1’X2 = 0
• Since the normal equation (X’X)β= X’y,
•
yX
yX
XX
XX'2
'1
2
1
2'2
1'1
ˆ
ˆ
0
0
yXXXyXXX '2
12
'22
'1
11
'11 )(ˆ and )(ˆ
![Page 16: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/16.jpg)
16
![Page 17: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/17.jpg)
17
3.3.4 Testing the General Linear Hypothesis• Let T be an m p matrix, and rank(T) = r• Full model: y = Xβ +
• Reduced model: y = Z + , Z is an n (p-r) matrix and is a (p-r) 1 vector. Then
• The difference: SSH = SSRes(RM) – SSRes(FM) with r degree of freedom. SSH is called the sum of squares due to the hypothesis H0: Tβ = 0
freedom) of degree p-(n ''ˆ')(Re yXyyFMSS s
freedom) of degreer p-(n ''ˆ')(
')'(ˆ
Re
1
yZyyRMSS
yZZZ
s
![Page 18: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/18.jpg)
18
• The test statistic:
pnrs
H FpnFMSS
rSSF
,Re
~)/()(
/
![Page 19: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/19.jpg)
19
![Page 20: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/20.jpg)
20
• Another form:
• H0: Tβ = c v.s. H1: Tβ c Then
)/()(
/ˆ]')'([''ˆ
Re
11
pnFMSS
rTTXXTTF
s
pnrs
FpnFMSS
rcTTXXTcTF
,Re
11
~)/()(
/)ˆ(]')'([)'ˆ(
![Page 21: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/21.jpg)
21
3.4 Confidence Intervals in Multiple Regression
3.4.1 Confidence Intervals on the Regression Coefficients
• Under the normality assumption,
))'(,(~ˆ 12 XXMN
![Page 22: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/22.jpg)
22
![Page 23: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/23.jpg)
23
3.4.2 Confidence Interval Estimation of the Mean Response
• A confidence interval on the mean response at a particular point.
• x0 = (1,x01,…,x0k)’
• The unbiased estimator of E(y|x0) :
2/10
1'0
2,2/0
01'
02
0
'000
))'((y
responsemean on the C.I. )-100(1 The
)'()ˆ(
)|()ˆ(
xXXxt
xXXxyVar
xxyEyE
pn
![Page 24: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/24.jpg)
24
• Example 3.9 The Delivery Time Data
![Page 25: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/25.jpg)
25
3.4.3 Simultaneous Confidence Intervals on Regression Coefficients
• An elliptically shaped region
![Page 26: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/26.jpg)
26
• Example 3.10 The Rocket Propellant Data
![Page 27: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/27.jpg)
27
![Page 28: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/28.jpg)
28
• Another approach:
is chosen so that a specified probability that all intervals are correct is obtained.
• Bonferroni method: Δ= tα/2p, n-p
• Scheffe S-method: Δ=(2Fα,p, n-p )1/2
• Maximum modulus t procedure: Δ= uα,p, n-2 is the
upper tail point of the distribution of the maximum absolute value of two independent student t r.v.’s each based on n-2 degree of freedom
k ..., 1, 0,j ),ˆ(ˆ jj se
![Page 29: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/29.jpg)
29
• Example 3.11 The Rocket Propellant Data
• Find 90% joint C.I. for β0 and β1 by constructing a 95% C.I. for each parameter.
![Page 30: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/30.jpg)
30
• The confidence ellipse is always a more efficient procedure than the Bonferroni method because the volume of the ellipse is always less than the volume of the space covere3d by the Bonferroni intervals.
• Bonferroni intervals are easier to construct.• The length of C.I.:
Maximum modulus t < Bonferroni method
< Scheffe S-method
![Page 31: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/31.jpg)
31
3.5 Prediction of New Observations
![Page 32: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/32.jpg)
32
3.6 Hidden Extrapolation in Multiple Regression
• Be careful about extrapolating beyond the region containing the original observations!
• Rectangle formed by ranges of regressors NOT data region.
• Regressor variable hull (RVH): the convex hull of the original n data points.
– Interpolation: x0 RVH
– Extrapolation: x0 RVH
![Page 33: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/33.jpg)
33
![Page 34: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/34.jpg)
34
• hii of the hat matrix H = X(XX)-1X’are useful in
detecting hidden extrapolation.
• hmax: the maximum of hii . The point xi that has the
largest value of hii will lie on the boundary of
RVH
• {x | x(XX)-1x h≦ max } is an ellipsoid enclosing all
points inside the RVH.
• Let h00 = x′(X′X)-1x0
– h00 hmax : inside the RVH and the boundary
of RVH
– h00 > hmax : outside the RVH
![Page 35: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/35.jpg)
35
• MCE : minimum covering ellipsoid (Weisberg, 1985).
![Page 36: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/36.jpg)
36
![Page 37: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/37.jpg)
37
3.7 Standardized Regression Coefficients
• Difficult to compare regression coefficients directly.
• Unit Normal Scaling: Standardize a Normal r.v.
![Page 38: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/38.jpg)
38
• New model:
– There is no intercept.– The least-square estimator of b is
nizbzby iikkii ,...,1 ,11*
*1 ')'(ˆ yZZZb
![Page 39: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/39.jpg)
39
• Unit Length Scaling:
![Page 40: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/40.jpg)
40
• New Model:
• The least-square estimator:
niwbwby iikkii ,...,1 ,110
01 ')'(ˆ yWWWb
![Page 41: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/41.jpg)
41
• It does not matter which scaling we use! They both produce the same set of dimensionless regression coefficient.
![Page 42: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/42.jpg)
42
![Page 43: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/43.jpg)
43
![Page 44: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/44.jpg)
44
3.8 Multicollinearity
• A serious problem: Multicollinearity or near-linear dependence among the regression variables.
• The regressors are the columns of X. So an exact linear dependence would result a singular X’X
![Page 45: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/45.jpg)
45
• Unit length scaling
1)ˆ()ˆ(
10
01)'( and
10
01'
22
21
1
bVarbVar
WWWW
![Page 46: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/46.jpg)
46
• Soft drink data:
• Off-diagonal elements are of W’W usually called the simple correlations between regressors.
12.3)ˆ()ˆ(
12.357.2
57.212.3)'( and
1824.0
824.01'
22
21
1
bVarbVar
WWWW
![Page 47: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/47.jpg)
47
• Variance inflation factors (VIFs):– The main diagonal elements of the inverse of
X’X ((W’W)-1 above)
– From above two cases:Soft drink: VIF1 = VIF2 =
3.12 and Figure 3.12: VIF1 = VIF2 = 1
– VIFj = 1/(1-Rj)
– Rj is the coefficient of multiple determination
obtained from regressing xj on the other regressor
variables.
– If xj is nearly linearly dependent on some of the
other regressors, then Rj 1 and VIFj will be
large.– Serious problems: VIFs > 10
![Page 48: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/48.jpg)
48
• Figure 3.13 (a): The plan is unstable and very sensitive to relatively small changes in the data points.
• Figure 3.13 (b): Orthogonal regressors.
![Page 49: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/49.jpg)
49
3.9 Why Do Regression Coefficients Have the Wrong Sign?
• The reasons of the wrong sign:
1. The range of some of the regressors is too small.
2. Important regressors have not been included in the model.
3. Multicollinearity is present.
4. Computational errors have been made.
![Page 50: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/50.jpg)
50
• For reason 1:
n
iixx xxSVar
1
221 ))(/(/)ˆ(
![Page 51: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/51.jpg)
51
• Although it is possible to decrease the variance of the regression coefficients by increase the range of the x’s, it may not be desirable to spread the levels of the regressors out too far:– The true response function may be nonlinear.– Impractical or impossible.
• For reason 2:
![Page 52: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/52.jpg)
52
•
• t.coefficien regression total"" a is ˆ here
, 463.0835.1ˆ 1
xy
given x xofeffect theis ˆ Here
649.3222.1036.1ˆ
211
21
xxy
![Page 53: Multiple Linear Regrssion - Chapter3.3](https://reader036.fdocuments.us/reader036/viewer/2022081412/54529ec7b1af9f76248b52be/html5/thumbnails/53.jpg)
53
• Fore reason 3: Multicollinearity inflates the variances of the coefficients, and this increases the probability that one or more regression coefficients will have the wrong sign.
• Different computer programs handle round-off or truncation problems in different ways, and some programs are more effective than the others in this regard.