Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression.
-
Upload
pauline-edwards -
Category
Documents
-
view
219 -
download
0
description
Transcript of Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression.
Anareg Week 10MulticollinearityInteresting special casesPolynomial regression
MulticollinearityNumerical analysis problem is that the matrix
X’X is close to singular and is therefore difficult to invert accurately
Statistical problem is that there is too much correlation among the explanatory variables and it is therefore difficult to determine the regression coefficients
Multicollinearity (2)Solve the statistical problem and the
numerical problem will also be solvedThe statistical problem is more serious than
the numerical problemWe want to refine a model that has redundancy
in the explanatory variables even if X’X can be inverted without difficulty
Multicollinearity (3)Extremes cases can help us to understand the
problemif all X’s are uncorrelated, Type I SS and Type II SS
will be the same, i.e, the contribution of each explanatory variable to the model will be the same whether or not the other explanatory variables are in the model
if there is a linear combination of the explanatory variables that is a constant (e.g. X1 = X2 (X1 - X2 = 0)), then the Type II SS for the X’s involved will be zero
Y = gpa X1 = hsmX3 = hss X4 = hseX5 = satm X6 = satvX7 = genderm;
Define: sat=satm+satv;We will regress Y on sat satm and satv;
Source DF Model 2 Error 221Corrected Total 223
•Something is wrong•dfM=2 but there are 3 Xs
NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.
satv = sat - satm
Par StVar DF Est Err t P
Int 1 1.28 0.37 3.43 0.0007sat B -0.00 0.00 -0.04 0.9684satm B 0.00 0.00 2.10 0.0365satv 0 0 . . .
Extent of multicollinearityOur CS example had one explanatory variable
equal to a linear combination of other explanatory variables
This is the most extreme case of multicollinearity and is detected by statistical software because (X’X) does not have an inverse
We are concerned with cases less extreme
Effects of multicollinearityRegression coefficients are not well
estimated and may be meaninglessSimilarly for standard errors of these
estimatesType I SS and Type II SS will differR2 and predicted values are usually ok
Two separate problemsNumerical accuracy
(X’X) is difficult to invertNeed good software
Statistical problemResults are difficult to interpretNeed a better model
Polynomial regressionWe can do linear, quadratic, cubic, etc. by
defining squares, cubes, etc. in a data step and using these as predictors in a multiple regression
We can do this with more than one explanatory variable
When we do this we generally create a multicollinearity problem
Polynomial Regression (2)We can remove the correlation between
explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the
mean and divide by the standard deviation)
Interaction ModelsWith several explanatory variables, we need to
consider the possibility that the effect of one variable depends on the value of another variable
Special casesOne indep variable – second orderOne indep variable – Third orderTwo cindep variables – second order
One Independent variable –Second Order
The regression model:
The mean response isa parabole and is frequently called a quadratic
response function.βo reperesents the mean response of Y when x
= 0 and β1 is often called the linear effect coeff while β11 is called the quadratic effect coeff.
XXxwherexxY iiiiioi 2111
2111)( iioi xxYE
One Independent variable –Third Order
The regression model:
The mean response is
XXxwhere
xxxY
ii
iiiioi 3111
2111
3111
2111)( iiioi xxxYE
Two Independent variable –Second Order
The regression model:
The mean response is
the equation of a conic section. The coeff β12 is often called the interaction effect coeff.
222111
21122222
21112211
XXxXXxwhere
xxxxxxY
iiii
iiiiiiioi
2112222
21112211)( iiiiiioi xxxxxxYE
NKNW Example p 330Response variable is the life (in cycles) of a
power cellExplanatory variables are
Charge rate (3 levels)Temperature (3 levels)
This is a designed experiment
Obs cycles chrate temp 1 150 0.6 10 2 86 1.0 10 3 49 1.4 10 4 288 0.6 20 5 157 1.0 20 6 131 1.0 20 7 184 1.0 20 8 109 1.4 20 9 279 0.6 30 10 235 1.0 30 11 224 1.4 30
Create new variables chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp;Then regress cycles on chrate, temp, chrate2, temp2, and ct;
Var b S(b) t Pr>|t|int 162.84 16.61 9.81 <.0002Chrate -55.83 13.22 -
4.22<0.01
Temp 75.50 13.22 5.71 <0.005Chrate2
27.39 20.34 1.35 0.2359
Temp2 -10.61 20.34 -.52 0.6244ct 11.50 16.19 .71 0.5092
b. ANOVA TableSource df SS MSRegression 5 66366 11703X1 1 18704 18704X2|X1 1 34201 34201X1
2|X1,X2 1 1646 1646X2
2|X1,X2,X12 1 285 285
X22|X1,X2,X1
2, X2
2
1 529 529
Error 5 4240 1048Total 10 60606
ConclusionWe have a multicollinearity problemLets look at the correlations (use proc corr)There are some very high correlations
r(chrate,chrate2) = 0.99103r(temp,temp2) = 0.98609
A remedyWe can remove the correlation between
explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the
mean and divide by the standard deviation)
Last slide
Read NKNW 7.6 to 7.7 and the problems on pp 317-326
We used programs cs4.sas and NKNW302.sas to generate the output for today
Last slide
Read NKNW 8.5 and Chapter 9