Analysis of Variance for Regression/Multiple Regressiondsmall/stat112-02/handouts/lectslides... ·...
Transcript of Analysis of Variance for Regression/Multiple Regressiondsmall/stat112-02/handouts/lectslides... ·...
Analysis of Variance for Regression/Multiple
Regression
Lecture Notes XVII
Statistic 112, Fall 2002
Announcements
�The second midterm is next Thursday.
� Extra Office Hours Next Week: Monday, 1:30-2:30;
Wednesday, 9-10, 1:30-2:30 or by appointment. Usual office
hours on Tuesday, 1-2 and 4:30-5:30.
� Haipeng will hold office hours tomorrow from 10:30-12:30.
� A tutor from the university tutor service will hold a review
session on Tuesday (Nov. 12) from 6-8 p.m. in Huntsman Hall,
Room 250. He is not affiliated with the course.
� I will post a set of exercises for this week’s lectures by Friday
night.
Outline
�Analysis of variance for regression.
� Multiple regression.
– Basic model.
– Estimating and interpreting the parameters.
– The impact of lurking variables.
� Reading for this time: Chapter 10.2 (analysis of variance for
regression part), Chapter 11.
Analysis of Variance for Regression
�The analysis of variance (ANOVA) provides a convenient
method of comparing the fit of two or more models to the same
set of data. Here we are interested in comparing
1. A simple linear regression model in which the slope is zero,����� ���� �� vs.
2. A simple linear regression model in which the slope is not
zero, ����� ����� �� ������ �� .For both models it is assumed that ���� � ����������� , ��independent.
� Analysis of variance summarizes information about the
sources of variation in the data.
� Total variation in the response � is expressed by the deviations� �"! #� .
Two reasons why � � does not equal #� :
– Responses � � correspond to different values of the explanatory
variable � . The fitted values�� � estimates the mean response
for each specific ��� . The difference�� �"! #� reflects variation in
mean responses due to differences in � � .– Individual observations will vary about mean because of
variation within subpopulation of responses to a fixed � � . This
variation is represented by the residuals � �"! ���� .
Sums of Squares
�Basic idea behind analysis of variance: If
� ��� �� � � , then
all variation should be due to individual observations varying
about their mean. We can estimate the amount of variation
due to the responses � � corresponding to different values of
the explanatory variable � and base our test on this estimate.
�
� � � ! #� � � � ���� ! #� �"� � ��� ! �� � �� Algebraic fact:
� � � ! #� � � � � �� � ! #� � � � � � � ! �� � � � (1)
� We write (1) as
����� � ����� � ����
SS stands for sum of squares and T,M and E stand for total,
model and error respectively. Total variation SST is the sum of
variation due to the straight-line model for the regression
function (SSM) and variation due to deviations from this model
(SSE).
� If� �� �� � � were true, then SSM should be small.
� Degrees of freedom are associated with each sum of squares.
Degrees of freedom can be thought of as the number of
independent pieces of information that the sum of squares
reflects.
�� � � � � � � � � �
.
�� � � � � ! � � � � � � � � � � � � ! � .
�
Mean square (MS) � sum of squares
degrees of freedom
� Interpretation of � � : fraction of variation in the values of � that
is explained by the least squares regression of � on � .
� � ������
������
� � �� � ! #� � �� � � � ! #� � �
� � !� � � � ! �� � � �
� � � � ! #� � �
The ANOVA F test
� � � � � � � ( � is not linearly related to � ) can be tested by
comparing MSM with MSE. The ANOVA test statistic is
� �� � �
� �
�will tend to be small when
� � is true and large when� � � � �� � is true.
� Under� � , the statistic
�has an
�distribution with � degree
of freedom in the numerator and � ! � degrees of freedom in
the denominator (Table E).
� For simple linear regression, the�
test is equivalent to the�-test of
� � � � � � versus� � � � �� � .
Multiple regression
�Consider again the problem of deciding how many years you
should stay in school. Suppose that you now have available
the joint distributions of earnings ( � ), education ( � ) and IQs
( � � ) for a sample from a population of people like yourself.
� You could just use the regression function � � � � � to make
your prediction but given the extra information about IQ in the
sample, it is natural to try to use it.
� The natural way to use the extra information is to use the
multiple regression function � � � �� �� � � � to make your
prediction (you substitute in your IQ to make the prediction).
� Population regression function: � � � � ������� � ��� � .
� Data for multiple regression:
Person 1 � � �� � �" � ������� � �" �� � �� �
Person 2 � � � � � � � � ������� � � � � � � � �...
Person � � � ��� � �� � ������� � ��� � ��� �
Multiple Linear Regression Model
�One possible model for the population regression function is
the multiple linear regression model, an analogue of the simple
linear regression model:
� � � �" �� ����� � ��� � � ����� �� ��" � ����� � � � ���
� Interpretation of � � : The change in the mean of � if � � is
increased by one unit and all other explanatory variables,�" ���� ��� � � ��� �� � ��� ������� � ��� are held fixed.
� The multiple linear regression model is very flexible. Examples
of multiple linear regression models:
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � ��� � � � � � � � � � � � � � � � � ���� � � �"� � � ��� � � � �
Probability Model for Multiple Linear Regression
�The statistical model for multiple linear regression is
� � � � � � � � � � � � � � � � � ��� � � � � � � � �
for� � � � � ������� � � .
� The mean response ��� is a linear function of the explanatory
variables
� � � � � � � � � � � � � � ��� � � � � � �
� The residuals � are independent and normally distributed with
mean 0 and standard deviation � . In other words, they are a
simple random sample from a � � ��� �"� � distribution.
Estimation of the multiple regression parameters
� � ��� ��� � �� ��� � ����� � � � ��� � � �� .� Let
� � ��� ��� � � � denote the estimators of � � ������� � � � .
� For the�th observations the predicted response is
�� � � � � � � � � � ��� � � � � � � �
� The�th residual, the difference between the observed and
predicted response, is
� � � observed response ! predicted response
� ����! ����� � � ! � � ! � � � ! � ��� ! � � � � �
� To estimate� � ������� � � � , we use the method of least squares.
Choose the values of the�’s that makes the sum of the
squares as small as possible, i.e., choose� � ������� � � � to
minimize
� � � ! � � ! � � � ! � � � � � ! ��� � � � � � � � �
� The parameter �"� measures the variability of the responses
about the population regression equation. We estimate � � by
� � which is an average of the squared residuals
� � �� � � � ! �� � � �
� ! � ! �
We estimate � by � � � � � . We call � the root mean squared
error.
The Impact of Lurking Variables
�You want to predict what your earnings will be if you obtain a
certain number of years of education. Let � � earnings,�" � years of education and � � � IQ. Suppose that you have
a sample thatonly contains earnings and years of education
data, people’s IQs are not recoreded.
� IQ is probably a lurking variable, i.e., a variable that has an
important effect on the relationship among the variables in a
study but is not included among the variables studied.
� Suppose that the population regression function for the
expected value of earnings given education and IQ is a
multiple linear regression function:
� � � �" � � � � � ���� �� ��" � � � � �
and also suppose that
� � �� �" � � � �� � �"
� What is � � � � � ?
� � � � � � � � � � � � � � �� � �
� � � � � � � � � � � � � �� � �
� � � � � � � � � � � �� � �
� � � � � � � � � � � � � � � � � � � � � � � � � ��� � � � � � � � �
� If we use least squares to estimate the simple linear regression
function � � � � � , the slope of the least squares line will be
an unbiased estimate of � � � � � which does not generally
equal �� . Thus, by regressing earnings on only years of
education, you will not obtain the right slope for estimating the
impact of additional years of education given your fixed IQ.
� Two circumstances in which � � � � � � � � � :– � � � � , i.e., IQ does not help to predict earnings once
education is included.
– � � � , i.e., years of education does not help to predict IQ.
Effect of omitting ability on estimates of the returns to
education:� � log earnings. The interpretation of the least squares
coefficient is that it approximately measures the percent
increase in earnings for one extra year of schooling.
Data Set IQ omitted IQ included
Male Ph..D’s, 1958-1960 0.0205 0.0213
Rejected low-AFQT 0.0346 0.0171
military trained
applicants (1962)
NLSYM (1969) 0.065 0.059
Veterans - CPS (1964) 0.0508 0.0433
NLSYM (1973) 0.041 0.030
military trained 0.0346 0.0171
G-77 0.022 0.014