REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | +...
-
date post
21-Dec-2015 -
Category
Documents
-
view
262 -
download
0
Transcript of REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | +...
![Page 1: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/1.jpg)
REGRESSION MODELREGRESSION MODEL
ASSUMPTIONSASSUMPTIONS
![Page 2: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/2.jpg)
The Regression Model
• We have hypothesized that:
y = 0 + 1x +
|<Regression>| + |<Error>|
• So far we focused on the regression part – getting the best estimates for the ’s
• Here we focus on the error term,
![Page 3: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/3.jpg)
THE RANDOM VARIABLE,
• The error term, , is a random variable that describes how the observed values, yi, vary around the regression line.
• For any value of x, has a distribution with a mean and a standard deviation
• At any x value xi, the observed value of the error term is called its residual, given by:
iii y - y e ˆ
![Page 4: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/4.jpg)
STEP 3: 4 ASSUMPTIONS ABOUT
The remainder of our discussion about linear regression assumes the following about
• (1) DISTRIBUTION: is distributed normally
• (2) MEAN:– The errors average out to 0, i.e. E(), or = 0
• (3) STANDARD DEVIATION: , is the samesame at all values of x
• (4) INDEPENDENCE:– The errors are independentindependent of each other
![Page 5: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/5.jpg)
What Do These Assumptions Imply About y?
• y = 0 + 1x + .0 + 1x is a constant for a given value of x is normally distributed with mean 0 and standard
deviation .
• Thus y is normally distributed with standard deviation and mean E(y),
E(y) = E(0 + 1x + ) = E(0 + 1x) + E() = 0 + 1x + 0 = 0 + 1x
![Page 6: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/6.jpg)
BEST ESTIMATE FOR
• The true value of is unkown.
• It can estimated by s as follows:
s s and 2-n
y -(y
2-n
SSE s
and, 2-n freedom of degrees Thus
β and β :quantities two estimating are we Here
.estimated) being quantities(# - n freedom of Degrees
Freedom of Degrees
y -(y
Freedom of Degrees
SSE s
2ii
10
ii
;)ˆ
.
)ˆ
22
22
![Page 7: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/7.jpg)
Hand Calculation of SSE
1 1200 101000 109567.57 73403214.02
2 800 92000 88540.54 11967859.75
3 1000 110000 99054.05 119813732.7
4 1300 120000 114824.32 26787618.7
5 700 90000 83283.78 45107560.26
6 800 82000 88540.54 42778670.56
7 1000 93000 99054.05 36651570.49
8 600 75000 78027.03 9162892.622
9 900 91000 93797.30 7824872.169
10 1100 105000 104310.81 474981.7385
SUM 373972972.97
ii 52.5657x 46486.49 ythat Recall ˆ
SSESSE
22iiiii )( )y(y )y y y x i
6837.15246746621.6s
246746621.68
97377972972.
2n
SSEs2
![Page 8: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/8.jpg)
s
Residual Error
SSE/(n-2) = s2
SSE
![Page 9: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/9.jpg)
Checking the Assumptions
• Many times it is just assumed that the assumptions hold.
• We now show how to check the assumptions.
![Page 10: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/10.jpg)
Residuals
• The assumptions for can be checked using RESIDUAL ANALYSISRESIDUAL ANALYSIS.
• A residual, ei, is the observation of at an observed value of x, xi.
• For example in the Dollar Only example:y1 = 101,000 when x1 = 1200
8567.67109,567.57101,000e
109,567.57200)52.56757(146486.49y
1
1
ˆ
![Page 11: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/11.jpg)
Standardized Residuals• Is a residual of -8,567.67 large?
– It depends on the size of a standard error, s.• Standardized residual = ei/(standard error of ei for xi).• Standardized residuals are easier to use to test the
assumptions.• Two typical ways for calculating the standard error of
ei for a particular xi value are:
• Both approaches yield substantially the same results.
2i
2i
i
i
i
i
)x(x
)x(x
n
1h where
h1s
e
s
e
![Page 12: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/12.jpg)
Standardized Residuals in Excel
• Excel uses the following formula:
1-n
2-ns
ei
This still gives approximately the same values as the other methods. We will use the ones generated by Excel to check the assumptions.
![Page 13: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/13.jpg)
Checking to See if Errors (Residuals) Appear to Come From a Normal Distribution
TWO WAYS TO CHECK• Construct a plot of standardized residuals and
see if they look normal– Could use Histogram from Data Analysis– A “quick check” – Standardized residuals are like
z-values. Check to see if about 68% are between ± 1, 95% between ± 2, and virtually all between ± 3.
• Look at a normal probability plot. These are statistical plots to check for “normality”. A “perfect” normal distribution would be a straight line on such a plot.
![Page 14: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/14.jpg)
Checking to see if Is Constant
• Look at the residual plot to see if the points seem more spread out at some x’s than at others – in the Dollar Only example, it did not appear so on the Excel residual plot.
• Constant is called homoscedasticityhomoscedasticity!• If the points had looked like the next page, then
we see for lower values of x there is less variation than at higher values and the constant variation assumption would have been violated. This is called heteroscedasticityheteroscedasticity!
![Page 15: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/15.jpg)
x
e
Heteroscedasticity– Nonconstant Variance
![Page 16: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/16.jpg)
Checking Independence
• This is mainly for time series data (i.e. the x-axis is time) used in forecasting
• But basically if the data looks like the next slide – errors are not independent – In this case whether you have a positive or
negative error (residual) depends on the x-value.
– This is called autocorrelation.
![Page 17: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/17.jpg)
X=timeX=time
YY
Example of Autocorrelation(Errors are Dependent on x)
![Page 18: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/18.jpg)
Residual Analysis in Excel
CHECK:
Residuals
Standardized Residuals
Residual Plots
Normal Probability Plots
![Page 19: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/19.jpg)
Standardized ResidualsStandardized Residuals70% are between ± 1
100% are between ±2
“Close” to expected
normalnormal values
Residual values appear to
average out to 0 everywhere.
There is no discernable
pattern for the errors.
![Page 20: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/20.jpg)
Normal Probability Plot
• The following is the normal probability plot generated by Excel. Again Excel does it “slightly wrong”, but it should give us a good idea.
• Looks close to a straight line – normality assumption appears valid.
Normal Probability Plot
050000100000150000
0 20 40 60 80 100
Sample Percentile
Sal
es
![Page 21: REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y = 0 + 1 x + | | + | | So far we focused on the regression part –](https://reader035.fdocuments.us/reader035/viewer/2022081417/56649d5c5503460f94a3b1e7/html5/thumbnails/21.jpg)
Review• 4 assumptions about
1. is normal.
2. = E() = 0.3. is the same for all values of x.4. Errors are independent.
• Checking The Assumptions– Check residual plot to see if variation changes for
different values of x.– Check normality assumption by a normal probability
plot or by creating a histogram of standardized residuals.
• Does it appear normal and centered around 0?• Are about 68% between ±1, 95% between ±2, almost all
between ±3?