Simple Regression I
-
Upload
gareth-kaufman -
Category
Documents
-
view
44 -
download
1
description
Transcript of Simple Regression I
![Page 1: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/1.jpg)
Simple Regression 1
Simple Regression I
![Page 2: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/2.jpg)
Simple Regression 2
Correlation tells us how strongly Y and X are related … but regression estimates the form of this relationship
We’ll begin with simple regression, which assumes the form:
Regression Analysis
ii XbbY 10ˆ
![Page 3: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/3.jpg)
Simple Regression 3
Y is the variable we want to predict
We believe X influences how Y behaves
Ŷi is the estimated value of Y at Xi
b0 is the Y-intercept in the equation
b1 is the slope of the regression line
Regression Notation
![Page 4: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/4.jpg)
Simple Regression 4
Our goal: Find the straight line that best fits the data we’ve collected
The best equation will be the one that minimizes the error in fit
The equation is:
The fit error is thus:
Fitting the Regression Line
ii XbbY 10ˆ
iii YYe ˆ
![Page 5: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/5.jpg)
Simple Regression 5
Obtaining the line
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7
- Errors
+ Errors
![Page 6: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/6.jpg)
Simple Regression 6
The fit error for the ith point on the scatterplot diagram is:
We would like the sum of the + errors to be the same as the sum of the – errors.
However, there are many lines that can make this happen.
Balancing out the errors
iii YYe ˆ
![Page 7: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/7.jpg)
Simple Regression 7
Zero Error Lines
![Page 8: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/8.jpg)
Simple Regression 8
So, which of these solutions is the best one?
Select the line with the minimum sum of squared error terms. This is called least-squares regression.
The “Least Squares” Line
![Page 9: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/9.jpg)
Simple Regression 9
Intercept:
Slope:
* note COVAR here is Excel’s functional calculation which is the population covariance not the sample covariance
The Least Squares Estimators
1*
)(
),(1
n
n
xVar
yxCOVAR
SS
SSb
x
xy
XbYb 10
![Page 10: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/10.jpg)
Simple Regression 10
Some values can be calculated directly using the means, variances, and covariances.
For one-variable (simple) regression, can add a trendline to a chart.
Can use the Data Analysis Tool, Regression Can use the Excel function LINEST.
Getting the Estimates in Excel
![Page 11: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/11.jpg)
Simple Regression 11
Regression with mail data
100 200 300 400 500 600 700 8000
5
10
15
20
25
f(x) = 0.0297026689096403 x + 0.191221309475372
X
Y
Uses Excel’s Trend Line function
![Page 12: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/12.jpg)
Simple Regression 12
Output from Data Analysis Tool
![Page 13: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/13.jpg)
Simple Regression 13
Output from LINESTThe LINEST function must be entered as an array formula. For the example, highlight the cells E3:F7, type the formula “=LINEST(Orders,Weight,1,1)”, then CTRL-SHFT-ENTER.
![Page 14: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/14.jpg)
Simple Regression 14
Remember the variables are X = weight in pounds and Y = orders in 1000s
The estimated intercept (b0) tells us that if there was no mail, we still have a minimum of (.1912)(1000) or 191.2 orders per day.
The estimated slope (b1) tells us that each pound of mail tends to bring with it (.0297)(1000) or 29.7 orders.
Interpretation of Results
![Page 15: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/15.jpg)
Simple Regression 15
There are two standard ways to judge:
1. How much of the variation in the Y values (orders) can be attributed to the different values of X (weight of mail)?
2. In general, how small (or large) are the errors in fit?
How Good Is Our New Model?
![Page 16: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/16.jpg)
Simple Regression 16
The Coefficient of Determination:
The R2 value is:
◦ Always between 0 and 1
◦ Is the percentage of variation explained by the model.
◦ The square of correlation (for simple regression)
R2 – A Universal Measure of Fit
Yin variationThe
iprelationsh Y-X by the explained Yin variationThe2 R
![Page 17: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/17.jpg)
Simple Regression 17
ANOVA table: Total variation in the Y values is SST = 449.76
The amount of unexplained variation isSSE = 12.12
The difference is thus the variation explained by the regression equation orSSR = 449.76 – 12.12 = 437.64
The ratio of explained to total is how we get R2 = 437.64/449.76 = .973
How is R2 computed?
![Page 18: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/18.jpg)
Simple Regression 18
For every observation i, its error is given by:
To find the “typical error,” use this formula:
This is the “Standard Error”, also the √MSE.
Size of the Typical Error (S)
iii YYe ˆ
2
2
n
eS
n
ii
![Page 19: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/19.jpg)
Simple Regression 19
The typical error (called the standard error of prediction) for our regression model is: S = .7258
This means that we typically misestimate the actual number of orders per day by (.7258)(1000) = 725.8
That may sound like a lot, but you have to consider that we have between 5 and 20 thousand orders each day, average (13.22)*(1000) = 13200, then the percentage error is only 725.8 / 13200 = 5.5%.
S in our example
![Page 20: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/20.jpg)
Simple Regression 20
Sales Data
![Page 21: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/21.jpg)
Simple Regression 21
Sales Data Manual
![Page 22: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/22.jpg)
Simple Regression 22
Sales Data Graphical
![Page 23: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/23.jpg)
Simple Regression 23
Sales Data Tools
![Page 24: Simple Regression I](https://reader031.fdocuments.us/reader031/viewer/2022020718/5681330e550346895d99ccf0/html5/thumbnails/24.jpg)
Simple Regression 24
Sales Data LINEST