Section 10.3 Regression
description
Transcript of Section 10.3 Regression
![Page 1: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/1.jpg)
1
Objective
Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend.
Section 10.3Regression
![Page 2: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/2.jpg)
2
Equation of a line
Recall that the equation of a line is given by its slope and y-intercept
y = m x + b
![Page 3: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/3.jpg)
3
Regression
For a set of data (with variables x and y) that is linearly correlated, we want to find the equation of the line that best describes the trend.
This process is called Regression
![Page 4: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/4.jpg)
4
x : The predictor variable (Also called the explanatory variable or independent variable)
y : The response variable (Also called the dependent variable)
Regression Equation The equation that describes the algebraically relationship between the two variables
Regression Line The graph of the regression equation (also called the line of best fit or least squares line)
Definitions
![Page 5: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/5.jpg)
5
Regression Equation
y = b0 + b1x
b0 : y-intercept b1 : slope
Regression Line
Definitions
![Page 6: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/6.jpg)
6
Notation for Regression Equation
y-intercept
Slope
Equation
Population
0
1
y = 0 + 1 x
Sample
b0
b1
y = b0 + b1 x
![Page 7: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/7.jpg)
7
1. The sample of paired (x, y) data is a random sample of quantitative data.
2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern.
3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.
Requirements
![Page 8: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/8.jpg)
8
Rounding b0 and b1
Round to three significant digits
If you use the formulas from the book, do not round intermediate values.
![Page 9: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/9.jpg)
9
Refer to the sample data given in Table 10-1 in the Chapter Problem.
Find the equation of the regression line in which the explanatory variable (x-variable) is the cost of a slice of pizza and the response variable (y-variable) is the corresponding cost of a subway fare.
(CPI=Consumer Price Index, not used)
Example 1
![Page 10: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/10.jpg)
10
x : 0.15 0.35 1.00 1.25 1.75 2.00
y : 0.15 0.35 1.00 1.35 1.50 2.00
1. Enter data in StatCrunch (columns)
Example 1
![Page 11: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/11.jpg)
11
x : 0.15 0.35 1.00 1.25 1.75 2.00
y : 0.15 0.35 1.00 1.35 1.50 2.00
2. Stat – Regression – Simple Linear
Example 1
![Page 12: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/12.jpg)
12
x : 0.15 0.35 1.00 1.25 1.75 2.00
y : 0.15 0.35 1.00 1.35 1.50 2.00
2. Select var1 and var2 (i.e. x and y values) Click Calculate
Example 1
![Page 13: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/13.jpg)
13
x : 0.15 0.35 1.00 1.25 1.75 2.00
y : 0.15 0.35 1.00 1.35 1.50 2.00
b0 = 0.0345
b1 = 0.945
Regression Equation
y = (0.0345) + (0.945)x
Example 1
![Page 14: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/14.jpg)
14
Regression Equation
y = (0.0345) + (0.945)x
Example 1
![Page 15: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/15.jpg)
15
1. Predicted value of y is y = b0 + b1x
2. Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well.
Using the Regression Equation for Predictions
3. Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables.
![Page 16: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/16.jpg)
16
4. Use the regression line for predictions only if the value of x does not go much beyond the scope of the available sample data.
Predicting too far beyond the scope of the available sample data is called extrapolation, and it could result in bad predictions.
Using the Regression Equation for Predictions
5. If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate, which is its sample mean ( y )
_
![Page 17: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/17.jpg)
17
Using the Regression Equation for Predictions
Source: www.xkcd.com
![Page 18: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/18.jpg)
18
Strategy for Predicting Values of Y
![Page 19: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/19.jpg)
19
If the regression equation is not a good model, the best predicted value of y is simply y (the mean of the y values)
Remember, this strategy applies to linear patterns of points in a scatterplot.
Using the Regression Equation for Predictions
_
![Page 20: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/20.jpg)
20
For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the y-value that is predicted by using the regression equation. That is,
Definition
Residual = (observed y) – (predicted y) = y – y
![Page 21: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/21.jpg)
21
Residuals
![Page 22: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/22.jpg)
22
A straight line satisfies the least-squares property if the sum of the squares of the residuals is the smallest sum possible.
The best possible regression line satisfies this properties (hence why it is also called the least squares line)
Definition
![Page 23: Section 10.3 Regression](https://reader036.fdocuments.us/reader036/viewer/2022062723/56813b46550346895da4281c/html5/thumbnails/23.jpg)
23
Least Squares Property
sum = (-5)2 + 112 + (-13) 2 + 72 = 364(any other line would yield a sum larger than 364)