Post on 22-Dec-2015
1
Simple Linear Regression
•Linear regression model•Prediction•Limitation•Correlation
2
Example: Computer Repair
A company markets and repairs small computers. How fast (Time) an electronic component (Computer Unit) can be repaired is very important to the efficiency of the company. The Variables in this example are:
Time and Units.
3
Humm…
How long will it take me to repair this unit?
Goal: to predict the length of repair Time for a given number of computer Units
4
Computer Repair Data
Units Min’s Units Min’s
1 23 6 97
2 29 7 109
3 49 8 119
4 64 9 149
4 74 9 145
5 87 10 154
6 96 10 166
5
Scatterplot of response variable against explanatory variable
What is the overall (average) pattern? What is the direction of the pattern? How much do data points vary from the overall (average) pattern? Any potential outliers?
Graphical Summary of Two Quantitative Variable
6
Time is Linearly related with computer Units.
(The length of) Time is Increasing as (the number of) Units increases.
Data points are closed to the line.
No potential outlier.
Scatterplot (Time vs Units) Some Simple Conclusions
Summary for Computer Repair Data
7
Numerical Summary of Two Quantitative Variable
Regression Model
Correlation
8
Linear Regression Model
Y: the response variable X: the explanatory variable
X
Y Y=b0+b1X+error
} b0
} b1
1
9
Linear Regression Model
The regression line models the relationship between X and Y on average.
10
Prediction
: Predicted value of Y for a given X value Regression equation:
Eg. How long will it take to repair 3 computer units?
Y
XbbY 10ˆˆˆ
XY 51.1516.4ˆ
11
The Limitation of the Regression Equation
The regression equation cannot be used to predict Y value for the X values which are (far) beyond the range in which data are observed.
Eg. The predicted WT of a given HT:
Given HT of 40”, the regression equation will give us WT of -205+5x40 = -5 pounds!!
XY 5205ˆ
12
The Unpredicted Part
The value is the part the regression equation (model) cannot predict, and it is called “residual.”
YY ˆ
13
residual {
14
Correlation between X and Y
X and Y might be related to each other in many ways: linear or curved.
15
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.2
1.4
1.6
1.8
2.0
2.2
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.5
2.0
2.5
3.0
r=.98Strong Linearity
r=.71Median Linearity
Examples of Different Levels of Correlation
16
x
y
0.0 0.2 0.4 0.6 0.8 1.0
2.0
2.5
3.0
3.5
4.0
r=-.09Nearly Uncorrelated
Examples of Different Levels of Correlation
x
y
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
r=.00Nearly Curved
17
(Pearson) Correlation Coefficient of X and Y
• A measurement of the strength of the “LINEAR” association between X and Y
• The correlation coefficient of X and Y is:
xxyy
xy
xxyy
n
iii
xyss
s
ss
xxyyr
1
))((
18
Correlation Coefficient of X and Y
-1< r < 1 The magnitude of r measures the strength of
the linear association of X and Y The sign of r indicate the direction of the
association: “-” negative association
“+” positive association
19
Correlation Coefficient
The value r is almost 0
the best line to fit the data points is exactly horizontal
the value of X won’t change our prediction on Y
The value r is almost 1
A line fits the data points almost perfectly.
Goodness of Fit of SLR Model
For a data point: residuals
For the whole dataset: R^2
R^2 (=r^2) is the proportion o f variation in Y explained by (the variation in) X
20
21
i
1
2
…
n
… …. ….
Total
2)(,, yyyyy iii 2)(,, xxxxx iii ))(( xxyy ii
2111 )(,, yyyyy
2222 )(,, yyyyy
2)(,, yyyyy nnn
211,1 )(, xxxxx
2222 )(,, xxxxx
2)(,, xxxxx nnn
))(( 11 xxyy
))(( 22 xxyy
))(( xxyy nn
2
11
)(,0, yyyn
ii
n
ii
2
11
)(,0, xxxn
ii
n
ii
))((1
xxyy i
n
ii
yySy ,0, xxSx ,0, xyxy rS ,
Table for Computing Mean, St. Deviation, and Corr. Coef.
22
Example: Computer Repair Time
9937./
,1768))((
,114)(
,614/84,84
,35.27768)(
2143.9714/1361,14,1361
1
2
1
1
1
2
1
xxyyxyxy
i
n
iixy
n
iixx
n
ii
n
iiyy
n
ii
sssr
xxyys
xxs
xx
yys
yny
23
(1) Fill the following table, then compute the mean and st. deviation of Y and X (2) Compute the corr. coef. of Y and X
(3) Draw a scatterplot
i
1 -.3 -.3 .09 .1 -.9 .81 .27
2 -.2 -.2 .04 .4 -.6 .36 .12
3 -.1 .01 .7
4 .1 .1 .01 1.2 .2
5 .2 .04 1.6 .6
6 .3 .3 .09 2.0
Total 0 * 6.0 *
ix xxi 2)( xxi iy yyi 2)( yyi ))(( xxyy ii
Exercise
24
4 6 8 10 12 14
X3
5
7
9
11
13
Y3
The Influence of Outliers
The slope becomes bigger
(toward outliers)
The r value becomes smaller (less linear)
25
The slope becomes clear (toward outliers)
The | r | value becomes larger (more linear: 0.1590.935)
The Influence of Outliers
x
y
1086420
5
4
3
2
1
0
Scatterplot of y vs x