Post on 22-Dec-2015
Introduction to Linear and Logistic Regression
Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals Curve Fitting Logistic Regression Odds and Probability
Basic Ideas Jargon
IV = X = Predictor (pl. predictors) DV = Y = Criterion (pl. criteria) Regression of Y on X Linear Model = relations between IV and DV
represented by straight line.
A score on Y has 2 parts – (1) linear function of X and (2) error.
Y Xi i i= + +α β ε (population values)
Basic Ideas (2) Sample value: Intercept – place where X=0 Slope – change in Y if X changes 1 unit If error is removed, we have a predicted value
for each person at X (the line):
Y a bX ei i i= + +
′= +Y a bX
Suppose on average houses are worth about 50.00 Euro a square meter. Then the equation relating price to size would be Y’=0+50X. The predicted price for a 2000 square meter house would be 250,000 Euro
Linear Transformation 1 to 1 mapping of variables via line Permissible operations are addition and
multiplication (interval data)
1086420X
40
35
30
25
20
15
10
5
0
Y
Changing the Y Intercept
Y=5+2XY=10+2XY=15+2X
Add a constant
1086420X
30
20
10
0
Y
Changing the Slope
Y=5+.5XY=5+X
Y=5+2X
Multiply by a constant
′= +Y a bX
Linear Transformation (2) Centigrade to Fahrenheit Note 1 to 1 map Intercept? Slope?
1209060300Degrees C
240
200
160
120
80
40
0D
eg
ree
s F
32 degrees F, 0 degrees C
212 degrees F, 100 degrees C
Intercept is 32. When X (Cent) is 0, Y (Fahr) is 32.
Slope is 1.8. When Cent goes from 0 to 100 (run), Fahr goes from 32 to 212 (rise), and 212-32 = 180. Then 180/100 =1.8 is rise over run is the slope. Y = 32+1.8X. F=32+1.8C.
′= +Y a bX
Standard Deviation and Variance Square root of the variance, which is the sum of
squared distances between each value and the mean divided by population size (finite population)
Example• 1,2,15 Mean=6
• =6.37
€
1− 6( )2
+ (2 − 6)2 + (15 − 6)2
3= 40.66
€
=1
N∗ x i − x( )
2
i=1
N
∑
Correlation Analysis
Correlation coefficient (also called Pearson’s product moment coefficient)
If rX,Y > 0, X and Y are positively correlated (X’s values increase as
Y’s). The higher, the stronger correlation.
rX,Y = 0: independent; rX,Y < 0: negatively correlated€
rXY =x i − x ( )∑ y i − y ( )
(n −1)σ Xσ Y
Regression of Weight on HeightHt Wt
61 105
62 120
63 120
65 160
65 120
68 145
69 175
70 160
72 185
75 210
N=10 N=10
mean=67 mean=150
=4.57 = 33.99
767472706866646260Height in Inches
240
210
180
150
120
90
60
Weight in Lbs
Regression of Weight on Height
Regression of Weight on Height
Regression of Weight on Height
Rise
Run
Y= -316.86+6.97X
Correlation (r) = .94.
Regression equation: Y’=-316.86+6.97X
′= +Y a bX
Predicted Values & Residuals
N Ht Wt Y' RS
1 61 105 108.19 -3.19
2 62 120 115.16 4.84
3 63 120 122.13 -2.13
4 65 160 136.06 23.94
5 65 120 136.06 -16.06
6 68 145 156.97 -11.97
7 69 175 163.94 11.06
8 70 160 170.91 -10.91
9 72 185 184.84 0.16
10 75 210 205.75 4.25
mean 67 150 150.00 0.00
4.57 33.99 31.85 11.89
V 20.89 1155.56 1014.37 141.32
Numbers for linear part and error.
•Y’ is called the predicted value•Y-Y’ the residual (RS)•The residual is the error•Mean of Y’ and Y is the same•Variance of Y is equal to the variance Y’ + RS
′= +Y a bX
Finding the Regression Line
Need to know the correlation, standard deviation and means of X and Y
€
b = rXY
σ Y
σ X
To find the intercept, use: XbYa −=
Suppose rXY = .50, X = .5, meanX = 10, Y = 2, meanY = 5.
25.
25. ==b
15)10(25 −=−=aXY 215' +−=Slope
Intercept
Line of Least Squares Assume linear relations is reasonable, so the 2
variables can be represented by a line. Where should the line go?
Place the line so errors (residuals) are small The line we calculate has a sum of errors = 0 It has a sum of squared errors that are as small as possible;
the line provides the smallest sum of squared errors or least squares
Minimize sum of the quadratic residuals
• Derivation equal 0
€
SRSmin = (RS)2
i=1
n
∑
€
RSi = a + bX i −Yi
€
SRSmin = (a + bX i
i=1
n
∑ −Yi)2
€
∂ (a + bX i −Yi)2
i=1
n
∑∂a
= 0
€
∂ (a + bX i −Yi)2
i=1
n
∑∂b
= 0
€
∂ (a + bX i −Yi)2
i=1
n
∑∂a
= 0
€
2 (a + bX i −Yi)i=1
n
∑ ⋅∂ (a + bX i −Yi)i=1
n
∑∂a
= 0
€
2n (a + bX i
i=1
n
∑ −Yi) = 0
€
ai=1
n
∑ + bX i
i=1
n
∑ = Yi
i=1
n
∑
€
a ⋅n + b X i
i=1
n
∑ = Yi
i=1
n
∑
€
∂ (a + bX i −Yi)2
i=1
n
∑∂b
= 0
€
2 (a + bX i −Yi)i=1
n
∑ ⋅∂ (a + bX i −Yi)i=1
n
∑∂b
= 0
€
2 (a + bX i
i=1
n
∑ −Yi) X i
i=1
n
∑ = 0
€
ai=1
n
∑ X i + bX i2
i=1
n
∑ = X iYi
i=1
n
∑
€
a X i
i=1
n
∑ + b X i2
i=1
n
∑ = X iYi
i=1
n
∑
The coefficients a and b are found by solving the following system of linear equations
€
n X i
i=1
n
∑
X i
i=1
n
∑ X i2
i=1
n
∑
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
a
b
⎡
⎣ ⎢
⎤
⎦ ⎥=
Yi
i=1
n
∑
X iYi
i=1
n
∑
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
Curve Fitting
Linear Regression
Exponential Curve
Logarithmic Curve
Power Curve
€
Y = a + bX
€
Y = aebX a > 0
€
Y = aX b a > 0€
Y = a + b ln(x)
The coefficients a and b are found by solving the following system of linear equations
€
n ˆ X ii=1
n
∑
ˆ X ii=1
n
∑ ˆ X i2
i=1
n
∑
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
ˆ a
b
⎡
⎣ ⎢
⎤
⎦ ⎥=
ˆ Y ii=1
n
∑
ˆ X i ˆ Y ii=1
n
∑
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
with
Linear Regression
Exponential Curve
Logarithmic Curve
Power Curve
€
ˆ a := a ˆ X i := X iˆ Y i := Yi
€
ˆ a := ln(a) ˆ X i := X iˆ Y i := ln(Y )i
€
ˆ a := a ˆ X i := ln(X i) ˆ Y i := Yi
€
ˆ a := ln(a) ˆ X i := ln(X i) ˆ Y i := ln(Yi)
Multiple Linear Regression
€
Ti = a + bX i + cYi
€
n X i
i=1
n
∑ Yi
i=1
n
∑
X i
i=1
n
∑ X i2
i=1
n
∑ X iYi
i=1
n
∑
Yi
i=1
n
∑ X iYi
i=1
n
∑ Yi2
i=1
n
∑
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
a
b
c
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥=
Ti
i=1
n
∑
X iTi
i=1
n
∑
Yi
i=1
n
∑ Ti
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
The coefficients a, b and c are found by solving the following system of linear equations
Polynomial Regression
€
Yi = a + bX i + cX i2
€
n X i
i=1
n
∑ X i2
i=1
n
∑
X i
i=1
n
∑ X i2
i=1
n
∑ X i3
i=1
n
∑
X i2
i=1
n
∑ X i3
i=1
n
∑ X i4
i=1
n
∑
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
a
b
c
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥=
Yi
i=1
n
∑
X iYi
i=1
n
∑
X i2
i=1
n
∑ Yi
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
The coefficients a, b and c are found by solving the following system of linear equations
Logistic Regression Variable is binary (a categorical variable that
has two values such as "yes" and "no") rather than continuous
binary DV (Y) either 0 or 1 For example, we might code a successfully kicked
field goal as 1 and a missed field goal as 0 or we might code yes as 1 and no as 0 or admitted
as 1 and rejected as 0 or Cherry Garcia flavor ice cream as 1 and all other flavors as zero.
If we code like this, then the mean of the distribution is equal to the proportion of 1s in the distribution.
For example if there are 100 people in the distribution and 30 of them are coded 1, then the mean of the distribution is .30, which is the proportion of 1s
The mean of a binary distribution so coded is denoted as P, the proportion of 1s
The proportion of zeros is (1-P), which is sometimes denoted as Q
The variance of such a distribution is PQ, and the standard deviation is Sqrt(PQ)
Suppose we want to predict whether someone is male or female (DV, M=1, F=0) using height in inches (IV)
We could plot the relations between the two variables as we customarily do in regression. The plot might look something like this
None of the observations (data points) fall on the regression line
They are all zero or one
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Predicted values (DV=Y)correspond to probabilities If linear regression is used, the predicted
values will become greater than one and less than zero if one moves far enough on the X-axis
Such values are theoretically inadmissible
€
P := Y =ea +bX
1+ ea +bX=
1
1+ e−(a +bX )
Linear vs. Logistic regression
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Odds and Probability
€
odds =P
1− P
log(odds) = log it(P) = lnP
1− P
⎛
⎝ ⎜
⎞
⎠ ⎟
€
logit(P) = a + bX
P
1− P= ea +bX
P =ea +bX
1+ ea +bX
Linear regression!
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“
benötigt.
Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals Curve Fitting Logistic Regression Odds and Probability