Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual...
-
Upload
robyn-sellon -
Category
Documents
-
view
218 -
download
0
Transcript of Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual...
Regression
“A new perspective on freedom”
Classification
?Cat Dog
Cleanliness
Size
?
$ $$ $$$ $$$$
Regression
$
$$
$$$
$$$$
Price
Top speed
x
y
Regression
Data
Goal: given , predict
i.e. find a prediction function
(xi ;yi )i=1:::n
y(x)
x y
Nearest neighbor
-5 0 5 10 15 20 25-10
-5
0
5
10
15
Nearest neighbor
• To predict x– Find the data point xi closest to x
– Choose y = yi
+ No training
– Finding closest point can be expensive
– Overfitting
Kernel Regression
• To predict X– Give data point xi weight
– Normalize weights
– Let y=nX
i=1
m0iyi
k(x) = e.g. k(x) = e¡x 2
2¾2
m0i =
miP nj =1mj
mi = k(x ¡ xi )
Kernel Regression
-5 0 5 10 15 20 25-10
-5
0
5
10
15
[matlab demo]k
Kernel Regression
+ No training
+ Smooth prediction
– Slower than nearest neighbor
– Must choose width of
y(x) =P
i yik(xi ¡ x)P
i k(xi ¡ x)
k
Linear regression
Linear regression
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
[start Matlab demo lecture2.m]
Given examples
Predict given a new point
(xi ;yi )i=1:::n
yn+1 xn+1
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
xn+1
yn+1
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
Linear regression
Predictionyi = w0 + w1xi
Predictionyi = w0 + w1xi;1 + w2xi;2
=³1 xi;1 xi;2
´0
B@w0w1w2
1
CA
= X >i w
xn+1
yn+1
Linear Regression
yy Error or “residual”
Prediction
Observation
x
X i =
0
B@
1xi;1xi;2
1
CA
Sum squared errorX
i(X >
i w ¡ yi)2
y = X >i w
Linear Regression
n
d Solve the system (it’s better not to invert the matrix)
E =X
i
(X >i w¡ yi )2 = kXw¡ yk22
= w>X >Xw¡ 2y>Xw+kyk22
A b>
X =
0
B@
¡ X >1 ¡
¡ X >2 ¡: : :
1
CA
@E@w
=2Aw¡ 2b
Aw= b
LMS Algorithm(Least Mean Squares)
where
Online algorithm
E =X
i
(X >i w¡ yi )2 =
X
i
E i
@E@w
=X
i
@E i
@w
@E i
@w
@E@w
@E i
@w=
@@w
(X >i w¡ yi )2
= 2X i (X >i w¡ yi )
®@E@w
wX i
X >i w= yi
wt+1 =wt +®X i (yi ¡ X >i w
t)
Beyond lines and planes
everything is the same with
still linear in
0 10 200
20
40
yi =w0+w1xi +w2x2i
w
X i =
0
@1xix2i
1
A
Linear Regression [summary]
n
d
Let
For example
Let
Minimize by solvingkX w ¡ yk22³X >X
´w = X >y
y =
0
BB@
y1y2: : :
1
CCA
Given examples
X >i =
³1 xi;1 xi;2 x2i;1 x2i;2 xi;1xi;2
´X >i = (f 1(xi) f 2(xi) : : : f d(xi))
X =
0
BB@
¡ X >1 ¡
¡ X >2 ¡
: : :
1
CCA
Predict yn+1 = X >n+1w
(xi ;yi )i=1:::n
Probabilistic interpretation
Likelihood
X >i wyi
xi
yi jxi » N (X >i w;¾
2)
L =Y
iexp ¡
12¾2
(X >i w ¡ yi)
2 = exp ¡12¾2
X
i(X >
i w ¡ yi)2
= exp ¡12¾2
kX w ¡ yk2
Overfitting
0 2 4 6 8 10 12 14 16 18 20-15
-10
-5
0
5
10
15
20
25
30
[Matlab demo]
Degree 15 polynomial
Ridge Regression(Regularization)
0 2 4 6 8 10 12 14 16 18 20-10
-5
0
5
10
15Effect of regularization (degree 19)
with “small”²Minimize12kX w ¡ yk22+ ²kwk22
A = X >X
b= X >y
(A + ²I )w = bSolve
Let
Probabilistic interpretation
yi jxi » N (X >i w;¾
2)Likelihood
Prior
P (wjx1; : : :xn) =P (w;x1; : : :xn)P (x1; : : :xn)
/ P (w;x1; : : :xn)
Posterior
w » N
Ã
0;¾2
²
!
P (w;x1; : : :xn) = exp ¡½ ²2¾2
kwk22
¾Y
iexp ¡
12¾2
(X >i w ¡ yi)
2
= exp ¡12¾2
2
4²kwk22+X
i(X >
i w ¡ yi)2
3
5
Locally Linear Regression
[source: http://www.cru.uea.ac.uk/cru/data/temperature]
1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Global temperature increase
Locally Linear Regression
• To predict X– Give data point xi weight
– Let
– Let
w=Argminw
nX
i=1
mi (X >i w¡ yi )2
mi = k(xn+1 ¡ xi )
k(x) = e.g. k(x) = e¡x 2
2¾2
yn+1 =X >n+1w
Locally Linear Regression
+ Good even at the boundary (more important in high dimension)
– Solve linear system for each new prediction
– Must choose width of k
To minimize
Solve³X >M X
´w = X >M y
Predict yn+1 = X >n+1w
nX
i=1
mi (X >i w¡ yi )2
where M =
0
@m1
m2
m3
1
A
[source: http://www.cru.uea.ac.uk/cru/data/temperature]
Locally Linear RegressionGaussian kernel
1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
180
[source: http://www.cru.uea.ac.uk/cru/data/temperature]
Locally Linear RegressionLaplacian kernel
1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
180
L1 Regression
Sensitivity to outliers
yi
High weight given to outliers
010
2030
40
0
10
20
30
5
10
15
20
25
Temperature at noon
x>i w
yix>i w
E =X
i(x>i w ¡ yi)
2 =X
iE i E i
@E i@yi Influence
function
s.t. x>i w ¡ yi · ci 8i
yi ¡ x>i w · ci 8i
L1 Regression
E 0 =X
ijx>i w ¡ yi j
=X
iE 0i yix>i w
Linear program
E iE 0i
yix>i w
@E 0i
@yiminw;c
X
ici
Influence function
Spline RegressionRegression on each interval
5200 5400 5600 5800
50
60
70
Spline RegressionWith equality constraints
5200 5400 5600 5800
50
60
70
Spline RegressionWith L1 cost
5200 5400 5600 5800
50
60
70
To learn more
• The Elements of Statistical Learning, Hastie, Tibshirani, Friedman, Springer