UVA CS 4501: Machine Learning Lecture 8: Review of Regression · Regularized multivariate linear...
Transcript of UVA CS 4501: Machine Learning Lecture 8: Review of Regression · Regularized multivariate linear...
-
UVACS4501:MachineLearning
Lecture8:ReviewofRegression
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
-
Wherearewe?èFivemajorsec@onsofthiscourse
q Regression(supervised)q Classifica@on(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
2/19/18 2
Dr.YanjunQi/UVACS
-
Lecture3
q Linearregression(akaleastsquares)q Learntoderivetheleastsquareses@matebynormalequa@on
q Evalua@onwithCross-valida@on
2/19/18 3
Dr.YanjunQi/UVACS
-
Lecture-4
q Morewaystotrain/performop@miza@onforlinearregressionmodelsü Review:GradientDescentü GradientDescent(GD)forLRü Stochas@cGD(SGD)forLR
2/19/18 4
Dr.YanjunQi/UVACS
-
Lecture-5
q RegressionModelsBeyondLinearü LRwithnon-linearbasisfunc@onsü Instance-basedRegression:K-NearestNeighborsü Locallyweightedlinearregressionü RegressiontreesandMul@linearInterpola@on(later)
2/19/18 5
Dr.YanjunQi/UVACS
-
Lecture-6
q LinearRegressionModelwithRegulariza@onsü Review:(Ordinary)Leastsquares:squaredloss(NormalEqua@on)ü Ridgeregression:squaredlosswithL2regulariza@onü Lassoregression:squaredlosswithL1regulariza@onü Elas@cregression:squaredlosswithL1ANDL2regulariza@onü WHYandInfluenceofRegulariza@onParameter
2/19/18 6
Dr.YanjunQi/UVACS
-
Lecture-7
q FeatureSelec@onü GeneralIntroduc@onü Filteringü Wrapperü EmbeddedMethod
2/19/18 7
Dr.YanjunQi/UVACS
-
Task
Machine Learning in a Nutshell
Representation
Score Function
Search/Optimization
Models, Parameters
2/19/18 8
Dr.YanjunQi/UVACS
-
Multivariate Linear Regression
Regression
Y = Weighted linear sum of X’s
Least-squares / GD / SGD
Linear algebra
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
ŷ = f (x) =θ T x2/19/18 9
Dr.YanjunQi/UVACS
-
Multivariate Linear Regression with basis Expansion
Regression
Y = Weighted linear sum of (X basis expansion)
SSE
Linear algebra
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
!! ŷ =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ
2/19/18 10
Dr.YanjunQi/UVACS
-
K-Nearest Neighbor
Regression/ classification
Local Smoothness
NA
NA
Training Samples
Task
Representation
Score Function
Search/Optimization
Models, Parameters
2/19/18 11
Dr.YanjunQi/UVACS6316/f16
-
Locally Weighted / Kernel Linear Regression
Regression
Y = Weighted linear sum of X’s
Weighted SSE
Linear algebra
Local Regression coefficients
(conditioned on each test point)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min
α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2
i=1
N
∑2/19/18 12
Dr.YanjunQi/UVACS
θ*(x0)= (BTW(x0)B)−1BTW(x0)y
-
Regularized multivariate linear regression
Regression
Y = Weighted linear sum of X’s
Least-squares + Regularization
Linear algebra for Ridge / sub-GD for Lasso & Elastic
Regression coefficients (regularized weights)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
2/19/18 13
Dr.YanjunQi/UVACS
min J(β ) = Y −Y^⎛
⎝⎞⎠
2
i=1
n
∑ + λ( β jq )1/qj=1
p
∑
-
FeatureSelec@on:filtersvs.wrappersvs.embedding
n Maingoal:ranksubsetsofusefulfeatures
FromDr.IsabelleGuyon2/19/18
Dr.YanjunQi/UVACS
14
-
Complexity versus Goodness of Fit: Model Selection
x
y
x
y
x
y
x
y
Too simple?
Too complex ? About right ?
Training data
What ultimately matters: GENERALIZATION
LowVariance/HighBias
LowBias/HighVariance
2/19/18 15
Dr.YanjunQi/UVACS
-
e.g.Byk=10foldCrossValida@onmodel P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 train train train train train train train train train test
2 train train train train train train train train test train
3 train train train train train train train test train train
4 train train train train train train test train train train
5 train train train train train test train train train train
6 train train train train test train train train train train
7 train train train test train train train train train train
8 train train test train train train train train train train
9 train test train train train train train train train train
10 test train train train train train train train train train
• Dividedatainto10equalpieces
• 9piecesastrainingset,therest1astestset
• Collectthescoresfromthediagonal
• Wenormallyusethemeanofthescores 16 2/19/18
Dr.YanjunQi/UVACSMakesurethatthetrain/test/valida@onfoldsareindeedindependentsamples.
-
EvaluaJone.g.Regression(1Dexample)
2/19/18
17
y^=θ 0+θ1 x1
θ * = XTX( )−1XT !y
Dr.YanjunQi/UVACS
ε1ε2
:= ε i
Jtest =1m
(x iTθ * − yi )2i=n+1
n+m
∑ = 1m ε i2
i=n+1
n+m
∑
Tes@ngMSEErrortoreport:
-
e.g.APrac@calApplica@onofRegressionModel
2/19/18
Dr.YanjunQi/UVACS
18
ProceedingsofHLT’2010HumanLanguageTechnologies:
-
2/19/18
Dr.YanjunQi/UVACS
19
ThefeatureweightscanbedirectlyinterpretedasU.S.dollarscontributedtothepredictedvalueyˆbyeachoccurrenceofthefeature.
tomovies
AREALAPPLICATION:MovieReviewsandRevenues:AnExperimentinTextRegression,ProceedingsofHLT'10HumanLanguageTechnologies:
-
2/19/18
Dr.YanjunQi/UVACS
20
Acombina@onofthemetaandtextfeaturesachievesthebestperformancebothintermsofMAEandpearsonr.
-
2/19/18
Dr.YanjunQi/UVACS
21
MovieReviewsandRevenues:AnExperimentinTextRegression,ProceedingsofHLT'10HumanLanguageTechnologies:
Thefeaturesarefromthetext-onlymodelannotatedinTable2(total,notperscreen).ThefeatureweightscanbedirectlyinterpretedasU.S.dollarscontributedtothepredictedvaluebyeachoccurrenceofthefeature.Sen@ment-relatedtextfeaturesarenotasprominentasmightbeexpected,andtheiroverallpropor@oninthesetoffeatureswithnon-zeroweightsisquitesmall(es@matedinpreliminarytrialsatlessthan15%).Phrasesthatrefertometadataarethemorehighlyweightedandfrequentones.
-
2/19/18 22
AnOpera@onalModelofMachineLearning
Learner Reference Data
Model
Execution Engine
Model Tagged Data
Production Data Deployment
Consistsofinput-outputpairs
Dr.YanjunQi/UVACS
-
GoalsinGeneral
• 1.GeneralizeWell– Connec@ngtoAsympto@cERRORBOUND
• 2.Interpretable– Especiallyforsomedomains,thisisabouttrust!
• 3.Computa@onalEfficient
2/19/18
Dr.YanjunQi/UVACS
23
-
24
Probabilis@cInterpreta@onofLinearRegression(LATER)
• Letusassumethatthetargetvariableandtheinputsarerelatedbytheequa@on:
whereεisanerrortermofunmodeledeffectsorrandomnoise
• NowassumethatεfollowsaGaussianN(0,σ),thenwehave:
• Byiid(amongsamples)assump@on:
yi =θTx i + ε i
⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 22
221
σθ
σπθ )(exp);|( i
Ti
iiyxyp x
⎟⎟
⎠
⎞⎜⎜
⎝
⎛ −−⎟
⎠⎞⎜
⎝⎛== ∑∏ =
=2
12
1 221
σθ
σπθθ
n
i iT
inn
iii
yxypL
)(exp);|()(
x
2/19/18
Dr.YanjunQi/UVACS
Manymorevaria@onsofLinearRfromthisperspec@ve,e.g.binomial/poisson
(LATER)
-
References
q BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides
q Prof.AlexanderGray’sslides
2/19/18 25
Dr.YanjunQi/UVACS