Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating...
Transcript of Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating...
![Page 1: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/1.jpg)
MachineLearning
NeuralNetworks:Introduction
1BasedonslidesandmaterialfromGeoffreyHinton,RichardSocher,DanRoth,Yoav Goldberg,ShaiShalev-Shwartz andShaiBen-David,andothers
![Page 2: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/2.jpg)
Wherearewe?
Generallearningprinciples• Overfitting• Mistake-boundlearning• PAClearning,samplecomplexity• Hypothesischoice&VCdimensions• Trainingandgeneralizationerrors• RegularizedEmpiricalLoss
Minimization• BayesianLearning
Learningalgorithms• DecisionTrees• Perceptron• AdaBoost• SupportVectorMachines• NaïveBayes• LogisticRegression
4
Producelinearclassifiers
![Page 3: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/3.jpg)
NeuralNetworks
• Whatisaneuralnetwork?
• Predictingwithaneuralnetwork
• Trainingneuralnetworks
• Practicalconcerns
6
![Page 4: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/4.jpg)
Thislecture
• Whatisaneuralnetwork?– Thehypothesisclass– Structure,expressiveness
• Predictingwithaneuralnetwork
• Trainingneuralnetworks
• Practicalconcerns
7
![Page 5: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/5.jpg)
Wehaveseenlinearthresholdunits
11
features
dotproduct
threshold
Predictionsgn(&'( + *) = sgn(∑./0/ + *)
Learningvariousalgorithmsperceptron,SVM,logisticregression,…
ingeneral,minimizeloss
Butwheredotheseinputfeaturescomefrom?
Whatifthefeatureswereoutputsofanotherclassifier?
![Page 6: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/6.jpg)
Featuresfromclassifiers
12
![Page 7: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/7.jpg)
Featuresfromclassifiers
13
![Page 8: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/8.jpg)
Featuresfromclassifiers
14
Eachoftheseconnectionshavetheirownweightsaswell
![Page 9: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/9.jpg)
Featuresfromclassifiers
15
![Page 10: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/10.jpg)
Featuresfromclassifiers
16
Thisisatwolayerfeedforwardneuralnetwork
![Page 11: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/11.jpg)
Featuresfromclassifiers
17
Theoutputlayer
ThehiddenlayerTheinputlayer
Thisisatwolayerfeedforwardneuralnetwork
Thinkofthehiddenlayeraslearningagoodrepresentationoftheinputs
![Page 12: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/12.jpg)
Featuresfromclassifiers
19
Thedotproductfollowedbythethresholdconstitutesaneuron
Fiveneuronsinthispicture(fourinhiddenlayerandoneoutput)
Thisisatwolayerfeedforwardneuralnetwork
![Page 13: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/13.jpg)
Butwheredotheinputscomefrom?
20
Whatiftheinputsweretheoutputsofaclassifier?Theinputlayer
Wecanmakeathree layernetwork….Andsoon.
![Page 14: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/14.jpg)
Letustrytoformalizethis
21
![Page 15: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/15.jpg)
Neuralnetworks
Arobustapproachforapproximatingreal-valued,discrete-valuedorvectorvaluedfunctions
Amongthemosteffectivegeneralpurpose supervisedlearningmethodscurrentlyknown
Especiallyforcomplexandhardtointerpretdatasuchasreal-worldsensorydata
TheBackpropagationalgorithmforneuralnetworkshasbeenshownsuccessfulinmanypracticalproblems
Acrossvariousapplicationdomains
22
![Page 16: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/16.jpg)
Artificialneurons
Functionsthatverylooselymimicabiologicalneuron
Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:
1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation
25
123423 = activation(&'( + *)
![Page 17: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/17.jpg)
Artificialneurons
Functionsthatverylooselymimicabiologicalneuron
Aneuronacceptsacollectionofinputs(avectorx)andproducesanoutputby:
1. Applyingadotproductwithweightsw andaddingabiasb2. Applyinga(possiblynon-linear)transformationcalledanactivation
27
Dotproduct
Thresholdactivation
Otheractivationsarepossible
123423 = activation(&'( + *)
![Page 18: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/18.jpg)
Activationfunctions
Nameoftheneuron Activationfunction:activation(;)Linearunit ;Threshold/sign unit sgn(;)
Sigmoidunit1
1 + exp(−;)Rectifiedlinearunit(ReLU) max(0, ;)Tanh unit tanh(;)
28
123423 = activation(&'( + *)
Manymoreactivationfunctionsexist(sinusoid,sinc,gaussian,polynomial…)
Alsocalledtransferfunctions
![Page 19: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/19.jpg)
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
30
Input
Hidden
Output
wFGH
wFGI
![Page 20: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/20.jpg)
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
31
Input
Hidden
Output
wFGH
wFGI
![Page 21: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/21.jpg)
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
32
CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier
Input
Hidden
Output
wFGH
wFGI
![Page 22: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/22.jpg)
Aneuralnetwork
Afunctionthatconvertsinputstooutputsdefinedbyadirectedacyclicgraph
– Nodesorganizedinlayers,correspondtoneurons
– Edgescarryoutputofoneneurontoanother,associatedwithweights
• Todefineaneuralnetwork,weneedtospecify:– Thestructureofthegraph
• Howmanynodes,theconnectivity– Theactivationfunctiononeachnode– Theedgeweights
33
CalledthearchitectureofthenetworkTypicallypredefined,partofthedesignoftheclassifier
Learnedfromdata
Input
Hidden
Output
wFGH
wFGI
![Page 23: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/23.jpg)
Abriefhistoryofneuralnetworks
• 1943:McCulloughandPittsshowedhowlinearthresholdunitscancomputelogicalfunctions
• 1949:Hebbsuggestedalearningrulethathassomephysiologicalplausibility
• 1950s:Rosenblatt,thePeceptron algorithmforasinglethresholdneuron
• 1969:MinskyandPapert studiedtheneuronfromageometricalperspective
• 1980s:Convolutionalneuralnetworks(Fukushima,LeCun),thebackpropagationalgorithm(various)
• Early2000s-today:Morecompute,moredata,deepernetworks
34Seealso:http://people.idsia.ch/~juergen/deep-learning-overview.html
very
![Page 24: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/24.jpg)
Whatfunctionsdoneuralnetworksexpress?
35
![Page 25: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/25.jpg)
Asingleneuronwiththresholdactivation
36
Prediction=sgn(b+w1 x1 +w2x2)
++
++
+ +++
-- --
-- -- --
---- --
--
b+w1 x1 +w2x2=0
![Page 26: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/26.jpg)
Twolayers,withthresholdactivations
37
Ingeneral,convexpolygons
FigurefromShaiShalev-Shwartz andShaiBen-David,2014
![Page 27: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/27.jpg)
Threelayerswiththresholdactivations
38
Ingeneral,unionsofconvexpolygons
FigurefromShaiShalev-Shwartz andShaiBen-David,2014
![Page 28: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/28.jpg)
Neuralnetworksareuniversalfunctionapproximators
• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]
• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]
• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis
• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)
• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H
– Lowerbound:Ω N H
39
Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness
![Page 29: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/29.jpg)
Neuralnetworksareuniversalfunctionapproximators
• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]
• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]
• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis
• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)
• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H
– Lowerbound:Ω N H
40
![Page 30: Neural Networks: Introduction - svivek · Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general](https://reader033.fdocuments.us/reader033/viewer/2022042219/5ec4d6eb87da536c513901e0/html5/thumbnails/30.jpg)
Neuralnetworksareuniversalfunctionapproximators
• Anycontinuousfunctioncanbeapproximatedtoarbitraryaccuracyusingonehiddenlayerofsigmoidunits[Cybenko 1989]
• Approximationerrorisinsensitivetothechoiceofactivationfunctions[DasGupta etal1993]
• Twolayerthreshold networkscanexpressanyBooleanfunction– Exercise:Provethis
• VCdimensionofthresholdnetworkwithedgesE:JK = L(|N|log|N|)
• VCdimensionofsigmoidnetworkswithnodesVandedgesE:– Upperbound:Ο J H N H
– Lowerbound:Ω N H
41
Exercise:Showthatifwehaveonlylinearunits,thenmultiplelayersdoesnotchangetheexpressiveness