CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣...
Transcript of CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣...
![Page 1: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/1.jpg)
CS388:NaturalLanguageProcessing
GregDurrett
Lecture6:NeuralNetworks
![Page 2: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/2.jpg)
Administrivia
‣Mini1gradedlaterthisweek
‣ Project1dueinaweek
![Page 3: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/3.jpg)
Recall:CRFs
‣ NaiveBayes:logisGcregression::HMMs:CRFslocalvs.globalnormalizaGon<->generaGvevs.discriminaGve
(locallynormalizeddiscriminaGvemodelsdoexist(MEMMs))
P (y|x) = 1
Zexp
nX
k=1
w>fk(x,y)
!y1 y2
x1
x2
x3
f1
f2
f3
‣ HMMs:inthestandardsetup,emissionsconsideronewordataGme
‣ CRFs:featuresovermanywordssimultaneously,non-independentfeatures(e.g.,suffixesandprefixes),doesn’thavetobeageneraGvemodel
‣ CondiGonalmodel:x’sareobserved
![Page 4: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/4.jpg)
Recall:SequenGalCRFs
‣Model:
‣ Inference:argmaxP(y|x)fromViterbi
‣ Learning:runforward-backwardtocomputeposteriorprobabiliGes;then
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
@
@wL(y⇤,x) =
nX
i=1
fe(y⇤i , i,x)�
nX
i=1
X
s
P (yi = s|x)fe(s, i,x)
y1 y2 yn…�t
�e
‣ Emissionfeaturescaptureword-levelinfo,transiGonsenforcetagconsistency
![Page 5: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/5.jpg)
ThisLecture
‣ Feedforwardneuralnetworks+backpropagaGon
‣ Neuralnetworkbasics
‣ ApplicaGons
‣ Neuralnetworkhistory
‣ Beamsearch:inafewlectures
‣ ImplemenGngneuralnetworks(ifGme)
‣ FinishdiscussionofNER
![Page 6: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/6.jpg)
NER
![Page 7: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/7.jpg)
NER
‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem
‣ OtherpiecesofinformaGonthatmanysystemscapture
‣Worldknowledge:
ThedelegaGonmetthepresidentattheairport,Tanjugsaid.
![Page 8: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/8.jpg)
ORG?PER?
NonlocalFeatures
ThedelegaGonmetthepresidentattheairport,Tanjugsaid.
ThenewsagencyTanjugreportedontheoutcomeofthemeeGng.
‣Morecomplexfactorgraphstructurescanletyoucapturethis,orjustdecodesentencesinorderandusefeaturesonprevioussentences
FinkelandManning(2008),RaGnovandRoth(2009)
![Page 9: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/9.jpg)
Semi-MarkovModels
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
‣ Chunk-levelpredicGonratherthantoken-levelBIO
‣ yisasetofspanscoveringthesentence
‣ Cons:there’sanextrafactorofninthedynamicprograms
{ { { { { {
PER O LOC ORG OO
‣ Pros:featurescanlookatwholespanatonce
SarawagiandCohen(2004)
![Page 10: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/10.jpg)
EvaluaGngNER
‣ PredicGonofallOssGllgets66%accuracyonthisexample!
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣Whatwereallywanttoknow:howmanynamedenGtychunkpredicGonsdidwegetright?‣ Precision:oftheoneswepredicted,howmanyareright?
‣ Recall:ofthegoldnamedenGGes,howmanydidwefind?
‣ F-measure:harmonicmeanofthesetwo
![Page 11: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/11.jpg)
HowwelldoNERsystemsdo?
RaGnovandRoth(2009)
Lampleetal.(2016)
BiLSTM-CRF+ELMo Petersetal.(2018)
92.2
BERT Devlinetal.(2019)
92.8
![Page 12: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/12.jpg)
ModernEnGtyTyping
Choietal.(2018)
‣Moreandmoreclasses(17->112->10,000+)
![Page 13: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/13.jpg)
NeuralNetHistory
![Page 14: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/14.jpg)
History:NN“darkages”‣ Convnets:appliedtoMNISTbyLeCunin1998
‣ LSTMs:HochreiterandSchmidhuber(1997)
‣ Henderson(2003):neuralshin-reduceparser,notSOTA
![Page 15: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/15.jpg)
2008-2013:Aglimmeroflight…
‣ CollobertandWeston2011:“NLP(almost)fromscratch”‣ FeedforwardneuralnetsinducefeaturesforsequenGalCRFs(“neuralCRF”)
‣ 2008versionwasmarredbybadexperiments,claimedSOTAbutwasn’t,2011versionGedSOTA
‣ Socher2011-2014:tree-structuredRNNsworkingokay
‣ Krizhevskeyetal.(2012):AlexNetforvision
![Page 16: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/16.jpg)
2014:Stuffstartsworking
‣ Sutskeveretal.+Bahdanauetal.:seq2seqforneuralMT(LSTMs)
‣ Kim(2014)+Kalchbrenneretal.(2014):sentenceclassificaGon/senGment(convnets)
‣ 2015:explosionofneuralnetsforeverythingunderthesun
‣ ChenandManningtransiGon-baseddependencyparser(basedonfeedforwardnetworks)
‣Whatmadethesework?Data(notasimportantasyoumightthink),op-miza-on(iniGalizaGon,adapGveopGmizers),representa-on(goodwordembeddings)
![Page 17: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/17.jpg)
NeuralNetBasics
![Page 18: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/18.jpg)
NeuralNetworks
‣WanttolearnintermediateconjuncGvefeaturesoftheinput
argmaxyw>f(x, y)‣ LinearclassificaGon:
themoviewasnotallthatgood
I[containsnot&containsgood]
‣ Howdowelearnthisifourfeaturevectorisjusttheunigramindicators?
I[containsnot],I[containsgood]
![Page 19: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/19.jpg)
NeuralNetworks:XOR
x1
x2
x1 x2
1 1111
100 0
00
0
0
1 0
1
x1, x2
(generally x = (x1, . . . , xm))
y
(generally y = (y1, . . . , yn))y = x1 XOR x2
‣ Let’sseehowwecanuseneuralnetstolearnasimplenonlinearfuncGon
‣ Inputs
‣ Output
![Page 20: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/20.jpg)
NeuralNetworks:XOR
x1
x2
x1 x2 x1 XOR x2
1 1111
100 0
00
0
0
1 0
1“or”
y = a1x1 + a2x2 Xy = a1x1 + a2x2 + a3 tanh(x1 + x2)
(looks like action potential in neuron)
![Page 21: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/21.jpg)
NeuralNetworks:XORy = a1x1 + a2x2
x1
x2
x1 x2 x1 XOR x2
1 1111
100 0
00
0
0
1 0
1
Xy = a1x1 + a2x2 + a3 tanh(x1 + x2)
x2
x1
“or”y = �x1 � x2 + 2 tanh(x1 + x2)
![Page 22: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/22.jpg)
NeuralNetworks
Takenfromhvp://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Warp space
ShiftNonlinear transformation
Linear model: y = w · x+ b
y = g(w · x+ b)y = g(Wx+ b)
![Page 23: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/23.jpg)
NeuralNetworks
Takenfromhvp://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Linearclassifier Neuralnetwork…possiblebecausewetransformedthespace!
![Page 24: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/24.jpg)
DeepNeuralNetworks
Takenfromhvp://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
}
outputoffirstlayer
z = g(Vg(Wx+ b) + c)
z = g(Vy + c)
z = V(Wx+ b) + c
Check:whathappensifnononlinearity?Morepowerfulthanbasiclinearmodels?
![Page 25: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/25.jpg)
FeedforwardNetworks,BackpropagaGon
![Page 26: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/26.jpg)
LogisGcRegressionwithNNsP (y|x) = exp(w>f(x, y))P
y0 exp(w>f(x, y0))‣ Singlescalarprobability
P (y|x) = softmax
�[w>f(x, y)]y2Y
� ‣ Computescoresforallpossiblelabelsatonce(returnsvector)
softmax(p)i =exp(pi)Pi0 exp(pi0)
‣ sonmax:expsandnormalizesagivenvector
P (y|x) = softmax(Wf(x)) ‣Weightvectorperclass;Wis[numclassesxnumfeats]
P (y|x) = softmax(Wg(V f(x))) ‣ Nowonehiddenlayer
![Page 27: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/27.jpg)
NeuralNetworksforClassificaGon
V
nfeatures
dhiddenunits
dxnmatrix num_classesxdmatrix
sonmaxWf(x)
z
nonlinearity(tanh,relu,…)
g P(y
|x)
P (y|x) = softmax(Wg(V f(x)))num_classes
probs
![Page 28: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/28.jpg)
TrainingNeuralNetworks
‣Maximizeloglikelihoodoftrainingdata
‣ i*:indexofthegoldlabel
‣ ei:1intheithrow,zeroelsewhere.Dotbythis=selectithindex
z = g(V f(x))P (y|x) = softmax(Wz)
L(x, i⇤) = Wz · ei⇤ � log
X
j
exp(Wz) · ej
L(x, i⇤) = logP (y = i⇤|x) = log (softmax(Wz) · ei⇤)
![Page 29: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/29.jpg)
CompuGngGradients
‣ GradientwithrespecttoW
ifi=i*zj � P (y = i|x)zj
�P (y = i|x)zj
@
@WijL(x, i⇤) =
zj � P (y = i|x)zj
�P (y = i|x)zj otherwise
‣ LookslikelogisGcregressionwithzasthefeatures!
i
j
{
L(x, i⇤) = Wz · ei⇤ � log
X
j
exp(Wz) · ej
W
![Page 30: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/30.jpg)
NeuralNetworksforClassificaGon
V sonmaxWf(x)
zg P
(y|x)
P (y|x) = softmax(Wg(V f(x)))
@L@Wz
![Page 31: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/31.jpg)
CompuGngGradients:BackpropagaGonz = g(V f(x))
AcGvaGonsathiddenlayer
‣ GradientwithrespecttoV:applythechainrule
err(root) = ei⇤ � P (y|x)dim=m dim=d
@L(x, i⇤)@z
= err(z) = W>err(root)
L(x, i⇤) = Wz · ei⇤ � log
X
j
exp(Wz) · ej
[somemath…]
@L(x, i⇤)@Vij
=@L(x, i⇤)
@z
@z
@Vij
![Page 32: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/32.jpg)
BackpropagaGon:Picture
V sonmaxWf(x)
zg P
(y|x)
P (y|x) = softmax(Wg(V f(x)))
@L@W err(root)err(z)
z
‣ Canforgeteverythinganerz,treatitastheoutputandkeepbackpropping
![Page 33: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/33.jpg)
BackpropagaGon:Takeaways‣ GradientsofoutputweightsWareeasytocompute—lookslikelogisGcregressionwithhiddenlayerzasfeaturevector
‣ CancomputederivaGveoflosswithrespecttoztoforman“errorsignal”forbackpropagaGon
‣ Easytoupdateparametersbasedon“errorsignal”fromnextlayer,keeppushingerrorsignalbackasbackpropagaGon
‣ NeedtorememberthevaluesfromtheforwardcomputaGon
![Page 34: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/34.jpg)
ApplicaGons
![Page 35: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/35.jpg)
NLPwithFeedforwardNetworks
Bothaetal.(2017)
…
Fedraisesinterestratesinorderto…
f(x)?? emb(raises)
‣Wordembeddingsforeachwordforminput
‣ ~1000featureshere—smallerfeaturevectorthaninsparsemodels,buteveryfeaturefiresoneveryexample
emb(interest)
emb(rates)‣WeightmatrixlearnsposiGon-dependent
processingofthewords
previousword
currword
nextword
otherwords,feats,etc.
‣ Part-of-speechtaggingwithFFNNs
![Page 36: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/36.jpg)
NLPwithFeedforwardNetworks
‣ HiddenlayermixesthesedifferentsignalsandlearnsfeatureconjuncGons
Bothaetal.(2017)
![Page 37: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/37.jpg)
NLPwithFeedforwardNetworks‣MulGlingualtaggingresults:
Bothaetal.(2017)
‣ GillickusedLSTMs;thisissmaller,faster,andbever
![Page 38: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/38.jpg)
SenGmentAnalysis‣ DeepAveragingNetworks:feedforwardneuralnetworkonaverageofwordembeddingsfrominput
Iyyeretal.(2015)
![Page 39: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/39.jpg)
SenGmentAnalysis
{
{Bag-of-words
TreeRNNs/CNNS/LSTMS
WangandManning(2012)
Kim(2014)
Iyyeretal.(2015)
![Page 40: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/40.jpg)
ImplementaGonDetails
![Page 41: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/41.jpg)
ComputaGonGraphs
‣ CompuGnggradientsishard!ComputaGongraphabstracGonallowsustodefineacomputaGonsymbolicallyandwilldothisforus
‣ AutomaGcdifferenGaGon:keeptrackofderivaGves/beabletobackpropagatethrougheachfuncGon:
y = x * x (y,dy) = (x * x, 2 * x * dx)codegen
‣ UsealibrarylikePytorchorTensorflow.Thisclass:Pytorch
![Page 42: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/42.jpg)
ComputaGonGraphsinPytorch
P (y|x) = softmax(Wg(V f(x)))
class FFNN(nn.Module): def __init__(self, inp, hid, out): super(FFNN, self).__init__() self.V = nn.Linear(inp, hid) self.g = nn.Tanh() self.W = nn.Linear(hid, out) self.softmax = nn.Softmax(dim=0)
def forward(self, x): return self.softmax(self.W(self.g(self.V(x))))
‣ Defineforwardpassfor
![Page 43: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/43.jpg)
ComputaGonGraphsinPytorch
P (y|x) = softmax(Wg(V f(x)))
ffnn = FFNN()
loss.backward()
probs = ffnn.forward(input)loss = torch.neg(torch.log(probs)).dot(gold_label)
optimizer.step()
def make_update(input, gold_label):
ffnn.zero_grad() # clear gradient variables
ei*: one-hot vector of the label (e.g., [0, 1, 0])
![Page 44: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/44.jpg)
TrainingaModel
DefineacomputaGongraph
Foreachepoch:
Computelossonbatch
Foreachbatchofdata:
Decodetestset
Autogradtocomputegradients
TakestepwithopGmizer
![Page 45: CS388: Natural Language Processing Lecture 6: Neural Networks · 2019. 9. 17. · Recall: CRFs ‣ Naive Bayes : logisGc regression :: HMMs : CRFs local vs. global normalizaGon](https://reader035.fdocuments.us/reader035/viewer/2022071412/610981ca9dc52a1b316df918/html5/thumbnails/45.jpg)
NextTime
‣WordrepresentaGons/wordvectors
‣ word2vec,GloVe
‣ Trainingneuralnetworks