TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language...
Transcript of TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language...
![Page 1: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/1.jpg)
TTIC31190:NaturalLanguageProcessing
KevinGimpelWinter2016
Lecture11:RecurrentandConvolutionalNeuralNetworksinNLP
1
![Page 2: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/2.jpg)
Announcements• Assignment3assignedyesterday,dueFeb.29
• projectproposaldueTuesday,Feb.16
• midtermonThursday,Feb.18
2
![Page 3: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/3.jpg)
Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications
3
![Page 4: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/4.jpg)
2-transformation(1-layer)network
• we’llcallthisa“2-transformation”neuralnetwork,ora“1-layer”neuralnetwork
• inputvectoris• scorevectoris• onehiddenvector(“hiddenlayer”)
4
vectoroflabelscores
![Page 5: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/5.jpg)
1-layerneuralnetworkforsentimentclassification
5
![Page 6: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/6.jpg)
Usesoftmax functiontoconvertscoresintoprobabilities
6
![Page 7: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/7.jpg)
ikr smh heaskedfiryo lastnamesohecanadduonfb lololol
7
intj pronoun prepadj prep verbotherverb det noun pronoun
pronoun propernoun
verbprep intj
NeuralNetworksforTwitterPart-of-SpeechTagging
adj =adjectiveprep=prepositionintj =interjection
• inAssignment3,you’llbuildaneuralnetworkclassifiertopredictaword’sPOStagbasedonitscontext
![Page 8: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/8.jpg)
ikr smh heaskedfiryo lastnamesohecan
8
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?– ithastobeindependentofthelabel– ithastobeafixed-lengthvector
![Page 9: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/9.jpg)
ikr smh heaskedfiryo lastnamesohecan
9
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?
wordvectorforyo
![Page 10: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/10.jpg)
ikr smh heaskedfiryo lastnamesohecan
10
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?
wordvectorforyowordvectorforfir
![Page 11: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/11.jpg)
ikr smh heaskedfiryo lastnamesohecan
11
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
wordvectorforyowordvectorforfir
• whenusingwordvectorsaspartofinput,wecanalsotreatthemasmoreparameterstobelearned!
• thisiscalled“updating”or“fine-tuning”thevectors(sincetheyareinitializedusingsomethinglikeword2vec)
![Page 12: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/12.jpg)
ikr smh heaskedfiryo lastnamesohecan
12
intj pronoun prepadj prep verbotherverbdet noun pronoun
NeuralNetworksforTwitterPart-of-SpeechTagging
vectorforlastvectorforyo
• let’susethecenterword+twowordstotheright:
vectorforname
• ifname istotherightofyo,thenyo isprobablyaformofyour• butourx aboveusesseparatedimensionsforeachposition!
– i.e.,nameistwowordstotheright– whatifnameisonewordtotheright?
![Page 13: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/13.jpg)
FeaturesandFilters• wecoulduseafeaturethatreturns1ifnameistotherightofthecenterword,butthatdoesnotusetheword’sembedding
• howdoweincludeafeaturelike“awordsimilartoname appearssomewheretotherightofthecenterword”?
• ratherthanalwaysspecifyrelativepositionandembedding,wewanttoaddfilters thatlookforwordslikenameanywhereinthewindow(orsentence!)
13
![Page 14: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/14.jpg)
Filters• fornow,thinkofafilterasavectorinthewordvectorspace
• thefiltermatchesaparticularregionofthespace• “match”=“hashighdotproductwith”
14
![Page 15: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/15.jpg)
Convolution• convolutionalneuralnetworksuseabunchofsuchfilters
• eachfilterismatchedagainst(dotproductcomputedwith)eachwordintheentirecontextwindoworsentence
• e.g.,asinglefilterisavectorofsamelengthaswordvectors
15
![Page 16: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/16.jpg)
Convolution
16
vectorforlastvectorforyo vectorforname
![Page 17: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/17.jpg)
Convolution
17
vectorforlastvectorforyo vectorforname
![Page 18: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/18.jpg)
Convolution
18
vectorforlastvectorforyo vectorforname
![Page 19: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/19.jpg)
Convolution
19
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
![Page 20: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/20.jpg)
Pooling
20
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
howdoweconvertthisintoafixed-lengthvector?usepooling:
max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin
![Page 21: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/21.jpg)
Pooling
21
vectorforlastvectorforyo vectorforname
=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence
howdoweconvertthisintoafixed-lengthvector?usepooling:
max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin
then,thissinglefilterproducesasinglefeaturevalue(theoutputofsomekindofpooling).inpractice,weusemanyfiltersofmanydifferentlengths(e.g.,n-gramsratherthanwords).
![Page 22: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/22.jpg)
ConvolutionalNeuralNetworks• convolutionalneuralnetworks(convnets orCNNs)usefiltersthatare“convolvedwith”(matchedagainstallpositionsof)theinput
• informally,thinkofconvolutionas“performthesameoperationeverywhereontheinputinsomesystematicorder”
• “convolutionallayer”=setoffiltersthatareconvolvedwiththeinputvector(whetherx orhiddenvector)
• couldbefollowedbymoreconvolutionallayers,orbyatypeofpooling
• oftenusedinNLPtoconvertasentenceintoafeaturevector
22
![Page 23: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/23.jpg)
RecurrentNeuralNetworksInputisasequence:
23
nottoobad
![Page 24: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/24.jpg)
RecurrentNeuralNetworksInputisasequence:
24
“hiddenvector”
![Page 25: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/25.jpg)
RecurrentNeuralNetworks
25
“hiddenvector”
![Page 26: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/26.jpg)
Disclaimer• thesediagramsareoftenusefulforhelpingusunderstandandcommunicateneuralnetworkarchitectures
• buttheyrarelyhaveanysortofformalsemantics(unlikegraphicalmodels)
• theyaremorelikecartoons
26
![Page 27: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/27.jpg)
LongShort-TermMemoryRNNs(gateless)
27
“memorycell”
![Page 28: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/28.jpg)
LongShort-TermMemoryRNNs(gateless)
28
![Page 29: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/29.jpg)
LongShort-TermMemoryRNNs(gateless)
29
![Page 30: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/30.jpg)
LongShort-TermMemoryRNNs(gateless)
30
Experiment:textclassification• StanfordSentimentTreebank• binaryclassification(positive/negative)
• 25-dimwordvectors• 50-dimcell/hiddenvectors• classificationlayeronfinal hiddenvector• AdaGrad,10epochs,mini-batchsize10• earlystoppingondev set
accuracy
80.6
![Page 31: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/31.jpg)
OutputGates
31
![Page 32: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/32.jpg)
OutputGates
32
![Page 33: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/33.jpg)
OutputGates
33
![Page 34: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/34.jpg)
OutputGates
34
thisispointwisemultiplication!isavector
![Page 35: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/35.jpg)
OutputGates
35
thisispointwisemultiplication!isavector
![Page 36: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/36.jpg)
OutputGates
36
diagonalmatrix
logisticsigmoid,sooutputrangesfrom
0to1
![Page 37: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/37.jpg)
OutputGates
37
acc.
gateless 80.6
outputgates 81.9
![Page 38: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/38.jpg)
OutputGates
38
acc.
gateless 80.6
outputgates 81.9
What’sbeinglearned?(demo)
![Page 39: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/39.jpg)
InputGates
39
![Page 40: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/40.jpg)
InputGates
40
again,thisispointwise
multiplication
![Page 41: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/41.jpg)
InputGates
41
![Page 42: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/42.jpg)
InputGates
42
diagonalmatrix
![Page 43: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/43.jpg)
OutputGates
43
InputGates
difference
![Page 44: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/44.jpg)
InputGates
44
acc.
gateless 80.6
outputgates 81.9
inputgates 84.4
![Page 45: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/45.jpg)
InputandOutputGates
45
acc.
gateless 80.6
outputgates 81.9
inputgates 84.4
input&outputgates 84.6
![Page 46: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/46.jpg)
ForgetGates
46
![Page 47: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/47.jpg)
ForgetGates
47
![Page 48: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/48.jpg)
ForgetGates
48
![Page 49: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/49.jpg)
ForgetGates
49
acc.
gateless 80.6
outputgates 81.9
inputgates 84.4
forgetgates 82.1
![Page 50: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/50.jpg)
AllGates
50
![Page 51: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/51.jpg)
AllGates
51
acc.
gateless 80.6
outputgates 81.9
inputgates 84.4
input&outputgates 84.6
forgetgates 82.1
input&forget gates 84.1
forget& outputgates 82.6
input,forget,outputgates 85.3
![Page 52: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/52.jpg)
Backward&BidirectionalLSTMs
52
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
![Page 53: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/53.jpg)
Backward&BidirectionalLSTMs
53
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
forward backward
gateless 80.6 80.3
outputgates 81.9 83.7
inputgates 84.4 82.9
forgetgates 82.1 83.4
input,forget,outputgates 85.3 85.9
![Page 54: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/54.jpg)
Backward&BidirectionalLSTMs
54
bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax
forward backward bidirectional
gateless 80.6 80.3 81.5
outputgates 81.9 83.7 82.6
inputgates 84.4 82.9 83.9
forgetgates 82.1 83.4 83.1
input,forget,outputgates 85.3 85.9 85.1
![Page 55: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/55.jpg)
LSTM
55
![Page 56: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/56.jpg)
DeepLSTM(2-layer)
56
layer1
layer2
![Page 57: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/57.jpg)
DeepLSTM(2-layer)
57
layer1
layer2
acc.
gatelessshallow(50) 80.6
deep(30,30) 80.8
input,forget,outputshallow(50) 85.3
deep(30,30) ~85
![Page 58: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/58.jpg)
DeepBidirectionalLSTMs
58
concatenatehiddenvectorsofforward&backwardLSTMs,connecteachentrytoforwardandbackwardhiddenvectorsinnextlayer
![Page 59: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed3f6068d46b66d22632b15/html5/thumbnails/59.jpg)
59
(logistic)sigmoid: