Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward...
Transcript of Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward...
Lecture8:RNNs
AlanRi2er(many slides from Greg Durrett)
Administrivia
‣ Homework3DueNextweekonMonday
‣ GuestlecturenextweekonWednesday
‣MidtermisnextweekonFriday‣WillcovereverythinguptothisFriday‣ PracFceQuesFons:‣ h2ps://docs.google.com/document/d/1eidU29ni8ZeTlBcrTyQ67duurxU74vk7I16fUok_X8s/edit?usp=sharing
‣ Reading:RNNs‣ Goldberg10,11‣ JurafskyandMarFn,Chapter9
Recall:TrainingTips
Recall:TrainingTips
‣ ParameteriniFalizaFoniscriFcaltogetgoodgradients,someusefulheurisFcs(e.g.,XavieriniFalizer)
Recall:TrainingTips
‣ ParameteriniFalizaFoniscriFcaltogetgoodgradients,someusefulheurisFcs(e.g.,XavieriniFalizer)
‣ DropoutisaneffecFveregularizer
Recall:TrainingTips
‣ ParameteriniFalizaFoniscriFcaltogetgoodgradients,someusefulheurisFcs(e.g.,XavieriniFalizer)
‣ DropoutisaneffecFveregularizer
‣ ThinkaboutyouropFmizer:AdamortunedSGDworkwell
Recall:WordVectors
Recall:WordVectors
goodenjoyable
bad
dog
great
is
Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext
thedogbitthemanMikolovetal.(2013)
Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext
thedogbittheman
dog
the
Mikolovetal.(2013)
Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext
thedogbittheman
dog
the
+
sum,sized
Mikolovetal.(2013)
Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext
thedogbittheman
dog
the
+
sum,sized
sogmaxMulFplybyW
Mikolovetal.(2013)
Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext
thedogbittheman
dog
the
+
sum,sizedP (w|w�1, w+1)
sogmaxMulFplybyW
Mikolovetal.(2013)
Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext
thedogbittheman
dog
the
+
sum,sizedP (w|w�1, w+1)
sogmaxMulFplybyW
‣MatrixfactorizaFonapproachesusefulforlearningvectorsfromreallylargedata
Mikolovetal.(2013)
UsingWordEmbeddings
UsingWordEmbeddings
‣ Approach1:learnembeddingsdirectlyfromdatainyourneuralmodel,nopretraining
‣ Ogenworkspre2ywell
UsingWordEmbeddings
‣ Approach1:learnembeddingsdirectlyfromdatainyourneuralmodel,nopretraining
‣ Approach2:pretrainusingGloVe,keepfixed‣ Fasterbecausenoneedtoupdatetheseparameters
‣ Ogenworkspre2ywell
‣ NeedtomakesureGloVevocabularycontainsallthewordsyouneed
UsingWordEmbeddings
‣ Approach1:learnembeddingsdirectlyfromdatainyourneuralmodel,nopretraining
‣ Approach2:pretrainusingGloVe,keepfixed
‣ Approach3:iniFalizeusingGloVe,fine-tune
‣ Fasterbecausenoneedtoupdatetheseparameters
‣ Notascommonlyusedanymore
‣ Ogenworkspre2ywell
‣ NeedtomakesureGloVevocabularycontainsallthewordsyouneed
ComposiFonalSemanFcs
ComposiFonalSemanFcs
‣WhatifwewantembeddingrepresentaFonsforwholesentences?
ComposiFonalSemanFcs
‣WhatifwewantembeddingrepresentaFonsforwholesentences?
‣ Skip-thoughtvectors(Kirosetal.,2015),similartoskip-gramgeneralizedtoasentencelevel(morelater)
ComposiFonalSemanFcs
‣WhatifwewantembeddingrepresentaFonsforwholesentences?
‣ Skip-thoughtvectors(Kirosetal.,2015),similartoskip-gramgeneralizedtoasentencelevel(morelater)
‣ IsthereawaywecancomposevectorstomakesentencerepresentaFons?Summing?RNNs?
ThisLecture
‣ Vanishinggradientproblem
‣ Recurrentneuralnetworks
‣ LSTMs/GRUs
‣ ApplicaFons/visualizaFons
RNNBasics
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
themoviewasgreat
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
themoviewasgreat
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
themoviewasgreat thatwasgreat!
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
themoviewasgreat thatwasgreat!
‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
‣ Instead,weneedto:
themoviewasgreat thatwasgreat!
‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
‣ Instead,weneedto:1)Processeachwordinauniformway
themoviewasgreat thatwasgreat!
‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)
RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs
‣ Instead,weneedto:1)Processeachwordinauniformway
themoviewasgreat thatwasgreat!
2)…whilesFllexploiFngthecontextthatthattokenoccursin
‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)
RNNAbstracFon‣ Cellthattakessomeinputx,hassomehiddenstateh,andupdatesthathiddenstateandproducesoutputy(allvector-valued)
previoush nexth
inputx
outputy
RNNAbstracFon‣ Cellthattakessomeinputx,hassomehiddenstateh,andupdatesthathiddenstateandproducesoutputy(allvector-valued)
previoush nexth
(previousc) (nextc)
inputx
outputy
RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence
themoviewasgreat
DTNNVBDJJoutputy=scoreforeachtag,thensogmax
RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence
‣ Acceptor/encoder:encodeasequenceintoafixed-sizedvectorandusethatforsomepurpose
themoviewasgreat
predictsenFment(matmul+sogmax)
themoviewasgreat
DTNNVBDJJoutputy=scoreforeachtag,thensogmax
RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence
‣ Acceptor/encoder:encodeasequenceintoafixed-sizedvectorandusethatforsomepurpose
themoviewasgreat
predictsenFment(matmul+sogmax)
translate
themoviewasgreat
DTNNVBDJJoutputy=scoreforeachtag,thensogmax
RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence
‣ Acceptor/encoder:encodeasequenceintoafixed-sizedvectorandusethatforsomepurpose
themoviewasgreat
predictsenFment(matmul+sogmax)
translate
themoviewasgreat
DTNNVBDJJ
paraphrase/compress
outputy=scoreforeachtag,thensogmax
ElmanNetworks
inputxt
prevhiddenstateht-1 ht
outputyt
Elman(1990)
ElmanNetworks
inputxt
prevhiddenstateht-1 ht
outputyt
‣ Updateshiddenstatebasedoninputandcurrenthiddenstate
Elman(1990)
ht = tanh(Wxt + V ht�1 + bh)
ElmanNetworks
inputxt
prevhiddenstateht-1 ht
outputyt
‣ Computesoutputfromhiddenstate
‣ Updateshiddenstatebasedoninputandcurrenthiddenstate
yt = tanh(Uht + by)
Elman(1990)
ht = tanh(Wxt + V ht�1 + bh)
ElmanNetworks
inputxt
prevhiddenstateht-1 ht
outputyt
‣ Computesoutputfromhiddenstate
‣ Updateshiddenstatebasedoninputandcurrenthiddenstate
‣ Longhistory!(inventedinthelate1980s)
yt = tanh(Uht + by)
Elman(1990)
ht = tanh(Wxt + V ht�1 + bh)
TrainingElmanNetworks
themoviewasgreat
predictsenFment
TrainingElmanNetworks
themoviewasgreat
predictsenFment
‣ “BackpropagaFonthroughFme”:buildthenetworkasonebigcomputaFongraph,someparametersareshared
TrainingElmanNetworks
themoviewasgreat
predictsenFment
‣ “BackpropagaFonthroughFme”:buildthenetworkasonebigcomputaFongraph,someparametersareshared
‣ RNNpotenFallyneedstolearnhowto“remember”informaFonforalongFme!
itwasmyfavoritemovieof2016,thoughitwasn’twithoutproblems->+
TrainingElmanNetworks
themoviewasgreat
predictsenFment
‣ “BackpropagaFonthroughFme”:buildthenetworkasonebigcomputaFongraph,someparametersareshared
‣ RNNpotenFallyneedstolearnhowto“remember”informaFonforalongFme!
itwasmyfavoritemovieof2016,thoughitwasn’twithoutproblems->+
‣ “Correct”parameterupdateistodoabe2erjobofrememberingthesenFmentoffavorite
VanishingGradient
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
VanishingGradient
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
VanishingGradient
<-gradient
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
VanishingGradient
<-gradient<-smallergradient
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
VanishingGradient
‣ Gradientdiminishesgoingthroughtanh;ifnotin[-2,2],gradientisalmost0
<-gradient<-smallergradient
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
VanishingGradient
‣ Gradientdiminishesgoingthroughtanh;ifnotin[-2,2],gradientisalmost0
<-gradient<-smallergradient<-Fnygradient
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs/GRUs
GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates
ht = ht�1 � f + func(xt) ht = tanh(Wxt + V ht�1 + bh)
gated Elman
GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates
‣ Vector-valued“forgetgate”fcomputedbasedoninputandprevioushiddenstate
‣ Sigmoid:elementsoffarein(0,1)
f = �(W xfxt +Whfht�1)
ht = ht�1 � f + func(xt) ht = tanh(Wxt + V ht�1 + bh)
gated Elman
GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates
‣ Vector-valued“forgetgate”fcomputedbasedoninputandprevioushiddenstate
‣ Sigmoid:elementsoffarein(0,1)
f = �(W xfxt +Whfht�1)
ht = ht�1 � f + func(xt)
=
ht-1 f ht
ht = tanh(Wxt + V ht�1 + bh)
gated Elman
GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates
‣ Vector-valued“forgetgate”fcomputedbasedoninputandprevioushiddenstate
‣ Sigmoid:elementsoffarein(0,1)
f = �(W xfxt +Whfht�1)
ht = ht�1 � f + func(xt)
=
ht-1 f ht
ht = tanh(Wxt + V ht�1 + bh)
gated Elman
‣ Iff≈1,wesimplysumupafuncFonofallinputs—gradientdoesn’tvanish!
LSTMs
‣ “Cell”cinaddiFontohiddenstatehct = ct�1 � f + func(xt,ht�1)
LSTMs
‣ “Cell”cinaddiFontohiddenstateh
‣ Vector-valuedforgetgatefdependsonthehhiddenstate
ct = ct�1 � f + func(xt,ht�1)
f = �(W xfxt +Whfht�1)
LSTMs
‣ “Cell”cinaddiFontohiddenstateh
‣ Vector-valuedforgetgatefdependsonthehhiddenstate
‣ BasiccommunicaFonflow:x->c->h->output,eachstepofthisprocessisgatedinaddiFontogatesfrompreviousFmesteps
ct = ct�1 � f + func(xt,ht�1)
f = �(W xfxt +Whfht�1)
LSTMs
xj
f
hjhj-1
cj-1 cj
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
Goldberglecturenotes
‣ f,i,oaregatesthatcontrolinformaFonflow
‣ greflectsthemaincomputaFonofthecell
LSTMs
xj
fg
i
hjhj-1
cj-1 cj
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
Goldberglecturenotes
‣ f,i,oaregatesthatcontrolinformaFonflow
‣ greflectsthemaincomputaFonofthecell
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
Goldberglecturenotes
‣ f,i,oaregatesthatcontrolinformaFonflow
‣ greflectsthemaincomputaFonofthecell
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
‣ CanweignoretheoldvalueofcforthisFmestep?
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
‣ CanweignoretheoldvalueofcforthisFmestep?‣ CananLSTMsumupitsinputsx?
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
‣ CanweignoretheoldvalueofcforthisFmestep?
‣ CanweignoreaparFcularinputx?‣ CananLSTMsumupitsinputsx?
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
‣ CanweignoretheoldvalueofcforthisFmestep?
‣ CanweignoreaparFcularinputx?‣ CananLSTMsumupitsinputsx?
‣ Canweoutputsomethingwithoutchangingc?
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
Goldberglecturenotes
‣ IgnoringrecurrentstateenFrely:‣ Letsusgetfeedforwardlayerovertoken
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
Goldberglecturenotes
‣ IgnoringrecurrentstateenFrely:
‣ Letsusdiscardstopwords
‣ Letsusgetfeedforwardlayerovertoken‣ Ignoringinput:
LSTMs
xj
fg
io
hjhj-1
cj-1 cj
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
Goldberglecturenotes
‣ IgnoringrecurrentstateenFrely:
‣ Letsusdiscardstopwords‣ Summinginputs:
‣ Letsusgetfeedforwardlayerovertoken‣ Ignoringinput:
‣ Letsuscomputeabag-of-wordsrepresentaFon
LSTMs
<-gradientsimilargradient<-
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs
‣ GradientsFlldiminishes,butinacontrolledwayandgenerallybyless—usuallyiniFalizeforgetgate=1toremembereverythingtostart
<-gradientsimilargradient<-
h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/
GRUs
xj
fg
io
hjhj-1
cj-1 cj
hj-1
sj-1
xj
sj
‣ LSTM:morecomplexandslower,mayworkabitbe2er
X
hj
sj
σ X
+1-z
z
σ tanhr
GRUs
xj
fg
io
hjhj-1
cj-1 cj
hj-1
sj-1
xj
sj
‣ GRU:faster,abitsimpler‣ LSTM:morecomplexandslower,mayworkabitbe2er
X
hj
sj
σ X
+1-z
z
σ tanhr
GRUs
xj
fg
io
hjhj-1
cj-1 cj
hj-1
sj-1
xj
sj
‣ GRU:faster,abitsimpler‣ LSTM:morecomplexandslower,mayworkabitbe2er
X
hj
sj
σ X
+1-z
z
σ tanhr
‣ Twogates:z(forget,mixessandh)andr(mixeshandx)
WhatdoRNNsproduce?=
‣ Encodingofthesentence—canpassthisadecoderormakeaclassificaFondecisionaboutthesentence
themoviewasgreat
WhatdoRNNsproduce?
‣ Encodingofeachword—canpassthistoanotherlayertomakeapredicFon(canalsopoolthesetogetadifferentsentenceencoding)
=
‣ Encodingofthesentence—canpassthisadecoderormakeaclassificaFondecisionaboutthesentence
themoviewasgreat
WhatdoRNNsproduce?
‣ Encodingofeachword—canpassthistoanotherlayertomakeapredicFon(canalsopoolthesetogetadifferentsentenceencoding)
=
‣ Encodingofthesentence—canpassthisadecoderormakeaclassificaFondecisionaboutthesentence
themoviewasgreat
‣ RNNcanbeviewedasatransformaFonofasequenceofvectorsintoasequenceofcontext-dependentvectors
MulFlayerBidirecFonalRNN
themoviewasgreat
MulFlayerBidirecFonalRNN
themoviewasgreat
MulFlayerBidirecFonalRNN
themoviewasgreat themoviewasgreat
MulFlayerBidirecFonalRNN
‣ SentenceclassificaFonbasedonconcatenaFonofbothfinaloutputs
themoviewasgreat themoviewasgreat
MulFlayerBidirecFonalRNN
‣ SentenceclassificaFonbasedonconcatenaFonofbothfinaloutputs
‣ TokenclassificaFonbasedonconcatenaFonofbothdirecFons’tokenrepresentaFons
themoviewasgreat themoviewasgreat
TrainingRNNs
themoviewasgreat
TrainingRNNs
themoviewasgreat
P (y|x)
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldlabel(oruseSVMorotherloss)
P (y|x)
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldlabel(oruseSVMorotherloss)
P (y|x)
‣ BackpropagatethroughenFrenetwork
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldlabel(oruseSVMorotherloss)
P (y|x)
‣ BackpropagatethroughenFrenetwork‣ Example:senFmentanalysis
TrainingRNNs
themoviewasgreat
TrainingRNNs
themoviewasgreat
P (ti|x)
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags
P (ti|x)
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags
‣ Losstermsfilterbackthroughnetwork
P (ti|x)
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags
‣ Losstermsfilterbackthroughnetwork
P (ti|x)
TrainingRNNs
themoviewasgreat
‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags
‣ Losstermsfilterbackthroughnetwork
P (ti|x)
‣ Example:languagemodeling(predictnextwordgivencontext)
ApplicaFons
WhatcanLSTMsmodel?
WhatcanLSTMsmodel?‣ SenFment
‣ Languagemodels
‣ Encodeonesentence,predict
‣Moveleg-to-right,per-tokenpredicFon
WhatcanLSTMsmodel?‣ SenFment
‣ TranslaFon
‣ Languagemodels
‣ Encodeonesentence,predict
‣Moveleg-to-right,per-tokenpredicFon
WhatcanLSTMsmodel?‣ SenFment
‣ TranslaFon
‣ Languagemodels
‣ Encodeonesentence,predict
‣Moveleg-to-right,per-tokenpredicFon
‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(laterinthecourse)
VisualizingLSTMs
Karpathyetal.(2015)
VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
Karpathyetal.(2015)
VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
Karpathyetal.(2015)
‣ VisualizeacFvaFonsofspecificcells(componentsofc)tounderstandthem
VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
Karpathyetal.(2015)
‣ VisualizeacFvaFonsofspecificcells(componentsofc)tounderstandthem
VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
Karpathyetal.(2015)
‣ Counter:knowwhentogenerate\n‣ VisualizeacFvaFonsofspecificcells(componentsofc)tounderstandthem
VisualizingLSTMs
Karpathyetal.(2015)
‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack
‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
VisualizingLSTMs
Karpathyetal.(2015)
‣ Binaryswitch:tellsusifwe’reinaquoteornot‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack
‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
VisualizingLSTMs
Karpathyetal.(2015)
‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack
‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
VisualizingLSTMs
Karpathyetal.(2015)
‣ Stack:acFvaFonbasedonindentaFon‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack
‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
VisualizingLSTMs
Karpathyetal.(2015)
‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack
‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
VisualizingLSTMs
Karpathyetal.(2015)
‣ Uninterpretable:probablydoingdouble-duty,oronlymakessenseinthecontextofanotheracFvaFon
‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack
‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode
WhatcanLSTMsmodel?‣ SenFment
‣ TranslaFon
‣ Languagemodels
‣ Encodeonesentence,predict
‣Moveleg-to-right,per-tokenpredicFon
‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(nextlecture)
WhatcanLSTMsmodel?‣ SenFment
‣ TranslaFon
‣ Languagemodels
‣ Encodeonesentence,predict
‣Moveleg-to-right,per-tokenpredicFon
‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(nextlecture)
‣ Textualentailment
WhatcanLSTMsmodel?‣ SenFment
‣ TranslaFon
‣ Languagemodels
‣ Encodeonesentence,predict
‣Moveleg-to-right,per-tokenpredicFon
‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(nextlecture)
‣ Textualentailment
‣ Encodetwosentences,predict
NaturalLanguageInference
Aboyplaysinthesnow Aboyisoutside
Premise Hypothesis
NaturalLanguageInference
Aboyplaysinthesnow Aboyisoutsideentails
Premise Hypothesis
NaturalLanguageInference
Amaninspectstheuniformofafigure Themanissleeping
Aboyplaysinthesnow Aboyisoutsideentails
Premise Hypothesis
NaturalLanguageInference
Amaninspectstheuniformofafigure Themanissleeping
Aboyplaysinthesnow Aboyisoutsideentails
contradicts
Premise Hypothesis
NaturalLanguageInference
Amaninspectstheuniformofafigure Themanissleeping
AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying
Aboyplaysinthesnow Aboyisoutsideentails
contradicts
Premise Hypothesis
NaturalLanguageInference
Amaninspectstheuniformofafigure Themanissleeping
AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying
Aboyplaysinthesnow Aboyisoutsideentails
contradicts
neutral
Premise Hypothesis
NaturalLanguageInference
Amaninspectstheuniformofafigure Themanissleeping
AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying
Aboyplaysinthesnow Aboyisoutsideentails
contradicts
neutral
‣ Longhistoryofthistask:“RecognizingTextualEntailment”challengein2006(Dagan,Glickman,Magnini)
Premise Hypothesis
NaturalLanguageInference
Amaninspectstheuniformofafigure Themanissleeping
AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying
Aboyplaysinthesnow Aboyisoutsideentails
contradicts
neutral
‣ Longhistoryofthistask:“RecognizingTextualEntailment”challengein2006(Dagan,Glickman,Magnini)
‣ Earlydatasets:small(hundredsofpairs),veryambiFous(lotsofworldknowledge,temporalreasoning,etc.)
Premise Hypothesis
SNLIDataset
Bowmanetal.(2015)
‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements
‣ >500,000sentencepairs
SNLIDataset
Bowmanetal.(2015)
‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements
‣ >500,000sentencepairs‣ Encodeeachsentenceandprocess
SNLIDataset
Bowmanetal.(2015)
‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements
‣ >500,000sentencepairs
100DLSTM:78%accuracy
‣ Encodeeachsentenceandprocess
SNLIDataset
Bowmanetal.(2015)
‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements
‣ >500,000sentencepairs
100DLSTM:78%accuracy
300DLSTM:80%accuracy(Bowmanetal.,2016)
‣ Encodeeachsentenceandprocess
SNLIDataset
Bowmanetal.(2015)
‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements
‣ >500,000sentencepairs
100DLSTM:78%accuracy
300DLSTM:80%accuracy(Bowmanetal.,2016)
300DBiLSTM:83%accuracy(Liuetal.,2016)
‣ Encodeeachsentenceandprocess
SNLIDataset
Bowmanetal.(2015)
‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements
‣ >500,000sentencepairs
100DLSTM:78%accuracy
300DLSTM:80%accuracy(Bowmanetal.,2016)
300DBiLSTM:83%accuracy(Liuetal.,2016)
‣ Encodeeachsentenceandprocess
‣ Later:be2ermodelsforthis
Takeaways
‣ RNNscantransduceinputs(produceoneoutputforeachinput)orcompressthewholeinputintoavector
‣ UsefulforarangeoftaskswithsequenFalinput:senFmentanalysis,languagemodeling,naturallanguageinference,machinetranslaFon
‣ NextFme:CNNsandneuralCRFs