Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward...

128
Lecture 8: RNNs Alan Ri2er (many slides from Greg Durrett)

Transcript of Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward...

Page 1: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Lecture8:RNNs

AlanRi2er(many slides from Greg Durrett)

Page 2: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Administrivia

‣ Homework3DueNextweekonMonday

‣ GuestlecturenextweekonWednesday

‣MidtermisnextweekonFriday‣WillcovereverythinguptothisFriday‣ PracFceQuesFons:‣ h2ps://docs.google.com/document/d/1eidU29ni8ZeTlBcrTyQ67duurxU74vk7I16fUok_X8s/edit?usp=sharing

‣ Reading:RNNs‣ Goldberg10,11‣ JurafskyandMarFn,Chapter9

Page 3: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:TrainingTips

Page 4: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:TrainingTips

‣ ParameteriniFalizaFoniscriFcaltogetgoodgradients,someusefulheurisFcs(e.g.,XavieriniFalizer)

Page 5: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:TrainingTips

‣ ParameteriniFalizaFoniscriFcaltogetgoodgradients,someusefulheurisFcs(e.g.,XavieriniFalizer)

‣ DropoutisaneffecFveregularizer

Page 6: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:TrainingTips

‣ ParameteriniFalizaFoniscriFcaltogetgoodgradients,someusefulheurisFcs(e.g.,XavieriniFalizer)

‣ DropoutisaneffecFveregularizer

‣ ThinkaboutyouropFmizer:AdamortunedSGDworkwell

Page 7: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:WordVectors

Page 8: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:WordVectors

goodenjoyable

bad

dog

great

is

Page 9: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext

thedogbitthemanMikolovetal.(2013)

Page 10: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext

thedogbittheman

dog

the

Mikolovetal.(2013)

Page 11: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext

thedogbittheman

dog

the

+

sum,sized

Mikolovetal.(2013)

Page 12: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext

thedogbittheman

dog

the

+

sum,sized

sogmaxMulFplybyW

Mikolovetal.(2013)

Page 13: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext

thedogbittheman

dog

the

+

sum,sizedP (w|w�1, w+1)

sogmaxMulFplybyW

Mikolovetal.(2013)

Page 14: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Recall:ConFnuousBag-of-Words‣ Predictwordfromcontext

thedogbittheman

dog

the

+

sum,sizedP (w|w�1, w+1)

sogmaxMulFplybyW

‣MatrixfactorizaFonapproachesusefulforlearningvectorsfromreallylargedata

Mikolovetal.(2013)

Page 15: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

UsingWordEmbeddings

Page 16: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

UsingWordEmbeddings

‣ Approach1:learnembeddingsdirectlyfromdatainyourneuralmodel,nopretraining

‣ Ogenworkspre2ywell

Page 17: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

UsingWordEmbeddings

‣ Approach1:learnembeddingsdirectlyfromdatainyourneuralmodel,nopretraining

‣ Approach2:pretrainusingGloVe,keepfixed‣ Fasterbecausenoneedtoupdatetheseparameters

‣ Ogenworkspre2ywell

‣ NeedtomakesureGloVevocabularycontainsallthewordsyouneed

Page 18: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

UsingWordEmbeddings

‣ Approach1:learnembeddingsdirectlyfromdatainyourneuralmodel,nopretraining

‣ Approach2:pretrainusingGloVe,keepfixed

‣ Approach3:iniFalizeusingGloVe,fine-tune

‣ Fasterbecausenoneedtoupdatetheseparameters

‣ Notascommonlyusedanymore

‣ Ogenworkspre2ywell

‣ NeedtomakesureGloVevocabularycontainsallthewordsyouneed

Page 19: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ComposiFonalSemanFcs

Page 20: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ComposiFonalSemanFcs

‣WhatifwewantembeddingrepresentaFonsforwholesentences?

Page 21: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ComposiFonalSemanFcs

‣WhatifwewantembeddingrepresentaFonsforwholesentences?

‣ Skip-thoughtvectors(Kirosetal.,2015),similartoskip-gramgeneralizedtoasentencelevel(morelater)

Page 22: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ComposiFonalSemanFcs

‣WhatifwewantembeddingrepresentaFonsforwholesentences?

‣ Skip-thoughtvectors(Kirosetal.,2015),similartoskip-gramgeneralizedtoasentencelevel(morelater)

‣ IsthereawaywecancomposevectorstomakesentencerepresentaFons?Summing?RNNs?

Page 23: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ThisLecture

‣ Vanishinggradientproblem

‣ Recurrentneuralnetworks

‣ LSTMs/GRUs

‣ ApplicaFons/visualizaFons

Page 24: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNBasics

Page 25: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

themoviewasgreat

Page 26: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

themoviewasgreat

Page 27: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

themoviewasgreat thatwasgreat!

Page 28: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

themoviewasgreat thatwasgreat!

‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)

Page 29: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

‣ Instead,weneedto:

themoviewasgreat thatwasgreat!

‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)

Page 30: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

‣ Instead,weneedto:1)Processeachwordinauniformway

themoviewasgreat thatwasgreat!

‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)

Page 31: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNMoFvaFon‣ FeedforwardNNscan’thandlevariablelengthinput:eachposiFoninthefeaturevectorhasfixedsemanFcs

‣ Instead,weneedto:1)Processeachwordinauniformway

themoviewasgreat thatwasgreat!

2)…whilesFllexploiFngthecontextthatthattokenoccursin

‣ Thesedon’tlookrelated(greatisintwodifferentorthogonalsubspaces)

Page 32: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNAbstracFon‣ Cellthattakessomeinputx,hassomehiddenstateh,andupdatesthathiddenstateandproducesoutputy(allvector-valued)

previoush nexth

inputx

outputy

Page 33: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNAbstracFon‣ Cellthattakessomeinputx,hassomehiddenstateh,andupdatesthathiddenstateandproducesoutputy(allvector-valued)

previoush nexth

(previousc) (nextc)

inputx

outputy

Page 34: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence

themoviewasgreat

DTNNVBDJJoutputy=scoreforeachtag,thensogmax

Page 35: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence

‣ Acceptor/encoder:encodeasequenceintoafixed-sizedvectorandusethatforsomepurpose

themoviewasgreat

predictsenFment(matmul+sogmax)

themoviewasgreat

DTNNVBDJJoutputy=scoreforeachtag,thensogmax

Page 36: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence

‣ Acceptor/encoder:encodeasequenceintoafixed-sizedvectorandusethatforsomepurpose

themoviewasgreat

predictsenFment(matmul+sogmax)

translate

themoviewasgreat

DTNNVBDJJoutputy=scoreforeachtag,thensogmax

Page 37: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

RNNUses‣ Transducer:makesomepredicFonforeachelementinasequence

‣ Acceptor/encoder:encodeasequenceintoafixed-sizedvectorandusethatforsomepurpose

themoviewasgreat

predictsenFment(matmul+sogmax)

translate

themoviewasgreat

DTNNVBDJJ

paraphrase/compress

outputy=scoreforeachtag,thensogmax

Page 38: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ElmanNetworks

inputxt

prevhiddenstateht-1 ht

outputyt

Elman(1990)

Page 39: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ElmanNetworks

inputxt

prevhiddenstateht-1 ht

outputyt

‣ Updateshiddenstatebasedoninputandcurrenthiddenstate

Elman(1990)

ht = tanh(Wxt + V ht�1 + bh)

Page 40: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ElmanNetworks

inputxt

prevhiddenstateht-1 ht

outputyt

‣ Computesoutputfromhiddenstate

‣ Updateshiddenstatebasedoninputandcurrenthiddenstate

yt = tanh(Uht + by)

Elman(1990)

ht = tanh(Wxt + V ht�1 + bh)

Page 41: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ElmanNetworks

inputxt

prevhiddenstateht-1 ht

outputyt

‣ Computesoutputfromhiddenstate

‣ Updateshiddenstatebasedoninputandcurrenthiddenstate

‣ Longhistory!(inventedinthelate1980s)

yt = tanh(Uht + by)

Elman(1990)

ht = tanh(Wxt + V ht�1 + bh)

Page 42: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingElmanNetworks

themoviewasgreat

predictsenFment

Page 43: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingElmanNetworks

themoviewasgreat

predictsenFment

‣ “BackpropagaFonthroughFme”:buildthenetworkasonebigcomputaFongraph,someparametersareshared

Page 44: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingElmanNetworks

themoviewasgreat

predictsenFment

‣ “BackpropagaFonthroughFme”:buildthenetworkasonebigcomputaFongraph,someparametersareshared

‣ RNNpotenFallyneedstolearnhowto“remember”informaFonforalongFme!

itwasmyfavoritemovieof2016,thoughitwasn’twithoutproblems->+

Page 45: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingElmanNetworks

themoviewasgreat

predictsenFment

‣ “BackpropagaFonthroughFme”:buildthenetworkasonebigcomputaFongraph,someparametersareshared

‣ RNNpotenFallyneedstolearnhowto“remember”informaFonforalongFme!

itwasmyfavoritemovieof2016,thoughitwasn’twithoutproblems->+

‣ “Correct”parameterupdateistodoabe2erjobofrememberingthesenFmentoffavorite

Page 46: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VanishingGradient

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 47: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VanishingGradient

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 48: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VanishingGradient

<-gradient

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 49: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VanishingGradient

<-gradient<-smallergradient

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 50: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VanishingGradient

‣ Gradientdiminishesgoingthroughtanh;ifnotin[-2,2],gradientisalmost0

<-gradient<-smallergradient

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 51: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VanishingGradient

‣ Gradientdiminishesgoingthroughtanh;ifnotin[-2,2],gradientisalmost0

<-gradient<-smallergradient<-Fnygradient

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 52: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs/GRUs

Page 53: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates

ht = ht�1 � f + func(xt) ht = tanh(Wxt + V ht�1 + bh)

gated Elman

Page 54: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates

‣ Vector-valued“forgetgate”fcomputedbasedoninputandprevioushiddenstate

‣ Sigmoid:elementsoffarein(0,1)

f = �(W xfxt +Whfht�1)

ht = ht�1 � f + func(xt) ht = tanh(Wxt + V ht�1 + bh)

gated Elman

Page 55: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates

‣ Vector-valued“forgetgate”fcomputedbasedoninputandprevioushiddenstate

‣ Sigmoid:elementsoffarein(0,1)

f = �(W xfxt +Whfht�1)

ht = ht�1 � f + func(xt)

=

ht-1 f ht

ht = tanh(Wxt + V ht�1 + bh)

gated Elman

Page 56: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GatedConnecFons‣ Designedtofix“vanishinggradient”problemusinggates

‣ Vector-valued“forgetgate”fcomputedbasedoninputandprevioushiddenstate

‣ Sigmoid:elementsoffarein(0,1)

f = �(W xfxt +Whfht�1)

ht = ht�1 � f + func(xt)

=

ht-1 f ht

ht = tanh(Wxt + V ht�1 + bh)

gated Elman

‣ Iff≈1,wesimplysumupafuncFonofallinputs—gradientdoesn’tvanish!

Page 57: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

‣ “Cell”cinaddiFontohiddenstatehct = ct�1 � f + func(xt,ht�1)

Page 58: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

‣ “Cell”cinaddiFontohiddenstateh

‣ Vector-valuedforgetgatefdependsonthehhiddenstate

ct = ct�1 � f + func(xt,ht�1)

f = �(W xfxt +Whfht�1)

Page 59: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

‣ “Cell”cinaddiFontohiddenstateh

‣ Vector-valuedforgetgatefdependsonthehhiddenstate

‣ BasiccommunicaFonflow:x->c->h->output,eachstepofthisprocessisgatedinaddiFontogatesfrompreviousFmesteps

ct = ct�1 � f + func(xt,ht�1)

f = �(W xfxt +Whfht�1)

Page 60: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

f

hjhj-1

cj-1 cj

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Goldberglecturenotes

‣ f,i,oaregatesthatcontrolinformaFonflow

‣ greflectsthemaincomputaFonofthecell

Page 61: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

i

hjhj-1

cj-1 cj

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Goldberglecturenotes

‣ f,i,oaregatesthatcontrolinformaFonflow

‣ greflectsthemaincomputaFonofthecell

Page 62: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Goldberglecturenotes

‣ f,i,oaregatesthatcontrolinformaFonflow

‣ greflectsthemaincomputaFonofthecell

Page 63: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

Page 64: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

‣ CanweignoretheoldvalueofcforthisFmestep?

Page 65: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

‣ CanweignoretheoldvalueofcforthisFmestep?‣ CananLSTMsumupitsinputsx?

Page 66: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

‣ CanweignoretheoldvalueofcforthisFmestep?

‣ CanweignoreaparFcularinputx?‣ CananLSTMsumupitsinputsx?

Page 67: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

‣ CanweignoretheoldvalueofcforthisFmestep?

‣ CanweignoreaparFcularinputx?‣ CananLSTMsumupitsinputsx?

‣ Canweoutputsomethingwithoutchangingc?

Page 68: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Goldberglecturenotes

‣ IgnoringrecurrentstateenFrely:‣ Letsusgetfeedforwardlayerovertoken

Page 69: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Goldberglecturenotes

‣ IgnoringrecurrentstateenFrely:

‣ Letsusdiscardstopwords

‣ Letsusgetfeedforwardlayerovertoken‣ Ignoringinput:

Page 70: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

xj

fg

io

hjhj-1

cj-1 cj

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Goldberglecturenotes

‣ IgnoringrecurrentstateenFrely:

‣ Letsusdiscardstopwords‣ Summinginputs:

‣ Letsusgetfeedforwardlayerovertoken‣ Ignoringinput:

‣ Letsuscomputeabag-of-wordsrepresentaFon

Page 71: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

<-gradientsimilargradient<-

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 72: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

LSTMs

‣ GradientsFlldiminishes,butinacontrolledwayandgenerallybyless—usuallyiniFalizeforgetgate=1toremembereverythingtostart

<-gradientsimilargradient<-

h2p://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 73: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GRUs

xj

fg

io

hjhj-1

cj-1 cj

hj-1

sj-1

xj

sj

‣ LSTM:morecomplexandslower,mayworkabitbe2er

X

hj

sj

σ X

+1-z

z

σ tanhr

Page 74: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GRUs

xj

fg

io

hjhj-1

cj-1 cj

hj-1

sj-1

xj

sj

‣ GRU:faster,abitsimpler‣ LSTM:morecomplexandslower,mayworkabitbe2er

X

hj

sj

σ X

+1-z

z

σ tanhr

Page 75: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

GRUs

xj

fg

io

hjhj-1

cj-1 cj

hj-1

sj-1

xj

sj

‣ GRU:faster,abitsimpler‣ LSTM:morecomplexandslower,mayworkabitbe2er

X

hj

sj

σ X

+1-z

z

σ tanhr

‣ Twogates:z(forget,mixessandh)andr(mixeshandx)

Page 76: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatdoRNNsproduce?=

‣ Encodingofthesentence—canpassthisadecoderormakeaclassificaFondecisionaboutthesentence

themoviewasgreat

Page 77: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatdoRNNsproduce?

‣ Encodingofeachword—canpassthistoanotherlayertomakeapredicFon(canalsopoolthesetogetadifferentsentenceencoding)

=

‣ Encodingofthesentence—canpassthisadecoderormakeaclassificaFondecisionaboutthesentence

themoviewasgreat

Page 78: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatdoRNNsproduce?

‣ Encodingofeachword—canpassthistoanotherlayertomakeapredicFon(canalsopoolthesetogetadifferentsentenceencoding)

=

‣ Encodingofthesentence—canpassthisadecoderormakeaclassificaFondecisionaboutthesentence

themoviewasgreat

‣ RNNcanbeviewedasatransformaFonofasequenceofvectorsintoasequenceofcontext-dependentvectors

Page 79: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

MulFlayerBidirecFonalRNN

themoviewasgreat

Page 80: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

MulFlayerBidirecFonalRNN

themoviewasgreat

Page 81: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

MulFlayerBidirecFonalRNN

themoviewasgreat themoviewasgreat

Page 82: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

MulFlayerBidirecFonalRNN

‣ SentenceclassificaFonbasedonconcatenaFonofbothfinaloutputs

themoviewasgreat themoviewasgreat

Page 83: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

MulFlayerBidirecFonalRNN

‣ SentenceclassificaFonbasedonconcatenaFonofbothfinaloutputs

‣ TokenclassificaFonbasedonconcatenaFonofbothdirecFons’tokenrepresentaFons

themoviewasgreat themoviewasgreat

Page 84: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

Page 85: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

P (y|x)

Page 86: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldlabel(oruseSVMorotherloss)

P (y|x)

Page 87: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldlabel(oruseSVMorotherloss)

P (y|x)

‣ BackpropagatethroughenFrenetwork

Page 88: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldlabel(oruseSVMorotherloss)

P (y|x)

‣ BackpropagatethroughenFrenetwork‣ Example:senFmentanalysis

Page 89: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

Page 90: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

P (ti|x)

Page 91: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags

P (ti|x)

Page 92: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags

‣ Losstermsfilterbackthroughnetwork

P (ti|x)

Page 93: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags

‣ Losstermsfilterbackthroughnetwork

P (ti|x)

Page 94: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

TrainingRNNs

themoviewasgreat

‣ Loss=negaFveloglikelihoodofprobabilityofgoldpredicFons,summedoverthetags

‣ Losstermsfilterbackthroughnetwork

P (ti|x)

‣ Example:languagemodeling(predictnextwordgivencontext)

Page 95: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

ApplicaFons

Page 96: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?

Page 97: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?‣ SenFment

‣ Languagemodels

‣ Encodeonesentence,predict

‣Moveleg-to-right,per-tokenpredicFon

Page 98: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?‣ SenFment

‣ TranslaFon

‣ Languagemodels

‣ Encodeonesentence,predict

‣Moveleg-to-right,per-tokenpredicFon

Page 99: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?‣ SenFment

‣ TranslaFon

‣ Languagemodels

‣ Encodeonesentence,predict

‣Moveleg-to-right,per-tokenpredicFon

‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(laterinthecourse)

Page 100: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

Page 101: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Karpathyetal.(2015)

Page 102: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Karpathyetal.(2015)

‣ VisualizeacFvaFonsofspecificcells(componentsofc)tounderstandthem

Page 103: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Karpathyetal.(2015)

‣ VisualizeacFvaFonsofspecificcells(componentsofc)tounderstandthem

Page 104: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Karpathyetal.(2015)

‣ Counter:knowwhentogenerate\n‣ VisualizeacFvaFonsofspecificcells(componentsofc)tounderstandthem

Page 105: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack

‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Page 106: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

‣ Binaryswitch:tellsusifwe’reinaquoteornot‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack

‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Page 107: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack

‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Page 108: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

‣ Stack:acFvaFonbasedonindentaFon‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack

‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Page 109: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack

‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Page 110: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

VisualizingLSTMs

Karpathyetal.(2015)

‣ Uninterpretable:probablydoingdouble-duty,oronlymakessenseinthecontextofanotheracFvaFon

‣ VisualizeacFvaFonsofspecificcellstoseewhattheytrack

‣ TraincharacterLSTMlanguagemodel(predictnextcharacterbasedonhistory)overtwodatasets:WarandPeaceandLinuxkernelsourcecode

Page 111: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?‣ SenFment

‣ TranslaFon

‣ Languagemodels

‣ Encodeonesentence,predict

‣Moveleg-to-right,per-tokenpredicFon

‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(nextlecture)

Page 112: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?‣ SenFment

‣ TranslaFon

‣ Languagemodels

‣ Encodeonesentence,predict

‣Moveleg-to-right,per-tokenpredicFon

‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(nextlecture)

‣ Textualentailment

Page 113: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

WhatcanLSTMsmodel?‣ SenFment

‣ TranslaFon

‣ Languagemodels

‣ Encodeonesentence,predict

‣Moveleg-to-right,per-tokenpredicFon

‣ Encodesentence+thendecode,usetokenpredicFonsfora2enFonweights(nextlecture)

‣ Textualentailment

‣ Encodetwosentences,predict

Page 114: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Aboyplaysinthesnow Aboyisoutside

Premise Hypothesis

Page 115: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Aboyplaysinthesnow Aboyisoutsideentails

Premise Hypothesis

Page 116: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Amaninspectstheuniformofafigure Themanissleeping

Aboyplaysinthesnow Aboyisoutsideentails

Premise Hypothesis

Page 117: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Amaninspectstheuniformofafigure Themanissleeping

Aboyplaysinthesnow Aboyisoutsideentails

contradicts

Premise Hypothesis

Page 118: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Amaninspectstheuniformofafigure Themanissleeping

AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying

Aboyplaysinthesnow Aboyisoutsideentails

contradicts

Premise Hypothesis

Page 119: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Amaninspectstheuniformofafigure Themanissleeping

AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying

Aboyplaysinthesnow Aboyisoutsideentails

contradicts

neutral

Premise Hypothesis

Page 120: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Amaninspectstheuniformofafigure Themanissleeping

AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying

Aboyplaysinthesnow Aboyisoutsideentails

contradicts

neutral

‣ Longhistoryofthistask:“RecognizingTextualEntailment”challengein2006(Dagan,Glickman,Magnini)

Premise Hypothesis

Page 121: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

NaturalLanguageInference

Amaninspectstheuniformofafigure Themanissleeping

AnolderandyoungermansmilingTwomenaresmilingandlaughingatcatsplaying

Aboyplaysinthesnow Aboyisoutsideentails

contradicts

neutral

‣ Longhistoryofthistask:“RecognizingTextualEntailment”challengein2006(Dagan,Glickman,Magnini)

‣ Earlydatasets:small(hundredsofpairs),veryambiFous(lotsofworldknowledge,temporalreasoning,etc.)

Premise Hypothesis

Page 122: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

SNLIDataset

Bowmanetal.(2015)

‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements

‣ >500,000sentencepairs

Page 123: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

SNLIDataset

Bowmanetal.(2015)

‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements

‣ >500,000sentencepairs‣ Encodeeachsentenceandprocess

Page 124: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

SNLIDataset

Bowmanetal.(2015)

‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements

‣ >500,000sentencepairs

100DLSTM:78%accuracy

‣ Encodeeachsentenceandprocess

Page 125: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

SNLIDataset

Bowmanetal.(2015)

‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements

‣ >500,000sentencepairs

100DLSTM:78%accuracy

300DLSTM:80%accuracy(Bowmanetal.,2016)

‣ Encodeeachsentenceandprocess

Page 126: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

SNLIDataset

Bowmanetal.(2015)

‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements

‣ >500,000sentencepairs

100DLSTM:78%accuracy

300DLSTM:80%accuracy(Bowmanetal.,2016)

300DBiLSTM:83%accuracy(Liuetal.,2016)

‣ Encodeeachsentenceandprocess

Page 127: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

SNLIDataset

Bowmanetal.(2015)

‣ ShowpeoplecapFonsfor(unseen)imagesandsolicitentailed/neural/contradictorystatements

‣ >500,000sentencepairs

100DLSTM:78%accuracy

300DLSTM:80%accuracy(Bowmanetal.,2016)

300DBiLSTM:83%accuracy(Liuetal.,2016)

‣ Encodeeachsentenceandprocess

‣ Later:be2ermodelsforthis

Page 128: Lecture 8: RNNs - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec8-nn3.pdf‣ Feedforward NNs can’t handle variable length input: each posiFon in the feature vector has

Takeaways

‣ RNNscantransduceinputs(produceoneoutputforeachinput)orcompressthewholeinputintoavector

‣ UsefulforarangeoftaskswithsequenFalinput:senFmentanalysis,languagemodeling,naturallanguageinference,machinetranslaFon

‣ NextFme:CNNsandneuralCRFs