neural dialog -...

Post on 11-Jul-2020

0 views 0 download

Transcript of neural dialog -...

NeuralDialogShrimai Prabhumoye

AlanWBlackSpeechProcessing11-[468]92

Review

• TaskOrientedSystems• Intents,slots,actionsandresponse

• Non-TaskOrientedSystems• Noagenda,forfun

• Buildingdialogsystems• RuleBasedSystems• Eliza

• RetrievalTechniques• Representations:TF-IDF,N-grams,wordsthemselves• SimilarityMeasures:Jaccard,cosine,euclidean distance• Limitations– fixedsetofresponses,novariationinresponse

Review

• TaskOrientedSystems• Non-TaskOrientedSystems• Buildingdialogsystems• RetrievalTechniques

• Representation• WordVectors

• SimilarityMeasures• Limitations– fixedsetofresponses,novariationinresponse

• GenerativeModels

Overview

• WordEmbeddings

• LanguageModelling

• RecurrentNeuralNetworks• SequencetoSequenceModels

• HowtoBuildDialogSystem

• IssuesandExamples

• Alexa-Prize

NeuralDialog

• Wewanttomodel:

• Howtowerepresentsentence(𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 , 𝑃 𝑖𝑛𝑝𝑢𝑡 ?)• Howtobuildalanguagemodel.• Howtorepresentswords(wordembeddings?)

𝑷 𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆 𝒊𝒏𝒑𝒖𝒕)

NaturalLanguageProcessing

• Typicalpreprocessingstepso FormvocabularyofwordsthatmapswordstoauniqueIDo Differentcriteriacanbeusedtoselectwhichwordsarepartof

thevocabulary(eg:thresholdfrequency)o Allwordsnotinthevocabularywillbemappedtoaspecial‘out-

of-vocabulary’• Typicalvocabularysizeswillvarybetween10,000and250,000

(Salakhutdinov,2017)

PreprocessingTechniques

• Tokenization• “Iamagirl.”tokenizedto“I”,“am”,“a”,“girl”,“.”

• Lowercaseallwords• RemovingStopWords• Ex:“the”,“a”,“and”,etc

• FrequencyofWords• SetathresholdandmakeallwordsbelowthisfrequencyasUNK

• Add<START>and<EOS>tagatthebeginningandendofsentence.

(Salakhutdinov,2017)

Vocabulary

One-HotEncoding

• FromitswordID,wegetabasicrepresentationofawordthroughthe

one-hotencodingoftheID

• theone-hotvectorofanIDisavectorfilledwith0s,exceptfora1at

thepositionassociatedwiththeID

• ForvocabularysizeD=10,theone-hotvectorofwordIDw=4is:

𝑒 𝑤 = [0001000000]

(Salakhutdinov,2017)

LimitationsofOne-HotEncoding

LimitationsofOne-HotEncoding

• Aone-hotencodingmakesnoassumptionaboutwordsimilarity.o [“working”,“on”,“Friday”,“is”,“tiring”]doesnotappearinourtrainingset.

o [“working”,“on”,“Monday”,“is”,“tiring”]isinthetrainset.oWewanttomodel𝑃 “𝑡𝑖𝑟𝑖𝑛𝑔” “𝑤𝑜𝑟𝑘𝑖𝑛𝑔”, “𝑜𝑛”, “𝐹𝑟𝑖𝑑𝑎𝑦”, “𝑖𝑠”)oWordrepresentationof“Monday”and“Friday” aresimilarthengeneralize

(Salakhutdinov,2017)

LimitationsofOne-HotEncoding

• Themajorproblemwiththeone-hotrepresentationisthatitisvery

high-dimensional

othedimensionalityofe(w)isthesizeofthevocabulary

oatypicalvocabularysizeis≈100,000

oawindowof10wordswouldcorrespondtoaninputvectorofat

least1,000,000units!

(Salakhutdinov,2017)

ContinuousRepresentationofWords

• Eachwordwisassociatedwithareal-valuedvectorC(w)• Typicalsizeofword– embeddingis300ormore.

(Salakhutdinov,2017)

ContinuousRepresentationofWords

• Wewouldlikethedistance||C(w)-C(w’)||toreflectmeaningfulsimilarities betweenwords

(Salakhutdinov,2017)

LanguageModeling

• Alanguagemodelallowsustopredicttheprobabilityofobserving

the sentence(inagivendataset)as:

𝑃 𝑥G, … , 𝑥I = J𝑃 𝑥K 𝑥G, … , 𝑥KLG)I

KMG

• Herelengthofsentenceisn.• Builda languagemodel usingaRecurrentNeuralNetwork.

WordEmbeddings fromLanguageModels

(Neubig,2017)

ContinuousBagofWords(CBOW)

• Predictwordbasedonsumofsurroundingembeddings

(Neubig,2017)

Skip-gram

• usethecurrentwordtopredictthesurroundingwindowofcontext

words

(Neubig,2017)

BERT(BidirectionalEncoderRepresentationsfromTransformers)

• BERTisamethodofpretraining languagerepresentations

• Data:Wikipedia(2.5Bwords)+BookCorpus (800Mwords)

• Maskoutk%oftheinputwords,andthenpredictthemaskedwords

• WordEmbeddingSize:768

UseofWordEmbeddings

• torepresentasentence

• asinputtoaneuralnetwork

• tounderstandpropertiesofwords

• Partofspeech

• Dotwowordsmeanthesamething?

• semanticrelation(is-a,part-of,went-to-school-at)?

NLPandSequentialData

• NLPisfullofsequentialdata

• Charactersinwords

• Wordsinsentences

• Sentencesindiscourse

• …

(Neubig,2017)

Long-distanceDependenciesinLanguage

• Agreementinnumber,gender,etc.

• He doesnothaveverymuchconfidenceinhimself.

• She doesnothaveverymuchconfidenceinherself.

• Selectional preference

• Thereign haslastedaslongasthelifeofthequeen.

• Therain haslastedaslongasthelifeoftheclouds.

(Neubig,2017)

RecurrentNeuralNetworks

• Toolstorememberinformation

(Neubig,2017)

FeedForwardNN RecurrentNN

UnrollinginTime

• Whatdoesprocessingasequencelooklike?

I hate this movie

RNN

predict

label

RNN

predict

label

RNN

predict

label

predict

label

RNN

(Neubig,2017)

TrainingRNNsI hate this movie

RNN

predict

Prediction1

RNN

predict

Prediction2

RNN

predict

Prediction3

predict

Prediction4

RNN

Label1 Label2 Label3 Label4

Loss1 Loss2 Loss3 Loss4

sum totalloss (Neubig,2017)

WhatcanRNNsdo

• Representasentence

• Readwholesentence,makeaprediction

• Representacontextwithinasentence

• Readcontextupuntilthatpoint

(Neubig,2017)

Representingasentence

• ℎO istherepresentationofthesentence

• ℎO istherepresentationoftheprobabilityofobserving“Ihatethismovie”

I hate this movie

RNN RNN RNN RNN

ℎP ℎG ℎQ ℎR ℎO

(Neubig,2017)

LanguageModelingusingRNN

(Neubig,2017)

I hate this movie<start>

RNN RNN RNN RNN RNN

predict

hate

predict

I

predict

this

predict

movie

predict

<end>

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)Prediction1

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)Prediction1 Prediction2

Bidirectional-RNNs

• Asimpleextension,runtheRNNinbothdirections

(Neubig,2017)Prediction1 Prediction2 Prediction3 Prediction4

RecurrentNeuralNetworks

• TheideabehindRNNsistomakeuseofsequentialinformation.

RecurrentNeuralNetworks

• 𝑥S istheinputattimestept• 𝑥S isthewordembedding• 𝑠S isthehiddenrepresentationattimestept

𝑠S = 𝑓 𝑈𝑥S +𝑊𝑠SLG𝑜S = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑉𝑠S)

• Note: U,V,Wareshared acrossalltimesteps

RNNProblemsandAlternatives

• Vanishinggradients

• Gradientsdecreaseastheygetpushedback

• Sol:LongShort-termMemory(Hochreiter andSchmidhuber 1997)(Neubig,2017)

RNNStrengthsandWeaknesses

• RNNs,particularlydeepRNNs/LSTMs,arequitepowerfulandflexible

• Buttheyrequirealotofdata

• Alsohavetroublewithweakerrorsignalspassedbackfromtheend

ofthesentence

BuildChatbots

• Wewanttomodel𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑖𝑛𝑝𝑢𝑡_𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒)

• Welearnthowtobuildwordembeddings

• Welearnthowtobuildalanguagemodel

• Welearnthowtorepresentasentence.

• Wewanttogetarepresentationoftheinput_sentence andthen

generatetheresponseconditionedontheinput.

ConditionalLanguageModels

• LanguageModel

𝑃 𝑋 = J𝑃 𝑥K|𝑥G, … , 𝑥KLG

I

KMG

• ConditionalLanguageModel

𝑃 𝑌 𝑋 = J𝑃 𝑦 |𝑋, 𝑦G, … , 𝑦 LG

a

`MG

(Neubig,2017)

contextnextword

context

Addedcontext

ConditionalLanguageModel(Sutskever etal.2014)

(Neubig,2017)

Howtopasshiddenstate?

• Initializedecoderw/encoder(Sutskever etal.2014)

• Transform(canbedifferentdimensions)

• Inputateverytimestep(Kalchbrenner &Blunsom 2013)

(Neubig,2017)

SequencetoSequenceModels

ConstraintsofNeuralModels

Backchanneling

Long-termconversationplanning

Context

Engagement

ConstraintsofNeuralModels

Constraints

Gesture

GazeLaughter

Backchanneling

Long-termconversationplanning

Context

Engagement

ExamplesofNeuralChatbots

Tay

Zo

Xiaoice

• https://www.youtube.com/watch?v=dg-x1WuGhuI

AlexaPrizeChallenge

• Challenge:Buildachatbotthatengages theusersfor20mins.• Sponsored12UniversityTeamswith$100k.• CMUMagnusandCMURuby.• Systemsaremulticomponent

oCombinationsoftask/non-taskoHand-writtenandstatistical/neuralmodels

• ItsaboutengagingresearchersoHavingmorePhDstudentsdodialogoGivingaccessfordeveloperstousersoCollectingdata:whatdouserssay

CMUMagnus

• Highaveragenumberofturns

• AverageRating

• Topics:Movies,Sports,Travel,GoT

• Usershadlongerconversationsbutdidnotenjoytheconversation.oIdentifywhenuserisfrustrated orwantstochangetopic.

oIdentifywhattheuserwouldliketotalkabout(intent).

• Detecting“Abusive”remarksandrespondingappropriately

Summary

• Howtorepresentwordsincontinuousspace.• WhatareRNNsandhowtousethemtorepresentasentence.• Sequencetosequencemodelsfor𝑃 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑖𝑛𝑝𝑢𝑡_𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒)• Issuesinneuralmodel• IssueswithLivesystem!

References

• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-03-wordemb.pdf• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-06-rnn.pdf• http://www.phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-08-condlm.pdf• https://www.cs.cmu.edu/~rsalakhu/10707/Lectures/Lecture_Language_2019.pdf• http://www.phontron.com/class/mtandseq2seq2017/mt-spring2017.chapter6.pdf

References

• http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/• http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/• https://nlp.stanford.edu/seminar/details/jdevlin.pdf

RNNtorepresentasentence

RNN

Embedding

how

𝒔𝟎 RNN

Embedding

are

𝒔𝟏RNN

Embedding

you

𝒔𝟐RNN

Embedding

?

𝒔𝟑𝒔𝟒

• 𝑠O istherepresentationoftheentiresentence• 𝑠O istherepresentationofprobabilityofobserving“howareyou?”