Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep...

38
Deep Learning for Natural Language Processing

Transcript of Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep...

Page 1: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Deep Learning for Natural Language Processing

Page 2: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Topics

•  Word embeddings

•  Recurrent neural networks

•  Long-short-term memory networks

•  Neural machine translation

•  Automatically generating image captions

Page 3: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word meaning in NLP •  How do we capture meaning and context of words?

Synonyms: Synechdoche: “I loved the movie.” “Today, Washington affirmed “I adored the movie.” its opposition to the trade pact.” Homonyms: “I deposited the money in the bank.” “I buried the money in the bank.”

Polysemy: “I read a book today.” “I wasn’t able to book the hotel room.”

Page 4: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word Embeddings

“One of the most successful ideas of modern NLP”.

One example: Google’s Word2Vec algorithm

Page 5: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec algorithm

...

...

...

Page 6: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Page 7: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Hiddenlayer(linearac.va.onfunc.on)300units

Page 8: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Hiddenlayer(linearac.va.onfunc.on)300units

Output:Probability(foreachwordwiinvocabulary)thatwiisnearbytheinputwordinasentence.

10,000units

Page 9: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Hiddenlayer(linearac.va.onfunc.on)300units

Output:Probability(foreachwordwiinvocabulary)thatwiisnearbytheinputwordinasentence.

10,000units

10,000×300weights

300×10,000weights

Page 10: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec training

•  Training corpus of documents

•  Collect pairs of nearby words

•  Example “document”: Every morning she drinks Starbucks coffee.

Training pairs (window size = 3): (every, morning) (morning, drinks) (drinks, Starbucks) (every, she) (she, drinks) (drinks, coffee) (morning, she) (she, Starbucks) (Starbucks, coffee)

Page 11: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec training via backpropagation

...

...

...

10,000×300weights

300×10,000weights

Linearac<va<onfunc<on

drinks

...Starbucks

Target (probability that “Starbucks” is nearby “drinks”)

Page 12: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word2Vec training via backpropagation

...

...

...

10,000×300weights

300×10,000weights

Linearac<va<onfunc<on

drinks

...coffee

Target (probability that “coffee” is nearby “drinks”)

Page 13: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Learned word vectors

...

...

...

10,000×300weights

drinks

...

Page 14: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Some surprising results of word2vec

h@p://www.aclweb.org/anthology/N13-1#page=784

Page 15: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

Page 16: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

Page 17: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

Page 18: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Word embeddings demo

http://bionlp-www.utu.fi/wv_demo/

Page 19: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

From http://axon.cs.byu.edu/~martinez/classes/678/Slides/Recurrent.pptx

Recurrent Neural Network (RNN)

Page 20: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

From http://eric-yuan.me/rnn2-lstm/

Recurrent Neural Network “unfolded” in time

Training algorithm: “Backpropagation in time”

Page 21: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Encoder-decoder (or “sequence-to-sequence”) networks for translation

h@p://book.paddlepaddle.org/08.machine_transla.on/image/encoder_decoder_en.png

Page 22: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Problem for RNNs: learning long-term dependencies.

“The cat that my mother’s sister took to Hawaii the year before last when you were in high school is now living with my cousin.” Backpropagation through time: problem of vanishing gradients

Page 23: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Long Short Term Memory (LSTM)

•  A “neuron” with a complicated memory gating structure.

•  Replaces ordinary hidden neurons in RNNs.

•  Designed to avoid the long-term dependency problem

Page 24: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Long-Short-Term-Memory (LSTM) Unit

SimpleRNN(hidden)unit

LSTM(hidden)unit Fromh@ps://deeplearning4j.org/lstm.html

Page 25: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Comments on LSTMs

•  LSTM unit replaces simple RNN unit •  LSTM internal weights still trained with backpropagation

•  Cell value has feedback loop: can remember value indefinitely

•  Function of gates (“input”, “forget”, “output”) is learned via minimizing loss

Page 26: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Google “Neural Machine Translation”: (unfolded in time)

From https://arxiv.org/pdf/1609.08144.pdf

Page 27: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Neural Machine Translation:

Training: Maximum likelihood, using gradient descent on weights

Trained on very large corpus of parallel texts in source (X) and target (Y) languages.

θ * = argmaxθ

logP(X |Y,X,Y∑ θ )

Page 28: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

How to evaluate automated translations?

Human raters’ side-by-side comparisons: Scale of 0 to 6

0: “completely nonsense translation” 2: “the sentence preserves some of the meaning of the source sentence but misses significant parts” 4: “the sentence retains most of the meaning of the source sentence, but may have some grammar mistakes” 6: “perfect translation: the meaning of the translation is completely consistent with the source, and the grammar is correct.”

Page 29: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Results from Human Raters

Page 30: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •
Page 31: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Automating Image Captioning

Page 32: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Automating Image Captioning

Wordsincap<on

Wordembeddings

SoFmaxprobabilitydistribu<onovervocabulary

Training:Largedatasetofimage/cap.onpairsfromFlickrandothersources

CNNfeatures

Vinyalsetal.,“ShowandTell:ANeuralImageCap.onGenerator”,CVPR2015

Page 33: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

“NeuralTalk” sample results

Fromh@p://cs.stanford.edu/people/karpathy/deepimagesent/genera.ondemo/

Page 34: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •
Page 35: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •
Page 36: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •
Page 37: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •
Page 38: Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

Microsoft Captionbot

https://www.captionbot.ai/