Deep learning for natural language embeddings
-
Upload
roelof-pieters -
Category
Education
-
view
1.018 -
download
2
Transcript of Deep learning for natural language embeddings
![Page 1: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/1.jpg)
@graphificRoelof Pieters
DeepLearningforNaturalLanguageEmbeddings
2May2016 KTH
www.csc.kth.se/~roelof/ [email protected]
![Page 2: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/2.jpg)
Can we understand Language ?
1. Language is ambiguous:Every sentence has many possible interpretations.
2. Language is productive:We will always encounter new words or new constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
2
![Page 3: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/3.jpg)
Can we understand Language ?
1. Language is ambiguous:Every sentence has many possible interpretations.
2. Language is productive:We will always encounter new words or new constructions
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
• the students went to class DT NN VB P NN
3
Some of the challenges in Language Understanding:
![Page 4: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/4.jpg)
Can we understand Language ?
1. Language is ambiguous:Every sentence has many possible interpretations.
2. Language is productive:We will always encounter new words or new constructions
4
Some of the challenges in Language Understanding:
![Page 5: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/5.jpg)
[Karlgren 2014, NLP Sthlm Meetup]5
Digital Media Deluge: text
![Page 6: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/6.jpg)
[ http://lexicon.gavagai.se/lookup/en/lol ]6
Digital Media Deluge: text
lol ?
…
![Page 7: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/7.jpg)
Can we understand Language ?
1. Language is ambiguous:Every sentence has many possible interpretations.
2. Language is productive:We will always encounter new words or new constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
7
![Page 8: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/8.jpg)
Can we understand Language ?
8
![Page 9: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/9.jpg)
NLP Data?
some of the (many) treebank datasets
source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks
!
9
![Page 10: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/10.jpg)
Penn TreebankThat’s a lot of “manual” work:
10
![Page 11: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/11.jpg)
• the students went to class
DT NN VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
With a lot of issues:
Penn Treebank
11
![Page 12: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/12.jpg)
Deep Learning: Why for NLP ?
Beat state of the art at:• Language Modeling (Mikolov et al. 2011) [WSJ AR task] • Speech Recognition (Dahl et al. 2012, Seide et al 2011;
following Mohammed et al. 2011)• Sentiment Classification (Socher et al. 2011)
and we already know for some longer time it works well for Computer Vision of course:• MNIST hand-written digit recognition (Ciresan et al.
2010)• Image Recognition (Krizhevsky et al. 2012) [ImageNet]
12
![Page 13: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/13.jpg)
One Model rules them all ?DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summarization Coreference resolution Discourse analysis
Machine translation Morphological segmentation Named entity recognition (NER)
Natural language generation
Natural language understanding
Optical character recognition (OCR)
Part-of-speech tagging
Parsing
Question answering
Relationship extraction
sentence boundary disambiguation
Sentiment analysis
Speech recognition
Speech segmentation
Topic segmentation and recognition
Word segmentation
Word sense disambiguation
Information retrieval (IR)
Information extraction (IE)
Speech processing
13
![Page 14: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/14.jpg)
• What is the meaning of a word? (Lexical semantics)
• What is the meaning of a sentence? ([Compositional] semantics)
• What is the meaning of a longer piece of text?(Discourse semantics)
Semantics: Meaning
14
![Page 15: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/15.jpg)
• Language models define probability distributions over (natural language) strings or sentences
• Joint and Conditional Probability
Language Model
15
![Page 16: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/16.jpg)
• Language models define probability distributions over (natural language) strings or sentences
Language Model
16
![Page 17: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/17.jpg)
• Language models define probability distributions over (natural language) strings or sentences
Language Model
17
![Page 18: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/18.jpg)
Word senses
What is the meaning of words?
• Most words have many different senses:dog = animal or sausage?
How are the meanings of different words related?
• - Specific relations between senses:Animal is more general than dog.
• - Semantic fields:money is related to bank
18
![Page 19: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/19.jpg)
Word senses
Polysemy:
• A lexeme is polysemous if it has different related senses
• bank = financial institution or building
Homonyms:
• Two lexemes are homonyms if their senses are unrelated, but they happen to have the same spelling and pronunciation
• bank = (financial) bank or (river) bank
19
![Page 20: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/20.jpg)
Word senses: relations
Symmetric relations:
• Synonyms: couch/sofaTwo lemmas with the same sense
• Antonyms: cold/hot, rise/fall, in/outTwo lemmas with the opposite sense
Hierarchical relations:
• Hypernyms and hyponyms: pet/dog The hyponym (dog) is more specific than the hypernym (pet)
• Holonyms and meronyms: car/wheelThe meronym (wheel) is a part of the holonym (car)
20
![Page 21: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/21.jpg)
Distributional representations
“You shall know a word by the company it keeps” (J. R. Firth 1957)
One of the most successful ideas of modern statistical NLP!
these words represent banking21
![Page 22: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/22.jpg)
Distributional hypothesis
He filled the wampimuk, passed it around and we all drunk some
We found a little, hairy wampimuk sleeping behind the tree
(McDonald & Ramscar 2001)22
![Page 23: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/23.jpg)
• NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
• or in vector space:
• also known as “one hot” representation.
• Its problem ?
Word Representation
Love Candy Store
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] ANDStore [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !
23
![Page 24: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/24.jpg)
Word Representation
24
![Page 25: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/25.jpg)
Distributional semanticsDistributional meaning as co-occurrence vector:
25
![Page 26: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/26.jpg)
Deep Distributional representations
• Taking it further:
• Continuous word embeddings
• Combine vector space semantics with the prediction of probabilistic models
• Words are represented as a dense vector:
Candy =
26
![Page 27: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/27.jpg)
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
27
![Page 28: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/28.jpg)
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birththe place where I was born
28
![Page 29: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/29.jpg)
How?
![Page 30: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/30.jpg)
• Representation of words as continuous vectors has a long history (Hinton et al. 1986; Rumelhart et al. 1986; Elman 1990)
• First neural network language model: NNLM (Bengio et al. 2001; Bengio et al. 2003) based on earlier ideas of distributed representations for symbols (Hinton 1986)
How?
30
![Page 31: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/31.jpg)
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birththe place where I was born ?
…
31
![Page 32: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/32.jpg)
How? Language Compositionality
Principle of compositionality:
the “meaning (vector) of a complex expression (sentence) is determined by:
— Gottlob Frege (1848 - 1925)
- the meanings of its constituent expressions (words) and
- the rules (grammar) used to combine them”
32
![Page 33: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/33.jpg)
• Can theoretically (given enough units) approximate “any” function
• and fit to “any” kind of data
• Efficient for NLP: hidden layers can be used as word lookup tables
• Dense distributed word vectors + efficient NN training algorithms:
• Can scale to billions of words !
Neural Networks for NLP
33
![Page 34: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/34.jpg)
• How do we handle the compositionality of language in our models?
34
Compositionality
![Page 35: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/35.jpg)
• How do we handle the compositionality of language in our models?
• Recursion :the same operator (same parameters) is applied repeatedly on different components
35
Compositionality
![Page 36: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/36.jpg)
• How do we handle the compositionality of language in our models?
• Option 1: Recurrent Neural Networks (RNN)
36
RNN 1: Recurrent Neural Networks
![Page 37: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/37.jpg)
• How do we handle the compositionality of language in our models?
• Option 2: Recursive Neural Networks (also sometimes called RNN)
37
RNN 2: Recursive Neural Networks
![Page 38: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/38.jpg)
• achieved SOTA in 2011 on Language Modeling (WSJ AR task) (Mikolov et al., INTERSPEECH 2011):
• and again at ASRU 2011:
38
Recurrent Neural Networks
“Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.”
“[ RNN LM trained on a] single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup”
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)Recurrent neural network based language model
![Page 39: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/39.jpg)
39
Recurrent Neural Networks
(simple recurrent neural network for LM)
input
hidden layer(s)
output layer
+ sigmoid activation function+ softmax function:
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)Recurrent neural network based language model
![Page 40: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/40.jpg)
40
Recurrent Neural Networks
backpropagation through time
![Page 41: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/41.jpg)
41
Recurrent Neural Networks
backpropagation through time
class based recurrent NN
[code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
![Page 42: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/42.jpg)
• Recursive Neural Network for LM (Socher et al. 2011; Socher 2014)
• achieved SOTA on new Stanford Sentiment Treebank dataset (but comparing it to many other models):
Recursive Neural Network
42
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
![Page 43: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/43.jpg)
Recursive Neural Tensor Network
43
code & info: http://www.socher.org/index.php/Main/ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks
Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) Parsing Natural Scenes and Natural Language with Recursive Neural Networks
![Page 44: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/44.jpg)
Recursive Neural Tensor Network
44
![Page 45: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/45.jpg)
• RNN (Socher et al. 2011a)
Recursive Neural Network
45
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
![Page 46: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/46.jpg)
• RNN (Socher et al. 2011a)
• Matrix-Vector RNN (MV-RNN) (Socher et al., 2012)
Recursive Neural Network
46
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
![Page 47: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/47.jpg)
• RNN (Socher et al. 2011a)
• Matrix-Vector RNN (MV-RNN) (Socher et al., 2012)
• Recursive Neural Tensor Network (RNTN) (Socher et al. 2013)
Recursive Neural Network
47
![Page 48: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/48.jpg)
• negation detection:
Recursive Neural Network
48
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
![Page 49: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/49.jpg)
Recurrent vs Recursive
49
recursive net untied recursive net recurrent net
![Page 50: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/50.jpg)
NP
PP/IN
NP
DT NN PRP$ NN
Parse TreeRecurrent NN for Vector Space
50
![Page 51: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/51.jpg)
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
INDT NN PRP NN
Compositionality
51
Recurrent NN: CompositionalityRecurrent NN for Vector Space
![Page 52: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/52.jpg)
NP
IN
NP
PRP NN
Parse Tree
DT NN
Compositionality
52
Recurrent NN: CompositionalityRecurrent NN for Vector Space
![Page 53: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/53.jpg)
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules” “meanings”
Compositionality
53
Recurrent NN: CompositionalityRecurrent NN for Vector Space
![Page 54: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/54.jpg)
Vector Space + Word Embeddings: Socher
54
Recurrent NN: CompositionalityRecurrent NN for Vector Space
![Page 55: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/55.jpg)
Vector Space + Word Embeddings: Socher
55
Recurrent NN for Vector Space
![Page 56: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/56.jpg)
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/ 56
![Page 57: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/57.jpg)
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/ 57
![Page 58: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/58.jpg)
Word Embeddings: Collobert & Weston (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch
58
![Page 59: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/59.jpg)
Word2Vec & Linguistic Regularities: Mikolov (2013)
code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations
59
![Page 60: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/60.jpg)
Word2Vec & Linguistic Regularities: Mikolov (2013)
• Similar to the feedforward NNLM, but
• Non-linear hidden layer removed • Projection layer shared for all words –
Not just the projection matrix • Thus, all words get projected into the
same position – Their vectors are just averaged
• Called CBOW (continuous BoW) because the order of the words is lost
• Another modification is to use words from past and from future (window centered on current word)
Continuous BoW (CBOW) Model
![Page 61: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/61.jpg)
Word2Vec & Linguistic Regularities: Mikolov (2013)
• Similar to CBOW, but instead of predicting the current word based on the context
• Tries to maximize classification of a word based on another word in the same sentence
• Thus, uses each current word as an input to a log-linear classifier
• Predicts words within a certain window • Observations – Larger window size =>
better quality of the resulting word vectors, higher training time – More distant words are usually less related to the current word than those close to it – Give less weight to the distant words by sampling less from those words in the training examples
Continuous Skip-gram Model
![Page 62: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/62.jpg)
GloVe: Global Vectors for Word Representation (Pennington 2015)
62
Glove == Word2Vec?
Omer Levy & Yoav Goldberg (2014) Neural Word Embeddingas Implicit Matrix Factorization
![Page 63: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/63.jpg)
Paragraph Vectors: Dai et al. (2014)
63
![Page 64: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/64.jpg)
Paragraph Vectors: Dai et al. (2014)
64
Paragraph vector with distributed memory (PV-DM)
Distributed bag of words version (PVDBOW)
![Page 65: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/65.jpg)
Paragraph Vectors: Dai et al. (2014)
65
![Page 66: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/66.jpg)
Swivel: Improving Embeddings by Noticing What's Missing (Shazeer et al 2016)
66
![Page 67: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/67.jpg)
Swivel: Improving Embeddings by Noticing What's Missing (Shazeer et al 2016)
67
![Page 68: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/68.jpg)
Multi-embeddings: Stanford (2012)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)Improving Word Representations via Global Context and Multiple Word Prototypes
68
![Page 69: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/69.jpg)
Word Embeddings for MT: Mikolov (2013)
Mikolov, T., Le, V. L., Sutskever, I. (2013) . Exploiting Similarities among Languages for Machine Translation
69
![Page 70: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/70.jpg)
Word Embeddings for MT: Kiros (2014)
70
![Page 71: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/71.jpg)
Recursive Embeddings for Sentiment: Socher (2013)
Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.
code & demo: http://nlp.stanford.edu/sentiment/index.html71
![Page 72: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/72.jpg)
![Page 73: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/73.jpg)
• RNN
RNN
=
(unrolled)
![Page 74: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/74.jpg)
>Word Embeddings: Sentiment Analysis• RNN
![Page 75: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/75.jpg)
>Word Embeddings: Sentiment Analysis• Bidirectional RNN
![Page 76: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/76.jpg)
>Word Embeddings: Sentiment Analysis• Vanishing gradient problem
![Page 77: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/77.jpg)
• LSTM
>Word Embeddings: Sentiment Analysis
![Page 78: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/78.jpg)
• LSTM
Most cited LSTM paper:- Graves, Alex. Supervised sequence labelling with recurrent neura l networks. Vo l . 385. Springer, 2012.
Introduction of the LSTM model:- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Addition of the forget gate to the LSTM model:- Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.
>Word Embeddings: Sentiment Analysis
![Page 79: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/79.jpg)
>Word Embeddings: Sentiment Analysis• Vanishing gradient problem… solved
![Page 80: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/80.jpg)
Wanna be Doing Deep Learning?
![Page 81: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/81.jpg)
python has a wide range of deep learning-related libraries available
Deep Learning with Python
Low level
High level
deeplearning.net/software/theano
caffe.berkeleyvision.org
tensorflow.org/
lasagne.readthedocs.org/en/latest
and of course:
keras.io
![Page 84: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/84.jpg)
Questions?love letters? existential dilemma’s? academic questions? gifts?
find me at:www.csc.kth.se/~roelof/
@graphific
Oh, and soon we’re looking for Creative AI enthusiasts !
- job- internship- thesis work
inAI (Deep Learning)
&Creativity
![Page 85: Deep learning for natural language embeddings](https://reader034.fdocuments.us/reader034/viewer/2022042723/5871529d1a28ab8e5b8b4857/html5/thumbnails/85.jpg)