CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c...
Transcript of CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c...
![Page 1: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/1.jpg)
CS287: Statistical Natural Language Processing
Alexander Rush
April 6, 2016
![Page 2: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/2.jpg)
Contents
Applications
Scientific Challenges
Deep Learning for Natural Language Processing
This Class
![Page 3: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/3.jpg)
Count-based Language Models
By the chain rule, any distribution can be factorized as
p(w1, . . . ,wT ) =T∏
t=1
p(wt |w1, . . . ,wt−1)
Count-based n-gram language models make a Markov assumption:
p(wt |w1, . . . ,wt) ≈ p(wt |wt−n, . . . ,wt−1)
Need smoothing to deal with rare n-grams.
Kim, Jernite, Sontag, Rush Character-Aware Neural Language Models 3 / 75
![Page 4: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/4.jpg)
Neural Language Models
Neural Language Models (NLM)
Represent words as dense vectors in Rn (word embeddings).
wt ∈ R|V| : One-hot representation of word ∈ V at time t⇒ xt = Xwt : Word embedding (X ∈ Rn×|V|, n < |V|)
Train a neural net that composes history to predict next word.
Kim, Jernite, Sontag, Rush Character-Aware Neural Language Models 4 / 75
![Page 5: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/5.jpg)
![Page 6: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/6.jpg)
![Page 7: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/7.jpg)
![Page 8: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/8.jpg)
Contents
Applications
Scientific Challenges
Deep Learning for Natural Language Processing
This Class
![Page 9: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/9.jpg)
Foundational Challenge: Turing Test
Q: Please write me a sonnet on the subject of the Forth
Bridge.
A : Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K
at K6 and R at R1. It is your move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate. - Turing (1950)
![Page 10: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/10.jpg)
(1) Lexicons and Lexical Semantics
Zipf’ Law (1935,1949):
The frequency of any word is inversely proportional to its rank
in the frequency table.
![Page 11: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/11.jpg)
(2) Structure and Probabilistic Modeling
The Shannon Game (Shannon and Weaver, 1949):
Given the last n words, can we predict the next one?
The pin-tailed snipe (Gallinago stenura) is a small
stocky wader. It breeds in northern Russia and migrates
to spend the
I Probabilistic models have become very effective at this task.
I Crucial for speech recognition (Jelinek), OCR, automatic
translations, etc.
![Page 12: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/12.jpg)
(3) Compositionality of Syntax and Semantics
Probabilistic models give no insight into some of the basic
problems of syntactic structure - Chomsky (1956)
![Page 13: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/13.jpg)
(4) Document Structure and Discourse
Language is not merely a bag-of-words but a tool with
particular properties - Harris (1954)
![Page 14: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/14.jpg)
(5) Knowledge and Reasoning Beyond the Text
It is based on the belief that in modeling language
understanding, we must deal in an integrated way with all of
the aspects of language syntax, semantics, and inference. -
Winograd (1972)
The city councilmen refused the demonstrators a permit
because they [feared/advocated] violence.
I Recently (2011) posed as a challenge for testing commonsense
reasoning.
![Page 15: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/15.jpg)
Contents
Applications
Scientific Challenges
Deep Learning for Natural Language Processing
This Class
![Page 16: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/16.jpg)
Deep Learning and NLP
I Presentation-based on Chris Manning’s “Computational Linguistics
and Deep Learning” (2016) published in Computational Linguistics
Deep Learning waved have lapped at the shores of
computational linguistics for several years now, but 2015
seems like the year when the full force of the tsunami hit
major NLP conferences. - Chris Manning
![Page 17: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/17.jpg)
NLP as a Challenge for Machine Learning
I’d use the billion dollars to build a NASA-size program
focusing on natural langauge processing in all of its glory
(semantics, pragmatics, etc.) ... Intellectually I think that
NLP is fascinating, allowing us to focus on highly structure
inference programs, on issues that go to the core of ‘what is
thought‘ but remain eminently practical, and on a technology
that surely would make the world a better place” - Jordan
(2014)
![Page 18: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/18.jpg)
NLP as a Challenge for Deep Learning
The next big step for Deep Learning is natural language
understanding, which aims to give machines the power to
understand not just individual words but entire sentence and
paragraphs. - Bengio
![Page 19: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/19.jpg)
What are they referring to?
Recent advances in,
I Speech Recognition
I Language Modeling
I Machine Translation
I Question Answering
I many other tasks.
Still,
Problems in higher-level language processing have not seen
the dramatic error-rate reductions from deep learning that
have been seen in speech recognition and object recognition in
vision.
![Page 20: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/20.jpg)
What are they referring to?
Recent advances in,
I Speech Recognition
I Language Modeling
I Machine Translation
I Question Answering
I many other tasks.
Still,
Problems in higher-level language processing have not seen
the dramatic error-rate reductions from deep learning that
have been seen in speech recognition and object recognition in
vision.
![Page 21: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/21.jpg)
Object Recognition
![Page 22: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/22.jpg)
![Page 23: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/23.jpg)
Image Captioning
![Page 24: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/24.jpg)
Central Aspects of Deep Learning for NLP
1. Learn the features representations of language.
2. Construct higher-level structure in a latent manner
3. Train systems completely end-to-end.
![Page 25: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/25.jpg)
Central Aspects of Deep Learning for NLP
1. Learn the features representations of language.
2. Construct higher-level structure in a latent manner
3. Train systems completely end-to-end.
![Page 26: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/26.jpg)
Central Aspects of Deep Learning for NLP
1. Learn the features representations of language.
2. Construct higher-level structure in a latent manner
3. Train systems completely end-to-end.
![Page 27: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/27.jpg)
![Page 28: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/28.jpg)
![Page 29: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/29.jpg)
LSTM
![Page 30: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/30.jpg)
![Page 31: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/31.jpg)
![Page 32: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/32.jpg)
GPU Processing
I Neural Networks are remarkably parallel-izable.
GPU Implementation of a variant of HW1
non-GPU GPU
per epoch 2475s 54.0 s
per batch 787ms 15.6 ms
![Page 33: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/33.jpg)
(1) Compositional Structures?
![Page 34: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/34.jpg)
(2) Understanding Text?
![Page 35: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/35.jpg)
(3) Language and Thought?
I Do these methods tell us anything about core nature of language?
I Do they inform psychology or cognitive problems?
![Page 36: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/36.jpg)
Contents
Applications
Scientific Challenges
Deep Learning for Natural Language Processing
This Class
![Page 37: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/37.jpg)
This Semester
Deep Learning for Natural Language Processing
I Primarily a lecture course.
I Topics and papers distributed throughout.
I Main Goal: Educate researchers in NLP
![Page 38: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/38.jpg)
Background
I Some college-level Machine Learning course
I Practical programming experience
I Interest in applied experimental research (not a theory course)
![Page 39: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/39.jpg)
Audience
Take this class to...
I understand about cutting-edge methods in the area.
I replicate many important recent results
I apply machine learning to relevant, interesting problems
Do not take this class to...
I get experience with common NLP tools (NLTK, CoreNLP, etc.)
I build a system for your (non-NLP) startup
I learn much about modern Linguistics
![Page 40: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/40.jpg)
Audience
Take this class to...
I understand about cutting-edge methods in the area.
I replicate many important recent results
I apply machine learning to relevant, interesting problems
Do not take this class to...
I get experience with common NLP tools (NLTK, CoreNLP, etc.)
I build a system for your (non-NLP) startup
I learn much about modern Linguistics
![Page 41: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/41.jpg)
Topics
1. Machine Learning for Text
2. Feed-Forward Neural Networks
3. Language Modeling and Word Embeddings
4. Recurrent Neural Networks
5. Conditional Random Fields and Structured Prediction
![Page 42: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/42.jpg)
Topics
1. Machine Learning for Text
2. Feed-Forward Neural Networks
3. Language Modeling and Word Embeddings
4. Recurrent Neural Networks
5. Conditional Random Fields and Structured Prediction
![Page 43: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/43.jpg)
Topics
1. Machine Learning for Text
2. Feed-Forward Neural Networks
3. Language Modeling and Word Embeddings
4. Recurrent Neural Networks
5. Conditional Random Fields and Structured Prediction
![Page 44: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/44.jpg)
Topics
1. Machine Learning for Text
2. Feed-Forward Neural Networks
3. Language Modeling and Word Embeddings
4. Recurrent Neural Networks
5. Conditional Random Fields and Structured Prediction
![Page 45: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/45.jpg)
Topics
1. Machine Learning for Text
2. Feed-Forward Neural Networks
3. Language Modeling and Word Embeddings
4. Recurrent Neural Networks
5. Conditional Random Fields and Structured Prediction
![Page 46: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/46.jpg)
Homeworks
Each homework will require you to replicate a research result,
I Text Classification
I Sentence Tagging
I Language Modeling (1)
I Language Modeling (2) (LSTMs)
I Name-Entity Recognition (CRFs)
![Page 47: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/47.jpg)
Programming
Assignments use,
I Python for text processing and visualization
I Lua/Torch for neural networks
I First section on Fri. will be introduction.
![Page 48: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/48.jpg)
Applications
Lectures on NLP applications
I Language Modeling
I Coreference and Pronoun Anaphora
I Neural Machine Translation
I Syntactic Parsing
![Page 49: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/49.jpg)
Final Project
I Empirical project done in teams
I Research-level project on current topics
I Expect top projects to be conference submissions.
![Page 50: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/50.jpg)
Project Ideas
Projects we work on,
I Morphology in language modeling
I In-Document Coreference
I Surface ordering of words in a sentence.
I Question-Answering in Text
![Page 51: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/51.jpg)
Project Ideas
Projects we work on . . .
I Morphology in language modeling
I In-Document Coreference
I Surface ordering of words in a sentence.
I Question-Answering in Text
![Page 52: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/52.jpg)
Project Ideas
Projects to consider . . .
I Information Extraction from Documents
I Twitter and Social Network Modeling
I Visualization of NLP networks
I Deep-Reinforcement Learning and Languages
![Page 53: CS287: Statistical Natural Language Processing · PDF fileContents Applications Scienti c Challenges Deep Learning for Natural Language Processing This Class](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a9e47e17f8b9a2e688cedcc/html5/thumbnails/53.jpg)
- Two undergraduate students Marvin Minsky at Harvard in 1950, build
the first neural network computer the SNARC (3000 Vacuum tubes and
a surplus autopilot mechanism form a B-24 bomber) - von neumann “If
it isn’t now it will be someday”