1 Registration of 3D Faces Leow Wee Kheng CS6101 AY2012-13 Semester 1.
Machine Translation, Seq2Seq and Attention CS6101 Deep...
Transcript of Machine Translation, Seq2Seq and Attention CS6101 Deep...
![Page 1: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/1.jpg)
CS6101 Deep Learning for NLP
Recess Week Machine Translation, Seq2Seq and Attention
Presenters:Yesha, Gary, Vikash, Nick, Hoang
![Page 2: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/2.jpg)
2
Outline
• Introduction - the evolution of machine translation systems (Yesha)
• Statistical machine translation (Gary)
• Neural machine translation and sequence-to-sequence architecture (Vikash)
• Attention (Nick)
• Other applications of seq2seq and attention (Hoang)
![Page 3: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/3.jpg)
Evolution of Machine Translation
Diversification of Strategies1966 to 1975
Birth of Ruled-Based Translation1954 to 1966
Early Experiments1920s to 1950s
● The first known machine translation proposal was made in 1924 in Estonia and involved a typewriter-translator
● In 1933, French-Armenian inventor Georges Artsrouni received a patent for a mechanical machine translator that looked like a typewriter
![Page 4: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/4.jpg)
Evolution of Machine Translation
Diversification of Strategies1966 to 1975
Birth of Ruled-Based Translation1954 to 1966
Early Experiments1920s to 1950s
● From then to 1946, numerous proposals were made.
● Russian scholar Peter Petrovich Troyanskii got a patent for proposing a mechanised dictionary
● In 1946, Booth and Richens in Britain did a demo on automated dictionary
● In 1949, Warren Weaver wrote a memorandum that first mentioned the possibility of using digital computers to translate documents between natural human languages
![Page 5: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/5.jpg)
Evolution of Machine Translation
Diversification of Strategies1966 to 1975
Birth of Rule-Based Translation1954 to 1966
● In 1954, IBM and Georgetown University conduct a public demo○ Direct translation systems
for pairs of languages○ Example: Russian-English
systems to US Atomic Energy Commision
● Direct translation - limited scope in terms of languages and grammar rules used
Early Experiments1920s to 1950s
![Page 6: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/6.jpg)
Evolution of Machine Translation
Diversification of Strategies1966 to 1975
● Research continues - better understanding of linguistics and indirect approaches to system design
● Different types of rule-based translation systems: ○ Interlingual MT Early Experiments1920s to
1950s
Birth of Ruled-Based Translation1954 to 1966
![Page 7: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/7.jpg)
Evolution of Machine Translation
Diversification of Strategies1966 to 1975
● Different types of rule-based translation systems: ○ Transfer MT
Early Experiments1920s to 1950s
Birth of Ruled-Based Translation1954 to 1966
![Page 8: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/8.jpg)
Evolution of Machine Translation
Modern MT1975 to present
● Statistical Machine Translation● Example Based Machine
Translation● Neural Machine Translation
Diversification of Strategies1966 to 1975
Birth of Ruled-Based Translation1954 to 1966
![Page 9: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/9.jpg)
Statistical Machine Translation
![Page 10: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/10.jpg)
Basic Approach
Languagemodel
Translationmodel
![Page 11: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/11.jpg)
Basic Approach
Languagemodel
Translationmodel
![Page 12: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/12.jpg)
Basic Approach
Languagemodel
Translationmodel
![Page 13: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/13.jpg)
General Formulation
![Page 14: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/14.jpg)
General Formulation
![Page 15: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/15.jpg)
General Formulation
![Page 16: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/16.jpg)
E.g. On voit Jon à la télévision (French)
good match to French? P(F|E)
good English? P(E)
Jon appeared in TV. ✓
It back twelve saw.In Jon appeared TV. ✓
Jon is happy today. ✓
Jon appeared on TV. ✓ ✓
TV appeared on Jon. ✓
Jon was not happy. ✓
![Page 17: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/17.jpg)
Overview of SMT (from http://courses.engr.illinois.edu/cs447; lecture 22 slides)
![Page 18: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/18.jpg)
via chain rule
Markov assumption
Maximum Likelihood Estimation
![Page 19: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/19.jpg)
● Learns the lexical mapping
between languages + ordering
● Typically focus on word-level or phrase-level mapping(insufficient data to learn entire sentences mapping directly)
● Chicken-and-egg problem:
○ Given a translation table, we can easily extract the mappings
○ Given the mappings, we can easily estimate the probabilities
English Norwegian Probability
Translation probability table
Lexical mapping between languages
![Page 20: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/20.jpg)
● Learns the lexical mapping
between languages + ordering
● Typically focus on word-level or phrase-level mapping(insufficient data to learn entire sentences mapping directly)
● Chicken-and-egg problem:
○ Given a translation table, we can easily extract the mappings
○ Given the mappings, we can easily estimate the probabilities
English Norwegian Probability
Translation probability table
Lexical mapping between languages
![Page 21: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/21.jpg)
● Learns the lexical mapping
between languages + ordering
● Typically focus on word-level or phrase-level mapping(insufficient data to learn entire sentences mapping directly)
● Chicken-and-egg problem:
○ Given a translation table, we can easily extract the mappings
○ Given the mappings, we can easily estimate the probabilities
English Norwegian Probability
Translation probability table
Lexical mapping between languages
![Page 22: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/22.jpg)
Word Alignments
1-1 mapping 1-many mapping many-1 mapping many-many mapping
Spurious words
![Page 23: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/23.jpg)
Word Alignments
1-1 mapping 1-many mapping many-1 mapping many-many mapping
Spurious words
j=0 → means NULLi=0
![Page 24: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/24.jpg)
Word Alignments
1-1 mapping 1-many mapping many-1 mapping many-many mapping
Spurious words
j=0 → means NULLi=0
![Page 25: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/25.jpg)
Word Alignments
1-1 mapping 1-many mapping many-1 mapping many-many mapping
Spurious words
j=0 → means NULLi=0
![Page 26: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/26.jpg)
IBM Translation Models
![Page 27: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/27.jpg)
IBM Translation Model Example
![Page 28: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/28.jpg)
IBM Translation Model Example
Produces 3
Produces 0
![Page 29: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/29.jpg)
IBM Translation Model Example
Produces 3
Produces 0
Inserts
![Page 30: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/30.jpg)
IBM Translation Model Example
Produces 3
Produces 0
Inserts
Distorts
![Page 31: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/31.jpg)
IBM SMT Translation Models
● Models are typically built progressively, starting from the simplest to the most complex
○ IBM1 – lexical transitions only ○ IBM2 – lexicon plus absolute position ○ HMM – lexicon plus relative position ○ IBM3 – plus fertilities ○ IBM4 – inverted relative position alignment ○ IBM5 – non-deficient version of model 4
![Page 32: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/32.jpg)
IBM1 Translation Model
e.g.
![Page 33: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/33.jpg)
IBM1 Translation Model
e.g.
![Page 34: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/34.jpg)
Training Translation Models
● How can t, n, d and p1 be estimated?
● Assume corpora is already sentence-aligned
● Use Expectation-Maximization (EM) method to iteratively infer the parameters
Start with uniform transition probabilities
Iterate until converge
Expectation step
Maximization step
![Page 35: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/35.jpg)
Training Translation Models
● How can t, n, d and p1 be estimated?
● Use Expectation-Maximization (EM) method to iteratively infer the parameters
Start with uniform transition probabilities
Iterate until converge
Expectation step
Maximization step
![Page 36: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/36.jpg)
Decoder – Search problem● Finds a list of possible
target translation candidates, with the highest probabilities
● Exhaustive search is infeasible
● Typically, search heuristics are used insteade.g. beam search greedy decoding
![Page 37: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/37.jpg)
Other things not discussed
● Phrase-based alignment
● Sentence alignment
● Syntactic-based or grammar-based models
● Hierarchical phrase-based translation
● …
![Page 38: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/38.jpg)
Shortcomings of SMT● Models can grow to be overly complex● Requires a lot of hand-designed feature engineering in
order to be effective● Difficult to manage out-of-vocabulary words ● Components are independently trained; no synergy
between models vs. end-to-end training● Memory-intensive, as decoder relies on huge lookup
tables → not mobile-tech friendly➔ Let’s look into how Neural Machine Translation (NMT) models solves the above issues
![Page 39: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/39.jpg)
Neural machine translation and sequence-to-sequence architecture
![Page 40: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/40.jpg)
What is Neural Machine Translation?
● Neural Machine Translation (NMT) is a way to do Machine Translation with a single neural network.
● The neural network architecture is called sequence-to-sequence (aka seq2seq) and it involves two RNNs.
![Page 41: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/41.jpg)
The sequence-to-sequence model
![Page 42: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/42.jpg)
Simplest Model
![Page 43: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/43.jpg)
Model Extensions
![Page 44: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/44.jpg)
Model Extensions
![Page 45: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/45.jpg)
Model Extensions
![Page 46: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/46.jpg)
Model Extensions
![Page 47: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/47.jpg)
Model Extensions
6.Use more complex hidden units GRU or LSTM.
![Page 48: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/48.jpg)
Greedy Decoding
● Target sentence is generated by taking argmax on each step of the decoder.
![Page 49: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/49.jpg)
Greedy Decoding Problems
![Page 50: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/50.jpg)
Beam Search Decoding
![Page 51: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/51.jpg)
Multilingual NMT
![Page 52: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/52.jpg)
Google’s Multilingual NMT
![Page 53: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/53.jpg)
Google’s Multilingual NMT Architecture
![Page 54: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/54.jpg)
Four big wins of NMT
![Page 55: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/55.jpg)
So is Machine Translation solved?
![Page 56: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/56.jpg)
Zero Shot Word Prediction
![Page 57: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/57.jpg)
Predicting Unseen Words
![Page 58: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/58.jpg)
Pointer Details
![Page 59: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/59.jpg)
59
Attention please!
![Page 60: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/60.jpg)
60
Seq2Seq: the bottleneck problem
![Page 61: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/61.jpg)
61
Seq2Seq: the bottleneck problem
![Page 62: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/62.jpg)
62
Seq2Seq: the bottleneck problem
![Page 63: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/63.jpg)
63
Attention: thinking processHow to solve the bottleneck problem?
![Page 64: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/64.jpg)
64
Attention: thinking process
How to solve the bottleneck problem?
![Page 65: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/65.jpg)
65
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
![Page 66: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/66.jpg)
66
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
Let’s do a weighted sum of all encoder hidden states!
context vector
![Page 67: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/67.jpg)
67
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
Let’s do a weighted sum of all encoder hidden states!
context vector
![Page 68: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/68.jpg)
68
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
Let’s do a weighted sum of all encoder hidden states!
context vector
Step 1: dot product
![Page 69: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/69.jpg)
69
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
Let’s do a weighted sum of all encoder hidden states!
context vector
Step 1: dot product
![Page 70: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/70.jpg)
70
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
Let’s do a weighted sum of all encoder hidden states!
context vector
Step 1: dot product
Step 2: softmax
![Page 71: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/71.jpg)
71
Attention: thinking process
How do we deal with variable length input sequence?
How to solve the bottleneck problem?
Let’s do a weighted sum of all encoder hidden states!
context vector
Step 1: dot product
Step 2: softmax
Q1: Why dot product?
![Page 72: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/72.jpg)
72
Seq2Seq with Attention
![Page 73: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/73.jpg)
73
Seq2Seq with Attention
![Page 74: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/74.jpg)
74
Seq2Seq with Attention
![Page 75: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/75.jpg)
75
Seq2Seq with Attention
![Page 76: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/76.jpg)
76
Seq2Seq with Attention
![Page 77: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/77.jpg)
77
Seq2Seq with Attention
![Page 78: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/78.jpg)
78
Seq2Seq with Attention
![Page 79: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/79.jpg)
79
Seq2Seq with Attention
![Page 80: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/80.jpg)
80
Seq2Seq with Attention
![Page 81: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/81.jpg)
81
Seq2Seq with Attention
![Page 82: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/82.jpg)
82
Seq2Seq with Attention
![Page 83: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/83.jpg)
83
Seq2Seq with Attention
![Page 84: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/84.jpg)
84
Seq2Seq with Attention
Q3: How does decoder state participate in the attention mechanism?
![Page 85: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/85.jpg)
85
Attention: in equations
![Page 86: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/86.jpg)
86
Attention is great
![Page 87: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/87.jpg)
87
Seq2Seq Applications
![Page 88: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/88.jpg)
● Summarization● Dialogue systems● Speech recognition● Image captioning
Seq2seq and attention applications
![Page 89: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/89.jpg)
“Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.”
Text summarization
volume of transactions at the nigerian stock exchange has continued its decline since last week , a nse official said thursday . the latest statistics showed that a total of ##.### million shares valued at ###.### million naira -lrb- about #.### million us dollars -rrb- were traded on wednesday in #,### deals.
transactions dip at nigerian stock exchange
![Page 90: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/90.jpg)
Allahyari, Mehdi, et al. "Text summarization techniques: a brief survey."
Extractive Topic Representation Approaches:
● Latent Semantic Analysis
● Bayesian Topic Models○ Latent Dirichlet
Allocation
Text summarization (before seq2seq)
![Page 91: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/91.jpg)
Their drawbacks:
● They consider the sentences as independent of each other
● Many of these previous text summarization techniques do not consider the semantics of words.
● The soundness and readability of generated summaries are not satisfactory.
Text summarization (before seq2seq)
![Page 92: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/92.jpg)
Nallapati, Ramesh, et al. "Abstractive text summarization using sequence-to-sequence rnns and beyond."
Abstractive Summary - not a mere selection of a few existing passages or sentences extracted from the source
Text summarization (seq2seq)
Neural Machine Translation Abstractive Text Summarization
Output depends on source length Output is short
Retains the original content Compresses the original content
Strong notion of one-to-one word level alignment
Less obvious notion of such one-to-one alignment
![Page 93: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/93.jpg)
Text summarization (seq2seq)Feature-rich encoder:
● POS, NER, TF, IDF
![Page 94: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/94.jpg)
Text summarization (seq2seq)Switch generator/pointer for OOV
● Switch is on → produces a word from itstarget vocabulary
● Switch is off → generates a pointer to a word-position in theSource → copied into the summary
![Page 95: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/95.jpg)
Text summarization (seq2seq)Identify the key sentences:
● 2 levels of importance → 2 BiRNNs
● Attention mechanisms operate at both levels simultaneously.
● Word-level attention is reweighted:
![Page 96: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/96.jpg)
“Dialogue systems, also known as interactive conversational agents, virtual agents and sometimes chatterbots, are used in a wide set of applications ranging from technical support services to language learning tools and entertainment.”
Dialogue systems
![Page 97: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/97.jpg)
Dialogue systems (before seq2seq)Young, Steve, et al. "Pomdp-based statistical spoken dialog systems: A review."
Goal-driven system:
● At each t, SLU converts input to a semantic representation ut
● System updates internal state and determines next action at, then converted to output speech
![Page 98: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/98.jpg)
Their drawbacks:
● Use handcrafted features for the state and action space representations● Require a large corpora of annotated task-specific simulated conversations
○ → Time consuming to deploy
Dialogue systems (before seq2seq)
![Page 99: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/99.jpg)
Serban, Iulian Vlad, et al. "Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models."
● End-to-end trainable, non-goal-driven systems based on generative probabilistic models.
Dialogue systems (seq2seq)
![Page 100: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/100.jpg)
Dialogue systems (seq2seq)Hierarchical Recurrent Encoder - Decoder
● Encoder map each utterance (last token) to an utterance vector
● Higher-level context RNN:○ cn = f(cn-1, un)
![Page 101: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/101.jpg)
From RNN to Speech Recognition:
● RNN requires pre-segmented input data
○
● RNNs can only be trained to make a series of independent label classifications
○
Speech recognition
![Page 102: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/102.jpg)
Speech recognitionPrabhavalkar, Rohit, et al. "A comparison of sequence-to-sequence models for speech recognition."
![Page 103: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/103.jpg)
Image captioningXu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention."
![Page 104: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/104.jpg)
Location variable st: where to put attention when generating the tth word
http://kelvinxu.github.io/projects/capgen.html
Image captioning
![Page 105: Machine Translation, Seq2Seq and Attention CS6101 Deep ...kanmy/courses/6101_1810/wrecess-mt-seq2... · CS6101 Deep Learning for NLP Recess Week Machine Translation, Seq2Seq and Attention](https://reader030.fdocuments.us/reader030/viewer/2022041017/5ec9a94824d6a07a495b8871/html5/thumbnails/105.jpg)
105
Thank You!