Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need...
Transcript of Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need...
![Page 1: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/1.jpg)
Attention Is All You Need
Presented by: Aqeel Labash
2017 - By:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia PolosukhinFrom:
Google brainGoogle research
![Page 2: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/2.jpg)
Deep learning, algorithms inspired by brain functions & structure
http://neuralnetworksanddeeplearning.com
![Page 3: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/3.jpg)
Machine translation is a blackbox
http://blog.webcertain.com/the-dangers-of-machine-translation/09/07/2014/
![Page 4: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/4.jpg)
Taking a look under the hood
http://matthewbrannelly.com.au/
![Page 5: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/5.jpg)
But First How Machine Translation work ?
![Page 6: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/6.jpg)
Word embedding is about representing words in meaningful numerical vectors
https://www.doc.ic.ac.uk/~js4416/163/website/nlp/
![Page 7: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/7.jpg)
Problem: RNN constrained by previous timestep computation
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 8: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/8.jpg)
Target is to Improve the perofmrance and get rid of sequential computation
![Page 9: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/9.jpg)
Transformer
1
https://www.universalstudioshollywood.com/things-to-do/rides-and-attractions/transformers-the-ride-3d/
![Page 10: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/10.jpg)
Self-Attention: focus on the important parts.
https://deepage.net/deep_learning/2017/03/03/attention-augmented-recurrent-neural-networks.html#attentionインターフェース
![Page 11: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/11.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 12: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/12.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 13: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/13.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 14: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/14.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 15: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/15.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 16: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/16.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 17: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/17.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 18: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/18.jpg)
Model: Encoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward
![Page 19: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/19.jpg)
Model: Decoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward● Softmax
![Page 20: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/20.jpg)
Model: Decoder
https://arxiv.org/abs/1706.03762
● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward● Softmax
![Page 21: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/21.jpg)
Model: Complete
https://arxiv.org/abs/1706.03762
![Page 22: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/22.jpg)
https://arxiv.org/abs/1706.03762
![Page 23: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/23.jpg)
https://arxiv.org/abs/1706.03762
![Page 24: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/24.jpg)
https://arxiv.org/abs/1706.03762
![Page 25: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/25.jpg)
https://www.reddit.com/r/MachineLearning/comments/6kc7py/d_where_does_the_query_keys_and_values_come_from/https://arxiv.org/abs/1706.03762
Q,K,V
● "encoder-decoder attention" layers, the queries (Q) come from the previous decoder layer, and the memory keys (K) and values (V) come from the output of the encoder.”
● Otherwise: all three come from previous layer ( Hidden state )
![Page 26: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/26.jpg)
https://www.reddit.com/r/MachineLearning/comments/6kc7py/d_where_does_the_query_keys_and_values_come_from/https://arxiv.org/abs/1706.03762
Q,K,V
● "encoder-decoder attention" layers, the queries (Q) come from the previous decoder layer, and the memory keys (K) and values (V) come from the output of the encoder.”
● Otherwise: all three come from previous layer ( Hidden state )
![Page 27: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/27.jpg)
https://arxiv.org/abs/1706.03762
Complexity
![Page 28: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/28.jpg)
https://arxiv.org/abs/1706.03762
Position-wise Feed-Forward network
![Page 29: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/29.jpg)
https://stats.stackexchange.com/questions/201569/difference-between-dropout-and-dropconnect/201891
Dropout
● Used for Residual connections
![Page 30: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/30.jpg)
http://www.prioritiesusa.org/recommendations-for-sports-training/
Training
● Data sets:○ WMT 2014 English-German: 4.5 million sentences pairs with
37K tokens.○ WMT 2014 English-French: 36M sentences, 32K tokens.
● Hardware:○ 8 Nvidia P100 GPus (Base model 12 hours, big model 3.5 days)
![Page 31: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/31.jpg)
http://www.prioritiesusa.org/recommendations-for-sports-training/
Training
● Data sets:○ WMT 2014 English-German: 4.5 million sentences pairs with
37K tokens.○ WMT 2014 English-French: 36M sentences, 32K tokens.
● Hardware:○ 8 Nvidia P100 GPus (Base model 12 hours, big model 3.5 days)
![Page 32: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/32.jpg)
Optimizing the parameters
https://arxiv.org/abs/1706.03762
![Page 33: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/33.jpg)
Results
https://arxiv.org/abs/1706.03762
![Page 34: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/34.jpg)
Results
https://arxiv.org/abs/1706.03762
![Page 35: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/35.jpg)
Results
https://arxiv.org/abs/1706.03762
![Page 36: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/36.jpg)
How is this paper linked to neuroscience ??
https://www.astrazeneca.com/our-focus-areas/neuroscience.htmlhttp://blog.webcertain.com/machine-translation-technology-the-search-engine-takeover/18/02/2015/
?
![Page 37: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob](https://reader034.fdocuments.us/reader034/viewer/2022050422/5f912839c0f090317c458b00/html5/thumbnails/37.jpg)
Thanks for ListeningLet’s start with the first set of slides