T5 and large language models: The good, the bad, and the...
Transcript of T5 and large language models: The good, the bad, and the...
![Page 1: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/1.jpg)
T5 and large language models: The good, the bad,
and the ugly Colin Raffel
![Page 2: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/2.jpg)
Which transfer learning methods work best, and what happens when we scale them up?
What about non-English pre-trained models?
How much knowledge does the model learn during pre-training?
Does the model memorize data during pre-training?
Which Transformer modifications work best?
![Page 3: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/3.jpg)
The cabs ____ the same rates as those ____ by horse-drawn cabs and were ____ quite popular, ____ the Prince of Wales (the ____ King Edward VII) travelled in ____. The cabs quickly ____ known as "hummingbirds" for ____ noise made by their motors and their distinctive black and ____ livery. Passengers ____ ____ the interior fittings were ____ when compared to ____ cabs but there ____ some complaints ____ the ____ lighting made them too ____ to those outside ____.
charged, used, initially, even, future, became, the, yellow, reported, that, luxurious, horse-drawn, were that, internal, conspicuous, cab
Unsupervised pre-training
This movie is terrible! The acting is bad and I was bored the entire time. There was no plot and nothing interesting happened. I was really surprised since I had very high expectations. I want 103 minutes of my life back!
negative
Supervised fine-tuning
![Page 4: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/4.jpg)
SQuAD Exact Match score (validation set)
Source: https://paperswithcode.com/sota/question-answering-on-squad11-dev
![Page 5: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/5.jpg)
Source: https://paperswithcode.com/sota/question-answering-on-squad11-dev
{ {Transfer learning
No transfer learning
![Page 6: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/6.jpg)
Source: https://paperswithcode.com/sota/question-answering-on-squad11-dev
BERT
T5
![Page 7: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/7.jpg)
word2vec
ELMoULMFiT
BERTMASS
20192015 2016 2017 20182014 2020
GPT-1Semi-S
upervise
d Sequence Learning
StructB
ERT
FreeLB
ALBERT
SpanBERT
RoBERTa
XLNet
MT-DNN
BERT on STILTs
Unsupervi
sed se
ntiment n
euron
![Page 8: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/8.jpg)
word2vec
ELMoULMFiT
20192015 2016 2017 20182014 2020
Semi-Supervi
sed Sequence
Learning
{Lots of stuff!
Unsupervi
sed se
ntiment n
euron
![Page 9: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/9.jpg)
- Paper A proposes an unsupervised pre-training technique called "FancyLearn".
- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.
- Paper A uses Wikipedia for unlabeled data.
- Paper B uses Wikipedia and the Toronto Books Corpus.
- Is FancierLearn better than FancyLearn?
![Page 10: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/10.jpg)
- Paper A proposes an unsupervised pre-training technique called "FancyLearn".
- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.
- Paper A uses a model with 100 million parameters.
- Paper B uses a model with 200 million parameters.
- Is FancierLearn better than FancyLearn?
![Page 11: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/11.jpg)
- Paper A proposes an unsupervised pre-training technique called "FancyLearn".
- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.
- Paper A pre-trains on 100 billion tokens of unlabeled data.
- Paper B pre-trains on 200 billion tokens of unlabeled data.
- Is FancierLearn better than FancyLearn?
![Page 12: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/12.jpg)
- Paper A proposes an unsupervised pre-training technique called "FancyLearn".
- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.
- Paper A uses the Adam optimizer.
- Paper B uses SGD with momentum.
- Is FancierLearn better than FancyLearn?
![Page 13: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/13.jpg)
Given the current landscape of transfer learning for NLP, what works best? And how
far can we push the tools we already have?
![Page 14: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/14.jpg)
T5
Text-to-Text Transfer
Transformer
![Page 15: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/15.jpg)
"translate English to German: That is good." T5 "Das ist gut."
![Page 16: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/16.jpg)
"cola sentence: The course is jumping well." T5 "not acceptable"
![Page 17: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/17.jpg)
"stsb sentence1: The rhino grazed on the grass. sentence2: A rhino
is grazing in a field." T5 "3.8"
![Page 18: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/18.jpg)
"summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…"
T5 "six people hospitalized after a storm in attala county."
![Page 19: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/19.jpg)
"translate English to German: That is good."
"cola sentence: The course is jumping well."
"summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…"
"stsb sentence1: The rhino grazed on the grass. sentence2: A rhino
is grazing in a field."T5
"Das ist gut."
"not acceptable"
"six people hospitalized after a storm in attala county."
"3.8"
![Page 20: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/20.jpg)
Source: http://jalammar.github.io/illustrated-transformer/
![Page 21: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/21.jpg)
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== oklahoma city
oklahoma city (/oʊkləˌhoʊmə -/), often shortened to okc, is the capital and largest city of the u.s. state of oklahoma. the county seat of oklahoma county,[8] the city ranks 27th among united states cities in population. the population grew following the 2010 census, with the population estimated to have increased to 643,648 as of july 2017.[5] as of 2015, the oklahoma city metropolitan area had a population of 1,358,452,[9] and the oklahoma city-shawnee combined statistical area had a population of 1,459,758 residents,[9] making it oklahoma's largest metropolitan area.
oklahoma city's city limits extend into canadian,...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
== piano
the piano is an acoustic, stringed musical instrument invented in italy by bartolomeo cristofori around the year 1700 (the exact year is uncertain), in which the strings are struck by hammers. it is played using a keyboard,[1] which is a row of keys (small levers) that the performer presses down or strikes with the fingers and thumbs of both hands to cause the hammers to strike the strings.
the word piano is a shortened form of pianoforte, the italian term for the early 1700s versions of the instrument, which in turn derives from gravicembalo col piano e forte[2] and fortepiano. the italian musical terms piano and forte indicate "soft" and "loud" respectively,[3] in this context referring to the variations in volume ...
== running man (tv series)
running man was classified as an "urban action variety"; a genre of variety shows in an urban environment.[1] the mcs and guests were to complete missions at a landmark to win the race.[2] the show has since shifted to a more familiar reality-variety show concept focused on games. it has garnered attention as being the comeback program for yoo jae-suk, the main mc of the program, after leaving good sunday's family outing in february 2010.[3]
the show has become popular in other parts of asia, and has gained online popularity among hallyu fans, having been fansubbed into various languages, such as english, spanish, portuguese, french, italian, thai, vietnamese, chinese, ...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== wheelbarrow
a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.
the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...
== lemon
the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.
the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...
== oklahoma city
oklahoma city (/oʊkləˌhoʊmə -/), often shortened to okc, is the capital and largest city of the u.s. state of oklahoma. the county seat of oklahoma county,[8] the city ranks 27th among united states cities in population. the population grew following the 2010 census, with the population estimated to have increased to 643,648 as of july 2017.[5] as of 2015, the oklahoma city metropolitan area had a population of 1,358,452,[9] and the oklahoma city-shawnee combined statistical area had a population of 1,459,758 residents,[9] making it oklahoma's largest metropolitan area.
oklahoma city's city limits extend into canadian,...
== treaty of paris (1763)
the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.
the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...
![Page 22: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/22.jpg)
Please enable JavaScript to use our site.
HomeProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.
Menu
Lemon
Introduction
The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.
Article
The origin of the lemon is unknown, though lemons are thought to have first grown in Assam (a region in northeast India), northern Burma or China.A genomic study of the lemon indicated it was a hybrid between bitter orange (sour orange) and citron.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Curabitur in tempus quam. In mollis et ante at consectetur.Aliquam erat volutpat.Donec at lacinia est.Duis semper, magna tempor interdum suscipit, ante elit molestie urna, eget efficitur risus nunc ac elit.Fusce quis blandit lectus.Mauris at mauris a turpis tristique lacinia at nec ante.Aenean in scelerisque tellus, a efficitur ipsum.Integer justo enim, ornare vitae sem non, mollis fermentum lectus.Mauris ultrices nisl at libero porta sodales in ac orci.
function Ball(r) { this.radius = r; this.area = pi * r ** 2; this.show = function(){ drawCircle(r); }}
Common Crawl Web Extracted Text
![Page 23: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/23.jpg)
Please enable JavaScript to use our site.
HomeProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.
Menu
Lemon
Introduction
The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.
Article
The origin of the lemon is unknown, though lemons are thought to have first grown in Assam (a region in northeast India), northern Burma or China.A genomic study of the lemon indicated it was a hybrid between bitter orange (sour orange) and citron.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Curabitur in tempus quam. In mollis et ante at consectetur.Aliquam erat volutpat.Donec at lacinia est.Duis semper, magna tempor interdum suscipit, ante elit molestie urna, eget efficitur risus nunc ac elit.Fusce quis blandit lectus.Mauris at mauris a turpis tristique lacinia at nec ante.Aenean in scelerisque tellus, a efficitur ipsum.Integer justo enim, ornare vitae sem non, mollis fermentum lectus.Mauris ultrices nisl at libero porta sodales in ac orci.
function Ball(r) { this.radius = r; this.area = pi * r ** 2; this.show = function(){ drawCircle(r); }}
Common Crawl Web Extracted Text
![Page 24: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/24.jpg)
![Page 25: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/25.jpg)
Thank you for inviting me to your party last week.Original text
![Page 26: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/26.jpg)
Thank you for inviting me to your party last week.Original text
![Page 27: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/27.jpg)
Thank you <X> me to your party <Y> week.
Thank you for inviting me to your party last week.Original text
Inputs
![Page 28: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/28.jpg)
Thank you <X> me to your party <Y> week.
Thank you for inviting me to your party last week.Original text
Inputs
<X> for inviting <Y> last <Z>Targets
![Page 29: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/29.jpg)
219 steps235 or ~34B tokens
Inverse square root learning rate schedule
Pretrain
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
![Page 30: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/30.jpg)
GLUE
219 steps235 or ~34B tokens
Inverse square root learning rate schedule
Pretrain
Finetune
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
218 steps234 or ~17B tokens
Constant learning rate
![Page 31: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/31.jpg)
218 steps234 or ~17B tokens
Constant learning rate
GLUE
CNN/DM
219 steps235 or ~34B tokens
Inverse square root learning rate schedule
Pretrain
Finetune
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
![Page 32: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/32.jpg)
218 steps234 or ~17B tokens
Constant learning rate
GLUE
CNN/DM
SQuAD
219 steps235 or ~34B tokens
Inverse square root learning rate schedule
Pretrain
Finetune
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
![Page 33: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/33.jpg)
218 steps234 or ~17B tokens
Constant learning rate
GLUE
CNN/DM
SQuAD
SuperGLUE
219 steps235 or ~34B tokens
Inverse square root learning rate schedule
Pretrain
Finetune
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
![Page 34: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/34.jpg)
218 steps234 or ~17B tokens
Constant learning rate
GLUE
CNN/DM
SQuAD
SuperGLUE
WMT14 EnDe
WMT15 EnFr
WMT16 EnRo219 steps
235 or ~34B tokensInverse square root learning
rate schedule
Pretrain
Finetune
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
![Page 35: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/35.jpg)
218 steps234 or ~17B tokens
Constant learning rate
GLUE
CNN/DM
SQuAD
SuperGLUE
WMT14 EnDe
WMT15 EnFr
WMT16 EnRo219 steps
235 or ~34B tokensInverse square root learning
rate schedule
Pretrain
Finetune Evaluate on validation
step 750000
step 760000
step 770000
step 780000
Evaluate all checkpoints, choose the best
BERTBASE-sized encoder-decoder
Transformer
C4 dataset
Denoising objective
![Page 36: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/36.jpg)
Downstream task performanceSetting 1Setting 2...
![Page 37: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/37.jpg)
![Page 38: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/38.jpg)
Star denotes baseline Comparable to BERT Bold = 1 std. dev. of max
Big training set
![Page 39: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/39.jpg)
Disclaimer
![Page 40: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/40.jpg)
x1 x2 x3 x4
y1 y2 .
Enco
der
Dec
oder
x1 x2 x3 y1 y2
x2 x3 y1 y2 .
Language model
x1 x2 x3 y1 y2
x2 x3 y1 y2 .
Prefix LM
![Page 41: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/41.jpg)
High-levelapproaches
BERT-style
Deshuffling
Language modeling
Corruption strategies
Mask
Drop
Replace spans
10%
15%
25%
Corruption rate
50%
2
3
5
Corrupted span length
10
![Page 42: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/42.jpg)
Please enable JavaScript to use our site.
HomeAboutProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
Please enable JavaScript to use our site.
HomeAboutProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
Please enable JavaScript to use our site.
HomeAboutProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
Please enable JavaScript to use our site.
HomeAboutProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
Much better on MultiRC
Much better on ReCoRDMuch worse on CoLA
{Order of magnitude smaller
Please enable JavaScript to use our site.
HomeAboutProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
Please enable JavaScript to use our site.
HomeAboutProductsShippingContactFAQ
Dried Lemons, $3.59/pound
Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.
![Page 43: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/43.jpg)
![Page 44: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/44.jpg)
Task
Mix
ing
wei
ght Tem
perature (T)
Threshold (K)
![Page 45: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/45.jpg)
Task A Task B
Task C
Task A
Task B
Task C
UnsupervisedTask
Task A
Task B
Task C
UnsupervisedTask
Task ATask B
Task C
UnsupervisedTask
Task ATask B
Task C
Task A
Task B
Task C
UnsupervisedTask
Task A
Task C
Task B
![Page 46: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/46.jpg)
![Page 47: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/47.jpg)
Encoder-decoder architecture
Span prediction objective
C4 dataset
Multi-task pre-training
Bigger models trained longer
![Page 48: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/48.jpg)
Model size variants
![Page 49: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/49.jpg)
Human score = 89.8
Back-translation beats English-only pre-training
![Page 50: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/50.jpg)
https://github.com/google-research/text-to-text-transfer-transformer
![Page 52: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/52.jpg)
What about all of the other languages?
![Page 53: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/53.jpg)
![Page 54: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/54.jpg)
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
![Page 55: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/55.jpg)
English3B pages 3T tokens
Yoruba50K pages
50M tokens
Slide from Noah Constant
![Page 56: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/56.jpg)
Slide from Noah Constant
![Page 57: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/57.jpg)
XNLI Zero-shot AccuracyUrdu Russian
α=0.2 73.9 81.2α=0.3 73.5 81.5α=0.7 71.7 82.8
Slide from Noah Constant
![Page 58: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/58.jpg)
Slide from Noah Constant
![Page 59: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/59.jpg)
TyDi QA GoldP Performance
Slide from Noah Constant
![Page 60: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/60.jpg)
How much knowledge does a language model
pick up during pre-training?
![Page 61: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/61.jpg)
Reading Comprehension
"The lemon tree's ellipsoidal yellow fruit is used for culinary and non-culinary
purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses. The pulp and rind are also
used in cooking and baking."
"What color is a lemon?"
Model yellow
Question
Context
![Page 62: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/62.jpg)
"What color is a lemon?"
Model yellow
Question
Database
Open-Domain Question Answering
"The lemon tree's ellipsoidal yellow fruit is used for culinary and non-culinary
purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses. The pulp and rind are also
used in cooking and baking."
![Page 63: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/63.jpg)
"What color is a lemon?" Model yellow
Question
Closed-Book Question Answering
![Page 64: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/64.jpg)
President Franklin <M> born <M> January 1882.
Our <M> hand-picked and sun-dried <M> orchard in Georgia.
Lily couldn't <M>. The waitress had brought the largest <M> of
chocolate cake <M> seen. T5D. Roosevelt was <M> in
believe her eyes <M> piece <M> she had ever
peaches are <M> at our
When was Franklin D. Roosevelt born? T5 1882
President Franklin D. Roosevelt was bornin January 1882.
Pre-training
Fine-tuning
![Page 65: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/65.jpg)
![Page 66: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/66.jpg)
T5 Ana Santos Aramburo<M> (born 1957) is a Spanish librarian
who has been the director of the National Library of Spain since February 2013.
SSM data from "REALM: Retrieval-Augmented Language Model Pre-Training" by Guu et al.
![Page 67: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/67.jpg)
![Page 68: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/68.jpg)
![Page 69: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/69.jpg)
✅✅
❌
��
![Page 70: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/70.jpg)
12.5% 25% 37.5% 50% 62.5%
❌ True Negative
✅ Phrasing mismatch
✅ Incomplete annotation
🗑 Unanswerable
Exact Match: 36.6 → 57.8%!
![Page 71: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/71.jpg)
Do large language models memorize their
training data?
![Page 72: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/72.jpg)
“... the extent that a work is produced with a machine learning tool that was trained on a large number of copyrighted works, the degree of copying with respect to any given work is likely to be, at most, de minimis.”
– Electronic Frontier Foundation
“Well-constructed AI systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus.”
– OpenAI
![Page 73: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/73.jpg)
![Page 74: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/74.jpg)
Top-n samplingDecaying-temperature sampling
Conditioning on Internet text
Perplexity… vs. different GPT
… vs. zlib… vs. lowercased
Windowed perplexity
In training set?
![Page 75: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/75.jpg)
![Page 76: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/76.jpg)
![Page 77: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/77.jpg)
![Page 78: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/78.jpg)
Can we close the gap between large and small models by improving the
Transformer architecture?
![Page 79: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/79.jpg)
Source: http://jalammar.github.io/illustrated-transformer/
![Page 80: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/80.jpg)
Factorized embeddings
![Page 81: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/81.jpg)
Shared embedding and softmax layer
![Page 82: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/82.jpg)
Mixture of Softmaxes, Adaptive softmax
![Page 83: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/83.jpg)
RMSNorm, ReZero, FixUp
![Page 84: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/84.jpg)
Transparent Attention,Lightweight & Dynamic Convolutions,
Synthesizer
![Page 85: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/85.jpg)
Nonlinearities,Mixture of Experts,Switch Transformer
![Page 86: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/86.jpg)
Funnel Transformer, Evolved Transformer, Universal Transformer, block sharing ...
![Page 87: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/87.jpg)
![Page 88: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/88.jpg)
![Page 89: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/89.jpg)
![Page 90: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/90.jpg)
![Page 91: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/91.jpg)
Validation loss
Supe
rGLU
E sc
ore
![Page 92: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/92.jpg)
Validation loss
Supe
rGLU
E sc
ore
Transparent Attention
Switch Transformer
![Page 93: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/93.jpg)
Validation loss
WQ
Acc
urac
yTransparent Attention
Switch Transformer
![Page 94: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/94.jpg)
- Is our codebase unusual?
- Are our tasks non-standard?
- Do we need to tune hyperparameters?
- Did we implement the modifications correctly?
- Do Transformer modifications not “transfer”?
![Page 95: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time.](https://reader036.fdocuments.us/reader036/viewer/2022081623/6145661134130627ed50f336/html5/thumbnails/95.jpg)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
mT5: A massively multilingual pre-trained text-to-text transformer
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Extracting Training Data from Large Language Models
Do Transformer Modifications Transfer Across Implementations and Applications?
Work done with Adam Roberts, Aditya Barua, Aditya Siddhant, Alina Oprea, Ariel Herbert-Voss, Dawn Song, Eric Wallace, Florian Tramer, Hyung Won Chung, Jake Marcus, Karishma Malkan, Katherine Lee, Linting Xue, Matthew Jagielski, Michael Matena, Mihir Kale, Nan Ding, Nicholas Carlini, Noah Constant, Noah Fiedel, Noam Shazeer, Peter J. Liu, Rami Al-Rfou, Sharan Narang, Thibault Fevry, Tom Brown, Ulfar Erlingsson, Wei Li, William Fedus, Yanqi Zhou, Yi Tay, and Zhenzhong Lan
Questions?