Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
-
Upload
rachel-mills -
Category
Documents
-
view
223 -
download
2
Transcript of Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Sequence ModelsSequence Models
Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence
COS302COS302
Michael L. LittmanMichael L. Littman
Fall 2001Fall 2001
AdministrationAdministration
Exams enjoyed Toronto.Exams enjoyed Toronto.
Letter grades for programs:Letter grades for programs:
A: 74-100 (31)A: 74-100 (31)
B: 30-60 (20)B: 30-60 (20)
C: 10-15 (4)C: 10-15 (4)
?:?: (7) (7)
(0 did not imply “incorrect”)(0 did not imply “incorrect”)
Shannon GameShannon Game
Sue swallowed the large green __.Sue swallowed the large green __.
pepperpepper frogfrog
peapea pillpill
Not:Not:
ideaidea beigebeige
runningrunning veryvery
““AI Complete” ProblemAI Complete” Problem
My mom told me that playing My mom told me that playing Monopoly® with toddlers was a Monopoly® with toddlers was a bad idea, but I thought it would be bad idea, but I thought it would be ok. I was wrong. Billy chewed on ok. I was wrong. Billy chewed on the “Get Out of Jail Free Card”. the “Get Out of Jail Free Card”. Todd ran away with the little metal Todd ran away with the little metal dog. dog. Sue swallowed the large Sue swallowed the large green __.green __.
Language ModelingLanguage Modeling
If we had a way of assigning If we had a way of assigning probabilities to sentences, we could probabilities to sentences, we could solve this. How?solve this. How?
Pr(Pr(Sue swallowed the large green cat.Sue swallowed the large green cat.))
Pr(Pr(Sue swallowed the large green odd.Sue swallowed the large green odd.))
How could such a thing be learned from How could such a thing be learned from data?data?
Why Play This Game?Why Play This Game?
Being able to assign likelihood to Being able to assign likelihood to sentences a useful way of sentences a useful way of processing language.processing language.
Speech recognitionSpeech recognitionCriterion for comparing language Criterion for comparing language
modelsmodelsTechniques useful for other problemsTechniques useful for other problems
Statistical EstimationStatistical Estimation
To use statistical estimation:To use statistical estimation:• Divide data into equivalence Divide data into equivalence
classesclasses• Estimate parameters for the Estimate parameters for the
different classesdifferent classes
Conflicting InterestsConflicting Interests
ReliabilityReliability• Lots of data in each classLots of data in each class• So, small number of classesSo, small number of classes
DiscriminationDiscrimination• All relevant distinctions madeAll relevant distinctions made• So, large number of classesSo, large number of classes
End PointsEnd Points
Unigram model:Unigram model:Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = ) =
Pr(w)Pr(w)
Exact match model:Exact match model:
Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = ) = Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ))
What word would these suggest?What word would these suggest?
N-grams: CompromiseN-grams: Compromise
N-grams are simple, powerful.N-grams are simple, powerful.Bigram model:Bigram model:Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = Pr(w ) = Pr(w
| | green ___green ___ ))Trigram model:Trigram model:Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = Pr(w ) = Pr(w
| | large green ___large green ___ ))Not perfect: misses “swallowed”.Not perfect: misses “swallowed”. pillowpillow crystalcrystal catepillarcatepillar IguanaIguana SantaSanta tigerstigers
Aside: SyntaxAside: Syntax
Can do better with a little bit of knowledge Can do better with a little bit of knowledge about grammar:about grammar:
Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = ) = Pr(w | Pr(w | modified by modified by swallowedswallowed,, the the,, green green ))
pillpill dyedye oneone pineapplepineapple dragondragon beansbeans speckspeck liquidliquid solutionsolution drinkdrink
Estimating TrigramsEstimating Trigrams
Treat sentences independently. Ok?Treat sentences independently. Ok?
Pr(wPr(w11 w w22))
Pr(wPr(wjj | w | wj-1j-1 w wj-2j-2))
Pr(EOS | wPr(EOS | wj-1j-1 w wj-2j-2))
Simple so far.Simple so far.
SparsitySparsity
Pr(w| Pr(w| comes acrosscomes across))
asas 8/10 (in Austen’s works)8/10 (in Austen’s works)
aa 1/101/10
moremore 1/101/10
thethe 0/100/10
Don’t estimate as zeros!Don’t estimate as zeros!
Can use Laplace smoothing, e.g., or back Can use Laplace smoothing, e.g., or back off to bigram, unigram.off to bigram, unigram.
Unreliable WordsUnreliable Words
Can’t take much stock in words only Can’t take much stock in words only seen once (seen once (hapax legomenahapax legomena). ). Change to “Change to “UNKUNK”.”.
Generally a small fraction of the Generally a small fraction of the tokens and half the types.tokens and half the types.
The boy saw the dog.The boy saw the dog.
5 tokens, 4 types.5 tokens, 4 types.
Zipf’s LawZipf’s Law
Frequency is proportional to rank.Frequency is proportional to rank.
Thus, extremely long tail!Thus, extremely long tail!
Word Frequencies in Tom Word Frequencies in Tom SawyerSawyer
0500
100015002000250030003500
Using TrigramsUsing Trigrams
Hand me the ___ knife now .Hand me the ___ knife now .
butterbutter
knifeknife
CountsCounts
me theme the 28326702832670me the butterme the butter 88 88me the knifeme the knife 638 638the knifethe knife 154771 154771the knife knife the knife knife 72 72the butterthe butter 92304 92304the butter knifethe butter knife 559 559knife knifeknife knife 7831 7831knife knife nowknife knife now 4 4butter knifebutter knife 9046 9046butter knife nowbutter knife now 15 15
Markov ModelMarkov Model
Hand me
me the
the the butter
the knifeknife knife
butter knife
butter
knifeknife
knife
knife now
-2.4-10.4
-8.4
-5.1
-7.7
now
now-7.6
-6.4
General SchemeGeneral Scheme
Pr(wPr(wjj = x | w = x | w11 w w22 … EOS) … EOS)
= Pr(w= Pr(w11 w w22 … x … EOS) … x … EOS) / sum x Pr(w/ sum x Pr(w11 w w22 … x … EOS) … x … EOS)
Maximized by Pr(wMaximized by Pr(w11 w w22 … x … EOS) … x … EOS)
= Pr(w= Pr(w1 1 ww22) … Pr(x | w) … Pr(x | wj-1j-1 w wj-2j-2) ) Pr(wPr(wj+1j+1 | w | wj-1j-1 x) Pr(wx) Pr(wj+2j+2 | x w | x wj+1j+1) … ) … Pr( EOS | wPr( EOS | wn-1n-1 w wnn))
Maximized by Pr(x | wMaximized by Pr(x | wj-1j-1 w wj-2j-2) ) Pr(wPr(wj+1j+1 | w | wj-1j-1 x) Pr(wx) Pr(wj+2j+2 | x w | x wj+1j+1))
Mutual InformationMutual Information
Log(Pr(x and y)/Pr(x) Pr(y))Log(Pr(x and y)/Pr(x) Pr(y))
Measures the degree to which two Measures the degree to which two events are independent (how much events are independent (how much “information” we learn about one “information” we learn about one from knowing the other).from knowing the other).
Mutual Inf. ApplicationMutual Inf. Application
Measure of strength of association Measure of strength of association between wordsbetween words
leviedlevied: : imposedimposed vs. vs. believedbelieved
Reduces to simplyReduces to simply
Pr(Pr(leviedlevied|x) = Pr(|x) = Pr(leviedlevied, x)/Pr(x), x)/Pr(x)
=count(=count(leviedlevied and x) / count (x) and x) / count (x)
““imposedimposed” has higher score.” has higher score.
Analogy IdeaAnalogy Idea
Find a linking word such that a Find a linking word such that a mutual information score is mutual information score is maximized.maximized.
Tricky to find the right word. Unclear Tricky to find the right word. Unclear if any word will have the right if any word will have the right effect.effect.
traffictraffic flowsflows through the through the streetstreet waterwater flowsflows through the through the riverbedriverbed
What to LearnWhat to Learn
Reliability/discrimination tradeoff.Reliability/discrimination tradeoff.
Definition of N-gram modelsDefinition of N-gram models
How to find most likely word in an N-How to find most likely word in an N-gram modelgram model
Mutual InformationMutual Information
Homework 7 (due 11/21)Homework 7 (due 11/21)
1.1. Give a maximization scheme for Give a maximization scheme for filling in the two blanks in a filling in the two blanks in a sentence like “I hate it when ___ sentence like “I hate it when ___ goes ___ on me.” Be somewhat goes ___ on me.” Be somewhat rigorous to make the TA’s job rigorous to make the TA’s job easier.easier.
2.2. more soonmore soon