Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

25
Sequence Models Sequence Models Introduction to Introduction to Artificial Intelligence Artificial Intelligence COS302 COS302 Michael L. Littman Michael L. Littman Fall 2001 Fall 2001

Transcript of Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Page 1: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Sequence ModelsSequence Models

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

Page 2: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

AdministrationAdministration

Exams enjoyed Toronto.Exams enjoyed Toronto.

Letter grades for programs:Letter grades for programs:

A: 74-100 (31)A: 74-100 (31)

B: 30-60 (20)B: 30-60 (20)

C: 10-15 (4)C: 10-15 (4)

?:?: (7) (7)

(0 did not imply “incorrect”)(0 did not imply “incorrect”)

Page 3: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Shannon GameShannon Game

Sue swallowed the large green __.Sue swallowed the large green __.

pepperpepper frogfrog

peapea pillpill

Not:Not:

ideaidea beigebeige

runningrunning veryvery

Page 4: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

““AI Complete” ProblemAI Complete” Problem

My mom told me that playing My mom told me that playing Monopoly® with toddlers was a Monopoly® with toddlers was a bad idea, but I thought it would be bad idea, but I thought it would be ok. I was wrong. Billy chewed on ok. I was wrong. Billy chewed on the “Get Out of Jail Free Card”. the “Get Out of Jail Free Card”. Todd ran away with the little metal Todd ran away with the little metal dog. dog. Sue swallowed the large Sue swallowed the large green __.green __.

Page 5: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Language ModelingLanguage Modeling

If we had a way of assigning If we had a way of assigning probabilities to sentences, we could probabilities to sentences, we could solve this. How?solve this. How?

Pr(Pr(Sue swallowed the large green cat.Sue swallowed the large green cat.))

Pr(Pr(Sue swallowed the large green odd.Sue swallowed the large green odd.))

How could such a thing be learned from How could such a thing be learned from data?data?

Page 6: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Why Play This Game?Why Play This Game?

Being able to assign likelihood to Being able to assign likelihood to sentences a useful way of sentences a useful way of processing language.processing language.

Speech recognitionSpeech recognitionCriterion for comparing language Criterion for comparing language

modelsmodelsTechniques useful for other problemsTechniques useful for other problems

Page 7: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Statistical EstimationStatistical Estimation

To use statistical estimation:To use statistical estimation:• Divide data into equivalence Divide data into equivalence

classesclasses• Estimate parameters for the Estimate parameters for the

different classesdifferent classes

Page 8: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Conflicting InterestsConflicting Interests

ReliabilityReliability• Lots of data in each classLots of data in each class• So, small number of classesSo, small number of classes

DiscriminationDiscrimination• All relevant distinctions madeAll relevant distinctions made• So, large number of classesSo, large number of classes

Page 9: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

End PointsEnd Points

Unigram model:Unigram model:Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = ) =

Pr(w)Pr(w)

Exact match model:Exact match model:

Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = ) = Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ))

What word would these suggest?What word would these suggest?

Page 10: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

N-grams: CompromiseN-grams: Compromise

N-grams are simple, powerful.N-grams are simple, powerful.Bigram model:Bigram model:Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = Pr(w ) = Pr(w

| | green ___green ___ ))Trigram model:Trigram model:Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = Pr(w ) = Pr(w

| | large green ___large green ___ ))Not perfect: misses “swallowed”.Not perfect: misses “swallowed”. pillowpillow crystalcrystal catepillarcatepillar IguanaIguana SantaSanta tigerstigers

Page 11: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Aside: SyntaxAside: Syntax

Can do better with a little bit of knowledge Can do better with a little bit of knowledge about grammar:about grammar:

Pr(w | Pr(w | Sue swallowed the large green ___.Sue swallowed the large green ___. ) = ) = Pr(w | Pr(w | modified by modified by swallowedswallowed,, the the,, green green ))

pillpill dyedye oneone pineapplepineapple dragondragon beansbeans speckspeck liquidliquid solutionsolution drinkdrink

Page 12: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Estimating TrigramsEstimating Trigrams

Treat sentences independently. Ok?Treat sentences independently. Ok?

Pr(wPr(w11 w w22))

Pr(wPr(wjj | w | wj-1j-1 w wj-2j-2))

Pr(EOS | wPr(EOS | wj-1j-1 w wj-2j-2))

Simple so far.Simple so far.

Page 13: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

SparsitySparsity

Pr(w| Pr(w| comes acrosscomes across))

asas 8/10 (in Austen’s works)8/10 (in Austen’s works)

aa 1/101/10

moremore 1/101/10

thethe 0/100/10

Don’t estimate as zeros!Don’t estimate as zeros!

Can use Laplace smoothing, e.g., or back Can use Laplace smoothing, e.g., or back off to bigram, unigram.off to bigram, unigram.

Page 14: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Unreliable WordsUnreliable Words

Can’t take much stock in words only Can’t take much stock in words only seen once (seen once (hapax legomenahapax legomena). ). Change to “Change to “UNKUNK”.”.

Generally a small fraction of the Generally a small fraction of the tokens and half the types.tokens and half the types.

The boy saw the dog.The boy saw the dog.

5 tokens, 4 types.5 tokens, 4 types.

Page 15: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Zipf’s LawZipf’s Law

Frequency is proportional to rank.Frequency is proportional to rank.

Thus, extremely long tail!Thus, extremely long tail!

Page 16: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Word Frequencies in Tom Word Frequencies in Tom SawyerSawyer

0500

100015002000250030003500

Page 17: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Using TrigramsUsing Trigrams

Hand me the ___ knife now .Hand me the ___ knife now .

butterbutter

knifeknife

Page 18: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

CountsCounts

me theme the 28326702832670me the butterme the butter 88 88me the knifeme the knife 638 638the knifethe knife 154771 154771the knife knife the knife knife 72 72the butterthe butter 92304 92304the butter knifethe butter knife 559 559knife knifeknife knife 7831 7831knife knife nowknife knife now 4 4butter knifebutter knife 9046 9046butter knife nowbutter knife now 15 15

Page 19: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Markov ModelMarkov Model

Hand me

me the

the the butter

the knifeknife knife

butter knife

butter

knifeknife

knife

knife now

-2.4-10.4

-8.4

-5.1

-7.7

now

now-7.6

-6.4

Page 20: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

General SchemeGeneral Scheme

Pr(wPr(wjj = x | w = x | w11 w w22 … EOS) … EOS)

= Pr(w= Pr(w11 w w22 … x … EOS) … x … EOS) / sum x Pr(w/ sum x Pr(w11 w w22 … x … EOS) … x … EOS)

Maximized by Pr(wMaximized by Pr(w11 w w22 … x … EOS) … x … EOS)

= Pr(w= Pr(w1 1 ww22) … Pr(x | w) … Pr(x | wj-1j-1 w wj-2j-2) ) Pr(wPr(wj+1j+1 | w | wj-1j-1 x) Pr(wx) Pr(wj+2j+2 | x w | x wj+1j+1) … ) … Pr( EOS | wPr( EOS | wn-1n-1 w wnn))

Maximized by Pr(x | wMaximized by Pr(x | wj-1j-1 w wj-2j-2) ) Pr(wPr(wj+1j+1 | w | wj-1j-1 x) Pr(wx) Pr(wj+2j+2 | x w | x wj+1j+1))

Page 21: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Mutual InformationMutual Information

Log(Pr(x and y)/Pr(x) Pr(y))Log(Pr(x and y)/Pr(x) Pr(y))

Measures the degree to which two Measures the degree to which two events are independent (how much events are independent (how much “information” we learn about one “information” we learn about one from knowing the other).from knowing the other).

Page 22: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Mutual Inf. ApplicationMutual Inf. Application

Measure of strength of association Measure of strength of association between wordsbetween words

leviedlevied: : imposedimposed vs. vs. believedbelieved

Reduces to simplyReduces to simply

Pr(Pr(leviedlevied|x) = Pr(|x) = Pr(leviedlevied, x)/Pr(x), x)/Pr(x)

=count(=count(leviedlevied and x) / count (x) and x) / count (x)

““imposedimposed” has higher score.” has higher score.

Page 23: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Analogy IdeaAnalogy Idea

Find a linking word such that a Find a linking word such that a mutual information score is mutual information score is maximized.maximized.

Tricky to find the right word. Unclear Tricky to find the right word. Unclear if any word will have the right if any word will have the right effect.effect.

traffictraffic flowsflows through the through the streetstreet waterwater flowsflows through the through the riverbedriverbed

Page 24: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

What to LearnWhat to Learn

Reliability/discrimination tradeoff.Reliability/discrimination tradeoff.

Definition of N-gram modelsDefinition of N-gram models

How to find most likely word in an N-How to find most likely word in an N-gram modelgram model

Mutual InformationMutual Information

Page 25: Sequence Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 7 (due 11/21)Homework 7 (due 11/21)

1.1. Give a maximization scheme for Give a maximization scheme for filling in the two blanks in a filling in the two blanks in a sentence like “I hate it when ___ sentence like “I hate it when ___ goes ___ on me.” Be somewhat goes ___ on me.” Be somewhat rigorous to make the TA’s job rigorous to make the TA’s job easier.easier.

2.2. more soonmore soon