Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech...

15
Introduction to Part of Speech Tagging Alan Ritter Many slides adapted from Brendan O’Connor Chris Manning

Transcript of Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech...

Page 1: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

IntroductiontoPartofSpeechTagging

AlanRitter

ManyslidesadaptedfromBrendanO’Connor ChrisManning

Page 2: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

Wherearewegoingwiththis?

• Textclassification:bagsofwords

• Sequencetagging• PartsofSpeech• NamedEntityRecognition• Otherareas:bioinformatics(geneprediction),etc…

Page 3: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

What’sapart-of-speech(POS)?

• Syntax=howwordscomposetoformlargermeaningbearingunits• POS=syntacticcategoriesforwords

• Youcouldsubstitutewordswithinaclassandhaveasyntacticallyvalidsentence

• Givesinformationhowwordscombineintolargerphrases

• Isawthedog• Isawthecat• Isawthe___

Page 4: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

PartsofSpeechisanoldidea

• PerhapsstartingwithAristotleintheWest(384–322BCE),therewastheideaofhavingpartsofspeech

• Schoolgrammar:noun,verb,adjective,adverb,preposition,conjunction,pronoun,interjection

• Manymorefinegrainedpossibilities

https://www.youtube.com/watch?v=ODGA7ssL-6g&index=1&list=PL6795522EAD6CE2F7

Page 5: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

Open class (lexical) words

Closed class (functional)

Nouns Verbs

Proper Common

Modals

Main

Adjectives

Adverbs

Prepositions

Particles

Determiners

Conjunctions

Pronouns

… more

… more

IBMItaly

cat / catssnow

seeregistered

canhad

old older oldest

slowly

to with

off up

the some

and or

he its

Numbers

122,312one

Interjections Ow Eh

Page 6: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

Openvs.Closedclasses

• Openvs.Closedclasses• Closed:

• determiners:a,an,the• pronouns:she,he,I• prepositions:on,under,over,near,by,…• Why“closed”?

• Open:• Nouns,Verbs,Adjectives,Adverbs.

Page 7: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

ManyTaggingStandards

• PennTreebank(45tags)…thisisthemostcommonone• Browncorpus(85tags)• Coarsetagsets

• UniversalPOStags(Petrovet.al.https://github.com/slavpetrov/universal-pos-tags)

• Motivation:cross-linguisticregularities

Page 8: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

Whatarepartsofspeechusefulfor?

• Phraseidentification(chunking)• Namedentityrecognition• InformationExtraction• Parsing

Page 9: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

QuickandDirtyNounPhraseIdentification

Page 10: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

POSTagging

• WordsoftenhavemorethanonePOS:back• Theback door =JJ• Onmyback =NN• Winthevotersback =RB• Promisedtoback thebill =VB

• ThePOStaggingproblemistodeterminethePOStagforaparticularinstanceofaword.

Page 11: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

POSTagging

• Input: Playswellwithothers• Ambiguity:NNS/VBZUH/JJ/NN/RBINNNS• Output: Plays/VBZwell/RBwith/INothers/NNS• Uses:

• Text-to-speech(howdowepronounce“lead”?)• Canwriteregexps like(Det)Adj*N+overtheoutputforphrases,etc.• Asinputtoortospeedupafullparser• Ifyouknowthetag,youcanbackofftoitinothertasks

PennTreebankPOStags

Page 12: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

POStaggingperformance

• Howmanytagsarecorrect?(Tagaccuracy)• About97%currently• Butbaselineisalready90%

• Baselineisperformanceofstupidestpossiblemethod• Tageverywordwithitsmostfrequent tag• Tagunknownwordsasnouns

• Partlyeasybecause• Manywordsareunambiguous• Yougetpointsforthem(the,a,etc.)andforpunctuationmarks!

Page 13: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

Decidingonthecorrectpartofspeechcanbedifficultevenforpeople

• Mrs/NNPShaefer/NNPnever/RBgot/VBDaround/RP to/TOjoining/VBG

• All/DTwe/PRPgotta/VBNdo/VBis/VBZgo/VBaround/INthe/DTcorner/NN

• Chateau/NNPPetrus/NNPcosts/VBZaround/RB250/CD

Page 14: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

HowdifficultisPOStagging?

• About11%ofthewordtypesintheBrowncorpusareambiguouswithregardtopartofspeech

• Buttheytendtobeverycommonwords.E.g.,that• Iknowthat heishonest=IN• Yes,that playwasnice=DT• Youcan’tgothat far=RB

• 40%ofthewordtokensareambiguous

Page 15: Introduction to Part of Speech Taggingaritter.github.io/courses/5525_slides/pos1.pdfParts of Speech is an old idea • Perhaps starting with Aristotle in the West (384 –322 BCE),

It’shardforpeopletoo!