Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

8
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill

Transcript of Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Page 1: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Part of Speech Tagging with MaxEnt Re-ranked Hidden

Markov Model

Brian Highfill

Page 2: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Part of Speech Tagging

• Train a model on a set of hand-tagged sentences• Find best sequence of POS tags for new sentence

• Generative Models• Hidden Markov Model HMM

• Discriminative Models• Maximum Entropy Markov Model (MEMM)

• Brown Corpus• ~57,000 tagged sentences• 87 tags (reduced to 45 for Penn TreeBank tagging)• ~300 tags including compound tags• that_DT fire's_NN+BEZ too_QL big_JJ ._.

• “fire’s” = fire_NN is_BEZ

Page 3: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Hidden Markov Models

• Set of hidden states (POS tags)• Set of observations (word tokens)

• Dependents ONLY on current tag

• HMM parameters• Transition probabilities : P(ti|t0…ti) = P(ti|ti-1)

• Observation probabilities: P(wi|t0…tn,w0…wn) = P(wi|ti)

• Initial tag distribution: P(t0)

DT (singular determiner)

• That

NN+BEZ (common noun + “is”)

• Fire’s

QL (qualifier)

• too

JJ (adjective)

• big

Page 4: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

HMM Best Tag Sequence

• For HMM, the Viterbi algorithm finds the most probable tagging for a new sentence

• For re-ranking later, we want not the best tagging but the k best tagging for each sentence

Page 5: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

HMM Beam Search

• Step1• Enumerate all possible tags for the first

word

• Step 2• Evaluate each tagging using trained HMM

keep only the best k (first word sentence taggings)

• Step 3• For each of the k taggings of the previous

step, enumerate all possible tags for the second word

• Step 4• Evaluate each two-word sentence tagging

and discard all the k best.

• Repeat for all words in the sentence

Start

Word 0 <tag 1-

1>Word 0 <tag 1-

2>

...

Word 0 <tag 1-r>

Start

Word 1 <tag 1-

1'>

Word 2 <tag 2-1>

...

Word 2 <tag 2-r>

...

Word 2 <tag 2-1>

...

Word 2 <tag 2-r>

Word 1 <tag 1-

k'>

Word 2 <tag 2-1>

...

Word 2 <tag 2-r>

Page 6: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

MaxEnt Re-ranking

• After beam search, we have the k “best” taggings for our sentence

• Use trained MaxEnt model to select most probable sequence of tags

Start

Word 1 <tag 1-

1'>...

Word t <tag t-

1'>

... ...Word t <tag t-

2'>

... ... ...

Word 1 <tag 1-

k'>... Word t

<tag t-k'>

Page 7: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Results

• Feature• Current word• Previous tag• <Word AND previous Tag>• Word contains a numeral• “-ing”• “-ness”• “-ity”• “-ed”• “-able”• “-s”• “-ion”• “-al”• “-ive”• “-ly”• Word is capitalized• Word is hyphenated• Word is all uppercase• Word is all uppercase with a numeral• Word is capitalized and a word ending in “Co.” or “Inc.” is found within 3 words ahead

Page 8: Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Results

• Baseline “Most frequent class tagger”: 73.41% (24%)

• HMM Viterbi tagger: 92.96% (32.76% on <UNK>)

1 2 3 4 5 10 2089

89.5

90

90.5

91

91.5

92

0

10

20

30

40

50

60

Known Word Accuracy

Beam Search Width (K)

Accura

cy