a maxent approach to textsetting - UCLA Department of Linguistics
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
-
Upload
jasmine-booker -
Category
Documents
-
view
219 -
download
1
Transcript of Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
Part of Speech Tagging with MaxEnt Re-ranked Hidden
Markov Model
Brian Highfill
Part of Speech Tagging
• Train a model on a set of hand-tagged sentences• Find best sequence of POS tags for new sentence
• Generative Models• Hidden Markov Model HMM
• Discriminative Models• Maximum Entropy Markov Model (MEMM)
• Brown Corpus• ~57,000 tagged sentences• 87 tags (reduced to 45 for Penn TreeBank tagging)• ~300 tags including compound tags• that_DT fire's_NN+BEZ too_QL big_JJ ._.
• “fire’s” = fire_NN is_BEZ
Hidden Markov Models
• Set of hidden states (POS tags)• Set of observations (word tokens)
• Dependents ONLY on current tag
• HMM parameters• Transition probabilities : P(ti|t0…ti) = P(ti|ti-1)
• Observation probabilities: P(wi|t0…tn,w0…wn) = P(wi|ti)
• Initial tag distribution: P(t0)
DT (singular determiner)
• That
NN+BEZ (common noun + “is”)
• Fire’s
QL (qualifier)
• too
JJ (adjective)
• big
HMM Best Tag Sequence
• For HMM, the Viterbi algorithm finds the most probable tagging for a new sentence
• For re-ranking later, we want not the best tagging but the k best tagging for each sentence
HMM Beam Search
• Step1• Enumerate all possible tags for the first
word
• Step 2• Evaluate each tagging using trained HMM
keep only the best k (first word sentence taggings)
• Step 3• For each of the k taggings of the previous
step, enumerate all possible tags for the second word
• Step 4• Evaluate each two-word sentence tagging
and discard all the k best.
• Repeat for all words in the sentence
Start
Word 0 <tag 1-
1>Word 0 <tag 1-
2>
...
Word 0 <tag 1-r>
Start
Word 1 <tag 1-
1'>
Word 2 <tag 2-1>
...
Word 2 <tag 2-r>
...
Word 2 <tag 2-1>
...
Word 2 <tag 2-r>
Word 1 <tag 1-
k'>
Word 2 <tag 2-1>
...
Word 2 <tag 2-r>
MaxEnt Re-ranking
• After beam search, we have the k “best” taggings for our sentence
• Use trained MaxEnt model to select most probable sequence of tags
Start
Word 1 <tag 1-
1'>...
Word t <tag t-
1'>
... ...Word t <tag t-
2'>
... ... ...
Word 1 <tag 1-
k'>... Word t
<tag t-k'>
Results
• Feature• Current word• Previous tag• <Word AND previous Tag>• Word contains a numeral• “-ing”• “-ness”• “-ity”• “-ed”• “-able”• “-s”• “-ion”• “-al”• “-ive”• “-ly”• Word is capitalized• Word is hyphenated• Word is all uppercase• Word is all uppercase with a numeral• Word is capitalized and a word ending in “Co.” or “Inc.” is found within 3 words ahead
Results
• Baseline “Most frequent class tagger”: 73.41% (24%)
• HMM Viterbi tagger: 92.96% (32.76% on <UNK>)
1 2 3 4 5 10 2089
89.5
90
90.5
91
91.5
92
0
10
20
30
40
50
60
Known Word Accuracy
Beam Search Width (K)
Accura
cy