Part-of-Speech Tagging and Hidden Markov Model
Transcript of Part-of-Speech Tagging and Hidden Markov Model
![Page 1: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/1.jpg)
Part-of-Speech Tagging and Hidden Markov ModelBonan Min
Some slides are based on class materials from Thien Huu Nguyen and Ralph Grishman
![Page 2: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/2.jpg)
Parts of Speech (POS)Role of parts-of-speech in grammar
β¦ βpreterminalsβ
β¦ Rules stated in terms of classes of words sharing syntactic properties
noun
verb
adjective
β¦
![Page 3: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/3.jpg)
Parts of Speech (POS)The distributional hypothesis: Words that appear in similar contexts have similar representations (and similar meanings)
Substitution test for POS: if a word is replaced by another word, does the sentence remain grammatical?
He noticed the elephant before anybody else
dog
cat
point
features
*what
*and
![Page 4: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/4.jpg)
Substitution TestThese can often be too strict; some contexts admit substitutability for some pairs but not others.
He noticed the elephant before anybody else
*Sandy
He *arrived the elephant before
Both verbsbut transitive vs. intransitive
Both nounsbut common vs. proper
![Page 5: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/5.jpg)
Parts of Speech (POS)
![Page 6: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/6.jpg)
POS Tag Sets (Categories)Most influential tag sets were those defined for projects to produce large POS-annotated corpora:
Brown corpus⦠1 million words from variety of genres
β¦ 87 tags
UPenn Tree Bank⦠initially 1 million words of Wall Street Journal
β¦ later retagged Brown
β¦ first POS tags, then full parses
β¦ 45 tags (some distinctions captured in parses)
Cross-lingual considerations for POS tags
![Page 7: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/7.jpg)
Penn Treebank POS Tags
Penn Treebank POS Tags
![Page 8: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/8.jpg)
Verbs
http://www.personal.psu.edu/faculty/x/x/xxl13/teaching/sp07/apling597e/resources/Tagset.pdf
![Page 9: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/9.jpg)
Nouns
![Page 10: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/10.jpg)
RP (Particle)Used in combination with a verb⦠She turned the paper over
verb + particle = phrasal verb, often non-compositional⦠turn down, rule out, find out, go on
![Page 11: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/11.jpg)
DT and PDTDT (Articles)β¦ Articles (a, the, every, no)
β¦ Indefinite determiners (another, any, some, each)
β¦ That, these, this, those when preceding noun
β¦ All, both when not preceding another determiner or possessive pronoun
PDT (Predeterminer)β¦ Determiner-like words that precede an article or
possessive pronoun⦠all his marbles
β¦ both the girls
β¦ such a good time
![Page 12: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/12.jpg)
PRP and PRP$PRP (personal pronoun)β¦ Personal pronouns (I, me, you, he, him, it, etc.)
β¦ Reflective pronouns (ending in -self): himself, herself
β¦ Nominal possessive pronouns: mine, yours, hers
PRP$ (possessive pronouns)β¦ Adjectival possessive forms: my, their, its, his, her
![Page 13: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/13.jpg)
AdjectivesJJ (Adjectives)
β¦ General adjectives (happy person, new house)β¦ Ordinal numbers (fourth cat)
JJR (Comparative adjectives)β¦ Adjectives with a comparative ending -er and comparative meaning
(happier person)β¦ More and less (when used as adjectives) (more mail)
JJS (Superlative adjectives)β¦ Adjectives with a superlative ending -est and superlative meaning
(happiest person)β¦ Most and least (when used as adjectives) (most mail)
![Page 14: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/14.jpg)
AdverbsRB (Adverbs)
β¦ Most words that end in βly (highly, heavily)β¦ Degree words (quite, too, very)β¦ Negative markers (not, nβt, never)
RBR (Comparative adverbs)β¦ Adverbs with a comparative ending -er and comparative
meaning, e.g., run faster⦠More/less, e.g., more expensive
RBS (Superlative adverbs)β¦ Adverbs with a superlative ending -est and superlative
meaning, e.g., run fastest⦠Most/least, e.g., most expensive
![Page 15: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/15.jpg)
IN and CCIN (preposition, subordinating conjunction)β¦ All prepositions (except to) and subordinating
conjunctions
β¦ He jumped on the table because he was excited
CC (coordinating conjunction)β¦ And, but, not, or
β¦ Math operators (plus, minor, less, times)
β¦ For (meaning βbecauseβ)β¦ he asked to be transferred, for he was unhappy
![Page 16: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/16.jpg)
The POS Tagging TaskTask: assigning a POS to each wordnot trivial: many words have several tags
dictionary only lists possible POS, independent of context
![Page 17: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/17.jpg)
Why Tag?POS tagging can help parsing by reducing ambiguity
Can resolve some pronunciation ambiguities for text-to-speech (βdesertβ β noun: /ΛdΙzΙrt/, verb: /dΙͺΛzΙrt/ )
Can resolve some semantic ambiguities
![Page 18: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/18.jpg)
Some Tricky CasesJJ or VBNβ¦ If it is gradable (can insert βveryβ) = JJ
β¦ He was very surprised
β¦ If can be followed by a βbyβ phrase = VBN. If that conflicts with #1 above, then = JJβ¦ He was invited by some friends of her
β¦ He was very surprised by her remarks
JJ or NNP/NNPS⦠Proper names can be adjectives or nouns
β¦ French cuisine is delicious
β¦ The French tend to be inspired cooks
JJ
JJ
VBN
JJ
NNPS
![Page 19: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/19.jpg)
Some Tricky CasesNN or VBG⦠Only nouns can be modified by adjectives; only gerunds(-ing) can be
modified by adverbs⦠Good cooking is something to enjoy
β¦ Cooking well is a useful skill
IN or RP⦠If it can precede or follow the noun phrase = RP
β¦ She told off her friends
β¦ She told her friends off
⦠If it must precede the noun phrase = IN⦠She stepped off the train
β¦ *She stepped the train off
NN
VBG
![Page 20: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/20.jpg)
Quiz [SLP2]Find the tagging errors in the following sentences:
I/PRP need/VBP a/DT flight/NN from/IN Atlanta/NN
Does/VBZ this/DT flight/NN serve/VB dinner/NNS
I/PRP have/VB a/DT friend/NN living/VBG in/IN Denver/NNP
Can/VBP you/PRP list/VB the/DT nonstop/JJ afternoon/NN flights/NNS
![Page 21: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/21.jpg)
Quiz [SLP2]Find the tagging errors in the following sentences:
I/PRP need/VBP a/DT flight/NN from/IN Atlanta/NN
Does/VBZ this/DT flight/NN serve/VB dinner/NNS
I/PRP have/VB a/DT friend/NN living/VBG in/IN Denver/NNP
Can/VBP you/PRP list/VB the/DT nonstop/JJ afternoon/NN flights/NNS
NNP
NN
VBP
MD
![Page 22: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/22.jpg)
POS Tagging MethodsSimilar to text classification, we would like to use machine learning methods to do POS tagging.
Using supervised learning, we need to assemble a text corpus and manually annotate the POS for every word in the corpus (i.e., the Brown corpus) (i.e., the corpus-based methods).
β¦ We can divide the corpus into training data, development data and test data
To build a good corpus⦠we must define a task people can do reliably (choose a suitable POS set)⦠we must provide good documentation for the task
β¦ so annotation can be done consistently
β¦ we must measure human performance (through dual annotation and inter-annotator agreement)
β¦ Often requires several iterations of refinement
![Page 23: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/23.jpg)
The Simplest POS Tagging MethodWe tag each word with its most likely part-of-speech (based on the training data)
β¦ this works quite well: about 90% accuracy when trained and tested on similar texts
β¦ although many words have multiple parts of speech, one POS typically dominates within a single text type
How can we take advantage of context to do better?
![Page 24: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/24.jpg)
POS Tagger As Sequence LabelingSequence labeling: given a sequence of observations π₯ =π₯1, π₯2, β¦ , π₯π, we need to assign a label/type/class π¦π for each observation π₯π β π₯, leading to the sequence label π¦ =π¦1, π¦2, β¦ , π¦π for π₯ (π¦π β π) (π is the set of possible POS tags)
For POS tagging, π₯ can be an input sentence where π₯π is the π-th word in the sentence, and π¦π can be the POS tag of π₯π in π₯ (π is the set of the possible POS tags in our data). E.g.,
π₯ = Does this flight serve dinner
π¦ = VBZ DT NN VB NN
![Page 25: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/25.jpg)
Sequence LabelingAs in text classification, we also want to estimate the distribution from the training data:
π π¦ π₯ = π(π¦1, π¦2, β¦ , π¦π|π₯1, π₯2, β¦ , π₯π)
So, we can also obtain the predicted label sequence for π₯ by:
π¦β = ππππππ₯π¦π π¦ π₯ = ππππππ₯π¦π(π¦1, π¦2, β¦ , π¦π|π₯1, π₯2, β¦ , π₯π)
![Page 26: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/26.jpg)
Hidden Markov Model (HMM)Using Bayesβ Rule
ππππππ₯π¦π(π¦|π₯) = ππππππ₯π¦π(π₯|π¦)π(π¦)
π(π₯)
= ππππππ₯π¦π π₯ π¦ π π¦
= ππππππ₯π¦π π₯1, π₯2, β¦ , π₯π|π¦1, π¦2, β¦ , π¦π π(π¦1, π¦2, β¦ , π¦π)
First-order Markov assumption: the probability of the label for the current step only depends on the label from the previous step, so:
π π¦1, π¦2, β¦ , π¦π = Οπ‘=1π π π¦π‘|π¦<π‘ = Οπ‘=1
π π π¦π‘|π¦π‘β1
Independency assumption: the probability of the current word is only dependent on its label:
π π₯1, π₯2, β¦ , π₯π|π¦1, π¦2, β¦ , π¦π = Οπ‘=1π π(π₯π‘|π₯<π‘ , π¦) = Οπ‘=1
π π(π₯π‘|π¦π‘)
So, in HMM, we need to obtain two types of probabilities:β¦ The transition probabilities: π π¦π‘|π¦π‘β1β¦ The emission probabilities: π(π₯π‘|π¦π‘)
![Page 27: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/27.jpg)
Parameter EstimationUsing Maximum Likelihood Estimators as in NaΓ―ve Bayes (i.e., just counting):
π π¦π‘|π¦π‘β1 =π(π¦π‘β1,π¦π‘)
π(π¦π‘β1)
π π₯π‘|π¦π‘ =π(π₯π‘,π¦π‘)
π(π¦π‘)
With smoothing:
π π₯π‘|π¦π‘ =πΌ+ π(π₯π‘,π¦π‘)
π πΌ + π(π¦π‘)
How many times π¦π‘β1 and π¦π‘ appear together in the training data?
How many times π¦π‘β1 appears in the training data?
How many times π₯π‘ appears with π¦π‘ in the training data?
How many probabilities we have?
![Page 28: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/28.jpg)
Transition Probabilities
![Page 29: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/29.jpg)
Emission Probabilities
![Page 30: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/30.jpg)
Hidden State Network
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
![Page 31: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/31.jpg)
DecodingGiven the transition and emission probabilities π π¦π‘|π¦π‘β1 and π(π₯π‘|π¦π‘), we need to find the best label sequence π¦β = π¦1
β, π¦2β, β¦ , π¦π
β for the input sentence π₯ = π₯1, π₯2, β¦ , π₯π via:
π¦β = ππππππ₯π¦π π¦ π₯
= ππππππ₯π¦π(π₯,π¦)
π(π₯)= ππππππ₯π¦π π₯, π¦
= ππππππ₯π¦π(π₯1, π₯2, β¦ , π₯π, π¦1, π¦2, β¦ , π¦π)
This requires the enumeration over all the possible label sequences (paths) π¦which are exponentially large
⦠E.g., using Penn Treebank with 45 tags⦠A sentence of length 5 would have 455 = 184,528,15 possible sequences
β¦ A sentence of length 20 would have 4520 = 1.16e33 possible sequences
![Page 32: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/32.jpg)
Greedy Decodersimplest decoder (tagger) assign tags deterministically from left to right
selects π¦π‘β to maximize π(π₯π‘|π¦π‘) β π π¦π‘|π¦π‘β1
does not take advantage of right context
can we do better?
![Page 33: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/33.jpg)
Viterbi AlgorithmBasic idea: if an optimal path through a sequence uses label πΏ at time π‘, then it must have used an optimal path to get to label πΏ at time π‘
We can thus discard all non-optimal paths up to label πΏ at time π‘
Let π£π‘(π ) be the probability that the HMM is in state (label) s after seeing the first t observations (words) and passing through the most probable state sequence π¦1, π¦2, β¦ , π¦π‘β1:
π£π‘ π = πππ₯π¦1,π¦2,β¦,π¦π‘β1π(π₯1, π₯2, β¦ , π₯π‘ , π¦1, π¦2, β¦ , π¦π‘β1, π¦π‘ = π )
Introducing the π π‘πππ‘ and πππ states to represent the beginning and the end of the sentences (π¦0 = π π‘πππ‘, π¦π+1 = πππ), the probability for the optimal label sequence would be:
π£π+1 πππ = πππ₯π¦1,π¦2,β¦,π¦ππ(π₯1, π₯2, β¦ , π₯π, π¦0 = π π‘πππ‘, π¦1, π¦2, β¦ , π¦π, π¦π+1 = πππ)
![Page 34: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/34.jpg)
Viterbi Algorithmπ£π‘ π = πππ₯π¦1,π¦2,β¦,π¦π‘β1π(π₯1, π₯2, β¦ , π₯π‘, π¦0 = π π‘πππ‘, π¦1, π¦2, β¦ , π¦π‘β1, π¦π‘ = π )
Initialization (π‘ = 0):
π£0 π = α1 ππ π = π π‘πππ‘0 ππ‘βπππ€ππ π
Recurrence (π‘ > 0):π£π‘ π = πππ₯π β²βπ[π£π‘β1 π β² π π π β² π(π₯π‘|π )]
πππππ‘πππππ‘ π = ππππππ₯π β²βπ[π£π‘β1 π β² π π π β² π(π₯π‘|π )]
Termination (π‘ = π + 1): the optimal probability is π£π+1 πππ , following the backtrack links (starting at πππππ‘πππππ+1 πππ ) to retrieve the optimal path.
![Page 35: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/35.jpg)
ExampleFish sleep
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
![Page 36: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/36.jpg)
Word Emission ProbabilitiesWord Emission Probabilities P ( word | state )
A two-word language: βfishβ and βsleepβ
Suppose in our training corpus,β¦ βfishβ appears 8 times as a noun and 5 times as a verb
β¦ βsleepβ appears twice as a noun and 5 times as a verb
Emission probabilities:β¦ Noun
β¦ P(fish | noun) : 0.8
β¦ P(sleep | noun) : 0.2
⦠Verb⦠P(fish | verb) : 0.5
β¦ P(sleep | verb) : 0.5
![Page 37: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/37.jpg)
Viterbi Probabilities
0 1 2 3
start
verb
noun
end
![Page 38: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/38.jpg)
0 1 2 3
start 1
verb 0
noun 0
end 0
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
Init
![Page 39: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/39.jpg)
0 1 2 3
start 1 0
verb 0 .2 * .5
noun 0 .8 * .8
end 0 0
Token 1: fish
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 40: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/40.jpg)
0 1 2 3
start 1 0
verb 0 .1
noun 0 .64
end 0 0
Token 1: fish
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 41: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/41.jpg)
0 1 2 3
start 1 0 0
verb 0 .1 .1*.1*.5
noun 0 .64 .1*.2*.2
end 0 0 -
Token 2: sleep
(if βfishβ is verb)
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 42: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/42.jpg)
0 1 2 3
start 1 0 0
verb 0 .1 .005
.64*.8*.5
noun 0 .64 .004
.64*.1*.2
end 0 0 -
Token 2: sleep
(if βfishβ is a noun)
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 43: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/43.jpg)
0 1 2 3
start 1 0 0
verb 0 .1 .005
.256
noun 0 .64 .004
.0128
end 0 0 -
Token 2: sleep
(if βfishβ is a noun)
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 44: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/44.jpg)
0 1 2 3
start 1 0 0
verb 0 .1 .005
.256
noun 0 .64 .004
.0128
end 0 0 -
Token 2: sleeptake maximum,set back pointers
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 45: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/45.jpg)
0 1 2 3
start 1 0 0
verb 0 .1 .256
noun 0 .64 .0128
end 0 0 -
Token 2: sleeptake maximum,set back pointers
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 46: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/46.jpg)
0 1 2 3
start 1 0 0 0
verb 0 .1 .256 -
noun 0 .64 .0128 -
end 0 0 - .256*.7
.0128*.1
Token 3: end
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 47: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/47.jpg)
0 1 2 3
start 1 0 0 0
verb 0 .1 .256 -
noun 0 .64 .0128 -
end 0 0 - .256*.7
.0128*.1
Token 3: endtake maximum,set back pointers
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 48: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/48.jpg)
0 1 2 3
start 1 0 0 0
verb 0 .1 .256 -
noun 0 .64 .0128 -
end 0 0 - .256*.7
Decode:fish = nounsleep = verb
start noun verb end0.8
0.2
0.80.7
0.1
0.2
0.10.1
NounP(fish | noun) : 0.8P(sleep | noun) : 0.2
VerbP(fish | verb) : 0.5P(sleep | verb) : 0.5
![Page 49: Part-of-Speech Tagging and Hidden Markov Model](https://reader030.fdocuments.us/reader030/viewer/2022012714/61ad0e102eadec72ae1fc5a1/html5/thumbnails/49.jpg)
Complexity for Viterbitime = π ( π 2 π)
for π states (labels) and π words
(Relatively fast: for 40 states and 20 words,
32,000 steps)