School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning...
-
Upload
kimberly-jacobs -
Category
Documents
-
view
224 -
download
5
Transcript of School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning...
![Page 1: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/1.jpg)
School of somethingFACULTY OF OTHER
School of ComputingFACULTY OF ENGINEERING
Machine Learning PoS-Taggers
COMP3310 Natural Language Processing
Eric Atwell, Language Research Group
(with thanks to Katja Markert, Marti Hearst, and other contributors)
![Page 2: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/2.jpg)
Reminder
Puns play on our assumptions of the next word…
… eg they present us with an unexpected homonym (bends)
ConditionalFreqDist() counts word-pairs: word bigrams
Used for story generation, Speech recognition, …
Parts of Speech: groups words into grammatical categories
… and separates different functions of a word
In English, many words are ambiguous: 2 or more PoS-tags
Very simple tagger: tag with the likeliest tag for the word
Better Pos-Taggers: to come…
![Page 3: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/3.jpg)
Taking context into account
Theory behind some example Machine Learning PoS-taggers
Example implementations in NLTK
Machine Learning from a PoS-tagged training corpus
Statistical (N-Gram/Markov) taggers:
learn table of 1/2/3/N-tag sequence frequencies
Brill (transformation-based) tagger:
learn likeliest tag for each word ignoring context,
then learn rules to change tag to fit context
NB you don’t have to use NLTK – just useful to illustrate
![Page 4: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/4.jpg)
Training and Testing ofMachine Learning Algorithms
Algorithms that “learn” from data see a set of examples and try to generalize from them.
Training set:
• Examples trained on
Test set:
• Also called held-out data and unseen data
• Use this for evaluating your algorithm
• Must be separate from the training set; otherwise, you cheated!
“Gold standard” evaluation corpus
• An evaluation set that a community has agreed on and uses as a common benchmark.
• Not “seen” until development is finished – ONLY for evaluation
![Page 5: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/5.jpg)
Cross-Validation of Learning Algorithms
Cross-validation set
• Part of the training set.
Used for tuning parameters of the algorithm without “polluting” (tuning to) the test data.
• You can train on x%, and then cross-validate on the remaining 1-x%
• E.g., train on 90% of the training data, cross-validate (test) on the remaining 10%
• Repeat several times with different splits
• This allows you to choose the best settings to then use on the real test set.
• You should only evaluate on the test set at the very end, after you’ve gotten your algorithm as good as possible on the cross-validation set.
![Page 6: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/6.jpg)
Strong Baselines
When designing NLP algorithms, you need to evaluate them by comparing to others.
Baseline Algorithm:
• An algorithm that is relatively simple but can be expected to do “ok”
• Should get the best score possible by doing the obvious thing.
![Page 7: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/7.jpg)
A Tagging Baseline
Find the most likely tag for the most frequent words
• Frequent words are ambiguous
• You’re likely to see frequent words in any collection
• Will always see “to” but might not see “armadillo”
How to do this?
• First find the most likely words and their tags in the training data
• Train a tagger that looks up these results in a table
![Page 8: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/8.jpg)
Find the most frequent words and the most likely tag of each
![Page 9: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/9.jpg)
Use our own tagger class
![Page 10: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/10.jpg)
N-Grams
The N stands for how many terms are used
• Unigram: 1 term (0th order)
• Bigram: 2 terms (1st order)
• Trigrams: 3 terms (2nd order)
• Usually don’t go beyond this
You can use different kinds of terms, e.g.:
• Character based n-grams
• Word-based n-grams
• POS-based n-grams
Ordering
• Often adjacent, but not required
We use n-grams to help determine the context in which some linguistic phenomenon happens.
E.g., look at the words before and after period to see if it is the end of sentence or not.
![Page 11: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/11.jpg)
Tagging with lexical frequencies
Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN
People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN
Problem: assign a tag to race given its lexical frequency
Solution: we choose the tag that has the greater probability
• P(race|VB)
• P(race|NN)
![Page 12: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/12.jpg)
Unigram Tagger
Train on a set of sentences
Keep track of how many times each word is seen with each tag.
After training, associate with each word its most likely tag.
• Problem: many words never seen in the training data.
• Solution: have a default tag to “backoff” to.
![Page 13: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/13.jpg)
Unigram tagger with Backoff
![Page 14: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/14.jpg)
What’s wrong with unigram?
Most frequent tag isn’t always right!
Need to take the context into account
• Which sense of “to” is being used?
• Which sense of “like” is being used?
![Page 15: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/15.jpg)
N-gram tagger
Uses the preceding N-1 predicted tags
Also uses the unigram estimate for the current word
![Page 16: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/16.jpg)
Bigram Tagging
• For tagging, in addition to considering the token’s type, the context also considers the tags of the n preceding tokens
• What is the most likely tag for word n, given word n-1 and tag n-1?
• The tagger picks the tag which is most likely for that context.
![Page 17: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/17.jpg)
Combining Taggers using Backoff
Use more accurate algorithms when we can, backoff to wider coverage when needed.
• Try tagging the token with the 1st order tagger.
• If the 1st order tagger is unable to find a tag for the token, try finding a tag with the 0th order tagger.
• If the 0th order tagger is also unable to find a tag, use the default tagger to find a tag.
Important point:
• Bigram and trigram taggers use the previous tag context to assign new tags. If they see a tag of “None” in the previous context, they will output “None” too.
![Page 18: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/18.jpg)
Demonstrating the n-gram taggers
Trained on brown.tagged(‘a’), tested on brown.tagged(‘b’)
Backs off to a default of ‘nn’
![Page 19: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/19.jpg)
Demonstrating the n-gram taggers
![Page 20: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/20.jpg)
Combining Taggers
The bigram backoff tagger did worse than the unigram! Why?
Why does it get better again with trigrams?
How can we improve these scores?
![Page 21: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/21.jpg)
Rule-Based Tagger
The Linguistic Complaint
• Where is the linguistic knowledge of a tagger?
• Just a massive table of numbers
• Aren’t there any linguistic insights that could emerge from the data?
• Could thus use handcrafted sets of rules to tag input sentences, for example, if input follows a determiner tag it as a noun.
• Constraint Grammar (CG) tagger: PhD student spends 3+ years coding a large set of these rules (for English, Finnish, …)
• Machine Learning researchers would prefer to use ML to extract a large set of such rules from a PoS-tagged training corpus
![Page 22: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/22.jpg)
The Brill tagger
An example of Transformation-Based Learning
• Basic idea: do a quick job first (using frequency), then revise it using contextual rules.
Very popular (freely available, works fairly well)
A supervised method: requires a tagged corpus
![Page 23: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/23.jpg)
Brill Tagging: In more detail
Start with simple (less accurate) rules…learn better ones from tagged corpus
• Tag each word initially with most likely POS
• Examine set of transformations to see which improves tagging decisions compared to tagged corpus
• Re-tag corpus using best transformation
• Repeat until, e.g., performance doesn’t improve
• Result: tagging procedure (ordered list of transformations) which can be applied to new, untagged text
![Page 24: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/24.jpg)
An example
Examples:
• They are expected to race tomorrow.
• The race for outer space.
Tagging algorithm:
1. Tag all uses of “race” as NN (most likely tag in the Brown corpus)
• They are expected to race/NN tomorrow
• the race/NN for outer space
2. Use a transformation rule to replace the tag NN with VB for all uses of “race” preceded by the tag TO:
• They are expected to race/VB tomorrow
• the race/NN for outer space
![Page 25: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/25.jpg)
Example Rule Transformations
![Page 26: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/26.jpg)
Sample Final Rules
![Page 27: School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.](https://reader035.fdocuments.us/reader035/viewer/2022062312/5515f490550346d46f8b5587/html5/thumbnails/27.jpg)
Summary: N-gram/Markov and Transformation/Brill PoS-Taggers
Theory behind some example Machine Learning PoS-taggers
Example implementations in NLTK
Machine Learning from a PoS-tagged training corpus
Statistical (N-Gram/Markov) taggers:
learn table of 1/2/3/N-tag sequence frequencies
If not enough data for N, back off to N-1 patterns
Brill (transformation-based) tagger:
learn likeliest tag for each word ignoring context,
then learn rules to change tag to fit context
NB you don’t have to use NLTK – just useful to illustrate