Lesson Overview Lesson Overview Primate Evolution Lesson Overview 26.3 Primate Evolution.
Overview
description
Transcript of Overview
![Page 1: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/1.jpg)
Taking the Kitchen Sink Seriously:
An Ensemble Approach to Word Sense Disambiguation from
Christopher Manning et al.
![Page 2: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/2.jpg)
Overview
● 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers).
● Performed well on SENSEVAL-2: 4th place out of 21 supervised systems on the English Lexical Sample task.
● Offers some valuable lessons for both WSD and ensemble methods in general.
![Page 3: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/3.jpg)
System Overview
● 23 different "1st order" classifiers.
– Independently developed WSD systems.
– Use a variety of algorithms (naïve bayes, n-gram, etc.).
● These 1st order classifiers combined into a variety of 2nd order classifiers/voting mechanisms.
– 2nd order classifiers vary with respect to:
● Algorithms used to combine 1st order classifiers.● Number of voters. Each takes the top k 1st order,
where k is one of {1,3,5,7,9,11,13,15} .
![Page 4: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/4.jpg)
Voting Algorithms
● Majority vote (each vote has weight 1).
● Weighted voting, with weights determined by EM.
– Tries to choose weights that maximize the likelihood of 2nd order training instances, where the probability of a sense (given the votes) is defined as the sum of weighted votes for that sense.
● Maximum entropy using features derived from the votes of the 1st order classifiers.
![Page 5: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/5.jpg)
Classifier Construction Process● For each word:
– Train each 1st order on ¾ of training data
– Use remaining ¼ of data to rank performance of 1st orders
– For each 2nd order classifier:
● Take the top k 1st orders for this word● Train the 2nd order on ¾ of training data using
this ensemble– Rank performance of 2nd orders with ¼ of training
data
– Take the top 2nd order as the classifier for this word. Retrain on all the training data.
![Page 6: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/6.jpg)
Results
● 61.7% accuracy in SENSEVAL-2 competition (4th place).
● After competition, improved performance:
– Used global performance (i.e., over all words) as a tie breaker for rankings of both 1st and 2nd order .
– Improved accuracy to 63.9% (would have been 2nd).
![Page 7: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/7.jpg)
Results for 2nd Order Classifiers
● Results are averaged over all words.
● Note MaxEnt's ability to resist dilution.
![Page 8: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/8.jpg)
Evaluating Effects of Combination● We want different classifiers to make different mistakes.
● We can measure this differentiation as the average (over all pairs of 1st order classifiers) of the fraction of errors that are shared (error independence).
● When error independence and word difficulty grow, the advantage of combination grows.
![Page 9: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/9.jpg)
Lessons for WSD
● Every word is a separate problem.
– All 1st and 2nd order classifiers had some words on which they did the best.
● Implementation details:
– Large or small window sizes work better than medium window sizes.
– This suggests that senses are determined on both a very local, collocational level and a very general, topical level.
– Smoothing is very important.
![Page 10: Overview](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813b02550346895da3a30b/html5/thumbnails/10.jpg)
Lessons for Ensemble Methods
● Variety within the ensemble is desirable.
– Qualitatively different approaches are better than minor perturbations in similar approaches.
– We can measure the extent to which this ideal is achieved.
● Variety in combination algorithms helps as well.
– In particular, it can help with overfitting (because different algorithms will start overtraining at different points).