Using Error-Correcting Codes For Text Classification
description
Transcript of Using Error-Correcting Codes For Text Classification
![Page 1: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/1.jpg)
Using Error-Correcting Codes For Text Classification
Rayid [email protected]
Center for Automated Learning & Discovery,Carnegie Mellon University
This presentation can be accessed at http://www.cs.cmu.edu/~rayid/icmltalk
![Page 2: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/2.jpg)
Outline Review of ECOC Previous Work Types of Codes Experimental Results Semi-Theoretical Model Drawbacks Conclusions & Work in Progress
![Page 3: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/3.jpg)
Overview of ECOC Decompose a multiclass problem
into multiple binary problems The conversion can be independent
or dependent of the data (it does depend on the number of classes)
Any learner that can learn binary functions can then be used to learn the original multivalued function
![Page 4: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/4.jpg)
ECOC-Picture
0111
0100
1001
C
B
A
0111
0100
1001
C
B
A
4321 ffff
A B
C
![Page 5: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/5.jpg)
Training ECOC
Given m distinct classes
Create an m x n binary matrix M.
Each class is assigned ONE row of M.
Each column of the matrix divides the classes into TWO groups.
Train the Base classifier to learn the n binary problems.
![Page 6: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/6.jpg)
Testing ECOC To test a new instance
Apply each of the n classifiers to the new instance
Combine the predictions to obtain a binary string(codeword) for the new point
Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)
![Page 7: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/7.jpg)
Previous Work Combine with Boosting –
ADABOOST.OC (Schapire, 1997)
![Page 8: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/8.jpg)
Types of Codes Random Algebraic Constructed/Meaningful
![Page 9: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/9.jpg)
Experimental Setup Generate the code Choose a Base Learner
![Page 10: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/10.jpg)
Dataset Industry Sector Dataset
Consists of company web pages classified into 105 economic sectors
Standard stoplist No Stemming Skip all MIME and HTML headers Experimental approach similar to
McCallum et al. (1997) for comparison purposes.
![Page 11: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/11.jpg)
Results
Classification Accuracies on five random 50-50 train-test splits of the Industry Sector dataset with a vocabulary size of 10000.
ECOC - 88% accurate!
Comparison with NBC
0
20
40
60
80
100
Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
Cla
ssif
icat
ion
Acc
ura
cy (
%)
Naive Bayes Classifier
63-bit ECOC
![Page 12: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/12.jpg)
How does the length of the code matter?
Naive Bayes Classifier15-bit ECOC 31-bit ECOC 63-bit ECOCAccuracy(%) 65.3 77.4 83.6 88.1
Table 2: Average Classification Accuracy on 5 random 50-50 train-test splits of the Industry Sector dataset with a vocabulary size of 10000 words selected using Information Gain.
Longer codes mean larger codeword separation
The minimum hamming distance of a code C is the smallest distance between any pair of distance codewords in C
If minimum hamming distance is h, then the code can correct (h-1)/2 errors
![Page 13: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/13.jpg)
No. of Bits Training SizeHmin Emin p(average) Theoretical Exprerimental15 20 5 2 0.846 58.68 64.5415 50 5 2 0.895 79.64 77.3715 80 5 2 0.907 84.23 79.4231 20 11 5 0.847 66.53 71.7631 50 11 5 0.899 91.34 83.5731 80 11 5 0.908 93.97 84.7663 50 31 15 0.897 99.95 88.12
Theoretical Evidence Model ECOC by a Binomial Distribution
B(n,p) n = length of the codep = probability of each bit being
classified incorrectly
![Page 14: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/14.jpg)
Theoretical Vs. Experimental AccuracyVocabsize=10000
0
20
40
60
80
100
15 15 15 31 31 31 63
Length of Code
Acc
ura
cy (
%)
Theoretical
Exprerimental
![Page 15: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/15.jpg)
Size Matters?
Variation of accuracy with code length and training size
40
50
60
70
80
90
100
0 20 40 60 80 100
Training size per class
Acc
ura
cy (
%) SBC
15bit
31bit
63bit
![Page 16: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/16.jpg)
Size does NOT matter!
Percent Decrease in Error with Training size and length of code
30
35
40
45
50
55
60
65
70
0 20 40 60 80 100
Training Size
% D
ecre
ase
in E
rro
r
15bit
31bit
63bit
![Page 17: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/17.jpg)
Choosing Codes
![Page 18: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/18.jpg)
Interesting Observations NBC does not give good probabilitiy
estimates- using ECOC results in better estimates.
![Page 19: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/19.jpg)
Drawbacks Can be computationally expensive Random Codes throw away the real-
world nature of the data by picking random partitions to create artificial binary problems
![Page 20: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/20.jpg)
Conclusion Improves Classification Accuracy
considerably! Extends a binary learner to a
multiclass learner Can be used when training data is
sparse
![Page 21: Using Error-Correcting Codes For Text Classification](https://reader036.fdocuments.us/reader036/viewer/2022062409/56814e04550346895dbb7249/html5/thumbnails/21.jpg)
Future Work Use meaningful codes (hierarchy
or distinguishing between particularly difficult classes)
Use artificial datasets Combine ECOC with Co-Training or
Shrinkage Methods Sufficient and Necessary
conditions for optimal behavior