ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a...

39
ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect of Smoothing on Naive Bayes for text classification (with Tong Zhang) Hypertext Categorization using link and extracted information (with Sean Slattery & Yiming Yang) Some Recent work
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a...

Page 1: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

ECOC for Text Classification

Hybrids of EM & Co-Training (with Kamal Nigam)

Learning to build a monolingual corpus from the web (with Rosie Jones)

Effect of Smoothing on Naive Bayes for text classification (with Tong Zhang)

Hypertext Categorization using link and extracted information (with Sean Slattery & Yiming Yang)

Some Recent work

Page 2: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Using Error-Correcting Codes For Text Classification

Rayid GhaniCenter for Automated Learning & DiscoveryCarnegie Mellon University

This presentation can be accessed at http://www.cs.cmu.edu/~rayid/talks/

Page 3: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Outline Introduction to ECOC Intuition & Motivation Some Questions? Experimental Results Semi-Theoretical Model Types of Codes Drawbacks Conclusions

Page 4: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Introduction Decompose a multiclass

classification problem into multiple binary problems One-Per-Class Approach (moderately

expensive) All-Pairs (very expensive) Distributed Output Code (efficient but

what about performance?) Error-Correcting Output Codes (?)

Page 5: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.
Page 6: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Is it a good idea? Larger margin for error since errors can

now be “corrected” One-per-class is a code with minimum

hamming distance (HD) = 2 Distributed codes have low HD

The individual binary problems can be harder than before

Useless unless number of classes > 5

Page 7: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Training ECOC Given m distinct classes

1. Create an m x n binary matrix M.

2. Each class is assigned ONE row of M.

3. Each column of the matrix divides the classes into TWO groups.

4. Train the Base classifiers to learn the n binary problems.

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

ABCD

f1 f2 f3 f4 f5

Page 8: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Training ECOC

Given m distinct classes

Create an m x n binary matrix M.

Each class is assigned ONE row of M.

Each column of the matrix divides the classes into TWO groups.

Train the Base classifiers to learn the n binary problems.

Page 9: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Testing ECOC To test a new instance

Apply each of the n classifiers to the new instance

Combine the predictions to obtain a binary string(codeword) for the new point

Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)

Page 10: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

ABCD

A

DC

B

f1 f2 f3 f4 f5

Page 11: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

ABCD

A

DC

B

f1 f2 f3 f4 f5

Page 12: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

ABCD

A

DC

B

f1 f2 f3 f4 f5

Page 13: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

ECOC - Picture

0 0 1 1 01 0 1 0 00 1 1 1 00 1 0 0 1

ABCD

A

DC

B

f1 f2 f3 f4 f5

X 1 1 1 1 0

Page 14: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Single classifier – learns a complex boundary once

Ensemble – learns a complex boundary multiple times

ECOC – learns a “simple” boudary multiple times

Page 15: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Questions? How well does it work? How long should the code be? Do we need a lot of training data? What kind of codes can we use? Are there intelligent ways of creating

the code?

Page 16: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Previous Work Combine with Boosting –

ADABOOST.OC (Schapire, 1997), (Guruswami & Sahai, 1999)

Local Learners (Ricci & Aha, 1997) Text Classification (Berger, 1999)

Page 17: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Experimental Setup Generate the code

BCH Codes Choose a Base Learner

Naive Bayes Classifier as used in text classification tasks (McCallum & Nigam 1998)

Page 18: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Dataset Industry Sector Dataset

Consists of company web pages classified into 105 economic sectors

Standard stoplist No Stemming Skip all MIME headers and HTML tags Experimental approach similar to

McCallum et al. (1998) for comparison purposes.

Page 19: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Results

Industry Sector Data Set

Naïve Bayes

Shrinkage1 ME2 ME/ w Prior3

ECOC 63-bit

66.1% 76% 79% 81.1% 88.5%

ECOC reduces the error of the Naïve Bayes Classifier by 66%

1. (McCallum et al. 1998) 2,3. (Nigam et al. 1999)

Page 20: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

The Longer the Better!Naive Bayes Classifier15-bit ECOC 31-bit ECOC 63-bit ECOC

Accuracy(%) 65.3 77.4 83.6 88.1

Table 2: Average Classification Accuracy on 5 random 50-50 train-test splits of the Industry Sector dataset with a vocabulary size of 10000 words selected using Information Gain.

Longer codes mean larger codeword separation

The minimum hamming distance of a code C is the smallest distance between any pair of distance codewords in C

If minimum hamming distance is h, then the code can correct (h-1)/2 errors

Page 21: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Size Matters?

Variation of accuracy with code length and training size

40

50

60

70

80

90

100

0 20 40 60 80 100

Training size per class

Acc

ura

cy (

%) SBC

15bit

31bit

63bit

Page 22: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Size does NOT matter!

Percent Decrease in Error with Training size and length of code

30

35

40

45

50

55

60

65

70

0 20 40 60 80 100

Training Size

% D

ecre

ase

in E

rro

r

15bit

31bit

63bit

Page 23: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Semi-Theoretical Model Model ECOC by a Binomial Distribution B(n,p)

n = length of the codep = probability of each bit being classified

incorrectly# of Bits Hmin Emax Pave Accuracy

15 5 2 .85 .59

15 5 2 .89 .80

15 5 2 .91 .84

31 11 5 .85 .67

31 11 5 .89 .91

31 11 5 .91 .94

63 31 15 .89 .99

inave

iave

E

i

ppi

nnp

)1()(

max

0

Page 24: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Theoretical Vs. Experimental AccuracyVocabsize=10000

0

20

40

60

80

100

15 15 15 31 31 31 63

Length of Code

Acc

ura

cy (

%)

Theoretical

Exprerimental

Page 25: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Talk.misc.religion

Comp.sys.ibm.hardware

Comp.os.windows

Alt.atheism

Comp.os.windows

Comp.sys.ibm.hardware

Talk.misc.religion

Alt.atheism

Comp.os.windows

Talk.misc.religion

Comp.sys.ibm.hardware

Alt.atheism

Alt.atheism

Talk.misc.religion

Comp.sys.ibm.hardware

Comp.os.windows

Talk.misc.religion

Alt.atheism

Comp.sys.ibm.hardware

Comp.os.windows

Comp.os.windows

Alt.atheism

Comp.sys.ibm.hardware

Talk.misc.religion

99% 73% 68%

81% 86% 87%

Page 26: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Types of CodesTypes of Codes Data-Independent Data-Dependent

Algebraic

Random

Hand-Constructed

Adaptive

Page 27: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

What is a Good Code? Row Separation Column Separation (Independence

of errors for each binary classifier) Efficiency (for long codes)

Page 28: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Choosing Codes

Random Algebraic

Row Sep On AverageFor long codes

Guaranteed

Col Sep On AverageFor long codes

Can be Guaranteed

Efficiency No Yes

Page 29: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Experimental Results

Code Min Row HD

Max Row HD

Min Col HD

Max Col HD

Error Rate

15-Bit BCH

5 15 49 64 20.6%

19-Bit Hybrid

5 18 15 69 22.3%

15-bit Random

2 (1.5)

13 42 60 24.1%

Page 30: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Drawbacks Can be computationally expensive Random Codes throw away the real-

world nature of the data by picking random partitions to create artificial binary problems

Page 31: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Current Work Combine ECOC with Co-Training to

use unlabeled data Automatically construct optimal /

adaptive codes

Page 32: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Conclusion Performs well on text classification tasks Can be used when training data is sparse Algebraic codes perform better than

random codes for a given code length Hand-constructed codes may not be the

answer

Page 33: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Background Co-training seems to be the way to

go when there is (and maybe even when there isn’t) a feature split in the data

Reported results on co-training only deal with very small (toy) problems – mostly binary classification tasks (Blum & Mitchell 98, Nigam & Ghani 2000)

Page 34: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Co-Training Challenge Task: Apply cotraining to a 65 class

dataset containing 130,000 training examples

Result: Cotraining fails!

Page 35: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Solution? ECOC seems to work well when there

are a large number of classes ECOC decomposes a multiclass

problems into several binary problems Cotraining works well with binary

problems

Combine ECOC and Cotrain

Page 36: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Algorithm Learn each bit for ECOC using a

cotrained classifier

Page 37: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Dataset (Job Descriptions) 65 classes 32000 examples Two feature sets

Title Description

Page 38: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Class Distribution

0

2

4

6

8

10

12

Class

Perc

enta

ge

Page 39: ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Results 10% Train, 50% unlabeled, 40% test

NB 40.3% ECOC 48.9% EM 30.83% CoTraining ECOC-EM ECOC-Cotrain ECOC-CoEM