Text Classification Using Naive Bayes

8/8/2019 Text Classification Using Naive Bayes

http://slidepdf.com/reader/full/text-classification-using-naive-bayes 1/26

Bayesian Classifiers Part 2



Contents

Simple Text Classification Using Naïve Bayes

Bayesian Belief Networks (Bayes Nets)



SIMPLE TEXT CLASSIFICATIONUSING NAÏVE BAYES



Learning to Classify Text



Learn_Naïve_Bayes_Text (Examples, V )



Classify_Naïve_Bayes_Text (Doc)



Twenty Newsgroups (Joachims, 1996)

1000 training documents from each of 20 groups 20,000

Use two third of them in learning to classify new documentsaccording to which newsgroup it came from.

Newsgroups:y comp.graphics, misc.forsale, comp.os.ms-windows.misc,

rec.autos, comp.sys.ibm.pc.hardware, rec.motorcycles,comp.sys.mac.hardware, rec.sport.baseball, comp.windows.x,rec.sport.hockey, alt.atheism, sci.space, soc.religion.christian,sci.crypt, talk.religion.misc, sci.electronics, talk.politics.mideast,sci.med, talk.politics.misc, talk.politics.guns

Naive Bayes: 89% classification accuracy

Random guess: ?



An article from rec.sport.hockey

Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!ogicse!uwm.edu

From: [email protected] (John Doe)

Subject: Re: This year's biggest and worst (opinion)...

Date: 5 Apr 93 09:53:39 GMT

I can only comment on the Kings, but the mostobvious candidate for pleasant surprise is Alex

Zhitnik. He came highly touted as a defensive

defenseman, but he's clearly much more than that.

Great skater and hard shot (though wish he were

more accurate). In fact, he pretty much allowed

the Kings to trade away that huge defensiveliability Paul Coffey. Kelly Hrudey is only the

biggest disappointment if you thought he was any

good to begin with. But, at best, he's only a

mediocre goaltender. A better choice would be

Tomas Sandstrom, though not through any fault of

his own, but because some thugs in Toronto decided



Learning Curve for 20 Newsgroups

Accuracy vs. Training set size (1/3 withheld for test)

(Note that the x-axis in log scale)



Problems In Classifying Text

Frequent words e.g. the, of

Words with insignificant occurrence e.g.

less than threeRemove them from Vocabulary!



BAYESIAN BELIEF NETWORKS(BAYES NETS)



Overview

Bayesian Belief Network

Learning Bayesian Network

± Data is fully observable and network structure is

known Conditional probabilities table from training data (Naïve Bayesian

classifier)

± Network structure is known, but data is partiallyobservable

Conditional probabilities table can be obtained in similar manner forobtaining neural network weights

Other technique is by using EM algorithm

± Data is partially observable and networkstructure is unknown?



Bayesian Belief Networks

Interesting because: ± Naive Bayes assumption of conditional

independence too restrictive

± But it's intractable without some suchassumptions...

± Bayesian Belief networks describe conditionalindependence among of variables

± allows combining prior knowledge about

(in)dependencies among variables withobserved training data

Also called Bayes Nets



Conditional Independence




Network represents a set of conditional independenceassertions:y Each node is asserted to be conditionally independent of its

nondescendants, given its immediate predecessors.

y Directed acyclic graph



Inference in Bayesian Networks

How can one infer the values of one or more networkvariables, given observed values of others?y Bayes net contains all information needed for this

inference

y

If only one variable with unknown value, easy to infer ity In general case, problem is NP hard

In practice, one can succeed in many casesy Exact inference methods work well for some network

structuresy Monte Carlo methods simulate the network randomly to

calculate approximate solutions



Learning of Bayesian Networks

Several variants of this learning tasky Network structure might be know n or unknow n

y Training examples might provide values of all networkvariables, or just some

If structure known and observe all variablesy Then it's easy as training a Naive Bayes classifier

Suppose structure known, variables partially

observabley e.g., observe F or est Fi r e, St or m, BusT ourGrou p, Thund er ,

but not Light ni ng , Campf i r e Similar to training neural network with hidden units

In fact, one can learn network conditional probability tables using gradient ascent!

Converge to network h that (locally) maximizes P( D|h)



Learning of Bayesian Networks?

Maximization of P(D|h)

In principle, it is easyy

Calculate P(D|h) for each h and return h of maximunP(D|h)

In practice, h contains many, many continuousvariablesy

Use gradient descent (ascent) method

In general, h contains discrete variables, too. (?)y Use an algorithm for combinatorial optimization,

such as simulated annealing method



Gradient Ascent for Bayes Nets

i j

k




' ' ' '

', '

' ' ' ' '

', '

ln lnln

1

1| , ,

1| , |

1| , |

h h

h

d Dd Dijk ijk ijk

h

d D h ijk

h ij ik h ij ik

d D j k h ijk

h ij ik h ij ik h ik

d D j k h ijk


d D h ijk

P D P d P d

w w w

P d

P d w

P d y u P y u P d w

P d y u P y u P u P d w

P d y u P y u P u P d w

x xx! !

x x x

x!

x

x!

x

x!

x

x!

x

§

§

§ §

§ §

§




ln 1| , |

1| ,

1| ,

, |1

,

, | , |

, |

h


d Dijk h ijk

h ij ik ijk h ik

d D h ijk

h ij ik h ik

d D h

h ij ik h h ik

d D h h ij ik

h ij ik h ik h ij ik

d D d h ij ik h ij ik

P D P d y u P y u P u

w P d w

P d y u w P u

P d w

P d y u P u P d

P y u d P d P u

P d P y u

P y u d P u P y u d

P y u P y u

x x!

x x

x!

x

!

!

! !

§

§

§

§

§ ...

D

!§

Text Classification Using Naive Bayes

Documents

Transcript of Text Classification Using Naive Bayes