Text Classification Using Naive Bayes
Transcript of Text Classification Using Naive Bayes
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 1/26
Bayesian Classifiers Part 2
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 2/26
Contents
Simple Text Classification Using Naïve Bayes
Bayesian Belief Networks (Bayes Nets)
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 3/26
SIMPLE TEXT CLASSIFICATIONUSING NAÏVE BAYES
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 4/26
Learning to Classify Text
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 5/26
Learning to Classify Text
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 6/26
Learn_Naïve_Bayes_Text (Examples, V )
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 7/26
Classify_Naïve_Bayes_Text (Doc)
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 8/26
Twenty Newsgroups (Joachims, 1996)
1000 training documents from each of 20 groups 20,000
Use two third of them in learning to classify new documentsaccording to which newsgroup it came from.
Newsgroups:y comp.graphics, misc.forsale, comp.os.ms-windows.misc,
rec.autos, comp.sys.ibm.pc.hardware, rec.motorcycles,comp.sys.mac.hardware, rec.sport.baseball, comp.windows.x,rec.sport.hockey, alt.atheism, sci.space, soc.religion.christian,sci.crypt, talk.religion.misc, sci.electronics, talk.politics.mideast,sci.med, talk.politics.misc, talk.politics.guns
Naive Bayes: 89% classification accuracy
Random guess: ?
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 9/26
An article from rec.sport.hockey
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!ogicse!uwm.edu
From: [email protected] (John Doe)
Subject: Re: This year's biggest and worst (opinion)...
Date: 5 Apr 93 09:53:39 GMT
I can only comment on the Kings, but the mostobvious candidate for pleasant surprise is Alex
Zhitnik. He came highly touted as a defensive
defenseman, but he's clearly much more than that.
Great skater and hard shot (though wish he were
more accurate). In fact, he pretty much allowed
the Kings to trade away that huge defensiveliability Paul Coffey. Kelly Hrudey is only the
biggest disappointment if you thought he was any
good to begin with. But, at best, he's only a
mediocre goaltender. A better choice would be
Tomas Sandstrom, though not through any fault of
his own, but because some thugs in Toronto decided
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 10/26
Learning Curve for 20 Newsgroups
Accuracy vs. Training set size (1/3 withheld for test)
(Note that the x-axis in log scale)
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 11/26
Problems In Classifying Text
Frequent words e.g. the, of
Words with insignificant occurrence e.g.
less than threeRemove them from Vocabulary!
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 12/26
BAYESIAN BELIEF NETWORKS(BAYES NETS)
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 13/26
Overview
Bayesian Belief Network
Learning Bayesian Network
± Data is fully observable and network structure is
known Conditional probabilities table from training data (Naïve Bayesian
classifier)
± Network structure is known, but data is partiallyobservable
Conditional probabilities table can be obtained in similar manner forobtaining neural network weights
Other technique is by using EM algorithm
± Data is partially observable and networkstructure is unknown?
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 14/26
Bayesian Belief Networks
Interesting because: ± Naive Bayes assumption of conditional
independence too restrictive
± But it's intractable without some suchassumptions...
± Bayesian Belief networks describe conditionalindependence among of variables
± allows combining prior knowledge about
(in)dependencies among variables withobserved training data
Also called Bayes Nets
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 15/26
Conditional Independence
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 16/26
Bayesian Belief Network
Network represents a set of conditional independenceassertions:y Each node is asserted to be conditionally independent of its
nondescendants, given its immediate predecessors.
y Directed acyclic graph
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 17/26
Bayesian Belief Network
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 18/26
Inference in Bayesian Networks
How can one infer the values of one or more networkvariables, given observed values of others?y Bayes net contains all information needed for this
inference
y
If only one variable with unknown value, easy to infer ity In general case, problem is NP hard
In practice, one can succeed in many casesy Exact inference methods work well for some network
structuresy Monte Carlo methods simulate the network randomly to
calculate approximate solutions
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 19/26
Learning of Bayesian Networks
Several variants of this learning tasky Network structure might be know n or unknow n
y Training examples might provide values of all networkvariables, or just some
If structure known and observe all variablesy Then it's easy as training a Naive Bayes classifier
Suppose structure known, variables partially
observabley e.g., observe F or est Fi r e, St or m, BusT ourGrou p, Thund er ,
but not Light ni ng , Campf i r e Similar to training neural network with hidden units
In fact, one can learn network conditional probability tables using gradient ascent!
Converge to network h that (locally) maximizes P( D|h)
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 20/26
Learning of Bayesian Networks?
Maximization of P(D|h)
In principle, it is easyy
Calculate P(D|h) for each h and return h of maximunP(D|h)
In practice, h contains many, many continuousvariablesy
Use gradient descent (ascent) method
In general, h contains discrete variables, too. (?)y Use an algorithm for combinatorial optimization,
such as simulated annealing method
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 21/26
Gradient Ascent for Bayes Nets
i j
k
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 22/26
Gradient Ascent for Bayes Nets
' ' ' '
', '
' ' ' ' '
', '
ln lnln
1
1| , ,
1| , |
1| , |
h h
h
d Dd Dijk ijk ijk
h
d D h ijk
h ij ik h ij ik
d D j k h ijk
h ij ik h ij ik h ik
d D j k h ijk
h ij ik h ij ik h ik
d D h ijk
P D P d P d
w w w
P d
P d w
P d y u P y u P d w
P d y u P y u P u P d w
P d y u P y u P u P d w
x xx! !
x x x
x!
x
x!
x
x!
x
x!
x
§
§
§ §
§ §
§
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 23/26
Gradient Ascent for Bayes Nets
ln 1| , |
1| ,
1| ,
, |1
,
, | , |
, |
h
h ij ik h ij ik h ik
d Dijk h ijk
h ij ik ijk h ik
d D h ijk
h ij ik h ik
d D h
h ij ik h h ik
d D h h ij ik
h ij ik h ik h ij ik
d D d h ij ik h ij ik
P D P d y u P y u P u
w P d w
P d y u w P u
P d w
P d y u P u P d
P y u d P d P u
P d P y u
P y u d P u P y u d
P y u P y u
x x!
x x
x!
x
!
!
! !
§
§
§
§
§ ...
D
!§
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 24/26
Gradient Ascent for Bayes Nets
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 25/26
More on Learning Bayes Nets
8/8/2019 Text Classification Using Naive Bayes
http://slidepdf.com/reader/full/text-classification-using-naive-bayes 26/26
Summary: Bayesian Belief
Networks Combine prior knowledge with observed data ± Q: how does prior knowledge enter the network?
Impact of prior knowledge (when correct!) is to lower
the sample complexity
Active research area ± Extend from boolean to real-valued variables
± Parameterized distributions instead of tables
± Extend to first-order instead of propositional systems
± More effective inference methods
± ...