Text Classification and Naïve Bayes - Texas A&M...

45
Text Classification and Naïve Bayes The Task of Text Classification Many slides are adapted from slides by Dan Jurafsky

Transcript of Text Classification and Naïve Bayes - Texas A&M...

Page 1: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

TheTaskofTextClassification

Many slides are adapted from slides by Dan Jurafsky

Page 2: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Isthisspam?

Page 3: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

WhowrotewhichFederalistpapers?• 1787-8:anonymousessaystrytoconvinceNewYorktoratifyU.SConstitution: Jay,Madison,Hamilton.

• Authorshipof12ofthelettersindispute• 1963:solvedbyMosteller andWallaceusingBayesianmethods

JamesMadison AlexanderHamilton

Page 4: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Maleorfemaleauthor?1. By1925present-dayVietnamwasdividedintothreeparts

underFrenchcolonialrule.ThesouthernregionembracingSaigonandtheMekongdeltawasthecolonyofCochin-China;thecentralareawithitsimperialcapitalatHuewastheprotectorateofAnnam…

2. Claraneverfailedtobeastonishedbytheextraordinaryfelicityofherownname.Shefoundithardtotrustherselftothemercyoffate,whichhadmanagedovertheyearstoconverthergreatestshameintooneofhergreatestassets…

S.Argamon,M.Koppel,J.Fine,A.R.Shimoni,2003.“Gender,Genre,andWritingStyleinFormalWrittenTexts,”Text,volume23,number3,pp.321–346

Page 5: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Positiveornegativemoviereview?• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists

• thisisthegreatestscrewballcomedyeverfilmed

• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.

5

Page 6: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Whatisthesubjectofthisarticle?

• Antogonists andInhibitors

• BloodSupply• Chemistry• DrugTherapy• Embryology• Epidemiology• …

6

MeSH SubjectCategoryHierarchy

?

MEDLINE Article

Page 7: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassification

• Assigningsubjectcategories,topics,orgenres• Spamdetection• Authorshipidentification• Age/genderidentification• LanguageIdentification• Sentimentanalysis• …

Page 8: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassification:definition• Input:– adocumentd– afixedsetofclassesC= {c1,c2,…,cJ}

• Output:apredictedclassc Î C

Page 9: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

ClassificationMethods:Hand-codedrules

• Rulesbasedoncombinationsofwordsorotherfeatures– spam:black-list-addressOR(“dollars”AND“have beenselected”)

• Accuracycanbehigh– Ifrulescarefullyrefinedbyexpert

• Butbuildingandmaintainingtheserulesisexpensive

Page 10: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

ClassificationMethods:SupervisedMachineLearning

• Input:– adocumentd– afixedsetofclassesC= {c1,c2,…,cJ}– Atrainingsetofm hand-labeleddocuments(d1,c1),....,(dm,cm)

• Output:– alearnedclassifierγ:dà c

10

Page 11: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

ClassificationMethods:SupervisedMachineLearning

• Anykindofclassifier– Naïve Bayes– Logisticregression,maxent– Support-vectormachines– k-NearestNeighbors

– …

Page 12: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

TheTaskofTextClassification

Page 13: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

FormalizingtheNaïve BayesClassifier

Page 14: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

NaïveBayesIntuition

• Simple(“naïve”)classificationmethodbasedonBayesrule

• Reliesonverysimplerepresentationofdocument– Bagofwords

Page 15: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Bayes’RuleAppliedtoDocumentsandClasses

•Foradocumentd andaclassc

P(c | d) = P(d | c)P(c)P(d)

Page 16: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Naïve BayesClassifier(I)

cMAP = argmaxc∈C

P(c | d)

= argmaxc∈C

P(d | c)P(c)P(d)

= argmaxc∈C

P(d | c)P(c)

MAP is “maximum a posteriori” = most likely class

Bayes Rule

Dropping the denominator

Page 17: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Naïve BayesClassifier(II)

cMAP = argmaxc∈C

P(d | c)P(c)

Document d represented as features x1..xn

= argmaxc∈C

P(x1, x2,…, xn | c)P(c)

Page 18: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Naïve BayesClassifier(IV)

How often does this class occur?

cMAP = argmaxc∈C

P(x1, x2,…, xn | c)P(c)

O(|X|n•|C|)parameters

We can just count the relative frequencies in a corpus

Couldonlybeestimatedifavery,verylargenumberoftrainingexampleswasavailable.

Page 19: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

MultinomialNaïve BayesIndependenceAssumptions

• BagofWordsassumption:Assumepositiondoesn’tmatter

• ConditionalIndependence:AssumethefeatureprobabilitiesP(xi|cj)areindependentgiventheclassc.

P(x1, x2,…, xn | c)

P(x1,…, xn | c) = P(x1 | c)•P(x2 | c)•P(x3 | c)•...•P(xn | c)

Page 20: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

ThebagofwordsrepresentationI love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

γ(

)=c

Page 21: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Thebagofwordsrepresentation

γ(

)=cgreat 2love 2

recommend 1

laugh 1happy 1

... ...

Page 22: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Planning GUIGarbageCollection

Machine Learning NLP

parsertagtrainingtranslationlanguage...

learningtrainingalgorithmshrinkagenetwork...

garbagecollectionmemoryoptimizationregion...

Test document

parserlanguagelabeltranslation…

Bagofwordsfordocumentclassification

...planningtemporalreasoningplanlanguage...

?

Page 23: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

ApplyingMultinomialNaiveBayesClassifierstoTextClassification

cNB = argmaxc j∈C

P(cj ) P(xi | cj )i∈positions∏

positions ¬ allwordpositionsintestdocument

Page 24: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

FormalizingtheNaïve BayesClassifier

Page 25: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

Naïve Bayes:Learning

Page 26: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

LearningtheMultinomialNaïve BayesModel

• Firstattempt:maximumlikelihoodestimates– simplyusethefrequenciesinthedata

Sec.13.3

P̂(wi | cj ) =count(wi,cj )count(w,cj )

w∈V∑

P̂(cj ) =doccount(C = cj )

Ndoc

Page 27: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Parameterestimation

• Createmega-documentfortopicj byconcatenatingalldocsinthistopic– Usefrequencyofw inmega-document

fractionoftimeswordwi appearsamongallwordsindocumentsoftopiccj

P̂(wi | cj ) =count(wi,cj )count(w,cj )

w∈V∑

Page 28: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

ProblemwithMaximumLikelihood• Whatifwehaveseennotrainingdocumentswiththewordfantastic and

classifiedinthetopicpositive (thumbs-up)?

• Zeroprobabilitiescannotbeconditionedaway,nomattertheotherevidence!

P̂("fantastic" positive) = count("fantastic", positive)count(w, positive

w∈V∑ )

= 0

cMAP = argmaxc P̂(c) P̂(xi | c)i∏

Sec.13.3

Page 29: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Laplace(add-1)smoothing:unknownwords

P̂(wu | c) = count(wu,c)+1

count(w,cw∈V∑ )

#

$%%

&

'(( + V +1

Addoneextrawordtothevocabulary,the“unknownword”wu

=1

count(w,cw∈V∑ )

#

$%%

&

'(( + V +1

Page 30: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

UnderflowPrevention:logspace• Multiplyinglotsofprobabilitiescanresultinfloating-pointunderflow.• Sincelog(xy)=log(x)+log(y)

– Bettertosumlogsofprobabilitiesinsteadofmultiplyingprobabilities.• Classwithhighestun-normalizedlogprobabilityscoreisstillmost

probable.

• Modelisnowjustmaxofsumofweights

cNB = argmaxc j∈C

logP(cj )+ logP(xi | cj )i∈positions∑

Page 31: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïve Bayes

Naïve Bayes:Learning

Page 32: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

MultinomialNaïve Bayes:AWorkedExample

Page 33: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Choosingaclass:P(c|d5)

P(j|d5) 1/4*(2/10)3 *2/10 *2/10≈0.00008

Doc Words Class

Training 1 Chinese BeijingChinese c

2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseChineseChineseTokyo Japan ?

33

ConditionalProbabilities:P(Chinese|c)=P(Tokyo|c)=P(Japan|c)=P(Chinese|j)=P(Tokyo|j)=P(Japan|j)=

Priors:P(c)=P(j)=

34 1

4

P̂(w | c) = count(w,c)+1count(c)+ |V |

P̂(c) = Nc

N

(5+1)/(8+7)=6/15(0+1)/(8+7)=1/15

(1+1)/(3+7)=2/10(0+1)/(8+7)=1/15

(1+1)/(3+7)=2/10(1+1)/(3+7)=2/10

3/4*(6/15)3 *1/15 *1/15≈0.0002

µ

µ

+1

Page 34: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Summary:NaiveBayesisNotSoNaive• RobusttoIrrelevantFeatures

IrrelevantFeaturescanceleachotherwithoutaffectingresults

• Verygoodindomainswithmanyequallyimportantfeatures

DecisionTreessufferfromfragmentation insuchcases– especiallyiflittledata

• Optimaliftheindependenceassumptionshold:Ifassumedindependenceiscorrect,thenitistheBayesOptimalClassifierforproblem

• Agooddependablebaselinefortextclassification– Butwewillseeotherclassifiersthatgivebetteraccuracy

Page 35: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

MultinomialNaïve Bayes:AWorkedExample

Page 36: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

TextClassification:Evaluation

Page 37: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

The2-by-2contingencytable

correct notcorrectselected tp fp

notselected fn tn

Page 38: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Precisionandrecall• Precision:%ofselecteditemsthatarecorrectRecall:%ofcorrectitemsthatareselected

correct notcorrectselected tp fp

notselected fn tn

Page 39: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Acombinedmeasure:F• AcombinedmeasurethatassessestheP/RtradeoffisFmeasure(weightedharmonicmean):

• PeopleusuallyusebalancedF1measure– i.e.,withb =1(thatis,a =½): F =2PR/(P+R)

RPPR

RP

F+

+=

−+= 2

2 )1(1)1(1

1ββ

αα

Page 40: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Confusionmatrixc• Foreachpairofclasses<c1,c2>howmanydocumentsfromc1 wereincorrectlyassignedtoc2?– c3,2:90wheatdocumentsincorrectlyassignedtopoultry

40

Docsintestset AssignedUK

Assignedpoultry

Assignedwheat

Assignedcoffee

Assignedinterest

Assignedtrade

TrueUK 95 1 13 0 1 0

Truepoultry 0 1 0 0 0 0

Truewheat 10 90 0 1 0 0

Truecoffee 0 0 0 34 3 7

Trueinterest - 1 2 13 26 5

Truetrade 0 0 2 14 5 10

Page 41: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

PerclassevaluationmeasuresRecall:Fractionofdocsinclassi classifiedcorrectly:

Precision:Fractionofdocsassignedclassi thatareactually

aboutclassi:

Accuracy:(1- errorrate)Fractionofdocsclassifiedcorrectly: 41

ciii∑

ciji∑

j∑

ciic ji

j∑

ciicij

j∑

Sec. 15.2.4

Page 42: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Micro- vs.Macro-Averaging– Ifwehavemorethanoneclass,howdowecombinemultipleperformancemeasuresintoonequantity?

• Macroaveraging:Computeperformanceforeachclass,thenaverage.Averageonclasses

• Microaveraging:Collectdecisionsforeachinstancefromallclasses,computecontingencytable,evaluate.Averageoninstances

42

Sec. 15.2.4

Page 43: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

Micro- vs.Macro-Averaging:Example

Truth:yes

Truth:no

Classifier:yes 10 10

Classifier:no 10 970

Truth:yes

Truth:no

Classifier:yes 90 10

Classifier:no 10 890

Truth:yes

Truth:no

Classifier:yes 100 20

Classifier:no 20 1860

43

Class1 Class2 MicroAve.Table

Sec.15.2.4

• Macroaveraged precision:(0.5+0.9)/2=0.7• Microaveraged precision:100/120=.83• Microaveraged scoreisdominatedbyscoreoncommonclasses

Page 44: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

DevelopmentTestSetsandCross-validation

• Metric:P/R/F1orAccuracy• Unseentestset– avoidoverfitting (‘tuningtothetestset’)– moreconservativeestimateofperformance

– Cross-validationovermultiplesplits• Handlesamplingerrorsfromdifferentdatasets

– Poolresultsovereachsplit– Computepooleddev setperformance

Trainingset Development Test Set TestSet

TestSet

TrainingSet

TrainingSetDev Test

TrainingSet

Dev Test

Dev Test

Page 45: Text Classification and Naïve Bayes - Texas A&M Universityfaculty.cse.tamu.edu/.../Fall17/l3_textClassification_naivebayes.pdf · Text Classification and Naïve Bayes The Task of

TextClassificationandNaïveBayes

TextClassification:Evaluation