Dan Jurafsky Text Classification - Wuwei Lan › courses › SP19 › 3521... · Dan Jurafsky Text...

DanJurafsky

TextClassification

• Assigningsubjectcategories,topics,orgenres• Spamdetection• Authorshipidentification• Age/genderidentification• LanguageIdentification• Sentimentanalysis• …

DanJurafsky

TextClassification:definition

• Input:• adocumentd• afixedsetofclassesC= {c1,c2,…,cJ}

• Output:apredictedclassc Î C

DanJurafsky

ClassificationMethods:SupervisedMachineLearning

• Input:• adocumentd• afixedsetofclassesC= {c1,c2,…,cJ}• Atrainingsetofm hand-labeleddocuments(d1,c1),....,(dm,cm)

• Output:• alearnedclassifierγ:dà c

DanJurafsky ClassificationMethods:SupervisedMachineLearning

• Anykindofclassifier• Naïve Bayes• Logisticregression• Support-vectormachines• k-NearestNeighbors

• …

DanJurafsky

NaïveBayesIntuition

• Simple(“naïve”)classificationmethodbasedonBayesrule

• Reliesonverysimplerepresentationofdocument• Bagofwords

DanJurafsky

Thebagofwordsrepresentation

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

DanJurafsky

Thebagofwordsrepresentation

)=cgreat 2love 2

recommend 1

laugh 1happy 1

... ...

DanJurafsky

MultinomialNaïve BayesIndependenceAssumptions

P(x1, x2,…, xn | c)

• BagofWordsassumption:Assumepositiondoesn’tmatter

• ConditionalIndependence:AssumethefeatureprobabilitiesP(xi|cj)areindependentgiventheclassc.

P(x1,…, xn | c) = P(x1 | c)•P(x2 | c)•P(x3 | c)•...•P(xn | c)

DanJurafsky

LearningtheMultinomialNaïve BayesModel

• Firstattempt:maximumlikelihoodestimates• simplyusethefrequenciesinthedata

Sec.13.3

P̂(wi | cj ) =count(wi,cj )count(w,cj )

w∈V∑

P̂(cj ) =doccount(C = cj )

DanJurafsky

MultinomialNaïveBayes:Learning

• CalculateP(cj) terms• Foreachcj inC do

docsj¬ alldocswithclass=cj

P(wk | cj )←nk +α

n+α |Vocabulary |P(cj )←

| docsj || total # documents|

• CalculateP(wk | cj) terms• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary

nk¬ #ofoccurrencesofwk inTextj

• Fromtrainingcorpus,extractVocabulary

DanJurafsky

Choosingaclass:P(c|d5)

P(j|d5) 1/4*(2/9)3 *2/9*2/9≈0.0001

Doc Words ClassTraining 1 Chinese BeijingChinese c

2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j

Test 5 ChineseChineseChineseTokyo Japan ?

Priors:P(c)=P(j)=

P̂(w | c) = count(w,c)+1count(c)+ |V |

P̂(c) = Nc

(5+1)/(8+6)=6/14=3/7(0+1)/(8+6)=1/14

(1+1)/(3+6)=2/9(0+1)/(8+6)=1/14

(1+1)/(3+6)=2/9(1+1)/(3+6)=2/9

3/4*(3/7)3 *1/14*1/14≈0.0003

DanJurafsky

UnderflowPrevention:logspace

• Multiplyinglotsofprobabilitiescanresultinfloating-pointunderflow.• Sincelog(xy)=log(x)+log(y)

• Bettertosumlogsofprobabilitiesinsteadofmultiplyingprobabilities.• Classwithhighestun-normalizedlogprobabilityscoreisstillmostprobable.

• Modelisnowjustmaxofsumofweights

cNB = argmaxc j∈C

logP(cj )+ logP(xi | cj )i∈positions∑

DanJurafsky

Summary:NaiveBayesisNotSoNaive

• VeryFast,lowstoragerequirements• RobusttoIrrelevantFeatures

IrrelevantFeaturescanceleachotherwithoutaffectingresults

• VerygoodindomainswithmanyequallyimportantfeaturesDecisionTreessufferfromfragmentation insuchcases– especiallyiflittledata

• Optimaliftheindependenceassumptionshold:Ifassumedindependenceiscorrect,thenitistheBayesOptimalClassifierforproblem

• Agooddependablebaselinefortextclassification• Butwewillseeotherclassifiersthatgivebetteraccuracy

TextClassification:Evaluation

DanJurafsky

The2-by-2contingencytable

correct notcorrectselected tp fp

notselected fn tn

DanJurafsky

Precisionandrecall

• Precision:%ofselecteditemsthatarecorrectRecall:%ofcorrectitemsthatareselected

correct notcorrectselected tp fp

notselected fn tn

DanJurafsky

Acombinedmeasure:F

• AcombinedmeasurethatassessestheP/RtradeoffisFmeasure(weightedharmonicmean):

• Theharmonicmeanisaveryconservativeaverage;seeIIR§8.3

• PeopleusuallyusebalancedF1measure• i.e.,withb =1(thatis,a =½): F =2PR/(P+R)

−+= 2

2 )1(1)1(1

DanJurafsky

MoreThanTwoClasses:Setsofbinaryclassifiers

• Dealingwithany-oformultivalue classification• Adocumentcanbelongto0,1,or>1classes.

• Foreachclassc∈C• Buildaclassifierγc todistinguishc fromallotherclassesc’∈C

• Giventestdocd,• Evaluateitformembershipineachclassusingeachγc• d belongstoany classforwhich γc returnstrue

Sec.14.5

DanJurafsky

MoreThanTwoClasses:Setsofbinaryclassifiers

• One-oformultinomialclassification• Classesaremutuallyexclusive:eachdocumentinexactlyoneclass

• Foreachclassc∈C• Buildaclassifierγc todistinguishc fromallotherclassesc’∈C

• Giventestdocd,• Evaluateitformembershipineachclassusingeachγc• d belongstotheone classwithmaximumscore

Sec.14.5

DanJurafsky

Confusionmatrixc• Foreachpairofclasses<c1,c2>howmanydocumentsfromc1

wereincorrectlyassignedtoc2?• c3,2:90wheatdocumentsincorrectlyassignedtopoultry

Docsintestset AssignedUK

Assignedpoultry

Assignedwheat

Assignedcoffee

Assignedinterest

Assignedtrade

TrueUK 95 1 13 0 1 0

Truepoultry 0 1 0 0 0 0

Truewheat 10 90 0 1 0 0

Truecoffee 0 0 0 34 3 7

Trueinterest - 1 2 13 26 5

Truetrade 0 0 2 14 5 10

DanJurafsky

Perclassevaluationmeasures

Recall:Fractionofdocsinclassi classifiedcorrectly:

Precision:Fractionofdocsassignedclassi thatare

actuallyaboutclassi:

Accuracy:(1- errorrate)Fractionofdocsclassifiedcorrectly:

ciii∑

ciji∑

ciic ji

ciicij

Sec. 15.2.4

DanJurafsky

Micro- vs.Macro-Averaging

• Ifwehavemorethanoneclass,howdowecombinemultipleperformancemeasuresintoonequantity?

• Macroaveraging:Computeperformanceforeachclass,thenaverage.

• Microaveraging:Collectdecisionsforallclasses,computecontingencytable,evaluate.

Sec. 15.2.4

DanJurafsky

Micro- vs.Macro-Averaging:Example

Truth:yes

Truth:no

Classifier:yes 10 10

Classifier:no 10 970

Truth:yes

Truth:no

Truth:yes

Truth:no

Class1 Class2 MicroAve.Table

Sec.15.2.4

• Macroaveraged precision:(0.5+0.9)/2=0.7• Microaveraged precision:100/120=.83• Microaveraged scoreisdominatedbyscoreoncommonclasses

DanJurafsky

DevelopmentTestSetsandCross-validation

• Metric:P/R/F1orAccuracy• Unseentestset

• avoidoverfitting (‘tuningtothetestset’)• moreconservativeestimateofperformance

• Cross-validationovermultiplesplits• Handlesamplingerrorsfromdifferentdatasets

• Poolresultsovereachsplit• Computepooleddev setperformance

Trainingset Development Test Set TestSet

TestSet

TrainingSet

TrainingSetDev Test

TrainingSet

Dev Test

Dan Jurafsky Text Classification - Wuwei Lan › courses › SP19 › 3521... · Dan Jurafsky Text...

Documents

Transcript of Dan Jurafsky Text Classification - Wuwei Lan › courses › SP19 › 3521... · Dan Jurafsky Text...

Dan Jurafsky: The Language of Food

POS based on Jurafsky and Martin Ch. 8

Classification, Linear Models, Naïve Bayes · 2018. 12. 14. · Classification, Linear Models, Naïve Bayes CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky & James Martin, Jacob

Dan Jurafsky Lecture 1: Sentiment Lexicons and Sentiment Classification Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011.

Morphology Reading: Chap 3, Jurafsky & Martin

(Slides modified from D. Jurafsky) Emotion CS 3710 / ISSP 3565.

Harmonious Classrooms With the Daoist Principle of Wuwei

Machine Translation: Introduction Slides from: Dan Jurafsky.

Basic Text Processing Regular Expressions. Dan Jurafsky 2 The original slides from: jurafsky/NLPCourseraSlides.h tml Some changes.

Slides are from Dan Jurafsky and Schütze Language Modeling.

Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin

44036-013: Wuwei County Wucheng Water Environment ... file2 Wuwei People's Government Implementing ADB Loan Wucheng Water Environment (Wuwei Section of Xihe River) Comprehensive Treatment

Hidden Markov Models IP notice: slides from Dan Jurafsky.

Text Classification and Naïve Bayes - ecology lab · Text Classification and Naïve Bayes The Task of Text Classification Many slides are adapted from slides by Dan Jurafsky. Is

CS 4705 Some slides adapted from Hirschberg, Dorr/Monz, Jurafsky.

Text The Task of Text Classification Classification and Naive ...jurafsky/slp3/slides/4_NB_Apr...Positive or negative movie review?...zany characters and richly applied satire, and

Text Classification & Naïve BayesText Classification & Naïve Bayes CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Some slides by Dan Jurafsky & James Martin, Jacob

Computing with Affective Lexicons: Computational Linguistics Tutorial with Dan Jurafsky

Text Classification and Naïve Bayesrocha/teaching/2013s2/... · Dan$Jurafsky$ Posi8ve#or#nega8ve#movie#review?# • unbelievably$disappoin1ng$$ • Full$of$zany$characters$and$richly$applied$sa1re,$and$some$

Logic Form Representations Reading: Chap 14, Jurafsky & Martin