The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes...

84
Machine Learning The Naïve Bayes Classifier 1

Transcript of The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes...

Page 1: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

MachineLearning

TheNaïveBayesClassifier

1

Page 2: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• Practicalconcerns

2

Page 3: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• Practicalconcerns

3

Page 4: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Wherearewe?

WehaveseenBayesianlearning– Usingaprobabilisticcriteriontoselectahypothesis– Maximumaposterioriandmaximumlikelihoodlearning

Youshouldknowwhatisthedifferencebetweenthem

Wecouldalsolearnfunctionsthatpredict probabilitiesofoutcomes

– Differentfromusingaprobabilisticcriteriontolearn

Maximumaposteriori(MAP)prediction asopposedtoMAPlearning

4

Page 5: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Wherearewe?

WehaveseenBayesianlearning– Usingaprobabilisticcriteriontoselectahypothesis– Maximumaposterioriandmaximumlikelihoodlearning

Youshouldknowwhatisthedifferencebetweenthem

Wecouldalsolearnfunctionsthatpredict probabilitiesofoutcomes

– Differentfromusingaprobabilisticcriteriontolearn

Maximumaposteriori(MAP)prediction asopposedtoMAPlearning

5

Page 6: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

MAPprediction

UsingtheBayesruleforpredicting𝑦 givenaninput𝐱

𝑃 𝑌 = 𝑦 𝑋 = 𝐱 =𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

𝑃(𝑋 = 𝐱)

6

Posteriorprobabilityoflabelbeingy forthisinputx

Page 7: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

MAPprediction

UsingtheBayesruleforpredicting𝑦 givenaninput𝐱

𝑃 𝑌 = 𝑦 𝑋 = 𝐱 =𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

𝑃(𝑋 = 𝐱)

Predictthelabel𝑦 fortheinput𝐱 using

argmax.

𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦𝑃(𝑋 = 𝐱)

7

Page 8: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

MAPprediction

UsingtheBayesruleforpredicting𝑦 givenaninput𝐱

𝑃 𝑌 = 𝑦 𝑋 = 𝐱 =𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

𝑃(𝑋 = 𝐱)

Predictthelabel𝑦 fortheinput𝐱 using

argmax.𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

8

Page 9: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

MAPprediction

UsingtheBayesruleforpredicting𝑦 givenaninput𝐱

𝑃 𝑌 = 𝑦 𝑋 = 𝐱 =𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

𝑃(𝑋 = 𝐱)

Predictthelabel𝑦 fortheinput𝐱 using

argmax.𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

9

Don’tconfusewithMAPlearning:findshypothesisby

Page 10: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

MAPprediction

Predictthelabel𝑦 fortheinput𝐱 using

argmax.𝑃 𝑋 = 𝐱 𝑌 = 𝑦 𝑃 𝑌 = 𝑦

10

Likelihood ofobservingthisinputx whenthelabelisy

Priorprobabilityofthelabelbeingy

Allweneedarethesetwosetsofprobabilities

Page 11: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

11

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?

OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

Page 12: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

12

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?

OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

Page 13: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

13

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?

OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

Page 14: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

14

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

Page 15: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

15

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

argmaxy P(H,W|play?)P(play?)

Page 16: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

16

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

argmaxy P(H,W|play?)P(play?)

P(H,W|Yes)P(Yes)=0.4£ 0.3=0.12

P(H,W|No)P(No)=0.1£ 0.7=0.07

Page 17: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Tennisagain

17

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

argmaxy P(H,W|play?)P(play?)

P(H,W|Yes)P(Yes)=0.4£ 0.3=0.12

P(H,W|No)P(No)=0.1£ 0.7=0.07

MAPprediction=Yes

Page 18: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook: S(unny),O(vercast),R(ainy)

Temperature: H(ot),M(edium),C(ool)

Humidity: H(igh),N(ormal),L(ow)

Wind: S(trong),W(eak)

18

Page 19: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook: S(unny),O(vercast),R(ainy)

Temperature: H(ot),M(edium),C(ool)

Humidity: H(igh),N(ormal),L(ow)

Wind: S(trong),W(eak)

19

Weneedtolearn

1.Theprior𝑃(Play? )2.Thelikelihoods𝑃 x Play? )

Page 20: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)

• (24 – 1)parametersineachcase

Oneforeachassignment

20

Page 21: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(O,T,H,W|Play?)

21

Page 22: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

3 3 3 2

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(O,T,H,W|Play?)

22Valuesforthisfeature

Page 23: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

3 3 3 2

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(O,T,H,W|Play?)

• (3 ⋅ 3 ⋅ 3 ⋅ 2 − 1)parametersineachcase

Oneforeachassignment

23Valuesforthisfeature

Page 24: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• Iftherearedfeatures,then:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

24

Ingeneral

Page 25: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

25

Ingeneral

Page 26: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

26

Ingeneral

Page 27: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

27

Highmodelcomplexity

Ifthereisverylimiteddata,highvarianceintheparameters

Page 28: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

28

Highmodelcomplexity

Ifthereisverylimiteddata,highvarianceintheparameters

Howcanwedealwiththis?

Page 29: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Howhardisittolearnprobabilisticmodels?

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

29

Highmodelcomplexity

Ifthereisverylimiteddata,highvarianceintheparameters

Howcanwedealwiththis?

Answer:Makeindependenceassumptions

Page 30: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Recall:Conditionalindependence

SupposeX,YandZarerandomvariables

XisconditionallyindependentofYgivenZiftheprobabilitydistributionofXisindependentofthevalueofYwhenZisobserved

Orequivalently

30

Page 31: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Modelingthefeatures

𝑃(𝑥:, 𝑥<,⋯ , 𝑥>|𝑦) requiredk(2d – 1)parameters

Whatifallthefeatureswereconditionallyindependentgiventhelabel?

Thatis,𝑃 𝑥:, 𝑥<,⋯ , 𝑥> 𝑦 = 𝑃 𝑥: 𝑦 𝑃 𝑥< 𝑦 ⋯𝑃 𝑥> 𝑦

Requiresonlydnumbersforeachlabel.kd featuresoverall.Notbad!

31

TheNaïveBayesAssumption

Page 32: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Modelingthefeatures

𝑃(𝑥:, 𝑥<,⋯ , 𝑥>|𝑦) requiredk(2d – 1)parameters

Whatifallthefeatureswereconditionallyindependentgiventhelabel?

Thatis,𝑃 𝑥:, 𝑥<,⋯ , 𝑥> 𝑦 = 𝑃 𝑥: 𝑦 𝑃 𝑥< 𝑦 ⋯𝑃 𝑥> 𝑦

Requiresonlydnumbersforeachlabel.kd parametersoverall.Notbad!

32

TheNaïveBayesAssumption

Page 33: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

TheNaïveBayesClassifier

Assumption:FeaturesareconditionallyindependentgiventhelabelY

Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)

33

Page 34: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

TheNaïveBayesClassifier

Assumption:FeaturesareconditionallyindependentgiventhelabelY

Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)

Decisionrule

34

ℎAB 𝒙 = argmax.

𝑃 𝑦 𝑃 𝑥:, 𝑥<,⋯ , 𝑥> 𝑦)

Page 35: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

TheNaïveBayesClassifier

Assumption:FeaturesareconditionallyindependentgiventhelabelY

Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)

Decisionrule

35

ℎAB 𝒙 = argmax.

𝑃 𝑦 𝑃 𝑥:, 𝑥<,⋯ , 𝑥> 𝑦)

= argmax.

𝑃 𝑦 D𝑃(𝑥E|𝑦)�

E

Page 36: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

DecisionboundariesofnaïveBayes

WhatisthedecisionboundaryofthenaïveBayesclassifier?

Considerthetwoclasscase.Wepredictthelabeltobe+if

36

𝑃 𝑦 = + D𝑃 𝑥E 𝑦 = + > 𝑃 𝑦 = − D𝑃 𝑥E 𝑦 = −)�

E

E

Page 37: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

DecisionboundariesofnaïveBayes

WhatisthedecisionboundaryofthenaïveBayesclassifier?

Considerthetwoclasscase.Wepredictthelabeltobe+if

37

𝑃 𝑦 = + D𝑃 𝑥E 𝑦 = + > 𝑃 𝑦 = − D𝑃 𝑥E 𝑦 = −)�

E

E

𝑃 𝑦 = + ∏ 𝑃 𝑥E 𝑦 = +)�E

𝑃 𝑦 = − ∏ 𝑃(𝑥E|𝑦 = −)�E

> 1

Page 38: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

DecisionboundariesofnaïveBayes

WhatisthedecisionboundaryofthenaïveBayesclassifier?

Takinglogandsimplifying,weget

38

Thisisalinearfunctionofthefeaturespace!

Easytoprove.Seenoteoncoursewebsite

log𝑃(𝑦 = −|𝒙)𝑃(𝑦 = +|𝒙) = 𝒘L𝒙 + 𝑏

Page 39: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• PracticalConcerns

39

Page 40: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:P(y)• Likelihoodsforfeaturexj givenalabel:P(xj|y)

IfwehaveadatasetD={(xi,yi)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatistheprobabilisticcriteriontoselectthehypothesis?

40

Page 41: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

IfwehaveadatasetD={(xi,yi)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatistheprobabilisticcriteriontoselectthehypothesis?

41

Page 42: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

Supposewehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples

42

Page 43: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

Supposewehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples

43

Anoteonconventionforthissection:• Examplesinthedatasetareindexedbythesubscript𝑖 (e.g. 𝒙𝑖)• Featureswithinanexampleareindexedbythesubscript𝑗

• The𝑗ST featureofthe𝑖ST examplewillbe𝑥UE

Page 44: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

Ifwehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatisaprobabilisticcriteriontoselectthehypothesis?

44

Page 45: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

45

HerehisdefinedbyalltheprobabilitiesusedtoconstructthenaïveBayesdecision

Page 46: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Maximumlikelihoodestimation

Givenadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)} withmexamples

46

Eachexampleinthedatasetisindependentandidenticallydistributed

SowecanrepresentP(D|h)asthisproduct

Page 47: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Maximumlikelihoodestimation

Givenadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples

47

Asks“Whatprobabilitywouldthisparticularh assigntothepair(xi,yi)?”

Eachexampleinthedatasetisindependentandidenticallydistributed

SowecanrepresentP(D|h)asthisproduct

Page 48: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

48

Page 49: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

49

TheNaïveBayesassumption

xij isthejthfeatureofxi

Page 50: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

50

Howdoweproceed?

Page 51: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

51

Page 52: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

52

Whatnext?

Page 53: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

53

Whatnext?

Weneedtomakeamodelingassumptionaboutthefunctionalformoftheseprobabilitydistributions

Page 54: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

54

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

Thatis,thepriorprobabilityisfromtheBernoullidistribution.

Page 55: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

55

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

Page 56: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

56

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

Thatis,thelikelihoodofeachfeatureisalsoisfromtheBernoullidistribution.

Page 57: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

57

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

hconsistsofp,allthea’sandb’s

Page 58: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

58

• Prior:P(y=1)=p andP(y=0)=1– p

Page 59: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

59

• Prior:P(y=1)=p andP(y=0)=1– p

[z]iscalledtheindicatorfunctionortheIversonbracket

Itsvalueis1iftheargumentzistrueandzerootherwise

Page 60: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

60

Likelihoodforeachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

Page 61: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Substitutingandderivingtheargmax,weget

61

P(y=1)=p

Page 62: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Substitutingandderivingtheargmax,weget

62

P(y=1)=p

P(xj =1|y=1)=aj

Page 63: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

LearningthenaïveBayesClassifier

Substitutingandderivingtheargmax,weget

63

P(y=1)=p

P(xj =1|y=1)=aj

P(xj =1|y=0)=bj

Page 64: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Let’slearnanaïveBayesclassifier

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

64

WiththeassumptionthatallourprobabilitiesarefromtheBernoullidistribution

Page 65: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Let’slearnanaïveBayesclassifier

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

65

𝑃 𝑃𝑙𝑎𝑦 = + =914 𝑃 𝑃𝑙𝑎𝑦 = − =

514

Page 66: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Let’slearnanaïveBayesclassifier

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

66

𝑃(𝑶 = 𝑆|𝑃𝑙𝑎𝑦 = +) =29

𝑃 𝑃𝑙𝑎𝑦 = + =914 𝑃 𝑃𝑙𝑎𝑦 = − =

514

Page 67: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Let’slearnanaïveBayesclassifier

67

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

𝑃(𝑶 = 𝑅|𝑃𝑙𝑎𝑦 = +) = 39

𝑃(𝑶 = 𝑆|𝑃𝑙𝑎𝑦 = +) =29

𝑃 𝑃𝑙𝑎𝑦 = + =914 𝑃 𝑃𝑙𝑎𝑦 = − =

514

Page 68: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Let’slearnanaïveBayesclassifier

68

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

𝑃(𝑶 = 𝑂|𝑃𝑙𝑎𝑦 = +) = 49

Andsoon,forotherattributesandalsoforPlay=-

𝑃(𝑶 = 𝑅|𝑃𝑙𝑎𝑦 = +) = 39

𝑃(𝑶 = 𝑆|𝑃𝑙𝑎𝑦 = +) =29

𝑃 𝑃𝑙𝑎𝑦 = + =914 𝑃 𝑃𝑙𝑎𝑦 = − =

514

Page 69: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

NaïveBayes:LearningandPrediction

• Learning– Counthowoftenfeaturesoccurwitheachlabel.Normalizetogetlikelihoods

– Priorsfromfractionofexampleswitheachlabel– Generalizestomulticlass

• Prediction– Uselearnedprobabilitiestofindhighestscoringlabel

69

Page 70: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• Practicalconcerns+anexample

70

Page 71: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

ImportantcaveatswithNaïveBayes

1. Featuresneednotbeconditionallyindependentgiventhelabel– Justbecauseweassumethattheyaredoesn’tmeanthat

that’showtheybehaveinnature– Wemadeamodelingassumptionbecauseitmakes

computation andlearningeasier

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

71

Page 72: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

ImportantcaveatswithNaïveBayes

1. Featuresarenotconditionallyindependentgiventhelabel

AllbetsareoffifthenaïveBayesassumptionisnotsatisfied

Andyet,veryoftenusedinpracticebecauseofsimplicityWorksreasonablywellevenwhentheassumptionisviolated

72

Page 73: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

ImportantcaveatswithNaïveBayes

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

73

Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.

Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes

Shouldwetreatthosecountsaszero?

Page 74: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

ImportantcaveatswithNaïveBayes

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

74

Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.

Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes

Shouldwetreatthosecountsaszero? Butthatwillmaketheprobabilitieszero

Page 75: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

ImportantcaveatswithNaïveBayes

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

75

Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.

Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes

Shouldwetreatthosecountsaszero?

Answer:Smoothing• Addfakecounts(verysmallnumberssothatthecountsarenotzero)• TheBayesianinterpretationofsmoothing:Priors onthehypothesis(MAPlearning)

Butthatwillmaketheprobabilitieszero

Page 76: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

• Instancespace:Textdocuments• Labels:Spam orNotSpam

• Goal:TolearnafunctionthatcanpredictwhetheranewdocumentisSpam orNotSpam

HowwouldyoubuildaNaïveBayesclassifier?

76

Letusbrainstorm

Howtorepresentdocuments?Howtoestimateprobabilities?Howtoclassify?

Page 77: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

77

Page 78: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

78

Page 79: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

79

Page 80: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

80

Page 81: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

81

Howoftendoesawordoccurwithalabel?

Page 82: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

82

Smoothing

Page 83: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Continuousfeatures

• Sofar,wehavebeenlookingatdiscretefeatures– P(xj |y)isaBernoullitrial(i.e.acointoss)

• WecouldmodelP(xj |y)withotherdistributionstoo– Thisisaseparateassumptionfromtheindependence

assumptionthatnaiveBayesmakes– Eg:Forrealvaluedfeatures,(Xj |Y)couldbedrawnfroma

normaldistribution

• Exercise:Derivethemaximumlikelihoodestimatewhenthefeaturesareassumedtobedrawnfromthenormaldistribution

83

Page 84: The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Summary:NaïveBayes

• Independenceassumption– Allfeaturesareindependentofeachothergiventhelabel

• Maximumlikelihoodlearning:Learningissimple– Generalizestorealvaluedfeatures

• PredictionviaMAPestimation– Generalizestobeyondbinaryclassification

• Importantcaveatstoremember– Smoothing– Independenceassumptionmaynotbevalid

• Decisionboundaryislinearforbinaryclassification

84