Post on 29-Nov-2021
Naïve Bayes Discussion
Computational Linguistics: Jordan Boyd-GraberUniversity of MarylandLECTURE 1A
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 1 / 13
Roadmap
� Content Questions
� Administrivia Questions
� NB Exercise
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 2 / 13
Content Questions
Content Questions
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 3 / 13
Administrivia Questions
Administrivia Announcements
� Use Piazza
� submit server
� TA office hours
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 4 / 13
Administrivia Questions
Administrivia Questions
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 5 / 13
NB Example
Documents
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13
NB Example
Documents
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
What’s |C| and |V |?
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13
NB Example
Documents
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
|C|= 2 (spam vs. ham)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13
NB Example
Documents
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
|V |= 8: ’deal’, ’dog’, ’bagel’, ’logistic’,’mother’, ’cialis’, ’abuja’, ’man’
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 6 / 13
NB Example
Background Probabilities
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 7 / 13
NB Example
Background Probabilities
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
What’s P̂(cj)?
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 7 / 13
NB Example
Background Probabilities
� For spam:
(1)
� For ham:
(2)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13
NB Example
Background Probabilities
� For spam:
P̂(cj = spam) =Nc +1
N + |C|(1)
(2)
� For ham:
(3)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13
NB Example
Background Probabilities
� For spam:
P̂(cj = spam) =Nc +1
N + |C|(1)
=4+1
7+2(2)
=5
9(3)
� For ham:
(4)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13
NB Example
Background Probabilities
� For spam:
P̂(cj = spam) =Nc +1
N + |C|(1)
=4+1
7+2(2)
=5
9(3)
� For ham:
(4)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13
NB Example
Background Probabilities
� For spam:
P̂(cj = spam) =Nc +1
N + |C|(1)
=4+1
7+2(2)
=5
9(3)
� For ham:
P̂(cj = ham) =Nc +1
N + |C|(4)
(5)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13
NB Example
Background Probabilities
� For spam:
P̂(cj = spam) =Nc +1
N + |C|(1)
=4+1
7+2(2)
=5
9(3)
� For ham:
P̂(cj = ham) =Nc +1
N + |C|(4)
=3+1
7+2(5)
=4
9(6)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 8 / 13
NB Example
Conditional Probabilities
D1: Spam
abuja man
D3: Spam
cialis deal
D5: Spam
abuja deal
D7: Spam
cialis dog
D2: Ham
man dog
D4: Ham
logistic mother logistic abuja
D6: Ham
bagel deal
What’s the conditional probability P̂(w = dog |c)?
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 9 / 13
NB Example
Conditional Probabilities
� For spam:
(7)
� For ham:
(8)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
(8)
� For ham:
(9)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
=1+1
8+8(8)
(9)
� For ham:
(10)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
=1+1
8+8(8)
=1
8(9)
� For ham:
(10)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
=1+1
8+8(8)
=1
8(9)
� For ham:
(10)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
=1+1
8+8(8)
=1
8(9)
� For ham:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(10)
(11)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
=1+1
8+8(8)
=1
8(9)
� For ham:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(10)
=1+1
8+8(11)
(12)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(7)
=1+1
8+8(8)
=1
8(9)
� For ham:
P̂(w = dog |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(10)
=1+1
8+8(11)
=1
8(12)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 10 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
(13)
� For ham:
(14)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
(14)
� For ham:
(15)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=5
9·1
8(14)
(15)
� For ham:
(16)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=5
9·1
8(14)
=0.07 (15)
� For ham:
(16)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=0.07 (14)
� For ham:
(15)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=0.07 (14)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (15)
(16)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=0.07 (14)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (15)
=4
9·1
8(16)
(17)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=0.07 (14)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (15)
=4
9·1
8(16)
=0.06 (17)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Prediction
What if you saw a document with the word “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (13)
=0.07 (14)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (15)
=0.06 (16)
These aren’t probabilities? What if we wanted the real probabilities?
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 11 / 13
NB Example
Conditional Probabilities
� For spam:
(17)
� For ham:
(18)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(17)
(18)
� For ham:
(19)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(17)
=0+1
8+8(18)
=1
16(19)
� For ham:
(20)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(17)
=0+1
8+8(18)
=1
16(19)
� For ham:
(20)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(17)
=0+1
8+8(18)
=1
16(19)
� For ham:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(20)
(21)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13
NB Example
Conditional Probabilities
� For spam:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(17)
=0+1
8+8(18)
=1
16(19)
� For ham:
P̂(w = logistic |c) =Tcw +1
(∑
w ′∈V Tcw ′)+ |V |(20)
=2+1
8+8(21)
=3
16(22)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 12 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?
� For spam:
(23)
� For ham:
(24)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
(24)
� For ham:
(25)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
=5
9·1
8·
1
16·
1
16(24)
(25)
� For ham:
(26)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
=5
9·1
8·
1
16·
1
16(24)
=0.0002 (25)
� For ham:
(26)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
=0.0002 (24)
� For ham:
(25)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?
� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
=0.0002 (24)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (25)
(26)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
=0.0002 (24)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (25)
=4
9·1
8·
3
16·
3
16(26)
(27)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13
NB Example
Prediction
What if you saw a document with the words “logistic” “logistic” “dog”?� For spam:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (23)
=0.0002 (24)
� For ham:
P(c|d)∝P(c)∏
1≤i≤nd
P(wi |c) (25)
=4
9·1
8·
3
16·
3
16(26)
=0.002 (27)
Computational Linguistics: Jordan Boyd-Graber | UMD Naïve Bayes Discussion | 13 / 13