Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall...

25
Geoff Gordon—10-701 Machine Learning—Fall 2013 Related reading Bishop 2.5: nearest neighbor and Parzen windows Bishop 3-3.1: least squares for regression Bishop 4-4.1: linear classifiers Bishop p46, p380: naive Bayes 1

Transcript of Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall...

Page 1: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Related reading

• Bishop 2.5: nearest neighbor and Parzen windows

• Bishop 3-3.1: least squares for regression

• Bishop 4-4.1: linear classifiers

• Bishop p46, p380: naive Bayes

1

Page 2: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Bayes rule

• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0

2Geoff Gordon—10-701 Machine Learning—Fall 2013

Bayes rule

• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0

12

Page 3: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Bayes rule: sum version

• P(a | b) = P(b | a) P(a) / P(b)

3

Page 4: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Bayes rule in ML

• P(model | data) = P(data | model) P(model) / P(data)

4

Page 5: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Bayes rule vs. MAP vs. MLE

• P(model | data) = P(data | model) P(model) / P(data)

5

Page 6: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Jerzy Neyman

Frequentist vs. Bayes

• Nature as adversary vs. Nature as probability distribution

• Probability as long-run frequency of repeatable events vs. odds for bets I'm willing to take

6

rev. Thomas Bayes

FIGHT!!!

see

also

: htt

p://w

ww

.xkc

d.co

m/1

132/

Page 7: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Test for a rare disease

• About 0.1% of all people are infected

• Test detects all infections

• Test is highly specific: 1% false positive

• You test positive. What is the probability you have the disease?

7

Page 8: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Test for a rare disease

• About 0.1% of all people are infected

• Test detects all infections

• Test is highly specific: 1% false positive

• You test positive. What is the probability you have the disease?

7

Bonus: what is probability an average med student gets this question wrong?

Page 9: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Follow-up test

• Test 2: detects 90% of infections, 5% false positives‣ P(+disease | +test1, +test2) =

8

Page 10: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Independence

9

Page 11: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Conditional independence

10

Page 12: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

xkcd.com

London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…

Conditionally Independent

31

slide credit: Barnabas

Page 13: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

xkcd.com

London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…

Conditionally Independent

31

humor credit: xkcd

More on the importance of conditioning

12

Page 14: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Samples

13

Page 15: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Recall: spam filtering

14

Page 16: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Bag of words

15

Page 17: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

A ridiculously naive assumption

• Assume:

• Clearly false:

• Given this assumption, use Bayes rule

16

Page 18: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Graphical model

17

A Graphical Model

spam

x1 x2 . . . xn

spam

xi

i=1..n

41

A Graphical Model

spam

x1 x2 . . . xn

spam

xi

i=1..n

41

Page 19: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Naive Bayes

• P(spam | email ∧ award ∧ program ∧ for ∧ internet ∧ users ∧ lump ∧ sum ∧ of ∧ Five ∧ Million)

18

Page 20: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

In log spacezspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))

z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))

19

Page 21: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Collect termszspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))

z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))

z = zspam – zspam

20

Page 22: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Linear discriminant

21

Page 23: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Intuitions

22

Page 24: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

How to get probabilities?

23

• Bernoulli distribution: Ber(p)

Suppose a coin with head prob. p is tossed n times. What is the probability of getting k heads and n-k tails?

• Binomial distribution: Bin(n,p)

17

Discrete Distributions

Page 25: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Improvements

24