Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review...

50
DS 4400 Alina Oprea Associate Professor, CCIS Northeastern University October 25 2018 Machine Learning and Data Mining I

Transcript of Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review...

Page 1: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

DS 4400

Alina Oprea

Associate Professor, CCIS

Northeastern University

October 25 2018

Machine Learning and Data Mining I

Page 2: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Review

2

Page 3: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Outline

• Naïve Bayes classifier

– Density Estimation

• Application

– Document classification

• Review of supervised learning methods

• Metrics for evaluating classifiers

3

Page 4: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Prior and Joint Probabilities

4

Page 5: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Density Estimation

5

Page 6: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Example – Learning Joint Probability Distribution

6

Page 7: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Pros and Cons of Density Estimators

• Pros

– Density Estimators can learn distribution of training data

– Can compute probability for a record

– Can do inference (predict likelihood of record)

• Cons

– Can overfit to the training data and not generalize to test data

– Curse of dimensionality

7

Naïve Bayes classifier fixes these cons!

Page 8: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Bayes’ Rule

8

Page 9: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Bayes’ Rule

9

Page 10: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

LDA

10

• Classify to one of k classes

• Logistic regression computes directly

– P 𝑌 = 1 𝑋 = 𝑥

– Assume sigmoid function

• LDA uses Bayes Theorem to estimate it

– P 𝑌 = 𝑘 𝑋 = 𝑥 =P 𝑋 = 𝑥 𝑌 = 𝑘 P[𝑌=𝑘]

P[𝑋=𝑥]

– Let 𝜋𝑘 = P[𝑌 = 𝑘] be the prior probability of class k and 𝑓𝑘 𝑥 = P 𝑋 = 𝑥 𝑌 = 𝑘

Discriminative model

Generative model

Page 11: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

LDA

11

Assume 𝑓𝑘 𝑥 is Gaussian!Unidimensional case (d=1)

Assumption: 𝜎1 = …𝜎𝑘 = σ

Page 12: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Classification Setting

12

P 𝑌 = 𝑘 𝑋 = 𝑥 =P[𝑌 = 𝑘]P 𝑋 = 𝑥 𝑌 = 𝑘

P[𝑋 = 𝑥]

P 𝑌 = 𝑘 𝑋 = 𝑥 =P[𝑌 = 𝑘]P 𝑋1 = 𝑥1 ∧⋯∧ 𝑋𝑑= 𝑥𝑑 𝑌 = 𝑘

P[𝑋1 = 𝑥1 ∧⋯∧ 𝑋𝑑= 𝑥𝑑]

Page 13: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Naïve Bayes Classifier

13

P[𝑌 = 𝑘]P 𝑋1 = 𝑥1 ∧⋯∧ 𝑋𝑑= 𝑥𝑑 𝑌 = 𝑘

P[𝑋1 = 𝑥1 ∧⋯∧ 𝑋𝑑= 𝑥𝑑]P 𝑌 = 𝑘 𝑋 = 𝑥

Page 14: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Naïve Bayes Classifier

14

Page 15: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Training Naïve Bayes

15

Page 16: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Training Naïve Bayes

16

3/4 1/4

Page 17: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Training Naïve Bayes

17

1

Page 18: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Training Naïve Bayes

18

0

Page 19: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Training Naïve Bayes

19

2/3

Page 20: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Training Naïve Bayes

20

1

Page 21: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Laplace Smoothing

21

𝑘

𝑘

Page 22: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Using the Naïve Bayes Classifier

22

𝑘 𝑘

𝑘 𝑘

P[𝑌 = 𝑘]P 𝑋1 = 𝑥1 ∧ ⋯∧ 𝑋𝑑= 𝑥𝑑 𝑌 = 𝑘

P[𝑋1 = 𝑥1 ∧ ⋯∧ 𝑋𝑑= 𝑥𝑑]P 𝑌 = 𝑘 𝑋 = 𝑥

Page 23: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Naïve Bayes Classifier

23

𝑘 𝑘

• For each class label 𝑘1. Estimate prior 𝑃[𝑌 = 𝑘] from the data 2. For each value 𝑣 of attribute 𝑋𝑗

• Estimate P[𝑋𝑗 = 𝑣|𝑌 = 𝑘]

Page 24: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Computing Probabilities

24

𝑘 𝑘 𝑘

𝑘

𝑘 𝑘

Page 25: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Naïve Bayes Summary

25

Page 26: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Document Classification

26

Page 27: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Document Classification

27

Page 28: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Text Classification: Examples

28

Page 29: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Bag of Words Representation

29

Page 30: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Bag of Words Representation

30

Page 31: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Another View of Naïve Bayes

31

Page 32: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Another View of Naïve Bayes

32

Page 33: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Document Classification with Naïve Bayes

33

Page 34: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Review Naïve Bayes

• Density Estimators can estimate joint probability distribution from data

• Risk of overfitting and curse of dimensionality

• Naïve Bayes assumes that features are independent given labels– Reduces the complexity of density estimation

– Even though the assumption is not always true, Naïve Bayes works well in practice

• Applications: text classification with bag-of-words representation– Naïve Bayes becomes a linear classifier

34Generative model

Page 35: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Traditional learning

• Linear classifiers – Logistic regression, LDA, perceptrons

• Decision trees

• Ensembles – Random Forests

– AdaBoost

• SVM– Linear SVM

– Kernels

• Naïve Bayes

35

Page 36: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Confusion Matrix

36

Page 37: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Accuracy and Error

37

Page 38: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Precision & Recall

38

Page 39: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

F-Score

39

Page 40: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

A Word of Caution

40

Page 41: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Receiver Operating Characteristic (ROC)

41

Page 42: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Performance Depends on Threshold

42

Page 43: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

ROC Example

43

Page 44: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

ROC Example

44

Page 45: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

ROC Curve

45

Page 46: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

ROC Curve

46

Page 47: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

ROC Curve

47

Page 48: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Area Under the ROC Curve

48

Page 49: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Comparing Supervised Learning

49

Page 50: Machine Learning and Data Mining I · 2018. 10. 27. · Machine Learning and Data Mining I. Review 2. Outline •Naïve Bayes classifier –Density Estimation ... 11 Assume 𝑓 𝑥is

Acknowledgements

• Slides made using resources from:

– Andrew Ng

– Eric Eaton

– David Sontag

– Andrew Moore

• Thanks!