Naïve Bayes 𝑖 𝜶 - Kangwoncs.kangwon.ac.kr/.../2015_MachineLearning/07_naive_bayes.pdf ·...
Transcript of Naïve Bayes 𝑖 𝜶 - Kangwoncs.kangwon.ac.kr/.../2015_MachineLearning/07_naive_bayes.pdf ·...
𝑠𝑖𝑔𝑚𝑎 𝜶
Machine Learning
𝑠𝑖𝑔𝑚𝑎 𝜶
2015.08.01.
Naïve Bayes
𝑠𝑖𝑔𝑚𝑎 𝜶 2
Probability Basics
• Prior, conditional and joint probability for random variables
• Prior probability: 𝑃(𝑋)
• Conditional probability: 𝑃 𝑋1 𝑋2 , 𝑃(𝑋2|𝑋1)
• Joint probability: 𝑿 = 𝑋1, 𝑋2 , 𝑃 𝑿 = 𝑃(𝑋1, 𝑋2)
• Relationship: 𝑃 𝑋1, 𝑋2 = 𝑃 𝑋2 𝑋1 𝑃 𝑋1 = 𝑃 𝑋1 𝑋2 𝑃(𝑋2)
• Independence: 𝑃 𝑋2|𝑋1 = 𝑃 𝑋2 , 𝑃 𝑋1|𝑋2 = 𝑃 𝑋1 ,
𝑃 𝑋1, 𝑋2 = 𝑃 𝑋1 𝑃(𝑋2)
• Bayesian Rule
𝑠𝑖𝑔𝑚𝑎 𝜶 3
Probabilistic Classification
• Establishing a probabilistic model for classification
• Discriminative model
),, , )( 1 n1L X(Xc,,cC|CP XX
),,,( 21 nxxx x
Discriminative
Probabilistic Classifier
1x 2x nx
)|( 1 xcP )|( 2 xcP )|( xLcP
𝑠𝑖𝑔𝑚𝑎 𝜶 4
Probabilistic Classification
• Establishing a probabilistic model for classification (cont.)
• Generative model
• Data들의 패턴으로 분류
• Label이 주어졌을 때 data들을 확인 data와 label 관계 파악
𝑠𝑖𝑔𝑚𝑎 𝜶 5
Bayes`s Theorem
• Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on conditions that might be related to the event.
• 두 확률 변수의 사전 확률과 사후 확률 사이의 관계를 나타냄
• 새로운 근거가 제시될 때 사후 확률이 어떻게 갱신될지 구함
• 𝑃 𝐴 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑨
• 𝑃 𝐵 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑑𝑎𝑡𝑎 𝑩
• 𝑃 𝐴 𝐵 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑨 𝑔𝑖𝑣𝑒𝑛 𝑩
• 𝑃 𝐵 𝐴 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑩 𝑔𝑖𝑣𝑒𝑛 𝑨
𝑷 𝑨 𝑩 =𝑷(𝑩|𝑨)𝑷 𝑨
𝑷 𝑩
𝑠𝑖𝑔𝑚𝑎 𝜶 6
Bayes`s Theorem
• MAP classification rule• MAP: Maximum A Posterior
• Assign 𝑥 to 𝑐∗ if
𝑃 𝐶 = 𝑐∗ 𝑋 = 𝑥 > 𝑃 𝐶 = 𝑐 𝑋 = 𝑥 𝑐 ≠ 𝑐∗, 𝑐 = 𝑐1, … , 𝑐𝐿
• Generative classification with the MAP rule• Apply Bayesian rule
𝑃 𝐶 = 𝑐𝑖 𝑋 = 𝑥 =𝑃 𝑋 = 𝑥 𝐶 = 𝑐𝑖 𝑃 𝐶 = 𝑐𝑖
𝑃 𝑋 = 𝑥
∝ 𝑃 𝑋 = 𝑥 𝐶 = 𝑐𝑖 𝑃 𝐶 = 𝑐𝑖 ∀ 𝑐𝑖
𝑠𝑖𝑔𝑚𝑎 𝜶 7
Naïve Bayes
• Bayes rule을 적용하면 모든 데이터에 대하여 고려해야 함 learning the joint probability 𝑃(𝑋1, … , 𝑋𝑛|𝐶) : Difficulty
• 10개의 Binary feature 210개의 data
• Thus, assumption that all input features are conditionally independent Naïve Bayes rule
• 각 자질에 대하여 조건부확률이 독립적이라 가정
• 조건부 확률에 대한 경우의 수: 2𝑛 2𝑛
𝑠𝑖𝑔𝑚𝑎 𝜶 8
Naïve Bayes
• Naïve Bayes
• MAP classification rule: 𝑥 = (𝑥1, 𝑥2, … , 𝑥𝑛)
𝑃 𝑋1, 𝑋2, … , 𝑋𝑛 𝐶 = 𝑃 𝑋1 𝑋2, … , 𝑋𝑛, 𝐶 𝑃(𝑋2, … , 𝑋𝑛|𝐶)
= 𝑃 𝑋1 𝐶 𝑃(𝑋2, … , 𝑋𝑛|𝐶)
= 𝑃 𝑋1 𝐶 𝑃 𝑋2 𝐶 …𝑃(𝑋𝑛|𝐶)
ProbabilityChain rule!
𝑃 𝑥1 𝐶∗ …𝑃 𝑥𝑛 𝑐
∗ 𝑃 𝑐∗ > [𝑃 𝑥1 𝑐 …𝑃 𝑥𝑛 𝑐)]𝑃(𝑐),
𝑐 ≠ 𝑐^ ∗ , 𝑐 = 𝑐_1, … , 𝑐_𝐿
=
𝑖
𝑃(𝑋𝑖|𝐶)
𝑠𝑖𝑔𝑚𝑎 𝜶 9
Example
• Example: Play Tennis
𝑠𝑖𝑔𝑚𝑎 𝜶 10
Example
• Learning Phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
𝑠𝑖𝑔𝑚𝑎 𝜶 11
Example
• Test Phase• Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
• Look up tables achieved in the learning phrase
• Decision making with the MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|x’) ≈ [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’) ≈ [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
𝑠𝑖𝑔𝑚𝑎 𝜶 12
References
• Naïve Bayes Classifier - Ke Chen
• Advanced Algorithm(Naïve Bayes Classifier) - Leeck
• Machine Learning and Its Applications – Harksoo Kim
• Wikipedia
• http://www.leesanghyun.co.kr/Naive_Bayesian_Classifier
• http://darkpgmr.tistory.com/62