Classification: Probabilistic Generative...
Transcript of Classification: Probabilistic Generative...
![Page 1: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/1.jpg)
Classification: Probabilistic Generative Model
![Page 2: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/2.jpg)
Classification
• Credit Scoring
• Input: income, savings, profession, age, past financial history ……
• Output: accept or refuse
• Medical Diagnosis
• Input: current symptoms, age, gender, past medical history ……
• Output: which kind of diseases
• Handwritten character recognition
• Face recognition
• Input: image of a face, output: person
金Input:output:
Function𝑥 Class n
![Page 3: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/3.jpg)
Example Application
𝑓 = 𝑓 = 𝑓 =
![Page 4: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/4.jpg)
Example Application
• Total: sum of all stats that come after this, a general guide to how strong a pokemon is
• HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
• Attack: the base modifier for normal attacks (eg. Scratch, Punch)
• Defense: the base damage resistance against normal attacks
• SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
• SP Def: the base damage resistance against special attacks
• Speed: determines which pokemon attacks first each round
Can we predict the “type” of pokemon based on the information?
320
3555
40
5050
90
pokemon games (NOT pokemoncards or Pokemon Go)
![Page 5: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/5.jpg)
Example Application
![Page 6: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/6.jpg)
How to do Classification
• Training data for Classification
𝑥1, ො𝑦1 𝑥2, ො𝑦2 𝑥𝑁, ො𝑦𝑁……
Training: Class 1 means the target is 1; Class 2 means the target is -1
Classification as Regression?Binary classification as example
Testing: closer to 1 → class 1; closer to -1 → class 2
![Page 7: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/7.jpg)
• Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 …… problematic
Penalize to the examples that are “too correct” … (Bishop, P186)
Class 1
x1x1
x2x2
Class 2
Class 1
Class 2
1 1
-1 -1
y = b + w1x1 + w2x2
b + w1x1 + w2x2 = 0
>>1
error
to decrease error
![Page 8: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/8.jpg)
Ideal Alternatives
• Function (Model):
• Loss function:
• Find the best function:
• Example: Perceptron, SVM Not Today
𝑥Output = class 1
𝑓 𝑥
𝑔 𝑥 > 0
𝑒𝑙𝑠𝑒
𝐿 𝑓 =
𝑛
𝛿 𝑓 𝑥𝑛 ≠ ො𝑦𝑛
Output = class 2
The number of times f get incorrect results on training data.
![Page 9: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/9.jpg)
Two Boxes
Box 1 Box 2
P(B1) = 2/3 P(B2) = 1/3
P(Blue|B1) = 4/5
P(Green|B1) = 1/5
P(Blue|B1) = 2/5
P(Green|B1) = 3/5
Where does it come from?
P(B1 | Blue) =𝑃 Blue|𝐵1 𝑃 𝐵1
𝑃 Blue|𝐵1 𝑃 𝐵1 + 𝑃 Blue|𝐵2 𝑃 𝐵2
from one of the boxes
![Page 10: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/10.jpg)
Two Classes
Given an x, which class does it belong to
=𝑃 𝑥|𝐶1 𝑃 𝐶1
𝑃 𝑥|𝐶1 𝑃 𝐶1 + 𝑃 𝑥|𝐶2 𝑃 𝐶2𝑃 𝐶1|𝑥
𝑃 𝑥 = 𝑃 𝑥|𝐶1 𝑃 𝐶1 + 𝑃 𝑥|𝐶2 𝑃 𝐶2Generative Model
P(C1) P(C2)
P(x|C1) P(x|C2)
Estimating the ProbabilitiesFrom training data
Class 1 Class 2
![Page 11: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/11.jpg)
Prior
Class 1 Class 2
P(C1) P(C2)
Water Normal
Water and Normal type with ID < 400 for training, rest for testing
Training: 79 Water, 61 Normal
P(C1) = 79 / (79 + 61) =0.56
P(C2) = 61 / (79 + 61) =0.44
![Page 12: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/12.jpg)
Probability from Class
P( |Water)
79 in total
WaterType
= ?
Each Pokémon is represented as a vector by its attribute. feature
P(x|C1) = ?
![Page 13: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/13.jpg)
Probability from Class - Feature
• Considering Defense and SP Defense
6564
4850
10345
P(x|Water)=? 0?
WaterType
𝑥1, 𝑥2, 𝑥3, …… ,𝑥79
Assume the points are sampled from a Gaussian distribution.
![Page 14: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/14.jpg)
𝑓𝜇,Σ 𝑥 =1
2𝜋 𝐷/2
1
Σ 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇 𝑇Σ−1 𝑥 − 𝜇
Input: vector x, output: probability of sampling x
The shape of the function determines by mean 𝝁 and covariance matrix 𝜮
Gaussian Distribution
𝑥1
https://blog.slinuxer.com/tag/pca
𝑥2𝑥1
𝑥2
𝜇 =00 𝜇 =
23
=2 00 2
=2 00 2
![Page 15: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/15.jpg)
𝑓𝜇,Σ 𝑥 =1
2𝜋 𝐷/2
1
Σ 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇 𝑇Σ−1 𝑥 − 𝜇
Input: vector x, output: probability of sampling x
The shape of the function determines by mean 𝝁 and covariance matrix 𝜮
Gaussian Distribution
𝑥1
https://blog.slinuxer.com/tag/pca
𝑥2𝑥1
𝑥2
𝜇 =00 𝜇 =
00
=2 00 6
=2 −1−1 2
![Page 16: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/16.jpg)
Probability from ClassAssume the points are sampled from a Gaussian distribution
Find the Gaussian distribution behind them
WaterType
Probability for new points
𝑥1, 𝑥2, 𝑥3, …… ,𝑥79
𝜇 =75.071.3
Σ =874 327327 929
𝑓𝜇,Σ 𝑥 =1
2𝜋 𝐷/2
1
Σ 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇 𝑇Σ−1 𝑥 − 𝜇
New x
How to find them?
![Page 17: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/17.jpg)
The Gaussian with any mean 𝜇 and covariance matrix 𝛴can generate these points.
Maximum Likelihood
𝜇 =7575
𝜇 =150140
=5 00 5
=2 −1−1 2
𝑓𝜇,Σ 𝑥 =1
2𝜋 𝐷/2
1
Σ 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇 𝑇Σ−1 𝑥 − 𝜇
= 𝑓𝜇,Σ 𝑥1 𝑓𝜇,Σ 𝑥2 𝑓𝜇,Σ 𝑥3 ……𝑓𝜇,Σ 𝑥79
Likelihood of a Gaussian with mean 𝜇 and covariance matrix 𝛴
Different Likelihood
𝑥1, 𝑥2, 𝑥3, …… ,𝑥79
= the probability of the Gaussian samples 𝑥1, 𝑥2, 𝑥3, …… ,𝑥79
𝐿 𝜇, Σ
![Page 18: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/18.jpg)
Maximum Likelihood
𝑓𝜇,Σ 𝑥 =1
2𝜋 𝐷/2
1
Σ 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇 𝑇Σ−1 𝑥 − 𝜇
𝑥1, 𝑥2, 𝑥3, …… ,𝑥79We have the “Water” type Pokémons:
We assume 𝑥1, 𝑥2, 𝑥3, …… ,𝑥79 generate from the Gaussian (𝜇∗, Σ∗) with the maximum likelihood
= 𝑓𝜇,Σ 𝑥1 𝑓𝜇,Σ 𝑥2 𝑓𝜇,Σ 𝑥3 ……𝑓𝜇,Σ 𝑥79𝐿 𝜇, Σ
𝜇∗, Σ∗ = 𝑎𝑟𝑔max𝜇,Σ
𝐿 𝜇, Σ
𝜇∗ =1
79
𝑛=1
79
𝑥𝑛 Σ∗ =1
79
𝑛=1
79
𝑥𝑛 − 𝜇∗ 𝑥𝑛 − 𝜇∗ 𝑇
average
![Page 19: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/19.jpg)
Maximum Likelihood
Class 1: Water Class 2: Normal
𝜇1 =75.071.3
𝜇2 =55.659.8
Σ1 =874 327327 929
Σ2 =847 422422 685
![Page 20: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/20.jpg)
Now we can do classification
=𝑃 𝑥|𝐶1 𝑃 𝐶1
𝑃 𝑥|𝐶1 𝑃 𝐶1 + 𝑃 𝑥|𝐶2 𝑃 𝐶2𝑃 𝐶1|𝑥
If 𝑃 𝐶1|𝑥 > 0.5 x belongs to class 1 (Water)
P(C1) = 79 / (79 + 61) =0.56
P(C2) = 61 / (79 + 61) =0.44
𝑓𝜇1,Σ1 𝑥 =1
2𝜋 𝐷/2
1
Σ1 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1
𝜇1 =75.071.3
𝜇2 =55.659.8
Σ1 =874 327327 929
Σ2 =847 422422 685
𝑓𝜇2,Σ2 𝑥 =1
2𝜋 𝐷/2
1
Σ2 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
![Page 21: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/21.jpg)
How’s the results?Testing data: 47% accuracyAll: total, hp, att, sp att, de, sp de, speed (7 features)
54% accuracy …
𝑃 𝐶1|𝑥
Blue points: C1 (Water), Red points: C2 (Normal)
𝑃 𝐶1|𝑥> 0.5
𝑃 𝐶1|𝑥< 0.5
𝑃 𝐶1|𝑥> 0.5
𝑃 𝐶1|𝑥< 0.5
𝜇1, 𝜇2: 7-dim vector
Σ1, Σ2: 7 x 7 matrices
![Page 22: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/22.jpg)
Modifying Model
Class 1: Water
𝜇1 =75.071.3
𝜇2 =55.659.8
Σ1 =874 327327 929
Σ2 =847 422422 685
Class 2: Normal
The same ΣLess parameters
![Page 23: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/23.jpg)
Modifying Model
• Maximum likelihood
𝑥1, 𝑥2, 𝑥3, …… ,𝑥79“Water” type Pokémons:
𝑥80, 𝑥81, 𝑥82, …… ,𝑥140“Normal” type Pokémons:
𝜇1 Σ 𝜇2
Find 𝜇1, 𝜇2, Σ maximizing the likelihood 𝐿 𝜇1,𝜇2,Σ
𝐿 𝜇1,𝜇2,Σ = 𝑓𝜇1,Σ 𝑥1 𝑓𝜇1,Σ 𝑥2 ⋯𝑓𝜇1,Σ 𝑥79
× 𝑓𝜇2,Σ 𝑥80 𝑓𝜇2,Σ 𝑥81 ⋯𝑓𝜇2,Σ 𝑥140
Σ =79
140Σ1 +
61
140Σ2
Ref: Bishop, chapter 4.2.2
𝜇1 and 𝜇2 is the same
![Page 24: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/24.jpg)
Modifying Model
The same covariance matrix
All: total, hp, att, sp att, de, sp de, speed
54% accuracy 73% accuracy
The boundary is linear
![Page 25: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/25.jpg)
• Function Set (Model):
• Goodness of a function:
• The mean 𝜇 and covariance Σ that maximizing the likelihood (the probability of generating data)
• Find the best function: easy
Three Steps
𝑃 𝐶1|𝑥 =𝑃 𝑥|𝐶1 𝑃 𝐶1
𝑃 𝑥|𝐶1 𝑃 𝐶1 + 𝑃 𝑥|𝐶2 𝑃 𝐶2𝑥
If 𝑃 𝐶1|𝑥 > 0.5, output: class 1
Otherwise, output: class 2
![Page 26: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/26.jpg)
Probability Distribution
• You can always use the distribution you like
𝑃 𝑥|𝐶1
If you assume all the dimensions are independent,
then you are using Naive Bayes Classifier.
𝑥1𝑥2⋮𝑥𝑘⋮𝑥𝐾
= 𝑃 𝑥1|𝐶1 𝑃 𝑥2|𝐶1 𝑃 𝑥𝑘|𝐶1…… ……
1-D Gaussian
For binary features, you may assume they are from Bernoulli distributions.
![Page 27: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/27.jpg)
Posterior Probability
𝑃 𝐶1|𝑥 =𝑃 𝑥|𝐶1 𝑃 𝐶1
𝑃 𝑥|𝐶1 𝑃 𝐶1 + 𝑃 𝑥|𝐶2 𝑃 𝐶2
=1
1 +𝑃 𝑥|𝐶2 𝑃 𝐶2𝑃 𝑥|𝐶1 𝑃 𝐶1
=1
1 + 𝑒𝑥𝑝 −𝑧
𝑧 = 𝑙𝑛𝑃 𝑥|𝐶1 𝑃 𝐶1𝑃 𝑥|𝐶2 𝑃 𝐶2
= 𝜎 𝑧Sigmoid function
z
z
![Page 28: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/28.jpg)
Warning of Math
![Page 29: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/29.jpg)
Posterior Probability
𝑃 𝐶1|𝑥 = 𝜎 𝑧 𝑧 = 𝑙𝑛𝑃 𝑥|𝐶1 𝑃 𝐶1𝑃 𝑥|𝐶2 𝑃 𝐶2
𝑃 𝑥|𝐶1 =1
2𝜋 𝐷/2
1
Σ1 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1
𝑃 𝑥|𝐶2 =1
2𝜋 𝐷/2
1
Σ2 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
𝑧 = 𝑙𝑛𝑃 𝑥|𝐶1𝑃 𝑥|𝐶2
+ 𝑙𝑛𝑃 𝐶1𝑃 𝐶2
sigmoid
𝑁1𝑁1 +𝑁2𝑁2
𝑁1 +𝑁2
=𝑁1𝑁2
![Page 30: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/30.jpg)
= 𝑙𝑛Σ2 1/2
Σ1 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1 − 𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 −
𝑙𝑛
12𝜋 𝐷/2
1Σ1 1/2 𝑒𝑥𝑝 −
12𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1
12𝜋 𝐷/2
1Σ2 1/2 𝑒𝑥𝑝 −
12𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
= 𝑙𝑛Σ2 1/2
Σ1 1/2−1
2𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1 − 𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
𝑃 𝑥|𝐶1 =1
2𝜋 𝐷/2
1
Σ1 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1
𝑃 𝑥|𝐶2 =1
2𝜋 𝐷/2
1
Σ2 1/2𝑒𝑥𝑝 −
1
2𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
𝑧 = 𝑙𝑛𝑃 𝑥|𝐶1𝑃 𝑥|𝐶2
+ 𝑙𝑛𝑃 𝐶1𝑃 𝐶2
=𝑁1𝑁2
![Page 31: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/31.jpg)
𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1
= 𝑥𝑇 Σ1 −1𝑥 − 2 𝜇1 𝑇 Σ1 −1𝑥 + 𝜇1 𝑇 Σ1 −1𝜇1
𝑧 = 𝑙𝑛Σ2 1/2
Σ1 1/2 −1
2𝑥𝑇 Σ1 −1𝑥 + 𝜇1 𝑇 Σ1 −1𝑥 −
1
2𝜇1 𝑇 Σ1 −1𝜇1
+1
2𝑥𝑇 Σ2 −1𝑥 − 𝜇2 𝑇 Σ2 −1𝑥 +
1
2𝜇2 𝑇 Σ2 −1𝜇2
= 𝑙𝑛Σ2 1/2
Σ1 1/2−1
2𝑥 − 𝜇1 𝑇 Σ1 −1 𝑥 − 𝜇1 − 𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
𝑧 = 𝑙𝑛𝑃 𝑥|𝐶1𝑃 𝑥|𝐶2
+ 𝑙𝑛𝑃 𝐶1𝑃 𝐶2
=𝑁1𝑁2
= 𝑥𝑇 Σ1 −1𝑥 − 𝑥𝑇 Σ1 −1𝜇1 − 𝜇1 𝑇 Σ1 −1𝑥 + 𝜇1 𝑇 Σ1 −1𝜇1
𝑥 − 𝜇2 𝑇 Σ2 −1 𝑥 − 𝜇2
= 𝑥𝑇 Σ2 −1𝑥 − 2 𝜇2 𝑇 Σ2 −1𝑥 + 𝜇2 𝑇 Σ2 −1𝜇2
+𝑙𝑛𝑁1𝑁2
![Page 32: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/32.jpg)
End of Warning
![Page 33: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/33.jpg)
𝑃 𝐶1|𝑥 = 𝜎 𝑧
Σ1 = Σ2 = Σ
𝑧 = 𝜇1 − 𝜇2 𝑇Σ−1𝑥 −1
2𝜇1 𝑇Σ−1𝜇1 +
1
2𝜇2 𝑇Σ−1𝜇2 + 𝑙𝑛
𝑁1𝑁2
𝒘𝑻b
𝑃 𝐶1|𝑥 = 𝜎 𝑤 ∙ 𝑥 + 𝑏
𝑧 = 𝑙𝑛Σ2 1/2
Σ1 1/2 −1
2𝑥𝑇 Σ1 −1𝑥 + 𝜇1 𝑇 Σ1 −1𝑥 −
1
2𝜇1 𝑇 Σ1 −1𝜇1
+1
2𝑥𝑇 Σ2 −1𝑥 − 𝜇2 𝑇 Σ2 −1𝑥 +
1
2𝜇2 𝑇 Σ2 −1𝜇2 +𝑙𝑛
𝑁1𝑁2
In generative model, we estimate 𝑁1, 𝑁2, 𝜇1, 𝜇2, Σ
Then we have w and b
How about directly find w and b?
![Page 34: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/34.jpg)
Reference
• Bishop: Chapter 4.1 – 4.2
• Data: https://www.kaggle.com/abcsds/pokemon
• Useful posts:• https://www.kaggle.com/nishantbhadauria/d/abcsds/po
kemon/pokemon-speed-attack-hp-defense-analysis-by-type
• https://www.kaggle.com/nikos90/d/abcsds/pokemon/mastering-pokebars/discussion
• https://www.kaggle.com/ndrewgele/d/abcsds/pokemon/visualizing-pok-mon-stats-with-seaborn/discussion
![Page 35: Classification: Probabilistic Generative Modelspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture...Example Application •Total: sum of all stats that come after this, a general](https://reader030.fdocuments.us/reader030/viewer/2022040923/5e9e07b8ed32b478fd2d254c/html5/thumbnails/35.jpg)
Acknowledgment
•感謝江貫榮同學發現課程網頁上的日期錯誤
•感謝范廷瀚同學提供寶可夢的 domain knowledge
•感謝 Victor Chen發現投影片上的打字錯誤