Machine Learning - Bayesian Decision Theory and …vishy/fall2016/notes/BayesianDecision.pdf ·...
Transcript of Machine Learning - Bayesian Decision Theory and …vishy/fall2016/notes/BayesianDecision.pdf ·...
Machine LearningBayesian Decision Theory and Classification
S.V.N. (vishy) Vishwanathan
University of California, Santa [email protected]
October 21, 2016
S.V. N. Vishwanathan (UCSC) CMPS242 1 / 21
Binary Classification
Outline
1 Binary Classification
2 Generative ModelsGaussian Generative ModelNaive Bayes
3 Discriminative ClassifiersLogistic Regression
S.V. N. Vishwanathan (UCSC) CMPS242 2 / 21
Binary Classification
Problem Setting: Binary Classification
Data: x = (x1, x2, . . . , xN)>
Labels: t = (t1, t2, . . . , tN)> with ti ∈ {0, 1}t = 1 implies class C1 and t = 0 implies class C2
Let us call the two classes C1 and C2, and let p (C1) = π andp (C2) = 1− π.
S.V. N. Vishwanathan (UCSC) CMPS242 3 / 21
Binary Classification
Basic Idea
Estimate p (Ci |x) and predict C1 if p (C1|x) > p (C2|x)
Key Problem: How to estimate p (Ci |x)?
Two philosophies:
GenerativeDiscriminative
S.V. N. Vishwanathan (UCSC) CMPS242 4 / 21
Generative Models
Outline
1 Binary Classification
2 Generative ModelsGaussian Generative ModelNaive Bayes
3 Discriminative ClassifiersLogistic Regression
S.V. N. Vishwanathan (UCSC) CMPS242 5 / 21
Generative Models
Generative Models
p (Ci |x) =p (x|Ci ) · p (Ci )
p (x)
Decision function: predict C1 if
p (C1|x) > p (C2|x)
S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21
Generative Models
Generative Models
p (Ci |x) =p (x|Ci ) · p (Ci )
p (x)
Decision function: predict C1 if
p (x|C1) · p (C1)
p (x)>
p (x|C2) · p (C2)
p (x)
S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21
Generative Models
Generative Models
p (Ci |x) =p (x|Ci ) · p (Ci )
p (x)
Decision function: predict C1 if
p (x|C1) · p (C1) > p (x|C2) · p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21
Generative Models
Generative Models
p (Ci |x) =p (x|Ci ) · p (Ci )
p (x)
Decision function: predict C1 if
ln p (x|C1) + ln p (C1) > ln p (x|C2) + ln p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21
Generative Models
Generative Models
p (Ci |x) =p (x|Ci ) · p (Ci )
p (x)
Decision function: predict C1 if
ln p (x|C1) + lnπ > ln p (x|C2) + ln (1− π)
S.V. N. Vishwanathan (UCSC) CMPS242 6 / 21
Generative Models Gaussian Generative Model
Class-Conditional Gaussian Distribution
p (x|Ck) = N (x|µk ,Σk)
S.V. N. Vishwanathan (UCSC) CMPS242 7 / 21
Generative Models Gaussian Generative Model
Class-Conditional Gaussian Distribution
p (x|Ck) =1
(2π)D2
1
|Σ|12
exp
{−1
2(x− µk)>Σ−1
k (x− µk)
}
S.V. N. Vishwanathan (UCSC) CMPS242 7 / 21
Generative Models Gaussian Generative Model
Decision Rule
ln p (x|C1) + ln p (C1) > ln p (x|C2) + ln p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 8 / 21
Generative Models Gaussian Generative Model
Decision Rule
ln
(1
(2π)D2
1
|Σ1|12
exp
{−1
2(x− µ1)>Σ−1
1 (x− µ1)
})+ ln p (C1) >
ln
(1
(2π)D2
1
|Σ2|12
exp
{−1
2(x− µ2)>Σ−1
2 (x− µ2)
})+ ln p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 8 / 21
Generative Models Gaussian Generative Model
Decision Rule
1
2x>(Σ−1
2 − Σ−11
)x + µ>1 Σ−1
1 x− µ>2 Σ−12 x + b > 0
where
b = ln
(π
1− π
)− 1
2ln|Σ1||Σ2|
+1
2µ>2 Σ−1
2 µ2 −1
2µ>1 Σ−1
1 µ1
S.V. N. Vishwanathan (UCSC) CMPS242 8 / 21
Generative Models Gaussian Generative Model
Special Case: Σi = Σ
(µ1 − µ2)>Σ−1x + b > 0
where
b = ln
(π
1− π
)+
1
2µ>2 Σ−1µ2 −
1
2µ>1 Σ−1µ1
S.V. N. Vishwanathan (UCSC) CMPS242 9 / 21
Generative Models Gaussian Generative Model
Special Case: Σi = Σ
(µ1 − µ2)>Σ−1︸ ︷︷ ︸w>
x + b > 0
where
b = ln
(π
1− π
)+
1
2µ>2 Σ−1µ2 −
1
2µ>1 Σ−1µ1
S.V. N. Vishwanathan (UCSC) CMPS242 9 / 21
Generative Models Gaussian Generative Model
Special Case: Σi = Σ
w>x + b > 0
where
b = ln
(π
1− π
)+
1
2µ>2 Σ−1µ2 −
1
2µ>1 Σ−1µ1
S.V. N. Vishwanathan (UCSC) CMPS242 9 / 21
Generative Models Gaussian Generative Model
Parameter Estimation via MLE
p (x, t|π, µ1, µ2,Σ) =N∏
n=1
[π · N (xn|µ1,Σ)]tn · [(1− π) · N (xn|µ2,Σ)]1−tn
S.V. N. Vishwanathan (UCSC) CMPS242 10 / 21
Generative Models Gaussian Generative Model
Parameter Estimation via MLE
ln p (x, t|π, µ1, µ2,Σ) =N∑
n=1
tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)
+ (1− tn) lnN (xn|µ2,Σ)
S.V. N. Vishwanathan (UCSC) CMPS242 10 / 21
Generative Models Gaussian Generative Model
Focus on π
ln p (x, t|π, µ1, µ2,Σ) =N∑
n=1
tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)
+ (1− tn) lnN (xn|µ2,Σ)
Take gradients and set to zero:
π =1
N
N∑n=1
tn =N1
N=
N1
N1 + N2
S.V. N. Vishwanathan (UCSC) CMPS242 11 / 21
Generative Models Gaussian Generative Model
Focus on µ1
ln p (x, t|π, µ1, µ2,Σ) =N∑
n=1
tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)
+ (1− tn) lnN (xn|µ2,Σ)
Take gradients and set to zero:
µ1 =1
N1
N∑n=1
tnxn
Similar calculation for µ2
S.V. N. Vishwanathan (UCSC) CMPS242 12 / 21
Generative Models Gaussian Generative Model
Focus on Σ
ln p (x, t|π, µ1, µ2,Σ) =N∑
n=1
tn lnπ + (1− tn) ln (1− π) + tn lnN (xn|µ1,Σ)
+ (1− tn) lnN (xn|µ2,Σ)
Take gradients and set to zero:
Σ =N1
NS1 +
N2
NS2
S1 =1
N1
∑n,tn=1
(xn − µ1) (xn − µ1)>
S2 =1
N2
∑n,tn=0
(xn − µ2) (xn − µ2)>
S.V. N. Vishwanathan (UCSC) CMPS242 13 / 21
Generative Models Naive Bayes
Class-Conditional Distribution
For simplicity let each component xi ∈ {0, 1} and we assume conditionalindependence
p (x|Ck) =D∏i=1
µxiki (1− µki )(1−xi )
S.V. N. Vishwanathan (UCSC) CMPS242 14 / 21
Generative Models Naive Bayes
Decision Rule
ln p (x|C1) + ln p (C1) > ln p (x|C2) + ln p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21
Generative Models Naive Bayes
Decision Rule
D∑i=1
xi lnµ1i + (1− xi ) ln (1− µ1i ) + ln p (C1) >
D∑i=1
xi lnµ2i + (1− xi ) ln (1− µ2i ) + ln p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21
Generative Models Naive Bayes
Decision Rule
D∑i=1
(xi ln
µ1i
µ2i+ (1− xi ) ln
(1− µ1i
1− µ2i
))+ ln
(π
1− π
)> 0
S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21
Generative Models Naive Bayes
Decision Rule
D∑i=1
xi · lnµ1i · (1− µ2i )
µ2i · (1− µ1i )︸ ︷︷ ︸w>x
+ ln
(1− µ1i
1− µ2i
)+ ln
(π
1− π
)︸ ︷︷ ︸
b
> 0
S.V. N. Vishwanathan (UCSC) CMPS242 15 / 21
Discriminative Classifiers
Outline
1 Binary Classification
2 Generative ModelsGaussian Generative ModelNaive Bayes
3 Discriminative ClassifiersLogistic Regression
S.V. N. Vishwanathan (UCSC) CMPS242 16 / 21
Discriminative Classifiers
Rewriting the Model
p (C1|x) =p (x|C1) · p (C1)
p (x)
S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21
Discriminative Classifiers
Rewriting the Model
p (C1|x) =p (x|C1) · p (C1)
p (x|C1) · p (C1) + p (x|C2) · p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21
Discriminative Classifiers
Rewriting the Model
p (C1|x) =exp (a1)
exp (a1) + exp (a2)
where ak = ln p (x|Ck) · p (Ck)
S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21
Discriminative Classifiers
Rewriting the Model
p (C1|x) =1
1 + exp (−a)= σ (a)
where a = a1 − a2
S.V. N. Vishwanathan (UCSC) CMPS242 17 / 21
Discriminative Classifiers
Key Idea
Recall that in the Gaussian case with Σi = Σ
a = lnp (x|C1) · p (C1)
p (x|C2) · p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 18 / 21
Discriminative Classifiers
Key Idea
Recall that in the Gaussian case with Σi = Σ
a = ln p (x|C1) + ln p (C1)− ln p (x|C2)− ln p (C2)
S.V. N. Vishwanathan (UCSC) CMPS242 18 / 21
Discriminative Classifiers
Key Idea
Recall that in the Gaussian case with Σi = Σ
a = (µ1 − µ2)>Σ−1︸ ︷︷ ︸w>
·x + b = µ>1 Σ−1︸ ︷︷ ︸w>
1
·x− µ>2 Σ−1︸ ︷︷ ︸w>
2
·x + b
where
b = ln
(π
1− π
)+
1
2µ>2 Σ−1µ2 −
1
2µ>1 Σ−1µ1
Why not model a directly as w>x + b for some arbitrary w?
S.V. N. Vishwanathan (UCSC) CMPS242 18 / 21