The Quadratic Formula and the Discriminant. The Discriminant.
COSC522–MachineLearning Discriminant Functions
Transcript of COSC522–MachineLearning Discriminant Functions
COSC 522 – Machine Learning
Discriminant Functions
Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: [email protected]
1
Course Website: http://web.eecs.utk.edu/~hqi/cosc522/
Recap from Previous Lecture• Definition of supervised learning (vs. unsupervised learning)• The difference between the training set and the test set• The difference between classification and regression• Definition of “features”, “samples”, and “dimension”• From histogram to probability density function (pdf)• In Bayes’ Formula, what is conditional pdf? Prior probability? Posterior
probability?• What does the normalization factor (or evidence) do?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of
error?• What are cost function (or objective function) and optimization method used in
MPP?
2
Recap
3
( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21
xxPxPx ωω >
( ) ( ) ( )( )xpPxp
xP jjj
ωωω
|| =
MaximumPosteriorProbability
€
P error( ) = P ω2 | x( )p x( )dxℜ1
∫ + P ω1 | x( )p x( )dxℜ2
∫Overall probability of error
Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent
from each other?• What would the covariance matrix look like if the features are independent
from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic
classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)
distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or
quadratic classifier (machine)? What does the decision boundary looklike?
• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?
• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the
optimization method used to find the best solution?
4
Multi-variate Gaussian
Linear and Quadratic Machines
and their assumptions
5
Discrimimant Function
One way to represent pattern classifier- use discriminant functions gi(x)
For two-class cases,
( ) ( ) if class to vector x feature aassign willclassifier The
xgxg ji
i
>
ω
€
g x( ) = g1(x) − g2(x) = P ω1 | x( ) − P ω2 | x( )
6
Multivariate Normal Density
€
p x ( ) =
12π( )d / 2
Σ1/ 2 exp − 1
2 x − µ ( )T
Σ−1 x − µ ( )
%
& ' (
) *
x : d - component column vector µ : d - component mean vectorΣ : d - by - d covariance matrixΣ : determinant
Σ-1 : inverse
!!!
"
#
$$$
%
&
=
!!!
"
#
$$$
%
&
=
ddx
xx
µ
µ
µ
11
,
( ) ( )!"
#$%
& −−== 2
2
21
exp21
1,dWhen σµ
σπ
xxp
!!!
"
#
$$$
%
&
=
!!!
"
#
$$$
%
&
=Σ2
1
121
1
111
dd
d
ddd
d
σσ
σσ
σσ
σσ
7
Discriminant Function for Normal Density
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )iiiiT
i
iiiiT
i
iii
Pxx
Pd
xx
Pxpxg
ωµµ
ωπµµ
ωω
lnln21
21
lnln212ln
221
ln|ln
1
1
+Σ−−Σ−−=
+Σ−−−Σ−−=
+=
−
−
( )( )
( ) ( )!"#
$%
&−Σ−−
Σ= − µµ
π
xxwxp T
d1
2/12/ 21
exp2
1|
Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent
from each other?• What would the covariance matrix look like if the features are independent
from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic
classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)
distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or
quadratic classifier (machine)? What does the decision boundary looklike?
• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?
• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the
optimization method used to find the best solution?
8
Multi-variate Gaussian
Linear and Quadratic Machines
and their assumptions
9
Case 1: Si=s2I
The features are statistically independent, and have the same varianceGeometrically, the samples fall in equal-size hyperspherical clustersDecision boundary: hyperplane of d-1 dimension
€
Σ =
σ 2 0 0 σ 2
$
%
& & &
'
(
) ) ) , Σ =σ 2d ,Σ−1 =
1σ 2 0
0 1σ 2
$
%
& & & &
'
(
) ) ) )
10
Linear Discriminant Function and Linear Machine
( ) ( )
( )
( ) ( )iiTi
Ti
i
ii
Ti
Ti
T
ii
i
Pxxg
Pxxx
Px
xg
ωσ
µµ
σ
µ
ωσ
µµµ
ωσ
µ
ln2
ln2
2
ln2
22
2
2
2
+−=
++−
−=
+−
−=
( ) ( )iT
ii
i
xxx
x
µµµ
µ
−−=−
−2
(distance) normEuclidean the:
11
Minimum-Distance Classifier
When P(wi) are the same for all c classes, the discriminant function is actually measuring the minimum distance from each x to each of the c mean vectors
( ) 2
2
2σµi
i
xxg
−
−=
12
Case 2: Si = SThe covariance matrices for all the classes are identical but not a scalar of identity matrix. Geometrically, the samples fall in hyperellipsoidalDecision boundary: hyperplane of d-1 dimension
€
gi x ( ) = ln p
x |ω i( ) + ln P ω i( )
= −12 x − µ i( )T
Σi−1 x −
µ i( ) + ln P ω i( )
= µ i
T Σ−1( )T x − 1
2 µ i
TΣ−1 µ i + lnP ω i( )Squared Mahalanobis distance
13
Case 3: Si = arbitrary
The covariance matrices are different from each categoryQuadratic classifierDecision boundary: hyperquadratic for 2-D Gaussian( ) ( ) ( )
( ) ( ) ( )
( ) ( )iiiiTi
Ti
Tii
T
iiiiT
i
iii
Pxxx
Pxx
Pxpxg
ωµµµ
ωµµ
ωω
lnln21
21
21
lnln21
21
ln|ln
111
1
+Σ−Σ−Σ+Σ−=
+Σ−−Σ−−=
+=
−−−
−
Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent
from each other?• What would the covariance matrix look like if the features are independent
from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic
classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)
distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or
quadratic classifier (machine)? What does the decision boundary looklike?
• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?
• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the
optimization method used to find the best solution?
14
Multi-variate Gaussian
Linear and Quadratic Machines
and their assumptions
Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent
from each other?• What would the covariance matrix look like if the features are independent
from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic
classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean) distance
classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or quadratic
classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Mahalanobis)
distance classifier?• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the
optimization method used to find the best solution?15
16
Bayes Decision Rule( ) ( ) ( )
( )xpPxp
xP jjj
ωωω
|| =
( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21
xxPxPx ωω >Maximum
PosteriorProbability
( ) ( ) if class to vector x feature aassign willclassifier The
xgxg ji
i
>
ωDiscriminantFunction
Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I
Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S
Case 3: Quadratic classifier , Si = arbitrary
All assuming Gaussian pdf