Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of...
![Page 1: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/1.jpg)
Huang,Kaizhu
Classifier based
on mixture of density tree
CSE Department,The Chinese University of Hong Kong
![Page 2: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/2.jpg)
Huang,Kaizhu
Basic Problem Given a dataset
{(X1,C), (X2,C),… ,(XN-1,C), (XN,C)}
Here Xi stands for the training data,C stands for the class label,assuming we have m classes,We estimate the probability
P(Ci |X), i=1,2,…,m (1)
The classifier is then denoted by:
The key point is How we can estimate the posterior probability (1)?
)|(maxarg XCP ii
![Page 3: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/3.jpg)
Huang,Kaizhu
Density estimation problem Given a dataset D
{X1, X2,… ,XN-1, XN,}
Where Xi is a instance of a m-variable vector
{v1 , v2 ,… vm-1, vm}
The goal is to find a joint distribution P(v1, v2 ,… vm-1, vm ) which can maximize the
negative entropy
N
ii
PXPQ
1
)(logmaxarg
![Page 4: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/4.jpg)
Huang,Kaizhu
Interpretation
spacestatetheasdefine
xPxPNXPX
N
ii
)(log)()(log1
According to information theory,This measures how many bites are needed to described D based on the probability distribution P .To find the maximum P ,it is actually to find the well-know MDL (minimal description length)
![Page 5: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/5.jpg)
Huang,Kaizhu
Graphical Density estimation
Naive Bayesian Network (NB) Tree augmented Naive Network(TANB) Chow-Liu tree network(CL) Mixture of tree network(MT)
![Page 6: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/6.jpg)
Huang,Kaizhu
NB TANB
Chow-Liu Mixture of tree
EM
![Page 7: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/7.jpg)
Huang,Kaizhu
Naive Bayesian Network Given a problem in Slide 3,we make
the following assumption about the variables: All the variables are independent,given
the class label Then the joint distribution P can be
written as:
)|()|),...,(( 121 CvPCvvvvP ii
mm
Õ
![Page 8: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/8.jpg)
Huang,Kaizhu
Structure of NB
1. With this structure ,it is easy to estimate the joint distribution since we can obtain the by the following easy accumulation
)|( CvP i
values' of one isk
variable,
,)|(
i
i
j
kvji
v
discreteaisvassume
Nc
NCkvP i
![Page 9: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/9.jpg)
Huang,Kaizhu
Chow-Liu Tree network Given a problem in Slide 3,we make the
following assumption about the variables:
Each variable has direct dependence relationship with just one other variable and is conditional independent with other variables ,given the class label.
Thus the joint distribution can be written into :
iv
mergersofnpermutatio
unknownanislmllwhereiij
CvvPCvvvvP ilj
m
limm
,...,2,1 int
),...,2,1( ,)(0
),(|()|,,...,,( )(1
121
![Page 10: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/10.jpg)
Huang,Kaizhu
Example of CL methodsFig2 is an example of CL tree where 1.v3 is just conditional dependent on v4,and conditional independent on other variablesP(v3|v4,B)=P(v3|v4)2.v5 is just conditional dependent on v4, and conditional independent on other variablesP(v5|v4,B)=P(v5|v4)3.v2 is just conditional dependent on v3, and conditional independent on other variablesP(v2|v3,B)=P(v2|v3)4.v1 is just conditional dependent on v3 ,and and conditional independent on other variablesP(v1|v3,B)=P(v1|v3)
![Page 11: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/11.jpg)
Huang,Kaizhu
)|()|()|()|()(
)|(),|()|()|()(
)|,()|()|()(
),|,()|()|()(
),|,(),,,|()|()(
),|,,()|()(
)()|,,,(
),,,,(
323145434
3232145434
32145434
432145434
432143215434
43521434
445321
54321
vvPvvPvvPvvPvP
vvPvvvPvvPvvPvP
vvvPvvPvvPvP
vvvvPvvPvvPvP
vvvvPvvvvvPvvPvP
vvvvvPvvPvP
vPvvvvvP
vvvvvP
![Page 12: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/12.jpg)
Huang,Kaizhu
CL Tree
mergersofnpermutatio
unknownanislmllwhereiij
CvvPCvvvvP ilj
m
limm
,...,2,1 int
),...,2,1( ,)(0
),(|()|,,...,,( )(1
121
•The key point about CL tree is that:
We use a multiplication of 2-dimension variable distributions to approximate the high-dimension distributions.
Then how can we find the best multiplication of 2-dimension variable distributions to approximate the high-dimension distributions optimally
![Page 13: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/13.jpg)
Huang,Kaizhu
CL tree algorithm
1.Obtaining P(vi|vj), P(vi,vj) for each pair of (vi,vj) by accumulating process .
2.Calculating the mutual entropy
3.Utilizing Maximum spanning tree algorithm to find the optimal tree structure,which the edge weight between two nodes vi,vj is I((vi,vj)
This CL algorithm was proved to be optimal in [1]
))()(
),(log(),(),(
, ji
jiji
vvji vPvP
vvPvvPvvI
ji
![Page 14: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/14.jpg)
Huang,Kaizhu
Mixture of tree (MT) A mixture of tree model is defined to be a
distribution of the form:
A MT can be viewed as containing a unobserved choice variable z,which takes values k{1,… }
integer randoman as defined is tree,CL a is )(
.1 ;,...,1,0
)()(
1
1
lvT
lkwith
vTvQ
k
l
kkk
l
k
kk
l
![Page 15: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/15.jpg)
Huang,Kaizhu
![Page 16: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/16.jpg)
Huang,Kaizhu
Difference between MT & CL z can be any integer variable, especially when
unobserved variable z is the class label,the MT is changed into the multi-CL tree
CL is a supervised learning algorithm,which has to be trained each tree for each class
MT is a unsupervised learning algorithm,which considers the class variable as the training data
![Page 17: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/17.jpg)
Huang,Kaizhu
Optimization problem of MT Given a data set of observations
We are required to find the mixture of tree Q that satisfies
This optimization problem in mixture model can be solved by EM(Expectation Maximizing) methods
},...,,{ 21 NxxxD
(1) )(logmaxarg1'
N
i
i
QXQQ
![Page 18: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/18.jpg)
Huang,Kaizhu
![Page 19: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/19.jpg)
Huang,Kaizhu
![Page 20: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/20.jpg)
Huang,Kaizhu
We maximize (7) with respect k and Tk with the constraint
We can obtain the update equation:
As for the second term of (7) ,
In fact it is a CL procedure,so we can maximize it by finding a CL tree based on
;11
N
ii
![Page 21: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/21.jpg)
Huang,Kaizhu
MT in classifiers1. In training phrase. Train the MT model on the training data
domain {c}V,C is the class label, V is the input domain
2. In testing phrasea new instance xV is classified by picking the most likely value of the class variable given the settings of the other variables:
),(maxarg)( xxQxc c
xc
![Page 22: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/22.jpg)
Huang,Kaizhu
Multi-CL in handwritten digit recognition
1. Feature extraction
The four basic configurations above are rotated in four cardinal directions and applied to the characters in the six overlapped zones shown in the following
![Page 23: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/23.jpg)
Huang,Kaizhu
Multi-CL in handwritten digit recognition
So we have 4*4*6=96 dimension features
![Page 24: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/24.jpg)
Huang,Kaizhu
Multi-CL in handwritten digit recognition
1.For a given pattern,we calculate the probabilities this pattern belongs to each class(for digit we have 10 class ,0,1,2,…9)
2.We choose the maximum class probability as the classification result
Here the probability the pattern “2” belongs to class 2 is the maxim,so we classified it as digit 2
![Page 25: Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d545503460f94a3153a/html5/thumbnails/25.jpg)
Huang,Kaizhu
Discussion 1. When all of the component trees are in
the same structure,the MT becomes a TANB mode.
2. When z is class label,MT becomes a CL mode
3.The MT is a general mode of CL and naive bayesian mode.So the performance is expected to be better than NB, TANB,CL