Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and...
-
Upload
arlene-hubbard -
Category
Documents
-
view
212 -
download
0
Transcript of Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and...
Applications of Supervised Learning in Bioinformatics
Yen-Jen Oyang
Dept. of Computer Science and Information Engineering
Problem Definition ofSupervised Learning
(or Data Classification) In a supervised learning problem, each sample is
described by a set of feature values and each sample belongs to one of the predefined classes.
The goal is to derive a set of rules that predicts which class an incoming query sample should belong to, based on a given set of training samples. Supervised learning is also called data classification.
The Vector Space Model
feature1 feature2 ‧‧‧‧‧ featurem
sample1
sample2
samplen
Rxxx m ),...,,( vector feature 21v
Class 2
Class 1
Class C
In microarray data analysis, supervised In microarray data analysis, supervised learning algorithms have been learning algorithms have been employed to predict the class of an employed to predict the class of an incoming query sample based on the incoming query sample based on the existing samples with known classes.existing samples with known classes.
Application of Supervised Learning in Microarray Data
Analysis
For example, in the Leukemia data set, For example, in the Leukemia data set, there are 72 samples and 7129 genes.there are 72 samples and 7129 genes. 25 Acute Myeloid Leukemia(AML) 25 Acute Myeloid Leukemia(AML)
samples.samples. 38 B-cell Acute Lymphoblastic Leukemia 38 B-cell Acute Lymphoblastic Leukemia
(B-cell ALL) samples.(B-cell ALL) samples. 9 T-cell Acute Lymphoblastic Leukemia (T-9 T-cell Acute Lymphoblastic Leukemia (T-
cell ALL) samples.cell ALL) samples.
Application of Supervised Learning in Microarray Data
Analysis
Model of the Leukemia Dataset
gene1 gene2 ‧‧‧‧‧‧‧‧ gene7129
sample1
sample2
sample72
Rxxx ),...,,( vector feature 712921v
Class 2
Class 1
Class 3
Training Process From the mathematical point of view, the task
of the supervised learning algorithm in the training stage is to identify curves that separate samples with different classes.
Prediction of the class of an incoming query sample is carried out by referring to the separating curves identified during the training stage.
query
The Basis of Kernel Regression
. and 0, 0,with
,)(2
1
)()(lim
)()()(
have we),(function dimension 1 aFor
2
2
2
)(
0
k
kx
k
kfe
kfkx
dttftxxf
xf-
Given a set of samples Given a set of samples
randomly taken from a probability randomly taken from a probability distribution. We want to find a set of distribution. We want to find a set of Gaussian functions and the Gaussian functions and the corresponding weights to obtain an corresponding weights to obtain an approximate probability density function, approximate probability density function, i.e.i.e.
nsss ,...,, 21
),;( iiK μν
iw
).(),;()(ˆ 2
2
2 νμνν
μν
fewKwf i
i
iiii
ii
Problem Definition of Kernel Density Estimation (KDE)
with Gaussian Kernels
The KDE based learning algorithm The KDE based learning algorithm constructs one approximate probability constructs one approximate probability density function for each class of samples.density function for each class of samples.
Prediction is conducted based on the Prediction is conducted based on the following likelihood function:following likelihood function:
samples. -class offunction density y probabilit
eapproximat theis )(ˆ and ly,respective classes, all
of samples trainingofnumber total theand class of
samples trainingofnumber theare and where
),(ˆ)(
j
f
j
fL
j
j
j
j
j
SS
vS
Sv
The KDE Based Predictor
The Decision Function of the RVKDE Based Predictor
vector.feature theofdimension theis (iii)
ly;respective ),( ofproximity in the samples (negative)
positve theamong distance average theis )( (ii)
ly;respective , and (i)
)(2exp
2
11
)(2exp
2
11)(
12
2
12
2
m
n
nf
ji
ji
iiii
n
j j
j
m
j
n
i i
i
m
i
ss
sν
sνν
With the KDE based predictor, each training sample is associated with a kernel function, typically with a varying width.
An Example ofSupervised Learning (Data Classification)
Given the data set shown on next slide, can we figure out a set of rules that predict the classes of samples?
Data Set
Data Class Data Class Data Class
( 15,33)
O ( 18,28)
× ( 16,31)
O
( 9 ,23) × ( 15,35)
O ( 9 ,32) ×
( 8 ,15) × ( 17,34)
O ( 11,38)
×
( 11,31)
O ( 18,39)
× ( 13,34)
O
( 13,37)
× ( 14,32)
O ( 19,36)
×
( 18,32)
O ( 25,18)
× ( 10,34)
×
( 16,38)
× ( 23,33)
× ( 15,30)
O
( 12,33)
O ( 21,28)
× ( 13,22)
×
Distribution of the Data Set
。。
10 15 20
30
。。。 。。
。 。。
××
××
×
×
×
×
×
×
××
×
×
Rule Based on Observation
.x
o
30
253015 22
class
else
class
, thenand y
yxIf
Rule Generated by a Kernel Density Estimation Based
Algorithm
Let and
If then prediction=“O”.
Otherwise prediction=“X”.
2o
2o
210
12o
o 2
1)( i
icv
i i
evf
.
2
1)(
2
214
12x
x
2x
x
j
jcv
j j
evf
),()( xo vfvf
(15,33)
(11,31)
(18,32)
(12,33)
(15,35)
(17,34)
(14,32)
(16,31)
(13,34)
(15,30)
1.723 2.745 2.327 1.794 1.973 2.045 1.794 1.794 1.794 2.027
ico
io
(9,23) (8,15)(13,37)
(16,38) (18,28) (18,39) (25,18)(23,33)
(21,28) (9,32)(11,38)
(19,36)(10,34)
(13,22)
6.458 10.08 2.939 2.745 5.451 3.287 10.86 5.322 5.070 4.562 3.463 3.587 3.232 6.260
jcx
jx