1/15
Strengthening I-ReGEC classifierStrengthening I-ReGEC classifier
G. Attratto, D. Feminiano, and M.R. GuarracinoG. Attratto, D. Feminiano, and M.R. GuarracinoHigh Performance Computing and Networking InstituteHigh Performance Computing and Networking Institute
Italian National Research CouncilItalian National Research Council
2/15
Supervised learning
• Supervised learning refers
to the capability of a system
to learn from a set of
input/output couples:
Training Set.
3/15
Classification
• Consists of determining a model that it allows to group elements according to determined features
• The groups are the classes
4/15
Evaluation of classification methods
It’s ability’s pointer of prediction model
Some methods employ little time than others
The defined rules and the accuracy do not change considerable with various set
Possibility to classify dataset of great dimensions
• Accuracy
• Speed
• Robustness
• Scalability
5/15
• To render more efficient the examples’ choice during the training
Goals
• Delete the redundant examples or insufficient informative contribution
• Strengthening the training set, deleting the obsolete knowledge
Building anefficient, scalabile and generalizable
model
6/15
Classification techniques
Based on tree
Compute posterior probabilities with Bayes’ theorem
Simulate the behavior of the biological systems
Calculate hyperplanes
• Decision tree
• Bayesian Networks
• Neurals Networks
• Support Vector Machine (SVM)
(Optimal Tree)
(Slow in training)
(Slow in training)
7/15
SVM: The state of the art
Support vector
Optimal Hyperplane
Separation margin
• Find an examples set (support vectors)
representatives for classes
Nonlinearcase
Linear case
8/15
Regec• Two Hyperplanes representative for classes
(GEPSVM’s family)
011 x 022 x
Based on Genralized Eigenvalue
9/15
I-Regec• Select k points for each class with a clustering technique (K-means) |S| = 2xK
• Classify the test-set with the S points
• Add misclassified points in incremental mode to the S set
• On proceede until the finish of misclassified points
10/15
Strengthening• Apply I-ReGEC in order to obtain the training set
• Each iteration delete a point from training set
• Apply I-ReGEC in each iteration with new input set S
• Strengthening the set (save new S) if accuracy is improved
11/15
Microarray and matrix
EXAMPLES
FEATURESCLASSES
Gene expression
12/15
Results
DATASETACC.
I-RegecN° of
pointsACC.
StrengtheningN° of
points
Alon (62x2000)Colon cancer
73,00% 7,78 74,60% 7,78
Golub (72x7129)Leukaemia
87,12% 9,44 89,88% 9,44
Nutt (50x12625)Gliome
65,20% 7,47 65,20% 7,47
BRCA1 (22x3226)Breast Cancer
67,50% 4,24 67,50% 4,24
BRCA2 (22x3226)Breast Cancer
78,50% 5,53 79,50% 5,96
13/15
Results and Diagrams
Golub2D
Golub3D
I-Regec Strengthening
StrengtheningI-Regec
14/15
• The examples choice became more efficient
Conclusions
• The reduntants or obsolete examples have been deleted
• The training set are “strengthened”
15/15
Future work
• In order to optimize the execution time, the Strengthening technique would to go integrated into I-Regec.
Top Related