Crystallization classification semisupervised
-
Upload
madhav-sigdel -
Category
Data & Analytics
-
view
237 -
download
0
Transcript of Crystallization classification semisupervised
lklklkljkhklhkh
Evaluation of Semi-supervised Learning forClassification of Protein Crystallization Imagery
Madhav Sigdel, Imren Din, Semih Din, Madhu S Sigdel, Marc Pusey, PhDRamazan Aygun, PhDEmail: [email protected] Research LabComputer Science DepartmentUniversity of Alabama in Huntsville
IEEE SoutheastCon 2014, Lexington, KY
OutlineBackgroundMotivationSemi-supervised ClassificationSelf TrainingYet Another Two Staged Idea (YATSI)Overview of FeaturesExperimental ResultsConclusion
IEEE SoutheastCon 2014, Lexington, KY
Background
Sample Protein Crystallization Trial Images
Non-crystals
Likely-leads
CrystalsModel of robotic system to collect images
IEEE SoutheastCon 2014, Lexington, KYImage Categories
Non-crystals Images without crystals (Clear drop/Precipitates)
Likely-leads Images with micro-crystals or high intensity regions without clear shapes
Crystals Images with different shapes of crystals (needles, plates, 3D crystals)
Related WorkProtein crystallization classification problem using variety of algorithms such asSupport vector machines (SVMs)Decision treesNeural networks etc. Combination of multiple classifiers (Saitoh et al. 08)Trend to increase the size of training data to improve classification performance79,632 images(Po & Laine 08)165,351 images (Cumba et al. 10)
MotivationExpert labeling is very difficult and time-consuming
Can we build a reliable classification system using limited labeled images?
Semi-Supervised learning
Semi-supervised ClassificationCombine labeled data and unlabeled data to improve the learning modelExamples of Semi-supervised classificationSelf-trainingYet-Another Two Staged Idea (YATSI)Laplacian SVM, transductive SVM etc.Used for applications such as text classification, spam email detection, software fault detection etc.
Self-TrainingLet L be the set of labeled data, U be the set of unlabeled data.Repeat Train a classifier h with training data LClassify data in U with hFind a subset U of U with the most confident predictionsL + U L U U U
IEEE SoutheastCon 2014, Lexington, KY
Yet-Another Two Staged Idea (YATSI)Uses a supervised classification algorithm and a nearest neighborhood algorithmTwo stagesFirst stageGenerate prediction model (M) using labeled data (L)Find predictions for unlabeled data (U) using M U (Pre-labeled data)Combine original labeled data and pre-labeled data (L + U)Second stageApply k nearest neighbor on (L+U) to determine the actual predictions for unlabeled instances
Drissens et al. 06
O?XXXXXOOOOOOOOOOXXXXXXXOK = 1, ? OK = 3, ? XXK-Nearest Neighbor Classifier
O?XXOOOOOOOXXXXX????????????????Classifier???????
Prediction Model
YATSI Algorithm
OOXXOOOOOOOXXXXXOXOOXOOOOXOOXOXOClassifierXOOOOXO
Prediction Model
YATSI Algorithm
OOXXOOOOOOOXXXXXOXOOXOOOOXOOXOXOXOOOOXO
YATSI Algorithm
OOXXOOOOOOOXXXXXOXOOXOOOOXOOXOXOXOOOOOOYATSI Algorithm
Overview of Features
IEEE SoutheastCon 2014, Lexington, KY3 thresholding techniques6 - Intensity Features9 Blob Features3*(6+9) = 45-dimension feature vector
Dataset
IEEE SoutheastCon 2014, Lexington, KY
Non-crystals
Likely-leads
Crystals2250 Images
2 Class Problem67% Non-crystals 33% Likely Crystals (Crystals + Likely Leads)
3 Class Problem67% Non-crystals 18% Likely Leads 15% Crystals
Experiments Self Training2 Supervised ClassifiersNave Bayesian (NB)Sequential Minimum Optimization (SMO)Confidence level (c) for first predictionc = 0.8c = 0.9c = 0.95Training sizes - 1%, 2%, 5%, 10%, 20%
IEEE SoutheastCon 2014, Lexington, KY
Experiments Self Training
IEEE SoutheastCon 2014, Lexington, KY
Experiments - Self-training
IEEE SoutheastCon 2014, Lexington, KY
Experiments - YATSI5 Supervised ClassifiersNave Bayesian (NB)Sequential Minimum Optimization (SMO)Decision Tree (J48)Multilayer Perceptron (MLP)Random Forest (RF)No of K-nearest neighbors (K)K = 10K = 20K = 30Training sizes - 1%, 2%, 5%, 10%, 20%
Experiments - YATSI
Supervised vs YATSI
Experiments - YATSIIEEE SoutheastCon 2014, Lexington, KY
Experiments - YATSIIEEE SoutheastCon 2014, Lexington, KY
Best Classifers Comparison
ConclusionCompared the performances of 2 semi-supervised classification techniques using self-training and YATSI approachNave Bayesian (NB) and SMO classifiers benefited from self-training and YATSI approachClassifiers J48, multilayer perceptron (MLP) and random forest (RF) did not show improvement by applying semi-supervised approachRandom forest provided the best classification performance
IEEE SoutheastCon 2014, Lexington, KY
Future WorkInvestigate active learning in combination with semi-supervised learning
IEEE SoutheastCon 2014, Lexington, KY
AcknowledgementNational Institutes of Health (GM090453) grantIEEE SoutheastCon 2014, Lexington, KY
THANK YOU
Madhav Sigdel, mren Din, Semih Din, Madhu Sigdel, Marc Pusey, PhDRamazan Aygun, PhDEmail: [email protected] Research LabComputer Science DepartmentUniversity of Alabama in Huntsville