Gist 2.3
-
Upload
yeardley-duke -
Category
Documents
-
view
26 -
download
3
description
Transcript of Gist 2.3
Gist 2.3
John H. Phan
MIBLab Summer Workshop
June 28th, 2006
Overview
• Gist 2.3 Tools
– Support Vector Machine (SVM) classification
– Kernel Principal Component Analysis (KPCA)
Gist 2.3 Overview
• Gist is a set of command line programs
written in C
– Primary programs
• SVM and KPCA
– Auxiliary programs
• Ranking and feature selection
– Web interface for the SVM component
Support Vector Machines
• Supervised classification method
• Maximal margin hyperplane
http://www.dtreg.com/svm.htm
Primary Gist Programs
• gist-train-svm – train support vector machine
• gist-classify – classify points with a trained
support vector machine
• gist-fast-classify – linear optimized classification
• gist-kpca – kernel principal component analysis
• gist-project – project points onto KPCA
components
Auxiliary Gist Programs• gist-fselect – linear feature selection
• gist-matrix – basic matrix manipulations
• gist-score-svm – performance of gist-train-svm
and gist-classify
• gist-rfe – recursive feature elimination
• gist-sigmoid – classification probabilities
• gist2html – convert output to HTML
• gist-kernel – create a square kernel matrix
gist-train-svm
• Train a support vector machine
–Input file is tab delimited but transposed
–Output file contains 5 columns
• Label, binary classification, SVM
weights, predicted classification,
discriminant value
gist-fselect – Feature Selection• Fisher Criterion Score
• t-test
• Welch t-test
• Mann-Whitney
• SAM (significance analysis of microarrays)
• Threshold number of mis-classifications
gist-score-svm• Compute False and true positives on
training and test sets
• Compute area under the ROC curves for
training and test sets
gist-rfe• Recursive feature elimination – SVM
–Initialize the data to contain all features
–Train an SVM on the data
–Rank features according to SVM weights
–Eliminate lower 50% of features
–Repeat until 1 feature is left
Gist SVM Web Interface• SVM Training and Testing
• Normalize data by mean centering or z-score
• Adjust kernel settings (linear, polynomial, or radial
basis)
• Demo (http://svm.sdsc.edu/svm-intro.html)
Comparison to MAGMA
• Normalizations
– Row (gene) mean center
– Row (gene) median center
– Column mean center
– Column median center
– Row z-score
– Column z-score
– Quantile
– Handles missing values
MAGMA Gist (Web)
• Normalizations
– Column (sample) mean center
– Column (sample) z-score
Comparison to MAGMA
• Classifiers
– SVM
– Fisher’s Discriminant
– SDF
• Data Representation
– Visualization of classifiers
– Database storage
MAGMA Gist (Web)
• Classifiers
– SVM
• Data Representation
– Text files
– HTML output
Comparison to MAGMA
• Ranking Methods
– Resubstitution
– Cross validation
– Bootstrap
– Bolstering
MAGMA Gist (Web)
• Ranking Methods
– Fisher criterion
– T-test
– SAM
– Mann-Whitney
– Welch t-test