Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, [email protected] Five...
-
Upload
agatha-owen -
Category
Documents
-
view
213 -
download
1
Transcript of Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, [email protected] Five...
Participation in the NIPS 2003 ChallengeTheodor Mader
ETH Zurich, [email protected]
Five Datasets were provided for experiments:•ARCENE: cancer diagnosis•DEXTER: text categorization•DOROTHEA: drug discovery•GISETTE: digit recognition•MADELON: artificial dataAll problems were two-class Problems
DATASETS
BACKGROUNDFeature Selection is a vital part for any classification procedure. Thus it is very important to understand common procedures and algorithms and see how they perform on real world data. The students had the chance to apply the knowledge acquired during class to real world datasets, learning about the strengths and weaknesses of the individual approaches.
SUMMARYIn the course on feature extraction of the winter semester 2005-2006 the students had to participate in the NIPS 2003 feature selection challenge. They experimented with different feature selection and classification methods in order to achieve high rankings.
METHODSThe students were provided with a Matlab machine learning package containing a variety of classification and feature extraction algorithms like supportvector machines, random probe method,...
The results were ranked according to the balanced error rate (BER), i.e. the error rate of both classes weighted by the number of examples of each class. For this purpose a test set was used which was not available to the students.
RESULTS (BER in %)
BER (%)
0 2 4 6 8 10 12 14 16
Our Entry
Baseline
Winner
DOROTHEA: drug discoverySize: 5000 features, 300 training examples, 300 validation examples
•Our Results indicate: feature selection by TP statistic was already sufficient, the relief criterion didn’t give much improvement.•We optimized the number of features in order to beat the baseline model.Submission:my_model=chain({TP('f_max=1400’),naive, bias})
CONCLUSIONSParticipating in the NIPS 2003 challenge was a very interesting opportunity. It could be clearly seen that in most cases there was no need to use very complicated classifiers, rather simple techniques (when applied correctly) and combined with powerful feature selection methods were faster and more successful. The importance of feature selection was hencee shown very clearly. Only using a fraction of the original features improved classification results and speed significantly.
Dexter BER (%)
BER (%)
0 2 4 6 8 10 12 14 16
Our Entry
Baseline
Winner
ARCENE: cancer diagnosis10000 features, 100 training examples, 100 validation examples
•Very small number of training examples, merging training and validation set improved performance significantly.
•We empiricaly found that the probe method performed better than signal to noise ratio
•Optimal number of features found by crossvalidation:Crossvalidation BER (%) and errorbar
0
2
4
6
8
10
12
14
16
2000 2100 2200 2300 2400 2500 2600
Number of features
Submission:
my_classif=svc({'coef0=1','degree=3','gamma=0','shrinkage=0.1'})
my_model=chain({probe(relief,{'p_num=2000', ‘f_max=2400'}), normalize, my_classif})
BER (%)
0 1 2 3 4 5 6
Our Entry
Baseline
Winner
DEXTER: text categorization, 5000 features, 300 training examples, 300 validation examples.
•Also very small number of training examples => we merged training and validation set
•Initial experiments with principal component analysis showed no improvement
•We sticked to the baseline model and optimized the shrinkage and number of features parameter
Our final submission:•Feature selection by signal to noise ratio (1210 features kept)
•Normalization of features•Support vector classifier with linear kernel
GISETTE: digit recognition, 5000 features, 6000 training examples, 1000 validation examples
•Garbage features were easily removed by feature selection according to signal to noise ratio
•Applying gaussian blur to the input images brought significant improvement.
•Shrinkage could further improve the results
BER (%)
0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2
Our Entry
Baseline
Winner
my_classif=svc({'coef0=1', 'degree=4', 'gamma=0', 'shrinkage=0.5'});
my_model=chain({convolve(’dim1=5’,’dim2=5’), normalize, my_classif})
MADELON: artificial data5000 features, 6000 training examples, 1000 validation examples
The relief method performed extremely well: With only 20 features the classification error could be reduced to below 7%Shrinkage proved to be nearly as important as kernel width for the gaussian kernel.
BER (%)
6,2 6,4 6,6 6,8 7 7,2 7,4
Our Entry
Baseline
Winner
my_classif=svc({'coef0=1', 'degree=0', 'gamma=0.5', 'shrinkage=0.3'})
my_model=chain({probe(relief,{’f_max=20', 'pval_max=0'}), standardize, my_classif})