[email protected] CLOP A MATLAB® learning object package [email protected].

19
CLOP A MATLAB® learning object package http://clopinet.com/CLOP/ [email protected]

Transcript of [email protected] CLOP A MATLAB® learning object package [email protected].

Page 1: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

CLOP A MATLAB® learning object package

http://clopinet.com/CLOP/[email protected]

Page 2: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

What is CLOP?

CLOP stands for

Challenge Learning Object Package

(It was developed for use in ML challenges with hundreds of thousands of features and/or examples)

Page 3: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

What is CLOP?

CLOP is an object-oriented Matlab package using the “Spider” interface

Page 4: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

DATA OBJECTS

Page 5: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

data(X, Y)

% Load data: X=load([data_dir 'gisette_train.data']); Y=load([data_dir 'gisette_train.labels']); % Create a data object and examine it: dat=data(X, Y); browse(dat, 2);

Page 6: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

ALGORITHM OBJECTS

Page 7: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

algo(hyperparam)

% Create data objects:trainD=data(X,Y);testD=data(Xt,Yt);

% Define some hyperparameters:hyper = {'degree=3', 'shrinkage=0.1'};

% Create a kernel ridge regression model:model = kridge(hyper);

% Train it and test it:[resu, Model] = train(model, trainD);tresu = test(Model, testD);

% Visualize the results: roc(tresu);

Page 8: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

COMPOUND MODELS

Page 9: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

Preprocessing

% For example, create a smoothing kernel: my_ker=gauss_ker({'dim1=11', 'dim2=11', 'sigma1=2', 'sigma2=2'}); show(my_ker);

% Create a preprocessing object of type convolve: my_prepro=convolve(my_ker);

% Perform the preprocessing and visualize the results: d=train(my_prepro, dat); browse(d, 2);

Page 10: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

chain({model1, model2,…})

% Combine preprocessing and kernel ridge regression:model = chain({my_prepro,kridge(hyper)});

% Combine replicas of a base learner:for k=1:10 base_model{k}=chain({my_prepro, naive});endmy_model=ensemble(base_model);

ensemble({model1, model2,…})

Page 11: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

BASICMETHODS

Page 12: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

train(model, trainD)

% After creating your complex model, just one command: train model=ensemble({chain({standardize,kridge(hyper)}),chain({normalize,naive})});

[resu, Model] = train(model, trainD);

% After training your complex model, just one command: testtresu = test(My_model, testD);

% You can chain with a “cv” object to perform cross-validation:cv_model=cv(my_model);% Just call train and test on it!

test(Model, testD)

Page 13: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

BASICOBJECTS

Page 14: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

Some CLOP objects

Basic learning machines

Feature selection, pre- and post- processing

Compound models

Page 15: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

BENCHMARKS

Page 16: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

MADELON Best BER=6.22Best BER=6.220.57% - n0=20 (4%) – BER0=7.33%0.57% - n0=20 (4%) – BER0=7.33%

my_classif=svc({'coef0=1', 'degree=0', 'gamma=1', 'shrinkage=1'});

my_model=chain({probe(relief,{'p_num=2000', 'pval_max=0'}), standardize, my_classif})

DOROTHEA Best BER=8.54Best BER=8.540.99% - n0=1000 (1%) – BER0=12.37%0.99% - n0=1000 (1%) – BER0=12.37%

my_model=chain({TP('f_max=1000'), naive, bias});

Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmarkCompetitive baseline methods set new standards for the NIPS 2003 feature selection benchmark , , Isabelle Guyon, Jiwen Li, Theodor Mader, Patrick A. Pletscher, Georg Isabelle Guyon, Jiwen Li, Theodor Mader, Patrick A. Pletscher, Georg

Schneider and Markus UhrSchneider and Markus Uhr,Pattern Recognition Letters, Volume 28, Issue 12, 1 September 2007, Pages 1438-1444.,Pattern Recognition Letters, Volume 28, Issue 12, 1 September 2007, Pages 1438-1444.

Dataset Size Type FeaturesTraining Examples

Validation Examples

Test Examples

Arcene8.7 MB

Dense 10000 100 100 700

Gisette22.5 MB

Dense 5000 6000 1000 6500

Dexter0.9 MB

Sparse integer

20000 300 300 2000

Dorothea4.7 MB

Sparse binary

100000 800 350 800

Madelon2.9 MB

Dense 500 2000 600 1800

Class taught at ETH, Zurich, winter 2005Task of the students:• Baseline method provided, BER0 performance and n0 features.• Get BER<BER0 or BER=BER0 but n<n0.• Extra credit for beating best challenge entry.

5 10 15 20 25

5

10

15

20

25

5 10 15 20 25

5

10

15

20

25

GISETTE

DOROTHEA

NEW YORK, October 2, 2001 – Instinet Group Incorporated (Nasdaq: INET), the world’s largest electronic agency securities broker, today announced tha

DEXTER

MADELON

0 2000 4000 6000 8000 10000 12000 14000 160000

10

20

30

40

50

60

70

80

90

100

ARCENE

DEXTER Best BER=3.30Best BER=3.300.40% - n0=300 (1.5%) – BER0=5%0.40% - n0=300 (1.5%) – BER0=5%

my_classif=svc({'coef0=1', 'degree=1', 'gamma=0', 'shrinkage=0.5'});

my_model=chain({s2n('f_max=300'), normalize, my_classif})

GISETTE Best BER=1.26Best BER=1.260.14% - n0=1000 (20%) – 0.14% - n0=1000 (20%) – BER0=1.80%BER0=1.80%

my_classif=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=1'});

my_model=chain({normalize, s2n('f_max=1000'), my_classif});

ARCENE Best BER= 11.9 Best BER= 11.9 1.2 %1.2 % - n0=1100 (11%) – BER0=14.7%- n0=1100 (11%) – BER0=14.7%

my_svc=svc({'coef0=1', 'degree=3', 'gamma=0', 'shrinkage=0.1'});

my_model=chain({standardize, s2n('f_max=1100'), normalize, my_svc})

NIPS 2003 Feature Selection Challenge

Page 17: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

NIPS 2006 Model Selection Game

Dataset

CLOP models selected

ADA 2*{sns,std,norm,gentleboost(neural),bias}; 2*{std,norm,gentleboost(kridge),bias}; 1*{rf,bias}

GINA

6*{std,gs,svc(degree=1)}; 3*{std,svc(degree=2)}

HIVA

3*{norm,svc(degree=1),bias}

NOVA

5*{norm,gentleboost(kridge),bias}

SYLVA

4*{std,norm,gentleboost(neural),bias}; 4*{std,neural}; 1*{rf,bias}

 

First place: Juha Reunanen, cross-indexing-7

sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters

not shown)

Dataset

CLOP models selected

ADA {sns, std, norm, neural(units=5), bias}

GINA

{norm, svc(degree=5, shrinkage=0.01), bias}

HIVA

{std, norm, gentleboost(kridge), bias}

NOVA

{norm,gentleboost(neural), bias}

SYLVA

{std, norm, neural(units=1), bias}

 

Second place: Hugo Jair Escalante Balderas, BRun2311062

sns = shift’n’scale, std = standardize, norm = normalize (some details of hyperparameters not shown)

Note: entry Boosting_1_001_x900 gave better results, but was older.

Subject: Re: Goalie masksLines: 21

Tom Barrasso wore a great mask, one time, last season. It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, along with a steel mill on one side and the Civic Arena on the other. On the back of the helmet was the old Pens' logo the current (at the time) Pens logo, and a space for the "new" logo.

Lori 

NOVA

GINA

HIVA

ADA

SYLVA

Dataset Domain Feature # Training # Validation # Test #

ADA Marketing 48 4147 415 41471

GINA Digit recognition 970 3153 315 31532

HIVA Drug discovery 1617 3845 384 38449

NOVA Text classification 16969 1754 175 17537

SYLVA Ecology 216 13086 1309 130857

Proc. IJCNN07, Orlando, FL, Aug, 2007:

PSMS for Neural Networks H. Jair Escalante, Manuel Montes y G´omez, and Luis Enrique Sucar

Model Selection and Assessment Using Cross-indexing, Juha Reunanen

Page 18: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

Credits

The Challenge Learning Object Package (CLOP) is based on code to which many people have contributed:- The developers of CLOP: Isabelle Guyon and Amir Reza Saffari Azar. - The creators of The Spider: Jason Weston, André Elisseeff, Gökhan BakIr, Fabian Sinz. - The developers of the packages attached to CLOP: Olivier Chapelle, Hugo Jair Escalante Balderas (PSMS), Gavin Cawley (LSSVM), Chih-Chung Chang and Chih-JenLin Jun-Cheng (LIBSVM), Chen, Kuan-Jen Peng, Chih-Yuan Yan, Chih-Huai Cheng, and Rong-En Fan (LIBSVM Matlab interface), Junshui Ma and Yi Zhao (second LIBSVM Matlab interface), Leo Breiman and Adele Cutler (Random Forests), Ting Wang (RF Matlab interface), Ian Nabney and Christopher Bishop (NETLAB). - The contributors to other Spider functions or packages: Thorsten Joachims (SVMLight), Chih-Chung Chang and Chih-JenLin (LIBSVM), Ronan Collobert (SVM Torch II), Jez Hill, Jan Eichhorn, Rodrigo Fernandez, Holger Froehlich, Gorden Jemwa, Kiyoung Yang, Chirag Patel, Sergio Rojas. - The authors of the Weka package and the R project who made code available, which was interfaced to Matlab and made accessible to CLOP.

Page 19: support@clopinet.com CLOP A MATLAB® learning object package  support@clopinet.com.

http://clopinet.com/CLOP/ [email protected]

Book with CLOP and datasets

Feature Extraction, Foundations and Applications, Isabelle Guyon, Steve Gunn, et al, Eds.  Springer, 2006 http://clopinet.com/fextract-book/

• CD including CLOP and the data of the NIPS2003 challenge• Tutorial chapters• Invited papers on the best results of the challenge