Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification...
Transcript of Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification...
Sparse Wavelet Representations andClassification
Challenge Kaggle in class
DREEM
March 5th 2015
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 1/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.
I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
The data
The AUC metric
Classification algorithms we triedScatNet + AUC gradient descent (Herschtal, Raskutti,ICML 2004ScatNet + Mixture of Probabilistic PrincipalComponent Analysis (MPPCA) (Tipping, Bishop,1999)Neural NetworksIdea (cheat ?) use subjects IDs
Results and Conclusion
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 3/10
The data
The data
I Times series
I Proportion of each subject in the training setI Positive rate for each subject in the training set
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10
The data
The dataI Times seriesI Proportion of each subject in the training set
I Positive rate for each subject in the training set
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10
The data
The dataI Times seriesI Proportion of each subject in the training setI Positive rate for each subject in the training set
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10
The AUC metric
The AUC metric
I Area under ROC curve
I Estimator (Wilcoxon-Man-Whitney) :P(f (x+) > f (x−))
I Adapted to skewed data
false positive rate0 0.2 0.4 0.6 0.8 1
tru
e p
os
itiv
e r
ate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curve - subj. 3 - (test on 20% of train)
AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10
The AUC metric
The AUC metric
I Area under ROC curveI Estimator (Wilcoxon-
Man-Whitney) :P(f (x+) > f (x−))
I Adapted to skewed data
false positive rate0 0.2 0.4 0.6 0.8 1
tru
e p
os
itiv
e r
ate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curve - subj. 3 - (test on 20% of train)
AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10
The AUC metric
The AUC metric
I Area under ROC curveI Estimator (Wilcoxon-
Man-Whitney) :P(f (x+) > f (x−))
I Adapted to skewed data
false positive rate0 0.2 0.4 0.6 0.8 1
tru
e p
os
itiv
e r
ate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curve - subj. 3 - (test on 20% of train)
AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each class
I Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−
I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9
neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.
I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9
neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.I Used as a black box.
I Best architecture : 4 hidden layers with 14,13,10,9neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9
neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Use subjects IDs
I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject
I ... lead to overfittingI and not useful for “ready trained” device
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 9/10
Classification algorithms we tried
Use subjects IDs
I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject
I ... lead to overfittingI and not useful for “ready trained” device
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 9/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transforms
I NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but good
I All algorithms : less than 15 minutes on a laptop fortraining
I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
training
I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10