My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

46
Higgs Challenge My first attempt at Kaggle: 755st and proud!

description

The Higgs Machine Learning Challenge is not only a place for PhDs! As an undergraduate with a student license of MATLAB and a couple of dollars for Amazon AWS I could enter on the last 8 days of the challenge and overtake more than half of the competitors! In this talk, I'll present the challenge, my approach, and walk through the code.

Transcript of My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Page 1: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Higgs ChallengeMy first attempt at Kaggle: 755st and proud!

Page 2: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

@dhianadeva

Page 3: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Err… Kaggle?!Platform for data science competitions

Machine Learning, Big Data, Statistics, Data mining ...

Community for data scientistsUsers, leaderboard, forums …

Sponsors!

Page 4: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 5: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 6: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

$$$posored competitions!

Page 7: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

We don’t need no PhD!

Page 8: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Yes, we can!My guilty pleasure:

Student license of MATLAB <3

Open source alternatives:Python + Scikit + Numpy + …R + randomForest + e1071 + caret + …Octave!?

Page 9: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 10: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Higgs Challenge

Page 11: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 12: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 13: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 14: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

DatasetsTraining (labeled):

250k events30 featuresEvent id, weight and class (s/b)

Test (unlabeled):18% Public (500k events)72% Private

Page 16: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

training.csvEventId , DER_mass_MMC , … , Weight , Class100000 , 138.47 , … , 0.00265331133733 , s100001 , 160.937 , … , 2.23358448717 , b100002 , -999.0 , … , 2.34738894364 , b100003 , 143.905 , … , 5.44637821192 , b…

Page 17: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

test.csvEventId , DER_mass_MMC , … , PRI_jet_all_pt350000 , -999 , … , -0.0350001 , 106.398 , … , 47.575350002 , 117.794 , … , 0.0350003 , 135.861 , … , 0.0…

Page 18: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

submission.csvEventId , RankOrder , Class350000 , 262328 , b350001 , 201479 , b350002 , 212810 , b350003 , 134945 , b…

Page 19: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

End-to-end

Page 20: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

A little math...

(Aproximate Median Significance)

Page 21: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

755th/1785 secretsI’ve entered on the last 8 days of the 127-days challenge and could overtake more than half of the competitors using:

MATLAB 2014b (student license)Neural Networks Toolbox20$ EC2 at Amazon Web Services9 code files totaling 674 words

Page 22: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Neural netwhat?!

Neurons

Inputs Output

Page 23: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

For now, a Black box!

OutputInputs

Page 24: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

It trains

Output

Inputs

Target

Error

Page 25: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

It runs

OutputInputs

Page 26: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 27: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Moonlighting!

Page 28: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

1. nprtool2. fixunknowns3. trainlm4. processpca5. 0.8 threshold6. ams threshold pick7. hidden neurons pick8. 0.25*targets + amsweights

8 days!

Page 29: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Some stats...

Page 30: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 1

Page 31: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 32: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 2

Page 33: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 3

Page 34: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 4

Page 35: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 36: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 37: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 5

Page 38: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 39: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 6

Page 40: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 7

Page 41: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 8

Oops!(weighted errors using ams, regularization, mapstd, … nothing worked!)

Page 42: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Lessons learned+ Optimize self-learning doing things from scratch (or

from default baseline)

+ Kaggle is way funnier than studying with traditional datasets (iris, cancer, thyroid...)

+ Data science needs good engineering practices!

+ The competition fact sheet was a great way of accessing what I know I know, what I know I don’t know…

Page 43: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 44: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 45: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Let’s hack?!Re-considering PCAPCD?Dimensionality ReductionStop on best AMS (hack nn toolbox!)EnsembleAuto-encoderMATLAB unit testsMATLAB continuous integration

Page 46: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Thanks! ;)