Uvrgrp ml
-
Upload
dccom -
Category
Technology
-
view
104 -
download
2
description
Transcript of Uvrgrp ml
David Callender• Finished in top 2% (18th out of >1300) on 3 year
$3 million Machine Learning competition.
• Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College
• Studied computational protein design at the University of Washington
• Studied Mathematical foundations of Quantum Mechanics at Macalester College
Machine Learning in Rcirca 2013
David Callender
a.k.a. Using R on Kaggle
who will end up in the hospital
} drug effectiveness
Computer Security:Determining employee
access needs
What will the salary be for a given job advertisement
Not Just Kaggle
•Movie recomendations•Popular productions
•Product recomendations•Good business oportunities
•The Entire Internet•Probably a lot more too
Talk Outline
• Motivation
• Concepts
• Algorithms
• Decision Trees and Forests
• Neural networks
• Kaggle
• Interactive session with R packages
• randomForest
• gbm
• neuralnet
Supervised Learning
Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q? 3 male 27 0 0 8.6625 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S
Train model with examples where
you know value of “survived”
Use model to predict value of
“survived”
Predicting survival for passengers of Titanic
binary
numeric
catagorical
Overfitting
http://en.wikipedia.org/wiki/File:Overfitting_on_Training_Set_Data.pdf Tomaso Poggio
Decision Trees
http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R
Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q
? 3 male 27 0 0 8.7 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S
Random Forest (RF)Survived Pclass Sex Age SibSp Parch Fare Embarked
0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C
Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C
Random Sub-SpacesBagging
{
{Voting/Avg
Prediction
Training
Adaboost &Gradient Boosting
• Initialize a set of weights, One for each training example, with equal value
• Train a tree with weighted training examples
• Add tree to set of trees
• Make predictions with set of trees
• Adjust weights so that the training examples you got wrong have more weight
• repeat
Logistic Regressiona.k.a The Perceptron
ActivationFunction
Weighted sum
Multilayer Feed-forwardNeural Network
R’s Popularity
Tools mentioned in Kaggle user profiles
From blog entry by Ben Hammerhttp://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/
Summary of Recent Competition Winners
Position Algorithm Other Algs. Tools
AdzunaSalary
1stAdzunaSalary
2ndAdzunaSalary
3rd
Merck
1st
Merck 2ndMerck
3rd
NN* - Python GPU
NN - C++
NN NB, SVM, LR Python
NN* - Python GPU
GBM & SVM RF, PCA,KNN, SVM R & Python
RF & SVM GBM, NN R
Learning More
• Pedro Domingos at University of Washington
• www.coursera.org/course/machlearning
• www.coursera.org/uw
• A Few Useful Things to Know about Machine Learning. Communications of the ACM
• homes.cs.washington.edu/~pedrod
• blog.kaggle.com
• ufldl.stanford.edu/wiki/