Uvrgrp ml

15
David Callender Finished in top 2% (18th out of >1300) on 3 year $3 million Machine Learning competition. Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College Studied computational protein design at the University of Washington Studied Mathematical foundations of Quantum Mechanics at Macalester College

description

 

Transcript of Uvrgrp ml

Page 1: Uvrgrp ml

David Callender• Finished in top 2% (18th out of >1300) on 3 year

$3 million Machine Learning competition.

• Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College

• Studied computational protein design at the University of Washington

• Studied Mathematical foundations of Quantum Mechanics at Macalester College

Page 2: Uvrgrp ml

Machine Learning in Rcirca 2013

David Callender

Page 3: Uvrgrp ml

a.k.a. Using R on Kaggle

who will end up in the hospital

} drug effectiveness

Computer Security:Determining employee

access needs

What will the salary be for a given job advertisement

Page 4: Uvrgrp ml

Not Just Kaggle

•Movie recomendations•Popular productions

•Product recomendations•Good business oportunities

•The Entire Internet•Probably a lot more too

Page 5: Uvrgrp ml

Talk Outline

• Motivation

• Concepts

• Algorithms

• Decision Trees and Forests

• Neural networks

• Kaggle

• Interactive session with R packages

• randomForest

• gbm

• neuralnet

Page 6: Uvrgrp ml

Supervised Learning

Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C

Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q? 3 male 27 0 0 8.6625 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S

Train model with examples where

you know value of “survived”

Use model to predict value of

“survived”

Predicting survival for passengers of Titanic

binary

numeric

catagorical

Page 7: Uvrgrp ml

Overfitting

http://en.wikipedia.org/wiki/File:Overfitting_on_Training_Set_Data.pdf Tomaso Poggio

Page 8: Uvrgrp ml

Decision Trees

http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R

Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q

? 3 male 27 0 0 8.7 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S

Page 9: Uvrgrp ml

Random Forest (RF)Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C

Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C

Random Sub-SpacesBagging

{

{Voting/Avg

Prediction

Training

Page 10: Uvrgrp ml

Adaboost &Gradient Boosting

• Initialize a set of weights, One for each training example, with equal value

• Train a tree with weighted training examples

• Add tree to set of trees

• Make predictions with set of trees

• Adjust weights so that the training examples you got wrong have more weight

• repeat

Page 11: Uvrgrp ml

Logistic Regressiona.k.a The Perceptron

ActivationFunction

Weighted sum

Page 12: Uvrgrp ml

Multilayer Feed-forwardNeural Network

Page 13: Uvrgrp ml

R’s Popularity

Tools mentioned in Kaggle user profiles

From blog entry by Ben Hammerhttp://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/

Page 14: Uvrgrp ml

Summary of Recent Competition Winners

Position Algorithm Other Algs. Tools

AdzunaSalary

1stAdzunaSalary

2ndAdzunaSalary

3rd

Merck

1st

Merck 2ndMerck

3rd

NN* - Python GPU

NN - C++

NN NB, SVM, LR Python

NN* - Python GPU

GBM & SVM RF, PCA,KNN, SVM R & Python

RF & SVM GBM, NN R

Page 15: Uvrgrp ml

Learning More

• Pedro Domingos at University of Washington

• www.coursera.org/course/machlearning

• www.coursera.org/uw

• A Few Useful Things to Know about Machine Learning. Communications of the ACM

• homes.cs.washington.edu/~pedrod

• blog.kaggle.com

• ufldl.stanford.edu/wiki/