Download - Uvrgrp ml

Transcript
Page 1: Uvrgrp ml

David Callender• Finished in top 2% (18th out of >1300) on 3 year

$3 million Machine Learning competition.

• Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College

• Studied computational protein design at the University of Washington

• Studied Mathematical foundations of Quantum Mechanics at Macalester College

Page 2: Uvrgrp ml

Machine Learning in Rcirca 2013

David Callender

Page 3: Uvrgrp ml

a.k.a. Using R on Kaggle

who will end up in the hospital

} drug effectiveness

Computer Security:Determining employee

access needs

What will the salary be for a given job advertisement

Page 4: Uvrgrp ml

Not Just Kaggle

•Movie recomendations•Popular productions

•Product recomendations•Good business oportunities

•The Entire Internet•Probably a lot more too

Page 5: Uvrgrp ml

Talk Outline

• Motivation

• Concepts

• Algorithms

• Decision Trees and Forests

• Neural networks

• Kaggle

• Interactive session with R packages

• randomForest

• gbm

• neuralnet

Page 6: Uvrgrp ml

Supervised Learning

Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C

Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q? 3 male 27 0 0 8.6625 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S

Train model with examples where

you know value of “survived”

Use model to predict value of

“survived”

Predicting survival for passengers of Titanic

binary

numeric

catagorical

Page 7: Uvrgrp ml

Overfitting

http://en.wikipedia.org/wiki/File:Overfitting_on_Training_Set_Data.pdf Tomaso Poggio

Page 8: Uvrgrp ml

Decision Trees

http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R

Survived Pclass Sex Age SibSp Parch Fare Embarked? 3 male 34.5 0 0 7.8292 Q? 3 female 47 1 0 7 S? 2 male 62 0 0 9.6875 Q

? 3 male 27 0 0 8.7 S? 3 female 22 1 1 12.2875 S? 3 male 14 0 0 9.225 S? 3 female 30 0 0 7.6292 Q? 2 male 26 1 1 29 S? 3 female 18 0 0 7.2292 C? 3 male 21 2 0 24.15 S

Page 9: Uvrgrp ml

Random Forest (RF)Survived Pclass Sex Age SibSp Parch Fare Embarked

0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C

Survived Pclass Sex Age SibSp Parch Fare Embarked0 3 male 22 1 0 7.25 S1 1 female 38 1 0 71.2833 C1 3 female 26 0 0 7.925 S1 1 female 35 1 0 53.1 S0 3 male 35 0 0 8.05 S0 3 male 33 0 0 8.4583 Q0 1 male 54 0 0 51.8625 S0 3 male 2 3 1 21.075 S1 3 female 27 0 2 11.1333 S1 2 female 14 1 0 30.0708 C

Random Sub-SpacesBagging

{

{Voting/Avg

Prediction

Training

Page 10: Uvrgrp ml

Adaboost &Gradient Boosting

• Initialize a set of weights, One for each training example, with equal value

• Train a tree with weighted training examples

• Add tree to set of trees

• Make predictions with set of trees

• Adjust weights so that the training examples you got wrong have more weight

• repeat

Page 11: Uvrgrp ml

Logistic Regressiona.k.a The Perceptron

ActivationFunction

Weighted sum

Page 12: Uvrgrp ml

Multilayer Feed-forwardNeural Network

Page 13: Uvrgrp ml

R’s Popularity

Tools mentioned in Kaggle user profiles

From blog entry by Ben Hammerhttp://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/

Page 14: Uvrgrp ml

Summary of Recent Competition Winners

Position Algorithm Other Algs. Tools

AdzunaSalary

1stAdzunaSalary

2ndAdzunaSalary

3rd

Merck

1st

Merck 2ndMerck

3rd

NN* - Python GPU

NN - C++

NN NB, SVM, LR Python

NN* - Python GPU

GBM & SVM RF, PCA,KNN, SVM R & Python

RF & SVM GBM, NN R

Page 15: Uvrgrp ml

Learning More

• Pedro Domingos at University of Washington

• www.coursera.org/course/machlearning

• www.coursera.org/uw

• A Few Useful Things to Know about Machine Learning. Communications of the ACM

• homes.cs.washington.edu/~pedrod

• blog.kaggle.com

• ufldl.stanford.edu/wiki/