QCon Rio - Machine Learning for Everyone

67
@dhianadeva MACHINE LEARNING FOR EVERYONE Demystifying machine learning!

Transcript of QCon Rio - Machine Learning for Everyone

Page 1: QCon Rio - Machine Learning for Everyone

@dhianadeva

MACHINE LEARNINGFOR EVERYONE

Demystifying machine learning!

Page 2: QCon Rio - Machine Learning for Everyone

AGENDA

Goal:

Encourage you to start a machine learning project. Today!

● About me● About you● Machine Learning● Problems● Design● Algorithms● Evaluation● Code snippets● Pay-as-you-go● Competitions

Page 3: QCon Rio - Machine Learning for Everyone

ABOUT MEElectronics Engineering, Software Development and Data Science… Why not?

Page 4: QCon Rio - Machine Learning for Everyone

DHIANA DEVA

Page 5: QCon Rio - Machine Learning for Everyone

NEURALTB

Page 6: QCon Rio - Machine Learning for Everyone

CERN

Page 7: QCon Rio - Machine Learning for Everyone

NEURALRINGER

Page 8: QCon Rio - Machine Learning for Everyone

DJBRAZIL

Page 9: QCon Rio - Machine Learning for Everyone

HIGGS CHALLENGE

Page 10: QCon Rio - Machine Learning for Everyone

ABOUT YOUYou can do it!

Page 11: QCon Rio - Machine Learning for Everyone

FOR ALL

Page 12: QCon Rio - Machine Learning for Everyone

MASSIVE ONLINEOPEN COURSES

Page 13: QCon Rio - Machine Learning for Everyone

OPEN SOURCE TOOLS

Page 14: QCon Rio - Machine Learning for Everyone

OPEN SOURCEPYTHON TOOLS

Page 15: QCon Rio - Machine Learning for Everyone

PAY-AS-YOU-GO SERVICES

Page 16: QCon Rio - Machine Learning for Everyone

MACHINE LEARNINGLearning, machine learning!

Page 17: QCon Rio - Machine Learning for Everyone

EXPECTATIONS

Page 18: QCon Rio - Machine Learning for Everyone

REALITY

Page 19: QCon Rio - Machine Learning for Everyone

FEATUREEXTRACTION

Item { Feature 1Feature 2…Feature N

Page 20: QCon Rio - Machine Learning for Everyone

FEATURESPACE

Page 21: QCon Rio - Machine Learning for Everyone

SUPERVISEDLEARNING

458316,86.513,24.312,64.983,65.8

58317,135.493,2.204,101.966,46.5

458318,-999.0,91.803,113.007,120.

Feature VectorsItems

New item

Machine Learning Algorithm

458316,86.513,24.312,64.983,65.8

Feature Vector

133

Expected Label

Predictive Model

150

623

74

Labels

Page 22: QCon Rio - Machine Learning for Everyone

UNSUPERVISEDLEARNING

Items

New item

Machine Learning Algorithm

458316,86.513,24.312,64.983,65.8

Feature VectorPredictive Model

Better Representation

458316,86.513,24.312,64.983,65.8

58317,135.493,2.204,101.966,46.5

458318,-999.0,91.803,113.007,120.

Feature Vectors

Page 23: QCon Rio - Machine Learning for Everyone

MODELS

Page 24: QCon Rio - Machine Learning for Everyone

BIOLOGICAL MOTIVATION

Page 25: QCon Rio - Machine Learning for Everyone

PROBLEMSI've got 99 problems, but machine learning ain't one!

Page 26: QCon Rio - Machine Learning for Everyone

CLASSIFICATION

?

A

B

Page 27: QCon Rio - Machine Learning for Everyone

CLASSIFICATION

Page 28: QCon Rio - Machine Learning for Everyone

REGRESSION

? 8 15 7 1

11 13 6 3

Page 29: QCon Rio - Machine Learning for Everyone

REGRESSION

Page 30: QCon Rio - Machine Learning for Everyone

CLUSTERING

Page 31: QCon Rio - Machine Learning for Everyone

CLUSTERING

Page 32: QCon Rio - Machine Learning for Everyone

DIMENSIONALITY REDUCTION

Page 33: QCon Rio - Machine Learning for Everyone

DIMENSIONALITY REDUCTION

Page 34: QCon Rio - Machine Learning for Everyone

DESIGN DECISIONS1, 2 steps!

Page 35: QCon Rio - Machine Learning for Everyone

NORMALIZATION

● min-max

● z-score

Page 36: QCon Rio - Machine Learning for Everyone

TRAINING

Page 37: QCon Rio - Machine Learning for Everyone

REGULARIZATION

Page 38: QCon Rio - Machine Learning for Everyone

RELEVANCE ANALYSIS

Page 39: QCon Rio - Machine Learning for Everyone

CROSS VALIDATION

Page 40: QCon Rio - Machine Learning for Everyone

ALGORITHMSCheat sheet included!

Page 41: QCon Rio - Machine Learning for Everyone

CHEAT SHEET

Page 42: QCon Rio - Machine Learning for Everyone

CHEAT SHEET

Page 43: QCon Rio - Machine Learning for Everyone

ALGORITHMS PT. I

Linear Regression

Decision Trees Random Forest

Page 44: QCon Rio - Machine Learning for Everyone

ALGORITHMS PT. II

K-Nearest Neighbors K-Means

Page 45: QCon Rio - Machine Learning for Everyone

NEURAL NETWORKS

Page 46: QCon Rio - Machine Learning for Everyone

SELF-ORGANIZING MAPS

Page 47: QCon Rio - Machine Learning for Everyone

PRINCIPAL COMPONENTS ANALYSIS

Page 48: QCon Rio - Machine Learning for Everyone

T-SNE

Page 49: QCon Rio - Machine Learning for Everyone

EVALUATIONHow you doin'?

Page 50: QCon Rio - Machine Learning for Everyone

PRECISION AND ACCURACY

Page 51: QCon Rio - Machine Learning for Everyone

CONFUSION MATRIX

Precision =TP

Recall =

F1-score =2 * precision * recall

precision + recall

TP + FP

TP

TP + FN

TP = True PositivesTN = True NegativesFP = False PositivesFN = False Negatives

Page 52: QCon Rio - Machine Learning for Everyone

ROC CURVE

Page 53: QCon Rio - Machine Learning for Everyone

A/B TESTS

Page 54: QCon Rio - Machine Learning for Everyone

Code Snippets"Hello, Machine Learning"

Page 55: QCon Rio - Machine Learning for Everyone

MATLAB 101

[x,y] = ovarian_dataset;net = patternnet(5);[net,tr] = train(net,x,y);testX = x(:,tr.testInd);testY = net(testX);

Page 56: QCon Rio - Machine Learning for Everyone

MATLAB 201

net = patternnet(14);net.input.processFcns = {'mapminmax', 'fixunknowns', 'processpca'};net.inputs{1}.processParams{3}.maxfrac = 0.02;net.trainFcn = 'trainlm';net.performFcn = 'mse';net.divideParam.trainRatio = 70/100;net.divideParam.valRatio = 15/100;net.divideParam.testRatio = 15/100;[net, tr] = train(net_config, test_inputs, train_targets);outputs = net(test_inputs);

Page 57: QCon Rio - Machine Learning for Everyone

R

library(randomForest)raw.orig < - read.csv(file="train.txt", header=T, sep="\t")frmla = Metal ~ OTW + AirDecay + Kocfit.rf = randomForest(frmla, data=raw)print(fit.rf)importance(fit.rf)

Page 58: QCon Rio - Machine Learning for Everyone

SCIKIT LEARN

dataset = pd.read_csv('Data/train.csv')target = dataset.Activity.valuestrain = dataset.drop('Activity', axis=1).valuestest = pd.read_csv('Data/test.csv').valuesrf = RandomForestClassifier(n_estimators=100, n_jobs=-1)rf.fit(train, target)predicted_probs = [x[1] for x in rf.predict_proba(test)]importances = rf.feature_importances_

Page 59: QCon Rio - Machine Learning for Everyone

PAY-AS-YOU-GOSERVICESAmazon Machine Learning

Page 60: QCon Rio - Machine Learning for Everyone

AMAZONMACHINE LEARNING

Five easy steps1. Upload csv dataset to Amazon S32. Create Datasource with metadata about uploaded

dataset3. Create ML Model with configurations for model training4. Create Evaluation to analyse and tune model efficiency5. Create Prediction to use trained model with new data

Page 61: QCon Rio - Machine Learning for Everyone

EVALUATION

Page 62: QCon Rio - Machine Learning for Everyone

SDKs

Page 63: QCon Rio - Machine Learning for Everyone

DATA SCIENCE COMPETITIONSChallenge accepted!

Page 64: QCon Rio - Machine Learning for Everyone

KAGGLE

Page 65: QCon Rio - Machine Learning for Everyone

SPONSORED

Page 66: QCon Rio - Machine Learning for Everyone

END TO END

TRAINTRAIN.CSV

TEST.CSV

TRAINED.DAT

RUN SOLUTION.CSV

Page 67: QCon Rio - Machine Learning for Everyone

THANK YOUQuestions?

Dhiana Deva

[email protected]