Post on 21-Apr-2017
ANP126Machine Learning: Hype or Hit?Fred Verheul
2
Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
3. Demo, SAP ICN
4. Skill set for aspiring ML experts
5. Take-aways
3
Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
3. Demo, SAP ICN
4. Skill set for aspiring ML experts
5. Take-aways
4
Machine Learning
"Field of study that gives computers the ability to learnwithout being explicitly programmed” (Arthur Samuel, 1959)
5
What is Machine Learning?
Computer
Computer
Traditional Programming
Machine Learning
Data
Data
Program Output
ProgramOutput
6
Examples: Recommender systems
7
Examples: Natural Language Processing
Siri
Google Translate
8
Examples, continued…
SPAM-filtering
Handwriting recognition
9
ML in the news: IBM Watson
10
ML in the news: Deepmind’s AlphaGo
11
ML in the news: business example
12
Vendor Platforms…
13
Tricking a neural network…
A cat! Surely also a cat?!
More examples and explanation by Julia Evans (@b0rk)
14
Machine Learning gone wrong
15
Data Mining Fail (by Carina C. Zona)
16
Prediction is hard…
17
Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
3. Demo, SAP ICN
4. Skill set for aspiring ML experts
5. Take-aways
18
CRISP-DM: data mining process
ML important
ML important
19
Data: terminology
featuretarget / label
instance
20
Examples of ML tasksSupervised learning
Regression target is numeric
Classification target is categorical
Unsupervised learning
Clustering
Dimensionalityreduction
21
Exploratory Data Analysis
22
Data preparation
• Data Cleaning
• Missing Data
• Feature Engineering• Normalization• Categorical data Numerical features• Log-based features or target• Date/time-related features• Combine features, e.g. by +, -, x, /
23
Modeling: so many algorithms…
24
ML Algorithms: by RepresentationCollection of candidate models/programs, aka hypothesis space
Decision trees
Instance-based
Neural networks
Model ensembles
ML Algorithms: by Evaluation
Evaluation: Quality measure for a model
25
Regression
Example metric: Root Mean Squared Error
RMSE =
Binary classification: confusion matrix
Accuracy: 8 + 971 -> 97,9%
Example: medical test for a disease
Accuracy: Better evaluation metrics:• Precision: 8 / (8 + 19)• Recall: 8 / (8 + 2)
26
Optimization: how the algorithm ‘learns’, depends on representation and evaluation
ML Algorithms: by Optimization
Greedy Search, ex. of combinatorial optimization
Gradient Descent (or in general: Convex Optimization)
Linear Programming (or in general:Constrained/Nonlinear Optimization)
27
Algorithms by Evaluation: Heuristics
• Hill climbing
• Simulated Annealing
• Nelder-Mead Simplex Method
• Artificial Bee Colony Optimization
• Genetic Algorithms
• Particle Swarm Optimization
• Ant Colony Optimization
28
Choice of ML-algorithm, considerations
• Size & Dimensionality of training set
• Computational efficiency
• Model building, no of parameters• Eager vs lazy learning• Online vs batch
• Interpretability
29
Evaluation: training vs test data
5-fold cross validation
30
Training error vs test error
31
Overfitting
32
Chebishev distance (L∞-norm: || ||∞ )
|| P – Q ||∞ = max( , )
Number of moves of a King on a chessboard ;-)
Manhattan distance (L1-norm: || ||1 )
|| P – Q ||1 = +
0 1 2 3 4 5 6 7 8 9012345678
Line through (2,2) and (6,5)Line y = 2 (between 2 and 6)Vertical line x = 6 (between 2 and 5)
Distance metrics
Euclidean distance (L2-norm: || ||2 )
|| P – Q ||2 = (length of)
P
Q
Many more: Cosine distance, Edit distance (aka Levenshtein distance), …
33
Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
3. Demo, SAP ICN
4. Skill set for aspiring ML experts
5. Take-aways
34
Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
3. Demo, SAP ICN
4. Skill set for aspiring ML experts
5. Take-aways
35
So you want to be a Data Scientist?
36
CRISP-DM: data mining process
37
Hacking skills
• Programming languages:
• Libraries (examples):• Tensorflow, Caffe, Theano, Keras• SciPy & scikit-learn• Spark MLLib (Scala/Java/Python)
39
More math skills that may be needed…
Calculus Linear Algebra
40
Data Science for Business
• Focuses more on general principles than specific algorithms
• Not math-heavy, does contain some math
• O’Reilly link: http://shop.oreilly.com/product/0636920028918.do
• Book website: http://data-science-for-biz.com/DSB/Home.html
41
Agenda
1. Introduction: Hype or Hit?!
2. Machine Learning
3. Demo, SAP ICN
4. Skill set for aspiring ML experts
5. Take-aways
42
What has NOT been covered
• Deep learning / Neural Networks
• Specifics of ML-algorithms
• Tools / Libraries / Code
• SAP Products, like HANA / Predictive Analytics / Vora / …
• Hardware
• …
43
Take-aways
• Goal of ML: generalize from training data (not optimization!!)
• Part of ‘Data Mining Process’, not a goal in and of itself
• No magic! Just some clever algorithms…
• Increasingly important non-technical aspects:• Ethics
• Algorithmic transparency
Thank Youwww.soapeople.cominfo@soapeople.com@SOAPEOPLE
Fred VerheulBig Data Consultant+31 6 3919 2986fred.verheul@soapeople.com