Holistic approach to machine learning

@SrcMinistry @MariuszGil

Holistic approach to Machine Learning

Data processing

@SrcMinistry

We are developers

We love to…

Write code

Write tests

Use DDD/OOP/AOP/SOLID/GRASP/XYZ

What for?

Write code

Make money

Make users happy

Solve problems

Solve problems by writing code, to make users happy and make money

Solve problems

Solve problems by writing code, to make users happy and make money

problems

Mapping all problems to DDD/OOP/AOP/SOLID/GRASP/XYZ

Test first

Understand the problem first

Domain knowledge

Ask expert

Real problems

Data classification

Bot detection

Minimize risk of error

+ value estimator

+ chance of sell

+ $ optimization

Tens of thousands historical transactions

Tens of data components

Hundreds of data components

IF-Unsolveable

Machine Learning

The theory

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E

Tom M. Mitchell

Typical ML techniquesClassification Regression Clustering Dimensionality reduction Association learning

ooo oo

oo o o oo

feature 1

ooo oo

oo o o oo

feature 1

ooo oo

oo o o oo

feature 1

Experience

Typical ML paradigmsSupervised learning Unsupervised learning Reinforcement learning

Accuracy

The practice

data + algo = result

+-------+--------+------+--------+---------+-------+ | brand | model | year | milage | service | price | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 123000 | 9900 | 67000 | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 175000 | 9900 | 30000 | +-------+--------+------+--------+---------+-------+ | ford | focus | 2010 | 45000 | 6700 | 30000 | +-------+--------+------+--------+---------+-------+

Learning Data

Algorithm Learning

Classifier ModelReal Data Classification

Failure recipe

+-------+--------+------+--------+---------+-------+ | brand | model | year | milage | service | price | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 123000 | 9900 | 67000 | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 175000 | 9900 | 30000 | +-------+--------+------+--------+---------+-------+ | ford | focus | 2010 | 45000 | 6700 | 30000 | +-------+--------+------+--------+---------+-------+

+-------+--------+------+--------+---------+--------+-------+ | brand | model | year | milage | service | repair | price | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ | ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+------+--------+---------+--------+-------+

+-------+--------+------+--------+---------+--------+-------+ | brand | model | year | milage | service | repair | price | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 45000 | +-------+--------+------+--------+---------+--------+-------+ | ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+------+--------+---------+--------+-------+

+-------+--------+-----+------+--------+---------+--------+-------+ | brand | model | gen | year | milage | service | repair | price | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 45000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+-----+------+--------+---------+--------+-------+

+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | brand | model | gen | year | milage | service | repair | igla | crying German | price | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 0 | 0 | 67000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 1 | 1 | 30000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 0 | 0 | 45000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 1 | 0 | 30000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+

Understand your data first

Exploratory analysis

http://blogs.adobe.com/digitalmarketing/wp-content/uploads/2013/08/aq2.jpg

ML pipeline

Raw Data Collection

Pre-processing

Sampling

Training Dataset

Algorithm Training

Optimization

Post-processing

Final model

Pre-processingFeature Selection

Feature Scaling

Dimensionality Reduction

Performance Metrics

Model Selection

Test Dataset

Final ModelEvaluation

Pre-processing Classification

Missing Data

Feature Extraction

DataSplit

Raw Data Collection

Pre-processing

Sampling

Training Dataset

Algorithm Training

Optimization

Final model

Pre-processingFeature Selection

Feature Scaling

Dimensionality Reduction

Performance Metrics

Model Selection

Test Dataset

Final ModelEvaluation

Pre-processing Classification

Missing Data

Feature Extraction

DataSplit

Post-processing

Classification algorithmsLinear Classification Logistic Regression Linear Discriminant Analysis PLS Discriminant Analysis

Non-Linear Classification Mixture Discriminant Analysis Quadratic Discriminant Analysis Regularized Discriminant Analysis Neural Networks Flexible Discriminant Analysis Support Vector Machines k-Nearest Neighbor Naive Bayes

Decission Trees for Classification Classification and Regression Trees C4.5 PART Bagging CART Random Forest Gradient Booster Machines Boosted 5.0

Regression algorithmsLinear Regiression Ordinary Least Squares Regression Stepwise Linear Regression Prinicpal Component Regression Partial Least Squares Regression

Non-Linear Regression / Penalized Regression Ridge Regression Least Absolute Shrinkage ElasticNet Multivariate Adaptive Regression Support Vector Machines k-Nearest Neighbor Neural Network

Decission Trees for Regression Classification and Regression Trees Conditional Decision Tree Rule System Bagging CART Random Forest Gradient Boosted Machine Cubist

Algorithm is only element in the ML chain

Everything may be important for ML

Testing

Test datasets

60% 20% 20%

Andrew NG rule of ML

Does it do well onthe training data?

Does it do well onthe test data?

Better features /Better parameters

More data

by Andrew Ng

Calculate, measure, apply later

The code

import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD} import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics import org.apache.spark.mllib.util.MLUtils

// Load training data in LIBSVM format. val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

// Split data into training (60%) and test (40%). val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1)

// Run training algorithm to build the model val numIterations = 100 val model = SVMWithSGD.train(training, numIterations)

// Clear the default threshold. model.clearThreshold()

// Compute raw scores on the test set. val scoreAndLabels = test.map { point => val score = model.predict(point.features) (score, point.label) }

// Get evaluation metrics. val metrics = new BinaryClassificationMetrics(scoreAndLabels) val auROC = metrics.areaUnderROC()

println("Area under ROC = " + auROC)

// Save and load model model.save(sc, "myModelPath") val sameModel = SVMModel.load(sc, "myModelPath")

Art of asking right questions related to right data

@SrcMinistry

Thanks!

@MariuszGil

Holistic approach to machine learning

Software

Transcript of Holistic approach to machine learning

A Holistic Machine Learning-Based Autoscaling Approach for ...

A HOLISTIC APPROACH TO LOCAL ECONOMIC DEVELOPMENT … · a holistic approach to local economic development ... dai inc., belgrade the holistic approach to local economic development

Puppet modules: An Holistic Approach

Holistic Approach to Health

CSR in Building: a Holistic Approach

The Holistic Marketing Approach

Principal 4 Enabling A Holistic Approach

THE HOLISTIC APPROACH TO KNOWING YOUR · 10 THE HOLISTIC APPROACH TO KNOWING YOUR CUSTOMER COMMON BARRIERS TO EFFECTIVE KYC THE WRONG BLEND OF HUMAN AND MACHINE Finding the right

Holistic Approach to Admission.

Flavobacterium psychrophilum A Holistic … psychrophilum A Holistic Management Approach A Holistic Management Approach Presented by: Sherry Mead Freshwater Fisheries Society of BC

Holistic approach to eMarketing

The holistic approach of project management - …neerlandsdiep.nl/wp-content/uploads/2016/10/... · What is a holistic approach of ... common meaning of the holistic approach by making

Holistic Approach To Monitoring

Holistic Approach in Education: Some Considerations

The holistic approach - Atlas Copco · 2020. 9. 13. · The holistic approach With the Atlas Copco holistic approach to sustainable productivity, we provide tools tat are efficient

Holistic Approach to Homelessness

Vitality! The Holistic Approach

TÂM LÝ HOLISTIC APPROACH

A Holistic Approach To Stress

Holistic Career Development- An Aboriginal Approach.