Machine learning overview (with SAS software)

125
Copyright © 2012, SAS Institute Inc. All rights reserved. MACHINE LEARNING WITH SAS WORKSHOP GETTING THE MOST OUT OF YOUR DATA Longhow Lam

Transcript of Machine learning overview (with SAS software)

Page 1: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING WITH SAS WORKSHOP

GETTING THE MOST OUT OF YOUR DATA

Longhow Lam

Page 2: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

AGENDA AND SOME READING MATERIAL

Intro & positioning of Machine learning

SAS platform for Machine learning

Overview of Specific methods

Some examples

Further reading

An experimental comparison of classification techniques for imbalanced

credit scoring data sets using SAS® Enterprise Miner

http://support.sas.com/resources/papers/proceedings12/129-2012.pdf

Benchmarking state-of-the-art classification algorithms for credit scoring: A ten-year update

http://www.business-school.ed.ac.uk/waf/crc_archive/2013/42.pdf

An absolute recommender for more detail:

The elements of statistical learning, Hasting, Tibshirani & Friedman

http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Page 3: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

LONGHOW LAM SHORT BIO

MSc Mathematics (1995) Vrije Universiteit Amsterdam (drs. wiskunde)

MTD Applied Statistics (1997) Technical University Delft (twee jarige AIO toegepaste statistiek)

10+ year SAS experience (Base / Stat / Guide/ Miner / VA / VS)

10+ year R experience ( An introduction to R)

10 + year predictive modeling experience

ABNAMRO – Risk modeler

Basel, Credit risk, ALM models

Business&Decision – Quantitative consultant

ING Belgium, Fortis

Leaseplan, Belgium Post

Experian – data mininer

Collection Score, Delphi credit score, consulting

@longhowlamFollow me:

Page 4: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

INTRO MACHINE LEARNING

Wikipedia:

“Machine learning is a scientific discipline that deals with the construction

and study of algorithms that can learn from data. Such algorithms operate by

building a model based on inputs and using that to make predictions or

decisions, rather than following only explicitly programmed instructions.”

Page 5: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING AND SOME OTHER TERMS YOU OFTEN HEAR

Statistical

modeling

Supervised

Learning

Clustering

Unsupervised

Learning

Data mining

Machine

learningDimension

reduction

Association

rulesRecommender

Auto

encoders

Self

organizing

maps

Page 6: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SAS SOFTWARE

FOR MACHINE LEARNING (AND DATA MINING)

Page 7: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTSSAS In-Database Scoring

SAS Decision Manager

BUSINESS

MANAGER

SAS Model Manager

IT SYSTEMS /

MANAGEMENT

SAS Enterprise Guide

BUSINESS

ANALYST

Enterprise Miner / Text Miner

SAS IMSTAT / Recommender

DATA MINER /

DATA SCIENTIST

THE ANALYTICS

LIFECYCLE

SAS Visual Analytics

SAS Visual Statistics

Page 8: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

EASY TO USE GUI FOR MACHINE LEARNING COMBINED WITH CODE LIBRARIES

PROC hpbnet data = creditdata

structure = markovblanket;

model default = x1 LTV income age;

selction = Y

RUN;

Page 9: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING

Machine Learning algorithms designed to run on single

blade or multi blade distributed memory environments

HIGH PERFORMANCE

Page 10: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Manage

Rules + Data + Models

Deployment flexibility:

Batch

Real Time

Stored Process

In Database

Drive Reuse and

Consistency

EASY DEPLOYABLE

Model

Data

Rules

Model

MACHINE LEARNING WITH SAS

Page 11: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PREDICT SOMEONE’S INCOME

Income = 15.2 + 1.102 × Age

Age

Income

Predict someones income from his/her age

Collect some data

Plot the data

Analytical Base Table

IS THIS MACHINE LEARNING?

Page 12: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING ADDRESSING SOME MODELING ISSUES

The problem may not be linear: X2, X3, Log(X), Sqrt(X), 1/X ,…….?

You do not have one input variable: X1, X2, X3,……X567

Interactions en correlations between input variables

age

income

male

female

Analytical base table Derived inputs

Page 13: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING WHY IT CAN MATTER € € €

Suppose we have an untargeted direct mailing of 100.000 ‘letters’ to randomly

sampled prospects:

Conversion rate is around 1%. Profit per conversion €80, Cost per mailing is €0.70

Total ROI = 100.000 X 1% X € 80 − 100.000 X € 0.70 = € 10,000

Now we have a targeted mailing with a machine learning predictive model, that uses

prospect input data that can distinguish between high / low responders.

Page 14: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING WHY IT CAN MATTER € € €

Decile N Conversion Profit Cumulative

1 10.000 2.00% 9.000 9.000

2 10.000 1.50% 5.000 14.000

3 10.000 1.00% 1.000 15.000

4 10.000 1.00% 1.000 16.000

5 10.000 1.00% 1.000 17.000

6 10.000 1.00% 1.000 18.000

7 10.000 1.00% 1.000 19.000

8 10.000 0.80% -600 18.400

9 10.000 0.50% -3.000 15.400

10 10.000 0.20% -5.400 10.000

The profit by using a model to sent

letters only to the first 7 deciles is now:

€ 19.000 (instead of € 10.000)

If you have 100 of such campaigns a

year that means an increase of

€ 0.9 mln !!

Page 15: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING WHY IT CAN MATTER € € €

Decile N Conversion Profit Cumulative

1 10.000 3.00% 17.000 17.000

2 10.000 2.00% 9.000 26.000

3 10.000 1.40% 4.200 30.200

4 10.000 1.15% 2.200 32.400

5 10.000 1.00% 1.000 33.400

6 10.000 0.60% -2.200 31.200

7 10.000 0.40% -3.800 27.400

8 10.000 0.30% -4.600 22.800

9 10.000 0.10% -6.200 16.600

10 10.000 0.05% -6.600 10.000

The profit by using a much better model

to sent letters only to the first 5 deciles

is now:

€ 33.400 (instead of € 10.000)

If you have 100 of such campaigns a

year that means an increase of

€ 2.34 mln !!

Page 16: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING WHY IT CAN MATTER? € € €

Decile N Conversion Profit Cumulative

1 10.000 3.35% 19.800 19.800

2 10.000 2.23% 10.840 30.640

3 10.000 1.30% 3.400 34.040

4 10.000 1.10% 1.800 35.840

5 10.000 1.00% 1.000 36.840

6 10.000 0.55% -2.600 34.240

7 10.000 0.28% -4.760 29.480

8 10.000 0.25% -5.000 24.480

9 10.000 0.05% -6.600 17.880

10 10.000 0.02% -6.840 11.040

Now lets suppose we have even a

slightly better model than the last one

€ 36.840

If you have 100 of such campaigns a

year that means an increase of

€ 2.68 mln !!

Page 17: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

OVERVIEW OF SPECIFIC

MACHINE LEARNING METHODS

Classical regression

Decision trees

Dimension reduction

Bagging & Boosting

Support vector machines

K-Nearest Neighbour

Neural networks / deep learning

Bayesian networks

Text mining

Recommendation engine

Page 18: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

“CLASSICAL” REGRESSION

Page 19: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

LINEAR & LOGISTIC REGRESSION

Income = a + b × Age

Age

Income

Age

P(Churn)

1

0

P(Churn) = 1

1+𝐸𝑋𝑃(𝑎+𝑏 × Age)

Numeric target variable Binairy target variable

Page 20: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SPLINE REGRESSION MODELING NON LINEARITIES

Often there is a non linear relation

• Transformation of inputs: X2 , X3 , log(X) etc…

• Buckets / binning of variables

Y / logit(y)

X

Smoothing Splines

Page 21: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SPLINE REGRESSION MODELING NON LINEARITIES

Smoothing Splines: Piecewise polynomials that are glued together at knots

Two special cases for λ:

λ = 0 Any function that interpolates the data

λ = ∞ Simple Least square line fit

Choose λ by cross validation

Page 22: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

OPEL ASTRA CAR EXAMPLESPLINE REGRESSION

Extracted data from car sales site. For many cars we have the

kilometres driven and the car price. For the Opel Astra we have 2360 cars:

What is the relation between km driven and car sales price?

Too much smoothing and too little smoothing

Page 23: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

OPEL ASTRA CAR EXAMPLESPLINE REGRESSION

0.2 is the optimal smoothing paramter

Page 24: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Some other car make/models with

spline estimates of car depreciation

versus kilometres driven.

Hmmm.. my Renault Clio looks nice

but after 50.000 km I only have 46%

of the original value left…

Page 25: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MODELING NON LINEARITIES

In SAS we have TPSLINE, LOESS and the ADAPTIVEREG procedure

to fit multivariate regression splines

Supports:

More than one input

linear, logistic, Poisson, GLM regressions

combines both regression splines and model selection methods.

supports partitioning of data into training, validation, and testing roles

SPLINE REGRESSION

Page 26: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DECISION TREES

Page 27: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DECISION TREES

How does it work? A simple example

Suppose we have the following group of people

50% Response

50% No Response

We have/know Age and Marital Status

50%

50%

Age≤ 45 Age> 45

30%

70%

60%

40%

Married

Divorced UnMarried

20%

80%

60%

40%

Page 28: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DECISION TREES REGRESSION & CLASSIFICATION

Target X1 X2 X3 X4 X5

Y 12 A 456 1.2 X

N 21 B 456 1.5 X

Y 32 A 545 1.3 U

Y 34 C 443 1.1 U

N 23 A 345 1.7 U

N 13 B 567 1.2 X

N 45 A 654 1.9 X

… … … … … …

… … … … … …

Y 46 A 657 2.1 X

A recursive splitting algorithm:

1. Loop trough all inputs

2. Determine per input how to split

3. Take the best input to split

4. On the two new data sets apply 1,2,3 again….

5. Stop somewhere….

• How to split X1 or X2 ?

• When to stop?

Page 29: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DECISION TREES

How to split?Number is usualy 2 or 3.

More splits will exhaust the data too fast

Why split X1 <t1 beter dan X1 <s1?

Regression: Mean squared Error

Classification:

Mis-classification rate,

Cross-entropy, Chi-Squared

Regression tree: Mean square error

..

...

.. . .

...

.. .

.

Split s1 Split t1x

Y Y

x

REGRESSION & CLASSIFICATION

Page 30: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DECISION TREES

How to split?Number is usualy 2 or 3.

More splits will exhaust the data too fast

Why split X1 <t1 beter dan X1 <s1?

Regression: Mean squared Error

Classification:

Mis-classification rate,

Cross-entropy, Chi-Squared

Classification tree: Mis classificatie rate

xSplit s1 Split t1

REGRESSION & CLASSIFICATION

Page 31: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Decision trees (regressie & classificatie)

When to stop?

Not too early not too late!

Pruning

Remove parts the tree

Page 32: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DECISION TREES SOME COMMON TYPES

CHAID (chi-squared automatic interaction detection)

C4.5 / C5.0

CART (Classification and Regression)

The difference is mainly in the different splitting options

Page 33: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Decision trees pros and cons

pros

Interaction between variables

Interpretable rules Missing values easy to incorporate.

cons

Unstable

“Lack-of-Smoothnes” Fit of obvious (non)linear relations

man vrouw

Inkomen < 45 K Leeftijd < 33

Response rate

Opel Astras

Page 34: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DIMENSION REDUCTION

Page 35: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRINCIPLE

COMPONENTSANALYSIS

Linear transformation of data to uncorrelated data

The transformation W is such that

The largest variance is in the first coordinate

The second largets variance is in the second coordinate

Etc…

Page 36: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRINCIPLE

COMPONENTSANALYSIS

X1

X2

x x x x x x x

x

x

x

x

x

x

x

x

Page 37: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRINCIPLE

COMPONENTSANALYSIS

Page 38: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRINCIPLE

COMPONENTSANALYSIS

The Math behind

P = X W

𝑝11 𝑝21...

.

.

.𝑝1𝑛 𝑝2𝑛

=

𝑥11 𝑥21...

.

.

.𝑥1𝑛 𝑥2𝑛

𝑤11 𝑤21

𝑤12 𝑤22

w11 and w12 are the loadings corresponding to the first principle component.

w21 and w22 are the loadings corresponding to the second principle component.

With two dimensions In general

It turns out that the columns of W

Are the eigenvalue vectors of the matrix XTX

Page 39: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRINCIPLE

COMPONENTSANALYSIS

Scaling the inputs is important here

Applications of PCA

Dimension reduction

Visualisation

Outlier / anomalie detectie

PCA regression

Use PC instead of the original inputs

Page 40: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRINCIPLE

COMPONENTSDIMENSION REDUCTION

P = X WNow only take the first L columns of W

PL = X WL

For example for visualization only use the first

2 or 3 columns so that PL only has 2 or 3

columns that can be visualized in scatter or

contour plots

X

W

P=

XWL

PL

=

(10000 by 100 ) (100 by 100)(10000 by 100 )

(10000 by100 ) (100 by2)(10000 by 2)

Page 41: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SINGULAR VALUE DECOMPOSITION

Matrix SVD decomposition:

Diagonal with r singular values

[ could be a large number]UA

VT

═ Σ

Page 42: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SINGULAR VALUE DECOMPOSITION

A datapoint d can now be represented by k dimensional point

Matrix SVD decomposition:

Diagonal with r singular values

[ could be a large number]UA

VT

═ Σ

Take only k << r singular values

Uk

Ak

VTk

Σk

Page 43: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SVD EXAMPLE USING MY SON AS AN EXPERIMENT

Original

2448 X 3264 ~ 8 mln numbers

Page 44: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SVD EXAMPLE USING MY SON AS AN EXPERIMENT

SVD: 15 largest SV’s

1% of the data

Page 45: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SVD EXAMPLE USING MY SON AS AN EXPERIMENT

SVD: 75 largest V’s

5% of the data

Page 46: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

VARIABLE

CLUSTERING TO REDUCE THE DIMENSION

Variabele selection

I have 500 inputs but maybe there are only ten clusters of inputs

Within 1 cluster the variables are (strongly) correlated.

Then use only 1 input per cluster for predictive modeling

X1, X2, X3, ….., X500

X1, X21, X35, X430,….. X35

X17, X29, X353, X490,…. X29

X37, X95, X251, X393,…. X251

Page 47: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

VARIABLE

CLUSTERING TO REDUCE THE DIMENSION

Page 48: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

VARIABLE

CLUSTERING TO REDUCE THE DIMENSION

Page 49: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

BAGGING & BOOSTING

Page 50: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

COMBINE MODELS BAGGING & BOOSTING

If one model is not good enough: let multiple models vote for a prediction

Bootstrap Aggregation (Bagging)

This makes only sense if underlying models are different enough and have some predictive power

Random

sample

Final

modeldata

Page 51: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Bagging & Boosting: Random Forests

Random forests ≈ Bagging with trees

Apply underlying steps repeatedly

1. Generate a bootstrap sample

2. Choose randomly m inputs m << P

3. Fit a tree on the bootstrap sample with the m inputs (do not prune)

In case of a classification tree:

The random forest prediction is the majority vote of all trees

In case of a regression tree:

The random forest prediction is the average of all trees

Page 52: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

FOREST VS TREE EXAMPLE ON SIMULATED DATA

Decision tree and Random forest (100

sub trees) fitted on the simulated data

Page 53: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

FOREST VS TREE EXAMPLE ON SIMULATED DATA

It is clear to see that the forest can produce much smoother predictions.

Page 54: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

GRADIENT BOOSTING DON’T LET THE FORMULAS INTIMIDATE YOU

Page 55: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

GRADIENT

BOOSTINGSCHEMATIC OVERVIEW

Gradient Boosting, M iterations m = 1,2,…,M

Inputs

xr1

Final

model FM… M

At each succesive iteration a base learner hm(which is a decision tree) is fit on the pseudo residuals

using inputs x to “correct” the previous learner.

Pseudo residuals rim at each step

r2rM

Inputs

x

Inputs

x

Fm = Fm-1 + γ·hm

Page 56: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SUPPORT VECTOR MACHINES

Page 57: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Support vector machines (SVM)

Suppose we have a separable classification problem.

Find a linear decision boundary between the two groups with

maxium margin M. So green line would be better than blue line.

If not separable you have to allow that some points are on the

wrong side. These points are penalized. SVM still maximizes the

margin M, but with the constraint that total penalty is smaller than

C.

The input space might not be linear. We could apply non linear

mappings to the inputs: I.e. x2 , x3 , of spline(x).

The beauty of SVM is that in the calculations of the decision

boundary we do not need to explicitly use these transformations

“The kernel trick”

Page 58: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Support vector machines (SVM)

Suppose we have a separable classification problem.

Find a linear decision boundary between the two groups with

maxium margin M. So green line would be better than blue line.

If not separable you have to allow that some points are on the

wrong side. These points are penalized. SVM still maximizes the

margin M, but with the constraint that total penalty is smaller than

C.

The input space might not be linear. We could apply non linear

mappings to the inputs: I.e. x2 , x3 , of spline(x).

The beauty of SVM is that in the calculations of the decision

boundary we do not need to explicitly use these transformations

“The kernel trick”

Page 59: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Support vector machines (SVM)

Suppose we have a separable classification problem.

Find a linear decision boundary between the two groups with

maxium margin M. So green line would be better than blue line.

If not separable you have to allow that some points are on the

wrong side. These points are penalized. SVM still maximizes the

margin M, but with the constraint that total penalty is smaller than

C.

The input space might not be linear. We could apply non linear

mappings to the inputs: I.e. x2 , x3 , of spline(x).

The beauty of SVM is that in the calculations of the decision

boundary we do not need to explicitly use these transformations

“The kernel trick”

Page 60: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SVM UNDERLYING MATHEMATICAL OPTIMIZATION PROBLEMS

Separable classification

Non Separable classification

Non Separable classification rewritten using

Lagrange Dual problem

Kernels to model nonlinear behaviour

Page 61: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

https://www.youtube.com/watch?v=3liCbRZPrZA

Linear not separable, but in 3D space they are!

Page 62: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

K – NEAREST NEIGHBOUR

Page 63: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

K-NN METHOD

• No model is fitted. Given a query point x0 , find the k points x1, x2,..., xk that are

closest in distance to x0.

• Classify x0 using the majority vote among the k neighbours

x05 nearest neighbours of x0

3 of them are red

2 of them are green

so we predict x0 to be red

Page 64: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

K-NN METHOD

1 nearest neighbour 15 nearest neighbour

Page 65: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

K-NN METHOD

Use different numbers k of nearest neighbours test and traning errors

Despite its simplicity, k-nearest-neighbors has been

successful used in problems like

• handwritten digits,

• Satellite image scenes

• EKG patterns

Page 66: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

K-NN EXAMPLE DUTCH HOUSE PRICES

Extract house for sale prices from a Dutch housing site

For 108K Dutch postal codes (out of 463K) there are one or more houses for sale.

How can we estimate the house value for the postal codes without a house price?

For a Postal code with no price estimate the price

by taking the k closest house for sale prices.

Page 67: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Comparing different nearest neighbours in SAS Enterprise Miner

Page 68: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

K-NN EXAMPLE DUTCH HOUSE PRICES

30% of the data was used as validation set

In Enterprise Miner different values for k were used

k=5 nearest neighboor has the lowest Average squared error

Page 69: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Page 70: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETWORKS

DEEP LEARNING

Page 71: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETWORK LINEAR REGRESSION

f Y = f(X,w) = w1 + w2X2 + w3X3 + w4X41

X2

X3

X4w4

w3

w1

w2 Neural network compute node

f is the so-called activation function.

This could be the logit function, but

other choices are possible

There are four weights w’s that have

to be determined

Page 72: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETWORKS MATHEMATICAL FORMULATION

In formula the prediction forumla for a NN is geiven by

Leeftijd

Inkomen

Regio

Geslacht

X1

X2

X3

X4

Z1

Z2

Z3

Y

N

X inputs Hidden layer z outputs

α1

β1

P Y X) = 𝑔 𝑇𝑌

𝑇𝑌 = 𝛽0𝑌 + 𝛽𝑌𝑇𝑍

𝑍𝑚 = 𝜎 𝛼0𝑚 + 𝛼𝑚𝑇 𝑋

De functions g and σ are defined as

𝑔 𝑇𝑌 =𝑒𝑇𝑌

𝑒𝑇𝑁+𝑒𝑇𝑌, 𝜎(𝑥) =

1

1+𝑒−𝑥

In case of a binary classifier 𝑃 𝑁 𝑋 = 1 − 𝑃(𝑌|𝑋)

The model weights α and β have to be estimated from the data

Page 73: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETWORKS ESTIMATING THE WEIGHTS

Back propagation algorithm

Randomly choose small values for all wi’ s

For each data point (observation)

1. Calculate the neural net prediction

2. Calculate the error E (for example: E = (actual – prediction)2)

3. Adjust weights w according to:

4. Stop if error E is small enough.

𝑤𝑖𝑛𝑒𝑤 = 𝑤𝑖 + ∆𝑤𝑖

∆𝑤𝑖 = −𝛼𝜕𝐸

𝜕𝑤𝑖

Page 74: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DEEP LEARNING NEURAL NET WORK WITH MORE THAN 2 HIDDEN LAYERS

Page 75: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETS AUTOENCODERS

http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf

Neural networks that use inputs to predict the inputs

X1

X2

X3

X4

X1

X2

X3

X4

ENCODE DECODE

Linear activation function corresponds with 2 dimensional principle components analysis

2 dimensional middle layer

For visualisation

Page 76: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETS AUTOENCODERS

http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf

Often more hidden layers with many nodes

ENCODE DECODE

INPUT OUTPUT = INPUT

Page 77: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NET CARS EXAMPLE

2 dimensional PCAAutoencoder network

25 – 15 – 2 – 15 – 25

Page 78: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETS AUTOENCODER EXAMPLE

• 1000 images of digits

• Each image has 400 pixels

• So a 400 dimensional input vector X = (x1,…,x400)

• Compare two dimensional PCA with an neural net auto encoder

Page 79: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NEURAL NETS AUTOENCODER EXAMPLE

proc neural

data= autoencoderTraining

dmdbcat= work.autoencoderTrainingCat;

performance compile details cpucount= 12 threads= yes;

/* DEFAULTS: ACT= TANH COMBINE= LINEAR */

/* IDS ARE USED AS LAYER INDICATORS – SEE FIGURE 6 */

/* INPUTS AND TARGETS SHOULD BE STANDARDIZED */

archi MLP hidden= 5;

hidden 300 / id= h1;

hidden 100 / id= h2;

hidden 2 / id= h3 act= linear;

hidden 100 / id= h4;

hidden 300 / id= h5;

input corruptedPixel1 - corruptedPixel400 / id= i level= int std=

std;

target pixel1-pixel400 / act= identity id= t level= int std= std;

/* BEFORE PRELIMINARY TRAINING WEIGHTS WILL BE RANDOM */

initial random= 123;

prelim 10 preiter= 10;

run;

Page 80: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Two dimensional representation of 400 dimensial ‘digit’ data

Page 81: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

BAYESIAN NETWORKS

Page 82: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

BAYESIAN NETWORKS -- ACYCLIC GRAPHICAL MODELS

• Nodes represent random variables,

• Links between nodes represent conditional dependencies,

• Conditional probabilty tables are derived from training data for each node,

• Random variables are typically

binary or discrete,

• The graph structure can be

learned from the data,

Page 83: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Page 84: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TEXT MINING

Page 85: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TEXT MINING BASICS

“Advanced” word counting

Parse & Filter Part of speech

Entity detection

Mixed / numeric / abbrev.

Stemming

Spell checks, Stop list, Synonim list

Multi-term words

Apply Traditional data mining Clustering

Prediction / machine learning

Page 86: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TEXT MINING BASICS

Document 1: “Ik loop over straat in Amsterdam, 1057DK, met mijn fiets”

Document 2: “Zij liep niet maar fietste met haar blauwe fieets, //bitly.com/sdrtw”

Document 3: “Mijn tweewieler is kapot, wat een slecht stuk ijzer, @#$%$@!”

Terms Doc 1 Doc 2 Doc 3

+Fiets (znmw) 1 1 1

Fietsen (ww) 0 1 0

Blauwe (bvg) 0 1 0

Amsterdam (locatie) 1 0 0

+Lopen (ww) 1 1 0

Straat (znmw) 1 0 0

Kapot (bijw) 0 0 1

Slecht 0 0 1

Stuk Ijzer 0 0 1

1057DK (postcode) 1 0 0

//bitly.com/sdrtw (Internet) 0 1 0

TERM DOCUMENT MATRIX: A

• Each text document is (very) long vector

of word counts (often with many zeros!)

• Apply further mining on this matrix A.

Page 87: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TEXT MINING TERM DOCUMENT MATRIX A

It is not useful to apply data mining techniques directly on the term document

matrix

• Often more terms than documents

• Rows could be strongly correlated

• Matrix is often very sparse

Apply Singular value decomposition first.

Page 88: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TEXT MINING SVD ON THE TERM DOCUMENT MATRIX A

A document d is not a long vector of m word counts but a much shorter vector 𝑑,

say of length 300.

Matrix SVD decompositie:

Diagonal with r singular values

[ could be many thousands ]UA

VT

═ Σ

take only the first k << r singular values

Uk

Ak

VTk

Σk

Page 89: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TEXT MINING APPLICATIONS

Combine customer structured data and unstructured data to better predict behaviour (churn / fraud)

Apply machine learning to create

a model f to predict the target

Automatically generate topics within large document collections

Apply clustering techniques to classify

documents into clusters (topics)

Topic 1 Topic 2 Topic 3

Page 90: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RECOMMENDATION ENGINE

Which product should I recommend my customers?

Page 91: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RECOMMENDATION

ENGINE USER – ITEM MATRIX EXPLICIT RECOMMENDATIONS

Users rated items (products) explicitly

Matrix is often very sparse

1 mln users 100K items ~ 0.01%??

User - Item Matrix – DataItem 1 Item 2 Item 3 Item 4 Item 5

User 1 3 2 5 4 5

User 2 - - - 1 1

User 3 1 - 2 5 -

User 4 - - 1 2 5

User 5 2 1 4 2 3

User 6 2 3 - 5 1

User 7 5 1 - 3 4

User 8 - 1 - 4 1

User 9 2 3 2 4 2

User 10 - 1 3 - 1

User 4's Item RatingsUser 4 - - 1 2 5

After some math…. recommendations are: User 4 3.21 4.82 1 2 5

Recommend item 2!

Page 92: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RECOMMENDATION

ENGINE ALGORITHMS IN PROC RECOMMEND

Memory-based algorithms

Slope one (slope1)

K nearest neighbors (knn)

Model-based algorithms

Matrix factorization (SVD - LBFGS)

Market basket analysis

Association rules mining (arm)

Mixture of different methods

Clustering(cluster)

Ensemble

Page 93: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHODS SLOPE ONE

Y = x + b with slope equal to 1;

See notes

Item-item based

𝑟𝑢𝑖 = 𝑗 𝑤𝑖𝑗𝑟𝑢𝑗

𝑗 𝑤𝑖𝑗

Weight wij: the number of users having rated both items i and j;

Rating ruj : the average rating computed from item j;

Sample rating database

Customer Item A Item B Item C

John 5 3 2

Mark 3 4 ??

Lucy ?? 2 5

Page 94: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHODS K NEAREST NEIGHBORS

The rating rui is determined by the ratings “in the neighborhood”

𝑟𝑢𝑖 = 𝑗∈N 𝑖;𝑢 𝑠𝑖𝑚𝑖𝑗𝑟𝑢𝑗

𝑗∈N 𝑖;𝑢 𝑠𝑖𝑚𝑖𝑗

How to determine the neighbors and how many (k) to use?

How to compute the similarity/distance measure 𝒘𝒊𝒋

• Pearson’s correlation coefficient

• Cosine distance

• Other adjustments

Similarity w

Neighbors N

Page 95: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHODS

PEARSON CORRELATION

𝑎, 𝑏 : users

𝑟𝑎,𝑝 : rating of user 𝑎 for item 𝑝

𝑃 : set of items, rated both by 𝑎 and 𝑏

• Possible similarity values between −1 and 1

𝒔𝒊𝒎 𝒂, 𝒃 = 𝒑 ∈𝑷(𝒓𝒂,𝒑 − 𝒓𝒂)(𝒓𝒃,𝒑 − 𝒓𝒃)

𝒑 ∈𝑷 𝒓𝒂,𝒑 − 𝒓𝒂𝟐

𝒑 ∈𝑷 𝒓𝒃,𝒑 − 𝒓𝒃𝟐

Page 96: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHODS K NEAREST NEIGHBORS METHOD

Page 97: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHODS MATRIX FACTORIZATION

How do we fill in the missing data?

m n

R U=

V

m k k n

Select loss function (squared error)

Select the number of hidden factors k

Optimization problem

L-BFGS

ALS

users

items

𝑅𝑖𝑗 = 𝑈𝑖𝑇𝑉𝑗Predict New Rating R:

Minimize prediction error: min𝑢,𝑣

𝑖,𝑗

(𝑅𝑖𝑗−𝑈𝑖𝑇𝑉𝑗)

2 + 𝜆( 𝑈𝑖2 + 𝑉𝑗

2)

Page 98: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHODS CLUSTER

Knn within

one subgroup

User/item

profile

User/item

rating

Predictions

Clustering

Page 99: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RE METHOD ASSOCIATION RULE MINING (MARKET BASKET ANALYSIS)

Basic steps for assoc rules mining

Identify frequent itemsets (rules) in the transaction data:

IF item A and B THEN item C

IF item X THEN item Y

Not all rules are interesting, use ‘support’ and ‘lift’ to judge importance of a rule

# trxs. {X} {Y}

Total # trxs.

Support (X,Y) =

Lift = Support (X,Y)

Support (X) * Support(Y)

Support & LiftDiapers Beer 0.8%

Diapers Candles 0.018%

For example a lift of 2.5 means:

If people have X they are 2.5 more likely

to buy Y than if they don’t have X

Page 100: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

METHOD ENSEMBLE

Linear combination of previous methods

Achieve better performance

Page 101: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PROC RECOMMEND recom = rs.IENS;

* Add a recommendation system;

ADD rs.IENS /item = item user = user rating = rating;

* Add tables;

ADDTABLE LHL1209.IENS_UIR / recom = rs.IENS type = rating vars=(item user rating);

* Method SVD LBFGS met 20 factoren ;

METHOD svd /

factors = 20

label = "svd" fconv = 1e-3

gconv = 1e-3 maxiter = 100

MAXFEVAL = 5000 function = L2

lamda = 0.2

technique = lbfgs;

RUN;

METHOD ARM /

label = "ARM" ;

RUN;

/* information on the recommender system */

INFO;

QUIT;

Page 102: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

/** prediction with the SVD method ***/

PROC RECOMMEND recom = rs.IENS;

PREDICT /

method = svd

label = "svd"

Num = 3

users = ("Longhow Lam");

run;

QUIT;

Page 103: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

LAST SLIDE

Page 104: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

OF MORE MODERN MACHINE LEARNING

CONS Unfamilar with broader audiance, (more) difficult to explain

Black box approach (you are rejected: The computer says NO)

Often relations can already be modeled with classical regression models

It allows you to not think about the business problem

PROS Often less data prep (manual tuning) neccesary (just throw it in the algorithm…)

Interactions often “automatically” taken into account

Superior for Text mining, Image & Speech recognition

Better lift possible (paar procent “gratis”)

It allows you to not think about the business problem

(compared to traditional linear /logistic regression)

PROS AND CONS

Page 105: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

WHY SAS FOR MACHINE LEARNING

• Many different techniques

• Easy to use GUI’s combined with flexible coding

• High performance scalability

• Easy Deployable

Page 106: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SOME MACHINE LEARNING EXAMPLES

Text mining

Image recognition

Sound recognition

Strange faces

So can a machine read, see and hear?

Page 107: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PREDICTING SENTIMENT FROM

RESTAURANT REVIEWS

Page 108: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

IENS REVIEWS COLLECTED AROUND 16.000 REVIEWS AND THEIR SCORES

Used text miner to parse and filter reviews,

and transform reviews to data points in SVD space.

Page 109: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Predicted review score vs. Given review score

USE MACHINE LEARNING TO PREDICT TARGET WITH THE 300 INPUTS

R2 Linear regression = 0.5

R2 Neural Net = 0.6

Page 110: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Predicted review score vs. Given review score

USE MACHINE LEARNING TO PREDICT TARGET WITH THE 300 INPUTS

R2 Linear regression = 0.5

R2 Neural Net = 0.6

Page 111: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

IENS REVIEWS APPLY MODEL ON ‘NEW REVIEWS’

Page 112: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MNIST DATA IN SAS

MODIFIED NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY

Page 113: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MNIST TRAINING DATA

42.000 pictures of hand-written digits

Each digit is a picture of 28 by 28 pixels

So a 784 dimensional vector

First 100 digits of the MNIST data and there KNOWN labels in red

Page 114: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MNIST DATA TRYING DIFFERENT LEARNING TECHNIQUES

8 – Nearest Neighbour has the lowest misclassification

rate. 3.6% of the digits in the validation set are mis

classified.

70/30 training/validation split

PCA regression on 50 largest PC’s

Seven singel layer neural nets: 3, 6, 12, 24,

48, 100, 200 neurons

Seven multi layer neural nets

Three Random forest: 100, 500 and 1000

trees

8, 16 and 24 nearest neighbors

Page 115: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MNIST DATA APPLY MODEL ON TEST SET

28.000 digits without known labels.

Our best model predicted the label for

these digits.

First 100 predicted digits, together with

the handwritten digits are displayed

here.

Red numbers are predicted labels. We

see obvious some mistakes…..

Page 116: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SPEECH RECOGNITION

DIGITS RECORDED WITH IPHONE

1 2

Page 117: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SPEECH RECOGNITION

WAV files consists of ~ 30.000 points too much redundancy

Use spectral analysis to convert signal to frequency domain

Still too much apply principle components

TRAIN DATA

8 spoken ‘ones’ in wav files

8 spoken ‘twos’ in wav files

Page 118: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SPEECH RECOGNITION

Page 119: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SPEECH RECOGNITION

Zero errors on training data

Zero errors on test data

Also 8 ‘ones’ and 8 ‘twos’

In Enterprise Miner:

Neural network with 9 neurons in one hidden layer

Page 120: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

STRANGE FACE

DETECTIONCOMBO OF OPEN API / R & SAS

Little joke on my colleagues….

Page 121: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

STRANGE FACE

DETECTIONCOMBO OF OPEN API / R & SAS

Get free API key for Face++

Their API returns 83 facial landmarks (in JSON format)

Apply advanced analytics on the ABT

Which faces are look-alikes proc cluster (hierarchical cluster)

Sales faces? Predictive modeling / machine learning

Who is the Brad Pit? Nearest Neighbor

Strange faces? proc neural / auto-encoder

Create R script to

Retrieve the SAS faces from our site

put them trough the Face++ API

Collect JSON results and store them in an ABT

Page 122: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

STRANGE FACE

DETECTIONLOOK ALIKE FACES

Page 123: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

STRANGE FACE

DETECTIONBRAD PIT LOOK A LIKES

Page 124: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

STRANGE FACE

DETECTION

STRANGE FACES

SAS Faces, Actors Faces

Read more on my blog

Page 125: Machine learning overview (with SAS software)

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

STRANGE FACE

DETECTIONCOMBO OF OPEN API / R & SAS

SAS Faces, Actors Faces

Read more on my blog