Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) §...

30
1 © 2016 The MathWorks, Inc. Supporting Data Science Workflows with MATLAB Marta Wilczkowiak Eugene McGoldrick MathWorks

Transcript of Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) §...

Page 1: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

1© 2016 The MathWorks, Inc.

Supporting Data Science Workflows with MATLAB

Marta WilczkowiakEugene McGoldrickMathWorks

Page 2: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

2

DomainExpertise

Computer Science

Mathematics Statistics

Page 3: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

3

DomainExpertise

Computer Science

Mathematics Statistics

“ThisyearIwasplanningontryingoutmachinelearningusingTensorFlow orCaffe...butthebarriertoentryseemedveryhighthere

Page 4: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

4

Flickr Anzhelika German

Page 5: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

5

DomainExpertise

Computer Science

Mathematics Statistics

Page 6: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

6

PROTOTYPING, MODELLING &

ANALYSISAPPLICATION

DEVELOPMENT

TESTING: VALIDATION AND VERIFICATION

(ENTERPRISE) IMPLEMENTATION

DomainExpertise

Computer Science

Mathematics Statistics

Page 7: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

7

Examples

Page 8: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

8

Getting started with Machine Learning

§ User: Subject matter expert with no prior experience with machine learning§ Objective: Improve current methods using machine learning § Characteristics: Many methods, but :

– Similar interfaces– Similar problems: overfitting, trade-off false-positives/negatives

Page 9: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

9

MATLAB has Apps

Page 10: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

10

Before After

User had to choose a classifier to get started.

A complex tree is pre-selected so user can get started quickly.

Page 11: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

11

Before After

User has to train each model one by one. New options to allow users to train multiple models.

Page 12: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

12

Before After

User could only see the legend for the first three classes.

User can see the legend for all the classes and can show/hide each class.

Page 13: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

13

Deep Learning

§ User: Subject matter expert with no experience with deep learning§ Objective: Feasibility study on applicability of deep learning § Characteristics:

– Large optimisation problem– GPUs are great to accelerate cost function calculations– Lots of parameters. Effectiveness of developing a network depends on ability to

evaluate impact of different parameters

Page 14: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

14

What is Deep Learning ? Deep learning performs end-end learning by learning features, representations and tasks directly from images, text and sound

Traditional Machine Learning

Machine Learning

ClassificationManual Feature Extraction

TruckûCar ü

Bicycleûll

Deep Learning approach

…𝟗𝟓%𝟑%ll𝟐%

TruckûCar ü

Bicycleû

ll

Convolutional Neural Network (CNN)Learned features

End-to-end learning

Feature learning + Classification

Page 15: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

15

Why is Deep Learning so Popular ?

§ Results: Achieved substantially better results on ImageNet large scale recognition challenge– 95% + accuracy on ImageNet 1000 class

challenge

§ Computing Power: GPU’s and advances to processor technologies have enabled us to train networks on massive sets of data.

§ Data: Availability of storage and access to large sets of labeled data – E.g. ImageNet , PASCAL VoC , Kaggle

Year Error Rate Pre-2012 (traditionalcomputer vision and machine learning techniques)

> 25%

2012 (Deep Learning ) ~ 15%

2015 ( Deep Learning) <5 %

Page 16: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

16

Convolutional Neural Networks

Page 17: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

17

Code…%% Define a CNN architectureconv1 = convolution2dLayer(5,32,'Padding',2, 'BiasLearnRateFactor',2);fc1 = fullyConnectedLayer(64,'BiasLearnRateFactor',2);fc2 = fullyConnectedLayer(10,'BiasLearnRateFactor',2);

layers = [ ...imageInputLayer([32 32 3]);

convolution2dLayer(5,32,'Padding',2, 'BiasLearnRateFactor',2);maxPooling2dLayer(3,'Stride',2);reluLayer();

fullyConnectedLayer(10,'BiasLearnRateFactor',2);

softmaxLayer()classificationLayer()];

opts = trainingOptions('sgdm', 'InitialLearnRate', 0.001 ...

%% Training the CNN

[net, info] = trainNetwork(XTrain, TTrain, layers);

Page 18: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

18

Code…% everything except the last 3 layers.layers = net.Layers(1:end-3);

% Add new fully connected layer for 2 categories.

layers(end+1) = fullyConnectedLayer(2, 'Name', 'fc8_2')

% Add the softmax layer and the classification layer which make up the

% remaining portion of the networks classification layers.layers(end+1) = softmaxLayerlayers(end+1) = classificationLayer()

net = trainNetwork(trainingDS, layers, opts);

Page 19: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

19

IwasawareofmanyoftheprinciplesandtechniquesyoucoveredbuthadnoideatheyweresoeasilyaccessibleinMATLABincomparisontootherpackages(R,Wekaetc.).”

“ThisyearIwasplanningontryingoutmachinelearningusingTensorFlow orCaffe...butthebarriertoentryseemedveryhighthere,soamhappythatwithMATLABIcangetcrackingrelativelyquickly.”

DomainExpertise

Computer Science

Mathematics Statistics

Page 20: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

20

PROTOTYPING, MODELLING &

ANALYSISAPPLICATION

DEVELOPMENT

TESTING: VALIDATION AND VERIFICATION

(ENTERPRISE) IMPLEMENTATION

DomainExpertise

Computer Science

Mathematics Statistics

Page 21: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

21© 2016 The MathWorks, Inc.

Deploying MATLAB Data Analytics on the desktop and in Enterprise Production Systems

Eugene McGoldrick

Page 22: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

22

MATLAB Analytics Deployment Options

§ MATLAB Compiler and Compiler SDK targets many different platforms

– Desktop applications– Web applications– Database Servers– Big Data Solutions

§ Efficient development, testing, and deployment workflow– Single source code base– Optimize and test models in MATLAB– Simple compile and deployment mechanism to give access to analytic models to non

MATLAB users and processes

Page 23: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

23

MATLAB Data Analytics embedded as Excel Add-ins§ Desktop Excel Add-ins with compiler

§ Server based Excel Add-ins with Compiler SDK

MATLAB Production Server

RequestBroker

&ProgramManager

.

.

.

XLAMATLAB Model

MATLAB Runtime

XLA

XLA

XLA

Generated by MATLAB Compiler SDK

Generated by MATLAB Compiler

Page 24: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

24

App ServerWeb Server

Visualize Portfolio()

computeStats()

initSession()

computeFrontier()

ClientCSS

SVG JS

JS

AssetAllocStartHTML

AssetAllocResultJSP

PortOpAssetAlloc

CTF

MATLAB Production Server

Req

uest

Bro

ker

MATLABPortOptIMATLABPortOpt

Tomcat

AssetAllocProc

Scalable Web based solutions

Page 25: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

25

Workflow for Embedding MATLAB Data Analytics Components in Production Systems

§ Financial Institutions deal with Terabytes/Petabytes of data.

– Unreasonable to send the data over the wire … another solution/process is required. – Embed analytic in data engine

§ The development to production process is a two step process, using MATLAB and Deployment products

– Step 1: Bring Data to MATLAB§ Multiple data sources§ Build algorithms/models§ Test§ Compile to target platform component

– Step 2: Bring algorithm to the Data§ Install MATLAB component into the enterprise production applications.§ Same functionality/single source

Page 26: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

26

Database Server

• Optimize numerical processing within databases– Request MATLAB analytics directly from database servers

– Trigger requests based upon database transactions

• Minimize data handling using Database Toolbox

Integration with Databases

Enterprise Application Client

LibraryStored

Procedure

MATLAB Production Server

RequestBroker

&ProgramManager

Page 27: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

27

Integration with Databases 2016a

§ Native in database JSON/Restful call

§ ORACLE, MS SQL Server, SAS support this in database

Database ServerEnterprise Application JSON/

RestfulStored

Procedure

MATLAB Production Server

RequestBroker

&ProgramManager

Page 28: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

28

Database ServerClientLibrary

StoredProcedure

MATLAB Production Server

RequestBroker

&ProgramManager

Computer memory

• Database Server and MATLAB Production Server can reside in the same memory on physical hardware … occupying different areas of memory

• Efficient use of both database and MPS .. Data transfers via http in memory from one application to the other.

• Compiled MATLAB models on MPS can access the database via Database Toolbox calls in the MATLAB code (ODBC or JDBC calls)

Data can be retrieved or written to database from MATLAB Model

Database Server

Integration with Databases

Page 29: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

29

MATLAB Components in Production Databases

MATLAB Production Server can provide predictive analytics in the database§ Oracle (Java, .NET)§ Microsoft SQL Server (.NET)§ Microsoft Access (.NET)§ Netezza (JAVA)§ SAS (JAVA)§ Teradata (JAVA)

§ Thin client with MPS– Java and .NET supported

§ JSON/Restful API§ Central repository for models … Simplifies change management

MATLAB Production Server

RequestBroker

&ProgramManager

Database Server

Page 30: Marta Wilczkowiak Eugene McGoldrick MathWorks...– Desktop applications ... § SAS (JAVA) § Teradata (JAVA) § Thin client with MPS ... Program Manager Database Server. 30 MATLAB

30

MATLAB Big Data Analytic Components - Hadoop

Hadoop Grid

Compiled MATLAB Hadoop Targets

Compiled MATLAB Hadoop Targets

Compiler generated Hadoop components on each node of Hadoop Grid

• MATLAB is the analytics for big data systems such as Hadoop

• Embedded on-node custom algorithms can be developed in MATLAB and deployed to the grid

MATLAB Model

MATLAB Runtime