Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different...

65
Harry Snart [email protected], linkedin.com/in/harrysnart 1 Production ML with Autonomous Data Warehouse

Transcript of Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different...

Page 1: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Harry [email protected], linkedin.com/in/harrysnart

1

Production ML with Autonomous Data Warehouse

Page 2: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Future and pastTechCasts:

Formerly called the BIWA Summit with the Spatial and Graph Summit

Same great technical content…new name!

www.AnalyticsandDataSummit.org

Submit a topic to share at https://analyticsanddatasummit.org/techcasts/

Page 3: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

Helpful LinksORACLE AUTONOMOUS CLOUDhttps://cloud.oracle.com/tryit

ORACLE AUTONOMOUS HANDS ON LAB FOR DEVELOPERShttps://go.oracle.com/e/f2?LP=82486

ORACLE MACHINE LEARNING ON OTNhttps://www.oracle.com/technetwork/database/options/oml/overview/index.html

OML TUTORIALSBasic getting started: https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/omlug/get-started-oracle-machine-learning.html#GUID-2AEC56A4-E751-48A3-AAA0-0659EDD639BACredit_Scoring_100K Targeting Top Customers: https://oracle.github.io/learning-library/workshops/adwc4dev/?version=Self-Guided&page=SG-intro.md&elqTrackId=e57daae9db8d44bfac4a9e6614175e5a&elqaid=82487&elqat=2&source=%3Aow%3Alp%3Acpo%3A%3A

ORACLE ANALYTICS CLOUDExamples: https://www.oracle.com/solutions/business-analytics/data-visualization/examples.html

3

Page 4: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Analytics and Data SummitAll Analytics. All Data. No Nonsense.

February 25-27, 2020Oracle campus in Santa Clara, CA

Formerly called the BIWA Summit with the Spatial and Graph Summit

Same great technical content…new name!

www.AnalyticsandDataSummit.org

Page 5: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Production Machine Learning with the Autonomous Data Warehouse

Harry Snart

Page 6: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Safe harbor statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.

The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.

Page 7: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

What is Machine Learning?

Page 8: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Any technique which enables computers to mimic human behavior

AI techniques that give computers the ability to learn without being explicitly programmed to do so A subset of ML which make

the computation of multi-layer neural networks feasible

ARTIFICIAL INTELLIGENCE

MACHINE LEARNING

DEEP LEARNING

1950’s 1960’s 1970’s 1980’s 1990’s 2000’s 2010s

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

ARTIFICIAL INTELLIGENCE IS NOT A NEW CONCEPT

Page 9: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

9

Types of Machine Learning Algorithms

Regression Classification Unsupervised

Supervised

Page 10: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

About today’s demo

Using binary classification models to classify labels in a dataset Data from UCI ML repository:

https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

Has been used in the past to run Kaggle competitions Dataset is cell attributes of breast tissue where a tumour has been found We use a historic dataset to predict whether a given tumour is benign or

malignant

Page 11: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Goals of Demo

Use Data Visualization to do Exploratory Data Analysis (EDA)

Profile data

Identify any inputs we might want to exclude if model is overfitting

Identify shape of training dataset

Rapid prototype a model and evaluate using DVML

Use the Autonomous Data Warehouse to Productionise our Model

Load data into ADW

Use the Oracle Machine Learning Notebooks to build an in-database model

Expose model as a set of REST APIs

Document our APIs for Application Developers to use in their Apps

Page 12: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Why is it difficult to drop Machine Learning Models

into Production?

Page 13: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

61%Organizations have Data Science initiatives

Page 14: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

87%Data Science Projects Fail

Page 15: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Data Scientist != Computer Scientist

Page 16: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Machine Learning Models

Unsupervised Learning

Clustering

Anomaly Detection

Neural Networks

Regression

Classification

Sentiment Analysis

Time Series

Dimensionality Reduction

Deep Learning

Sound Recognition

Image Recognition

Language Processing

Object Classification

Supervised Learning

Page 17: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Application DevelopmentMachine Learning

Page 18: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Example - Openstack

+

Apps

RE

ST

Page 19: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

What Makes Oracle Data Science Solutions Different

ALL Data Management

Integrated Data Management Platform with big data / data lake capabilities

Self-driving, self-tuning, self-recovering data management and administration

Data Science Capabilities

Parallel, scalable Machine Learning algorithms

In-database processing -> move the algorithm to the data

Notebook and GUI development interfaces

Collaborative development environments

Integrated ML Capabilities (Graph, Text, Spatial…)

Enriched with the best of Open Source

High Performance Cloud Infrastructure

Integrated easy to use platform that minimizes the difficulties of development and adoption

Focus on solving business problems by maximizing end-user efficiency

19

Page 20: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Oracle Machine Learning Algorithms and Analytics

Copyright © 2019, Oracle and/or its affiliates. All rights reserved.20

Classification• Naïve Bayes• Logistic Regression (GLM)• Decision Tree• Random Forest• Neural Network• Support Vector Machine• Explicit Semantic Analysis

Clustering• Hierarchical K-means• Hierarchical O-cluster• Expectation Maximization (EM)

Anomaly Detection• One-class SVM

Time Series• Forecasting - Exponential

Smoothing• Includes Popular Models

E.G. Holt-winters With Trends, Seasonality, Irregularity, Missing Data

Regression• Linear Model• Generalized Linear Model• Support Vector Machine (SVM)• Stepwise Linear Regression• Neural Network• Lasso

Attribute Importance• Minimum Description Length• Principal Comp Analysis (PCA)• Unsupervised Pair-wise KL Div• CUR Decomposition For Row &

AI

Association Rules• A Priori/ Market Basket

Predictive Queries• Predict, Cluster, Detect, Features

SQL Analytics• SQL Windows• SQL Patterns• SQL Aggregates

Feature Extraction• Principal Comp Analysis (PCA)• Non-negative Matrix Factorization• Singular Value Decomposition (SVD)• Explicit Semantic Analysis (ESA)

Text Mining Support• Algorithms Support Text• Tokenization And Theme Extraction• Explicit Semantic Analysis (ESA) For

Document Similarity

Statistical Functions• Basic Statistics: Min, Max,

Median, Stdev, T-test, F-test, Pearson’s, Chi-sq, ANOVA, Etc.

R and Python Packages• Third-party R And Python Packages

Through Embedded Execution• Spark Mllib Algorithm Integration

Page 21: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

The Machine Learning Process

21

Business Understanding

Data Understanding

Data Preparation

Modeling Evaluation Deployment

Capture DataDescribe DataExplore Data

Select DataClean DataConstruct DataCombine Data

Select Modeling TechniqueBuild ModelAsses Model

Evaluate resultDecide Next Step

ObjectivesSuccess Criteria

Page 22: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

The Process – Rapid Prototyping Using OAC

22

Business Understanding

Data Understanding

Data Preparation

Modeling Evaluation Deployment

VisualizationExplain

Data PrepData Flow

Data Flow

Model DetailsVisualization

Page 23: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

23

Applied before training and applying a model

Automated data preparations

Fill NA Sanitize Feature EncodingStandardize and

clean

• Drop columns with NULL percentage higher than Maximum Null Value Percentage

• Replace NA values• Categorical using most

or least frequent value• Numeric using mean,

max, min or median value

• Same for all columns of the same type

• Remove categorical (text) columns with high number of unique values (> 75%)

• Remove numerical columns that have a uniform distribution

• Remove highly correlated attributes (above 0.75)

• Convert string values to numeric values

• Two ways• OneHot• Indexer

• Optional for some algorithms

• Standardization of Numeric columns

• Remove mean and scale to unit variance

• Purpose to normalize the range of values

Page 24: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Drop into Production

Apps

RE

ST

Page 25: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 26: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 27: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 28: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 29: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 30: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 31: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 32: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 33: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 34: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 35: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 36: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 37: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 38: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 39: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 40: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 41: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 42: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 43: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 44: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 45: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 46: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 47: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 48: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 49: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 50: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 51: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 52: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 53: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 54: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 55: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 56: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 57: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 58: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 59: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 60: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 61: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 62: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 63: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 64: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake
Page 65: Production ML with Autonomous Data Warehouse · What Makes Oracle Data Science Solutions Different ALL Data Management Integrated Data Management Platform with big data / data lake

Summary

Using OAC we can explore our data

We can build a prototype predictive model

with OML we can build a scalable, production ready model inside the database

With ORDS we can expose our model as a REST API

Using REST we can deploy our model into custom applications