Oracle Machine Learning Platform Overview
Mark Hornick, Senior Director, Product ManagementData Science and Machine Learning
Move the algorithms, not the data!
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Agenda
• Oracle Machine Learning family of products
• Supporting multiple personas
• OML component details
• Enabling Oracle Applications use of ML
3
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 4
Oracle Machine Learning
OML Microservices**Supporting Oracle ApplicationsImage, Text, Scoring, Deployment,
Model Management
* Coming soon ** For use through Oracle Application only
OML4SQLOracle Advanced Analytics
SQL API
OML4Py*Python API
OML4ROracle R Enterprise
R API
OML Notebookswith Apache Zeppelin on
Autonomous Database
OML4SparkOracle R Advanced Analytics
for Hadoop
Oracle Data MinerOracle SQL Developer extension
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 5
New Branding Corresponding Products and Components
Oracle Machine Learning (OML)Oracle Advanced Analytics (OAA) and Oracle R Advanced Analytics for Hadoop (ORAAH)
Oracle Machine Learning for SQL (OML4SQL) Oracle Advanced Analytics (OAA) / Oracle Data Mining (ODM)
Oracle Machine Learning for R (OML4R) Oracle R Enterprise (ORE)
Oracle Machine Learning for Python (OML4Py) Oracle Machine Learning for Python (OML4Py)
Oracle Machine Learning for Spark (OML4Spark) Oracle R Advanced Analytics for Hadoop (ORAAH)
Oracle Machine Learning Notebooks previously known as Oracle Machine Learning
Oracle Data Miner (ODMr)
Note: The official price list product names have not changed.
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application and Dashboard Developers
Executives
6
OML empowers Enterprise Users
OracleMachineLearning
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 7
Data Scientists
•Popular data science languages: R, Python, SQL•Augment with 3rd party packages•Scalability and performance•Automation-enhanced productivity•Greater enterprise collaboration•Integrate and analyze data across the enterprise
OracleMachineLearning
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application and Dashboard Developers
Executives
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 8
Business and Data Analysts
•Expand analytical tool set with ML•Enable non-ML experts with AutoML•Leverage domain knowledge for better results•Collaborate with Data Scientists and IT
OracleMachineLearning
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application and Dashboard Developers
Executives
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 9
DBA and IT Professionals
OracleMachineLearning
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application and Dashboard Developers
Executives
• Even greater value from Oracle investment• Support scalability and performance• Simpler, streamlined infrastructure• Maintain data security, backup, recovery• Use SQL, expand to Python and R• Leverage Database and Big Data sources
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Application and Dashboard Developers
• Realize intelligent solutions faster through Oracle stack integration
• Easily uptake data scientists’ R, Python, and SQL scripts and rapidly deploy solutions
• Embed ML in applications and dashboards using SQL, REST, and SODA APIs
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application and Dashboard Developers
Executives OracleMachineLearning
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Executives
• Benefit from world-class data management technology and support
• Democratize ML across the enterprise to enable better data-driven decisions
• Deploy solutions faster to realize ROI
Data Scientists
Business and Data Analysts
DBA and IT Professionals
Application and Dashboard Developers
Executives OracleMachineLearning
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Vision: Autonomous Database evolves into Autonomous Data Platform
12
Big Data
Application Development
Machine Learning
Data Ingest
Data Analytics
From essentially a database cloud service…
…to a tightly integrated data platform
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 13
Vision: Manage and Analyze Cross-Platform Datawith Oracle Machine Learning
Oracle Cloud SQLOML4R
OML4Python
OML4SQL
REST
Popular R IDEs
Popular Python IDEs
SQL Developer
OML Notebooks
SelectUser Interface, e.g.
APIOptions
Cloud or On-premises
Reach broaderData Sources
Oracle Object Storage
Big DataService (HDFS)
NoSQLDatabases
KafkaStreams
Amazon S3
Azure Blob Storage
Oracle Database
Data Lake
OML4Spark
Oracle Big Data SQL
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
CLASSIFICATION– Naïve Bayes– Logistic Regression (GLM)– Decision Tree– Random Forest– Neural Network– Support Vector Machine– Explicit Semantic Analysis
CLUSTERING– Hierarchical K-Means– Hierarchical O-Cluster– Expectation Maximization (EM)
ANOMALY DETECTION– One-Class SVM
TIME SERIES– Forecasting - Exponential Smoothing– Includes popular models
e.g. Holt-Winters with trends, seasonality, irregularity, missing data
REGRESSION– Linear Model– Generalized Linear Model– Support Vector Machine (SVM)– Stepwise Linear regression– Neural Network– LASSO
ATTRIBUTE IMPORTANCE– Minimum Description Length– Principal Component Analysis (PCA)– Unsupervised Pair-wise KL Div – CUR decomposition for row & AI
ASSOCIATION RULES– A priori/ market basket
PREDICTIVE QUERIES– Predict, cluster, detect, features
SQL ANALYTICS– SQL Windows– SQL Patterns– SQL Aggregates
Oracle Machine Learning Algorithms and Analytics
FEATURE EXTRACTION– Principal Comp Analysis (PCA)– Non-negative Matrix Factorization– Singular Value Decomposition (SVD)– Explicit Semantic Analysis (ESA)
TEXT MINING SUPPORT– Algorithms support text columns– Tokenization and theme extraction– Explicit Semantic Analysis (ESA) for
document similarity
STATISTICAL FUNCTIONS– Basic statistics: min, max,
median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.
R AND PYTHON PACKAGES– Third-party R and Python Packages
through Embedded Execution– Spark MLlib algorithm integration
14
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OML algorithms differentiatorsFeature In-Database Spark
No data movement to separate analytical engines
Wide range of ML techniques supported Native Native and Spark MLlib
High performance from parallel, distributed execution Spark 2-based. Use all nodes from Hadoop cluster
Greater scalability from improved memory utilization
Handle narrow, wide, and sparse data Plus star schema, nested data
Automation Data preparation, text mining, partitioned models, AutoML
Multiple language interfaces SQL, R, Python R, Java
High performance R Formula for R Language
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning Notebooksfor Autonomous Database
16
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning Notebooks
• Collaborative UI – Based on Apache Zeppelin
– Supports data scientists, data analysts, application developers, DBAs
– Easy sharing of notebooks and templates with permissions, versioning, and execution scheduling
• Included with Autonomous Database
– Automatically provisioned, managed, backed up
– In-database machine learning algorithms and analytics functions via OML4SQL, and soon to be augmented with Python and R
Autonomous Database as a Data Science Platform
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for SQL (OML4SQL)for Python (OML4Py)for R (OML4R)
Empower SQL users with immediate access to ML in Oracle Database and Oracle Autonomous Database
Empower data scientists with open source environments
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Traditional Analytics and Data Source Interaction
• Access latency
• Paradigm shift: e.g., R/Python Data Access Language R/Python
• Memory limitation – data size, in-memory processing
• Single threaded
• Issues for backup, recovery, security
• Ad hoc production deployment
DeploymentAd hoc
cron job
Data SourceFlat Files extract / exportread
export load
Data source connectivity packages
Read/Write files using built-in tool capabilities
?
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for SQL
• Use in-database parallel and distributed machine learning algorithms from SQL and PL/SQL
• ML models as first class database objects
• Export / import models across databases
• Batch and real-time scoring with explanatory predictive details
• Leverage machine learning across SQL-enabled Oracle stack
Component of ADB and Oracle Advanced Analytics option to Oracle Database
SQL InterfacesSQL*PlusSQLDeveloper…
AutonomousDatabase
OML Notebooks
Oracle Databasewith OAA option
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OML4SQL: Model Build and Real-time Prediction
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'BUY_INSUR1',
mining_function => dbms_data_mining.classification,
data_table_name => 'CUST_INSUR_LTV',
case_id_column_name => 'CUST_ID',
target_column_name => 'BUY_INSURANCE',
settings_table_name => 'CUST_INSUR_LTV_SET');
END;
Simple SQL Syntax—Classification Model
SELECT prediction_probability(BUY_INSUR1, 'Yes'
USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age,
'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)
FROM dual;
Model build (PL/SQL)
Real-time scoring (SQL query)
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for R and Python
• Use Oracle Database as HPC environment
• Use in-database parallel and distributed machine learning algorithms
• Manage R and Python scripts and objects in Oracle Database
• Integrate open source results into applications and dashboards via SQL
• In OML4Py, automated machine learning – AutoML
Components of Oracle Advanced Analytics option to Oracle Database
Database ServerMachine
Client SQL InterfacesSQL*Plus
SQLDeveloperOML4Py OML4R
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for R and Python
• Transparency layer– Leverage proxy objects so data remain in database
– Overload native functions translating functionality to SQL
– Use familiar R / Python syntax to manipulate database data
• Parallel, distributed algorithms– Scalability and performance
– Exposes in-database algorithms available from OML4SQL
• Embedded execution– Manage and invoke R or Python scripts in Oracle Database
– Data-parallel, task-parallel, and non-parallel execution
– Use open source packages to augment functionality
• In OML4Py, Automated Machine Learning - AutoML– Feature selection, model selection, hyper-parameter tuning
Oracle Advanced Analytics option to Oracle Database
Database ServerMachine
Client SQL InterfacesSQL*Plus
SQLDeveloperOML4Py OML4R
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Example using OML4R interface
Proxy objects for scalability
data.frame
Proxydata.frame
Inherits from
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Transparency Layer
• Leverages proxy objects for database data: oml.DataFrame
# Create table from Pandas DataFrame data
DATA = oml.create(data, table = 'BOSTON')
# Get proxy object to DB table boston
DATA = oml.sync(table = 'BOSTON')
• Uses familiar Python syntax to manipulate database data
• Overloads Python functions translating functionality to SQL
In-database performance from indexes, query optimization, parallelism, partitioning
DATA.shape
DATA.head()
DATA.describe()
DATA.std()
DATA.skew()
TRAIN, TEST =
DATA.split()
TRAIN.shape
TEST.shape
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
In-database modeling using Support Vector Machine
Parallel, Distributed Algorithms
User tables
Oracle Database
from oml import svm
# create proxy object
ONTIME_S = oml.sync(table='ONTIME_S')
# define model object
settings = {'svms_outlier_rate' : 0.01}
svm_mod = svm('anomaly_detection',
svms_kernel_function =
'dbms_data_mining.svms_linear',
**settings)
# build anomaly detection model
svm_mod = svm_mod.fit(x=ONTIME_S, y=None)
# view model object
svm_mod
OML4Py
Python Client
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Embedded Python Execution
User tables
pyq*eval () interface
extproc
Oracle Database
extproc
# user-defined function using sklearn
def build_lm(dat):
from sklearn import linear_model
lm = linear_model.LinearRegression()
X = dat[['PETAL_WIDTH']]
y = dat[['PETAL_LENGTH']]
lm.fit(X, y)
return lm
# select column(s) for partitioning data
index = oml.DataFrame(IRIS['SPECIES'])
# invoke function in parallel on IRIS table
mods = oml.group_apply(IRIS, index,
func=build_lm,
parallel=2)
mods.pull().items()
OML4Py
Python Engine
OML4Py
Python Engine
OML4Py
Python Client
Example of parallel execution for partitioned data flow using third party package
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
AutoML – new with OML4Py
• Auto Feature Selection– Reduce # of features by
identifying most predictive
– Improve performance and accuracy
Increase data scientist productivity – reduce overall compute timeUses in-database algorithms
AutoMSMuch faster than exhaustive search
AutoFS>50% reduction in
features
AutoTuneSignificant score
improvementData Model
• Auto Model Selection– Identify algorithm that
achieves highest model quality
– Find best model faster than with exhaustive search
• Auto Tune Hyperparameters– Significantly improve
model accuracy
– Avoid manual or exhaustive search techniques
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Reduce # features by identifying most relevantImprove performance and accuracy
OML4Py Auto Feature Selection: examples
0
5
10
15
20
25
30
299 9
Trai
nin
g ti
me
(sec
on
ds)
ML training time
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
299 9
Acc
ura
cy
Prediction Accuracy
33xreduction
+4%
OpenML dataset 312 with 1925 rows, 299 columns OpenML dataset 40996 56K rows, 784 columns
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
784 309
Acc
ura
cy
Prediction Accuracy
+18%
60% reduction1.3X time reduction to build SVM Gaussian model
97% reduction
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Core APIs Feature Summary
Feature OML4SQL OML4Py OML4R
Transparency Layer n/a
Parallel, Distributed Algorithms
Embedded Execution n/a
Automated Data Preparation
Automated Text Processing
Partitioned Models
Automated Machine Learning (AutoML)
PGX Integration for Graph Analytics implicit
DML table package transparency n/a
Extensible Algorithm Models
SQL plus two most popular open source languages for machine learning
n/a – not applicable
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for Spark (OML4Spark)supported by Oracle R Advanced Analytics for Hadoop
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for Spark
• Leverage Spark 2 environment for powerful data preparation and machine learning
• Use data across range of Data Lake sources
• Achieve scalability and performance using full Hadoop cluster
• Parallel and distributed machine learning algorithms from native and Spark MLlib implementations
• Use expressive R Formula specification
R Language API Component to Oracle Big Data Connectors
Java API
HDFS / Hive / Spark DF / Impala / JDBC Sources
BDABDSDIY
OML4Spark
R Client
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for Spark
• Transparency layer– Proxy objects reference data from file system, HDFS, Hive,
Impala, Spark DataFrame and JDBC sources
– Overloaded R functions translate functionality to native language, e.g., HiveQL for HIVE and Impala
– Users manipulate data via standard R syntax
• Parallel, distributed machine learning algorithms– Scalability and performance leveraging full Hadoop cluster
– Spark-based custom LM, GLM, NN, K-Means plus Spark MLlib algorithms
– Use expressive R Formula specification
• Compute framework with custom R mappers/reducers– Data-parallel and task-parallel execution
– Allows for open source CRAN packages run on the Cluster Nodes
R Language API Component to Oracle Big Data Connectors
Java API
HDFS / Hive / Spark DF / Impala / JDBC Sources
BDABDSDIY
OML4Spark
R Client
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
OML4Spark Performance
• Logistic Regression (GLM)
• Data fits in memory
– Up to 7x faster than Spark MLlib
• Data cannot fit memory
– Able to solve a 10B row model
• Benchmark environment
– ORAAH 2.8.0
– Big Data Appliance X7-2
– 6 Nodes, 256GB of RAM per Node
Formula: cancelled ~ distance + origin + dest + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum)
1
10
100
1,000
10,000
100K 1M 10M 100M 1B 10B
Exec
uti
on
Tim
e (s
eco
nd
s)
Dataset Size (# rows)
OML4Spark vs. Spark MLlibfor GLM Logistic Regression
OML4Spark MLlib
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Enabling Oracle Applications with OML Models and Microservices
36
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
HCM Cloud Workforce Predictions
CRM Sales Cloud Sales Prediction
Retail GBUCustomer Insights, Customer Segmentation
Adaptive Intelligent Applications for Manufacturing
Configure, Price, Quote Cloud
Content and ExperienceUnstructured Data Analytics
Oracle Integration CloudDigital Process Automation
Industry Data ModelsCommunications, SNA, Utilities, Airlines, Retail, …
EBS Spend Classification
Organize spend into logical categories
EBS Depot Repair
Optimize speed, cost, quality of product repair, reuse, recycling
Oracle Identity ManagementAdaptive Access Management
FSGBUAnalytical ApplicationsInfrastructure
Applications integrating Oracle Machine Learning
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Integration Cloud (OIC)• Digital Process Automation
– Help business users make better decisions by using recommendations from ML models
– Increase automation of human-centric approval workflows
• Used by Oracle SaaS process-centric apps
• PaaS service that exposes OML features– Build models in ADB, deploy via OML Microservices
Oracle Content and Experience (OCE)• Improve Content Discoverability
– Search, organize content, reduce duplication
– Find relevant images/docs during content creation
– Automatic tagging and classification of images, videos, text
– Visual search
• Cloud-based content hub to drive omni-channel content management and accelerate experience delivery
• Leverages OML cognitive microservices
Application Integration Spotlight – Platforms using OML Platform
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning Microservices
• Model Management Services– Building and deploying OML models
• Cognitive Services– Feature Extraction, Image and Text
• Model repository – Store, version, compare ML models
• REST APIs for application integration
• Docker Containers for portability
• Kubernetes support for container management
39
Available now internally to Oracle Applications teams
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning Platform REST APIs
Cognitive Services
FeatureExtraction REST
CognitiveImage REST
CognitiveText REST
ModelRepository REST
ModelDeploy REST
Model Management and Online Scoring Services
Oracle REST Data Services
REST Build / Batch Scoring / Export
Store Models and Metadata
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning
• Automation, scalability, and performance
• Machine learning model deployment for applications
• Integrated with Oracle Database, Big Data, and Cloud environments
• APIs for REST, SQL, R, and Python
52
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. | 53
For more information…
https://www.oracle.com/database/technologies/datawarehouse-bigdata/machine-learning.html
Top Related