Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions...

67
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/CharlieDataMine Faster than a Mouse: Turn Data Mining Strategy into Action Miguel Barrera, Director of Risk Analytics, Fiserv Inc. Julia Minkowski, Risk Analytics Manager, Fiserv Inc. Make Big Data + Analytics Simple

Transcript of Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions...

Page 1: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL

Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Data Mining and Advanced Analytics [email protected] www.twitter.com/CharlieDataMine

Faster than a Mouse: Turn Data Mining Strategy into Action

Miguel Barrera, Director of Risk Analytics, Fiserv Inc. Julia Minkowski, Risk Analytics Manager, Fiserv Inc.

Make Big Data + Analytics Simple

Page 2: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Data, data everywhere

Oracle Confidential – Internal/Restricted/Highly Restricted

Data Analysis platforms requirements:

• Be extremely powerful and handle large data volumes

• Be easy to learn

• Be highly automated & enable deployment

Growth of Data Exponentially Greater than Growth of Data Analysts!

http://www.delphianalytics.net/more-data-than-analysts-the-real-big-data-problem/ http://uk.emc.com/collateral/analyst-reports/ar-the-economist-data-data-everywhere.pdf

Page 3: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Analytics + Data Warehouse + Hadoop

• Platform Sprawl

– More Duplicated Data

– More Data Movement Latency

– More Security challenges

– More Duplicated Storage

– More Duplicated Backups

– More Duplicated Systems

– More Space and Power

Page 4: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Vision

• Big Data + Analytic Platform for the Era of Big Data and Cloud

–Make Big Data + Analytics Simple

• Any data size, on any computer infrastructure

• Any variety of data, in any combination

–Make Big Data + Analytics Deployment Simple

• As a service, as a platform, as an application

5

Page 5: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Scalable in-database data mining algorithms and R integration

Powerful predictive analytics and deployment platform

Drag and drop workflow, R and SQL APIs

Data analysts, data scientists & developers

Enables enterprise predictive analytics applications

Key Features

Oracle Advanced Analytics Database Option Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics

Page 6: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

OBIEE

Oracle Database Enterprise Edition

Oracle Advanced Analytics Database Architecture Multi-lingual Component of Oracle Database—SQL, SQL Dev/ODMr GUI, R

Oracle Advanced Analytics - Database Option SQL Data Mining & Analytic Functions + R Integration

for Scalable, Distributed, Parallel in-Database ML Execution

SQL Developer

Applications

R Client

Data & Business Analysts R programmers Business Analysts/Mgrs Domain End Users Users

Platform

Oracle Database 12c

Page 7: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Data remains in the Database

Scalable machine learning algorithms (e.g. clustering, regression, decision Trees, SVM, PCA, NB, text, AR, etc.) implemented as SQL functions

Parallelized SQL data mining functions, data preparation and execution of R open-source packages

High-performance parallel scoring of SQL dm functions and R models

Key Features

Oracle Advanced Analytics Database Option Fastest way to deliver enterprise-wide predictive analytics

avings

Model “Scoring” Embedded Data Prep

Data Preparation

Model Building

Oracle Advanced Analytics

Secs, Mins or Hours

Traditional Analytics

Hours, Days or Weeks

Data Extraction

Data Prep & Transformation

Data Mining Model Building

Data Mining Model “Scoring”

Data Prep. & Transformation

Data Import

Page 8: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

avings

Model “Scoring” Embedded Data Prep

Data Preparation

Model Building

Oracle Advanced Analytics

Secs, Mins or Hours

Traditional Analytics

Hours, Days or Weeks

Data Extraction

Data Prep & Transformation

Data Mining Model Building

Data Mining Model “Scoring”

Data Prep. & Transformation

Data Import

Lowest Total Cost of Ownership

Leverage investment in Oracle tech stack

Eliminate duplicate data & ETL

Eliminate separate analytical servers

Fastest way to deliver enterprise-wide predictive analytics

Analytical Database Applications

Key Features

Oracle Advanced Analytics Database Option Fastest way to deliver enterprise-wide predictive analytics

Page 9: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Function Algorithms Applicability

Classification

Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machines (SVM)

Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text

Regression Linear Regression (GLM) Support Vector Machine (SVM)

Classical statistical technique

Wide / narrow data / text

Anomaly Detection

One Class SVM Unknown fraud cases or anomalies

Attribute Importance

Minimum Description Length (MDL) Principal Components Analysis (PCA)

Attribute reduction, Reduce data noise

Association Rules

Apriori Market basket analysis / Next Best Offer

Clustering Hierarchical k-Means Hierarchical O-Cluster Expectation-Maximization Clustering (EM)

Product grouping / Text mining Gene and protein analysis

Feature Extraction

Nonnegative Matrix Factorization (NMF) Singular Value Decomposition (SVD)

Text analysis / Feature reduction

Oracle Advanced Analytics In-Database Data Mining Algorithms—SQL & & GUI Access

A1 A2 A3 A4 A5 A6 A7

F1 F2 F3 F4

Page 10: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Oracle Advanced Analytics

• R language for interaction with the database

• R-SQL Transparency Framework overloads R functions for scalable in-database execution

• Function overload for data selection, manipulation and transforms

• Interactive display of graphical results and flow control as in standard R

• Submit user-defined R functions for execution at database server under control of Oracle Database

• 15+ Powerful data mining algorithms (regression, clustering, AR, DT, etc._

• Run Oracle Data Mining SQL data mining functioning (ORE.odmSVM, ORE.odmDT, etc.)

• Speak “R” but executes as proprietary in-database SQL functions—machine learning algorithms and statistical functions

• Leverage database strengths: SQL parallelism, scale to large datasets, security

• Access big data in Database and Hadoop via SQL, R, and Big Data SQL

Other R packages

Oracle R Enterprise (ORE) packages

R-> SQL Transparency “Push-Down”

• R Engine(s) spawned by Oracle DB for database-managed parallelism

• ore.groupApply high performance scoring

• Efficient data transfer to spawned R engines

• Emulate map-reduce style algorithms and applications

• Enables production deployment and automated execution of R scripts

Oracle Database 12c R-> SQL

Results

In-Database Adv Analytical SQL Functions

R Engine Other R packages

Oracle R Enterprise packages

Embedded R Package Callouts

R

Results

How Oracle R Enterprise Compute Engines Work

1 2 3

Page 11: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

You Can Think of Oracle Advanced Analytics Like This… Traditional SQL

– “Human-driven” queries

– Domain expertise

– Any “rules” must be defined and managed

SQL Queries – SELECT

– DISTINCT

– AGGREGATE

– WHERE

– AND OR

– GROUP BY

– ORDER BY

– RANK

Oracle Advanced Analytics - SQL & – Automated knowledge discovery, model

building and deployment

– Domain expertise to assemble the “right” data to mine/analyze

Analytical SQL “Verbs” – PREDICT

– DETECT

– CLUSTER

– CLASSIFY

– REGRESS

– PROFILE

– IDENTIFY FACTORS

– ASSOCIATE

+

Page 12: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 13

Page 13: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 14

Page 14: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 15

Page 15: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 16

Page 16: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 17

Page 17: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 18

Page 18: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 19

Page 19: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 20

Page 20: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 21

Page 21: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 22

Page 22: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 23

Page 23: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 24

Page 24: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Predicting Behavior Identify “Likely Behavior” and their Profiles

Consider: • Demographics • Past purchases • Recent purchases • Customer comments & tweets Unstructured data

also mined by algorithms

Transactional POS data

Generates SQL scripts for deployment

Inline predictive model to augment input data

SQL Joins and arbitrary SQL transforms & queries – power of SQL

Page 25: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

SQL Developer/Oracle Data Miner 4.0 New Features SQL Script Generation

– Deploy entire methodology as a SQL script

– Immediate deployment of data analyst’s methodologies

R

Page 26: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Fraud Prediction Demo

drop table CLAIMS_SET; exec dbms_data_mining.drop_model('CLAIMSMODEL'); create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000)); insert into CLAIMS_SET values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES'); insert into CLAIMS_SET values ('PREP_AUTO','ON'); commit; begin dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION', 'CLAIMS', 'POLICYNUMBER', null, 'CLAIMS_SET'); end; / -- Top 5 most suspicious fraud policy holder claims select * from (select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud, rank() over (order by prob_fraud desc) rnk from (select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud from CLAIMS where PASTNUMBEROFCLAIMS in ('2to4', 'morethan4'))) where rnk <= 5 order by percent_fraud desc;

Automated In-DB Analytical Methodology

POLICYNUMBER PERCENT_FRAUD RNK ------------ ------------- ---------- 6532 64.78 1 2749 64.17 2 3440 63.22 3 654 63.1 4 12650 62.36 5

Automated Monthly “Application”! Just

add:

Create

View CLAIMS2_30

As

Select * from CLAIMS2

Where mydate > SYSDATE – 30

Time measure: set timing on;

Page 27: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Oracle Advanced Analytics

• On-the-fly, single record apply with new data (e.g. from call center)

More Details

Call Center Get Advice

Web Mobile

Branch Office

Social Media

Email

R

Select prediction_probability(CLAS_DT_1_2, 'Yes'

USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status, 250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)

from dual;

Likelihood to respond:

Page 28: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 29

Page 29: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 30

Page 30: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 31

Page 31: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Introducing Oracle Big Data SQL

32

Massively Parallel SQL Query across Oracle, Hadoop and NoSQL

Oracle Database 12c

Offload Query to Exadata Storage Servers

Small data subset quickly returned

Hadoop & NoSQL

Offload Query to Data Nodes

SQL

data subset

SQL

Page 32: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Manage and Analyze All Data—SQL & Oracle Big Data SQL

33

Store JSON data unconverted in Hadoop

JSON

Oracle Database 12c Oracle Big Data Appliance

SQL

Data analyzed via SQL or R Store business-critical data in Oracle

SQL

Page 33: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

More Data Variety—Better Predictive Models

• Increasing sources of relevant data can boost model accuracy

Naïve Guess or Random

100%

0% Population Size

Res

po

nd

ers

Model with 20 variables

Model with 75 variables

Model with 250 variables

Model with “Big Data” and hundreds -- thousands of input variables including: • Demographic data • Purchase POS transactional

data • “Unstructured data”, text &

comments • Spatial location data • Long term vs. recent historical

behavior • Web visits • Sensor data • etc.

100%

Page 34: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Oracle R Advanced Analytics for Hadoop

35

Page 35: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

• Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics

• OAA’s clustering and predictions available in-DB for OBIEE

• Automatic Customer Segmentation, Churn Predictions, and Sentiment Analysis

Pre-Built Predictive Models

Oracle Communications Industry Data Model Example Predictive Analytics Application

Page 36: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

New Features

Page 37: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Miner 4.1 New Features

• JSON Query node

JSON Query node extracts BDA data via External Tables and parses out JSON data type and assembles data for data mining

Page 38: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Miner 4.1 New Features

• Oracle Data Miner Workflow API to Manage, Schedule and Run Workflows – PL/SQL APIs to enable

applications to execute workflows immediately or schedule them

– Oracle Scheduler for scheduling functionality

– ODMr repository views can be queried for project and workflow information

– Applications can monitor workflow execution and query generated results

CONNECT DMUSER/DMUSER SET SERVEROUTPUT ON DECLARE v_jobId VARCHAR2(30) := NULL; v_status VARCHAR2(30) := NULL; v_projectName VARCHAR2(30) := 'Project'; v_workflow_name VARCHAR2(30) := 'build_workflow'; v_node VARCHAR2(30) := 'MODEL_COEFFCIENTS'; v_run_mode VARCHAR2(30) := ODMRSYS.ODMR_WORKFLOW.RERUN_NODE_PARENTS; v_failure NUMBER := 0; v_nodes ODMRSYS.ODMR_OBJECT_NAMES := ODMRSYS.ODMR_OBJECT_NAMES(); BEGIN v_nodes.extend(); v_nodes(v_nodes.count) := v_node; v_jobId := ODMRSYS.ODMR_WORKFLOW.WF_RUN(p_project_name => v_projectName, p_workflow_name => v_workflow_name, p_node_names => v_nodes, p_run_mode => v_run_mode, p_start_date => '31-DEC-14 12.00.00 AM AMERICA/NEW_YORK', p_repeat_interval => 'FREQ=MONTHLY;BYMONTHDAY=-1', p_end_date => '31-DEC-15 12.00.00 AM AMERICA/NEW_YORK'); DBMS_OUTPUT.PUT_LINE('Job: '||v_jobId);

Page 39: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |

Oracle R Advanced Analytics for Hadoop

Oracle Internal - Proprietary 40

Algorithms and Functions

Page 40: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

OAA Links and Resources • Oracle Advanced Analytics Overview:

– OAA presentation— Big Data Analytics in Oracle Database 12c With Oracle Advanced Analytics & Big Data SQL

– Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple white paper on OTN

– Oracle Internal OAA Product Management Wiki and Workspace

• YouTube recorded OAA Presentations and Demos:

– Oracle Advanced Analytics and Data Mining at the YouTube Movies (6 + OAA “live” Demos on ODM’r 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.)

• Getting Started:

– Link to Getting Started w/ ODM blog entry

– Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.

– Link to OAA/Oracle Data Mining 4.1 Oracle by Examples (free) Tutorials on OTN

– Take a Free Test Drive of Oracle Advanced Analytics (Oracle Data Miner GUI) on the Amazon Cloud

– Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN

• Additional Resources:

– Oracle Advanced Analytics Option on OTN page

– OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog

– OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog

– Oracle SQL based Basic Statistical functions on OTN

– BIWA Summit’16, Jan 26-28, 2016 – Oracle Big Data & Analytics User Conference @ Oracle HQ Conference Center

Page 41: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

• Hands-on-Labs • Customer stories, told by the customers • Educational sessions by Practitioners and Direct from Developers • Oracle Keynote presentations • Presentations covering: Advanced Analytics, Big Data, Business Intelligence,

Cloud, Data Warehousing and Integration, Spatial and Graph, SQL • Networking with product management and development professionals

Page 42: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Faster than a Mouse: Turn Data Mining Strategy into Action

Miguel Barrera, Director of Risk Analytics, Fiserv Inc.

Julia Minkowski, Risk Analytics Manager, Fiserv Inc.

Page 43: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Risk Analytics @ Fiserv Electronic Payments

• We prevent $200M in losses every year using data to monitor, understand and anticipate fraud

• We manage risk for $30BB in transfers, servicing 2,500+ financial institutions, including the 27 of the top 30 banks in the US

• A department of 6 people, we operated in start-up mode until we were acquired in 2011 by Fiserv

• We build our risk models, supervise their installation & develop the next-generation of strategies for risk mitigation

Page 44: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

What is special about Fraud Prevention?

• Fraud is performed by organized criminal groups using sophisticated technologies and logistics

• Hard to detect: target has low frequency (2 in 10,000)

• Misclassification is expensive

$ Losses if you fail to detect fraud

60% increase in customer attrition if you miss-classify

• The environment changes fast, so you need to adapt quickly

Fraud prevention

is a great field for

the application of

predictive

analytics

Page 45: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Evolution of Model Deployment

3 months to run & deploy Logistic Regression

(using SAS) 1 month to estimate and deploy Trees and GLM

1 week to estimate, 1 week to install rules in online application

1 day to estimate and deploy Trees + GLM models (using Oracle Advanced Analytics)

2008 2010 2013 2015

Page 46: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Lessons we learned for our business

• Complex methods were hard to deploy

because they required large investments in

infrastructure and translation time

• Once we created good predictive attributes,

simple methods (Trees, GLM) were almost as

good as complex estimates (ensemble,

gradient descent)

Page 47: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

What to Do?

• Option 1: Get a bigger door

(Netezza/ Hadoop / Pivotal )

• Option 2: Shrink the Elephant

Install Simpler Algorithms

+

Page 48: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Analytics Software + Infrastructure Looking for a cost-effective way to migrate our modeling requirements from other vendors to a scalable infrastructure: IBM/ Spark

• We stopped replicating

data into other software

tables

• We integrated all

preprocessing for the

models into the DB

• We installed OAA

analytics for model

development and

integrated R during 2014

Co

st

Page 49: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Why we like Oracle Advanced Analytics:

Accuracy

Agility

Scalability

• The loss reduction from timely deployment (hours) compensates for

the small increase of highly complex models

• No data transfer needed (in-database)

• New opportunities to combine structured data with unstructured data

• The integration with our DB replication makes re-fit inexpensive

• The same algorithm can scale-up for all other clients

Page 50: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Data Miner Survey 2015 by Rexer Analytics

While 6 out 10 data miners report the data is available for analysis within days of

capture, the time to deploy the models takes substantially longer. For 60% of the

respondents the deployment time will range between 3 weeks and 1year. Everyone

forgets about

deployment –

but is most

important

component!

Page 51: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

In Fraud-Mitigation Speed is the Key How long can you wait to deploy a solution?

Page 52: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

The Value of Time

© 2015, Oracle Corporation

avings

Model “Scoring” Embedded Data Prep

Data Preparation

Model Building

Oracle Advanced Analytics

Secs, Mins or Hours

Traditional Analytics

Hours, Days or Weeks

Data Extraction

Data Prep & Transformation

Data Mining Model Building

Data Mining Model “Scoring”

Data Prep. & Transformation

Data Import

Page 53: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Accuracy / Agility vs. Cost to Deploy

• Pick the best combination of: • Less days to deployment

• High model accuracy

• Lower Cost

Application Deploy (Days) Accuracy Total Cost

SAS Server 3 0.92 x5

ODM 1 0.90 1

SAS Base 15 0.83 30%

Angoss 12 0.85 10%

Page 54: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Time to Fit & Deploy

Page 55: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

How we have leveraged the Oracle-R interface

Expand Data Exploration and Visualization 1. Chart and store the results of existing procedures in the DB 2. Run R matrices for visualization of variable densities Model Fitting 1. Transport R models and legacy code from desktops to the DB 2. All the model documentation stays in the DB and access is public

Deployment 1. Run process directly from PL/SQL and visualize in the application 2. Call R models from stored procedures for scoring and direct action

Page 56: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

OAA+ R Improves Exploration and Visualization

1. We process routines in R and save

images in DB

2. We removed the memory size constraint that made R-methods and plots impractical

3. Allowed to integrate leverage our DBAs and add them to the modeling team

Chart and store the results of existing

procedures in the DB

Page 57: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Fit from SQL Developer and Store in DB

• Wrap the script around your existing R code, just like you would with an existing R function

• The server stores the script in a table that you can call and just assemble the code dynamically to fit your models

• The SQL access allow you to preprocess information and sub-set your data as part of SQL calls in screen or procedures, so you can seamlessly integrate into your exploration or production scoring processes.

Page 58: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Store the results directly in the Database

• ggplot creates a chart, prints it to the output, then the

query runs the script and retrieves the output as an

image

• You can wrap a procedure and just run the analysis and

store the output directly into a table in the DB

Page 59: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Store models, data and performance in the DB

Page 60: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

Turning Data Mining Strategy into Action

Page 61: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Stakeholders: Everyone has different incentives

Business Manager

Data Scientist IT Manager

• Preserve Service Level

Agreement

• Reduce Operational Risk

• Preserve Budget

Page 62: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Conflict of Interests?

Cannot agree on success factors?

Wonder why…?

Page 63: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Managing the Quants

• Define clearly the objective and constraints

• Implement SMART* goal setting

• Get familiar with basic analytics concepts

• Establish a time-line for delivery then multiply x 2

• Make sure you understand enough to explain to other executives… you will champion this initiative and negotiate the budgets

Source: Davenport, Tom (2013), “Keep up with your quants", Harvard Business Review, Issue July-August 2013

Page 64: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Take Care of Business (tips for Data Scientists)

• Communicate clearly business level information

• When and what is the expected result

• Present the key concept in 2 phrases

• Avoid technical language for communication

• If asked for more details, then present the “How”

• Provide a Business Dashboard

• Provide the $$ metrics profit/loss reduction

• Show the impact of algorithms deployed / provided

• Current vs. Historical

• Pick the right model - the model that maximizes the ROI

Source: Davenport, Tom (2013), “Keep up with your quants", Harvard Business Review, Issue July-August 2013

Page 65: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Tracking Performance: Dashboard

Our dashboards tracked the key performance metrics:

• Historical Trends for Fraud Rates and Losses (Business KPI)

• Percentage of Transfers affected by Risk Mitigation (Business KPI)

• % of population affected by policy and % of fraud prevented (KPI for Analytics)

• Fraud detection rates for rules installed (KPI for Analytics)

Page 66: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates.

Key Takeaways On Fraud Modeling

• When choosing the tools for fraud management, speed is a critical factor

• Oracle Advance Analytics provided a fast and flexible solution for model building, visualization and integration with production processes

• The additional interface with R has allowed the group to leverage the skillset for both the analytics and data management team, accelerating adoption across our group

Turning Strategy into Action

• Involving the key stakeholders early in the process maximizes your chance for success. Once you have aligned the incentives for the team, selecting the appropriate techniques, tools and infrastructure becomes much simpler

• For data scientists it is important to select their models and projects based on the expected business impact and to translate their findings into the relevant metrics

Page 67: Big Data Analytics with Oracle Advanced Analytics 12c and Big … · implemented as SQL functions Parallelized SQL data mining functions, data preparation and execution of R open-source

© 2014 Fiserv, Inc. or its affiliates. 69

If you have further questions or comments, please contact:

[email protected]

[email protected]