Oracle Data Integration Solutions (DIS) · PDF fileIBM DataStage Tableau IBM Discovery IBM...

64

Transcript of Oracle Data Integration Solutions (DIS) · PDF fileIBM DataStage Tableau IBM Discovery IBM...

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration Solutions

(DIS) An Overview

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Market Overview

estimates the DI market

will reach $2.8 billion

estimates the DQ market

will reach $1.75 billion

by 2016 with an average growth rate of 18.2%

Exponential Growth in Data Volumes

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration Solutions and Proven Benefits

5

Improve Agility • Deploy Projects Faster

• Reliable Real-Time

Reduce Risk • Popular, Proven Tools

• Open, Not Proprietary

Reduce Costs • Better Productivity

• Eliminate ETL Servers

Analytic Data Integration • Big Data Integration & Governance • Data Warehouse Integration • Business Intelligence Applications

Enterprise Data Integration and Governance • Enterprise Data Quality and Profiling • Comprehensive, Heterogeneous Data Integration • Business Glossary and Metadata Management

Business Continuity • Active-Active for Maximum Availability • Zero Downtime Migrations • Data Consolidation / Application Modernization

24 x 7 x 365

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration 12c Delivery real-time data integration for Cloud and Big Data

Big Data

Cloud

Apps

Database

• Real-time data replication; optimized for Database 12c and Oracle Exadata

• End-to-end integrated with simplified deployment

• Unified tooling for both structured data sources and Hadoop / NoSQL

• Flexible deployment on-premise or in the Cloud for heterogeneous systems

• Expanded support for 3rd party systems and Oracle Applications in real-time data integration and continuous availability solutions

Oracle Data Integrator

Oracle GoldenGate

Oracle Enterprise Data

Quality

Oracle Data Services

Integrator

Oracle Meta Data

Management

Oracle Active

Data Guard

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Comprehensive Data Integration & Governance Capabilities

7

Real-Time Data Movement – Low impact capture, stage in Hadoop – Continuous data availability

Data Transformation – Bulk data movement – Pushdown data processing

Data Federation – Virtualized Data Services

Data Quality & Verification – Fix quality at the source – Verify data consistency

Metadata Management – Lineage and Impact Analysis – Business Glossary Semantics

Data Governance Foundation

Oracle Data Integrator (Transformation)

Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate)

Fast Load

Oracle GoldenGate (Movement)

Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance)

Data Service Integrator (Federation)

GoldenGate Veridata (Online Data Verification)

ELT Processing on Hadoop or SQL

Continuous Availability

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Governance Foundation

Differentiated Technical Approach

8

Dynamic Data Movement – Real-time CDC is by default, not ETL – Least invasive on sources – Proven best performance – Integrated Oracle capture/apply

No ETL Engines – Take the processing to the data;

don’t move the data to the process – Leverage your data engines for the

workloads (Hadoop or SQL)

Most Heterogeneous – Leverage open source Hadoop, not

proprietary distributions – Hadoop is the Hub, not ETL tools – Open metadata standards

Oracle Data Integrator (Transformation)

Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate)

Fast Load

Oracle GoldenGate (Movement)

Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance)

Data Service Integrator (Federation)

GoldenGate Veridata (Online Data Verification)

ELT Processing on Hadoop or SQL

Continuous Availability

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Comprehensive, Open & Heterogeneous Data Integration

9

Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata

Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter

CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL

QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema

Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus

+ open APIs and standards based meta-model

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Enterprise Metadata Management

10

Packaged for two offerings:

Oracle Enterprise Metadata Management (OEMM) Fully featured enterprise edition product

Oracle Metadata Management for Oracle Business Intelligence (OMM) Limited for use with OBIEE, no Business Glossary

Key Features: Report to Source Lineage

Impact Analysis

Model Versioning

Annotations and Tagging

Supports Metadata Standards

Business Glossary

3rd Party BI Metadata

3rd Party ETL Metadata

3rd Party DB Metadata

Big Data Ready

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Metadata Harvesting from all Popular Platforms

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Enrichment Cloud Service (ODECS)

12

Data Discovery & Visualization

Desktop Analytics

Enterprise Reporting

Internet

Logs

Unstructured & Structured Data

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle GoldenGate Low-Impact, Real-Time Data Integration & Transactional Replication

PERFORMANCE: Low-

impact Real-Time Data

Integration and Replication

FLEXIBLE: Open,

Modular Architecture –

Heterogeneous including

Cloud and Big Data

RELIABLE: Maintains,

Transactional Integrity –

Resilient against Failures

Real-Time Changed Data Capture

Data Integrator

New DB/ HW/OS/APP

Fully Active Distributed

DB

Reporting Database

Data Warehouse

Message Bus

Oracle & Non-Oracle Database(s)

Cloud

Cloud & On-

Premises

Big Data

Message Bus

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integrator E-LT: Bulk Data Processing and Fast Data Transformation

Big Data

Cloud

Apps

Database

Oracle Data Integrator

High Performance E-LT

Declarative Design

Extensible Knowledge

Modules

Data Services

Structured &

Unstructured Data

• Certified for leading technologies to deliver fast time to value

• High-performance, low cost of ownership E-LT architecture

• Lightweight deployment

• Flexible, easy to enrich functionality

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15

Industry Leading Performance Extremely Fast Execution and Reduced Cost

E-LT provides a flexible architecture for

optimized performance on any platform

Benefits

Leverages set-based transformations

Improves performance for loading,

no network hop

Takes advantage of existing infrastructure:

hardware and software

Conventional ETL Architecture

Extract Load

Transform

Next Generation Architecture

“E-LT”

Load Extract

Transform Transform

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Sources

Oracle Enterprise Data Quality

Parsing Standardization Cleansing Matching Merging

Targets

Oracle Data Integrator

E-LT/ETL Process

- Continuous Quality Monitoring - Quality Alerts

4

Create new Data Quality Rules

2

- Add Data Quality to E-LT/ETL Flow

3

Profile Data 1

EDQ and ODI: Comprehensive Data Quality Process

Data Profiling

• Analyze and understand

data

to build ODI mappings

Automated Processes

• Data De-duplication

• Semantic/Contextual data

parsing, cleansing and

standardization

• Address Validation &

Geolocation > 240 countries

• invoked in ODI workflow

Measure Ongoing Data Quality

• Assess quality of data

• in target system. How well

is ETL working?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Reduce ODI Implementation Time and Risk 50% of data warehouse/BI projects have limited acceptance or are outright failures as a

result of lack of attention to data quality issues

ETL mappings should not be solely developed based on specifications

Data Profiling helps uncover defects, patterns, formats early in the ETL development process

Use EDQ Profiling to analyze and understand your data and required mappings

Populate a Data Warehouse with High Quality Data Avoid making poor decisions based on poor data (avoid garbage-in, garbage-out)

Platform for Data Governance/Data Stewardship and ongoing quality improvement

Engage business users in defining and implementing appropriate business rules

Use EDQ Batch Processing to deliver accurate, consistent and complete data

ODI and EDQ: Core Use Cases

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

EDQ Product Architecture

• All Java Server (Stateless)

• Java Webstart Client Applications

• Fully integrated with a single repository and UI

• Batch and Real-time Execution

• Connects to virtually any source/target of data

• Platform Independent

18

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Core Four Plays for Data Integration Solutions

DIS for

Business Intelligence &

Data Warehousing

Modernization and

Consolidations

High Availability Data Integration

for Oracle Applications

19

24 X 7

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

• Real-time or near-real time data feeds

• Move to EL-T and remove middle-tier ETL

• Integrated Data Cleansing using EDQ

• Optimized for Exadata

• Make business decisions with real-time data

• Oracle’s BI Apps solutions

20

DIS for Business Intelligence & Data Warehousing

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Infrastructure Modernization

• Move away from legacy to gain better ROI & drive innovation

• Cross platform (DB/OS), Cloud or On-Premises with ease

Data Consolidation to Exadata and Cloud

• DIS is red stack optimized – only ODI can run on Exadata for best data loading performance

• OGG fully supports Oracle on Exadata

• EDQ for de-duplication and cleansing of data

21

Modernization and Consolidation

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Active-Active, Multi-Master, Disaster Recovery

• Simply the best HA solution using Oracle GoldenGate and Active Data Guard

• Make better use of your HA investments

Zero Downtime Operations

• Avoid downtime planned or unplanned

• Keep production systems making money for the company!

• Reduce risk with fail-back or phased migrations

22

High Availability

24 X 7

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

High Availability Solution: Avoid Planned or Unplanned Downtime

Solution • Zero downtime operations for any supported database. o Upgrades, Maintenance for HW, OS, DB or Applications

• Logical, heterogeneous replication with GoldenGate for DR for non-Oracle databases

• Active-Active bi-directional or multi-master replication with GoldenGate

• Physical replication with Active Data Guard o Best for Disaster Recovery for Oracle Applications o Best for Disaster Recovery for Oracle Database

Benefits • Ensure business continuity in any situation • Eliminate planned downtime for maintenance for any

supported database with Oracle GoldenGate • Improve ROI by utilizing standby database • Mitigate Risk for Applications and infrastructure upgrades

and migrations.

23

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Optimized for Oracle Applications

• Active-active data synchronization

• Integrated operational reporting

• Included in Oracle BI Applications

Data Integration for SOA

• Large, complex transformations

• Direct to database integration connections with no impact on performance, avoiding placing large data traffic on queues

24

Data Integration for Oracle Applications

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Integration for Oracle Applications Solution: Integrated Application Data in Both Batch and Real-Time

Solution • Oracle Apps Unlimited to Fusion Migrations

• E-Business Suite Application Database Migrations

• E-Business Suite Operational Reporting

• Oracle BI Applications utilizing Oracle Data Integrator

• Siebel CRM Zero Downtime App Upgrades

• ATG Active-Passive or Active-Active

• JDE Edwards Zero Downtime App Upgrades

• PeopleSoft Real-Time Integrated Operational Reporting

Benefits • Trusted, pre-built, certified solutions

• Right tools for the job

• Improved report generation times

• Improved performance on transaction systems

25

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Integration with Oracle Coherence

Oracle TopLink

Tight Integration with Oracle Coherence

enables real-time updates to Coherence

cache

Refreshes invalidated object in the Coherence cache when the database is directly modified

Coherence users can access real-time data without any changes required to the source system

Oracle Coherence Grid Edition 12.1.2

Oracle & Non-Oracle Database(s)

Capture C

oh

ere

nc

e

Ad

ap

ter

Trail

Files

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Next Major Disruptive Forces

Oracle Company Confidential

Data Self Service Big Data Reservoir Devices & Things Virtualization

Cloud Affecting everything, the location of the data or the data processing

can be anywhere in public or private cloud data centers

Bringing automation and simplicity to data

movement, sandboxing, and preparation

Enterprise scale use of Hadoop for staging,

storage and manipulation of all types of data

Integrating data that originates from devices,

things and any other event sources on the

network

Enabling data consumers to access and manipulate

data regardless of its physical location

…from IT led ETL to: …from ELT w/SQL to: …from SQL Logs to: …from Federation to:

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What Will Transform Data Integration Solutions in the Future?

28

GOV

Big Data Reservoir

Cloud Data Governance

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 29

GOV

Big Data Reservoir

Cloud Data Governance

What Will Transform Data Integration Solutions in the Future?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration – Pragmatic Solutions for Cloud

30

Cloud BI / Analytics Oracle Business Intelligence Cloud

uses Oracle Data Integration Oracle Data Integration also

supports non-Oracle BI/Analytics

Cloud SaaS to Mart/EDW Bring SaaS Application data into

on-premise data warehouses Synchronize reference data or

master data with SaaS Apps

Cloud Database Sync On-premise DBs to managed or

private cloud data centers Sync local databases with

Database as a Service (DBaaS)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

On-Premise

Amazon S3 Bucket

Amazon Redshift

FTP

On-Premise Apps to Heterogeneous Cloud BI/Analytics

OGG ODI

EDQ

ODI

OBIEE

Key Opportunity Provide high volume data movement

and data synchronization capabilities between on-premises and Cloud-based resources

Perform E-LT/ETL and Data Quality transformations natively on Cloud BI/Analytics platforms

Avoid using different Data Integration solutions for Cloud and on-premises deployments

31

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

On-Premise

SaaS Application Data into On-Premise BI/Analytics

ODI

EDQ

OGG

ODI

Key Opportunity

Integrate natively with on-premises resources and Cloud-based Applications such as Salesforce.com, Sales Cloud, Service Cloud or Eloqua

Offload reporting to eliminate impact on production systems

Provide high volume data movement and data synchronization capabilities for Cloud and on-premises Apps

32

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

On-Premise

Database to Database Replication in the Cloud

OGG

OGG

Private

Cloud /

Managed

Cloud

ODI

Key Opportunity

Synchronize data efficiently between on-premises databases and Oracle DBaaS

Consolidate numerous databases into a Private or Public Cloud database infrastructure

Implement an highly-available infrastructure for both on-premises and Cloud database deployments

33

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration Can Help Now

Unidirectional Query Offloading Zero-Downtime Migration Data Integration Cloud or On-Premise

Bi-Directional Active-Active for Multi-Master/HA Cloud or On-Premises

Big Data Delivery Real/Time and Batch Delivery Structured Data to Data Reservoir

Data Distribution via Messaging

Cloud Apps Integration

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration Components in the Cloud

• Oracle GoldenGate with Amazon RDS is available under the “Bring-your-own-license” model in all AWS regions

• Oracle Data Integrator is already being used internally within various Oracle Cloud Applications such as Oracle Sales Cloud (ex- FA CRM)

• Enterprise Data Quality is already being used in some of our Cloud Applications such as the Address Verification Service

• Amazon RDS supports migration and replication across several Oracle Database Editions using Oracle GoldenGate. We do not support nor prevent customers from migrating or replicating across heterogeneous databases

• Oracle Data Integrator is also an integral part of the Oracle Cloud to OBIA Connector offering

• Cloud to On-Premise App replication using OGG has also been proven at customer sites

• ODI can be installed in Cloud environments such as Oracle Cloud or Amazon EC2

• Customers are successfully using ODI with Cloud databases such as Amazon Redshift

35

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 36

Big Data Reservoir

GOV Data Governance Cloud

What Will Transform Data Integration Solutions in the Future?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Why the word “Reservoir?”

37

https://blogs.oracle.com/bigdata/entry/big_data_and_analytic_top

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

True Hadoop Opportunity: Big Data Reservoir

38

Deep Data Storage

Data Preparation

Data Discovery

Data staged / merged in

Hadoop to provide single place

to explore/discover data

External data staging and long

running batch jobs run in Hadoop

to make the most of the DB

Store more raw detail data for

less cost, while keeping

aggregates in the DB

DW

Support for Exploratory Analytics

without time consuming data

modeling

Lower cost data staging and data

preparation

Lower cost storage for

questionable business data

Data Staging & Preparation

New Data Discovery

Detailed, Deep Data

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Self Service/Reservoir Solution – What is it?

39

High Level Pattern #2:

Hadoop as a pre-processing platform for staging, preparing and transforming data prior to loading the Data Warehouse

Also used for long term storage of Detail data records (vs. Summary) and other aged data

Analytics run (a) directly on Hadoop, (b) federated with DW, or (c) only on DW

High Level Pattern #3:

Hadoop as transparent backend expansion point for Detail data records (vs. Summary) and other aged data

Also used for long term storage of Detail data records (vs. Summary) and other aged data

Analytics run only on DW

Data Flow DW

Analytics Analytics Analytics

Data

Dat

a

DW

Analytics

Data (optional)

Data Flow

Analytics Analytics

High Level Pattern #1:

Also used for long term storage of Detail data records (vs. Summary) and other aged data

Analytics run (a) directly on Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Logical Architecture – Seamless Data Integration is Crucial

40

Virtu

alis

atio

n &

Qu

ery

Fed

era

tio

n

Enterprise Performance Management

Pre-built & Ad-hoc BI Assets

Information

Services

Data Ingestion

Information Interpretation

Access & Performance Layer

Foundation Data Layer

Raw Data Reservoir

Data Science

Data Engines & Poly-structured sources

Content

Docs Web & Social Media

SMS

Structured Data Sources

• Operational Data

• COTS Data

• Streaming & BAM

Immutable raw data reservoir

Raw data at rest is not interpreted

Immutable modelled data. Business

Process Neutral form. Abstracted

from business process changes

Past, current and future interpretation of

enterprise data. Structured to support agile

access & navigation

Discovery Lab Sandboxes Rapid Development Sandboxes

Project based data stores

to support specific

discovery objectives

Project based data stored

to facilitate rapid content /

presentation delivery

Data Sources

Master & Reference Data Sources

Data Integration & Governance

Data Integration & Governance

DI&

G

DI&

G

DI&

G

DI&

G

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Concrete Business Value with Big Data Reservoir

41

Lower TCO for the Data

Warehouse

LoB Faster Access to

Analytic Data

New Types of Analytics for

All Data • Control the costs of the Data

Warehouse

• Massive value multipliers for Teradata and Netezza customers

• Put an end to the annual upgrade cycle

• Give analytics to the business earlier in the data lifecycle

• Avoid up front modelling overhead for Discovery

• Empower IT to focus on highest value analytics

• Run BI queries faster

• Support Exploratory Analytics directly from Hadoop

• Run Streaming Analytics from OEP, Storm, Flume etc.

• Drive new business solutions (telematics data, machine data, log data, unstructured data)

COST SPEED VALUE

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration with Hadoop

42

Sources

Oracle Data Integrator (E-LT & ETL)

Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate)

Fast Load

Oracle GoldenGate (Replication)

E-LT & DQ

Enterprise Meta Data Management (Lineage, Impact Analysis and Data Provenance)

Comprehensive data integration platform designed to work with all data.

• Data Replication

– Continuous data staging into Hadoop

• Data Transformation

– Pushdown processing in Hadoop

• Data Federation

– Query Hadoop SQL via JDBC

• Data Quality

– Fix quality at the source or invoke Machine Learning in Hadoop

• Metadata Management

– Lineage and Impact Analysis w/Hadoop

Data Service Integrator (Federation)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Connectors

Data Load Oracle Loader for Hadoop

Data Access Oracle SQL Connector for

HDFS

R Analytics Oracle R Advanced Analytics

on Hadoop

Oracle Data Integrator Application Adapter for

Hadoop

XML/XQuery Oracle XQuery on Hadoop

XQuery R Client

Optimized for Hadoop: Maximise parallelism Fast performance Analyze data on Hadoop using familiar client tools

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Supports Hadoop standards

Reverse Engineer Hadoop

metadata

Check, Validate and Ensure

Data Integrity with Hadoop

Load Data into HDFS/Hive

Generate HiveQL and execute

in Hadoop

Leverage existing Hadoop

transformations

Oracle Data Integrator for Big Data Heterogeneous Integration with Hadoop Environments

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Simplifies creation of Hadoop and MapReduce code to boost productivity

Integrates big data heterogeneously via industry standards: Hadoop, MapReduce, Hive, NoSQL, HDFS

Unifies integration tooling across unstructured/semi-structured and structured data

Optimizes loading of big data to Oracle Exadata using Oracle Big Data Connectors

Engineered for running on and integrating with Oracle Big Data Appliance via Big Data Connectors

Oracle Data Integrator for Big Data

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle GoldenGate for Continuous Streaming to Hadoop

• Leverages GoldenGate & HDFS / Hive Java APIs

• My Oracle Support Documents

• HDFS – 1586210.1

• Hive – 1586188.1

• Can also integrate with Flume for delivery to HDFS • Flume – 1926867.1

Overview

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integration Can Help Right Now

47

Any Sources

Staging

Temp

Prod

Files

Files

Detail

MR

MR

Oracle Data Integrator Oracle GoldenGate

Fast Load SQL

#1 – Tools not Spaghetti • “ETL 101” avoid complex, costly custom coding

#2 – Non-invasive Capture and Staging • Move data without inefficient batch extracts

#3 – Processing is Taken to the Data • No separate ETL engine needed • Eliminate unnecessary data movement • Reclaim latency and time from network overhead

#4 –Native Hadoop Execution • Choose the right Hadoop language for your use case

• HiveQL, Pig, Spark, Storm, Java/MR2, etc. • Template driven code gen keeps pace w/change on Hadoop platform

#5 – Native SQL Pushdown • Optimize some join types within the Data Warehouse

#6 – Oracle Optimized • OGG and ODI certified to run on the Oracle Appliances

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Heterogeneous Reservoir with Oracle Data Integration

48

Flume Hive on MR, Tez, Spark

Logs

OLTP DB

SQOOP

OGG

Pig on MR, Tez, Spark

ODI

SQOOP

Any DW

OGG

Spark

Oozie

OEDQ OEMM

Data Validation & Cleansing

Metadata Mgmt & Lineage

API/File

Hive/HCat, HDFS,HBase

Hive/HCat, HDFS,HBase

NoSQL

Flume

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Load to Oracle

OLH/OSCH

Red Stack Reservoir with Oracle Data Integration

49

Transform Hive

ODI

Hive/HDFS

Federate Hive/HDFS to Oracle

Big Data SQL

Oracle DB OLTP

Load from Oracle

CopyToBDA

Hive/HDFS

Federate Oracle to Hive

Query Provider for Hadoop

OGG OGG Hive/HDFS

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Engineered System for Big Data from Oracle

50

DISK

PCI

FLASH

DRAM

Warm

Data

Hottest Data

Active Data

• Engineered data platform

• ODI Data Transformation at the

speed of DRAM or the scale of

Hadoop

• Utilize each data tier for

specialized algorithms &

compression

• Speed of DRAM

• I/Os of Flash

• Cost of Disk

• Scale of Hadoop

Hadoop

DISKS Deep Data

Oracle Data Integrator

Oracle GoldenGate

Fully exploit Big Data SQL, In-Memory and No-SQL Advancements from Oracle

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Strategy

Acquire – Organize – Analyze

Oracle BI Foundation Suite

Oracle Real-Time Decisions

Endeca Information Discovery

Decide

Oracle Big Data Connectors

Oracle Data Integrator

Oracle Advanced Analytics

Oracle Database

Oracle Spatial & Graph

Stream

Oracle Event Processing

Apache Flume

Oracle GoldenGate

Oracle NoSQL Database

Cloudera Hadoop

Oracle R Distribution Oracle Big Data

SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Streaming Reservoir with NoSQL and DIS

52

Transform (Hive, Pig/Oozie, Spark)

ODI

Federate Hive/HDFS

Big Data SQL

Oracle NoSQL

Hive/HDFS

OGG

OGG

Hive/HDFS Any DB

Sensors & Events

Hive/HDFS

OEP

Load to Oracle

OLH/OSCH

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

GoldenGate and Streaming Data

53

Sensors

Apps

Apps

Storm / Flume / Spark / Kafka / etc

Hive (high speed apply) & HBase

OGG OGG

OGG OGG

OGG OGG

OGG

OGG

Leverage DB transactions w/in realtime analytic

streams

Stage DB records for subsequent

processing

Open OGG APIs for capture of non-DBMS events

Non-invasive Capture and Staging

• Move data without batch extracts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Does Big Data Integration Better

54

Dynamic Data Movement – CDC is by default, not an add-on – Least invasive on sources – Proven best performance – Native Oracle capture/apply

NoETL Engine – Take the processing to the data;

don’t move the data to the process – Leverage your data engines for the

workloads (Hadoop or SQL)

Most Heterogeneous – Leverage open source Hadoop, not

proprietary distributions – Hadoop is the Hub, not ETL tools – Open metadata standards

vs.

Batch Data Movement – Typical ETL vendors all default to batch data

movement in their reference architectures – Some can “talk the talk” but their CDC tech can’t

touch Oracle GoldenGate scale/performance

ETL Engine Must Scale Alongside Hadoop – Carefully watch how ETL engines scale out;

parallelism runs via the Engine – more H/W to buy – Map out the physical deployment architecture,

compare to ODI, the TCO difference will be clear

Proprietary Vendor Lock-in – One popular ETL vendor puts their engines at the

center of the architecture, not Hadoop – The mainframe of ETL vendors is has proprietary

features that mainly run in their own distro – A “fake free” ETL vendor sells proprietary add-ons

vs.

vs.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Does Big Data Better: Dynamic Data Movement

55

HDFS (Files)

HBase (NoSQL)

Hive / Hive Streaming (SQL)

Flume & Storm (Streaming)

Kafka (MPP Pub/Sub)

Spark Streaming (Machine Learning)

Capture Database Transactions and Deliver to Big Data in Real-Time

Ca

ptu

re

Tra

il

Ro

ute

De

live

r

Pu

mp

GoldenGate

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Does Big Data Better: Invented Pushdown Processing

56

OR

CL In

ve

stm

en

ts in E

LT

/Pu

sh

dow

n T

ech

Scripted

SQL

Stored

Procs

Warehouse

Builder

Data

Integrator

(Heterogeneous)

ODI for

Columnar

DBs

ODI for

In-Memory

DBs

ODI for

Engineered

Systems

ODI for

Hadoop

NoSQL

ODI for

Hadoop

Pig & Oozie

ODI for

Spark

ODI for …

1990’s

Eon of Scripts and PL-SQL Era of Native SQL Big Data Revolution

Oracle’s tool maturity and operational know-how for E-LT is unmatched

10x bigger footprint with E-LT than next closest competitor using “pushdown”

Simple and easy way to blend Hadoop and SQL E-LT execution from one tool

ODI for

Hadoop

Hive

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Does Big Data Better: NoETL Approach

57

One Logical Design: Many Engine Alternatives:

Data Engines: Examples: Engine I/O: Best Use:

SQL / OLTP Database

• Oracle DBMS • Any OLTP DBMS • DW Appliances

SSD / Disk based

High volumes of transformations on relational data

MapReduce • Hive / MR2 • Pig / Oozie / MR2

SSD / Disk based

Huge batch-like transformations on any data types

In Memory (SQL / Big Data)

• Oracle InMemory • Hive / Tez / YARN • Spark / YARN • Cloudera Impala

D/RAM; with various built in spill to disk approaches

Highly interactive data transformation patterns

Streaming Big Data

• Storm / YARN • Oracle Event

Processor (OEP)

D/RAM; “always on” data pipeline

Very low latency transformations

Modern design studio for simple map development

Team-based GUI Tooling for work on Enterprise projects

Integrated lifecycle and metadata management

Automated support for Changed Data Capture

SEPARATE ETL ENGINE NOT REQUIRED!

Data Integrator

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Does Big Data Better: Clear Business Benefits

58

Proven

Technology

Better

Architecture

Best for

Oracle • Unlike custom coding, a tools

based approach is proven to result in lower cost long term operations

• Oracle GoldenGate is industry standard for Data Replication

• Oracle invented E-LT Pushdown processing and is 10x more widely deployed than competitors

• Oracle GoldenGate provides the most scalable, native integration for database replication

• Oracle Data Integrator provides ultimate scalability and choice for Hadoop data transformations

• Consistent agent-based architecture avoids having multiple, incompatible engines (eg; old style ETL tools)

• Exadata – OGG and ODI are deeply integrated and are the only Replication and ETL processes certified to run on the appliance

• Big Data Appliance – deeply integrated technology part of core reference architecture

• Big Data Connectors – ODI included with core connector technologies for Hadoop

RISK SCALE COMPLETE

Heterogeneous Access

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 59

Big Data Reservoir

Cloud

GOV Data Governance

What Will Transform Data Integration Solutions in the Future?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Core Data Governance Solution Use Cases

60

System Consolidation/ Migration

Enterprise DQ Services/ Governance

DW/BI Enablement

MDM Enablement

Application Enablement

Compliance

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Resources

61

Oracle Data Integration OracleDataintegration OracleGoldenGate ORCLGoldenGate blogs.oracle.com/dataintegration

Follow us and connect with our community

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Questions & Answers

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 63