The Dynamic Duo: How Oracle’s Big Data Appliance & Exadata Deliver Analytics [CON8076] Mike...
-
Upload
edwin-gardner -
Category
Documents
-
view
214 -
download
0
Transcript of The Dynamic Duo: How Oracle’s Big Data Appliance & Exadata Deliver Analytics [CON8076] Mike...
The Dynamic Duo: How Oracle’s Big Data Appliance & Exadata Deliver Analytics [CON8076]
Mike Sorrels, Sr. VP, Database and Architecture, RegionsManish Nevrekar, Sr. Database Administrator, RegionsChris Fox, Director, Enterprise Architecture, Oracle
October 01, 20144:45 - 5:30pm EST
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Today’s Agenda
Introduction
The Challenge & The Dilemma
The Decisions
Lessons Learned
Things to Consider
1
2
3
4
5
Regions Financial Overview
Regions is a Top 20 US Bank with approximately $118 billion in assets
Headquartered in Birmingham, AL
Operations in 16 states in the South and Midwest.
Approximately 1,700 branches and 2,000 ATMs
Product lines include: Consumer banking
Commercial banking
Wealth management
Mortgage products and services
Insurance products and services
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The Challenge – We Want To Respond Faster!
• Like Everyone in this Room:– Regions has been working with data for a long-time! – Regions has lots of sources, warehouses, marts, BI tools, Users, etc
And…Just Like Everyone Else– COMPETITION to keep customers & get new ones is as fierce as ever– They needed MUCH faster access to customer and operational data– They wanted to be able to evaluate MUCH more data than in the past– Their regulatory reporting requirements are intensifying and always change– They wanted to provide great data fast
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The Dilemma – Old School vs New School
Hadoop Relational
Old SchoolNew School
vs.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The Dilemma – So, Who Do We Call?
Hadoop Relational
OldNew
1. Do we rip-and-replace what we’ve got? 2. Does anyone know Hadoop, Hive or Impala?3. How can we merge the old with the new? 4. Can an old dog “the DBA” learn new tricks?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The Dilemma – So, Who Do We Call? Actions
LoBExecutives
DataScientists
Employees& Partners
FinancialAnalysts
CustomerExperience
Machines
Mixed Data
Social
OperationalData
MachineEvents
HistoricalData
Transactions
Ken Rudin, Director of Analytics at Facebook* refers to this as “The genius of AND vs the tyranny of OR“
(see his TDWI ‘13 presentation)
All Data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Both…When They Are Combined….They’re Unstoppable!Actions
LoBExecutives
DataScientists
Employees& Partners
FinancialAnalysts
CustomerExperience
Machines
Mixed Data
Social
OperationalData
MachineEvents
HistoricalData
Transactions
Hadoop Relational
10
Infiniband – delivers ultra high speed connections for data movement; 4x-10x Regions cloud;
Oracle Enterprise Manager (OEM)
InfiniBandOracle Big Data
Connectors
Oracle Data Integrator (ODI)
Oracle Exadata Database Machine
Oracle Big Data Appliance (BDA)
Oracle Data Integrator (ODI)
Data Acquire Organize Analyze Decide
Single vendor integrates and supports all layers; Allows staff to focus on delivering business value; Earlier data availability - faster load processing; Horizontal scale for growth; Lower TCO solution;
Extremely fast & expandable platform; Up to 25x throughput of current systems; Used for analytical and transaction applications; Scalable, will host other applications in future;
Administration, problem analysis monitor and tool; Similar tool and views used by DBAs; Supplement and cross-train Oracle DBA staff; Oracles ‘flagship’ software integrated platforms;
Builds data maps & executes data migration; Integrated & engineered for Oracle platforms; Simplifies complexities of Big Data technologies; Can be leveraged end-to-end, source to marts;
Transparent connection to BDA; Reduces duplication of data; Allows query drill-thru to BDA;
Business (1) Data Mart
Business (2) Data Mart
Business (3) Data Mart
Data Services(SOA / API)
Regions Financial Approach
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Ok…So Where Do I Put The Data?
Hadoop Relational
You explore EVERYTHING
= OldNew=
The data is FOCUSED (Gold Tier)
How Do I Access & Move Data Between Them?
Oracle SQL Connectors for Hadoop Provides access to HDFS as external tables
Queries data in place uses DB compute resources.
Oracle Loader for Hadoop Provides datapump data file on Hadoop cluster to tables
Map Reduce to transform data to Oracle Data Types
Uses Compute resources on Hadoop cluster
Hadoop Relational
How Did We Set Up Exadata & Big Data Appliance to Work Together?
Sqoop and HDFS_PUT loads data from 35+ sources Data is converted with JRecord HIVE tables are created External tables are created to access the files in
HDFS. Everyday each of the files are persisted into the
database in a new partition. Views are built on the tables and the external tables. Partitions dropped as the data becomes “Stale”.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Ok…So What Data Goes Where?
Hadoop Relational
You explore EVERYTHING The Data is MUCH more granular
= OldNew=
The data is FOCUSED (Gold) The data is FAST
Why Not Just Query Hadoop? (HIVE)hive> select count(*) from *************;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Starting Job = job_201408051928_44624, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201408051928_44624
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201408051928_44624
Output Truncated for Brevity
Ended Job = job_201408051928_44624
MapReduce Jobs Launched:
Job 0: Map: 2503 Reduce: 1 Cumulative CPU: 30850.17 sec HDFS Read: 662156350180 HDFS Write: 11 SUCCESS
Total MapReduce CPU Time Spent: 0 days 8 hours 34 minutes 10 seconds 170 msec
OK
1404377433
Time taken: 236.547 secondshive>
Hadoop
That’s REAL SLOW!!!
Why Not Just Query Hadoop? (Exadata HCC)
SQL> select count(*) from *********** PARTITION(**********);
COUNT(*)
-----------------
1404377433
Elapsed: 00:00:04.41SQL>
Relational
That’s SUB SECOND!!!
Compression Speeds Up Query Performance!
Uncompressed
Compressed Compression
HDFS data
3GB 1.75 GB (LZO compression)
1.71 times
Exadata data
3.2 GB 0.425 GB (Warehouse High)
7.5 times
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Ok…So What Data Goes Where?
Hadoop Relational
You explore EVERYTHING The Data is MUCH more granular The data is TRANSFORMED faster You can rapidly create NEW data
= OldNew=
The data is FOCUSED (Gold Tier) The data is FAST The data is SECURE
I Need To Secure My Sensitive Data!
Hadoop – Sentry Relational
Authentication
• Kerberos is used.• User to Service (User running Map
Reduce)• Service to Service (Oozie to Job
tracker)• Proxy user (Job being run by user
through Oozie)
• Password based DB authentication
Authorization • HDFS unix type (User/Group permissions)
• Hive Server2 RBAC
• Role Based authentication.• Row Level Security • Object Level Preventative
Controls
Confidentiality
• TPM on the motherboard.• Passphrase using public/private keys
using openSSL
• Transparent Data encryption
Audit • Develop policies, procedures and practices
• Potentially utilize Oracle Audit Vault
• Leverage same relational practice currently used
• Consider Oracle Audit Vault
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Ok…So What Data Goes Where?
Hadoop Relational
You explore EVERYTHING The Data is MUCH more granular The data is TRANSFORMED faster You can rapidly create NEW data
= OldNew=
The data is FOCUSED (Gold Tier) The data is FAST The data is SECURE The data can be JOINED with OPS
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Risk Data
Relational
Users Don’t Care…They Just Want The Data…Fast!
Oracle Confidential – Internal/Restricted/Highly Restricted
HadoopApplication Logs
Combine Data with SQL
Profit and Loss
Relational
Marketing
Relational
Join
So, What Did We Learn On Our Journey?
Evaluated multiple products and architecture approaches Utilize integrated Oracle Engineered Systems
Overlap implementations for TEST and PRODUCTION environments Oracle ACS drives implementation
Formed a new internal team of technologists Leveraged existing associates; Hired new associates with related skills
Support Model Cloudera for software support; Oracle for Hardware & Exa
support
Leverage Cloudera Manager capabilities
Where do we go from here?
Expand our Data as a Service More data sources and more consumers Rapidly provision new data sets and
databases with Oracle 12c and Multi-Tenant Migration
Advanced analytics on Big Data Appliance Cluster expansions and scale Big Data SQL!
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Revolutionize Data Access – Add Big Data SQL!
One fast SQL query, on all your data.
Oracle SQL on Hadoop and beyond•With a Smart Scan on Hadoop service as in Exadata•With native SQL operators•With the security and certainty of Oracle Database
Oracle Confidential – Internal/Restricted/Highly Restricted
24
Profit and Loss
RelationalHadoopApplication Logs
NoSQLCustomer Profiles
BIG DATA SQL
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Revolutionize Data Speed – Add the In-Memory Option!
Column Format
Memory
Row Format
Memory
In-Memory Analytic SpeedSales Sales
Sales
Hadoop
Sentiment
40Gb/s
Oracle Exadata
In-MemorySmart Scan
Infiniband
Big Data Appliance
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
The Dynamic Duo…Meets the rest of the Team (the Justice League)!
Content
Docs Web & Social Media
SMS
StructuredDataSources
•Operational Data•COTS Data•Streaming & BAM
Data Ingestion/Transform
Raw Data Reservoir
Immutable raw data reservoirRaw data at rest is not interpreted
Information Interpretation
Foundation Data Layer
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Access & Performance Layer
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to support specific discovery objectives
Project based data stored to facilitate rapid content / presentation delivery
Data Sources
Master Data & Metadata Mgmt, Ref. Data Sources
Real-Time Access
Mixed Data Sources Data Governance Event Processing Ingestion andDeep Storage
Transformation and Access Sandboxes Consumption/Action
Rapid Event ProcessingIngest and Act in Real-Time
LoBExecutives
DataScientists
Employees& Partners
FinancialAnalysts
CustomerExperience
In-MemoryData Grid
Pre-built Analytics &
Ad-Hoc
Data Science
Enterprise Performance Management
Mobile
InformationServices
Next Best Decision
Machines
Data Engines & Poly-structured sources Weather
Uni
fied
SQL
Que
ry
& D
ata
Virt
ualiz
ation
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
LoBExecutives
DataScientists
Employees& Partners
FinancialAnalysts
CustomerExperience
Machines
Together…This Team Tackles Every Problem!
Data Ingestion/Transform
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Content
Docs Web & Social Media
SMS
StructuredDataSources
•Operational Data•COTS Data•Streaming & BAM
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to support specific discovery objectives
Project based data stored to facilitate rapid content / presentation delivery
Data Sources
Master Data & Metadata Mgmt, Ref. Data Sources
Real-Time AccessRapid Event ProcessingIngest and Act in Real-Time In-Memory
Data Grid
Pre-built Analytics &
Ad-Hoc
Data Science
Enterprise Performance Management
Mobile
Uni
fied
SQL
Que
ry
& D
ata
Virt
ualiz
ation
InformationServices
Next Best Decision
BIG DATA APPLIANCE
DATA INTEGRATION
DATABASE
EVENT PROCESSING
ENDECA ADVANCED ANALYTICS
HYPERION
WEBCENTER
BUSINESS INTELLIGENCE
ADF MOBILE
REAL-TIME DECISIONS
COHERENCE
MASTER DATAMANAGEMENT
DATAWAREHOUSING
BIG DA
TA SQL
BUSIN
ESS INT
ELLIGEN
CE
NO SQL
METADATAMANAGEMENT
Data Engines & Poly-structured sources Weather
DATABASE
Mixed Data Sources Data Governance Event Processing Ingestion andDeep Storage
Transformation and Access Sandboxes Consumption/Action
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
7 Things to Consider on Your Journey1. Consider how to truly “Revolutionize Data Delivery”…
2. Consider what each technology brings to your platform…
3. Consider where data should live on your platform…
4. Consider how to tackle data governance (Stewards, Tiers of Quality, Lineage)…
5. Consider starting with an architectural approach (Understand How to Grow The Platform)
6. Consider how you will support the entire system (Install, Support, Growth, etc)
7. Consider X-Training the teams…
Hadoop Relational
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Conclusion – The Dynamic Duo Always Saves The Day!
Data Is Manipulated Faster Than Ever as Business Needs ChangeEveryone gets Cross-Trained & Gets New Skills
Users Get Faster Access to All DataIT Redefines Responsiveness
Hadoop RelationalExadataBig Data Appliance
Questions??