John mallory, EMC @ the Chief Data Officers Forum ANZ - Sydney, Feb 2015
-
Upload
chief-data-officer-forum-cdoforum -
Category
Technology
-
view
139 -
download
0
Transcript of John mallory, EMC @ the Chief Data Officers Forum ANZ - Sydney, Feb 2015
1 © Copyright 2013 EMC Corporation. All rights reserved.
Transform Your Analytics with a Business Data Lake
John Mallory CTO, Analytics
2 © Copyright 2013 EMC Corporation. All rights reserved.
See More Completely
On Demand Access to All Data
Analyze More Deeply
Act More Precisely
Deeper Insights Better Business Outcomes
Data Nirvana
3 © Copyright 2013 EMC Corporation. All rights reserved.
The Old Way: Bringing Data to Compute
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
EDWs Marts Search Servers Document Stores Storage
Complex Architecture • Many special-‐purpose
systems • Moving data around • No complete views
Visibility • Leaving data behind • Risk and compliance • High cost of storage
Time to Data • Up-‐front modeling • Transforms slow • Transforms lose data
Cost of Analy9cs • ExisJng systems strained • No agility • BI backlog
4 © Copyright 2013 EMC Corporation. All rights reserved.
The Result: Data & Application Silos
Finance Manufacturing Marketing Research
60% YoY Growth Floods These Silos *
65% of Capacity Is Copy Data *
*IDC
5 © Copyright 2013 EMC Corporation. All rights reserved.
The New Way: Bringing Compute to Data
EDWs
Marts Storage
Search Servers
Documents
Archives
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
Mul9-‐workload analy9c pla?orm • Bring applicaJons to data • Combine different workloads on
common data (i.e. SQL + Search) • True BI agility
4
1
2 1
3 4
Ac9ve archive • Full fidelity original data • Indefinite Jme, any source • Lowest cost storage
1
Data management, transforma9ons • One source of data for all analyJcs • Persisted state of transformed data • Significantly faster & cheaper
2
Self-‐service exploratory BI • Simple search + BI tools • “Schema on read” agility • Reduce BI user backlog requests
3
6 © Copyright 2013 EMC Corporation. All rights reserved.
Drivers for Business Data Lakes Traditional cost models do not
scale to “Big Data”
Very little of the data generated is used due to expense and complexity of storing and processing data
Challenges accessing “Atomic” level raw data.
Time consuming data transformation limits agility and exploration
NEW DATA SOURCES
are emerging that do not meet traditional storage paradigms
7 © Copyright 2013 EMC Corporation. All rights reserved.
What is a Business Data Lake?
• Single Unified Data Pool = “Single Copy of Truth”
• Multiple Access Points & Methods
• Enterprise Data Governance & Protection
• Data Migration Immune
See IDC’s Insight: “Enterprise Data Lake Platforms: Deep Storage for Big Data & Analytics” July 2014
8 © Copyright 2013 EMC Corporation. All rights reserved.
Data Lakes Make Analytics More Efficient A data lake lets you bring your analytics tools to your data. Data is shared between projects with centralized control & standardized analytics tooling.
Ingest Store Analyze Surface Act
Capture data from a wide range of
sources, traditional and
new.
Store everything in one
environment for cross data-set
analysis.
Use advanced algorithms to discover new,
predictive patterns.
Share insight with business domain
experts.
Build data-driven applications that meet business
needs.
9 © Copyright 2013 EMC Corporation. All rights reserved.
Business Value of Analytic Agility
Complexity
Value of Analytics
($)
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
What happened?
Why did it happen?
What will happen?
How can we make it happen?
BI
Data Science
10 © Copyright 2013 EMC Corporation. All rights reserved.
Current State Analytics Existing Enterprise Data Warehouse
$$$$
(Highly Summarized /
Processed Data)
ERP
HR
SFDC
Traditional Data Sources
Load
New Data Sources/Formats
Machine
ETL
Backup Storage
Trash
BI / Analytical
Tools
This data doesn’t look
right – where’s the
detail?
I really need data I know we have, but
it’s not accessible
I can’t afford to
keep buying more EDW/Datamarts
at this growth!
Business Users
11 © Copyright 2013 EMC Corporation. All rights reserved.
What is ETL Offload?
Raw Data Massive Growth
Cost Effective $
Ideal for new data sources
Hadoop/HDFS
MapReduce & SQL
Source
Source
Source
Extract
Sources Increased
Variety
Logs
Transactional
Etc.
Cleansed Data
Load
EDW Reclaim for
higher value apps Improve query performance
Load
Preserve existing EDW without increasing costs and risk
Business Apps
(incl. Data Marts)
12 © Copyright 2013 EMC Corporation. All rights reserved.
The Business Data Lake Approach
Analytic Sandbox
Ad Hoc Analytic Environment
Structured BI Reporting Environment
Data Preparation and Enrichment
Via Hadoop
ALL data fed into Business Data Lake
EDW ETL
Business Data Lake
Offload ETL to Hadoop
13 © Copyright 2013 EMC Corporation. All rights reserved.
Hadoop Enables the Business Data Lake An ecosystem for storing and processing any data type
Large community of users and developers
Easily extended with new interfaces and tools
Not limited to single data type – can access any data
Store, process, and analyze any size data sets
14 © Copyright 2013 EMC Corporation. All rights reserved.
Hadoop For The Business Data Lake?
Direct-attached storage
Stand-alone Servers
Single purpose
All commodity environment
Typical Hadoop
Support at scale
Rapid deployment
“What Now” factor
Intensive Learning Curve
Typical Challenges
Reintroduces old challenges that IT solved years ago
15 © Copyright 2013 EMC Corporation. All rights reserved.
Report
Mobile Analyze
Files
Archive
Web
A Better Business Data Lake Approach
16 © Copyright 2013 EMC Corporation. All rights reserved.
National Healthcare Organization Replaces Aging Platform, Adopts Data Lake
Challenge: • Aging IBM Infrastructure could not support new SAS
Access and Visual Analytics Technology • Interest in enabling infrastructure to support for-profit
healthcare analytics as a service business • Sought to provide refined data sets to other insurance
companies for their own research, needed way to cleanse data
Solution: • Stepwise evolution of platform onto GPDB, one of two
certified platform partners for running visual analytics • Established data lake as platform for upload, cleansing
and conversion of private data into publicly consumable datasets
17 © Copyright 2013 EMC Corporation. All rights reserved.
Large Telco Reduces Response Time for Regulatory Reports from 1 Week to 1 Day
Challenge: • Distributed and Heterogeneous data infrastructure
made it difficult to respond to regulatory report requests
• Data volumes prevented analysis of information across broad timescales
Solution: • Reports which required more than one week to create
can now be turned around same day • 1PB System storage capacity allows the analysis of all
data • Platform combining PHD and Isilon allows the ability to
scale infrastructure for storage without having to worry about compute
19 © Copyright 2013 EMC Corporation. All rights reserved.
Thank You!
Check out Inovalon & EMC video on YouTube
Questions?