John mallory, EMC @ the Chief Data Officers Forum ANZ - Sydney, Feb 2015

20
1 © Copyright 2013 EMC Corporation. All rights reserved. Transform Your Analytics with a Business Data Lake John Mallory CTO, Analytics

Transcript of John mallory, EMC @ the Chief Data Officers Forum ANZ - Sydney, Feb 2015

1 © Copyright 2013 EMC Corporation. All rights reserved.

Transform Your Analytics with a Business Data Lake

John Mallory CTO, Analytics

2 © Copyright 2013 EMC Corporation. All rights reserved.

See More Completely

On Demand Access to All Data

Analyze More Deeply

Act More Precisely

Deeper Insights Better Business Outcomes

Data Nirvana

3 © Copyright 2013 EMC Corporation. All rights reserved.

The Old Way: Bringing Data to Compute

ERP,  CRM,  RDBMS,  Machines   Files,  Images,  Video,  Logs,  Clickstreams   External  Data  Sources  

EDWs   Marts   Search  Servers   Document    Stores   Storage  

Complex  Architecture  •  Many  special-­‐purpose  

systems  •  Moving  data  around  •  No  complete  views  

Visibility  •  Leaving  data  behind  •  Risk  and  compliance  •  High  cost  of  storage  

Time  to  Data  •  Up-­‐front  modeling  •  Transforms  slow  •  Transforms  lose  data  

Cost  of  Analy9cs  •  ExisJng  systems  strained  •  No  agility  •  BI  backlog  

4 © Copyright 2013 EMC Corporation. All rights reserved.

The Result: Data & Application Silos

Finance Manufacturing Marketing Research

60% YoY Growth Floods These Silos *

65% of Capacity Is Copy Data *

*IDC

5 © Copyright 2013 EMC Corporation. All rights reserved.

The New Way: Bringing Compute to Data

EDWs  

Marts   Storage  

Search  Servers  

Documents  

Archives  

ERP,  CRM,  RDBMS,  Machines   Files,  Images,  Video,  Logs,  Clickstreams   External  Data  Sources  

Mul9-­‐workload  analy9c  pla?orm  •  Bring  applicaJons  to  data  •  Combine  different  workloads  on    

common  data  (i.e.  SQL  +  Search)  •  True  BI  agility  

4

1

2 1

3 4

Ac9ve  archive  •  Full  fidelity  original  data  •  Indefinite  Jme,  any  source  •  Lowest  cost  storage  

1

Data  management,  transforma9ons  •  One  source  of  data  for  all  analyJcs  •  Persisted  state  of  transformed  data  •  Significantly  faster  &  cheaper  

2

Self-­‐service  exploratory  BI  •  Simple  search  +  BI  tools  •  “Schema  on  read”  agility  •  Reduce  BI  user  backlog  requests  

3

6 © Copyright 2013 EMC Corporation. All rights reserved.

Drivers for Business Data Lakes   Traditional cost models do not

scale to “Big Data”

  Very little of the data generated is used due to expense and complexity of storing and processing data

  Challenges accessing “Atomic” level raw data.

  Time consuming data transformation limits agility and exploration

NEW DATA SOURCES

are emerging that do not meet traditional storage paradigms

7 © Copyright 2013 EMC Corporation. All rights reserved.

What is a Business Data Lake?

• Single Unified Data Pool = “Single Copy of Truth”

• Multiple Access Points & Methods

• Enterprise Data Governance & Protection

• Data Migration Immune

See IDC’s Insight: “Enterprise Data Lake Platforms: Deep Storage for Big Data & Analytics” July 2014

8 © Copyright 2013 EMC Corporation. All rights reserved.

Data Lakes Make Analytics More Efficient A data lake lets you bring your analytics tools to your data. Data is shared between projects with centralized control & standardized analytics tooling.

Ingest Store Analyze Surface Act

Capture data from a wide range of

sources, traditional and

new.

Store everything in one

environment for cross data-set

analysis.

Use advanced algorithms to discover new,

predictive patterns.

Share insight with business domain

experts.

Build data-driven applications that meet business

needs.

9 © Copyright 2013 EMC Corporation. All rights reserved.

Business Value of Analytic Agility

Complexity

Value of Analytics

($)

Descriptive Analytics

Diagnostic Analytics

Predictive Analytics

Prescriptive Analytics

What happened?

Why did it happen?

What will happen?

How can we make it happen?

BI

Data Science

10 © Copyright 2013 EMC Corporation. All rights reserved.

Current State Analytics Existing Enterprise Data Warehouse

$$$$

(Highly Summarized /

Processed Data)

ERP

HR

SFDC

Traditional Data Sources

Load

New Data Sources/Formats

Machine

ETL

Backup Storage

Trash

BI / Analytical

Tools

This data doesn’t look

right – where’s the

detail?

I really need data I know we have, but

it’s not accessible

I can’t afford to

keep buying more EDW/Datamarts

at this growth!

Business Users

11 © Copyright 2013 EMC Corporation. All rights reserved.

What is ETL Offload?

Raw Data Massive Growth

Cost Effective $

Ideal for new data sources

Hadoop/HDFS

MapReduce & SQL

Source

Source

Source

Extract

Sources Increased

Variety

Logs

Transactional

Etc.

Cleansed Data

Load

EDW Reclaim for

higher value apps Improve query performance

Load

Preserve existing EDW without increasing costs and risk

Business Apps

(incl. Data Marts)

12 © Copyright 2013 EMC Corporation. All rights reserved.

The Business Data Lake Approach

Analytic Sandbox

Ad Hoc Analytic Environment

Structured BI Reporting Environment

Data Preparation and Enrichment

Via Hadoop

ALL data fed into Business Data Lake

EDW ETL

Business Data Lake

Offload ETL to Hadoop

13 © Copyright 2013 EMC Corporation. All rights reserved.

Hadoop Enables the Business Data Lake An ecosystem for storing and processing any data type

Large community of users and developers

Easily extended with new interfaces and tools

Not limited to single data type – can access any data

Store, process, and analyze any size data sets

14 © Copyright 2013 EMC Corporation. All rights reserved.

Hadoop For The Business Data Lake?

Direct-attached storage

Stand-alone Servers

Single purpose

All commodity environment

Typical Hadoop

Support at scale

Rapid deployment

“What Now” factor

Intensive Learning Curve

Typical Challenges

Reintroduces old challenges that IT solved years ago

15 © Copyright 2013 EMC Corporation. All rights reserved.

Report

Mobile Analyze

Files

Archive

Web

A Better Business Data Lake Approach

16 © Copyright 2013 EMC Corporation. All rights reserved.

National Healthcare Organization Replaces Aging Platform, Adopts Data Lake

Challenge: •  Aging IBM Infrastructure could not support new SAS

Access and Visual Analytics Technology •  Interest in enabling infrastructure to support for-profit

healthcare analytics as a service business •  Sought to provide refined data sets to other insurance

companies for their own research, needed way to cleanse data

Solution: •  Stepwise evolution of platform onto GPDB, one of two

certified platform partners for running visual analytics •  Established data lake as platform for upload, cleansing

and conversion of private data into publicly consumable datasets

17 © Copyright 2013 EMC Corporation. All rights reserved.

Large Telco Reduces Response Time for Regulatory Reports from 1 Week to 1 Day

Challenge: •  Distributed and Heterogeneous data infrastructure

made it difficult to respond to regulatory report requests

•  Data volumes prevented analysis of information across broad timescales

Solution: •  Reports which required more than one week to create

can now be turned around same day •  1PB System storage capacity allows the analysis of all

data •  Platform combining PHD and Isilon allows the ability to

scale infrastructure for storage without having to worry about compute

18 © Copyright 2013 EMC Corporation. All rights reserved.

Analytics As A Service

19 © Copyright 2013 EMC Corporation. All rights reserved.

Thank You!

 Check out Inovalon & EMC video on YouTube

 Questions?