Ovum Fireside Chat: Governing the data lake - Understanding what's in there

11
Fireside Chat with Tony Baer, Ovum Research Developing a Strategy for Data Lake Governance Wednesday, May 18, 2016 1:00 pm EST

Transcript of Ovum Fireside Chat: Governing the data lake - Understanding what's in there

Fireside Chat with Tony Baer, Ovum Research Developing a Strategy for Data Lake Governance

Wednesday, May 18, 2016 1:00 pm EST

Meet today’s speakers

Tony Baer Principle Analyst, Information Management, Ovum Tony Baer leads Ovum’s Big Data research area. His coverage focuses on how Big Data must become a first-class citizen in the data center, IT organization, and the business. He has a multi-disciplinary background touching the different tiers of enterprise software. He is an author and sought after speaker.

Scott Gidley Vice President of Product, Zaloni Scott is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, Scott served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation. Scott received his BS in Computer Science from University of Pittsburgh.

•  Award-winning provider of enterprise data lake management solutions:

Integrated data lake management platform

Self-service data preparation

•  Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training

•  Data Science Professional Services

Delivering on the business of big data

Funded by top-tier technology investors:

Key Findings •  Data lakes must be managed •  Data lakes must have the capability to ingest all data &

related metadata •  Data lakes will only succeed if they become shared

resources •  Business users must be prepared to take responsibility

for curating data. •  Maturity & readiness of tools, technologies & best

practices are works in progress •  Mgmt. & governance of data lakes should be a phased

process

Ovum Big Data Report: Developing a Strategy for Data Lake Governance

Group Multi-department Enterprise

Log analytics Sentiment Analysis DW offload

Data Lake

Exploratory Analytics Line of business analytic applications Operational analytics

Data lake is later stage of Hadoop adoption

IT Data Scientists Business

Bulk storage of raw data

Exploratory Analytics Line of business analytic applications

Operational analytics

Migrate I/O-intensive operations (e.g., ELT)

“Deep” analytics (e.g. segmentation, predictive, prescriptive modeling)

Data lake use case maturity model

Availability/Reliability

(FT, HA

, Backup D

R)

Monitoring &

troubleshooting

Perimeter

Security

Data platform (Hadoop)

Query/Analytics tools, programs

Cost Optimization & Integration

Data Inventory Data Curation

Data-level security

Self-service tier

Data Lake building block Hadoop platform management

End user tool

Ovum’s data lake reference architecture

Data lake challenges and complications

•  Ingestion

•  Lack of Visibility

•  Privacy and Compliance

•  Quality Issues

•  Reliance on IT

•  Reusability

•  Rate of Change

•  Skills Gap

•  Complexity

Building: Managing: Delivering:

Zaloni Confidential and Proprietary 8

Engage the business

• Discover • Enrich

• Provision

Govern the data in the lake

• Cleanse • Secure

• Operationalize

Enable the data lake

•  Ingest • Organize • Catalog

Data Curation Build your library of

information

Physical Inventory Know/manage what data is in

the data lake

Data profiling, data preparation, collaborative data enrichment,

catalog, match data, derive master data, record data lineage

Business & Analytics teams Technology team

Manage data access, track data lineage, tag for security,

data retention

Manage data access, tag for security, data retention, lifecycle &

workflow, track data lineage

Collaboration key to modern data management

Data lake reference architecture Consumption

ZoneSource System

File Data

DB Data

ETL Extracts

Streaming

TransientLoading Zone

Raw Data Refined Data

Trusted Data

DiscoverySandbox

Original unaltered data attributes

Tokenized Data

APIs

Reference Data Master Data

Data WranglingData DiscoveryExploratory Analytics

Metadata Data Quality Data Catalog Security

Data Lake

Integrate to common formatData ValidationData CleansingAggregations

OLTP or ODS

Enterprise Data Warehouse

Logs(or other unstructured

data)

Cloud Services

Business AnalystsResearchersData Scientists

Zaloni Proprietary 10

DON’T GO IN THE DATA LAKE WITHOUT US

Zaloni Proprietary