Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data...

27
Data Virtualization and Data Integration Building an Modern Enterprise Data Architecture Dave Chiou - Denodo Sales Engineer: [email protected] Tom LaSalle -Denodo Sales Director: [email protected]

Transcript of Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data...

Page 1: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

Data Virtualization and Data Integration

Building an Modern Enterprise

Data Architecture

Dave Chiou - Denodo Sales Engineer: [email protected]

Tom LaSalle - Denodo Sales Director: [email protected]

Page 2: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

Agenda1. Data Virtualization and Data Integration Market Perspectives

2. Data Virtualization Capabilities and How It Works

3. Data Virtualization and Data Integration Styles

4. Modern Data Architecture Examples

5. Data Virtualization Benefits and ROI

*** BREAK ***

1. Data Virtualization Demonstration

2. Participant Discussion Forum

Page 3: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

3

Gartner Gives DV Its Highest Maturity Rating

“Data Virtualization

can be deployed with

low risk and effort to

achieve maximum

value.”

Source: https://www.gartner.com/en/newsroom/press-releases/2018-09-11-gartner-hype-cycle-for-data-management-positions-three-technologies-in-the-innovation-trigger-phase-in-2018

Page 4: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

4

Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive use of it in production-level deployments for both analytics and operational use cases.

By 2020, organizations utilizing data virtualization as a data delivery style will spend 45% less than those who do not on building and managing data integration processes for connecting distributed data assets.

Through 2022, 60% of all organizations will implement data virtualization as one key delivery style in their data integration architecture.

Source: Gartner 2018 Data Virtualization Market Guide

Page 5: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

5

IT and Business Going in Different Directions

BI Benchmark Report

High Cost - IT spends ~1% of Revenue on ETL

& Storage

▪ 75% of data stored is not used – largely wasted

▪ 90% of all queries are for Current data

▪ Lots of data is not available in the EDW or

data lakes

Long Time – Months to Build ETL Process

& DataMarts

▪ 2+ Months to add new data source to an EDW

▪ 1 – 2 Months to build complex dashboard or

report

Data Challenges

By2020

▪ 500% growth in Data &

Device Avalanche

▪ Due to lack of data

accessibility today

< 0.5% of all data is

ever analyzed and used

Source:

Business Speeding Up

To remain competitive,

by 2020, Business

Decision Speed &

Analysis Sophistication

Requires 300% Increase

Source:

Page 6: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

6

Solution to IT/Business divergence:

Data Challenges

By2020

▪ 500% growth in Data &

Device Avalanche

▪ Due to lack of data

accessibility today

< 0.5% of all data is

ever analyzed and used

Source:

Business Speeding Up

To remain competitive,

by 2020, Business

Decision Speed &

Analysis Sophistication

Requires 300% Increase

Source:

Data Virtualization:

The only agile data delivery

platform that enables:

▪ IT and Business to move at different speeds so

▪ IT can store data in the most efficient way w/o

affecting the business &

▪ Business can use the best tool to make decisions

without affecting IT

▪ Add new data sources and consumers without

limitations

FedEx for Data

Page 7: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

7

Rising Complexity of Data Management

• Exponential growth of data and wide variety of disparate data

sources (NoSQL, IOT, Open Source, SaaS applications)

• Adding capacity to existing physical data warehouse is expensive.

High level of effort to integrate and model data.

• Expansion of Big Data/Analytics by growing consumers of data

• Migration to Cloud and Hybrid distributed multi-platform

deployments – Develop Modern Data Architecture

• Reduce or eliminate Data Latency

• Need for better Data Governance

• Security and Data Privacy requirements

• Need for Agile Self-Service BI

Page 8: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

8

http://mattturck.com/bigdata2018/

2018 Big Data and AI Landscape – “Increased Complexity”

Page 9: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

9

Gartner Logical Data Warehouse: the Path to the Future

Sources: www.gartner.com/en/documents/3871182 and www.datavirtualizationblog.com/virtual-data-lake-business-user/

Page 10: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

10

“Connect” vs “Collect”

Logical Data Warehouse Benefits from a Customer Implementation

50%Less time vs traditional

data warehouse

approaches

3 Hours VS 3 DaysSourcing data for BI vs traditional ETL methods

Data from different technologies

/sources can be easily combined

LDW helped to free up resources to work on other Enterprise projects.

Page 11: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

11

Six Essential Capabilities of Data Virtualization

Unified Data Integration and Rapid Delivery of Data to Business

1. Single Access Point to Data – consumers

decoupled from data sources (location agnostic)

2. Semantic/Abstraction Layer - Data in business

friendly form (abstracts data source formats)

3. Real-time information, Zero replication

4. Access from any Tool / Protocol – ODBC / JDBC /

Data service / API Layer, etc.

5. Centralized Metadata, Security & Governance

6. Self-Service Data Services

DATA VIRTUALIZATION LAYER

Page 12: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

12

How Does It Work?

Sources

Combine,

Transform

&

Integrate

Publish

Base View

(Source

Abstraction)Client Address Client

Type

Company Invoicing Service

Usage

Product Logs Web

Incidents

Customer Invoice Product

Customer 360°

Service Usage Incident

Hadoop Web SiteREST

Web Service

MultidimensionalSalesforceS3 BucketRDBMS/EDW

SQL, SOAP, REST, ODATA, etc. Denodo’s Information Self Service

Data Virtualization Platform

Page 13: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

13

Current Architecture

Point to Point

Consume

in business

applications

Combine

related data

into views

2

3 DATA CONSUMERS

Enterprise Applications Reporting BI Portals ESB Mobile Web Users IoT/Streaming Data

Connect

to disparate

data sources

1 DISPARATE DATA SOURCES

Databases & Warehouses Cloud/Saas Apps Big Data NoSQL, Web XML Excel PDF Word...

Less StructuredMore Structured

Multiple protocols and formats

Data as a ServicesQuery, Search and Browse

Library of

wrappers

Any data

or content

DATA VIRTUALIZATION

DATA CONSUMERSAnalytical Operational

Agile Development

Performance

Resource Management

Lifecycle Management Data Services

Data Catalog

Governance & Metadata

Security & Data Privacy

A Modern Data Virtualization Architecture

Metadata

RepositoryExecution Engine

& Optimizer

Virtual

Databases

Semantic

Layer

Page 14: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

14

System Execution Time Data Transferred Optimization Technique

Data

Virtualization9 sec. 4 M Aggregation push-down

Federation 125 sec. 292 M None: full scan

SELECT c.id, SUM(s.amount) as total

FROM customer c JOIN sales s

ON c.id = s.customer_id

GROUP BY c.id

Data Virtualization Optimization is much more efficient than reporting tools’ federation engines

290 M 2 M

Sales Customer

join

group by

2 M

2 M

Sales Customer

join

group by

Page 15: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

15

Data Virtualization, ETL, ESB Compared

Guiding principles on when to use DV versus other methods for data delivery are driven by your non-

functional requirements (use cases, TCO, time-to-market)

Physical Movement and

Consolidation

Logical Abstraction and

Virtual Integration

Synchronization

and Propagation

ETL CDC

DB DB DB DB

Scheduled Event Driven

▪ Building DWs and MDM Hubs

▪ Complex workflows and DQ

▪ Historical data and cubes

Data Virtualization

DB Applications

On demand

▪ Distributed access and delivery

▪ Agility and timeliness

▪ Logical Data Warehouse

EAI / ESB

Application Application

Event Driven

▪ Business process automation

▪ Transaction propagation

▪ Messaging with small payloads

MIDDLEWARE

PURPOSE

MODEL

STRENGHTS

Page 16: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

16

Use Case Summary

Use Case DV ETL ESB

Moving data into EDW or ODS ✔

Migrating EDW (to Cloud) ✔ ✔

Data Unification ✔

Customer 360º ✔

Real-time insights ✔ ✔

Agile Data Marts ✔

Physical Data Marts ✔

Agile Reporting (from EDW + other sources) ✔

Logical Data Warehouse ✔

Data Warehouse Offloading ✔ ✔

Application Synchronization ✔ ✔

Metadata Discovery an Enrichment ✔

Self-Service Analytics ✔

ETL “seeding” (decouple ETL from sources) ✔

Event-Driven Workflows ✔

DV and ETL used in conjunction for solution

Page 17: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

17

Data Hub/Data Lake – Modern Data Architecture

Page 18: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

18

Modern Data Architecture - Revisited

DATA

VIRTUALIZATION

Page 19: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

19

Modern Data Architecture - Revisited

Page 20: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

20

Data Virtualization Reference Architecture

Page 21: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

21

IoT Reference Architecture

Streaming Data

Ingestion StreamingAnalytics

Big Data Storage

Other RDBMS(Apps, CRM, SAP, …)

Other Sources(SaaS, Salesforce, …)

Batch Processing (ETL → EDW)

Data DiscoverySelf-Service

Search

Reporting

Data Insights

Real-Time Decision

Management

Alerts

ScorecardsDashboards

PredictiveAnalytics

StatisticalAnalytics (R)

Text Analytics

Data Mining

Batch AnalyticsMachine Learning

Data

Serv

ices

Security &Governance

Ab

str

acti

on

Tra

nsfo

rmati

on

Data Catalog

Fed

era

tio

n

Dynamic QueryOptimization

Lifecycle Management

Data Caching

Categorize

QueryDiscover

Collaborate

Cost Based Optimizer

Data Virtualization

Page 22: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

22

Benefits of Data Virtualization

• Expose all data needed by users and enable better decision making

• Remove silo barriers of access

• Provide data catalog to allow

users to find the data they need

• Expose curated data sets or allow data scientists to explore data

• New data sources can be

configured in days rather than months or weeks

• Approximately 40% to 60% of cost savings from development

• Approximately 30% to 40% of

test cycle reduction results cost saving

• Operational cost reduction by

40% by eliminating possible physical copies

• Zero cost to remove physical

copies

• Controlled and audited access to data

• Security based on user roles, not

application silos

• Reduced number of data copies floating around organization

• Visibility into changing data

access patterns

• Managed self-service with ‘guard rails’

Faster time to

value from data

Better management

and security

Reduced data

integration costs

Page 23: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

23

Benefits of Data Virtualization - Metrics

Value Driver Metric Goal Actual

Time to Develop Time to develop data service in

days

50% 90%

Time to Deploy Time to Deploy data service in days 50% 90%

TTM Overall time it takes to make data

service available for use

60% 90%

Time to Engage Time it takes for business to engage

with IT

75% 75%

Performance Performance of data services 50% 60%

Impact Analysis How fast can we perform impact

analysis

50% 90%

Enterprise Architectural

Alignment

Ease at which data from disparate

sources can be integrated

Security, data

classification

High

Page 24: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

24

Customer-reported projected savings by percentage

ROI and TCO of Data Virtualization

Data Integration Cost reduction

▪ 60-80% savings

Traditional Call Centres, Portals

▪ 30-70% savings

BI and Reporting

▪ 40-60% savings

ETL and Data Warehousing

▪ Project timelines of 6-12 months reduced to 3-6 months

▪ Up to 85% reduction in time

• New sources can be configured in

minutes, and fully integrated within days.

• 100’s of application entities can be

integrated within weeks.

• New business functionality can be added

within days.

• Existing functionality can be enhanced

with new data within days.

• Data proliferation can be significantly

reduced.

• Common, consistent and timely access to

all data via preferred visualization tools.

Page 25: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

25

Three Key Takeaways

FIRSTTakeaway

Data architectures are getting more complex…and users shouldn’t have to struggle navigating this complexity

SECONDTakeaway

Data Virtualization is a technology that hides and simplifies access to a wide variety of data for many different users – from the ‘casual users’ (with curated data sets) to the power users

THIRDTakeaway

Data Virtualization enables organizations to build a modern, flexible, and extensible data architecture while providing the security and governance needed in regulated environments

Page 26: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

26

Demo Scenario

What’s the impact of a new

marketing campaign for each

country?

Historical sales data offloaded to Hadoop

cluster for cheaper storage

Marketing campaigns managed in an

external cloud app

Country is part of the customer details

table, stored in the DWSources

Combine,

Transform

&

Integrate

Consume

Base View Source

Abstraction

join

group and sum

join

Sales(2.8 million rows)

Campaign Customer(100,000 rows)

Data Catalog

Virtual Table (View)Role Based Security

& Masking

Push Down

Optimization

& Caching

Page 27: Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive

Thanks!

www.denodo.com [email protected]

© Copyright Denodo Technologies. All rights reserved

Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,

without prior the written authorization from Denodo Technologies.