Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data...
Transcript of Buckeye DAMA - Data Virtualization and Data …...4 Data virtualization has emerged as a mature data...
Data Virtualization and Data Integration
Building an Modern Enterprise
Data Architecture
Dave Chiou - Denodo Sales Engineer: [email protected]
Tom LaSalle - Denodo Sales Director: [email protected]
Agenda1. Data Virtualization and Data Integration Market Perspectives
2. Data Virtualization Capabilities and How It Works
3. Data Virtualization and Data Integration Styles
4. Modern Data Architecture Examples
5. Data Virtualization Benefits and ROI
*** BREAK ***
1. Data Virtualization Demonstration
2. Participant Discussion Forum
3
Gartner Gives DV Its Highest Maturity Rating
“Data Virtualization
can be deployed with
low risk and effort to
achieve maximum
value.”
Source: https://www.gartner.com/en/newsroom/press-releases/2018-09-11-gartner-hype-cycle-for-data-management-positions-three-technologies-in-the-innovation-trigger-phase-in-2018
4
Data virtualization has emerged as a mature data delivery style, with more than 35% of the surveyed organizations making extensive use of it in production-level deployments for both analytics and operational use cases.
By 2020, organizations utilizing data virtualization as a data delivery style will spend 45% less than those who do not on building and managing data integration processes for connecting distributed data assets.
Through 2022, 60% of all organizations will implement data virtualization as one key delivery style in their data integration architecture.
Source: Gartner 2018 Data Virtualization Market Guide
5
IT and Business Going in Different Directions
BI Benchmark Report
High Cost - IT spends ~1% of Revenue on ETL
& Storage
▪ 75% of data stored is not used – largely wasted
▪ 90% of all queries are for Current data
▪ Lots of data is not available in the EDW or
data lakes
Long Time – Months to Build ETL Process
& DataMarts
▪ 2+ Months to add new data source to an EDW
▪ 1 – 2 Months to build complex dashboard or
report
Data Challenges
By2020
▪ 500% growth in Data &
Device Avalanche
▪ Due to lack of data
accessibility today
< 0.5% of all data is
ever analyzed and used
Source:
Business Speeding Up
To remain competitive,
by 2020, Business
Decision Speed &
Analysis Sophistication
Requires 300% Increase
Source:
6
Solution to IT/Business divergence:
Data Challenges
By2020
▪ 500% growth in Data &
Device Avalanche
▪ Due to lack of data
accessibility today
< 0.5% of all data is
ever analyzed and used
Source:
Business Speeding Up
To remain competitive,
by 2020, Business
Decision Speed &
Analysis Sophistication
Requires 300% Increase
Source:
Data Virtualization:
The only agile data delivery
platform that enables:
▪ IT and Business to move at different speeds so
▪ IT can store data in the most efficient way w/o
affecting the business &
▪ Business can use the best tool to make decisions
without affecting IT
▪ Add new data sources and consumers without
limitations
FedEx for Data
7
Rising Complexity of Data Management
• Exponential growth of data and wide variety of disparate data
sources (NoSQL, IOT, Open Source, SaaS applications)
• Adding capacity to existing physical data warehouse is expensive.
High level of effort to integrate and model data.
• Expansion of Big Data/Analytics by growing consumers of data
• Migration to Cloud and Hybrid distributed multi-platform
deployments – Develop Modern Data Architecture
• Reduce or eliminate Data Latency
• Need for better Data Governance
• Security and Data Privacy requirements
• Need for Agile Self-Service BI
8
http://mattturck.com/bigdata2018/
2018 Big Data and AI Landscape – “Increased Complexity”
9
Gartner Logical Data Warehouse: the Path to the Future
Sources: www.gartner.com/en/documents/3871182 and www.datavirtualizationblog.com/virtual-data-lake-business-user/
10
“Connect” vs “Collect”
Logical Data Warehouse Benefits from a Customer Implementation
50%Less time vs traditional
data warehouse
approaches
3 Hours VS 3 DaysSourcing data for BI vs traditional ETL methods
Data from different technologies
/sources can be easily combined
LDW helped to free up resources to work on other Enterprise projects.
11
Six Essential Capabilities of Data Virtualization
Unified Data Integration and Rapid Delivery of Data to Business
1. Single Access Point to Data – consumers
decoupled from data sources (location agnostic)
2. Semantic/Abstraction Layer - Data in business
friendly form (abstracts data source formats)
3. Real-time information, Zero replication
4. Access from any Tool / Protocol – ODBC / JDBC /
Data service / API Layer, etc.
5. Centralized Metadata, Security & Governance
6. Self-Service Data Services
DATA VIRTUALIZATION LAYER
12
How Does It Work?
Sources
Combine,
Transform
&
Integrate
Publish
Base View
(Source
Abstraction)Client Address Client
Type
Company Invoicing Service
Usage
Product Logs Web
Incidents
Customer Invoice Product
Customer 360°
Service Usage Incident
Hadoop Web SiteREST
Web Service
MultidimensionalSalesforceS3 BucketRDBMS/EDW
SQL, SOAP, REST, ODATA, etc. Denodo’s Information Self Service
Data Virtualization Platform
13
Current Architecture
Point to Point
Consume
in business
applications
Combine
related data
into views
2
3 DATA CONSUMERS
Enterprise Applications Reporting BI Portals ESB Mobile Web Users IoT/Streaming Data
Connect
to disparate
data sources
1 DISPARATE DATA SOURCES
Databases & Warehouses Cloud/Saas Apps Big Data NoSQL, Web XML Excel PDF Word...
Less StructuredMore Structured
Multiple protocols and formats
Data as a ServicesQuery, Search and Browse
Library of
wrappers
Any data
or content
DATA VIRTUALIZATION
DATA CONSUMERSAnalytical Operational
Agile Development
Performance
Resource Management
Lifecycle Management Data Services
Data Catalog
Governance & Metadata
Security & Data Privacy
A Modern Data Virtualization Architecture
Metadata
RepositoryExecution Engine
& Optimizer
Virtual
Databases
Semantic
Layer
14
System Execution Time Data Transferred Optimization Technique
Data
Virtualization9 sec. 4 M Aggregation push-down
Federation 125 sec. 292 M None: full scan
SELECT c.id, SUM(s.amount) as total
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.id
Data Virtualization Optimization is much more efficient than reporting tools’ federation engines
290 M 2 M
Sales Customer
join
group by
2 M
2 M
Sales Customer
join
group by
15
Data Virtualization, ETL, ESB Compared
Guiding principles on when to use DV versus other methods for data delivery are driven by your non-
functional requirements (use cases, TCO, time-to-market)
Physical Movement and
Consolidation
Logical Abstraction and
Virtual Integration
Synchronization
and Propagation
ETL CDC
DB DB DB DB
Scheduled Event Driven
▪ Building DWs and MDM Hubs
▪ Complex workflows and DQ
▪ Historical data and cubes
Data Virtualization
DB Applications
On demand
▪ Distributed access and delivery
▪ Agility and timeliness
▪ Logical Data Warehouse
EAI / ESB
Application Application
Event Driven
▪ Business process automation
▪ Transaction propagation
▪ Messaging with small payloads
MIDDLEWARE
PURPOSE
MODEL
STRENGHTS
16
Use Case Summary
Use Case DV ETL ESB
Moving data into EDW or ODS ✔
Migrating EDW (to Cloud) ✔ ✔
Data Unification ✔
Customer 360º ✔
Real-time insights ✔ ✔
Agile Data Marts ✔
Physical Data Marts ✔
Agile Reporting (from EDW + other sources) ✔
Logical Data Warehouse ✔
Data Warehouse Offloading ✔ ✔
Application Synchronization ✔ ✔
Metadata Discovery an Enrichment ✔
Self-Service Analytics ✔
ETL “seeding” (decouple ETL from sources) ✔
Event-Driven Workflows ✔
DV and ETL used in conjunction for solution
17
Data Hub/Data Lake – Modern Data Architecture
18
Modern Data Architecture - Revisited
DATA
VIRTUALIZATION
19
Modern Data Architecture - Revisited
20
Data Virtualization Reference Architecture
21
IoT Reference Architecture
Streaming Data
Ingestion StreamingAnalytics
Big Data Storage
Other RDBMS(Apps, CRM, SAP, …)
Other Sources(SaaS, Salesforce, …)
Batch Processing (ETL → EDW)
Data DiscoverySelf-Service
Search
Reporting
Data Insights
Real-Time Decision
Management
Alerts
ScorecardsDashboards
PredictiveAnalytics
StatisticalAnalytics (R)
Text Analytics
Data Mining
Batch AnalyticsMachine Learning
Data
Serv
ices
Security &Governance
Ab
str
acti
on
Tra
nsfo
rmati
on
Data Catalog
Fed
era
tio
n
Dynamic QueryOptimization
Lifecycle Management
Data Caching
Categorize
QueryDiscover
Collaborate
Cost Based Optimizer
Data Virtualization
22
Benefits of Data Virtualization
• Expose all data needed by users and enable better decision making
• Remove silo barriers of access
• Provide data catalog to allow
users to find the data they need
• Expose curated data sets or allow data scientists to explore data
• New data sources can be
configured in days rather than months or weeks
• Approximately 40% to 60% of cost savings from development
• Approximately 30% to 40% of
test cycle reduction results cost saving
• Operational cost reduction by
40% by eliminating possible physical copies
• Zero cost to remove physical
copies
• Controlled and audited access to data
• Security based on user roles, not
application silos
• Reduced number of data copies floating around organization
• Visibility into changing data
access patterns
• Managed self-service with ‘guard rails’
Faster time to
value from data
Better management
and security
Reduced data
integration costs
23
Benefits of Data Virtualization - Metrics
Value Driver Metric Goal Actual
Time to Develop Time to develop data service in
days
50% 90%
Time to Deploy Time to Deploy data service in days 50% 90%
TTM Overall time it takes to make data
service available for use
60% 90%
Time to Engage Time it takes for business to engage
with IT
75% 75%
Performance Performance of data services 50% 60%
Impact Analysis How fast can we perform impact
analysis
50% 90%
Enterprise Architectural
Alignment
Ease at which data from disparate
sources can be integrated
Security, data
classification
High
24
Customer-reported projected savings by percentage
ROI and TCO of Data Virtualization
Data Integration Cost reduction
▪ 60-80% savings
Traditional Call Centres, Portals
▪ 30-70% savings
BI and Reporting
▪ 40-60% savings
ETL and Data Warehousing
▪ Project timelines of 6-12 months reduced to 3-6 months
▪ Up to 85% reduction in time
• New sources can be configured in
minutes, and fully integrated within days.
• 100’s of application entities can be
integrated within weeks.
• New business functionality can be added
within days.
• Existing functionality can be enhanced
with new data within days.
• Data proliferation can be significantly
reduced.
• Common, consistent and timely access to
all data via preferred visualization tools.
25
Three Key Takeaways
FIRSTTakeaway
Data architectures are getting more complex…and users shouldn’t have to struggle navigating this complexity
SECONDTakeaway
Data Virtualization is a technology that hides and simplifies access to a wide variety of data for many different users – from the ‘casual users’ (with curated data sets) to the power users
THIRDTakeaway
Data Virtualization enables organizations to build a modern, flexible, and extensible data architecture while providing the security and governance needed in regulated environments
26
Demo Scenario
What’s the impact of a new
marketing campaign for each
country?
Historical sales data offloaded to Hadoop
cluster for cheaper storage
Marketing campaigns managed in an
external cloud app
Country is part of the customer details
table, stored in the DWSources
Combine,
Transform
&
Integrate
Consume
Base View Source
Abstraction
join
group and sum
join
Sales(2.8 million rows)
Campaign Customer(100,000 rows)
Data Catalog
Virtual Table (View)Role Based Security
& Masking
Push Down
Optimization
& Caching
Thanks!
www.denodo.com [email protected]
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.