Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical &...
-
Upload
denodo -
Category
Technology
-
view
100 -
download
0
Transcript of Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical &...
Correctly Architecting your Solutions for Analytical & Operational Uses
Alberto Bengoa, Sr. Product Manager, Denodo
Agenda1.Introduction
2.Operational use cases
3.Analytical use cases
4.Deployment architectures
5.Optimization of use cases
Introduction
4
Introduction
• The Denodo Platform is a powerful middleware
• Horizontal, rather than vertical
• It can be deployed in a range of use cases
• Each use case is different
• The use case guides the way we deploy data virtualization
• Two main groups of use cases:
• Operational (or transactional)
• Analytical (or informational)
Operational Use Cases
6
Operational Use Cases
• Online/interactive work: a human is in an interactive session with
the software
• This requires low latency in the queries
• Queries with low selectivity: each query uses only a small subset
of our data
• Short queries: queries are fast and return small data sets
• Transactional: read & write support with many data sources
• High concurrency: typically there are many clients
Characteristics
7
Operational Use Cases
• Goal: retrieve information about a single customer
• Multiple data sources
• Billing
• Sales
• Technical support
• Potentially write back new information about the customer case
• Many (1000s+) concurrent agents solving customer problems
• Short queries
Example: call center software
Analytical Use Cases
9
Analytical Use Cases
• Offline use cases: queries are typically scheduled
• Latency is not important
• Processing of very big data sets
• Long running queries
• Read-only
• Low concurrency
• Infrequent queries
Characteristics
10
Analytical Use Cases
• Create reports across multiple data sources
• Daily, weekly, monthly, quarterly, yearly…
• Very big input data sets
• Fact tables (sales!)
• Multiple dimensions
• Products
• Territories
• Small output data sets
Example: LDW reporting
Reference Architectures
12
Denodo Platform Reference ArchitecturesOperational use cases
13
Denodo Platform Reference ArchitecturesAnalytical use cases
14
Denodo Platform Reference ArchitecturesAnalytical use cases (extended)
Optimization and Deployments
16
Optimization of Use Cases
• Optimize queries for < execution time
• Online use cases have a human waiting for results
• Increase data source pool sizes
• Data source pools can be a bottleneck with many concurrent queries
• Configure high number of concurrent users
• Make sure no clients are waiting unnecessarily
• Hardware resources (CPU, RAM) permitting
• Use historical analysis of queries to characterize workloads
• Know your queries: median and mean duration, number of concurrent queries, etc.
Operational uses (I)
17
Optimization of Use Cases
• Use partial caching for queries that are very frequent
• Partial caching with low TTL is specially useful during a single agent session
• Scale horizontally: more servers before bigger servers
• Each query is small, so many servers add capacity and resiliency
• Bigger servers do not help with the network bottleneck
• Drop queries vs waiting queues – your choice
• Use case requirements will determine what is best
Operational uses (II)
18
Optimization of Use Cases
• Single sign on, pass-through credentials
• Ensure the data source applies security restrictions when accessing the
data
• Set up low logging levels (logging is slow)
• Make sure logging is done only as absolutely necessary
• Possible to enable higher levels temporarily to diagnose issues
Operational uses (III)
19
Optimization of Use Cases
• Critical to push down processing to the sources
• Analyze the query plans and optimize for delegation
• Push down reduces network data transfers by orders of magnitude
• High memory settings
• Source pushdown is key, but sometimes processing at the data
virtualization layer is needed
• Set up swapping
• Ensure that the server does not run out of memory
• Gracefully degrade performance
Analytical use cases (I)
20
Optimization of Use Cases
• Fast hard drives for swapping
• SSDs will help reduce the impact of accessing primary storage
• Balance query optimization between speed, memory usage and
data source load
• Depending on the limitations of the source systems, the requirements of
the use case, and the hardware of the data virtualization layer
• Set low number of concurrent connections
• Ensure your server is not competing for resources with other queries
Analytical use cases (II)
21
Optimization of Use Cases
• Increase query timeouts
• Standard timeouts are too small for long-running queries
• Monitor server in real time
• Check the health of the servers and ensure everything is ok
• Our monitoring tools allow you to inspect query plans in real time to verify a
correct query behavior
• Use full cache
• To cache slow sources
• To store intermediate results that are reused several times through the report
Analytical use cases (III)
22
Optimization of Use Cases
• Scale vertically: bigger servers before more servers
• Bigger servers allow faster processing of big volumes of data
• More servers do not help in a scenario with few concurrent queries
• Multi-level servers for minimal network transfers and maximum
push down to sources
• Multiple data virtualization servers, close to geographically distributed data
sources
• Each server aggregates local data
• Each server shares with other servers aggregation results (small data sets)
Analytical use cases (IV)
23
Deploying Mixed Solutions
• Use separate servers for separate use cases
• Do not mix use cases in the same server
• Very different performance profiles
• Treat your operational and analytical servers separately
• Assign different hardware resources
• Assign separate security configuration
• Assign separate update policies
• Assign separate code deployment policies
Thanks!
www.denodo.com [email protected]
© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.