Welcome [tc18.tableau.com] · 2020-01-06 · Hyper compiles each SQL query to machine code and then...
Transcript of Welcome [tc18.tableau.com] · 2020-01-06 · Hyper compiles each SQL query to machine code and then...
Welcome
Boom Goes the Data Engine!Turbocharging Tableau with Hyper
Tobias Muehlbauer
Jan Finis
# T C 1 8
What is Hyper?
2008: Hyper Started as a Research Projectat Technical University Munich
Academic Success
Commercial Spin-Off
Early 2016: Tableau Acquires Hyper
Europe R&D Center in Munichwith Now Over 30 Employees
10.5: Replace Tableau Data Engine inExisting Tableau Products
2018+: Data Engine for Prep, Improve Existing Scenarios, and Evaluate New Use Cases
Hyper
Hyper as the Data Engine Replacement
Desktop Online Public Server
Hyper
Hyper as the Data Engine for Prep
Customer Impact in 10.5 and 2018
Speed and size: fast analysis on data of all sizes
• Larger data sets
• Query performance scales linearly with number of CPU cores
Data freshness: faster extract creation and refreshes
• Ingestion at speed of data source
• No post-processing phase
Enterprise ready: improved scalability and performance
• Improved throughput with Hyper and Tableau Server
Learnings
Integration project that replaces a core component
Scalability and performance are an end-to-end story
Every data set and query workload is differentContinuous performance improvementsDifferences in size of extract filesMaterialization of calculations
Resource usage and deployment guidelines
Why are Databases So Slow?
Pat HanrahanCo-Founder and Chief Scientist of Tableau
Keynote at an Academic Database Conference, 2012
Why is Hyper So Fast?
Modern hardware …
... and what it means for database systems
A Changing Hardware Landscape
Main memory capacities are growing fast
A Changing Hardware Landscape
CPUs are based on an increasingly complex super-scalar multi-core architecture
What Does it Mean for Database Systems?
A Changing Hardware Landscape:What does it mean for Database Systems?
If a hand-coded program is faster than all databases, then why can't the database just generate this program?
Hyper compiles each SQL query to machine code and then executes this code
Traditional Interpretation vs. Compilation
Traditional Interpreting Database Compiling Database (Hyper)
Interpreter has to handle all possible queries
→ very general
Cannot be adapted to specific query at hand
→ no query-specific optimizations
Code generated for the specific query at hand
→ highly specialized
Query-specific optimizations baked into the program
→ highly optimized for query at hand
Query execution starts immediatly
1. Generate highly optimized C program (fast)
2. Compilation to machine code
(slow, some seconds for C code)
3. Execution of machine code
Why is Compilation Attractive Now?Analogy: Fetching data from memory = Getting a document (1cycle = 3 feet)
Processing the data = Reading the document
≈1 cycle
≈4 cycles
≈10 cycles
≈40 cycles
≈200 cycles
≈4-40 Million cycles
Latency
DISK (HDD)
RAM
L3 Cache
L2 Cache
L1 Cache
CPU Registers
other side of the earth
For decades, databases “went around the earth” to get your data
Hyper
Query Optimization
Query Optimization
Efficient execution is not enough!
How Well Do Databases Optimize?
Problem: Query optimization is inherently hard
→ Only real experts can write good query optimizers
→Many existing systems lack various optimizations
→Only very few researchers in the whole world specialize in query optimization
So, why is Hyper good at optimization?
Prof. Dr. Thomas Neumann, co-founder of Hyper and principal advisor at Tableau, is one of these few researchers.
Thomas is a leading researcher in database systems research specializing in query optimization.
Parallelization
Modern CPUs have lots of cores!
But more cores are only beneficial if the software can keep these cores busy
No parallelization: Almost no utilization
Traditional parallelization in database systems only scales to a few cores; with more cores there is no further speedup
Hyper’s morsel-driven parallelization fully utilizes large numbers of cores (>120)
Morsel Driven Parallelism
Assume you have 4 people (cores) and your goal is to eat a cake (process a query) as fast as possible
How do you do it?
How Traditional Databases Eat Cake
Cut the cake into four equally large pieces
Every person eats one piece
What if one person is slower than the others? (skew)
Hard-to-eat nuts
Piece is larger than anticipated
Distracted by other work
…
In the end, she eats alone, while the others have to wait
Bad CPU utilization
What if new people arrive or have to leave after the cake was cut?
→ Load balancing is hard/no elasticity
Why Skew Really Hurts
Amdahl’s law: The theoretical speedup of a partly parallel task is always limited by the non-parallel part of the task
If not all parts of query execution are fully parallelized, scalability will be limited
The more processors you want to utilize, the better you must parallelize
Skew forces a part of the task to be serial (last person eats alone)
95% 32 13x 40%
How Traditional Databases Eat Cake
How Hyper Eats Cake
Cut the cake into very small morsels
Everyone grabs a morsel whenever he/she finishes their current morsel
A faster eater simply eats more morsels
If morsels are small enough, all eaters finish at roughly the same time (skew resilience)
If a tastier cake becomes available, an eater can switch to it quickly (query prioritization)
Number of eaters per cake can change dynamically (elasticity)
Enabling morsel-driven parallelism in existing database systems is a lot of effort
Hyper versus TDE on 32 cores
More than Analytics …
Supporting Transactions and Analytics
and a Data Warehouse is hard
Combining a Transactional System
Hyper: The New Data Engine
Extract Creation
Extract Refresh
Federation
Project Maestro Dashboards
Interactive Analysis
Deep Analytics
Hyper allows both, efficient management of your data,
as well as fast analysis of the latest state of your data.
Please complete the
session survey from the My
Evaluations menu
in your TC18 app