Welcome [tc18.tableau.com] · 2020-01-06 · Hyper compiles each SQL query to machine code and then...

Welcome

Boom Goes the Data Engine!Turbocharging Tableau with Hyper

Tobias Muehlbauer

Jan Finis

# T C 1 8

What is Hyper?

2008: Hyper Started as a Research Projectat Technical University Munich

Academic Success

Commercial Spin-Off

Early 2016: Tableau Acquires Hyper

Europe R&D Center in Munichwith Now Over 30 Employees

10.5: Replace Tableau Data Engine inExisting Tableau Products

2018+: Data Engine for Prep, Improve Existing Scenarios, and Evaluate New Use Cases

Hyper as the Data Engine Replacement

Desktop Online Public Server

Hyper

Hyper as the Data Engine for Prep

Customer Impact in 10.5 and 2018

Speed and size: fast analysis on data of all sizes

• Larger data sets

• Query performance scales linearly with number of CPU cores

Data freshness: faster extract creation and refreshes

• Ingestion at speed of data source

• No post-processing phase

Enterprise ready: improved scalability and performance

• Improved throughput with Hyper and Tableau Server

Learnings

Integration project that replaces a core component

Scalability and performance are an end-to-end story

Every data set and query workload is differentContinuous performance improvementsDifferences in size of extract filesMaterialization of calculations

Resource usage and deployment guidelines

Why are Databases So Slow?

Pat HanrahanCo-Founder and Chief Scientist of Tableau

Keynote at an Academic Database Conference, 2012

Why is Hyper So Fast?

Modern hardware …

... and what it means for database systems

A Changing Hardware Landscape

Main memory capacities are growing fast

A Changing Hardware Landscape

CPUs are based on an increasingly complex super-scalar multi-core architecture

What Does it Mean for Database Systems?

A Changing Hardware Landscape:What does it mean for Database Systems?

If a hand-coded program is faster than all databases, then why can't the database just generate this program?

Hyper compiles each SQL query to machine code and then executes this code

Traditional Interpretation vs. Compilation

Traditional Interpreting Database Compiling Database (Hyper)

Interpreter has to handle all possible queries

→ very general

Cannot be adapted to specific query at hand

→ no query-specific optimizations

Code generated for the specific query at hand

→ highly specialized

Query-specific optimizations baked into the program

→ highly optimized for query at hand

Query execution starts immediatly

1. Generate highly optimized C program (fast)

2. Compilation to machine code

(slow, some seconds for C code)

3. Execution of machine code

Why is Compilation Attractive Now?Analogy: Fetching data from memory = Getting a document (1cycle = 3 feet)

Processing the data = Reading the document

≈1 cycle

≈4 cycles

≈10 cycles

≈40 cycles

≈200 cycles

≈4-40 Million cycles

Latency

DISK (HDD)

RAM

L3 Cache

L2 Cache

L1 Cache

CPU Registers

other side of the earth

For decades, databases “went around the earth” to get your data

Hyper

Query Optimization

Query Optimization

Efficient execution is not enough!

How Well Do Databases Optimize?

Problem: Query optimization is inherently hard

→ Only real experts can write good query optimizers

→Many existing systems lack various optimizations

→Only very few researchers in the whole world specialize in query optimization

So, why is Hyper good at optimization?

Prof. Dr. Thomas Neumann, co-founder of Hyper and principal advisor at Tableau, is one of these few researchers.

Thomas is a leading researcher in database systems research specializing in query optimization.

Parallelization

Modern CPUs have lots of cores!

But more cores are only beneficial if the software can keep these cores busy

No parallelization: Almost no utilization

Traditional parallelization in database systems only scales to a few cores; with more cores there is no further speedup

Hyper’s morsel-driven parallelization fully utilizes large numbers of cores (>120)

Morsel Driven Parallelism

Assume you have 4 people (cores) and your goal is to eat a cake (process a query) as fast as possible

How do you do it?

How Traditional Databases Eat Cake

Cut the cake into four equally large pieces

Every person eats one piece

What if one person is slower than the others? (skew)

Hard-to-eat nuts

Piece is larger than anticipated

Distracted by other work

…

In the end, she eats alone, while the others have to wait

Bad CPU utilization

What if new people arrive or have to leave after the cake was cut?

→ Load balancing is hard/no elasticity

Why Skew Really Hurts

Amdahl’s law: The theoretical speedup of a partly parallel task is always limited by the non-parallel part of the task

If not all parts of query execution are fully parallelized, scalability will be limited

The more processors you want to utilize, the better you must parallelize

Skew forces a part of the task to be serial (last person eats alone)

95% 32 13x 40%

How Traditional Databases Eat Cake

How Hyper Eats Cake

Cut the cake into very small morsels

Everyone grabs a morsel whenever he/she finishes their current morsel

A faster eater simply eats more morsels

If morsels are small enough, all eaters finish at roughly the same time (skew resilience)

If a tastier cake becomes available, an eater can switch to it quickly (query prioritization)

Number of eaters per cake can change dynamically (elasticity)

Enabling morsel-driven parallelism in existing database systems is a lot of effort

Hyper versus TDE on 32 cores

More than Analytics …

Supporting Transactions and Analytics

and a Data Warehouse is hard

Combining a Transactional System

Hyper: The New Data Engine

Extract Creation

Extract Refresh

Federation

Project Maestro Dashboards

Interactive Analysis

Deep Analytics

Hyper allows both, efficient management of your data,

as well as fast analysis of the latest state of your data.

Please complete the

session survey from the My

Evaluations menu

in your TC18 app

Thank you!

# T C 1 8

Tobias Muehlbauer: [email protected]

Jan Finis: [email protected]

Welcome [tc18.tableau.com] · 2020-01-06 · Hyper compiles each SQL query to machine code and then...

Documents

Transcript of Welcome [tc18.tableau.com] · 2020-01-06 · Hyper compiles each SQL query to machine code and then...