Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives
-
Upload
huguk -
Category
Technology
-
view
253 -
download
0
description
Transcript of Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives
© 2014 EXASOL AG
1
Pla$orm performance comparisons, bare metal and cloud hos6ng alterna6ves
Dave Shu4leworth, Principal Consultant, EXASOL UK
© 2014 EXASOL AG
2
Agenda
§ Who is Exasol ? § Benchmark Background and ObjecNves § Why TPC-‐DS? § TPC-‐DS approach § Test plaRorms § Test Results § Summary
© 2014 EXASOL AG
3
§ Founded in 2000 in Nuremberg, Germany
‒ Based on university research begun in 1990s at Friedrich Alexander University (Erlangen) and University of Jena
§ Employees today: >60
§ 70+ customers / 150+ installaNons / 300+ OEM customers
§ Offices: Brazil, Germany, Israel, UK and US
§ Core product offering: ‒ EXASoluNon in-‐memory RDBMS
‒ EXAPowerlyNcs analyNcs plaRorm
§ Key industries: Digital Media, Retail, Financial Services, Healthcare
Exasol company snapshot
© 2014 EXASOL AG
4
Core design principles § Speed § Smart § Simplicity
The first columnar, in-‐memory, MPP database
Culture § R&D driven culture § We deliver on our promises § Open and straighRorward to deal with
EXASOL bet on a columnar, in-‐memory, massively parallel architecture 15 years ago
© 2014 EXASOL AG
5
70+ customers in 11 countries (plus 300+ OEM customers)
© 2014 EXASOL AG
6
QphH@1000 GB 1,000,000 2,000,000 3,000,000 4.000,000
Oct ´11
April ´14
June ´12
Feb ´14
Dec ´13
Aug ´11
Sept ´11
Oct ´11
Dec ´11
Source: www.tpc.org / May 26, 2014
We are the benchmark leader
4,253,937
Microson 134,117
Oracle 201,487
Oracle 209,533
Microson 219,887
Sybase IQ 258,474
Oracle 326,454
Vectorwise 445,529
Microson 519,976
TPC-‐H is the industry standard benchmark for analyNcal databases
© 2014 EXASOL AG
7
100GB 300GB
1.000GB 3.000GB
10.000GB
§ #1 TPC-‐H -‐ dwarfing our followers
§ The bigger the data, the greater the advantage
§ 30TB/100TB results coming soon!
Unseen scalability
© 2014 EXASOL AG
8
Background & Objec6ves
§ We wanted to understand the relaNve performance of Exasol on cloud deployments vs more convenNonal ‘bare metal‘ installaNons
§ CollaboraNon with Bigstep gave us the opportunity to include high performance ‘bare metal‘ cloud alongside AWS
§ We decided to use TPC-‐DS as the benchmark test § Independently specified § Most recent benchmark (and most difficult!) § Already experienced with TPC-‐H § TPC-‐DS sample results are already appearing for new products
© 2014 EXASOL AG
9
Why use TPC-‐DS?
§ TPC general characterisNcs § Broad Industry representaNon (all decisions taken by the TPC board) § Verifiable (audit process) § Domain specific standard tests § Cross-‐vendor comparisons (performance, TCO) § Use to evaluate new technologies § Eliminate costly in-‐house benchmark development
§ TPC-‐DS § RealisNc and understandable data model § Complex workload
§ Large query set § ETL like update model
§ Simple and comprehensible metrics § Already some (restricted) test results released for new products
© 2014 EXASOL AG
10
TPC-‐DS approach
§ UNliNes provided to generate the raw data sets and queries § Dial in the scale factor (i.e. overall database size) – we used scale factor 1000 (1TB) § Generate raw data (more of which later) § Load data using fastest available method
§ For a fully audited TPC-‐DS benchmark the load Nme is taken into consideraNon § Tuning via indexes, data distribuNon etc is allowed at this stage (but any Nme
taken should be included in the ‘set up Nme’) § Generate query scripts
§ Can generate a ‘qualificaNon script’ for syntax check –i.e. all queries in sequence
§ Then generate a series of scripts for to be run as individual streams for ‘throughput’ test
§ The generaNon process will create query scripts in different sequences, with different selecNon criteria etc – scale factor determines number of concurrent streams
© 2014 EXASOL AG
11
TPC-‐DS overview – Data Model
Catalog Returns
Catalog Sales
Inventory
Web Returns
Web Sales
Store Returns
Store Sales l 3 sales channels: Catalog -‐ Web -‐ Store l 7 fact tables l 2 fact tables for each sales channel
l 24 tables total
l At Scale factor 1000 (1TB):
l Store Sales – 2.87 billion rows
l Catalog Sales – 1.43 billion rows
l Web Sales – 720 million rows
l Inventory – 783 million rows
Source: ‘The making of TPC-‐DS’ – VLDB conference 2006
© 2014 EXASOL AG
12
TPC-‐DS overview – Data Model
Date_Dim
Item Time_Dim
Customer_ Demographics
Store
Household_ Demographics
Promotion
Income_ Band Customer
Customer_ Address
Store_Sales
Source: ‘The making of TPC-‐DS’ – VLDB conference 2006
© 2014 EXASOL AG
13
TPC-‐DS Overview – Data Model
§ Some data has “real world” content: § Last name “Sanchez”, “Ward”, “Roberts” § Addresses “630 Railroad, Woodbine, Sullivan County,MO-‐64253”
§ Data is skewed § Sales are modeled aner US census data § More green items than red § Small and large ciNes
§ RealisNc table scaling § Non-‐uniform distribuNons à challenging for:
§ staNsNcs collecNon § query opNmizer
© 2014 EXASOL AG
14
TPC-‐DS Overview – Data Model
Distribution of Store Sales over Month
0
100000
200000
300000
400000
500000
600000
1 2 3 4 5 6 7 8 9 10 11 12
Month
Stor
e Sa
les
Group 1
Group 2
Group 3 14 % of all sales happen between January and July
28 % of all sales happen between
August and October
58% of all sales happen in November and
December
Source: ‘The making of TPC-‐DS’ – VLDB conference 2006
© 2014 EXASOL AG
15
TPC-‐DS Overview – queries
Data Mining
Iterative
Ad-hoc Reporting
Type
Queries feeding Data Mining Tools for further processing
Users issuing sequences of queries
Sporadic queries, minimal tuning Finely tuned reoccurring queries
simulate
Return large number of rows
Sequence of queries where each query adds SQL elements
Access Store and Web Sales Channel tables
Access catalog sales channel tables
Implemented via
10
4
47 38
Templates
§ Query Language: SQL99 + OLAP analyNcal extensions § Query needs to be executed “as is” § No hints or rewrites allowed, except when approved by TPC § 99 different query templates § 4 different query types:
Source: ‘The making of TPC-‐DS’ – VLDB conference 2006
© 2014 EXASOL AG
16
Test pla$orms The objecNve was to use similar 4 node configuraNons with a similar amount of RAM – but the actual configuraNons available were these:
Platform CPU RAM Disk
Bigstep2 x Intel XEON E5-‐2430 CPU 6-‐core 2.2GHz 96GB
1 x 750GB SAN (SSD)
AWShs1.8xlarge: 16 vCPUs (Intel XEON) 117GB 24 x 2TB HDD
Exasol 'bare metal'
DELL PowerEdge R720:2 x Intel Xeon E5-‐2680v2 CPU (10-‐core) 2.8GHz 128GB 8 x 1.2TB HDD
The EXASOL database was configured to use the same amount of RAM across all plaRorms (344GB)
© 2014 EXASOL AG
17
TPC-‐DS test considera6ons
§ Some SQL implementaNons would be unable to run all 99 queries due to unsupported SQL funcNons – e.g. § SQL 2008 AnalyNcal funcNons such as RANK § INTERSECT, EXCEPT operators § GROUPING and ROLLUP funcNons § Some subquery syntax
§ Exasol is able to run all queries
© 2014 EXASOL AG
18
Test findings
§ This test was not intended to show the absolute maximum performance for the TPC-‐DS benchmark, but as a method of comparing performance across the various plaRorms
§ These results do NOT consNtute a full TPC-‐DS benchmark as the complete test regime as specified by TPC has not been done
§ No vendor has yet posted a complete audited TPC-‐DS benchmark result set
§ HOWEVER – the same set of TPC-‐DS queries was run against the same 1TB data set across all plaRorms and so the Nmings can be compared
§ There is some variability between plaRorms used due to the configuraNons available
© 2014 EXASOL AG
19
Test Results – Single stream – simple queries
These numbers are based on a subset of 17 queries from the total set of 99 TPC-‐DS queries -‐ these are relaNvely simple short-‐running queries. These queries are seen in several other published result sets
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0
query34
query3
query42
query43
query46
query52
query53
query55
query59
query63
query65
query68
query73
query79
query7
query89
query98
Time in seconds
Exasol bare metal
Exasol Bigstep
Exasol AWS
© 2014 EXASOL AG
20
Test Results – Single stream – medium complexity
These numbers are based on a subset of 14 queries from the total set of 99 TPC-‐DS queries -‐ these are medium complexity queries which typically include some large joins
0.0 10.0 20.0 30.0 40.0 50.0 60.0
query37
query40
query43
query46
query59
query68
query72
query73
query75
query82
query85
query88
query93
query99
Time in seconds
Exasol bare metal
Exasol Bigstep
Exasol AWS
© 2014 EXASOL AG
21
Test Results – Single stream – complex queries
These numbers are based on a subset of 15 queries from the total set of 99 TPC-‐DS queries -‐ these are higher complexity queries, or queries that return larger result sets
0 100 200 300 400 500
query1
query9
query13
query16
query23b
query31
query35
query39a
query39b
query59
query64
query71
query78
query94
query98
Tine in seconds
Exasol bare metal
Exasol Bigstep
Exasol AWS
© 2014 EXASOL AG
22
Test Results – Concurrency – raw throughput
1) These numbers are based on the ‘simple subset' of queries from the total set of 99 TPC-‐DS queries -‐ these are relaNvely simple short-‐running queries 2) The numbers in the grid represent a 'Queries per hour' measure, based on the average query run Nme over the total number of queries run in the overall elapsed Nme for all queries in all streams to complete
0.0
500.0
1,000.0
1,500.0
2,000.0
2,500.0
3,000.0
1 stream 2 streams 5 streams 11 streams
Exasol on AWS
Exasol on Bigstep
Exasol bare metal
N.B. – pla$orm configura6ons are different
© 2014 EXASOL AG
23
Test Results – Concurrency – Normalised throughput
1) This chart shows is the same as the previous one, but using normalised numbers to nullify the difference in CPU performance
2) The be4er single stream performance for Bigstep is due to SSD disk I/O vs HDD 3) As the number of concurrent streams increases, the benefit of more cores becomes
apparent
0
200
400
600
800
1000
1200
1 stream 2 streams 5 streams 11 streams
Exasol on AWS
Exasol on Bigstep
Exasol bare metal
© 2014 EXASOL AG
24
Test Results – Price-‐performance comparison
1) These numbers are based on the ‘simple subset' of queries from the total set of 99 TPC-‐DS queries -‐ these are relaNvely simple short-‐running queries 2) the numbers in the grid represent a ’price-‐performance' measure, based on the 3 year TCO to achieve query throughput at various concurrency levels
EXASOL on Bare Metal
EXASOL on Bigstep EXASOL on AWS
-‐
100
200
300
400
500
600
700
800
1 stream 2 streams
5 streams 11 streams
£/QpH
Concurrency
Price-‐Performance (£/QpH) @ 1TB SF
EXASOL on Bare Metal
EXASOL on Bigstep
EXASOL on AWS
© 2014 EXASOL AG
25
Bare Metal vs Cloud considera6ons
§ The choice between bare metal and cloud is not only based on performance (or even price/performance)
§ Bare metal § Complete control of server specificaNon and operaNng environment § No requirement to move data outside the organisaNon
§ Cloud deployment § Flexible resource provisioning from a single supplier § Short-‐term workload requirements § Support capabiliNes –
§ e.g. speed to fix hardware problems § Technology refresh
§ Take advantage of new technology quickly and more easily
© 2014 EXASOL AG
26
Summary
§ Cloud hosNng is definitely viable for these types of analyNcal and reporNng workloads, but for absolute maximum performance a ‘bare-‐metal’ approach is required
§ For more complex queries, higher performance and throughput a specialised product such as Exasol is opNmal
§ Bigstep’s Full Metal Cloud provides a way of achieving ‘bare metal’ performance with the flexibility of a cloud deployment
§ Be aware of the full scope of the TPC-‐DS benchmark when comparing products based on individual query Nmings § Range of query types and syntax § Scale factor § ConfiguraNon used to run the test
© 2014 EXASOL AG
27
EXASolo – Community Edi6on
§ Recently announced – Free fully featured EXASoluNon instance § Single VM environment (no cluster support) § Supports up to 10GB RAM licence (unlimited data volume)
§ To try it you will need: § 64 bit Windows, Linux or MacOS with > 4GB RAM § A Virtual Machine player
§ VirtualBox, VMWare Player, KVM
§ EXASolo virtual machine image § Download from the Exasol website (via User portal)
§ OpNonally – EXAPlus SQL Client § ..or use your favourite ODBC/JDBC SQL client