Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

28
© 2014 EXASOL AG Pla$orm performance comparisons, bare metal and cloud hos6ng alterna6ves Dave Shu4leworth, Principal Consultant, EXASOL UK

description

Choosing the right database technology and deployment platform can have a major impact on performance and total cost of ownership in production environments. Using the industry TPC-DS benchmark, Dave will present findings from a performance and TCO comparison of EXASOL on dedicated servers, Bigstep bare metal cloud and AWS. The presentation will be a tutorial on performance benchmarking.

Transcript of Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

Page 1: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

1  

Pla$orm  performance  comparisons,  bare  metal  and  cloud  hos6ng  alterna6ves  

Dave  Shu4leworth,  Principal  Consultant,  EXASOL  UK  

Page 2: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

2  

Agenda  

§  Who  is  Exasol  ?  §  Benchmark  Background  and  ObjecNves  §  Why  TPC-­‐DS?  §  TPC-­‐DS    approach  §  Test  plaRorms  §  Test  Results  §   Summary  

Page 3: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

3  

§  Founded  in  2000  in  Nuremberg,  Germany    

‒  Based  on  university  research  begun  in  1990s  at  Friedrich  Alexander  University  (Erlangen)  and  University  of  Jena  

§  Employees  today:  >60  

§  70+  customers  /  150+  installaNons  /  300+  OEM  customers  

§  Offices:  Brazil,  Germany,  Israel,  UK  and  US  

§  Core  product  offering:    ‒  EXASoluNon  in-­‐memory  RDBMS  

‒  EXAPowerlyNcs  analyNcs  plaRorm  

§  Key  industries:  Digital  Media,  Retail,  Financial  Services,  Healthcare  

Exasol  company  snapshot  

Page 4: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

4  

   Core  design  principles  §  Speed  §  Smart  §  Simplicity  

The  first  columnar,  in-­‐memory,  MPP  database  

Culture  §  R&D  driven  culture  §  We  deliver  on  our  promises  §  Open  and  straighRorward  to  deal  with  

EXASOL  bet  on  a  columnar,  in-­‐memory,  massively  parallel  architecture  15  years  ago  

Page 5: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

5  

70+  customers  in  11  countries  (plus  300+  OEM  customers)  

Page 6: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

6  

QphH@1000  GB   1,000,000   2,000,000   3,000,000   4.000,000  

Oct  ´11  

April  ´14  

June  ´12  

Feb  ´14  

Dec  ´13  

Aug  ´11  

Sept  ´11  

Oct  ´11  

Dec  ´11  

Source:  www.tpc.org  /  May  26,  2014  

We  are  the  benchmark  leader  

4,253,937  

Microson   134,117  

Oracle   201,487  

Oracle   209,533  

Microson   219,887  

Sybase  IQ   258,474  

Oracle   326,454  

Vectorwise   445,529  

Microson   519,976  

TPC-­‐H  is  the  industry  standard  benchmark  for  analyNcal  databases  

Page 7: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

7  

100GB  300GB  

1.000GB  3.000GB  

10.000GB  

§  #1  TPC-­‐H  -­‐  dwarfing  our  followers  

§  The  bigger  the  data,  the  greater  the  advantage  

§  30TB/100TB  results  coming  soon!  

Unseen  scalability  

Page 8: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

8  

Background  &  Objec6ves  

§  We  wanted  to  understand  the  relaNve  performance  of  Exasol  on  cloud  deployments  vs  more  convenNonal  ‘bare  metal‘    installaNons  

§  CollaboraNon  with  Bigstep  gave  us  the  opportunity  to  include  high  performance  ‘bare  metal‘  cloud  alongside  AWS  

§  We  decided  to  use  TPC-­‐DS  as  the  benchmark  test  §  Independently  specified  §  Most  recent  benchmark  (and  most  difficult!)  §  Already  experienced  with  TPC-­‐H  §  TPC-­‐DS  sample  results  are  already  appearing  for  new  products  

 

Page 9: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

9  

Why  use  TPC-­‐DS?  

§  TPC    general  characterisNcs  §  Broad  Industry  representaNon  (all  decisions  taken  by  the  TPC    board)  §  Verifiable  (audit  process)  §  Domain  specific  standard  tests  §  Cross-­‐vendor  comparisons  (performance,  TCO)  §  Use  to  evaluate  new  technologies    §  Eliminate  costly  in-­‐house  benchmark  development  

§  TPC-­‐DS  §  RealisNc  and  understandable  data  model  §  Complex  workload  

§  Large  query  set  §  ETL  like  update  model  

§  Simple  and  comprehensible  metrics    §  Already  some  (restricted)  test  results  released  for  new  products  

Page 10: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

10  

TPC-­‐DS  approach  

§  UNliNes  provided  to  generate  the  raw  data  sets  and  queries  §  Dial  in  the  scale  factor  (i.e.  overall  database  size)  –  we  used  scale  factor  1000  (1TB)  §  Generate    raw  data  (more  of  which  later)  §  Load  data  using  fastest  available  method  

§  For  a  fully  audited  TPC-­‐DS  benchmark  the  load  Nme  is  taken  into  consideraNon  §  Tuning  via  indexes,  data  distribuNon  etc  is  allowed  at  this  stage  (but  any  Nme  

taken  should  be  included  in  the  ‘set  up  Nme’)  §  Generate  query  scripts  

§  Can    generate  a  ‘qualificaNon  script’  for  syntax  check  –i.e.  all  queries  in  sequence  

§  Then  generate  a  series  of  scripts  for  to  be  run  as  individual  streams  for  ‘throughput’  test  

§  The  generaNon  process  will  create  query  scripts  in  different  sequences,  with  different  selecNon  criteria  etc  –  scale  factor  determines  number  of  concurrent  streams  

Page 11: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

11  

TPC-­‐DS    overview  –  Data  Model  

     Catalog  Returns    

 Catalog  Sales    

 Inventory    

 Web  Returns    

 Web  Sales    

 Store  Returns    

 Store  Sales    l  3  sales  channels:  Catalog  -­‐  Web  -­‐  Store  l  7  fact  tables  l  2  fact  tables  for  each  sales  channel  

l  24  tables  total  

l  At  Scale  factor  1000  (1TB):  

l  Store  Sales  –  2.87  billion  rows  

l  Catalog  Sales  –  1.43  billion  rows  

l  Web  Sales  –  720  million  rows  

l  Inventory  –  783  million  rows  

Source:  ‘The  making  of  TPC-­‐DS’  –  VLDB  conference  2006  

Page 12: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

12  

TPC-­‐DS    overview  –  Data  Model  

   

Date_Dim

Item Time_Dim

Customer_ Demographics

Store

Household_ Demographics

Promotion

Income_ Band Customer

Customer_ Address

Store_Sales

Source:  ‘The  making  of  TPC-­‐DS’  –  VLDB  conference  2006  

Page 13: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

13  

TPC-­‐DS  Overview  –  Data  Model  

§  Some  data  has  “real  world”  content:  §  Last  name  “Sanchez”,  “Ward”,  “Roberts”  §  Addresses  “630  Railroad,  Woodbine,  Sullivan  County,MO-­‐64253”  

§  Data  is  skewed  §  Sales  are  modeled  aner  US  census  data    §  More  green  items  than  red  §  Small  and  large  ciNes  

§  RealisNc  table  scaling  §  Non-­‐uniform  distribuNons    à  challenging  for:  

§  staNsNcs  collecNon  §  query  opNmizer  

Page 14: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

14  

TPC-­‐DS  Overview  –  Data  Model  

Distribution of Store Sales over Month

0

100000

200000

300000

400000

500000

600000

1 2 3 4 5 6 7 8 9 10 11 12

Month

Stor

e Sa

les

Group 1

Group 2

Group 3 14 % of all sales happen between January and July

28 % of all sales happen between

August and October

58% of all sales happen in November and

December

Source:  ‘The  making  of  TPC-­‐DS’  –  VLDB  conference  2006  

Page 15: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

15  

TPC-­‐DS  Overview  –  queries  

Data Mining

Iterative

Ad-hoc Reporting

Type

Queries feeding Data Mining Tools for further processing

Users issuing sequences of queries

Sporadic queries, minimal tuning Finely tuned reoccurring queries

simulate

Return large number of rows

Sequence of queries where each query adds SQL elements

Access Store and Web Sales Channel tables

Access catalog sales channel tables

Implemented via

10

4

47 38

Templates

§  Query  Language:  SQL99  +  OLAP  analyNcal  extensions  §  Query  needs  to  be  executed  “as  is”  §  No  hints  or  rewrites  allowed,  except  when  approved  by  TPC  §  99  different  query  templates  §  4  different  query  types:  

Source:  ‘The  making  of  TPC-­‐DS’  –  VLDB  conference  2006  

Page 16: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

16  

Test  pla$orms  The  objecNve  was  to  use  similar  4  node  configuraNons  with  a  similar  amount  of  RAM  –  but  the  actual  configuraNons  available  were  these:  

Platform CPU RAM Disk

Bigstep2  x  Intel  XEON  E5-­‐2430  CPU  6-­‐core  2.2GHz 96GB

1  x  750GB  SAN  (SSD)

AWShs1.8xlarge:  16  vCPUs  (Intel  XEON) 117GB 24  x  2TB  HDD

Exasol  'bare  metal'

DELL  PowerEdge  R720:2  x  Intel  Xeon  E5-­‐2680v2  CPU  (10-­‐core)  2.8GHz 128GB 8  x  1.2TB  HDD

The  EXASOL  database  was  configured  to  use  the  same  amount  of  RAM  across  all  plaRorms  (344GB)  

Page 17: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

17  

TPC-­‐DS  test  considera6ons  

§  Some  SQL  implementaNons  would  be  unable  to  run  all  99  queries  due  to  unsupported  SQL  funcNons  –  e.g.  § SQL  2008  AnalyNcal  funcNons  such  as  RANK  §  INTERSECT,  EXCEPT  operators  § GROUPING  and  ROLLUP  funcNons  § Some  subquery  syntax  

§  Exasol  is  able  to  run  all  queries  

Page 18: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

18  

Test  findings  

§  This  test  was  not  intended  to  show  the  absolute  maximum  performance  for  the  TPC-­‐DS  benchmark,  but  as  a  method  of  comparing  performance  across  the  various  plaRorms  

§  These  results  do  NOT  consNtute  a  full  TPC-­‐DS  benchmark  as  the  complete  test  regime  as  specified  by  TPC  has  not  been  done  

§  No  vendor  has  yet  posted  a  complete  audited  TPC-­‐DS  benchmark  result  set  

§  HOWEVER  –  the  same  set  of  TPC-­‐DS  queries  was  run  against  the  same  1TB    data  set  across  all  plaRorms  and  so  the  Nmings  can  be  compared  

§  There  is  some  variability  between  plaRorms  used  due  to  the  configuraNons  available  

Page 19: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

19  

Test  Results  –  Single  stream  –  simple  queries  

These  numbers  are  based  on  a  subset  of  17  queries  from  the  total  set  of  99  TPC-­‐DS  queries  -­‐  these  are  relaNvely  simple  short-­‐running  queries.    These  queries  are  seen  in  several  other  published  result  sets      

0.0   10.0   20.0   30.0   40.0   50.0   60.0   70.0  

query34  

query3  

query42  

query43  

query46  

query52  

query53  

query55  

query59  

query63  

query65  

query68  

query73  

query79  

query7  

query89  

query98  

Time  in  seconds  

Exasol  bare  metal  

Exasol  Bigstep  

Exasol  AWS  

Page 20: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

20  

Test  Results  –  Single  stream  –  medium  complexity  

These  numbers  are  based  on  a  subset  of  14  queries  from  the  total  set  of  99  TPC-­‐DS  queries  -­‐  these  are  medium  complexity  queries  which  typically  include  some  large  joins  

0.0   10.0   20.0   30.0   40.0   50.0   60.0  

query37  

query40  

query43  

query46  

query59  

query68  

query72  

query73  

query75  

query82  

query85  

query88  

query93  

query99  

Time  in  seconds  

Exasol  bare  metal  

Exasol  Bigstep  

Exasol  AWS  

Page 21: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

21  

Test  Results  –  Single  stream  –  complex  queries  

These  numbers  are  based  on  a  subset  of  15  queries  from  the  total  set  of  99  TPC-­‐DS  queries  -­‐  these  are  higher  complexity  queries,  or  queries  that  return  larger  result  sets  

0   100   200   300   400   500  

query1  

query9  

query13  

query16  

query23b  

query31  

query35  

query39a  

query39b  

query59  

query64  

query71  

query78  

query94  

query98  

Tine  in  seconds  

Exasol  bare  metal  

Exasol  Bigstep  

Exasol    AWS  

Page 22: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

22  

Test  Results  –  Concurrency  –  raw  throughput  

1)  These  numbers  are  based  on  the  ‘simple  subset'  of  queries  from  the  total  set  of  99  TPC-­‐DS  queries  -­‐  these  are  relaNvely  simple  short-­‐running  queries  2)  The  numbers  in  the  grid  represent  a  'Queries  per  hour'  measure,  based  on  the  average  query  run  Nme  over  the  total  number  of  queries  run  in  the  overall  elapsed  Nme  for  all  queries  in  all  streams  to  complete  

0.0  

500.0  

1,000.0  

1,500.0  

2,000.0  

2,500.0  

3,000.0  

1  stream   2  streams   5  streams   11  streams  

Exasol  on  AWS  

Exasol  on  Bigstep  

Exasol  bare  metal  

N.B.  –  pla$orm  configura6ons  are  different  

Page 23: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

23  

Test  Results  –  Concurrency  –  Normalised  throughput  

1)  This  chart  shows  is  the  same  as  the  previous  one,  but  using  normalised  numbers  to  nullify  the  difference  in  CPU  performance  

2)  The  be4er  single  stream  performance  for  Bigstep  is  due  to  SSD  disk  I/O  vs  HDD  3)  As  the  number  of  concurrent  streams  increases,  the  benefit  of  more  cores  becomes  

apparent  

0  

200  

400  

600  

800  

1000  

1200  

1  stream   2  streams   5  streams   11  streams  

Exasol  on  AWS  

Exasol  on  Bigstep  

Exasol  bare  metal  

Page 24: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

24  

Test  Results  –  Price-­‐performance  comparison  

1)  These  numbers  are  based  on  the  ‘simple  subset'  of  queries  from  the  total  set  of  99  TPC-­‐DS  queries  -­‐  these  are  relaNvely  simple  short-­‐running  queries  2)  the  numbers  in  the  grid  represent  a  ’price-­‐performance'  measure,  based  on  the  3  year  TCO  to  achieve  query  throughput  at  various  concurrency  levels  

EXASOL  on  Bare  Metal  

EXASOL  on  Bigstep  EXASOL  on  AWS  

 -­‐        

 100    

 200    

 300    

 400    

 500    

 600    

 700    

 800    

1  stream  2  streams  

5  streams  11  streams  

£/QpH

 

Concurrency    

Price-­‐Performance  (£/QpH)  @  1TB  SF  

EXASOL  on  Bare  Metal  

EXASOL  on  Bigstep  

EXASOL  on  AWS  

Page 25: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

25  

Bare  Metal  vs  Cloud  considera6ons  

§  The  choice  between  bare  metal  and  cloud  is  not  only  based  on  performance  (or  even  price/performance)  

§  Bare  metal  §  Complete  control  of  server  specificaNon  and  operaNng  environment  §  No  requirement  to  move  data  outside  the  organisaNon  

§  Cloud  deployment  §  Flexible  resource  provisioning  from  a  single  supplier  §  Short-­‐term  workload  requirements  §  Support  capabiliNes  –    

§  e.g.  speed  to  fix  hardware  problems  §  Technology  refresh  

§  Take  advantage  of  new  technology  quickly  and  more  easily  

Page 26: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

26  

Summary  

§  Cloud  hosNng  is  definitely  viable  for  these  types  of  analyNcal  and  reporNng  workloads,  but  for  absolute  maximum  performance  a  ‘bare-­‐metal’  approach  is  required  

§  For  more  complex  queries,  higher  performance  and  throughput  a  specialised  product  such  as  Exasol  is  opNmal  

§  Bigstep’s  Full  Metal  Cloud  provides  a  way  of  achieving  ‘bare  metal’  performance  with  the  flexibility  of  a  cloud  deployment  

§  Be  aware  of  the  full  scope  of  the  TPC-­‐DS  benchmark  when  comparing  products  based  on  individual  query  Nmings  §  Range  of  query  types  and  syntax  §  Scale  factor  §  ConfiguraNon  used  to  run  the  test  

Page 27: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

27  

EXASolo  –  Community  Edi6on  

§  Recently  announced  –  Free  fully  featured  EXASoluNon  instance  §  Single  VM  environment  (no  cluster  support)  §  Supports  up  to  10GB  RAM  licence  (unlimited  data  volume)  

§  To  try  it  you  will  need:  §  64  bit  Windows,  Linux  or  MacOS  with  >  4GB  RAM  §  A  Virtual  Machine  player  

§ VirtualBox,  VMWare  Player,  KVM  

§  EXASolo  virtual  machine  image  § Download  from  the  Exasol  website  (via  User  portal)  

§  OpNonally  –  EXAPlus  SQL  Client  §  ..or  use  your  favourite  ODBC/JDBC  SQL  client  

Page 28: Dave Shuttleworth - Platform performance comparisons, bare metal and cloud hosting alternatives

©  2014  EXASOL  AG  

28  

Ques6ons  ?  

My  contact  details  :    Email  :  [email protected]    Twieer  :  @EXA_DaveS