Download - MariaDB ColumnStore Introduction and MariaDB Use Cases

Transcript
Page 1: MariaDB ColumnStore Introduction and MariaDB Use Cases

1

MariaDB  ColumnStore  and  Use  Cases  Maria  Luisa  Raviol,  Senior  Solu6on  Architect,  MariaDB  Corp.  Massimiliano  Pinto,  Senior  So?ware  Solu6ons  Engineer,  MariaDB  Corp      

Page 2: MariaDB ColumnStore Introduction and MariaDB Use Cases

2

We  should  be  talking  about  the  analy6cs  of  things,    not  the  internet  of  things.    

Jim  DavisCMO,  SAS  

“  

”  

Page 3: MariaDB ColumnStore Introduction and MariaDB Use Cases

3

Current  State  of  Analy5cs  

•  Tradi5onal  OLAP  

o  Cost  to  perform  

o  Appliances  or  Proprietary  Solu6ons    

•  Big-­‐Data  Analy5cs  

o  Scale  to  perform  

o  Non-­‐SQL  Interfaces  

•  Analy5cs  and  Transac5on  Separa5on  

Page 4: MariaDB ColumnStore Introduction and MariaDB Use Cases

4

Why  MariaDB  ColumnStore  

Price  to  Performance  at  Scale  

Data  Analy6cs  using  SQL  or  SPARK  

Unified  Simplicity  (  Transac6on  and  Analy6cs    under  the  same  Roof  )  

Open-­‐Source  GPL2  

SQL

Page 5: MariaDB ColumnStore Introduction and MariaDB Use Cases

5

Row-­‐Oriented  vs  Column-­‐Oriented      Row-oriented: rows stored sequentially in a file

Key   Fname   Lname   State   Zip   Phone   Age   Sales  1   Bugs   Bunny   NJ   11217   (123)  938-­‐3235   34   100  2   Yosemite   Sam   CT   95389   (234)  375-­‐6572   52   500  3   Daffy   Duck   IA   10013   (345)  227-­‐1810   35   200  4   Elmer   Fudd   CT   04578   (456)  882-­‐7323   43   10  5   Witch   Hazel   CT   01970   (567)  744-­‐0991   57   250  

Column-oriented: each column is stored in a separate file. Each column for a given row is at the same offset. Key  1  2  3  4  5  

Fname  Bugs  Yosemite  Daffy  Elmer  Witch  

Lname  Bunny  Sam  Duck  Fudd  Hazel  

State  NJ  CT  IA  CT  CT  

Zip  11217  95389  10013  04578  01970  

Phone  (123)  938-­‐3235  (234)  375-­‐6572  (345)  227-­‐1810  (456)  882-­‐7323  (567)  744-­‐0991  

Age  34  52  35  43  57  

Sales  100  500  200  10  250  

Page 6: MariaDB ColumnStore Introduction and MariaDB Use Cases

6

Why  Customers  Choose  MariaDB  ColumnStore    SCALE  ●  Massively  parallel  architecture  designed  for  big  data  scaling  to  process  petabytes  of  data  

●  Read  performance  scales  linearly  with  data  growth  

SPEED  ●  Excep6onal  performance    

●  Real-­‐6me  response  to  analy6cs  queries  and  High  speed  data  loading  

SECURITY  and  RELIABILITY  ●  Data  with  encryp6on  for  data  in  mo6on,  role  based  access  and  audit  features  of  

MariaDB  Enterprise  

●  Built-­‐in  high  availability  at  access  and  data  layers  

SIMPLICITY  with  POWER  ●  Simplified  management  and  maintenance,  Easy  installa6on  and  scaling    

●  Same  interface  as  MariaDB  and  MySQL,  Aeaches  to  wide  range  of  BI  tools  

Page 7: MariaDB ColumnStore Introduction and MariaDB Use Cases

Columnar  Distributed  Data  Storage  

MariaDB    SQL  Front  End  

Query  Engine  

User  Modules  

Performance    Modules    1   ... Performance  

Modules  N  Performance    Modules    2  

Performance    Modules    3  

Clients  

User  Connec5ons  

7

MariaDB  ColumnStore  Architecture    ▪  User  Module  :  Processes  SQL  Requests  ▪  Performance  Module  :  Distributed  Processing  Engine  

Page 8: MariaDB ColumnStore Introduction and MariaDB Use Cases

8

Data  Storage  -­‐  Extents  and  PMs      

Extent 1 Extent 2

Extent 3 Extent 4

Extent 5 Extent 6

Extent 7 Extent 8

PM 1 PM 2

Extent 1 Extent 2 Extent 3 Extent 4

Extent 5 Extent 6 Extent 7 Extent 8

PM 1 PM 2 PM 4 PM 3

●  Extent  Map  

○  In  memory  meta-­‐data  of  an  extent’s  min,  max  value  for  a  column,  extent’s  physical  block  offset  and  PM  on  which  the  extent  resides  

Page 9: MariaDB ColumnStore Introduction and MariaDB Use Cases

Data  Inges5on    ●  Bulk  data  loadHadoop  is  suitable  for    

○  cpimport  :  CSV  and  Binary  

○  LOAD  DATA  INFILE:  CSV    

●  Apache  Sqoop  Integra6on:  ○  Integra6on  with  cpimport  and  sql  interface  

●  Future  Release  ○  Data  Streaming  from  MariaDB/MySQL  database  to  MariaDB  ColumnStore  cluster  

•  via Kafka

•  Avro data record

Page 10: MariaDB ColumnStore Introduction and MariaDB Use Cases

Data  Inges5on  -­‐  Bulk  Data  Load    ●  cpimport  

○  Fastest  way  to  load  data    •  Load data from CSV file •  Load data from Standard Input •  Load data from Binary Source file

○  Mul6ple  tables  in  can  be  loaded  in  parallel  by  launching  mul6ple  jobs  ○  Read  queries  con6nue  without  being  blocked  ○  Successful  cpimport  is  auto-­‐commieed  ○  In  case  of  errors,  en6re  load  is  rolled  back  

●  LOAD  DATA  INFILE  ○  Tradi6onal  way  of  impor6ng  data  into  any  MariaDB  storage  engine  table  ○  Up  to  2  6mes  slower  than  cpimport  for  large  size  imports    ○  Either  success  or  error  opera6on  can  be  rolled  back  

 

Page 11: MariaDB ColumnStore Introduction and MariaDB Use Cases

Analy5cs   In  database  analy6cs  with  complex  joins,  windowing  func6ons  and  UDFs  Out  of  box  BI  Tools  connec6vity,  Analy6cs  integra6on  with  R  

Scale   •  Columnar,  Massively  Parallel    •  Linear  scalability    

Performance   •  High  performance  adhoc  analysis  •  Consistent  query  response  6me  

High  Availability   Built  in  redundancy  and  high  availability  

Ease  of  Use   •  ANSI  SQL  compa6ble    •  ACID  compliant  •  No  indexes,  No  materialized  views  •  No  manual  par66oning  

Data  Inges5on   CONNECT  Engine  Create  Table  as  Select  High  speed  parallel  data  load  and  extract  

Security   SSL  support,  Audit  Plugin,  Authen6ca6on  Plugin,  Role  Based  Access  

Deployment  Op5ons   On  premise,  AWS,  Hadoop  11

MariaDB  ColumnStore  1.0    

Page 12: MariaDB ColumnStore Introduction and MariaDB Use Cases

•  Harvest  new  value    from  large  historical  datasets  by  deriving  new  insights  •  Support  growth  in  your  business,  while  con6nue  to  deliver  high  service  levels  

for    data  analy6cs    

Rows/DataSize  Scope  

1        100          10,000          1,000,000        100,000,000          10,000,000,000        100,000,000,000  10-­‐100GB      100-­‐1000GB    1-­‐10TB    10-­‐100TB...PB  

           

MariaDB  Enterprise  OLTP  

           

MariaDB  Enterprise  Enterprise  OLAP  

Use  Case:  Scaling  Big  Data  Analy5cs    

12

Page 13: MariaDB ColumnStore Introduction and MariaDB Use Cases

●  Improved  DBA  produc6vity  

●  Familiar  SQL  interfaces  democra6zes  access  to  big  data  to  larger  user  base    

●  Reduced  opera6onal  complexity  

●  Gekng  most  value  out  of  big  data  while  minimizing  DBA  Opex  cost  

Use  Case:  Simplifying  Big  Data  Management    

Page 14: MariaDB ColumnStore Introduction and MariaDB Use Cases

14

Use  Case:  Simplifying  Big  Data  Management    

●  MariaDB  ColumnStore  

●  Libera6on  from  Index    management  

●  Automa6c  par66oning  

●  Easy  to  grow  

●  Micro-­‐batch  bulkload  for  real-­‐6me  data-­‐flow  

Business  Challenge   MariaDB  Solu6on  Complexity  of  data  management  increases  as  data  volume  grows  

●  Tedious  to  keep  up  with  indexes  and  par66oning    as  data  grow  

●  Scaling-­‐out  or  Scaling  up  management  

●  Moving    opera6onal  data  to    big  data  analy6cs  plamorm  in  real-­‐6me  

PM  Node  

cpimport

Source Source Source

UM  Node  

PM  Node  

PM  Node  

Page 15: MariaDB ColumnStore Introduction and MariaDB Use Cases

15

Use  Case:  Scaling  Big  Data  Analy5cs  

●  An  organiza6on  is  genera6ng  large  amount  of  opera6onal  data      

●  Mul6ple  tera-­‐bytes  of  historical  data    

●  With  growth  in  business  and  in  opera6onal  data      

○  Analy6cs  query  performance  degrades      

○  Imprac6cal  to  do  analy6cs      

●  Put  past  data  into  MariaDB  ColumnStore  

●  As  data  grows      

   

●  Perform  analy6cs  without  performance  degrada6on    

●  Linear  Scalability  with  data  growth  

Business  Challenge   MariaDB  Solu6on  

1   2   3  

MariaDB ColumnStore 1.0

Add new node(s)

Page 16: MariaDB ColumnStore Introduction and MariaDB Use Cases

●  Uncover  new  business  opportunity  with  data  explora6on  and  analy6cs  on  petabyte  data  volumes  

●  Generate  real-­‐6me  insights  to  inform  and  enhance  live  customer  interac6ons    

Use  Case:  Discover  Insight    

Page 17: MariaDB ColumnStore Introduction and MariaDB Use Cases

Use  Case:  Discover  Insight  

Challenges  

●  Need  to  analyze  real-­‐6me  and  historical  flight  parameter  data  

●  Too  6me-­‐consuming  to  perform  analy6cs  with  current  toolset  

●  Most  data  analyst  have  SQL  background  

Objec5ves:      ●  Maintain  flight  safety  -­‐  accurately  

predict  part  replacement  t  ●  Provide  high  service  levels  and  

minimize  cost  -­‐    proac6vely  plan  equipment  maintenance  and  re6rement      

Global  Commercial  Avia5on  Manufacturer    

Historical  DATA  Real-­‐6me  in-­‐flight  performance  data  

•  Complex-­‐join,  aggrega6on  and  windowing  func6ons    

•  High  speed  real-­‐6me  performance  

Micro-­‐batch  upload  real-­‐6me  flight  performance  into  MariaDB  ColumnStore    

Analy6cs  DATA  Scien6st  

Familiar  SQL  Interface  

The  company  plans  to  sell  this  solu5on  as  a  service  to  commercial  airliners  

Timely  maintenance  forecast,  part  replacement,  

flight  re5rement  

Page 18: MariaDB ColumnStore Introduction and MariaDB Use Cases

●  Familiar  SQL  interfaces  democra6zes  access  to  big  data  to  larger  user  base  

●  Aeach  wide  range  of  BI  tools  via  MariaDB/MySQL  connectors  

●  Gekng  most  value  out  of  big  data  while  minimizing  Opex  cost  

●  Leverage  Hadoop  deployments  

Use  Case:  Accelerated  Analy5cs  with  SQL  &  SPARK      

Page 19: MariaDB ColumnStore Introduction and MariaDB Use Cases

19

Use  Case:  Accelerated  Analy5cs  with  Hadoop    

●  MariaDB  ColumnStore  OLAP  can  run  on  premise,  on  cloud  or  on  Hadoop  cluster  

●  Ingest  data  from  Hadoop    

●  Mature  ANSI-­‐SQL  compliance  

●  Stellar  performance  :  70  to  80  6mes  faster  than  SQL-­‐on-­‐Hadoop  counterparts  Hive,  Hbase  and  Impala  

●  Mature  interfaces    

Business  Challenge   MariaDB  Solu6on  ●  Large    amount  of  data  in    Hadoop  

●  Hadoop  is  suitable  for    

○  batch  processing  

○  Transforms  via  Map-­‐Reduce  programming  

●  Real-­‐6me  analy6cs  on  Hadoop  

○  Speed  cannot  meet  business  requirement  with  the  Hadoop  tool  set  

●  Shortage  of  Hadoop  skills  for  Data  Scien6st/BA      

○  SQL  interfaces  on  Hadoop  Tools  are  not  mature                                                              

Map  Reduce  HBase   MariaDB  ColumnStore  

Hadoop  Distributed  File  System  

Pig/Hive  

Batch  Processing   High  Performance  analy6cs  

Page 20: MariaDB ColumnStore Introduction and MariaDB Use Cases

20

MariaDB  to  Hadoop  Replica5on  –  Coming  Soon  MariaDB  MaxScale  Binlog-­‐Avro  translator  

●  AVRO  files:  Object  Container  File  consists  of:  •  A  File  Header  

•  4 bytes, ASCII 'O', 'b', 'j', followed by 1. •  File metadata, including the schema. •  The 16-byte, randomly-generated sync marker for this file.

•  One or more file data blocks •  A long indicating the count of objects in this block. •  A long indicating the size in bytes of the serialized objects in the current

block. •  The file's 16-byte sync marker.

Note: each AVRO file contains data related to ONE table only,

Page 21: MariaDB ColumnStore Introduction and MariaDB Use Cases

21

MariaDB  to  Hadoop  Replica5on  –  Coming  Soon  

Master  

Slaves  

Binlog to Avro

Amazon EMR

Amazon RedShift

MaxScale  

Binary log events

Avro or JSON events

MariaDB MaxScale Binlog-Avro translator ○  Replicate binlog events from MariaDB to Kafka Producer ○  Kafka consumers to ingest data into Hadoop or any other custom

data warehouse or application  

Page 22: MariaDB ColumnStore Introduction and MariaDB Use Cases

22

Booking.com:  a  MaxScale  solu6on  Based  in  Amsterdam  since  1996    •  150  offices  worldwide  

•   +590.000  proper6es  in  212  countries  •   42  languages  (website  and  customer  service)    

•  >3000  servers,  ~90%  replica6ng,  around  100  master,  10  to  >  50  slaves,  4  have  >  100  slaves    

Problem?  •  With  so  many  slaves  it’s  easy  to  saturate  the  network  interface  of  the  master  

Solu6on?    •  MariaDB  MaxScale  Binlog  Server,  that  is  a  daemon  that:  !  Downloads binary logs from the master

•  Saves them in the same structure as the master

•  Serves the binary logs to slaves

 

Page 23: MariaDB ColumnStore Introduction and MariaDB Use Cases

23

Booking.com:  a  MaxScale  solu6on  

 

Slaves  

Binlog  Cache  

Master  

MaxScale  MaxScale  

Slaves  

Binlog  Cache  

MariaDB  MaxScale  Binlog  Server:  !  Horizontal  scaling  of  slaves  

without  master  overload  !  Crash  safe  disaster  recovery  !  Master  switch/fail  over  without  

reconfiguring  any  slave  

Page 24: MariaDB ColumnStore Introduction and MariaDB Use Cases

24

MariaDB  ColumnStore  Roadmap    

First  release  •  MariaDB  ColumnStore  (Por6ng  of  InfiniDB  on  MariaDB  10.1)  •  Amazon  EBS  support  •  Create  Table  Like/As  Select    

Future  Releases  •  Spark  Integra6on  •  Data  Streaming  integra6on  with  MaxScale  •  Na6ve  API  for  columnar  file  •  Join  and  Filter  performance  op6miza6on  •  ROLLUP,  CUBE  in  MariaDB  ColumnStore  •  AS  OF  implementa6on    in  MariaDB  Server  •  CONNECT  Engine  support  in  MariaDB  Server  •  SQL  Editor  (OSS  or  3rd  party  partner)      

Subscrip6on  offering      

Page 25: MariaDB ColumnStore Introduction and MariaDB Use Cases

25

●  BETA  release  in  May  2016.    

●  Sign  up  for  no6fica6on  of  BETA  availability  today  

●  Product  Page  heps://mariadb.com/products/mariadb-­‐columnstore  

Learn  more  about  MariaDB  ColumnStore  

Page 26: MariaDB ColumnStore Introduction and MariaDB Use Cases

26

Q&A  

Page 27: MariaDB ColumnStore Introduction and MariaDB Use Cases

27  

Thank  You  Maria  Luisa  Raviol,  Senior  Sales  Engineer  

[email protected]    

Massimiliano  Pinto,  Senior  So?ware  Solu6ons  Engineer  [email protected]