MariaDB ColumnStore Introduction and MariaDB Use Cases

Click here to load reader

  • date post

    17-Jan-2017
  • Category

    Software

  • view

    415
  • download

    4

Embed Size (px)

Transcript of MariaDB ColumnStore Introduction and MariaDB Use Cases

  • 1

    MariaDB ColumnStore and Use Cases Maria Luisa Raviol, Senior Solu6on Architect, MariaDB Corp. Massimiliano Pinto, Senior So?ware Solu6ons Engineer, MariaDB Corp

  • 2

    We should be talking about the analy6cs of things, not the internet of things.

    Jim DavisCMO, SAS

  • 3

    Current State of Analy5cs

    Tradi5onal OLAP

    o Cost to perform

    o Appliances or Proprietary Solu6ons

    Big-Data Analy5cs

    o Scale to perform

    o Non-SQL Interfaces

    Analy5cs and Transac5on Separa5on

  • 4

    Why MariaDB ColumnStore

    Price to Performance at Scale

    Data Analy6cs using SQL or SPARK

    Unified Simplicity ( Transac6on and Analy6cs under the same Roof )

    Open-Source GPL2

    SQL

  • 5

    Row-Oriented vs Column-Oriented Row-oriented: rows stored sequentially in a file Key Fname Lname State Zip Phone Age Sales 1 Bugs Bunny NJ 11217 (123) 938-3235 34 100 2 Yosemite Sam CT 95389 (234) 375-6572 52 500 3 Daffy Duck IA 10013 (345) 227-1810 35 200 4 Elmer Fudd CT 04578 (456) 882-7323 43 10 5 Witch Hazel CT 01970 (567) 744-0991 57 250

    Column-oriented: each column is stored in a separate file. Each column for a given row is at the same offset. Key 1 2 3 4 5

    Fname Bugs Yosemite Daffy Elmer Witch

    Lname Bunny Sam Duck Fudd Hazel

    State NJ CT IA CT CT

    Zip 11217 95389 10013 04578 01970

    Phone (123) 938-3235 (234) 375-6572 (345) 227-1810 (456) 882-7323 (567) 744-0991

    Age 34 52 35 43 57

    Sales 100 500 200 10 250

  • 6

    Why Customers Choose MariaDB ColumnStore SCALE

    Massively parallel architecture designed for big data scaling to process petabytes of data

    Read performance scales linearly with data growth

    SPEED Excep6onal performance

    Real-6me response to analy6cs queries and High speed data loading

    SECURITY and RELIABILITY Data with encryp6on for data in mo6on, role based access and audit features of

    MariaDB Enterprise

    Built-in high availability at access and data layers

    SIMPLICITY with POWER Simplified management and maintenance, Easy installa6on and scaling

    Same interface as MariaDB and MySQL, Aeaches to wide range of BI tools

  • Columnar Distributed Data Storage

    MariaDB SQL Front End

    Query Engine

    User Modules

    Performance Modules 1 ... Performance Modules N Performance Modules 2 Performance Modules 3

    Clients

    User Connec5ons

    7

    MariaDB ColumnStore Architecture User Module : Processes SQL Requests Performance Module : Distributed Processing Engine

  • 8

    Data Storage - Extents and PMs

    Extent 1 Extent 2

    Extent 3 Extent 4

    Extent 5 Extent 6

    Extent 7 Extent 8

    PM 1 PM 2

    Extent 1 Extent 2 Extent 3 Extent 4

    Extent 5 Extent 6 Extent 7 Extent 8

    PM 1 PM 2 PM 4 PM 3

    Extent Map

    In memory meta-data of an extents min, max value for a column, extents physical block offset and PM on which the extent resides

  • Data Inges5on Bulk data loadHadoop is suitable for

    cpimport : CSV and Binary

    LOAD DATA INFILE: CSV

    Apache Sqoop Integra6on: Integra6on with cpimport and sql interface

    Future Release Data Streaming from MariaDB/MySQL database to MariaDB ColumnStore cluster

    via Kafka

    Avro data record

  • Data Inges5on - Bulk Data Load cpimport

    Fastest way to load data Load data from CSV file Load data from Standard Input Load data from Binary Source file

    Mul6ple tables in can be loaded in parallel by launching mul6ple jobs Read queries con6nue without being blocked Successful cpimport is auto-commieed In case of errors, en6re load is rolled back

    LOAD DATA INFILE Tradi6onal way of impor6ng data into any MariaDB storage engine table Up to 2 6mes slower than cpimport for large size imports Either success or error opera6on can be rolled back

  • Analy5cs In database analy6cs with complex joins, windowing func6ons and UDFs Out of box BI Tools connec6vity, Analy6cs integra6on with R

    Scale Columnar, Massively Parallel Linear scalability

    Performance High performance adhoc analysis Consistent query response 6me

    High Availability Built in redundancy and high availability

    Ease of Use ANSI SQL compa6ble ACID compliant No indexes, No materialized views No manual par66oning

    Data Inges5on CONNECT Engine Create Table as Select High speed parallel data load and extract

    Security SSL support, Audit Plugin, Authen6ca6on Plugin, Role Based Access

    Deployment Op5ons On premise, AWS, Hadoop 11

    MariaDB ColumnStore 1.0

  • Harvest new value from large historical datasets by deriving new insights Support growth in your business, while con6nue to deliver high service levels

    for data analy6cs

    Rows/DataSize Scope

    1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000 10-100GB 100-1000GB 1-10TB 10-100TB...PB

    MariaDB Enterprise OLTP

    MariaDB Enterprise Enterprise OLAP

    Use Case: Scaling Big Data Analy5cs

    12

  • Improved DBA produc6vity

    Familiar SQL interfaces democra6zes access to big data to larger user base

    Reduced opera6onal complexity

    Gekng most value out of big data while minimizing DBA Opex cost

    Use Case: Simplifying Big Data Management

  • 14

    Use Case: Simplifying Big Data Management

    MariaDB ColumnStore

    Libera6on from Index management

    Automa6c par66oning

    Easy to grow

    Micro-batch bulkload for real-6me data-flow

    Business Challenge MariaDB Solu6on Complexity of data management increases as data volume grows

    Tedious to keep up with indexes and par66oning as data grow

    Scaling-out or Scaling up management

    Moving opera6onal data to big data analy6cs plamorm in real-6me

    PM Node

    cpimport

    Source Source Source

    UM Node

    PM Node

    PM Node

  • 15

    Use Case: Scaling Big Data Analy5cs

    An organiza6on is genera6ng large amount of opera6onal data

    Mul6ple tera-bytes of historical data

    With growth in business and in opera6onal data

    Analy6cs query performance degrades

    Imprac6cal to do analy6cs

    Put past data into MariaDB ColumnStore

    As data grows

    Perform analy6cs without performance degrada6on

    Linear Scalability with data growth

    Business Challenge MariaDB Solu6on

    1 2 3

    MariaDB ColumnStore 1.0

    Add new node(s)

  • Uncover new business opportunity with data explora6on and analy6cs on petabyte data volumes

    Generate real-6me insights to inform and enhance live customer interac6ons

    Use Case: Discover Insight

  • Use Case: Discover Insight

    Challenges

    Need to analyze real-6me and historical flight parameter data

    Too 6me-consuming to perform analy6cs with current toolset

    Most data analyst have SQL background

    Objec5ves: Maintain flight safety - accurately

    predict part replacement t Provide high service levels and

    minimize cost - proac6vely plan equipment maintenance and re6rement

    Global Commercial Avia5on Manufacturer

    Historical DATA Real-6me in-flight performance data

    Complex-join, aggrega6on and windowing func6ons

    High speed real-6me performance

    Micro-batch upload real-6me flight performance into MariaDB ColumnStore

    Analy6cs DATA Scien6st

    Familiar SQL Interface

    The company plans to sell this solu5on as a service to commercial airliners

    Timely maintenance forecast, part replacement,

    flight re5rement

  • Familiar SQL interfaces democra6zes access to big data to larger user base

    Aeach wide range of BI tools via MariaDB/MySQL connectors

    Gekng most value out of big data while minimizing Opex cost

    Leverage Hadoop deployments

    Use Case: Accelerated Analy5cs with SQL & SPARK

  • 19

    Use Case: Accelerated Analy5cs with Hadoop

    MariaDB ColumnStore OLAP can run on premise, on cloud or on Hadoop cluster

    Ingest data from Hadoop

    Mature ANSI-SQL compliance

    Stellar performance : 70 to 80 6mes faster than SQL-on-Hadoop counterparts Hive, Hbase and Impala

    Mature interfaces

    Business Challenge MariaDB Solu6on Large amount of data in Hadoop

    Hadoop is suitable for

    batch processing

    Transforms via Map-Reduce programming

    Real-6me analy6cs on Hadoop

    Speed cannot meet business requirement with the Hadoop tool set

    Shortage of Hadoop skills for Data Scien6st/BA

    SQL interfaces on Hadoop Tools are not mature

    Map Reduce HBase MariaDB ColumnStore

    Hadoop Distributed File System

    Pig/Hive

    Batch Processing High Performance analy6cs

  • 20

    MariaDB to Hadoop Replica5on Coming Soon MariaDB MaxScale Binlog-Avro translator

    AVRO files: Object Container File consists of: A File Header

    4 bytes, ASCII 'O', 'b', 'j', followed by 1. File metadata, including the schema. The 16-byte, randomly-generated sync marker for this file.

    One or more file data blocks A long indicating the count of objects in this block. A long indicating the size in bytes of the serialized objects in the current

    block. The file's 16-byte sync marker.

    Note: each AVRO file contains data related to ONE table only,

  • 21

    MariaDB to Hadoop Replica5on Coming Soon

    Master

    Slaves

    Binlog to Avro

    Amazon EMR

    Amazon RedShift

    MaxScale

    Binary log events

    Avro or JSON events

    MariaDB MaxScale Binlog-Avro translator Replicate binlog events from MariaDB to Kafka Producer Kafka consumers to ingest data into Hadoop or any other custom

    data warehouse or application

  • 22

    Booking.com: a MaxScale solu6on Based in Amsterdam since 1996 150 offices worldwide +590.000 proper6es in 212 countries 42 languages (website and customer service) >3000 servers, ~90% replica6ng, around 100 master, 10 to > 50 slaves, 4 have > 100 slaves Problem? With so many slaves its easy to saturate the network interface of the master Solu6on? MariaDB MaxScale Binlog Server, that is a daemon that: ! Downloads binary logs from the master Saves them in the same structure as the master Serves the binary logs to slaves

  • 23

    Booking.com: a MaxScale solu6on

    Slaves

    Binlog Cache

    Master

    MaxScale MaxScale

    Slaves

    Binlog Cache

    MariaDB MaxScale Binlog Server: ! Horizontal scaling of slaves

    without master overload ! Crash safe disaster recovery ! Master switch/fail over without

    reconfiguring any slave

  • 24

    MariaDB ColumnStore Roadmap

    First release MariaDB ColumnStore (Por6ng of InfiniDB on MariaDB 10.1) Amazon EBS support Create Table Like/As Select

    Future Releases Spark Integra6on Data Streaming integra6on with MaxScale Na6ve API for columnar file Join and Filter performance op6miza6on ROLLUP, CUBE in MariaDB ColumnStore AS OF implementa6on in MariaDB Server CONNECT Engine support in MariaDB Server SQL Editor (OSS or 3rd party partner)

    Subscrip6on offering

  • 25

    BETA release in May 2016.

    Sign up for no6fica6on of BETA availability today

    Product Page heps://mariadb.com/products/mariadb-columnstore

    Learn more about MariaDB ColumnStore

  • 26

    Q&A

  • 27

    Thank You Maria Luisa Raviol, Senior Sales Engineer

    [email protected]

    Massimiliano Pinto, Senior So?ware Solu6ons Engineer [email protected]