MariaDB ColumnStore Introduction and MariaDB Use Cases
Embed Size (px)
Transcript of MariaDB ColumnStore Introduction and MariaDB Use Cases
-
1
MariaDB ColumnStore and Use Cases Maria Luisa Raviol, Senior Solu6on Architect, MariaDB Corp. Massimiliano Pinto, Senior So?ware Solu6ons Engineer, MariaDB Corp
-
2
We should be talking about the analy6cs of things, not the internet of things.
Jim DavisCMO, SAS
-
3
Current State of Analy5cs
Tradi5onal OLAP
o Cost to perform
o Appliances or Proprietary Solu6ons
Big-Data Analy5cs
o Scale to perform
o Non-SQL Interfaces
Analy5cs and Transac5on Separa5on
-
4
Why MariaDB ColumnStore
Price to Performance at Scale
Data Analy6cs using SQL or SPARK
Unified Simplicity ( Transac6on and Analy6cs under the same Roof )
Open-Source GPL2
SQL
-
5
Row-Oriented vs Column-Oriented Row-oriented: rows stored sequentially in a file Key Fname Lname State Zip Phone Age Sales 1 Bugs Bunny NJ 11217 (123) 938-3235 34 100 2 Yosemite Sam CT 95389 (234) 375-6572 52 500 3 Daffy Duck IA 10013 (345) 227-1810 35 200 4 Elmer Fudd CT 04578 (456) 882-7323 43 10 5 Witch Hazel CT 01970 (567) 744-0991 57 250
Column-oriented: each column is stored in a separate file. Each column for a given row is at the same offset. Key 1 2 3 4 5
Fname Bugs Yosemite Daffy Elmer Witch
Lname Bunny Sam Duck Fudd Hazel
State NJ CT IA CT CT
Zip 11217 95389 10013 04578 01970
Phone (123) 938-3235 (234) 375-6572 (345) 227-1810 (456) 882-7323 (567) 744-0991
Age 34 52 35 43 57
Sales 100 500 200 10 250
-
6
Why Customers Choose MariaDB ColumnStore SCALE
Massively parallel architecture designed for big data scaling to process petabytes of data
Read performance scales linearly with data growth
SPEED Excep6onal performance
Real-6me response to analy6cs queries and High speed data loading
SECURITY and RELIABILITY Data with encryp6on for data in mo6on, role based access and audit features of
MariaDB Enterprise
Built-in high availability at access and data layers
SIMPLICITY with POWER Simplified management and maintenance, Easy installa6on and scaling
Same interface as MariaDB and MySQL, Aeaches to wide range of BI tools
-
Columnar Distributed Data Storage
MariaDB SQL Front End
Query Engine
User Modules
Performance Modules 1 ... Performance Modules N Performance Modules 2 Performance Modules 3
Clients
User Connec5ons
7
MariaDB ColumnStore Architecture User Module : Processes SQL Requests Performance Module : Distributed Processing Engine
-
8
Data Storage - Extents and PMs
Extent 1 Extent 2
Extent 3 Extent 4
Extent 5 Extent 6
Extent 7 Extent 8
PM 1 PM 2
Extent 1 Extent 2 Extent 3 Extent 4
Extent 5 Extent 6 Extent 7 Extent 8
PM 1 PM 2 PM 4 PM 3
Extent Map
In memory meta-data of an extents min, max value for a column, extents physical block offset and PM on which the extent resides
-
Data Inges5on Bulk data loadHadoop is suitable for
cpimport : CSV and Binary
LOAD DATA INFILE: CSV
Apache Sqoop Integra6on: Integra6on with cpimport and sql interface
Future Release Data Streaming from MariaDB/MySQL database to MariaDB ColumnStore cluster
via Kafka
Avro data record
-
Data Inges5on - Bulk Data Load cpimport
Fastest way to load data Load data from CSV file Load data from Standard Input Load data from Binary Source file
Mul6ple tables in can be loaded in parallel by launching mul6ple jobs Read queries con6nue without being blocked Successful cpimport is auto-commieed In case of errors, en6re load is rolled back
LOAD DATA INFILE Tradi6onal way of impor6ng data into any MariaDB storage engine table Up to 2 6mes slower than cpimport for large size imports Either success or error opera6on can be rolled back
-
Analy5cs In database analy6cs with complex joins, windowing func6ons and UDFs Out of box BI Tools connec6vity, Analy6cs integra6on with R
Scale Columnar, Massively Parallel Linear scalability
Performance High performance adhoc analysis Consistent query response 6me
High Availability Built in redundancy and high availability
Ease of Use ANSI SQL compa6ble ACID compliant No indexes, No materialized views No manual par66oning
Data Inges5on CONNECT Engine Create Table as Select High speed parallel data load and extract
Security SSL support, Audit Plugin, Authen6ca6on Plugin, Role Based Access
Deployment Op5ons On premise, AWS, Hadoop 11
MariaDB ColumnStore 1.0
-
Harvest new value from large historical datasets by deriving new insights Support growth in your business, while con6nue to deliver high service levels
for data analy6cs
Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000 10-100GB 100-1000GB 1-10TB 10-100TB...PB
MariaDB Enterprise OLTP
MariaDB Enterprise Enterprise OLAP
Use Case: Scaling Big Data Analy5cs
12
-
Improved DBA produc6vity
Familiar SQL interfaces democra6zes access to big data to larger user base
Reduced opera6onal complexity
Gekng most value out of big data while minimizing DBA Opex cost
Use Case: Simplifying Big Data Management
-
14
Use Case: Simplifying Big Data Management
MariaDB ColumnStore
Libera6on from Index management
Automa6c par66oning
Easy to grow
Micro-batch bulkload for real-6me data-flow
Business Challenge MariaDB Solu6on Complexity of data management increases as data volume grows
Tedious to keep up with indexes and par66oning as data grow
Scaling-out or Scaling up management
Moving opera6onal data to big data analy6cs plamorm in real-6me
PM Node
cpimport
Source Source Source
UM Node
PM Node
PM Node
-
15
Use Case: Scaling Big Data Analy5cs
An organiza6on is genera6ng large amount of opera6onal data
Mul6ple tera-bytes of historical data
With growth in business and in opera6onal data
Analy6cs query performance degrades
Imprac6cal to do analy6cs
Put past data into MariaDB ColumnStore
As data grows
Perform analy6cs without performance degrada6on
Linear Scalability with data growth
Business Challenge MariaDB Solu6on
1 2 3
MariaDB ColumnStore 1.0
Add new node(s)
-
Uncover new business opportunity with data explora6on and analy6cs on petabyte data volumes
Generate real-6me insights to inform and enhance live customer interac6ons
Use Case: Discover Insight
-
Use Case: Discover Insight
Challenges
Need to analyze real-6me and historical flight parameter data
Too 6me-consuming to perform analy6cs with current toolset
Most data analyst have SQL background
Objec5ves: Maintain flight safety - accurately
predict part replacement t Provide high service levels and
minimize cost - proac6vely plan equipment maintenance and re6rement
Global Commercial Avia5on Manufacturer
Historical DATA Real-6me in-flight performance data
Complex-join, aggrega6on and windowing func6ons
High speed real-6me performance
Micro-batch upload real-6me flight performance into MariaDB ColumnStore
Analy6cs DATA Scien6st
Familiar SQL Interface
The company plans to sell this solu5on as a service to commercial airliners
Timely maintenance forecast, part replacement,
flight re5rement
-
Familiar SQL interfaces democra6zes access to big data to larger user base
Aeach wide range of BI tools via MariaDB/MySQL connectors
Gekng most value out of big data while minimizing Opex cost
Leverage Hadoop deployments
Use Case: Accelerated Analy5cs with SQL & SPARK
-
19
Use Case: Accelerated Analy5cs with Hadoop
MariaDB ColumnStore OLAP can run on premise, on cloud or on Hadoop cluster
Ingest data from Hadoop
Mature ANSI-SQL compliance
Stellar performance : 70 to 80 6mes faster than SQL-on-Hadoop counterparts Hive, Hbase and Impala
Mature interfaces
Business Challenge MariaDB Solu6on Large amount of data in Hadoop
Hadoop is suitable for
batch processing
Transforms via Map-Reduce programming
Real-6me analy6cs on Hadoop
Speed cannot meet business requirement with the Hadoop tool set
Shortage of Hadoop skills for Data Scien6st/BA
SQL interfaces on Hadoop Tools are not mature
Map Reduce HBase MariaDB ColumnStore
Hadoop Distributed File System
Pig/Hive
Batch Processing High Performance analy6cs
-
20
MariaDB to Hadoop Replica5on Coming Soon MariaDB MaxScale Binlog-Avro translator
AVRO files: Object Container File consists of: A File Header
4 bytes, ASCII 'O', 'b', 'j', followed by 1. File metadata, including the schema. The 16-byte, randomly-generated sync marker for this file.
One or more file data blocks A long indicating the count of objects in this block. A long indicating the size in bytes of the serialized objects in the current
block. The file's 16-byte sync marker.
Note: each AVRO file contains data related to ONE table only,
-
21
MariaDB to Hadoop Replica5on Coming Soon
Master
Slaves
Binlog to Avro
Amazon EMR
Amazon RedShift
MaxScale
Binary log events
Avro or JSON events
MariaDB MaxScale Binlog-Avro translator Replicate binlog events from MariaDB to Kafka Producer Kafka consumers to ingest data into Hadoop or any other custom
data warehouse or application
-
22
Booking.com: a MaxScale solu6on Based in Amsterdam since 1996 150 offices worldwide +590.000 proper6es in 212 countries 42 languages (website and customer service) >3000 servers, ~90% replica6ng, around 100 master, 10 to > 50 slaves, 4 have > 100 slaves Problem? With so many slaves its easy to saturate the network interface of the master Solu6on? MariaDB MaxScale Binlog Server, that is a daemon that: ! Downloads binary logs from the master Saves them in the same structure as the master Serves the binary logs to slaves
-
23
Booking.com: a MaxScale solu6on
Slaves
Binlog Cache
Master
MaxScale MaxScale
Slaves
Binlog Cache
MariaDB MaxScale Binlog Server: ! Horizontal scaling of slaves
without master overload ! Crash safe disaster recovery ! Master switch/fail over without
reconfiguring any slave
-
24
MariaDB ColumnStore Roadmap
First release MariaDB ColumnStore (Por6ng of InfiniDB on MariaDB 10.1) Amazon EBS support Create Table Like/As Select
Future Releases Spark Integra6on Data Streaming integra6on with MaxScale Na6ve API for columnar file Join and Filter performance op6miza6on ROLLUP, CUBE in MariaDB ColumnStore AS OF implementa6on in MariaDB Server CONNECT Engine support in MariaDB Server SQL Editor (OSS or 3rd party partner)
Subscrip6on offering
-
25
BETA release in May 2016.
Sign up for no6fica6on of BETA availability today
Product Page heps://mariadb.com/products/mariadb-columnstore
Learn more about MariaDB ColumnStore
-
26
Q&A
-
27
Thank You Maria Luisa Raviol, Senior Sales Engineer
Massimiliano Pinto, Senior So?ware Solu6ons Engineer [email protected]