1 NY 3 NEW YORK TRAINING 2010 NY NONRESIDENT RETURN IT 203 INSTR NY PUB 80 NY PUB 88.
UNDERSTANDING THE ARCHITECTURE OF MARIADB COLUMNSTORE · State NY CA NY ME MA Zip 11217 95389 10013...
Transcript of UNDERSTANDING THE ARCHITECTURE OF MARIADB COLUMNSTORE · State NY CA NY ME MA Zip 11217 95389 10013...
UNDERSTANDING THE ARCHITECTURE OF MARIADB COLUMNSTOREMaria Luisa RaviolSenior Sales EngineerMariaDB Corporation
Hybrid workloads
Database workloads
Current data
Range queries
Known queries
Transactional
Historical data
Aggregate queries
Unkown queries
Analytical
Analytical Transactional
PerformanceRange
Analytical Transactional
PerformanceRange
Analytical Transactional
PerformanceRange
More data
More customers
AX TX
PerformanceRange
Database (OLTP)
AX TX
PerformanceRange
Data warehouse (OLAP)
Database workloads
Current data
Range queries
Known queries
Row-based storageIndexesClustered/Replicated
Transactional
Historical data
Aggregate queries
Unkown queries
Columnar storageNo indexesDistributed
Analytical
Existing Approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses Hadoop / NoSQL
LIMITED SQL SUPPORT
DIFFICULT TO INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA MANAGEMENT
Hard to use
AX TX
PerformanceRange
Database (OLTP)
AX TX
PerformanceRange
Data warehouse (OLAP)
Application development BI/reporting + data science
Application(eCommerce)
Transactional
Show me all new products in the science fiction category
Analytical
Show me the top products added to shopping carts or purchased today, and with low inventory.
Actionable insight
I should buy one now because everyone wants one, and they’ll be sold out by the end of the day!
Data warehouse(OLAP)
Database(OLTP)
Hybrid workloads: the problem
Transactions
App/dev
Analytics
BI/reporting + data science
Data warehouse(OLAP)
Database(Hybrid)
Hybrid workloads: the solution
Transactions Analytics
App/dev
Analytics
BI/reporting + data science
MARIADB COLUMNSTORERow-oriented vs. Column-oriented format
Row-oriented vs. Column-oriented format
● Row oriented○Rows stored sequentially in a
file○Scans through every record
row by row
● Column oriented:○Each column is stored in a
separate file○Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
Single-Row Operations - InsertRow oriented:new rows appended to the end.
Column oriented:new value added to each file.
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs. column. Batch load on column-oriented is faster (compression, no indexes).
Single-Row Operations - UpdateRow oriented:Update 100% of rows means change 100% of blocks on disk.
Column oriented:Just update the blocks needed to be updated
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Single-Row Operations - DeleteRow oriented:new rows deleted
Column oriented:value deleted from each file
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Changing the table structureRow oriented:requires rebuilding of the whole table
Column oriented:Create new file for the new column
Column-oriented is very flexible for adding columns, no need for a full rebuild required with it.
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N
MARIADB COLUMNSTOREThe Architecture
Easier Enterprise Analytics
Single SQL Front-end• Use a single SQL interface for analytics and OLTP• Leverage MariaDB Security features - Encryption for data in motion, role based
access and auditing
Full ANSI SQL• No more SQL “like” query• Support complex join, aggregation and window function
Easy to manage and scale• Eliminate needs for indexes and views• Automated horizontal/vertical partitioning• Linear scalable by adding new nodes as data grows• Out of box connection with BI tools• 90.3% cost reduction per TB per year
ANSI SQL
MariaDB ColumnStore Architecture
Columnar Distributed Data StorageLocal Storage | SAN | NAS | EBS | Gluster FS
BI Tool SQL Client Custom Big Data App
Application
MariaDB SQL Front End - UM
Distributed Query Engine
- PM
Data Storage
Massively parallel, Shared nothing architecture
Shared Nothing Distributed Data Storage
SQL
ColumnPrimitives
Use
rM
odul
ePe
rform
ance
Mod
ule
UM
PM
• Query received and parsed by MariaDB Front End on UM
• Storage Engine Plugin breaks down query in primitive operations and distributes across PM
• Primitives processed on PM
• One thread working on a range of rows• Execute column restrictions and projections• Execute group by/aggregation against local
data• Each PM work on Primitives in parallel
and fully distributed• Each primitive executes in a fraction of a
second
• Return intermediate results to UM
Primitives ↓↓↓↓ Intermediate ↑↑Results↑↑
Shared Nothing Distributed Data Storage• Each PM attached to its own local disk or mount point on networked data storage• PMs Access Shared Distributed Data Storage• During run time PMs access data partition on the mount point it is attached to
Use
rM
odul
ePe
rform
ance
Mod
ule
UM
PM
Sha
red
Not
hing
D
istri
bute
d
Data
Hardware Requirements
● Lots of RAM○ minimum 32GB for UM, 16GB for PM○ minimum 4GB for trying single server out on a VM
● Optimised for HDD spindles, will still work with SSD○ We are looking into SSD optimisation soon
● More cores typically better○ 8 core minimum recommendation
● For AWS m4.4xlarge is the recommended minimum
Disk Storage
Storage Architecture
Data is stored column by columnEach column is stored in one or more extents
Each extent is represented by 1 fileEach extent is arranged in fixed size blocksExtents are compressed (using Snappy)Data is one of
Fixed size (1, 2, 4 or 8 bytes)Dictionary based with a fixed size pointer
Meta data is in an extent mapExtent map is in memoryExtent map contains meta data on each extent, like min and max
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by • Column – Acts as Vertical Partitioning• Extents – Acts as horizontal partition
Vertical Partition
Horizontal Partition
...
Vertical Partition
Vertical Partition
Vertical Partition
Horizontal Partition
Horizontal Partition
High Performance Query Processing
Horizontal Partition:
8 Million RowsExtent 2
Horizontal Partition:
8 Million RowsExtent 3
Horizontal Partition:
8 Million RowsExtent 1
Storage Architecture reduces I/O
• Only touch column files that are in filter, projection, group by, and join conditions
• Eliminate disk block touches to partitions outside filter and join conditions
Extent 1: ShipDate: 2016-01-12 - 2016-03-05
Extent 2: ShipDate: 2016-03-05 - 2016-09-23
Extent 3: ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’GROUP BY Item
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell2016-01-
12 G
2 1 2 Monitor 5 200 LG2016-01-
13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
Extents and PMs
Extent 1 Extent 2
Extent 3 Extent 4
Extent 5 Extent 6
Extent 7 Extent 8
PM 1 PM 2
Extent 1 Extent 2 Extent 3 Extent 4
Extent 5 Extent 6 Extent 7 Extent 8
PM 1 PM 2 PM 4PM 3
Writing Data
Data Loading and Extents
CSV File
Extent 1Min 1
Max 100
Extent 2Min 105Max 200
8 million rows
8 million rows
Data loadData Range1 ~ 200Rows 16 million
New CSV File
Data Range150 ~ 210Rows 24million
Extent 3Min 150Max 165
Extent 4Min 162Max 192
8 million rows
8 million rowsData load
Extent 5Min 192Max 210
Second Data Load
8 million rows
Inserting Data
● Multiple methods○ Single INSERTs○ INSERT...SELECT○ LOAD DATA INFILE○ cpimport○ Bulk Write API
● Designed for large bulk inserts● Inserts are appended at the end of extents (or new extents created)
○ This means reads are not affected○ A High Water Mark pointing to the last block is moved at the end of the insert
cpimport
● Uses CSV files or piped CSV data● Fastest way to get data into ColumnStore● Does minimal data conversion and pipes it straight into the PMs
○ Works by appending new blocks to the table and moving an atomic block pointer (HWM)
○ No UNDO log needed (atomic pointer not moved on rollback)○ Therefore can cause a gap of 0-64KB in a column
● Can load multiple tables simultaneously● Can load into multiple PMs for the same table simultaneously● Can load into specific PMs for physical partitioning by PM
A Note About DELETE
● Need to touch every column and the undo log○ So very slow
● Also leaves a gap in the column that won’t be filled● Having a column that is marked using an UPDATE query is faster● Dropping entire partitions is instantaneous
○ Partitions can be disabled first
INSERT...SELECT / LOAD DATA INFILE
● Injects the binary row data from MariaDB into cpimport● Good for backwards compatibility with tools and remote loading● cpimport then injects this data into the column extent files
○ In 1.2 it will use the write API instead● If autocommit is turned off this will behave like regular DML instead (slow)
Best Practices
Data Modeling
● Star-schema optimizations are generally a good idea
● Conservative data typing is very important
○ Especially around fixed-length vs. dictionary boundary (8 bytes)
○ IP Address vs. IP Number
● Break down compound fields into individual fields:
○ Trivializes searching for sub-fields
○ Can avoid dictionary overhead
○ Cost to re-assemble is generally small
Data Ingestion
● Avoid single inserts as continuous data feed
● Micro batch with bulk load operation
● Use cpimport or data adapters instead of LOAD DATA INFILE for bulk load
● If you wish to drop partitions in future based on particular field, load the data in sorted order of this field
THANK YOU!