UNDERSTANDING THE ARCHITECTURE OF MARIADB COLUMNSTORE · State NY CA NY ME MA Zip 11217 95389 10013...

Post on 30-May-2020

4 views 0 download

Transcript of UNDERSTANDING THE ARCHITECTURE OF MARIADB COLUMNSTORE · State NY CA NY ME MA Zip 11217 95389 10013...

UNDERSTANDING THE ARCHITECTURE OF MARIADB COLUMNSTOREMaria Luisa RaviolSenior Sales EngineerMariaDB Corporation

Hybrid workloads

Database workloads

Current data

Range queries

Known queries

Transactional

Historical data

Aggregate queries

Unkown queries

Analytical

Analytical Transactional

PerformanceRange

Analytical Transactional

PerformanceRange

Analytical Transactional

PerformanceRange

More data

More customers

AX TX

PerformanceRange

Database (OLTP)

AX TX

PerformanceRange

Data warehouse (OLAP)

Database workloads

Current data

Range queries

Known queries

Row-based storageIndexesClustered/Replicated

Transactional

Historical data

Aggregate queries

Unkown queries

Columnar storageNo indexesDistributed

Analytical

Existing Approaches

Limited real time analytics

Slow releases of product innovation

Expensive hardware and software

Data Warehouses Hadoop / NoSQL

LIMITED SQL SUPPORT

DIFFICULT TO INSTALL/MANAGE

LIMITED TALENT POOL

DATA LAKE W/ NO DATA MANAGEMENT

Hard to use

AX TX

PerformanceRange

Database (OLTP)

AX TX

PerformanceRange

Data warehouse (OLAP)

Application development BI/reporting + data science

Application(eCommerce)

Transactional

Show me all new products in the science fiction category

Analytical

Show me the top products added to shopping carts or purchased today, and with low inventory.

Actionable insight

I should buy one now because everyone wants one, and they’ll be sold out by the end of the day!

Data warehouse(OLAP)

Database(OLTP)

Hybrid workloads: the problem

Transactions

App/dev

Analytics

BI/reporting + data science

Data warehouse(OLAP)

Database(Hybrid)

Hybrid workloads: the solution

Transactions Analytics

App/dev

Analytics

BI/reporting + data science

MARIADB COLUMNSTORERow-oriented vs. Column-oriented format

Row-oriented vs. Column-oriented format

● Row oriented○Rows stored sequentially in a

file○Scans through every record

row by row

● Column oriented:○Each column is stored in a

separate file○Scans only the relevant

columns

ID Fname Lname State Zip Phone Age Sex

1 Bugs Bunny NY 11217 (718) 938-3235 34 M

2 Yosemite Sam CA 95389 (209) 375-6572 52 M

3 Daffy Duck NY 10013 (212) 227-1810 35 M

4 Elmer Fudd ME 04578 (207) 882-7323 43 M

5 Witch Hazel MA 01970 (978) 744-0991 57 F

ID

1

2

3

4

5

Fname

Bugs

Yosemite

Daffy

Elmer

Witch

Lname

Bunny

Sam

Duck

Fudd

Hazel

State

NY

CA

NY

ME

MA

Zip

11217

95389

10013

04578

01970

Phone

(718) 938-3235

(209) 375-6572

(212) 227-1810

(207) 882-7323

(978) 744-0991

Age

34

52

35

43

57

Sex

M

M

M

M

F

SELECT Fname FROM People WHERE State = 'NY'

Single-Row Operations - InsertRow oriented:new rows appended to the end.

Column oriented:new value added to each file.

Key Fname Lname State Zip Phone Age Sex

1 Bugs Bunny NY 11217 (718) 938-3235 34 M

2 Yosemite Sam CA 95389 (209) 375-6572 52 M

3 Daffy Duck NY 10013 (212) 227-1810 35 M

4 Elmer Fudd ME 04578 (207) 882-7323 43 M

5 Witch Hazel MA 01970 (978) 744-0991 57 F

6 Marvin Martian CA 91602 (818) 761-9964 26 M

Key

1

2

3

4

5

Fname

Bugs

Yosemite

Daffy

Elmer

Witch

Lname

Bunny

Sam

Duck

Fudd

Hazel

State

NY

CA

NY

ME

MA

Zip

11217

95389

10013

04578

01970

Phone

(718) 938-3235

(209) 375-6572

(212) 227-1810

(207) 882-7323

(978) 744-0991

Age

34

52

35

43

57

Sex

M

M

M

M

F

6 Marvin Martian CA 91602 (818) 761-9964 26 M

Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs. column. Batch load on column-oriented is faster (compression, no indexes).

Single-Row Operations - UpdateRow oriented:Update 100% of rows means change 100% of blocks on disk.

Column oriented:Just update the blocks needed to be updated

Key Fname Lname State Zip Phone Age Sex

1 Bugs Bunny NY 11217 (718) 938-3235 34 M

2 Yosemite Sam CA 95389 (209) 375-6572 52 M

3 Daffy Duck NY 10013 (212) 227-1810 35 M

4 Elmer Fudd ME 04578 (207) 882-7323 43 M

5 Witch Hazel MA 01970 (978) 744-0991 57 F

Key

1

2

3

4

5

Fname

Bugs

Yosemite

Daffy

Elmer

Witch

Lname

Bunny

Sam

Duck

Fudd

Hazel

State

NY

CA

NY

ME

MA

Zip

11217

95389

10013

04578

01970

Phone

(718) 938-3235

(209) 375-6572

(212) 227-1810

(207) 882-7323

(978) 744-0991

Age

34

52

35

43

57

Sex

M

M

M

M

F

Single-Row Operations - DeleteRow oriented:new rows deleted

Column oriented:value deleted from each file

Key Fname Lname State Zip Phone Age Sex

1 Bugs Bunny NY 11217 (718) 938-3235 34 M

2 Yosemite Sam CA 95389 (209) 375-6572 52 M

3 Daffy Duck NY 10013 (212) 227-1810 35 M

4 Elmer Fudd ME 04578 (207) 882-7323 43 M

5 Witch Hazel MA 01970 (978) 744-0991 57 F

6 Marvin Martian CA 91602 (818) 761-9964 26 M

Key

1

2

3

4

5

Fname

Bugs

Yosemite

Daffy

Elmer

Witch

Lname

Bunny

Sam

Duck

Fudd

Hazel

State

NY

CA

NY

ME

MA

Zip

11217

95389

10013

04578

01970

Phone

(718) 938-3235

(209) 375-6572

(212) 227-1810

(207) 882-7323

(978) 744-0991

Age

34

52

35

43

57

Sex

M

M

M

M

F

6 Marvin Martian CA 91602 (818) 761-9964 26 M

Changing the table structureRow oriented:requires rebuilding of the whole table

Column oriented:Create new file for the new column

Column-oriented is very flexible for adding columns, no need for a full rebuild required with it.

Key Fname Lname State Zip Phone Age Sex Active

1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y

2 Yosemite Sam CA 95389 (209) 375-6572 52 M N

3 Daffy Duck NY 10013 (212) 227-1810 35 M N

4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y

5 Witch Hazel MA 01970 (978) 744-0991 57 F N

Key

1

2

3

4

5

Fname

Bugs

Yosemite

Daffy

Elmer

Witch

Lname

Bunny

Sam

Duck

Fudd

Hazel

State

NY

CA

NY

ME

MA

Zip

11217

95389

10013

04578

01970

Phone

(718) 938-3235

(209) 375-6572

(212) 227-1810

(207) 882-7323

(978) 744-0991

Age

34

52

35

43

57

Sex

M

M

M

M

F

Active

Y

N

N

Y

N

MARIADB COLUMNSTOREThe Architecture

Easier Enterprise Analytics

Single SQL Front-end• Use a single SQL interface for analytics and OLTP• Leverage MariaDB Security features - Encryption for data in motion, role based

access and auditing

Full ANSI SQL• No more SQL “like” query• Support complex join, aggregation and window function

Easy to manage and scale• Eliminate needs for indexes and views• Automated horizontal/vertical partitioning• Linear scalable by adding new nodes as data grows• Out of box connection with BI tools• 90.3% cost reduction per TB per year

ANSI SQL

MariaDB ColumnStore Architecture

Columnar Distributed Data StorageLocal Storage | SAN | NAS | EBS | Gluster FS

BI Tool SQL Client Custom Big Data App

Application

MariaDB SQL Front End - UM

Distributed Query Engine

- PM

Data Storage

Massively parallel, Shared nothing architecture

Shared Nothing Distributed Data Storage

SQL

ColumnPrimitives

Use

rM

odul

ePe

rform

ance

Mod

ule

UM

PM

• Query received and parsed by MariaDB Front End on UM

• Storage Engine Plugin breaks down query in primitive operations and distributes across PM

• Primitives processed on PM

• One thread working on a range of rows• Execute column restrictions and projections• Execute group by/aggregation against local

data• Each PM work on Primitives in parallel

and fully distributed• Each primitive executes in a fraction of a

second

• Return intermediate results to UM

Primitives ↓↓↓↓ Intermediate ↑↑Results↑↑

Shared Nothing Distributed Data Storage• Each PM attached to its own local disk or mount point on networked data storage• PMs Access Shared Distributed Data Storage• During run time PMs access data partition on the mount point it is attached to

Use

rM

odul

ePe

rform

ance

Mod

ule

UM

PM

Sha

red

Not

hing

D

istri

bute

d

Data

Hardware Requirements

● Lots of RAM○ minimum 32GB for UM, 16GB for PM○ minimum 4GB for trying single server out on a VM

● Optimised for HDD spindles, will still work with SSD○ We are looking into SSD optimisation soon

● More cores typically better○ 8 core minimum recommendation

● For AWS m4.4xlarge is the recommended minimum

Disk Storage

Storage Architecture

Data is stored column by columnEach column is stored in one or more extents

Each extent is represented by 1 fileEach extent is arranged in fixed size blocksExtents are compressed (using Snappy)Data is one of

Fixed size (1, 2, 4 or 8 bytes)Dictionary based with a fixed size pointer

Meta data is in an extent mapExtent map is in memoryExtent map contains meta data on each extent, like min and max

Column 1

Extent 1 (8 million rows, 8MB~64MB)

Extent 2 (8 million rows)

Extent M (8 million rows)

Column 2 Column 3 ... Column N

Data automatically arranged by • Column – Acts as Vertical Partitioning• Extents – Acts as horizontal partition

Vertical Partition

Horizontal Partition

...

Vertical Partition

Vertical Partition

Vertical Partition

Horizontal Partition

Horizontal Partition

High Performance Query Processing

Horizontal Partition:

8 Million RowsExtent 2

Horizontal Partition:

8 Million RowsExtent 3

Horizontal Partition:

8 Million RowsExtent 1

Storage Architecture reduces I/O

• Only touch column files that are in filter, projection, group by, and join conditions

• Eliminate disk block touches to partitions outside filter and join conditions

Extent 1: ShipDate: 2016-01-12 - 2016-03-05

Extent 2: ShipDate: 2016-03-05 - 2016-09-23

Extent 3: ShipDate: 2016-09-24 - 2017-01-06

SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’GROUP BY Item

Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode

1 1 1 Laptop 5 1000 Dell2016-01-

12 G

2 1 2 Monitor 5 200 LG2016-01-

13 G

3 2 1 Mouse 1 20 Logitech 2016-02-05 M

4 3 1 Laptop 3 1600 Apple 2016-01-31 P

... ... ... ... ... ... ... ... ...

8M 2016-03-05

8M+1 2016-03-05

... ... ... ... ... ... ... ... ...

16M 2016-09-23

16M+1 2016-09-24

... ... ... ... ... ... ... ... ...

24M 2017-01-06

ELIMINATED PARTITION

ELIMINATED PARTITION

Extents and PMs

Extent 1 Extent 2

Extent 3 Extent 4

Extent 5 Extent 6

Extent 7 Extent 8

PM 1 PM 2

Extent 1 Extent 2 Extent 3 Extent 4

Extent 5 Extent 6 Extent 7 Extent 8

PM 1 PM 2 PM 4PM 3

Writing Data

Data Loading and Extents

CSV File

Extent 1Min 1

Max 100

Extent 2Min 105Max 200

8 million rows

8 million rows

Data loadData Range1 ~ 200Rows 16 million

New CSV File

Data Range150 ~ 210Rows 24million

Extent 3Min 150Max 165

Extent 4Min 162Max 192

8 million rows

8 million rowsData load

Extent 5Min 192Max 210

Second Data Load

8 million rows

Inserting Data

● Multiple methods○ Single INSERTs○ INSERT...SELECT○ LOAD DATA INFILE○ cpimport○ Bulk Write API

● Designed for large bulk inserts● Inserts are appended at the end of extents (or new extents created)

○ This means reads are not affected○ A High Water Mark pointing to the last block is moved at the end of the insert

cpimport

● Uses CSV files or piped CSV data● Fastest way to get data into ColumnStore● Does minimal data conversion and pipes it straight into the PMs

○ Works by appending new blocks to the table and moving an atomic block pointer (HWM)

○ No UNDO log needed (atomic pointer not moved on rollback)○ Therefore can cause a gap of 0-64KB in a column

● Can load multiple tables simultaneously● Can load into multiple PMs for the same table simultaneously● Can load into specific PMs for physical partitioning by PM

A Note About DELETE

● Need to touch every column and the undo log○ So very slow

● Also leaves a gap in the column that won’t be filled● Having a column that is marked using an UPDATE query is faster● Dropping entire partitions is instantaneous

○ Partitions can be disabled first

INSERT...SELECT / LOAD DATA INFILE

● Injects the binary row data from MariaDB into cpimport● Good for backwards compatibility with tools and remote loading● cpimport then injects this data into the column extent files

○ In 1.2 it will use the write API instead● If autocommit is turned off this will behave like regular DML instead (slow)

Best Practices

Data Modeling

● Star-schema optimizations are generally a good idea

● Conservative data typing is very important

○ Especially around fixed-length vs. dictionary boundary (8 bytes)

○ IP Address vs. IP Number

● Break down compound fields into individual fields:

○ Trivializes searching for sub-fields

○ Can avoid dictionary overhead

○ Cost to re-assemble is generally small

Data Ingestion

● Avoid single inserts as continuous data feed

● Micro batch with bulk load operation

● Use cpimport or data adapters instead of LOAD DATA INFILE for bulk load

● If you wish to drop partitions in future based on particular field, load the data in sorted order of this field

THANK YOU!