In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Columnar tests using LHC physics

analysis benchmarkMaaike Limper

2 June 2014

Test setup

Single instance 32-core machine 512 GB memory Using Beta version 2

2

Test bug-fixes in Beta 2

Simultaneous population of multiple tables now OK in Beta 2

Removing table from IMC with “ALTER TABLE … NO INMEMORY” now works OK

remaining issues reported in beta forum: re-population when changing in-memory properties

3

In-Memory Columnar table sizes Test COMPRESS FOR QUERY vs CAPACITY HIGH

• “EF”-> trigger-data, only booleans, best compression• “MET”-> table with floats & double, worst compression

4

Table name Original size (GB)

Compress ratio IMC cap. high

Compress ratio IMC query

“photon” 114.57 3.66 1.96“electron” 94.67 3.52 1.97“jet” 32.27 4.82“muon” 14.52 3.19 1.38“EF” 3.22 63.46 22.13“MET” 2.53 1.7

v$inmemory_area

inmemory_size=120 GB• 64KB POOL: nearly empty (1/4 of im_size)• 1MB POOL: nearly full (3/4 of im_size)

Add option to use smaller 64 KB POOL for read-only data?

5

In-Memory Population By default spawns 2*CPU-cores “space-background-

workers” (=64 on my test-setup)• Large memory consumption, system starts using swap-space!• Consumes all CPU in system

I’ve manually set _max_spacebg_slaves=16 to prevent problems while populating

6

In-Memory Population

7

COMPRESS FOR CAPACITY HIGH

IMC population with 16 spacebg-slaves on 32-core machine:• Each slave takes 100% of 1 CPU-core• Total CPU-usage is ~50% of system

In-Memory Population

8

COMPRESS FOR QUERY (default)

CPU-usage just as high when using default compression

In-Memory Population 25 minutes to populate 94.6 GB table with 340

columns, 50 million rows (“electron”) 16 workers Same time to populate with different compression

rate

9

COMPRESS FOR QUERY (default)

COMPRESS FOR CAPACITY HIGH

Measuring query time

In the following slides I measure query time between reading data from the In-Memory Columnar store and data stored in the buffer cache

Speed-up factor depends on compression level used for IMC, here I show results for:• COMPRESS FOR QUERY (default)• COMPRESS FOR CAPACITY HIGH

10

Default compression: query time

IMC 40x faster than cache for (very) simple query:

11


IMC 15x faster than cache for simple query with group-by:

12


IMC 6x faster than cache for more complex query with window-function:

13

CAPACITY HIGH: query time

IMC 10x faster than cache for (very) simple query:

14


IMC 5x faster than cache for simple query with group-by:

15


IMC 2.5x faster than cache for more complex query with window-function:

16

Preliminary conclusion Compression:

• COMPRESS FOR QUERY has 2x less compression• COMPRESS FOR QUERY on average ~3x faster queries

Default number of workers 2xCPU-cores, too much (I think)• Can use a lot of CPU and memory, may result in swapping• No way to stop population once in progress, can hang the DB• I would recommend #workers=½ CPU-cores

Trying to get good “cache” vs “IMC” benchmark• Looks good for simple queries, • but I’d like to test more complex queries as well, in progress…

17

In-Memory Columnar tests using LHC physics analysis benchmark

Documents

Transcript of In-Memory Columnar tests using LHC physics analysis benchmark