In-Memory Columnar tests using LHC physics analysis benchmark

17
In-Memory Columnar tests using LHC physics analysis benchmark Maaike Limper 2 June 2014

description

In-Memory Columnar tests using LHC physics analysis benchmark. Maaike Limper 2 June 2014. Test setup. Single instance 32-core machine 512 GB memory Using Beta version 2. Test bug-fixes in Beta 2. Simultaneous population of multiple tables now OK in Beta 2 - PowerPoint PPT Presentation

Transcript of In-Memory Columnar tests using LHC physics analysis benchmark

Page 1: In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Columnar tests using LHC physics

analysis benchmarkMaaike Limper

2 June 2014

Page 2: In-Memory Columnar tests using LHC physics analysis benchmark

Test setup

Single instance 32-core machine 512 GB memory Using Beta version 2

2

Page 3: In-Memory Columnar tests using LHC physics analysis benchmark

Test bug-fixes in Beta 2

Simultaneous population of multiple tables now OK in Beta 2

Removing table from IMC with “ALTER TABLE … NO INMEMORY” now works OK

remaining issues reported in beta forum: re-population when changing in-memory properties

3

Page 4: In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Columnar table sizes Test COMPRESS FOR QUERY vs CAPACITY HIGH

• “EF”-> trigger-data, only booleans, best compression• “MET”-> table with floats & double, worst compression

4

Table name Original size (GB)

Compress ratio IMC cap. high

Compress ratio IMC query

“photon” 114.57 3.66 1.96“electron” 94.67 3.52 1.97“jet” 32.27 4.82“muon” 14.52 3.19 1.38“EF” 3.22 63.46 22.13“MET” 2.53 1.7

Page 5: In-Memory Columnar tests using LHC physics analysis benchmark

v$inmemory_area

inmemory_size=120 GB• 64KB POOL: nearly empty (1/4 of im_size)• 1MB POOL: nearly full (3/4 of im_size)

Add option to use smaller 64 KB POOL for read-only data?

5

Page 6: In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Population By default spawns 2*CPU-cores “space-background-

workers” (=64 on my test-setup)• Large memory consumption, system starts using swap-space!• Consumes all CPU in system

I’ve manually set _max_spacebg_slaves=16 to prevent problems while populating

6

Page 7: In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Population

7

COMPRESS FOR CAPACITY HIGH

IMC population with 16 spacebg-slaves on 32-core machine:• Each slave takes 100% of 1 CPU-core• Total CPU-usage is ~50% of system

Page 8: In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Population

8

COMPRESS FOR QUERY (default)

CPU-usage just as high when using default compression

Page 9: In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Population 25 minutes to populate 94.6 GB table with 340

columns, 50 million rows (“electron”) 16 workers Same time to populate with different compression

rate

9

COMPRESS FOR QUERY (default)

COMPRESS FOR CAPACITY HIGH

Page 10: In-Memory Columnar tests using LHC physics analysis benchmark

Measuring query time

In the following slides I measure query time between reading data from the In-Memory Columnar store and data stored in the buffer cache

Speed-up factor depends on compression level used for IMC, here I show results for:• COMPRESS FOR QUERY (default)• COMPRESS FOR CAPACITY HIGH

10

Page 11: In-Memory Columnar tests using LHC physics analysis benchmark

Default compression: query time

IMC 40x faster than cache for (very) simple query:

11

Page 12: In-Memory Columnar tests using LHC physics analysis benchmark

Default compression: query time

IMC 15x faster than cache for simple query with group-by:

12

Page 13: In-Memory Columnar tests using LHC physics analysis benchmark

Default compression: query time

IMC 6x faster than cache for more complex query with window-function:

13

Page 14: In-Memory Columnar tests using LHC physics analysis benchmark

CAPACITY HIGH: query time

IMC 10x faster than cache for (very) simple query:

14

Page 15: In-Memory Columnar tests using LHC physics analysis benchmark

CAPACITY HIGH: query time

IMC 5x faster than cache for simple query with group-by:

15

Page 16: In-Memory Columnar tests using LHC physics analysis benchmark

CAPACITY HIGH: query time

IMC 2.5x faster than cache for more complex query with window-function:

16

Page 17: In-Memory Columnar tests using LHC physics analysis benchmark

Preliminary conclusion Compression:

• COMPRESS FOR QUERY has 2x less compression• COMPRESS FOR QUERY on average ~3x faster queries

Default number of workers 2xCPU-cores, too much (I think)• Can use a lot of CPU and memory, may result in swapping• No way to stop population once in progress, can hang the DB• I would recommend #workers=½ CPU-cores

Trying to get good “cache” vs “IMC” benchmark• Looks good for simple queries, • but I’d like to test more complex queries as well, in progress…

17