In-Memory Columnar tests using LHC physics analysis benchmark
description
Transcript of In-Memory Columnar tests using LHC physics analysis benchmark
In-Memory Columnar tests using LHC physics
analysis benchmarkMaaike Limper
2 June 2014
Test setup
Single instance 32-core machine 512 GB memory Using Beta version 2
2
Test bug-fixes in Beta 2
Simultaneous population of multiple tables now OK in Beta 2
Removing table from IMC with “ALTER TABLE … NO INMEMORY” now works OK
remaining issues reported in beta forum: re-population when changing in-memory properties
3
In-Memory Columnar table sizes Test COMPRESS FOR QUERY vs CAPACITY HIGH
• “EF”-> trigger-data, only booleans, best compression• “MET”-> table with floats & double, worst compression
4
Table name Original size (GB)
Compress ratio IMC cap. high
Compress ratio IMC query
“photon” 114.57 3.66 1.96“electron” 94.67 3.52 1.97“jet” 32.27 4.82“muon” 14.52 3.19 1.38“EF” 3.22 63.46 22.13“MET” 2.53 1.7
v$inmemory_area
inmemory_size=120 GB• 64KB POOL: nearly empty (1/4 of im_size)• 1MB POOL: nearly full (3/4 of im_size)
Add option to use smaller 64 KB POOL for read-only data?
5
In-Memory Population By default spawns 2*CPU-cores “space-background-
workers” (=64 on my test-setup)• Large memory consumption, system starts using swap-space!• Consumes all CPU in system
I’ve manually set _max_spacebg_slaves=16 to prevent problems while populating
6
In-Memory Population
7
COMPRESS FOR CAPACITY HIGH
IMC population with 16 spacebg-slaves on 32-core machine:• Each slave takes 100% of 1 CPU-core• Total CPU-usage is ~50% of system
In-Memory Population
8
COMPRESS FOR QUERY (default)
CPU-usage just as high when using default compression
In-Memory Population 25 minutes to populate 94.6 GB table with 340
columns, 50 million rows (“electron”) 16 workers Same time to populate with different compression
rate
9
COMPRESS FOR QUERY (default)
COMPRESS FOR CAPACITY HIGH
Measuring query time
In the following slides I measure query time between reading data from the In-Memory Columnar store and data stored in the buffer cache
Speed-up factor depends on compression level used for IMC, here I show results for:• COMPRESS FOR QUERY (default)• COMPRESS FOR CAPACITY HIGH
10
Default compression: query time
IMC 40x faster than cache for (very) simple query:
11
Default compression: query time
IMC 15x faster than cache for simple query with group-by:
12
Default compression: query time
IMC 6x faster than cache for more complex query with window-function:
13
CAPACITY HIGH: query time
IMC 10x faster than cache for (very) simple query:
14
CAPACITY HIGH: query time
IMC 5x faster than cache for simple query with group-by:
15
CAPACITY HIGH: query time
IMC 2.5x faster than cache for more complex query with window-function:
16
Preliminary conclusion Compression:
• COMPRESS FOR QUERY has 2x less compression• COMPRESS FOR QUERY on average ~3x faster queries
Default number of workers 2xCPU-cores, too much (I think)• Can use a lot of CPU and memory, may result in swapping• No way to stop population once in progress, can hang the DB• I would recommend #workers=½ CPU-cores
Trying to get good “cache” vs “IMC” benchmark• Looks good for simple queries, • but I’d like to test more complex queries as well, in progress…
17