© Prof. Dr.-Ing. Wolfgang Lehner |
Thomas Kissinger 2014/09/01Tim Kiefer ADMS 2014Benjamin Schlegel Hangzhou, ChinaDirk HabichDaniel MolkaWolfgang Lehner
ERIS: A NUMA-AWARE IN-MEMORYSTORAGE ENGINE FOR TERA-SCALE
ANALYTICAL WORKLOAD
| 2
Motivation
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Databases in the many-core era
0
0.5
1
1.5
2
0
0.5
1
1.5
2
2.5
3
3.5
0 64 128 192 256 320 384 448
Scan
Th
rou
ghp
ut
[Ti
B/s
]
Loo
kup
Th
rou
ghp
ut
[Bill
ion
/s]
#Cores
Shared LookupERIS LookupShared ScanERIS Scan
| 3
NUMA Systems
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
85
19616.4
1.8
0
50
100
150
200
250
local remote0
5
10
15
20
latency (ns) bandwidth (GB/s)
AMD8 nodes64 cores64 GBs max 2 hops
| 4
NUMA Systems
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
AMD8 nodes64 cores64 GBs max 2 hops
SGI64 nodes512 cores
8 TBsmax 4 hops81
87036.2
6.5
0
200
400
600
800
1000
local remote0
10
20
30
40
latency (ns) bandwidth (GB/s)
85
19616.4
1.8
0
50
100
150
200
250
local remote0
5
10
15
20
latency (ns) bandwidth (GB/s)
| 5
In a nutshell
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Key-value store with high lookup, scan, and insert performance to support analytical workloads
In-memory storage engine for multi-socket-multi-core systems with large main memories (NUMA)
| 6
In a nutshell
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Predominantly a shared-nothing distributed system (partition per core)
Implemented as partitioned prefix-tree and partitioned column store
In-memory storage engine for multi-socket-multi-core systems with large main memories (NUMA)
Key-value store with high lookup, scan, and insert performance to support analytical workloads
| 7
In a nutshell
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Predominantly a shared-nothing distributed system (partition per core)
Implemented as partitioned prefix-tree and partitioned column store
Experiments show linear scalability, superior memory and link usage, efficient load balancing
In-memory storage engine for multi-socket-multi-core systems with large main memories (NUMA)
Key-value store with high lookup, scan, and insert performance to support analytical workloads
| 8
>
ERIS Implementation
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
| 9
ERIS Data Structures
Data structures: prefix tree (lookup) and column store (scan)
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
0 1 2 14 15…
0 1 2 14 15… 0 1 2 14 15…
0 1 2 14 15…
… … …
4bit
4bit
4bit
KeyValue
KeyValue
KeyValue
KeyValue
ValueValueValueValueValueValueValueValueValueValueValueValue
Prefix Tree Column Store
| 10
Partitioned Prefix Tree
Prefix Tree Layout
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Direct Access Indirect Access
| 11
ERIS Architecture
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Multiprocessor 1
AEU
Core 1
Local Memory
Core N
…
Local Memory Manager
AEU
| 12
ERIS Architecture
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Multiprocessor 1
AEU
Core 1
Local Memory
Core N
…
Local Memory Manager
Multiprocessor M
Core 1
Local Memory
Core N
…
Local Memory Manager
NUMA-Optimized High-Throughput Data Command RoutingGlobal Partition Table (GPT)
…AEU AEU AEU
| 13
ERIS Architecture
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
PartitionTransfer
Multiprocessor 1
AEU
Core 1
Local Memory
Core N
…
Local Memory Manager
Multiprocessor M
Core 1
Local Memory
Core N
…
Local Memory Manager
NUMA-Optimized High-Throughput Data Command RoutingGlobal Partition Table (GPT)
Monitoring
…LoadBalancer AEU AEU AEU
| 14
ERIS Architecture
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Core N
Autonomous Execution Unit (AEU) Local Memory
Local Command Buffer
AEUAEU’s Partitions
Process Data Commands(i.e.., Scan, Lookup, and
Insert/Upsert)
Process Balancing Commands
Group Data Commands
Column-StoreIndex
PartitionTransfer
Multiprocessor 1
AEU
Core 1
Local Memory
Core N
…
Local Memory Manager
Multiprocessor M
Core 1
Local Memory
Core N
…
Local Memory Manager
NUMA-Optimized High-Throughput Data Command RoutingGlobal Partition Table (GPT)
Monitoring
…LoadBalancer AEU AEU AEU
| 15
Load Balancing
Load Balancer Implementation
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Multiprocessor 1
Local Memory
Multiprocessor 2
Local Memory
Intra-Node Transfer
link
…
AEU AEU AEUAEU
| 16
Load Balancing
Load Balancer Implementation
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
Multiprocessor 1
Local Memory
Multiprocessor 2
Local Memory
Intra-Node Transfer
link
Inter-Node Transfer
copy
…
AEU AEU AEUAEU
Transfer Command
Raw Data Stream
| 17
Load Balancing
Load Balancer Strategies
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
| 18
Load Balancing
Load Balancer Strategies
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
| 19
>
ERIS Evaluation
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
| 20
Evaluation
Lookup/Upsert Throughput Depending on Index Size
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
AMD Machine SGI Machine
Loo
kup
Up
sert
| 21
Evaluation
Scan Performance
� SGI Machine� 488 cores – parallel scan� 8 billion entries in the column store
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
2094.1
273.6
33.8
0 500 1000 1500 2000 2500
ERIS
Interleaved
Single RAM
Bandwidth [GB/s]
| 22
Evaluation
Link and Memory Controller Activity
� AMD Machine� Scan: 8B Keys� Lookup: 1B Keys
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
33.8
41.6
75.6
83.8
122.9
73.04
1.2
17.8
0 20 40 60 80 100 120 140
Scan
Lookup
Scan
Lookup
MEM
LIN
K
Bandwidth [GB/s]
ERIS Shared
© Prof. Dr.-Ing. Wolfgang Lehner |
Thomas Kissinger 2014/09/01Tim Kiefer ADMS 2014Benjamin Schlegel Hangzhou, ChinaDirk HabichDaniel MolkaWolfgang Lehner
ERIS: A NUMA-AWARE IN-MEMORYSTORAGE ENGINE FOR TERA-SCALE
ANALYTICAL WORKLOAD
| 24
Data Command Routing
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
AEU
1
Local Outgoing Buffers
Local Incoming Buffer
… AEU N
1. Batch Lookup Target AEU(s)
3. Copy to Target
Local Incoming BufferActive
1bitOffset32bit
Active Writers31bit
Bitmap Partition Table
Range Partition Table
Range Partition Table
Multicast Buffer
To AEU 1
Unicast Buffer
Multicast References
To AEU N
Unicast Buffer
Multicast References2.
Local Outgoing Buffers
Local Incoming BufferProcessing
Fill
2.
2.
| 25
Data Command Routing – Evaluation
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
0
200
400
600
800
1000
1200
0 64 128 192 256 320 384 448 512
Thro
ugh
pu
t [M
illio
n R
ou
tin
gs/s
]
Local Buffer Size [#Requests]
Raw Routing
Routing w/ Index Lookups
| 26
Evaluation
L3 Cache Usage – Index Lookup
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
0
10
20
30
40
50
60
70
80
0
50
100
150
200
250
300
350
400
16M 32M 64M 128M 256M 512M 1B 2B
L3 C
ach
e M
iss
Rat
io [
%]
Thro
ugh
pu
t [M
illio
n/s
]
#Keys
ERIS Shared ERIS L3 Cache Shared L3 Cache
| 27
Evaluation
L3 Cache Line State – Index Lookup
� Percentage of all hits� 1B keys
ERIS: A NUMA-Aware In-Memory Storage Engine for Tera-Scale Analytical Workload
19.4 %
76.6%
2.1% 1.9%
16.3%4.5%
20.9%
58.4%
0102030405060708090
100
Modified Exclusive Forward Shared
Pe
rce
nt
Cache Line State
ERIS Shared Index
Top Related