PG-Strom - A FDW module utilizing GPU device
-
Upload
kohei-kaigai -
Category
Technology
-
view
6.859 -
download
3
description
Transcript of PG-Strom - A FDW module utilizing GPU device
PG-Strom ~A FDW module utilizing GPU device~
NEC Europe
SAP Global Competence Center
KaiGai Kohei <[email protected]>
Page 2 PostgreSQL Conference 2012
FDW is fun
Exe
cuto
r
Regular Table
Foreign Table
Foreign Table
Foreign Table
MySQL FDW
Oracle FDW
PG-Strom FDW
Exec
Exec
Exec
Regular Table
SELECT * FROM …
Utilizing External
Computing Resource!
Run on single thread
Page 3 PostgreSQL Conference 2012
Idea of Asynchronous Execution using CPU and GPU
CPU
vanilla PostgreSQL PostgreSQL with PG-Strom
CPU GPU
Synchronization
Itera
tion o
f sc
an t
uple
s and
eva
luation o
f qualif
iers
Larger “chunk” to scan the database at once
Asynchronous memory transfer and execution
Earlier than “Only CPU” scan
: Red means, scan tuples from the database
: Green means, execution of the qualifiers
Page 4 PostgreSQL Conference 2012
World of CPU World of GPU
Backend Process
Architecture of PG-Strom
Postmaster
PG-Strom GPU
Calculation Server
shared chunks shared buffer
shadow tables
regular tables
Backend Process
GPU Kernel
Function
PCI-E x16 Gen2 (16GB/sec)
Backend Process
Executor
PG-Strom
GPU Device Memory
Preload
Exec
Exec
Data Exchange via shared chunk
Massive Parallel
Execution
DMA Transfer
Page 5 PostgreSQL Conference 2012
Data Density and Column-base structure
Chunk B
uffer
of
FT1
value a[]
rowmap
value b[]
value c[]
value d[] <not used>
<not used>
Table: my_schema.ft1.b.cs
10300 {10.23, 7.54, 5.43, … }
Table: my_schema.ft1.c.cs
{‘2010-10-21’, …}
10100 {2.4, 5.6, 4.95, … }
{‘2011-01-23’, …}
{‘2011-08-17’, …}
② Calculation
① Transfer
③ Write-Back
10100
10200
10300
Foreign Table FT1 (a, b, c, d)
row
id m
ap
colu
mn s
tore
of
A
colu
mn s
tore
of
B
colu
mn s
tore
of
C
colu
mn s
tore
of
D
row store of FT1
Shadow Tables
Page 6 PostgreSQL Conference 2012
Benchmark Result
▌CPU: Intel Xeon E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core)
▌rtbl and ftbl contains 5 million tuples, with same values.
▌All the tuples are already in the shared buffers, so seldom disk i/o happen.
postgres=# SELECT COUNT(*) FROM rtbl
WHERE sqrt((x-256)^2 + (y-128)^2) < 40;
count
-------
25069
(1 row)
Time: 3739.492 ms
postgres=# SELECT COUNT(*) FROM ftbl
WHERE sqrt((x-256)^2 + (y-128)^2) < 40;
count
-------
25069
(1 row)
Time: 227.023 ms
X10 times Faster
GPU Accelerated!
Page 7 PostgreSQL Conference 2012
Future Development
▌Git URL
https://github.com/kaigai/pg_strom
▌v9.3 development
Writable Foreign Table
Sort / Aggregate acceleration using GPU
Inheritance between regular and foreign tables
▌Need your help
Folks who can review the patches
Folks who can provide real-life big data
Folks who can know typical workload of analytic queries
Page 8 PostgreSQL Conference 2012