PG-Strom - A FDW module utilizing GPU device

8
PG-Strom ~A FDW module utilizing GPU device~ NEC Europe SAP Global Competence Center KaiGai Kohei <[email protected]>

description

slides on LT session of PGcon2012

Transcript of PG-Strom - A FDW module utilizing GPU device

Page 1: PG-Strom - A FDW module utilizing GPU device

PG-Strom ~A FDW module utilizing GPU device~

NEC Europe

SAP Global Competence Center

KaiGai Kohei <[email protected]>

Page 2: PG-Strom - A FDW module utilizing GPU device

Page 2 PostgreSQL Conference 2012

FDW is fun

Exe

cuto

r

Regular Table

Foreign Table

Foreign Table

Foreign Table

MySQL FDW

Oracle FDW

PG-Strom FDW

Exec

Exec

Exec

Regular Table

SELECT * FROM …

Utilizing External

Computing Resource!

Run on single thread

Page 3: PG-Strom - A FDW module utilizing GPU device

Page 3 PostgreSQL Conference 2012

Idea of Asynchronous Execution using CPU and GPU

CPU

vanilla PostgreSQL PostgreSQL with PG-Strom

CPU GPU

Synchronization

Itera

tion o

f sc

an t

uple

s and

eva

luation o

f qualif

iers

Larger “chunk” to scan the database at once

Asynchronous memory transfer and execution

Earlier than “Only CPU” scan

: Red means, scan tuples from the database

: Green means, execution of the qualifiers

Page 4: PG-Strom - A FDW module utilizing GPU device

Page 4 PostgreSQL Conference 2012

World of CPU World of GPU

Backend Process

Architecture of PG-Strom

Postmaster

PG-Strom GPU

Calculation Server

shared chunks shared buffer

shadow tables

regular tables

Backend Process

GPU Kernel

Function

PCI-E x16 Gen2 (16GB/sec)

Backend Process

Executor

PG-Strom

GPU Device Memory

Preload

Exec

Exec

Data Exchange via shared chunk

Massive Parallel

Execution

DMA Transfer

Page 5: PG-Strom - A FDW module utilizing GPU device

Page 5 PostgreSQL Conference 2012

Data Density and Column-base structure

Chunk B

uffer

of

FT1

value a[]

rowmap

value b[]

value c[]

value d[] <not used>

<not used>

Table: my_schema.ft1.b.cs

10300 {10.23, 7.54, 5.43, … }

Table: my_schema.ft1.c.cs

{‘2010-10-21’, …}

10100 {2.4, 5.6, 4.95, … }

{‘2011-01-23’, …}

{‘2011-08-17’, …}

② Calculation

① Transfer

③ Write-Back

10100

10200

10300

Foreign Table FT1 (a, b, c, d)

row

id m

ap

colu

mn s

tore

of

A

colu

mn s

tore

of

B

colu

mn s

tore

of

C

colu

mn s

tore

of

D

row store of FT1

Shadow Tables

Page 6: PG-Strom - A FDW module utilizing GPU device

Page 6 PostgreSQL Conference 2012

Benchmark Result

▌CPU: Intel Xeon E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core)

▌rtbl and ftbl contains 5 million tuples, with same values.

▌All the tuples are already in the shared buffers, so seldom disk i/o happen.

postgres=# SELECT COUNT(*) FROM rtbl

WHERE sqrt((x-256)^2 + (y-128)^2) < 40;

count

-------

25069

(1 row)

Time: 3739.492 ms

postgres=# SELECT COUNT(*) FROM ftbl

WHERE sqrt((x-256)^2 + (y-128)^2) < 40;

count

-------

25069

(1 row)

Time: 227.023 ms

X10 times Faster

GPU Accelerated!

Page 7: PG-Strom - A FDW module utilizing GPU device

Page 7 PostgreSQL Conference 2012

Future Development

▌Git URL

https://github.com/kaigai/pg_strom

▌v9.3 development

Writable Foreign Table

Sort / Aggregate acceleration using GPU

Inheritance between regular and foreign tables

▌Need your help

Folks who can review the patches

Folks who can provide real-life big data

Folks who can know typical workload of analytic queries

Page 8: PG-Strom - A FDW module utilizing GPU device

Page 8 PostgreSQL Conference 2012