GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

43
1 GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~ PGconf.EU 2015 Wien NEC Business Creation Division The PG-Strom Project KaiGai Kohei <[email protected]>

Transcript of GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

Page 1: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

1

GPGPU Accelerates PostgreSQL

~Unlock the power of multi-thousand cores~

PGconf.EU 2015 Wien

NEC Business Creation Division

The PG-Strom Project

KaiGai Kohei <[email protected]>

Page 2: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

2

Today’s topic

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Very powerful computing capability

Very functional & well-used

database

PG-Strom:

Glue of the both promising technologies

GPGPU

Page 3: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

3

Self Introduction

▌Name: KaiGai Kohei

▌Works for: NEC

▌Role:

Lead of the PG-Strom project

Contribution to PostgreSQL and other OSS projects

Making business opportunity using this technology

▌PG-Strom Project

Mission: To deliver the power of semiconductor evolution for every people who want

Launched at Jan-2012, as a personal development project

Project is now funded by NEC

Fully open source project

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 4: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

4

Agenda

1. GPU characteristics and why?

2. What intermediates PostgreSQL and GPU

3. How GPU processes SQL workloads

4. Performance results and analysis

5. Further development

Page 5: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

5

Features of GPU (Graphic Processor Unit)

▌Characteristics

Massive parallel cores

Much higher DRAM bandwidth

Better price / performance ratio

advantages for massive but simple arithmetic operations

Different manner to write software algorithm

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

SOURCE: CUDA C Programming Guide

GPU CPU

Model Nvidia GTX

TITAN X Intel Xeon E5-2690 v3

Architecture Maxwell Haswell

Launch Mar-2015 Sep-2014

# of transistors 8.0billion 3.84billion

# of cores 3072

(simple) 12

(functional)

Core clock 1.0GHz 2.6GHz,

up to 3.5GHz

Peak Flops (single

precision) 6.6TFLOPS

998.4GFLOPS (with AVX2)

DRAM size 12GB, GDDR5 (384bits bus)

768GB/socket, DDR4

Memory band 336.5GB/s 68GB/s

Power consumption

250W 135W

Price $999 $2,094

Page 6: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

6

How GPU cores works

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

● item[0]

step.1 step.2 step.4 step.3

Calculation of

𝑖𝑡𝑒𝑚[𝑖]

𝑖=0…𝑁−1

with GPU cores

▲ ■ ★

● ◆

● ◆ ▲

● ◆

● ◆ ▲ ■

● ◆

● ◆ ▲

● ◆

item[1]

item[2]

item[3]

item[4]

item[5]

item[6]

item[7]

item[8]

item[9]

item[10]

item[11]

item[12]

item[13]

item[14]

item[15]

Sum of items[] by log2N steps

Inter-core synchronization by HW functionality

Page 7: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

7

Dark Silicon and Semiconductor Trend

▌Processor shrink also increases power leakage.

▌Dark Silicon: area cannot power-on due to thermal problem.

▌People thought: How to utilize these extra transistors?

Domain specific processors, like GPU

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Core Core Core Core

Core Core Core Core

Dark Silicon Core Core

Core Core

Dark Silicon

Core

Core

Core

Core

Core

Core

Next Gen Next Gen

Intel Haswell on-chip layout (4core system)

Page 8: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

8

Towards heterogeneous era...

Software must be designed to intermediate CPU and GPU

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 9: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

9

Agenda

1. GPU characteristics and why?

2. What intermediates PostgreSQL and GPU

3. How GPU processes SQL workloads

4. Performance results and analysis

5. Further development

Page 10: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

10

What is PG-Strom

▌Core ideas

① GPU native code generation on the fly

② Asynchronous massive parallel execution

▌Advantages

Transparent acceleration with 100% query compatibility

Commodity H/W and less system integration cost

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Parser

Planner

Executor

Custom- Scan/Join Interface

Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid;

PG-Strom

CUDA driver

nvrtc

DMA Data Transfer

CUDA Source code Massive

Parallel Execution

Page 11: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

11

How query is processed

▌Custom-Scan/Join interface (v9.5 new!)

▌PG-Strom provides own scan/join logics using GPUs

▌It generates same results, but different way

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Query Parser

Query Optimizer

Query Executor

SQL (text)

Parse Tree

Query Plan

Custom Scan/Join Interface

PG-Strom

Interactions between

PostgreSQL and

Extensions

CUDA code

Data Chunk

NVIDIA Run-Time Compiler

Page 12: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

12

definition at planning

time

(OT) Dive into Custom-Join

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Built-in JOIN Custom JOIN

CustomScan (scanrelid==0)

Logic implemented by extension

Inner Scan Outer Scan

table-X table-Y

Join

• NestedLoop • MergeJoin • HashJoin

TupleDesc of table-X

TupleDesc of table-Y

X-tuple Y-tuple

joined-tuple

TupleDesc of Join X-Y

table-Y Something other data source

table-X

Data Load with extension’s comfortable way

joined-tuple

compatible tuple format

Page 13: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

13

SQL-to-GPU native code generation (1/3)

postgres=# EXPLAIN VERBOSE

SELECT cat, count(*), avg(x) FROM t0 WHERE x between y and y + 20.0 GROUP BY cat; QUERY PLAN

-------------------------------------------------------------------------------------

HashAggregate (cost=167646.33..167646.36 rows=3 width=12)

Output: cat, pgstrom.count((pgstrom.nrows())), pgstrom.avg((pgstrom.nrows((x IS NOT NULL))), (pgstrom.psum(x)))

Group Key: t0.cat

-> Custom Scan (GpuPreAgg) (cost=24150.22..163315.53 rows=258 width=48)

Output: cat, pgstrom.nrows(), pgstrom.nrows((x IS NOT NULL)), pgstrom.psum(x)

Bulkload: On (density: 100.00%)

Reduction: Local + Global

Device Filter: ((t0.x >= t0.y) AND (t0.x <= (t0.y + '20'::double precision)))

Features: format: tuple-slot, bulkload: unsupported

Kernel Source: /opt/pgsql/base/pgsql_tmp/pgsql_tmp_strom_24905.4.gpu -> Custom Scan (BulkScan) on public.t0 (cost=21150.22..159312.94 rows=10000060 width=12)

Output: id, cat, aid, bid, cid, did, eid, x, y, z

Features: format: heap-tuple, bulkload: supported

(13 rows)

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 14: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

14

SQL-to-GPU native code generation (2/3)

$ less /opt/pgsql/base/pgsql_tmp/pgsql_tmp_strom_24905.4.gpu

:

STATIC_FUNCTION(bool)

gpupreagg_qual_eval(kern_context *kcxt,

kern_data_store *kds,

kern_data_store *ktoast,

size_t kds_index)

{

pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); /* constant 20.0 */

pg_float8_t KVAR_8 = pg_float8_vref(kds,kcxt,7,kds_index); /* var of X */

pg_float8_t KVAR_9 = pg_float8_vref(kds,kcxt,8,kds_index); /* var of Y */

return EVAL(pgfn_boolop_and_2(kcxt,

pgfn_float8ge(kcxt, KVAR_8, KVAR_9), pgfn_float8le(kcxt, KVAR_8, pgfn_float8pl(kcxt, KVAR_9, KPARAM_1))));

}

:

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

formula expression based on WHERE clause

Page 15: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

15

SQL-to-GPU native code generation (3/3)

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

SQL Query

Dynamically constructed GPU code

------ Per Query Variant

Statically constructed GPU code

------ Algorithm Core

NVIDIA Run-Time Compiler (nvrtc)

Cached GPU Binary

+

PG-Strom GPU code generator

For Execution

For Next Time GpuScan,

GpuHashJoin, GpuPreAgg, GpuSort, ...

Page 16: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

16

PG-Strom functionalities at Sep-2015

▌Logics

GpuScan ... Simple loop extraction by GPU multithread

GpuHashJoin ... GPU multithread based N-way hash-join

GpuNestLoop ... GPU multithread based N-way nested-loop

GpuPreAgg ... Row reduction prior to CPU aggregation

GpuSort ... GPU bitonic + CPU merge, hybrid sorting

▌Data Types

Numeric ... int2/4/8, float4/8, numeric

Date and Time ... date, time(tz), timestamp(tz), interval

Text ... simple matching only, at this moment

others ... currency

▌Functions

Comparison operator ... <, <=, !=, =, >=, >

Arithmetic operators ... +, -, *, /, %, ...

Mathematical functions ... sqrt, log, exp, ...

Aggregate functions ... min, max, sum, avg, stddev, ...

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 17: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

17

Agenda

1. GPU characteristics and why?

2. What intermediates PostgreSQL and GPU

3. How GPU processes SQL workloads

4. Performance results and analysis

5. Further development

Page 18: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

18

Asynchronous & Pipelining Execution (1/3)

▌Karma of discrete GPU device

We have to move the data to be processed.

Thus, people say data intensive tasks are not good for GPU.

Correct, but we can reduce the impact.

Our solution: Pipelining

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

GPU

CPU

GPU kernel execution

time

DMA Send

DMA Recv

Page 19: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

19

Asynchronous & Pipelining Execution (2/3)

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

chunk-(i+1)

chunk-(i+2)

chunk-i

chunk-(i+3)

table

scan

Move to Next

DMA Recv

GPU Kernel Exec

DMA Send

Buffer Read

DMA Recv

GPU Kernel Exec

DMA Send

Buffer Read

GPU Kernel Exec

DMA Send

Buffer Read

DMA Send

Buffer Read

Step-(i+4) Step-(i+3) Step-(i+2) Step-(i+1) Step-i

Page 20: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

20

Asynchronous & Pipelining Execution (3/3)

▌Karma of discrete GPU device

We have to move the data to be processed.

Thus, people say data intensive tasks are not good for GPU.

Correct, but we can reduce the impact

Our solution: Pipelining

▌Downside of the pipelining

Too small workloads

Pipeline hazard

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

GPU

CPU

GPU kernel execution

time

DMA Send

DMA Recv

Page 21: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

21

SQL Logic on GPU (1/3) – GpuHashJoin

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Inner relation

Outer relation

Inner relation

Outer relation

Hash table Hash table

Next step Next step

All CPU does is just references the result of relations join

Sequential Hash table search by CPU

Sequential Projection by CPU

Parallel Projection by GPU

Parallel Hash-table search by GPU

Built-in Hash-Join GpuHashJoin of PG-Strom

Page 22: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

22

Inner Hash table

OT: Ununiformly Distributed Hash-Table

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

GPU Core-0

GPU Core-1

GPU Core-2

GPU Core-3

GPU Core-4

GPU Core-5

GPU Core-M

Outer relation

Item[0]

Item[1]

Item[2]

Item[3]

Item[4]

Item[5]

Item[N]

Item[M]

Item Item Item Item Item Item Item Item Item Item

Inner relation

Hash Value

Hash Function

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Item[3] Item

Overflow of Result Buffer

Page 23: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

23

Inner Hash table

OT: Ununiformly Distributed Hash-Table

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

GPU Core-0

GPU Core-1

GPU Core-2

GPU Core-3

GPU Core-4

GPU Core-5

GPU Core-M

Outer relation

Item[0]

Item[1]

Item[2]

Item[3]

Item[4]

Item[5]

Item[N]

Item[M]

Item Item Item Item Item Item Item Item Item Item

Inner relation

Hash Value

Hash Function

Item[3] Item Item[3] Item

Virtual Partition with Extra Randomness

Just fit for Result Buffer

x Retry for each virtual partition

Page 24: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

24

SQL Logic on GPU (2/3) – GpuNestLoop (unparametalized)

Oute

r-Rela

tion

(Nx: u

sually

larg

er)

※ s

plit to

chunk-b

y-c

hunk o

n

dem

and

● ●

Two dimensional GPU kernel

launch

blockDim.x

blockDim.y

Ny

Nx Thread

(X=2, Y=3)

Inner-Relation (Ny: relatively small)

Only edge thread references DRAM to fetch values.

Nx:32 x Ny:32 = 1024

A matrix can be evaluated with only 64 times DRAM accesses

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 25: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

25

SQL Logic on GPU (3/3) – GpuPreAgg

▌GpuPreAgg, prior to Aggregate, reduces number of rows to be processed.

▌Query rewrite to make equivalent results from intermediate aggregation.

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

GroupAggregate

Sort

SeqScan

Tbl_1

Result Set

several rows

several millions rows

several millions rows

Y, count(*), avg(X)

X, Y

X, Y

X, Y, Z (table definition)

GroupAggregate

Sort

GpuScan

Tbl_1

Result Set

several rows

several hundreds rows

several millions rows

Y, sum(nrows), avg_ex(nrows, psum)

Y, nrows, psum_X

X, Y

X, Y, Z (table definition)

GpuPreAgg Y, nrows, psum_X

several hundreds rows

SELECT count(*), AVG(X), Y FROM Tbl_1 GROUP BY Y;

reduction of rows to be

processed

Page 26: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

26

Agenda

1. GPU characteristics and why?

2. What intermediates PostgreSQL and GPU

3. How GPU processes SQL workloads

4. Performance results and analysis

5. Further development

Page 27: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

27

Microbenchmark (1/2) – total response time

▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;

measurement of query response time with increasing of inner relations

t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded.

▌Environment:

PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)

CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

58.89 82.51

102.15 125.46

159.83

194.50

241.88

12.44 17.88 17.56 21.98 33.33 38.14

50.29

0

50

100

150

200

250

300

2 3 4 5 6 7 8

Qu

ery

Res

po

nse

Tim

e [s

ec]

# of tables involved

Query Response Time towards Numer of Tables Joined

PostgreSQL 9.5β PG-Strom

Page 28: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

28

Microbenchmark (2/2) – Time consumption per component

▌Good acceleration results towards JOIN and AGGREGATION

▌Less acceleration ratio when N is larger than 6

By inaccurate problem size estimation, why?

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

0

50

100

150

200

250

300

PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom

2 3 4 5 6 7 8

Qu

ery

Res

po

nse

Tim

e [s

ec]

# of tables involved

Time consumption per component (PostgreSQL v9.5β vs PG-Strom)

Scan Join Aggregate Others

Page 29: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

29

OT: Why accurate problem size estimation is important

▌GpuJoin results must be smaller than GPU RAM size. ▌Virtual inner-partition saves scale of the Join results. Better size estimation reduces number of retries. We can determine the problem size using dynamic parallelism

downside: Supported by only Kepler or later generation.

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

innerRel (depth-1)

innerRel (depth-2)

innerRel (depth-3)

innerRel (depth-4)

outerRel Results

× × × × =

× × × × =

GpuJoin with virtual inner-partition and iteration

Larger than

GPU RAM

Within GPU RAM

Size

Page 30: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

30

TPC-DS Benchmark (1/2) – Total Results

▌What is TPC-DS? Successor of TPC-H, defines 103 (99+4) kind of queries Simulation of decision support queries in retail industry

▌About this benchmark Run the series of TPC-DS queries sequentially. We could collect 96 results of 103. Environment: • PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64) • CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

0

500

1000

1500

2000

Qu

ery

Re

spo

nse

Tim

e [

sec]

PostgreSQL v9.5β PG-Strom

anomaly?

Page 31: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

31

TPC-DS Benchmark (2/2) – Total Result

▌Summary

PG-Strom wins to PostgreSQL: 76 of 96

PostgreSQL wins to PG-Strom: 20 of 96

shows advantages on most of simple queries

▌Challenges

Wise plan construction, to avoid GpuXXX node not to be here

Functionality improvement for Aggregation and Scan

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

0

100

200

300

400

500

Qu

ery

Re

spo

nse

Tim

e [

sec]

PostgreSQL v9.5β PG-Strom

Page 32: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

32

Time consumption per workloads (1/3) – PostgreSQL v9.5

▌Scan, Join and Aggregation provides a clear explanation for the time consumption factor in TPC-DS workloads

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Time consumption per workloads (PostgreSQL v9.5beta)

Scan Join Aggregate Others

Page 33: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

33

Time consumption per workloads (2/3) – PG-Strom

▌PG-Strom dramatically reduced time for Joins.

▌Time for Joins are: 37% 17%

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Time consumption per workloads (PostgreSQL v9.5beta)

Scan Join Aggregate Others

Page 34: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

34

Time consumption per workloads (3/3) – Wow! GpuJoin

GpuJoin can explain 84% of performance improvement

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

y = 0.962x - 7.289 R² = 0.8442

-50.00

0.00

50.00

100.00

150.00

200.00

250.00

-50.00 0.00 50.00 100.00 150.00 200.00 250.00

Imp

rove

me

nt

of

Tota

l Tim

e [

sec]

(T

ota

l tim

e o

f P

gSQ

L) –

(To

tal t

ime

of

PG

-Str

om

)

Improvement of Join Time [sec] (Join time of PgSQL) – (Join time of PG-Strom)

Page 35: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

35

Agenda

1. GPU characteristics and why?

2. What intermediates PostgreSQL and GPU

3. How GPU processes SQL workloads

4. Performance results and analysis

5. Further development

Page 36: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

36

What we learn from TPC-DS

▌GpuJoin may be a killer feature of PG-Strom

....unless problem size estimation is accurate.

Planner needs to be wiser, not to inject GpuJoin towards the situation not suitable.

▌For Aggregation

GpuPreAgg didn’t appear as we expected

Anything GPU can help CPU aggregation?

▌For Scan

Scan performance == data stream bandwidth of GPU input

Single thread is a restriction for bandwidth of data supply

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 37: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

37

Features enhancement / improvement

▌Target-list calculation on GPU

Pre-calculation of hash-value for HashAggregate

Projection off-load if complicated mathematical expression

▌Custom-Scan under Gather node

Allows multiple workers to provide data stream to GPUs

▌Partial Aggregation before Join

Wider utilization of GpuPreAgg to reduce number of rows to be joined by CPU

▌Utilization of dynamic parallelism

Wise approach towards result buffer overflow problem in Join

▌... and more...

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 38: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

38

Short term development

1. Software quality improvement

2. Corner case handling

3. Documentation

4. Target-List calculation in GPU

5. Software quality improvement

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 39: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

39

WIP Project – In-place Computing

Why export entire data to run mathematical algorithm?

▌Runs complicated mathematical expression on GPUs, then CPU just references the result

Algorithms in SQL (even partially) automatically moves to GPUs

WIP: Data-Clustering algorithms, like Min-Max, K-means, ...

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

mathematical algorithm

Things wants to

know

Things wants to

know

result

result

mathematical algorithm

Principle of Near-Data Computing

Limited computing capability

Data Export

Page 40: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

40

Future Direction of PG-Strom

Real-life workloads are the only compass for development

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 41: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

41

WE WANT YOU GET INVOLVED

▌PostgreSQL Users

who concern about query execution performance on your workloads

▌Application Developers

who are interested in trying yet another database acceleration methodology

▌PostgreSQL Developers

who can bet the future of heterogeneous era

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 42: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

42

Resources

▌Github

https://github.com/pg-strom/devel

https://github.com/pg-strom/devel/issues

▌Documentation

Under construction.... sorry

▌Today’s slides

Uploaded soon, check #pgconfeu on Tw

▌Author’s contact

e-mail: [email protected]

Tw: @kkaigai

PGconf.EU 2015 - GPGPU Accelerates PostgreSQL

Page 43: GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

43