SQL vs GPU

10
1 SQL vs GPU Richard Wilton Johns Hopkins University Department of Physics and Astronomy

description

SQL vs GPU. Richard Wilton Johns Hopkins University Department of Physics and Astronomy. Rationale. Terabyte-scale data is best managed in a DBMS Some classes of computation are best implemented using a GPU. How to do GPU computation in a terabyte-scale database. - PowerPoint PPT Presentation

Transcript of SQL vs GPU

Page 1: SQL vs GPU

1

SQL vs GPU

Richard Wilton

Johns Hopkins University

Department of Physics and Astronomy

Page 2: SQL vs GPU

2

Rationale

• Terabyte-scale data is best managed in a DBMS• Some classes of computation are best implemented

using a GPU

Page 3: SQL vs GPU

3

How to do GPU computation in a terabyte-scale database

• Do the computation “outside” the database (export the data / compute / re-import the data)

• Do the computation “inside” the database• Something in between

Page 4: SQL vs GPU

IPCSQL code

SQLCLR procedure or function

Special-case functionality

SQL Server Out-of-process server

SQLCLR out-of-process server

Why?• Avoid memory, threading, and permissions restrictions on SQLCLR implementations• Load dynamic-link libraries• Invoke native-code methods• Exploit lower-level APIs (e.g. SqlClient, bulk insert) to facilitate data movement between SQL Server and non-SQL computational resources

The basic concept• Implement computational functionality in a separate process from the SQL Server process• Access that functionality using interprocess communication (IPC)

Page 5: SQL vs GPU

5

IPCSQL code

SQLCLR procedure CUDA functionality

SQL Server Out-of-process server

declare @sql nvarchar(max) set @sql = N'exec SqA1.dbo.SWGPerf @qOffset=311, @chrNum=7, @dimTile=128' exec dbo.ExecSWG @sqlCmd=@sql, @targetTable='##tmpX08'

SQL

SQLCLR implementation (in C#)• Allocates IPC buffer• Initializes IPC buffer with the result set from a specified SQL query • Signals out-of-process server• Waits for response from out-of-process server• Returns data from the IPC buffer as a SQL result set

Out-of-process server implementation (in C#)

CUDA implementation (in C++)

• Loads the CUDA implementation (dynamic link library)• Invokes a method in the DLL with a reference to the IPC buffer• Waits for completion• Signals the SQLCLR implementation• Stand-alone application or Windows service

• Implements GPU functionality• Uses the IPC buffer for both incoming and outgoing data

SQLCLR out-of-process server implementation

Page 6: SQL vs GPU

6

A test case: genomic sequence data

• Nucleic acid sequence data• Data is processed in a “loose” workflow by a

sequence of software tools• Example:

– Align short sequences to a long reference sequence– Identify alignment regions of interest– Map regions of interest to annotation data (genome

map, known variations, etc.)

Page 7: SQL vs GPU

7

A test case: genomic sequence data

• Typical data-management scenario:– All data is maintained in file-system files– Processing software reads and writes files– “Database-like” operations (e.g. joins) are performed

using command-line tools (e.g. Perl scripts)

• Can we use a DBMS as a repository?– All data is maintained in the DBMS– Processing software reads and writes tables– “Database-like” operations (e.g. joins) are performed

using SQL

Page 8: SQL vs GPU

8

A test case: genomic sequence data

• Nucleic acid sequence data• DBMS: data repository• GPU: implementation of sequence alignment algorithms

– Gapped alignment: Smith Waterman– Non-gapped alignment: bit vector mapping through a hash table

Page 9: SQL vs GPU

9

A test case: genomic sequence data

• Results: so far, so good …– Sequence data can be managed using a DBMS

• Use low-level (C#, C++) implementations to manipulate string and binary data

– Sequence data from the DBMS can be processed using GPU implementations

• GPU sequence alignment implementations can exploit task parallelism and execute 10x (or more) faster than equivalent CPU implementations

• But …– DBMS data abstractions (tables, columns) must be mapped to

and from native GPU data formats and layouts– Data movement out of and into the DBMS is slow

Page 10: SQL vs GPU

10

Going forward…

• Compromise: maintain GPU-formatted data as BLOBs in the file system– BLOBs are in database “native” format

– BLOBs correspond to database tables or table partitions

• Much programming and testing remain to be done…