Satisfying Data-Intensive Queries Using GPU Clusters

9
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters November 11 th , 2012 1 Satisfying Data-Intensive Queries Using GPU Clusters Haicheng Wu, Jeff Young Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology Sponsors: AIC, AMD, LogicBlox Inc., National Science Foundation, NEC, NVIDIA

description

Satisfying Data-Intensive Queries Using GPU Clusters. Haicheng Wu, Jeff Young Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology. - PowerPoint PPT Presentation

Transcript of Satisfying Data-Intensive Queries Using GPU Clusters

Page 1: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 1

Satisfying Data-Intensive Queries Using GPU Clusters

Haicheng Wu, Jeff YoungSudhakar Yalamanchili

Computer Architecture and Systems LaboratoryCenter for Experimental Research in Computer Systems

School of Electrical and Computer EngineeringGeorgia Institute of Technology

Sponsors: AIC, AMD, LogicBlox Inc., National Science Foundation, NEC, NVIDIA

Page 2: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 2

Application: Data WarehousingOn-line and off-line analysis

Retail analysis Forecasting Pricing Etc…

Combination of relational data queries and computational kernels

Current applications process 1 to 50 TBs of data [1]

Techniques can be applied to other “Big Data” problems like irregular graphs, sorting

[1] Independent Oracle Users Group. A New Dimension to Data Warehousing: 2011 IOUG Data Warehousing Survey.

……LargeQty(p) <-

Qty(q), q > 1000.

……

Page 3: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 3

Proposed System Model

Red Fox: Compilation and optimization of queries for GPUs Remove need for application developer to optimize applications to run on

GPUs Oncilla: Global Address Space (GAS) layer

Create an API to simplify data movement and scheduling

Page 4: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 4

Red Fox Compilation Flow

RA-to-PTX(nvcc + RA-

Lib)

Primitives Library

Runtime Manager

LogicBlox Front-

End

Language Front-

End

Translation Layer

Back-End

Datalog Queries

Query Plan PTX/Binary Kernel

Kernel Weaver

Page 5: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 5

Relational Algebra Primitives on GPUs

Multi-stage algorithm

Simple primitives are close to maximum performance

More complex primitives could show better performance with newer implementations (in progress with NVIDIA Research)

Raw Performance (NVIDIA C2050)

Fastest known for GPUs!

Practical MAX

Page 6: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 6

Red Fox: TPC-H Q1 Results

GPU computation scales well with problem sizeImproved primitives could lead to further 10x speedup

Page 7: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 7

Oncilla: Fabrics for Accelerator Clouds

Goal: Transparent, efficient host memory aggregation across node for accelerators

Solution: Use Global Address Spaces (GAS) and commodity fabrics (HT, QPI, PCIe, 10GE, IB)

Support in-core databases using software from Red Fox project

Companies: LogicBlox, NVIDIA, AIC

Page 8: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 8

Oncilla aims to combine support for multiple types of data transfer and CUDA-based optimizations under a simplified runtime.

Ex: “oncilla_malloc(2 GB, node2, gpumem)” Enable application developers and schedulers to take advantage of

high-performance GAS without needing to be experts in specialized hardware

Oncilla: Efficient Data Movement

Page 9: Satisfying Data-Intensive Queries Using GPU Clusters

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters

November 11th, 2012 9

Questions?

For more information:

Red Fox:H. Wu, G. Diamos, H. Cadambi, and S. Yalamanchili, “KernelWeaver: Automatically Fusing

Database Primitives for Efficient GPU Computation,” MICRO, December 2012

http://gpuocelot.gatech.edu/projects/compiler-projects/

Oncilla:http://gpuocelot.gatech.edu/projects/compiler-projects/oncilla-gas-infrastructure/

J. Young, S. Yalamanchili, Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds,” HPCC, June, 2012