Satisfying Data-Intensive Queries Using GPU Clusters
description
Transcript of Satisfying Data-Intensive Queries Using GPU Clusters
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 1
Satisfying Data-Intensive Queries Using GPU Clusters
Haicheng Wu, Jeff YoungSudhakar Yalamanchili
Computer Architecture and Systems LaboratoryCenter for Experimental Research in Computer Systems
School of Electrical and Computer EngineeringGeorgia Institute of Technology
Sponsors: AIC, AMD, LogicBlox Inc., National Science Foundation, NEC, NVIDIA
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 2
Application: Data WarehousingOn-line and off-line analysis
Retail analysis Forecasting Pricing Etc…
Combination of relational data queries and computational kernels
Current applications process 1 to 50 TBs of data [1]
Techniques can be applied to other “Big Data” problems like irregular graphs, sorting
[1] Independent Oracle Users Group. A New Dimension to Data Warehousing: 2011 IOUG Data Warehousing Survey.
……LargeQty(p) <-
Qty(q), q > 1000.
……
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 3
Proposed System Model
Red Fox: Compilation and optimization of queries for GPUs Remove need for application developer to optimize applications to run on
GPUs Oncilla: Global Address Space (GAS) layer
Create an API to simplify data movement and scheduling
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 4
Red Fox Compilation Flow
RA-to-PTX(nvcc + RA-
Lib)
Primitives Library
Runtime Manager
LogicBlox Front-
End
Language Front-
End
Translation Layer
Back-End
Datalog Queries
Query Plan PTX/Binary Kernel
Kernel Weaver
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 5
Relational Algebra Primitives on GPUs
Multi-stage algorithm
Simple primitives are close to maximum performance
More complex primitives could show better performance with newer implementations (in progress with NVIDIA Research)
Raw Performance (NVIDIA C2050)
Fastest known for GPUs!
Practical MAX
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 6
Red Fox: TPC-H Q1 Results
GPU computation scales well with problem sizeImproved primitives could lead to further 10x speedup
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 7
Oncilla: Fabrics for Accelerator Clouds
Goal: Transparent, efficient host memory aggregation across node for accelerators
Solution: Use Global Address Spaces (GAS) and commodity fabrics (HT, QPI, PCIe, 10GE, IB)
Support in-core databases using software from Red Fox project
Companies: LogicBlox, NVIDIA, AIC
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 8
Oncilla aims to combine support for multiple types of data transfer and CUDA-based optimizations under a simplified runtime.
Ex: “oncilla_malloc(2 GB, node2, gpumem)” Enable application developers and schedulers to take advantage of
high-performance GAS without needing to be experts in specialized hardware
Oncilla: Efficient Data Movement
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
HPCDB 2012 - Satisfying Data-Intensive Queries Using GPU Clusters
November 11th, 2012 9
Questions?
For more information:
Red Fox:H. Wu, G. Diamos, H. Cadambi, and S. Yalamanchili, “KernelWeaver: Automatically Fusing
Database Primitives for Efficient GPU Computation,” MICRO, December 2012
http://gpuocelot.gatech.edu/projects/compiler-projects/
Oncilla:http://gpuocelot.gatech.edu/projects/compiler-projects/oncilla-gas-infrastructure/
J. Young, S. Yalamanchili, Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds,” HPCC, June, 2012