Quartile and Outlier Detection on Heterogeneous Clusters using Distributed Radix Sort
The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Architectures
Maestro: Data Orchestration and Tuning for OpenCL Devices
Quantifying NUMA and Contention Effects in Multi-GPU Systems
Accelerating S3D: A GPGPU Case Study