Post on 13-May-2015
description
Ofer Rosenberg
The GPU continuum workshop, April 25 2013
THE GPGPU CONTINUUM
CONTENT
• Intel’s Compute Continuum
• GPGPU Evolution
• The GPGPU Continuum
• Mobile GPGPU challenges
• GPGPU Continuum challenges
• Towards the Continuum
INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010
GPGPU EVOLUTION
2004 – Stanford University: Brook for GPUs
2006 – AMD releases CTM
NVIDIA releases CUDA
2008 – OpenCL 1.0 released
G80 – 346 GFLOPS R580 – 375 GFLOPS
GPGPU EVOLUTION
Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1
1,024 Intel Xeon E5450 CPUs
5,120 Radeon 4870 X2 GPUs
Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A
14,336 Xeon X5670 CPUs
7,168 Nvidia Tesla M2050 GPUs
Tianhe-1 : 563 TFLOPS
Tianhe-1A : 2577 TFLOPS
Source: http://www.top500.org/lists/
GPGPU EVOLUTION
2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320)
Nexus 10 (ARM Mali T604)
Android 4.2 adds GPU support for Renderscript
2014 – NVIDIA Tegra 5 will support CUDA
2013 – GPGPU Continuum becomes a reality
THE GPGPU CONTINUUM
Apple A6 GPU
25 GFLOPS
< 2W
ORNL TITAN SC
27 PFLOPS
8200 KW
AMD G-T16R
46 GFLOPS*
4.5W
NVIDIA GTX Titan
4500 GFLOPS
250W
Intel i7-3770
511 GFLOPS*
77W* GFLOPS of CPU+GPU
Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum:
A common ecosystem
built on a common (SW) architecture
INTRO TO LEADING MOBILE GPU VENDORS
Qualcomm Adreno 320
• Part of Snapdragon S4
• Unified Shader
• SIMD4 ?
• Supports OpenCL 1.1 (E)
• 50 GFlops
http://kyokojap.myweb.hinet.net/gpu_gflops/
Imagination PowerVR 543
• Apple, Samsung, Motorola,
Intel
• Unified Shaders
• Supports OpenCL 1.1 (E)
• 38 Gflops (Apple’s MP4 ver)
ARM Mali T604
• 4 Cores
• Multiple “pipes” per core
• Supports OpenCL 1.1
• 68 GFlops
Vivante CG4000
• Unified Shaders
• 4 Cores, SIMD4 each
• Supports OpenCL 1.2
• 48 Gflops
NVIDIA Tegra 4
• 6 X 4-wide Vertex shaders
• 4 X 4-wide Pixel Shaders
• No GPGPU support
• 74 GFLOPS
MOBILE GPGPU CHALLENGES
• Many Different GPU Architectures
• Optimizing for each sets high bar on development costs
• Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)
• Ecosystem
• Lack of libraries, wizards, middleware Slow & expensive development
• Distribution Model
• Driver updates are part of OS distribution (no more per-month updates…)
• End users are less likely to update version higher standards on stability &
performance of driver release
• Security – the unspoken issue (hole) …
GPGPU CONTINUUM CHALLENGES
• Many Different GPU Architectures
• Optimizing for each sets high bar on development costs
• Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)
• Ecosystem
• Lack of libraries, wizards, middleware Slow & expensive development
• Distribution Model
• End users are less likely to update version higher standards on stability &
performance of driver release
• Security – the unspoken issue (hole) …
These challenges are a barrier to GPGPU adoption across the continuum
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
GPU
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
OpenCL
CUDADirect
Compute
Render
ScriptGPU
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
OpenCL
CUDADirect
Compute
Render
Script
OpenACC
C++ AMP
Fortran
Aparapi
(Java)
PyOpenCL
NumbaPro
(Python)
WebCL
GPU
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …
OpenCL
CUDADirect
Compute
Render
Script
OpenACC
C++ AMP
Fortran
Aparapi
(Java)
PyOpenCL
NumbaPro
(Python)
WebCL
GPU
A Jungle of languages… but are these the right ones ?
TOWARDS THE CONTINUUM (1) - LANGUAGES
• Current GPGPU languages are C/C++
based
• There are “binding” to Python, Java,
Javascript – but kernels are still C/C++
• Current developers trends:
• Managed languages (Java , C#)
• Scripting languages (Python, PHP)
• Higher abstraction & manageability:
• More room for tools to excel on
optimization
• Mitigate difference between GPU
architectures
Data from CodeEval.com, based on 100K+ code samples
https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-of-
Programming-Language
GPGPU languages need to evolve
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
LLVM IR
CUDA
Vendor X IL
Vendor X GPU
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
LLVM IR
OpenCL CUDA
Vendor X IL
Vendor X GPU
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
• Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR
• Most languages also posses similar
“API capabilities” set
LLVM IR
Render
ScriptOpenCL CUDA
OpenACC
Vendor X IL
Vendor X GPU
TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
• Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR
• Most languages also posses similar
“API capabilities” set
• Defining a common stack based on
LLVM & common API will:
• Improve the compiler
• Increase driver quality & stability
• Enable unified debugger / profiler
• …
LLVM IR
Render
ScriptOpenCL CUDA
OpenACC
Vendor X IL
Vendor X GPU
Define GPGPU Virtual Machine based on LLVM
TAKEAWAYS
• GPGPU Continuum is here - from Mobile devices to HPC
• Vision: A common ecosystem built on a common (SW)
architecture
• Challenges: many architectures, immature tools, ecosystem
QUESTIONS
• Q: What about “Heterogeneous Computing” ?
• A: Go back, replace each “GPGPU” with “Heterogeneous
Computing” – and it all fits…
• More ?
SOME SOURCES:
• http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/New-graphics-the-biggest-news-in-Ivy-Bridge.html
• http://elrond.informatik.tu-freiberg.de/papers/WorldComp2012/PDP2833.pdf
• http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/5
• http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400
• http://www.chipdesignmag.com/pallab/2011/06/30/arm-mali-gpu-unifying-graphics-across-platforms/
• http://en.wikipedia.org/wiki/Adreno#Renaming_to_Adreno
• http://en.wikipedia.org/wiki/PowerVR#Series_5_.28SGX.29
• http://en.wikipedia.org/wiki/Mali_(GPU)
• http://johndayautomotivelectronics.com/?p=12412
• http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidia-geforce-ulp/
• http://www.brightsideofnews.com/print/2013/1/30/rise-of-vivante-fastest-tablet-gpu-on-the-market.aspx
• https://www.uplinq.com/2012/schedule/accelerating-your-android-application-renderscript-and-llvm-0
• http://www.androidauthority.com/adreno-320-features-performance-benchmarks-103269/