Post on 13-Jan-2022
GPGPU Computing
GPGPU ComputingGPGPU Computing
Yong Cao
GPGPU Computing
Wh G hi C d?Wh G hi C d?Why Graphics Card?Why Graphics Card?
It’s powerful!A quiet trend
Copyright © 2009 by Yong Cao
GPGPU Computing
Wh G hi C d?Wh G hi C d?Why Graphics Card?Why Graphics Card?
It’ f l!It’s powerful!Processor Processing Units FLOPs per
U itClock S d
Processing PowerUnit Speed
High End Qual-Core CPU
16 (4 per core) 2 3000 MHz 96 GFLOPs
NVIDIA GTX285 240 3 1476 MHz 1063 GFLOPs
ATI Radeon HD 4870 800 2 750 MHz 1200 GFLOPsATI Radeon HD 4870 800 2 750 MHz 1200 GFLOPs
Copyright © 2009 by Yong Cao
GPGPU Computing
Wh G hi C d?Wh G hi C d?Why Graphics Card?Why Graphics Card?
It’s powerful!It s powerful!Die size 576 mm2
1.4 Billion TransistorsMemory width: 512bitBandwidth 142GB/s
Max board Power:236 WGTX285: 183 W (55nm)
Copyright © 2009 by Yong Cao
GPGPU Computing
Wh G hi C d?Wh G hi C d?Why Graphics Card?Why Graphics Card?
It’ h d hIt’s cheap and everywhere.E.g. NIVIDA sold 100 Million high-end GPGPU de icesdevices.A 128-core Geforce 9800GTX (648 GFLOPs) is $129 on Newegg comon Newegg.com.
Copyright © 2009 by Yong Cao
GPGPU Computing
Wh G hi C d NOW?Wh G hi C d NOW?Why Graphics Card NOW?Why Graphics Card NOW?
B fBefore:Two years ago, everyone is using Graphics API (Cg, GLSL HLSL) for GPGPU programming GLSL, HLSL) for GPGPU programming. Restrict random-read (using Texture), NOT be able to random write (No pointer!)random-write. (No pointer!)
Copyright © 2009 by Yong Cao
GPGPU Computing
Wh G hi C d NOW?Wh G hi C d NOW?Why Graphics Card NOW?Why Graphics Card NOW?
NNow:NIVIDA released CUDA two years ago, since then
Th d f CUDA f iThousands of CUDA software engineersNew job title “CUDA programmer”Around 200 CUDA based technical publications! Around 200 CUDA based technical publications!
Why?Standard C languageStandard C languageSupport Pointer! Random read and write on GPU memory.Work with C++, Fortran,
Copyright © 2009 by Yong Cao
GPGPU Computing
Wh ’ GPU i th tWh ’ GPU i th tWhere’s GPU in the systemWhere’s GPU in the system
Copyright © 2009 by Yong Cao
GPGPU Computing
NVIDIA GPU A hit tNVIDIA GPU A hit tNVIDIA GPU ArchitectureNVIDIA GPU Architecture
Lots of ALUsLots of ALUsLots of ControlFocus on Graphics Focus on Graphics Applicants (Data parallel)( p )
Copyright © 2009 by Yong Cao
GPGPU Computing
NVIDIA GT200 A hit tNVIDIA GT200 A hit tNVIDIA GT200 ArchitectureNVIDIA GT200 Architecture
Thread Processing Cluster
Atomic Memory Access Control
Copyright © 2009 by Yong Cao
GPGPU Computing
NVIDIA GT200 A hit tNVIDIA GT200 A hit tNVIDIA GT200 ArchitectureNVIDIA GT200 Architecture
Multi-Processor
Copyright © 2009 by Yong Cao
GPGPU Computing
E ti M dE ti M d G hiG hi
Host
Execution ModeExecution Mode–– GraphicsGraphics
Vtx Thread Issue
Setup / Rstr / ZCull
Geom Thread Issue Pixel Thread Issue
Input Assembler
SP SP
roce
ssorSP SP SP SP SP SP SP SP SP SP SP SP SP SP
L1
TF
Thre
ad P
r
L1
TF
L1
TF
L1
TF
L1
TF
L1
TF
L1
TF
L1
TF
L2 L2 L2 L2 L2 L2
13
FB FB FB FB FB FB
Copyright © 2009 by Yong Cao, Referencing UIUC ECE498AL Course Notes
NVIDIA G80 GPU
GPGPU Computing
E ti M dE ti M d G l C ti g G l C ti g Host
Execution ModeExecution Mode–– General Computing General Computing
Thread Execution Manager
Input Assembler
Texture Texture Texture Texture Texture Texture Texture TextureTexture
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Parallel DataCache
Load/store Load/store Load/store Load/store Load/store Load/store
14
Global Memory
Copyright © 2009 by Yong Cao, Referencing UIUC ECE498AL Course Notes
NVIDIA G80 GPU
GPGPU Computing
GPGPU f G hi P i t f ViGPGPU f G hi P i t f ViGPGPU from Graphics Point of ViewGPGPU from Graphics Point of View
G hi P i g Pi li ( GPU)Graphics Processing Pipeline (on GPU)Fixed function pipelineProgrammable pipeline with Shaders
GPU Processing ModelStream computing model
GPGPU Computing
3D G hi A li ti3D G hi A li ti3D Graphics Applications3D Graphics Applications
DDemos.
GPGPU Computing
GPU F d t l Th G hi Pi liGPU F d t l Th G hi Pi liGPU Fundamentals: The Graphics PipelineGPU Fundamentals: The Graphics Pipeline
X Graphics StateS F F
Application Transform& Light Rasterize Fragment
ProcessingVideoMemory(T t )
Xformed, Lit Ve
AssemblePrimitives
Vertices (3
Screenspace tr
Fragments (pre
Final Pixels (C
o
GPUCPU
(Textures)
ertices (2D)
3D)
iangles (2D)
e-pixels)
olor, Depth)
A simplified graphics pipeline
GPUCPU Render-to-texture
Copyright © 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke
GPGPU Computing
P g bl G hi Pi li P g bl G hi Pi li Sh dSh d
X Graphics StateS F F
Programmable Graphics Pipeline Programmable Graphics Pipeline -- ShadersShaders
TransformApplication eRasterize Shade VideoMemory(T t )
Xformed, Lit Ve
AssemblePrimitives
Vertices (3
Screenspace tr
Fragments (pre
Final Pixels (C
oProcessorFragmentProcessor
VertexProcessor
GPUCPU
(Textures)
ertices (2D)
3D)
iangles (2D)
e-pixels)
olor, Depth)
GPUCPU Render-to-texture
Programmable vertex Programmable fragment processor! processor!
Copyright © 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke
GPGPU Computing
GPU Pi li T fGPU Pi li T fGPU Pipeline: TransformGPU Pipeline: TransformVertex processor (multiple in parallel)
Transform from “world space” to “image space”Compute per-vertex lighting
Copyright © 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke
GPGPU Computing
GPU Pi li R t iGPU Pi li R t iGPU Pipeline: RasterizeGPU Pipeline: Rasterize
RasterizerRasterizerConvert geometric rep. (vertex) to image rep. (fragment)( g )
Fragment = image fragmentPixel + associated data: color, depth, stencil, etc.
Interpolate per-vertex quantities across pixels
Copyright © 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke
GPGPU Computing
Fragment processors (multiple in parallel)
Pixel / Fragment ProcessorPixel / Fragment Processor
Fragment processors (multiple in parallel)Compute a color for each pixelOptionally read colors from textures (images)Optionally read colors from textures (images)
Copyright © 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke
GPGPU Computing
GPU P g i g M d l StGPU P g i g M d l StGPU Programming Model: StreamGPU Programming Model: Stream
St P g i g M d lStream Programming ModelStreams: Stream
An array of data unitsKernels:
Stream
Kernel
Take streams as input, produce streams at outputPerform computation on streamspKernels can be linked together
GPGPU Computing
Wh St ?Wh St ?Why Streams?Why Streams?
Ample computation by exposing parallelismStream expose data parallelism
Multiple stream elements can be processed in parallel
Pi li (t k) ll liPipeline (task) parallelismMultiple tasks can be processed in parallel
Efficient communicationEfficient communicationProducer-consumer localityPredictable memory access patterny p
Optimize for throughput of all elements, not latency of oneProcessing many elements at once allows latency hiding
GPGPU Computing
R di g M t i lR di g M t i lReading MaterialReading Material
NVIDIA CUDA P g i g G id 2 0 Ch t NVIDIA CUDA Programming Guide 2.0, Chapter Onehttp://www nvidia com/object/cuda develop html and looking http://www.nvidia.com/object/cuda_develop.html and looking for documentation
NVIDIA Geforce GTX 280 Technical BriefNVIDIA Geforce GTX 280 Technical Briefhttp://www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf
Copyright © 2009 by Yong Cao