GPGPU Computing

GPGPU ComputingGPGPU Computing

Yong Cao

GPGPU Computing

Wh G hi C d?Wh G hi C d?Why Graphics Card?Why Graphics Card?

It’s powerful!A quiet trend

GPGPU Computing

It’ f l!It’s powerful!Processor Processing Units FLOPs per

U itClock S d

Processing PowerUnit Speed

High End Qual-Core CPU

16 (4 per core) 2 3000 MHz 96 GFLOPs

NVIDIA GTX285 240 3 1476 MHz 1063 GFLOPs

ATI Radeon HD 4870 800 2 750 MHz 1200 GFLOPsATI Radeon HD 4870 800 2 750 MHz 1200 GFLOPs

GPGPU Computing

It’s powerful!It s powerful!Die size 576 mm2

1.4 Billion TransistorsMemory width: 512bitBandwidth 142GB/s

Max board Power:236 WGTX285: 183 W (55nm)

GPGPU Computing

It’ h d hIt’s cheap and everywhere.E.g. NIVIDA sold 100 Million high-end GPGPU de icesdevices.A 128-core Geforce 9800GTX (648 GFLOPs) is $129 on Newegg comon Newegg.com.

GPGPU Computing

Wh G hi C d NOW?Wh G hi C d NOW?Why Graphics Card NOW?Why Graphics Card NOW?

B fBefore:Two years ago, everyone is using Graphics API (Cg, GLSL HLSL) for GPGPU programming GLSL, HLSL) for GPGPU programming. Restrict random-read (using Texture), NOT be able to random write (No pointer!)random-write. (No pointer!)

GPGPU Computing

Wh G hi C d NOW?Wh G hi C d NOW?Why Graphics Card NOW?Why Graphics Card NOW?

NNow:NIVIDA released CUDA two years ago, since then

Th d f CUDA f iThousands of CUDA software engineersNew job title “CUDA programmer”Around 200 CUDA based technical publications! Around 200 CUDA based technical publications!

Why?Standard C languageStandard C languageSupport Pointer! Random read and write on GPU memory.Work with C++, Fortran,

GPGPU Computing

Wh ’ GPU i th tWh ’ GPU i th tWhere’s GPU in the systemWhere’s GPU in the system

GPGPU Computing

NVIDIA GPU A hit tNVIDIA GPU A hit tNVIDIA GPU ArchitectureNVIDIA GPU Architecture

Lots of ALUsLots of ALUsLots of ControlFocus on Graphics Focus on Graphics Applicants (Data parallel)( p )

GPGPU Computing

NVIDIA GT200 A hit tNVIDIA GT200 A hit tNVIDIA GT200 ArchitectureNVIDIA GT200 Architecture

Thread Processing Cluster

Atomic Memory Access Control

GPGPU Computing

NVIDIA GT200 A hit tNVIDIA GT200 A hit tNVIDIA GT200 ArchitectureNVIDIA GT200 Architecture

Multi-Processor

GPGPU Computing

E ti M dE ti M d G hiG hi

Execution ModeExecution Mode–– GraphicsGraphics

Vtx Thread Issue

Setup / Rstr / ZCull

Geom Thread Issue Pixel Thread Issue

Input Assembler

ssorSP SP SP SP SP SP SP SP SP SP SP SP SP SP

L2 L2 L2 L2 L2 L2

FB FB FB FB FB FB

NVIDIA G80 GPU

GPGPU Computing

E ti M dE ti M d G l C ti g G l C ti g Host

Execution ModeExecution Mode–– General Computing General Computing

Thread Execution Manager

Input Assembler

Texture Texture Texture Texture Texture Texture Texture TextureTexture

Parallel DataCache

Load/store Load/store Load/store Load/store Load/store Load/store

Global Memory

NVIDIA G80 GPU

GPGPU Computing

GPGPU f G hi P i t f ViGPGPU f G hi P i t f ViGPGPU from Graphics Point of ViewGPGPU from Graphics Point of View

G hi P i g Pi li ( GPU)Graphics Processing Pipeline (on GPU)Fixed function pipelineProgrammable pipeline with Shaders

GPU Processing ModelStream computing model

GPGPU Computing

3D G hi A li ti3D G hi A li ti3D Graphics Applications3D Graphics Applications

DDemos.

GPGPU Computing

GPU F d t l Th G hi Pi liGPU F d t l Th G hi Pi liGPU Fundamentals: The Graphics PipelineGPU Fundamentals: The Graphics Pipeline

X Graphics StateS F F

Application Transform& Light Rasterize Fragment

ProcessingVideoMemory(T t )

Xformed, Lit Ve

AssemblePrimitives

Vertices (3

Screenspace tr

Fragments (pre

Final Pixels (C

GPUCPU

(Textures)

ertices (2D)

iangles (2D)

e-pixels)

olor, Depth)

A simplified graphics pipeline

GPUCPU Render-to-texture

GPGPU Computing

P g bl G hi Pi li P g bl G hi Pi li Sh dSh d

X Graphics StateS F F

Programmable Graphics Pipeline Programmable Graphics Pipeline -- ShadersShaders

TransformApplication eRasterize Shade VideoMemory(T t )

Xformed, Lit Ve

AssemblePrimitives

Vertices (3

Screenspace tr

Fragments (pre

Final Pixels (C

oProcessorFragmentProcessor

VertexProcessor

GPUCPU

(Textures)

ertices (2D)

iangles (2D)

e-pixels)

olor, Depth)

GPUCPU Render-to-texture

Programmable vertex Programmable fragment processor! processor!

GPGPU Computing

GPU Pi li T fGPU Pi li T fGPU Pipeline: TransformGPU Pipeline: TransformVertex processor (multiple in parallel)

Transform from “world space” to “image space”Compute per-vertex lighting

GPGPU Computing

GPU Pi li R t iGPU Pi li R t iGPU Pipeline: RasterizeGPU Pipeline: Rasterize

RasterizerRasterizerConvert geometric rep. (vertex) to image rep. (fragment)( g )

Fragment = image fragmentPixel + associated data: color, depth, stencil, etc.

Interpolate per-vertex quantities across pixels

GPGPU Computing

Fragment processors (multiple in parallel)

Pixel / Fragment ProcessorPixel / Fragment Processor

Fragment processors (multiple in parallel)Compute a color for each pixelOptionally read colors from textures (images)Optionally read colors from textures (images)

GPGPU Computing

GPU P g i g M d l StGPU P g i g M d l StGPU Programming Model: StreamGPU Programming Model: Stream

St P g i g M d lStream Programming ModelStreams: Stream

An array of data unitsKernels:

Stream

Kernel

Take streams as input, produce streams at outputPerform computation on streamspKernels can be linked together

GPGPU Computing

Wh St ?Wh St ?Why Streams?Why Streams?

Ample computation by exposing parallelismStream expose data parallelism

Multiple stream elements can be processed in parallel

Pi li (t k) ll liPipeline (task) parallelismMultiple tasks can be processed in parallel

Efficient communicationEfficient communicationProducer-consumer localityPredictable memory access patterny p

Optimize for throughput of all elements, not latency of oneProcessing many elements at once allows latency hiding

GPGPU Computing

R di g M t i lR di g M t i lReading MaterialReading Material

NVIDIA CUDA P g i g G id 2 0 Ch t NVIDIA CUDA Programming Guide 2.0, Chapter Onehttp://www nvidia com/object/cuda develop html and looking http://www.nvidia.com/object/cuda_develop.html and looking for documentation

NVIDIA Geforce GTX 280 Technical BriefNVIDIA Geforce GTX 280 Technical Briefhttp://www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf

GPGPU Computing

Documents

Transcript of GPGPU Computing

Lecture 11: “GPGPU” computing and the CUDA/OpenCL ... · “GPGPU” computing and the CUDA/OpenCL Programming Model Kayvon Fatahalian ... -Data-parallel programing: ZPL, Nesl-Stream

peddie gpgpu

GPGPU stream computing 2009developer.amd.com/wordpress/media/2012/10/ATIGPGPU...• OpenCL/DX11 wrapper for low level GPGPU/stream computing 12 | GDC 09| March, 2009 | Confidential

GPGPU - UCLouvain

Performance modeling in GPGPU computing

GPGPU Computing and Multicore Processors: Exploring …gsaw.org/wp-content/uploads/2018/05/2007evening_michel.pdf · GPGPU Computing and Multicore Processors: Exploring The Spectrum

OpenCL-Based Mobile GPGPU Benchmarking: Methods and …gw2/pdf/iwocl2016_mobile_gpgpu_benchmarking.pdfBenchmarking general-purpose computing on graphics pro-cessing unit (GPGPU) aims

Gpgpu intro

ORIGINAL ARTICLE GPGPU-Based Explicit Finite Element … · 2018-03-23 · the GPGPU, utilizing an array of GPU computing cores. The number of cores a kernel can utilize for its execution

GPGPU - ELTE

Architecture-Aware Optimization Targeting … › groups › nucar › GPGPU › GPGPU-2 › ...Architecture-Aware Optimization Targeting Multithreaded Stream Computing Byunghyun Jang,

GPGPU: HIGH-PERFORMANCE COMPUTING - TUM...Fig. 3. Schematic comparison of CPU and GPU architecture The most popular instrument of general-purpose GPU computing is NVIDIA’s GPGPU

Outline GPU Computing GPGPU-Sim / Manycore Accelerators (Micro)Architecture Challenges: –Branch Divergence (DWF, TBC) –On-Chip Interconnect 2.

General Purpose Computing on Graphical …kom.aau.dk/~zt/courses/Readings_in_VGIS_2008/General Purpose GPU...Computing on Graphical Processing Units (GPGPU GPGPU / ... • • AMD-ATI

Parallel Programming & Cluster Computing: GPGPU

Distributed Password Cracking Platform - de Laat · Compared to Central Processing Unit ( CPU ) computing, GPGPU computing allows for a signi cant increase in the computation capabilities

GPGPU Introduction

GPGPU Tutorial

CS 380 - GPU and GPGPU Programming Lecture 1: Introduction · 2 Lecture Overview Goals • Learn GPU architecture and programming; both for graphics and for computing (GPGPU) •

vfp report zhang - City University of New Yorkjzhang/papers/vfp_report_zhang.pdf · popular General-purpose computing on Graphics Processing Units (GPGPU) technologies. High-end GPGPU