AMD/ATI GPU hardware - Aalto · ATI, AMD & GPGPU ATI incorporated 1985 1 AMD acquired ATI 2006 GPUs...

Post on 19-Aug-2020

2 views 0 download

Transcript of AMD/ATI GPU hardware - Aalto · ATI, AMD & GPGPU ATI incorporated 1985 1 AMD acquired ATI 2006 GPUs...

AMD/ATI GPU hardware

Antti P Miettinen

February 15, 2010

Antti P Miettinen AMD/ATI GPU hardware

ATI, AMD & GPGPU

◮ ATI incorporated 1985 1

◮ AMD acquired ATI 2006

◮ GPUs for GPGPU◮ R600: first generation with unified shader model◮ R700: first generation with OpenCL support◮ Latest: R800/Evergreen

◮ GPGPU software◮ Close-to-metal (CTM)◮ Brook, Brook+◮ OpenCL

1NVIDIA released first product 1995

Antti P Miettinen AMD/ATI GPU hardware

Overview

Outputs

Commands

Mem

ory

Controlle

r

Data Parallel Processor (DPP) Array

Host

Application

Command Processor

Instructions

Constants

Inputs

System-Memory

Address Space

Outputs

Commands

Instructions

Constants

Inputs

R600 Local

Memory

Interrupts

R600

Commands, Instructions and data

Memory-Mapped

R600 Registers

Antti P Miettinen AMD/ATI GPU hardware

More details

Host ApplicationCompute Driver

SystemMemory

Stream ProcessorLocal Memory

Commands

Instructionsand Constants

Inputsand outputs

Commands

Instructionsand Constants

Inputsand outputs

Command Processor

Ultra-threaded Dispatch Processor

Output Cache

Mem

ory

Read

and W

rite

Cache

L1 Input C

ache

L2 Input C

ache

Mem

ory

Controlle

r

DM

A

Instruction a

nd

Consta

nt C

ache

ProgramCounter

ProgramCounter

ProgramCounter

ProgramCounter

ATIStreamProcessor

Antti P Miettinen AMD/ATI GPU hardware

DPP array

Ultra-Threaded Dispatch Processor

SIMDEngine

SIMDEngine

SIMDEngine

SIMDEngine

General-Purpose Registers

BranchExecutionUnit

StreamCores

T-Stream Core

Instructionand ControlFlow

ThreadProcessor

Antti P Miettinen AMD/ATI GPU hardware

Memory hierarchy

◮ registers◮ GPRs◮ constant registers

◮ caches◮ instruction caches per instruction type (CF, ALU etc)◮ constant cache◮ texture cache per SIMD◮ L2 input cache per memory channel◮ read/write cache

◮ local data share per SIMD

◮ global data share across SIMDs

◮ local memory: memory accessible to the GPU

Antti P Miettinen AMD/ATI GPU hardware

R700 data sharing

Antti P Miettinen AMD/ATI GPU hardware

Terminology

AMD/ATI term NVIDIA term Description

SIMD engine multiprocessor GPU subunit that has aprogram counter

thread processor scalar processor GPU execution subunit

local memory device memory memory accessible to theGPU

wavefront warp set of threads running inlockstep

local data share shared memory memory that can be sharedby a thread block

Antti P Miettinen AMD/ATI GPU hardware

Instruction set

◮ Control flow instructions◮ initiate ALU clauses, vertex/texture fetch etc◮ loops◮ calls, jumps

◮ ALU clauses◮ no control flow (but can use predication)◮ instruction group: 1-5 instructions, 0-2 literals◮ 5-way VLIW: X/Y/Z/W and Trans ALUs

◮ texture/vertex fetch

◮ export (actually read/write)

◮ data share (separate clauses before R800)

Antti P Miettinen AMD/ATI GPU hardware

Thread state

◮ program counter is shared by threads within a SIMD

◮ loop state (constant, index)

◮ stack (loop nesting, predicates)

◮ GPRs◮ thread private◮ clause temporary◮ SIMD global

◮ constant registers

◮ previous vector, previous scalar

◮ predication state

Antti P Miettinen AMD/ATI GPU hardware

Simple vector addition

__kernel void

vectorAddition(__global float * output,

__global float * input0,

__global float * input1,

const uint width)

{

int bx = get_group_id(0);

int tx = get_local_id(0);

int idx = bx * get_local_size(0) + tx;

if (idx >= 0 && idx < width)

output[idx] = input0[idx] + input1[idx];

}

Antti P Miettinen AMD/ATI GPU hardware

RV710 code

Antti P Miettinen AMD/ATI GPU hardware