PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

39
Simulation, Compilation, and Debugging of OpenCL on The Southern Islands Dana Schaa (AMD), Rafael Ubal, and David Kaeli (Northeastern University, Boston, MA)

description

Presentation PT-4102 by David Kaeli at the AMD Developer Summit (APU13) November 11-13, 2013.

Transcript of PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

Page 1: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

Simulation, Compilation, and Debugging ofOpenCL on The S outhern Islands

Dana S chaa (AMD), Rafael Ubal, and David Kaeli(Northeastern University, Boston, MA)

Page 2: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20132

Simulation Methodology

• Full - OS simulation

An OS runs on the simulator. The simulator implements the complete ISA, and virtualizes native hardware devices, similar to a virtual machine. Accurate simulations, but extremely slow.

• Guest program simulation

An application runs directly on the simulator. The simulator implements the non-privileged subset of the ISA, and virtualizes the system call interface (ABI). Multi2Sim falls in this category.

Application- OS vs. Guest Program

Ful l - s y st ems im ulat or core

Guestprogram 1

Guestprogram 2

Full O.S.

...

Virtualizat ion of Complete processor ISA I/O hardware

Virtualizat ion of User-space subset of ISA System call int erface

Guest programs imulat or c ore

Guestprogram 1

Guestprogram 2

...

Page 3: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20133

Instructionbytes

Instructionfields

Run oneinstruction

Instructioninformation

Pipelinetrace

ExectuableELF file

Instructionsdump

Exectuable file,program arguments

Programoutput

Executable file,program arguments,

processor configuration

Performancestatistics

Userinteraction

Cycle navigation,timing diagrams

Disassembler Emulator Timingsimulator

Visualtool

Simulation MethodologyFour-Step Simulation Process

Page 4: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20134

Simulation MethodologyCurrent Architecture Support

• In our latest Multi2Sim SVN repository─ 4 GPU + 3 CPU architectures supported or in progress

─ This presentation focuses on Southern Islands (and x86)

In progressNVIDIA Fermi X In progress –

Disasm. EmulationTiming

simulationVisualtool

ARM X In progress – –MIPS X ––x86 X X X XAMD Evergreen X X X XAMD Southern Islands X X X X

NVIDIA Kepler In progress –––

X

Page 5: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20135

The x86 EmulatorProgram Loading

• Emulation of x86 instructions─ Update x86 registers

─ Update memory map if needed

─ Example: add [bp+16], 0x5

• Emulation of Linux system calls─ Analyze system call code and arguments

─ Update memory map

─ Update register eax with return value

─ Example: read(fd, buf, count)

Stack

Program args.Env. variables

mmap region

(not initialized)

Heap

Initialized data

Text

Initialized data0x08000000

0x08xxxxxx

0x40000000

0xc0000000

eax

ebx

eax

ecx

esp

eip

Initial virtual memory image

Initial values for x86 registers

Stac

k po

inte

rIn

stru

ctio

n po

inte

r

Page 6: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20136

1) Parse ELF executable─ Read ELF sections and symbols

─ Initialize code and data

2) Initialize stack─ Program headers

─ Arguments

─ Environment variables

3) Initialize registers─ Program entry → eip

─ Stack pointer → esp

The x86 EmulatorEmulation Loop

Read instr.at eip

Instr.bytes

Decodeinstruction

Instr.fields

Instr. isint 0x80

No Yes

Emulatesystem call

Emulatex86 instr.

Move eipto next instr.

Page 7: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20137

OpenCL on the HostExecution Framework

─ An OpenCL host program performs a set of OpenCL library function calls (API calls)

─ Multi2Sim's OpenCL runtime l ibrar y , running with guest code, transparently intercepts the call. It communicates with the Multi2Sim driver using system calls with codes not reserved in Linux.

─ An OpenCL driver module (Multi2Sim code) intercepts the ABI call and communicates with the GPU emulator

─ The GPU emulator updates its internal state based on the message received from the driver

Userapplication

API call

Devicedriver

ABI call

Hardware

Internalinterface

Runtimelibrary

User

-leve

l cod

eO

S-le

vel c

ode

Page 8: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20138

OpenCL on the DeviceExecution Model

Work-group

Work-group

···

···

Work-group

···

Global Memory

Work-group

Work-item

Work-item

···

···

Work-item

···

Local Memory

Work-item

···

__kernel func(){

}

Private Memory

ND-Range Work-Group Work-Item

─ Work-items execute multiple instances of the same kernel code

─ Work-groups are sets of work-items that can synchronize and communicate efficiently

─ The ND -Range contains all work-groups, not communicating with each other and executing in any order

Page 9: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 20139

The S outhern Islands Disassembler

Disassembler Emulato r Timingsimulato r

Visualto ol

Page 10: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201310

• Source code

__kernel void vector_add( __read_only __global int *src1, __read_only __global int *src2, __write_only __global int *dst){ int id = get_global_id(0); dst[id] = src1[id] + src2[id];}

Scal

ar in

stru

ctio

nsThe loads

The additionThe store

Vect

or in

stru

ctio

ns

Vector registers

Scalar registers

The S outhern Islands DisassemblerVector Addition Kernel

Page 11: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201311

__kernel void if_kernel(__global int *v)

{ uint id = get_global_id(0); if (id < 5) v[id] = 10;}

• Source code

The comparison.Save active mask.

Store value 10.

Restore active mask.

• Assembly code

The S outhern Islands DisassemblerConditional Statements

Page 12: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201312

The S outhern Islands Emulator

Disassembler EmulatorTiming

simulato rVisual

tool

Page 13: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201313

• Responsible entity─ The device driver is the module responsible for setting up an initial

state for the hardware, leaving it ready to run the first ISA instruction.

─ Natively, it writes on hardware registers and global memory locations. On Mult i2Sim, it calls initialization functions of the emulator.

• Setup─ Instruction memories in compute units, each with one copy of the ISA

section of the kernel binary

─ Initial global memor y image, copying global buffers from CPU to GPU memory

─ Kernel arguments

─ ND -Range topology , including number of dimensions and sizes

Userapplication

API call

Devicedriver

ABI call

Hardware

Internalinterface

Runtimelibrary

The S outhern Islands EmulatorProgram Loading

Page 14: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201314

• Work-group execution─ Work-groups can execute in any order .

This order is irrelevant for emulation purposes.

─ The chosen policy is executing one work-group at a t ime, in increasing order of ID for each dimension.

• Wavefront execution─ Wavefronts within a work-group can also

execute in any order, as long as synchronizations are considered.

─ The chosen policy is executing one wavefront at a time unti l it hits a barrier , if any.

Split ND-Rangeinto work-groups

Work-grouppool

Anywork-groups

left?

Grab work-group andsplit in wavefronts

Wavefrontpool

Anywavefront

left?

For each running wavefront not stalled in a barrier:

● Read instruction @PC● Emulate (update mem. + regs.)● Advance PC

Yes

No

Yes

NoEnd

The S outhern Islands EmulatorEmulation Loop

Page 15: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201315

The S outhern Islands Timing Simulator

Disassembler EmulatorTiming

simulatorVisual

tool

Page 16: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201316

─ A command processor receives and processes commands from the host.

─ When the ND-Range is created, an ultra-threaded dispatcher (scheduler) assigns work-groups into compute units while new available slots occur.

Command Processor

Ultra-Threaded Dispatcher

ComputeUnit 0

ComputeUnit 1

ComputeUnit 31···

L1Cache

L1Cache

L1Cache···

Crossbar

Main Memory Hierarchy(L2 caches, memory controllers,

video memory)

The S outhern Islands Timing SimulatorThe GPU Architecture

Page 17: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201317

The S outhern Islands Timing SimulatorThe Compute Unit

─ The instruction memor y of each compute unit contains the OpenCL kernel.

─ A front- end fetches instructions and sends them to the appropriate execution unit.

─ There is one scalar unit, vector-memory unit, branch unit, LDS (local data store) unit.

─ There are multiple instances of SIMD units.

Fron

t-En

d

Scalar unit

Vector memory unit

Branch unit

LDS unit

SIMD unit 0

SIMD unit 1

SIMD unit 2Instructionmemory ···

Glo

bal

mem

ory

Loca

lm

emor

y

Page 18: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201318

The S outhern Islands Timing SimulatorThe Front-End

Wavefront Pool ···

···

···

···

Wavefront Pool

Wavefront Pool

Wavefront Pool

···

···

···

···

···

Fetch buffers, oneper wavefront pool

SIMD issue buffer,matchingwavefront pool

Scalar unitissue buffer

Branch unitissue buffer

Vector memoryunit issue buffer

LDS unitissue buffer

Fetch

Issue

─ Work-groups are split into wavefronts and allocated to wavefront pools .

─ The fetch and issue stages operate in a round-robin fashion.

─ There is one SIMD unit associated to each wavefront pool.

Page 19: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201319

The S outhern Islands Timing SimulatorThe SIMD Unit

─ The SIMD unit runs arithmetic- logic vector instructions.

─ There are 4 SIMD units , each one associated with one of the 4 wavefront pools.

─ The SIMD unit pipeline is modeled with 5 stages : decode, read, execute, write, and complete.

─ In the execute stage , a wavefront (64 work-items max.) is split into 4 subwavefronts (16 work-items each). Subwavefronts are pipelined over the 16 stream cores in 4 consecutive cycles.

─ The vector register file is accessed in the read and write stages to consume input and produce output operands, respectively.

Page 20: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201320

The S outhern Islands Timing SimulatorThe SIMD Unit

Execute

Work-item 0Work-item 16Work-item 32Work-item 48

PipelinedFunctionalunits

SIMD Lane 0

SIMD Lane 1Work-items 1, 17, 33, 49

...SIMD Lane 15

Work-items 15, 31, 47, 63

···Read

···

Issuebuffer

Readbuffer

Write···

Executebuffer

···

Complete

Write

buffer

···

Decode

buffer

Decode

From

com

pute

unit

fro

nt-e

nd

Vector/scalarregister file

Vectorregister file

`

Page 21: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201321

The S outhern Islands Timing SimulatorThe S calar Unit

─ Runs both arithmetic-logic and memory scalar instructions

─ Modeled with 5 stages – decode, read, execute/memory, write, complete ···Read

···

Issuebuffer

Readbuffer

···

Decode

buffer

Decode

From

com

pute

unit

fro

nt-e

nd

Scalarregister file

Execute

Memory

···

Executebuffer

Write

Complete

Write

buffer

Vectorregister file

···

Page 22: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201322

The S outhern Islands Timing SimulatorThe Vector Memor y Unit

Read

···

Issuebuffer

Decode

buffer

Decode

From

com

pute

unit

fro

nt-e

nd

Vectorregister file

···

··· Memory

Readbuffer

···

Memorybuffer

Write

Complete

Write

buffer···

Vectorregister file

Globalmemory

─ Runs vector memor y instructions

─ Modeled with 5 stages – decode, read, memory, write, complete

─ Accesses to the global memor y hierarchy happen mainly in this unit

Page 23: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201323

The S outhern Islands Timing SimulatorThe Branch Unit

Read

···

Issuebuffer

Decode

buffer

Decode

From

com

pute

unit

fro

nt-e

nd

Scalarregister file

(condition codes)

···

··· Execute

Readbuffer

···

Executebuffer

Write

Complete

Write

buffer···

Scalar reg. file(programcounter)

─ Runs branch instructions , which decide whether to make an entire wavefront jump to a target address depending on the scalar condition code

─ Modeled with 5 stages – decode, read, execute/memory, write, complete

Page 24: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201324

The S outhern Islands Timing SimulatorThe Local Data Share (LDS) Unit

─ Runs local memor y accesses instructions

─ Modeled with 5 stages – decode, read, execute/memory, write, complete

─ The memory stage accesses the compute unit local memor y for read/write

Read

···

Issuebuffer

Decode

buffer

Decode

From

com

pute

unit

fro

nt-e

nd

Vectorregister file

···

··· Memory

Readbuffer

···

Memorybuffer

Write

Complete

Write

buffer···

Vectorregister file

Localmemory

Page 25: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201325

The S outhern Islands Timing SimulatorGlobal Memor y Hierarchy

─ Fully configurable memory hierarchy, with default values based on theAMD Radeon HD 7970 Southern Islands GPU

─ One 16KB data L1 per compute unit

─ One scalar L1 cache shared by every 4 compute units

─ Six L2 banks with a total size of 128KB, each connected to a DRAM module

CU 0

L1

CU 1

L1

CU 2

L1

CU 3

L1Scalarcache

CU 28

L1

CU 29

L1

CU 30

L1

CU 31

L1Scalarcache. . .

. . .L2Bank 0

L2Bank 1

L2Bank 1

...Interconnect

Page 26: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201326

The S outhern Islands Visual Tool

Disa ssembler EmulatorTiming

simulatorVisual

tool

Page 27: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201327

The S outhern Islands Visual ToolMain Window and Timing Diagram

─ The main window provides c ycle-by- c ycle navigation throughout simulation.

─ A dedicated S outhern Is lands panel contains one widget per compute unit, showing allocated work-groups.

─ The memor y hierarchy panel shows caches connected to Southern Islands compute units, and special-purpose scalar caches.

Page 28: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201328

Ongoing ProjectsCPU- GPU Cache Coherence Protocol

. . .

L2Bank 0

...

ARM x86 Evg. S.I. . . .Fermi

L1 L1 L1 L1

Interconnect

. . .L2Bank 1

NMOESI Interface─ MOESI protocol extended with an additional

non-coherent write state: NMOESI

─ CPU and GPU cores with any ISA can be connected to different entry points of the memory hierarchy

─ Processing nodes interact with the memory hierarchy with three types of accesses: load, store, and n-store

─ GPUs and CPUs running OpenCL kernels issue n-store write accesses. The rest issue regular store accesses.

Page 29: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201329

Ongoing ProjectsCooperative E xecution of Work- Groups

S.I. S.I. S.I. S.I.x86x86

Tim

e

Hostprogram+ kernel

KernelKernelHardware

ND-Range

S.I. S.I. S.I. S.I.x86x86

WG-0 WG-1 WG-2 WG-3 WG-N. . .

• Work-group mapping─ Portions of ND-Range executed by CPU/GPU cores

with different ISAs

─ Work-groups mapped to CPU cores or GPU compute units as they become available

• Attained concurrency─ x86 cores run both the host program

and a portion of the ND-Range

─ Idle regions are removed during the execution of the ND-Range

Page 30: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201330

─ LLVM-based compiler for OpenCL and CUDA kernels

─ Future release Multi2Sim 4.2 will include a working version

─ Diagrams show progress as per SVN Rev. 1838

• Yellow = under development• Blue = arriving soon• Green = supported

Ongoing ProjectsMulti2C – An OpenCL /CUDA Kernel Compiler

vec-add.clOpenCL Cto LLVM

front-end

CUDAto LLVM

front-endvec-add.cu

LLVM toSouthernIslands

back-end

vec-add.llvm

vec-add.s

LLVM toFermi

back-end

LLVM toKepler

back-end

vec-add.s

vec-add.s

SouthernIslands

assembler

Fermiassembler

Keplerassembler

vec-add.bin

vec-add.cubin

vec-add.cubin

Page 31: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201331

Ongoing ProjectsSimulation of OpenGL Pipelines

• Goal─ Leverage our Southern Islands pipeline models to

execute OpenGL vertex and fragment shaders

• Steps─ Develop a runtime l ibrar y to link with guest programs,

implementing the OpenGL, GLUT and GLEW library APIs

─ Reverse engineer AMD's OpenGL binar y format to decode embedded metadata

─ Timing model of the OpenGL pipeline

• New capabilities targeted─ Timing simulation of other critical GPU components, such as rasterizer

─ Concurrency evaluation of compute + graphics pipelines

Page 32: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201332

The Multi2Sim CommunityAcademic Efforts at Northeastern

• The “GPU Programming and Architecture” course─ We started an unofficial seminar that students can voluntarily attend. The syllabus covers

OpenCL programming, GPU architecture, and state-of-the-art research topics on GPUs.─ Average attendance of ~25 students per semester.

• Undergraduate directed studies─ Official alternative equivalent to a 4- credit course that an undergraduate student can

optionally enroll, collaborating with Multi2Sim development

• Graduate-level development─ Multiple ongoing PhD theses using Multi2Sim as support tool─ All related development becomes openly available through the SVN repo

Page 33: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201333

The Multi2Sim CommunityAcademic Publicat ions

• Conference papers

─ Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors, SBAC-PAD, 2007

─ The Multi2Sim Simulation Framework: A CPU-GPU Model for Heterogeneous Computing, PACT, 2012

• Tutorials

─ The Multi2Sim Simulation Framework: A CPU-GPU Model for Heterogeneous Computing, PACT, 2011

─ Programming and Simulating Fused Devices — OpenCL and Multi2Sim, ICPE, 2012

─ Multi-Architecture ISA-Level Simulation of OpenCL, IWOCL, 2013

─ Simulation of OpenCL and APUs on Multi2Sim, ISCA, 2013

Page 34: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201334

The Multi2Sim Communityw w w.multi2sim.org

Page 35: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201335

The Multi2Sim Communityw w w.multi2sim.org

Page 36: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201336

The Multi2Sim Communityw w w.multi2sim.org

Page 37: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

| Simulation, Compilation, and Debugging of OpenCL on the AMD S outhern Islands | November 13th, 201337

`

The Multi2Sim CommunitySponsors

Page 38: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

38

Thanks!Questions?

Page 39: PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern Islands, by David Kaeli

39

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AT TRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.

Disclaimer & Attribution