AMD-SPL Runtime Programming Guide
Jiawei
Outline
• What is AMD-SPL runtime• How to use AMD-SPL runtime
WHAT IS AMD-SPL RUNTIME
The Core of SPLEncapsulation
Resource management
Workflow control
Optimization
Based on CAL
Goal
Overcome limitations of Brook+
Provide friendly programming
interface for CAL
Support the development of SPL
What is in SPL Runtime
SPL Runtime
Program Manageme
nt
Buffer Manageme
nt
Device Manageme
nt
Outline
• What is AMD-SPL runtime• How to use AMD-SPL runtime
HOW TO USE SPL RUNTIME
Pre-Requirements
Windows
•Visual Studio 2005•AMD Stream SDK 1.4 beta•AMD-SPL 1.0 beta or higher
Linux •……
Add Include Directories
• Add include path in VS2005– CAL: “$(CALROOT)\include\”– SPL: “$(SPLROOT)\include\”– Runtime: “$(SPLROOT)\include\core\cal”
Note: $(SPLROOT) is the root folder of SPL
Add Library Directories
• Add library directories in VS2005– CAL:
• “$(CALROOT)\lib\lh32\” Vista 32bit• “$(CALROOT)\lib\lh64\” Vista 64bit• “$(CALROOT)\lib\xp32\” XP 32bit• “$(CALROOT)\lib\xp64\” XP 64bit
– SPL• “$(SPLROOT)\lib
Note: $(SPLROOT) is the root folder of SPL
Add Library Dependencies
• Add additional dependencies in VS2005– CAL:
• aticalrt.lib aticalcl.lib
– SPL:• amd-spl_d.libDebug version• amd-spl.lib Release version
Header and Namespaces
• Include proper header files– #include “cal.h” CAL header– #include “amdspl.h” SPL header– #include “RuntimeDefs.h” Runtime header
• Using namespaces– using namespace amdspl;– using namespace amdspl::core::cal;
DEFINE THE IL KERNEL
Code in IL
AMD Stream Kernel Analyzer
Generate IL from Brook+ kernel• Easier to program• Difficult to maintain and optimize
Write IL manually• Difficult to program and understand• Easier to optimize• Provide more GPU features
IL Kernel Sample
kernel void k(out float o<>, float i<>, float c){ o = i + c;}
il_ps_2_0dcl_output_generic o0dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)dcl_input_position_interp(linear_noperspective) v0.xy__dcl_cb cb0[1]sample_resource(0)_sampler(0) r1, v0.xy00add o0, r1, cb0[0]endmainend
The Brook+ kernel equivalent:
IL Source String
const char * __sample_program_src__ =
"il_ps_2_0\n""dcl_output_generic o0\n""dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n""dcl_input_position_interp(linear_noperspective) v0.xy__\n""dcl_cb cb0[1]\n""sample_resource(0)_sampler(0) r1, v0.xy00\n""add o0, r1, cb0[0]\n""endmain\n""end\n";
Kernel Information
• Define the kernel using template class ProgramInfo– Kernel Parameters– ID of the Kernel– Source of the Kernel
template <int outputsT, int inputsT = 0, int constantsT = 0, bool globalsT = false> class ProgramInfo{ ProgramInfo(const char* ID, const char* source) {...}...};
Define the IL Kernel in SPL
• Define a global object for the kernel
typedef ProgramInfo<1, 1, 1, false> SampleProgram;
SampleProgram sampleProgInfo = SampleProgram("Sample Program", __sample_program_src__);
INITIALIZE SPL RUNTIME
Initialize SPL Runtime
Get runtime instance
Get device manager
Get buffer manager
Get program manager
Runtime *runtime = Runtime::getInstance(); assert(runtime); DeviceManager *devMgr = runtime->getDeviceManager(); assert(devMgr); BufferManager *bufMgr = runtime->getBufferManager(); assert(bufMgr); ProgramManager* progMgr = runtime->getProgramManager(); assert(progMgr);
Assign Device to SPL
bool r;r = devMgr->assignDevice(0);assert(r);
Assign device to device manager
The device manager will handle device initialization and destroy.
SPL cannot access device which is not assigned to it
DO GPGPU COMPUTING
Initialize CPU Buffer
void fillBuffer(float buf[], int size){ for (int i = 0;i < size; i++) { buf[i] = (float)i; }}float *cpuInBuf = new float[1024 * 512];float *cpuOutBuf = new float[1024 * 512];float constant = 3;
fillBuffer(cpuInBuf, 1024 * 512);
Get Device
• Get the default device• Get device by ID
Device* device = devMgr->getDefaultDevice();
Device* device = devMgr->getDeviceByID(0);
OR
Load Program
• Load the program using program manager– Pass in a ProgramInfo instance
Program *prog = progMgr->loadProgram(sampleProgInfo);
assert(prog);
Create Buffers
• Create local buffer for input• Create remote buffer for output• Get constant buffer from constant buffer pool
Buffer* inBuf = bufMgr-> createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512);assert(inBuf); Buffer* outBuf = bufMgr->createRemoteBuffer( CAL_FORMAT_FLOAT_1, 1024, 512);assert(outBuf);ConstBuffer* constBuf = bufMgr->getConstBuffer(1);assert(constBuf);
CPU to GPU Data Transfer
• Read in CPU buffer
• Set Constant
bool r;r = inBuf->readData(cpuInBuf, 1024 * 512);assert(r);
r = constBuf->setConstant<0>(&constant);assert(r);
Bind Buffers
• Bind buffers to the program– Input, Output, Constant, Global
r = prog->bindOutput(outBuf, 0);assert(r);r = prog->bindInput(inBuf, 0);assert(r);r = prog->bindConstant(constBuf, 0);assert(r);
Execute Program
• Define the execution domain• Run program• Check the execution event
CALdomain domain = {0, 0, 1024, 512};
Event *e = prog->run(domain);assert(e);
GPU to CPU Data Transfer
• Write in CPU buffer
r = outBuf->writeData(cpuOutBuf, 1024 * 512);assert(r);
RELEASE RESOURCE AND CLEAN UP
Unload Program
• Destroy program object– Unbind all the buffers
• Call Program::unbindAllBuffers();
– Unload module from context
progMgr->unloadProgram(prog);
Destroy/Release Buffers
• Destroy buffers– InputBuffer, OutputBuffer
• Release ConstBuffer to the pool
bufMgr->destroyBuffer(inBuf);bufMgr->destroyBuffer(outBuf);bufMgr->releaseConstBuffer(constBuf);
Shutdown Runtime
• Not necessary!– Runtime will be destroy when application
exits.
Runtime::destroy();
The Whole Program
#include "cal.h"#include "amdspl.h"#include "RuntimeDefs.h"
using namespace amdspl;using namespace amdspl::core::cal;
void fillBuffer(float buf[], int size){ for (int i = 0;i < size; i++) { buf[i] = (float)i; }}
The Whole Program
const char *__sample_program_src__ = "il_ps_2_0\n""dcl_output_generic o0\n""dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n""dcl_input_position_interp(linear_noperspective) v0.xy__\n""dcl_cb cb0[1]\n""sample_resource(0)_sampler(0) r1, v0.xy00\n""add o0, r1, cb0[0]\n""endmain\n""end\n";
typedef ProgramInfo<1, 1, 1, false> SampleProgram;SampleProgram sampleProgInfo = SampleProgram("Sample Program", __sample_program_src__);
The Whole Program
int main(void){ float *cpuInBuf = new float[1024 * 512]; float *cpuOutBuf = new float[1024 * 512]; float constant = 3; fillBuffer(cpuInBuf, 1024 * 512);
Runtime *runtime = Runtime::getInstance(); DeviceManager *devMgr = runtime->getDeviceManager(); BufferManager *bufMgr = runtime->getBufferManager(); ProgramManager* progMgr = runtime->getProgramManager(); devMgr->assignDevice(0); Device* device = devMgr->getDefaultDevice();
..........
The Whole Program
...... Program *prog = progMgr->loadProgram(sampleProgInfo); Buffer* inBuf = bufMgr->createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); Buffer* outBuf = bufMgr->createRemoteBuffer(CAL_FORMAT_FLOAT_1, 1024, 512); ConstBuffer* constBuf = bufMgr->getConstBuffer(1); inBuf->readData(cpuInBuf, 1024 * 512); constBuf->setConstant<0>(&constant); prog->bindOutput(outBuf, 0); prog->bindInput(inBuf, 0); prog->bindConstant(constBuf, 0); CALdomain domain = {0, 0, 1024, 512}; Event *e = prog->run(domain); r = outBuf->writeData(cpuOutBuf, 1024 * 512); ......
The Entire Program
..... progMgr->unloadProgram(prog); bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf);
Runtime::destroy();
delete [] cpuInBuf; delete [] cpuOutBuf;
return 0;}
THANK YOU!
Top Related