PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

61
USING OPENGL AND DIRECTX FOR HETEROGENEOUS COMPUTE KARL HILLESLAND

description

Presentation PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland at the AMD Developer Summit (APU13) November 11-13, 2013

Transcript of PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

Page 1: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

USING  OPENGL  AND  DIRECTX  FOR  HETEROGENEOUS  COMPUTE  

KARL  HILLESLAND  

Page 2: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  2  

AGENDA  

THE  GRAPHICS  PIPELINE  

PROGRAMMING  THE  GPU  

FEEDING  THE  GPU  

Page 3: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

The  Graphics  Pipeline  

Page 4: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  4  

GRAPHICS  PIPELINE  SHADER  CENTRIC  

OpenGL   DirectX  Input  Assembler  Vertex  Shader  Hull  Shader  Tessellator  

Domain  Shader  Geometry  Shader  

Rasterizer  Pixel  Shader  

Output  Merger  

Vertex  Puller  Vertex  Shader  

TessellaQon  Control  Shader  TessellaQon  PrimiQve  Generator  TessellaQon  EvaluaQon  Shader  

Geometry  Shader  Rasterizer  

Fragment  Shader  Per-­‐Fragment  OperaQons  

Page 5: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  5  

GRAPHICS  PIPELINE  MORE  DETAILS  

Input  Assembler   Vertex  Shader  

Hull  Shader  Domain  Shader  

Geometry  Shader  Next  Slide  

Collects  Patches  

Patch  Constant  

indices,  verQces  

vertex  

Patch  verts  n1  Thread  per  vertex  

Thread  per  output  control  point  n2  

Control  point  

Tessellator   Tess  factors  

Collects  patches  

Thread  per  DS  vertex  (n3)  Barycentric  

Patch  verts  n2  Collects  prims  

DS  vertex  

Prim  verts  PrimiQve  Assembler  

Prims  

vertex  

Page 6: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  6  

Hi-­‐Z/Stencil  Rasterizer  1  prim  Hi-­‐Z/Stencil  info  

Rasterizer  2   Early-­‐Z/Stencil  

Collects  Quads  Pixel  Shader  

Reordering   Depth/Stencil   Blending  

Unroller  Unrolling,  Masking  

Not  shown:  Any  shader  stage  can  read/write  to  memory,  including  atomics,  filtering*,  decompression,  and  sRGB  conversion  

Conversion  

Page 7: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  7  

WHAT’S  THE  POINT?  

!  The  Graphics  pipeline  has  a  lot  more  parts  ‒ Reorganizes  threads  ‒ Tracks  dependencies  ‒ Reorders  ‒ Extra  fixed-­‐funcQon  units  

!  Are  they  usable?  

Page 8: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  8  

GRAPHICS  IN  THE  NINETIES    

Input  Assembler  

Transform  and  LighQng  

Rasterizer  

Texturing  and  Fog  

Output  Merger  

Page 9: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  9  

VORONOI  DIAGRAMS  

!  Color  according  to  closest  ‒ Point  ‒  Line  

!  Could  be  weighted  !  Useful  for    

‒ Collision  DetecQon  ‒  Surface  ReconstrucQon  ‒ Robot  MoQon  Planning  ‒ Non-­‐PhotorealisQc  Rendering  ‒  Surface  SimplificaQon  ‒ Mesh  GeneraQon  

GPGPU  WITHOUT  SHADERS  

Page 10: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  10  

VORONOI  DIAGRAMS  IN  THE  NINETIES  

2-­‐part  discrete  Voronoi  diagram  representaQon  

Distance  

Depth  Buffer  

Site  IDs  

Color  Buffer  

Simply  rasterize  the  cones  using  graphics  

hardware  

Haeberli90,  Woo97  

Page 11: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  11  

OPENGL  1  SIMD  MACHINE  PEERCY,  ET.  AL.  SIGGRAPH  2000  

SIMD  Concept   OpenGL  1  SIMD  

InstrucQon   OpenGL  call  (CPU)  

SIMD  Lane   Pixel  

SIMD  Lane  Input  Data   Texel  

SIMD  Lane  Output  Data   Fragment  

ALU   Blend  OperaQon  

CondiQonals   Alpha  and  Stencil  Tests  

float y; float4 contrived_example() { float x = f(u,v) if( x*y > 0) { x = x + g(u,v) } return x*h(u,v);

}

Page 12: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  12  

USING  EARLY-­‐Z  OR  STENCIL  

ApplicaQons  of  Explicit  Early-­‐Z  Culling,  Real-­‐Time  Shading  Course,  Siggraph  2004.  

Pressure  buffer  used  for  sim  culling  Texture-­‐space  blur   With  back-­‐face  culling  

Page 13: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  13  

The  graphics  pipeline    gives  you  access  to  more  

What’s  the  Point?  

Page 14: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

Programming  the  GPU  

Page 15: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  15  

OpenGL   D3D  

!  Compute  (4.3)  

!  Vertex  (2,  ES  2)  !  TessellaQon  Control  (4)  !  TessellaQon  EvaluaQon  (4)    !  Geometry  (3)  

!  Fragment  (2,  ES  2)  

!  Compute  (11)  !  Vertex  (8)  !  Hull  (11)  !  Domain  (11)  !  Geometry  (10)  !  Pixel  (9)  

SHADER  TYPES  

15  

Page 16: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  16  

#version 430 in vec3 Position; in vec2 UV; out PosUV //Not available in GLES

{ vec3 vPositionWS; vec2 vUV;

} vs_output; uniform mat4x4 mMVP; uniform mat4x4 mM;

void main(void) {

gl_Position = mMVP * vec4(Position, 1.0);

vs_output.vPositionWS = mM * vec4(Position, 1.0); vs_output.vUV = UV;

}

BASIC  GLSL  VERTEX  SHADER  

16  

Page 17: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  17  

in fsInput //Not available in GLES

{

vec3 vPositionWS;

vec2 vUV;

} fs_input;

uniform sampler2D sDiffuse;

out vec4 color_out;

void main(void)

{

color_out = texture( sDiffuse, fs_input.vUV );

}

BASIC  GLSL  PIXEL  SHADER  

17  

Page 18: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  18  

struct PosUV //Not available in GLES

{

float4 vPositionSS : SV_POSITION;

float3 vPositionWS : POSITION;

float2 vUV : TEXCOORD0;

};

float4x4 mMVP;

float4x4 mM;

PosUV main(

float3 Position : POSITION,

float2 UV: TEXCOORD0)

{

PosUV vs_output;

output.vPositionSS = mMVP * float4(Position, 1.0);

vs_output.vPositionWS = mMP * float4(Position, 1.0);

vs_output.vUV = UV;

return vs_output;

}

BASIC  HLSL  VERTEX  SHADER  

18  

Page 19: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  19  

struct fsInput

{

float3 vPositionWS : POSITION;

float2 vUV : TEXCOORD0;

};

sampler sWrapTriLin;

texture2D <float4> tDiffuse;

float4 main(fsInput i) : SV_TARGET

{

return tDiffuse.Sample(sWrapTriLin, i.vUV);

}

BASIC  HLSL  PIXEL  SHADER  

19  

Page 20: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  20  

layout (triangles) in;

layout (triangle_strip, max_vertices = 3) out;

void main(void)

{

for(int i=0; i < gl_in.length(); i++)

{

gl_Position = gl_in[i].gl_Position;

EmitVertex();

}

EndPrimitive();

}

BASIC  GEOMETRY  SHADER  

20  

Page 21: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  21  

TESSELLATION  

D3D11  OpenGL  4.0  

Hull  Shader   Patch  Constant  Func  

Tessellator  

Domain  Shader  

Tess  factors  

Topology  

TessellaQon  Control  

TessellaQon  EvaluaQon  

Tessellator  

Tess  factors  

Topology  

21  

Page 22: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  22  

TESSELLATION  

D3D11  OpenGL  4.0  

// Hull Shader

[outputcontrolpoints(4)]

[patchconstantfunc("ConstantsHS")]

[domain("quad")]

[partitioning(“integer")]

[outputtopology("triangle_cw")]

HS_OUTPUT HullShader(…)

// Domain Shader

DS_OUTPUT DomainShader(…)

// Tessellation Control layout (vertices = 4) out; void TCS(void) { if (gl_InvocationID == 0) { gl_TessLevelInner[0] = 2.0; … // Tessellation Evaluation layout (quads, cw, equal_spacing) in void TES(void) { …

22  

Page 23: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  23  

out patch float tessFactor;

void main(void)

{

if (gl_InvocationID == 0)

{

gl_TessLevelInner[0] = 2.0;

tessFactor = 2.0;

}

barrier();

DoSomeWork(tessFactor, gl_InvocationID);

TESSELLATION  CONTROL  

TessellaQon  rate  can  be  set  by  any  instance  

Values  can  be  communicated  across  threads  

23  

Page 24: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  24  

!  Groups  can  share  local  memory  

!  Threads  can  be  synced  at  a  group  level  

24  

COMPUTE  SHADERS  

global size y

global size x

Thread Group

group size x

group size y

Thread Thread

Thread Thread

Page 25: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  25  

OPENGL  COMPUTE  

buffer BlockName { int linearOutput[] };

shared int var;

layout(local_size_x = 64, local_size_y = 1, local_size_z = 1) void ContrivedSample()

{

const uvec3 localIdx = gl_LocalInvocationID; const uvec3 globalIdx = gl_GlobalInvocationID; const uvec3 groupIdx = gl_WorkGroupID;

if(localId.x == 0)

var = groupIdx.x;

barrier();

linearOutput[globalIdx.x] = var;

}

25  

Page 26: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  26  

DIRECT  COMPUTE  

RWStructuredBuffer<int> linearOutput;

groupshared int var;

[numthreads(64, 1, 1)]

void ContrivedSample(

uint3 globalIdx : SV_DispatchThreadID,

uint3 localIdx : SV_GroupThreadID,

uint3 groupIdx : SV_GroupID )

{

if(localIdx.x == 0)

var = groupIdx.x;

GroupMemoryBarrierWithGroupSync();

linearOutput[globalIdx.x] = var;

}

26  

Page 27: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

PROGRAMMING  THE  GPU  SYNCHRONIZATION  

Page 28: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  28  

MEMORY  COHERENCE-­‐  GL  /  DX  

Dispatch  

CS  Mem  

CS  

28  

Page 29: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  29  

MEMORY  COHERENCE-­‐  GL/DX  11.1  

Draw  

VS  

GS  

FS  

RT  

Mem  VS  

GS  

FS  

29  

Page 30: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  30  

MEMORY  COHERENCE-­‐  GL  /  DX  11.1  

Draw  

VS  

GS  

FS  

RT  

Mem  

30  

Page 31: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

Feeding  the  GPU  

Page 32: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  32  

DRIVER  STACKS  (WINDOWS)  

32  

 OpenGL  App  

OpenGL32.dll  

OpenGL  ICD  

DirectX  App  

D3D11.dll  

D3D  UMD  

KMD  

DXGI  

Page 33: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  33  

DRIVER  STACKS  (LINUX)  

33  

App  

libGL  

DRI  

drm  

libDRM-­‐radeon  

Gallium3D  State  tracker  

Gallium3D  WinSys  

Hardware  layer  Or  

Page 34: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

FEEDING  THE  GPU  GPU-­‐CPU  SYNCHRONIZATION  

Page 35: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  35  

DRIVER  COMMAND  QUEUE  

35  

Dr  5  

Dr  6  

Da    6  

ApplicaQon  

Dr  1   Dr  2  Da  2   Dr  3   Dr  4  Da  4   Dr  5   Dr  6  Da  6  

Driver/GPU  

Reorder  possible?  

Time  

Da  1   Da  3  

Da  5  

Da  5  

Page 36: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  36  

CPU/GPU  MEMORY  SYNCHRONIZATION  BY  DRIVER  

App  Memory  

Driver  Copy  

App  Memory  

Driver  Copy  

GPU  Read  

GPU  Read  

Driver  Copy  

Stream,  StaQc,  Dynamic  Draw,  Read,  Copy  Hints  

Page 37: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  37  

CPU/GPU  MEMORY  SYNCHRONIZATION  MANUAL  

Dr  1   Dr  2  Da  2   Dr  3   Dr  4  Da  4   Dr  5   Dr  6  Da  6  Da  1   Da  3   Da  5  

App  Memory   App  Copy   GPU  

Read  Driver  Copy  

Fence  

Page 38: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

FEEDING  THE  GPU  DATA  

Page 39: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  39  

!  glGenBuffers,  glGenTextures,  glGenSamplers,  …  ‒ Creates  name  /  handle  

!  glBindBuffer,  glBindTexture,    ‒  Sets  as  current  

!  glBufferData,  glTexSubImage,  glMapBuffer  ‒  Supplies  data  

LEGACY  OPENGL  OBJECT  MODEL  

39  

Page 40: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  40  

BUFFER  BINDING  AND  CREATION  

glBindBuffer(target,name)  

binding  Target   BufferObject  

desc.BindFlags  =  <Target>  pDevice-­‐>CreateBuffer(desc,…)  

BufferData  

State,  Usage  

40  

Page 41: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  41  

SETTING  DATA  (SIMPLEST  OPTION)  

binding  Target   BufferObject  

glBufferData  (target,  size,  pData,  usage)  

data  

desc.Usage  =  <Usage>  desc.CPUAccessFlags  =  <RWUsage>  pDevice-­‐>CreateBuffer(desc,pData,)  

41  

Page 42: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  42  

GL  Name   Typical  Purpose     DX  Equivalent  

ARRAY   VerQces   VERTEX  

ELEMENT_ARRAY   Indices   INDEX  

UNIFORM   Read-­‐only  vars   CONSTANT  

TEXTURE_BUFFER   Buffer-­‐as-­‐texture   CONSTANT  (tbuffer)  

SHADER_STORAGE   Read/write   SHADER_RESOURCE  

TRANSFORM_FEEDBACK   Stream  out   Stream  out  

DRAW_INDIRECT   indirect  draw   DRAWINDIRECT  

ATOMIC_COUNTER   Global  counter  var   UAV_FLAG_COUNTER  

COPY_READ,  _WRITE   Copying  (opQonal)   Staging?  

PIXEL_PACK,  _UNPACK   GPU  <-­‐>  CPU   Staging?  

BUFFER  TARGETS  

42  

Page 43: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  43  

!  Resource  (base  class)  ‒ Usage:  default,  immutable,  dynamic,  staging  ‒ Bind  flags:  vertex,  index,  shader  resource,  …  

!  Buffer  !  Texture2D,  …  

!  DepthStencilView  !  RenderTargetView  !  ShaderResourceView  !  UnorderedAccessView  

43  

DIRECTX  OBJECTS  AND  VIEWS  

Page 44: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  44  

D3D11_BUFFER_DESC desc;

desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;

pDevice->CreateBuffer(&desc, data, &pBuffer);

D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc;

srcDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;

pDevice->CreateShaderResourceView(pBuffer, &srvDesc, &pView);

//at draw time

pContext->VSSetShaderResources(0, 1, pView);

44  

OBJECT  AND  VIEW  EXAMPLE  

Page 45: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  45  

DATA  TYPES  

Image   Linear  

Page 46: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  46  

glGenTextures(1, &texObjName);

glBindTexture(GL_TEXTURE_2D_ARRAY,

texObjName);

glTexStorage3D(GL_TEXTURE_2D_ARRAY, level, internalformat,

width, height, depth);

glTexSubImage3D(GL_TEXTURE_2D_ARRAY,

0,0,0, width, height, depth,

format, type, pData);

IMMUTABLE  TEXTURES  (4.2,  GLES  3)  

CreateTexture2D( desc, srcDataLayout, pData);  

46  

Page 47: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

FEEDING  THE  GPU  PROGRAMS  

Page 48: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  48  

GLuint shader = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(…);

glCompileShader();

GLuint program = glCreateProgram();

glAttachShader(program, shader);

glLinkProgram(program);

glUseProgram(program);    

 

SHADER  MANAGEMENT  -­‐  OPENGL  

48  

Program  Object  

Vertex  Shader  Pixel  Shader  

Page 49: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  49  

in fsInput //Not available in GLES

{

vec3 vPositionWS;

vec2 vUV;

} fs_input;

uniform sampler2D sDiffuse;

out vec4 color_out;

void main(void)

{

color_out = texture( sDiffuse, fs_input.vUV );

}

BASIC  GLSL  PIXEL  SHADER  

49  

Page 50: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  50  

#version 430 in vec3 Position; in vec2 UV; out PosUV //Not available in GLES

{ vec3 vPositionWS; vec2 vUV;

} vs_output; uniform mat4x4 mMVP; uniform mat4x4 mM;

void main(void) {

gl_Position = mMVP * vec4(Position, 1.0);

vs_output.vPositionWS = mM * vec4(Position, 1.0); vs_output.vUV = UV;

}

BASIC  GLSL  VERTEX  SHADER  

50  

Page 51: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  51  

D3DCompile(source,..,vs_5_0,..,&pByteCode)

pShader = CreateVertexShader(pByteCode);

VSSetShader(pShader,0,0);

!  No  program  /  link  concept  in  API  

SHADER  MANAGEMENT  -­‐  DX  

51  

Page 52: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  52  

PROGRAM  BINARIES  

glGetProgramBinary(program,…,format,pBinaryOut);

 

!  Program  level  

!  In  theory:  format  choices  

!  In  pracQce:  somewhat  final,  non-­‐portable  

D3DCompile(source,..,vs_5_0,..,&pByteCode)

!  Shader  level  !  Portable  byte  code  

OpenGL   DirectX  

52  

Page 53: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  53  

OpenGL   D3D  

glDrawArrays   Draw  

glDrawArraysInstanced   DrawInstanced(…,0)  

glDrawArraysInstancedBaseInstance   DrawInstanced  

glDrawArraysIndirect   DrawInstancedIndirect  

glMulQDrawArrays   for(int  i=0;  i<n;  ++i)        Draw(count[i],  start[i]);  

glMulQDrawArraysIndirect   for(int  i=0;  i<n;  ++i)        DrawInstancedIndirect(…)  

glDrawElements   DrawIndexed  

…And  so  forth  

DRAW  CALLS  

53  

Page 54: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  54  

glDispatchCompute(nGroupsX,nGroupsY,nGroupsZ)  

COMPUTE  SHADERS  

Dispatch(nGroupsX,nGroupsY,nGroupsZ)

 glDispatchComputeIndirect(offset)

  DispatchIndirect(pResource,offset)

 

OpenGL  4.3   D3D11  

54  

Page 55: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

Wrap  up  

Page 56: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  56  

IMAGE-­‐BASED  MODELING  

Page 57: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  57  

GENERATING  THE  MODEL  

Render:  projecQon,  rasterizaQon,  texturing,  depth  buffering,  …  

Page 58: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  58  

TressFX  

! AMD  technology  for  high-­‐quality  hair  rendering  

! Thousands  of  hair  strands  individually  simulated  and  rendered  on  the  GPU  

! DirectCompute  physics  simulaQon  

! Shader  Model  5.0  pixel  shader  using  compute  capabiliQes  for  rendering  

HAIR  

Page 59: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  59  

NOT  EXPOSED  IN  GRAPHICS  APIS  (YET)  

!  Local  shared  memory  restricted  to  ‒ Compute    ‒ TessellaQon  Control,  in  a  limited  sense  

!  Some  OpenCL  extensions  (e.g.,  64  bit  atomics)  

!  Numerical  compliance  

!  Some  OpenCL  1.2  addiQons  

!  OpenCL  2.0  addiQons  

Page 60: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  60  

SUMMARY  

The  graphics  pipeline    gives  you  access  to  different  hardware  

Mix  and  match  for  the  best  of  both  compute  and  graphics  

There  are  addiQonal  synchroniza6on  issues  and  opportunites  

Page 61: PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland

|      PRESENTATION  TITLE      |      DECEMBER  4,  2013      |      CONFIDENTIAL  61  

DISCLAIMER  &  ATTRIBUTION  

The  informaQon  presented  in  this  document  is  for  informaQonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.    

The  informaQon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap  changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  sozware  changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaQon  to  update  or  otherwise  correct  or  revise  this  informaQon.  However,  AMD  reserves  the  right  to  revise  this  informaQon  and  to  make  changes  from  Qme  to  Qme  to  the  content  hereof  without  obligaQon  of  AMD  to  noQfy  any  person  of  such  revisions  or  changes.    

AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY  INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.    

AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE  LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION  CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.  

 

ATTRIBUTION  

©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combinaQons  thereof  are  trademarks  of  Advanced  Micro  Devices,  Inc.  in  the  United  States  and/or  other  jurisdicQons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  EvaluaQon  CorporaQon  (SPEC).  Other  names  are  for  informaQonal  purposes  only  and  may  be  trademarks  of  their  respecQve  owners.