Computer Graphics 3Lecture 4:
GPU Programming
Benjamin Mora 1University of Wales
Swansea
Dr. Benjamin Mora
Content
2Benjamin MoraUniversity of Wales
Swansea
• Introduction.
• Vertex and Fragment Programs.
• Programming the GPU.
– Assembly Code.
– High Level Languages.
• Example of applications.
• Conclusion.
Introduction
3Benjamin MoraUniversity of Wales
Swansea
Introduction
4Benjamin MoraUniversity of Wales
Swansea
• OpenGL (SGI) early oriented the design of current graphics processors (GPUs).– Fixed pipeline.
• Once the different tests are passed, the fragment color is replaced by the new (textured & interpolated) one.
– Not realistic enough.• The graphics pipeline is fed with Primitives like Triangles,
Points, etc… that are rasterized.• Two main stages:
– Vertex processing.– Fragment (rasterized pixel) processing.
• These 2 stages have been extended for more realism.
Introduction
5Benjamin MoraUniversity of Wales
Swansea
• Latest evolutions– Unified shaders.
• Automatic graphical units balancing between vertex and fragment programs.
• The lower the image size is, the more cpu and vertex bound the program is.
• The greater the image-size is, the more fragment/pixel bound the program is.
– Anti-aliasing and texture filtering parameters also contribute to this.
– Geometry shaders discussed separately.
Vertex and Fragments Programs
6Benjamin MoraUniversity of Wales
Swansea
Vertex and Fragment Programs
7Benjamin MoraUniversity of Wales
Swansea
Daniel Weiskopf, Basics of GPU-Based Programming,
http://www.vis.uni-stuttgart.de/vis04_tutorial/vis04_weiskopf_intro_gpu.pdf
Vertex and Fragment Programs
8Benjamin MoraUniversity of Wales
Swansea
Setup
Rasterization
Frame Buffer Blending
Texture Fetch, Fragment Shading
Tests (z, stencil…)
Vertices
Transform And Lighting
Vertex Programs:User-Defined Vertex
Processing
Fragment Programs:User-Defined
Per-Pixel Processing
Programming the GPU
9Benjamin MoraUniversity of Wales
Swansea
Programming the GPU
10Benjamin MoraUniversity of Wales
Swansea
• Low Level languages (Pseudo-assembler).– Help to understand what is possible on the GPU.– Large code is a pain to maintain/optimize.– May be specific to the graphics card
generation/supplier.
• High Level languages.– Easier to write.– Early compilers were not very good.– Code may be more compatible.
• Loops.
Current Low Level Languages (APIs)
11Benjamin MoraUniversity of Wales
Swansea
• DirectX 9.– Vertex shader 2.0.– Pixel shader 2.0.
• OpenGL extensions.– GL_ARB_vertex_program.– GL_ARB_fragment_program.
• Vendor APIs– NVidia vertex and fragment program.
Current High Level Languages (APIs)
12Benjamin MoraUniversity of Wales
Swansea
• Microsoft, ATI.– High Level Shading Language (HLSL).
• NVidia.– Cg.
• OpenGL Shading Language.
How to use them?
13Benjamin MoraUniversity of Wales
Swansea
• Assembly programs:– Can be loaded (and compiled) at run-time
(OpenGL).– Several programs can be loaded at once.
• Applying the suitable rendering style (i.e. program) to every scene primitive.
• Avoid latency due to pseudo-assembly compilation.
• High level Programs:– Must be compiled before run-time.– The resulting (pseudo) assembly code can then
be used.
Vertex Programs
14Benjamin MoraUniversity of Wales
Swansea
• Vertex Program.– Bypass the T&L unit.– GPU instruction set to perform all vertex math.– Input: arbitrary vertex attributes.– Output: a transformed vertex attributes.
• homogeneous clip space position (required).• colors (front/back, primary/secondary).• fog coord.• texture coordinates.• Point size.
Vertex Programs
15Benjamin MoraUniversity of Wales
Swansea
• Customized computation of vertex attributes– Computation of anything that can be interpolated
linearly between vertices.
• Limitations:– Vertices can neither be generated nor destroyed.
• Geometry shader for that.
– No information about topology or ordering of vertices is available.
Vertex Programs
16Benjamin MoraUniversity of Wales
Swansea
• Vertex programs bypass the following OpenGL functionalities:– Vertex transformations.
• The modelview and projection matrix transformations.
– Normal transformations and normalizations.
– Color material.
– Per-vertex lighting.
– Texture coordinate generation.
– Texture matrix transformations.
– Raster position transformation.
– Client-defined clip planes.
– Per-vertex processing in EXT_point_parameters.
– Per-vertex processing in NV_fog_distance.
– Per-vertex point size computations.
Vertex Programs
17Benjamin MoraUniversity of Wales
Swansea
• What is not replaced?– The view frustum clip.– Perspective divide (division by w).– The viewport transformation.– The depth range transformation.– Clamping the primary and secondary color to
[0,1].– Primitive assembly and per-fragment operations.– Evaluator (except the AUTO_NORMAL
normalization).
NV Vertex Programs
18Benjamin MoraUniversity of Wales
Swansea
• Different Versions: 1.0,1.1, 2.0, 3.0.
• Version 1.0:– 12 temporary vectorial registers (xyzw): R0 =>
R11.– 96 Read-Only vectorial registers (xyzw).
• Specified outside of glBegin/glEnd.
– 8 Matrices.– 17 Different Vertex Programs instructions.
• (128 instruction Max. inside the program.)• 27 in shader 3.0 model.
NV Vertex Programs
19Benjamin MoraUniversity of Wales
Swansea
• Input Parameters for the vertices (v[]):
Mnemonic Number Typical Meaning– OPOS 0 object position
– WGHT 1 vertex weight
– NRML 2 normal
– COL0 3 primary color
– COL1 4 secondary color
– FOGC 5 fog coordinate
– TEX0 8 texture coordinate 0
– TEX1 9 texture coordinate 1
– TEX2 10 texture coordinate 2
– TEX3 11 texture coordinate 3
– TEX4 12 texture coordinate 4
– TEX5 13 texture coordinate 5
– TEX6 14 texture coordinate 6
– TEX7 15 texture coordinate 7
NV Vertex Programs
20Benjamin MoraUniversity of Wales
Swansea
• New Output Values for the vertices (o[]):
Mnemonic Typical Meaning– HPOS Homogeneous clip space position (x,y,z,w)
– COL0 Primary color (front-facing) (r,g,b,a)
– COL1 Secondary color (front-facing) (r,g,b,a)
– BFC0 Back-facing primary color (r,g,b,a)
– BFC1 Back-facing secondary color (r,g,b,a)
– FOGC Fog coordinate (f,*,*,*)
– PSIZ Point size (p,*,*,*)
– TEX0 Texture coordinate set 0 (s,t,r,q)
– TEX1 Texture coordinate set 1 (s,t,r,q)
– TEX2 Texture coordinate set 2 (s,t,r,q)
– TEX3 Texture coordinate set 3 (s,t,r,q)
– TEX4 Texture coordinate set 4 (s,t,r,q)
– TEX5 Texture coordinate set 5 (s,t,r,q)
– TEX6 Texture coordinate set 6 (s,t,r,q)
– TEX7 Texture coordinate set 7 (s,t,r,q)
NV Vertex Programs
21Benjamin MoraUniversity of Wales
Swansea
• Vertex Program Instructions: OpCode Inputs Output Operation
(scalar or vector) (vector or replicated scalar)
ARL s address register address register load
MOV v v move
MUL v,v v multiply
ADD v,v v add
MAD v,v,v v multiply and add
RCP s ssss reciprocal
RSQ s ssss reciprocal square root
DP3 v,v ssss 3-component dot product
DP4 v,v ssss 4-component dot product
DST v,v v distance vector
MIN v,v v minimum
MAX v,v v maximum
SLT v,v v set on less than
SGE v,v v set on greater equal than
EXP s v (ssss?) exponential base 2
LOG s v (ssss?) logarithm base 2
LIT v v light coefficients
NV Vertex Programs
22Benjamin MoraUniversity of Wales
Swansea
• Special Instruction Manipulation: – Use of Negated Values:
• MOV R0,-R1;• ADD R0,R1,-R2; # R0 <= R1-R2 (vectorial operation.)
– Registers can be Swizzled:• MOV R1,R1.wzyx;• ADDR R1,R1,R1.xzxy;
x y z w– Old R1:
– New R1:
1 3 7 11
2 10 8 14
NV Vertex Programs
23Benjamin MoraUniversity of Wales
Swansea
• Example: Normal Normalization.
# v[NRML] = (nx,ny,nz)
#
# R0.xyz = normalize(v[NRML])
# R0.w = 1/sqrt(nx*nx + ny*ny + nz*nz)
#
!!VP1.0
MOV R1, v[NRML] ;
DP3 R0.w, R1, R1;
RSQ R0.w, R0.w;
MUL R0.xyz, R1, R0.wwww;
# Then use R0 to compute shading...
MOV o[COL0],...
NV Vertex Programs
24Benjamin MoraUniversity of Wales
Swansea
#simple specular and diffuse lighting computation with an eye-space normal?!!VP1.0
#
# c[0-3] = modelview projection (composite) matrix
# c[4-7] = modelview inverse transpose
# c[32] = normalized eye-space light direction (infinite light)
# c[33] = normalized constant eye-space half-angle vector (infinite viewer)
# c[35].x = pre-multiplied monochromatic diffuse light color & diffuse material
# c[35].y = pre-multiplied monochromatic ambient light color & diffuse material
# c[36] = specular color
# c[38].x = specular power
#
# outputs homogenous position and color
#
DP4 o[HPOS].x, c[0], v[OPOS];
DP4 o[HPOS].y, c[1], v[OPOS];
DP4 o[HPOS].z, c[2], v[OPOS];
DP4 o[HPOS].w, c[3], v[OPOS];
DP3 R0.x, c[4], v[NRML];
DP3 R0.y, c[5], v[NRML];
DP3 R0.z, c[6], v[NRML]; # R0 = n' = transformed normal
DP3 R1.x, c[32], R0; # R1.x = Lpos DOT n'
DP3 R1.y, c[33], R0; # R1.y = hHat DOT n'
MOV R1.w, c[38].x; # R1.w = specular power
LIT R2, R1; # Compute lighting values
MAD R3, c[35].x, R2.y, c[35].y; # diffuse + emissive
MAD o[COL0].xyz, c[36], R2.z, R3; # + specular
END
NV Fragment Programs
25Benjamin MoraUniversity of Wales
Swansea
• Similar to the Vertex Programs.– Same way to load programs.– Inputs and Outputs are differents. – Different Set of instructions.
• More instructions, but tend to be the same…
• Versions available: 1.0, 2.0, and 4.0.– 64 constant vector registers.– 32 32-bit floating point precision registers or 64
16-bit floating point precision registers.
NV Fragment Programs
26Benjamin MoraUniversity of Wales
Swansea
Fragment Program Inputs
Register Name Descriptionf[WPOS] Position of the fragment center. (x,y,z,1/w)
f[COL0] Interpolated primary color (r,g,b,a)
f[COL1] Interpolated secondary color (r,g,b,a)
f[FOGC] Interpolated fog distance/coord (z,0,0,0)
f[TEX0] Texture coordinate (unit 0) (s,t,r,q)
f[TEX1] Texture coordinate (unit 1) (s,t,r,q)
f[TEX2] Texture coordinate (unit 2) (s,t,r,q)
f[TEX3] Texture coordinate (unit 3) (s,t,r,q)
f[TEX4] Texture coordinate (unit 4) (s,t,r,q)
f[TEX5] Texture coordinate (unit 5) (s,t,r,q)
f[TEX6] Texture coordinate (unit 6) (s,t,r,q)
f[TEX7] Texture coordinate (unit 7) (s,t,r,q)
NV Fragment Programs
27Benjamin MoraUniversity of Wales
Swansea
Fragment Program Outputs
Register Name Description
o[COLR] Final RGBA fragment color, fp32 format (color programs)
o[COLH] Final RGBA fragment color, fp16 format (color programs)
o[DEPR] Final fragment depth value, fp32 format
o[TEX0] TEXTURE0 output, fp16 format (combiner programs)
o[TEX1] TEXTURE1 output, fp16 format (combiner programs)
o[TEX2] TEXTURE2 output, fp16 format (combiner programs)
o[TEX3] TEXTURE3 output, fp16 format (combiner programs)
Write access only!
NV Fragment Programs
28Benjamin MoraUniversity of Wales
Swansea
Fragment Program Instruction Set (V2.0)Instruction Inputs Output Description
ADD[RHX][C][_SAT] v,v v add
COS[RH ][C][_SAT] s ssss cosine
DDX[RH ][C][_SAT] v v derivative relative to x
DDY[RH ][C][_SAT] v v derivative relative to y
DP3[RHX][C][_SAT] v,v ssss 3-component dot product
DP4[RHX][C][_SAT] v,v ssss 4-component dot product
DST[RH ][C][_SAT] v,v v distance vector
EX2[RH ][C][_SAT] s ssss exponential base 2
FLR[RHX][C][_SAT] v v floor
FRC[RHX][C][_SAT] v v fraction
KIL none none conditionally discard fragment
LG2[RH ][C][_SAT] s ssss logarithm base 2
LIT[RH ][C][_SAT] v v compute light coefficients
LRP[RHX][C][_SAT] v,v,v v linear interpolation
MAD[RHX][C][_SAT] v,v,v v multiply and add
MAX[RHX][C][_SAT] v,v v maximum
MIN[RHX][C][_SAT] v,v v minimum
MOV[RHX][C][_SAT] v v move
MUL[RHX][C][_SAT] v,v v multiply
PK2H v ssss pack two 16-bit floats
PK2US v ssss pack two unsigned 16-bit scalars
PK4B v ssss pack four signed 8-bit scalars
PK4UB v ssss pack four unsigned 8-bit scalars
POW[RH ][C][_SAT] s,s ssss exponentiation (x^y)
NV Fragment Programs
29Benjamin MoraUniversity of Wales
Swansea
Fragment Program Instruction Set (V2.0)Instruction Inputs Output Description
RCP[RH ][C][_SAT] s ssss reciprocal
RFL[RH ][C][_SAT] v,v v reflection vector
RSQ[RH ][C][_SAT] s ssss reciprocal square root
SEQ[RHX][C][_SAT] v,v v set on equal
SFL[RHX][C][_SAT] v,v v set on false
SGE[RHX][C][_SAT] v,v v set on greater than or equal
SGT[RHX][C][_SAT] v,v v set on greater than
SIN[RH ][C][_SAT] s ssss sine
SLE[RHX][C][_SAT] v,v v set on less than or equal
SLT[RHX][C][_SAT] v,v v set on less than
SNE[RHX][C][_SAT] v,v v set on not equal
STR[RHX][C][_SAT] v,v v set on true
SUB[RHX][C][_SAT] v,v v subtract
TEX[C][_SAT] v v texture lookup
TXD[C][_SAT] v,v, v v texture lookup w/partials
TXP[C][_SAT] v v projective texture lookup
UP2H[C][_SAT] s v unpack two 16-bit floats
UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars
UP4B[C][_SAT] s v unpack four signed 8-bit scalars
UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars
X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation
NV Fragment Programs
30Benjamin MoraUniversity of Wales
Swansea
• Simple Example: Red Colouring of the fragments (i.e., rasterized pixels):
!!FP1.0
DEFINE red={1.0,0,0,0};
MOV o[COLR], red;
END
• Simple Example: Applying Single Texturing.!!FP1.0
TEX R0, f[TEX0],TEX0, 2D; //Last Parameter can be 1D,2D,3D,RECT
MOV o[COLR],R0;
END
NV Fragment Programs
31Benjamin MoraUniversity of Wales
Swansea
• Useful Instructions:– LRP: Linear Interpolation.– SIN, COS…– SGE,SLT, … : Set the comparison flags.– KILL : Stop the pixel computation.– Pack and Unpack instructions.
• Most instructions are done in 1 cycle (without allowing for texture access).
• Most instructions can conditionally update the result according the comparison flags (e.g., MOV => MOVC)
• Most instructions can clamp the results between 0 and 1.– MOV => MOV_SAT.
• Loops are now possible with the latest generation.
(Silly) Limitations
32Benjamin MoraUniversity of Wales
Swansea
• Most of the limitations are for performance reasons.• At the fragment level, there is no real possibility to access
the frame-buffer in read-write mode.– The new pixel value cannot be computed from the old one.– Floating-point precision filtering and blending only available in recent
graphics cards (NV 8x00 generation). Previous cards (e.g., GeForce 7800 series) could only filter and blend at a FP16 precision.
– Actual number of registers may be less than the number of logical registers.
• Slower programs if a large number of registers is used.
High Level Languages
33Benjamin MoraUniversity of Wales
Swansea
• Why ?– Assembly programming can be tedious when having long
assembly shaders.– Inefficient or difficult programming and debugging
operations.– High-level languages are more portable.
• But:– Final code may be slower.
High Level Languages: Cg Overview
34Benjamin MoraUniversity of Wales
Swansea
• C for Graphics.– Syntax similar to C for easy shader writing.– See CG manual.
http://developer.nvidia.com/object/cg_toolkit.html
• The Vertex and Fragments programs take specific input vectors and values, and have to return specific outputs.
• Need to declare data structures that will be input and output parameters of a function.
Cg: Inputs
35Benjamin MoraUniversity of Wales
Swansea
• Two kinds of shader inputs:
– Varying Inputs.• Inputs that are specific to each entity processed.
– Vertex: Position, Normals, etc…– Fragment: Interpolated values like colors, texture
coordinates, etc…
– Uniform Inputs.• Values that do not change when streaming vertices.
– Vertex level: Transformation Matrix.– Fragment Level: Constant parameters,…
Cg: Vertex Program Inputs
36Benjamin MoraUniversity of Wales
Swansea
• Supported Inputs to a CG Vertex Program (Binding semantics).– POSITION .– BLENDWEIGHT.– NORMAL.– TANGENT.– BINORMAL. – PSIZE.– BLENDINDICES. – TEXCOORD0—TEXCOORD7.
• Every parameter can be declared as a float array with a range of 1 to 4 components. (float, float4,…).– float3 myPosition : POSITION;
Cg: Vertex Program Inputs
37Benjamin MoraUniversity of Wales
Swansea
• Example from the CG user Manual.struct myinputs {
float3 myPosition : POSITION;
float3 myNormal : NORMAL;
float3 myTangent : TANGENT;
float refractive_index : TEXCOORD3;
};
outdata foo(myinputs indata) {
/* ... */
// Within the program, the parameters are referred to as
// “indata.myPosition”, “indata.myNormal”, and so on.
/* ... */
}
Cg: Vertex Program Inputs
38Benjamin MoraUniversity of Wales
Swansea
• Inputs can be directly specified (rather than using a struct operator).
• Example from the CG user Manual:
outdata foo( float3 myPosition : POSITION,
float3 myNormal : NORMAL,
float3 myTangent : TANGENT,
float refractive_index : TEXCOORD3) {
/* ... */
}
Cg: Vertex Program Varying Output
39Benjamin MoraUniversity of Wales
Swansea
• The vertex program output type should match the fragment programs input type.
• The binding semantics will help the compiler to associate the vertex output to the fragment input (interoperability).
• The semantics do not actually impose a specific use for those channels.– Texture coordinates can be used to specify colors or
locations for example.
Cg: Vertex Program Varying Output
40Benjamin MoraUniversity of Wales
Swansea
• Supported outputs to a Vertex Program.– POSITION.– PSIZE.– FOG.– COLOR0–COLOR1.– TEXCOORD0–TEXCOORD7.
Cg: Vertex Program Varying Output
41Benjamin MoraUniversity of Wales
Swansea
• Example from the CG user Manual:// Vertex program (inside a CG file…)
struct myvf {
float4 pout : POSITION; // Used for rasterization
float4 diffusecolor : COLOR0;
float4 uv0 : TEXCOORD0;
float4 uv1 : TEXCOORD1;
};
myvf foo(/* ... */) {
myvf outstuff;
/* ... */
return outstuff;
}
Cg: Input/Output Interoperability
42Benjamin MoraUniversity of Wales
Swansea
• Example from the CG user Manual:struct myvert2frag {
float4 pos : POSITION;
float4 uv0 : TEXCOORD0;
float4 uv1 : TEXCOORD1;
};
// Vertex program
myvert2frag vertmain(...) {
myvert2frag outdata;
/* ... */
return outdata;
}
// Fragment program
void fragmain(myvert2frag indata ) {
float4 tcoord = indata.uv0;
/* ... */
}
Cg: Fragment Program Varying Output
43Benjamin MoraUniversity of Wales
Swansea
• Two supported outputs: COLOR and DEPTH.• Examples:
void main(/* ... */, out float4 color : COLOR, out float depth : DEPTH) {
/* ...*/
color = diffuseColor * /* ...*/;
depth = /*...*/;
}
float4 main(/* ... */) : COLOR {
/* ... */
return diffuseColor * /* ... */;
}
Cg: General Coding
44Benjamin MoraUniversity of Wales
Swansea
• Different type of variables are supported and declarable:– float, half (16 bits), fixed (12 bits).– int, bool.– float1, float4, bool4, bool1,…– float1x1, float2x2,…– Arrays.
• Can declare auxiliary functions.• A wide set of functions and operators is also
available.
Cg: General Coding
45Benjamin MoraUniversity of Wales
Swansea
• Control flow.– if, else, while, for.
• Function definitions and function overloads.• Arithmetic operators from C.• Multiplication function.
– MatrixxVector, VectorxMatrix, MatrixxMatrix.
• Vector constructor.• Boolean and comparison operators.• Swizzle operator.
– float4 a; =>a.xxxx;
• Write mask operator.– float4 color = float4(1.0, 1.0, 0.0, 0.0); color.a=2.0;
• Conditional operator.
Cg: General Coding
46Benjamin MoraUniversity of Wales
Swansea
• Standard nonprojective texture lookup:– tex2D (sampler2D tex, float2 s);– texRECT (samplerRECT tex, float2 s);– texCUBE (samplerCUBE tex, float3 s);
• Standard projective texture lookup:– tex2Dproj (sampler2D tex, float3 sq);– texRECTproj (samplerRECT tex, float3 sq);– texCUBEproj (samplerCUBE tex, float4 sq);
• Math functions:– abs, cos, sin, tan, acos, asin, atan, clamp, determinant,
exp, log, floor, lerp, min, max, pow, sqrt, normalize, …
Applications
47Benjamin MoraUniversity of Wales
Swansea
Application: Procedural Texturing
48Benjamin MoraUniversity of Wales
Swansea
ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/
• Application of textures that are not image based. – Combination of noise and various math expressions.
(Perlin Noise.)– Representation of Wood, Marble,
Stone, Clouds, Waves, Bumps…– Can be computed at the fragment level.– Adds computations, but reduces
bandwidth.– Suppresses the issue of texturing
curved surfaces.
Application: Phong Shading
49Benjamin MoraUniversity of Wales
Swansea
ref: new york university media research lab, http://mrl.nyu.edu/projects/texture/
• Traditional OpenGL pipeline implements Gouraud (shading) interpolation.– Computation of colors and lighting at the vertices,
followed by a linear interpolation.– Can miss the specular highlights that can occur in the
middle of a triangle.
• Phong interpolation is better.– Linearly interpolate the normal across the triangle first.– Then compute Phong shading from the interpolated
normal.
Application: Phong Shading
50Benjamin MoraUniversity of Wales
Swansea
Ian Fergusson, https://www.cis.strath.ac.uk/teaching/ug/classes/52.359/lect13.pdf
Application: Phong Shading
51Benjamin MoraUniversity of Wales
Swansea
• How to realize a Phong interpolation ?
– Pass the normal as a texture coordinate at the vertex level.
– The texture coordinates will be automatically interpolated at the fragment level.
– Normalize the normal in the fragment program first, and then compute a Phong shading.
Other Applications
52Benjamin MoraUniversity of Wales
Swansea
• Bump Mapping.– Can be done at the vertex or at the fragment level.
• Volume Rendering.– Use of 3D textures.
• GPGPU.– General Processing on Graphics Processor Unit.– A lot of GFLOPS…– Scientific calculations like Fourier transforms.
• Geometry modification (Animation, Morphing…).
Top Related