GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10...

Post on 29-Dec-2015

217 views 0 download

Tags:

Transcript of GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10...

GPU Shading and Rendering

Shading

Technology

8:30 Introduction(:30–Olano)

9:00 Direct3D 10(:45–Blythe)

Languages, System

s and D

emos

10:30 RapidMind (:50–McCool)

11:20 OpenGL Shading Language(:50–Olano)

1:45 Cg / NVIDIA(:50–Kilgard)

2:35 HLSL / ATI(:50–Scheuermann)

GP

Us in P

roduction Rendering

3:45 GPU Production Animation(:45–Wexler)

4:30 Interactive Cinematic Shading. Where are we?(:45–Pellacini)

Wrap

Up

5:15 Discussion and Q&A(:15–All)

GPU Shading and Rendering:Introduction

Marc Olano

UMBC

Americas Army

GPU

• GPU: Graphics Processing Unit

– Designed for real-time graphics

– Present in almost every PC

– Increasing realismand complexity

GPU computation

Texture /

Buffer

Texture /

BufferVertexVertex

GeometryGeometry

FragmentFragment

CPUCPU

DisplayedPixels

DisplayedPixels

Low-level code

!!ARBvp1.0# Transform the normal to view spaceTEMP Nv,Np;DP3 Nv.x,state.matrix.modelview.invtrans.row[0],vertex.normal;DP3 Nv.y,state.matrix.modelview.invtrans.row[1],vertex.normal;DP3 Nv.z,state.matrix.modelview.invtrans.row[2],vertex.normal;MAD Np,Nv,{.9,.9,.9,0},{0,0,0,1};

# screen position from vertexTEMP Vp;DP4 Vp.x, state.matrix.mvp.row[0], vertex.position;DP4 Vp.y, state.matrix.mvp.row[1], vertex.position;DP4 Vp.z, state.matrix.mvp.row[2], vertex.position;DP4 Vp.w, state.matrix.mvp.row[3], vertex.position;[…]# interpolateMAD Np, Np, -vertex.color.x, Np;MAD result.position, Vp, vertex.color.x, Np;END

High-level code

void main() { vec4 Kin = gl_Color; // key input

// screen position from vertex, texture and normal vec4 Vp = ftransform(); vec4 Tp = vec4(gl_MultiTexCoord0.xy*1.8-.9, 0,1); vec4 Np = vec4(nn*.9,1);

// interpolate between Vp, Tp and Np gl_Position = Vp; gl_Position = mix(Tp,gl_Position,pow(1.-Kin.x,8.)); gl_Position = mix(Np,gl_Position,pow(1.-Kin.y,8.));

// copy to output gl_TexCoord[0] = gl_MultiTexCoord0; gl_TexCoord[1] = Vp; gl_TexCoord[3] = Kin;}

Non-real time vs. Real time

• Not real-time

– Developed from General CPU code

– Seconds to hours per frame

– 1000s of lines

– “Unlimited” computation, texture, memory, …

• Real-time

– Developed from fixed-function hardware

– Tens of frames per second

– 1000s of instructions

– Limited computation, texture, memory, …

Non-real time vs. Real-time

• Non-real time • Real-time

Texture/

Buffer

Texture/

BufferVertexVertex

GeometryGeometry

FragmentFragment

ApplicationApplication

DisplayedPixels

DisplayedPixels

LightLight

DisplayedPixels

DisplayedPixels

ApplicationApplication

DisplacementDisplacement

SurfaceSurface

VolumeVolume

AtmosphereAtmosphere

ImagerImager

History (not real-time)

• Testbed [Whitted and Weimer 1981]

• Shade Trees [Cook 1984]

• Image Synthesizer [Perlin 1985]

• RenderMan [Hanrahan and Lawson 1990]

• Multi-pass RenderMan [Peercy et al. 2000]

• GPU acceleration [Wexler et al. 2005]

History (real-time)

• Custom HW [Olano and Lastra 1998]

• Multi-pass standard HW [Peercy et al. 2000]

• Register combiners [NVIDIA 2000]

• Vertex programs [Lindholm et al. 2001]

• Compiling to mixed HW [Proudfoot et al. 2001]

• Fragment programs

• Standardized languages

• Geometry shaders [Blythe 2006]

Choices

• OS: Windows, Mac, Linux

• API: DirectX, OpenGL

• Language: HLSL, GLSL, Cg, …

• Compiler: DirectX, OpenGL, Cg, ASHLI

• Runtime: CgFX, ASHLI, OSG (& others), sample code

Major Commonalities

• Vertex & Fragment/Pixel

• C-like, if/while/for

• Structs & arrays

• Float + small vector and matrix

– Swizzle & mask (a.xyz = b.xxw)

• Common math & shading functions

PipelinePipelinePipelinePipeline

Texture /

Buffer

Texture /

BufferVertexVertex

GeometryGeometry

FragmentFragment

GPU Parallelism

PipelinePipelinePipelinePipeline

SPMD ParallelSPMD ParallelFragment StreamFragment StreamSPMD ParallelSPMD ParallelFragment StreamFragment Stream

Texture /

Buffer

Texture /

BufferVertexVertex

GeometryGeometry

FragmentFragment

GPU Parallelism

GPU Parallelism

SPMD ParallelSPMD ParallelFragment StreamFragment StreamSPMD ParallelSPMD ParallelFragment StreamFragment Stream

Fragment

Fragment

Fragment

Fragment

Fragment

Fragment

Fragment

Fragment

SIMD ParallelSIMD Parallel2x2 Block2x2 Block

SIMD ParallelSIMD Parallel2x2 Block2x2 Block

Fragment

Fragment

Fragment

Fragment

Fragment

Fragment

Fragment

Fragment

GPU Parallelism

ShaderUnit

ShaderUnit

ShaderUnit

ShaderUnit

BranchUnit

BranchUnit

FogFog

Texture

Unit

Texture

Unit

L1 Cache L1

Cache

L2 Cache L2

Cache

PipelinePipeline(NVIDIA)(NVIDIA)PipelinePipeline(NVIDIA)(NVIDIA)

SIMD ParallelSIMD Parallel2x2 Block2x2 Block

SIMD ParallelSIMD Parallel2x2 Block2x2 Block

GPU Parallelism

ShaderUnit

ShaderUnit

ShaderUnit

ShaderUnit

BranchUnit

BranchUnit

FogFog

Texture

Unit

Texture

Unit

L1 Cache L1

Cache

L2 Cache L2

Cache

PipelinePipeline(NVIDIA)(NVIDIA)

Vector ParallelVector ParallelLimited MIMDLimited MIMD

ALUALUALUALU ALUALUALUALU

ALUALUALUALU ALUALUALUALU

Managing GPU Programming

• Simplified computational model

– Bonus: consistent as hardware changes

• All stages SIMD

– Explicit 4-element SIMD vectors

• Fixed conversion / remapping between each stage

BufferBufferVertex (stream)Vertex (stream)

Geometry(stream)Geometry(stream)

Fragment(array)Fragment(array)

Vertex

• One element in / one out

• NO communication

• Can select fragment address BufferBufferVertex (stream)Vertex (stream)

Geometry(stream)Geometry(stream)

Fragment(array)Fragment(array)

Geometry

• More next (Blythe talk)

• One element in / 0 to ~100 out

– Limited by hardware buffer sizes

• Like vertex:

– NO communication

– Can select fragment address

BufferBufferVertex (stream)Vertex (stream)

Geometry(stream)Geometry(stream)

Fragment(array)Fragment(array)

Fragment

• Biggest computational resource

• One element in / 0 – 1 out

• Cannot change destination address

– I am element x,y in an array, what is my value?

• Effectively no communication

• Conditionals expensive

– Better if block coherence

BufferBufferVertex (stream)Vertex (stream)

Geometry(stream)Geometry(stream)

Fragment(array)Fragment(array)

Program / Multiple Passes

• Communication

– None in one pass

– Arbitrary read addresses between passes

• Data layout

– No persistent per-processor memory

– No penalty to change

BufferBufferVertex (stream)Vertex (stream)

Geometry(stream)Geometry(stream)

Fragment(array)Fragment(array)

Multiple passes

• GPGPU

• Non-local effects

– Shadow maps

– Texture space

• Precomputation

– Fix some degrees of freedom

– Factor into functions of 1-3D

– Project input or output into another space

GPU Shading and Rendering

Shading

Technology

8:30 Introduction(:30–Olano)

9:00 Direct3D 10(:45–Blythe)

Languages, System

s and D

emos

10:30 RapidMind (:50–McCool)

11:20 OpenGL Shading Language(:50–Olano)

1:45 Cg / NVIDIA(:50–Kilgard)

2:35 HLSL / ATI(:50–Scheuermann)

GP

Us in P

roduction Rendering

3:45 GPU Production Animation(:45–Wexler)

4:30 Interactive Cinematic Shading. Where are we?(:45–Pellacini)

Wrap

Up

5:15 Discussion and Q&A(:15–All)