A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation...

A User-Programmable Vertex Engine

Erik LindholmErik Lindholm

Mark KilgardMark Kilgard

Henry MoretonHenry Moreton

NVIDIA CorporationNVIDIA Corporation

Presented by Han-Wei ShenPresented by Han-Wei Shen

Where does the Vertex Engine fit? Where does the Vertex Engine fit?

frame-bufferanti-aliasingframe-bufferanti-aliasing

textureblendingtexture

blending

setuprasterizer

Transform & LightingTransform & Lighting

Traditional Graphics Pipeline

frame-bufferanti-aliasingframe-bufferanti-aliasing

textureblendingtexture

blending

setuprasterizer

Transform & LightingTransform & Lighting

GeForce 3 Vertex EngineGeForce 3 Vertex Engine

VertexProgramVertex

Program

API SupportAPI Support

• Designed to fit into OpenGL and Designed to fit into OpenGL and D3D API’sD3D API’s

• Program mode vs. Fixed function Program mode vs. Fixed function modemode

• Load and bind programLoad and bind program

• Simple to add to old D3D and Simple to add to old D3D and OpenGL programsOpenGL programs

Programming Model Programming Model

• Enable vertex program Enable vertex program •glEnable(GL_VERTEX_PROGRAM_NV);

• Create vertex program objectCreate vertex program object

• Bind vertex program object Bind vertex program object

• Execute vertex program object Execute vertex program object

Create Vertex Program Create Vertex Program

• Programs (assembly) are defined Programs (assembly) are defined inline as inline as

character strings character strings static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \END";

Create Vertex Program (2)Create Vertex Program (2)

• Load and bind vertex programs Load and bind vertex programs similar to texture objects similar to texture objects glLoadProgramNV(GL_VERTEX_PROGRAM_NV, 7,

strelen(programString), programString);

glBindProgramNV(GL_VERTEX_PROGRAM_NV, 7);

Invoke Vertex Program Invoke Vertex Program

• The vertex program is initiated The vertex program is initiated when a vertex is given, i.e., whenwhen a vertex is given, i.e., when

glBegin(…)glBegin(…)

glVertex3f(x,y,z)glVertex3f(x,y,z)

… …

glEnd()glEnd()

Let’s look at the sample program

static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \END";

O[HPOS] = M(c0,c1,c2,c3) * v - HPOS? O[COL0] = v[3] - COL0?

Calculate the clip space point position and Assign the vertex with v[3] as its diffuse color

Vertex Source

Vertex Program

Vertex Output

Program Constants

Temporary Registers

16x4 registers

128 instructions

96x4 registers

12x4 registers

15x4 registers

Programming ModelProgramming Model

V[0] …V[15] c[0]

…c[96]

R0 …R11

O[HPOS]O[COL0]O[COL1]O[FOGP]O[PSIZ]O[TEX0] …O[TEX7]

All quad floats

Input Vertex AttributesInput Vertex Attributes

• V[0] – V[15]V[0] – V[15]

• Aliased (tracked) with conventional per-Aliased (tracked) with conventional per-vertex attributes (Table 3)vertex attributes (Table 3)

• Use glVertexAttribNV() to explicitly assig Use glVertexAttribNV() to explicitly assig values values

• Can also specify a scalar value to the vertex Can also specify a scalar value to the vertex attribute array - glVertexAttributesNV()attribute array - glVertexAttributesNV()

• Can change values inside or outside Can change values inside or outside glBegin()/glEnd() pairglBegin()/glEnd() pair

Program ConstantsProgram Constants

• Can only change values outside glBegin()/glEnd() Can only change values outside glBegin()/glEnd() pair pair

• No automatic aliasing No automatic aliasing

• Can be used to track OpenGl matrices Can be used to track OpenGl matrices (modelview, projection, texture, etc.)(modelview, projection, texture, etc.)

• Example: Example:

glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, GL_MODELVIEW_PROJECTION_NV, GL_MODELVIEW_PROJECTION_NV, GL_IDENTIGY_NV)GL_IDENTIGY_NV)

- track 4 contiguous program constants starting - track 4 contiguous program constants starting with c[0]with c[0]

Program Constants (cont’d)

DP4 o[HPOS].x, c[0], v[OPOS]DP4 o[HPOS].x, c[0], v[OPOS]

DP4 o[HPOS].y, c[1], v[OPOS]DP4 o[HPOS].y, c[1], v[OPOS]

DP4 o[HPOS].z, c[2], v[OPOS]DP4 o[HPOS].z, c[2], v[OPOS]

DP4 o[HPOS].w, c[3], v[OPOS]DP4 o[HPOS].w, c[3], v[OPOS]

What does it do? What does it do?

Program Constants (cont’d)

glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV)GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV)

DP3 R0.x, C[4], V[NRML]DP3 R0.x, C[4], V[NRML]

DP3 R0.y, C[5[, V[NRML]DP3 R0.y, C[5[, V[NRML]

DP3 R0.z, C[6], V[NRML] DP3 R0.z, C[6], V[NRML]

What doe it do? What doe it do?

Hardware Block DiagramHardware Block Diagram

Vertex Attribute Buffer (VAB)

Vector FP Core

Vertex In

Vertex Out

Vertex Attribute Buffer (VAB)

128 ( 32 x 4 )

dirty bitsVAB

….0 1 14 15IB

0 1 n-2 n-1........IB

0 1 n-2 n-1........OB

SIMDVector Unit

SpecialFunction

ConstantMemory

InstructionMemory

Registers

writemask

sw/neg

writemask

sw/negsw/neg

HW Block DiagramHW Block Diagram

Data PathData Path

FPU Core

NegateSwizzle

X Y Z WX Y Z W X Y Z W

Write Mask

X Y Z W

Instruction Set: The opsInstruction Set: The ops

• 17 instructions total17 instructions total

• MOV, MUL, ADD, MAD, DSTMOV, MUL, ADD, MAD, DST

• DP3, DP4DP3, DP4

• MIN, MAX, SLT, SGEMIN, MAX, SLT, SGE

• RCP, RSQ, LOG, EXP, LITRCP, RSQ, LOG, EXP, LIT

• ARL ARL

Instruction Set: The Core FeaturesInstruction Set: The Core Features

• Immediate access to sourcesImmediate access to sources

• Swizzle/negate on all sourcesSwizzle/negate on all sources

• Write mask on all destinationsWrite mask on all destinations

• DP3,DP4 most common graphics opsDP3,DP4 most common graphics ops

• Cross product is MUL+MAD with Cross product is MUL+MAD with swizzlingswizzling

• LIT instruction implements LIT instruction implements phongphonglightinglighting

Dot Product Instruction Dot Product Instruction

DP3 R0.x, R1, R2DP3 R0.x, R1, R2

R0.x = R1.x * R2.x + R1.y * R1.y + R0.x = R1.x * R2.x + R1.y * R1.y + R1.z * R2.zR1.z * R2.z

DP4 R0.x, R1, R2DP4 R0.x, R1, R2

4-component dot product 4-component dot product

MUL instruction MUL instruction

MUL R1, R0, R2 MUL R1, R0, R2 (component-wise (component-wise mult.)mult.)

R1.x = R0.x * R2.x R1.x = R0.x * R2.x

R1.y = R0.y * R2.y R1.y = R0.y * R2.y

R1.z = R0.z * R2.z R1.z = R0.z * R2.z

R1.w = R0.w * R2.w R1.w = R0.w * R2.w

MAD instruction MAD instruction

MAD R1, R2, R3, R4MAD R1, R2, R3, R4

R1 = R2 * R3 + R4 R1 = R2 * R3 + R4

*: component wise multiplication*: component wise multiplication

Example: Example:

MAD R1, R0.yzxw, R2.zxyw, -R1MAD R1, R0.yzxw, R2.zxyw, -R1

What does it do? What does it do?

Cross Product Coding ExampleCross Product Coding Example

# Cross product R2 = R0 x R1# Cross product R2 = R0 x R1

MUL R2, R0.zxyw, R1.yzxw;MUL R2, R0.zxyw, R1.yzxw;MAD R2, R0.yzxw, R1.zxyw, -R2;MAD R2, R0.yzxw, R1.zxyw, -R2;

Lighting instructionLighting instruction

LIT R1, R0 LIT R1, R0 (phong light model)(phong light model)Input: R0 = (diffuse, specular, ??, shiness)Input: R0 = (diffuse, specular, ??, shiness)

Output R1 = (1, diffuse, specular^shininess, Output R1 = (1, diffuse, specular^shininess, 1)1)

Usually followed by Usually followed by

DP3DP3 o[COL0], C[21], R1 o[COL0], C[21], R1 (assuming using (assuming using c[21]) c[21])

where C[xx] = (ka, kd, ks, ??) where C[xx] = (ka, kd, ks, ??)

Ready to trace some program? Ready to trace some program?

Previous Work: Geometry EnginePrevious Work: Geometry Engine

• High bandwidth + lots of FlopsHigh bandwidth + lots of Flops

• Low clock rateLow clock rate

• No architectural continuityNo architectural continuity

• VERY hard to programVERY hard to program

• Some high-level language support Some high-level language support (maybe)(maybe)

• A compromise solution (vtx,prim,pix,A compromise solution (vtx,prim,pix,…)…)

Alternative: The CPUAlternative: The CPU

• Low bandwidth + reasonable FlopsLow bandwidth + reasonable Flops

• High clock rateHigh clock rate

• Excellent architectural continuityExcellent architectural continuity

• VERY hard to use efficientlyVERY hard to use efficiently

• Excellent high-level language Excellent high-level language supportsupport

• Flexible, but often too slowFlexible, but often too slow

New Design: The Vertex EngineNew Design: The Vertex Engine

• Simple hardware for a commodity Simple hardware for a commodity GPUGPU

• Allows user to manipulate vertex Allows user to manipulate vertex transformtransform

• Simple to use programming modelSimple to use programming model

• Superset of fixed function modeSuperset of fixed function mode

Why Vertex Processing?Why Vertex Processing?

• Very parallelVery parallel

• Use single vertex programming Use single vertex programming modelmodel

• Hardware can batch or interleaveHardware can batch or interleave

• KISSKISS

Why Not Primitive Processing?Why Not Primitive Processing?

• Face culling and clipping break Face culling and clipping break parallelismparallelism

• Complicates memory accessesComplicates memory accesses

• Inefficient (control takes time)Inefficient (control takes time)

• Let hardware designers optimizeLet hardware designers optimize

Programming Model: Vertex I/OProgramming Model: Vertex I/O

• Streaming vertex architectureStreaming vertex architecture

• Source data converted to floatsSource data converted to floats

• Source data loadedSource data loaded

• Run programRun program

• Destination data drainedDestination data drained

• Destination data re-formatted for Destination data re-formatted for hwhw

Hardware ImplementationHardware Implementation

• Vector SIMD Unit + Special Vector SIMD Unit + Special Function UnitFunction Unit

• Multithreaded and pipelined to hide Multithreaded and pipelined to hide latencylatency

• Any one instruction/cycleAny one instruction/cycle

• All instructions equal latencyAll instructions equal latency

• Free swizzling/negate/write mask Free swizzling/negate/write mask supportsupport

ConclusionConclusion

• Very simple, efficient Very simple, efficient implementationimplementation

• Allows vertex programming Allows vertex programming continuitycontinuity

• Stanford Imagine ArchitectureStanford Imagine Architecture

• A work in progress, lots more to A work in progress, lots more to come…come…

• We welcome your feedbackWe welcome your feedback

A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation...

Documents

Transcript of A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation...

WELCOME KEEPING IN TOUCH - Moreton Rivers Presbytery€¦ · Moreton Rivers Presbytery NEWSLETTER Issue #21 Page 1 Moreton Rivers Presbytery Newsletter 18th October encouraging and

1 What do Soft Skills Have to do With Language? Tuula Lindholm, Algonquin College, Ottawa ©Lindholm, 2010 Reproduction allowed by permission of author.

Michael P. Kilgard Sensory Experience and Cortical Plasticity University of Texas at Dallas.

Commodore’s Corner — Joe Lindholm...Page 2 Latitude 29.28 Official Publication Of The Flagler Beach Yacht Club, Inc. Longitude 81.08 Commodore’s Corner — Joe Lindholm - “Continued

Cultural Collision the Branded Abaya Christina Lindholm

Lindholm rose hill redevelopment project css

Moreton Bay Disaster Preparedness

LILLEYLILLEY BRISBANE GRIFFITHGRIFFITH … · mnchi Tamba Rathwell Nathan Road Chelsea Street Hays In'et Conservation Park Moreton Bay Moreton Bay Moreton Bay Moreton Bay Geoff ...

Cristina Lindholm - DiVA portal544833/FULLTEXT01.pdf · Cristina Lindholm 1 1 Introduction REINERTSEN AS is a Norwegian main contractor supplying multidiscipline process facilities

Optimized Stencil Shadow Volumes Cass Everitt & Mark J. Kilgard.

Mathias Lindholm Ronald Richman Andreas Tsanakas Mario ...

Flora of Moreton Islandmoretonisland.org.au/Flora of Moreton Island.pdf · Flora of Moreton Island Mangroves & Saltmarsh Foredunes Seepage Areas Headland communities Melaleuca swamp

THE SUNSHINE COAST AND MORETON ADVANCE … · 2019. 10. 28. · sunshine coast and moreton bay sunshine coast and moreton bay. the sunshine coast and moreton bay region has many local

A Eurocentric Study of New Terrorism - Henrik Lindholm

Harvest Barn, Moreton, Thame - £875,000media.rightmove.co.uk/29k/28439/67882340/28439... · 2017. 7. 24. · Harvest Barn, Moreton, Thame, Oxfordshire, OX9 2HR Moreton The picturesque

SOUTHERN MORETON BAY ISLANDS - Kangaroo Island SeaLink · 01 southern moreton ba islands - 2013 southern moreton bay islands visitor guide 2013 queensland australia

Water Plan (Moreton) 2007

TESL 2010 Conference Presentation- Tuula Lindholm

Moreton m 15971597715977

Good practices on innovation and R&D funding policies support Ukraine Knowledge Economy Seminar Kiev, April 22nd, 2008 Peter Lindholm peter@lindholm-consult.com.