Performance Tools Jeff Kiel Manager, Developer Performance Tools.

37
Performance Tools Jeff Kiel Manager, Developer Performance Tools

Transcript of Performance Tools Jeff Kiel Manager, Developer Performance Tools.

Page 1: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

Performance Tools

Jeff Kiel

Manager, Developer Performance Tools

Page 2: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Performance Tools Agenda

Overview of GPU pipeline and Unified Shader

NVIDIA PerfKit 5.0: Driver & GPU Performance DataInstrumented Driver: GPU & driver performance information, GLExpert runtime debugging

PerfSDK: Performance data integrated into your application

PerfHUD: The Direct3D GPU Performance Accelerator

gDEBugger: OpenGL performance analysis and debugging

ShaderPerf: Shader program performance

Page 3: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

GPU Pipelined Architecture (Logical View)

FramebufferFramebuffer

Vertex Vertex AssemblyAssemblyCPUCPU

GPUGPU

BlendingBlendingRasterizerRasterizerVertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShader

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShader

GeometryGeometryShaderShader

TextureTexture

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShaderPixel Pixel

ShaderShader

Page 4: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

GPU Pipelined Architecture (Logical View)

FramebufferFramebuffer

Vertex Vertex AssemblyAssemblyCPUCPU

GPUGPU

BlendingBlendingRasterizerRasterizerVertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShader

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShader

GeometryGeometryShaderShader

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShaderPixel Pixel

ShaderShader

TextureTexture

Page 5: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Common Graphics/GPU Problems

New, increasingly complex GPU hardwareGPU is a black boxUnified shaders changes everything

Increasing engine and scene complexityArtists don’t always understand how rendering engines workCPU tuning insufficient (multiple processors, multi-cores)Turn around time for debugging and tuning shaders too longHard to debug API/pipeline setup issues

Page 6: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Unified Shader Tuning

No longer “pixel shader bound” GPU balances workload

Now just “shader unit bound”

Check workload distribution for optimization opportunities

Typical optimizations may not workClassic: move calculations from pixels to vertices

If #vertices ~= #pixels, no improvement

Page 7: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

NVIDIA PerfKit 5: The Solution!

Instrumented DriverGLExpertPerfHUDPerfSDK PerfAPI Sample Code Helper Classes DocumentationTools

NVIDIA Plug-In for Microsoft PIX for WindowsgDEBugger 3.1 DevCPL

Platforms (x32 & x64)Windows XP & VistaLinux Update Soon!

Page 8: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfKit Instrumented Driver

GLExpert functionality

GPU and Driver Performance CountersOpenGL and Direct3D

Data exported via NVIDIA API and PDH

Simplified Experiments (SimExp)

Collect GPU and driver data, retain performanceTrack intra-frame events & statistics

Gather and collate at end of frame

Performance overhead 1-2%

Page 9: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

GLExpert: What is it?

Helps eliminate driver/CPU performance issues

OpenGL portion of the Instrumented DriverOutput to stdout or debugger

Different groups/levels of information detail

Controlled using environment variables in Linux, DevCPL tab in Windows

Page 10: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

GLExpert: What is it?

Information providedGL Errors: print when raised

Software Fallbacks: indicate when the driver is in fall back

GPU Programs: errors during compile or link

VBOs: show where they reside, mapping details

FBOs: print reasons for unsupported configuration

Future EnhancementsExtensive SLI supportQuadro 5600/GeForce 8 Series extensionsMore detailed pipeline setup messages, buffer object support, fallback information, and more

Page 11: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfKit: Performance Counter Types

SW/Driver Counters: PerfAPI, PDH

Raw GPU Counters: PerfAPI, PDH

Simplified Experiments: PerfAPI

Instrumented GPUs

Quadro FX 5600 & 4500GeForce 8800 GTX, 8600 GTGeForce 7950/7900 GTX & GT

GeForce 7800 GTXGeForce 6800 Ultra & GTGeForce 6600

Page 12: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

OpenGL/Direct3D Driver CountersGeneral

FPSms per frame

DriverDriver frame time (total time spent in driver)Driver sleep time (waiting for GPU) Detailed wait timers (kernel, locks, rendering, etc.)

CountsBatches, vertices, primitivesTriangles and instanced triangles

MemoryTotal usedRender targets, textures, buffers

Page 13: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

GPU Counters

vertex_attribute_count

culled_primitive_countprimitive_counttriangle_countvertex_count

shaded_pixel_count

rop_busy

Texture Texture Unit Unit

(Filtering)(Filtering)

Vertex Vertex AssemblyAssembly

Raster / Raster / ZCullZCull

Raster Raster OperationsOperations

Frame Frame BufferBuffer

(RAM (RAM Memory)Memory)

gpu_idle

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShader

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShader

GeometryGeometryShaderShader

Vertex Vertex ShaderShaderVertex Vertex ShaderShaderVertex Vertex ShaderShaderPixel Pixel

ShaderShader

shader_busyvertex, geometry, pixel

ratios

Page 14: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

How do I use PerfKit counters?

PerfAPI: Easy integration of PerfKit Real time performance monitoring using GPU and driver counters, round robin sampling

Simplified Experiments for single frame analysis

PDH: Performance Data Helper for WindowsDriver data, GPU counters, and OS information

Exposed via Perfmon

Good for rapid prototyping

PerfSDK: Sample code and helper classes

Page 15: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfAPI: Real Time

// Somewhere in setupNVPMAddCounterByName(“vertex_shader_busy”);NVPMAddCounterByName (“pixel_shader_busy”);NVPMAddCounterByName (“shader_waits_for_texture”);NVPMAddCounterByName (“gpu_idle”);

// In your rendering loop, sample using namesNVPMSample(NULL, &nNumSamples);NVPMGetCounterValueByName(“vertex_shader_busy”, 0, &nVSEvents, &nVSCycles); NVPMGetCounterValueByName(“pixel_shader_busy”, 0, &nPSEvents, &nPSCycles); NVPMGetCounterValueByName(“shader_waits_for_texture”, 0, &nTexEvents, &nTexCycles); NVPMGetCounterValueByName(“gpu_idle”, 0, &nIdleEvents, &nIdleCycles);

Page 16: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfAPI: SimExp

NVPMAddCounter(“GPU Bottleneck”);NVPMAllocObjects(50);

// Set up the experiment, get pass countNVPMBeginExperiment(&nNumPasses);for(int ii = 0; ii < nNumPasses; ++ii) {

// Scene setup/clear backbufferNVPMBeginPass(ii);

NVPMBeginObject(0);// Draw calls associated with object 0 and flushNVPMEndObject(0);

...

NVPMEndPass(ii);

// End scene/present/swap buffers}

// End experiment and retrieve bottleneckNVPMEndExperiment();NVPMGetCounterValueByName(“GPU Bottleneck”, 0, &nGPUBneck, &nGPUCycles);

Page 17: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfHUD: Direct3D debugging and tuning

One click bottleneck determination

Graphs and debugging tools overlaid on your application

4 screens for targeted analysisPerformance Dashboard

Debug Console

Frame Debugger

Frame Profiler

Drag and drop application on PerfHUD icon

Page 18: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

New! PerfHUD 5.0!

Interactive modelShader Edit and Continue

Render state Modification

Configurable Graphs

Many more features and usability improvements

New technologiesWindows Vista & DirectX 10

Quadro 5600 and GeForce 8800 with Unified Shader Architecture

Page 19: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: PerfHUD

Company of Heroes used with permission from THQ and Relic Entertainment

Page 20: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: Performance Dashboard

Company of Heroes used with permission from THQ and Relic Entertainment

Page 21: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: Performance Dashboard

Company of Heroes used with permission from THQ and Relic Entertainment

Page 22: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: Performance Dashboard

Company of Heroes used with permission from THQ and Relic Entertainment

Page 23: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: Frame Debugger

Company of Heroes used with permission from THQ and Relic Entertainment

Page 24: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: Advanced Frame Debugger

Page 25: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Demo: Frame Profiler

Company of Heroes used with permission from THQ and Relic Entertainment

Page 26: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Frame ProfilerOne Touch Performance Analysis

PerfHUD uses PerfSDK

Multiple passes on the scene, sample over 40 performance counters

Need to render THE SAME FRAME until all the counters are read

Must use time-based animation

Do use QueryPerformanceCounter() or timeGetTime()

Don’t use RDTSC or throttle frame rate

Page 27: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Associated Tools: NVIDIA Plug-In for Microsoft PIX for Windows

Page 28: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Graphic Remedy’s gDEBugger

OpenGL and OpenGL ES Debugger and ProfilerShorten development timeImprove application qualityOptimize performanceNVIDIA PerfKit and GLExpert integratedNow supports Linux!Windows XP & Vista, x32 & x64Discounted academic licenses availableNVIDIA booth Thursday morning

http://www.gremedy.com

Page 29: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfGraph

Open source tool for graphing performance counters

Supports PerfKit GPU/Driver signals

System performance information (CPU utilization, memory, etc.)

Cross platform

Windows & Linux

Page 30: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Project Status

PerfKit 5.0 available now: http://developer.nvidia.com/perfkitPerfHUD 5.0

ForceWare Release 100 Driver

GeForce 8800 support

Windows XP & Vista, 32 and 64 bit

Linux 32 and 64 bit PerfSDK/GLExpert in development

PerfGraph: www.sourceforge.org\perfgraph

Instrumented GPUs

Feedback and Support: http://developer.nvidia.com/forums

Quadro FX 5600 & 4500GeForce 8800 SeriesGeForce 7950/7900 GTX & GT

GeForce 7800 GTXGeForce 6800 Ultra & GTGeForce 6600

Page 31: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

v2f BumpReflectVS(a2v IN,uniform float4x4 WorldViewProj,uniform float4x4 World,uniform float4x4 ViewIT)

{v2f OUT;// Position in screen space.OUT.Position = mul(IN.Position, WorldViewProj);// pass texture coordinates for fetching the normal mapOUT.TexCoord.xyz = IN.TexCoord;OUT.TexCoord.w = 1.0;// compute the 4x4 tranform from tangent space to object spacefloat3x3 TangentToObjSpace;// first rows are the tangent and binormal scaled by the bump scaleTangentToObjSpace[0] = float3(IN.Tangent.x, IN.Binormal.x, IN.Normal.x);TangentToObjSpace[1] = float3(IN.Tangent.y, IN.Binormal.y, IN.Normal.y);TangentToObjSpace[2] = float3(IN.Tangent.z, IN.Binormal.z, IN.Normal.z);OUT.TexCoord1.x = dot(World[0].xyz, TangentToObjSpace[0]);OUT.TexCoord1.y = dot(World[1].xyz, TangentToObjSpace[0]);OUT.TexCoord1.z = dot(World[2].xyz, TangentToObjSpace[0]);OUT.TexCoord2.x = dot(World[0].xyz, TangentToObjSpace[1]);OUT.TexCoord2.y = dot(World[1].xyz, TangentToObjSpace[1]);OUT.TexCoord2.z = dot(World[2].xyz, TangentToObjSpace[1]);OUT.TexCoord3.x = dot(World[0].xyz, TangentToObjSpace[2]);OUT.TexCoord3.y = dot(World[1].xyz, TangentToObjSpace[2]);OUT.TexCoord3.z = dot(World[2].xyz, TangentToObjSpace[2]);float4 worldPos = mul(IN.Position, World);// compute the eye vector (going from shaded point to eye) in cube spacefloat4 eyeVector = worldPos - ViewIT[3]; // view inv. transpose contains eye position in world space in last row.OUT.TexCoord1.w = eyeVector.x;OUT.TexCoord2.w = eyeVector.y;OUT.TexCoord3.w = eyeVector.z;return OUT;

}

///////////////// pixel shader //////////////////

float4 BumpReflectPS(v2f IN,uniform sampler2D NormalMap,uniform samplerCUBE EnvironmentMap,

uniform float BumpScale) : COLOR{

// fetch the bump normal from the normal mapfloat3 normal = tex2D(NormalMap, IN.TexCoord.xy).xyz * 2.0 - 1.0;normal = normalize(float3(normal.x * BumpScale, normal.y * BumpScale, normal.z)); // transform the bump normal into cube space// then use the transformed normal and eye vector to compute a reflection vector// used to fetch the cube map// (we multiply by 2 only to increase brightness)float3 eyevec = float3(IN.TexCoord1.w, IN.TexCoord2.w, IN.TexCoord3.w);float3 worldNorm;worldNorm.x = dot(IN.TexCoord1.xyz,normal);worldNorm.y = dot(IN.TexCoord2.xyz,normal);worldNorm.z = dot(IN.TexCoord3.xyz,normal);float3 lookup = reflect(eyevec, worldNorm);return texCUBE(EnvironmentMap, lookup);

}

ShaderPerf 2.0

Inputs:•GLSL, Cg, HLSL•PS1.x,PS2.x,PS3.x•VS1.x,VS2.x, VS3.x •!!FP1.0•!!ARBfp1.0

ShaderPerf

GPU Arch:•Quadro FX series •GeForce 8X00, 7X00•GeForce 6X00 & FX

Outputs:Outputs:•Resulting assembly codeResulting assembly code•# of cycles# of cycles•# of temporary registers# of temporary registers•Pixel/vertex throughputPixel/vertex throughput•Test all fp16 and all fp32Test all fp16 and all fp32

Page 32: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

ShaderPerf: In your pipeline

Test current performance Compare with shader cycle budgets

Test optimization opportunities

Not just Tex/ALU balance: cycles & throughput

Automated regression analysis

Integrated in FX Composer 2.0Artists/TDs code expensive shaders

Achieve optimum performance

Page 33: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

ShaderPerf 2.0 Alpha

Supports Direct3D/HLSL

GeForce 7, 6, and FX series GPUs

ForceWare Release 162 Unified Compiler

Improved vertex performance simulation and throughput calculation with branching

Multiple drivers from one ShaderPerf

Smaller footprint

New programmatic interface

Page 34: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

ShaderPerf 2.0: Beta

Full support for Cg and GLSL, vertex and fragment programs

Support for Quadro 5600 & GeForce 8 series GPUs

Geometry shaders and geometry throughput

Fragment program differencing

Page 35: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

Questions?

Stop by our booth for a hands on demo!

Online: http://developer.nvidia.com/PerfKithttp://developer.nvidia.com/PerfHUDhttp://developer.nvidia.com/ShaderPerf

Feedback and Support: http://developer.nvidia.com/forums

Page 36: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

PerfKit 5

The NVIDIA Developer ToolkitThe NVIDIA Developer Toolkit

GPU Programming Guide

ShaderPerf 2

PerfHUD 5

Conference Presentations

PerfSDK

GLExpert

gDEBugger

NV PIX Plug-in

SDK 10FX Composer 2

Melody

Texture Tools 2

Cg Toolkit

NVSG

Content CreationContent Creation Software Software DevelopmentDevelopment PerformancePerformance DocumentationDocumentation

Videos

mental mill Artist Edition

Books

Page 37: Performance Tools Jeff Kiel Manager, Developer Performance Tools.

© NVIDIA Corporation 2007

GPU Gems 3 Available Now!

SIGGRAPH Bookstore

Major Book Retailers

Includes chapters fromAdobe Systems

Apple

Crytek

Cornell University

Electronic Arts

Havok

Juniper Networks

Microsoft

SEGA

…and many more