Videogame Optimization

Post on 17-Jan-2017

347 views 6 download

Transcript of Videogame Optimization

Video Game Optimization Workshop

Amir H. Fassihi Fanafzar Game Studio

Aug 2012

Fanafzar Game Studio

System Design Requirements

•  Functional •  Non Functional

Fanafzar Game Studio

Non Functional Requirements

•  Maintainability •  Extensibility •  Security •  Scalability •  Intellectual Manageability •  Availability •  Portability •  Usability •  Performance

Fanafzar Game Studio

Performance The amount of work accomplished by a computer system compared to the time and resources used. •  Short response time •  High throughput •  Low utilization of computer resources •  High availability of applications •  Fast data compression and decompression •  High bandwidth/ Short data transmission time

Fanafzar Game Studio

Video Games

•  Most x-abilities are important – Even more so for game engines. (As in

enterprise applications) •  Performance is REALLY important!

– For any game or game engine.

Fanafzar Game Studio

System Design

•  Solution for Functional Requirements •  Solution for Non-Functional Requirements

– Bulk of the technical efforts – Conflicts in Design! – Performance as the bad boy in the group – Performance as the cream of the crop – Performance being directly experienced by

end user

Fanafzar Game Studio

Can you make this?

Fanafzar Game Studio

Optimization

•  “The process of modifying a software system to make some aspects of it work more efficiently or use fewer resources.”

Fanafzar Game Studio

Optimization Lifecycle

1.  Benchmark 2.  Detect (Hotspots and Bottlenecks)

3.  Solve 4.  Check 5.  Goto 1

Fanafzar Game Studio

Levels of Optimization

•  System Level •  Algorithmic Level •  Micro Level

– Branch prediction –  Instruction throughput – Latency

Fanafzar Game Studio

Project Lifecycle and Optimization

•  Pre-production •  Production •  Post-production Optimization from High Level to Low Level Quake Story: High level architectural optimization before low level triangle draw function (Carmack and Abrash) http://www.bluesnews.com/abrash/

Fanafzar Game Studio

Measuring Performance in Games

1.  Set Specification 1.  Performance Goal (FPS, time) 2.  Hardware Specification

2.  Define Line Items 1.  CPU time, RAM, GPU time, Video Mem 2.  Rendering, Physics, Sound, Gameplay, Misc.

Fanafzar Game Studio

Memory Management (God of War)

32 Meg memory

16 Meg for Levels, split into 2 4*1 Meg Enemies

1.5 Meg Exe

Run Time Data

Perm Data

•  Establish Hard Rules. –  16 Meg for Level Data (Split into 2 Levels) –  4 * 1 Meg for Enemies

•  Maintain 60fps From: Tim Moss 2006 GDC Talk

Fanafzar Game Studio

Tools

•  Profilers (Intel VTune, VS Profiler, …)

– Total time – Self time – Calls

•  System Monitors (Nvidia PerfHud, MS PIX,…)

•  System Adjusters (Intel GPA, …)

Fanafzar Game Studio

Holistic Optimization

•  Optimization Process •  CPU Bound •  GPU Bound

Fanafzar Game Studio

CPU Bound, Memory

•  Prefetching Memory •  Memory Cache

Fanafzar Game Studio

Memory Optimization

•  Cache Miss –  Instruction Cache – Data Cache

Fanafzar Game Studio

Memory Hierarchy

source: Memory Optimization, Christer Ericson, GDC 2003 Fanafzar Game Studio

Data Access Patterns

•  Linear Access Forward for (i = 0; i < numData; ++i) memArray[i];

•  Linear Access Backward

Fanafzar Game Studio

Data Access Patterns Ctd. •  Periodic Access

struct vertex {

float pos[3]; float norm[3]; float textCoord[3];

} for (i = 0; i < num; ++i)

vertexArray[i].pos •  Random Access

Fanafzar Game Studio

AOS vs. SOA

Fanafzar Game Studio

Critical Stride

•  Stride size in memory read can cause cache thrashing

Fanafzar Game Studio

Strip Mining for { access pos; } for {

access norm; } ------------------------------------------------------ for {

access pos; access norm;

}

Fanafzar Game Studio

Memory

•  Stack – Temporal coherence, spatial locality

•  Global – No fragmentation, freed at end

•  Heap – new, delete, malloc, free – No spatial locality, no temporal coherence,

fragmentation

Fanafzar Game Studio

Load-Hit-Store

•  Write data to address x and then read the data from address x -> Large stall

•  Writing data all the way to the main memory through all caches -> 40 to 80 CPU cycle delay

•  http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/

Fanafzar Game Studio

Load-Hit-Store

Fanafzar Game Studio

Memory Solutions •  Don’t allocate •  Linearize allocations

– Use arrays •  Memory pools

– Coherent – No fragmentation – No construction/destruction

•  Don’t construct or destruct – Plain Old Structures (POS)

Fanafzar Game Studio

Memory Solutions

•  Time scoped pools – Frame allocator – Pool for one level content, discarded at the

end

Fanafzar Game Studio

Memory Manager

“If you don’t have a custom memory manager in your game, you’re a fool (or a PC game developer)” Christer Ericson, Director of Tools and Technology, Sony Santa Monica

Fanafzar Game Studio

Memory Related Solutions •  Reducing memory footprint at compile time and

runtime •  Algorithms that reduce memory fetching •  Reduce cache miss

–  Spatial Locality –  Proper Stride –  Correct Alignment

•  Increase Temporal Coherence •  Utilize Pre-fetching •  Avoid worst-case access patterns that break

caching

Fanafzar Game Studio

Pitfalls of Object Oriented Programming

Summary of study (Tony Albrecht, 2009) •  Case study for CPU side rendering code •  Just re-organizing data locations was a win •  + pre-fetching is more win •  Can you decouple data from objects? •  Be aware of what the compiler and hardware

are doing, watch the generated assembly!

Fanafzar Game Studio

Pitfalls of OOP

•  Optimize for data first, then code – Memory access is going to be your biggest

bottleneck •  Simplify Systems

– KISS – Easier to optimize, Easier to parallelize

•  Keep code and data homogeneous •  Not everything needs to be an object

Fanafzar Game Studio

Pitfalls of OOP

•  You are writing a game – You have control over the input data – Don’t be afraid to pre-format it if needed

•  Design for specifics, not generics

Fanafzar Game Studio

Data Oriented Design

•  Better performance •  Better realization of code optimization •  Often simpler code •  More parallelizable code

Fanafzar Game Studio

CPU Bound: Compute

•  Lots of arithmetic operations not load and store

Fanafzar Game Studio

CPU Compute: Solutions •  Compiler flags (float: precise/fast) •  Time against Space

– Use of lookup tables •  Memoization •  Function Inlining •  Branch prediction, out of order execution

– Branch mis-prediction is much less costly than cache miss

•  Make branches more predictable

Fanafzar Game Studio

CPU Computer: Solutions

•  Remove Branches –  If (a) z=c; else z=d; – Z = a * c + (1 – a) * d

•  Profile Guided Optimization •  Loop unrolling

Fanafzar Game Studio

Loop Unrolling for (i = 0; i < 100; ++i)

sum += intArray[i]; ------------------------------------------------------ for (i = 0; i < 100; i+=4) {

sum1 += intArray[i]; sum2 += intArray[i+1]; sum3 += intArray[i+2]; sum4 += intArray[i+3];

} sum = sum1+sum2+sum3+sum4;

Fanafzar Game Studio

Virtual Functions

•  How slow are virtual functions really? http://assemblyrequired.crashworks.org/2009/01/19/how-slow-are-virtual-functions-really/

•  1000 iterations over 1024 vectors •  12,288,000 function calls •  Virtual: 159.856 ms •  Direct: 67.962 •  Inline: 8.040 ms

Fanafzar Game Studio

Slow Virtual Functions

•  Problem is not the cost of looking up the indirect function pointer from vtable.

•  The issue lies in “branch prediction” and the way marshalling parameters for the calling convention can get in the way of good instruction scheduling.

Fanafzar Game Studio

Micro Optimization •  Bit Tricks

– Bitwise Swap •  X^=Y; Y^=X; X^=Y;

– Bitmasks •  isFlagSet = someInt & MY_FLAG, someInt |= Flag2; •  Example use: Collisions in Physics

– Fast Modulo •  X%Y = X & (Y -1) iff Y is a power of 2

– Even and Odd •  (X & 1) == 0; // same as X%2==0

Fanafzar Game Studio

Book on Bit Tricks

•  Hacker’s Delight (Henry S. Warren, Addison Wesley, 2003)

Fanafzar Game Studio

Other Micro Optimization

•  Data type conversion •  SSE Instructions •  Removing loop invariant code •  Loop unrolling •  Cross-.obj optimization

– Whole program optimization •  Hardware Specific Optimizations

Fanafzar Game Studio

Vector vs. List

•  Random data insertion and deletion into a c++ vector and list compared

•  Data kept sorted in the containers

Fanafzar Game Studio

Vector vs. List Results

Fanafzar Game Studio

Vector vs. List Ctd.

Fanafzar Game Studio

STL iterator debugging

STL Iterator Debugging and Secure SCL http://channel9.msdn.com/Shows/Going+Deep/STL-Iterator-Debugging-and-Secure-SCL

Fanafzar Game Studio

Copy vs. Move

•  Vector of strings with 4 dimensions •  100 x 100 x 100 x 500 •  Construction: 564 ms •  Copy Construction: 537 ms •  Move Construction: 0.001 ms •  Empty Destruction: 0.001 ms •  Destruction: 285 ms

Fanafzar Game Studio

GPU Bound •  GPU related issues

– Synchronization – Capabilities Management – Resource Management – Global Ordering

•  Reflections/Shadows before scene •  Opaque front to back/Translucent back to front •  Sort by material or texture to reduce state changes

–  Instrumentation – Debugging

Fanafzar Game Studio

GPU Optimization Tricks •  State Changes •  Draw Call (Most common issue) •  Instancing and Batching

– Shader Instancing – Hardware Instancing

•  Video RAM – Device Resets – Resource uploads/locks

•  Minimize Copies •  Minimize Locks •  Double Buffer

Fanafzar Game Studio

GPU Optimization Ctd.

•  Fragmentation – Power of 2 allocations help

•  Lock culling – Debug visualization for those culled

•  Texture debugging – Different texture for each mip level

Fanafzar Game Studio

GPU Bound?

•  Spend a long time in API calls (Draw calls or swap/present frame buffer)

•  Front End / Back End – Triangles/Geometry – Pixels/Shaders – Vary each workload and measure

performance

Fanafzar Game Studio

Back End •  Fill Rate (ex. 1000 MP/sec)

– FPS, Overdraw, resolution – Fill Rate / FPS = overdraw * resolution – Render Target Format (16 / 32 bit) – Blending

•  Transparency instead of translucency – Shading

•  Pixel shaders – Texture Sampling

•  Format, Filter Mode, Count (DXT1)

Fanafzar Game Studio

Front End

•  Bottlenecks – Vertex Transformation

•  Lighting calculations, skinning, …

– Vertex Fetching and caching •  Vertex format, indexes (16/32 bit)

– Tessellation

Fanafzar Game Studio

Other GPU factors

•  Multi-sample antialiasing (MSAA) – Downsample from high-res render – Can significantly affect fill-rate

•  Lights and Shadows – CPU, vertex processing, pixel processing

Fanafzar Game Studio

Forward VS. Deferred

•  Multiple render targets needed for deferred

•  Lot of fill-rate needed for deferred •  Performance is flattened

Fanafzar Game Studio

Shaders

•  Memory •  Inter-shader communication •  Texture sampling (biggest problem with

memory) •  Computation

Fanafzar Game Studio

Other shader notes •  Shader compilation •  Shader count

– Penalty for many shaders in one scene –  Limits on GPU for shader execution

•  Effect framework – CgFX, ColladaFX (by tools like Nvidia FX

composer) – Oriented towards ease of use than performance – Engines have their own (Unreal 3, Unity, Source,

torque, Gamebryo)

Fanafzar Game Studio

Networking

•  Throughput •  Latency •  Reliability

– Out of order packets – Corrupted – Truncated – Lost

Fanafzar Game Studio

Reliability

•  User Datagram Protocol (UDP) •  Transmission Control Protocol (TCP)

Fanafzar Game Studio

Game Networking Data

•  Events – Guaranteed, Ordered

•  State data – Unordered, Not Guaranteed (opportunities for

optimization) – Unless using lock step simulation

Fanafzar Game Studio

Bandwidth

•  Bitstreams and Bit packing – Flag -> one bit – Health -> 7 bits

•  Encoding on streams

TCP/UDP

BitStream

Decimation LZW Huffman

Most Recent State Events

Fanafzar Game Studio

Prioritizing Data

•  Fill packet with most important data first •  Heuristic for most recent data (ex. how

close to player) •  Only send what you must

– ex. Cull enemy behind the wall

Fanafzar Game Studio

Packets

•  Smaller than 1400 bytes •  Send packets regularly (Routers allocate

bandwidth to those who use it)

Fanafzar Game Studio

Smooth Experience

•  Interpolation •  Extrapolation

– Client Side Prediction – Dead Reckoning

Fanafzar Game Studio

Profiling Networking

•  Make sure networking code is efficient – Measure compute and memory

•  Expose what the networking layer is doing – Number of packets – Bandwidth for each packet

•  Be aware of situations that client and server get out of sync.

Fanafzar Game Studio

Mass Storage

•  Hard Drives •  CD, DVD •  Blu-Ray •  Flash Drives

Fanafzar Game Studio

Performance Issues •  Seek Time •  Transfer Rate (ex. 75MB/sec)

•  Worst Case – 8ms delay between blocks on disk – 4KB blocks – Loading 1MB -> (1024/4) * 8 = 2048 ms = 2

secs – Loading 1GB -> 34 min

Fanafzar Game Studio

Rule

•  No disk IO in the inner loops

Fanafzar Game Studio

IO Profiling is hard •  File systems optimize themselves based on

access patterns •  Disk will rebalance data based on load and

sector failure •  Disk, disk controller, file system and OS will

cache and reorder requests •  User software may intercept the disk access

for virus scanning •  Good idea to test on fresh machines from

time to time

Fanafzar Game Studio

Disk IO performance tips

•  Limit disk access •  Minimize reads and writes

– Read larger chunks •  Asynchronous Access •  Optimize file order •  Optimize data for fast loading

– Space on disk vs. Time to load (ex. decompressing a JPG file)

Fanafzar Game Studio

Disk IO Tips •  Support development and runtime formats •  Support dynamic reloading •  Automate resource processing •  Centralize resource loading

– Resource Managers •  Preload when appropriate •  Stream

– First second of sound in memory – Small texture mip levels in memory – Small mesh LODs in memory

Fanafzar Game Studio

Concurrent Programming

•  Data Parallelism – Scatter Phase – Gather Phase

•  Task Parallelism

Fanafzar Game Studio

Threading Performance Problems

•  Scalability •  Contention •  Balancing

Fanafzar Game Studio

Scalability

•  High performance is proportional to the parallelizable section of an algorithm

•  Amdahl’s Law – S(N) = 1 / ((1 – P) + P/N) – N: Processors, P: Parallelizable Portion

Fanafzar Game Studio

Contention

•  More than one thread accessing the same resource

•  Some solutions – Thread Safety (Mutex) – Redundant Data – Efficient Synchronization (Locks, Atomic

Operations, …)

Fanafzar Game Studio

Balancing

•  Ensure all cores are busy •  Eliminate starving

Fanafzar Game Studio

False Sharing

Fanafzar Game Studio

False Sharing Ctd. Struct vertex {

float xyz[3]; // data 1 float tutuv[2]; // data 2

}; vertex triList[N]; ------------------------------------------------------------ Struct vertices {

float xyz[3][N]; float tutuv[3][N];

}; vertices triList;

Fanafzar Game Studio

Multi-threaded Profiling

•  Look for time spent on synchronization primitives

•  Look out for Heisenbugs! •  Assess Amdahl’s Law •  Use multi-threaded profilers

Fanafzar Game Studio

No Synchronization is best

•  Lock-free algorithms are great. •  Wait-free algorithms are event better!!

Mike Acton notes on wait free coding: http://cellperformance.beyond3d.com/articles/2009/08/roundup-recent-sketches-on-concurrency-data-design-and-performance.html

Fanafzar Game Studio

Managed Languages

•  Execute on a runtime •  C#, Java, Javascript, lua, python, php,

Actionscript

Fanafzar Game Studio

Concerns for Profiling

•  Garbage Collector •  Just in Time compiler •  No high accuracy timers •  Allocation can be costly, usually no stack

Fanafzar Game Studio

Managed/Unmanaged

•  Gameplay code is usually not performance critical

•  Bottlenecks can be replaced with native code

Fanafzar Game Studio

Dealing with GC

•  Memory pressure causes GC to run frequently and cause sudden hitches

•  Memory pressure causes big memory footprint and hurts cache efficiency

•  Big total working set needs the GC to check all the pointers

•  Incremental GC behavior is helpful but high pressure can force GC to collect all

Fanafzar Game Studio

Strategies for dealing with GC

•  Less data on heap •  Your own memory management •  Memory pooling •  Using temporary objects that are instances

as class members instead of local variable creation

Fanafzar Game Studio

Dealing with JIT

•  JIT activation time is important for performance (startup, after a few function calls, …)

•  Constructors usually left out (Heavy initialization code needs to be in a helper function)

•  JIT might not be available on all platforms

Fanafzar Game Studio

Optimizing Animation

•  Channel Omission •  Quantization •  Sample Frequency and Key Omission •  Curve Based Compression •  Selective Loading and Streaming •  Hardware Skinning

Fanafzar Game Studio

Misc. Optimization Related Topics

•  Mesh LOD •  Animation LOD •  AI LOD •  Collision Detection Spatial Partitioning •  Physics Optimizations (GPU, Sleeps, …)

Fanafzar Game Studio

PIX Test Case

•  PIX (Performance Investigator for Xbox •  Part of DirectX SDK •  Used for DirectX based applications •  Used for analyzing Garshasp 1 and

Garshasp: Temple of the Dragon (Expansion)

Fanafzar Game Studio

Using PIX to Analyze Garshasp

Fanafzar Game Studio

Selecting Measurement Attributes

Fanafzar Game Studio

In-Game HUD

Fanafzar Game Studio

PIX Report

Fanafzar Game Studio

Garshasp Performance Post-Mortem

•  Animation skinning (Intel VTune) –  Switched to Hardware Skinning

•  Asset Loading –  Used background thread

•  Draw Calls –  Dynamic Far-Clip distance

•  High RAM consumption –  Reduced particle quotas –  Reduced Area arrangement (changes in camera

system needed) –  Reduced Texture size –  Better strategies for audio loading/unloading

Fanafzar Game Studio

Garshasp Ctd. •  Large Video memory usage

– Changed mesh geometry – Better seamlessness strategy

•  Frame rate drops – Better use of particles – Modifications to camera angles and

seamlessness strategy – Smaller areas for more even distribution of

resource loading.

Fanafzar Game Studio

Some un-resolved issues •  Un-optimized animation system •  Overdraw •  Slow Game Object update loop •  No static batching

– Use of vertex color for baked color •  Huge game save data •  In-efficient texture size usage •  No sound/video streaming •  + may more!

Fanafzar Game Studio

Biggest Optimization Related Problem

No internal resource consciousness!

Fanafzar Game Studio

Unity Editor Profiler

Fanafzar Game Studio

Profiler Views

Fanafzar Game Studio

CPU

Fanafzar Game Studio

Deep Calls

Fanafzar Game Studio

Rendering Information

Fanafzar Game Studio

Memory

Fanafzar Game Studio

CPU vs. GPU

Fanafzar Game Studio

References •  Video Game Optimization, Ben Garney and Eric Preisz •  “How the left and right brain learned to love one another”, Tim Moss

http://timmoss.blogspot.com/2007/02/it-seems-reasonable-that-my-very-first.html

•  “Optimization is a Full time job”, Maciej Sinilo http://msinilo.pl/blog/?p=483

•  “Memory Optimizaton”, Christer Ericson, http://www.research.scea.com/research/pdfs/GDC2003_Memory_Optimization_18Mar03.pdf

•  “A pragmatic approach to optimization”, Niklas Frykholm, http://bitsquid.blogspot.com/2011/12/pragmatic-approach-to-performance.html

Fanafzar Game Studio

References Ctd. •  Hacker’s Delight (Henry S. Warren, Addison

Wesley 2003) •  Advanced Bit Manipulation-fu, Christer Ericson

http://realtimecollisiondetection.net/blog/?p=78 •  Networking for Programmers, Glenn Fiedler,

http://gafferongames.com/networking-for-game-programmers/

•  Source Multiplayer Networking, Valve Software, https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking

Fanafzar Game Studio

References Ctd. •  False sharing and its effect on memory performance,

William J. Bolosky, http://static.usenix.org/publications/library/proceedings/sedms4/full_papers/bolosky.txt

•  Concurrency, Data Design and Performance, Mike Acton, http://cellperformance.beyond3d.com/articles/2009/08/roundup-recent-sketches-on-concurrency-data-design-and-performance.html

•  Diving down the concurrency rabbit hole, Mike Acton, http://www.insomniacgames.com/tech/articles/0809/files/concurrency_rabit_hole.pdf

Fanafzar Game Studio

References Ctd. •  Scalar Quantization, Jonathan Blow,

http://number-none.com/product/Scalar%20Quantization/index.html

•  Are we out of memory, Christian Gyrling, http://www.swedishcoding.com/2008/08/31/are-we-out-of-memory/

•  Practical Efficient Memory Management, Jesus De Santos, http://entland.homelinux.com/blog/2008/08/19/practical-efficient-memory-management/

•  Fanafzar Game Studio

References Ctd. •  Load Hit Store and the restrict keyword, Elan

Ruskin, http://assemblyrequired.crashworks.org/2008/07/08/load-hit-stores-and-the-__restrict-keyword/

•  How slow are virtual functions really, Elan Ruskin, http://assemblyrequired.crashworks.org/2009/01/19/how-slow-are-virtual-functions-really/

•  Current Generation Parallelism in Games, Jon Olick, http://s08.idav.ucdavis.edu/olick-current-and-next-generation-parallelism-in-games.pdf

Fanafzar Game Studio

References Ctd. •  Real Life Performance Pitfalls, Alan Murphy,

http://www.microsoft.com/en-us/download/confirmation.aspx?id=3539

•  Graphics Programming Black Book, Michael Abrash

•  Zen of Code Optimization, Michael Abrash •  The Free Lunch is Over, Herb Sutter,

http://www.gotw.ca/publications/concurrency-ddj.htm

Fanafzar Game Studio

References Ctd. •  Intel Software Optimization Cookbook,

http://www.intel.com/intelpress/sum_swcb2.htm •  Pitfalls of Objects Oriented Programming, Tony

Albrecht, http://www.reddit.com/r/programming/comments/ag43j/pitfalls_of_object_oriented_programming_pdf/

•  Microsoft PIX, http://msdn.microsoft.com/en-us/library/ee663275(v=vs.85).aspx

Fanafzar Game Studio

References Ctd.

•  Top 10 Myths of Video Game Optimization, http://www.gamasutra.com/view/feature/130296/the_top_10_myths_of_video_game_.php?print=1

Fanafzar Game Studio

Questions?

fassihi@fanafzar.com

Fanafzar Game Studio