Advances in Real-Time Rendering in Games
Advances in Real-Time Rendering in Games
Game Worlds from Polygon Soup: Visibility, Spatial Connectivity and Rendering
Hao Chen Ari Silvennoinen Natalya Tatarchuk
Bungie Umbra Software Bungie
Advances in Real-Time Rendering in Games
Talk Outline
• Game Environment
• Previous Approach
• New Approach
• Implementation
• Demo
Advances in Real-Time Rendering in Games
Halo Reach
Advances in Real-Time Rendering in Games
More than pretty pixels
• AI perception
• Activation
• Object Attachment
• Audibility
• Caching/Paging
• Path-finding
• Collision/Physics
• Visibility
• Spatial Connectivity
• Rendering
Advances in Real-Time Rendering in Games
More than pretty pixels
• AI perception
• Activation
• Object Attachment
• Audibility
• Caching/Paging
• Path-finding
• Collision/Physics
• Visibility
• Spatial Connectivity
• Rendering
Advances in Real-Time Rendering in Games
Background
Cells and portals
Potentially Visible Sets (PVS)
Occluder rasterization Software rasterization
Hardware occlusion queries
GPGPU solutions
Spatial Connectivity Watershed Transform
Automatic Portal Generation
Advances in Real-Time Rendering in Games
Halo Approach
• Cells and portals
• Watertight shell geometry
• Artists manually placed portals
• Build a BSP tree from shell geometry
• Floodfill BSP leaves into cells
• Build cell connectivity
Advances in Real-Time Rendering in Games
Pros
• Unified visibility/spatial connectivity
• Precise spatial decomposition
• Inside/outside test
• Great for indoor spaces with natural portals
Advances in Real-Time Rendering in Games
Cons
• Manual portalization is non-trivial!
• Watertightness is painful for content authoring
• Force early level design decision
• Optimized for indoor scene only.
Advances in Real-Time Rendering in Games
Portalization Example
Advances in Real-Time Rendering in Games
Polygon Soup!
Advances in Real-Time Rendering in Games
Polygon Soup
• Just pieces jammed together
• No water-tightness
• No manual portals
• Incremental construction/fast iteration.
• Allow late design changes
Advances in Real-Time Rendering in Games
General Idea
• Sub-divide the scene
• Voxelize the subdivided volume
• Segment the voxels into regions
• Build a connectivity graph between regions
• Build simplified volumes from voxel regions
Advances in Real-Time Rendering in Games
2D Example –Path Finding
Input Scene
[Recast Library: Mikko Mononenhttp://code.google.com/p/recastnavigation/]
Advances in Real-Time Rendering in Games
2D Example –Path Finding
Voxelization
Advances in Real-Time Rendering in Games
2D Example –Path Finding
“Walk-able” voxels
Advances in Real-Time Rendering in Games
2D Example –Path Finding
Distance Field
Advances in Real-Time Rendering in Games
2D Example –Path Finding
2D Watershed Transform
Advances in Real-Time Rendering in Games
2D Example –Path Finding
Contour
Advances in Real-Time Rendering in Games
2D Example –Path Finding
Nav Mesh
Advances in Real-Time Rendering in Games
3D Watershed
Bungie & Zhe Jiang University
Zhefeng Wu, Xinguo Liu
Advances in Real-Time Rendering in Games
3D Watershed Transform
Advances in Real-Time Rendering in Games
Problems
• 3D is considerably harder/slower
• Over-segmentation (small regions)
• Sensitive to scene changes
• Simplified representation non-trivial
• What about visibility?
Advances in Real-Time Rendering in Games
Collaboration with Umbra
• Automatic portal generation
• Incremental/local updates
• CPU based solution, low latency
• Same solution for visibility and spatial connectivity
• Handle doors and elevators
• Precise around user placed portals
• Fast run time / low memory fooprint
Advances in Real-Time Rendering in Games
Umbra Solution
Polygon soup
Automatic cell and portal generation
Visibility and connectivity queries
Preprocess
Runtime
Advances in Real-Time Rendering in Games
Preprocess Overview
Polygon soup
Automatic cell and portal generation
Visibility and connectivity queries
Preprocess
Runtime
Advances in Real-Time Rendering in Games
Preprocess Overview
Discretize the scene into voxels
Determine voxel connectivity with respect to input geometry
Propagate connectivity to find connected components
Determine portals between local connected components
Advances in Real-Time Rendering in Games
Tile Grid
Subdivide the input geometry into tiles
Advances in Real-Time Rendering in Games
Tile Grid
Subdivide the input geometry into tiles
Localizes computation
Distributed computing
Fast local changes
Advances in Real-Time Rendering in Games
Tile Voxelization
Compute a BSP tree for each tile
Advances in Real-Time Rendering in Games
Tile Voxelization
Compute a BSP tree for each tile
Subdivide to discretization level
Skip empty space
Leaf nodes = voxels
Advances in Real-Time Rendering in Games
From Voxels to Cells and Portals
Advances in Real-Time Rendering in Games
From Voxels to Cells and Portals
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Voxel Classification
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Voxel Connectivity
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Voxel Connectivity
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Voxel Connectivity
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Voxel Connectivity
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Voxel Connectivity
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Cells
Classify voxels
Connect voxels
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Portals
Classify voxels
Build voxel connections
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Portals
Classify voxels
Build voxel connections
Local connected components represent view cells
Build portals between connected cells
Advances in Real-Time Rendering in Games
Cell Graph
Optimize cells and portals to a runtime cell graph
Runtime algorithms are graph traversals
Advances in Real-Time Rendering in Games
Cell Graph
Optimize cells and portals to a runtime cell graph
Runtime algorithms are graph traversals
Graph structure allows limited dynamic changes
Advances in Real-Time Rendering in Games
Runtime Algorithms
Polygon soup
Automatic cell and portal generation
Visibility and connectivity queries
Preprocess
Runtime
Advances in Real-Time Rendering in Games
Connectivity Algorithms
Connectivity is encoded in the precomputed cell graph
Connectivity queries are just graph traversals
Examples: Find connected region (region == set of view cells)
Find shortest 3D path
Intersection queries
Ray casts
Combinations: Ray cast -> connected region -> objects in region
Lot’s of possibilities for simulation and AI
Advances in Real-Time Rendering in Games
Visibility Algorithms
Practical analytic visibility in the cell graph Axis aligned portals enable effective algorithms
From point visibility queries
From region visibility queries
Volumetric visibility
We can choose to be aggressive or conservative
Advances in Real-Time Rendering in Games
Potentially Visible Sets
Deterministic conservative visibility
Computation time is directly related to culling efficiency
Every solution is useful Sampling based visibility solvers can take long time to converge
Additional use cases: Identify visibility hotspots
Cull always hidden triangles
Cull always hidden lightmaps
Advances in Real-Time Rendering in Games
Portal Culling
How to traverse 100K+ portals fast?
Recursive algorithm does not scale Many paths to one cell – combinatorial explosion
Rasterization based approach BSP-style front-to-back traversal
Update coverage buffer on entry and exit
Fast – 16 pixels at a time with 128-bit SIMD vectors
Advances in Real-Time Rendering in Games
Renderer Integration
• Focus on the pipeline, not on rendering techniques
• Visibility integration with game state extraction and rendering
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
• Coarse-grain parallelism
– System on a thread
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
• Coarse-grain parallelism
– System on a thread
• Explicit synchronization through state mirroring
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
• Coarse-grain parallelism
– System on a thread
• Explicit synchronization through state mirroring
• Mostly manual load-balancing
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
Simulation loop @ 30 hz
Job kernel
Render loop @ 30 hz
Audio loop
Job kernel, debug logging
Async tasks, I/O, misc.
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
Simulation loop @ 30 hz
Job kernel
Render loop @ 30 hz
Audio loop
Job kernel, debug logging
Async tasks, I/O, misc.
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Advances in Real-Time Rendering in Games
Halo Reach Game Loop
Simulation loop @ 30 hz
Job kernel
Render loop @ 30 hz
Audio loop
Job kernel, debug logging
Async tasks, I/O, misc.
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Advances in Real-Time Rendering in Games
Halo Reach: Simulation Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Advances in Real-Time Rendering in Games
Halo Reach: Simulation Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update
Advances in Real-Time Rendering in Games
Halo Reach: Simulation Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update Havok Update Obj move
Advances in Real-Time Rendering in Games
Halo Reach: Simulation Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update Havok Update Obj move
Advances in Real-Time Rendering in Games
Halo Reach: Simulation Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update Havok Update Obj move
Advances in Real-Time Rendering in Games
Halo Reach: Simulation Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update Havok Update Obj move
frame is published for renderinggame state
mirror
Advances in Real-Time Rendering in Games
Halo Reach: Render Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update Havok Update Obj move
Player Viewport 1 Player Viewport 2
Advances in Real-Time Rendering in Games
Halo Reach: Render Thread (MP)
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update Havok Update Obj move
Player Viewport 1 Player Viewport 2PV1: Visib PV1: Submission PV2: Visib PV2: Submission
Advances in Real-Time Rendering in Games
Halo Reach: Thread Utilization
Simulation loop: 75-100% utilized
Job kernel: 20-30% utilized
Render loop: 70-100% utilized
Audio loop: 50-80% utilized
Job kernel, debug logging: 20-30% utilized
Async tasks, I/O, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Advances in Real-Time Rendering in Games
Halo Reach: Thread Utilization
Simulation loop: 75-100% utilized
Job kernel: 20-30% utilized
Render loop: 70-100% utilized
Audio loop: 50-80% utilized
Job kernel, debug logging: 20-30% utilized
Async tasks, I/O, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Advances in Real-Time Rendering in Games
Can We Do Better?
• Observation #1: We don’t need the entire game state for rendering
Advances in Real-Time Rendering in Games
Gamestate and Visibility
• In Reach, game state extraction happens before we do visibility
– That’s why we have to copy the entire game state
– Expensive (in ms and memory footprint)
Advances in Real-Time Rendering in Games
Reduce CPU Latency
• Visibility is a large chunk of CPU time on render thread
• Yet we have CPU time is under utilized
– Underutilized HW threads
– And not to mention other platforms!
Advances in Real-Time Rendering in Games
Gamestate and Visibility
• But we can invert that operation
• Only copy data for visible objects out of game state
– Only extract data for objects that will be rendered
Advances in Real-Time Rendering in Games
Extract Post Visibility
• Better: Drive game extraction and processing based on results of visibility
– Only extract data for visible objects (both static and dynamic)
• No need to double buffer the entire game state
– Only buffer game data for the per-frame transient state for visible objects
– Smaller memory footprint
Advances in Real-Time Rendering in Games
Better Load Balancing
• Start by splitting off visibility computation into jobs per view
– This includes visibility computations for player, shadow, reflection views
• Visibility jobs can have viewport-to-viewport dependencies
– Can reuse results of one visibility job computation as input to another
Advances in Real-Time Rendering in Games
Reducing Input Latency
• Stagger visibility computation at the same time as game object update
– Start static visibility early with predictive camera early in the frame
– Start this before we do object update
Advances in Real-Time Rendering in Games
Improve CPU Latency
• Run expensive CPU render operations for visible objects only
– Just make sure to run this after visibility
– These would be render-only operations (skinning, cloth sim, polygon sorting) – they do not affect game play
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Predict camera envelope
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Determine render views next
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Start static objects’ (environment) visibility and broadphase dynamic objects visibility for render views as
jobs on available threads:player, shadows, etc.
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Execute object update jobs
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Run render prepare for static
environment rendering jobs
(bake precompiled command buffers,
etc.)
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Finalize camera (poll input)
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Compute narrow phase dynamic
objects visibility (as jobs)
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Preparing visible dynamic
objects and extracting game
state data for them (in jobs)
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move
Execute final prepare jobs to finalize frame packet data
Advances in Real-Time Rendering in Games
An Improved Game Loop
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Object Update/Move \
Publish the frame for rendering
Advances in Real-Time Rendering in Games
Streamlined Submission Thread
Simulation loop: 75-100%
Job kernel: 20-30%
Render loop: 70-100%
Audio loop: 50-80%
Job kernel, debug logging: 20-30%
Async tasks, socket polling, misc: 10-30% with bursts of 100% utilization
HW Thread 0
HW Thread 1
HW Thread 2
HW Thread 3
HW Thread 4
HW Thread 5
Render submission job
Object Update Havok Update Obj move
Advances in Real-Time Rendering in Games
Benefits
• Decouple game-state traversal from drawing
• Better CPU utilization with staggered visibility computation
– Earlier results for each frame mean reduced input latency
Advances in Real-Time Rendering in Games
Benefits
• Decouple game-state traversal from drawing
• Better CPU utilization with staggered visibility computation
• Render thread becomes a streamlined kernel processor
Advances in Real-Time Rendering in Games
A Simple Little Job Tree
Advances in Real-Time Rendering in Games
Future Work
• Predict dynamic objects visibility with temporal bounding volume
• Fixup after final camera and object positions are known
Advances in Real-Time Rendering in Games
Top Related