CHC ++: Coherent Hierarchical Culling Revisited
description
Transcript of CHC ++: Coherent Hierarchical Culling Revisited
CHC ++:Coherent Hierarchical Culling
Revisited
Oliver Mattausch, Jiří Bittner,
Michael Wimmer
Institute of Computer Graphics and Algorithms
Vienna University of Technology
Oliver Mattausch 2
CHC++: Fast occlusion culling algorithm
Oliver Mattausch 3
Occlusion Culling
Render only visible geometry → Output sensitivity
Preprocessing vs.Online occlusion culling
Preprocessing: Visibility from a region
Oliver Mattausch 4
Online Occlusion Culling
Query visibility from view point+ No preprocessing+ Dynamic Scenes
Hardware occlusion queries → # visible pixels
Query bounding box Render geometry
Visible?
Oliver Mattausch 5
Naive method: Hierarchical Stop & Wait
For each node: Issue query
Visible → traverse subtree
Invisible → cull subtree
Problem: Query latency
CPU stalls
GPU starvation
Cull Render
Hierarchical Culling
Render
Oliver Mattausch 6
Previous Work
Coherent Hierarchical Culling (CHC) [Bittner04]
Near Optimal Hierarchical Culling (NOHC) [Guthe06]
Oliver Mattausch 7
CHC
While waiting for query result → traverse / render
Keep query queue
Use coherence, assume node stays (in)visible
For previously visible nodes
Don‘t wait for query result
Issue query
Render geometry
Use result in next frameResult
available?
Oliver Mattausch 8
Problems of CHC Too many queries Not GPU friendly
Many state changes
Bounding box query (8 vertices per draw call) Can be slower (!) than view frustum culling (VFC)
Most houses visible → Bad view point for CHC
Oliver Mattausch 9
Properties of NOHC
+ Query only if cheaper than rendering+ Mostly better than view frustum culling+ Close to self-defined optimum - Hardware calibration step- Complex set of rules Possible to beat the defined optimum
Can reduce cost of queries Can further reduce # queries
Oliver Mattausch 10
Improved algorithm: CHC ++
Reduction of
State changes
Queries
Wait time
Rendered geometry
Keeps simplicity of CHC
Oliver Mattausch 11
Building Blocks of CHC ++
Query batching
Reduction of state changes
Reduction of CPU stalls
Multiqueries
Reduction of queries
Randomization
Better distribution of queries
Tight bounding volumes
Reduction of queries
Reduction of rendered geometry
Oliver Mattausch 12
Switch between render / query mode → Need state change (depth write on / off)
CHC induces one state change per query
Big overhead on modern GPUs!
Query Batching: State Changes
Idea: Store query candidates in separate queue
Collect n nodes
Switch to query mode
Query all nodes
State change
Oliver Mattausch 13
Query Batching: Previously invisible nodes
Query
Query
Query
Query
Query
Candidate queue Query queue
State change
Render mode
Oliver Mattausch 14
Previously visible nodes
No dependencies (geometry rendered anyway)
Can issue query at any time
Handle them in separate queue
Issue queries to fill up wait time
Very likely no new state change
Issue rest of queries in next frame
Query Batching: Previously visible nodes
Oliver Mattausch 15
Query Batching: Visualization
CHC: ~100 state changes CHC++: 2 state changes(Max. batch size: 50)
Each color represents a state change
Oliver Mattausch 16
Node invisible for long time
Likely to stay invisible (e.g., car engine block)
Cover many nodes with single query
Test q invisible nodes by single multiquery
Invisible → saved (q – 1) queries
Visible → must test individually, wasted one query
Multiqueries: Idea
Oliver Mattausch 17
Use history of nodes
Estimate probability that node will still be invisible in frame n if it was invisible in frame n - 1
Measurements behave like certain exp() function → sufficient in practice
Multiqueries : Minimize #queries
Fitted andmeasured functions
Oliver Mattausch 18
While node batch not empty
Add node to multiquery
Use cost / benefit model
Query size optimal → issue multiquery
Multiqueries: Greedy Algorithm
Visualization: Each color represents a multiquery
Oliver Mattausch 19
Queues in CHC++
traversal queue
v-queue(visible nodes)
i-queue (invisible nodes)
query queueMultiquery
Oliver Mattausch 20
Test previously visible nodes each frame → queries wasted
Assume visible for t frames
Frame rate drops every t frames
Randomization: Assumed Visibility
Q Q Q
Q Q Q
Q Q Q
Q Q Q
Oliver Mattausch 21
When node becomes visible
Randomize first invocation between [1.. t]
Afterwards, test every t frames
Randomization: Idea
Q Q Q
Q Q Q
Q Q Q
Q Q Q
Oliver Mattausch 22
+ Even distribution over frames+ Nodes tested in regular intervals+ Very stable for t between 5 - 10
Tried sophisticated models
But they could not beat the simple randomization!
Randomization: Properties
Optimization for bounding volume hierarchy (BVH)
For each node → query bounding boxes of children(using single query)
Child boxes invisible → Cull node, saved 2 queries Box visible → Traverse node
Oliver Mattausch 23
Tight Bounding Volumes: Idea
Oliver Mattausch 24
Subdivide deeper than actual hierarchy depth → tight bounds also for leaves
Better adjusts to shape of objects
Tight Bounding Volumes for Leaves
Tight bounds shown in red
Oliver Mattausch 25
+ Bounds for interiors reduce queries+ Bounds for leaves reduce geometry
(without overhead of deeper hierarchy)+ More boxes per query not relevant for hardware
Tight Bounding Volumes: Properties
Oliver Mattausch 26
Results
Powerplant (12M triangles)
Pompeii (6M triangles)
Oliver Mattausch 27
Results: Powerplant (12M triangles)
Oliver Mattausch 28
Results: Powerplant (12M triangles)
CHC + Batching + Randomization + Tight Bounds + Multiqueries
Oliver Mattausch 29
Results: Pompeii (6M triangles)
+ Hierarchical culling algorithm based on CHC+ Kept simplicity of CHC+ Improved several issues of CHC+ Always better than view frustum culling+ Up to 2 – 3 times speedup+ Better than optimum as defined by NOHC Drawback: several parameters
Conclusions
Oliver Mattausch 30
Oliver Mattausch 31
Multiqueries: Vienna (1M triangles)
Oliver Mattausch 32
Questions?
THANK YOU FOR YOUR ATTENTION!
Oliver Mattausch 33
Rendering in modern engines
Collect visible objects in render queue
Sort by materials
Render everything at once
Rendering single nodes is inefficient (CHC)
With batching:
Traverse render queueonly before switch to query mode
Batching: Render engine integration
Future Work
(Semi-)automatically find optimal values for parameters
Calibrate parameters during representative walkthrough
Remove overhead also for difficult view points
Oliver Mattausch 34
Oliver Mattausch 35
Naive method: Hierarchical Stop & Wait
For each node: issue query
Visible: Traverse subtree
Invisible: Cull subtree
Problem: Query latency
CPU stalls
GPU starvation
Query
Query Query
Query Query
Stop & Wait example:
Cull CullRender
Oliver Mattausch 36
Results
Powerplant (12M tri)
Pompeii (6M tri)
Vienna (1M tri)