GPU-Based Scene Management for Rendering Large Crowds · us full animation control for each...

1
GPU-Based Scene Management for Rendering Large Crowds Joshua Barczak AMD, Inc. Natalya Tatarchuk AMD, Inc. Christopher Oat AMD, Inc. 1 Introduction Many rendering scenarios, such as battle scenes or urban environ- ments, require rendering of large numbers of autonomous charac- ters. Crowd rendering in large environments presents a number of challenges, including visibility culling, animation, and level of detail (LOD) management. These have traditionally been CPU-based tasks, trading some extra CPU work for a larger reduction in the GPU load, but the per-character cost can be a serious bottleneck. Furthermore, CPU-side scene management is difficult if objects are simulated and animated on the GPU. We present a practical solution that allows ren- dering of large crowds of characters, from a variety of viewpoints, with stable performance and excellent visual quality. Our system uses DirectX10 R functionality to perform view-frustum culling, occlusion culling, and LOD selection entirely on the GPU, allowing thousands of GPU-simulated characters to be rendered with full shadows in arbi- trary environments. To our knowledge this is the first system presented that supports this functionality. 2 Scene Management We start with a vertex buffer containing all of the per-instance data needed to render each character, such as character positions and orien- tations. In our case, this information is obtained from a GPU-based crowd simulation, but a CPU-based simulation or user input could also be used. A key idea behind our system is the use of geome- try shaders that act as filters for a set of character instances. Given the instance data, we want to generate a set of buffers containing the visible instances at each level of detail. We do this by repeatedly rendering the instances as point primitives, using a geometry shader to filter the stream by re-emitting points that pass a particular test. Multiple filtering passes can be chained together by using successive DrawAuto calls, and different tests can be set up simply by using dif- ferent shaders. 2.1 Instance Culling It is straightforward to implement view-frustum culling using a stream filtering pass, as described above. In a frustum culling pass, the ver- tex shader performs a frustum intersection test against the character bounding sphere, and the geometry shader re-emits characters that pass the test. A unique benefit of our method is that we can also perform occlusion culling by examining the depth buffer in a vertex shader, and compar- ing it to the minimum depth of the character’s bounding volume. This allows culling against arbitrary occluders in dynamic environments, without preprocessing, and without the use of queries or predication, which are too expensive to apply per-instance. Prior to culling, we build a Hierarchical Z (Hi-Z) image [Greene et al. 1993] using the in- formation contained in the Z buffer. At cull-time, each object chooses a MIP level in the Hi-Z image based on the projected size of its bound- ing volume, and fetches a fixed number of texels for the occlusion test. 2.2 LOD Selection After culling, it is still necessary to group visible instances by LOD. In general the number of LODs can be selected due to the size or the complexity of the environment and the number of characters rendered. In our system, we use a static, three-level LOD scheme, in which the distance to the object centroid determines LOD. LOD sorting is performed by using three successive filtering passes into three output buffers, where each pass emits only the instances that fall into a partic- ular LOD bucket. Note that the culling tests are performed only once per instance, and the results are re-used during the LOD passes. Figure 1: A crowd of GPU-simulated characters (left). Characters are color coded by LOD (right). 3 Rendering After the LOD filtering passes, a GPU query is used to read the num- ber of instances for each LOD, and separate DrawInstanced calls are issued to render the LOD groups. The readback of the instance count is necessary in order to pass the count to the DrawInstanced calls, but the GPU stall it introduces can be avoided if additional rendering work is submitted prior to the readback. We use hardware tessellation and dis- placement mapping for the closest LOD for high amount of details in close-ups, conventional rendering for the middle LOD, and simplified geometry and shaders for the furthest LOD rendering. To animate our characters, we sample the skeletal animations for each animation cy- cle (running, digging, etc), and pack the resulting curves into a texture array which is sampled by the characters’ vertex shaders. This allows us full animation control for each character directly on the GPU. 4 Shadows High quality rendering system requires dynamic shadows cast by char- acters onto the environment and themselves. To manage shadow map resolution, our system implements Parallel Split Shadow Maps [Zhang et al. 2006]. The view-frustum test described in Section 2.1 is used to ensure that only characters that are within a particular parallel split frustum are rendered. Occlusion culling could also be used for shadow maps as well, but we do not do this in our system, because only char- acters and smaller scene elements are rendered into the shadow maps and there is little to cull the characters against (shadows cast by terrain are handled separately). We use aggressive filtering for generation of soft shadows. This allows us to use further mesh simplification for the LOD rendered into shadow maps. For characters in the higher-detail shadow frusta, we use the same simplified geometry that is used for the most distant level of detail during normal rendering. For more distant shadows, we can use a more extreme simplification. References GREENE, N., KASS, M., AND MILLER, G. 1993. Hierarchical z- buffer visibility. In SIGGRAPH ’93: Proceedings of the 20th an- nual conference on Computer graphics and interactive techniques, ACM, New York, NY, USA, 231–238. ZHANG, F., SUN, H., XU, L., AND LUN, L. K. 2006. Parallel-split shadow maps for large-scale virtual environments. In VRCIA ’06: Proceedings of the 2006 ACM international conference on Virtual reality continuum and its applications, ACM, New York, NY, USA, 311–318.

Transcript of GPU-Based Scene Management for Rendering Large Crowds · us full animation control for each...

Page 1: GPU-Based Scene Management for Rendering Large Crowds · us full animation control for each character directly on the GPU. 4 Shadows High quality rendering system requires dynamic

GPU-Based Scene Management for Rendering Large Crowds

Joshua BarczakAMD, Inc.

Natalya TatarchukAMD, Inc.

Christopher OatAMD, Inc.

1 Introduction

Many rendering scenarios, such as battle scenes or urban environ-ments, require rendering of large numbers of autonomous charac-ters. Crowd rendering in large environments presents a number ofchallenges, including visibility culling, animation, and level of detail(LOD) management. These have traditionally been CPU-based tasks,trading some extra CPU work for a larger reduction in the GPU load,but the per-character cost can be a serious bottleneck. Furthermore,CPU-side scene management is difficult if objects are simulated andanimated on the GPU. We present a practical solution that allows ren-dering of large crowds of characters, from a variety of viewpoints,with stable performance and excellent visual quality. Our system usesDirectX10 R© functionality to perform view-frustum culling, occlusionculling, and LOD selection entirely on the GPU, allowing thousandsof GPU-simulated characters to be rendered with full shadows in arbi-trary environments. To our knowledge this is the first system presentedthat supports this functionality.

2 Scene Management

We start with a vertex buffer containing all of the per-instance dataneeded to render each character, such as character positions and orien-tations. In our case, this information is obtained from a GPU-basedcrowd simulation, but a CPU-based simulation or user input couldalso be used. A key idea behind our system is the use of geome-try shaders that act as filters for a set of character instances. Giventhe instance data, we want to generate a set of buffers containing thevisible instances at each level of detail. We do this by repeatedlyrendering the instances as point primitives, using a geometry shaderto filter the stream by re-emitting points that pass a particular test.Multiple filtering passes can be chained together by using successiveDrawAuto calls, and different tests can be set up simply by using dif-ferent shaders.

2.1 Instance Culling

It is straightforward to implement view-frustum culling using a streamfiltering pass, as described above. In a frustum culling pass, the ver-tex shader performs a frustum intersection test against the characterbounding sphere, and the geometry shader re-emits characters that passthe test.

A unique benefit of our method is that we can also perform occlusionculling by examining the depth buffer in a vertex shader, and compar-ing it to the minimum depth of the character’s bounding volume. Thisallows culling against arbitrary occluders in dynamic environments,without preprocessing, and without the use of queries or predication,which are too expensive to apply per-instance. Prior to culling, webuild a Hierarchical Z (Hi-Z) image [Greene et al. 1993] using the in-formation contained in the Z buffer. At cull-time, each object choosesa MIP level in the Hi-Z image based on the projected size of its bound-ing volume, and fetches a fixed number of texels for the occlusion test.

2.2 LOD Selection

After culling, it is still necessary to group visible instances by LOD.In general the number of LODs can be selected due to the size or thecomplexity of the environment and the number of characters rendered.In our system, we use a static, three-level LOD scheme, in whichthe distance to the object centroid determines LOD. LOD sorting isperformed by using three successive filtering passes into three outputbuffers, where each pass emits only the instances that fall into a partic-ular LOD bucket. Note that the culling tests are performed only onceper instance, and the results are re-used during the LOD passes.

Figure 1: A crowd of GPU-simulated characters (left). Charactersare color coded by LOD (right).

3 RenderingAfter the LOD filtering passes, a GPU query is used to read the num-ber of instances for each LOD, and separate DrawInstanced calls areissued to render the LOD groups. The readback of the instance count isnecessary in order to pass the count to the DrawInstanced calls, but theGPU stall it introduces can be avoided if additional rendering work issubmitted prior to the readback. We use hardware tessellation and dis-placement mapping for the closest LOD for high amount of details inclose-ups, conventional rendering for the middle LOD, and simplifiedgeometry and shaders for the furthest LOD rendering. To animate ourcharacters, we sample the skeletal animations for each animation cy-cle (running, digging, etc), and pack the resulting curves into a texturearray which is sampled by the characters’ vertex shaders. This allowsus full animation control for each character directly on the GPU.

4 ShadowsHigh quality rendering system requires dynamic shadows cast by char-acters onto the environment and themselves. To manage shadow mapresolution, our system implements Parallel Split Shadow Maps [Zhanget al. 2006]. The view-frustum test described in Section 2.1 is usedto ensure that only characters that are within a particular parallel splitfrustum are rendered. Occlusion culling could also be used for shadowmaps as well, but we do not do this in our system, because only char-acters and smaller scene elements are rendered into the shadow mapsand there is little to cull the characters against (shadows cast by terrainare handled separately). We use aggressive filtering for generation ofsoft shadows. This allows us to use further mesh simplification for theLOD rendered into shadow maps. For characters in the higher-detailshadow frusta, we use the same simplified geometry that is used for themost distant level of detail during normal rendering. For more distantshadows, we can use a more extreme simplification.

ReferencesGREENE, N., KASS, M., AND MILLER, G. 1993. Hierarchical z-

buffer visibility. In SIGGRAPH ’93: Proceedings of the 20th an-nual conference on Computer graphics and interactive techniques,ACM, New York, NY, USA, 231–238.

ZHANG, F., SUN, H., XU, L., AND LUN, L. K. 2006. Parallel-splitshadow maps for large-scale virtual environments. In VRCIA ’06:Proceedings of the 2006 ACM international conference on Virtualreality continuum and its applications, ACM, New York, NY, USA,311–318.