Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic...

27
Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for improvements Master of Science Thesis in Computer Science JOAKIM CARLSSON Chalmers University of Technology University of Gothenburg Department of Computer Science and Engineering Göteborg, Sweden, 2010

Transcript of Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic...

Page 1: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

Realistic Ambient Occlusion In Real-TimeSurvey of the AOV algorithm and propositions for improvements

Master of Science Thesis in Computer Science

JOAKIM CARLSSON

Chalmers University of TechnologyUniversity of GothenburgDepartment of Computer Science and EngineeringGöteborg, Sweden, 2010

Page 2: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

The Author grants to Chalmers University of Technology and University of Gothenburgthe non-exclusive right to publish the Work electronically and in a non-commercialpurpose make it accessible on the Internet. The Author warrants that he/she is the author to the Work, and warrants that the Workdoes not contain text, pictures or other material that violates copyright law.

The Author shall, when transferring the rights of the Work to a third party (for example apublisher or a company), acknowledge the third party about this agreement. If the Authorhas signed a copyright agreement with a third party regarding the Work, the Authorwarrants hereby that he/she has obtained any necessary permission from this third party tolet Chalmers University of Technology and University of Gothenburg store the Workelectronically and make it accessible on the Internet.

Realistic Ambient Occlusion In Real-TimeSurvey of the AOV algorithm and propositions for improvements

Joakim Carlsson

© Joakim Carlsson, 2010.

Examiner: Ulf Assarsson

Chalmers University of TechnologyUniversity of GothenburgDepartment of Computer Science and EngineeringSE-412 96 GöteborgSwedenTelephone + 46 (0)31-772 1000

Cover:Image illustrating some of the effects achieved with the technique presented in this thesis.

Department of Computer Science and EngineeringGöteborg, Sweden 2010

Page 3: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

AbstractAmbient Occlusion is an area that has seen increased activity in the last few years and is an important

aspect of reality in 3D-graphics. Rough approximation algorithms that are fast enough to be used in realgames have appeared, but realistic solutions still have a way to go.

This thesis covers an evaluation of the recently published Ambient Occlusion Volumes algorithm andpresents various attempts of similar approaches to increase the algorithm performance. While no majorbreakthrough were made, it presents some new ideas and further examines some of those presented inthe original paper. The algorithm is currently not fit for real-time rendered ambient occlusion except forapplications dedicated to that purpose, but this might change in the near future, as it relies heavily uponthe GPU which is a fast advancing field of technology.

Page 4: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

SammanfattningKontaktskuggor är ett område som har sett ett ökat intresse de senaste åren. Grova approximeringsal-

goritmer har dykt upp, men verklighetstrogna lösningar är inte riktigt där än.Denna rapport presenterar en undersökning av den nyligen presenterade algoritmen Ambient Occlu-

sion Volumes. Den går även igenom ett flertal försök med liknande algoritmer i syftet att öka prestandavid körning. Inga större genombrott har gjorts, men den presenterar ett antal nya idéer och går även indjupare på vissa områden än originalrapporten. Algoritmen är ännu för långsam för att användas till attrendera kontaktskuggor i realtid annat än i program som inte behöver övrig tillgång till grafikkortet. Dettakan emellertid komma att ändras inom en snar framtid då grafikkortsteknologin går starkt framåt.

Page 5: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

CONTENTS CONTENTS

Contents

1 Introduction 21.1 Previous work . . . . . . . . . . . . . . 21.2 Problem statement . . . . . . . . . . . 2

2 Ambient Occlusion Volumes 42.1 Deferred rendering . . . . . . . . . . . 42.2 Ambient occlusion volumes . . . . . . 42.3 Algorithm . . . . . . . . . . . . . . . . 4

3 Suggested improvements 53.1 Pre-calculated self-occlusion . . . . . . 5

3.1.1 Algorithm outline . . . . . . . 53.1.2 Efficiently pre-calculating the

self-occlusion . . . . . . . . . . 53.1.3 Rendering . . . . . . . . . . . . 63.1.4 Performance . . . . . . . . . . . 63.1.5 Conclusions . . . . . . . . . . . 6

3.2 Clipping . . . . . . . . . . . . . . . . . 63.2.1 GPU-GEM . . . . . . . . . . . . 73.2.2 Our method . . . . . . . . . . . 73.2.3 Approximation methods . . . . 83.2.4 Performance & Conclusion . . 8

3.3 Upsampling . . . . . . . . . . . . . . . 93.3.1 Bilateral filter . . . . . . . . . . 93.3.2 Implementation . . . . . . . . . 93.3.3 Results & Performance . . . . . 93.3.4 Conclusion . . . . . . . . . . . 10

3.4 Pre-calculated AO and clipping func-tion . . . . . . . . . . . . . . . . . . . . 103.4.1 Naïve hash function . . . . . . 103.4.2 Using spherical coordinates to

map . . . . . . . . . . . . . . . . 10

3.4.3 Performance evaluation andConclusions . . . . . . . . . . . 10

3.5 Hard-limiting maximum number ofoverdraws . . . . . . . . . . . . . . . . 11

3.6 SSAO using the AOp function . . . . . 113.6.1 Triangle buffer . . . . . . . . . 113.6.2 Computing AO . . . . . . . . . 113.6.3 Advantages and Limitations . 113.6.4 Conclusions . . . . . . . . . . . 12

3.7 Ambient Occlusion lookup table us-ing circle-approximation . . . . . . . . 123.7.1 Main idea . . . . . . . . . . . . 123.7.2 Circle approximation (Pixel

shader) . . . . . . . . . . . . . . 133.7.3 Finding occluded surface area

(Pre-calculated) . . . . . . . . . 133.7.4 Performance . . . . . . . . . . . 133.7.5 Conclusions . . . . . . . . . . . 133.7.6 Further work . . . . . . . . . . 14

3.8 Pre-calculated Volumes . . . . . . . . . 14

4 Performance Evaluation 154.1 Results . . . . . . . . . . . . . . . . . . 15

4.1.1 Simple box . . . . . . . . . . . . 164.1.2 Multiple boxes . . . . . . . . . 174.1.3 Stairs . . . . . . . . . . . . . . . 184.1.4 Indoor scene . . . . . . . . . . . 194.1.5 Sponza . . . . . . . . . . . . . . 204.1.6 Summary . . . . . . . . . . . . 21

4.2 Discussion . . . . . . . . . . . . . . . . 21

5 Conclusions 225.1 Future work . . . . . . . . . . . . . . . 225.2 Conclusion . . . . . . . . . . . . . . . . 22

1

Page 6: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

1 INTRODUCTION

1 Introduction

Ambient illumination is a term that describes theindirect illumination of distant light, such as lightfrom the sky, or light reflected on distant objects.Ambient occlusion refers to how nearby objects canocclude parts of this ambient light, producing indi-rect shadows or contact shadows. In effect, this is thedarkening of curved or closed surfaces in a scene,for example if we have a car on a cloudy day, wewill have a darkening underneath it due to the am-bient occlusion (the car blocks the ambient light thatwould otherwise have reached the ground).

Computing this term efficiently is currently oneof the most important topics in real time graphics,as it greatly improves the visual quality of images,not only making it more physically correct but alsoproviding a visually pleasing experience.

Much of the current research in ambient occlu-sion deals with utilizing the GPU to produce ambi-ent occlusion in images. An early production-readysolution was the CryTek Screen Space Ambient Oc-clusion (SSAO) presented in [Mittring, 2007] and uti-lized in their graphics engine CryEngine 2. Thiswas followed by a great number of papers on im-provements and variations on this technique, suchas [Bavoil and Sainz, 2009] and [Bavoil et al., 2008].

A slightly different approach to this, but stillin screen space, was introduced in [McGuire, 2009],called Ambient Occlusion Volumes. The algorithmis analytical and produces smooth and near groundtruth results at impressing frame rates, but can notbe considered real-time for production use. This the-sis examines the algorithm and introduces severalattempts to improve it, with the goal of making itsuitable for real-time applications, such as games, onmodern hardware.

1.1 Previous work

A technique described in GPU Gems 2[Matt Pharr, 2005] treats polygon meshes as a setof surface elements. The benefit of this is that cal-culations are not needed between each and everyelement. A fast approximation is then done to com-pute the shadowing from blocking geometry.

A further improvement to the technique de-scribed in GPU Gems 2 is suggested in GPU Gems 3[Boubekeur and Schlick, 2007]. They give a demon-stration on a few changes to the algorithm that in-creases the usefulness and the robustness of the tech-

nique.Another paper which makes use of ideas pre-

sented in GPU Gems 2 [Matt Pharr, 2005] is papernamed "Accelerated Ambient Occlusion Using Spa-tial Subdivision Structures" [Wassenius, 2005]. Thetechnique adds geometry to a spatial subdivisionstructure (an octree) and traverses it to find nearbyoccluders.

Doing ambient occlusion in screen-space has re-cently become very popular. On of the first in-troduced was the Screen-Space Ambient Occlusion(SSAO) [Mittring, 2007], in which the idea is to ap-proximately calculate the ambient occlusion of apixel by sampling the a depth buffer. A papercalled Screen-Space Directional Occlusion (SSDO)[Ritschel et al., 2009] builds on this technique, andenhances it with directional information. The direc-tional information can then used to calculate localindirect illumination, or together with an environ-ment map; directional shadows. This provides aneven more accurate solution than the SSAO solution,with little computational cost.

By precalculating a set of factors around an ob-ject, from which the ambient occlusion factor canbe derived, [Kontkanen and Laine, 2005] were ableto perform ambient occlusion between objects veryquickly. [Malmer et al., 2007] built upon this idea,but stored the occlusion factors in a 3D grid instead,which made the algorithm even faster and can han-dle self-occlusion in the same pass.

[Evans, 2006] also uses 3D textures to estimateambient occlusion, but instead of per object they useone large grid for the whole scene, which they canquickly create and then use to estimate ambient oc-clusion.

[Reinbothe et al., 2009] introduces the idea ofvoxelizing the scene each frame, and then usingthis information to calculate the ambient occlusion,achieving interactive frame rates.

1.2 Problem statement

This section will give a brief analytical motivationfor ambient occlusion. The rendering equation iscommonly defined as:

Lo(x, ω) = Le(x, ω)+

∫Ω

fr(x, ω′, ω)Li(x, ω

′)(ω′·n)dω′

2

Page 7: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

1.2 Problem statement 1 INTRODUCTION

where Lo is the light exiting from a surface pointx in the direction ω, Le is the emitted illumination,fr is the material properties, Li is incoming light andn is the surface normal.

The incoming light can be divided into twoterms: "near" and "ambient" light, depending onhow far the light travels before it hits the surface.This means the space is divided into two parts, thenear space which is a sphere with a radius d, and theambient space which is everything outside it.

Lo(x, ω) = Le(x, ω) + Ln+∫Ω

fr(x, ω′, ω)V (x, ω)La(x, ω

′)(ω′ · n)dω′

where V (x, ω) is the visibility function, whichis 1 if a ray from x in the direction ω is unob-structed for at least a distance d, and otherwise 0.The Ln term describes the "near" incoming light,such as direct light and indirect light (bouncing from

nearby surfaces), and is described in more detail in[McGuire, 2009]. The last term describes the ambi-ent illumination of the surface point, and a commonapproximation is:

π

(∫Ω

fr(x, ω′, ω)La(x, ω

′)dω′)(

V (x, ω)(ω′·n))dω′)

The first factor of this can be precomputed withonly a small error for diffuse surfaces, thus only thesecond factor needs to be computed. The second fac-tor could be described as the accessibility of the sur-face point, i.e. a number from 0 to 1 describing howmuch ambient light reaches it. Ambient occlusionis generally defined as one minus the accessibility.Calculating this integral efficiently, with an error assmall as possible, for each surface point visible onthe screen is the primary problem that ambient oc-clusion algorithms try to solve.

3

Page 8: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

2 AMBIENT OCCLUSION VOLUMES

2 Ambient Occlusion Volumes

The AOV (ambient occlusion volumes) algo-rithm calculates the ambient occlusion term in therendering equation. It operates in screen space andmakes use of a deferred rendering pipeline. Wewill now explain this method briefly. For a full ex-planation, this method is thoroughly explained in[McGuire, 2009].

2.1 Deferred rendering

A deferred rendering pipeline is used to create inter-mediate geometry buffers which is of high impor-tance to the AOV algorithm. A common need, isto find the world position of a pixel in screen space,which is where the AOV algorithm operates. In thegeometry buffer, the worldspace normal is storedwhich is also needed for the AOV algorithm.

2.2 Ambient occlusion volumes

Every polygon in the scene is considered to be a po-tential occluder to a pixel. To limit the amount ofpixels to be evaluated, a volume is extruded fromevery polygon which makes up for the space inwhich pixels are considered. The volumes are be-ing extruded both horizontally and vertically fromthe polygon. This is done in the geometry shader.For each vertex in the polygon, an extension vectoris being calculated. The normal, which is computedin the geometry shader, and the extension vectorsare then used to extrude the polygon vertically andhorizontally respectively. It is important to store themagnitude to which we are extruding the volumesin order to later scale the amount of occlusion addedto a pixel according to the distance from the polygonto the pixel in world space in order to avoid hardedges on the volumes.

2.3 Algorithm

For each pixel found by rasterizing the volumes, thecorresponding pixel in worldspace is being calcu-lated using the depth value stored in the geometrybuffer. The polygon, which in this case is a triangle,belonging to the rasterized pixel is then consideredto be a potential occluder to the worldspace pixel.There are four cases which can arise from this set-ting.

1. The triangle is below the surface to the normalof the worldspace pixel

2. Two vertices in the triangle is above the surfaceto the normal of the worldspace pixel

3. One vertex in the triangle is above the surfaceto the normal of the worldspace pixel

4. All vertices are above the surface to the normalof the worldspace pixel

In the first case, no possible occlusion could bedealt to the pixel. In the second and third case, clip-ping has to be done between the triangle and the sur-face to the normal of the worldspace pixel in orderto find the part of the triangle which is above thesurface. In the forth case, the whole polygon is con-sidered.

In the final step of the algorithm, an occlusionvalue is calculated for each considered pixel. Sub-tractive blending is used in order to combine occlu-sion values. For the considered world space pixel,a hemisphere, based on the world space normal isvisualized to explain the following projections. Thetriangle is projected to the surface of the hemisphere,and further projected down to the surface of the nor-mal to the worldspace pixel. The area of the pro-jected triangle directly corresponds to the amount ofocclusion which is weighted with a value in order toavoid hard edges.

4

Page 9: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3 SUGGESTED IMPROVEMENTS

3 Suggested improvements

3.1 Pre-calculated self-occlusion

If we look at the ambient term of an object, we cansplit it into two components: self-occlusion and oc-clusion from other objects. For static geometry, theself-occlusion term will never change, since it onlydepends on the geometry itself. This means thatwe should be able to pre-calculate this term andonly calculate the occlusion from other objects live.Dynamic geometry (such as, for example, animatedskinned-mesh objects) are handled as normally doneby the AOV algorithm.

Two major parts of this optimization can be iden-tified; calculating the self-occlusion efficiently andutilizing this pre-calculated self-occlusion in the pro-gram. The different stages of the algorithm is visu-alized in Figure 1.

This technique is similar to light maps[Beam, 2003], with the difference that light mapsoften refer to whole-scene pre-calculated lighting,and not only self-occlusion. Light maps may alsohandle direct illumination.

3.1.1 Algorithm outline

1. (Offline) Create occlusion maps for every ob-ject type

2. (Each frame) Create an object id buffer

3. (Each frame) Render the scene using normallighting, and use the occlusion maps to pro-vide self occlusion

4. (Each frame) Render the Ambient OcclusionVolumes on top of this, but skip pixels whichwould calculate self-occlusion

3.1.2 Efficiently pre-calculating the self-occlusion

The first part of the optimization is the pre-calculation of the self occlusion. What we get fromthis is a texture which we call the occlusion map.This texture contains values defining how muchlight, or occlusion, points on the surface of the ob-ject receives. It is important to UV-map the object insuch a way that no faces overlap in the UV-map, oth-erwise those texels occlusion value would representseveral faces occlusion value.

The occlusion map can be calculated in a lot ofways, and it is unimportant to the second part of

the algorithm exactly how they are constructed. Wehave devised one algorithm that calculates the oc-clusion map on the GPU using the AOV algorithm,which has the benefit of producing results that arevery similar to calculating the AOV self-occlusionterm live, meaning that the difference between animage with pre-calculated self-occlusion and an im-age with normal AOV is very small (as can be seenin Figure 1).

The algorithm can be outlined as follows:

1. Calculate a normal and a world position bufferin texture space (using the UV-coordinate ofthe vertex as position, and outputting the nor-mal and world position as texcoords)

2. Run the AOV algorithm on an empty buffer,but instead of outputting volumes outputquads covering the whole buffer.

The reason this works is because of how the AOValgorithm works. The AOV pixel shader has the fol-lowing input: world position and normal of pixelwe are trying to shade and the triangle that we wantto calculate how much it occludes that pixel. Theworld position and normal are simply sampled fromthe calculated buffers in stage one, and the triangle issimply output in the geometry shader (as normallydone) in the second stage.

A few points should be noted though:

• You can choose to use or omit the gp function.Omitting it is mathematically equal to sam-pling towards infinity when calculating theAO term. This might however not always bedesirable, for example inside a room model,since the room would then be rendered com-pletely black (the AOp term will be one).

• The P triangles outputted in the geometryshader can be offset by a small value alongthe mk vector to prevent "edges" of under-occlusion in the occlusion map.

• After the algorithm is done, we can run a"padding" algorithm on the texture which ba-sically fills texels that are outside the faces onthe occlusion maps with nearby texel informa-tion, similar to the "push-pull" algorithm from[Grossman and Dally, 1998]. This prevents in-correct color "bleeding" when sampling thetexture.

5

Page 10: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.2 Clipping 3 SUGGESTED IMPROVEMENTS

Figure 1: Pre-calculated self-occlusion visualizations

3.1.3 Rendering

To utilize the pre-calculated self-occlusion in the liverendering, we have to perform a few additionalsteps to the original AOV algorithm. The generalidea is to skip calculating self-occlusion on objects,and use the pre-calculated values in those places in-stead.

The first step is to calculate what we call an ob-ject id buffer. This buffer is a screen resolutionbuffer of integers, each integer representing the id ofthe object at that location. Next we clear the screenand then draw the scene normally, but with the self-occlusion maps applied to all objects. This will pro-duce an image with only self occlusion, so after thatwe render the AOV algorithm to calculate occlusionfrom other objects, but with a modification; we onlydraw a pixel if the id of object we are working withis not the same as the id in the object id buffer.

3.1.4 Performance

The early measurements of this technique was madeon a Radeon HD 3470, which is a graphics cardwith a fairly low fillrate (3.2 GPixels/s). All our testscenes were fillrate bound on this card.

We started out implementing the id buffer as aninteger texture, which we rendered to in a separatepass, and then we simply read this texture in theAOV pixel shader and discarded pixels which hadthe same id as the object that was being handled.However, this proved very slow and we only saw arough 10% performance gain compared to standardAOV.

We then implemented the id culling by writingid’s to the stencil buffer while doing the normal ren-dering, and then using the stencil buffer to cull away

the pixels with the same idea, which turned out to bemuch faster. In many of our test scenes we saw per-formance gains of around 50% (i.e. twice the framer-ate).

Measuring the overdraw we noticed a similar re-lationship; in many scenes the overdraw was halvedcompared to normal AOV, which explains the per-formance gain on this graphics card.

3.1.5 Conclusions

This technique is fairly straight forward to imple-ment, and produces almost equal (if the occlusionmap resolution is low), equal or better (the d vari-able can be arbitrarily large in the occlusion map, orgp omitted) results than the normal AOV algorithm.The occlusion map generation is fast, and can becompiled offline once and then simply loaded fromthe hard drive.

The drawbacks are that it requires a UV-mappingwith no overlapping faces (which means we eitherhave to generate those maps in the modelling soft-ware, or compute them in some way; we createdthem in the modelling software for simplicity) andthe extra memory usage of the occlusion maps. An-other concern is that the stencil buffer can only store8-bit values, which means that we can only handle255 objects at a time, more than that and we wouldhave to recalculate the stencil id buffer every 255thobject for the next 255 objects.

3.2 Clipping

When computing contributions from polygon p topixel x, one must find the parts of p that can pos-sibly shade x. If all vertices of p are below the sur-

6

Page 11: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.2 Clipping 3 SUGGESTED IMPROVEMENTS

face to the normal of x, p can not shade x. If all ver-tices lie above the surface to the normal of x, p fullyshades x. However, clipping has to be performedon p when p intersects the surface to the normal ofx. In our case, all polygons are triangles so there aretwo cases where clipping is needed. One where twovertices are above the surface to the normal of x andanother where only one vertex is above the surfaceto the normal of x.

This is a simple task but since it is run for everypixel it has to be done fast. We found two analyticaltechniques. One suggest in the paper which is de-veloped by GPU GEM, and another one developedby ourselves. We also tried to find an approximationalgorithm for solving this problem in order to trade-off visual appeal for performance.

3.2.1 GPU-GEM

This technique is suggested for use in the latest ver-sion of ambient occlusion volumes. As the problemitself, the technique is straight forward. The algo-rithm goes through a series of if-statements in orderto find the correct setting. To find the points of inter-section, this technique computes the distances fromthe vertices of the triangles to the plane. This is illus-trated in figure 2.

Figure 2: Distances d0 and d1 are calculated in orderto find the intersection point i0.

In order to find the intersection point we simplycalculate it:

i0 = (d0/(d0 − d1)) ∗ (v0 − v1)

They choose to structure their if-statements in alogical way that searches to find a setting of verticeswhich matches the actual case. The way of doingthis is the key difference between our algorithm andtheirs. Their structure looks as follows:

if(v0 is above)

if(v1 is above)

if(v2 is below)

//compute intersections from//v2 to v1//v2 to v0

else if(v2 is above)

//all above

... etc ...

3.2.2 Our method

The main difference between this method and GPUGEM’s method is how the if statements are struc-tured. In our algorithm, we first count the numberof vertices that are below the surface to the normalof x. By doing that we can split the if-statements intoblocks, each representing a number of vertices belowthe surface to the normal of x. The structure looks asfollows:

if(below == 3)//all below

if(below == 0)//all above

if(below == 2)//find which vertex is above

if(below == 1)//find which vertex is below

When there are two vertices below the surface wecheck which vertex is above to know which settingthat needs to be handled. We then compute the in-tersection points on the surface to the normal of x

7

Page 12: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.2 Clipping 3 SUGGESTED IMPROVEMENTS

and the vectors from the vertex above to the vertexbelow. When there are only one vertex above thesurface to the normal of x, we look for that particu-lar vertex and compute the intersection points in thesame manner as in the previous case.

3.2.3 Approximation methods

There are no approximation methods easily avail-able online, which is natural since the nature of theproblem itself is so simple. We developed our ownto gain some knowledge into how much visual ap-peal one would have to trade-off to boost perfor-mance. In our best implementation, we managed togain performance but the visual appeal was greatlyimpaired which ruins the soul purpose of AOV. Thealgorithm looks as follows:

Input: Polygon p, Pixelnormal NOutput: Clipped polygon p’

for each Vertex v in Polygon pif(v below plane)

move v up to planereturn true if any v is above plane

else false

What is actually being approximated is the inter-section points of the triangle and the plane. The al-gorithm always returns three vertices in differenceto the analytical version which returns a quad or atriangle dependant on the setting of vertices. Thisis a result from how the intersection points are be-ing approximated. In our algorithm we rise verticesbelow the plane until they lie in the plane. This isillustrated in figure 3.

Figure 3: This figure illustrates how the clippingis approximated by translating the triangle until allpoints are above the surface of the normal to thepixel being shaded.

The error-area shown in figure 3, is dependant ofthe distances from the vertices to the plane. It can bereduced by doing additional computations at the ex-pense of performance. Unfortunately the price is sohigh, the approximation algorithm no longer boostsperformance in comparison to the previously men-tioned methods. So the error-area is the final trade-off to gain additional performance.

3.2.4 Performance & Conclusion

The performance between our method and GPU-GEM’s varies greatly between computers and candiffer up to 20 %. We could not find any way topre-determine which algorithm to use. It is verydifficult to create an approximation algorithm thatboosts performance while not killing off the visuallooks. We were not able to develop a method thatcould out-perform GPU-GEM by a lot, but our mostsuccessful method seems to be at an advantage onmost machines.

8

Page 13: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.3 Upsampling 3 SUGGESTED IMPROVEMENTS

Figure 4: A comparison between the ground truth and sampled pictures at various quality.

3.3 Upsampling

A common way to improve performance for pixelbound methods is to reduce the size of the rendertarget for that particular method. The output is writ-ten to an intermediate low resolution buffer and thenrescaled using filtered sampling in order to recon-struct the information lost from the downsampling.

3.3.1 Bilateral filter

Many papers suggest ways to implement bilateralfilters. What usually varies is the size of the kernelsand the ways of weighing the normals, distancesand depths.

In the AOV-paper a Gaussian function is used toweigh distances, normals and depths, and a 5x5 ker-nel is used to sample from the low resolution texture.

A paper from 2008 [Lei Yang, ] takes a similar ap-proach to the implementation of the bilateral filterbut with some interesting differences. To upsam-ple the low resolution texture, they use a 2x2 ker-nel. They also choose to weigh the distances differ-ently by using a tent function rather than the typicalGaussian. Both papers use the Gaussian function toweigh normals and depths.

f(x) = ae−(x−b)2

2c2

This is the Gaussian function that is used toweigh normals and depths. The width of the Gaus-sian curve is adjusted by modulating a. It needs tobe carefully tuned in order to preserve sharp edgeswhich is usually desired for visually pleasing im-ages.

3.3.2 Implementation

To implement the suggested method in the 2008 pa-per for the AOV method, one has to create a high

resolution and a low resolution geometry buffer. Thelow resolution geometry buffer should be of the sizeas the downsampled render target.

cHi =∑cLj f(xi,xj)g(|nH

i −nLj |,θn)g(|zHi −z

Lj |,θz)∑

f(xi,xj)g(|nHi −nL

j |,θn)g(|zHi −zLj |,θz)

This is the formula which is being applied to ev-ery pixel pi. In our implementation a 2x2 grid wasused so this was run four times per pixel. xi is theposition of pi in the downsampled texture and xj isa sample in the grid. θz and θn is the weight of dif-ference in the normals and depths respectively.

The 2008 paper mentions that small values forθz and θn are desired for maintaining sharp edges.In our implementation we found that θz = 0.1 andθn = 0.1 produced pleasing results.

3.3.3 Results & Performance

The results are showing huge performance gainsfrom downsampling. This was to be expectedsince we are directly attacking the bottleneck of thismethod. When using smaller kernels we can furtheroptimize the upsampling, in particular when usingthe 2x2 instead of the 5x5 grid. The trade-off in vi-sual appearance is small, if even noticeable.

In order to get good results with a smaller kernelone has to be extra cautious about correctly tuningthe weights to the Gaussian functions. In figure 4,visual appearance are shown when using differentscales on the render target.

The downsampled scenes are rendered using a2x2 grid, weighing distances with a tent-functionand depths and normals with a Gaussian function,as suggested in the paper by Yang. The perfor-mance differs a lot between different downsamplingschemes. They are rendered in 290 ms, 105 ms, 65ms and 45 ms respectively.

9

Page 14: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.4 Pre-calculated AO and clipping function 3 SUGGESTED IMPROVEMENTS

Using the same scheme, but with a 5x5 gridthe performance is highly affected and the qualitygained is hardly noticable. The fourth picture ren-dered in 45 ms using the 2x2 while it takes the 5x5grid 83 ms.

An obvious improvement is the second picturewhich is rendered three times as fast as the full reso-lution image while not losing any visual appeal. Thethird picture which downsamples the render targetby almost 11% further increases performance by al-most 100 %, while still not producing any notewor-thy artifacts.

3.3.4 Conclusion

The 2x2 grid suggested in Yang’s paper showpromising results in visual appeals and particularlyin performance. The sampling in the 2x2 is very fastin comparison with the sampling in the 5x5.

3.4 Pre-calculated AO and clipping func-tion

Looking at the pixel shader in the standard AO al-gorithm, we may identify a few things that are po-tentially worth pre-calculating. The input to the ac-tual clipping and AO functions consists of three vari-ables; the triangle, the surface normal at the pointand the world space position of the point (equation1).

t : triangle (3 vertices = 9 floating point values)n : surface normalx : world space positionao : ambient termh : hash value

Figure 5: Definitions

clip(t, n, x)→ t′ (1)AO(t′, n, x)→ ao

Our idea was to use these values to calculate a hashvalue, which we would then use to look up the AO

value (equation 2).

hash(t, n, x)→ h (2)lookup(h)→ ao

Not only can we thus save the calculation of theAOpvalue, but we can also save the very expensive clip-ping calculations, since this can be computed in thelookup table.

3.4.1 Naïve hash function

(t, n, x) consists of a total of 15 floating point val-ues. The most simple solution would be to simplyuse these values to create a hash. If we map eachof these values to only four bits, meaning they arelimited to only 16 values, we will still have 1615 =1.1529215 ∗ 1018 values, which is obviously muchlarger than any memory can hold.

3.4.2 Using spherical coordinates to map

A lot of the information in a naïve mapping wouldhowever be redundant. First of all we can get rid ofx, and use (t − x, n) instead, since it does not mat-ter where in the world a triangle is, the calculationswill be the same. Next, we may also transform thetriangle with n, which gives us (trans(t − x, n)), atotal of nine values. Further more, we may reducethese nice values by projecting the triangle onto thesphere and use spherical coordinates to describe thetriangle (which only requires two values per ver-tex instead of three, since the radius is always onewhen the triangle is projected onto the sphere). Fi-nally, the z-rotation of the triangle is unimportantso we may describe the values in reference to oneof the vertices, leaving us with only five values(aθ, bθ, bϕ− aϕ, cθ, cϕ− aϕ) (where (a, b, c) is the ver-tices of the triangle).

If we use one byte to store each ao value, andmap each of these floating point values to four bits,then the map will be 1 ∗ 165 = 1048576byte = 1MBlarge. Since 1D textures in directx only can be of8192 texels in width, we use 3D textures, meaningwe have to do additional mapping from these fivefloating point values to three integer values.

3.4.3 Performance evaluation and Conclusions

To do an early evaluation of the performance of thistechnique, we implemented a pixel shader that per-formed the lookup in an empty texture. The pixelshader is outlined in figure 6.

10

Page 15: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.5 Hard-limiting maximum number of overdraws 3 SUGGESTED IMPROVEMENTS

1. t′ = ToSphericalCoords(|t− x|)

2. n′ = ToSphericalCoords(n)

3. t′′ = Normalize(t′ − n′)

4. h =Map5FloatTo3Integer(t′′)

5. ao = LookupInTexture(AOTexture, h)

Figure 6: Pixel shader outline

Early measurements of this provided an perfor-mance increase of around 20-30%, which we con-sider low in relation to how complex the method is.

There are however alternative hashingfunctions which may prove more efficient;[Hr?dek et al., 2003] is one example.

3.5 Hard-limiting maximum number ofoverdraws

The biggest bottleneck of the AOV algorithm is thehuge overdraw (around 10-20 times per pixel in av-erage for many of our scenes). To counter this, asimple solution is to hard-limit the number of over-draws (by using for example the stencil buffer).

Implementing this is very straight forward, but,as can be seen in Figure 7, the artifacts are verylarge even at as large limits as 32 times overdraw(for example, the chairs are under-shaded at n =32). Moreover, we could not measure any significantperformance increase at all from this optimization,which we believe is due to the fact that the algorithmis heavily fillrate bound, meaning that even thoughit might be able to skip some pixel calculations thefillrate will still bottleneck the application.

3.6 SSAO using the AOp function

As mentioned earlier, the AOV algorithm is heavilyfillrate bound. The reason it is fillrate bound is be-cause of the AO volumes used to identify pixels atriangle can potentially shade. These volumes over-lap each other a great number of times, and sincethe results are blended together all of them have tobe drawn, which produces a high overdraw count.

Our idea is to turn the relation around; insteadof using volumes to identify pixels a triangle canshade, we try to find triangles that can shade a pixelfor each pixel. This is inspired by the Screen Space

Ambient Occlusion algorithms such as the CrytekSSAO [Mittring, 2007].

The algorithm can be outlined as follows:

1. Create a triangle buffer

2. Draw a fullscreen quad with the triangle bufferas input. For each pixel find triangles from thetriangle buffer that can potentially shade it,and calculate the shading using the AOp func-tion.

3.6.1 Triangle buffer

The triangle buffer can be created by rendering thegeometry using forward-rendering, and in the ge-ometry shader appending information to each ver-tex as to which triangle it belongs to. We then outputthis information in the pixel shader, either in threetargets (containing the three vertices of the triangle)or compressing each vertex into one float value.

This operation is very fast and lightweight,though it does scale with scene complexity.

3.6.2 Computing AO

We then proceed by drawing a fullscreen quad tocompute the AO term of each pixel. For each pixelwe sample a number of triangles from the trianglebuffer. These triangles are then used in the normalAO algorithm, with the exception that we don’t needto use the gp function.

3.6.3 Advantages and Limitations

The primary advantage with this method is that itscales very well with scene complexity. The trianglebuffer creation is the only part of the algorithm thatis proportional to scene complexity.

The AO computation depends on three values;the screen resolution, the number of samples andthe size of the kernel (since a larger kernel decreasescache performance).

However, the major drawback with this methodis that it comes with a large constant factor; the pixelshader of the AO computation is rather heavy sincewe have to try to find the triangles that shade thatpixel.

We also have the problem of choosing a kernel.Several kernels were tried out, amongst them a grid,random sampling and a grid with each sample offsetwith a random number, each of these scaled in size

11

Page 16: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.7 Ambient Occlusion lookup table using circle-approximation 3 SUGGESTED IMPROVEMENTS

Figure 7: Overdraw with hard-limited number of maximum overdraws (n).

Figure 8: SSAO using the AOp function. Using a grid kernel with random offsets for each sample, scaled with thedepth of the pixel and the constant d. Notice that no matter the value of d the kernel cannot find the triangles underthe table (since we are sampling in screen space), and thus we will never have shading below the table.

by the distance of the pixel. All of these had trou-bles finding appropriate sample spots (i.e. findingthe potential triangles that could shade this pixel).

Above all, the problem is finding triangles whichare not visible in screen space (as detailed in Fig-ure 8). This could potentially be solved, at leastpartially by, for example, techniques described in[Bavoil and Sainz, 2009].

3.6.4 Conclusions

Although this technique has its advantages, thedrawbacks makes it in its current state practicallyuseless, as there exists various other SSAO tech-niques that would be preferred.

3.7 Ambient Occlusion lookup table us-ing circle-approximation

Due to the high amount of overdraw for Ambi-ent Occlusion Volumes, reducing the complexity ofcomputations in the pixel shader is essential in or-der to reach good frame rates. This approximationtechnique utilizes pre-calculated values in order toremove the occlusion calculations from the pixelshader.

3.7.1 Main idea

The main idea behind this algorithm is to replace thehemisphere-projected triangles found in the pixelshader-stage of the original algorithm with circles ofthe same area. By doing this the work that needsto be done each frame is reduced to finding the cir-cle approximation for each triangle processed in the

12

Page 17: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.7 Ambient Occlusion lookup table using circle-approximation 3 SUGGESTED IMPROVEMENTS

pixel shader. The ambient occlusion can then befound by doing a lookup in a pre-calculated table.

The pre-calculations performed finds the Ambi-ent Occlusion values for different values of the posi-tion angle θ and the circle size angle ψ and outputsthese to a texture.

3.7.2 Circle approximation (Pixel shader)

Starting out with a triangle P in worldspace coor-dinates and a sphere around point ~x, the trianglevertices are projected onto the sphere surface withnew coordinates A, B and C. By doing this beforefinding the circle we do not have to take into accountthe angle of P in regard to the sphere. We find thecircle radius by using the triangle area

Atriangle =12 |B −A× C −A| = πr2 = Acircle

r =√( 1

2π |B −A× C −A|)

Figure 9: Circle approximation projected onto tangentplane of ~x.

By finding the radius r and the triangle’s cen-troid c, we get a circle in world space. This aloneis not very easy to work with since the AO-valueswanted needs to fit in a single texture. It is easierto work with the circle projected onto the spheresurface which can be represented using two anglesthanks to the ~x->circle vector being orthogonal tothe projected triangle plane.

φ = arctan r|c|

θ = π2 − arccos c·y

|c||y|

3.7.3 Finding occluded surface area (Pre-calculated)

Given the sphere projected circle S(C) with circlesize angle φ we can find the amount of occlusioncast by the approximation circle by projecting thespherical segment surface onto the tangent plane of~x. We thus eliminate the cosine weighted solid angleby changing the integration domain.

AOC = 1π

∫S(C)

(ω · n) dω = 1π

∫T (S(C))

1 d~x

Finding the occlusion from here comes down tofinding the projected area in the tangent plane andcompare it to the unit circle area.

3.7.4 Performance

Running this algorithm showed time reductions of20 − 60% depending on scene complexity and themagnitude of the maximum obscurance distance δ.As expected, the difference in performance is greaterfor more complex scenes and bigger δ-values due toa faster pixel shader program.

3.7.5 Conclusions

Figure 10: The results for this technique did not come outvery well.

Although the speed increase of this algorithm ispromising, especially for bigger scenes than the rel-atively small ones used in these tests, the visual re-sults were far from satisfactory as shown in figure

13

Page 18: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

3.8 Pre-calculated Volumes 3 SUGGESTED IMPROVEMENTS

10. The main purpose of the Ambient Occlusion Vol-ume algorithm is to deliver results that are very closeto the ground truth and by making too hefty approx-imations this purpose is easily lost.

3.7.6 Further work

Circles are not very good at approximating trian-gles. Using eclipses instead would yield more accu-rate results but is also slightly more computationallyheavy.

One such solution would be to store two anglesthat determine the shape of the ellipse instead of oneas in the circle case. This is not the most accuratesolution as we still need another angle in order torotate the ellipse freely, but it could easily be repre-sented by a 3D-texture.

3.8 Pre-calculated Volumes

A natural proposal for making the Ambient Occlu-sion Volumes algorithm faster is to remove the ge-ometry shader stage by pre-computing the occlusionvolumes for all rigid objects on the CPU.

Performance tests showed that this had no effectat all when running on typical laptop GPUs with lowfillrates, but that it gave a significant boost in per-formance on high performance GPUs. It also scalesvery well with scene complexity if the applicationcan handle the increased memory usage of storingall the extra data.

The visual quality of the images is equal to thoserendered by the original AOV algorithm since theirpixel shader function and the input it receives areequal for both algorithms.

14

Page 19: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4 PERFORMANCE EVALUATION

4 Performance Evaluation

This section provides results from benchmarkruns well as a discussion about the work as a whole.

All tests in this section have been run on anNVIDIA Geforce GTX 260 GPU with an Intel Core2 DUO E8500 clocked from 3.16 to 3.80 GHz and4GB of physical memory on a Windows 7 64-bit sys-tem. The languages of choice for all implementa-tions were C# 3.5 and HLSL using DirectX 10 via theSlimDX API. The most important hardware for alltechniques is the graphics card due to the high fill-rate demands of most algorithms.

4.1 Results

All images supplied here were rendered using thebaseline algorithm. Only variants that produced re-sults similar to the baseline with no major artifactsare presented. All tests were run using a δ of 2 me-ters with 1280x720 resolution. No frustum cullingor similar optimization techniques were used. Rele-vant individual settings follow below.

• The limited overdraw method used a cap of 48overdraws per pixel

• Upsampling used the 2x2 grid mentioned in3.3.2

Each scene was tested by letting the camera flyaround the scene in a pre-determined manner so thatthe results gave a good overview of the general per-formance of the algorithm. The camera position androtation for each frame was set independent of ren-dering performance which made the algorithms runon equal terms.

The scenes presented below are:

• Baseline - The normal AOV algorithm as pre-sented by McGuire.

• Baked shadows - Pre-calculated self-occlusionas described in section 3.1.

• Upsampling (1/2) - Upsampling as describedin 3.3 using a render target texture of half thewidth and height of the final resolution.

• Upsampling (1/4) - Like Upsampling (1/2),but with quarter width and height render tar-get texture.

• Limited overdraw - Hard-limiting maximumnumber of overdraws as described in section3.5.

• Pre-calculated volumes - Pre-calculated vol-umes as described in section 3.8.

15

Page 20: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4.1 Results 4 PERFORMANCE EVALUATION

4.1.1 Simple box

Figure 11: Simple box scene (14 triangles).

Technique Min. time Avg. time Max. timeBaseline 0.26 1.84 8.80Upsampling (1/4) 0.52 1.57 5.47Baked shadows 0.56 1.64 5.78Pre-calculated volumes 0.32 1.94 11.61Upsampling (1/2) 0.56 2.23 5.18Limited overdraw 0.27 2.73 9.66

Table 1: Rendering times in milliseconds when rendering the scene in figure 11.

16

Page 21: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4.1 Results 4 PERFORMANCE EVALUATION

4.1.2 Multiple boxes

Figure 12: Multiple boxes scene (278 triangles).

Technique Min. time Avg. time Max. timeBaseline 0.54 23.69 47.19Upsampling (1/4) 1.18 4.04 7.83Pre-calculated volumes 0.54 7.54 24.83Upsampling (1/2) 1.27 11.2 19.59Baked shadows 0.88 20.30 39.98Limited overdraw 0.77 40.02 81.75

Table 2: Rendering times in milliseconds when rendering the scene in figure 12.

17

Page 22: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4.1 Results 4 PERFORMANCE EVALUATION

4.1.3 Stairs

Figure 13: Stairs scene (120 triangles).

Technique Min. time Avg. time Max. timeBaseline 0.31 8.71 25.43Pre-calculated volumes 0.24 1.76 13.75Upsampling (1/4) 0.48 2.24 7.63Upsampling (1/2) 0.5 4.93 10.31Baked shadows 0.89 5.39 12.46Limited overdraw 0.32 12.79 40.51

Table 3: Rendering times in milliseconds when rendering the scene in figure 13.

18

Page 23: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4.1 Results 4 PERFORMANCE EVALUATION

4.1.4 Indoor scene

Figure 14: Indoors scene (3678 triangles).

Technique Min. time Avg. time Max. timeBaseline 3.46 66.46 93.93Pre-calculated volumes 2.16 5.11 10.77Upsampling (1/4) 2.77 10.36 16.06Upsampling (1/2) 4.71 30.52 97.86Baked shadows 4.41 61.93 81.53Limited overdraw 3.50 130.00 163.41

Table 4: Rendering times in milliseconds when rendering the scene in figure 14.

19

Page 24: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4.1 Results 4 PERFORMANCE EVALUATION

4.1.5 Sponza

Figure 15: Sponza scene (66454 triangles).

Technique Min. time Avg. time Max. timeBaseline 0.24 36.22 81.39Baked shadows - - -Upsampling (1/2) 0.45 20.63 41.86Upsampling (1/4) 0.74 13.48 26.20Limited overdraw 0.31 60.55 139.94Pre-calculated volumes - - -

Table 5: Rendering times in milliseconds when rendering the scene in figure 15.

This scene was too heavy to load for the pre-calculated shadows method and, due to a problem with themodel, the scene would unfortunately not load for the pre-calculated volumes method either.

20

Page 25: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

4.2 Discussion 4 PERFORMANCE EVALUATION

4.1.6 Summary

Technique Box Multiple Boxes StaircaseBaseline 0.26 / 1.84 / 8.8 0.54 / 23.69 / 47.19 0.31 / 8.71 / 25.43Baked shadows 0.56 / 1.64 / 5.78 0.88 / 20.3 / 39.98 0.89 / 5.39 / 12.46Upsampling (1/2) 0.56 / 2.23 / 5.18 1.27 / 11.20 / 19.59 0.50 / 4.93 / 10.31Upsampling (1/4) 0.52 / 1.57 / 5.47 1.18 / 4.04 / 7.83 0.48 / 2.24 / 7.63Limited overdraw 0.27 / 2.73 / 9.66 0.77 / 40.02 / 81.75 0.32 / 12.79 / 40.51Pre-calc volumes 0.32 / 1.94 / 11.61 0.54 / 7.54 / 24.83 0.24 / 1.76 / 13.75

Technique Indoor SponzaBaseline 3.46 / 66.46 / 93.93 0.24 / 36.22 / 81.39Baked shadows 4.41 / 61.93 / 81.53 -Upsampling (1/2) 4.71 / 30.52 / 97.86 0.45 / 20.63 / 41.86Upsampling (1/4) 2.77 / 10.36 / 16.06 0.74 / 13.48 / 26.20Limited overdraw 3.50 / 130.00 / 163.41 0.31 / 60.55 / 139.94Pre-calc volumes 2.16 / 5.11 / 10.77 -

Table 6: Comparison of time (milliseconds) required to run the benchmark test for the different scenes used in thisperformance evaluation.

Based on the above results we can see that allvariations perform at least slightly better than thebaseline in terms of speed except one, limited over-draw. The maximum number of allowed overdrawsneeded to be quite high which made the possiblegains from the algorithm lose out to its overheads.

Upsampling with 1/4 dimensions on the AOVrender target (1/16th of the original area) looks quitecluttered when moving around most scenes, whilethe 1/2 one fared much better. They both scalepretty well with scene complexity, which makesthem viable candidates for real applications.

An algorithm that scaled very well with scenecomplexity was pre-calculated volumes. Unlike up-sampling, however, there is no gain whatsoeverfrom running this algorithm on a GPU with a lowfillrate as this quite easily becomes the bottleneck.The resulting image is equal to the one produced bythe baseline algorithm.

Baked shadows ran slightly faster than the base-line algorithm, but there was a noticeable drop inshadow quality. This, bundled with the complexityof implementation, makes it hardly worth the effort.

4.2 Discussion

The AOV algorithm has many benefits and is cer-tainly revolutionary in its target area, accurate im-

ages at interactive rates. That being said, it still suf-fers from a few issues such as the high dependencyon fillrate and the difficulty in finding a good sizeon δ. Nevertheless, it is a good thing that it relies soheavily on the GPU since this area currently is ad-vancing at a fast pace, which might mean the algo-rithm could be used for games in a couple of years.

Being an analytic solution it ensures precise andsmooth results with little artifacts. The algorithmdoes not have any problems dealing with non-rigid objects either which makes it good for 3D-modelling, which is its main target application.

This thesis covers a variety of propositions forimprovements; some mentioned by McGuire in hispaper and some that was formed during this thesiswork. The most applicable ones of those presentedin this paper seems to be pre-calculated volumes andupsampling, both coming with their share of down-sides and both presented in the original paper. Thefirst can become a bit heavy on memory usage, whilethe second skimps on image quality.

None of the fresh attempts at improving the algo-rithm were able to make it all the way, though somewere cutting it close. It is possible that some mightend up with better results with slight modifications,but there was not enough time allotted in this thesiswork for further exploration.

21

Page 26: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

5 CONCLUSIONS

5 Conclusions

5.1 Future work

There are still some features and variants that are yetto be explored. Some are mentioned in the originalpaper by McGuire, but were not used in the tests ofthis report.

The best way to improve the algorithm seems tobe to limit the amount of rasterized pixels, so thatas few unnecessary computations are carried outas possible. One way make the occlusion volumestighter is to implement a dynamic δ that varies basedon the surrounding geometry and the polygon size.However, such an implementation is non-trivial.

The original paper uses a 1D-texture compen-sation map to mitigate the overshadowing effectsthat can occur in geometrically dense areas. Doingthis can increase the visual results with little perfor-mance overhead.

Another method mentioned by McGuire is torepresent the geometry with quads instead of trian-

gles. According to the paper this resulted in a notice-able speedup. Unfortunately, the OpenGL and Di-rectX geometry shaders are unable to output quadri-lateral lists which complicates the implementation.

5.2 Conclusion

This thesis examines the Ambient Occlusion Vol-umes algorithm and presents several attempts of im-proving it. No better solutions than what was pre-sented in the original paper were found, but the re-alized attempts might provide useful information orspring new ideas to others exploring the same area.

The algorithm is currently not sufficient formodern games, it fulfills the requirements of 3D-modelling which is the intended usage. Neverthe-less, due to the current fast advances in GPU perfor-mance, it is far from impossible that we will be see-ing the algorithm used in games a few years fromnow.

22

Page 27: Realistic Ambient Occlusion In Real-Timeuffe/xjobb/Joakim Carlsson-Realistic Ambien… · Realistic Ambient Occlusion In Real-Time Survey of the AOV algorithm and propositions for

REFERENCES REFERENCES

References

[Bavoil and Sainz, 2009] Bavoil, L. and Sainz, M. (2009). Multi-layer dual-resolution screen-space ambientocclusion. In SIGGRAPH ’09: SIGGRAPH 2009: Talks, pages 1–1, New York, NY, USA. ACM.

[Bavoil et al., 2008] Bavoil, L., Sainz, M., and Dimitrov, R. (2008). Image-space horizon-based ambient oc-clusion. In SIGGRAPH ’08: ACM SIGGRAPH 2008 talks, pages 1–1, New York, NY, USA. ACM.

[Beam, 2003] Beam, J. (2003). Dynamic lightmaps in opengl. http://joshbeam.com/articles/dynamic_lightmaps_in_opengl/.

[Boubekeur and Schlick, 2007] Boubekeur, T. and Schlick, C. (2007). GPU Gems 3. Addison-Wesley.

[Evans, 2006] Evans, A. (2006). Fast approximations for global illumination on dynamic scenes. In SIG-GRAPH ’06: ACM SIGGRAPH 2006 Courses, pages 153–171, New York, NY, USA. ACM.

[Grossman and Dally, 1998] Grossman, J. P. and Dally, W. J. (1998). Point sample rendering. In RenderingTechniques, pages 181–192.

[Hr?dek et al., 2003] Hr?dek, J., Kuchar, M., and Skala, V. (2003). Hash functions and triangular meshreconstruction. Computers & Geosciences, 29(6):741 – 751.

[Kontkanen and Laine, 2005] Kontkanen, J. and Laine, S. (2005). Ambient occlusion fields. In Proceedings ofACM SIGGRAPH 2005 Symposium on Interactive 3D Graphics and Games, pages 41–48. ACM Press.

[Lei Yang, ] Lei Yang, Pedro V. Sander, J. L.

[Malmer et al., 2007] Malmer, M., Malmer, F., Assarsson, U., and Holzschuch, N. (2007). Fast precomputedambient occlusion for proximity shadows.

[Matt Pharr, 2005] Matt Pharr, R. F. (2005). GPU Gems 2. Addison-Wesley Professional.

[McGuire, 2009] McGuire, M. (2009). Ambient occlusion volumes. Technical Report CSTR200901,Williamstown, MA, USA.

[Mittring, 2007] Mittring, M. (2007). Finding next gen: Cryengine 2. In SIGGRAPH ’07: ACM SIGGRAPH2007 courses, pages 97–121, New York, NY, USA. ACM.

[Reinbothe et al., 2009] Reinbothe, C., Boubekeur, T., and Alexa, M. (2009). Hybrid ambient occlusion.EUROGRAPHICS 2009 Areas Papers.

[Ritschel et al., 2009] Ritschel, T., Grosch, T., and Seidel, H.-P. (2009). Approximating dynamic global illu-mination in image space. In I3D ’09: Proceedings of the 2009 symposium on Interactive 3D graphics and games,pages 75–82, New York, NY, USA. ACM.

[Wassenius, 2005] Wassenius, C. (2005). Accelerated ambient occlusion using spatial subdivision struc-tures.

23