An Evaluation of Multi-Hit Ray Traversal in a BVH Using...

5
Journal of Computer Graphics Techniques Vol. 4, No. 4, 2015 http://jcgt.org An Evaluation of Multi-Hit Ray Traversal in a BVH Using Existing First-Hit/Any-Hit Kernels: Algorithm Listings and Performance Visualizations This document provides algorithm listings for multi-hit ray traversal using both pro- gressive insertion sort and post-traversal selection sort, as well as performance visu- alizations of divergence and swap counts for all eight scenes used in our evaluation. 1. Algorithm Listings As noted in the main text, hit points must be sorted to meet the ordering constraint of multi-hit ray traversal; we explore two sorting methods: progressive insertion sort during traversal and post-traversal selection sort. Progressive insertion sort. In the first approach, intersections are sorted as they are inserted into the local buffer. As outlined in Algorithm 1, the traversal stack and hit list are initialized (lines 2–3), the ray traversed (lines 4–21), and hit points inserted into the hit list (line 13). Importantly, this algorithm follows the general form of standard BVH traversal. However, when a valid intersection is found (line 11), it is inserted into the hit list in ray-order (lines 12–13). This algorithm flows naturally from a direct application of naive multi-hit traversal, under the constraints imposed by a BVH. This technique strives to maximize cache locality—hit points are likely already in cache during traversal. Specifically, if a hit point needs to be swapped, it will most often do so with its closest neighboring hit points. Thus, when a valid intersection is found, adjacent hit points will likely be valid in cache and can thus be swapped without the penalty of global memory latency. Post-traversal selection sort. In the second approach, intersections are gathered into the local buffer without regard for proper front-to-back ordering. As seen in Algo- rithm 2, the traversal process (lines 2–21) is nearly identical to that of Algorithm 1. However, a valid hit point is simply appended to the local buffer (line 13), and the entire collection is sorted after traversal is complete but before returning to the client (line 22). This approach strives to keep both traversal and sorting operations amenable to SIMD processing. During traversal, rays within a SIMD vector may stall because of 1 ISSN 2331-7418

Transcript of An Evaluation of Multi-Hit Ray Traversal in a BVH Using...

Page 1: An Evaluation of Multi-Hit Ray Traversal in a BVH Using ...jcgt.org/published/0004/04/04/extra.pdfImportantly, this algorithm follows the general form of standard BVH traversal. However,

Journal of Computer Graphics Techniques Vol. 4, No. 4, 2015 http://jcgt.org

An Evaluation of Multi-Hit Ray Traversal in a BVHUsing Existing First-Hit/Any-Hit Kernels:

Algorithm Listings and PerformanceVisualizations

This document provides algorithm listings for multi-hit ray traversal using both pro-gressive insertion sort and post-traversal selection sort, as well as performance visu-alizations of divergence and swap counts for all eight scenes used in our evaluation.

1. Algorithm Listings

As noted in the main text, hit points must be sorted to meet the ordering constraintof multi-hit ray traversal; we explore two sorting methods: progressive insertion sortduring traversal and post-traversal selection sort.

Progressive insertion sort. In the first approach, intersections are sorted as they areinserted into the local buffer. As outlined in Algorithm 1, the traversal stack and hitlist are initialized (lines 2–3), the ray traversed (lines 4–21), and hit points insertedinto the hit list (line 13).

Importantly, this algorithm follows the general form of standard BVH traversal.However, when a valid intersection is found (line 11), it is inserted into the hit list inray-order (lines 12–13). This algorithm flows naturally from a direct application ofnaive multi-hit traversal, under the constraints imposed by a BVH.

This technique strives to maximize cache locality—hit points are likely already incache during traversal. Specifically, if a hit point needs to be swapped, it will mostoften do so with its closest neighboring hit points. Thus, when a valid intersectionis found, adjacent hit points will likely be valid in cache and can thus be swappedwithout the penalty of global memory latency.

Post-traversal selection sort. In the second approach, intersections are gathered intothe local buffer without regard for proper front-to-back ordering. As seen in Algo-rithm 2, the traversal process (lines 2–21) is nearly identical to that of Algorithm 1.However, a valid hit point is simply appended to the local buffer (line 13), and theentire collection is sorted after traversal is complete but before returning to the client(line 22).

This approach strives to keep both traversal and sorting operations amenable toSIMD processing. During traversal, rays within a SIMD vector may stall because of

1 ISSN 2331-7418

Page 2: An Evaluation of Multi-Hit Ray Traversal in a BVH Using ...jcgt.org/published/0004/04/04/extra.pdfImportantly, this algorithm follows the general form of standard BVH traversal. However,

Journal of Computer Graphics TechniquesAn Evaluation of Multi-Hit Ray Traversal

Vol. 4, No. 4, 2015http://jcgt.org

Algorithm 1 Multi-hit ray traversal with progressive sort.

1: function TRAVERSE(root, ray)2: INITIALIZE(travStack, hitList)3: PUSH(travStack, root)4: while !EMPTY(travStack) do5: node← POP(travStack)6: if !INTERSECT(node, ray) then7: continue8: end if9: if ISLEAF(node) then

10: for triangle in node do11: if INTERSECT(triangle, ray) then12: hitData← (t, u, v, tID, ...)13: INSERT(hitList, hitData)14: end if15: end for16: continue17: end if18: f ar← FARCHILD(node)19: PUSH(travStack, f ar)20: node← NEARCHILD(node)21: end while22: return hitList23: end function

neighboring rays that must find and store additional hit points. The stall period is di-rectly proportional to the time required to insert hit-point data into the local buffer. Ifsorting is postponed until after traversal, the potential stall period is reduced. Further-more, divergence among rays during traversal negatively impacts sorting coherence.As shown in Section 3 of the main text, the actual sorting process is significantly morecoherent when the set of intersection points to be sorted is well-known and bounded(after ray traversal) than when this information is unknown (during ray traversal).

2. Performance Visualizations

As noted in the main text, we observe that SIMD utilization is improved by deferringsort until after traversal. Divergence of neighboring rays is an issue even with first-hit ray traversal; sorting intersections during traversal only increases the probabilitythat rays and the corresponding sorting operations will diverge, because the numberof intersections along each ray may differ. Post-traversal sort alleviates this issue and,

2

Page 3: An Evaluation of Multi-Hit Ray Traversal in a BVH Using ...jcgt.org/published/0004/04/04/extra.pdfImportantly, this algorithm follows the general form of standard BVH traversal. However,

Journal of Computer Graphics TechniquesAn Evaluation of Multi-Hit Ray Traversal

Vol. 4, No. 4, 2015http://jcgt.org

Algorithm 2 Multi-hit ray traversal with post-traversal sort.

1: function TRAVERSE(root, ray)2: INITIALIZE(travStack, hitList)3: PUSH(travStack, root)4: while !EMPTY(travStack) do5: node← POP(travStack)6: if !INTERSECT(node, ray) then7: continue8: end if9: if ISLEAF(node) then

10: for triangle in node do11: if INTERSECT(triangle, ray) then12: hitData← (t, u, v, tID, ...)13: APPEND(hitList, hitData)14: end if15: end for16: continue17: end if18: f ar← FARCHILD(node)19: PUSH(travStack, f ar)20: node← NEARCHILD(node)21: end while22: SORT(hitList)23: return hitList24: end function

as noted in Section 2 of the main text, exploits a priori knowledge of the number of hitpoints to sort. Visualizations of divergence are shown for all eight scenes in Figures 1and 2.

We also see that swap counts are reduced when using post-traversal selection sort.Dense scenes, in particular, benefit the most, as more hit points are likely to be out-of-order. While swap counts do not significantly impact overall performance, we never-theless observe a reduction in the amount of work imposed by sorting. Visualizationsof swap counts are shown for all eight scenes in Figures 1 and 2.

3

Page 4: An Evaluation of Multi-Hit Ray Traversal in a BVH Using ...jcgt.org/published/0004/04/04/extra.pdfImportantly, this algorithm follows the general form of standard BVH traversal. However,

Journal of Computer Graphics TechniquesAn Evaluation of Multi-Hit Ray Traversal

Vol. 4, No. 4, 2015http://jcgt.org

#sw

aps

dive

rgen

ce(S

oA)

refe

renc

eim

age

prog

ress

ive

post

-tra

vers

alpr

ogre

ssiv

epo

st-t

rave

rsal

Figu

re1.

Perf

orm

ance

visu

aliz

atio

ns(1

of2)

.H

eatm

apvi

sual

izat

ions

ofsw

apco

unt

and

dive

rgen

cefo

rou

rfil

ter-

base

dm

ulti-

hit

ray

trav

ersa

lte

chni

ques

usin

gth

eco

lor

scal

ein

Figu

re1

ofth

em

ain

text

.W

eob

serv

eth

atpo

st-t

rave

rsal

sele

ctio

nso

rtre

duce

sbo

thsw

apco

unta

nddi

verg

ence

inea

chof

the

eigh

tsce

nes

used

forp

erfo

rman

ceev

alua

tion.

4

Page 5: An Evaluation of Multi-Hit Ray Traversal in a BVH Using ...jcgt.org/published/0004/04/04/extra.pdfImportantly, this algorithm follows the general form of standard BVH traversal. However,

Journal of Computer Graphics TechniquesAn Evaluation of Multi-Hit Ray Traversal

Vol. 4, No. 4, 2015http://jcgt.org

#sw

aps

dive

rgen

ce(S

oA)

refe

renc

eim

age

prog

ress

ive

post

-tra

vers

alpr

ogre

ssiv

epo

st-t

rave

rsal

Figu

re2.

Per

form

ance

visu

aliz

atio

ns(2

of2)

.

5