Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

83
Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila

Transcript of Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

Page 1: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Tero KarrasTimo Aila

Page 2: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

2

Ray tracing comes in many flavors

Interactive apps1M–100Mrays/frame

Architecture & design100M–10Grays/frame

Movie production10G–1T

rays/frame

© Activision 2009, Game trailer by Blur Studio

Courtesy of Delta Tracing Lucasfilm Ltd.™, Digital work by ILM

Courtesy of Columbia Pictures

NVIDIA

Courtesy of Dassault Systemes

Page 3: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

3

Effective performance

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔𝑡𝑖𝑚𝑒

Page 4: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

4

Effective performance

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔𝑡𝑖𝑚𝑒=𝑡𝑖𝑚𝑒𝑡𝑜𝑏𝑢𝑖𝑙𝑑𝐵𝑉𝐻+𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑟𝑎𝑦 h h𝑡 𝑟𝑜𝑢𝑔 𝑝𝑢𝑡

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔𝑡𝑖𝑚𝑒

“speed” “quality”

Page 5: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

5

Effective performance

1M 10M 100M 1G 10G 100G 1T0

50

100

150

200

250

300

350

400

450

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛

𝑐𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔𝑡𝑖𝑚𝑒=𝑡𝑖𝑚𝑒𝑡𝑜𝑏𝑢𝑖𝑙𝑑𝐵𝑉𝐻+𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑟𝑎𝑦 h h𝑡 𝑟𝑜𝑢𝑔 𝑝𝑢𝑡

Interactiveapps

Architecture& design

Movieproduction

Speed

matters

Qua

lity

matt

ersBoth matter

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔𝑡𝑖𝑚𝑒

Mrays/s

Page 6: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

6

Effective performance

1M 10M 100M 1G 10G 100G 1T0

50

100

150

200

250

300

350

400

450

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛

𝑐𝑒

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

SODA (2.2M tris)NVIDIA GTX TitanDiffuse rays

Page 7: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

7

Effective performance

1M 10M 100M 1G 10G 100G 1T0

50

100

150

200

250

300

350

400

450

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛

𝑐𝑒

SBVH[Stich et al. 2009]

(CPU, 4 cores)

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Build timedominates

Mrays/s

Page 8: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

8

Effective performance

1M 10M 100M 1G 10G 100G 1T0

50

100

150

200

250

300

350

400

450

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛

𝑐𝑒

HLBVH[Garanzha et al. 2011]

(GPU)

???

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

SBVH[Stich et al. 2009]

(CPU, 4 cores)

Page 9: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

9

Effective performance

1M 10M 100M 1G 10G 100G 1T0

50

100

150

200

250

300

350

400

450

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛

𝑐𝑒

Our method(GPU)

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

HLBVH[Garanzha et al. 2011]

(GPU)

SBVH[Stich et al. 2009]

(CPU, 4 cores)

Page 10: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

10

1M 10M 100M 1G 10G 100G 1T0

50

100

150

200

250

300

350

400

450

Effective performance𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛

𝑐𝑒

30M–500Grays/frame

97% ofSBVH

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Mrays/s

Best quality–speed tradeoff for wide range of applications

Page 11: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

11

Treelet restructuring

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

Page 12: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

12

Treelet restructuring

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendants

Page 13: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

13

R

Treelet restructuringTreelet root

Treelet leaf

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendants

Page 14: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

14

R

Treelet restructuringTreelet root

Treelet leaf

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodes

Grow

Page 15: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

15

R

Treelet restructuringTreeletinternal

node

Grow

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodes

Page 16: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

16

R

Treelet restructuring

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodes

Page 17: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

17

R

Treelet restructuring

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodes

Page 18: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

18

R

Treelet restructuring

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodes

Page 19: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

19

Treelet restructuring

R

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodesLargest leaves → best results

Page 20: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

20

Treelet restructuring

R

C

A B

F

D

G

E

treelet leaves

treelet internalnodesIdea

Build a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodesLargest leaves → best results

Valid binary tree in itself

Page 21: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

21

Treelet restructuring

R

C

A B

F

D

G

E

ActualBVH leaf

Arbitrarysubtree

IdeaBuild a low-quality BVHOptimize its node topologyLook at multiple nodes at once

TreeletSubset of a node’s descendantsGrow by turning leaves into internal nodesLargest leaves → best results

Valid binary tree in itselfLeaves can represent arbitrary subtrees of the BVH

Page 22: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

22

Treelet restructuring

R

C

A B

F

D

G

E

RestructuringConstruct optimal binary tree for the same set of leavesReplace old treelet

Page 23: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

23

Treelet restructuring

C

A B

F

D

G

E

R

RestructuringConstruct optimal binary tree for the same set of leavesReplace old treelet

Reuse the same nodesUpdate connectivity and AABBsNew AABBs should be smaller

Page 24: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

24

D

E

G

Treelet restructuring

A

R

C

F

B

RestructuringConstruct optimal binary tree for the same set of leavesReplace old treelet

Reuse the same nodesUpdate connectivity and AABBsNew AABBs should be smaller

Page 25: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

25

Treelet restructuring

CA BF

D

G E

R

RestructuringConstruct optimal binary tree for the same set of leavesReplace old treelet

Reuse the same nodesUpdate connectivity and AABBsNew AABBs should be smaller

Page 26: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

26

Treelet restructuring

R

CA BF

D

G E

RestructuringConstruct optimal binary tree for the same set of leavesReplace old treelet

Reuse the same nodesUpdate connectivity and AABBsNew AABBs should be smaller

Perfectly localized operationLeaves and their subtrees are kept intactNo need to look at subtree contents

Page 27: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

27

Processing stages

Initial BVH construction

Post-processing

Optimization

Input triangles

One triangleper leaf

Page 28: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

28

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel LBVH[Karras 2012]

60-bit Morton codesfor accurate spatial

partitioning

Page 29: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

29

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Restructure multipletreelets in parallel

Page 30: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

30

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 31: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

31

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 32: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

32

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 33: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

33

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 34: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

34

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Page 35: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

35

Processing stages

Initial BVH construction

Post-processing

Optimization

Parallel bottom-up traversal[Karras 2012]

Strict bottom-up order→ no overlap between treelets

Page 36: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

36

Processing stages

Initial BVH construction

Post-processing

Rinse and repeat(3 times is plenty)

Optimization

Page 37: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

37

Processing stages

Initial BVH construction

Post-processing

Optimization

Collapse subtreesinto leaf nodes

Page 38: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

38

Processing stages

Initial BVH construction

Post-processing

Optimization

Collect trianglesinto linear lists Prepare them for Woop’s

intersection test[Woop 2004]

Page 39: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

39

Processing stages

Initial BVH construction

Post-processing

Optimization

Fast GPU ray traversal[Aila et al. 2012]

Page 40: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

40

Processing stages

Initial BVH construction

Post-processing

Optimization

Triangle splitting

Fast GPU ray traversal[Aila et al. 2012]

Page 41: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

41

Processing stages

Initial BVH construction

Post-processing

Optimization

Triangle splitting

Fast GPU ray traversal[Aila et al. 2012]

0.4 ms

5.4 ms6.6 ms

17.0 ms21.4 ms

1.2 ms1.6 ms

DRAGON (870K tris)NVIDIA GTX Titan23.6 ms / 30.0 ms

No splits

30% splits

Page 42: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

42

Cost model

Surface area cost model[Goldsmith and Salmon 1987], [MacDonald and Booth 1990]

Track cost and triangle count of each subtree

Minimize SAH cost of the final BVHMake collapsing decisions already during optimization

→ Unified processing of leaves and internal nodes

Page 43: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

43

Optimal restructuring

Finding the optimal node topology is NP-hardNaive algorithm → Our approach →

But it becomes very powerful as grows treelet leaves is enough for high-quality results

Use fixed-size treeletsConstant cost per treelet

→ Linear with respect to scene size

Page 44: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

44

Optimal restructuring

Treelet size Layouts Quality vs. SBVH *4 15 78%5 105 85%6 945 88%7 10,395 97%8 135,135 98%

* SODA (2.2M tris)

Number of unique ways forrestructuring a given treelet

Ray tracing performanceafter 3 rounds of optimization

Page 45: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

45

Optimal restructuring

Treelet size Layouts Quality vs. SBVH *4 15 78%5 105 85%6 945 88%7 10,395 97%8 135,135 98%

* SODA (2.2M tris)

Almost thesame thing astree rotations

[Kensler 2008]

Limited options during optimization→ easy to get stuck in a local optimum

Varies a lotbetweenscenes

Page 46: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

46

Optimal restructuring

Treelet size Layouts Quality vs. SBVH *4 15 78%5 105 85%6 945 88%7 10,395 97%8 135,135 98%

* SODA (2.2M tris)

Can still beimplemented

efficiently

Surely one of thesewill take us forward

Consistentacross scenes

Further improvementis marginal

Page 47: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

47

Algorithm

Dynamic programmingSolve small subproblems firstTabulate their solutionsBuild on them to solve larger subproblems

Subproblem:What’s the best node topology for a subset of the leaves?

Page 48: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

48

Algorithm

input: set of treelet leavesfor to do

for each subset of size dofor each way of partitioning the leaves do

look up subtree costscalculate SAH cost

end forrecord the best solution

end forend forreconstruct optimal topology

Process subsets fromsmallest to largest

Record the optimalSAH cost for each

Page 49: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

49

Algorithm

input: set of treelet leavesfor to do

for each subset of size dofor each way of partitioning the leaves do

look up subtree costscalculate SAH cost

end forrecord the best solution

end forend forreconstruct optimal topology

Exhaustive search:assign each leaf toleft/right subtree

We already knowhow much thesubtrees will cost

Backtrack thepartitioning choices

Page 50: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

50

Scalar vs. SIMD

Scalar processingEach thread processes one treeletNeed many treelets in flight

SIMD processing32 threads collaborate on the same treeletNeed few treelets in flight

✗ Spills to off-chip memory✗ Doesn’t scale to small scenes✓ Trivial to implement

✓ Data fits in on-chip memory✓ Easy to fill the entire GPU✗ Need to keep all threads busy

Parallelize over subproblems usinga pre-optimized processing schedule

(details in the paper)

Page 51: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

51

Scalar vs. SIMD

Scalar processingEach thread processes one treeletNeed many treelets in flight

SIMD processing32 threads collaborate on the same treeletNeed few treelets in flight

✓ Data fits in on-chip memory✓ Easy to fill the entire GPU✓ Possible to keep threads busy

✗ Spills to off-chip memory✗ Doesn’t scale to small scenes✓ Trivial to implement

Page 52: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

52

Quality vs. speed

Spend less effort on bottom-most nodesLow contribution to SAH costQuick convergence

Additional parameter Only process subtrees that are large enoughTrade quality for speed

Double after each roundSignificant speedupNegligible effect on quality

Page 53: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

53

Triangle splitting

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Bounding box is nota good approximation

Split it!

Large triangle

Page 54: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

54

Triangle splitting

Resulting boxesprovide atighter bound

Keep going until theyare small enough

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Page 55: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

55

Triangle splitting

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Keep going until theyare small enough

Page 56: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

56

Triangle splitting

Early Split Clipping [Ernst and Greiner 2007]Split triangle bounding boxes as a pre-process

Treat each box as aseparate primitive

Triangle itselfremains the same

Page 57: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

57

Triangle splitting

Shortcomings of pre-process splittingCan hurt ray tracing performanceUnpredictable memory usageRequires manual tuning

Improve with better heuristicsSelect good split planesConcentrate splits where they matterUse a fixed split budget

Page 58: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

58

Split plane selection

Root node partitions thescene at its spatial median

Reduce node overlap in the initial BVH

Page 59: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

59

Split plane selection

Left child

Right child

Reduce node overlap in the initial BVH

Page 60: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

60

If a trianglecrosses the plane...

Split plane selection

...the bounding boxes will overlap

Use the samespatial medianas a split plane

Reduce node overlap in the initial BVH

Page 61: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

61

Split plane selection

No overlap

Use the samespatial medianas a split plane

Reduce node overlap in the initial BVH

Page 62: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

62

Split plane selection

Splitting one triangledoes not help much

Need to split them allto get the benefits

Reduce node overlap in the initial BVH

Page 63: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

63

Split plane selectionReduce node overlap in the initial BVH

Page 64: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

64

Split plane selection

Same reasoning holds on multiple levels

Reduce node overlap in the initial BVH

Level 0

Level 1

Level 2 Level 2

Level 3

Level 3

Page 65: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

65

Split plane selectionLook at all spatial

median planes thatintersect a triangle

Split it with thedominant one

Reduce node overlap in the initial BVH

Level 1

Level 2 Level 2Level 0

Level 3

Level 3

Page 66: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

66

Algorithm

1. Allocate memory for a fixed split budget

Page 67: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

67

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

Page 68: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

68

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

3. Distribute the split budget among trianglesProportional to their priority values

Page 69: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

69

Algorithm

1. Allocate memory for a fixed split budget

2. Calculate a priority value for each triangle

3. Distribute the split budget among trianglesProportional to their priority values

4. Split each triangle recursivelyDistribute remaining splits according to the size of the resulting AABBs

Page 70: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

70

Split priority

𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦=(2(−𝑙𝑒𝑣𝑒𝑙 ) ∙ ( 𝐴𝑎𝑎𝑏𝑏−𝐴𝑖𝑑𝑒𝑎𝑙 ))1/3

Crosses an importantspatial median plane?

Has large potential forreducing surface area?

Concentrate on triangleswhere both apply

…but leavesomething forthe rest, too

Page 71: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

71

ResultsCompare against 4 CPU and 3 GPU builders

4-core i7 930, NVIDIA GTX TitanAverage of 20 test scenes, multiple viewpoints

Page 72: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

Ray tracing performance

72

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

0%

20%

40%

60%

80%

100%

120%

140%

SweepSAH = 100%

High-quality CPU builders

SBVH = 131%

Page 73: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

0%

20%

40%

60%

80%

100%

120%

140%

Ray tracing performance

73

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

Fast GPU builders

67% – 69%

SBVH = 131%

Almost 2×

Page 74: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

Ray tracing performance

74

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

TRBVH TRBVH+30% split

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

0%

20%

40%

60%

80%

100%

120%

140%

No splits30% splits

Our method

Page 75: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

Ray tracing performance

75

SweepSAH[MacDonald]

SBVH[Stich]

Treerotations[Kensler]

Iterativereinsertion

[Bittner]

TRBVH TRBVH+30% split

LBVH[Karras]

HLBVH[Garanzha]

GridSAH[Garanzha]

0%

20%

40%

60%

80%

100%

120%

140%

96% of SweepSAH

91% of SBVH

Page 76: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

76

Effective performance

1M 10M 100M 1G 10G 100G 1T0%

20%

40%

60%

80%

100%

120%

140%

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Tree rotations[Kensler]

Iterative reinsertion[Bittner]

Not Pareto-optimal

SBVH[Stich]

SweepSAH[MacDonald]

Page 77: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

77

Effective performance

1M 10M 100M 1G 10G 100G 1T0%

20%

40%

60%

80%

100%

120%

140%

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

SweepSAH[MacDonald]

Page 78: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

78

Effective performance

1M 10M 100M 1G 10G 100G 1T0%

20%

40%

60%

80%

100%

120%

140%

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

HLBVH[Garanzha] GridSAH

[Garanzha]

SBVH[Stich]

SweepSAH[MacDonald]

LBVH[Karras]

Not Pareto-optimal

Page 79: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

79

Effective performance

1M 10M 100M 1G 10G 100G 1T0%

20%

40%

60%

80%

100%

120%

140%

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

SweepSAH[MacDonald]

LBVH[Karras]

Page 80: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

80

Effective performance

1M 10M 100M 1G 10G 100G 1T0%

20%

40%

60%

80%

100%

120%

140%

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

SBVH[Stich]

Our method(no splits)

SweepSAH[MacDonald]

LBVH[Karras]

Our method(30% splits)

Page 81: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

81

Effective performance

1M 10M 100M 1G 10G 100G 1T0%

20%

40%

60%

80%

100%

120%

140%

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

7M–60G rays/frame→ our method is the best choice

Below 7M→ LBVH

Above 60G→ SBVH

Page 82: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

82

Conclusion

General framework for optimizing treesInherently parallelApproximate restructuring → larger treelets?

Practical GPU-based BVH builderBest choice in a large class of applicationsAdjustable quality–speed tradeoff

Will be integrated into NVIDIA OptiX

Page 83: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila.

83

Thank you

AcknowledgementsSamuli LaineJaakko LehtinenSami LiedesDavid McAllisterAnonymous reviewersAnat Grynberg and Greg Ward for CONFERENCE

University of Utah for FAIRY

Marko Dabrovic for SIBENIK

Ryan Vance for BUBS

Samuli Laine for HAIRBALL and VEGETATION

Guillermo Leal Laguno for SANMIGUEL

Jonathan Good for ARABIC, BABYLONIAN and ITALIAN

Stanford Computer Graphics Laboratory for ARMADILLO, BUDDHA and DRAGON

Cornell University for BAR

Georgia Institute of Technology for BLADE