Computer Science Thesis Defense

GPU Ray Tracing

with CUDABY TOM PITKIN

Bill Clark, PhD

Stu Steiner, MS, PhC

1

Objectives

Develop a sequential CPU and parallel GPU ray tracer

Illustrate the difference in rendering speed and design of a CPU and

GPU ray tracer

2

Outline

Introduction to Ray Tracing

CUDA

Parallelization with CUDA / Results

Future Work

Questions

3

What is Ray Tracing?

Rendering technique used in computer graphics

Simulates the behavior of light

Can produce advanced optical effects

4

Light in the Physical World 5

Light Source

Object with

Red Reflectivity

Pinhole

Film

The Virtual Camera Model

Eye Position – camera location in 3D space

Reference Point – point in 3D space where the camera is pointing

Orientation Vectors (u, v, n) – camera orientation in 3D space

Image Plane – projected plane of the camera’s field of view

6

n u

v (Up Vector)

Eye Position

Reference Point

Ray Generation

Map the physical screen to the image plane

Divide the image plane into a uniform grid of pixel locations

Send a ray through the center of each pixel location

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡

𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ

𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ

Pixel

7

Eye Position

Ray Intersection Testing

Ray – Sphere Intersection

Ray – Triangle Intersection

8

Phong Reflection Model 9

Ambient + + =Diffuse Specular Phong Reflection

Specular Reflection

Recursive Ray Tracing

10

Outline


CUDA


Future Work

Questions

11

What is CUDA?

Compute Unified Device Architecture (CUDA)

Parallel computing platform

Developed by Nvidia

12

Kernel Functions

Specifies the code to be executed in parallel

Single Program, Multiple Data (SPMD)

13

Kernel Execution

Grids

Blocks

Threads

14

Memory Model

Global Memory

Constant Memory

Texture Memory

Registers

Local Memory

Shared Memory

15

Outline


CUDA


Future Work

Questions

16

Thread Organization

2D array of blocks

2D array of threads

Each thread represents

a ray

17

Block (0, 0) Block (1, 0)

Block (0, 1) Block (1, 1)

Block (2, 0)

Block (2, 1)

Image Plane

Testing Environment

OS – Ubuntu Gnome Remix 13.04

CPU – Core i7-920

Core Clock – 2.66 GHz

GPU – Nvidia GTX 570

Core Clock - 742 MHz

CUDA Core - 480

Memory Clock - 3800 MHz

Video Memory - GDDR5 1280MB

18

Test Objects

Teapot

Surfaces: 1

Triangles: 992

Al

Surfaces: 174

Triangles: 7,124

Crocodile

Surfaces: 6

Triangles: 34,404

19

Single Kernel 20

1,617,160 (26.95 min)

55,260 (55.26 sec)

23,003 (23 sec)

5,867 (5.87 sec)

411 (0.41 sec)

160 (0.16 sec)

1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Single Kernel

Single Thread

Kernel Complexity and Size

Driver timeout

Register Spilling

21

Replacing Recursion

Iterative Loop

Layer based stack

Layers store color values returned from rays

Final image from convex combination of layers

22

Multi-Kernel 23

5,867 (5.87 sec)

411 (0.41 sec)

160 (0.16 sec)

13,217 (13.22 sec)

967 (0.97 sec)

381 (0.38 sec)

1 10 100 1,000 10,000 100,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel

Single Kernel (Previous Kernel)

Multi-Kernel with Single-Precision Floating Points 24

13,217 (13.22 sec)

967 (0.97 sec)

381 (0.38 sec)

1,556 (1.56 sec)

118 (0.12 sec)

46 (0.05 sec)

1 10 100 1,000 10,000 100,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Single-Precision

Floating Points

Multi-Kernel (Previous Kernel)

Caching Surface Data

Object’s surface data stored on shared memory

All threads in same block have access to cached surface data

Removes duplicate memory requests

Data reuse

25

Multi-Kernel with Surface Caching 26

1,556 (1.56 sec)

118 (0.12 sec)

46 (0.05 sec)

1,007 (1.01 sec)

133 (0.13 sec)

30 (0.03 sec)

1 10 100 1,000 10,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Surface Caching

Multi-Kernel with Single-Precision

Floating Points (Previous Kernel)

Simplifying Mesh Data

Triangle data originally stored as three points (vertices)

Optimize data by storing triangles as one point (vertex) and two edges

Calculate edges on host before kernel call

27

0.5, 1

0, 0 1, 0

0.5, 1

Multi-Kernel with Mesh Optimization 28

1,007 (1.01 sec)

133 (0.13 sec)

30 (0.03 sec)

873 (0.87 sec)

127 (0.13 sec)

27 (0.03 sec)

1 10 100 1,000 10,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Mesh Optimization

Multi-Kernel with Surface Caching

(Previous Kernel)

Final Results 29

1,617,160 (26.95 min)

55,260 (55.26 sec)

23,003 (23 sec)

873 (0.87 sec)

127 (0.13 sec)

27 (0.03 sec)

1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Intersection

Optimization

Single Thread

Outline


CUDA


Future Work

Questions

30

Future Work

Spatial partitioning

Multiple GPUs

Optimize code for different GPUs

31

Questions? 32

Computer Science Thesis Defense

Technology

Transcript of Computer Science Thesis Defense