Computer Science Thesis Defense

32
GPU Ray Tracing with CUDA BY TOM PITKIN Bill Clark, PhD Stu Steiner, MS, PhC 1

description

For my thesis, I developed and compared a sequential CPU and parallel GPU implementation of a ray tracer written in C++ and CUDA respectively. Here are the presentation slides from my thesis defense.

Transcript of Computer Science Thesis Defense

Page 1: Computer Science Thesis Defense

GPU Ray Tracing

with CUDABY TOM PITKIN

Bill Clark, PhD

Stu Steiner, MS, PhC

1

Page 2: Computer Science Thesis Defense

Objectives

Develop a sequential CPU and parallel GPU ray tracer

Illustrate the difference in rendering speed and design of a CPU and

GPU ray tracer

2

Page 3: Computer Science Thesis Defense

Outline

Introduction to Ray Tracing

CUDA

Parallelization with CUDA / Results

Future Work

Questions

3

Page 4: Computer Science Thesis Defense

What is Ray Tracing?

Rendering technique used in computer graphics

Simulates the behavior of light

Can produce advanced optical effects

4

Page 5: Computer Science Thesis Defense

Light in the Physical World 5

Light Source

Object with

Red Reflectivity

Pinhole

Film

Page 6: Computer Science Thesis Defense

The Virtual Camera Model

Eye Position – camera location in 3D space

Reference Point – point in 3D space where the camera is pointing

Orientation Vectors (u, v, n) – camera orientation in 3D space

Image Plane – projected plane of the camera’s field of view

6

n u

v (Up Vector)

Eye Position

Reference Point

Page 7: Computer Science Thesis Defense

Ray Generation

Map the physical screen to the image plane

Divide the image plane into a uniform grid of pixel locations

Send a ray through the center of each pixel location

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡

𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ

𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ

Pixel

7

Eye Position

Page 8: Computer Science Thesis Defense

Ray Intersection Testing

Ray – Sphere Intersection

Ray – Triangle Intersection

8

Page 9: Computer Science Thesis Defense

Phong Reflection Model 9

Ambient + + =Diffuse Specular Phong Reflection

Page 10: Computer Science Thesis Defense

Specular Reflection

Recursive Ray Tracing

10

Page 11: Computer Science Thesis Defense

Outline

Introduction to Ray Tracing

CUDA

Parallelization with CUDA / Results

Future Work

Questions

11

Page 12: Computer Science Thesis Defense

What is CUDA?

Compute Unified Device Architecture (CUDA)

Parallel computing platform

Developed by Nvidia

12

Page 13: Computer Science Thesis Defense

Kernel Functions

Specifies the code to be executed in parallel

Single Program, Multiple Data (SPMD)

13

Page 14: Computer Science Thesis Defense

Kernel Execution

Grids

Blocks

Threads

14

Page 15: Computer Science Thesis Defense

Memory Model

Global Memory

Constant Memory

Texture Memory

Registers

Local Memory

Shared Memory

15

Page 16: Computer Science Thesis Defense

Outline

Introduction to Ray Tracing

CUDA

Parallelization with CUDA / Results

Future Work

Questions

16

Page 17: Computer Science Thesis Defense

Thread Organization

2D array of blocks

2D array of threads

Each thread represents

a ray

17

Block (0, 0) Block (1, 0)

Block (0, 1) Block (1, 1)

Block (2, 0)

Block (2, 1)

Image Plane

Page 18: Computer Science Thesis Defense

Testing Environment

OS – Ubuntu Gnome Remix 13.04

CPU – Core i7-920

Core Clock – 2.66 GHz

GPU – Nvidia GTX 570

Core Clock - 742 MHz

CUDA Core - 480

Memory Clock - 3800 MHz

Video Memory - GDDR5 1280MB

18

Page 19: Computer Science Thesis Defense

Test Objects

Teapot

Surfaces: 1

Triangles: 992

Al

Surfaces: 174

Triangles: 7,124

Crocodile

Surfaces: 6

Triangles: 34,404

19

Page 20: Computer Science Thesis Defense

Single Kernel 20

1,617,160 (26.95 min)

55,260 (55.26 sec)

23,003 (23 sec)

5,867 (5.87 sec)

411 (0.41 sec)

160 (0.16 sec)

1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Single Kernel

Single Thread

Page 21: Computer Science Thesis Defense

Kernel Complexity and Size

Driver timeout

Register Spilling

21

Page 22: Computer Science Thesis Defense

Replacing Recursion

Iterative Loop

Layer based stack

Layers store color values returned from rays

Final image from convex combination of layers

22

Page 23: Computer Science Thesis Defense

Multi-Kernel 23

5,867 (5.87 sec)

411 (0.41 sec)

160 (0.16 sec)

13,217 (13.22 sec)

967 (0.97 sec)

381 (0.38 sec)

1 10 100 1,000 10,000 100,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel

Single Kernel (Previous Kernel)

Page 24: Computer Science Thesis Defense

Multi-Kernel with Single-Precision Floating Points 24

13,217 (13.22 sec)

967 (0.97 sec)

381 (0.38 sec)

1,556 (1.56 sec)

118 (0.12 sec)

46 (0.05 sec)

1 10 100 1,000 10,000 100,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Single-Precision

Floating Points

Multi-Kernel (Previous Kernel)

Page 25: Computer Science Thesis Defense

Caching Surface Data

Object’s surface data stored on shared memory

All threads in same block have access to cached surface data

Removes duplicate memory requests

Data reuse

25

Page 26: Computer Science Thesis Defense

Multi-Kernel with Surface Caching 26

1,556 (1.56 sec)

118 (0.12 sec)

46 (0.05 sec)

1,007 (1.01 sec)

133 (0.13 sec)

30 (0.03 sec)

1 10 100 1,000 10,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Surface Caching

Multi-Kernel with Single-Precision

Floating Points (Previous Kernel)

Page 27: Computer Science Thesis Defense

Simplifying Mesh Data

Triangle data originally stored as three points (vertices)

Optimize data by storing triangles as one point (vertex) and two edges

Calculate edges on host before kernel call

27

0.5, 1

0, 0 1, 0

0.5, 1

Page 28: Computer Science Thesis Defense

Multi-Kernel with Mesh Optimization 28

1,007 (1.01 sec)

133 (0.13 sec)

30 (0.03 sec)

873 (0.87 sec)

127 (0.13 sec)

27 (0.03 sec)

1 10 100 1,000 10,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Mesh Optimization

Multi-Kernel with Surface Caching

(Previous Kernel)

Page 29: Computer Science Thesis Defense

Final Results 29

1,617,160 (26.95 min)

55,260 (55.26 sec)

23,003 (23 sec)

873 (0.87 sec)

127 (0.13 sec)

27 (0.03 sec)

1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Crocodile

(Surfaces: 6)

(Triangles: 34,404)

Al

(Surfaces: 174)

(Triangles: 7,124)

Teapot

(Surfaces: 1)

(Triangles: 992)

Milliseconds

Multi-Kernel with Intersection

Optimization

Single Thread

Page 30: Computer Science Thesis Defense

Outline

Introduction to Ray Tracing

CUDA

Parallelization with CUDA / Results

Future Work

Questions

30

Page 31: Computer Science Thesis Defense

Future Work

Spatial partitioning

Multiple GPUs

Optimize code for different GPUs

31

Page 32: Computer Science Thesis Defense

Questions? 32