Computer Science Thesis Defense
-
Upload
tompitkin -
Category
Technology
-
view
413 -
download
6
description
Transcript of Computer Science Thesis Defense
GPU Ray Tracing
with CUDABY TOM PITKIN
Bill Clark, PhD
Stu Steiner, MS, PhC
1
Objectives
Develop a sequential CPU and parallel GPU ray tracer
Illustrate the difference in rendering speed and design of a CPU and
GPU ray tracer
2
Outline
Introduction to Ray Tracing
CUDA
Parallelization with CUDA / Results
Future Work
Questions
3
What is Ray Tracing?
Rendering technique used in computer graphics
Simulates the behavior of light
Can produce advanced optical effects
4
Light in the Physical World 5
Light Source
Object with
Red Reflectivity
Pinhole
Film
The Virtual Camera Model
Eye Position – camera location in 3D space
Reference Point – point in 3D space where the camera is pointing
Orientation Vectors (u, v, n) – camera orientation in 3D space
Image Plane – projected plane of the camera’s field of view
6
n u
v (Up Vector)
Eye Position
Reference Point
Ray Generation
Map the physical screen to the image plane
Divide the image plane into a uniform grid of pixel locations
Send a ray through the center of each pixel location
𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡
𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡
𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ
𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
Pixel
7
Eye Position
Ray Intersection Testing
Ray – Sphere Intersection
Ray – Triangle Intersection
8
Phong Reflection Model 9
Ambient + + =Diffuse Specular Phong Reflection
Specular Reflection
Recursive Ray Tracing
10
Outline
Introduction to Ray Tracing
CUDA
Parallelization with CUDA / Results
Future Work
Questions
11
What is CUDA?
Compute Unified Device Architecture (CUDA)
Parallel computing platform
Developed by Nvidia
12
Kernel Functions
Specifies the code to be executed in parallel
Single Program, Multiple Data (SPMD)
13
Kernel Execution
Grids
Blocks
Threads
14
Memory Model
Global Memory
Constant Memory
Texture Memory
Registers
Local Memory
Shared Memory
15
Outline
Introduction to Ray Tracing
CUDA
Parallelization with CUDA / Results
Future Work
Questions
16
Thread Organization
2D array of blocks
2D array of threads
Each thread represents
a ray
17
Block (0, 0) Block (1, 0)
Block (0, 1) Block (1, 1)
Block (2, 0)
Block (2, 1)
Image Plane
Testing Environment
OS – Ubuntu Gnome Remix 13.04
CPU – Core i7-920
Core Clock – 2.66 GHz
GPU – Nvidia GTX 570
Core Clock - 742 MHz
CUDA Core - 480
Memory Clock - 3800 MHz
Video Memory - GDDR5 1280MB
18
Test Objects
Teapot
Surfaces: 1
Triangles: 992
Al
Surfaces: 174
Triangles: 7,124
Crocodile
Surfaces: 6
Triangles: 34,404
19
Single Kernel 20
1,617,160 (26.95 min)
55,260 (55.26 sec)
23,003 (23 sec)
5,867 (5.87 sec)
411 (0.41 sec)
160 (0.16 sec)
1 10 100 1,000 10,000 100,000 1,000,000 10,000,000
Crocodile
(Surfaces: 6)
(Triangles: 34,404)
Al
(Surfaces: 174)
(Triangles: 7,124)
Teapot
(Surfaces: 1)
(Triangles: 992)
Milliseconds
Single Kernel
Single Thread
Kernel Complexity and Size
Driver timeout
Register Spilling
21
Replacing Recursion
Iterative Loop
Layer based stack
Layers store color values returned from rays
Final image from convex combination of layers
22
Multi-Kernel 23
5,867 (5.87 sec)
411 (0.41 sec)
160 (0.16 sec)
13,217 (13.22 sec)
967 (0.97 sec)
381 (0.38 sec)
1 10 100 1,000 10,000 100,000
Crocodile
(Surfaces: 6)
(Triangles: 34,404)
Al
(Surfaces: 174)
(Triangles: 7,124)
Teapot
(Surfaces: 1)
(Triangles: 992)
Milliseconds
Multi-Kernel
Single Kernel (Previous Kernel)
Multi-Kernel with Single-Precision Floating Points 24
13,217 (13.22 sec)
967 (0.97 sec)
381 (0.38 sec)
1,556 (1.56 sec)
118 (0.12 sec)
46 (0.05 sec)
1 10 100 1,000 10,000 100,000
Crocodile
(Surfaces: 6)
(Triangles: 34,404)
Al
(Surfaces: 174)
(Triangles: 7,124)
Teapot
(Surfaces: 1)
(Triangles: 992)
Milliseconds
Multi-Kernel with Single-Precision
Floating Points
Multi-Kernel (Previous Kernel)
Caching Surface Data
Object’s surface data stored on shared memory
All threads in same block have access to cached surface data
Removes duplicate memory requests
Data reuse
25
Multi-Kernel with Surface Caching 26
1,556 (1.56 sec)
118 (0.12 sec)
46 (0.05 sec)
1,007 (1.01 sec)
133 (0.13 sec)
30 (0.03 sec)
1 10 100 1,000 10,000
Crocodile
(Surfaces: 6)
(Triangles: 34,404)
Al
(Surfaces: 174)
(Triangles: 7,124)
Teapot
(Surfaces: 1)
(Triangles: 992)
Milliseconds
Multi-Kernel with Surface Caching
Multi-Kernel with Single-Precision
Floating Points (Previous Kernel)
Simplifying Mesh Data
Triangle data originally stored as three points (vertices)
Optimize data by storing triangles as one point (vertex) and two edges
Calculate edges on host before kernel call
27
0.5, 1
0, 0 1, 0
0.5, 1
Multi-Kernel with Mesh Optimization 28
1,007 (1.01 sec)
133 (0.13 sec)
30 (0.03 sec)
873 (0.87 sec)
127 (0.13 sec)
27 (0.03 sec)
1 10 100 1,000 10,000
Crocodile
(Surfaces: 6)
(Triangles: 34,404)
Al
(Surfaces: 174)
(Triangles: 7,124)
Teapot
(Surfaces: 1)
(Triangles: 992)
Milliseconds
Multi-Kernel with Mesh Optimization
Multi-Kernel with Surface Caching
(Previous Kernel)
Final Results 29
1,617,160 (26.95 min)
55,260 (55.26 sec)
23,003 (23 sec)
873 (0.87 sec)
127 (0.13 sec)
27 (0.03 sec)
1 10 100 1,000 10,000 100,000 1,000,000 10,000,000
Crocodile
(Surfaces: 6)
(Triangles: 34,404)
Al
(Surfaces: 174)
(Triangles: 7,124)
Teapot
(Surfaces: 1)
(Triangles: 992)
Milliseconds
Multi-Kernel with Intersection
Optimization
Single Thread
Outline
Introduction to Ray Tracing
CUDA
Parallelization with CUDA / Results
Future Work
Questions
30
Future Work
Spatial partitioning
Multiple GPUs
Optimize code for different GPUs
31
Questions? 32