Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of...

Real-Time Reyes: Programmable Pipelines and Research Challenges

Anjul PatneyUniversity of California, Davis

Real-Time Reyes-Style Adaptive Surface Subdivision

Anjul Patney and John D. OwensSIGGRAPH Asia 2008 (to appear)

http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=952

Cinematic rendering looks goodImage courtesy: Pixar

High geometric complexityImage courtesy: Pixar

High shading complexityImage courtesy: Pixar

Motion blur, Depth-of-fieldImage courtesy: Pixar

Complex lightingImage courtesy: Pixar

All this is slowImage courtesy: Pixar

Our Goal

• Make it faster– As much as possible in real time

• Use the GPU– Massive available parallelism– Fixed-function texture/raster units– High Programmability

Enter Reyes• Developed in 1980s, offline

• Pixel-accurate smooth surfaces

• Eye-space shading

• Stochastic sampling

• Order-independent transparency

Bound/Split

Dice (tessellate)

Displace

Compose

Sample

Vertex Shade

Geometry Shade

Rasterize

Fragment Shade

Depth-buffer

Reyes Direct3D 10

Bound/Split

Dice (tessellate)

Displace

Compose

Sample

• High-quality geometry

• Convenient surface shading

• Cinematic quality

• Well-behaved (?)

Bound/Split

Dice (tessellate)

Displace

Compose

Sample

• Input:Smooth surfaces

• Output:0.5 x 0.5 pixel quads (micropolygons)

Outline

• Reyes Subdivision – algorithm

• Subdivision on GPU – implementation

• Results

• Limitations

Outline

• Results

• Limitations

Split-Dice

• Recursively split a surface till dicing makes sense

• Uniformly sample it to form a grid of micropolygons

Bound/Cull

Diceable?

No: Split

Yes:Dice

Micropolygon

Post-split surface

Split-Dice is hard

• Split–Recursive, serial–Rapid primitive

generation/destruction

• Dice–Huge memory demand

Can we do this in parallel?

Parallel Split

A lot of independent operations

Regular computation

Analogy: A Dynamic work queue

A B C D E F G H I

A A B C EE F H

Split No ActionCull

How can we do these efficiently?

• Creating new primitives– How to dynamically allocate space?

• Culling unneeded primitives– How to avoid fragmentation?

A child primitive is offset by the queue length

Our Choice – keep it simple…

A B C D E F G H I

A B C E F H A C E

…and get rid of the holes later

A B C E F H A C E

Scan-based compact is fast!

(Sengupta ‘07)

Storage Issues for Dice

• Too many micropolygons– Cannot reject early

• Screen-space buckets (tiles)– In parallel ?– Ideal bucket size ?

Outline

• Results

• Limitations

Platform

• NVIDIA GeForce 8800 GTX– 16 SIMD Multiprocessors– 16KB shared memory– 768 MB memory (no cache)

• NVIDIA CUDA 1.1– Grids/Blocks/Threads– OpenGL interface

Implementation Details

• Input choice: Bicubic Bézier Surfaces– Only affects implementation

• View Dependent Subdivision every frame– Single CPU-GPU transfer

• Final micropolygons written to a VBO– Flat-shaded and displayed

Kernels Implemented

• Dice– Regular, symmetric, parallel– 256 threads per patch– Primitive information in shared memory

• Bound/Split– Hard to ensure efficiency

Bound/Split: Efficiency Goals

• Memory Coherence– Off-chip memory accesses must be

efficient

• Computational Efficiency– Hardware SIMD must be maximally utilized

Memory Coherence during Split

• Compact work-queue after each iteration– Primitives always contiguous in memory

• Structure-Of-Arrays representation– Primitive attributes adjacent in memory

• 99.5% of all accesses fully coalesced

SIMD utilization during split• Intra-Primitive parallelism

– Independent control points– Negligible divergence

• Vectorized Split– 16 Threads per patch

• 90.16% of all branches SIMD coherent

32 threads (1 warp)

Outline

• Results

• Limitations

Results - Killeroo

• 14426 grids

• 5 levels of subdivision

• Bound/Split: 6.99 ms• Dice: 7.21ms

• 29.69 frames/second(19.92 with 16x AA)

Killeroo model courtesy: Headus Inc.

Results - Teapot

• 4823 grids

• 11 levels of subdivision

• Bound/Split: 3.46 ms• Dice: 2.42 ms

• 60.07 frames/second(30.02 with 16x AA)

Results – Random scenes

0 5000 10000 15000 200000

Number of grids

Subdivision time proportional to number

of micropolygons

Teapot Killeroo Random Model (30 patches)

Rendering

CUDA-OpenGL trans-fer

Normal generation

Subdivision

Render time BreakdownRendering lots of

small triangles

Should ideally be zero: CUDA limitation

Results - Screen-space buckets

0 128 256 384 5120

Subdivision time Memory Usage (max.) Memory Usage (avg.)

Bucket width (pixels)

)Acceptable performance,

Small memory footprint

Outline

• Results

• Limitations

Limitations

• Can’t split and dice in parallel

• Uniform dicing

• Cracks

Limitations – Uniform dicing

Limitations - Cracks

Conclusions

• Breadth-first recursive subdivision– Suits GPUs– Works fast

• Dicing Programmable tessellation is fast– 500M micropolygons/sec

• It is time to experiment with alternate graphics pipelines

Reyes in real-time rendering

• Visually superior to polygon pipeline

• Regular, highly parallel workload

• Extremely well-studied

• Good candidate for a real-time system?

Future Work

• Cracks, Displacement mapping

• Rest of Reyes– Offline quality shading– Interactive lighting– Parallel Stochastic Sampling (Wei 2008)– A-buffer (Myers 2007)

Thanks to

• Feedback and suggestions– Per Christensen, Charles Loop, Dave Luebke,

Matt Pharr and Daniel Wexler

• Financial support– DOE Early Career PI Award– National Science Foundation– SciDAC Institute for Ultrascale Visualization

• Equipment support from NVIDIA

Teapot VideoReal-time footage

Killeroo VideoReal-time footage

BACKUP SLIDESReal-Time Reyes-Style Adaptive Surface Subdivision

CUDA Thread Structure

Image courtesy: NVIDIA CUDA Programming Guide, 1.1

CUDA Memory Architecture

Image courtesy: NVIDIA CUDA Programming Guide, 1.1

Displacement Mapping

• Fairly simple if cracks can be avoided– Displace adjacent grids together

Shading & Lighting

• Interactive preview– Lpics / Lightspeed

• Shaders– Intelligent textures– File access

• Shadows

Composite/Filter Stuff

• Stochastic sampling– 20x speedup on a GPU (Wei 2008)

• A-buffer

Special Effects

• Motion Blur and DOF

• Global Illumination

• Ambient Occlusion

Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of...

Documents

Transcript of Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of...

Ritu Patney Technology Specialist – Windows Client | Microsoft Corporation.

Ijpee030204 Reyes

Reyes Correspondence

Point Reyes National Seashore Greening Charrette · Point Reyes National Seashore Greening Charrette June 25-26, 2003 Point Reyes, California Executive Summary Point Reyes National

Parallel View-Dependent Tessellation of Catmull-Clark Surfaces · 2012. 10. 22. · Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces Anjul Patney 1Mohamed

How to develop a successful Desktop Strategy (Thin, Slate, Thick, VDI... V6) Ritu Patney Optimized Desktop Specialist Microsoft Corporation.

Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney, John Owens University of California, Davis.

Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis.

Angel L. Reyes, III - Reyes Browne Reilley Law Firm | angel@reyeslaw.com | ReyesLaw.com Angel L. Reyes, III ! Angel Reyes, III is Founder and Managing Partner of the law firm Reyes

Fehech, REYEs HEROLES, REYES HEROLES,

Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces Anjul Patney 1 Mohamed S. Ebeida 2 John D. Owens 1 University of California,

1 Diana Gutierrez Reyes & Isaak Gutierrez Reyes (GUARD/P ... · 1 Diana Gutierrez Reyes & Isaak Gutierrez Reyes (GUARD/P) Case No. 10CEPR00378 Petitioner: Evelyn G. Gutierrez ...

Task Management for Irregular- Parallel Workloads … Management for Irregular-Parallel Workloads on the GPU Stanley Tzeng, Anjul Patney, and John D. Owens University of California,

Reyes León

Reyes v. Reyes

High-Quality Parallel Depth-of- Field Using Line Samples Stanley Tzeng, Anjul Patney, Andrew Davidson, Mohamed S. Ebeida, Scott A. Mitchell, John D. Owens.

Carmen Reyes

Anjul Sharma

Reyes, Jonathan

Spark Summit Presentation by Anjul Bhambhri