Enhancing and Optimizing the Render Cache

32
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics

description

Enhancing and Optimizing the Render Cache. Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics. Background. Render Cache - PowerPoint PPT Presentation

Transcript of Enhancing and Optimizing the Render Cache

Page 1: Enhancing and Optimizing the Render Cache

Enhancing and Optimizing the Render Cache

Bruce Walter

Cornell Program of Computer Graphics

George DrettakisREVES/INRIA Sophia-Antipolis

Donald P. GreenbergCornell Program of Computer Graphics

Page 2: Enhancing and Optimizing the Render Cache

Background

Render Cache• “Interactive Rendering using the Render

Cache”, Rendering Workshop 1999• Goal

- Interactive Rendering

- Exploit frame-to-frame coherence

- Decouple renderer from display framerate

- Reuse “expensive” rendering results

Page 3: Enhancing and Optimizing the Render Cache

Background

Goal: Interactive rendering

Ray tracing Path tracing

Page 4: Enhancing and Optimizing the Render Cache

Background

Modified Visual

Feedback Loop

display

application

image

userrenderer

Asynchronousinterface

Page 5: Enhancing and Optimizing the Render Cache

Background

Reproject rendered points

Original view New view

Page 6: Enhancing and Optimizing the Render Cache

Background

renderer

renderer

imageInterpolate

Sampling

Depth Cull

Project/Z-Buffer

Display process

Update Points

Page 7: Enhancing and Optimizing the Render Cache

Background

Results after each stage

Projection Depth cull Interpolation

Page 8: Enhancing and Optimizing the Render Cache

Background

Displayed image Priority image Requested pixels

Sampling

Page 9: Enhancing and Optimizing the Render Cache

Related Work

Faster ray engines• Optimize and parallelize

- E.g., Wald et al

Hardware-based display• Mesh-based

- E.g., Tapestry, Holodeck, Tole et al

• Texture-based- E.g., Corrective textures

Page 10: Enhancing and Optimizing the Render Cache

Motivation

Render Cache works well• Can enable interactive use of higher quality

ray-based renderers.

… but needs improvement• Images too small (256x256)• Gaps often visible during camera motion• Not fast enough in tracking shading

changes

Page 11: Enhancing and Optimizing the Render Cache

Enhancements

Tiled Z-Buffer• Better scalability and memory coherence

Larger Interpolation Prefilter• Can fill larger gaps between points

Predictive Sampling• Improved quality during camera motion

Point Eviction• Faster update of shading changes

Page 12: Enhancing and Optimizing the Render Cache

Enhancements

Code Optimization• Use of SIMD (MMX/SSE/SSE2)• Data layout, branch conversions, etc.

Publicly Available• For evaluation, comparison, or use

- Non-commercial binary release

- URL is in the paper

Page 13: Enhancing and Optimizing the Render Cache

Memory Coherence

Change from R10K to Pentium 4• Cache reduced from 4MB to 256K• Clock increased from 195MHz to 1.7GHz

- Cache misses much more expensive

Change from 256x256 to 512x512• Point data ~ 5MB, Image data ~ 3MB

- Much bigger than cache

Projection and Z-Buffer problematic

Page 14: Enhancing and Optimizing the Render Cache

Projection and Z-Buffer

Point Cloud 5MB

Image - 3MB

Random order memory access- Read/modify/write operation is memory latency

limited

Page 15: Enhancing and Optimizing the Render Cache

Tiled Projection and Z-Buffer

Point Cloud 5MB

Image - 3MB

Divide image into tiles- Tiles sized to fit in cache

Tile Buckets - 4MB

Page 16: Enhancing and Optimizing the Render Cache

Tiled Projection and Z-Buffer

Point Cloud 5MB

Image - 3MB

Project and bucket sort by tile

Tile Buckets - 4MB

Page 17: Enhancing and Optimizing the Render Cache

Tiled Projection and Z-Buffer

Point Cloud 5MB

Image - 3MB

Z-Buffer each tile separately

Tile Buckets - 4MB

Page 18: Enhancing and Optimizing the Render Cache

Tiled Projection and Z-Buffer

Point Cloud 5MB

Image - 3MB

Uses more memory and instructions- But it is faster (25ms instead of 42ms)

Tile Buckets - 4MB

Page 19: Enhancing and Optimizing the Render Cache

Interpolation Filters

Larger filters• Fill larger gaps in point data• Generally more expensive• Result in more blurring of the image

The previous Render Cache• Used a 3x3 weighted filter

- Can only fill very small gaps

- Introduces only a small amount of blurring

Page 20: Enhancing and Optimizing the Render Cache

Prefilter

Add a larger “backup” filter• Results used only when 3x3 filter fails• Uses a uniform 7x7 filter

- Can be computed cheaply

• Can fill in much larger gaps• Does not affect sampling priorities• Actually executed first then overwritten

- Hence the name “prefilter”

Page 21: Enhancing and Optimizing the Render Cache

Prefilter

3x3 filter only 7x7 prefilter only Both filters

Page 22: Enhancing and Optimizing the Render Cache

Predictive Sampling

Sampling is purely reactive• Helps to guide sparse sampling• Samples returned in later frame

- Problem when large new regions become visible

Predict large gaps ahead of time• Project using a predicted camera• Request samples before they are needed

Page 23: Enhancing and Optimizing the Render Cache

Predictive Sampling

Projection is expensive• 47% of original render cache cost

Use simplified projection• No Z-Buffer

- Only need to find regions with no points

• Reduced resolution- 1/4 width and height (1/16 # of pixels)

• Store only 1 byte per pixel- Occupancy image fits easily in cache

Page 24: Enhancing and Optimizing the Render Cache

Predictive Sampling

No Prediction With Prediction

Example during rapid camera rotation

Page 25: Enhancing and Optimizing the Render Cache

Algorithm Overview

renderer

renderer

image

Interpolate

Sampling

Depth Cull

Z-Buffer

Update Points

Prediction

Project/Sort

Prefilter

Page 26: Enhancing and Optimizing the Render Cache

Point Eviction

Stale data can be worse than no data• Points may live a long time at high ratios

- Not enough new samples to overwrite old

• Color change detection already exists- Enhances sampling in regions of change

- Works by aging nearby points

Evict points beyond an age limit• Speeds image convergence

Page 27: Enhancing and Optimizing the Render Cache

SIMD Optimizations

Utilize MMX/SSE/SSE2 instructions• Project four points at once• Process R,G,B channel simultaneously• Add memory prefetches

- Automatic prefetch works well for linear access

• Convert branches to data dependencies- Compares set masks of zeroes or ones

- Use boolean operations instead of branches

• Roughly a factor of two total speedup

Page 28: Enhancing and Optimizing the Render Cache

Results

Ray trace only (1.8 fps) Render Cache (9 fps)

Single 1.7GHz processor - rotating camera

Page 29: Enhancing and Optimizing the Render Cache

Results

Timing: 62.1 ms (up to 16 fps)• 512x512 image, render cache only• 1.7GHz Pentium 4 processor

Update Points

Prediction

ProjectZ-Buffer

Depth Cull

Prefilter

Filter / Smooth

Sampling

Page 30: Enhancing and Optimizing the Render Cache

Scalability with Image Size

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

0 50 100 150 200 250 300 350

Fra

me

Siz

e (

Pix

els

)

Frame Time (ms)

512x512

1200x1200

Page 31: Enhancing and Optimizing the Render Cache

Results

Try it for yourself• Download publicly available binary

- Includes Render Cache and simple Ray Tracer

- Requires a Pentium 4 and Java Web Start

- Free for evaluation and internal use

- Http://www.graphics.cornell.edu/research/interactive/rendercache

Demo

Page 32: Enhancing and Optimizing the Render Cache

The End