Practical Parallel Processing for Today’s Rendering Challenges SIGGRAPH 2001 Course 40

279
Practical Parallel Processing for Today’s Rendering Challenges -- 1 Practical Parallel Processing for Today’s Rendering Challenges SIGGRAPH 2001 Course 40 Los Angeles, CA

description

Practical Parallel Processing for Today’s Rendering Challenges SIGGRAPH 2001 Course 40 Los Angeles, CA. Speakers. Alan Chalmers, University of Bristol Tim Davis, Clemson University Erik Reinhard, University of Utah Toshi Kato, SquareUSA. Schedule. Introduction - PowerPoint PPT Presentation

Transcript of Practical Parallel Processing for Today’s Rendering Challenges SIGGRAPH 2001 Course 40

Page 1: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 1

Practical Parallel Processing for Today’s Rendering Challenges

SIGGRAPH 2001 Course 40Los Angeles, CA

Page 2: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 2

SpeakersSpeakers

Alan Chalmers, University of Bristol Tim Davis, Clemson University Erik Reinhard, University of Utah Toshi Kato, SquareUSA

Page 3: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 3

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion

Page 4: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 4

ScheduleSchedule

Introduction (Davis) Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion

Page 5: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 5

The Need for SpeedThe Need for Speed

Graphics rendering is time-consuming• large amount of data in a single image

• animations much worse

Demand continues to rise for high-quality graphics

Page 6: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 6

Rendering and Parallel ProcessingRendering and Parallel Processing

A holy union Many graphics rendering tasks can be

performed in parallel Often “embarrassing parallel”

Page 7: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 7

3-D Graphics Boards3-D Graphics Boards

Getting better Perform “tricks” with texture mapping Steve Jobs’ remark on constant frame

rendering time

Page 8: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 8

Parallel / Distributed Rendering

Fundamental Issues

• Task Management

Task subdivision, Migration, Load balancing

• Data Management

Data distributed across system

• Communication

Fundamental Issues

• Task Management

Task subdivision, Migration, Load balancing

• Data Management

Data distributed across system

• Communication

Page 9: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 9

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues

(Chalmers) Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion

Page 10: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 10

Introduction

“Parallel processing is like a dog’s walking on its hind legs. It is not done well, but you are surprised to find it done at all”

[Steve Fiddes (apologies to Samuel Johnson)]

• Co-operation

• Dependencies

• Scalability

• Control

“Parallel processing is like a dog’s walking on its hind legs. It is not done well, but you are surprised to find it done at all”

[Steve Fiddes (apologies to Samuel Johnson)]

• Co-operation

• Dependencies

• Scalability

• Control

Page 11: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 11

Co-operation

Solution of a single problem

• One person takes a certain time to solve the problem

• Divide problem into a number of sub-problems

• Each sub-problem solved by a single worker

• Reduced problem solution time

BUT

• co-operation overheads

Solution of a single problem

• One person takes a certain time to solve the problem

• Divide problem into a number of sub-problems

• Each sub-problem solved by a single worker

• Reduced problem solution time

BUT

• co-operation overheads

Page 12: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 12

Working TogetherWorking Together

Overheads• access to pool

• collision avoidance

Page 13: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 13

DependenciesDependencies

Divide a problem into a number of distinct stages• Parallel solution of one stage before next can start

• May be too severe no parallel solution

each sub-problem dependent on previous stage

• Dependency-free problems

order of task completion unimportant

BUT co-operation still required

Page 14: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 14

Building with BlocksBuilding with Blocks

Strictly sequential Dependency-free

Page 15: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 15

ScalabilityScalability

Upper bound on the number of workers• Additional workers will NOT improve solution time

• Shows how suitable a problem is for parallel processing

• Given problem finite number of sub-problems

more workers than tasks

• Upper bound may be (a lot) less than number of tasks

bottlenecks

Page 16: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 16

Bottleneck at Doorway Bottleneck at Doorway

@ $ &

More workers may result in LONGER solution time

Page 17: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 17

ControlControl

Required by all parallel implementations• What constitutes a task

• When has the problem been solved

• How to deal with multiple stages

• Forms of control

centralised

distributed

Page 18: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 18

Control RequiredControl Required

Sequential

Parallel

Page 19: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 19

Inherent DifficultiesInherent Difficulties

Failure to successfully complete• Sequential solution

deficiencies in algorithm or data

• Parallel solution

deficiencies in algorithm or data

deadlock

data consistency

Page 20: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 20

Novel DifficultiesNovel Difficulties

Factors arising from implementation• Deadlock

processor waiting indefinitely for an event

• Data consistency

data is distributed amongst processors

• Communication overheads

latency in message transfer

Page 21: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 21

Evaluating Parallel ImplementationsEvaluating Parallel Implementations

Realisation penalties• Algorithmic penalty

nature of the algorithm chosen

• Implementation penalty

need to communicate

concurrent computation & communication activities

idle time

Page 22: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 22

Solution TimesSolution Times

Page 23: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 23

Task ManagementTask Management

Providing tasks to the processors• Problem decomposition

algorithmic decomposition

domain decomposition

• Definition of a task

• Computational Model

Page 24: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 24

Problem DecompositionProblem Decomposition

Exploit parallelism• Inherent in algorithm

algorithmic decomposition

parallelising compilers

• Applying same algorithm to different data items

domain decomposition

need for explicit system software support

Page 25: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 25

Abstract Definition of a TaskAbstract Definition of a Task

• Principal Data Item (PDI) - application of algorithm

• Additional Data Items (ADIs) - needed to complete computation

Page 26: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 26

Computational ModelsComputational Models

Determines the manner tasks are allocated to PEs• Maximise PE computation time

• Minimise idle time

load balancing

• Evenly allocate tasks amongst the processors

Page 27: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 27

Data Driven ModelsData Driven Models

All PDIs allocated to specific PEs before computation starts

Each PE knows a priori which PDIs it is responsible for

Balanced (geometric decomposition)• evenly allocate tasks amongst the processors

• if PDIs not exact multiple of Pes then some PEs do one extra task

portion at each PE = number of PDIsnumber of PEs

Page 28: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 28

Balanced Data DrivenBalanced Data Driven

+

solution time = initial distribution

result collation

+243

Page 29: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 29

Demand Driven ModelDemand Driven Model

Task computation time unknown• Work is allocated dynamically as PEs become idle

PEs no longer bound to particular PDIs

• PEs explicitly demand new tasks

• Task supplier process must satisfy these demands

Page 30: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 30

Dynamic Allocation of TasksDynamic Allocation of Tasks

solution time =+

2 x total comms time

number of PEstotal comp time for all PDIs

Page 31: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 31

Task Supplier ProcessTask Supplier Process

Simple demand driven task supplier

PROCESS Task_Supplier() Begin remaining_tasks := total_number_of_tasks

(* initialise all processors with one task *) FOR p = 1 TO number_of_PEs SEND task TO PE[p] remaining_tasks := remaining_tasks -1

WHILE results_outstanding DO RECEIVE result FROM PE[i] IF remaining_tasks > 0 THEN SEND task TO PE[i] remaining_tasks := remaining_tasks -1 ENDIF

End (* Task_Supplier *)

Page 32: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 32

Load BalancingLoad Balancing

All PEs should complete at the same time• Some PEs busy with complex tasks

• Other PEs available for easier tasks

• Computation effort of each task unknown

hot spot at end of processing unbalanced solution

• Any knowledge about hot spots should be used

Page 33: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 33

Task Definition & GranularityTask Definition & Granularity

Computational elements• Atomic element (ray-object intersection)

sequential problem’s lowest computational element

• Task (trace complete path of one ray)

parallel problem’s smallest computational element

• Task granularity

number of atomic units is one task

Page 34: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 34

Task PacketTask Packet

Unit of task distribution• Informs a PE of which task(s) to perform

• Task packet may include

indication of which task(s) to compute

data items (the PDI and (possibly) ADIs)

• Task packet for ray tracer one or more rays to be traced

Page 35: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 35

Algorithmic DependenciesAlgorithmic Dependencies

Algorithm adopted for parallelisation:• May specify order of task completion

• Dependencies MUST be preserved

• Algorithmic dependencies introduce:

synchronisation points distinct problem stages

data dependencies careful data management

Page 36: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 36

Distributed Task ManagementDistributed Task Management

Centralised task supply• All requests for new tasks to System Controller

bottleneck

• Significant delay in fetching new tasks

Distributed task supply

• task requests handled remotely from System Controller

• spread of communication load across system

• reduced time to satisfy task request

Page 37: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 37

Preferred Bias AllocationPreferred Bias Allocation

Combining Data driven & Demand driven• Balanced data driven

tasks allocated in a predetermined manner

• Demand driven

tasks allocated dynamically on demand

• Preferred Bias: Regions are purely conceptual

enables the exploitation of any coherence

Page 38: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 38

Conceptual RegionsConceptual Regions

• task allocation no longer arbitrary

Page 39: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 39

Data ManagementData Management

Providing data to the processors• World model

• Virtual shared memory

• Data manager process

local data cache

requesting & locating data

• Consistency

Page 40: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 40

Remote Data FetchesRemote Data Fetches

Advanced data management• Minimising communication latencies

Prefetching

Multi-threading

Profiling

• Multi-stage problems

Page 41: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 41

Data RequirementsData Requirements

Requirements may be large• Fit in the local memory of each processor

world model

• Too large for each local memory

distributed data

provide virtual world model/virtual shared memory

Page 42: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 42

Virtual Shared Memory (VSM)Virtual Shared Memory (VSM)

Providing a conceptual single memory space• Memory is in fact distributed

• Request is the same for both local & remote data

• Speed of access may be (very) differentSystem Software Provided by DM process

Compiler HPF, ORCA

Operating System Coherent Paging

Hardware DDM, DASH, KSR-1

Higherlevel

Lowerlevel

Page 43: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 43

ConsistencyConsistency

Read/write can result in inconsistencies• Distributed memory

multiple copies of the same data item

• Updating such a data item

update all copies of this data item

invalidate all other copies of this data item

Page 44: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 44

Minimising Impact of Remote DataMinimising Impact of Remote Data

Failure to find a data item locally remote fetch• Time to find data item can be significant

• Processor idle during this time

• Latency difficult to predict

eg depends on current message densities

• Data management must minimise this idle time

Page 45: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 45

Data Management TechniquesData Management Techniques

Hiding the Latency• Overlapping the communication with computation

prefetching

multi-threading

Minimising the Latency• Reducing the time of a remote fetch

profiling

caching

Page 46: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 46

PrefetchingPrefetching

Exploiting knowledge of data requests• A priori knowledge of data requirements

nature of the problem

choice of computational model

• DM can prefetch them (up to some specified horizon)

available locally when required

overlapping communication with computation

Page 47: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 47

Multi-ThreadingMulti-Threading

Keeping PE busy with useful computation• Remote data fetch current task stalled

• Start another task (Processor kept busy)

separate threads of computation (BSP)

• Disadvantages: Overheads

Context switches between threads

Increased message densities

Reduced local cache for each thread

Page 48: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 48

Results for Multi-ThreadingResults for Multi-Threading

• More than optimal threads reduces performance

• “Cache 22” situation

less local cache more data misses more threads

Page 49: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 49

ProfilingProfiling

Reducing the remote fetch time• At the end of computation all data requests are

known

if known then can be prefetched

• Monitor data requests for each task

build up a “picture” of possible requirements

• Exploit spatial coherence (with preferred bias allocation)

prefetch those data items likely to be required

Page 50: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 50

Spatial CoherenceSpatial Coherence

Page 51: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 51

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems (Davis)

Practical Applications

Summary / Discussion

Page 52: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 52

Classification of Parallel Rendering Systems

Classification of Parallel Rendering Systems

Parallel rendering performed in many ways

Classification by• task subdivision

polygon rendering ray tracing

• hardware

parallel hardware distributed computing

Page 53: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 53

Classification by Task SubdivisionClassification by Task Subdivision

Original rendering task broken into smaller pieces to be processed in parallel

Depends on type of rendering Goals

• maximize parallelism

• minimize overhead, including communication

Page 54: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 54

Task Subdivision in Polygon Rendering

Task Subdivision in Polygon Rendering

Rendering many primitives Polygon rendering pipeline

• geometry processing (transformation, clipping, lighting)

• rasterization (scan conversion, visibility, shading)

Page 55: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 55

Polygon Rendering PipelinePolygon Rendering Pipeline

Graphics database traversal

Display

GeometryProcessing

Rasterization

… G GG G

… R RR R

Page 56: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 56

Primitive Processing and SortingPrimitive Processing and Sorting

View processing of primitives as sorting problem• primitives can fall anywhere on or off the screen

Sorting can be done in either software or hardware, but mostly done in hardware

Page 57: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 57

Primitive Processing and SortingPrimitive Processing and Sorting

Sorting can occur at various places in the rendering pipeline• during geometry processing (sort-first)

• between geometry processing and rasterization (sort-middle)

• during rasterization (sort-last)

Page 58: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 58

Sort-firstSort-first

GeometryProcessing

Rasterization

Graphics database(arbitrarily partitioned)

Display

G GG G …

R RR R

Redistribute “raw” primitives

(Pre-transform)

Page 59: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 59

Sort-first MethodSort-first Method

Each processor (renderer) assigned a portion of the screen

Primitives arbitrarily assigned to processors

Processors perform enough calculations to send primitives to correct renderers

Processors then perform geometry processing and rasterization for their primitives in parallel

Page 60: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 60

Screen SubdivisionScreen Subdivision

Page 61: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 61

Sort-first DiscussionSort-first Discussion

+ Communication costs can be kept low

- Duplication of effort if primitives fall into more than one screen area

- Load imbalance if primitives concentrated

- Very few, if any, sort-first renderers built

Page 62: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 62

Sort-middleSort-middle

GeometryProcessing

Rasterization

Graphics database(arbitrarily partitioned)

Display

G GG G

R RR R

Redistribute screen-space primitives

Page 63: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 63

Sort-middle MethodSort-middle Method

Primitives arbitrarily assigned to renderers

Each renderer performs geometry processing on its primitives

Primitives then redistributed to rasterizers according to screen region

Page 64: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 64

Sort-middle DiscussionSort-middle Discussion

+ Natural breaking point in graphics pipeline

- Load imbalance if primitives concentrated in particular screen regions

+ Several successful hardware implementations• PixelPlanes 5

• SGI Reality Engine

Page 65: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 65

Sort-lastSort-last

GeometryProcessing

Rasterization

Graphics database(arbitrarily partitioned)

Display

G GG G …

R RR R

Redistribute pixels, samples, orfragments

(Compositing)

Page 66: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 66

Sort-last MethodSort-last Method

Primitives arbitrarily distributed to renderers

Each renderer computes pixel values for its primitives

Pixel values are then sent to processors according to screen location

Rasterizers perform visibility and compositing

Page 67: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 67

Sort-last DiscussionSort-last Discussion

+ Less prone to load imbalance

- Pixel traffic can be high

+ Some working systems • Denali

Page 68: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 68

Task Subdivision in Ray TracingTask Subdivision in Ray Tracing

Ray tracing often prohibitively expensive on single processor

Prime candidate for parallelization• each pixel can be rendered independently

Processing easily subdivided• image space subdivision

• object space subdivision

• object subdivision

Page 69: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 69

Image Space SubdivisionImage Space Subdivision

Page 70: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 70

Image Space Subdivision DiscussionImage Space Subdivision Discussion

+ Straightforward

+ High parallelism possible

- Entire scene database must reside on each processor• need adequate storage

+ Low processor communication

Page 71: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 71

Image Space Subdivision DiscussionImage Space Subdivision Discussion

- Load imbalance possible• screen space may be further subdivided

+ Used in many parallel ray tracers• works better with MIMD machines

• distributed computing environments

Page 72: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 72

Object Space SubdivisionObject Space Subdivision

3-D object space divided into voxels Each voxel assigned to a processor Rays are passed from processor to

processor as voxel space is traversed

Page 73: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 73

Object Space Subdivision Discussion

Object Space Subdivision Discussion

+ Each processor needs only scene information associated with its voxel(s)

- Rays must be tracked through voxel space

+ Load balance good

- Communication can be high

+ Some successful systems

Page 74: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 74

Object PartitioningObject Partitioning

Each object in the scene is assigned to a processor

Rays passed as messages between processors

Processors check for intersection

Page 75: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 75

Object Partitioning DiscussionObject Partitioning Discussion

+ Load balancing good

- Communication high due to ray message traffic

- Fewer implementations

Page 76: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 76

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence (Davis)

• Interactive Ray Tracing• Parallel Rendering and the Quest for Realism: The Kilauea

Massively Parallel Ray Tracer Summary / Discussion

Page 77: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 77

Practical Experiences at Clemson

Problems with Rendering Current Resources Deciding on a Solution A New Render Farm

Page 78: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 78

A Demand for Rendering

Computer Animation course 3 SIGGRAPH animation submissions

• render over semester break

Page 79: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 79

Current Resources

dedicated lab• 8 SGI 02’s (R12000, 384 MB)

general-purpose lab• 4 SGI 02’s

shared lab• dual-pipe Onyx2 (8 R12000, 8 GB)

• 10 SGI 02’s (R12000, 256 MB)

offices• 5 SGI 02’s

Page 80: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 80

Resource Problems Rendering prohibits interactive sessions Little organized control over resources

• users must be self-monitoring

m renders on n machines 1 render on n/m machines

Disk space Cross-platform distributed rendering to PCs

problematic• security (rsh)

• distributed rendering software

• directory paths

Page 81: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 81

Short-term Solutions

Distributed rendering restricted to late night

Resources partitioned

Page 82: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 82

Problems with Maya

video Traditional distributed computing

problems• dropped frames

• incomplete frames

• tools developed

Page 83: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 83

Problems with Maya

Tools (DropCheck)

Page 84: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 84

Problems with Maya

Tools (Load Scan)

Page 85: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 85

Problems with Maya

Animation inconsistencies• next slide

Some frames would not render Particle system inconsistencies

Page 86: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 86

Problems with Maya

Page 87: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 87

Rendering Tips

Layering

Page 88: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 88

Rendering Tips

Layering

Page 89: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 89

Deciding on a Solution - RenderDrive

RenderDrive by ART (Advanced Rendering Technology)• network appliance for ray tracing

• 16-48 specialized processors

• claims speedups of 15-40 over Pentium III

• 768MB to 1.5GB memory

• 4GB hard disk cache

Page 90: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 90

Deciding on a Solution - RenderDrive

• plug-in interface to Maya

• Renderman ray tracer

• $15K - $25K

Page 91: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 91

Deciding on a Solution - PCs

Network of PCs as a render farm 10 PCs each with 1.4GHz, 1GB memory,

and 40GB hard drive Maya will run under Windows 2000 or

Linux (Maya 4.0) Distributed rendering software not

included for Windows 2000

Page 92: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 92

Deciding on a Solution - PCs Win

RenderDrive had some unusual anomalies

Interactive capabilities Scan-line or ray tracing Distributed rendering software may be

included Problems with security still exist

• shared file system

Page 93: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 93

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence (Davis)

• Interactive Ray Tracing

• Parallel Rendering and the Quest for Realism: The Kilauea Massively Parallel Ray Tracer

Summary / Discussion

Page 94: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 94

Agenda

Background Temporal Depth-Buffer Frame Coherence Algorithm Parallel Frame Coherence Algorithm

Page 95: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 95

Background - Ray TracingBackground - Ray Tracing

Closest to physical model of light High cost in terms of time / complexity

Page 96: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 96

Background - Frame CoherenceBackground - Frame Coherence

Frame coherence • those pixels that do not change from one frame to

the next

• derived from object and temporal coherence

We should not have to re-compute those pixels whose values will not change• writing pixels to frame files

Page 97: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 97

Background - Test AnimationBackground - Test Animation

Glass Bounce (60 frames at 320x240; 5 obj)

Page 98: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 98

Background - Frame Coherence Background - Frame Coherence

Page 99: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 99

Previous WorkPrevious Work

Frame coherence• moving camera/static world [Hubschman and

Zucker 81]

• estimated frames [Badt 88]

• stereoscopic pairs [Adelson and Hodges 93/95]

• 4D bounding volumes [Glassner 88]

• voxels and ray tracking [Jevans 92]

• incremental ray tracing [Murakami90]

Page 100: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 100

Previous Work (cont.)Previous Work (cont.)

Distributed computing• Alias and 3D Studio

• most major productions starting with Toy Story [Henne 96]

Page 101: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 101

GoalsGoals

Render exactly the same set of frames in much less time

Work in conjunction with other optimization techniques

Run on a variety of platforms Extend a currently popular ray tracer

(POV-Ray) to allow for general use

Page 102: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 102

Temporal Depth-BufferTemporal Depth-Buffer

Similar to traditional z-buffer For each pixel, store a temporal depth in

frame units

1

2

3

1

2

3

5 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 5

Page 103: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 103

Frame Coherence AlgorithmFrame Coherence Algorithm

Page 104: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 104

Frame Coherence AlgorithmFrame Coherence Algorithm

Page 105: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 105

Identify volume within 3D object space where movement occurs

Divide volume uniformly into voxels For each voxel, create a list of frame

numbers in which changing objects inhabit this voxel

Frame Coherence AlgorithmFrame Coherence Algorithm

Page 106: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 106

In each frame, track rays through voxels for each pixel

From the voxels traversed, find the one with the lowest frame number

Record that number in the temporal depth-buffer

Frame Coherence AlgorithmFrame Coherence Algorithm

Page 107: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 107

Frame Coherence AlgorithmFrame Coherence Algorithm

for each frame of the animation

for each pixel that needs to be computed for this frame

trace the rays for this pixel

for each voxel that any of these rays intersect

get the next frame number to compute

set the t-buffer entry to the lowest frame number found

Page 108: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 108

Frame Coherence AlgorithmFrame Coherence Algorithm

1

2

3

5 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 5

Page 109: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 109

Voxel Volume Voxel Volume

Uniform voxel spatial subdivision Voxel can be non-cubical Ways to determine voxel volume

• user-supplied

• pre-processing phase

active voxel marking

in distributed environment, done by master or slave or both

Page 110: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 110

Frame Coherence ExampleFrame Coherence Example

Page 111: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 111

Test AnimationTest Animation

Pool Shark (620 frames at 640x480; 174 obj)

Page 112: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 112

Test Animations - ProblemTest Animations - Problem

Bounding box problem

Page 113: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 113

ResultsResults

standardalgorithm

frame coherencealgorithm

ratio of framecoherence to standard

speedup

total number ofrays 47,841,269 13,259,380 0.27 --

total parse time0:48 1:30 1.88 --

first framerendering time 6:34 8:49 1.34 0.75

average framerendering time 7:15 3:05 0.43 2.33

total framerendering time 5:26:55 2:19:51 0.43 2.33

Page 114: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 114

Frame Coherence DiscussionFrame Coherence Discussion

Localized movement can have global effects

Performance depends on both the number and complexity of recomputed pixels

Issues• overhead

• antialiasing

• motion blur

Page 115: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 115

Uses less memory than other methods Simple Can be used with other algorithms

Temporal Depth-Buffer DiscussionTemporal Depth-Buffer Discussion

Page 116: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 116

Parallel Frame Coherence AlgorithmParallel Frame Coherence Algorithm

Distributed computing environment 1-8 Sun Sparc Ultra 5 processors running

at 270 MHz Coarse-grain parallelism Load balancing

• divide work among processors

• keep data together for frame coherence

Page 117: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 117

Load BalancingLoad Balancing

Image space subdivision• each processor computes a subregion for the

entire length of the run

Recursively subdivide subsequences to keep processors busy

… …… …

Page 118: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 118

Screen SubdivisionScreen Subdivision

Page 119: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 119

Load BalancingLoad Balancing

Coarse bin packing: find block with smallest number of computed frames

Keep statistics on average first frame time and average coherent frame time

Find a hole in the sequence Leave some free frames before new start

Page 120: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 120

Load Balancing ExampleLoad Balancing Example

18414

3

4

2

614

3

4

2

1141914

speedprocessor new

speedprocessor current

2

1 start tmp - end holestart tmp framestart

143113

811

3

4

15

3011

speedprocessor new

speedprocessor current

time frame avg

time framefirst start hole start tmp

h o l es t a r t

h o l ee n d

f i r s t f r a m e t i m e = 3 0a v g f r a m e t i m e = 1 5

c u r r e n t p r o c e s s o r s p e e d = 4n e w p r o c e s s o r s p e e d = 3

t m ps t a r t

s t a r tf r a m e

… …1 0 1 91 81 71 61 51 41 1 1 31 2 2 0

Page 121: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 121

Results - Parallel Frame CoherenceResults - Parallel Frame Coherence

standardalgorithm

parallel with 8machines

speedup parallel frame coherencewith 8 machines

speedup

total number ofrays 47,841,269 49,161,582 1.03 18,299,347 0.38

total parse time0:48 -- -- -- --

first framerendering time 6:34 -- -- -- --

average framerendering time 7:15 1:05 6.7 :34 12.9

total framerendering time 5:26:55 49:49 6.6 25:47 12.9

Page 122: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 122

ResultsResultsstandardalgorithm

frame coherencealgorithm

ratio of framecoherence to standard

speedup

total number ofrays 15,731,252 6,386,883 0.41 --

total parse time0:11 0:19 1.73 --

first framerendering time 2:39 3:19 1.25 0.80

average framerendering time 2:42 1:39 0.61 1.64

total framerendering time 2:42:26 1:39:02 0.61 1.64

number ofprocessors

total numberof rays

ratio tosingle processor

average framerendering time

total renderingtime

speedup

1 15,731,252 1.00 2:42 2:42:26 1.00

2 5,890,290 0.37 :38 38:25 4.23

4 5,913,926 0.38 :22 22:12 7.31

8 6,063,338 0.39 :16 16:28 9.86

12 6,086,781 0.39 :12 11:37 13.98

16 6,323,673 0.40 :11 10:50 14.99

Page 123: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 123

Another Test AnimationAnother Test Animation

Soda Worship (60 frames at 160x120; 839 obj)

Page 124: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 124

Another Test AnimationAnother Test Animation

Page 125: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 125

ResultsResultsstandardalgorithm

frame coherencealgorithm

ratio of framecoherence to standard

speedup

total number ofrays 44,454,548 19,944,939 0.45 --

total parse time3:06 3:47 1.04 --

first framerendering time 27:54 29:14 1.07 0.94

average framerendering time 28:07 15:07 0.54 1.86

total framerendering time 28:10:10 15:11:27 0.54 1.85

number ofprocessors

total numberof rays

ratio tosingle processor

average framerendering time

total renderingtime

speedup

1 44,454,548 1.00 28:10 28:10:10 1.00

2 22,163,526 0.50 15:11 11:48:11 2.39

4 22,286,422 0.50 7:45 4:27:26 6.32

8 22,409,023 0.50 3:58 2:16:34 12.38

12 23,125,140 0.52 2:38 1:31:05 18.56

16 23,180,741 0.52 2:02 1:12:15 23.39

Page 126: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 126

Good speedup Multiplicative speedup with both Speedup limitations

• voxel approximation

• writing pixels to frame files (communication)

Results DiscussionResults Discussion

Page 127: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 127

ConclusionsConclusions

Frame coherence algorithm combined with distributed computing provides good speedup

Algorithm scales well Techniques are useful and accessible to

a wide variety of users Benefits depend on inherent properties

of the animation

Page 128: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 128

Shameless AdvertisementShameless Advertisement

Masters of Fine Arts in Computing (MFAC)• special effects and animation courses

• two year program

Clemson Computer Animation Festival in Fall 2002

Page 129: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 129

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence

• Interactive Ray Tracing (Reinhard)

• Parallel Rendering and the Quest for Realism: The Kilauea Massively Parallel Ray Tracer

Summary / Discussion

Page 130: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 130

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample reuse techniques

Page 131: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

IntroductionIntroduction

Page 132: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 132

Interactive Ray TracingInteractive Ray Tracing

Renders effects not available using other rendering algorithms

Feasible on high-end supercomputers provided suitable hardware is chosen

Scales sub-linearly in scene complexity Scales almost linearly in number of

processors

Page 133: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 133

Hardware ChoicesHardware Choices

Shared memory vs. distributed memory Latency and throughput for pixel

communication

Choice Shared memory• This section of the course focuses on SGI Origin

series super computers

Page 134: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 134

Shared MemoryShared Memory

Shared address space Physically distributed memory ccNUMA architecture

Page 135: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 135

SGI Origin 2000 ArchitectureSGI Origin 2000 Architecture

Page 136: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 136

ImplicationsImplications

ccNUMA machines are easy to program, But it is more difficult to generate

efficient code

Memory mapping and processor placement may be important for certain applications

Topic returns later in this course

Page 137: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 137

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample re-use techniques

Page 138: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Interactive Ray TracingInteractive Ray Tracing

Page 139: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 139

Basic AlgorithmBasic Algorithm

Master-slave configuration Master (display thread) displays results

and farms out ray tasks Slaves produce new rays Task size reduced towards end of each

frame• Load balancing

• Cache coherence

Page 140: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 140

Tracing a Single RayTracing a Single Ray

Use spatial subdivisions for ray acceleration (assumed familiar)

Use grid or bounding volume hierarchy Could be optimized further, but good

results have been obtained with these acceleration structures

Efficiency mainly due to low level optimization

Page 141: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 141

Low Level OptimizationLow Level Optimization

Ray tracing in general:• Ray coherence: neighboring rays tend to intersect

the same objects

• Cache coherence: objects previously intersected are likely to still reside in cache for current ray

• Memory access patterns are important (next slide)

Page 142: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 142

Memory AccessMemory Access

On SGI Origin series computers:• Memory allocated for a specific process may be

located elsewhere in the machine reading memory may be expensive

• Processes may migrate to other processors when executing a system call whole cache becomes invalidated; previously local memory may now be remote and more expensive to access

Page 143: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 143

Memory Access (2)Memory Access (2)

Pin down processes to processors Allocate memory close to where the

processes run that will use this memory

Use sysmp and sproc for processor placement

Use mmap or dplace for memory placement

Page 144: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 144

Further Low Level OptimizationsFurther Low Level Optimizations

Know the architecture you work on (Appendix III.A in the course notes)

Use profiling to find expensive bits of code and cache misses (Appendix III.B in the course notes)

Use padding to fit important data structures on a single cache line

Page 145: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 145

Frameless RenderingFrameless Rendering

Display pixel as soon as it is computed No concept of frames

• Perceptually preferable

• Equivalent of a full frame takes longer to compute

• Less efficient exploitation of cache coherence

• This alternative will return later in this course

Page 146: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 146

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample re-use techniques

Page 147: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 147

Animation and Interactive Ray Tracing

Animation and Interactive Ray Tracing

Page 148: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 148

Why Animation?Why Animation?

Once interactive rendering is feasible, walk-through is not enough

Desire to manipulate the scene interactively

Render preprogrammed animation paths

Page 149: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 149

Issues to Be AddressedIssues to Be Addressed

What stops us from animating objects?

• Answer: spatial subdivisions

• Acceleration structures normally built during pre-processing

• They assume objects are stationary

Page 150: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 150

Possible SolutionsPossible Solutions

Target applications that require a small number of objects to be manipulated/ animated• Render these objects separately

Traversal cost will be linear in the number of animated objects

Only feasible for extremely small number of objects

Page 151: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 151

Possible Solutions (2)Possible Solutions (2)

Target small number of manipulated or animated objects• Modify existing spatial subdivisions

For each frame delete object from data structure

Update object’s coordinates

Re-insert object into data structure

• This is our preferred approach

Page 152: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 152

Spatial SubdivisionSpatial Subdivision

Should be able to deal with• Basic operations such as insertion and deletion of

objects should be rapid

• User manipulation can cause the extent of the scene to grow

Page 153: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 153

Subdivisions InvestigatedSubdivisions Investigated

Regular grid Hierarchical grid

• Borrows from octree spatial subdivision

• In our case this is a full tree: all leaf nodes are at the same depth

Both acceleration structures are investigated in the next few slides

Page 154: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 154

Regular Grid Data StructureRegular Grid Data Structure

We assume familiarity with spatial subdivisions!

Page 155: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 155

Object Insertion Into GridObject Insertion Into Grid

Compute bounding box of object Compute overlap of bounding box with

grid voxels Object is inserted into overlapping voxels

Object deletion works similarly

Page 156: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 156

Extensions to Regular GridExtensions to Regular Grid

Dealing with expanding scenes requires

• Modifications to object insertion/deletion

• Ray traversal

Page 157: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 157

Extensions to Regular Grid (2)Extensions to Regular Grid (2)

Page 158: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 158

Features of New Grid Data StructureFeatures of New Grid Data Structure

We call this an ‘Interactive Grid’• Straightforward object insertion/deletion

• Deals with expanding scenes

• Insertion cost depends on relative object size

• Traversal cost somewhat higher than for regular grid

Page 159: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 159

Hierarchical GridHierarchical Grid

Objectives• Reduce insertion/deletion cost for larger objects

• Retain advantages of interactive grid

Page 160: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 160

Hierarchical Grid (2)Hierarchical Grid (2)

Page 161: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 161

Hierarchical Grid (3)Hierarchical Grid (3)

Build full octree with all leaf nodes at the same level• Allow objects to reside in leaf nodes as well as in

nodes higher up in the hierarchy

• Each object can be inserted into one or more voxels of at most one level in the hierarchy

• Small object reside in leaf nodes, large objects reside elsewhere in the hierarchy

Page 162: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 162

Hierarchical Grid (4)Hierarchical Grid (4)

Features:• Deals with expanding scenes similar to interactive

grid

• Reduced insertion/deletion cost

• Traversal cost somewhat higher than interactive grid

Page 163: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 163

Test ScenesTest Scenes

Page 164: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 164

VideoVideo

Page 165: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 165

MeasurementsMeasurements

We measure• Traversal cost of

Interactive grid

Hierarchical grid

Regular grid

• Object update rates of

Interactive grid

Hierarchical grid

Page 166: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 166

Framerate vs. Grid Size (Sphereflake)Framerate vs. Grid Size (Sphereflake)

Page 167: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 167

Framerate vs. Grid Size (Triangles)Framerate vs. Grid Size (Triangles)

Page 168: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 168

Framerate Over Time (Sphereflake)Framerate Over Time (Sphereflake)

Page 169: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 169

Framerate Over Time (Triangles)Framerate Over Time (Triangles)

Page 170: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 170

ConclusionsConclusions

Interactive manipulation of ray traced scenes is both desirable and feasible using these modifications to grid and hierarchical grids

Slight impact on traversal cost (More results available in course notes)

Page 171: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 171

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample re-use techniques

Page 172: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Sample Re-use TechniquesSample Re-use Techniques

Page 173: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 173

Brute Force Ray TracingBrute Force Ray Tracing

Enables interactive ray tracing

Does not allow large image sizes Does not scale to scenes with

high depth complexity

Page 174: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 174

SolutionSolution

Exploit temporal coherence Re-use results from previous frames

Page 175: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 175

Practical SolutionsPractical Solutions

Tapestry (Simmons et. al. 2000)• Focuses on complex lighting simulation

Render cache (Walter et. al. 1999)• Addresses scene complexity issues

• Explained next

Parallel render cache (Reinhard et. al. 2000)• Builds on Walter’s render cache

• Explained next

Page 176: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 176

Render Cache AlgorithmRender Cache Algorithm

Basic setup• One front-end for:

Displaying pixels

Managing previous results

• Parallel back-end for:

Producing new pixels

Page 177: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 177

Render Cache Front-endRender Cache Front-end

Frame based rendering For each frame do:

• Project existing points

• Smooth image and display

• Select new rays using heuristics

• Request samples from back-end

• Insert new points into point cloud

Page 178: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 178

Render CacheRender Cache

Page 179: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 179

Render Cache (2)Render Cache (2)

Point reprojection is relatively cheap Smooth camera movement for small

images Does not scale to large images or large

numbers of renderers front-end becomes bottleneck

Page 180: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 180

Parallel Render CacheParallel Render Cache

Aim: remove front-end bottleneck• Distribute point reprojection functionality

• Integrate point reprojection with renderers

• Front-end only displays results

Page 181: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 181

Parallel Render Cache (2)Parallel Render Cache (2)

Page 182: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 182

Parallel Render Cache (3)Parallel Render Cache (3)

Features:• Scalable behavior for scene complexity

• Scalable in number of processors

• Allows larger images to be rendered

• Retains artifacts from render cache

• Introduces new artifacts

Page 183: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 183

ArtifactsArtifacts

Render cache artifacts at tile boundaries Image deteriorates during camera

movement

These artifacts are deemed more acceptable than loss of smooth camera movement!

Page 184: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 184

VideoVideo

Page 185: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 185

Test ScenesTest Scenes

Page 186: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 186

ResultsResults

Sub-parts of algorithm measured individually• Measure time per call to subroutine

• Sum over all processors and all invocations

• Afterwards divide by number of processors and number of invocations

• Results are measured in events per second per processor

Page 187: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 187

Scalability (Teapot Model)Scalability (Teapot Model)

Page 188: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 188

Scalability (Room Model)Scalability (Room Model)

Page 189: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 189

Samples Per SecondSamples Per Second

Page 190: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 190

Reprojections Per SecondReprojections Per Second

Page 191: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 191

ConclusionsConclusions

Exploitation of temporal coherence gives significantly smoother results than available with brute force ray tracing alone

This is at the cost of some artifacts which require further investigation

(More results available in course notes)

Page 192: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 192

AcknowledgementsAcknowledgements

Thanks to:• Steven Parker for writing the interactive ray tracer

in the first place

• Brian Smits, Peter Shirley and Charles Hansen for involvement in the animation and parallel point reprojection projects

• Bruce Walter and George Drettakis for the render cache source code

Page 193: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 193

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence

• Interactive Ray Tracing

• Parallel Rendering and the Quest for Realism: The Kilauea Massively Parallel Ray Tracer (Kato)

Summary / Discussion

Page 194: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 194

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 195: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 195

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 196: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 196

ObjectiveObjective

Global illumination Extremely complex scenes

Page 197: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 197

Parallel ProcessingParallel Processing

Hardware• Multi-CPU machine

• Linux PC cluster

Software• Threading (Pthread)

• Message passing (MPI)

Page 198: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 198

Our Render FarmOur Render Farm

Page 199: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 199

Global IlluminationGlobal Illumination

Photon map

Page 200: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 200

Ray Tracing RendererRay Tracing Renderer

Machine : A B C

Machine : A B C

Machine : A B CRead Scene

Ray Tracing

Shading

Output

Page 201: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 201

Ray Tracing RendererRay Tracing Renderer

Read Scene

Ray Tracing

Shading

Output

Machine : G H I

Machine : D E F

Machine : A B C

Page 202: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 202

Ray Tracing RendererRay Tracing Renderer

Machine : G H I

Machine : D E F

Machine : A B CRead Scene

Ray Tracing

Shading

Output

Page 203: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 203

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 204: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 204

Parallel Ray TracingParallel Ray Tracing

Simple case Complex case

Page 205: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 205

Parallel Ray TracingParallel Ray Tracing

Simple case Complex case

Page 206: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 206

Accel GridAccel GridHierarchical uniform grid

Scene data

Page 207: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 207

Simple Case (scene distribution)Simple Case (scene distribution)

Machine A

Machine BScene Data

copy

Page 208: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 208

Simple Case (ray tracing)Simple Case (ray tracing)

Machine A

Machine BScreen

Page 209: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 209

Parallel Ray TracingParallel Ray Tracing

Simple case Complex case

Page 210: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 210

Complex Case (scene distribution)Complex Case (scene distribution)

Machine A

Machine B

Random

Scene Data

Page 211: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 211

Complex Case (accel grid construction)Complex Case (accel grid construction)Independent construction Aligned by table

Machine B

Machine AMachine A

Machine B

Page 212: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 212

Complex Case (ray tracing)Complex Case (ray tracing)Machine A

Machine B

Screen

CompareResults

Page 213: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 213

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 214: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 214

Parallel Photon MappingParallel Photon Mapping

Photon trace Photon lookup

Page 215: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 215

Parallel Photon MappingParallel Photon Mapping

Photon trace Photon lookup

Page 216: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 216

Photon Tracing (simple case)Photon Tracing (simple case)

PhotonMap

Store

Store

Page 217: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 217

Photon Tracing (complex case)Photon Tracing (complex case)

PhotonMap B

Randomly store

PhotonMap A

Machine B

Machine A

Page 218: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 218

Parallel Photon MappingParallel Photon Mapping

Photon trace Photon lookup

Page 219: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 219

Photon Lookup (simple case)Photon Lookup (simple case)

Machine A

Machine B

PhotonMap

PhotonMap

Lookuprequest

Irradiancevalue

Lookuprequest

Irradiancevalue

Page 220: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 220

Photon Lookup (complex case)Photon Lookup (complex case)

Machine A

Machine B

PhotonMap A

PhotonMap B

Lookuprequest

Irradiancecalculation

Irradiancevaluecopy

Page 221: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 221

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 222: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 222

TaskTask

MtaskWtaskBtaskStaskRtask

AtaskEtaskLtaskPtaskOtask

Page 223: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 223

Task AssignmentTask Assignment

TaskTask

TaskTask

Machine A Task

TaskTask

Machine B

TaskTask

Task

Machine C

Page 224: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 224

Roles of TasksRoles of Tasks

pixel

T S RA

ACompare

Page 225: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 225

Task ConfigurationTask Configuration

A

A

T S RMachine A

Machine B

Page 226: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 226

Task ConfigurationTask Configuration

A

A

T S R

T S R

Machine A

Machine B

Page 227: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 227

Task ConfigurationTask Configuration

A

A

T S R

T S R

Machine A

Machine BA

A

T S R

T S R

Machine C

Machine DA

A

T S R

T S R

Machine E

Machine F

Page 228: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 228

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Page 229: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 229

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Page 230: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 230

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Page 231: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 231

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Page 232: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 232

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel Compare

pixel Compare

Page 233: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 233

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel Compare

pixel Compare

Page 234: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 234

Task Interaction (simple case)Task Interaction (simple case)

A

A

T S R

T S R

Machine

A

Machine

B

Page 235: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 235

Roles of Tasks (photon map)Roles of Tasks (photon map)

T S

RA

A

LP

PLookup

PhotonMap B

PhotonMap A

Page 236: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 236

Task Configuration (photon map)Task Configuration (photon map)

A

A

L

P

P

RST

Machine A

Machine B

Page 237: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 237

Task Configuration (photon map)Task Configuration (photon map)

T SR

A

A

L

P

P

L

RST

Machine A

Machine B

Page 238: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 238

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

STMachine A

Machine B

Page 239: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 239

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

ST

photon

photonMachine A

Machine B

Page 240: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 240

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

ST

photon

photonMachine A

Machine B

Page 241: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 241

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

ST

photonLookup

photon

Lookup

Machine A

Machine B

Page 242: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 242

Task Configuration (simple photon)Task Configuration (simple photon)

T SR

A

A

L

P

P

L

RST

Machine A

Machine B

Page 243: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 243

Task PriorityTask Priority

pixel

Compare

photon

T SR

L

A

PLookup

Low HighPriority

Page 244: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 244

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 245: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 245

Parallel Shading ProblemParallel Shading Problem

NReflection

I

P

Cp = Cs + Cr

Page 246: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 246

Parallel Shading ProblemParallel Shading Problem

NReflection

I

P

Machine B

Machine A

Cp = Cs + Cr

Page 247: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 247

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

Page 248: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 248

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 249: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 249

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 250: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 250

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 251: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 251

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 252: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 252

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 253: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 253

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 254: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 254

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Page 255: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 255

Decomposing Shading ComputationDecomposing Shading Computation

shading calculation

Page 256: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 256

Decomposing Shading ComputationDecomposing Shading Computation

funcA funcBoutside task

shading calculation

Page 257: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 257

Decomposing Shading ComputationDecomposing Shading Computation

funcA funcBoutside task

shading calculation

SPOT SPOToutside task

Page 258: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 258

SPOTSPOT

Method+

Data

data slot

Page 259: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 259

SPOT ConditionSPOT Condition

Page 260: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 260

Parallel Shading Solution using SPOTParallel Shading Solution using SPOT

Machine B

Machine A

Outside Task

Cs

Cr

C = Cs + CrSPOT

ASPOT

B

ReflectionRay

Page 261: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 261

Parallel Shading Solution using SPOTParallel Shading Solution using SPOT

SPOT SPOT

SPOT SPOT

SPOT SPOT

Machine A

Machine B

Outside Task

A

B

C

Page 262: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 262

Shader SPOT Network ExampleShader SPOT Network Example

Page 263: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 263

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Page 264: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 264

Rendering ResultsRendering Results

Test machine specification• 1GHz Dual Pentium III

• 512Mbyte memory

• 100BaseT Ethernet

• 18 machines connected via 100BaseT switch

Page 265: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 265

QuatroQuatro 700,223 triangles, 1 area point & sky light,

1280 x 692 18 machines : 7min 19sec

Page 266: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 266

Quatro : single Atask testQuatro : single Atask test

Speedup

0.00

5.00

10.00

15.00

20.00

25.00

1 3 5 7 9 11 13 15 17

Number of machines

Spe

edup raytrace

linearall

Rendering time

0:00:00

0:14:24

0:28:48

0:43:12

0:57:36

1:12:00

1:26:24

1:40:48

1 3 5 7 9 11 13 15 17

Number of machines

Exe

cutio

n tim

e (h

:m:s

)

allraytrace

Page 267: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 267

JeepJeep 715,059 triangles, 1 directional & sky light, 1280 x 692 18 machines : 8min 27sec

Page 268: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 268

Jeep4Jeep4 2,859,636 triangles, 1 directional & sky light, 1280 x 692

18 machines : 12min 38sec 2 Atsks x 1

Page 269: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 269

Jeep4 : 2 Atasks testJeep4 : 2 Atasks test

1Atask group = 2 machines

Speedup

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

1 2 3 4 5 6 7 8 9

Number of Atask group

Spee

dup raytrace

linearall

Rendering time

0:00:00

0:14:24

0:28:48

0:43:12

0:57:36

1:12:00

1:26:24

1:40:48

1 2 3 4 5 6 7 8 9

Number of Atask group

allraytrace

Page 270: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 270

Jeep8Jeep8 5,719,072 triangles, 1 directional & sky light, 1280 x 692

16 machines : 18min 43sec 4 Atasks x 4

Page 271: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 271

Escape PODEscape POD 468,321 triangles, 1 directional & sky light, 1280 x 692 18 machines : 14min 55sec

Page 272: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 272

ansGunansGun 20,279 triangles, 1 spot & sky light, 1280 x 960 18 machines : 16min 38sec

Page 273: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 273

SCN101SCN101 787,255 triangls, 1 area light, 1280 x 692 18 machines : 9min 10sec

Page 274: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 274

VideoVideo

Page 275: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 275

Conclusion / Future WorkConclusion / Future Work

We achieved:• Close to linear parallel performance

• Highly extensible architecture

We will achieve even more:• Speed

• Stability

• Usability (user interface)

• Etc.

Page 276: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 276

Additional InformationAdditional Information

Kilauea live rendering demo• BOOTH #1927 SquareUSA

http://www.squareusa.com/kilauea/

Page 277: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 277

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion (Chalmers)

Page 278: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 278

SummarySummary

Page 279: Practical Parallel Processing for Today’s Rendering Challenges  SIGGRAPH 2001 Course 40

Practical Parallel Processing for Today’s Rendering Challenges -- 279

Contact InformationContact Information

Alan Chalmers [email protected]

Tim [email protected]

Toshi Katohttp://www.squareusa.com/kilauea/

Erik [email protected]

Slideshttp://www.cs.clemson.edu/~tadavis