DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D....

20
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science Rutgers University John Zahorjan Department of Computer Science & Engineering University of Washington
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D....

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on

Commodity Clusters

Thu D. Nguyen and Christopher Peery Department of Computer Science

Rutgers University

John ZahorjanDepartment of Computer Science & Engineering

University of Washington

IPDPS 2001

Overview

Improve real-time rendering performance using distributed rendering on commodity clusters

• Real-time rendering -> interactive rendering applications

• Improve performance -> Render more complex scenes at interactive rates

Why real-time rendering?

• A critical component of an increasing number of continuous media applications

Virtual reality, data visualization, CAD, flight simulators, etc.

• Rendering performance will continue to be a bottleneck Model complexity increasing as fast (or faster) than hardware performance Part of the challenge is to leverage increasingly powerful hardware accelerators

IPDPS 2001

Challenges

How to structure the distributed renderer to leverage hardware-assisted rendering• Information that is useful for work partitioning and

assignment may be hidden in the hardware rendering pipeline

How to minimize non-parallelizable overheads (avoiding Amdhal’s Law)

How to decouple bandwidth requirement from the complexity of the scene and the cluster size

IPDPS 2001

Image Layer Decomposition (ILD)

Per-frame rendering load is partitioned using ILD

• presented in IPDPS 2000

Briefly review ILD because it affects DDDDRRaW’s architecture and performance

Basic idea: assign scene objects such that sets of objects assigned to different nodes are not mutually occlusive

Advantages of using ILD

• Do not need position of polygons in 2D This information may be hidden inside the graphics pipeline

• Do not need Z-buffer information This reduces the required bandwidth by at least 50%

IPDPS 2001

Spatial partitioning

Image Layer Decomposition (ILD)

1 2

3 4

5 6

3

5 4 1

26

IPDPS 2001

Non-mutually occlusive assignment -> legal for back-to-front compositing

Use heuristic-based algorithm to

• Balance load across cluster

• Minimize the screen real-estate covered by each assignment

ILD: Work Assignment

3

5 4 1

6 2

Legal

IPDPS 2001

App.

DDDDRRaWLibrary

DDDDRRaWLibrary

DDDDRRaWLibrary

DDDDRRaWLibrary

DDDDRRaWLibrary…

Display

WorkAssignment

PartialImage

VRMLScene,DisplayWindow

Viewpoint

DisplayNode

Rendering Nodes

Implementation: Architecture

• Partitioning• Assignment• Decompress• Compositing

• Rendering• Compress

IPDPS 2001

Implementation Details

Implemented an optimization to ILD: dynamic selection of octants to be rendered

• Minimize overhead of geometric transformation due to polygon splitting (in scene decomposition)

Compression of image layers before communication

• Reduce bandwidth requirement to accommodate slower networks (eg., 100 Mb/s LANs)

Use dynamic clipping to enforce octant boundaries for scene with smooth shading and/or texturing

• Simplification to ease implementation of prototype – this clipping could/should be done statically

• 20-25 percent overhead for 5 of our 6 test scenes that would not be present in a production system

IPDPS 2001

Performance Measurement

Application: VRML viewer

• VRweb – http://www.iicm.edu/vrwave

Collected 6 VRML scenes from the web

• Use fix paths through scenes to measure performance in terms of average frame rate (frames/sec)

Two clusters representing different points in the technology spectrum

• Cluster of 5 SGI O2s 180 MHz Mips R5000, 256 MB memory, SGI Graphics Accelerator, 100 Mb/s

switched Ethernet LAN IRIX 6.5.7

• Cluster of 13 PCs Pentium III 800 MHz, 512 MB memory, Giganet 1 Gb/s cLAN Red Hat Linux (kernel 2.2.14), Mesa 3D library version 3.2

IPDPS 2001

Two Test Scenes

IPDPS 2001

Overheads on SGI O2s

Operation Time (ms)

Display Node Rendering Node

P=1 P=2 P=4 P=1 P=2 P=4

ILD 2.08 1.97 8.68

Clear Image Buffer

3.50 3.50 3.50

Decompress 18.08 22.84 30.28

Display Frame 0.18 0.18 0.18

Compress 36.03 27.13 17.70

IPDPS 2001

Overheads on PCs

Operation Time (ms)

Display Node Rendering Node

P=1 P=4 P=8 P=12 P=1 P=4 P=8 P=12

ILD 2.62 2.63 2.63 2.70

Clear Image Buffer

4.98 5.01 5.37 5.24

Decompress 3.29 4.11 4.33 4.46

DisplayFrame

15.79 15.34 15.73 15.73

Compress 7.42 7.52 7.46 7.79

IPDPS 2001

Speed-up of Average Frame Rate on O2s

0

1

2

3

4

5

6

Aztec City Chamber Hall Coronary Left Lung CSBuilding

Sp

eed

-up

SequentialP=1P=4

IPDPS 2001

Speed-up of Average Frame Rate on PCs

0

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9 10 11 12

Num ber of Rendering Nodes (P)

Sp

ee

d-u

p

CS Building

Hall

Chamber

Aztec City

Coronary

IPDPS 2001

Speed-up of Rendering Component on PCs

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12

Num ber of Rendering Nodes (P)

Sp

eed

-up

Aztec City

Coronary

IPDPS 2001

Conclusions

Can build an ILD-based distributed renderer to significantly improve real-time rendering performance on commodity hardware

DDDDRRaW currently scales to modestly sized cluster• This limitation is due to non-optimal hardware configurations

• This is NOT because more suitable hardware is not available!

• Expect good scalability to clusters of 16-32 nodes

Overlapping communication with computation increases average frame rate but ONLY at the expense of increasing frame latency• Problem is CPU contention for rendering & communication

• Either need dedicated hardware or can only optimize after reaching 10-15 fps, the nominal interactive frame rate

Project URL: www.cs.washington.edu/research/ddddrraw/

IPDPS 2001

Overlapping Communication & Computation

Communication and compression are significant sources of overhead

Apply standard parallel optimization technique: overlap communication of rendered image layers for one frame with rendering of the next

Requires pipelining of DDDDRRaW

IPDPS 2001

The DDDDRRaw Pipeline

Render Compress

Receive

Send

DecompressComposite & DisplayILD Send

Receive

Stage 1 Stage 3

Stage 2

Display Node

Rendering Nodes

IPDPS 2001

Average Frame Rates

0

1

2

3

4

5

6

7

8

9

Aztec City Chamber Hall Coronary Left Lung CSBuilding

Fra

me

Ra

te (

fps

)

Avg SeqAvg STAvg MT

IPDPS 2001

Average Frame Latency

0

100

200

300

400

500

600

700

800

900

Aztec City Chamber Hall Coronary Left Lung CSBuilding

late

ncy

(m

s)

Avg ST

Avg MT