Efficient Parallel Refinement for Hierarchical Radiosity on a DSM computer
-
Upload
zachariah-jayson -
Category
Documents
-
view
25 -
download
1
description
Transcript of Efficient Parallel Refinement for Hierarchical Radiosity on a DSM computer
iMAGIS is a joint project of CNRS - INPG - INRIA - UJF
iMAGIS-GRAVIR / IMAG
Efficient Parallel Refinementfor Hierarchical Radiosity on a
DSM computer
François X. Sillion, Jean-Marc Hasenfratz
iMAGIS
iMAGIS-GRAVIR / IMAG
Radiosity
iMAGIS-GRAVIR / IMAG
Hierarchical Radiosity•Hierarchical representation (mesh)
• Interactions computed at appropriate level
iMAGIS-GRAVIR / IMAG
Strategies for Hierarchical Radiosity
•Gathering– memory consuming (store links)– Easier dynamic modifications
•Shooting– Memory efficient– Requires heuristic to decide shooting level– Links recomputed as needed
iMAGIS-GRAVIR / IMAG
Parallel Approaches•Two approaches:
– data exchange via message-passing algorithms– Shared memory
•Partial solutions possible if “natural” partitioning exists (e.g. inside buildings) [Fun96,FY97]
•Virtual interfaces are harder to handle [RAPP97]
•Load balancing problem[Cav99]
iMAGIS-GRAVIR / IMAG
Scheduler•Force all link refinement operations through a scheduler object
•Natural place for – Parallel synchronization– Orientation and steering of calculation
•Advantages of using scheduler:– Global view of all pending task at any given time– Task extraction can be made according to various
selection criteria
iMAGIS-GRAVIR / IMAG
Example (sequential) schedulers
•Stack scheduler (depth first refinement)
•Priority scheduler– Use simple structure (heap)– Hierarchical level (breadth first)– Size, energy, error– Interactive user control
•Random scheduler...
iMAGIS-GRAVIR / IMAG
Architecture
SolverSolver
Main / GUIMain / GUI
…
Refiner RefinerRefiner
Refiner RefinerRefiner
Refiner RefinerRefiner
Refiner Refiner
iMAGIS-GRAVIR / IMAG
Synchronization
1. Scheduler– Single object talks to all refiners => Danger!– Use simple blocks of refinement jobs
2. Hierarchical data structure– Consistency of hierarchical scene structure
3. Interactions– Links or energy representations
iMAGIS-GRAVIR / IMAG
Test scenes
VRLab - 51 182 polygonsAircraft - 184 456 polygons
Office - 5 285 polygons
iMAGIS-GRAVIR / IMAG
Measurements• Hardware architecture:
– ccNUMA SGI 2000 computer with 64 microprocessors
– Limit to 40 microprocessors R10000 at 195MHz
Proc AProc A Proc BProc B
IOXbarIO
Xbar
Node1
Node1
Node511
Node511Hub
ChipHubChip
Mem&
Dir
Mem&
Dir
Scalable Interconnect NetworkScalable Interconnect Network
Node 0Node 0
IO CtrlsIO Ctrls
……RR
RR RR
RR
RRRR
RR RR
RR
RR RR
RR
RRRR
RR RR
iMAGIS-GRAVIR / IMAG
Measurements
•Time measurements:– Refinement: times system call which return clock
ticks– Memory access, cache access…: perfex software
tool which uses the 31 hard counters of R10,000
iMAGIS-GRAVIR / IMAG
Results
CPU Refinement time
VRLab
1
10
100
1000
10000
100000
0 5 10 15 20 25 30 35 40
# proc
Tim
e [s
ec]
Iteration 1
Iteration 2
Iteration 3
Iteration 4
VRLab
1
10
100
1000
10000
100000
0 5 10 15 20 25 30 35 40
# proc
Tim
e [s
ec]
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Aircraft
0
2000
4000
6000
8000
10000
12000
0 5 10 15 20 25 30 35 40
# proc
Tim
e [s
ec]
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Aircraft
0
2000
4000
6000
8000
10000
12000
0 5 10 15 20 25 30 35 40
# proc
Tim
e [s
ec]
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Office
0
2000
4000
6000
8000
10000
12000
14000
0 5 10 15 20 25 30
# proc
Tim
e [s
ec]
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Office
0
2000
4000
6000
8000
10000
12000
14000
0 5 10 15 20 25 30
# proc
Tim
e [s
ec]
Iteration 1
Iteration 2
Iteration 3
Iteration 4
iMAGIS-GRAVIR / IMAG
Results
Speed-up
0
5
10
15
20
25
30
35
40
1 10 20 30 40# proc
Sp
eed
-up
VRLab
Office
Aircraft
Ideal
0
5
10
15
20
25
30
35
40
1 10 20 30 40# proc
Sp
eed
-up
VRLab
Office
Aircraft
Ideal
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40# proc
Sp
eed
-up
Iteration 1Iteration 2Iteration 3
Iteration 4Ideal
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40# proc
Sp
eed
-up
Iteration 1Iteration 2Iteration 3
Iteration 4Ideal
iMAGIS-GRAVIR / IMAG
Results
Influence of the size of link blocks on overall CPU time
Office
0
5000
10000
15000
20000
25000
30000
35000
0 5 10 15 20 25 30 35 40
# proc
Tim
e [s
ec]
Block of 1 link
Block of 10 links
Block of 100 links
Office
0
5000
10000
15000
20000
25000
30000
35000
0 5 10 15 20 25 30 35 40
# proc
Tim
e [s
ec]
Block of 1 link
Block of 10 links
Block of 100 links
iMAGIS-GRAVIR / IMAG
Results
Memory used before and during the iterations
0
100
200
300
400
500
600
700
800
Load Iteration 1 Iteration 2 Iteration 3 Iteration 4
Me
mo
ry o
cc
up
ati
on
[M
o]
0
1000000
2000000
3000000
4000000
5000000
6000000
# o
f li
nk
s
AircraftTDBMWVRLabAircraft linksTd linksBMW linksVRLab links
0
100
200
300
400
500
600
700
800
Load Iteration 1 Iteration 2 Iteration 3 Iteration 4
Me
mo
ry o
cc
up
ati
on
[M
o]
0
1000000
2000000
3000000
4000000
5000000
6000000
# o
f li
nk
s
AircraftTDBMWVRLabAircraft linksTd linksBMW linksVRLab links
iMAGIS-GRAVIR / IMAG
Conclusions•Very simple atomic tasks•Easily managed with a single scheduler structure
•Easily implemented on top of an existing radiosity simulation code
– Thread setup – New link creation upon refinement decision
iMAGIS-GRAVIR / IMAG
Future work
•Understanding the peculiar behaviour observed for the aircraft scene
•Dealing with graphics resources for “optimized” calculations using graphics hardware
iMAGIS-GRAVIR / IMAG
Acknowledgements•Peter Kipfer contributed to the design and early implementation of this work.
•Thanks to Centre Charles Hermite for providing access to its computational resources
•Laurent Alonso provided useful advice on performance questions.
•This work was supported in part by the European Union’s ESPRIT project #24944, ARCADE (“Making Radiosity Usable”).