Interactive parallel visualization of large particle datasets

18
Interactive parallel visualization of large particle datasets Kevin Liang a, * , Patricia Monger a , Huge Couchman b a Research & HPC Support, McMaster University, Hamilton, ON, Canada L8S 4M1 b Department of Physics and Astronomy, McMaster University, Hamilton, ON, Canada L8S 4M1 Received 16 February 2004; revised 20 December 2004 Available online 14 April 2005 Abstract This paper presents a new interactive parallel visualization method for large particle data- sets by directly rendering individual particles based on a parallel rendering cluster. A frame rate of 9 frames-per-second is achieved for 256 3 particles using 7 render nodes and a display node. This provides real time interaction and interactive exploration of large datasets, which has been a challenge for scientific visualization and other real time data mining applications. A dynamic data distribution technique is designed for highlighting a subset of the particle volume. It maintains load balance of the system and minimizes network traffic by reconfigur- ing the rendering chain. Experiments show that on a given subset, interactive manipulation of the subset usually requires less than 3% of the particles inside the subset to be redistributed among all render nodes. The method can be easily extended to other large datasets such as hydrodynamic turbulence, fluid dynamics, and so on. Ó 2005 Elsevier B.V. All rights reserved. Keywords: Interactive graphics; Parallel visualization; High performance scientific visualization; Distri- buted/network graphics; Volume rendering 0167-8191/$ - see front matter Ó 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.parco.2005.02.008 * Corresponding author. Tel.: +1 604 540 0891; fax: +1 604 540 0891. E-mail address: [email protected] (K. Liang). www.elsevier.com/locate/parco Parallel Computing 31 (2005) 243–260

Transcript of Interactive parallel visualization of large particle datasets

Page 1: Interactive parallel visualization of large particle datasets

www.elsevier.com/locate/parco

Parallel Computing 31 (2005) 243–260

Interactive parallel visualization of largeparticle datasets

Kevin Liang a,*, Patricia Monger a, Huge Couchman b

a Research & HPC Support, McMaster University, Hamilton, ON, Canada L8S 4M1b Department of Physics and Astronomy, McMaster University, Hamilton, ON, Canada L8S 4M1

Received 16 February 2004; revised 20 December 2004

Available online 14 April 2005

Abstract

This paper presents a new interactive parallel visualization method for large particle data-

sets by directly rendering individual particles based on a parallel rendering cluster. A frame

rate of 9 frames-per-second is achieved for 2563 particles using 7 render nodes and a display

node. This provides real time interaction and interactive exploration of large datasets, which

has been a challenge for scientific visualization and other real time data mining applications.

A dynamic data distribution technique is designed for highlighting a subset of the particle

volume. It maintains load balance of the system and minimizes network traffic by reconfigur-

ing the rendering chain. Experiments show that on a given subset, interactive manipulation of

the subset usually requires less than 3% of the particles inside the subset to be redistributed

among all render nodes. The method can be easily extended to other large datasets such as

hydrodynamic turbulence, fluid dynamics, and so on.

� 2005 Elsevier B.V. All rights reserved.

Keywords: Interactive graphics; Parallel visualization; High performance scientific visualization; Distri-

buted/network graphics; Volume rendering

0167-8191/$ - see front matter � 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.parco.2005.02.008

* Corresponding author. Tel.: +1 604 540 0891; fax: +1 604 540 0891.

E-mail address: [email protected] (K. Liang).

Page 2: Interactive parallel visualization of large particle datasets

244 K. Liang et al. / Parallel Computing 31 (2005) 243–260

1. Introduction

One of the scientific visualization problems is to interactively visualize simulated

large particle datasets, usually containing more than a hundred million particles.

Three methods have been used to visualize large particle datasets. Researchers viewindividual particles directly, convert the particles to volumetric data representing

particle density [7], or explore the particles using a combined direct- and volume-

based rendering method [14]. Direct particle rendering takes longer time on desktop

graphics workstations; therefore, it is usually difficult to obtain interactive frame

rate. Volume rendering and combined methods can provide interactive frame rates

but have the limitation of missing fine structures. In this paper, we investigate a di-

rect rendering method using a parallel rendering cluster, the Sepia [8] from Hewlett–

Packard. The aim is to develop a system which can interactively explore 2563 and5123 particle datasets without missing fine structures.

Interactive particle rendering is very attractive to scientists who are interested in

exploring particle datasets in full detail and from many different perspectives. Recent

research has shown that it helps study the development of cosmic structure, such as

star formation, planet formation, galaxy evolution, and so on [5,9,13]. Rendering a

large particle dataset, however, is so computationally and graphically intensive that a

single desktop computer is unable to handle the problem. In this work, we propose

to use a parallel rendering cluster with a Sepia-2a compositing board [8] at eachnode. The cluster has 8 nodes, configured as 7 render nodes and a display node.

Using a dynamic rendering routing technique and a set of optimal particles visual

control methods, a frame rate of 9 frames-per-second (fps) is obtained for 2563 par-

ticle datasets, such as the one shown in Fig. 1. An interactive system built using this

Fig. 1. Direct rendering of a 2563 particle dataset.

Page 3: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 245

method allows scientists to interactively navigate the particle volume and to study

simulation results using various particle visual representations.

For 5123 particle datasets, the system runs at 1 fps using 7 render nodes and a dis-

play node. By using more render nodes, integrating more advanced rendering tech-

niques, and applying other optimization techniques, we expect to obtain higherframe rate.

1.1. Related work

Visualization of simulated particle datasets has focused on direct particle render-

ing and volumetric rendering. The direct rendering method views each particle using

a simple geometric primitive, often a single point. Volume-based particle rendering

first converts the particles to volumetric data and then displays the particles asiso-surfaces or as a series of closely-spaced parallel texture mapped planes [1]. Direct

rendering has limitations for interactive exploration of large particle datasets using

desktop graphics workstations. A workstation either is unable to process such large

datasets or takes too long for rendering a single frame.

Using a hierarchical data structure and principal component analysis (PCA) tech-

nique, Hopf and Ertl [3,4] proposed a splatting method to interactively visualize

large scattered data on desktop workstations. The method first creates a hierarchy

based on a principal component analysis clustering procedure. The hierarchical datastructure separates clusters from point data. Clusters at lower hierarchy levels are

rendered as splats during interaction to accelerate the rendering process as a

trade-off of image quality on standard workstations. Other methods, such as those

proposed in [11,15], are more focused on rendering surfaces.

In volume rendering, the range covered by the particle dataset is evenly divided

into voxels, and each voxel value is assigned a density based on the number of par-

ticles inside it. The data are then converted into a texture and rendered on the screen

as a series of parallel texture-mapped planes, creating the illusion of volume. Thismethod can provide real-time visualization when using lower resolutions [7]. A major

drawback of volume rendering is its missing fine structures due to limited voxel res-

olution. A combined method [14] provides a compromise solution to interactive visu-

alization of large particle datasets, but it suffers the same low resolution and missing

fine structure problems as volume rendering does.

1.2. Sepia-based parallel volume rendering

Moll et al. [8] proposed a scalable 3D compositing architecture, called Sepia, to

interactively render partitioned datasets. A Sepia cluster consists of a number of

nodes, each with a Sepia-2a compositing card that implements a sort-last image com-

position engine. It provides scalability for interactive volume rendering [6] and visu-alization of large scientific datasets [2]. A high bandwidth compositing network

makes interactivity possible by its speed, scalability, and low latency. Each node

in the cluster can be configured as either a render node or a display node. Render

Page 4: Interactive parallel visualization of large particle datasets

246 K. Liang et al. / Parallel Computing 31 (2005) 243–260

nodes, the workers, are responsible for portions of rendering task. A display node,

the master, is the last node in the Sepia rendering pipeline. It can be used to control

user interaction, coordinate operations among render nodes, and project composited

images to the target display.

As discussed in Section 1.1, existing researches on interactive direct rendering oflarge particle datasets have a number of limitations. Using a single desktop work-

station, we cannot visualize massive datasets interactively by rendering millions of

colored points for each frame, or a trade-off of rendering speed versus image quality

as in [4,7,14] has to be made. As parallel computing and advanced clusters become

more popular among researchers dealing with large datasets, a new method is ex-

pected not only to maintain all the fine features of the particle volume, but also to

provide interactive frame rate using parallel rendering technique on a rendering

cluster.The method presented in this paper uses parallel rendering techniques on a

scalable rendering cluster, the Sepia. Using a Sepia system avoids the memory

problem as experienced in stand-alone graphics workstations and allows for deal-

ing with increasingly large scientific datasets. In addition a scalable application

can be easily designed on the Sepia system. The more render nodes in the system,

the larger particle datasets can be processed or the higher frame rates can be

achieved.

2. Data distribution and parallel visualization

In this work, we focus on datasets from a series of simulations in a periodic vol-

ume. The dataset shown in Fig. 1 is a simulation of a collection of galaxies in a peri-

odic volume. It is a pure gravity simulation in which only the mass of the galaxies,

not the stars or gas, was being simulated. The dataset shows the final time of the sim-

ulation, i.e. the current age of the universe. One of the uses for these simulations is tounderstand properties of clusters of galaxies. Each bright point in the image is a large

galaxy or group of galaxies evolved within a Lambda cold dark matter cosmology to

the current time. Studying the simulation results help cosmologists better understand

the nature and origin of the universe [13].

In this applications, the dataset is organized as randomly distributed 3D points.

Each particle is defined by its 3D position, mass, density, and velocity.

2.1. Static data distribution

Based on a static workload distribution method, an equal portion of particles is

initially allocated to each render node. This static equal distribution strategy guaran-

tees that no individual node becomes the performance bottleneck because each has

almost same number of particles to process. The current static distribution is solely

based on the data organization of the simulation results, in which particles are ran-

domly distributed within the volume. This data organization obviously prevents

Page 5: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 247

some better sorting methods to improve the rendering process. A pre-process proce-

dure could be applied to sort the particles into a hierarchical data structure such as

octree first, as in [4], and then the static distribution method is used to distribute the

particles based on the sorted hierarchy of the datasets. In this case, the master pro-

gram has to load all the particles into its memory and sort the dataset, which isimpossible for our current system when dealing with 5123 particle datasets. In addi-

tion, for a series of datasets, this will greatly increase the data loading time and dis-

tribution time. Therefore, currently we simply distribute particles based on the

original particle organization of the dataset. The dataset is partitioned into equal

portion of particles and each render node concurrently loads only its portion of

the particles.

2.2. Dynamic data distribution

When exploring the particle internal structure, scientists often want to exam a

chosen subset in detail while being able to see the entire structure of the particle data-

set. One way is to apply different color maps to the subset and non-subset particles

respectively. This, however, can�t provide a clear view of the subset particles if there

are many clusters of particles around the subset. Another approach is to use alpha

blending operation. Particles inside the subset are rendered as opaque points while

particles outside the subset as transparent points. In this case, particles need to beredistributed among the render nodes in order to apply alpha blending operation

along the rendering pipeline. Subset particles need to be rendered before the rest

of the particles. This can be done by designating a few render nodes for processing

subset particles and remaining render nodes for the rest of the particles. The number

of render nodes designated for subset particles can be determined based on the num-

ber of particles within the subset. The basic strategy is to maintain the particle load

balance among all render nodes. This is possible because the last render node for the

subset can have particles outside the subset. One way to do this is to choose the firstfew render nodes in the initial rendering pipeline as the designated subset render

nodes. All particles within the subset at each render node are moved to the desig-

nated subset render nodes. Others are moved to the remaining render nodes. This

is obviously not an efficient solution because it involves massive particle exchange

among render nodes.

A better way is to choose render nodes with higher number of subset particles

for subset render nodes and to change the rendering pipeline according to the

number of subset particles contained in each render node. This will minimizethe data exchange among all render nodes as only a small portion of the parti-

cles at most nodes needs to be exchanged. In the new pipeline, the subset render

nodes will be moved to the front of other render nodes. This can be achieved

using Sepia�s dynamic routing functions. For each node, a new upstream node

and a downstream node are set based on the number of subset particles at each

node.

Page 6: Interactive parallel visualization of large particle datasets

Algorithm: Dynamic particle distribution

• choose a subset.• find the bounding box of the subset.• calculate the number of particles within the subset at each node.• calculate the total number of subset particles.• determine the number of render nodes required for the subset, i.e.,

nsubset ¼ ntotal �psubsetptotal

where nsubset and ntotal are the number of render nodes for subset particles and

total number of render nodes respectively; psubset and ptotal are the number of

subset particles and total number of particles respectively.• select the top nsubset nodes that have more particles in the subset.• exchange particles between nodes so that the first nsubset � 1 nodes contains

only subset particles, the nsubset�th node contains subset particles and possibly

particles outside the subset, and other nodes contains only particles outside the

subset.• change the rendering chain so that the nsubset nodes are in front of other

nodes in the rendering pipeline.• reset the upstream and downstream nodes for each node.

248 K. Liang et al. / Parallel Computing 31 (2005) 243–260

The redistribution process uses a similar strategy to that of the initial distribution

in which each node gets approximately equal share of particles. The particle redistri-

bution process is detailed in the dynamic particle distribution algorithm. Theoreti-cally all particles outside the subset should be sorted first in respect to current

view point before being rendered. The dynamic routing algorithm does not take into

account of such sorting because our focus is on the subset particles. Particles outside

the subset are rendered to simply provide an overall structure.

Fig. 2 shows an example of dynamic routing on a 6 render node case. At first, the

system creates a rendering pipeline according to the number of render nodes allo-

cated to the current running session. This initial pipeline is usually based on the

physical node configuration. Each render node, except the first and the last, isdirectly connected to its upstream render node and its downstream render node.

node 0 node 1 node 2 node 3 node 5node 4

node 0 node 2 node 4 node 5node 1 node 3

initial rendering node chain

new rendering node chain

Fig. 2. Dynamic routing rendering pipeline.

Page 7: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 249

The first render node has no upstream render node. The last render node has no

downstream render node and is connected to the display node. The top part of

Fig. 2 shows an initial rendering pipeline when using 6 render nodes. Each render

node renders its share of particles and combines its rendering result with the result

from the previous node in the pipeline (details are described in Section 2.4). Thecomposited result is then transferred to the next render node in the pipeline.

During dynamic data redistribution, the system first determines that two nodes

are required for rendering subset particles. It then chooses those two nodes with

the most subset particles, i.e. nodes 1 and 3 in this case. All particles outside the sub-

set on node 1 will be redistributed to node 0, 2, 4, or 5. Depending on the total num-

ber of particles in the subset, node 3 may only need to redistribute part of its particles

outside the subset to node 0, 2, 4, or 5. Particles inside the subset on nodes 0, 2, 4,

and 5 are transferred to either node 1 or node 3 in exchange of particles outside thesubset. The redistribution algorithm maintains the particle load balance among all

render nodes. In general, the number of particles on each render node remains the

same as before the redistribution. The 6 node rendering pipeline is finally changed

so that the two render nodes containing subset particles are set in front of all other

render nodes, as illustrated in Fig. 2.

2.3. Rendering particle primitives

In direct rendering, each particle is represented using a single point primitive. All

point primitives use the same radius, which is one in our examples. Using a uniform

point representation avoids OpenGL rendering overhead that may be incurred by

changing the point size for each particle and performing complex transformations re-

quired to position different primitives. In this case, particles at each node can be ren-

dered to a display list in a single loop or using a single OpenGL array rendering call.

Particle densities are usually used to reveal the structure of the particle volume.

To visualize the particles, particle densities are transformed to RGB or RGBA colorsbased on a given color map. This is usually done using a logarithm mapping which

maps the density of a particle to a specific color in the color map, i.e.

ðr; g; b; aÞ ¼ mapðlogðdÞÞwhere (r,g,b,a) is the mapped values of RGB and alpha channels, d is the density of

a given particle, and map( ) is a function from a given logarithmic density to a color.

Linear mapping and other non-linear mappings can be applied as well according tothe application�s characteristics. Non-linear color mapping method can be conve-

niently used to highlight particles in a particular range of densities. Images in this

work are produced using a variety of color maps, such as rainbow color map, light

hue color map, and black-to-yellow color map.

2.4. Image composition scheme

The Sepia cluster offers a variety of compositing operations, such as Z-compari-son using OpenGL comparison operators, alpha blending, full scene anti-aliasing,

Page 8: Interactive parallel visualization of large particle datasets

Fig. 3. Particle image compositing on four render nodes.

Fig. 4. Subset particles rendered using alpha blending.

250 K. Liang et al. / Parallel Computing 31 (2005) 243–260

and user down-loadable re-programmable computational logic. For direct particle

visualization, we use Z-comparison to composite the rendered image at each node

with the image from its upstream node in the pipeline. Fig. 3 illustrates rendered par-

ticles at each node and composited particles after each node in a case of 4 render

nodes and a display node.

Using alpha blending, a chosen subset of the particle dataset can be highlighted

using the dynamic particle distribution algorithm. As shown in Fig. 4, particles out-

side the selected subset are rendered as transparent particles while particles withinthe subset are rendered as opaque particles. This allows for detailed study of parti-

cular regions of particles while users are exploring the whole particle dataset.

Page 9: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 251

2.5. Rendering synchronization

In a parallel rendering system, operations on different nodes need to be synchro-

nized so that particles are rendered at the same time on all render nodes in response

to the same command. Otherwise, it is very likely that some render nodes havedifferent view points from others. This obviously will result in a compositing image

that is not a correct representation of the given particle dataset.

The system uses an event-driven mechanism [12] for operation synchronization

among all nodes using event handles. The master listens to and accepts user requests

and concurrently dispatches the requests to all render nodes. A frame controller is

used at both master node and render nodes. The master first sends a command to

all render nodes to initialize the controller. It then sends each command to the render

nodes, along with the frame controller. At each step, the render nodes respond to themaster�s commands and run only commands associated with the same frame control-

ler. This ensures that all render nodes are working on the same frame at the same

time.

3. Results and discussion

3.1. Implementation

An interactive visualization system has been implemented using OpenGL, Qt

toolkit and C++ on a HP�s Visualization Market Development System, the Sepia-

2 cluster, running NPACI Rocks clustering software [10]. The cluster contains 8

nodes, one display node and 7 render nodes. Each node is a HP Workstation

xw8000 with dual Intel(R) XeonTM 2.40GHz processors, 1GB memory, a sepia-2a

compositing card, a nVIDIA Quadra4 900 XGL graphics card with 128MB memory,

an Ethernet card and a ServerNet II card. A graphical user interface is designed toallow scientists and researchers to interactively manipulate and explore large particle

datasets. Communications among nodes in the system are handled using Unix sock-

et-based message passing interface.

At each render node, a set of OpenGL display lists are used for directly rendering

particles as point primitives. Particles at each render node are built into the display

lists when particle properties are changed. Using OpenGL display lists greatly en-

hances the system performance when users navigate the particle dataset. As OpenGL

display lists have to be stored in the graphics memory, this may become the con-straints of the system performance for increasing number of particles. In this case,

vertex arrays may be a better choice than display lists. We are still investigating this

on the 5123 particle datasets.

3.2. Particles visual representations

Figs. 5 and 6 show different visual representations of the 2563 particle cube. In

Fig. 5, a front view of the directly rendered particles is modeled using an inverse

Page 10: Interactive parallel visualization of large particle datasets

Fig. 5. Particles represented using inverse rainbow color map: (a) front view and (b) a close look.

Fig. 6. Particles represented using light hue color map (left) and from black to yellow linear color map

(right).

252 K. Liang et al. / Parallel Computing 31 (2005) 243–260

rainbow color map. The right image shows a close look at a specific part of the par-

ticle cube. Clusters of red particles represent high density areas. Fig. 6 (left) shows

the same 2563 particles cube using another color map, in which high density particles

are mapped to brown while low density particles are mapped to light black. A linear

color mapping from black to yellow is applied to the same particle cube to create a

different particle visual representation in Fig. 6 (right).

3.3. Results on 2563 particle dataset

A comprehensive test has been done for the 2563 particle dataset using 1–7 render

node(s) and a display node. In general, the performance is constrained by particle

rendering, data acquisition, and image composition. Particle rendering depends on

the number of particles at the render nodes. Data acquisition and image composition

are dependent on the Sepia hardware and its associated visualization toolkits.

Page 11: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 253

There are two types of operations, navigation and particle update.

• Navigation operations include zooming, rotation, and panning/translation. These

operations allow users to look around and to go inside the particle volume.

Because the particles visual representations are not changed, the system simplycalls the OpenGL display lists created while doing particle update operations, pro-

viding a high interactive frame rate.

• Particle update operations change the particles attributes and build the OpenGL

display lists. The particle attributes can be changed by modifying particles� posi-tions, colors and alpha channels using different color maps and/or different color

mapping methods. Particle update operations allow users to explore the overall

structure and fine features of the particle volume, and to study a specific region

of interest in the particle volume.

Table 1 shows system timings for individual navigation operations on 4, 5, 6, and 7

render nodes. The time shown in the table is taken during a series of navigation oper-

ations. Therefore, it includes only one data acquisition and image compositing time

that is independent on the number of render nodes. Table 2 shows time used for

different particle update operations.

As shown in Table 1, the time used for navigation operations remains the same,

allowing interactive flying-through the particle volume. This is because each rendernode processes about 4 million particles and it takes very little time to update a frame

by calling the already built OpenGL display lists. Our experience showed that the

number of display lists, 1024 in this test, is crucial to obtain this performance on

the given graphics card. The performance would drop dramatically if fewer display

lists are used or the number of particles on each node is over a certain limit. This can

be seen from Fig. 7 as the navigation operation time drops rapidly from about 0.78 s

on 1 render node to about 0.112 s on 3 render nodes.

Table 1

Navigation operation time (s) on 2563 particle dataset

Operation Nodes

4 5 6 7

Zoom in/out 0.1119 0.1127 0.1123 0.1123

Rotate 0.1119 0.1127 0.1124 0.1123

Pan/translate 0.1119 0.1127 0.1125 0.1124

Table 2

Particle update operation time (s) on 2563 particle dataset

Operation Nodes

4 5 6 7

Particle update 3.05 2.513 2.06 1.56

Changing color map 3.06 2.46 2.15 1.60

Sorting by color 3.21 2.47 2.15 1.58

Changing color mapping 3.20 2.50 2.06 1.60

Page 12: Interactive parallel visualization of large particle datasets

0.80

0.70

Time (sec)

acquisition and composition time

particle rendering time

overall operation time

1 2

0.10

0.20

0.30

0.40

0.50

0.60

Nodes3 4 65 7

Fig. 7. Navigation operation time (s) vs. the number of render nodes.

254 K. Liang et al. / Parallel Computing 31 (2005) 243–260

On the other hand, as shown in Table 2, particle update operations take much

longer time than navigation operations because they need to build the display lists

for rendering all the particles. Their performance is totally dependent on the render-ing capabilities of each render node. The less the number of particles at each render

node, the faster the operation.

In practice, scientists are more interested in navigation operations because these

operations provide sufficient power for them to study the particle dataset fromdifferent

view points, to fly through the particle cube, and to dive into specific regions for de-

tailed analysis. Based on the data in Table 1, a frame rate of 1/0.112 = 8.928 fps is ob-

tained for navigation operations on the 2563 particle dataset using 4 or more render

nodes. This result is very satisfactory for interactive visualization of 2563 particle data-sets. Operations such as rotation, zooming in and out, and panning run very smoothly.

Fig. 7 illustrates navigation operation time against the number of nodes. As seen

in the figure, the rendering time drops as the number of nodes increases from 1 to 3

nodes. From 3 to 7 nodes, the rendering time is close to 0 and the overall navigation

operation time is dominated by the acquisition and compositing time. For particle

update operations, as shown in Fig. 8, the rendering time drops as the number of ren-

der nodes is increased. This illustrates that the presented method is scalable for sys-

tems with up to 7 render nodes. The particle update operation time is dependent onthe graphics processing capability at each render node. The performance gain is

mainly from the speedup of individual nodes. For instance, for system running on

4 and 6 nodes, each node takes (64 � 42) * 2562 (or 34%) less particles in 6 nodes

rendering than in 4 nodes rendering, and, therefore, the overall performance is

improved by about 30%.

Page 13: Interactive parallel visualization of large particle datasets

8.00

Time (sec)

acquisition and composition time

particle rendering time

overall operation time

7.00

1 2

2.00

3.00

4.00

5.00

6.00

1.00

Nodes3 4 5 6 7

Fig. 8. Particle update operation time (s) vs. the number of render nodes.

Table 3

Time (s) split for navigation operations

Operation Nodes

4 5 6 7

Rendering particles 0.0007 0.0006 0.0006 0.0006

Acquisition and composition 0.1082 0.1121 0.1118 0.1118

K. Liang et al. / Parallel Computing 31 (2005) 243–260 255

In Table 3, the navigation operation time is split into particle rendering time and

image data acquisition and composition time for system running on 4, 5, 6, and 7

render nodes. The rendering time, as mentioned earlier, is the time used for calling

a set of display lists built in a particle update operation. The data show that, with4 or more render nodes, the navigation operation time is completely dependent on

the acquisition and compositing time, which is constrained by the Sepia hardware.

In conclusion, the performance of the current system is basically determined by

the number of render nodes, especially for particle update operations. The image data

acquisition and composition time at each node is dependent on the Sepia data trans-

fer network and data compositing engine. When using more than 3 render nodes, the

Sepia data acquisition and compositing engine has been identified as the bottleneck

for navigation operations of the system.

3.4. Minimization of data distribution for subset visualization

A dynamic load balance and data redistribution method, as described in Section

2.2, is used to highlight sub-volume of a large particle dataset. Table 4 shows the

Page 14: Interactive parallel visualization of large particle datasets

Fig. 9. Subset particle distribution for uniformly distributed datasets on four render nodes. Left: initial

distribution and a central subset, Center: moving subset in Y direction, Right: expanding subset in X

direction.

Table 4

Comparison between static and dynamic routing rendering schemes

Operation Routing Nodes involved Particles transferred

Static Dynamic Static Dynamic

Initial sub-volume 4 3 156,700 51,474

Changing X 4 3 85,314 797

Changing Y 4 2 78,099 871

Changing Z 4 3 151,125 45,667

Expanding X 4 3 187,540 6396

Expanding Y 4 3 168,058 3165

Expanding Z 4 3 161,913 3854

Table 5

Subset particle distributions of subset manipulation among 6 render nodes

Operation Nodes

0 1 2 3 4 5

Initial sub-volume 0 14,878 105,226 36,596 0 0

Moving along X 0 60 84,517 737 0 0

Moving along Y 0 0 77,228 871 0 0

Moving along Z 0 0 105,458 34,603 11,064 0

Expanding X 0 5537 181,143 860 0 0

Expanding Y 0 1577 164,893 1588 0 0

Expanding Z 0 627 158,059 3227 0 0

256 K. Liang et al. / Parallel Computing 31 (2005) 243–260

number of nodes and the number of particles involved in the data redistribution

using both static and dynamic routing rendering schemes. Table 5 shows the distri-bution of particles inside a given subset on typical subset operations, such as initial

setup, changing the subset position, and changing the size of the subset.

In general, as illustrated in Table 5, more render nodes are involved in the manip-

ulation of the subset visualization using the static routing rendering scheme than

using the dynamic routing rendering scheme. For an initial rendering node chain

Page 15: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 257

shown in Fig. 2, all particles inside the subset need to be transferred to node 0, in

exchange with particles outside the subset in order to maintain the load balance

among all render nodes. For instance, to move a given subset [(�0.25,�0.25,

�0.25), (0.25,0.25,0.25)] along X axis from its initial center position (0,0,0) to

(0.04,0,0), static routing rendering scheme needs to transfer all particles inside thenew subset on render nodes 1, 2 and 3 to node 0 and to receive corresponding num-

ber of particles outside the subset from node 0. This involves exchanging 60 particles

between nodes 0 and 1, 84,517 particles between nodes 0 and 2, and 737 particles be-

tween nodes 0 and 3. The total number of particles transferred for this operation is

170,628. In contrast, the dynamic routing rendering scheme requires only 60 particles

exchanged between nodes 1 and 2 and 737 particles between nodes 3 and 2. The

number of particles transferred for the operation is 1594, which is only 0.93% of

the particles transferred in static routing rendering scheme. The worst case is to movethe subset along Z axis, in which 45,667 particles, about 30% of the particles to be

transferred in static routing rendering scheme, need to be transferred among 3 render

nodes. This is due to the clustering property of the dataset.

Although the data shown in Tables 4 and 5 were based on a single operation on

the subset [(�0.25,�0.25,�0.25), (0.25,0.25,0.25)] at its initial center position,

experiment showed that it is arbitrary to define a subset at any position within the

dataset volume using dynamic routing. In fact, the system performance is the same

while the given subset is being moved across the volume.There are cases in which there might be no difference between static routing and

dynamic routing if the first render node contains all the subset particles. Subsequent

moving or changing of the subset, however, will change the data distribution and

massive data redistribution will be required using static routing scheme. In particu-

lar, such moving or changing of the subset may result in that the first render node

contains no subset particles. As a result, less particles will be involved in data redis-

tribution in dynamic routing than in static routing, similar to the case illustrated in

Tables 4 and 5.For uniformly distributed datasets, all render nodes may be configured to have

equal number of particles within a given initial subset using static data distribution

strategy, which may imply that static routing would not have any disadvantages over

dynamic routing. Such uniformity of subset particle distribution is immediately bro-

ken as soon as the subset is either moved or its size is changed. For instance, in a four

render node system configuration, as illustrated in Fig. 9, a quarter of particles can

be initially allocated to each render node and a subset at the volume center contains

equal number of particles from four render nodes. This balance will be broken whenthe subset moves in X or Y directions, or changes its size in either X or Y directions,

but not both. At such point, one render node will contain more subset particles than

others and will become the dominant subset render node. Since such dominant node

could be any of the four render nodes during interactive exploration of the dataset,

subset particles at this node will have to be redistributed, resulting in more data

transferring in static routing than dynamic routing.

In a Sepia-based system, the overhead introduced by re-arranging the rendering

chain is very small, the dynamic routing rendering method, therefore, greatly

Page 16: Interactive parallel visualization of large particle datasets

258 K. Liang et al. / Parallel Computing 31 (2005) 243–260

enhanced the performance of the system. In most cases, the number of particles

involved in data redistribution is only a small percentage of the total number of par-

ticles in the subset. Assuming that it takes the same amount of time for each particle

to be transferred, based on Table 4, the time used for particle data redistribution

using the dynamic routing rendering scheme is, in general, less than 3% of that usingstatic routing rendering scheme.

3.5. Results on 5123 particle dataset

A preliminary test on 5123 particle dataset has been done using 6 and 7 render

nodes and a display node. As shown in Table 6, the system takes much longer time

to perform the particle update operations than on 2563 particles. For a 5123 particles

dataset, each render node processes 8 times more particles. The large number of par-ticles pushes the system to the limit of both memory and graphics processing

capabilities.

Similarly, time used for navigation operations is greatly increased as well. The

number of particles on each render node is much more over the limit at which the

performance of calling OpenGL display lists drops dramatically. This is similar to

the 2563 case when the system runs on 1 or 2 render nodes.

The frame rate is about 1 fps for navigation operations when the system is running

on 7 render nodes. The more render nodes, the faster the rendering operations.To be able to navigate the particle volume smoothly, a much higher frame rate for

5123 particle datasets is expected. Several techniques are currently under investiga-

tion. First, current system uses a serial mode for data acquisition and image compo-

sition, i.e. image composition process starts only after the data acquisition process

has finished. Using a parallel mode will reduce about 50% combined time for data

acquisition and image composition. Second, a vertex program is to be applied to ren-

der particle points directly using the graphics processor. This may considerably

increase the rendering speed and relieve the graphics memory stress. Third, advanceddata structure, such as the hierarchical data structure in [4], for representing the par-

ticle volume is being investigated. In this case, software based rendering pre-process

can be done to reduce the workload on the graphics processor.

3.6. Limitations

The method presented in this paper renders all particles using uniform rendering

primitives. Either all particles are rendered as opaque points or some of them are

Table 6

Operation time (s) on 5123 particle datasets

Operation Nodes

6 7

Navigation 1.157 1.007

Particle update 189.5 154.58

Page 17: Interactive parallel visualization of large particle datasets

K. Liang et al. / Parallel Computing 31 (2005) 243–260 259

rendered as transparent points. This may make it difficult to identify exact particle

structures when particles in a given dataset are highly clustered. Although different

color mapping methods and subset visualization technique help to highlight partic-

ular part of the particles, the overall structure of the particle dataset, especially inter-

nal structure, is hard to be recognized. In this case, we may want to render allparticles as transparent points. This will allow us to visualize the overall particles

structure and internal features in the same way as in texture mapping based volume

rendering method.

4. Conclusions and future work

We have presented both static and dynamic data distribution schemes and directparticle visualization techniques for interactively viewing large particle datasets using

the Sepia rendering cluster. The results on 2563 particle dataset show that the system

provides an interactive rate at about 9 fps for common visualization tasks, such as

zooming, rotation, and translation. The system allows for interactively changing par-

ticles� visual properties and highlighting particular regions of interests for detailed

analysis. An 1 fps is obtained at an initial test on 5123 particle dataset using 6 or 7

render nodes and a display node. Using a dynamic routing rendering scheme, less ren-

der nodes are involved in particle data redistribution and only a small percentage ofthe particles inside the subset is required to be transferred between render nodes. In

general, the time used for particle data redistribution using the dynamic routing ren-

dering scheme is usually less than 3% of that using static routing rendering scheme.

The most challenging task is to improve the frame rate for the 5123 particle data-

sets. We are currently investigating a number of techniques. These include parallel-

ization of the data acquisition and image compositing, integrating better data

structures for representing particle volumes, and applying other rendering optimiza-

tion techniques.

Acknowledgement

The authors would like to thank Hewlett Packard for providing the Sepia system

and technical support to the system. We would also like to thank the department of

Research and High Performance Computing Support and SharcNet at McMaster

University for the use of the Sepia cluster. We are very grateful for James Wadsleyfor his particle dataset and helpful discussions on many practical particle visualiza-

tion issues from the point of view of physics scientists. We appreciate very much to

the anonymous reviewers for their constructive suggestions.

References

[1] B. Cabral, N. Cam, J. Foran, Accelerated volume rendering and tomographic reconstruction using

texture mapping hardware, in: 1994 Symposium on Volume Visualization, ACM Press, NY, USA,

1994, ISBN 0-89791-741-3.

Page 18: Interactive parallel visualization of large particle datasets

260 K. Liang et al. / Parallel Computing 31 (2005) 243–260

[2] A. Heirich, L. Moll, Scalable distributed visualization using off-the-shelf components, in: 1999 IEEE

Symposium on Parallel and Large-data Visualization and Graphics, ACM Press, NY, USA, 1999, pp.

55–59.

[3] M. Hopf, T. Ertl, Hierarchical splatting of scattered data, in: Visualization �03, IEEE, 2003, pp. 433–440.

[4] M. Hopf, M. Luttenberger, T. Ertl, Hierarchical splatting of scattered 4d data, Computer Graphics

and Applications 24 (4) (2004) 64–72.

[5] A. Jenkins, C.S. Frenk, F.R. Pearce, P.A. Thomas, J.M. Colberg, S.D.M. White, H.M.P. Couchman,

J.A. Peacock, G. Efstathiou, A.H. Nelson, The Virgo Consortium, Evolution of structure in cold

dark matter universes, Astrophysical Journal 499 (1998) 20–40.

[6] S. Lombeyda, L. Moll, M. Shand, D. Breen, A. Heirich, Scalable interactive volume rendering using

off-the-shelf components, in: IEEE 2001 Symposium on Parallel and Large-data Visualization and

Graphics, IEEE Press, NJ, USA, 2001, pp. 115–121.

[7] M. Meissner, U. Hoffmann, W. Straßer, Enabling classification and shading for 3d texture mapping

based volume rendering using opengl and extensions, in: IEEE Visualization �99, IEEE, 1999, pp.207–214.

[8] L. Moll, A. Heirich, M. Shand, epia: scalable 3D compositing using PCI Pamette, in: K.L. Pocek, J.

Arnold (Eds.), IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer

Society Press, Los Alamitos, CA, 1999, pp. 146–155.

[9] D.R. Nadeau, J.D. Genetti, S. Napear, B. Pailthorpe, C. Emmart, E. Wesselak, D. Davidson,

Visualizing stars and emission nebulas, in: Proc. Eurographics �2000, Eurographics, October 2000.

[10] P.M. Papadopoulos, M.J. Katz, G. Bruno, NPACI Rocks: tools and techniques for easily deploying

manageable Linux clusters, Concurrency and Computation: Practice and Experience 15 (7–8) (2003)

707–725.

[11] S. Rusinkiewwicz, M. Levoy, QSplat: a multiresolution point rendering system for large meshes,

in: SIGGRAPH �00, ACM SIGGRAPH, 2000, pp. 343–352.

[12] Y.-P. Shan, An event-driven model-view-controller framework for smalltalk, SIGPLAN Not. 24 (10)

(1989) 347–352.

[13] J.W. Wadsley, J. Stadel, T. Quinn, Gasoline: a flexible, parallel implementation of TreeSPH, New

Astronomy 9 (February) (2004) 137–158.

[14] B. Wilson, K.-L. Ma, J. Qiang, R. Ryne, Interactive visualization of particle beams for accelerator

design, in: ICCS 2002 Workshop on High Performance Computing in Particle Accelerator Science

and Technology, pp. 352–361. 2002 International Conference on Computational Science, April 21–24

2002. Amsterdam, The Netherlands.

[15] M. Zwicker, H. Phister, J. van Baar, M. Gross, EWA volume splatting, in: Visualization �01, IEEE,2001, pp. 29–36.