Interactive parallel visualization of large particle datasets
-
Upload
kevin-liang -
Category
Documents
-
view
216 -
download
3
Transcript of Interactive parallel visualization of large particle datasets
![Page 1: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/1.jpg)
www.elsevier.com/locate/parco
Parallel Computing 31 (2005) 243–260
Interactive parallel visualization of largeparticle datasets
Kevin Liang a,*, Patricia Monger a, Huge Couchman b
a Research & HPC Support, McMaster University, Hamilton, ON, Canada L8S 4M1b Department of Physics and Astronomy, McMaster University, Hamilton, ON, Canada L8S 4M1
Received 16 February 2004; revised 20 December 2004
Available online 14 April 2005
Abstract
This paper presents a new interactive parallel visualization method for large particle data-
sets by directly rendering individual particles based on a parallel rendering cluster. A frame
rate of 9 frames-per-second is achieved for 2563 particles using 7 render nodes and a display
node. This provides real time interaction and interactive exploration of large datasets, which
has been a challenge for scientific visualization and other real time data mining applications.
A dynamic data distribution technique is designed for highlighting a subset of the particle
volume. It maintains load balance of the system and minimizes network traffic by reconfigur-
ing the rendering chain. Experiments show that on a given subset, interactive manipulation of
the subset usually requires less than 3% of the particles inside the subset to be redistributed
among all render nodes. The method can be easily extended to other large datasets such as
hydrodynamic turbulence, fluid dynamics, and so on.
� 2005 Elsevier B.V. All rights reserved.
Keywords: Interactive graphics; Parallel visualization; High performance scientific visualization; Distri-
buted/network graphics; Volume rendering
0167-8191/$ - see front matter � 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.parco.2005.02.008
* Corresponding author. Tel.: +1 604 540 0891; fax: +1 604 540 0891.
E-mail address: [email protected] (K. Liang).
![Page 2: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/2.jpg)
244 K. Liang et al. / Parallel Computing 31 (2005) 243–260
1. Introduction
One of the scientific visualization problems is to interactively visualize simulated
large particle datasets, usually containing more than a hundred million particles.
Three methods have been used to visualize large particle datasets. Researchers viewindividual particles directly, convert the particles to volumetric data representing
particle density [7], or explore the particles using a combined direct- and volume-
based rendering method [14]. Direct particle rendering takes longer time on desktop
graphics workstations; therefore, it is usually difficult to obtain interactive frame
rate. Volume rendering and combined methods can provide interactive frame rates
but have the limitation of missing fine structures. In this paper, we investigate a di-
rect rendering method using a parallel rendering cluster, the Sepia [8] from Hewlett–
Packard. The aim is to develop a system which can interactively explore 2563 and5123 particle datasets without missing fine structures.
Interactive particle rendering is very attractive to scientists who are interested in
exploring particle datasets in full detail and from many different perspectives. Recent
research has shown that it helps study the development of cosmic structure, such as
star formation, planet formation, galaxy evolution, and so on [5,9,13]. Rendering a
large particle dataset, however, is so computationally and graphically intensive that a
single desktop computer is unable to handle the problem. In this work, we propose
to use a parallel rendering cluster with a Sepia-2a compositing board [8] at eachnode. The cluster has 8 nodes, configured as 7 render nodes and a display node.
Using a dynamic rendering routing technique and a set of optimal particles visual
control methods, a frame rate of 9 frames-per-second (fps) is obtained for 2563 par-
ticle datasets, such as the one shown in Fig. 1. An interactive system built using this
Fig. 1. Direct rendering of a 2563 particle dataset.
![Page 3: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/3.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 245
method allows scientists to interactively navigate the particle volume and to study
simulation results using various particle visual representations.
For 5123 particle datasets, the system runs at 1 fps using 7 render nodes and a dis-
play node. By using more render nodes, integrating more advanced rendering tech-
niques, and applying other optimization techniques, we expect to obtain higherframe rate.
1.1. Related work
Visualization of simulated particle datasets has focused on direct particle render-
ing and volumetric rendering. The direct rendering method views each particle using
a simple geometric primitive, often a single point. Volume-based particle rendering
first converts the particles to volumetric data and then displays the particles asiso-surfaces or as a series of closely-spaced parallel texture mapped planes [1]. Direct
rendering has limitations for interactive exploration of large particle datasets using
desktop graphics workstations. A workstation either is unable to process such large
datasets or takes too long for rendering a single frame.
Using a hierarchical data structure and principal component analysis (PCA) tech-
nique, Hopf and Ertl [3,4] proposed a splatting method to interactively visualize
large scattered data on desktop workstations. The method first creates a hierarchy
based on a principal component analysis clustering procedure. The hierarchical datastructure separates clusters from point data. Clusters at lower hierarchy levels are
rendered as splats during interaction to accelerate the rendering process as a
trade-off of image quality on standard workstations. Other methods, such as those
proposed in [11,15], are more focused on rendering surfaces.
In volume rendering, the range covered by the particle dataset is evenly divided
into voxels, and each voxel value is assigned a density based on the number of par-
ticles inside it. The data are then converted into a texture and rendered on the screen
as a series of parallel texture-mapped planes, creating the illusion of volume. Thismethod can provide real-time visualization when using lower resolutions [7]. A major
drawback of volume rendering is its missing fine structures due to limited voxel res-
olution. A combined method [14] provides a compromise solution to interactive visu-
alization of large particle datasets, but it suffers the same low resolution and missing
fine structure problems as volume rendering does.
1.2. Sepia-based parallel volume rendering
Moll et al. [8] proposed a scalable 3D compositing architecture, called Sepia, to
interactively render partitioned datasets. A Sepia cluster consists of a number of
nodes, each with a Sepia-2a compositing card that implements a sort-last image com-
position engine. It provides scalability for interactive volume rendering [6] and visu-alization of large scientific datasets [2]. A high bandwidth compositing network
makes interactivity possible by its speed, scalability, and low latency. Each node
in the cluster can be configured as either a render node or a display node. Render
![Page 4: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/4.jpg)
246 K. Liang et al. / Parallel Computing 31 (2005) 243–260
nodes, the workers, are responsible for portions of rendering task. A display node,
the master, is the last node in the Sepia rendering pipeline. It can be used to control
user interaction, coordinate operations among render nodes, and project composited
images to the target display.
As discussed in Section 1.1, existing researches on interactive direct rendering oflarge particle datasets have a number of limitations. Using a single desktop work-
station, we cannot visualize massive datasets interactively by rendering millions of
colored points for each frame, or a trade-off of rendering speed versus image quality
as in [4,7,14] has to be made. As parallel computing and advanced clusters become
more popular among researchers dealing with large datasets, a new method is ex-
pected not only to maintain all the fine features of the particle volume, but also to
provide interactive frame rate using parallel rendering technique on a rendering
cluster.The method presented in this paper uses parallel rendering techniques on a
scalable rendering cluster, the Sepia. Using a Sepia system avoids the memory
problem as experienced in stand-alone graphics workstations and allows for deal-
ing with increasingly large scientific datasets. In addition a scalable application
can be easily designed on the Sepia system. The more render nodes in the system,
the larger particle datasets can be processed or the higher frame rates can be
achieved.
2. Data distribution and parallel visualization
In this work, we focus on datasets from a series of simulations in a periodic vol-
ume. The dataset shown in Fig. 1 is a simulation of a collection of galaxies in a peri-
odic volume. It is a pure gravity simulation in which only the mass of the galaxies,
not the stars or gas, was being simulated. The dataset shows the final time of the sim-
ulation, i.e. the current age of the universe. One of the uses for these simulations is tounderstand properties of clusters of galaxies. Each bright point in the image is a large
galaxy or group of galaxies evolved within a Lambda cold dark matter cosmology to
the current time. Studying the simulation results help cosmologists better understand
the nature and origin of the universe [13].
In this applications, the dataset is organized as randomly distributed 3D points.
Each particle is defined by its 3D position, mass, density, and velocity.
2.1. Static data distribution
Based on a static workload distribution method, an equal portion of particles is
initially allocated to each render node. This static equal distribution strategy guaran-
tees that no individual node becomes the performance bottleneck because each has
almost same number of particles to process. The current static distribution is solely
based on the data organization of the simulation results, in which particles are ran-
domly distributed within the volume. This data organization obviously prevents
![Page 5: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/5.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 247
some better sorting methods to improve the rendering process. A pre-process proce-
dure could be applied to sort the particles into a hierarchical data structure such as
octree first, as in [4], and then the static distribution method is used to distribute the
particles based on the sorted hierarchy of the datasets. In this case, the master pro-
gram has to load all the particles into its memory and sort the dataset, which isimpossible for our current system when dealing with 5123 particle datasets. In addi-
tion, for a series of datasets, this will greatly increase the data loading time and dis-
tribution time. Therefore, currently we simply distribute particles based on the
original particle organization of the dataset. The dataset is partitioned into equal
portion of particles and each render node concurrently loads only its portion of
the particles.
2.2. Dynamic data distribution
When exploring the particle internal structure, scientists often want to exam a
chosen subset in detail while being able to see the entire structure of the particle data-
set. One way is to apply different color maps to the subset and non-subset particles
respectively. This, however, can�t provide a clear view of the subset particles if there
are many clusters of particles around the subset. Another approach is to use alpha
blending operation. Particles inside the subset are rendered as opaque points while
particles outside the subset as transparent points. In this case, particles need to beredistributed among the render nodes in order to apply alpha blending operation
along the rendering pipeline. Subset particles need to be rendered before the rest
of the particles. This can be done by designating a few render nodes for processing
subset particles and remaining render nodes for the rest of the particles. The number
of render nodes designated for subset particles can be determined based on the num-
ber of particles within the subset. The basic strategy is to maintain the particle load
balance among all render nodes. This is possible because the last render node for the
subset can have particles outside the subset. One way to do this is to choose the firstfew render nodes in the initial rendering pipeline as the designated subset render
nodes. All particles within the subset at each render node are moved to the desig-
nated subset render nodes. Others are moved to the remaining render nodes. This
is obviously not an efficient solution because it involves massive particle exchange
among render nodes.
A better way is to choose render nodes with higher number of subset particles
for subset render nodes and to change the rendering pipeline according to the
number of subset particles contained in each render node. This will minimizethe data exchange among all render nodes as only a small portion of the parti-
cles at most nodes needs to be exchanged. In the new pipeline, the subset render
nodes will be moved to the front of other render nodes. This can be achieved
using Sepia�s dynamic routing functions. For each node, a new upstream node
and a downstream node are set based on the number of subset particles at each
node.
![Page 6: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/6.jpg)
Algorithm: Dynamic particle distribution
• choose a subset.• find the bounding box of the subset.• calculate the number of particles within the subset at each node.• calculate the total number of subset particles.• determine the number of render nodes required for the subset, i.e.,
nsubset ¼ ntotal �psubsetptotal
where nsubset and ntotal are the number of render nodes for subset particles and
total number of render nodes respectively; psubset and ptotal are the number of
subset particles and total number of particles respectively.• select the top nsubset nodes that have more particles in the subset.• exchange particles between nodes so that the first nsubset � 1 nodes contains
only subset particles, the nsubset�th node contains subset particles and possibly
particles outside the subset, and other nodes contains only particles outside the
subset.• change the rendering chain so that the nsubset nodes are in front of other
nodes in the rendering pipeline.• reset the upstream and downstream nodes for each node.
248 K. Liang et al. / Parallel Computing 31 (2005) 243–260
The redistribution process uses a similar strategy to that of the initial distribution
in which each node gets approximately equal share of particles. The particle redistri-
bution process is detailed in the dynamic particle distribution algorithm. Theoreti-cally all particles outside the subset should be sorted first in respect to current
view point before being rendered. The dynamic routing algorithm does not take into
account of such sorting because our focus is on the subset particles. Particles outside
the subset are rendered to simply provide an overall structure.
Fig. 2 shows an example of dynamic routing on a 6 render node case. At first, the
system creates a rendering pipeline according to the number of render nodes allo-
cated to the current running session. This initial pipeline is usually based on the
physical node configuration. Each render node, except the first and the last, isdirectly connected to its upstream render node and its downstream render node.
node 0 node 1 node 2 node 3 node 5node 4
node 0 node 2 node 4 node 5node 1 node 3
initial rendering node chain
new rendering node chain
Fig. 2. Dynamic routing rendering pipeline.
![Page 7: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/7.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 249
The first render node has no upstream render node. The last render node has no
downstream render node and is connected to the display node. The top part of
Fig. 2 shows an initial rendering pipeline when using 6 render nodes. Each render
node renders its share of particles and combines its rendering result with the result
from the previous node in the pipeline (details are described in Section 2.4). Thecomposited result is then transferred to the next render node in the pipeline.
During dynamic data redistribution, the system first determines that two nodes
are required for rendering subset particles. It then chooses those two nodes with
the most subset particles, i.e. nodes 1 and 3 in this case. All particles outside the sub-
set on node 1 will be redistributed to node 0, 2, 4, or 5. Depending on the total num-
ber of particles in the subset, node 3 may only need to redistribute part of its particles
outside the subset to node 0, 2, 4, or 5. Particles inside the subset on nodes 0, 2, 4,
and 5 are transferred to either node 1 or node 3 in exchange of particles outside thesubset. The redistribution algorithm maintains the particle load balance among all
render nodes. In general, the number of particles on each render node remains the
same as before the redistribution. The 6 node rendering pipeline is finally changed
so that the two render nodes containing subset particles are set in front of all other
render nodes, as illustrated in Fig. 2.
2.3. Rendering particle primitives
In direct rendering, each particle is represented using a single point primitive. All
point primitives use the same radius, which is one in our examples. Using a uniform
point representation avoids OpenGL rendering overhead that may be incurred by
changing the point size for each particle and performing complex transformations re-
quired to position different primitives. In this case, particles at each node can be ren-
dered to a display list in a single loop or using a single OpenGL array rendering call.
Particle densities are usually used to reveal the structure of the particle volume.
To visualize the particles, particle densities are transformed to RGB or RGBA colorsbased on a given color map. This is usually done using a logarithm mapping which
maps the density of a particle to a specific color in the color map, i.e.
ðr; g; b; aÞ ¼ mapðlogðdÞÞwhere (r,g,b,a) is the mapped values of RGB and alpha channels, d is the density of
a given particle, and map( ) is a function from a given logarithmic density to a color.
Linear mapping and other non-linear mappings can be applied as well according tothe application�s characteristics. Non-linear color mapping method can be conve-
niently used to highlight particles in a particular range of densities. Images in this
work are produced using a variety of color maps, such as rainbow color map, light
hue color map, and black-to-yellow color map.
2.4. Image composition scheme
The Sepia cluster offers a variety of compositing operations, such as Z-compari-son using OpenGL comparison operators, alpha blending, full scene anti-aliasing,
![Page 8: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/8.jpg)
Fig. 3. Particle image compositing on four render nodes.
Fig. 4. Subset particles rendered using alpha blending.
250 K. Liang et al. / Parallel Computing 31 (2005) 243–260
and user down-loadable re-programmable computational logic. For direct particle
visualization, we use Z-comparison to composite the rendered image at each node
with the image from its upstream node in the pipeline. Fig. 3 illustrates rendered par-
ticles at each node and composited particles after each node in a case of 4 render
nodes and a display node.
Using alpha blending, a chosen subset of the particle dataset can be highlighted
using the dynamic particle distribution algorithm. As shown in Fig. 4, particles out-
side the selected subset are rendered as transparent particles while particles withinthe subset are rendered as opaque particles. This allows for detailed study of parti-
cular regions of particles while users are exploring the whole particle dataset.
![Page 9: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/9.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 251
2.5. Rendering synchronization
In a parallel rendering system, operations on different nodes need to be synchro-
nized so that particles are rendered at the same time on all render nodes in response
to the same command. Otherwise, it is very likely that some render nodes havedifferent view points from others. This obviously will result in a compositing image
that is not a correct representation of the given particle dataset.
The system uses an event-driven mechanism [12] for operation synchronization
among all nodes using event handles. The master listens to and accepts user requests
and concurrently dispatches the requests to all render nodes. A frame controller is
used at both master node and render nodes. The master first sends a command to
all render nodes to initialize the controller. It then sends each command to the render
nodes, along with the frame controller. At each step, the render nodes respond to themaster�s commands and run only commands associated with the same frame control-
ler. This ensures that all render nodes are working on the same frame at the same
time.
3. Results and discussion
3.1. Implementation
An interactive visualization system has been implemented using OpenGL, Qt
toolkit and C++ on a HP�s Visualization Market Development System, the Sepia-
2 cluster, running NPACI Rocks clustering software [10]. The cluster contains 8
nodes, one display node and 7 render nodes. Each node is a HP Workstation
xw8000 with dual Intel(R) XeonTM 2.40GHz processors, 1GB memory, a sepia-2a
compositing card, a nVIDIA Quadra4 900 XGL graphics card with 128MB memory,
an Ethernet card and a ServerNet II card. A graphical user interface is designed toallow scientists and researchers to interactively manipulate and explore large particle
datasets. Communications among nodes in the system are handled using Unix sock-
et-based message passing interface.
At each render node, a set of OpenGL display lists are used for directly rendering
particles as point primitives. Particles at each render node are built into the display
lists when particle properties are changed. Using OpenGL display lists greatly en-
hances the system performance when users navigate the particle dataset. As OpenGL
display lists have to be stored in the graphics memory, this may become the con-straints of the system performance for increasing number of particles. In this case,
vertex arrays may be a better choice than display lists. We are still investigating this
on the 5123 particle datasets.
3.2. Particles visual representations
Figs. 5 and 6 show different visual representations of the 2563 particle cube. In
Fig. 5, a front view of the directly rendered particles is modeled using an inverse
![Page 10: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/10.jpg)
Fig. 5. Particles represented using inverse rainbow color map: (a) front view and (b) a close look.
Fig. 6. Particles represented using light hue color map (left) and from black to yellow linear color map
(right).
252 K. Liang et al. / Parallel Computing 31 (2005) 243–260
rainbow color map. The right image shows a close look at a specific part of the par-
ticle cube. Clusters of red particles represent high density areas. Fig. 6 (left) shows
the same 2563 particles cube using another color map, in which high density particles
are mapped to brown while low density particles are mapped to light black. A linear
color mapping from black to yellow is applied to the same particle cube to create a
different particle visual representation in Fig. 6 (right).
3.3. Results on 2563 particle dataset
A comprehensive test has been done for the 2563 particle dataset using 1–7 render
node(s) and a display node. In general, the performance is constrained by particle
rendering, data acquisition, and image composition. Particle rendering depends on
the number of particles at the render nodes. Data acquisition and image composition
are dependent on the Sepia hardware and its associated visualization toolkits.
![Page 11: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/11.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 253
There are two types of operations, navigation and particle update.
• Navigation operations include zooming, rotation, and panning/translation. These
operations allow users to look around and to go inside the particle volume.
Because the particles visual representations are not changed, the system simplycalls the OpenGL display lists created while doing particle update operations, pro-
viding a high interactive frame rate.
• Particle update operations change the particles attributes and build the OpenGL
display lists. The particle attributes can be changed by modifying particles� posi-tions, colors and alpha channels using different color maps and/or different color
mapping methods. Particle update operations allow users to explore the overall
structure and fine features of the particle volume, and to study a specific region
of interest in the particle volume.
Table 1 shows system timings for individual navigation operations on 4, 5, 6, and 7
render nodes. The time shown in the table is taken during a series of navigation oper-
ations. Therefore, it includes only one data acquisition and image compositing time
that is independent on the number of render nodes. Table 2 shows time used for
different particle update operations.
As shown in Table 1, the time used for navigation operations remains the same,
allowing interactive flying-through the particle volume. This is because each rendernode processes about 4 million particles and it takes very little time to update a frame
by calling the already built OpenGL display lists. Our experience showed that the
number of display lists, 1024 in this test, is crucial to obtain this performance on
the given graphics card. The performance would drop dramatically if fewer display
lists are used or the number of particles on each node is over a certain limit. This can
be seen from Fig. 7 as the navigation operation time drops rapidly from about 0.78 s
on 1 render node to about 0.112 s on 3 render nodes.
Table 1
Navigation operation time (s) on 2563 particle dataset
Operation Nodes
4 5 6 7
Zoom in/out 0.1119 0.1127 0.1123 0.1123
Rotate 0.1119 0.1127 0.1124 0.1123
Pan/translate 0.1119 0.1127 0.1125 0.1124
Table 2
Particle update operation time (s) on 2563 particle dataset
Operation Nodes
4 5 6 7
Particle update 3.05 2.513 2.06 1.56
Changing color map 3.06 2.46 2.15 1.60
Sorting by color 3.21 2.47 2.15 1.58
Changing color mapping 3.20 2.50 2.06 1.60
![Page 12: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/12.jpg)
0.80
0.70
Time (sec)
acquisition and composition time
particle rendering time
overall operation time
1 2
0.10
0.20
0.30
0.40
0.50
0.60
Nodes3 4 65 7
Fig. 7. Navigation operation time (s) vs. the number of render nodes.
254 K. Liang et al. / Parallel Computing 31 (2005) 243–260
On the other hand, as shown in Table 2, particle update operations take much
longer time than navigation operations because they need to build the display lists
for rendering all the particles. Their performance is totally dependent on the render-ing capabilities of each render node. The less the number of particles at each render
node, the faster the operation.
In practice, scientists are more interested in navigation operations because these
operations provide sufficient power for them to study the particle dataset fromdifferent
view points, to fly through the particle cube, and to dive into specific regions for de-
tailed analysis. Based on the data in Table 1, a frame rate of 1/0.112 = 8.928 fps is ob-
tained for navigation operations on the 2563 particle dataset using 4 or more render
nodes. This result is very satisfactory for interactive visualization of 2563 particle data-sets. Operations such as rotation, zooming in and out, and panning run very smoothly.
Fig. 7 illustrates navigation operation time against the number of nodes. As seen
in the figure, the rendering time drops as the number of nodes increases from 1 to 3
nodes. From 3 to 7 nodes, the rendering time is close to 0 and the overall navigation
operation time is dominated by the acquisition and compositing time. For particle
update operations, as shown in Fig. 8, the rendering time drops as the number of ren-
der nodes is increased. This illustrates that the presented method is scalable for sys-
tems with up to 7 render nodes. The particle update operation time is dependent onthe graphics processing capability at each render node. The performance gain is
mainly from the speedup of individual nodes. For instance, for system running on
4 and 6 nodes, each node takes (64 � 42) * 2562 (or 34%) less particles in 6 nodes
rendering than in 4 nodes rendering, and, therefore, the overall performance is
improved by about 30%.
![Page 13: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/13.jpg)
8.00
Time (sec)
acquisition and composition time
particle rendering time
overall operation time
7.00
1 2
2.00
3.00
4.00
5.00
6.00
1.00
Nodes3 4 5 6 7
Fig. 8. Particle update operation time (s) vs. the number of render nodes.
Table 3
Time (s) split for navigation operations
Operation Nodes
4 5 6 7
Rendering particles 0.0007 0.0006 0.0006 0.0006
Acquisition and composition 0.1082 0.1121 0.1118 0.1118
K. Liang et al. / Parallel Computing 31 (2005) 243–260 255
In Table 3, the navigation operation time is split into particle rendering time and
image data acquisition and composition time for system running on 4, 5, 6, and 7
render nodes. The rendering time, as mentioned earlier, is the time used for calling
a set of display lists built in a particle update operation. The data show that, with4 or more render nodes, the navigation operation time is completely dependent on
the acquisition and compositing time, which is constrained by the Sepia hardware.
In conclusion, the performance of the current system is basically determined by
the number of render nodes, especially for particle update operations. The image data
acquisition and composition time at each node is dependent on the Sepia data trans-
fer network and data compositing engine. When using more than 3 render nodes, the
Sepia data acquisition and compositing engine has been identified as the bottleneck
for navigation operations of the system.
3.4. Minimization of data distribution for subset visualization
A dynamic load balance and data redistribution method, as described in Section
2.2, is used to highlight sub-volume of a large particle dataset. Table 4 shows the
![Page 14: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/14.jpg)
Fig. 9. Subset particle distribution for uniformly distributed datasets on four render nodes. Left: initial
distribution and a central subset, Center: moving subset in Y direction, Right: expanding subset in X
direction.
Table 4
Comparison between static and dynamic routing rendering schemes
Operation Routing Nodes involved Particles transferred
Static Dynamic Static Dynamic
Initial sub-volume 4 3 156,700 51,474
Changing X 4 3 85,314 797
Changing Y 4 2 78,099 871
Changing Z 4 3 151,125 45,667
Expanding X 4 3 187,540 6396
Expanding Y 4 3 168,058 3165
Expanding Z 4 3 161,913 3854
Table 5
Subset particle distributions of subset manipulation among 6 render nodes
Operation Nodes
0 1 2 3 4 5
Initial sub-volume 0 14,878 105,226 36,596 0 0
Moving along X 0 60 84,517 737 0 0
Moving along Y 0 0 77,228 871 0 0
Moving along Z 0 0 105,458 34,603 11,064 0
Expanding X 0 5537 181,143 860 0 0
Expanding Y 0 1577 164,893 1588 0 0
Expanding Z 0 627 158,059 3227 0 0
256 K. Liang et al. / Parallel Computing 31 (2005) 243–260
number of nodes and the number of particles involved in the data redistribution
using both static and dynamic routing rendering schemes. Table 5 shows the distri-bution of particles inside a given subset on typical subset operations, such as initial
setup, changing the subset position, and changing the size of the subset.
In general, as illustrated in Table 5, more render nodes are involved in the manip-
ulation of the subset visualization using the static routing rendering scheme than
using the dynamic routing rendering scheme. For an initial rendering node chain
![Page 15: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/15.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 257
shown in Fig. 2, all particles inside the subset need to be transferred to node 0, in
exchange with particles outside the subset in order to maintain the load balance
among all render nodes. For instance, to move a given subset [(�0.25,�0.25,
�0.25), (0.25,0.25,0.25)] along X axis from its initial center position (0,0,0) to
(0.04,0,0), static routing rendering scheme needs to transfer all particles inside thenew subset on render nodes 1, 2 and 3 to node 0 and to receive corresponding num-
ber of particles outside the subset from node 0. This involves exchanging 60 particles
between nodes 0 and 1, 84,517 particles between nodes 0 and 2, and 737 particles be-
tween nodes 0 and 3. The total number of particles transferred for this operation is
170,628. In contrast, the dynamic routing rendering scheme requires only 60 particles
exchanged between nodes 1 and 2 and 737 particles between nodes 3 and 2. The
number of particles transferred for the operation is 1594, which is only 0.93% of
the particles transferred in static routing rendering scheme. The worst case is to movethe subset along Z axis, in which 45,667 particles, about 30% of the particles to be
transferred in static routing rendering scheme, need to be transferred among 3 render
nodes. This is due to the clustering property of the dataset.
Although the data shown in Tables 4 and 5 were based on a single operation on
the subset [(�0.25,�0.25,�0.25), (0.25,0.25,0.25)] at its initial center position,
experiment showed that it is arbitrary to define a subset at any position within the
dataset volume using dynamic routing. In fact, the system performance is the same
while the given subset is being moved across the volume.There are cases in which there might be no difference between static routing and
dynamic routing if the first render node contains all the subset particles. Subsequent
moving or changing of the subset, however, will change the data distribution and
massive data redistribution will be required using static routing scheme. In particu-
lar, such moving or changing of the subset may result in that the first render node
contains no subset particles. As a result, less particles will be involved in data redis-
tribution in dynamic routing than in static routing, similar to the case illustrated in
Tables 4 and 5.For uniformly distributed datasets, all render nodes may be configured to have
equal number of particles within a given initial subset using static data distribution
strategy, which may imply that static routing would not have any disadvantages over
dynamic routing. Such uniformity of subset particle distribution is immediately bro-
ken as soon as the subset is either moved or its size is changed. For instance, in a four
render node system configuration, as illustrated in Fig. 9, a quarter of particles can
be initially allocated to each render node and a subset at the volume center contains
equal number of particles from four render nodes. This balance will be broken whenthe subset moves in X or Y directions, or changes its size in either X or Y directions,
but not both. At such point, one render node will contain more subset particles than
others and will become the dominant subset render node. Since such dominant node
could be any of the four render nodes during interactive exploration of the dataset,
subset particles at this node will have to be redistributed, resulting in more data
transferring in static routing than dynamic routing.
In a Sepia-based system, the overhead introduced by re-arranging the rendering
chain is very small, the dynamic routing rendering method, therefore, greatly
![Page 16: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/16.jpg)
258 K. Liang et al. / Parallel Computing 31 (2005) 243–260
enhanced the performance of the system. In most cases, the number of particles
involved in data redistribution is only a small percentage of the total number of par-
ticles in the subset. Assuming that it takes the same amount of time for each particle
to be transferred, based on Table 4, the time used for particle data redistribution
using the dynamic routing rendering scheme is, in general, less than 3% of that usingstatic routing rendering scheme.
3.5. Results on 5123 particle dataset
A preliminary test on 5123 particle dataset has been done using 6 and 7 render
nodes and a display node. As shown in Table 6, the system takes much longer time
to perform the particle update operations than on 2563 particles. For a 5123 particles
dataset, each render node processes 8 times more particles. The large number of par-ticles pushes the system to the limit of both memory and graphics processing
capabilities.
Similarly, time used for navigation operations is greatly increased as well. The
number of particles on each render node is much more over the limit at which the
performance of calling OpenGL display lists drops dramatically. This is similar to
the 2563 case when the system runs on 1 or 2 render nodes.
The frame rate is about 1 fps for navigation operations when the system is running
on 7 render nodes. The more render nodes, the faster the rendering operations.To be able to navigate the particle volume smoothly, a much higher frame rate for
5123 particle datasets is expected. Several techniques are currently under investiga-
tion. First, current system uses a serial mode for data acquisition and image compo-
sition, i.e. image composition process starts only after the data acquisition process
has finished. Using a parallel mode will reduce about 50% combined time for data
acquisition and image composition. Second, a vertex program is to be applied to ren-
der particle points directly using the graphics processor. This may considerably
increase the rendering speed and relieve the graphics memory stress. Third, advanceddata structure, such as the hierarchical data structure in [4], for representing the par-
ticle volume is being investigated. In this case, software based rendering pre-process
can be done to reduce the workload on the graphics processor.
3.6. Limitations
The method presented in this paper renders all particles using uniform rendering
primitives. Either all particles are rendered as opaque points or some of them are
Table 6
Operation time (s) on 5123 particle datasets
Operation Nodes
6 7
Navigation 1.157 1.007
Particle update 189.5 154.58
![Page 17: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/17.jpg)
K. Liang et al. / Parallel Computing 31 (2005) 243–260 259
rendered as transparent points. This may make it difficult to identify exact particle
structures when particles in a given dataset are highly clustered. Although different
color mapping methods and subset visualization technique help to highlight partic-
ular part of the particles, the overall structure of the particle dataset, especially inter-
nal structure, is hard to be recognized. In this case, we may want to render allparticles as transparent points. This will allow us to visualize the overall particles
structure and internal features in the same way as in texture mapping based volume
rendering method.
4. Conclusions and future work
We have presented both static and dynamic data distribution schemes and directparticle visualization techniques for interactively viewing large particle datasets using
the Sepia rendering cluster. The results on 2563 particle dataset show that the system
provides an interactive rate at about 9 fps for common visualization tasks, such as
zooming, rotation, and translation. The system allows for interactively changing par-
ticles� visual properties and highlighting particular regions of interests for detailed
analysis. An 1 fps is obtained at an initial test on 5123 particle dataset using 6 or 7
render nodes and a display node. Using a dynamic routing rendering scheme, less ren-
der nodes are involved in particle data redistribution and only a small percentage ofthe particles inside the subset is required to be transferred between render nodes. In
general, the time used for particle data redistribution using the dynamic routing ren-
dering scheme is usually less than 3% of that using static routing rendering scheme.
The most challenging task is to improve the frame rate for the 5123 particle data-
sets. We are currently investigating a number of techniques. These include parallel-
ization of the data acquisition and image compositing, integrating better data
structures for representing particle volumes, and applying other rendering optimiza-
tion techniques.
Acknowledgement
The authors would like to thank Hewlett Packard for providing the Sepia system
and technical support to the system. We would also like to thank the department of
Research and High Performance Computing Support and SharcNet at McMaster
University for the use of the Sepia cluster. We are very grateful for James Wadsleyfor his particle dataset and helpful discussions on many practical particle visualiza-
tion issues from the point of view of physics scientists. We appreciate very much to
the anonymous reviewers for their constructive suggestions.
References
[1] B. Cabral, N. Cam, J. Foran, Accelerated volume rendering and tomographic reconstruction using
texture mapping hardware, in: 1994 Symposium on Volume Visualization, ACM Press, NY, USA,
1994, ISBN 0-89791-741-3.
![Page 18: Interactive parallel visualization of large particle datasets](https://reader035.fdocuments.us/reader035/viewer/2022080310/57501e201a28ab877e8f127a/html5/thumbnails/18.jpg)
260 K. Liang et al. / Parallel Computing 31 (2005) 243–260
[2] A. Heirich, L. Moll, Scalable distributed visualization using off-the-shelf components, in: 1999 IEEE
Symposium on Parallel and Large-data Visualization and Graphics, ACM Press, NY, USA, 1999, pp.
55–59.
[3] M. Hopf, T. Ertl, Hierarchical splatting of scattered data, in: Visualization �03, IEEE, 2003, pp. 433–440.
[4] M. Hopf, M. Luttenberger, T. Ertl, Hierarchical splatting of scattered 4d data, Computer Graphics
and Applications 24 (4) (2004) 64–72.
[5] A. Jenkins, C.S. Frenk, F.R. Pearce, P.A. Thomas, J.M. Colberg, S.D.M. White, H.M.P. Couchman,
J.A. Peacock, G. Efstathiou, A.H. Nelson, The Virgo Consortium, Evolution of structure in cold
dark matter universes, Astrophysical Journal 499 (1998) 20–40.
[6] S. Lombeyda, L. Moll, M. Shand, D. Breen, A. Heirich, Scalable interactive volume rendering using
off-the-shelf components, in: IEEE 2001 Symposium on Parallel and Large-data Visualization and
Graphics, IEEE Press, NJ, USA, 2001, pp. 115–121.
[7] M. Meissner, U. Hoffmann, W. Straßer, Enabling classification and shading for 3d texture mapping
based volume rendering using opengl and extensions, in: IEEE Visualization �99, IEEE, 1999, pp.207–214.
[8] L. Moll, A. Heirich, M. Shand, epia: scalable 3D compositing using PCI Pamette, in: K.L. Pocek, J.
Arnold (Eds.), IEEE Symposium on FPGAs for Custom Computing Machines, IEEE Computer
Society Press, Los Alamitos, CA, 1999, pp. 146–155.
[9] D.R. Nadeau, J.D. Genetti, S. Napear, B. Pailthorpe, C. Emmart, E. Wesselak, D. Davidson,
Visualizing stars and emission nebulas, in: Proc. Eurographics �2000, Eurographics, October 2000.
[10] P.M. Papadopoulos, M.J. Katz, G. Bruno, NPACI Rocks: tools and techniques for easily deploying
manageable Linux clusters, Concurrency and Computation: Practice and Experience 15 (7–8) (2003)
707–725.
[11] S. Rusinkiewwicz, M. Levoy, QSplat: a multiresolution point rendering system for large meshes,
in: SIGGRAPH �00, ACM SIGGRAPH, 2000, pp. 343–352.
[12] Y.-P. Shan, An event-driven model-view-controller framework for smalltalk, SIGPLAN Not. 24 (10)
(1989) 347–352.
[13] J.W. Wadsley, J. Stadel, T. Quinn, Gasoline: a flexible, parallel implementation of TreeSPH, New
Astronomy 9 (February) (2004) 137–158.
[14] B. Wilson, K.-L. Ma, J. Qiang, R. Ryne, Interactive visualization of particle beams for accelerator
design, in: ICCS 2002 Workshop on High Performance Computing in Particle Accelerator Science
and Technology, pp. 352–361. 2002 International Conference on Computational Science, April 21–24
2002. Amsterdam, The Netherlands.
[15] M. Zwicker, H. Phister, J. van Baar, M. Gross, EWA volume splatting, in: Visualization �01, IEEE,2001, pp. 29–36.