High-Quality Volume Rendering - uni-stuttgart.deengel/engel_egStarReport2002... · Engel and Ertl /...

24
EUROGRAPHICS ’02 STAR – State of The Art Report Interactive High-Quality Volume Rendering with Flexible Consumer Graphics Hardware Klaus Engel and Thomas Ertl Visualization and Interactive Systems Group, University of Stuttgart Abstract Recently, the classic rendering pipeline in 3D graphics hardware has become flexible by means of programmable geometry engines and rasterization units. This development is primarily driven by the mass market of computer games and entertainment software, whose demand for new special effects and more realistic 3D environments induced a reconsideration of the once static rendering pipeline. Besides the impact on visual scene complexity in computer games, these advances in flexibility provide an enormous potential for new volume rendering algorithms. Thereby, they make yet unseen quality as well as improved performance for scientific visualization possible and allow to visualize hidden features contained within volumetric data. The goal of this report is to deliver insight into the new possibilities that programmable state-of-the-art graphics hardware offers to the field of interactive, high-quality volume rendering. We cover different slicing approaches for texture-based volume rendering, non-polygonal iso-surfaces, dot-product shading, environment-map shading, shadows, pre- and post-classification, multi-dimensional classification, high-quality filtering, pre-integrated clas- sification and pre-integrated volume rendering, large volume visualization and volumetric effects. 1. Introduction The massive innovation pressure exerted on the manufactur- ers of PC graphics hardware by the computer games market has led to very powerful consumer 3D graphics accelerators. The latest development in this field is the introduction of pro- grammable graphics hardware that allows computer game developers to implement new special effects and more com- plex and compelling scenes. As a result, the current state- of-the-art graphics chips, e.g. the NVIDIA GeForce4, or the ATI Radeon8500, are not only competing with profes- sional graphics workstations but often surpass the abilities of such hardware regarding speed, quality, and programma- bility. Therefore, this kind of hardware is becoming increas- ingly attractive for scientific visualization. Due to the new programmability offered by graphics hardware, many pop- ular visualization algorithms are now being mapped effi- ciently onto graphics hardware. Moreover, this innovation leads to completely new algorithms. Especially in the field of volume graphics, that at all times had very high computa- tion and resource requirements, the last few years brought a large number of new and optimized algorithms. In return for the innovations delivered by the gaming industry, scientific volume visualization can provide new optimized algorithms for volumetric effects in entertaiment applications. This report covers newest volume rendering algo- rithms and their implementation on programmable con- sumer graphics hardware. Specialized volume rendering hardware 34 30 is not the focus of this report. If details on the implementation of a specific algorithms are given, we focus on OpenGL and its extensions 32 , although Microsoft’s DirextX/3D API provides similar functionality for a compa- rable implementation. In this report, we restrict ourselves to volume data de- fined on rectilinear grids. In such grids, the volume data are comprised of samples located at grid points, which are equispaced along each volume axis, and can therefore eas- ily be stored in a texture map. We will start with some ba- sic, yet important considerations, that are required to pro- duce satisfying visualization results. After a brief introduc- tion into the new features of consumer graphics hardware, we will outline the basic principle of texture-based volume rendering with different slicing approaches. Subsequently, we introduce several optimized algorithms, which represent c The Eurographics Association 2002.

Transcript of High-Quality Volume Rendering - uni-stuttgart.deengel/engel_egStarReport2002... · Engel and Ertl /...

EUROGRAPHICS ’02 STAR – State of The Art Report

Interactive High-Quality Volume Renderingwith Flexible Consumer Graphics Hardware

Klaus Engel and Thomas Ertl

Visualization and Interactive Systems Group, University of Stuttgart

Abstract

Recently, the classic rendering pipeline in 3D graphics hardware has become flexible by means of programmablegeometry engines and rasterization units. This development is primarily driven by the mass market of computergames and entertainment software, whose demand for new special effects and more realistic 3D environmentsinduced a reconsideration of the once static rendering pipeline. Besides the impact on visual scene complexity incomputer games, these advances in flexibility provide an enormous potential for new volume rendering algorithms.Thereby, they make yet unseen quality as well as improved performance for scientific visualization possible andallow to visualize hidden features contained within volumetric data.The goal of this report is to deliver insight into the new possibilities that programmable state-of-the-art graphicshardware offers to the field of interactive, high-quality volume rendering. We cover different slicing approachesfor texture-based volume rendering, non-polygonal iso-surfaces, dot-product shading, environment-map shading,shadows, pre- and post-classification, multi-dimensional classification, high-quality filtering, pre-integrated clas-sification and pre-integrated volume rendering, large volume visualization and volumetric effects.

1. Introduction

The massive innovation pressure exerted on the manufactur-ers of PC graphics hardware by the computer games markethas led to very powerful consumer 3D graphics accelerators.The latest development in this field is the introduction of pro-grammable graphics hardware that allows computer gamedevelopers to implement new special effects and more com-plex and compelling scenes. As a result, the current state-of-the-art graphics chips, e.g. the NVIDIA GeForce4, orthe ATI Radeon8500, are not only competing with profes-sional graphics workstations but often surpass the abilitiesof such hardware regarding speed, quality, and programma-bility. Therefore, this kind of hardware is becoming increas-ingly attractive for scientific visualization. Due to the newprogrammability offered by graphics hardware, many pop-ular visualization algorithms are now being mapped effi-ciently onto graphics hardware. Moreover, this innovationleads to completely new algorithms. Especially in the fieldof volume graphics, that at all times had very high computa-tion and resource requirements, the last few years brought alarge number of new and optimized algorithms. In return forthe innovations delivered by the gaming industry, scientific

volume visualization can provide new optimized algorithmsfor volumetric effects in entertaiment applications.

This report covers newest volume rendering algo-rithms and their implementation on programmable con-sumer graphics hardware. Specialized volume renderinghardware34 � 30 is not the focus of this report. If details onthe implementation of a specific algorithms are given, wefocus on OpenGL and its extensions32 , although Microsoft’sDirextX/3D API provides similar functionality for a compa-rable implementation.

In this report, we restrict ourselves to volume data de-fined on rectilinear grids. In such grids, the volume dataare comprised of samples located at grid points, which areequispaced along each volume axis, and can therefore eas-ily be stored in a texture map. We will start with some ba-sic, yet important considerations, that are required to pro-duce satisfying visualization results. After a brief introduc-tion into the new features of consumer graphics hardware,we will outline the basic principle of texture-based volumerendering with different slicing approaches. Subsequently,we introduce several optimized algorithms, which represent

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

the state-of-the-art in hardware-accelerated volume render-ing. Finally, we summarize current restrictions of consumergraphics hardware and give some conclusions with an out-look on future developments.

2. Basics

Although volumetric data is defined over a continuous three-dimensional domain (R3), measurements and simulationsprovide volume data as 3D arrays, where each of the scalarvalues that comprise the volume data set is obtained by sam-pling the continuous domain at a discrete location. Thesevalues are referred to as voxels (volume elements) and usu-ally quantized to 8, 16, or 32 bit accuracy and saved as fixedpoint or floating point numbers. Figure 1 shows such a rep-resentation of a volumetric object as a collection of a largenumber of voxels.

Figure 1: Voxel representation of a volumetric object afterit has been discretized.

2.1. Reconstruction

In order to reconstruct25 � 31 the original continuous signalfrom the voxels, a reconstruction filter is applied, that calcu-lates a scalar value for the continuous three-dimensional do-main (R3) by performing a convolution of the discrete func-tion with a filter kernel. It has been proved, that the “perfect”,or ideal reconstruction kernel is provided by the sinc filter 33

sinc�x ��� sin

�πx �

πx � (1)

As this reconstruction filter has an unlimited extent, in prac-tice more simple reconstruction filters like tent or box fil-ters are applied (see Figure 2). Current graphics hardwareprovides linear, bilinear, and trilinear filtering for magnifi-cation and pre-filtering with mip-mapping and anisotropicfilters for minification. However, due to the availability ofmulti-textures and flexible per-fragment operations, con-sumer graphics hardware also allows filters with higher qual-ity (see Chapter 7).

Given a reconstruction filter and the three-dimensional ar-ray of voxels, the data is visualized by sending rays from the

0 11 -1

CBA

00-1

1

-1

1

2 31

1

-2-30.5-0.5

Figure 2: Three reconstruction filters: (a) box, (b) tent and(c) sinc filters.

eye through each pixel of the image plane through the vol-ume and sampling the data with a constant sampling rate(ray-casting)23. According to the Nyquist theorem, the orig-inal signal can be reproduced as long as the sampling rateis at least twice that of the highest frequency contained inthe original signal. This is, of course, a problem of the vol-ume acquisition during the measurements and simulations,so this problem does not concern us here. In the same man-ner the sampled signal may be reconstructed as long as thesampling distance in the volume data is at least twice that ofthe highest frequency contained in the sampled volume. As aconsequence of the Nyqist theorem we have to choose a sam-pling rate that satisfies those needs. However, we will showin Chapter 8 that due to high frequencies introduced duringthe classification, which is applied to each filtered sample ofthe volume data set, it is in general not sufficient to samplethe volume with the Nyquist frequency of the volume data.Because of these observations we will later present a clas-sification scheme that allows us to sample the volume withthe Nyquist frequency of the volume data by pre-integratingray-segments in a pre-processing step (see Section 8).

2.2. Classification

Classification is another crucial subtask in the visualiza-tion of volume data, i.e. the mapping of volume samples toRGBA values. The classification step is introduced by trans-fer functions for color densities c̃

�s � and extinction densi-

ties τ�s � , which map scalar values s � s

�x � to colors and

extinction coefficients. The order of classification and filter-ing strongly influences the resulting images, as demonstratedin Figure 3. The image shows the results of pre- and post-classification at the example of a 163 voxel hydrogen orbitalvolume and a high frequency transfer function for the greencolor channel.

It can be observed that pre-classification, i.e. classificationbefore filtering, does not reproduce high-frequencies in thetransfer function. In contrast to this, post-classification, i.e.classification after filtering, reproduces high frequencies inthe transfer function on the slice polygons.

2.3. Ray Integration

After the filtering and classification of the volume data,an integration along view rays through the volume is

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

classification-schemes

voxels

post-classification

filtering

filtering

pre-classification

classification

transfer-functions

classification

Figure 3: Comparison of pre-classification and post-classification. Alternate orders of classification and filtering lead to com-pletely different results. For clarification a random transfer function is used for the green color channel. Piecewise lineartransfer functions are employed for the other color channels. Note, that in contrast to pre-classification, post-classificationreproduces the high frequencies contained within in the transfer function.

required23 � 27. A common approximation used in the visual-ization of volume data is the density-emitter model, whichassumes that light is emitted and absorbed at each point inthe volume while neglecting scattering and frequency depen-dent effects.

We denote a ray cast into the volume by�x�t � , and param-

eterize it by the distance t to the eye. The scalar value cor-responding to this position on a ray is denoted by s

� �x�t � � .

Since we employ an emission-absorption optical model, thevolume rendering integral we are using integrates absorp-tion coefficients τ

�s� �x�t � � � (accounting for the absorption of

light), and colors c�s� �x�t � � � (accounting for light emitted by

particles) along a ray.

The volume rendering integral can now be used to obtainthe integrated “output” color C, subsuming both color (emis-sion) and opacity (absorption) contributions along a ray upto a certain distance D into the volume (see Figure 4):

C �� D

0c�s� �x�t � � � e ��� t

0 τ � s ���x � t � � dt � dt (2)

The equation denotes, that at each position in the volume,light is emitted according to the term c

�s� �x�t � � � , which is

absorbed by the volume at all positions along a ray in front ofthe light emission position according to the term τ

�s� �x�t � � � � .

(t)x

0

D

Figure 4: Integration along a ray through the volume.

By discretizing the integral, we can now introduce theopacity values A, “well-known” from alpha blending, bydefining

Ai � 1 e � τ � s ���x � id � d (3)

Similarly, the color (emission) of the i-th ray segment can beapproximated by:

Ci � c�s� �x�id � � � d (4)

Having approximated both the emissions and absorptions

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

along a ray, we can now state the approximate evaluationof the volume rendering integral as (denoting the number ofsamples by n � �

D � d � ):

Capprox �n

∑i � 0

Ci

i � 1

∏j � 0

�1 Ai � (5)

Equation 5 can be evaluated iteratively by alpha blending36 � 3

in either back-to-front, or front-to-back order.

The following iterative formulation evaluates equation 5in back-to-front order by stepping i from n 1 to 0:

C �i � Ci ��1 Ai � C �i � 1 (6)

2.4. Shading

In the above considerations we did not take the effects ofexternal light sources into account. Instead, we assumed asimple shading, i.e. we identified the primary color assignedin the classification with c

�s� �x�t � � � .

The most popular local illumination model is the Phongmodel35 � 4, which computes the lighting as a linear combi-nation of three different terms, an ambient, a diffuse and aspecular term,

IPhong � Iambient � Idiffuse � Ispecular � (7)

Ambient illumination is modeled by a constant term,Iambient � ka � const � Without the ambient term parts ofthe geometry that are not directly lit would be completelyblack. In the real world such indirect illumination effects arecaused by light reflected from other surfaces.

Diffuse reflection refers to light which is reflected withequal intensity in all directions (Lambertian reflection). Thebrightness of a dull, matte surface is independent of theviewing direction and depends only on the angle of inci-dence ϕ between the direction

�l of the light source and the

surface normal�n. The diffuse illumination term is written as

Idiffuse � Ip kd cosϕ � Ip kd� �l � �n � . Ip is the intensity emitted

from the light source. The surface property kd is a constantbetween 0 and 1 specifying the amount of diffuse reflectionas a material specific constant.

Specular reflection is exhibited by every shiny surfaceand causes so-called highlights. The specular lighting termincorporates the vector

�v from the object to the viewers eye

into the lighting computation. Light is reflected in the di-rection of reflection

�r which is the direction of light

�l mir-

rored about the surface normal�n. For efficiency, the re-

flection vector�r can be replaced by the halfway vector

�h,

Ispecular � Ip ks cosn α � Ip ks� �h � �n � n. The material prop-

erty ks determines the amount of specular reflection. The ex-ponent n is called the shininess of the surface and is used tocontrol the size of the highlights.

3. Programmable Consumer Graphics Hardware

The typical architecture of todays consumer graphics pro-cessing units (GPUs) is characterized by a configurable ren-dering pipeline employing an object-order approach, that ap-plies several geometric transformations on primitives likespoints, lines and polygons (geometry processing), before theprimitives are rasterized (rasterization) and written into theframe buffer after several fragment operations (see Figure 5).The design as a strict pipeline with only local knowledge al-lows lots of optimizations, for example the geometry stageis able to work on new vertex data, while the rasterizationstage is processing earlier data. However, the current trendis to replace the concept of a configurable pipeline with theconcept of a programmable pipeline. The programmabilityis introduced in the vertex processing stage as well as in therasterization stage. We will mainly be concerned with pro-grammable units in the rasterization stage, since the majortasks in texture-based volume rendering is done in this stage,i.e in volume rendering geometry processing of the proxy ge-ometry is very simple while rasterization is quite complex.

rasterimage

scenedescription

RASTERIZATIONFRAGMENT

OPERATIONSGEOMETRY

PROCESSING

Vertices Primitives PixelsFragments

Figure 5: The graphics pipeline that is used in common PCconsumer graphics hardware.

At the time of this writing (summer 2002), the two mostimportant vendors of programmable GPUs are NVIDIA andATI. The current state-of-the-art consumer graphics proces-sors are the NVIDIA GeForce4 and the ATI Radeon 8500.In the following, we will focus on the programmable per-fragment operation extensions of these two manufacturers.

Traditional lighting calculations were done on a per-vertex basis, i.e. the primary (or diffuse) color was calcu-lated at the vertices and linearly interpolated over the inte-rior of a triangle by the rasterizer. One of the first possibil-ities for per-pixel shading calculations was introduced forcomputer games that employ pre-calculated light maps anddecal maps for global illumination effects in walkable 3Dscenes. The texels of the decal texture map are modulatedwith the texels of the light texture map. This model requiresto apply two texture maps on a polygon, i.e. multi-texturesare needed. The way the multiple textures are combined to asingle RGBA value in traditional OpenGL multi-texturing isdefined by means of several hard-wired texture environmentmodes.

The OpenGL multi-texturing pipeline has various draw-backs, particularly it is very inflexible and cannot accom-modate the capabilities of today’s consumer graphics hard-

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

ware. Starting with the original NVIDIA register combin-ers, which are comprised of a register-based execution modeland programmable input and output routing and operations,the current trend is toward writing a fragment shader in anassembly language that is downloaded to the graphics hard-ware and executed for each fragment.

The NVIDIA model for programmable fragment shadingcurrently consists of a two-stage model that is comprised ofthe distinct stages of texture shaders14 and register combin-ers12.

Figure 6: Comparison of traditional multi-texturing andmulti-texturing using NVIDIA’s texture shader concept.

Texture shaders are the interface for programmable tex-ture fetch operations. For each of the four multi-textures afetch operation can be defined out of one of 23 pre-definedtexture shader programs, whereas the GeForce 4 offers 37different such programs15. In contrast to traditional multi-texturing, the results of a previous texture fetch may be em-ployed as a parameter for a subsequent texture fetch (seeFigure 6). Additionally, each texture fetch operation is capa-ble of performing a math operation before the actual texturefetch. A very powerful option are dependent texture fetches,which use the results of previous texture fetches as texturecoordinates for subsequent texture fetches. An example forone of these texture shader programs is dependent alpha-redtexturing, where the texture unit for which this mode is se-lected, takes the alpha and red outputs from a previous tex-ture unit as 2D texture coordinates for a dependent texturefetch in the texture bound to the texture unit. Thus it per-forms a dependent texture fetch, i.e. a texture fetch operationthat depends on the outcome of a fetch executed by anotherunit.

The major drawback of the texture shaders model isspecifically that it requires to use one of several fixed-function programs, instead of allowing arbitrary pro-grammability.

After all texture fetch operations have been exe-cuted (either by standard OpenGL texturing, or us-ing texture shaders), the register combiners mecha-nism may be used for flexible color combination op-erations, employing a register-based execution model.The register combiners interface is exposed through two

OpenGL extensions: GL_NV_register_combiners12,and GL_NV_register_combiners213.

��������������������

���������������������������������������������

���������������������������������������������

���������������

���������������������������������

�������������������������������������������������������������������������������������������������������������������

��������������������������������������������

���������������

���������������

���������������������������������

���������������������������������

not readable

inputmap

inputmap

inputmap

inputmap

A B C D

primary color

secondary color

texture 0

texture 1

spare 0

spare 1

fog

constant color 0

constant color 1

zero

primary color

secondary color

texture 0

texture 1

spare 0

spare 1

fog

constant color 0

constant color 1

zero

input registers output registers

RGB A RGB A

computations

scaleandbias

A B + C D−or−

A B mux C D

A B−or−

A ● B

C D−or−

C ● D

not writeable

Figure 7: A general combiner stage of NVIDIA’s registercombiner unit.

Current NVIDIA GPUs feature eight general and one fi-nal combiner stage. Each general combiner stage (see Fig-ure 7) has four input variables (A,B,C and D). These in-put variable can be occupied with one of the possible in-put parameters, e.g. the result of a texture fetch operation orthe primary color. As all computation are performed in therange of 1 to 1 an input mapping is required for each vari-able, that maps the input colors to that range. A maximumof three math operations may be computed per general com-biner stage, e.g. a component-wise weighted sum AB � CDand two component-wise products AB, CD or two dot prod-ucts A � B and C � D. The result are modified by scale and biasoperations and used in subsequent general combiner stages.The RGB and the Alpha channels are handled independently,i.e. alpha portion of each general combiner can perform adifferent math operation as the RGB portion.

inputmap

A B C D

primary color

secondary color

texture 0

texture 1

spare 0

spare 1

fog

constant color 0

constant color 1

zero

input registers

RGB A

inputmap

inputmap

inputmap

inputmap

inputmap

inputmap

EF

spare 0 +secondary color

E F

G

AB + (1−A)C + D

G

computations

fragment RGB out

fragment Alpha out

Figure 8: The final combiner stage of NVIDIA’s registercombiner unit.

After all enabled general combiner stages have been exe-cuted, a single final combiner stage (see Figure 8) generatesthe final fragment color, which is then passed on to fragmenttests and alpha blending.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

In contrast to the NVIDIA approach, fragment shad-ing on the Radeon 8500 uses a unified model that sub-sumes both texture fetch and color combination operationsin a single fragment shader8 program. The fragment shaderinterface is exposed through a single OpenGL extension:GL_ATI_fragment_shader.

texture coordinates,constants,

primary & secondary color

temporaryregisters

fragmentcolor

1st pass

2nd pass

arithmetics

sampling & routing

arithmetics

Hdepend.L sampling & routing

arithmetics

Figure 9: The ATI fragment shader unit.

The fragment shader unit is divided up into two (cur-rently) or more passes, each of which may perform sixtexture sampling and routing operations followed by eightarithmetic calculations (see Figure 9). The second phase al-lows dependent texture fetches with the results from the firstphase. Such results are stored in a set of temporary registers.

On the Radeon 8500, the input registers used bya fragment shader consist of six RGBA registers(GL_REG_0_ATI to GL_REG_5_ATI), correspond-ing to this architecture’s six texture units. Furthermore,two interpolated colors, and eight constant RGBA registers(GL_CON_0_ATI to GL_CON_7_ATI) are available toprovide additional color input to a fragment shader.

Each of the six possible sampling operations can be oc-cupied with a texture fetch from the corresponding textureunit. Each of the eight fragment operations can be occupiedwith one of the following instructions:

� MOV: Moves one register into another.� ADD: Adds one register to another and stores the result

in a third register.� SUB: Subtracts one register from another and stores the

result in a third register.� MUL: Multiplies two registers component-wise and

stores the result in a third register.

� MAD: Multiplies two registers component-wise, adds athird, and stores the result in a fourth register.

� LERP: Performs linear interpolation between two regis-ters, getting interpolation weights from a third, and storesthe result in a fourth register.

� DOT3: Performs a three-component dot-product, andstores the replicated result in a third register.

� DOT4: Performs a four-component dot-product, andstores the replicated result in a third register.

� DOT2_ADD: The same as DOT3, however the third com-ponent is assumed to be 1 � 0 and therefore not actuallymultiplied.

� CND: Moves one of two registers into a third, depend-ing on whether the corresponding component in a fourthregister is greater than 0 � 5.

� CND0: The same as CND, but the conditional is a com-parison with 0 � 0.

The components of input registers to each of these instruc-tions may be replicated, and the output can be masked foreach component, which allows for flexible routing of colorcomponents. Scaling, bias, negation, complementation, andsaturation (clamp against 0 � 0) are also supported. Further-more, instructions are issued separately for RGB and alphacomponents, although a single pair of RGB and alpha in-structions counts as a single instruction.

In general, it can be said that the ATI fragment shadermodel is much easier to use than the NVIDIA extensionsproviding similar functionality, and also offers more flexibil-ity with regard to dependent texture fetches. However, bothmodels allow specific operations that the other is not able todo.

4. Texture-Based Volume Visualization

As discussed in the previous section, current consumergraphics hardware is based on an object-order rasterizationapproach, i.e. primitives (polygons, lines, points) are scan-converted and written pixel-per-pixel into the frame buffer.Since the volume data does not consists of such primitivesper se, a proxy geometry is defined for each individual slicethrough the volume data. Each slice is textured with thecorresponding data from the volume. The volume is recon-structed during rasterization on the slice polygon by apply-ing a convolution of the volume data with a filter kernel asdiscussed in Section 2. The entire volume can be representedby a stack of such slices, if the number of slices satisfies therestrictions imposed by the Nyquist theorem.

There exist at least three different slicing approaches.View-aligned slicing in combination with 3D textures5 is themost common approach (see Figure 10). In this case, the vol-ume is stored in a single 3D texture and three texture coordi-nates from the vertices of the slices are interpolated over theinside of the slice polygons. These three texture coordinatesare employed during rasterization for fetching filtered texelsfrom the 3D texture map.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Polygon Slices Final Image3D Texture

Figure 10: The 3D texture-based approach uses view-aligned slices as proxy geometry.

Polygon Slices Final Image2D Textures

Figure 11: The 2D texture-based approach uses object-aligned slices as proxy geometry.

If only 2D texture mapping is supported by the graphicshardware, i.e. the hardware is able to perform bi-linear inter-polation, the slices have to be aligned orthogonal with oneof the the three major axes of the volume. For this so calledobject-aligned slicing (see Figure 11), the volume data isstored in several two-dimensional texture maps. To preventunfavorable alignments of the slices with the viewers line ofsight, that would allow the viewer to see in between indi-vidual slices, one slice stack for each major axis is stored.During rendering, the slice stack that is most perpendicularto the viewer’s line of sight is chosen for rendering (see Fig-ure 12).

image planeimage planeimage planeimage plane image plane

B EA C D

Figure 12: Choosing the most perpendicular slice stack tothe viewer’s line of sight during object-aligned slicing. Be-tween (C) and (D) another slice stack is chosen for render-ing.

There are major disadvantages when using object-alignedslices with standard 2D texturing. First, the volume must betripled, which is critical especially in the context of lim-ited memory of consumer graphics hardware. Second, thenumber of slices that are rendered is limited to the reso-lution of the volume, because the insertion of interpolated

slice will increase the memory consumption. Typically, un-dersampling occurs most visibly on the side of the volumealong the currently used major axis. Another disadvantageis, that switching from one slice stack to another when ro-tating the volume leads to an abrupt change of the currentlyused sampling points, which becomes visible as a poppingeffect (see Figure 13). Finally, the distance of sampling pointdepends on the viewing angle as outlined in Figure 14. Aconstant sampling distance is however necessary in order toobtain correct results.

CA B

Figure 13: Abrupt change of the location of sampling points,when switching from one slice stack (A) to another (B).

d2 d4d3d0 d1

Figure 14: The distance between adjacent sampling pointsdepends on the viewing angle.

However, almost all of these disadvantages are circum-vented by using multitextures and programmable rasteriza-tion units. The basic idea is to render arbitrary, tri-linearlyinterpolated, object-aligned slices by mapping two adja-cent texture slices to a single slice polygon by means ofmultitextures37. The texture environment of the two 2D tex-tures performs two bi-linear interpolations whilst the thirdinterpolation is done in the programmable rasterization unit.This unit is programmed to compute a linear interpolation oftwo bi-linearly interpolated texel from the adjacent slices. Alinear interpolation in the fragment stage is implementableon a wide variety of consumer graphics hardware architec-tures. A register combiner setup for the NVIDIA GeForceseries is illustrated in Figure 15. The interpolation factor α(variable D) is mapped into a constant color register and in-verted by means of a corresponding input mapping to obtainthe factor 1 α (variable B). The two slices that enclose theposition of the slice to be rendered are configured as tex-ture 0 (variable A) and texture 1 (variable C). The combineris configured to calculate AB � CD, thus the final fragmentcontains the linearly interpolated result corresponding to thespecified fractional slice position.

With the aid of this extension it is now possible to freelyadjust the sampling rate without increasing the requiredmemory. Furthermore, by adapting the sampling distanceto the viewing angle, the sampling rate is held constant, atleast for orthogonal projections. There remains the problem

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

RGB A

texture 0

output registerinput registers finalcombiner

slice i+1texture 1

const col 0

A

B

C

DAlpha Portioninterpolated

alpha

INVERT

RGB Portioninterpolated

color

AB + CD

RGB A

fragment

generalcombiner 0

gradientintensity

slice iA

B

C

D

A B +(1-A) C

+ D

interpolationfactor

gradientintensity

G

Figure 15: Register combiners configuration for the interpolation of intermediate slices.

of tripled memory consumption, yet this problem is solv-able with a small modification of the above approach. Forthat purpose view-aligned slice polygons similar to a 3Dtexture-based approach are computed (see Figure 10). Byintersecting view-aligned slices with a single object-alignedslice stack we obtain small stripes, each of which is boundedby two adjacent object-aligned slices (see Figure 16). Insteadof a constant interpolation factor it is now necessary to lin-early interpolate the factor from 0 to 1 from one slice to theadjacent slice on the stripe polygon. By enabling Gouraudshading for the stripe polygons and using a primary colorof 0 and 1 for the corresponding vertices of the stripe poly-gon the primary color is linearly interpolated and thereforeemployed as the interpolation factor. An appropriate registercombiner configuration is shown Figure 17.

calculatecross−section

cut polygon into stripes

specify alpha values

apply multi−texture

Figure 16: Interpolation of view-aligned slices by renderingslab-polygons.

slice i

slice (i +1)

input registers

RGB A

texture 1

interpolationfactor α primary color

texture 0

INVERT

generalcombiner 0

A

B

C

D

A B + C D

Alpha portion:interpolated

alpha

RGB portion:interpolated

color

Figure 17: Register combiners configuration for renderingtri-linearly interpolated stripe polygons.

Consequently, a volume rendering algorithm similar to the3D texture-based approach is possible with 2D textures andsimple per-fragment operations. The only remaining prob-lem is the subdivision of the slice polygons into stripes and

the larger number of resulting polygons to be rendered. Inorder to circumvent a potentially expensive calculation ofthe stripe polygon vertices, an analog algorithm with object-aligned stripes can be implemented. Since volume renderingis usually fillrate or memory bandwidth bound and the num-ber of polygons to be rendered is still quite limited for ren-dering stripe polygons, framerates similar to a 3D texture-based algorithm are possible.

viewing

rays

image plane

viewing

rays

image plane

viewing

rays

eyeimage plane

viewing

rays

eye

image plane

B. Perspective projectionA.Parallel Projection

Figure 18: For perspective projections the sampling ratevaries for each viewing ray (left: parallel projection, right:perspective projection).

A general disadvantage of a planar proxy geometry is thestill varying sampling distance inside the volume for eachviewing ray when using perspective projections (see Fig-ure 18). This problem is circumventable by rendering con-centric spherical shells around the camera position21 � 9 (seeFigure 19). These shells are generated by clipping tessellatedspheres against the viewing frustum and the volume bound-ing box. However, the setup and rendering of shells is morecomplicated than rendering planar slice polygons. Since thepixel-to-pixel difference, due to unequal sampling distancesis often not visible, shells should only be used for extremeperspectives, i.e. large field-of-views.

All of the above slicing approaches render infinitesimalthin slices into the framebuffer, whereby the number of slicesdetermines the sampling rate. In contrast to this slice-by-slice approach, pre-integrated volume rendering employs aslab-by-slab algorithm, i.e. the space in between two adja-cent slices (or shells) is rendered into the framebuffer. De-tails on pre-integrated volume rendering will be presented inChapter 8.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

eye

Figure 19: Shell rendering provides a constant samplingrate for all viewing rays.

A continuous volume is reconstructed on the slice poly-gons by the graphics hardware by applying a reconstruc-tion filter. Current graphics hardware supports pre-filteringmechanisms like mip-mapping and anisotropic filtering forminification and linear, bilinear, and tri-linear filters for mag-nification. However, in Section 7 multi-textures and pro-grammable per-fragment operations allow to apply high-quality filters to the volume data on-the-fly.

The classification step in hardware-accelerated volumerendering is either performed before or after the filter-ing step. The differences of pre- and post-classificationwere already demonstrated in Section 2.2. Pre-classificationcan be implemented in a pre-processing step by using theCPU to transform the scalar volume data into a RGBAtexture containing the colors and alpha values from thetransfer function. However, as the memory consumptionof RGBA textures is quite heavy, NVIDIA’s GPUs sup-port the GL_EXT_paletted_texture extension, whichuses 8 bit indexed textures and performs the transfer func-tion lookup in the graphics hardware. Unfortunately, theredoes not exist an equivalent extension for post-classification.However, NVIDIA’s GL_NV_texture_shader andATI’s GL_ATI_fragment_shader extension makes itpossible to implement post-classification in graphics hard-ware using dependent texture fetches. For that, first a filteredscalar value is fetched from the volume, that is used as atexture coordinate for a dependent lookup into a texture con-taining the RGBA values of the transfer function. The vol-ume texture and the transfer function texture are bound bymeans of multi-textures to a slice polygon. This also enablesus to implement multi-dimensional transfer functions, i.e. a2D transfer function is implemented with a 2D dependenttexture and a 3D transfer function is possible by using a 3Ddependent texture (see Section 6).

5. Illumination

Realistic lighting of volumetric data greatly enhances depthperception. For lighting calculations a per-voxel gradient isrequired, that is determined directly from the volume data

by investigating the neighborhood of the voxel. Althoughnewest graphics hardware will enable the calculation of thegradient at each voxel on-the-fly, in the majority of the casesthe voxel gradient is pre-computed in a pre-processing step.

For scalar volume data the gradient vector is defined bythe first order derivative of the scalar field I

�x � y � z � , which

is defined as by the partial derivatives of I in the x-, y- andz-direction:

�∇I � �

Ix � Iy � Iz ����

∂∂x

I � ∂∂y

I � ∂∂z

I � � (8)

The length of this vector defines the local variation of thescalar field and is computed using the following equation:��� �∇I

��� ��� Ix2 � Iy

2 � Iz2� (9)

Note that for dot-product lighting calculations the gradi-ent vector must normalized to a length of 1 to obtain cor-rect results. Due to the fact that the gradients are normalizedduring pre-calculation on a per-voxel basis and interpolatedtri-linearly in space, interpolated gradient will generally notbe normalized. Thus, instead of dot-product lighting calcu-lations, often light- or environment-maps are employed, be-cause in this case the lookup of the incoming light purely de-pends on the direction of the gradient (see Section 5.3). An-other possibility are normalization maps, that contain nor-malized gradients for each lookup with unnormalized gradi-ents. A normalized gradient is then used for further lightingcalculations.

One of the most common approaches to estimate the gra-dient is based on the first term of a Taylor expansion. Withthis central differences method, the directional derivative inthe x-direction is calculated by evaluating

Ix�x � y � z ��� I

�x � 1 � y � z � I

�x 1 � y � z � with x � y � z � IN �(10)

The derivatives in the other two directions are calculatedrespectively. A common way to store pre-computed gradi-ent vectors for hardware-accelerated volume rendering is tostore the three components of the gradient and the scalarvalue of the volume data as a RGBA texture:

�∇I � �� Ix

Iy

Iz

�� �� �� R

GB

I �� A

(11)

Use of gradients for the visualization of non-polygonalshaded iso-surfaces and for volume shading is demonstratedin the following subsections.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 20: A step in the transfer function or an alpha test GL_GEQUAL generates a single-sided iso-surface. Note, that theartifacts in the iso-surface are removed by very high sampling rates.

5.1. Non-polygonal Iso-surfaces

Aside from the explicit reconstruction of threshold surfacesfrom volume data in a pre-processing step to rendering, thereexist also techniques for the visualization of iso-surfacesduring rendering. The underlying idea is to reject fragmentsfrom being rendered that lie over (and/or under) the iso-value, by applying a per-fragment test or assigning a trans-parent alpha-value by classification. Westermann proposedto use the OpenGL alpha test to reject fragments that donot actually contribute to an iso-surfaces with given iso-value41. The OpenGL alpha test allows us to define a al-pha test reference value and a comparison test that rejectsfragments based on the comparison of the alpha channel ofthe fragment and the given reference value. With the RGBAtexture setup from the last section and by setting the al-pha test reference value to the iso-value and the compari-son test to GL_GEQUAL (greater or equal) one side of theiso-surface is visualized, while a test GL_LEQUAL (less orequal) visualizes the other side (see Figure 20). An alpha testwith GL_EQUAL will lead to a two-sided iso-surface. Rezk-Salama et al.37 extended Westermann’s approach to use pro-grammable graphics hardware for the shading calculations.

The similar result is obtained by using appropriate trans-fer functions for classification. Single-sided iso-surfaces aregenerated by using a step function from fully transparent tofully opaque values or vice versa, whilst peaks in the trans-fer function cause double-sided iso-surfaces. The thicknessof the iso-surfaces is determined by the width of the peak.As peaks in the transfer function generate high frequenciesin the classified volume data, holes in the surface will onlybe removed by employing very high sampling rates (see Fig-ure 21). It should be noted, that only post-classification willfill the holes in the surfaces for high sampling rates, sincepre-classification does not reconstruct the high frequencies

of the transfer function on the slice polygons properly. For asolution of this problem we refer to Section 8.

5.2. Fragment Shading

With the texture setup introduced in the section, interpo-lated gradients are available during rasterization on a per-fragment basis. In order to integrate the Phong illumina-tion model35 into a single-pass volume rendering algorithm,dot-products and component-wise products must be com-puted with per-fragment arithmetic operations. This mech-anism is provided by modern PC graphics hardware throughadvanced per-fragment operations. The standard OpenGLextension EXT_texture_env_dot3 provides a simplemechanism for product calculations. Additionally, NVIDIAprovides the mechanism of texture shaders and register com-biners for texture fetching and arithmetic operations, whilstATI combines the functionality of these two units into thefragment shader API.

Although similar implementations are possible with ATI’sfragment shader extension, the following examples employNVIDIA’s register combiners. The combiner setup for dif-fuse illumination with two independent light sources is dis-played in Figure 22. In this setup two textures are used foreach slice polygon - one for the pre-calculated gradients andone for the volume scalars. The first combiner calculatesthe two dot products of the gradient with the directions ofthe light sources. The second combiner multiplies these dotproducts with the colors of the corresponding light sourcesand sums up the results. In the final combiner stage the RGBvalue after classification is added to the RGB result and thetransparency of the fragment is set to the alpha value ob-tained from classification.

Specular and diffuse illumination is achieved by using the

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 21: Three peaks in the transfer function generate three double-sided iso-surfaces. Note, that the holes in the iso-surfacesare removed by very high sampling rates.

G

RGB A

texture 0

output registerinput registers

texture 1RGB A

fragment

slice i

finalcombiner

A

B

C

D

A B +(1-A) C

+ D

direction oflight 2 const col 1

sec. colorcolor of lightsource 2

gradientintensity

RGB

ALPHA

direction oflight 1 const col 0

prim. colorcolor of lightsource 1

generalcombiner 0

A

C

D

B dotproduct

C D

A B

generalcombiner 1

A

C

D

B comp.wise

AB CD+

ONE

ZERO

Figure 22: NVidia register combiner setup for diffuse illumination with two independent light sources.

register combiner setup displayed in Figure 23. Here, thefirst combiner calculates a single dot product of the gradientwith a single light source direction. The result is used in thesecond general combiner for calculation of a diffuse light-ing term and at the same time the dot product is squared. Inthe final combiner the squared dot product is squared twicebefore the ambient, diffuse and specular lighting terms aresummed up.

5.3. Light/Reflection Maps

In the above lighting calculations the complexity of the light-ing conditions is limited by the number of arithmetic opera-tions (or combiner stages) that are allowed on a per-fragmentbasis. Additionally, due the current unavailability of squareroot and division operations in the fragment stage, the inter-polated gradients cannot be renormalized. Dot-product light-ing calculations with unnormalized gradients will lead toshading artifacts.

In order to allow more complex lighting conditions andprevent artifacts, the concept of reflection maps is applica-ble. A reflection map caches the incident illumination fromall directions at a single point in space. The general assump-tion for the justification of reflection maps is, that the objectto which the reflection map is applied is small with respectto the environment that contains it.

For a lookup into a reflection map on a per-fragment ba-sis, a dependent texture fetch operation with the normal froma previous fetch is required. The coordinates for the lookupinto a diffuse environment map are directly computed fromthe normal vector, whereas the reflection vectors for reflec-tion map lookups are a function of both the normal vectorsand the viewing directions.

Besides spherical reflection-maps, cube maps have be-come more and more popular within the past years (see Fig-ure 24). In this case the environment is projected onto thesix sides of a cube, which are indexed with the largest ab-solute component of the normal or reflection vector. The de-

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

RGB A

texture 0

output registerinput registers

texture 1RGB A

fragment

slice i

finalcombiner

A

B

C

D

gradientintensity

RGB

ALPHA

direction oflight 1 const col 0

prim. colorcolor of lightsource 1

generalcombiner 0

A

C

D

B

generalcombiner 1

C

B comp.wise

product

ZERO

dotproduct

A B

G

E

F

A B +(1-A) C

+ D

E F

A B

C D

A

D +

Figure 23: NVidia register combiner setup for diffuse and specular illumination. The additional sum (+) is achieved using thespare0 and secondary color registers of the final combiner stage.

Figure 24: Example of a cube environment map.

pendent lookup into such a cube map is supported in ATI’sfragment shader and NVIDIA’s texture shader OpenGL ex-tensions. With NVIDIA’s texture shader extension four tex-ture units are required. The first texture unit fetches a fil-tered normal from the volume. Since the reflection mapis generated in world coordinate space, the current model-ing matrix is required to transform the normals into worldspace. The texture coordinates of the three remaining tex-ture stages are used to pass the current modeling matrix�si � ti � ri � � i � �

1 � 2 � 3 � and the eye vector�q1 � q2 � q3 � to the

rasterization unit as vectors vi ��si � ti � ri � qi � � i � �

1 � 2 � 3 �(see Figure 25). From this information the GPU calculatesa normal in world space for a diffuse lookup or a reflectionvector in world space for a reflection map lookup. The valuesfrom those lookups are finally combined using the registercombiner extension (see Figure 26 for results).

Due to the fact, that the upload time of updated cube maps

Figure 25: Texture shader configuration for lookupinto a reflection cube map. For a diffuse cube maplookup, the last fetch operation is replaced by aDOT_PRODUCT_TEXTURE_CUBE_MAP_NV operation.

is negligible, cube maps are ideally suited for complex, dy-namic lighting conditions.

5.4. Shadows

Shadows are another important visual clue for the perceptionof 3D structures. Behrens and Ratering2 proposed a hard-ware model for computing shadows in volume data. Theycompute a second volume that captures the light arriving at acertain volume sample. During rasterization the colors afterthe transfer function are multiplied with these values to pro-duce attenuated colors. However this approach suffers fromblurry shadows and too dark surfaces due to the interpolation

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 27: Example volume renderings with shadow computations.

Figure 26: Iso-surface of the engine block with diffuse cubemap (left) and specular reflection map (right).

of the coarse light intensity volume. This effect is referred toas attenuation leakage.

An alternative approach was proposed by Kniss18. Insteadof a shadow volume an offscreen render buffer accumulatesthe amount of light from the light’s point of view. In orderto render the same slices for light attenuation and volumeslicing, Kniss proposed to slice the volume with a slice axisthat is the halfway vector between the view and light direc-tions. For a single headlight the same slices for accumulat-ing opacity and for volume rendering are employed (Fig-ure 28(a)). The amount of light arriving from the light sourceat a particular slice is one minus the accumulated opacityfrom the slices before it. Given the situation in Figure 28(b),normally a separate shadow volume must be created. Withthe approach of Kniss the halfway vector s between the lightsource direction l and the view vector v is used as slicing axis(Figure 28(c)). In order to have an optimal slicing directionfor each possible configuration, the negative view vector isused for angles greater than 90 degrees between the view andthe light vector (Figure 28(d)). Note, that in order to ensurea consistent sampling rate, the slice spacing along the slicedirection must be corrected.

Given the above slicing approach and a front-to-back slicerendering order, a two-pass volume rendering algorithm willprovide the desired results. First, a hardware-accelerated offscreen render buffer is initialized with 1 light_intensity.

v v -v

l lss

θ

12−θ

vl

v

l

(a) (b)

(c) (d)

Figure 28: Modified slicing approach for light transportcomputation.

Alternatively, to create effects like spotlights, the buffer isinitialized with an arbitrary image. Then each slice is ren-dered in a first pass from the observers point of view usingthe off screen buffer to modulate the brightness of samples.The two textures are applied to a slice polygon by meansof multi-textures. Then, in the second pass, the slice is ren-dered from the light’s point of view to calculate the intensityof the light arriving at the next layer. Light is attenuated byaccumulating the opacity for each sample with the over op-erator. Optimized performance is achievable by utilizing ahardware-accelerated off screen buffer, that can be directlyused as a texture. Such functionality is provided in OpenGLby the extension render to texture. The approach renders onelight source per volume rendering pass. Multiple light sourcerequire additional rendering passes. The resulting images areweighted and summed. Figure 27 demonstrates results of theapproach. As images created with the this approach often ap-pear to dark because of missing scattering effects, Kniss alsoproposed to use the approach for scattering computations19

to simulate translucent material.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

6. Multi-Dimensional Classification

The term classification was introduced in Section 2 as theprocess of mapping scalar values from the volume to RGBAvalues using a one-dimensional transfer function. Figure 29shows a one-dimensional histogram of a CT data set (seeFigure 31), that allows to identify three basic materials.

� � �

Figure 29: A 1D histogram of the CT human head data set(black: log scale, grey: linear scale). The colored regions(A,B,C) identify basic materials.

However, it is often advantageous to have a classificationthat depends on multiple scalar values. The two-dimensionalhistogram based on the scalar value and the first order deriva-tive in Figure 30 allows to identify the materials as well asthe material boundaries of the same CT data set (see Fig-ure 31).

� � �� �

� �

� ����������� ���Figure 30: A log-scale 2D histogram. Materials (A,B,C) andmaterial boundaries (D,E,F) can be distinguished.

Multi-dimensional transfer functions were first proposedby Levoy22. They provide a very effective way to extract ma-terials and their boundaries for both scalar and multivariatedata. However, the manipulation of multi-dimensional trans-fer functions is extremely difficult. Kindlman16 proposedthe semi-automatic generation of transfer functions whilstKniss17 proposed a set of direct manipulation widgets thatmake specifying such transfer functions intuitive and conve-nient.

Since identifying good transfer functions is a difficulttask, the interactive manipulation of all transfer function pa-rameters is mandatory. Fortunately, dependent texture reads,which were introduced in the latest generation of consumergraphics hardware, are particularly suited for an interactiveclassification with multi-dimensional transfer functions. In-stead of the one-dimensional dependent texture from Sec-tion 4, the transfer function is transformed into a two- or

Figure 31: CT scan of a human head showing the materi-als and boundaries identified by a two-dimensional transferfunction.

three-dimensional RGBA texture with color and opacity val-ues from the transfer function. The volume itself is definedas a LUMINANCE_ALPHA-texture for two scalars per voxeland as a RGB-texture for three scalars per voxel.

Figure 32: Texture shader setup for a three-dimensionaltransfer function. Stage 0 fetches the per-voxel scalars whilestage 1 performs a dependent texture lookup into the 3Dtransfer function.

Figure 32 shows the texture shader setup for NVIDIA’sGeForce4 chip. During rendering the first texture stagefetches filtered texels from the volume containing the threescalars defined for that voxel in the RGB components. Tex-ture stage 1 performs dependent texture lookups with theRGB components from stage 0 as texture coordinates intothe 3D transfer function volume. Consequently, a post-classification with the 3D transfer function is performed.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

It should be noted, that a 3D dependent texture lookup iscurrently only supported by the texture_shader3 ex-tension of the GeForce4 chip. In contrast, both the ATIRadeon8500 and the NVIDIA GeForce3 and GeForce4chips support 2D dependent lookups.

7. High-Quality Filtering

As outlined in Section 2, the accurate reconstruction ofthe original volume from the sampled volume data requiresan appropriate reconstruction filter. Unfortunately, currentgraphics hardware only supports linear filters for magnifi-cation. However, Hadwiger et al.10 have shown that multi-textures and flexible rasterization hardware allow to evaluatearbitrary filter kernels during rendering.

The filtering of a signal can be described as the convolu-tion of the signal function s with a filter kernel function h:

g�t � � � s � h � � t � �

� ∞

� ∞s�t t � � � h

�t � � dt � (12)

In the discretized form this leads to

gt �� I

∑i � � I

st � ihi (13)

with the half width of the filter kernel denoted by I. Basi-cally this means, that we have to collect the contribution ofneighboring input samples multiplied by the correspondingfilter values to get a new filtered output sample. Instead ofthis gathering approach, Hadwiger et al. use a distributingapproach for a hardware-accelerated implementation. Thatis, the contribution of an input sample is distributed to itsneighboring samples, instead of the other way. The orderwas chosen, since this allows to collect the contribution ofa single relative input sample for all output samples simulta-neously. The term relative input sample denotes the relativeoffset of an input sample to the position of an output sample.The final result is obtained by adding the result of multi-ple rendering passes, whereby the number of input samplesthat contribute to an output sample determine the number ofpasses.

Figure 33 demonstrates this in the example of a one-dimensional tent filter. As one left-handed and one right-handed neighbor input sample contribute to each output sam-ple, a two-pass approach is necessary. In the first pass, the in-put samples are shifted right half a voxel distance by meansof texture coordinates. The input samples are stored in atexture-map that uses nearest-neighbor interpolation and isbound to the first texture stage of the multi-texture unit (seeFigure 34). Nearest-neighbor interpolation is needed to ac-cess the original input samples over the complete half ex-tend of the filter kernel. The filter kernel is divided into

input samples

resampling points

input samples

resampling points

Figure 33: Distributing the contributions of all “left-hand”(a), and all “right-hand” (b) neighbors, when using a tentfilter as a simple example for the algorithm.

two tiles. One filter tile is stored in a second texture map,mirrored and repeated via the GL_REPEAT texture environ-ment. This texture is bound to the second stage of the multi-texture unit. During rasterization the values fetched by thefirst multi-texture unit are multiplied with the result of thesecond multi-texture unit. The result is added into the framebuffer. In the second pass, the input samples are shifted lefthalf a voxel distance by means of texture coordinates. For asymmetric filter the same unmirrored filter tile is reused. Theresult is again added to the frame buffer to obtain the finalresult.

pass 1

pass 2

mirrored +

2 4

1 2 30

1 3

(nearest−neighbor interpolation)

filter tile (texture 1)

shifted input samples (texture 0)

x

output samples

Figure 34: Tent filter (width two) used for reconstruction ofa one-dimensional function in two passes. Imagine the val-ues of the output samples added together from top to bottom.

If a given hardware architecture supports 2n multi-textures, the number of required passes can be reduced byn. That is, two multi-texture units are calculating the resultof a single pass. Note, that the method outlined above is notconsidering area-averaging filters, since it is assumed thatmagnification is desired instead of minification. For minifi-cation pre-filtering approaches like mip-mapping are advan-tageous. Figure 35 demonstrates the benefit of bi-cubic fil-tering using a B-spline filter kernel over a standard bi-linearinterpolation.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 35: Using a high-quality reconstruction filter for vol-ume rendering. This image compares bi-linear interpolationof object-aligned slices (A) with bi-cubic filtering using a B-spline filter kernel (B).

8. Pre-Integrated Volume Rendering

Obviously, as outlined in Section 2, post-classification willreproduce high frequencies of the transfer function. How-ever, as observed in Section 5.1, high frequencies (e.g. iso-surface peaks) are only reproduced on slice polygons. In or-der to visualize details of the transfer function in betweenslice polygons, additional tri-linearly interpolated slice mustbe rendered. As this demands higher rasterization require-ments from the graphics hardware, framerates considerablydecrease.

8.1. Pre-Integrated Classification

In order to overcome the limitations discussed above, the ap-proximation of the volume rendering integral has to be im-proved. In fact, many improvements have been proposed,e.g., higher-order integration schemes, adaptive sampling,etc. However, these methods do not explicitly address theproblem of high Nyquist frequencies of the color after theclassification c̃

�s�x ��� and an extinction coefficients after

the classification τ�s�x � � resulting from non-linear trans-

fer functions. On the other hand, the goal of pre-integratedclassification38 is to split the numerical integration into twointegrations: one for the continuous scalar field s

�x � and one

for the transfer functions c̃�s � and τ

�s � in order to avoid the

problematic product of Nyquist frequencies.

The first step is the sampling of the continuous scalar fields�x � along a viewing ray. Note that the Nyquist frequency

for this sampling is not affected by the transfer functions.

s f = sHxHi dLLsb = sHxHHi + 1L dLL

d

sHxHΛLL

Λi d Hi + 1L d

xHΛLxHi dL xHHi + 1L dLFigure 36: Scheme of the parameters determining the colorand opacity of the i-th ray segment.

For the purpose of pre-integrated classification, the sam-pled values define a one-dimensional, piecewise linear scalarfield. The volume rendering integral for this piecewise lin-ear scalar field is efficiently computed by one table lookupfor each linear segment. The three arguments of the tablelookup are the scalar value at the start (front) of the segments f : � s

�x�id ��� , the scalar value the end (back) of the segment

sb : � s�x� �

i � 1 � d � � , and the length of the segment d. (SeeFigure 36.) More precisely spoken, the opacity αi of the i-thsegment is approximated by

αi � 1 exp

� � i � 1 di d

τ � s�x�λ ����� dλ �

� 1 exp

� 1

0τ� �

1 ω � s f � ωsb � d dω � � (14)

Thus, αi is a function of s f , sb, and d. (Or of s f and sb, if thelengths of the segments are equal.) The (associated) colorsC̃i are approximated correspondingly:

C̃i� � 1

0c̃� �

1 ω � s f � ωsb �� exp � � ω

0τ� �

1 ω � � s f � ω � sb � d dω � � d dω �(15)

Analogously to αi, C̃i is a function of s f , sb, and d. Thus, pre-integrated classification will approximate the volume render-ing integral by evaluating the following Equation:

I � n

∑i � 0

C̃i

i � 1

∏j � 0

�1 α j �

with colors C̃i pre-computed according to Equation (15) andopacities αi pre-computed according to Equation (14). Fornon-associated color transfer function, i.e., when substitut-ing c̃

�s � by τ

�s � c � s � , we will also employ Equation (14) for

the approximation of αi and the following approximation of

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

the associated color C̃τi :

C̃τi

� � 1

0τ� �

1 ω � s f � ωsb � c� �

1 ω � s f � ωsb �� exp � � ω

0τ� �

1 ω � � s f � ω � sb � d dω � � d dω �(16)

Note that pre-integrated classification always computes as-sociated colors, whether a transfer function for associatedcolors c̃

�s � or for non-associated colors c

�s � is employed.

In either case, pre-integrated classification allows to sam-ple a continuous scalar field s

�x � without the need to in-

crease the sampling rate for any non-linear transfer function.Therefore, pre-integrated classification has the potential toimprove the accuracy (less undersampling) and the perfor-mance (fewer samples) of a volume renderer at the sametime.

One of the major disadvantages of the pre-integrated clas-sification is the need to integrate a large number of ray-segments for each new transfer function dependent on thefront and back scalar value and the ray-segment length. Con-sequently, an interactive modification of the transfer functionis not possible. Therefore several modifications to the com-putation of the ray-segments were proposed7 , that lead to anenormous speedup of the integration calculations. However,this requires to neglect the attenuation within a ray segment.Yet, this is a common approximation for post-classified vol-ume rendering and well justified for small products τ

�s � d.

The dimensionality of the lookup table can easily be reducedby assuming constant ray segment lengths d. This assump-tion is correct for orthogonal projections and view-alignedproxy geometry. It is a good approximation for perspectiveprojections and view-aligned proxy geometry, as long as ex-treme perspectives are avoided. This assumption is correctfor perspective projections and shell-based proxy geome-try. In the following hardware-accelerated implementation,two-dimensional lookup tables for the pre-integrated ray-segments are employed, thus a constant ray segment lengthis assumed.

8.2. Texture-Based Pre-Integrated Volume Rendering

The utilization of flexible graphics hardware for a hardware-accelerated implementation of pre-integrated volume render-ing for volume data on cartesian grids was first proposedin 7. The texture maps (either three-dimensional or two-dimensional textures) contain the scalar values of the vol-ume, just as for post-classification. As each pair of adjacentslices (either view-aligned or object-aligned) corresponds toone slab of the volume (see Figure 37), the texture maps oftwo adjacent slices have to be mapped onto one slice (ei-ther the front or the back slice) by means of multiple tex-tures. Thus, the scalar values along a viewing ray of bothslices (front and back) are fetched from texture maps dur-ing the rasterization of the polygon for one slab. These twoscalar values are utilized as texture coordinates for a third

s fsb

front sliceback slice

Figure 37: A slab of the volume between two slices. Thescalar value on the front (back) slice for a particular viewingray is called s f (sb).

dependent texture fetch operation. This fetch performs thelookup of pre-integrated colors and opacities from a two-dimensional texture map.

The opacities of the dependent texture map are calculatedaccording to Equation (14), while the colors are computedaccording to Equation (15) if the transfer function specifiesassociated colors c̃

�s � , and Equation (16) if it specifies non-

associated colors c�s � . In either case the compositing Equa-

tion (5) is used for blending as the dependent texture mapalways contains associated colors.

NVidia’s texture shader extension provides a textureshader operation that employs the previous texture shader’sgreen and blue (or red and alpha) colors as the

�s � t � co-

ordinates for a non-projective 2D texture lookup. Unfor-tunately, this operation cannot be used as the coordinatesare fetched from two separate 2D textures. Instead, as aworkaround, the dot product texture shader, which com-putes the dot product of the stage’s

�s � t � r � and a vector de-

rived from a previous stage’s texture lookup is used (seeFigure 38). The result of two of such dot product textureshader operations are employed as coordinates for a depen-dent texture lookup. Here the dot product is only required toextract the front and back volume scalars. This is achievedby storing the volume scalars in the red components of thetextures and applying a dot product with a constant vector�v � � 1 � 0 � 0 � T . The texture shader extension allows us to de-fine to which previous texture fetch the dot product referswith the GL_PREVIOUS_TEXTURE_INPUT_NV textureenvironment. The first dot product is set to use the fetchedfront texel values as previous texture stage, the second usesthe back texel value . In this approach, the second dot prod-

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

uct performs the texture lookup into our dependent texturevia texture coordinates obtained from two different textures.As ATI’s fragment shader OpenGL extension is more flexi-ble than the NVIDIA counterpart, the more simple setup ispossible. The front and back scalar values are fetched ac-cordingly to the NVIDIA setup. After the fetches the twoscalar value are contained in the red components of twotemporary registers. Then the scalar value from one of tem-porary registers is moved to the green component of theother register with a GL_MOV_ATI command. In the sec-ond phase of the fragment shader program, a 2D dependenttexture fetch with the red and green components of this tem-porary register fetches the pre-integrated ray-segment froma texture map.

Figure 38: Texture shader setup for dependent 2D texturelookup with texture coordinates obtained from two sourcetextures. If 3D textures are employed, the first two fetch oper-ations are replaced by the corresponding 3D texture fetches.

For direct volume rendering without lighting the fetchedtexel from the last dependent texture operation is usedwithout further processing and blended into the framebuffer with the OpenGL blending function glBlend-Func(GL_ONE,GL_ONE_MINUS_SRC_ALPHA). Acomparison of the results of pre-classification, post-classification and pre-integrated classification is shown inFigure 39.

Besides the suitability of high-frequency transfer func-tions for the evaluation of volume rendering quality, theyalso allow to identify homogeneous regions and small vari-ations in the volume data (see Figure 40, right). Randomtransfer functions are a way to visualize all iso-surfaces inthe volume data at once, because a random transfer functionconsists of a large number of peaks, each of which representsa single iso-surface. Homogeneous regions have a low iso-surface density. In contrast, large variations, i.e. large gra-dients are identifiable by applying gradient-weighted opac-ity. The usual way to implement gradient weighting is to

Figure 39: Comparison of the results of pre-, post- andpre-integrated classification for a random transfer function.Note, that pre-classification does not reproduce high fre-quencies of the transfer function, that post-classification re-produces the high frequencies on the slice polygons, but pre-integrated classification produces the best visual result dueto the reconstruction of high frequencies in the volume.

store a pre-computed gradient-magnitude per voxel and touse a two-dimensional transfer function to enhance bound-ary structures. However, a directional derivate in the view-direction is implicitly contained within the pre-integratedvolume rendering scheme, simply by assigning higher opac-ity to texels that are further away from the diagonal of the de-pendent texture containing the pre-integrated ray segments(see Figure 40, left). Note, that entries near the diagonal ofthe dependent texture correspond to ray segments with simi-lar front and back scalar values, i.e. low gradient magnitude,whereas entries further away from the diagonal correspondto high directional gradient magnitude.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 40: Gradient-weighted opacity allows to identify re-gions of big variation in the data (left), while high-frequencytransfer functions allow to identify small variations (right).Note, that homgeneous and inhomogeneous regions insidethe engine data set can easily be identified.

8.3. Iso-surfaces and Shading

One of the additional advantages of pre-integrated classifica-tion is the possibility of classifying each ray segment in thepre-processing step regarding the existence of iso-surfaces inthat segment. An iso-surface is contained within a ray seg-ment, if the iso-value lies in between the scalar values at thefront and the back slice position along the ray. Just by ini-tializing a transparent dependent texture and then coloringpixels in this texture with the color and alpha value that cor-respond to iso-surfaces that intersects the ray segments, therendering of iso-surfaces is possible. Certainly this is not re-stricted to a single iso-surface per ray segment. If more thanone iso-surface intersects a ray segment, the iso-surfaces areblended respectively and the resulting color and alpha valueis used for the corresponding pixel in the dependent texture.Additionally, the back and front of the iso-surface can becolored independently. This is demonstrated in Figure 41,where a dependent texture that is used for visualizing 10 in-dependently colored iso-surfaces at once. Certainly, visual-izing multiple isosurfaces at once has no impact on framerates, as this only requires to modify the dependent texture.

For shading calculations it is common to employ RGBAtextures, that hold the volume gradient in the RGB compo-nents and the volume scalar in the ALPHA component. Asdot3 dot-products on the NVIDIA architecture are requiredto extract the front and back volume scalar from a RGBAtexel, the scalar data has to be stored in one of the RGB com-ponents (here in red). The first gradient component is storedin the ALPHA component in return. In the register combinersthe red and alpha values are swapped back for shading cal-culations with RGB gradients. This workaround is not nec-essary on the ATI Radeon8500 board, since this architectureallows more flexible per-fragment operations, i.e. the scalarvalue in the alpha channel can be simply moved to anothercolor channel for a dependent texture fetch.

After the color and alpha values have been fetched from

the dependent texture, the gradients from the front andthe back slice are available for shading calculations. Note,that due to limited amount of fetch operations on the cur-rent NVIDIA GPUs and the limited amount of dependencyphases on the ATI GPUs, light maps cannot be employed forshading. Instead dot-product shading is used to evaluate anambient, diffuse and specular lighting term. The gradient islinearly interpolated to the position of the intersection of theiso-surfaces inside a ray segment. If multiple iso-surfaces arecontained within a ray segment, the gradient is interpolatedto the position of the most front iso-surface or a weightedcombination of multiple iso-surface gradients is used, if mul-tiple semi-transparent iso-surfaces are visible28. The resultof such a shading calculation is shown in Figure 41.

Figure 41: Multiple shaded non-polygonal iso-surfaces thatwere extracted with the corresponding dependent texture(bottom, right). Note, that the front and back face of the iso-surfaces have different colors.

Beyond that, volume shading29 can easily be integratedinto the pre-integration scheme. As for iso-surfaces, RGBAtextures with gradients and scalar values are employed. In-stead of the special dependent iso-surface texture, the pre-integrated dependent texture is required again. For shading,the gradient is averaged between the front and back gradient.Figure 42 shows the result of pre-integrated volume shading.

The volume shading approach is also useful as an al-ternative approach for iso-surface visualization. As dis-cussed in Section 5.1 a peak in the transfer function de-fines a double-sided iso-surface. Due to the pre-integrationof ray-segments, the iso-surface resulting iso-surfaces do not

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 42: Pre-integrated volume shaded rendering of con-vection flow in the earth’s crust.

have holes as opposed to post-classification (see Figures 43and 21 for a comparison).

Figure 43: Visualization of iso-surfaces using transfer func-tion peaks (bottom, left) and pre-integrated shaded volumerendering does not lead to holes in the iso-surfaces for a lownumber of slices as opposed to post-classification.

9. Large Volumes

One of the main limitations of employing current PC con-sumer graphics hardware for volume rendering is the rela-tively small amount of available on-board texture memory.

Currently, a maximum of 128 MB is supported by NVIDIA’sGeForce4 and ATI’s Radeon8500 boards. Due to the quiteslow bus bandwidth of current PC architectures, the utiliza-tion of the main memory in combination with a bricking5

approach is often impractical. Although texture compres-sion is available in current consumer graphics hardwarefor quite a time in form of S3’s texture compression ex-tensions EXT_texture_compression_s3tc and theequivalent extension NV_texture_compression_vtcfor 3D textures, the compression ratios and quality is quitelimited. Multi-resolution methods39 � 21 still have not pro-vided satisfactory results.

255 255 138 3

255 228 222 104

237 146 198 24

187 59 93 214

0

1

2

3

4

......

Figure 44: Data structure for vector quantization of volumetextures. Each byte of the index data specifies one data blockconsisting of 2 � 2 � 2 voxels.

However, due to indirect memory access functionalityprovided by dependent texture fetch operations of mod-ern PC graphics hardware, decoders for compressed vol-ume data can directly be integrated into the rasterizationpipeline. Kraus et al.20 proposed to use dependent texturefetches for adaptive texture maps and vector-quantization ofvolume data. The basic idea of vector quantization of vol-ume data is illustrated in in Figure 44. Instead of the originalvolume data, the rendered 3D texture contains indices thatreference one vector of voxels in a codebook. In the imple-mentation of Kraus, the codebook includes 256 vectors con-sisting of 8 bytes corresponding to 2 � 2 � 2 voxels of theoriginal volume data. Thus, each cell of the index data spec-ifies the complete data of eight voxels with just one byte,i.e. the compression ratio is about 8 : 1 since the size of therelatively small codebook may be ignored. The codebook iscomputed in a pre-processing step with the help of one ofthe numerous vector quantization algorithms.

Since the dependent fetch from the codebook is done af-ter the filtering step, currently nearest neighbor interpolationis employed. In a hardware implementation on future graph-ics hardware, a vector quantization decoding unit would be

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

placed in front of the texture filtering unit to facilitate tri-linear interpolation.

10. Volumetric FX

In return for the advantages the scientific visualization com-munity takes from the rapid development in the consumergraphics hardware, we can provide advanced algorithms thatfacilitate new volumetric effects in computer games. In or-der to capture the characteristics of many volumetric objectssuch as clouds, smoke, trees, hair, and fur, high frequencydetails are essential. Ebert’s6 approach for modeling cloudsuses a coarse technique for modeling the macrostructure anduses procedural noise-based simulations for the microstruc-ture (see Figure 45). This technique was adapted by Kniss19

to interactive volume rendering through two volume pertur-bation approaches which are efficient on modern graphicshardware. The first approach is used to perturb texture co-ordinates and is useful to perturb boundaries in the volumedata, e.g. the boundary of the cloud in Figure 45. The secondapproach perturbs the volume itself, which has the effect thatmaterials appear to have inhomogeneities.

Both volume perturbation approaches employ a small 3D-perturbation volume with 323 voxels. Each texel is initial-ized with four random 8-bit numbers, stored as RGBA com-ponents, and blurred slightly to hide the artifacts caused bytrilinear interpolation. Texel access is then set to repeat. Anadditional pass is required for both approaches due to lim-itations imposed on the number of textures which can besimultaneously applied to a polygon, and the number of se-quential dependent texture reads permitted. The additionalpass occurs before the steps outlined in the previous section.Multiple copies of the noise texture are applied to each sliceat different scales. They are then weighted and summed perpixel. To animate the perturbation, they add a different offsetto each noise texture’s coordinates and update it each frame.

In the case of the modification of the location of the dataaccess for the volume, the three components of the noisetexture form a vector, which is added to the texture coor-dinates for the volume data per pixel. Offset textures in cur-rent graphics hardware would be ideally suited to add thenoise vector to each per-fragment texture coordinate. Unfor-tunately, NVIDIA’s texture shader extension does not facil-itate 3D offset textures. Instead, the perturbed data is ren-dered to a pixel buffer that is used instead of the originalvolume data. Figure 46 illustrates this process. Notice thatthe high frequency content is created by allowing the noisetexture to repeat.

In the case of the modification of the volume itself, thescalar value for each fragment is modified with the weightedsum of the three noise textures. Then this modified scalar isemployed for the transfer function lookup.

Figure 47 shows a fireball that is rendered with a small

Figure 45: Procedural clouds. The image on the top showsthe underlying data, 643. The center image shows the per-turbed volume. The bottom image shows the perturbed vol-ume lit from behind with low frequency noise added to theindirect attenuation to achieve subtle iridescence effects.

modification of the pre-integrated volume rendering ap-proach from Section 8. Instead of a indexed 3D-texture con-taining just the scalar values from a single volume, a RGBtexture is employed, that contains the volume scalars in thered channel, a low-frequency noise texture in the green chan-nel and a high-frequency noise texture in the blue colorchannel. If we apply texture coordinates

�α � β � γ � instead of

the constant�1 � 0 � 0 � texture coordinates as outlined in Fig-

ure 38, the three volumes contained in the color channels areweighted using the dot-product before the dependent texturelookup is performed. Thus, if we store a radial distance vol-ume in the red color channel, it is perturbed. Flames are an-imated by changing the weighting factors

�α � β � γ � , while an

outwards movement is achieved by color cycling the transferfunction.

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

Figure 46: An example of texture coordinate perturbationin 2D. A shows a square polygon mapped with the originaltexture that is to be perturbed. B shows a low resolution per-turbation texture applied to the polygon multiple times at dif-ferent scales. These offset vectors are weighted and summedto offset the original texture coordinates as seen in C. Thetexture is then read using the modified texture coordinates,producing the image seen in D.

11. Limitations

Even though there has been a enormous development in theconsumer graphics market in the last few years, current low-cost graphics hardware still has its drawbacks.

First, the computational precision of current PC graphicshardware is quite limited, especially in the rasterization stageof the rendering pipeline. Framebuffers currently support 32bits, 8 bits per color channel. NVIDIA’s register combinersextension has 9 bits precision for a range from 1 to 1, whilethe texture shader unit has floating point precision. ATI’sfragment shader unit supports 16 bits precision for a rangeof values from 8 to 8, 13 bit in the range 1 to 1. Note, thatfor calculations in 16 bit precision, it is necessary to expandthe values to the full range of 8 to 8. Textures currentlyhave 8 bit precision per color channel. The first step towardshigher precision is NVIDIA’s HILO texture format, that sup-ports two channels per texture with 16 bits precision. Theresult limited precision are artifacts, that can be frequentlyobserved in hardware-accelerated renderings. One solutionwas already introduced-the pre-computation of values withthe full precision of the CPU in a pre-processing step, as em-ployed for pre-integrated volume rendering.

Another drawback is the limited programmability of cur-rent units in the rendering pipeline. NVIDIA supports fourtexture fetches in a pass, followed by 8 general combinerand one final combiner operation. ATI’s fragment shaders

Figure 47: Pre-Integrated volume rendering of a fireball.The fireball effect is obtained by mixing different volumesduring rendering. (1) Radial distance volume with high-frequency fire transfer function, (2) Perlin-Noise-Volumewith fire-transfer function, (3)Weighted combination of thedistance volume and two Perlin-Noise volumes, (4) Like (3),but with higher weights for the Perlin-Noise-volumes.

currently support 2 phases, each with 8 texture fetches fol-lowed by 8 math operations. More complex operations re-quire to split the computation into multiple rendering passes.Hardware-accelerated offscreen buffers (PBuffers) and thepossibility to directly render to a texture (render to textureOpenGL extension) help to limit the loss in performancecaused by slow read-backs for subsequent rendering passes.

As already discussed in Section 9, the limited amount oftexture memory in combination with the low bandwidth ofthe AGP bus is one of the main obstacles for volume ren-dering on low-cost PC graphics hardware. Whilst the avail-able texture memory might be sufficient for computer games,the currently employed brute-force approaches of rasterizingthe complete volume make it necessary to keep the com-plete data set in the texture memory. Current frame ratesare mainly limited by rasterization power and memory band-width. Progress in this field is thus dependent on the devel-opment of faster memory in the memory market. Besidestexture compression as discussed in Section 9, early z-testsor embedded DRAM with wide busses might be future solu-tions to this problem.

From the application developer’s point of view, the vari-ety of different (and highly incompatible) extensions, thatgive access to the features of the proprietary hardware ex-tensions of graphics hardware manufacturers on a low ab-straction layer, make it hard to develop programs, that workon all graphics boards. Shading languages, that raise the ab-

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

straction layer are currently under development for the futureOpenGL2.0 and DirectX9 APIs. A first big step is the Stan-ford shading language, that is already available for quite awhile26.

12. Summary and Conclusions

This report presented the latest algorithms that utilizeflexible consumer graphics hardware for high-quality vol-ume rendering. We have shown a number of new, highly-optimized methods, that permit an interactive visualizationof quite large volumetric data sets with yet unseen speed andquality.

Apart from the these algorithms, multi-textures and pro-grammable rasterization units provide a number of addi-tional options, like per-fragment clipping with arbitraryclipping geometry40, speedup of rendering using parallelrasterization37 and the rendering of translucent materials19.The parallelization of multiple consumer graphics boardsto a rendering cluster has been investigated by severalauthors11 � 24.

We believe, that the trend towards more programmabilityof the graphics pipeline will carry on. A fully programmablerasterization unit, that facilitates arbitrary per-fragment pro-grams, might even be available in the next generation of con-sumer graphics hardware. Early z-tests will prevent unneces-sary rasterization and memory accesses. Another trend is thedevelopment towards floating point precision in the frame-buffer and textures. This is particularly useful for multi-passapproaches, that currently suffer from accumulated errorsduring rendering passes. Instead of merely rendering ge-ometry created from scientific data, programmable graphicshardware will be increasingly used for filtering and mappingtasks in scientific data visualization in the future.

The next generation of consumer graphics hardware isjust on the horizon. 3DLabs just announced1 the P10 VisualProcessor Unit (VPU), that will be fully programmable andsupport the upcoming standards OpenGL 2.0 and DirectX9.NVIDIA’s and ATI’s next generation GPUs will follow infall 2002.

Acknowledgements

We would like to thank Joe Kniss, Christof Rezk-Salama,Markus Hadwiger and Martin Kraus for providing imagesand illustrations for this report.

Web-Links

Semian:http://www.cs.utah.edu/~jmk/simian/

Pre-Integrated Volume Rendering:http://wwwvis.informatik.uni-stuttgart.de/~engel/

Interactive Volume Rendering:

http://www9.informatik.uni-erlangen.de/Persons/Rezk/

High-Quality filtering on PC graphics hardware:http://www.vrvis.at/vis/research/hq-hw-reco/algorithm.html

Siggraph course on PC volume graphics:http://www.siggraph.org/s2002/conference/courses/crs42.html

References

1. 3DLabs Press Release.http://www.3dlabs.com/whatsnew/pressreleases/pr02/02-05-03-vpu.htm,Sunnyvale, CA - May 3, 2002 23

2. U. Behrens and R. Ratering. Adding Shadows to a Texture-Based Volume Renderer. In 1998 Volume Visualization Sym-posium, pages 39–46, 1998. 12

3. J. F. Blinn. Jim Blinn’s Corner: Image Compositing–Theory.newblock IEEE Computer Graphics and Applications, 14(3),1994. 4

4. J. Blinn, Models of Light Reflection for Computer Synthe-sized Pictures. Computer Graphics, 11(2):192–198, 1977. 4

5. B. Cabral and N. Cam and J. Foran. Accelerated Volume Ren-dering and Tomographic Reconstruction Using Texture Map-ping Hardware. ACM Symposium on Vol. Vis., 1994 6, 20

6. D. Ebert, F. K. Musgrave, D. Peachey, K. Perlin and S. Worley.Texturing and Modeling: A Procedural Approach. AcademicPress July, 1998. 21

7. Klaus Engel, Martin Kraus, and Thomas Ertl. High-Quality Pre-Integrated Volume Rendering UsingHardware-Accelerated Pixel Shading. Proc. ofEurographics/SIGGRAPH Graphics Hardware Workshop2001, 2001. 17

8. Ginsburg, D. and Hart, E. and Mitchell, J.ATI_fragment_shader, OpenGL Extension Registry:http://oss.sgi.com/projects/ogl-sample/registry/ATI/fragment_shader.txt. 6

9. R. Grzeszczuk, C. Henn and R. Yagel. Advanced Geomet-ric Techniques for Ray Casting Volumes. In SIGGRAPH 98Course Nbr. 4, Orlando, FL, 1998. 8

10. M. Hadwiger, T. Theußl, H. Hauser, and E. Gröller. Hardware-Accelerated High-Quality Filtering on PC Hardware. Proc.of Vision, Modeling, and Visualization 2001, pages 105–112,2001. 15

11. Greg Humphreys, Ian Buck, Matthew Eldridge and Pat Han-rahan. Distributed Rendering for Scalable Displays. Proceed-ings of Supercomputing, 2000. 23

12. Mark J. Kilgard. NV_register_combiners, OpenGL ExtensionRegistry:http://oss.sgi.com/projects/ogl-sample/registry/NV/register_combiners.txt. 5

13. Mark J. Kilgard. NV_register_combiners2, howpublished =OpenGL Extension Registry:http://oss.sgi.com/projects/ogl-sample/registry/NV/register_combiners2.txt. 5

14. Mark J. Kilgard. NV_texture_shader, OpenGL ExtensionRegistry:http://oss.sgi.com/projects/ogl-sample/registry/NV/texture_shader.txt. 5

c�

The Eurographics Association 2002.

Engel and Ertl / High-Quality Volume Rendering

15. Mark J. Kilgard. NV_texture_shader2, OpenGL ExtensionRegistry:http://oss.sgi.com/projects/ogl-sample/registry/NV/texture_shader2.txt. 5

16. Gordon Kindlmann and James W. Durkin. Semi-AutomaticGeneration of Transfer Functions for Direct Volume Render-ing. In IEEE Symposium On Volume Visualization, 79–86,1998. 14

17. Joe Kniss, Gordon Kindlmann and Charles Hansen. Inter-active Volume Rendering Using Multi-Dimensional TransferFunctions and Direct Manipulation Widgets. IEEE Visualiza-tion 2001 14

18. Joe Kniss, Gordon Kindlmann and Charles Hansen. Multi-Dimensional Transfer Functions for Interactive Volume Ren-dering. Transactions on Visualization and Computer Graphics2002 13

19. Joe Kniss, Simon Premoze, Charles Hansen and David Ebert.Interative Translucent Volume Rendering and ProceduralModeling. IEEE Visualization 2002, 13, 21, 23

20. Martin Kraus and Thomas Ertl Adaptive Texture Maps Proc.of Eurographics/SIGGRAPH Graphics Hardware Workshop2002, 2002 20

21. E. LaMar, B. Hamann and K. Joy. Multiresolution Techniquesfor Interactive Texture-based Volume Visualization. Proc.IEEE Visualization, 1999. 8, 20

22. Marc Levoy. Display of Surfaces from Volume Data. IEEEComputer Graphics & Applications, 8(5):29–37, 1988. 14

23. Marc Levoy. Efficient ray tracing of volume data. ACM Trans-actions on Graphics, 9(3):245–261, July 1990. 2, 3

24. M. Magallon, M. Hopf, and T. Ertl. Parallel Volume Renderingusing PC Graphics Hardware. In Pacific Graphics, 2001. 23

25. S. R. Marschner and R. J. Lobb. An evaluation of reconstruc-tion filters for volume rendering. In Proceedings of IEEE Vi-sualization ’94, pages 100–107, 1994. 2

26. William R. Mark, Svetoslav Tzvetkov and Pat Hanrahan.A Real-Time Procedural Shading System for ProgrammableGraphics Hardware. In Proceedings of SIGGRAPH ’01, pages159–170,2001. 23

27. Nelson Max Optical Models for Direct Volume Rendering.IEEE Transactions on Visualization and Computer Graphics,1(2):99–108, 1995. 3

28. M. Meißner. and S. Guthe and W. Strasser Interactive Light-ing Models and Pre-Integration for Volume Rendering on PCGraphics Accelerators. In Proceedings of Graphics Interface,Conference on Human Computer Interaction and ComputerGraphics, 2002. 19

29. M. Meißner and U. Hoffmann and W. Straßer. Enabling Clas-sification and Shading for 3D Texture Based Volume Render-ing Using OpenGL and Extensions Visualization ’99, 1999.19

30. M. Meißner, U. Kanus and W. Straßer. VIZARD II, A PCI-Card for Real-Time Volume Rendering. Proc. Eurograph-ics/Siggraph Workshop on Graphics Hardware, 61–68, 1998.1

31. D. P. Mitchell and A. N. Netravali. Reconstruction filters incomputer graphics. In Proceedings of SIGGRAPH ’88, pages221–228, 1988. 2

32. M. Segal and K. Akeley. The OpenGL Graphics System: ASpecification. http://www.opengl.org. 1

33. A. V. Oppenheim and R. W. Schafer. Digital Signal Process-ing. Prentice Hall, Englewood Cliffs, 1975. 2

34. H. Pfister and J. Hardenbergh and J. Knittel and H. Lauer andL. Seiler. The VolumePro real-time ray-casting system. Proc.of SIGGRAPH ’99, 251–260, 1999. 1

35. B.T. Phong. Illumination for Computer Generated Pictures.Communications of the ACM, 18(6):311–317, June 1975. 4,10

36. T. Porter and T. Duff. Compositing digital images. ACM Com-puter Graphics (Proc. of SIGGRAPH ’84), 18:253–259, 1984.4

37. Christof Rezk-Salama, Klaus Engel, Michael Bauer, GüntherGreiner, and Thomas Ertl. Interactive Volume Rendering onStandard PC Graphics Hardware Using Multi-Textures andMulti-Stage Rasterization. Proc. of Eurographics/SIGGRAPHGraphics Hardware Workshop 2000, 2000. 7, 10, 23

38. S. Röttger, M. Kraus, and T. Ertl. Hardware-AcceleratedVolume and Isosurface Rendering Based On Cell-Projection.Proc. of IEEE Visualization 2000, pages 109–116, 2000. 16

39. M. Weiler, R. Westermann, C. Hansen, K. Zimmerman andT. Ertl. Level-Of-Detail Volume Rendering via 3D Textures.Procceedings of IEEE VolVis 2000, 7–13, 2000. 20

40. D. Weiskopf, K. Engel and T. Ertl Texture-Based VolumeClipping via Fragment Operations Proc. of IEEE Visualiza-tion 2002, 2002 23

41. R. Westermann and T. Ertl. Efficiently Using GraphicsHardware in Volume Rendering Applications. Proc. ofSIGGRAPH ’98, 169–178, 1998. 10

c�

The Eurographics Association 2002.