How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie...

12
How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm 1 of 12 1/6/2005 5:39 PM How NOT to Lie with Visualization Bernice E. Rogowitz [email protected] Lloyd A. Treinish [email protected] IBM Thomas J. Watson Research Center Yorktown Heights, NY Introduction How data are represented visually has a powerful effect on how the structure in those data is perceived. For example, in Figure 1 , four representations of an MRI scan of a human head are shown. The only difference between these images is the mapping of color to data values, yet, the four representations look very different. Furthermore, the inferences an analyst would draw from these representations would vary considerably. That is, variations in the method of representing the data can significantly influence the user's perception and interpretation of the data.

Transcript of How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie...

Page 1: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

1 of 12 1/6/2005 5:39 PM

How NOT to Lie with Visualization

Bernice E. Rogowitz [email protected]

Lloyd A. Treinish [email protected]

IBM Thomas J. Watson Research Center Yorktown Heights, NY

Introduction

How data are represented visually has a powerful effect on how the structure in those data is perceived. For example, in Figure 1, four representations of an MRI scan of a human head areshown. The only difference between these images is the mapping of color to data values, yet, thefour representations look very different. Furthermore, the inferences an analyst would draw fromthese representations would vary considerably. That is, variations in the method of representingthe data can significantly influence the user's perception and interpretation of the data.

Page 2: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

2 of 12 1/6/2005 5:39 PM

Figure 1. Four colormaps applied to a slice of an MRI scan of a human head. Theydemonstrate how the representation can influence the interpretation of the data.

The importance of visual representation has been a lively topic at the annual IEEE Computer Society Visualization conferences. This concept was first publicized by Huff [1954] in his bookHow to Lie with Statistics. In this book and in the How to Lie with Visualization sessions atthose conferences, the major concern is how the interpretation of data can be subverted by manipulating the data representation. In this article, we take a converse tack, and ask -- how canthe interpretation of data be enhanced? To address this question, we consider the structure of thedata, the perception of the visual dimensions used in visualization, and the task the analyst is trying to solve.

The Complexity of Mapping Data on Visual Representations

Modern interactive systems give the user free reign over the mapping of data onto visual dimensions, and the number of visual dimensions available for data representation is exploding.A visualization can use x, y, and z to represent the spatial dimensions of an object, color can be mapped onto a surface representing a fourth, the surface can be deformed according to a fifth, isocontour lines can represent a sixth, coloring them can represent a seventh, glyphs on the surface can represent a few more, not to mention animation, transparency, and stereo. This greatflexibility, however, can open a Pandora's box of problems for the user, and can easily give rise to visual representations which do not adequately represent the structure in the data or which introduce misleading visual artifacts.

The appropriate use of color is an area of particular consternation. This is partly because theperceptual impact of a color cannot be reliably predicted from a knowledge of the red, green and blue components generally made available to users. Furthermore, even if the three perceptualdimensions of color are surfaced to the users, they may not be aware that different aspects of the color signal communicate different characteristics of the data. Without guidance about thephysical or psychophysical properties of color, or about which colormaps are most appropriate for which types of data, the user is at a loss, even if the system provides a colormap editor or a library of pre- computed colormaps.

One common way developers of visualization software address this problem is to provide users with a default colormap. The most common default colormap, shown in the top left panel of

Page 3: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

3 of 12 1/6/2005 5:39 PM

Figure 1, maps the lowest value in the variable to blue, the highest value to red, and interpolates in color space (red, green, blue) to produce a color scale. This rainbow hue colormap is widelyused in visualization, but produces several well-documented artifacts (e.g., Lefkowitz and Herman [1992]; Robertson [1988]; Rogowitz, Ling and Kellogg [1992]). In this MRI image, forexample, the colormap creates perceived contours which do not reflect discrete transitions in the data, structures in the data which fall within one of these artificial bands are not represented, and attention is drawn to the yellow areas because they are the brightest, not because they are in any way the most important.

Giving users tools for creating accurate and effective representations of their data, furthermore, isnot confined to the selection of colormaps. There is also confusion in the application of contours,transparency, depth, and animation, especially since the perception of these dimensions can ofteninteract. For example, if a blue and a red object are placed behind a translucent green object, youmight expect that both objects would maintain their color, but be tinged by the color of the transparent layer. This is true for the blue object, which appears bluish-green, but not for the redobject, which appears yellow. This effect is well understood within the context of the algorithmby which transparency is generally computed and principles of additive color mixture, but can produce surprising results for the user (Rogowitz and Treinish [1993a]).

Using Perceptual Rules to Guide the Design Process

Since most users do not want to become experts in human perception, our strategy is to incorporate guidance directly into the visualization software to aid in the the visual design process (Rogowitz and Treinish [1993a]; [1993b]; [1994]; and Bergman, Rogowitz and Treinish[1995]). In our approach, which we call PRAVDA (Perceptual Rule-based Architecture forVisualizing Data Accurately), rules filter the choices offered to the user, based on principles of human perception, attention, and color theory.

In the case of colormap selection, for example, we have constructed a library of colormaps and a set of perceptual rules which constrain the set of colormaps offered to the user. These rules areparameterized by metadata about 1) data type 2) data spatial frequency, 3) visualization task, and 4) other design choices made by the user. Three colormaps designed for different visualizationtasks are compared with the default colormap in Figure 1. The isomorphic colormap (upper right)is designed to produce a faithful representation of the structure in the data. In this isomorphiccolormap equal steps in data value correspond to equal perceptual steps in the color scale. Thesegmented colormap (lower left) is designed to delineate regions visually. The highlightingcolormap (lower right) is designed to draw the users' attention to regions in the image which havecertain characteristic features (lower right). This color map was designed to draw attention toareas which have data values near the median of the range.

The four colormaps in Figure 1 clearly demonstrate how different mappings of data onto color scales produce different representations of the data. The goal of our work is to understand howdifferent information in the data is communicated by specific characteristics of the visual representation, and to harness this knowledge so it can be used routinely in visualization. The restof this article focuses on the colormap problem, describing the perceptual rules and metadata

Page 4: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

4 of 12 1/6/2005 5:39 PM

required to drive colormap selection.

Faithfully Representing the Structure in the Data

In order to accurately represent the structure in the data, it is important to understand the relationship between data structure and visual representation. For nominal data, objects should bedistinguishably different, but since the data themselves are not ordered, there should be no perceptual ordering in the representation. For ordinal data, objects should be perceptuallydiscriminable, but the ordering of the objects should be apparent in the representation. In intervaldata, equal steps in data value should appear as steps of equal perceived magnitude in the representation. In ratio data, values increase and decrease monotonically about a true zero orother threshold, which should be preserved in the data representation.

One important application of scientific visualization is to represent the magnitude of a variable atevery spatial position. In many cases, the interpretation of the data depends on having the visualpicture accurately represent the structure in the data. In order to accurately represent intervaldata, for example, the visual dimension chosen should appear continuous to the user. Candidatecolormaps which preserve the monotonic relationship between data values and perceived magnitude can be drawn from psychophysical scaling experiments. Stevens [1966], for example,identified a set of sensory dimensions for which a monotonic increase in stimulus intensity produced a monotonic increase in perceived magnitude. In particular, he found the shape of thisrelationship to be a power law, with each sensory dimension characterized by its exponent.Perceived magnitude obeys a power relationship with physical luminance over a very large rangeof gray scales, which may explain why grayscale colormaps are commonly used in medicalimaging. Another dimension which displays this behavior is color saturation, the progression of acolor from vivid to pastel.

The top row of Figure 1 compares the effectiveness of the default colormap and a colormap designed to produce an isomorphic representation of interval data. Looking at the color bar forthe default colormap we see bands of colors, not a gradual increase across the range. Forexample, nearly the entire range from 50 to 100 looks uniformly cyan. Although the data changeby almost a factor of 2, all the values in this range look identical. This is also true for magneticresonances in the range from 125 to 200, which appear to be green. This colormap produces acontoured impression, and masks the subtle variations in MRI intensity.

In contrast, the isomorphic representation used in the upper right, although less dramatic, more accurately reflects the underlying structure in the data. In this colormap, luminance andsaturation both increase monotonically with data value. That is, brightness increasesmonotonically and hue, which begins as a pure vivid blue, becomes more and more pastel. Thiscolormap produces a monotonic increase in perceived magnitude over the range. Using thiscolormap, structures which are invisible using the rainbow hue map can be easily seen. Forexample, the spatial structure in the midbrain and striate cortex which appear uniform green in the default map are highly detailed in the isomorphic map. Given the artifacts introduced by thedefault colormap, it is easy to understand why the medical community has been so cautious aboutadding color to their visual representations.

Page 5: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

5 of 12 1/6/2005 5:39 PM

The Importance of Spatial Frequency

Not all isomorphic colormaps, however, are appropriate for all data sets because different components of the color signal are processed differently by the human visual system. One veryimportant distinction for visualization is that the components of the color signal have different spatial sensitivities. The luminance component in a color (the brightness/darkness component) iscritical for carrying information about high spatial frequency variations in the data. If thecolormap does not contain a monotonic luminance variation, fine resolution information will not be seen. Conversely, the saturation and hue components in color are critical for carryinginformation about low spatial frequency variations in the data. A colormap which only varies inluminance (e.g., a grayscale image) cannot adequately communicate information about gradual changes in the spatial structure of the data.

This means that the balance of luminance and saturation variation in an isomorphic colormap depends on the spatial frequency of the data. Interval data with high spatial frequencyinformation call for a monotonic scale with a strong luminance component; interval data with low-spatial-frequency information call for a monotonic scale with a strong saturation component.

These ideas are illustrated in Figure 2, which shows a luminance-based colormap (left side) and saturation-based colormap (right side) applied to low-spatial-frequency data (top) and high-spatial-frequency data (bottom). In all four cases, continuous data are mapped ontoisomorphic colormaps, so contouring and other artifacts have already been eliminated. Thisfigure thus highlights additional advantages of taking spatial frequency into account.

Page 6: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

6 of 12 1/6/2005 5:39 PM

Figure 2. Isomorphic colormaps for low and high spatial frequency data. The top rowshows low spatial frequency data from a weather model. The bottom row shows highspatial frequency data from a radar scan. The high frequency colormap (left) reveals moredetail in the radar data. The low frequency colormap (right) reveals more structure in theweather data.

The data in the top row are output from a weather model which computes, among other things, the variation in relative humidity over a geographic region. The structure of this lowspatial-frequency variation is practically lost when the data are depicted with a map designed for depicting high spatial frequency information (top left). The right-hand map, designed to exposelow spatial-frequency structure, gives the analyst more information, especially in regions where the humidity changes slowly over the geography such as near the lower central portion below65%. Also, in the lower right-hand corner of the image, the infusion of the high humidity air intothe low-humidity area is clearly seen as a yellow stream, virtually invisible with the colormap intended for high-spatial-frequency data.

The images in the bottom row show a radial sweep from a weather radar sensor, measuring the high-spatial frequency variation of reflected intensity (e.g., from thick clouds). The high spatialfrequency map (left) gives a good representation of the finely detailed structure of these data, andalso reveals sampling artifacts introduced by the sensor. The low spatial frequency colormap(right) blurs the fine detail and, because the values above the mean are a different hue, puts inappropriate emphasis on these regions, shown in yellow.

Colormaps for Segmentation Tasks

The rules for providing isomorphic colormaps for ratio and interval data are also effective in creating maps for segmenting data. The luminance component conveys monotonicity for highspatial frequency data, while the saturation component can be used to convey monotonicity in low spatial-frequency data. Since the steps are explicitly defined, however, luminance steps canalso be effectively used for low spatial-frequency data. In creating a segmented colormap, it isimportant that the segments are each discriminably different from one another, which limits the number of steps which can be represented. We have found that more steps can be effectivelydiscriminated for low spatial-frequency data than for high.

Figure 3 shows a five-level segmented colormap (left-side) and a ten-level segmented colormap (right-side) applied to low-spatial frequency data (top) and high-spatial frequency data (bottom).For low spatial-frequency data (top row), additional levels provide additional information. In thiscase, additional features of the earth's magnetic field in the southern hemisphere are revealed. Forexample, in the right-hand image, the gradient about the south magnetic pole is clearer. Bycontrast, additional features of the high spatial-frequency cloud fraction observations (bottom row) are not revealed by increasing the number of colormap steps, effectively blurring the segmentation.

Page 7: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

7 of 12 1/6/2005 5:39 PM

Figure 3. Segmented colormaps for low and high spatial frequency data. The top row showslow spatial frequency data of the earth's magnetic field. The bottom row shows high spatialfrequency cloud fraction data. The high frequency colormap (right) reveals moreinformation about the structure of the low-frequency data, but reduces the information communicated for the high-frequency data.

Colormaps for Highlighting Tasks

Rules for selecting colormaps which highlight particular features in the data can be drawn from the literature on attention (e.g., Treisman and Gelade [1980]; Julesz [1981]). For visualization,this requires a user to identify ranges of data to highlight perceptually such as shown in the lowerright of Figure 1.

Using these principles, it is possible to construct colormaps which highlight particular ranges in the data. An interesting extension of this approach is illustrated in Figure 4, which displays data from the visible part of the spectrum remotely-sensed from space.

Page 8: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

8 of 12 1/6/2005 5:39 PM

Figure 4. Two isomorphic colormaps applied to remotely sensed data, and a highlightingscheme which identifies regions of interest without disturbing the perceived spatial structure of the data.

The left-hand panels display these data using two isomorphic colormaps designed for high-spatial-frequency data. The right-hand panel shows how color can be used to highlight aregion of interest without disturbing the perception of other aspects of the data. Across the entireimage, the luminance component of the colormap is identical. Within the regions of interest,however, the hue component is varied, producing three distinct, semantically differentiable regions, one blue, one green and one yellow. This method has been used successfully to helpusers mark regions of interest and to highlight for the user regions in the image which display certain characteristics, such as containing data which match a template.

Complementary Visual Techniques

An important task in visualization is to represent data from many sources simultaneously. Theimage at the top of Figure 5 is derived from three spectral bands of another remotely-sensedimage. These data are displayed in a typical fashion, mapping the values of each spectral band tolevels of red, green and blue in the image. This representation provides a crude classification ofthe pixels. Each pixel in these data have also been categorized into 5 classes using an externalland-use classification scheme. This information could be displayed to the user by coloring thepixels according to class membership, with a different color for each class.

Page 9: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

9 of 12 1/6/2005 5:39 PM

Figure 5. Two approaches to studying the results of a land-use classification model forremotely sensed data. The top figure shows a typical pixel-based colormap. The bottomfigure illustrates a graphical approach to examining differences between classes with respect to two spectral bands.

If the spatial regions occupied by the classes are sufficiently large, each pixel could be mapped onto isoluminant blue, cyan, green, yellow, and red, as described above, to highlight the differentcategories without perturbing the spatial structure of the data.

The six panels on the bottom illustrate a complementary method for using color to understand thesemantics of class membership for such a data set. In this representation, each pixel has also beenassigned a color according to its class membership. The coloring is used, however, to study thebehavior of the different classes in terms of relationships among the various spectral bands. Thetop left plot shows the relationship between IR1 and IR2, the near-infrared and far-infraredbands. These bands are highly correlated (r=0.92). The next five plots show this samerelationship separately for each of the five classes. Even looking at this one bivariate relationshipreveals that the red and blue classes are different from the whole population and from the other classes in that there is a much smaller correlation between these two bands, that the green and yellow classes are the only classes with low values in both infrared bands, and the cyan class is the only one with high values in both infrared bands. This type of analysis allows the user to gaininsight into the semantics of class membership.

PRAVDAColor

Figure 6 shows the PRAVDA rule-based colormap selection tool incorporated into an IBM Visualization Data Explorer (Abram and Treinish [1995]) program. In this visual program, data

Page 10: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

10 of 12 1/6/2005 5:39 PM

are imported into Data Explorer and flow into a module called PRAVDAColor. This toolcomputes metadata about the spatial frequency of the data and about data type (e.g., ordinal, interval or ratio), and asks the user to select the goal of the visual representation (e.g., isomorphic, segmentation, or highlighting) via a control panel widget.

Figure 6. Data Explorer visual program incorporating PRAVDAColor, demonstratingselection of candidate isomorphic colormaps for a low-spatial-frequency data set.

These metadata flow to rules which constrain the set of colormaps offered to the user. In thiscase, three colormaps have been offered to the user. Since the simulated jet engine noise datashown in this example are low spatial frequency interval data, and the task selected is isomorphic, these colormaps all encode variations in magnitude as variations in the saturation of opponent-process pairs. Clicking on any of the colormaps applies them directly to the data, andthe user is free to vary the range of the colormap. In this case, the full range of the first colormaphas been selected and the data are represented by a blue/yellow saturation scale.

Conclusions

Modern systems for creating visualizations have evolved to the extent that non-experts can createmeaningful representations of their data. However, it is still not easy enough, mainly because thevisual effects of processing, realizing and rendering data are not well-understood by the user, andthe process of creating visualizations is largely ad hoc. Often countless iterations are spent trying

Page 11: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

11 of 12 1/6/2005 5:39 PM

to get a color right, to draw attention to a particular juxtaposition in the data, or to understand why a feature on the display screen does not seem to correlate to a physical phenomenon.

Our approach emphasizes a migration from a tool-based visualization system to a rule-based system which helps the user navigate through a complex design space. Since the design processis iterative, the application of the rules is under interactive user control. The rules we haveimplemented so far draw on knowledge from the areas of human perception and color theory, butthis structure could easily be extended to incorporate expertise from other domains. The goal ofthis system is to help users make better, faster representations of their data.

Acknowledgments

This work is partially supported under NASA grant CAN NCC5-101.

The authors wish to acknowledge Lawrence Bergman's contribution to the development of PRAVDAColor and John Gerth's extension of the PRAVDAColor isomorphic maps to region-of-interest highlighting in satellite images. We would also like to thank Vittorio Castelliand Ed Kalin for assistance with the image classification work described in Figure 5.

MRI data are available courtesy of New York University, New York, NY. Cloud fraction andmagnetic field data are available courtesy of NASA/Goddard Space Flight Center, Greenbelt,MD. Humidity data are available courtesy of NOAA Forecast Systems Laboratory, Boulder, CO.Radar data are available courtesy of Sigmet, Inc., Westford, MA. Jet engine noise data areavailable courtesy of CRAFT, Inc., Dublin, PA. Remotely sensed images are available courtesyof EDC, USGS, Sioux Falls, SD.

The analysis and visualization at the bottom of Figure 5 were produced using the Diamond software, which was developed at the IBM T. J. Watson Research Center and is marketed bySPSS, Inc.

All other figures were produced with the IBM Visualization Data Explorer, which was developedat the IBM T. J. Watson Research Center.

References

Abram, G. and L. Treinish. An Extended Data-Flow Architecture for Data Analysis andVisualization. Proceedings of the IEEE Visualization 1995 Conference, October 1995, pp.263-270.

Bergman, L. D., B. E. Rogowitz and L. A. Treinish. A Rule-based Tool for Assisting ColormapSelections. Proceedings of the IEEE Visualization 1995 Conference, pp. 118-125, October 1995.

Huff, D. How to Lie with Statistics, New York. Norton, 1954.

Julesz, B. Textons, the elements of texture perception, and their interactions. Nature, 290, 12,91-97, 1981.

Page 12: How NOT to Lie with Visualization - New Mexico Institute ...viz/papers/Rogowitz.pdf · How to Lie with Statistics. In this book and in the How to Lie with Visualization sessions at

How NOT to Lie with Visualization http://www.research.ibm.com/dx/proceedings/pravda/truevis.htm

12 of 12 1/6/2005 5:39 PM

Lefkowitz, H. and G. T. Herman. Color Scales for Image Data. IEEE Computer Graphics andApplications, 12, n. 1, pp. 72-80, January 1992.

Robertson, P. K. Visualizing Color Gamuts: A User Interface for the Effective Use of PerceptualColor Spaces in Data Displays. IEEE Computer Graphics and Applications, 8, pp. 50-63,September 1988.

Rogowitz, B. E., D. T. Ling and W. A. Kellogg. Task Dependence, Veridicality, andPre-Attentive Vision: Taking Advantage of Perceptually-Rich Computer Environments.Proceedings of the SPIE Symposium, 1666, Human Vision, Visual Processing and DigitalDisplay III, pp. 504-513, February 1992.

Rogowitz, B. E. and L. A. Treinish. Data Structures and Perceptual Structures. Proceedings ofthe SPIE Symposium, 1913, Human Vision, Visual Processing and Digital Display IV, pp.600-612, February 1993.

Rogowitz, B. E. and L. A. Treinish. An Architecture for Perceptual Rule-Based Visualization.Proceedings of the IEEE Visualization 1993 Conference, pp. 236-243, October 1993.

Rogowitz, B. E. and L. A. Treinish. "Using Perceptual Rules in Interactive Visualization".Proceedings of the SPIE Symposium, 2179, Human Vision, Visual Processing and DigitalDisplay V, pp. 287-295, February 1994.

Stevens, S. S. Matching Functions Between Loudness and Ten Other Continua. Perception andPsychophysics, 1, pp. 5-8, 1966.

Treisman, A. and G. Gelade. A Feature Integration Theory of Attention. Cognitive Psychology,18, pp. 643-662, 1980.