BYJeff ReY heeR, michaeL BostocK, anD VaDim oGieVetsKY …digitalcuration.umaine.edu › resources...

9
JUNE 2010 | VOL. 53 | NO. 6 | COMMUNICATIONS OF THE ACM 59 DOI:10.1145/1743546.1743567 Article development led by queue.acm.org A survey of powerful visualization techniques, from the obvious to the obscure. BY JEFFREY HEER, MICHAEL BOSTOCK, AND VADIM OGIEVETSKY A Tour Through the Visualization Zoo help engage more diverse audiences in exploration and analysis. The challenge is to create effective and engaging visu- alizations that are appropriate to the data. Creating a visualization requires a number of nuanced judgments. One must determine which questions to ask, identify the appropriate data, and select effective visual encodings to map data values to graphical features such as position, size, shape, and color. The challenge is that for any given data set the number of visual encodings—and thus the space of possible visualization designs—is extremely large. To guide this process, computer scientists, psy- of valuable information on how we conduct our businesses, governments, and personal lives. To put the informa- tion to good use, we must find ways to explore, relate, and communicate the data meaningfully. The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers. Well-designed visual representations can replace cognitive calculations with simple perceptual in- ferences and improve comprehension, memory, and decision making. By mak- ing data more accessible and appeal- ing, visual representations may also THANKS TO ADVANCES in sensing, networking, and data management, our society is producing digital information at an astonishing rate. According to one estimate, in 2010 alone we will generate 1,200 exabytes—60 million times the content of the Library of Congress. Within this deluge of data lies a wealth

Transcript of BYJeff ReY heeR, michaeL BostocK, anD VaDim oGieVetsKY …digitalcuration.umaine.edu › resources...

  • june 2010 | vol. 53 | no. 6 | communications of the acm 59

    Doi:10.1145/1743546.1743567

    Article development led by queue.acm.org

    A survey of powerful visualization techniques, from the obvious to the obscure.

    BY JeffReY heeR, michaeL BostocK, anD VaDim oGieVetsKY

    a tour through the Visualization zoo

    help engage more diverse audiences in exploration and analysis. The challenge is to create effective and engaging visu-alizations that are appropriate to the data.

    Creating a visualization requires a number of nuanced judgments. One must determine which questions to ask, identify the appropriate data, and select effective visual encodings to map data values to graphical features such as position, size, shape, and color. The challenge is that for any given data set the number of visual encodings—and thus the space of possible visualization designs—is extremely large. To guide this process, computer scientists, psy-

    of valuable information on how we conduct our businesses, governments, and personal lives. To put the informa-tion to good use, we must find ways to explore, relate, and communicate the data meaningfully.

    The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers. Well-designed visual representations can replace cognitive calculations with simple perceptual in-ferences and improve comprehension, memory, and decision making. By mak-ing data more accessible and appeal-ing, visual representations may also

    ThAnKs To ADVA n Ces in sensing, networking, and data management, our society is producing digital information at an astonishing rate. According to one estimate, in 2010 alone we will generate 1,200 exabytes—60 million times the content of the Library of Congress. Within this deluge of data lies a wealth

  • 60 communications of the acm | june 2010 | vol. 53 | no. 6

    practice

    chologists, and statisticians have stud-ied how well different encodings facili-tate the comprehension of data types such as numbers, categories, and net-works. For example, graphical percep-tion experiments find that spatial po-sition (as in a scatter plot or bar chart) leads to the most accurate decoding of numerical data and is generally prefer-able to visual variables such as angle, one-dimensional length, two-dimen-sional area, three-dimensional volume, and color saturation. Thus, it should be no surprise that the most common data graphics, including bar charts, line charts, and scatter plots, use posi-tion encodings. Our understanding of graphical perception remains incom-plete, however, and must appropriately be balanced with interaction design and aesthetics.

    This article provides a brief tour through the “visualization zoo,” show-casing techniques for visualizing and interacting with diverse data sets. In many situations, simple data graphics will not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated and unusual techniques that deal with complex data sets. After all, you don’t go to the zoo to see chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the terrifying Suma-tran tiger. Analogously, we cover some of the more exotic (but practically use-ful) forms of visual data representation, starting with one of the most common, time-series data; continuing on to sta-tistical data and maps; and then com-pleting the tour with hierarchies and networks. Along the way, bear in mind that all visualizations share a common “DNA”—a set of mappings between data properties and visual attributes such as position, size, shape, and col-or—and that customized species of vi-sualization might always be construct-ed by varying these encodings.

    Each visualization shown here is accompanied by an online interactive example that can be viewed at the URL displayed beneath it. The live examples were created using Protovis, an open source language for Web-based data visualization. To learn more about how a visualization was made (or to copy and paste it for your own use), see the online version of this article available on the ACM Queue site at http://queue.

    -1.0x

    0.0x

    1.0x

    2.0x

    3.0x

    4.0x

    5.0x

    Gai

    n /

    Los

    s Fa

    ctor

    S&P 500MSFT

    AMZN

    IBM

    AAPL

    GOOG

    Jan 2005

    time-series Data: figure 1a. index chart of selected technology stocks, 2000–2010.

    Source: yahoo! Finance; http://hci.stanford.edu/jheer/files/zoo/ex/time/index-chart.html

    2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

    Agriculture

    Business services

    Construction

    Education and Health

    FinanceGovernmentInformation

    Leisure and hospitality

    ManufacturingMining and ExtractionOtherSelf-employedTransportation and Utilities

    Wholesale and Retail Trade

    time-series Data: figure 1b. stacked graph of unemployed u.s. workers by industry, 2000–2010.

    Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/stack.html

    Self-employed Agriculture

    Other Leisure and hospitality

    Education and Health Business services

    Finance Information

    Transportation and Utilities Wholesale and Retail Trade

    Manufacturing Construction

    Mining and Extraction Government

    time-series Data: figure 1c. small multiples of unemployed u.s. workers, normalized by industry, 2000–2010.

    Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/multiples.html

    time-series Data: figure 1d. horizon graphs of u.s. unemployment rate, 2000–2010.

    Source: u.S. bureau of Labor Statistics; http://hci.stanford.edu/jheer/files/zoo/ex/time/horizon.html

  • practice

    june 2010 | vol. 53 | no. 6 | communications of the acm 61

    acm.org/detail.cfm?id=1780401/. All example source code is released into the public domain and has no restric-tions on reuse or modification. Note, however, that these examples will work only on a modern, standards-compli-ant browser supporting scalable vector graphics (SVG). Supported browsers in-clude recent versions of Firefox, Safari, Chrome, and Opera. Unfortunately, In-ternet Explorer 8 and earlier versions do not support SVG and so cannot be used to view the interactive examples.

    time-series Data Sets of values changing over time—or, time-series data—is one of the most common forms of recorded data. Time-varying phenomena are central to many domains such as finance (stock prices, exchange rates), science (temperatures, pollution levels, electric potentials), and public policy (crime rates). One of-ten needs to compare a large number of time series simultaneously and can choose from a number of visualizations to do so.

    Index Charts. With some forms of time-series data, raw values are less im-portant than relative changes. Consider investors who are more interested in a stock’s growth rate than its specific price. Multiple stocks may have dra-matically different baseline prices but may be meaningfully compared when normalized. An index chart is an inter-active line chart that shows percentage changes for a collection of time-series data based on a selected index point. For example, the image in Figure 1a shows the percentage change of select-ed stock prices if purchased in January 2005: one can see the rocky rise enjoyed by those who invested in Amazon, Ap-ple, or Google at that time.

    Stacked Graphs. Other forms of time-series data may be better seen in aggregate. By stacking area charts on top of each other, we arrive at a visual summation of time-series values—a stacked graph. This type of graph (some-times called a stream graph) depicts aggregate patterns and often supports drill-down into a subset of individual series. The chart in Figure 1b shows the number of unemployed workers in the U.S. over the past decade, subdivided by industry. While such charts have prov-en popular in recent years, they do have some notable limitations. A stacked

    graph does not support negative num-bers and is meaningless for data that should not be summed (temperatures, for example). Moreover, stacking may make it difficult to accurately interpret trends that lie atop other curves. Inter-active search and filtering is often used to compensate for this problem.

    Small Multiples. In lieu of stacking, multiple time series can be plotted within the same axes, as in the index chart. Placing multiple series in the same space may produce overlapping curves that reduce legibility, however. An alternative approach is to use small multiples: showing each series in its own chart. In Figure 1c we again see the number of unemployed workers, but normalized within each industry category. We can now more accurately see both overall trends and seasonal patterns in each sector. While we are considering time-series data, note that small multiples can be constructed for just about any type of visualization: bar charts, pie charts, maps, among others. This often produces a more effective vi-sualization than trying to coerce all the data into a single plot.

    Horizon Graphs. What happens when you want to compare even more time series at once? The horizon graph is a technique for increasing the data density of a time-series view while pre-serving resolution. Consider the five graphs shown in Figure 1d. The first one is a standard area chart, with posi-tive values colored blue and negative values colored red. The second graph “mirrors” negative values into the same region as positive values, doubling the data density of the area chart. The third chart—a horizon graph—doubles the data density yet again by dividing the graph into bands and layering them to create a nested form. The result is a chart that preserves data resolution but uses only a quarter of the space. Al-though the horizon graph takes some time to learn, it has been found to be more effective than the standard plot when the chart sizes get quite small.

    statistical Distributions Other visualizations have been de-signed to reveal how a set of numbers is distributed and thus help an analyst better understand the statistical prop-erties of the data. Analysts often want to fit their data to statistical models, ei-

    ther to test hypotheses or predict future values, but an improper choice of mod-el can lead to faulty predictions. Thus, one important use of visualizations is exploratory data analysis: gaining in-sight into how data is distributed to inform data transformation and mod-eling decisions. Common techniques include the histogram, which shows the prevalence of values grouped into bins, and the box-and-whisker plot, which can convey statistical features such as the mean, median, quartile boundaries, or extreme outliers. In addition, a number of other techniques exist for assessing a distribution and examining interac-tions between multiple dimensions.

    Stem-and-Leaf Plots. For assessing a collection of numbers, one alternative to the histogram is the stem-and-leaf plot. It typically bins numbers accord-ing to the first significant digit, and then stacks the values within each bin by the second significant digit. This minimal-istic representation uses the data itself to paint a frequency distribution, re-placing the “information-empty” bars of a traditional histogram bar chart and allowing one to assess both the overall distribution and the contents of each bin. In Figure 2a, the stem-and-leaf plot shows the distribution of completion rates of workers completing crowd-sourced tasks on Amazon’s Mechani-cal Turk. Note the multiple clusters: one group clusters around high levels of completion (99%–100%); at the oth-er extreme is a cluster of Turkers who complete only a few tasks (~10%) in a group.

    Q-Q Plots. Though the histogram and the stem-and-leaf plot are common tools for assessing a frequency distribu-tion, the Q-Q (quantile-quantile) plot is a more powerful tool. The Q-Q plot com-pares two probability distributions by graphing their quantiles against each other. If the two are similar, the plotted values will lie roughly along the central diagonal. If the two are linearly related, values will again lie along a line, though with varying slope and intercept.

    Figure 2b shows the same Mechani-cal Turk participation data compared with three statistical distributions. Note how the data forms three distinct components when compared with uni-form and normal (Gaussian) distribu-tions: this suggests that a statistical model with three components might

  • 62 communications of the acm | june 2010 | vol. 53 | no. 6

    practice

    be more appropriate, and indeed we see in the final plot that a fitted mixture of three normal distributions provides a better fit. Though powerful, the Q-Q plot has one obvious limitation in that its effective use requires that viewers possess some statistical knowledge.

    SPLOM (Scatter Plot Matrix). Other visualization techniques attempt to represent the relationships among multiple variables. Multivariate data occurs frequently and is notoriously hard to represent, in part because of the difficulty of mentally picturing data in more than three dimensions. One technique to overcome this problem is to use small multiples of scatter plots showing a set of pairwise relations among variables, thus creating the SP-LOM (scatter plot matrix). A SPLOM en-ables visual inspection of correlations between any pair of variables.

    In Figure 2c a scatter plot matrix is used to visualize the attributes of a da-tabase of automobiles, showing the re-lationships among horsepower, weight, acceleration, and displacement. Addi-tionally, interaction techniques such as brushing-and-linking—in which a selection of points on one graph high-lights the same points on all the other graphs—can be used to explore pat-terns within the data.

    Parallel Coordinates. As shown in Figure 2d, parallel coordinates (||-co-ord) take a different approach to visu-alizing multivariate data. Instead of graphing every pair of variables in two dimensions, we repeatedly plot the data on parallel axes and then connect the corresponding points with lines. Each poly-line represents a single row in the database, and line crossings between dimensions often indicate inverse cor-relation. Reordering dimensions can aid pattern-finding, as can interactive querying to filter along one or more di-mensions. Another advantage of paral-lel coordinates is that they are relatively compact, so many variables can be shown simultaneously.

    mapsAlthough a map may seem a natural way to visualize geographical data, it has a long and rich history of design. Many maps are based upon a carto-graphic projection: a mathematical function that maps the 3D geometry of the Earth to a 2D image. Other maps

    0 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 6 7 8 8 8 8 8 8 9

    1 0 0 0 0 1 1 1 1 2 2 3 3 3 3 4 4 4 4 5 5 6 7 7 8 9 9 9 9 9

    2 0 0 1 1 1 5 7 8 9

    3 0 0 1 2 3 3 3 4 6 6 8 8

    4 0 0 1 1 1 1 3 3 4 5 5 5 6 7 8 9

    5 0 2 3 5 6 7 7 7 9

    6 1 2 6 7 8 9 9 9

    7 0 0 0 1 6 7 9

    8 0 0 1 2 3 4 4 4 4 4 4 4 5 6 7 7 7 9

    9 1 3 3 5 7 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

    10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    statistical Distributions: figure 2a. stem-and-leaf plot of mechanical turk participation rates.

    Source: Stanford Visualization group; http://hci.stanford.edu/jheer/files/zoo/ex/stats/stem-and-leaf.html

    Uniform Distribution

    0%

    50%

    100%

    0% 50% 100%

    Gaussian Distribution0% 50% 100%

    Fitted Mixture of 3 Gaussians0% 50% 100%T

    urke

    r T

    ask

    Gro

    up C

    ompl

    etio

    n %

    statistical Distributions: figure 2b. Q-Q plots of mechanical turk participation rates.

    Source: Stanford Visualization group; http://hci.stanford.edu/jheer/files/zoo/ex/stats/qqplot.html

    cylinders displacement weight horsepower acceleration mpg year

    3 68 cubic inch 1613 lbs 46 hp 8 (0 to 60mph) 9 miles/gallon 70

    8 455 cubic inch 5140 lbs 230 hp 25 (0 to 60mph) 47 miles/gallon 82

    statistical Distributions: figure 2d. Parallel coordinates of automobile data.

    Source: ggobi; http://hci.stanford.edu/jheer/files/zoo/ex/stats/parallel.html

    horsepower

    20003000

    40005000 10 15 20

    100200

    300400

    50

    100

    150

    200

    2000

    3000

    4000

    5000

    weight

    2000

    3000

    4000

    5000

    10

    15

    20

    acceleration

    10

    15

    20

    50 100150

    200

    100

    200

    300

    400

    20003000

    40005000

    10 15 20

    displacement

    United States European Union Japan

    statistical Distributions: figure 2c. scatter plot matrix of automobile data.

    Source: ggobi; http://hci.stanford.edu/jheer/files/zoo/ex/stats/splom.html

  • practice

    june 2010 | vol. 53 | no. 6 | communications of the acm 63

    knowingly distort or abstract geo-graphic features to tell a richer story or highlight specific data.

    Flow Maps. By placing stroked lines on top of a geographic map, a flow map can depict the movement of a quantity in space and (implicitly) in time. Flow lines typically encode a large amount of multivariate information: path points, direction, line thickness, and color can all be used to present dimensions of information to the viewer. Figure 3a is a modern interpretation of Charles Mi-nard’s depiction of Napoleon’s ill-fated march on Moscow. Many of the greatest flow maps also involve subtle uses of distortion, as geography is bended to accommodate or highlight flows.

    Choropleth Maps. Data is often col-lected and aggregated by geographi-cal areas such as states. A standard approach to communicating this data is to use a color encoding of the geo-graphic area, resulting in a choropleth map. Figure 3b uses a color encoding to communicate the prevalence of obe-sity in each state in the U.S. Though this is a widely used visualization tech-nique, it requires some care. One com-mon error is to encode raw data values (such as population) rather than using normalized values to produce a densi-ty map. Another issue is that one’s per-ception of the shaded value can also be affected by the underlying area of the geographic region.

    Graduated Symbol Maps. An alterna-tive to the choropleth map, the gradu-ated symbol map places symbols over an underlying map. This approach avoids confounding geographic area with data values and allows for more dimensions to be visualized (for example, symbol size, shape, and color). In addition to simple shapes such as circles, gradu-ated symbol maps may use more com-plicated glyphs such as pie charts. In Figure 3c, total circle size represents a state’s population, and each slice indi-cates the proportion of people with a specific BMI rating.

    Cartograms. A cartogram distorts the shape of geographic regions so that the area directly encodes a data variable. A common example is to redraw every country in the world sizing it propor-tionally to population or gross domes-tic product. Many types of cartograms have been created; in Figure 3d we use the Dorling cartogram, which represents

    0°-10°-20°-30°

    18 Oct24 Oct09 Nov

    14 Nov24 Nov

    28 Nov01 Dec06 Dec07 Dec

    maps: figure 3a. flow map of napoleon’s march on moscow, based on the work of charles minard.

    http://hci.stanford.edu/jheer/files/zoo/ex/maps/napoleon.html

    WY

    WI

    WV

    WA

    OR

    NV CO

    SD

    NDMT

    ID

    UT

    NMAZ

    CA

    NE

    KS

    OK

    TX LA

    MS AL GA

    ARTN

    SC

    NC

    FL

    MN

    IA

    MO

    IL

    KY

    IN

    MI

    OH

    VA

    DEMD

    PANJ

    NY

    CT RIMA

    NH

    ME

    VT

    14 - 17%

    17 - 20%

    20 - 23%

    23 - 26%

    26 - 29%

    29 - 32%

    32 - 35%

    maps: figure 3b. choropleth map of obesity in the u.s., 2008.

    Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/choropleth.html

    Obese

    Overweight

    Normal

    maps: figure 3c. Graduated symbol map of obesity in the u.s., 2008.

    Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/symbol.html

    WY

    WI

    WV

    WA

    OR

    NVCO

    SD

    NDMT

    ID

    UT

    NMAZCA

    NE

    KS

    OK

    TX

    LA

    MS

    AL

    GA

    ARTN

    SC

    NC

    FL

    MN

    IA

    MO

    IL

    KY

    IN

    MI

    OH

    VA

    DEMD

    PA

    NJ

    NY

    CT

    RI

    MA

    NH

    ME

    VT

    10M

    1M

    5M

    100K14 - 17%17 - 20%20 - 23%23 - 26%26 - 29%29 - 32%32 - 35%

    maps: figure 3d. Dorling cartogram of obesity in the u.s., 2008.

    Source: national Center for Chronic Disease Prevention and Health Promotion; http://hci.stanford.edu/jheer/files/zoo/ex/maps/cartogram.html

  • 64 communications of the acm | june 2010 | vol. 53 | no. 6

    practice

    each geographic region with a sized circle, placed so as to resemble the true geographic configuration. In this ex-ample, circular area encodes the total number of obese people per state, and color encodes the percentage of the to-tal population that is obese.

    hierarchiesWhile some data is simply a flat collec-tion of numbers, most can be organized into natural hierarchies. Consider: spa-tial entities, such as counties, states, and countries; command structures for businesses and governments; soft-ware packages and phylogenetic trees. Even for data with no apparent hierar-chy, statistical methods (for example, k-means clustering) may be applied to organize data empirically. Special visu-alization techniques exist to leverage hierarchical structure, allowing rapid multiscale inferences: micro-observa-tions of individual elements and mac-ro-observations of large groups.

    Node-link diagrams. The word tree is used interchangeably with hierarchy, as the fractal branches of an oak might mirror the nesting of data. If we take a two-dimensional blueprint of a tree, we have a popular choice for visualizing hierarchies: a node-link diagram. Many different tree-layout algorithms have been designed; the Reingold-Tilford al-gorithm, used in Figure 4a on a package hierarchy of software classes, produces a tidy result with minimal wasted space.

    An alternative visualization scheme is the dendrogram (or cluster) algorithm, which places leaf nodes of the tree at the same level. Thus, in the diagram in Fig-ure 4b, the classes (orange leaf nodes) are on the diameter of the circle, with the packages (blue internal nodes) in-side. Using polar rather than Cartesian coordinates has a pleasing aesthetic, while using space more efficiently.

    We would be remiss to overlook the indented tree, used ubiquitously by operating systems to represent file directories, among other applications (see Figure 4c). Although the indented tree requires excessive vertical space and does not facilitate multiscale infer-ences, it does allow efficient interactive exploration of the tree to find a specific node. In addition, it allows rapid scan-ning of node labels, and multivariate data such as file size can be displayed adjacent to the hierarchy.

    flare

    analytics

    cluster

    Agglom

    erativeCluster

    Com

    munityS

    tructureH

    ierarchicalCluster

    MergeE

    dge

    graph

    Betw

    eennessCentrality

    LinkDistance

    MaxFlow

    MinC

    utS

    hortestPaths

    SpanningT

    ree

    optimization

    AspectR

    atioBanker

    animate

    Easing

    FunctionSequence

    ISchedulable

    Parallel

    Pause

    Scheduler

    Sequence

    Transition

    TransitionE

    ventT

    ransitionerT

    ween

    interpolate

    ArrayInterpolator

    ColorInterpolator

    DateInterpolator

    InterpolatorM

    atrixInterpolatorN

    umberInterpolator

    ObjectInterpolator

    PointInterpolator

    RectangleInterpolator

    data

    DataField

    DataS

    chema

    DataS

    etD

    ataSource

    DataT

    ableD

    ataUtil

    converters

    Converters

    Delim

    itedTextC

    onverterG

    raphMLC

    onverterID

    ataConverter

    JS

    ON

    Converter

    display

    DirtyS

    priteLineS

    priteR

    ectSprite

    TextS

    prite

    flex

    FlareVis

    physics

    DragForce

    GravityForce

    IForceN

    BodyForce

    Particle

    Sim

    ulationS

    pringS

    pringForce

    query

    AggregateE

    xpressionA

    ndA

    rithmetic

    Average

    BinaryE

    xpressionC

    omparison

    Com

    positeExpression

    Count

    DateU

    tilD

    istinctE

    xpressionE

    xpressionIteratorFn If IsALiteralM

    atchM

    aximum

    Minim

    umN

    otO

    rQ

    ueryR

    angeS

    tringUtil

    Sum

    Variable

    Variance

    Xorm

    ethods

    _ addandaveragecountdistinctdiveq fn gt gteiff isalt ltem

    axm

    inm

    odm

    ulneqnotor orderbyrangeselectstddevsubsumupdatevariancew

    herexor

    scale

    IScaleM

    apLinearS

    caleLogS

    caleO

    rdinalScale

    QuantileS

    caleQ

    uantitativeScale

    RootS

    caleS

    caleS

    caleType

    Tim

    eScale

    util

    Arrays

    Colors

    Dates

    Displays

    FilterG

    eometry

    IEvaluable

    IPredicate

    IValueP

    roxyM

    athsO

    rientationP

    ropertyS

    hapesS

    ortS

    tatsS

    tringsheap

    FibonacciHeap

    HeapN

    ode

    math

    DenseM

    atrixIM

    atrixS

    parseMatrix

    palette

    ColorP

    aletteP

    aletteS

    hapePalette

    SizeP

    alette

    vis

    Visualization

    axis

    Axes

    Axis

    AxisG

    ridLineA

    xisLabelC

    artesianAxes

    controls

    AnchorC

    ontrolC

    lickControl

    Control

    ControlList

    DragC

    ontrolE

    xpandControl

    HoverC

    ontrolIC

    ontrolP

    anZoomC

    ontrolS

    electionControl

    TooltipC

    ontrol

    data

    Data

    DataList

    DataS

    priteE

    dgeSprite

    NodeS

    priteS

    caleBinding

    Tree

    TreeB

    uilderrender

    Arrow

    Type

    EdgeR

    endererIR

    endererS

    hapeRenderer

    events

    DataE

    ventS

    electionEvent

    TooltipE

    ventV

    isualizationEvent

    legendLegendLegendItemLegendR

    ange

    operator

    IOperator

    Operator

    OperatorList

    OperatorS

    equenceO

    peratorSw

    itchS

    ortOperator

    distortionB

    ifocalDistortion

    Distortion

    FisheyeDistortion

    encoder

    ColorE

    ncoderE

    ncoderP

    ropertyEncoder

    ShapeE

    ncoderS

    izeEncoder

    filterFisheyeT

    reeFilterG

    raphDistanceFilter

    VisibilityFilter

    labelLabelerR

    adialLabelerS

    tackedAreaLabeler

    layout

    AxisLayout

    BundledE

    dgeRouter

    CircleLayout

    CircleP

    ackingLayoutD

    endrogramLayout

    ForceDirectedLayout

    IcicleTreeLayout

    IndentedTreeLayout

    LayoutN

    odeLinkTreeLayout

    PieLayout

    RadialT

    reeLayoutR

    andomLayout

    StackedA

    reaLayoutT

    reeMapLayout

    hierarchies: figure 4a. Radial node-link diagram of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/tree.html

    933KB47KB14KB25KB

    6KB97KB29KB23KB

    4KB29KB

    1KB1KB0KB

    10KB2KB9KB2KB1KB

    87KB30KB

    161KB422KB

    16KB33KB

    flareanalytics

    clustergraphoptimization

    animatedatadisplayflexphysics

    DragForceGravityForceIForceNBodyForceParticleSimulationSpringSpringForce

    queryscaleutilvis

    Visualizationaxis

    43KB107KB

    6KB35KB

    179KB1KB2KB5KB4KB2KB1KB

    13KB14KB11KB16KB

    105KB6KB3KB9KB

    11KB4KB8KB4KB3KB

    controlsdataeventslegendoperator

    IOperatorOperatorOperatorListOperatorSequenceOperatorSwitchSortOperatordistortionencoderfilterlabellayout

    AxisLayoutBundledEdgeRouterCircleLayoutCirclePackingLayoutDendrogramLayoutForceDirectedLayoutIcicleTreeLayoutIndentedTreeLayout

    hierarchies: figure 4c. indented tree layout of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/indent.html

    flare

    anal

    ytic

    s

    clus

    ter

    Agg

    lom

    erat

    iveC

    lust

    erC

    omm

    unity

    Str

    uctu

    reH

    iera

    rchi

    calC

    lust

    erM

    erge

    Edg

    egr

    aph

    Bet

    wee

    nnes

    sCen

    tral

    ityLi

    nkD

    ista

    nce

    Max

    Flow

    Min

    Cut

    Sho

    rtes

    tPat

    hsS

    pann

    ingT

    ree

    optim

    izat

    ion

    Asp

    ectR

    atio

    Ban

    ker

    anim

    ate

    Easi

    ngFu

    nctio

    nSeq

    uenc

    e

    ISch

    edul

    able

    Para

    llel

    Paus

    eSc

    hedu

    ler

    Sequ

    ence

    Tran

    sitio

    nTr

    ansi

    tionE

    vent

    Tran

    sitio

    ner

    Twee

    n

    inter

    pola

    te

    Arra

    yInt

    erpo

    lato

    r

    Colo

    rInt

    erpo

    lato

    r

    Date

    Inte

    rpol

    ator

    Inte

    rpol

    ator

    Mat

    rixIn

    terp

    olat

    or

    Num

    berI

    nter

    polat

    or

    Objec

    tInt

    erpo

    lator

    Point

    Inte

    rpola

    tor

    Recta

    ngleI

    nter

    polat

    or

    data

    DataF

    ield

    DataS

    chem

    a

    DataS

    et

    DataS

    ource

    DataT

    able

    DataU

    til

    conver

    ters

    Conver

    ters

    Delimi

    tedTex

    tConve

    rter

    GraphM

    LConve

    rter

    IDataC

    onverte

    r

    JSONCo

    nverter

    display

    DirtySpri

    te

    LineSprite

    RectSprite

    TextSprite

    flexFlareV

    is

    physics

    DragForce

    GravityForce

    IForceNBodyForceParticleSimulationSpringSpringForce

    query

    AggregateExpressionAndArithmeticAverageBinaryExpressionComparisonCompositeExpression

    CountDateUtilDistinctExpressionExpressionIterator

    FnIfIsALiteralMatchMaximumMinimum

    NotOrQueryRangeStringUtil

    SumVariableVariance

    Xor

    methods _addandaverage

    count

    distinct

    diveqfngtgteiffisaltltemaxm

    inm

    odm

    ulneqnotororderbyrangeselectstddevsubsumupdatevariancew

    here

    xor

    scal

    e

    ISca

    leM

    apLi

    near

    Sca

    leLo

    gSca

    leO

    rdin

    alS

    cale

    Qua

    ntile

    Sca

    le

    Qua

    ntita

    tiveS

    cale

    Roo

    tSca

    leS

    cale

    Sca

    leT

    ype

    Tim

    eSca

    le

    util

    Arr

    ays

    Col

    ors

    Dat

    es

    Dis

    play

    sFi

    lter

    Geo

    met

    ry

    IEva

    luab

    le

    IPre

    dica

    te

    IVal

    uePr

    oxy

    Mat

    hs

    Orie

    ntat

    ion

    Prop

    erty

    Shap

    esSort

    Stat

    s

    Strin

    gs

    heap

    Fibon

    acciH

    eap

    Heap

    Node

    math

    Dens

    eMat

    rixIMatr

    ix

    Spars

    eMatr

    ix

    palet

    te

    Color

    Palet

    tePa

    lette

    Shap

    ePale

    tte

    SizeP

    alette

    vis

    Visual

    ization

    axis

    AxesAx

    isAxis

    GridLineAxis

    Label

    Cartesia

    nAxes

    controls

    AnchorCo

    ntrolClickCont

    rolControlContr

    olListDragContro

    lExpandControlHo

    verControlICo

    ntrolPanZoomControlSe

    lectionControlTooltipControl

    data

    DataDataList

    DataSpriteEdgeSpriteNodeSprite

    ScaleBindingTree

    TreeBuilderrender

    ArrowType

    EdgeRenderer

    IRenderer

    ShapeRenderer events

    DataEvent

    SelectionEvent

    TooltipEvent

    VisualizationEventlegend

    Legend

    LegendItem

    LegendRange

    operator

    IOperator

    Operator

    OperatorList

    OperatorSequence

    OperatorSwitch

    SortOperator

    distortion

    BifocalDistortion

    Distortion

    FisheyeDistortion

    encoder

    ColorEncoderEncoder

    PropertyEncoder

    ShapeEncoder

    SizeEncoder

    filter

    FisheyeTreeFilter

    GraphDistanceFilter

    VisibilityFilter

    label

    Labeler

    RadialLabeler

    StackedAreaLabeler

    layout

    AxisLayout

    BundledEdgeR

    outerC

    ircleLayout

    CircleP

    ackingLayoutD

    endrogramLayout

    ForceDirectedLayout

    IcicleTreeLayout

    IndentedTreeLayout

    LayoutN

    odeLinkTreeLayoutP

    ieLayoutR

    adialTreeLayout

    Random

    LayoutS

    tackedAreaLayout

    TreeM

    apLayout

    hierarchies: figure 4b. cartesian node-link diagram of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/cluster-radial.html

  • practice

    june 2010 | vol. 53 | no. 6 | communications of the acm 65

    Adjacency Diagrams. The adjacency diagram is a space-filling variant of the node-link diagram; rather than draw-ing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in Figure 4d is similar to the first node-link diagram in that the root node appears at the top, with child nodes underneath. Because the nodes are now space-filling, however, we can use a length encoding for the size of software classes and packages. This reveals an additional dimension that would be difficult to show in a node-link diagram.

    The sunburst layout, shown in Fig-ure 4e, is equivalent to the icicle lay-out, but in polar coordinates. Both are implemented using a partition layout, which can also generate a node-link diagram. Similarly, the previous cluster layout can be used to generate a space-filling adjacency diagram in either Car-tesian or polar coordinates.

    Enclosure Diagrams. The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rect-angles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. The example shown in Figure 4f uses padding (in blue) to emphasize enclosure; an alternative saturation encoding is sometimes used. Squarified treemaps use approxi-mately square rectangles, which offer better readability and size estimation than a naive “slice-and-dice” subdivi-sion. Fancier algorithms such as Vo-ronoi and jigsaw treemaps also exist but are less common.

    By packing circles instead of sub-dividing rectangles, we can produce a different sort of enclosure diagram that has an almost organic appear-ance. Although it does not use space as efficiently as a treemap, the “wast-ed space” of the circle-packing layout, shown in Figure 4g, effectively reveals the hierarchy. At the same time, node sizes can be rapidly compared using area judgments.

    NetworksIn addition to organization, one aspect

    flare

    analytics

    cluster

    graph

    optimization

    animate

    interpolate

    data

    converters

    display

    flex

    physics

    query

    methods

    scale

    util heap

    math

    palette

    vis

    axis

    controls data

    render

    events

    legend

    operatordistortion

    encoder

    filterlabel

    layout

    hierarchies: figure 4g. nested circles layout of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/pack.html Source: The Flare Toolkit http://flare.prefuse.org

    flar

    e

    anal

    ytic

    s

    clus

    ter

    grap

    h

    anim

    ate

    Eas

    ing

    Tra

    nsiti

    on

    Tra

    nsiti

    oner

    inte

    rpol

    ate

    Inte

    rpol

    ator

    data

    conv

    erte

    rsG

    raph

    MLC

    onve

    rter

    disp

    lay

    Dir

    tyS

    prite

    Tex

    tSpr

    ite

    phys

    ics

    NB

    odyF

    orce

    Sim

    ulat

    ion

    quer

    y

    Que

    ry

    met

    hods

    scal

    e

    util

    Arr

    ays

    Col

    ors

    Dat

    es

    Dis

    play

    s

    Geo

    met

    ry

    Mat

    hs

    Sha

    pes

    Str

    ings

    heap

    Fibo

    nacc

    iHea

    p

    mat

    h

    pale

    tte

    vis

    Vis

    ualiz

    atio

    n

    axis

    Axi

    s

    cont

    rols

    Too

    ltip

    Con

    trol

    data

    Dat

    a

    Dat

    aLis

    t

    Dat

    aSpr

    ite

    Nod

    eSpr

    ite

    Sca

    leB

    indi

    ng

    Tre

    eBui

    lder

    rend

    er

    lege

    ndLe

    gend

    Lege

    ndR

    ange

    oper

    ator

    dist

    ortio

    n

    enco

    der

    filte

    r

    labe

    lLa

    bele

    r

    layo

    ut

    Cir

    cleL

    ayou

    t

    Cir

    cleP

    acki

    ngLa

    yout

    Forc

    eDir

    ecte

    dLay

    out

    Nod

    eLin

    kTre

    eLay

    out

    Rad

    ialT

    reeL

    ayou

    t

    Sta

    cked

    Are

    aLay

    out

    Tre

    eMap

    Layo

    ut

    hierarchies: figure 4d. icicle tree layout of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/icicle.html

    flare

    anal

    ytic

    scl

    uste

    r

    grap

    hM

    axFl

    owM

    inC

    utan

    imat

    eEa

    sing

    Tran

    sitio

    nTr

    ansit

    ioner

    interp

    olate

    Inter

    polat

    or

    data

    conver

    ters

    GraphM

    LConve

    rter

    display

    DirtySpri

    te

    TextSprite

    physics

    NBodyForce

    Simulation

    query

    Query

    methods

    scale

    util

    Arrays

    Colors

    Dates

    Displays

    Geom

    etry

    Maths

    Shapes

    Str

    ings

    heap

    Fibo

    nacc

    iHea

    pm

    ath

    pale

    tte

    vis

    Visu

    aliz

    atio

    n

    axis

    Axi

    sco

    ntro

    ls

    Selec

    tionC

    ontro

    l

    Toolt

    ipCon

    trol

    data

    Data

    DataLis

    tData

    Sprite

    NodeSprite

    ScaleBinding

    TreeBuilder

    render

    legend

    Legend

    LegendRange

    operator

    distortion

    encoder

    filter

    label

    Labeler

    layout

    CircleLayoutCirclePackingLayout

    ForceDirectedLayout

    LayoutN

    odeLinkTreeLayout

    RadialT

    reeLayout

    StackedA

    reaLayout

    TreeM

    apLayout

    hierarchies: figure 4e. sunburst (radial space-filling) layout of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/sunburst.html

    flare

    analyticstics

    clusteromerativemunityStru

    HierarchicalCluste

    MergeEdgeanalytgraph

    weennessCenLinkDistanc

    MaxFlowMinChortestPat

    SpanningTree optimizationAspectRatioBanker

    animateEasing

    FunctionSequence chedul

    Parallel

    Pause

    Scheduler

    Sequence

    Transition

    sition

    Transitioner

    Tween

    interpolateyInterp

    lorInterpoInter

    Interpolator

    trixInterpo

    berInter

    ectInterp

    tInter

    angleInter

    data

    DataFie

    DataSchema

    DataSe

    DataSourceataTa

    DataUtil

    data

    converters

    nver

    itedTextCo

    aphMLConve

    ataConv

    SONConver display

    DirtySprite

    LineSprite

    RectSprit

    TextSprite

    flexFlareVis

    physics

    ragFor

    avityFor

    Fo

    NBodyForce

    Particle

    Simulation

    Spring

    SpringForc

    query

    egateExpr

    And

    Arithmetic

    Average

    naryExpress

    Comparison

    mpositeExpre

    Coun

    DateUtil

    Distinc

    Expression

    pressionIter

    Fn

    If IsA

    Literal

    Match

    aximnim

    Not

    Or

    Query

    Range

    StringUtil

    Sum

    VariablVariance

    Xor

    methods

    _

    add

    anderacoun

    stin

    div eq

    fn

    gt

    gteiff

    isa

    ltlte

    maxmin

    modmul

    neq not

    orrder

    range

    elec

    tddesub

    sum

    pdarian

    wher

    xor

    scale

    IScaleMap

    inearSca

    LogScale

    OrdinalScale

    uantileSc

    uantitativeSca

    RootScal

    Scale

    ScaleType

    TimeScale

    flare

    utilArrays

    Colors

    Dates

    Displays

    Filter

    Geometry

    valuredialuePr

    Maths

    ientat

    Property

    Shapes

    Sort

    Stats

    Strings

    heapFibonacciHeap pN

    mathDenseMatrix

    IMatrSparseMatri

    flare

    palette

    ColorPalette

    Palette

    hapePalezePale

    vis

    Visualization

    axis

    Axes

    Axis

    GridisLaCartesianAxes

    viscontrols

    nchorCont

    ClickContr

    Contro

    ControlList

    DragControl

    ExpandControHoverContro

    Cont

    anZoomCont

    SelectionControl

    TooltipControl

    data

    Data DataList

    DataSprite

    dgeSpr

    NodeSprite ScaleBinding

    TreeTreeBuilder

    renderrowT

    EdgeRendereend

    peRend

    eventsDataEven

    lectionEv

    oltipEv

    lizatio

    legend

    Legend

    egendIteLegendRange

    operator

    Operat

    Operator

    OperatorList

    OperatorSequenc

    eratorSw

    SortOperat

    distortionifocalDistort

    DistortionsheyeDistort

    encoder

    lorEnco

    EncoderpertyEnc

    apeEnco

    izeEncod

    filterheyeTreeFaphDistanceF

    VisibilityFilte

    label

    Labeler

    RadialLabelkedAreaL

    operatorlayout

    AxisLayout ledEdgeR

    CircleLayout

    CirclePackingLayo

    ndrogramLa

    ForceDirectedLayou

    cicleTreeLay

    dentedTreeLa

    Layout

    NodeLinkTreeLayou

    PieLayout

    RadialTreeLayout

    om

    StackedAreaLayout

    TreeMapLayou

    hierarchies: figure 4f. treemap layout of the flare package hierarchy.

    http://hci.stanford.edu/jheer/files/zoo/ex/hierarchies/treemap.html

  • 66 communications of the acm | june 2010 | vol. 53 | no. 6

    practice

    of data that we may wish to explore through visualization is relationship. For example, given a social network, who is friends with whom? Who are the central players? What cliques ex-ist? Who, if anyone, serves as a bridge between disparate groups? Abstractly, a hierarchy is a specialized form of net-work: each node has exactly one link to its parent, while the root node has no links. Thus node-link diagrams are also used to visualize networks, but the loss of hierarchy means a different al-gorithm is required to position nodes.

    Mathematicians use the formal term graph to describe a network. A central challenge in graph visualiza-tion is computing an effective layout. Layout techniques typically seek to po-sition closely related nodes (in terms of graph distance, such as the number of links between nodes, or other met-rics) close in the drawing; critically, unrelated nodes must also be placed far enough apart to differentiate rela-tionships. Some techniques may seek to optimize other visual features—for example, by minimizing the number of edge crossings.

    Force-directed Layouts. A common and intuitive approach to network lay-out is to model the graph as a physical system: nodes are charged particles that repel each other, and links are damp-ened springs that pull related nodes together. A physical simulation of these forces then determines the node posi-tions; approximation techniques that avoid computing all pairwise forces enable the layout of large numbers of nodes. In addition, interactivity allows the user to direct the layout and jiggle nodes to disambiguate links. Such a force-directed layout is a good starting point for understanding the structure of a general undirected graph. In Figure 5a we use a force-directed layout to view the network of character co-occurrence in the chapters of Victor Hugo’s classic novel, Les Misérables. Node colors de-pict cluster memberships computed by a community-detection algorithm.

    Arc Diagrams. An arc diagram, shown in Figure 5b, uses a one-dimen-sional layout of nodes, with circular arcs to represent links. Though an arc diagram may not convey the overall structure of the graph as effectively as a two-dimensional layout, with a good ordering of nodes it is easy to identify

    networks: figure 5a. force-directed layout of Les Misérables character co-occurrences.

    http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/force.html

    Myr

    iel

    Nap

    oleo

    nM

    lle. B

    aptis

    tine

    Mm

    e. M

    aglo

    ireC

    ount

    ess

    de L

    oG

    ebor

    and

    Cha

    mpt

    erci

    erC

    rava

    tteC

    ount

    Old

    Man

    Laba

    rreVa

    ljean

    Mar

    guer

    iteM

    me.

    de

    RIs

    abea

    uG

    erva

    isTh

    olom

    yes

    List

    olie

    rFa

    meu

    ilBl

    ache

    ville

    Favo

    urite

    Dah

    liaZe

    phin

    eFa

    ntin

    eM

    me.

    The

    nard

    ier

    Then

    ardi

    erC

    oset

    teJa

    vert

    Fauc

    hele

    vent

    Bam

    atab

    ois

    Perp

    etue

    Sim

    plic

    eSc

    auffl

    aire

    Wom

    an 1

    Judg

    eC

    ham

    pmat

    hieu

    Brev

    etC

    heni

    ldie

    uC

    oche

    paille

    Pont

    mer

    cyBo

    ulat

    ruel

    leEp

    onin

    eAn

    zelm

    aW

    oman

    2M

    othe

    r In

    noce

    ntG

    ribie

    rJo

    ndre

    tteM

    me.

    Bur

    gon

    Gav

    roch

    eG

    illeno

    rman

    dM

    agno

    nM

    lle. G

    illeno

    rman

    dM

    me.

    Pon

    tmer

    cyM

    lle. V

    aubo

    isLt

    . Gille

    norm

    and

    Mar

    ius

    Baro

    ness

    TM

    abeu

    fEn

    jolra

    sC

    ombe

    ferre

    Prou

    vaire

    Feui

    llyC

    ourfe

    yrac

    Baho

    rel

    Boss

    uet

    Joly

    Gra

    ntai

    reM

    othe

    r Pl

    utar

    chG

    ueul

    emer

    Babe

    tC

    laqu

    esou

    sM

    ontp

    arna

    sse

    Tous

    sain

    tC

    hild

    1C

    hild

    2Br

    ujon

    Mm

    e. H

    uche

    loup

    networks: figure 5b. arc diagram of Les Misérables character co-occurrences.

    http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/arc.html

    Child 1

    Chi

    ld 1

    Child 2

    Chi

    ld 2

    Mother Plutarch

    Mot

    her

    Plu

    tarc

    h

    Gavroche

    Gav

    roch

    e

    Marius

    Mar

    ius

    Mabeuf

    Mab

    euf

    Enjolras

    Enj

    olra

    s

    Combeferre

    Com

    befe

    rre

    Prouvaire

    Pro

    uvai

    re

    Feuilly

    Feu

    illy

    Courfeyrac

    Cou

    rfey

    rac

    Bahorel

    Bah

    orel

    Bossuet

    Bos

    suet

    Joly

    Joly

    Grantaire

    Gra

    ntai

    re

    Mme. Hucheloup

    Mm

    e. H

    uche

    loup

    Jondrette

    Jond

    rette

    Mme. Burgon

    Mm

    e. B

    urgo

    n

    Boulatruelle

    Bou

    latr

    uelle

    Cosette

    Cos

    ette

    Woman 2

    Wom

    an 2

    Gillenormand

    Gill

    enor

    man

    d

    Magnon

    Mag

    non

    Mlle. Gillenormand

    Mlle

    . G

    illen

    orm

    and

    Mme. Pontmercy

    Mm

    e. P

    ontm

    ercy

    Mlle. Vaubois

    Mlle

    . V

    aubo

    is

    Lt. Gillenormand

    Lt.

    Gill

    enor

    man

    d

    Baroness T

    Bar

    ones

    s T

    Toussaint

    Tou

    ssai

    nt

    Mme. Thenardier

    Mm

    e. T

    hena

    rdie

    r

    Thenardier

    The

    nard

    ier

    Javert

    Jave

    rt

    Pontmercy

    Pon

    tmer

    cy

    Eponine

    Epo

    nine

    Anzelma

    Anz

    elm

    a

    Gueulemer

    Gue

    ulem

    er

    Babet

    Bab

    et

    Claquesous

    Cla

    ques

    ous

    Montparnasse

    Mon

    tpar

    nass

    e

    Brujon

    Bru

    jon

    Marguerite

    Mar

    guer

    ite

    Tholomyes

    Tho

    lom

    yes

    Listolier

    List

    olie

    r

    Fameuil

    Fam

    euil

    Blacheville

    Bla

    chev

    ille

    Favourite

    Fav

    ourit

    e

    Dahlia

    Dah

    lia

    Zephine

    Zep

    hine

    Fantine

    Fan

    tine

    Perpetue

    Per

    petu

    e

    Labarre

    Laba

    rre

    Valjean

    Val

    jean

    Mme. de R

    Mm

    e. d

    e R

    Isabeau

    Isab

    eau

    Gervais

    Ger

    vais

    Bamatabois

    Bam

    atab

    ois

    Simplice

    Sim

    plic

    e

    Scaufflaire

    Sca

    uffla

    ire

    Woman 1

    Wom

    an 1

    Judge

    Judg

    e

    Champmathieu

    Cha

    mpm

    athi

    eu

    Brevet

    Bre

    vet

    Chenildieu

    Che

    nild

    ieu

    Cochepaille

    Coc

    hepa

    ille

    Myriel

    Myr

    iel

    Napoleon

    Nap

    oleo

    n

    Mlle. Baptistine

    Mlle

    . B

    aptis

    tine

    Mme. Magloire

    Mm

    e. M

    aglo

    ire

    Countess de Lo

    Cou

    ntes

    s de

    Lo

    Geborand

    Geb

    oran

    d

    Champtercier

    Cha

    mpt

    erci

    er

    Cravatte

    Cra

    vatte

    Count

    Cou

    nt

    Old Man

    Old

    Man

    Fauchelevent

    Fau

    chel

    even

    t

    Mother Innocent

    Mot

    her

    Inno

    cent

    Gribier

    Grib

    ier

    networks: figure 5c. matrix view of Les Misérables character co-occurrences.

    http://hci.stanford.edu/jheer/fi les/zoo/ex/networks/matrix.htmlSource: http://www-personal.umich.edu/~mejn/netdata

  • practice

    june 2010 | vol. 53 | no. 6 | communications of the acm 67

    cliques and bridges. Further, as with the indented-tree layout, multivariate data can easily be displayed alongside nodes. The problem of sorting the nodes in a manner that reveals underlying cluster structure is formally called seriation and has diverse applications in visualiza-tion, statistics, and even archaeology.

    Matrix Views. Mathematicians and computer scientists often think of a graph in terms of its adjacency matrix: each value in row i and column j in the matrix corresponds to the link from node i to node j. Given this representa-tion, an obvious visualization then is: just show the matrix! Using color or sat-uration instead of text allows values as-sociated with the links to be perceived more rapidly.

    The seriation problem applies just as much to the matrix view, shown in Figure 5c, as to the arc diagram, so the order of rows and columns is im-portant: here we use the groupings generated by a community-detection algorithm to order the display. While path-following is more difficult in a matrix view than in a node-link dia-gram, matrices have a number of com-pensating advantages. As networks get large and highly connected, node-link diagrams often devolve into giant hairballs of line crossings. In matrix views, however, line crossings are im-possible, and with an effective sort-ing one quickly can spot clusters and bridges. Allowing interactive group-ing and reordering of the matrix facili-tates even deeper exploration of net-work structure.

    conclusionWe have arrived at the end of our tour and hope the reader has found the ex-amples both intriguing and practical. Though we have visited a number of visual encoding and interaction tech-niques, many more species of visualiza-tion exist in the wild, and others await discovery. Emerging domains such as bioinformatics and text visualization are driving researchers and designers to continually formulate new and creative representations or find more powerful ways to apply the classics. In either case, the DNA underlying all visualizations remains the same: the principled map-ping of data variables to visual features such as position, size, shape, and color.

    As you leave the zoo and head back

    into the wild, try deconstructing the various visualizations crossing your path. Perhaps you can design a more ef-fective display?

    Additional Resources

    Few, S. Now I See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press, 2009.

    Tufte, E. The Visual Display of Quantitative Information. Graphics Press, 1983.

    Tufte, E.Envisioning Information. Graphics Press, 1990.

    Ware, C.Visual Thinking for Design. Morgan Kaufmann, 2008.

    Wilkinson, L.The Grammar of Graphics. Springer, 1999.

    Visualization Development Tools

    Prefuse: Java API for information visualization.

    Prefuse Flare: ActionScript 3 library for data visualization in the Adobe Flash Player.

    Processing: Popular language and IDE for graphics and interaction.

    Protovis: JavaScript tool for Web-based visualization.

    The Visualization Toolkit: Library for 3D and scientific visualization.

    Related articles on queue.acm.org

    A Conversation with Jeff heer, Martin Wattenberg, and Fernanda Viégashttp://queue.acm.org/detail.cfm?id=1744741

    Unifying Biological Image Formats with hDF5Matthew T. Dougherty, Michael J. Folk, Erez Zadok, Herbert J. Bernstein, Frances C. Bernstein, Kevin W. Eliceiri, Werner Benger, Christoph Besthttp://queue.acm.org/detail.cfm?id=1628215

    Jeffrey heer is an assistant professor of computer science at Stanford university, where he works on human-computer interaction, visualization, and social computing. He led the design of the Prefuse, Flare, and Protovis visualization toolkits.

    Michael Bostock is currently a Ph.D. student in the Department of Computer Science at Stanford university. before attending Stanford, he was a staff engineer at google, where he developed search quality evaluation methodologies.

    Vadim Ogievetsky is a master’s student at Stanford university specializing in human-computer interaction. He is a core contributor to Protovis, an open-source Web-based visualization toolkit.

    © 2010 ACM 0001-0782/10/0600 $10.00

    all visualizations share a common “Dna”—a set of mappings between data properties and visual attributes such as position, size, shape, and color—and customized species of visualization might always be constructed by varying these encodings.