ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface...

56
(A bit about) Data Visualization ICPSR Biennial Meeting October 2, 2015 Ryan Womack ([email protected]) Data Librarian, Rutgers University This work is licensed under a Creative Commons Attribution -NonCommercial-ShareAlike 4.0 International License. Ryan Womack ([email protected]) Data Librarian, Rutgers University (A bit about) Data Visualization 1 / 52

Transcript of ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface...

Page 1: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

(A bit about) Data VisualizationICPSR Biennial Meeting

October 2, 2015

Ryan Womack ([email protected])Data Librarian, Rutgers University

This work is licensed under a Creative Commons Attribution

-NonCommercial-ShareAlike 4.0 International License.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 1 / 52

Page 2: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Introduction

What this talk IS:

Discusses standard techniques of data visualization, the day-to-daypower tools for understanding data

Reviews various graphical techniques, from early to recent, fromsimple to advanced

Presents principles of good data presentation, and show the Rimplementation of many functions

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 2 / 52

Page 3: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Introduction

What this talk is NOT:

It is not about “infographics”, the beautiful, heavily customizedproducts of expert graphic designers. [See 1 and 2 for morediscussion]

It is not about the cognitive science aspects of data perception[wish I knew more about this!]

It is not about how to use R or other software [although code isprovided for those who are interested]

It is not necessarily a balanced survey of all data visualization. Inparticular, it is light on graph networks, clustering, and trees [notmy expertise]

Very little mapping, too [Others are better at this]

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 3 / 52

Page 4: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Setup

Most of the graphics examples that are not web accessible are runin R.

R is open source software available at http://r-project.org

RStudio is a useful freely available editor available athttp://rstudio.com

Workshop materials, including R scripts, supplemental images anddata, are available for download fromhttp://ryanwomack.com/ICPSR2015

The R script file contains working demonstrations of many of theconcepts mentioned here for you to try on your own.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 4 / 52

Page 5: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Outline

Why?

Whirlwhind tour of historical data viz

Standard visualization vs. some less commonly used examples

3-D and Animation

Interactivity, data exploration

A little bit of big data

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 5 / 52

Page 6: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Why Data Visualization?

Data visualization can:

provide clear understanding of patterns in data

detect hidden structures in data

condense information

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 6 / 52

Page 7: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Anscombe’s Quartet

For example, see Anscombe’s quartet (image source:http://commons.wikimedia.org/wiki/File:Anscombe%27s quartet 3.svg):

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 7 / 52

Page 8: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Links to DataViz sites

Some examples of good data visualization (and fancy infographics) canbe found at:

Information Aesthetics

Chart Porn

Eagereyes

DataVis.ca

VizWiz

US Census Data Visualization Gallery

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 8 / 52

Page 9: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Bad Graphs

Pie Charts are known to be problematic

Clutter and other issues can ruin graphics

Novel or nonsensical?

For more bad ideas, try:

Junk Charts

Ten Worst Graphs

WTFviz

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 9 / 52

Page 10: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Pie Chart Examples

image source: http://peltiertech.com/WordPress/3d-pie-charts/

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 10 / 52

Page 11: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Pie Chart Examples

image source: http://ndevisual.wordpress.com/tag/uses-of-pie-charts/

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 11 / 52

Page 12: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Pie Chart Examples

image source: http://www.nbcchicago.com/news/local/FOX-News-Chart-Fails-Math-73711092.html

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 12 / 52

Page 13: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Pie Chart Examples

image source: http://tips.vovici.com/content/111031 swb

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 13 / 52

Page 14: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Pie Chart Examples

image source: http://tips.vovici.com/content/111031 swb

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 14 / 52

Page 15: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Clutter Example

image source:http://junkcharts.typepad.com/junk charts/2013/03/which-software-is-responsible-for-this.html

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 15 / 52

Page 16: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Playfair

Astronomical observations, charts, and maps led in graphicalinnovation prior to 1800. See also Classic Data Visualizations

William Playfair is the pioneer of the line chart, bar chart, timeseries plots, and pie chart.

Playfair, W. (1786). Commercial and Political Atlas: Representing, byCopper-Plate Charts, the Progress of the Commerce, Revenues, Expenditure,and Debts of England, during the Whole of the Eighteenth Century,

Playfair, W. (1801). Statistical Breviary.

Both republished in The Commercial and Political Atlas and StatisticalBreviary, 2005, Cambridge University Press.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 16 / 52

Page 17: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Playfair Examples

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 17 / 52

Page 18: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Playfair Examples

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 18 / 52

Page 19: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Minard

Charles Joseph Minard was the next influential data graphic creatorafter Playfair.

Minard’s flow map of Napoleon’s Russian campaign is celebratedby Tufte and others as one of the greatest information graphics.

It embodies an ideal of highly compressed informative elements,presented with style

Six variables: size, location in 2 dimensions, the direction of thearmy, temperature, date [and group]

However, this is a one-off design that crosses into Infographics, butit can be reproduced in R and other software.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 19 / 52

Page 20: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Minard Examples

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 20 / 52

Page 21: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Minard Examples

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 21 / 52

Page 22: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Fisher and Tukey

In the 20th century, statisticians such as Ronald Fisher and JohnTukey continued to advance graphical methods for the analysis ofdata.

Fisher emphasized plotting the data to understand relationships.

Tukey’s Exploratory Data Analysis emphasized the use of graphicsto understand the data during analysis, rather than the finalpresentation to an outside audience.

Tukey created the box and whiskers plot and the stem and leafplot.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 22 / 52

Page 23: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Tufte

Edward R. Tufte’s series of books, beginning with The Visual Displayof Quantitative Information, have become the most widely know workson data visualization.

There is considerable overlap between the various publications

Tufte’s ideal is highly compressed, elegant, and informative data,as expressed in dense printed graphics

Tufte sometimes emphasizes beauty and design to the detriment ofsimplicity and clarity [e.g., train schedules]

“Graphical elegance is often found in simplicity of design andcomplexity of data.”

“Beautiful graphics do not traffic with the trivial.”

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 23 / 52

Page 24: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Train Schedule from Marey

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 24 / 52

Page 25: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Tufte’s principles

Tufte has developed and popularized numerous principles andterminology:

Graphics reveal data - show the data without distorting it - “above allelse show the data”

Small multiple - understanding one slice makes understanding otherseasier

Lie factor - effect shown/effect in reality

Graphical Integrity - no lies, let data vary, not design

Data density - maximize data/ink ratio

Sparklines - seems they haven’t caught on

chartjunk - self-explanatory

Powerpoint is responsible for most of the world’s sorrows [TheCognitive Style of Powerpoint]

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 25 / 52

Page 26: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Lie Factor

image source: http://www.datavis.ca/gallery/lie-factor.php

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 26 / 52

Page 27: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Cleveland

William Cleveland’s Elements of Graphing Data and VisualizingData pioneered systematic considerations of data legibility

Cleveland is particularly known for promoting the dot plot as aalternative to bars and pies.

The dot plot provides clarity and easy comparison of data.

Cleveland also pioneered Trellis graphics

Trellis graphics emphasizes comparison of multiple panels of data

The lattice package implements Trellis graphics in R

See Cleveland.pdf for a summary of Cleveland’s recommendations

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 27 / 52

Page 28: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Scatterplot matrix

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 28 / 52

Page 29: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

The Grammar of Graphics

The Grammar of Graphics, by Leland Wilkinson, was extremelyinfluential in thinking about graphics

Grammar means ”rules for art and science”

The Grammar of Graphics specifies rules both mathematical andaesthetic

Earlier graph producers focused on aesthetics of static content

Dynamic graphics and scientific visualization, by contrast, requiresophisticated designs to enable brushing, drill-down, zooming,linking

The Grammar of Graphics is easily adapted to this approach

ggplot2 was developed by Hadley Wickham as an implementationof the Grammar of Graphics

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 29 / 52

Page 30: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

From Barchart to Dot Plot

The Cleveland dot plot

use to compare labeled quantities, ordered lists

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 30 / 52

Page 31: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Figure: Bar chart v. Dot Plot

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 31 / 52

Page 32: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Visualizing Distributions of Data

Box and Whiskers Plot

illustrate quantiles and outliers. There is also a Tufte version.

Violin plot

Blends density information with box and whiskers style (in anartistic manner)

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 31 / 52

Page 33: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Figure: Box Plot v. Violin Plot

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 32 / 52

Page 34: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Visualizing Categorical Data

Beyond the pie chart

The mosaic plot allows multiple categories to be displayed on thesame graph, but can be complicated to interpret.

The spineplot is a variant of the mosaic plot, plotting proportionsin 2 dimensions.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 32 / 52

Page 35: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Figure: Pie Chart v. Mosaic Plot

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 33 / 52

Page 36: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Maps and Glyphs

Maps are obviously an important and widespread way of presentingdata.

We examine a few examples of choropleth maps, in which shadingindicates data levels

See also Interactive Maps in R and 5 kinds of Interactive maps inPlot.ly for further exploration

Glyphs present iconic representations of data elements.

Weather maps often use glyphs.

A more dynamic example is here.

As an R example, consider Chernoff faces and the aplpack

package. Also, Smiley faces [and many more graph variants in thischapter].

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 33 / 52

Page 37: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Figure: Choropleth Map v. Chernoff Faces

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 34 / 52

Page 38: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

3-D

3-D scatterplots

cloud (lattice)

contour plots

to plot standardized levels of data

wireframe plots

to present a 3-D surface representation of data

rgl (a separate package containing several 3d plotting functionsand animation)

mosaic3d extends the mosaic paradigm to three dimensions

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 34 / 52

Page 39: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Animation

Animation is an easy way to step through data over time

or to provide comparisons of different views of data

R makes animation easy with the animation package

Just enclose a sequence of graphics in the animation command togenerate interactive HTML (or GIF, SWF, LATEX, Video).

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 35 / 52

Page 40: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive DataViz - Principles

Why aren’t all of our graphs interactive?

Brushing is used to select data points and track them throughvarious analyses.

Drilling down, zooming, and subsetting are also interactivetechniques.

Data displays can be linked so that a selection in one panelmodifies the output displayed in another panel.

Interactivity is especially useful for data exploration, studyingmultidimensional relationships.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 36 / 52

Page 41: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Data in Practice

There are many R packages that allow for interactive data work in agraphical user interface, including:

playwith - versatile package that works with any graphicsfunction. Graphics can be explored, edited, and exported.

requires separate installation of GTK+ on your computer [method variesby OS]

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 37 / 52

Page 42: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

googleVis

In many contexts, visualizing the relationships between data elementsis made easier by viewing related data interactively.

Making this easy are googleVis and other “Vis” packages, e.g.bdvis for biodiversity or rainfreq.

A Library example - comparing selected ARL Statistics for publicCIC universities

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 38 / 52

Page 43: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Data on the Web - Rcharts

Rcharts is a package that uses javascript to create interactivevisualizations.

Lattice-style commands are used.

The package can output javascript for use in an HTML page.

Some commands depend on supplemental javascript libraries thatmust be installed, such as NVD3

Can embed in documents too, with slidify

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 39 / 52

Page 44: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Data on the Web - shiny

The shiny package is developed by the Rstudio folks

You can learn shiny in half a day via the online tutorial

More custom control of the design is possible with shiny, incomparison to other do-it-all packages

Graphics use familiar R syntax (including ggplot2), with wrappersto implement web functionality

Every shiny app has the same structure: two R scripts savedtogether in a directory [ui and server files]

You must install the shiny server to deliver pages via the web

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 40 / 52

Page 45: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Data on the Web - shiny, cont.

There are samples built into the shiny package.

You can build a Census Explorer of your own with theseinstructions from Ari Lamstein.

You can see more in the shiny gallery

Rcharts works with shiny too.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 41 / 52

Page 46: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Data on the Web - ggvis

The ggvis package is ALSO developed by the Rstudio folks

Think ggplot meets shiny

Similar syntax to ggplot

Some ability to add interactive controls

Can embed in shiny for web access

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 42 / 52

Page 47: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Data on the Web - radiant

Radiant is another new R interface built with shiny

The following links demonstrate capabilities:

vnijs.shinyapps.io/basevnijs.shinyapps.io/quantvnijs.shinyapps.io/marketing

By automating the mechanics of interacting with data, we canfocus on exploring and understanding.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 43 / 52

Page 48: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Other (non-R) options for Web visualization

D3.js, free at http://d3js.org/

Inkscape, free at https://inkscape.org/

Tableau, free 1-year student license athttp://www.tableau.com/academic/students

Plot.ly environment at http://plot.ly

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 44 / 52

Page 49: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Interactive Power

Population pyramids are one example whereinteractivity + animation = insight .

Populationpyramid.net - for all countries, basic animation

The German Population Pyramid from Destatis is even moreinteractive

Doing it in R is possible with these instructions (Part 1) and (Part2)

The ggvis package is ALSO developed by the Rstudio folks

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 45 / 52

Page 50: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Big Data

Big data presents special issues for data visualization

While many techniques and graphics are the same, explorationand plotting must be optimized for the size of the data set

Representation of the complexity of the data may require specialtechniques

hexbin

bigvis

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 46 / 52

Page 51: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

bigvis

bigvis was an experimental package by Hadley Wickham to deal withthe issues of Big Data

There is a Preprint and R Meetup presentation by Hadley Wickham

Complete code is available at https://github.com/hadley/bigvis-infovis

Target: process 100 million observations in under 5 seconds.

Fundamental principle: No need for more data points than there arepixels on the screen.

“ggstat” package has been mentioned as a future project that willincorporate these ideas.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 47 / 52

Page 52: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

bigvis steps

Condense (bin, condense)

Smooth (smooth, best_h, peel)

Visualize (autoplot plus standard methods)

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 48 / 52

Page 53: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Trelliscope (Tessera)

Tessera is developed by Purdue, Pacific Northwest NationalLaboratory, and Mozilla. Launched in November 2014, this projectholds a lot of promise.

Running in the R environment, Tessera provides its own commands thatexecute across a cluster, easing the burden of analysis in this environment.

The datadr package “divides and recombines” in a manner similarto MapReduce, providing a simplified interface to Hadoop.

Tessera has its own visualization interface, Trelliscope, that canhandle views across many variables and observations. Described inthis paper.

Tessera’s Bootcamp is a good introduction, or try the quickstart.

Live demo is here.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 49 / 52

Page 54: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Infographics links

Although not covered here, the following links are a sampling ofinfographics sites for your later enjoyment:

Data Storytelling in Video

Art of Data Visualization - in spite of its title, more on theinfographics side

Parisian Subway Traffic and New York Subway Inequality

Tulp Interactive

Mapping London and London Riots + Twitter

YouTube Trends Map

Global Burden of Disease Visualizations

and the Tree of Life

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 50 / 52

Page 55: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

Keep Exploring

Data Visualization represents a nearly infinite world of possibilty forexploration:

plunge into programming

deep dives into data

indulge in interactivity

...have fun and keep learning! [e.g., R-bloggers.com]

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 51 / 52

Page 56: ICPSR Biennial Meeting October 2, 2015 Ryan Womack ...€¦ · to present a 3-D surface representation of data rgl (a separate package containing several 3d plotting functions and

References

There is also an online bibliography of references to accompany thispresentation on my home page.

Ryan Womack ([email protected]) Data Librarian, Rutgers University(A bit about) Data Visualization 52 / 52