A Backstage Tour of ggplot2 with Hadley Wickham

54
February 2012 Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics Rice University ggplot2: A backstage tour Wednesday, February 8, 12

description

Ggplot2 is one of R’s most popular, widely used packages, developed by Rice University’s Hadley Wickham. Ggplot2’s exploratory graphics capabilities are driving the use of R as a complement to legacy analytics tools such as SAS. SAS is well-regarded for its strength in data management and "production" statistics, where you know what you want to do and need to do it repeatedly. On the other hand, R is strong in data analysis and exploration in situations where figuring out what is needed is the biggest challenge. In this important way, SAS and R are strong companions. This webinar will provide an all-access pass to Hadley’s latest work. He’ll discuss: * A brief overview of ggplot2, and how it's different to other plotting systems * A sneak peek at some of the new features coming to the next version of ggplot2 * What’s been learned about good development practices in the 5 years since first starting to develop ggplot * Some of the internals of ggplot2, and talk about how he is gradually making it easier for others to contribute.

Transcript of A Backstage Tour of ggplot2 with Hadley Wickham

Page 2: A Backstage Tour of ggplot2 with Hadley Wickham

1. Why ggplot2?

2. Sneak peek and new features

3. Best practices

4. Questions

Wednesday, February 8, 12

Page 3: A Backstage Tour of ggplot2 with Hadley Wickham

Poll: What graphics system are you currently using?

Wednesday, February 8, 12

Page 4: A Backstage Tour of ggplot2 with Hadley Wickham

Why ggplot2?

Wednesday, February 8, 12

Page 5: A Backstage Tour of ggplot2 with Hadley Wickham

WHC

day

whc

−0.3

−0.2

−0.1

0.0

0.1

0.2

20 40 60 80

02H02M12H

2004

Wednesday, February 8, 12

Page 6: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 7: A Backstage Tour of ggplot2 with Hadley Wickham

“Nothing is as practical as a good theory”—Kurt Lewin

“[A good model] will bring together in a coherent way things that previously appeared unrelated and which also will provide a basis for dealing systematically with new situations”—David Cox

Wednesday, February 8, 12

Page 8: A Backstage Tour of ggplot2 with Hadley Wickham

A plot is made up of multiple layers.A layer consists of data, a set of mappings between variables and aesthetics, a geometric object and a statistical transformationScales control the details of the mapping.All components are independent and reusable.

Wednesday, February 8, 12

Page 9: A Backstage Tour of ggplot2 with Hadley Wickham

Interesting ggplot exampleLayered grammar + ggplot2

James Cheshire, http://bit.ly/xqHhAsWednesday, February 8, 12

Page 10: A Backstage Tour of ggplot2 with Hadley Wickham

Charlotte Wickham, http://cwick.co.nz/Wednesday, February 8, 12

Page 11: A Backstage Tour of ggplot2 with Hadley Wickham

David B Sparks, http://bit.ly/hn54NWWednesday, February 8, 12

Page 12: A Backstage Tour of ggplot2 with Hadley Wickham

Claudia Beleites, http://bit.ly/yNqlpzWednesday, February 8, 12

Page 13: A Backstage Tour of ggplot2 with Hadley Wickham

Poll: What resources are most helpful to you when improving your R skills?

Wednesday, February 8, 12

Page 14: A Backstage Tour of ggplot2 with Hadley Wickham

Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403

Wednesday, February 8, 12

Page 15: A Backstage Tour of ggplot2 with Hadley Wickham

Sneak peek

Wednesday, February 8, 12

Page 16: A Backstage Tour of ggplot2 with Hadley Wickham

Poll: Why do you use visualisation?

Wednesday, February 8, 12

Page 17: A Backstage Tour of ggplot2 with Hadley Wickham

# Getting started

# To get the CRAN versioninstall.packages("ggplot2")

# To get the development versioninstall.packages("devtools")library(devtools)dev_mode() # don't overwrite your existing installinstall_github("ggplot2")

Wednesday, February 8, 12

Page 18: A Backstage Tour of ggplot2 with Hadley Wickham

Development version

CRAN version

Wednesday, February 8, 12

Page 19: A Backstage Tour of ggplot2 with Hadley Wickham

15

20

25

30

35

40

45

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

New geoms to deal with overplotting(by Winston Chang)

Wednesday, February 8, 12

Page 20: A Backstage Tour of ggplot2 with Hadley Wickham

15

20

25

30

35

40

45

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg)

New geoms to deal with overplotting(by Winston Chang)

Wednesday, February 8, 12

Page 21: A Backstage Tour of ggplot2 with Hadley Wickham

15

20

25

30

35

40

45

●●

●●

●●●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg, geom = "jitter")Wednesday, February 8, 12

Page 22: A Backstage Tour of ggplot2 with Hadley Wickham

15

20

25

30

35

40

45

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg, geom = "violin")Wednesday, February 8, 12

Page 23: A Backstage Tour of ggplot2 with Hadley Wickham

10

15

20

25

30

35

40

45

●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●●●●●●●●●●

●●●●●

●●●●●

●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

Wednesday, February 8, 12

Page 24: A Backstage Tour of ggplot2 with Hadley Wickham

10

15

20

25

30

35

40

45

●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●●●●●●●●●●

●●●●●

●●●●●

●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

2seater compact midsize minivan pickup subcompact suvclass

hwy

qplot(class, hwy, data = mpg, geom = "dotplot", stackdir = "center", binaxis = "y", stackratio = 1, binwidth = 1)Wednesday, February 8, 12

Page 25: A Backstage Tour of ggplot2 with Hadley Wickham

0.0

0.2

0.4

0.6

0.8

1.0

● ●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

colour●

0.51.01.52.02.5

Better legends(by Kohske Takahashi)

Wednesday, February 8, 12

Page 26: A Backstage Tour of ggplot2 with Hadley Wickham

0.0

0.2

0.4

0.6

0.8

1.0

● ●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

colour●

0.51.01.52.02.5

df <- data.frame(x = runif(100), y = runif(100))df$colour <- with(df, x ^ 2 + y + runif(100))

qplot(x, y, data = df, colour = colour)

Better legends(by Kohske Takahashi)

Wednesday, February 8, 12

Page 27: A Backstage Tour of ggplot2 with Hadley Wickham

0.0

0.2

0.4

0.6

0.8

1.0

●●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

colour● ● ●

● ●

0.5 1.0 1.5

2.0 2.5

qplot(x, y, data = df, colour = colour) + guides(colour = guide_legend(nrow = 2, byrow = T)) Wednesday, February 8, 12

Page 28: A Backstage Tour of ggplot2 with Hadley Wickham

0.0

0.2

0.4

0.6

0.8

1.0

● ●

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0x

y

0.51.01.52.02.5

colour

qplot(x, y, data = df, colour = colour) + guides(colour = guide_colorbar())Wednesday, February 8, 12

Page 29: A Backstage Tour of ggplot2 with Hadley Wickham

qplot(x, y, data = df, colour = colour, alpha = I(1/4))Wednesday, February 8, 12

Page 30: A Backstage Tour of ggplot2 with Hadley Wickham

qplot(x, y, data = df, colour = colour, alpha = I(1/4)) + guides(colour = guide_legend( override.aes = list(alpha = 1, size = 2)))Wednesday, February 8, 12

Page 31: A Backstage Tour of ggplot2 with Hadley Wickham

# Better layout

df <- data.frame(x = 1:10, y = 10:1, colour = 1:2)qplot(x, y, data = df) + coord_fixed()qplot(x, y, data = df) + facet_wrap(~ colour)

# Internally, there has been a big rewrite of # the facetting data processing and rendering# systems. This lays the foundation for new # features, and fixes some annoying long-standing # bugs.

Wednesday, February 8, 12

Page 32: A Backstage Tour of ggplot2 with Hadley Wickham

# Speed improvements

system.time( print(qplot(carat, price, data = diamonds)))

# Includes new tools for figuring out what's# taking all the timebenchplot(qplot(carat, price, data = diamonds))

# See also geom_raster and geom_map

# Still a lot of work to do. The emphasis in# ggplot2 is reducing the amount of thinking # time by making it easier to go from the plot in# your brain to the plot on the page.

Wednesday, February 8, 12

Page 33: A Backstage Tour of ggplot2 with Hadley Wickham

30s with geom_tile, 8s with annotation_rasterWednesday, February 8, 12

Page 34: A Backstage Tour of ggplot2 with Hadley Wickham

library(ggplot2)library(reshape2)library(RgoogleMaps)library(ggmap)

theft <- subset(crime, offense == "theft" & lat > 29 & lat < 30.2 & lon > -95.8)

lonr <- range(theft$lon)latr <- range(theft$lat)

h_map <- GetMap.bbox(lonr, latr, size = c(1024, 1024))h_raster <- as.raster(h_map$myTile)

benchplot(ggplot(theft, aes(lon, lat)) + annotation_raster(h_raster, lonr[1], lonr[2], latr[1], latr[2]) + geom_density2d(colour = "black"))

h_data <- melt(as.matrix(h_raster))h_data$lat <- seq(latr[2], latr[1], length = 640)[h_data$Var1]h_data$lon <- seq(lonr[1], lonr[2], length = 640)[h_data$Var2]

benchplot(ggplot(theft, aes(lon, lat)) + geom_tile(aes(fill = value), data = h_data) + scale_fill_identity() + geom_density2d(colour = "black"))

Wednesday, February 8, 12

Page 35: A Backstage Tour of ggplot2 with Hadley Wickham

ggplot2 0.9 scheduled for release on March 1

Wednesday, February 8, 12

Page 36: A Backstage Tour of ggplot2 with Hadley Wickham

Poll: How big is your data?

Wednesday, February 8, 12

Page 37: A Backstage Tour of ggplot2 with Hadley Wickham

# Future work: big visualisation# (Sponsored by Revolution Analytics)

# How can you make a plot of 100 million # observations?

# In less that one minute.

Wednesday, February 8, 12

Page 38: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 39: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 40: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 41: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 42: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 43: A Backstage Tour of ggplot2 with Hadley Wickham

~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot

Wednesday, February 8, 12

Page 44: A Backstage Tour of ggplot2 with Hadley Wickham

~100,000 points0.06 s to bin0.20 s to convert6.0 s to plot

~1.2 million10 s to bin

Wednesday, February 8, 12

Page 45: A Backstage Tour of ggplot2 with Hadley Wickham

Best practices

Wednesday, February 8, 12

Page 46: A Backstage Tour of ggplot2 with Hadley Wickham

Poll: How do you learn about new packages?

Wednesday, February 8, 12

Page 47: A Backstage Tour of ggplot2 with Hadley Wickham

Package best practices

• Namespace• Documentation• Unit tests• Read the source!

• (ggplot2 not always the best example: it was was my second R package - I have now written around 30. I now know a lot more!)

Wednesday, February 8, 12

Page 48: A Backstage Tour of ggplot2 with Hadley Wickham

Wednesday, February 8, 12

Page 49: A Backstage Tour of ggplot2 with Hadley Wickham

# Namespaces

library(ggplot2)ddply

# Note that plyr, reshape etc aren't automatically# loaded. This is good development practice - # it's better to be explicit than implicit.

# Look at the NAMESPACE file.

Wednesday, February 8, 12

Page 50: A Backstage Tour of ggplot2 with Hadley Wickham

export("%+%")export(aes_all)export(aes_auto)export(aes_string)export(aes)export(annotate)export(annotation_custom)export(annotation_map)export(annotation_raster)export(autoplot)export(benchplot)export(borders)export(continuous_scale)export(coord_cartesian)export(coord_equal)export(coord_fixed)export(coord_flip)export(coord_map)export(coord_polar)...

Wednesday, February 8, 12

Page 51: A Backstage Tour of ggplot2 with Hadley Wickham

# Unit tests

# Look in tests/ or inst/tests/

library(testthat)test_package("ggplot2")

Wednesday, February 8, 12

Page 52: A Backstage Tour of ggplot2 with Hadley Wickham

# Documentation

# Function level in man/?geom_point?facet_wrappackage?ggplot2

# Vignettes in inst/doc# (ggplot2 doesn't have any)

# Publicationscitation("ggplot2")

Wednesday, February 8, 12

Page 53: A Backstage Tour of ggplot2 with Hadley Wickham

Questions

Wednesday, February 8, 12

Page 54: A Backstage Tour of ggplot2 with Hadley Wickham

Learning ggplot2ggplot2 mailing listhttp://groups.google.com/group/ggplot2stackoverflowhttp://stackoverflow.com/tags/ggplot2Lattice to ggplot2 conversionhttp://learnr.wordpress.com/?s=latticeCookbook for common graphicshttp://wiki.stdout.org/rcookbook/Graphs/ggplot2 bookhttp://amzn.com/0387981403

Wednesday, February 8, 12