Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka...

79
Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@ inundata) for original slides February 22, 2015

Transcript of Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka...

Page 1: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Data Visualization Using R & ggplot2

Naupaka Zimmerman (@naupakaz)Andrew Tredennick (@ATredennick)

Hat tip to Karthik Ram (@ inundata) for original slides

February 22, 2015

Page 2: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some housekeeping

Install some packages

install.packages("ggplot2", dependencies = TRUE)install.packages("plyr")install.packages("ggthemes")install.packages("reshape2")

Page 3: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 1

Why ggplot2?

Page 4: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Why ggplot2?

I More elegant & compact code than with base graphics

I More aesthetically pleasing defaults than lattice

I Very powerful for exploratory data analysis

Page 5: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Why ggplot2?

I ‘gg’ is for ‘grammar of graphics’ (term by Lee Wilkinson)

I A set of terms that defines the basic components of a plot

I Used to produce figures using coherant, consistant syntax

Page 6: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Why ggplot2?

I Supports a continuum of expertise:

I Easy to get started, plenty of power for complex figures

Page 7: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 2

The Grammar

Page 8: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data I Must be a data.frame

I Gets pulled into the ggplot() object

Page 9: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

The iris dataset

head(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 �.2 setosa## 2 4.9 3.� 1.4 �.2 setosa## 3 4.7 3.2 1.3 �.2 setosa## 4 4.6 3.1 1.5 �.2 setosa## 5 5.� 3.6 1.4 �.2 setosa## 6 5.4 3.9 1.7 �.4 setosa

Page 10: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

plyr and reshape are key for using R

These two packages are the swiss army knives of R.I plyr

1. ddply (data frame to data frame ply)1.1 split

1.2 apply

1.3 combine

2. llply (list to list ply)3. join

Page 11: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

plyr

iris[1:2, ]

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 �.2 setosa## 2 4.9 3.� 1.4 �.2 setosa

# Note the use of the . function to allow Species to be used# without quotingddply(iris, .(Species), summarize,

mean.Sep.Wid = mean(Sepal.Width, na.rm = TRUE))

## Species mean.Sep.Wid## 1 setosa 3.428## 2 versicolor 2.77�## 3 virginica 2.974

Page 12: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

plyr and reshape are key for using R

These two packages are the swiss army knives of R.I reshape

1. melt2. dcast (data frame output)3. acast (vector/matrix/array output)

Page 13: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

reshape2

iris[1:2, ]

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 �.2 setosa## 2 4.9 3.� 1.4 �.2 setosa

df <- melt(iris, id.vars = "Species")df[1:2, ]

## Species variable value## 1 setosa Sepal.Length 5.1## 2 setosa Sepal.Length 4.9

Page 14: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

reshape2

df[1:2, ]

## Species variable value## 1 setosa Sepal.Length 5.1## 2 setosa Sepal.Length 4.9

dcast(df, Species ˜ variable, mean)

## Species Sepal.Length Sepal.Width Petal.Length## 1 setosa 5.��6 3.428 1.462## 2 versicolor 5.936 2.77� 4.26�## 3 virginica 6.588 2.974 5.552## Petal.Width## 1 �.246## 2 1.326## 3 2.�26

Page 15: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 3

Aesthetics

Page 16: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data

I aesthetics

I How your data are represented visuallyI a.k.a. mapping

I which data on the x

I which data on the y

I but also: color, size, shape, transparency

Page 17: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Let’s try an example

myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))summary(myplot)

## data: Sepal.Length, Sepal.Width, Petal.Length,## Petal.Width, Species [15�x5]## mapping: x = Sepal.Length, y = Sepal.Width## faceting: facet_null()

Page 18: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 4

Geoms

Page 19: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data

I aesthetics

I geometry

I The geometric objects in the plot

I points, lines, polygons, etc

I shortcut functions: geom point(),geom bar(), geom line()

Page 20: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Basic structure

ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))+ geom_point()

myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))myplot + geom_point()

I Specify the data and variables inside the ggplot function.

I Anything else that goes in here becomes a global setting.

I Then add layers: geometric objects, statistical models, andfacets.

Page 21: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Quick note

I Never use qplot - short for quick plot.

I You‘ll end up unlearning and relearning a good bit.

Page 22: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Let’s try an example

ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +geom_point()

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8Sepal.Length

Sepal.W

idth

Page 23: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Changing the aesthetics of a geom:Increase the size of points

ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +geom_point(size = 3)

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8Sepal.Length

Sepal.W

idth

Page 24: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Changing the aesthetics of a geom:Add some color

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(size = 3)

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 25: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Changing the aesthetics of a geom:Di↵erentiate points by shape

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(aes(shape = Species), size = 3)

# Why aes(shape = Species)?

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 26: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Exercise 1

# Make a small sample of the diamonds datasetd2 <- diamonds[sample(1:dim(diamonds)[1], 1���), ]

Then generate this plot below.

0

5000

10000

15000

1 2 3carat

price

colorDEFGHIJ

Page 27: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 5

Stats

Page 28: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data

I aesthetics

I geometry

I stats

I Statistical transformations and datasummary

I All geoms have associated default stats, andvice versa

I e.g. binning for a histogram or fitting alinear model

Page 29: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Built-in stat example: Boxplots

See ?geom boxplot for list of options

library(MASS)ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot()

1000

2000

3000

4000

5000

1 2 3factor(race)

bwt

Page 30: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Built-in stat example: Boxplots

myplot <- ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot()summary(myplot)

## data: low, age, lwt, race, smoke, ptl, ht, ui, ftv,## bwt [189x1�]## mapping: x = factor(race), y = bwt## faceting: facet_null()## -----------------------------------## geom_boxplot: outlier.colour = black, outlier.shape = 16, outlier.size = 2, notch = FALSE, notchwidth = �.5, varwidth = FALSE## stat_boxplot:## position_dodge: (width = NULL, height = NULL)

Page 31: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 6

Facets

Page 32: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data

I aesthetics

I geometry

I stats

I facets

I Subsetting data to make lattice plots

I Really powerful

Page 33: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Faceting: single column, multiple rows

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_grid(Species ˜ .)

2.02.53.03.54.04.5

2.02.53.03.54.04.5

2.02.53.03.54.04.5

setosaversicolor

virginica

5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 34: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Faceting: single row, multiple columns

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_grid(. ˜ Species)

setosa versicolor virginica

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 35: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

or just wrap your facets

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_wrap( ˜ Species) # notice lack of .

setosa versicolor virginica

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 36: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 7

Scales

Page 37: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data

I aesthetics

I geometry

I stats

I facets

I scales

I Control the mapping from data toaesthetics

I Often used for adjusting color mapping

Page 38: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Colors

aes(color = variable) # mappingcolor = "black" # setting

# Or add it as a scalescale_fill_manual(values = c("color1", "color2"))

Page 39: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

The RColorBrewer package

library(RColorBrewer)display.brewer.all()

Page 40: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Using a color brewer palette

df <- melt(iris, id.vars = "Species")ggplot(df, aes(Species, value, fill = variable)) +

geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Set1")

0

2

4

6

8

setosa versicolor virginicaSpecies

value

variableSepal.LengthSepal.WidthPetal.LengthPetal.Width

Page 41: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Manual color scale

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_grid(Species ˜ .) +scale_color_manual(values = c("red", "green", "blue"))

2.02.53.03.54.04.5

2.02.53.03.54.04.5

2.02.53.03.54.04.5

setosaversicolor

virginica

5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 42: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Refer to a color chart for beautful visualizations

http://tools.medialab.sciences-po.fr/iwanthue/

Page 43: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Adding a continuous scale to an axis

library(MASS)ggplot(birthwt, aes(factor(race), bwt)) +

geom_boxplot(width = .2) +scale_y_continuous(labels = (paste�(1:4, " Kg")),

breaks = seq(1���, 4���, by = 1���))

1 Kg

2 Kg

3 Kg

4 Kg

1 2 3factor(race)

bwt

Page 44: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Commonly used scales

scale_fill_discrete(); scale_colour_discrete()scale_fill_hue(); scale_color_hue()scale_fill_manual(); scale_color_manual()scale_fill_brewer(); scale_color_brewer()scale_linetype(); scale_shape_manual()

Page 45: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 8

Coordinates

Page 46: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Some terminology

I data

I aesthetics

I geometry

I stats

I facets

I scales

I coordinates

I Not going to cover this in detail

I e.g. polar coordinate plots

Page 47: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 9

Putting it all together with more examples

Page 48: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 10

Histograms

Page 49: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

See ?geom histogram for list of options

h <- ggplot(faithful, aes(x = waiting))h + geom_histogram(binwidth = 3�, colour = "black")

0

50

100

150

0 50 100 150waiting

count

Page 50: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

h <- ggplot(faithful, aes(x = waiting))h + geom_histogram(binwidth = 8, fill = "steelblue",colour = "black")

0

20

40

60

30 50 70 90waiting

count

Page 51: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 11

Line plots

Page 52: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

climate <- read.csv("data/climate.csv", header = T)ggplot(climate, aes(Year, Anomaly1�y)) +

geom_line()

0.0

0.5

1920 1950 1980Year

Anom

aly10y

climate <- read.csv(text =

RCurl::getURL(https://raw.github.com/karthikram/ggplot-lecture/master/climate.csv))

Page 53: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

We can also plot confidence regions

climate <- read.csv("data/climate.csv", header = T)ggplot(climate, aes(Year, Anomaly1�y)) +

geom_ribbon(aes(ymin = Anomaly1�y - Unc1�y,ymax = Anomaly1�y + Unc1�y),fill = "blue", alpha = .1) +

geom_line(color = "steelblue")

0.0

0.5

1920 1950 1980Year

Anom

aly10y

Page 54: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 12

Bar plots

Page 55: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggplot(iris, aes(Species, Sepal.Length)) +geom_bar(stat = "identity")

0

100

200

300

setosa versicolor virginicaSpecies

Sepal.Length

Page 56: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

df <- melt(iris, id.vars = "Species")ggplot(df, aes(Species, value, fill = variable)) +

geom_bar(stat = "identity")

0

250

500

750

setosa versicolor virginicaSpecies

value

variableSepal.LengthSepal.WidthPetal.LengthPetal.Width

Page 57: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggplot(df, aes(Species, value, fill = variable)) +geom_bar(stat = "identity", position = "dodge")

0

2

4

6

8

setosa versicolor virginicaSpecies

value

variableSepal.LengthSepal.WidthPetal.LengthPetal.Width

What’s going on with the y axis?

Page 58: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggplot(df, aes(Species, value, fill = variable)) +geom_bar(stat = "identity", position="dodge", color="black")

0

2

4

6

8

setosa versicolor virginicaSpecies

value

variableSepal.LengthSepal.WidthPetal.LengthPetal.Width

Page 59: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Exercise 3

Using the d2 dataset you created earlier, generate this plot below.Take a quick look at the data first to see if it needs to be binned.

0

25

50

75

I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IFclarity

coun

t

cutFairGoodVery GoodPremiumIdeal

Page 60: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 13

Density Plots

Page 61: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Density plots

ggplot(faithful, aes(waiting)) + geom_density()

0.00

0.01

0.02

0.03

50 60 70 80 90waiting

density

Page 62: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Density plots

ggplot(faithful, aes(waiting)) +geom_density(fill = "blue", alpha = �.1)

0.00

0.01

0.02

0.03

50 60 70 80 90waiting

density

Page 63: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggplot(faithful, aes(waiting)) +geom_line(stat = "density")

0.01

0.02

0.03

50 60 70 80 90waiting

density

Page 64: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 14

Adding smoothers

Page 65: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(aes(shape = Species), size = 3) +geom_smooth(method = "lm")

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 66: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(aes(shape = Species), size = 3) +geom_smooth(method = "lm") +facet_grid(. ˜ Species)

setosa versicolor virginica

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 67: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 15

Themes

Page 68: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Adding themes

Themes are a great way to define custom plots.

+ theme()# see ?theme() for more options

Page 69: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

A more basic theme

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(size = 1.2, shape = 16) +facet_wrap( ˜ Species) +theme_bw()

setosa versicolor virginica

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length

Sepal.W

idth Species

setosaversicolorvirginica

Page 70: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

A themed plot

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(size = 1.2, shape = 16) +facet_wrap( ˜ Species) +theme(legend.key = element_rect(fill = NA),

legend.position = "bottom",strip.background = element_rect(fill = NA),axis.title.y = element_text(angle = �))

Page 71: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

A themed plot

setosa versicolor virginica

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length

Sepal.Width

Species setosa versicolor virginica

Page 72: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

ggthemes library

install.packages(ggthemes)library(ggthemes)# Then add one of these themes to your plot+ theme_stata()+ theme_excel()+ theme_wsj()+ theme_solarized()

Page 73: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Fan of Wes Anderson movies?

Page 74: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Yup, that’s a thing

# install.packages(wesanderson)library("wesanderson")# display a palettewes_palette("Royal2", 5)

Royal2

Page 75: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 16

Create functions to automate your plotting

Page 76: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Write functions for day to day plots

my_custom_plot <- function(df, title = "", ...) {ggplot(df, ...) +ggtitle(title) +whatever_geoms() +theme(...)

}

Then just call your function to generate a plot. It’s a lot easier tofix one function that do it over and over for many plots

plot1 <- my_custom_plot(dataset1, title = "Figure 1")

Page 77: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Section 17

Publication quality figures

Page 78: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

I If the plot is on your screen

ggsave(˜/path/to/figure/filename.png)

I If your plot is assigned to an object

ggsave(plot1, file = "˜/path/to/figure/filename.png")

I Specify a size

ggsave(file = "/path/to/figure/filename.png", width = 6,height =4)

I or any format (pdf, png, eps, svg, jpg)

ggsave(file = "/path/to/figure/filename.eps")ggsave(file = "/path/to/figure/filename.jpg")ggsave(file = "/path/to/figure/filename.pdf")

Page 79: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@

Further help

I You’ve just scratched the surface with ggplot2.

I Practice

I Read the docs (either locally in R or athttp://docs.ggplot2.org/current/)

I Work together