Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Post on 17-Jul-2015

275 views 0 download

Transcript of Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Introduction to ggplot22

Paul Richards, ScHARR, The University of Sheffield

Thursday, February 26, 2015

Introduction

I ggplot2 is a package written by Hadley WickhamI Powerful but easy to use functions for 2D graphicsI Based on the “Grammar of Graphics” theory by Leland

WilkinsonI use install.packages() to install latest version

Concepts

I ggplot2 works with data.framesI aesthetic = a feature we can see on the graphic (shape, size,

colour etc)I map from data to aestheics (i.e. different colour per group)I layer = geometric object + data/aesthetics + statistical

transformationI a graphic will also have scale(s) and a co-ordinate systemI may also have facets - subsetting plot by some characteristic(s)

Quick note on “plot”

I The “plot” function in base R is just a wrapper for lots ofmethods

I Behaviour depends on what object is suppliedI Can require some manual tinkering to get it to work as requiredI Example, using the “iris” data, plot petal length vs sepal lengthI Different colour for each speciesI Point size varies by sepal width

Scatterplot with “plot”

with(iris, plot(Sepal.Length, Petal.Length,col = Species, cex = Sepal.Width))

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

12

34

56

7

Sepal.Length

Pet

al.L

engt

h

Problems

I Cannot specify dataset within functionI No legend, have to add manually via legend() which is fiddly to

useI Arguments are a bit “dumb” - we need to rescale Sepal.Width

to get better point sizesI Need to use different functions to add new geometric objects,

e.g. regression lines

Qplot

qplot(Sepal.Length, Petal.Length, data = iris,color = Species, size = Sepal.Width)

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Qplot()

I Qplot is the nearest equivalent to plot() in ggplot2I For single layer plots this is easy enough to useI Use geom argument to change plot type

Boxplot exampleqplot(Species, Sepal.Length, data = iris, geom="boxplot")

5

6

7

8

setosa versicolor virginicaSpecies

Sep

al.L

engt

h

Violin exampleqplot(Species, Sepal.Length, data = iris, geom="violin")

5

6

7

8

setosa versicolor virginicaSpecies

Sep

al.L

engt

h

Histogram exampleqplot(Sepal.Length, data = iris, fill = Species)

0

5

10

4 5 6 7 8Sepal.Length

coun

t

Species

setosa

versicolor

virginica

ggplot()

I For multilayer plots or where more flexibility is requiredI ggplot() sets up the default data and aesthetic mappingsI add layers using the “+” operator and appropriate functionsI all aesthetic mappings are wrapped in aes() functionI global changes (e.g. set all points to “red”) go outside

Iris scatterplot againggplot(data = iris, aes(x = Sepal.Length,

y = Petal.Length, color = Species)) +geom_point(aes(size = Sepal.Width))

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Alternativeggplot(data = iris, aes(x = Sepal.Length,

y = Petal.Length, color = Species)) +geom_point(size = 5)

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h Species

setosa

versicolor

virginica

Adding to plots

I ggplots are objects so you can save them as you goI You can then add new layers etc to the saved object

gg1 <- ggplot(data = iris, aes(x = Sepal.Length,y = Petal.Length, color = Species)) +

geom_point(aes(size = Sepal.Width))

gg2 <- gg1 + geom_smooth()gg2

Iris plot with loess smoother

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Add contours

gg3 <- gg2 + geom_density2d()gg3

Add contours

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Note on statistical transformations

I The loess smoother and contours in the previous plot are notpart of data

I In base plot we would have to calculate them firstI In ggplot2 the stat_*() functions do such tranformations for usI Often the corresponding geom_*() function does this

automaticallyI For more flexibility, use the stat_*() function and specify the

geom you need

Histogram example using stat_bin

I stat_bin performs a 1d “binning” transformationI i.e. a histogram transformation

ggplot(data = iris,aes(x = Sepal.Length, fill = Species)) +

stat_bin(binwidth=1,position="dodge")

Histogram example using stat_bin

0

10

20

30

4 6 8Sepal.Length

coun

t

Species

setosa

versicolor

virginica

Density plot with facet

I Use facet_grid() to subset plots by up to 2 variablesI Function takes a formula as its main argumentI Row variable on left, column variable on rightI Use . if no variable needed

ggplot(data = iris, aes(x = Sepal.Length)) +geom_density(fill = Species) +facet_grid(Species ~ .)

Density plot with facet

0.0

0.4

0.8

1.2

0.0

0.4

0.8

1.2

0.0

0.4

0.8

1.2

setosaversicolor

virginica

5 6 7 8Sepal.Length

dens

ity

Species

setosa

versicolor

virginica

Scale + axis control

I Adding title labels, axes etc is similar to adding layersI Use the “+” operator with the appropriate functions

ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",

x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")

Scale + axis control

0

5

10

15

100 200 300Arrests for Assault (per 100 000)

Arr

ests

for

Mur

der

(per

100

000

)

UrbanPop

40

50

60

70

80

90

Violent Crime Rates by US State, 1973

Logged x axis

ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",

x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")

+ scale_x_log10()

Logged x axis

0

5

10

15

100Arrests for Assault (per 100 000)

Arr

ests

for

Mur

der

(per

100

000

)

UrbanPop

40

50

60

70

80

90

Violent Crime Rates by US State, 1973

Conclusion

I ggplot2 works best with “long” format dataI One row per observation, rather than different obs in different

columnsI See “reshape2” package for easy conversion between “wide”

and “long” data formats

Where to learn more

I Web documentation is a good place to startI http://docs.ggplot2.orgI Lots of examples on blogs, stackoverflow etc.I We have only scratched the surface here!I Why not bring some example data visualisations to the next

meeting?I Tweet your plots @Sheffield_R_