Intro to ggplot2 - Sheffield R Users Group, Feb 2015

29
Introduction to ggplot22 Paul Richards, ScHARR, The University of Sheffield Thursday, February 26, 2015

Transcript of Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Page 1: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Introduction to ggplot22

Paul Richards, ScHARR, The University of Sheffield

Thursday, February 26, 2015

Page 2: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Introduction

I ggplot2 is a package written by Hadley WickhamI Powerful but easy to use functions for 2D graphicsI Based on the “Grammar of Graphics” theory by Leland

WilkinsonI use install.packages() to install latest version

Page 3: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Concepts

I ggplot2 works with data.framesI aesthetic = a feature we can see on the graphic (shape, size,

colour etc)I map from data to aestheics (i.e. different colour per group)I layer = geometric object + data/aesthetics + statistical

transformationI a graphic will also have scale(s) and a co-ordinate systemI may also have facets - subsetting plot by some characteristic(s)

Page 4: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Quick note on “plot”

I The “plot” function in base R is just a wrapper for lots ofmethods

I Behaviour depends on what object is suppliedI Can require some manual tinkering to get it to work as requiredI Example, using the “iris” data, plot petal length vs sepal lengthI Different colour for each speciesI Point size varies by sepal width

Page 5: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Scatterplot with “plot”

with(iris, plot(Sepal.Length, Petal.Length,col = Species, cex = Sepal.Width))

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

12

34

56

7

Sepal.Length

Pet

al.L

engt

h

Page 6: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Problems

I Cannot specify dataset within functionI No legend, have to add manually via legend() which is fiddly to

useI Arguments are a bit “dumb” - we need to rescale Sepal.Width

to get better point sizesI Need to use different functions to add new geometric objects,

e.g. regression lines

Page 7: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Qplot

qplot(Sepal.Length, Petal.Length, data = iris,color = Species, size = Sepal.Width)

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Page 8: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Qplot()

I Qplot is the nearest equivalent to plot() in ggplot2I For single layer plots this is easy enough to useI Use geom argument to change plot type

Page 9: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Boxplot exampleqplot(Species, Sepal.Length, data = iris, geom="boxplot")

5

6

7

8

setosa versicolor virginicaSpecies

Sep

al.L

engt

h

Page 10: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Violin exampleqplot(Species, Sepal.Length, data = iris, geom="violin")

5

6

7

8

setosa versicolor virginicaSpecies

Sep

al.L

engt

h

Page 11: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Histogram exampleqplot(Sepal.Length, data = iris, fill = Species)

0

5

10

4 5 6 7 8Sepal.Length

coun

t

Species

setosa

versicolor

virginica

Page 12: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

ggplot()

I For multilayer plots or where more flexibility is requiredI ggplot() sets up the default data and aesthetic mappingsI add layers using the “+” operator and appropriate functionsI all aesthetic mappings are wrapped in aes() functionI global changes (e.g. set all points to “red”) go outside

Page 13: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Iris scatterplot againggplot(data = iris, aes(x = Sepal.Length,

y = Petal.Length, color = Species)) +geom_point(aes(size = Sepal.Width))

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Page 14: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Alternativeggplot(data = iris, aes(x = Sepal.Length,

y = Petal.Length, color = Species)) +geom_point(size = 5)

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h Species

setosa

versicolor

virginica

Page 15: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Adding to plots

I ggplots are objects so you can save them as you goI You can then add new layers etc to the saved object

gg1 <- ggplot(data = iris, aes(x = Sepal.Length,y = Petal.Length, color = Species)) +

geom_point(aes(size = Sepal.Width))

gg2 <- gg1 + geom_smooth()gg2

Page 16: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Iris plot with loess smoother

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Page 17: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Add contours

gg3 <- gg2 + geom_density2d()gg3

Page 18: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Add contours

2

4

6

5 6 7 8Sepal.Length

Pet

al.L

engt

h

Species

setosa

versicolor

virginica

Sepal.Width

2.0

2.5

3.0

3.5

4.0

Page 19: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Note on statistical transformations

I The loess smoother and contours in the previous plot are notpart of data

I In base plot we would have to calculate them firstI In ggplot2 the stat_*() functions do such tranformations for usI Often the corresponding geom_*() function does this

automaticallyI For more flexibility, use the stat_*() function and specify the

geom you need

Page 20: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Histogram example using stat_bin

I stat_bin performs a 1d “binning” transformationI i.e. a histogram transformation

ggplot(data = iris,aes(x = Sepal.Length, fill = Species)) +

stat_bin(binwidth=1,position="dodge")

Page 21: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Histogram example using stat_bin

0

10

20

30

4 6 8Sepal.Length

coun

t

Species

setosa

versicolor

virginica

Page 22: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Density plot with facet

I Use facet_grid() to subset plots by up to 2 variablesI Function takes a formula as its main argumentI Row variable on left, column variable on rightI Use . if no variable needed

ggplot(data = iris, aes(x = Sepal.Length)) +geom_density(fill = Species) +facet_grid(Species ~ .)

Page 23: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Density plot with facet

0.0

0.4

0.8

1.2

0.0

0.4

0.8

1.2

0.0

0.4

0.8

1.2

setosaversicolor

virginica

5 6 7 8Sepal.Length

dens

ity

Species

setosa

versicolor

virginica

Page 24: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Scale + axis control

I Adding title labels, axes etc is similar to adding layersI Use the “+” operator with the appropriate functions

ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",

x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")

Page 25: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Scale + axis control

0

5

10

15

100 200 300Arrests for Assault (per 100 000)

Arr

ests

for

Mur

der

(per

100

000

)

UrbanPop

40

50

60

70

80

90

Violent Crime Rates by US State, 1973

Page 26: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Logged x axis

ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",

x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")

+ scale_x_log10()

Page 27: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Logged x axis

0

5

10

15

100Arrests for Assault (per 100 000)

Arr

ests

for

Mur

der

(per

100

000

)

UrbanPop

40

50

60

70

80

90

Violent Crime Rates by US State, 1973

Page 28: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Conclusion

I ggplot2 works best with “long” format dataI One row per observation, rather than different obs in different

columnsI See “reshape2” package for easy conversion between “wide”

and “long” data formats

Page 29: Intro to ggplot2 - Sheffield R Users Group, Feb 2015

Where to learn more

I Web documentation is a good place to startI http://docs.ggplot2.orgI Lots of examples on blogs, stackoverflow etc.I We have only scratched the surface here!I Why not bring some example data visualisations to the next

meeting?I Tweet your plots @Sheffield_R_