Intro to ggplot2 - Sheffield R Users Group, Feb 2015
-
Upload
paul-richards -
Category
Data & Analytics
-
view
275 -
download
0
Transcript of Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Introduction to ggplot22
Paul Richards, ScHARR, The University of Sheffield
Thursday, February 26, 2015
Introduction
I ggplot2 is a package written by Hadley WickhamI Powerful but easy to use functions for 2D graphicsI Based on the “Grammar of Graphics” theory by Leland
WilkinsonI use install.packages() to install latest version
Concepts
I ggplot2 works with data.framesI aesthetic = a feature we can see on the graphic (shape, size,
colour etc)I map from data to aestheics (i.e. different colour per group)I layer = geometric object + data/aesthetics + statistical
transformationI a graphic will also have scale(s) and a co-ordinate systemI may also have facets - subsetting plot by some characteristic(s)
Quick note on “plot”
I The “plot” function in base R is just a wrapper for lots ofmethods
I Behaviour depends on what object is suppliedI Can require some manual tinkering to get it to work as requiredI Example, using the “iris” data, plot petal length vs sepal lengthI Different colour for each speciesI Point size varies by sepal width
Scatterplot with “plot”
with(iris, plot(Sepal.Length, Petal.Length,col = Species, cex = Sepal.Width))
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
12
34
56
7
Sepal.Length
Pet
al.L
engt
h
Problems
I Cannot specify dataset within functionI No legend, have to add manually via legend() which is fiddly to
useI Arguments are a bit “dumb” - we need to rescale Sepal.Width
to get better point sizesI Need to use different functions to add new geometric objects,
e.g. regression lines
Qplot
qplot(Sepal.Length, Petal.Length, data = iris,color = Species, size = Sepal.Width)
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
Qplot()
I Qplot is the nearest equivalent to plot() in ggplot2I For single layer plots this is easy enough to useI Use geom argument to change plot type
Boxplot exampleqplot(Species, Sepal.Length, data = iris, geom="boxplot")
5
6
7
8
setosa versicolor virginicaSpecies
Sep
al.L
engt
h
Violin exampleqplot(Species, Sepal.Length, data = iris, geom="violin")
5
6
7
8
setosa versicolor virginicaSpecies
Sep
al.L
engt
h
Histogram exampleqplot(Sepal.Length, data = iris, fill = Species)
0
5
10
4 5 6 7 8Sepal.Length
coun
t
Species
setosa
versicolor
virginica
ggplot()
I For multilayer plots or where more flexibility is requiredI ggplot() sets up the default data and aesthetic mappingsI add layers using the “+” operator and appropriate functionsI all aesthetic mappings are wrapped in aes() functionI global changes (e.g. set all points to “red”) go outside
Iris scatterplot againggplot(data = iris, aes(x = Sepal.Length,
y = Petal.Length, color = Species)) +geom_point(aes(size = Sepal.Width))
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
Alternativeggplot(data = iris, aes(x = Sepal.Length,
y = Petal.Length, color = Species)) +geom_point(size = 5)
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h Species
setosa
versicolor
virginica
Adding to plots
I ggplots are objects so you can save them as you goI You can then add new layers etc to the saved object
gg1 <- ggplot(data = iris, aes(x = Sepal.Length,y = Petal.Length, color = Species)) +
geom_point(aes(size = Sepal.Width))
gg2 <- gg1 + geom_smooth()gg2
Iris plot with loess smoother
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
Add contours
gg3 <- gg2 + geom_density2d()gg3
Add contours
2
4
6
5 6 7 8Sepal.Length
Pet
al.L
engt
h
Species
setosa
versicolor
virginica
Sepal.Width
2.0
2.5
3.0
3.5
4.0
Note on statistical transformations
I The loess smoother and contours in the previous plot are notpart of data
I In base plot we would have to calculate them firstI In ggplot2 the stat_*() functions do such tranformations for usI Often the corresponding geom_*() function does this
automaticallyI For more flexibility, use the stat_*() function and specify the
geom you need
Histogram example using stat_bin
I stat_bin performs a 1d “binning” transformationI i.e. a histogram transformation
ggplot(data = iris,aes(x = Sepal.Length, fill = Species)) +
stat_bin(binwidth=1,position="dodge")
Histogram example using stat_bin
0
10
20
30
4 6 8Sepal.Length
coun
t
Species
setosa
versicolor
virginica
Density plot with facet
I Use facet_grid() to subset plots by up to 2 variablesI Function takes a formula as its main argumentI Row variable on left, column variable on rightI Use . if no variable needed
ggplot(data = iris, aes(x = Sepal.Length)) +geom_density(fill = Species) +facet_grid(Species ~ .)
Density plot with facet
0.0
0.4
0.8
1.2
0.0
0.4
0.8
1.2
0.0
0.4
0.8
1.2
setosaversicolor
virginica
5 6 7 8Sepal.Length
dens
ity
Species
setosa
versicolor
virginica
Scale + axis control
I Adding title labels, axes etc is similar to adding layersI Use the “+” operator with the appropriate functions
ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",
x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")
Scale + axis control
0
5
10
15
100 200 300Arrests for Assault (per 100 000)
Arr
ests
for
Mur
der
(per
100
000
)
UrbanPop
40
50
60
70
80
90
Violent Crime Rates by US State, 1973
Logged x axis
ggplot(USArrests, aes(x=Assault,y=Murder))+ geom_point(aes(size=UrbanPop))+ labs(title = "Violent Crime Rates by US State, 1973",
x = "Arrests for Assault (per 100 000)",y = "Arrests for Murder (per 100 000)")
+ scale_x_log10()
Logged x axis
0
5
10
15
100Arrests for Assault (per 100 000)
Arr
ests
for
Mur
der
(per
100
000
)
UrbanPop
40
50
60
70
80
90
Violent Crime Rates by US State, 1973
Conclusion
I ggplot2 works best with “long” format dataI One row per observation, rather than different obs in different
columnsI See “reshape2” package for easy conversion between “wide”
and “long” data formats
Where to learn more
I Web documentation is a good place to startI http://docs.ggplot2.orgI Lots of examples on blogs, stackoverflow etc.I We have only scratched the surface here!I Why not bring some example data visualisations to the next
meeting?I Tweet your plots @Sheffield_R_