OsamaMahmoud *** Website: E-mail ......Osama Mahmoud Data visualisation techniques using R 2/44 Base...

Post on 14-Dec-2020

1 views 0 download

Transcript of OsamaMahmoud *** Website: E-mail ......Osama Mahmoud Data visualisation techniques using R 2/44 Base...

Data visualisation techniques using R

****************Osama Mahmoud

***Website: http://osmahmoud.comE-mail: o.mahmoud@bristol.ac.uk

*****30 April 2019

Goals

In this session, we aim to cover:What the R base graphics and their types are.How to produce base plots in R.How to save your generated graphics.How to use ggplot2 in R to produce advanced graphics.What the building-blocks of ggplot2 graphs are.

Osama Mahmoud Data visualisation techniques using R 1 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Contents

1 Introduction to data visualisation

2 Overview of base graphics

3 Advanced graphics

4 Plot building-blocks

Osama Mahmoud Data visualisation techniques using R 2 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Types of R graphics

Base graphics.Grid graphics.Lattice graphics.ggplot2 graphics.

Osama Mahmoud Data visualisation techniques using R 3 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Base graphics

Osama Mahmoud Data visualisation techniques using R 4 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Installing the R package: BristolVis

> install.packages("drat")

> drat::addRepo("statcourses")

> install.packages("BristolVis", type="source")

Osama Mahmoud Data visualisation techniques using R 5 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Overview of base graphics

Graphics can be easily created in R and subsequently included inWord, Power Point, LATEX, etc.Let’s plot a simple graphic:

> plot(rate ~ conc, data = Puromycin)This implicitly opens a new graphics window.

0.0 0.2 0.4 0.6 0.8 1.0

5010

015

020

0

conc

rate

Osama Mahmoud Data visualisation techniques using R 6 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Overview of base graphics

Change settings of active graphic window> par(mfrow = c(1,2), mar = c(4,4,0,0) + 0.1)> x <- seq(-3,3, 0.1)> plot(x, x^2, type = ’l’)> plot(x, x^3, type = ’l’)

−3 −2 −1 0 1 2 3

02

46

8

x

x^2

−3 −2 −1 0 1 2 3

−20

−10

010

20

x

x^3

Osama Mahmoud Data visualisation techniques using R 7 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Overview of base graphics

Close active graphics window:

> dev.off()

Close all graphics windows:> graphics.off()

Osama Mahmoud Data visualisation techniques using R 8 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Histograms

Osama Mahmoud Data visualisation techniques using R 9 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Histograms

Simple histogram (next slide left)

> data(birthweight, package = “BristolVis”)> hist(birthweight$weightgain)

with a fixed set of “bars”: (breaks give number of cut-points)

> hist(birthweight$weightgain, breaks = 3)with a defined scale for the y-axis:

> hist(birthweight$weightgain, breaks = 3, ylim = c(0,10))with colored bars:

> hist(birthweight$weightgain, breaks = 3, ylim = c(0,10), col = gray(4:8 / 8))

with arbitrary text, vertical and double text size (next slide right)

> text(1500, 5, "some text", srt = 90, cex = 2)

Osama Mahmoud Data visualisation techniques using R 10 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Histograms

Histogram of birthweight$weightgain

birthweight$weightgain

Fre

qu

en

cy

500 1000 1500 2000 2500 3000 3500

01

23

45

Histogram of the weight gain

WG

Fre

qu

en

cy

0 1000 2000 3000 4000

02

46

81

0

so

me

text

Osama Mahmoud Data visualisation techniques using R 11 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Bar plots

Osama Mahmoud Data visualisation techniques using R 12 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Bar plots

Simple bar plot (next slide left)

> data(med, package = “BristolVis”)> treatment = table(med$treatment)> barplot(treatment, ylim = c(0, 500))

stacked bar plot:

> health_treat = table(med$health, med$treatment)> barplot(health_treat, ylim = c(0, 500))

with juxtaposed bars:barplot(health_treat, ylim = c(0, 300), beside = TRUE)

with legend (next slide right)

> barplot(health_treat, ylim = c(0, 300), beside = TRUE,legend.text = rownames(health_treat))

Osama Mahmoud Data visualisation techniques using R 13 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Bar plots

Existing New

01

00

20

03

00

40

05

00

Existing New

Poor

Fair

Good

nu

mb

er

of

pa

tie

nts

05

01

00

15

02

00

25

03

00

Osama Mahmoud Data visualisation techniques using R 14 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Box plots

Osama Mahmoud Data visualisation techniques using R 15 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Box plots

Simple box plot (next slide left)

> boxplot(birthweight$weight)

grouped by sex:

> boxplot(weight ~ sex, data = birthweight)

with new labels for sex, main title and a defined y-scale (next slide right)

> boxplot(weight ~ sex, data = birthweight, names =c("girls", "boys"), ylab = "birth weight [g]", ylim =c(2000, 4000), main = "Distribution of birth weight")

with additional line:> abline(h = 3000, col = "gray", lty = 4)

Osama Mahmoud Data visualisation techniques using R 16 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Box plots2

40

02

60

02

80

03

00

03

20

03

40

03

60

0

girls boys

20

00

25

00

30

00

35

00

40

00

Distribution of birth weight

bir

th w

eig

ht

[g]

Osama Mahmoud Data visualisation techniques using R 17 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Scatter plots

Osama Mahmoud Data visualisation techniques using R 18 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Scatter plots

Simple scatter plot (next slide left)

> plot(weightgain ~ weight, data = birthweight)

for a subset:

> plot(weightgain ~ weight, data = birthweight, subset =(sex == "F"))

with colored solid points by sex and add a legend (next slide right)

> plot(weightgain ~ weight, data = birthweight, subset =(sex == "M"), col = "blue", pch = 20)> points(weightgain ~ weight, data = birthweight, subset =(sex == "F"), col = "red", pch = 20) # add girls data> legend("bottomleft", title = "sex", legend = c("girls","boys"), col = c("red","blue"), pch = 20) # add legend

Osama Mahmoud Data visualisation techniques using R 19 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Scatter plots

2400 2600 2800 3000 3200 3400 3600

10

00

15

00

20

00

25

00

30

00

weight

we

igh

tga

in

3000 3200 3400 3600

15

00

20

00

25

00

30

00

Weight gain (for boys and girls)

weight [g]

we

igh

t g

ain

(g

)

sex

girls

boys

Osama Mahmoud Data visualisation techniques using R 20 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Examples of the graphic library

An overview of available demos for the built-in R library (graphics):> demo(graphics)

Osama Mahmoud Data visualisation techniques using R 21 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Saving plots

Osama Mahmoud Data visualisation techniques using R 22 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Saving plots

Graphics can be saved directly from within R using different devices,e.g. postscript(), pdf(), bmp(), (see ?Devices)For publications vector graphics such as PDFs or postscript files arepreferable to pixel graphics such as jpg, bmp, etc.

> pdf("Fig1.pdf", width = 10, height = 7)> boxplot(weight ~ sex, data = birthweight)> dev.off()

Osama Mahmoud Data visualisation techniques using R 23 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Advanced Graphics Usingggplot2

Osama Mahmoud Data visualisation techniques using R 24 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Remember: Using base graphics> plot(med[med$health=="Poor",]$age, med[med$health=="Poor",]$time,xlim=c(30,72), ylim=c(0.5,13))> points(med[med$health=="Fair",]$age, med[med$health=="Fair",]$time,col=2)

> points(med[med$health=="Good",]$age, med[med$health=="Good",]$time,

col=4)

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●●

●●

●●

●●

30 40 50 60 70

02

46

810

12

med[med$health == "Poor", ]$age

med

[med

$hea

lth =

= "P

oor"

, ]$t

ime

●● ●

●●

● ●●

●●

●● ●●

●●●

● ●●

●●●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

●●●

●●●

●●●

●●

●●

●●

● ● ●

●●

● ●

●● ●

●●

●●

●●●

●● ●●

●●●

●●

● ● ●

●●●

●●

● ● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●●

●●●

●●

●●●

●●

●●

●●

●●●

●● ●●●

●●

●●

●●

●● ●●

●●●●

●●

●● ●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●● ●

●●

●●

●●

●● ●●●

●●●

●●

● ●●●

● ●●

●●

●●

●●

● ●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●●

●●

●●● ●

●●

●●

● ●

●●

●● ●

●●

● ●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●● ●

Osama Mahmoud Data visualisation techniques using R 25 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Remember: Using base graphics

We had to manually set the scales using the xlim and ylimparameters.

We had not created a legend. We would need to use the legendfunction to create one.

The default axis labels were terrible!

Osama Mahmoud Data visualisation techniques using R 26 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Equivalent graphics using ggplot2

> library(ggplot2)> g = ggplot(data=med, aes(x=age, y=time))> g + geom_point(aes(colour=health))

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

● ●

●●

●●

●●●

● ●

● ●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

0.0

2.5

5.0

7.5

10.0

12.5

40 50 60 70

age

time

factor(health)

Poor

Fair

Good

Osama Mahmoud Data visualisation techniques using R 27 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Factor of point sizes

g + geom_point(aes(size=health))

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

● ●

●●

● ●●

●●

●●

●●

●● ●

● ●

●●

●●●●

●● ●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

● ●●

●●

●●●

●●●●

● ●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●● ●●

●●

0.0

2.5

5.0

7.5

10.0

12.5

40 50 60 70

age

time

health

Poor

Fair

Good

Osama Mahmoud Data visualisation techniques using R 28 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Example of a line chart

g + geom_line(aes(colour=health, size = health)))

0.0

2.5

5.0

7.5

10.0

12.5

40 50 60 70

age

time

health

Poor

Fair

Good

Osama Mahmoud Data visualisation techniques using R 29 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Geometric objects

Points, bars and lines are examples of geom’s. Some useful standardgeoms and their equivalent base graphic counter part:

Plot Name Geom Base graphicBar chart bar barplotBox-and-whisker boxplot boxplotHistogram histogram histLine plot line plot and linesScatter plot point plot and points

Osama Mahmoud Data visualisation techniques using R 30 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

More complicated functions

The idea of graphical layers, enables constructing more complicatedfunctions, e.g.:p = ggplot(mpg, aes(x = displ, y = cty)) +geom_point(aes(colour=factor(cyl))) +stat_smooth(aes(colour=factor(cyl)))

Osama Mahmoud Data visualisation techniques using R 31 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Understanding ggplot2 philosophy

Each ggplot command adds iteratively layers. A single layer maycomprise of four elements:

an aesthetic and data mapping;a geometric object (geom);a statistical transformation (stat);a position adjustment, i.e. how should overlapped objects be handled.

Osama Mahmoud Data visualisation techniques using R 32 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Understanding ggplot2 philosophy

For example, the command:g + geom_point(aes(colour=health))

actually calls (in the background) the command:g + layer(data = med, #inheritedmapping = aes(color=health), #x and y are inherited.stat = "identity",geom = "point",position = "identity",params = list(na.rm=FALSE))

Osama Mahmoud Data visualisation techniques using R 33 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Plot building-blocks

Osama Mahmoud Data visualisation techniques using R 34 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Initial plot object

An initial ggplot object, can be setup using the ggplot() functionwhich has two arguments:data (takes a data frame)an aesthetic mapping (creates default aesthetic attributes)

g = ggplot(data=mpg, mapping=aes(x=displ, y=cty,colour=factor(cyl)))Or equivelently,g = ggplot(mpg, aes(displ, cty, colour=factor(cyl)))

doesn’t actually produce anything to be displayed, it just sets the intialplot object. We need to add layers for that to happen.

Osama Mahmoud Data visualisation techniques using R 35 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

The geom_ functions

The geom_ functions perform the actual rendering in a plot, e.g. a linegeom will create a line plot and a point geom creates a scatter plot.

Each geom has a list of aesthetics that it accepts such as x , y ,colour and size.

However, some geoms have unique elements. For example, thegeom_errorbar requires arguments ymax and ymin.

The full list of aesthetics can be displayed by:

> ggplot2:::.all_aesthetics

Osama Mahmoud Data visualisation techniques using R 36 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

The geom_ functions

This table gives names and descriptions of some commonly usedgeoms:

Name Descriptionabline Line, specified by slope and interceptboxplot Box and whiskers plotdensity Kernel density plothistogram Histogramsjitter Individual points are jittered to avoid overlapstep Connect observations by stairs

Osama Mahmoud Data visualisation techniques using R 37 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Combining geoms

I will show how to combine more that one geom function to produce abit more complex plots. If we consider the mpg data set, a base ggplotobject:g = ggplot(mpg, aes(x=factor(cyl), y=hwy)) will do nothing.Now we’ll create a boxplot: (g1 = g + geom_boxplot())

●●

20

30

40

5 6 7

cyl

hwy

Osama Mahmoud Data visualisation techniques using R 38 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Combining geoms

Previous figure was a boxplot of all the mpg data, a more useful plotwould be to have individual boxplots conditional on number of cylinders:(g2 = g + geom_boxplot(aes(x=factor(cyl), group=cyl)))

●●

●●

20

30

40

4 5 6 8

factor(cyl)

hwy

Osama Mahmoud Data visualisation techniques using R 39 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Combining geoms

We are not restricted to a single geom. When data sets are reasonablysmall, it is useful to display the data on top of the boxplots:(g3 = g2 + geom_dotplot(aes(x=factor(cyl), group=cyl),binaxis="y", stackdir="center", binwidth=0.25,stackratio=2))

●●

●●

● ● ● ●

● ●

● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●

● ● ● ● ● ● ●

● ● ● ●

● ●

● ●

● ●

● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

● ● ● ● ● ● ●

● ● ● ●

● ● ● ●

● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●

● ● ●

● ● ● ● ●

● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ●

● ● ●

● ●

● ● ●

● ●

● ●

20

30

40

4 5 6 8

cyl

hwy

Osama Mahmoud Data visualisation techniques using R 40 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Standard plots

There are a few standard geom ’s that are particular useful:

geom_line : a line plot.geom_boxplot : produces a boxplot.geom_point : a scatter plot.geom_dotplot: a dot plot.geom_bar : produces a standard barplot that counts the x values.geom_text : adds labels to specified points (as geom_point but drawlabels rather than points).geom_raster: Similar to levelplot (heatmap).

Osama Mahmoud Data visualisation techniques using R 41 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Useful links

Choice of graphic coloursNames of colours in various formats:http://colorbrewer2.org/

R Course: Shiny web-pageThe RVis tool for learners of data visualisation using R:http://bristol-medical-stat.bristol.ac.uk:3838/RVis/

R package: BristolVis web-pageThe BristolVis package that associated with these materials:https://github.com/statcourses/BristolVis

Osama Mahmoud Data visualisation techniques using R 42 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

References

888

Osama Mahmoud Data visualisation techniques using R 43 / 44

Base graphics Introduction Base graphics Advanced graphics Building-blocks

Thank You

Osama Mahmoud Data visualisation techniques using R 44 / 44