Data Visualization using R How to get, manage, and present data to tell a compelling science story...
-
Upload
emmett-pippin -
Category
Documents
-
view
216 -
download
0
Transcript of Data Visualization using R How to get, manage, and present data to tell a compelling science story...
![Page 1: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/1.jpg)
Data Visualization using R
How to get, manage, and present data to tell a
compelling science story
William Gunn@mrgunnHead of Academic Outreach, Mendeley
![Page 2: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/2.jpg)
1. A short history of graphical presentation of data
2. Introduction to R
3. Finding, cleaning, and presenting data
4. Reproducibility and data sharing
![Page 3: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/3.jpg)
Data viz has a long history
John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.
![Page 4: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/4.jpg)
Florence Nightingale used dataviz
![Page 5: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/5.jpg)
Modernization of dataviz
![Page 6: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/6.jpg)
Chart junk: good, bad, and ugly
Which presentation is better?
![Page 7: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/7.jpg)
![Page 8: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/8.jpg)
It can be elegant…
![Page 9: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/9.jpg)
![Page 10: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/10.jpg)
Tufte
![Page 11: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/11.jpg)
Tufte
![Page 12: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/12.jpg)
How our eyes and brain perceive
It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or less. This is due to pre-attentive processing.
![Page 13: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/13.jpg)
Shape is a little slower than color!
![Page 14: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/14.jpg)
Pre-attentive processing fails!
![Page 15: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/15.jpg)
There are many “primitive” properties which we
perceive
• Length• Width• Size• Density• Hue• Color intensity• Depth• 3-D orientation
![Page 16: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/16.jpg)
Length
![Page 17: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/17.jpg)
Width
![Page 18: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/18.jpg)
Density
![Page 19: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/19.jpg)
Hue
![Page 20: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/20.jpg)
Color Intensity
![Page 21: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/21.jpg)
Depth
![Page 22: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/22.jpg)
3D orientation
![Page 23: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/23.jpg)
![Page 24: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/24.jpg)
Types of color schemes
• Sequential – suited for ordered data that progress from low to high. Use light colors for low values and dark colors for higher.
• Diverging – uses hue to show the breakpoint and intensity to show divergent extremes.
• Qualitative – uses different colors to represent different categories. Beware of using hue/saturation to highlight unimportant categories.
![Page 25: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/25.jpg)
Sequential
http://colorbrewer2.org/
![Page 26: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/26.jpg)
Diverging
![Page 27: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/27.jpg)
Qualitative
![Page 28: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/28.jpg)
Tips for maps
• Keep it to 5-7 data classes• ~8% of men are red-green
colorblind• Diverging schemes don’t do well
when printed or photocopied• Colors will often render differently
on different screens, especially low-end LCD screens
• http://colorbrewer2.org
![Page 29: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/29.jpg)
Part 2
Introduction to R
![Page 30: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/30.jpg)
Why R?
• Open source tool• Huge variety of packages for any
kind of analysis• Saves time repeating data
processing steps• Allows working with more diverse
types of data and much larger datasets than Excel
• Processing is much faster than Excel• Scripts are easily shareable,
promoting reproducible work
![Page 31: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/31.jpg)
.csv and .xls / xlsx
• Excel files are designed to hold the appearance of the spreadsheet in addition to the data.
• R just wants the data, so always save as .csv if you have tabular data
![Page 32: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/32.jpg)
data structures
• x<-c(1,2,3,4,5,6,7,8,9,10)• x• length(x)• x[1]• x[2]• x<-c(1:10)• x
![Page 33: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/33.jpg)
types of data
• y<-c(“abc”, “def”, “g”, “h”, “i”)• y• class(y)• y[2]• length(y)
• data can be integer (1,2,3,…), numeric (1.0, 2.3, …), character (a, b, c,…), logical (TRUE, FALSE) or other things
![Page 34: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/34.jpg)
Vectors• R can hold data organized a few
different ways• vectors (1,2,3,4) but not (1,2,3,x,y,z)• lists – can hold heterogeneous data
– 1– 2– a
• x
• arrays – multi-dimensional• dataframes – lists of vectors - like
spreadsheets
![Page 35: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/35.jpg)
Vector operations
• x + 1• x• sum(x)• mean(x)• mean(x+1)• x[2]<-x[2]+1• x• x+c(2:3)• x[2:10] + c(2:3)
![Page 36: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/36.jpg)
working with lists• y<-list(name = “Bob”, age = 24)• y• y$name• y[1]• y[[1]]• class(y[1])• class(y[[1]])• y<-list(y$name, “Sue”)• y$name• y$age[2]<-list(33)
![Page 37: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/37.jpg)
Loading data
• data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/traffic_accidents/accidents2010_all.csv", header = TRUE, stringsAsFactors = FALSE)
![Page 38: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/38.jpg)
Selecting subsets of data
• “[“• “$”• which• grep and grepl• subset
![Page 39: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/39.jpg)
PLOTS
• ggplot2 – an implementation of the “grammar of graphics” in R
• a set of graph types and a way of mapping variables to graph features
• graph types are called “geoms”• mappings are “aesthetics”• graphs are built up by layering
geoms
![Page 40: Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley.](https://reader031.fdocuments.us/reader031/viewer/2022013011/551c1dc1550346b24f8b5a32/html5/thumbnails/40.jpg)
Types of geoms
• point – dotplot – takes x,y coords of points
• abline – line layer – takes slope, intercept
• line – connect points with a line• smooth – fit a curve • bar – aka histogram – takes vector of
data• boxplot – box and whiskers• density – to show relative
distributions• errorbar – what it says on the tin