Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey...
Transcript of Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey...
![Page 1: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/1.jpg)
Introduction Key attributes Capabilities Examples Summary
Why I love R
Alastair Sanderson
School of Physics & Astronomy, University of Birmingham
2012-03-20
![Page 2: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/2.jpg)
Introduction Key attributes Capabilities Examples Summary
Outline
1 Introduction
2 Key attributes
3 Capabilities
4 Examples
5 Summary
![Page 3: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/3.jpg)
Introduction Key attributes Capabilities Examples Summary
Outline
Key attributes of R:
free/open sourcewidely usedwell documenteda high level programming language
Capabilities of R:
data handlingstatistical/numerical analysis & modellingdata visualisationreproducible research
Some examples using R
Summary
![Page 4: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/4.jpg)
Introduction Key attributes Capabilities Examples Summary
Outline
1 Introduction
2 Key attributes
3 Capabilities
4 Examples
5 Summary
![Page 5: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/5.jpg)
Introduction Key attributes Capabilities Examples Summary
R is. . . Freely available
Free/open source software
. . . with free/open source IDEs (integrated developmentenvironment), e.g:
RStudio: http://rstudio.org/Emacs Speaks Statistics (ESS): http://ess.r-project.org/
According to John Chambers, in his excellent book �Softwarefor Data Analysis - Programming with R�, the mission of R is
. . . to enable the best and most thorough exploration of
data possible
. . . with the associated prime directive, that
the computations and the software should be trustworthy:
they should do what they claim, and be seen to do so.
![Page 6: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/6.jpg)
Introduction Key attributes Capabilities Examples Summary
R is. . . Widely used (1/2)
Mature software (v1.0 released in 2000); runs on a wide rangeof platforms; with an annual development update cycle
Large & growing user base (1-2 million)
Many user contributed packages (>3500), very easy to install:
install.packages("mypkg")
library("mypkg")
Figure: Google searches for �R plot�: solidly growing interest
![Page 7: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/7.jpg)
Introduction Key attributes Capabilities Examples Summary
R is. . . Widely used (2/2)
R is used by major institutions, e.g. Google, Facebook, NYTimes, New Scientist etc.
Track R's popularity: http://r4stats.com/popularity
R is the favourite tool for users of Kaggle (a platform forpredictive modelling and analytics competitions):http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools
![Page 8: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/8.jpg)
Introduction Key attributes Capabilities Examples Summary
R is. . . Well documented
Excellent help pages, richly cross-linked:
help(package="base") # List package contents
?data.frame # help on a specific task
?Syntax # help on a general topic
news(package="ggplot2") # details of recent changes
demo("plotmath") # demonstrate maths annotation
browseVignettes() # view index of vignettes in web browser
See �documentation� links at http://www.r-project.org for manuals,wiki, R journal, books etc.
CRAN �task views�: http://cran.r-project.org/web/views/
Type function name without brackets to view its R source code
Many R bloggers, aggregated at http://www.r-bloggers.com/
![Page 9: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/9.jpg)
Introduction Key attributes Capabilities Examples Summary
R is. . . A high level programming language
Functional, object oriented language:http://cran.r-project.org/doc/manuals/R-lang.html
http://cran.r-project.org/doc/manuals/R-ints.html
Debugger, code pro�ling (Rprof) etc.:http://cran.r-project.org/doc/manuals/R-exts.html
Can link to compiled code (C, C++, Fortran) see �?Foreign�;e.g. seamless integration of R with C++ :http://cran.r-project.org/web/packages/Rcpp/index.html
Support for parallel computing:http://cran.r-project.org/web/views/HighPerformanceComputing.html
Writing packages in R:http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf
![Page 10: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/10.jpg)
Introduction Key attributes Capabilities Examples Summary
Outline
1 Introduction
2 Key attributes
3 Capabilities
4 Examples
5 Summary
![Page 11: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/11.jpg)
Introduction Key attributes Capabilities Examples Summary
Data handling
Data input/output: http://cran.r-project.org/doc/manuals/R-data.html
Data structures: e.g. vectors, factors, arrays/matrices,lists/data frames: http://cran.r-project.org/doc/manuals/R-intro.html
?Extract # details of operators to extract/replace parts of data structures
?apply; ?sapply; ?aggregate; ?sweep # vectorised operations in R
?subset; ?transform; ?match; ?merge # data access & join commands
?regex # details of regular expressions capabilities
install.packages("stringr") # make it easier to work with strings
install.packages("RODBC") # Open Database Connectivity interface
Very powerful & convenient data manipulation packages:
plyr : http://cran.r-project.org/web/packages/plyr/index.html
reshape2 :http://cran.r-project.org/web/packages/reshape2/index.html
![Page 12: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/12.jpg)
Introduction Key attributes Capabilities Examples Summary
Statistical analysis & modelling
A brief summary:
?Distributions # details of supported statistical distributions
?RNG # details of random number generation
help(package="stats") # list contents of base stats package
?NA # support for missing data
library("cluster") # cluster analysis
library("boot") # bootstrap resampling
library("survival") # survival analysis
?lm; ?nls; ?anova # linear/non-linear regression/ANOVA
See also these CRAN task views:http://cran.r-project.org/web/views/Distributions.html
http://cran.r-project.org/web/views/Multivariate.html
http://cran.r-project.org/web/views/MachineLearning.html
http://cran.r-project.org/web/views/Spatial.html
http://cran.r-project.org/web/views/Robust.html
![Page 13: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/13.jpg)
Introduction Key attributes Capabilities Examples Summary
Numerical analysis & modelling
Wide variety of capabilities, e.g:
?integrate # numerical integration
?optim, ?nlminb # general-purpose optimisation
?D # symbolic differentiation
install.packages("deSolve") # solve differential equations
library("Matrix") # sparse and dense matrix classes and methods
?spline; ?smooth.spline # spline interpolation
library("splines") # regression spline functions and classes
?prcomp # Principal Components Analysis
See also the following CRAN task views:http://cran.r-project.org/web/views/Optimization.html
http://cran.r-project.org/web/views/HighPerformanceComputing.html
http://cran.r-project.org/web/views/ChemPhys.html
![Page 14: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/14.jpg)
Introduction Key attributes Capabilities Examples Summary
Data visualisation
help(package="graphics") # contents of base graphics package
?Devices # details of available output devices
library(lattice) # excellent for highly structured data
library(grid) # lower-level hierarchical graphics
install.packages("ggplot2") # fantastic graphics package!
install.packages("rgl") # interactive graphics
CRAN task view on graphics:http://cran.r-project.org/web/views/Graphics.html
ggplot2 web resources:
http://had.co.nz/ggplot2/
http://crantastic.org/packages/ggplot2
see Andy's talk
![Page 15: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/15.jpg)
Introduction Key attributes Capabilities Examples Summary
Reproducible research
"The term reproducible research was �rst proposed by Jon Claerbout at Stanford
University and refers to the idea that the ultimate product of research is the
paper along with the full computational environment used to produce the results
in the paper such as the code, data, etc. necessary for reproduction of the
results and building upon the research."
quote from http://en.wikipedia.org/wiki/Reproducibility
Sweave (see �?Sweave�): http://www.statistik.lmu.de/∼ leisch/Sweave/
xtable package - export tables to LATEX or HTML
Using Emacs Org mode (http://orgmode.org) with R:
http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-R.html
http://orgmode.org/worg/org-contrib/babel/uses.html
RC package (see Ian's talk next meeting)
CRAN taskview: http://cran.r-project.org/web/views/ReproducibleResearch.html
![Page 16: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/16.jpg)
Introduction Key attributes Capabilities Examples Summary
Outline
1 Introduction
2 Key attributes
3 Capabilities
4 Examples
5 Summary
![Page 17: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/17.jpg)
Introduction Key attributes Capabilities Examples Summary
Hybrid R plot/image graphics
Figure: Integrating spatial data with a map, using ggplot2 in R
http://blog.revolutionanalytics.com/2012/02/what-are-the-most-popular-bike-routes-in-london.html
![Page 18: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/18.jpg)
Introduction Key attributes Capabilities Examples Summary
A New York Times graphic using R
Figure: Michael Jackson's billboard rankings vs. the Beatles (top, in red)and U2 (bottom; in red)
These charts were done mostly in R and were published withinhours of Michael Jackson's death:
http://blog.revolutionanalytics.com/2009/06/nyt-charts-michael-jacksons-pop-hits.html
See the full, interactive graphic is here:
http://www.nytimes.com/interactive/2009/06/25/arts/0625-jackson-graphic.html
![Page 19: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/19.jpg)
Introduction Key attributes Capabilities Examples Summary
Dates & time series
> dt <- as.Date("2012-03-20")
> dt - as.Date("01/Jan/12", format="%d/%b/%y")
Time difference of 79 days
> months(dt)
[1] "March"
> weekdays(dt)
[1] "Tuesday"
For more information:
?DateTimeClasses; ?Dates
Time series handling & modelling:
?acf # auto-correlation
?ccf # cross-correlation
?arima # Fit ARIMA models to univariate time series
install.package("zoo") # } very useful time series
install.package("forecast") # } packages
CRAN task view: http://cran.r-project.org/web/views/TimeSeries.html
![Page 20: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/20.jpg)
Introduction Key attributes Capabilities Examples Summary
Visualising multivariate galaxy data
2D kernel density estimates; semi-transparency;colour/shape/size encoding
http://www.sr.bham.ac.uk/~ajrs/talks/SandersonAlastair_user2011_talk.pdf
Example (galaxy spatial distribution)
Right Ascension
Dec
linat
ion
−17.5
−17.0
−16.5
−16.0
−15.5
−15.0
200.0 199.5 199.0 198.5 198.0 197.5
Luminosity (Solar)
109
109.5
1010
1010.5
Scaled velocity
−2
−1
0
1
2
Morphology
Early
Late
?
Example (histogram ofgalaxy velocities)
Galaxy velocity (km/s)
Num
ber
of g
alax
ies
0
5
10
15
20
1500 2000 2500 3000 3500
Morphology
Early
Late
?
![Page 21: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/21.jpg)
Introduction Key attributes Capabilities Examples Summary
World Bank data demo - part 1
Load package, download & save data
require("WDI") # load R package to access World Bank data
MSdata <- WDI(indicator="IT.CEL.SETS.P2", start=1990, end=2010)
save(MSdata, file="MSdata.RData")
Plot curves for each country & median line for all countries
## Use shorter name for data column:
MSdata <- transform(MSdata, MSpc = IT.CEL.SETS.P2)
require(ggplot2)
ggplot(data=MSdata, aes(year, MSpc, group=country)) +
geom_line(alpha=0.1) + # semi-transparent lines: 10% of normal
## add median curve for all countries (i.e. over-ride grouping):
stat_summary(fun.y=median, geom="line", aes(group=1), colour="blue") +
scale_y_log10() +
ylab("Mobile cellular subscriptions per 100 people")
![Page 22: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/22.jpg)
Introduction Key attributes Capabilities Examples Summary
Fraction of population with mobile phones
1e−03
1e−01
1e+01
1990 1995 2000 2005 2010year
Mob
ile c
ellu
lar
subs
crip
tions
per
100
peo
ple
![Page 23: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/23.jpg)
Introduction Key attributes Capabilities Examples Summary
World Bank data demo - part 2
Plot 20 most recent countries to get mobile phones
require(plyr) # extremely useful package
## Year when a country with no subscriptions first registered some:
firstMS <- ddply(subset(MSdata, MSpc==0 & any(MSpc>0)), .(country),
summarise, year1 = max(year) + 1)
## sort by the year mobile phone use first starts:
firstMS <- firstMS[order(firstMS$year1), ]
## Create dotplot:
ggplot(data=tail(firstMS, 20), aes(year1, reorder(country, year1))) +
geom_point() +
xlab("Year of first mobile cellular subscriptions") + ylab("")
![Page 24: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/24.jpg)
Introduction Key attributes Capabilities Examples Summary
Year that mobile phone usage starts
Swaziland
Ethiopia
Nepal
SyrianArabRepublic
Chad
Liberia
Mauritania
SierraLeone
Somalia
Afghanistan
Iraq
Mayotte
Micronesia,Fed.Sts.
SaoTomeandPrincipe
Bhutan
Comoros
Guinea−Bissau
Eritrea
Tuvalu
Korea,Dem.Rep.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1998 2000 2002 2004 2006 2008Year of first mobile cellular subscriptions
![Page 25: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/25.jpg)
Introduction Key attributes Capabilities Examples Summary
Outline
1 Introduction
2 Key attributes
3 Capabilities
4 Examples
5 Summary
![Page 26: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/26.jpg)
Introduction Key attributes Capabilities Examples Summary
Conclusions
Is is free/open source software
It has a rapidly growing user base (1-2 million)
It is widely used in academia, business & industry
�Today, all of the Fortune 500 companies use R for their
data analyses�
http://www.r-bloggers.com/open-source-is-opening-data-to-predictive-analytics/
I love R because it empowers the individual by enabling
cutting-edge processing, analysis & visualisation of data,
based on trustworthy computations and software.
![Page 27: Why I love R - Astrophysicsajrs/talks/why_I_love_R.pdf · IntroductionKey attributesCapabilitiesExamplesSummary Why I love R Alastair Sanderson School of Physics & Astronomy, University](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383cecb323da14094d46d5/html5/thumbnails/27.jpg)
Introduction Key attributes Capabilities Examples Summary
Birmingham R User Meeting (BRUM)
http://www.birminghamR.org
Alastair Sanderson: http://www.sr.bham.ac.uk/~ajrs