Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17...

52
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training

Transcript of Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17...

Page 1: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Introduction to R

IntroductionsWhat is R?

RStudio LayoutSummary StatisticsYour First R Graph

17 September 2014 Sherubtse Training

Page 2: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Ellen [email protected]

1772-4714

Page 3: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Your Turn Name & department

Research or activities for which you might be using R in the future (i.e., why are you here?)

[Compile group email list]

Page 4: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

What is ?

... an open source programming language for statistical computing and graphics

... with 5895 statistical “add-on” packages contributed by R-users since 1993

... and an extensive online “help” community (internet discussion site)

Page 5: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Windows Mac OS Linux

R YES YES YES FREE 7 13

S-PLUS YES NO YES $2,399/year 6 6

Minitab YES NO NO $1,395 6 5

SPSS YES YES YES $2,250/year 6 7

SAS YES NO YES $6,000 7 13

Operating SystemCost (USD)

# of ANOVA Types Formally Supported

PROGRAM# of Regression Types Formally Supported

(data from Wikipedia)

Why Use R?

Compatible with many operating systems

Page 6: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Windows Mac OS Linux

R YES YES YES FREE 7 13

S-PLUS YES NO YES $2,399/year 6 6

Minitab YES NO NO $1,395 6 5

SPSS YES YES YES $2,250/year 6 7

SAS YES NO YES $6,000 7 13

Operating SystemCost (USD)

# of ANOVA Types Formally Supported

PROGRAM# of Regression Types Formally Supported

(data from Wikipedia)

Why Use R?

FREE!

Page 7: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Windows Mac OS Linux

R YES YES YES FREE 7 13

S-PLUS YES NO YES $2,399/year 6 6

Minitab YES NO NO $1,395 6 5

SPSS YES YES YES $2,250/year 6 7

SAS YES NO YES $6,000 7 13

Operating SystemCost (USD)

# of ANOVA Types Formally Supported

PROGRAM# of Regression Types Formally Supported

(data from Wikipedia)

Why Use R?

Flexible & powerful statistical program

Page 8: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Why Use R?“R has really become the second language for people coming out of grad school now, and there’s an amazing amount of code being written for it” (Max Kuhn, Pfizer statistician)

“The popularity of R at universities could threaten SAS Institute, the privately held business software company that specializes in data analysis software” (NY Times article, 6-Jan 2009)

“Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use [R]...Companies like Google and Pfizer say they use the [R] software for just about anything they can” (NY Times article, 6-Jan 2009)

Page 9: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Besides Statistics, What Can R Do?

• Publication-quality charts & graphs

Page 10: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 11: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 12: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 13: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Besides Statistics, What Can R Do?

• Publication-quality charts & graphs

• Dynamic graphing & animations

Page 14: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 15: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 16: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Besides Statistics, What Can R Do?

• Publication-quality charts & graphs

• Dynamic graphing & animations

• Link with many other applications, e.g., Google Earth, GIS, etc.

... and many, MANY other things

Page 17: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Migratory path of a turkey vulture in 2009 (red), 2010 (blue), 2011 (green)

Page 18: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Abundance data for two different species (blue, red) plotted on Google Earth map

Page 19: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

LET’S GET STARTED...

Page 20: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Installing Rwww.r-project.org

1

Page 21: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

2 Choose a CRAN mirror

3

Choose your Operating System

4 Choose ‘base’ installation

5 Download

Page 22: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Installing RStudiowww.rstudio.com

1

Page 23: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

2

Page 24: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

3

Page 25: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

RStudio Layout

CONSOLE for typing code for immediate execution,

and seeing output

Page 26: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

RStudio Layout

SOURCE PANE for writing and editing code (script) for later

execution later (like a notepad), and a good way to store notes for this training

Page 27: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

RStudio Layout

- WORKSPACE tab shows active variables & data

- HISTORY tab records all the code as you execute them

Page 28: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

RStudio Layout- FILES tab shows files & folders in your workspace

- PLOTS tab for viewing graphs

- PACKAGES tab for installing & updating R packages

- HELP tab for R help

Page 29: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

CREATE FILES & SET YOUR CURRENT WORKING DIRECTORY

Page 30: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

1 Browse to your working directory

Page 31: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

2 Set your working directory

Page 32: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

getwd() to see your current directory

Page 33: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

WHAT IS A WORKING DIRECTORY?1) Create a test script with variables

– check global environment & history

2) Save the test script– where does it save?

3) Close RStudio, then open it from the script– what is the working directory?

4) Close RStudio, then open it from the application– what is the working directory?

NOTE: Can also save workspace with a specific name (Session > Save Workspace As...)

Page 34: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

TO SET A DEFAULT DIRECTORY...(if you just open RStudio from the application,

this is the directory it will use)

MAC: RStudio > PreferencesWINDOWS: Tools > Global Options

Page 35: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 36: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

RStudio Layout- FILES tab shows files & folders in your workspace

- PLOTS tab for viewing graphs

- PACKAGES tab for installing & updating R packages

- HELP tab for R help

Page 37: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

R Help

Page 38: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

R Help

help.start() to access some R resources off-line

help(topic_name) OR ?topic_name if the relevant package is loaded (use quotes for words with spaces)

help.search(topic_name) OR ??topic_name if the package is not loaded

Page 39: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

R Help

Page 40: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

www.rseek.org

Page 41: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Page 42: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

RStudio Layout- FILES tab shows files & folders in your workspace

- PLOTS tab for viewing graphs

- PACKAGES tab for installing & updating R packages

- HELP tab for R help

Page 43: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Packages in this list are already installed. Mark boxes to load the ones you want to use.

Now unload the package and typelibrary(datasets) in the console

Page 44: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Quakes Dataset

• Read the documentation for this dataset– What do these data describe?– How many data records are there?– What are the 5 variables included in this dataset?

• Type the dataset name in the console

• Now try these functions (use your Source pane!): names(quakes)head(quakes)tail(quakes)

Page 45: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Quakes Dataset• Examine how the history pane relates to the console,

and use it to search & run previous commands (or up-arrow). How would you send commands to a script?

• Learn what these functions do: summary()ncol()nrow()dim()View()

Page 46: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Quakes Dataset• Examine a single column of data: quakes$mag

• Calculate the mean and standard deviation of quake magnitudes

??”standard deviation”help.search(“standard deviation”)

Page 47: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Missing Values• Missing values in your data should be entered as

NA (not available)—not as blanks

• For some functions, you need to specify in function arguments what to do with missing values in the data (e.g., na.rm=TRUE)

• Use the function is.na() to determine if there are missing values in a data set

• What does sum(is.na()) tell us?

Create the vector: x <- c(5, 10, 15, NA, 25)...then calculate the mean of the vector

Page 48: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Your First R GraphQuakes Dataset Is there a relationship between the magnitude of an earthquake and the number of stations reporting?

What kind of graph would we plot? How would we expect the graph to look?

Use plot() to create this graph (which package does this function belong to?)

Page 49: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

plot (quakes$mag, quakes$stations)

ALTERNATIVE CODE:with (quakes, plot (mag, stations))

How would you use with() to calculate the mean of quakes$mag?

Page 50: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

Learn to Use Function Arguments

Play around with sending arguments to the plot() function. For now, try:

main="Fiji earthquakes since 1964" xlab="Magnitude"ylab="# of Stations Reporting"col="green"

Then export the graph as a jpeg file & save it (easiest to resize before you export)

Page 51: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

TODAY'S REVIEWHow would you...• list the first few records of a dataset• see a dataset in Excel format• find out the number of rows & columns in a

dataset• calculate the standard deviation of a vector• get summary information on a dataset • find out if there are missing values in a dataset

Page 52: Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.

TODAY'S REVIEWHow would you...• find out the current working directory• create a simple scatterplot• list the column names for a dataset• load a package• find out the number of rows in a dataset• calculate the mean of a vector