R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a...
Transcript of R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a...
![Page 1: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/1.jpg)
R: A Gentle Introduction
Vega Bharadwaj | George Mason University Data Services
![Page 2: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/2.jpg)
Part I: Why R?
![Page 3: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/3.jpg)
What do YOU know about R and why do you want to learn it?
![Page 4: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/4.jpg)
Reasons to use R
• Free and open-source
• User-created “packages” available to download allow for an endless number of things you can do in R
• Highly customizable graphing capabilities
• Lots of free documentation and tutorials available on the web
• Beware of bad documentation, too
• You don’t need to be a computer scientist to code in R
• Packages do a lot of the “programming” (applying fundamental CS concepts) for you
![Page 5: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/5.jpg)
R for historians
https://programminghistorian.org/lessons/data_wrangling_and_management_in_R
![Page 6: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/6.jpg)
R for scientists
https://rcompanion.org/rcompanion/d_10.html
![Page 7: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/7.jpg)
R for social scientists
http://personality-project.org/r/psych/HowTo/factor.pdf
![Page 9: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/9.jpg)
Part II: Why code?
![Page 10: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/10.jpg)
What is pointing and clicking?
• Clicking, dragging, and using buttons and specified text boxes in user-friendly applications to do things
![Page 11: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/11.jpg)
Drawbacks of pointing and clicking
• Does not usually involve an automatic way to keep track of all of your steps
• The things you can do are limited by the number of buttons available
![Page 12: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/12.jpg)
Example: Microsoft Word
• Suppose you had to edit a Microsoft Word document…• Move paragraph 1 between
paragraphs 3 and 4• Relabel all paragraphs in
chronological order• Move paragraph 2 to the end• Remove all paragraph labels• Create headlines by
boldfacing the first line of each paragraph and separating it from the rest of the paragraph with a single line break
• Indent each paragraph below headlines
• Change font to 12 pt. “Times New Roman”
![Page 13: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/13.jpg)
Example: Microsoft Word, 2
• Troubles with this situation:
• Word’s capabilities are limited
• Lots of rearranging to do (human effort)
• No buttons available through Word that can automatically put paragraphs in the order you want
• No way to document all these steps while ensuring 100% accuracy
• Might be some ambiguity with English language
• What if you had to hand this task over to someone else?
![Page 14: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/14.jpg)
What is coding?
• The process of writing out a list of instructions for a computer to read, interpret, and do
![Page 15: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/15.jpg)
Summary: pointing and clicking vs. coding
Pointing and clicking Coding
SPECIFICITY Limited by number of
buttons available
Do certain things that
can’t be done through
pointing and clicking
alone
REPRODUCIBILITY No innate way to keep
track of steps while
minimizing error
By nature of coding,
keep track of everything
you do and save these
steps for future use
![Page 16: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/16.jpg)
Part III: R & RStudio
![Page 17: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/17.jpg)
R and RStudio
• R is a programming language for statisticians
• Uses code to allow you to efficiently reshape datasets, perform statistical tests, and create graphics
• RStudio is an integrated development environment (IDE) for R
• Translates some R commands into point-and-click features
• Provides a user-friendly visual interface in which to code
![Page 18: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/18.jpg)
R vs. RStudio
![Page 19: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/19.jpg)
Open RStudio
![Page 20: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/20.jpg)
Different ways to code
• Console
• Quickly enter temporary commands
• Script file
• A text document in which you save blocks of code you will want to recreate later
![Page 21: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/21.jpg)
Exercise: R as a calculator
• Type “5-3” into the console and hit the “Enter” key
• Things to note:
• “>” indicates where you should enter your input—never type this in yourself!
• “[1]” indicates the output’s first line
![Page 22: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/22.jpg)
Exercise: R as a calculator, 2
• Type “5-” into the console and hit the “Enter” key
• Observe what happens
• Type “3” and hit the “Enter” key
• “+” indicates R is expecting more input
![Page 23: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/23.jpg)
Exercise: R as a calculator, 3.1
• Create a new R script
![Page 24: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/24.jpg)
Script files (.R)
• Script files are how you save the R code you want to recreate later
• Every line that begins with “#” is a comment, not interpreted by R as code
![Page 25: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/25.jpg)
Exercise: R as a calculator, 3.2
• Type “5-3” into your R script, followed by “5+3”, separated by a line break exactly as it looks like below
![Page 26: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/26.jpg)
Exercise: R as a calculator, 3.3
• Highlight the first line only and click the “Run” button
![Page 27: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/27.jpg)
Exercise: R as a calculator, 3.4
• Examine the output in the “Console”
• Do the same for “5+3”
![Page 28: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/28.jpg)
Concepts: the things you code
• Objects (NOUNS)
• The things you work with in R, i.e. datasets and statistical analysis information
• Functions (VERBS)
• The actions you perform in R, usually on objects
![Page 29: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/29.jpg)
Functions in Excel
• Functions are indicated by their name, followed IMMEDIATELY by parentheses (see text in red)
• Arguments are references to objects (in this case, specific cells) or other types of descriptors that provide information to the function
=SUM(A1, A2)
![Page 30: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/30.jpg)
Object creation
• Use “<-” to create new objects
• Type the object name into the console to get its value
![Page 31: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/31.jpg)
Exercise: objects & functions
• Add the following to your R script and run each line:
Pay attention to spacing!
![Page 32: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/32.jpg)
Exercise: objects & functions, 2
• Examine the output and the “Environment”
![Page 33: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/33.jpg)
Other types of objects
![Page 34: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/34.jpg)
Packages
• A set of functions and object templates available to download and use directly through RStudio
• Ordinarily, you can open them up using checkboxes, but we will do so using code
![Page 35: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/35.jpg)
Summary: what you do in R
Using code…
1. Create objects (“nouns”)
2. Use functions (“verbs”) to do things to objects
![Page 36: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/36.jpg)
Part IV: Working with Data in R
![Page 37: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/37.jpg)
What is a CSV file?
• CSV stands for “comma-separated values”
• Preferred format for working with data in R
• Can be opened in Excel
• Why CSV over Excel format?
• .XLSX files can cause problems in R
![Page 38: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/38.jpg)
CSV: Excel vs. text editor
![Page 39: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/39.jpg)
Functions and datasets
• When working with datasets, it may be necessary to work with more complicated functions
• Arguments without an equals sign (“positional”) must always be in the same spot whenever the function is called
read.table(datafile, header=TRUE, sep=",")
Positional
Argument
Named
Argument
Named
Argument
Function
![Page 40: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/40.jpg)
Other function examples
help()
library(ggplot2)
read.table(datafile, header=TRUE, sep=",")
read.table(datafile,header=TRUE,sep=",")
ggplot(mydata, aes(age, fare)) + geom_point(aes(color =factor(survived)))
![Page 41: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/41.jpg)
Script and dataset management
• For every project, you must create a unique folder on your computer in which to store all your datasets (.CSV files) and scripts (.R files)
![Page 42: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/42.jpg)
Set working directory
• Direct R to the right file folder
![Page 43: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/43.jpg)
Save R script
NO NEED TO SPECIFY FILE EXTENSION!
![Page 44: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/44.jpg)
Exercise: Load in data
![Page 45: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/45.jpg)
Load in data, 2
![Page 46: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/46.jpg)
Load in data, 3
![Page 47: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/47.jpg)
Load in data, 4
Copy & paste into R script
![Page 48: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/48.jpg)
Load in data, 5
• Highlight and run each line
• Examine output in the CONSOLE
![Page 49: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/49.jpg)
Load in data, 6
• You can close out of dataset and click “titanic_r” in ENVIRONMENT to open it up again
![Page 50: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/50.jpg)
Installing and loading packages
• To install a package, use the install.packages() function followed by the package name in double quotes
• To load a package, use the library() function followed by the package name without quotes
![Page 51: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/51.jpg)
Using the ggplot2 package
• Copy and paste the following text into your script (after inserting some line breaks):
# install.packages(“ggplot2”)
library(ggplot2)
• REMEMBER: the “#” indicates a comment that is not interpreted by R as code
• We left this function as a comment because ggplot2is already installed
![Page 52: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/52.jpg)
Exercise: Exploratory data analysis• Copy and paste the following lines of code into your
script (after inserting some line breaks):
head(titanic_r)
str(titanic_r)
summary(titanic_r$gender)
table(titanic_r$pclass, titanic_r$gender)
![Page 53: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/53.jpg)
Exercise: Create graphs
• Copy and paste the following lines of code into your script (after inserting some line breaks):
qplot(pclass, fill=gender, data=mydata)
ggplot(titanic_r, aes(age, fare)) + geom_point(aes(color = factor(survived)))
![Page 54: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/54.jpg)
Things to remember
• R is case-sensitive
• No spaces between function name and opening parenthesis
• Comment every block of code
• Leave line breaks after every code block
![Page 55: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/55.jpg)
For more coding practice:
http://infoguides.gmu.edu/learn_r/101
![Page 56: R: A Gentle Introduction - Data Services · Part III: R & RStudio. R and RStudio •R is a programming language for statisticians •Uses code to allow you to efficiently reshape](https://reader033.fdocuments.us/reader033/viewer/2022042311/5ed9c9b2d1892709d926a222/html5/thumbnails/56.jpg)
Workshop resources:
https://dataservices.gmu.edu/workshops/r