R: A Gentle Introduction
Vega Bharadwaj | George Mason University Data Services
Part I: Why R?
What do YOU know about R and why do you want to learn it?
Reasons to use R
• Free and open-source
• User-created “packages” available to download allow for an endless number of things you can do in R
• Highly customizable graphing capabilities
• Lots of free documentation and tutorials available on the web
• Beware of bad documentation, too
• You don’t need to be a computer scientist to code in R
• Packages do a lot of the “programming” (applying fundamental CS concepts) for you
R for historians
https://programminghistorian.org/lessons/data_wrangling_and_management_in_R
R for scientists
https://rcompanion.org/rcompanion/d_10.html
R for social scientists
http://personality-project.org/r/psych/HowTo/factor.pdf
Part II: Why code?
What is pointing and clicking?
• Clicking, dragging, and using buttons and specified text boxes in user-friendly applications to do things
Drawbacks of pointing and clicking
• Does not usually involve an automatic way to keep track of all of your steps
• The things you can do are limited by the number of buttons available
Example: Microsoft Word
• Suppose you had to edit a Microsoft Word document…• Move paragraph 1 between
paragraphs 3 and 4• Relabel all paragraphs in
chronological order• Move paragraph 2 to the end• Remove all paragraph labels• Create headlines by
boldfacing the first line of each paragraph and separating it from the rest of the paragraph with a single line break
• Indent each paragraph below headlines
• Change font to 12 pt. “Times New Roman”
Example: Microsoft Word, 2
• Troubles with this situation:
• Word’s capabilities are limited
• Lots of rearranging to do (human effort)
• No buttons available through Word that can automatically put paragraphs in the order you want
• No way to document all these steps while ensuring 100% accuracy
• Might be some ambiguity with English language
• What if you had to hand this task over to someone else?
What is coding?
• The process of writing out a list of instructions for a computer to read, interpret, and do
Summary: pointing and clicking vs. coding
Pointing and clicking Coding
SPECIFICITY Limited by number of
buttons available
Do certain things that
can’t be done through
pointing and clicking
alone
REPRODUCIBILITY No innate way to keep
track of steps while
minimizing error
By nature of coding,
keep track of everything
you do and save these
steps for future use
Part III: R & RStudio
R and RStudio
• R is a programming language for statisticians
• Uses code to allow you to efficiently reshape datasets, perform statistical tests, and create graphics
• RStudio is an integrated development environment (IDE) for R
• Translates some R commands into point-and-click features
• Provides a user-friendly visual interface in which to code
R vs. RStudio
Open RStudio
Different ways to code
• Console
• Quickly enter temporary commands
• Script file
• A text document in which you save blocks of code you will want to recreate later
Exercise: R as a calculator
• Type “5-3” into the console and hit the “Enter” key
• Things to note:
• “>” indicates where you should enter your input—never type this in yourself!
• “[1]” indicates the output’s first line
Exercise: R as a calculator, 2
• Type “5-” into the console and hit the “Enter” key
• Observe what happens
• Type “3” and hit the “Enter” key
• “+” indicates R is expecting more input
Exercise: R as a calculator, 3.1
• Create a new R script
Script files (.R)
• Script files are how you save the R code you want to recreate later
• Every line that begins with “#” is a comment, not interpreted by R as code
Exercise: R as a calculator, 3.2
• Type “5-3” into your R script, followed by “5+3”, separated by a line break exactly as it looks like below
Exercise: R as a calculator, 3.3
• Highlight the first line only and click the “Run” button
Exercise: R as a calculator, 3.4
• Examine the output in the “Console”
• Do the same for “5+3”
Concepts: the things you code
• Objects (NOUNS)
• The things you work with in R, i.e. datasets and statistical analysis information
• Functions (VERBS)
• The actions you perform in R, usually on objects
Functions in Excel
• Functions are indicated by their name, followed IMMEDIATELY by parentheses (see text in red)
• Arguments are references to objects (in this case, specific cells) or other types of descriptors that provide information to the function
=SUM(A1, A2)
Object creation
• Use “<-” to create new objects
• Type the object name into the console to get its value
Exercise: objects & functions
• Add the following to your R script and run each line:
Pay attention to spacing!
Exercise: objects & functions, 2
• Examine the output and the “Environment”
Other types of objects
Packages
• A set of functions and object templates available to download and use directly through RStudio
• Ordinarily, you can open them up using checkboxes, but we will do so using code
Summary: what you do in R
Using code…
1. Create objects (“nouns”)
2. Use functions (“verbs”) to do things to objects
Part IV: Working with Data in R
What is a CSV file?
• CSV stands for “comma-separated values”
• Preferred format for working with data in R
• Can be opened in Excel
• Why CSV over Excel format?
• .XLSX files can cause problems in R
CSV: Excel vs. text editor
Functions and datasets
• When working with datasets, it may be necessary to work with more complicated functions
• Arguments without an equals sign (“positional”) must always be in the same spot whenever the function is called
read.table(datafile, header=TRUE, sep=",")
Positional
Argument
Named
Argument
Named
Argument
Function
Other function examples
help()
library(ggplot2)
read.table(datafile, header=TRUE, sep=",")
read.table(datafile,header=TRUE,sep=",")
ggplot(mydata, aes(age, fare)) + geom_point(aes(color =factor(survived)))
Script and dataset management
• For every project, you must create a unique folder on your computer in which to store all your datasets (.CSV files) and scripts (.R files)
Set working directory
• Direct R to the right file folder
Save R script
NO NEED TO SPECIFY FILE EXTENSION!
Exercise: Load in data
Load in data, 2
Load in data, 3
Load in data, 4
Copy & paste into R script
Load in data, 5
• Highlight and run each line
• Examine output in the CONSOLE
Load in data, 6
• You can close out of dataset and click “titanic_r” in ENVIRONMENT to open it up again
Installing and loading packages
• To install a package, use the install.packages() function followed by the package name in double quotes
• To load a package, use the library() function followed by the package name without quotes
Using the ggplot2 package
• Copy and paste the following text into your script (after inserting some line breaks):
# install.packages(“ggplot2”)
library(ggplot2)
• REMEMBER: the “#” indicates a comment that is not interpreted by R as code
• We left this function as a comment because ggplot2is already installed
Exercise: Exploratory data analysis• Copy and paste the following lines of code into your
script (after inserting some line breaks):
head(titanic_r)
str(titanic_r)
summary(titanic_r$gender)
table(titanic_r$pclass, titanic_r$gender)
Exercise: Create graphs
• Copy and paste the following lines of code into your script (after inserting some line breaks):
qplot(pclass, fill=gender, data=mydata)
ggplot(titanic_r, aes(age, fare)) + geom_point(aes(color = factor(survived)))
Things to remember
• R is case-sensitive
• No spaces between function name and opening parenthesis
• Comment every block of code
• Leave line breaks after every code block
For more coding practice:
http://infoguides.gmu.edu/learn_r/101
Workshop resources:
https://dataservices.gmu.edu/workshops/r
Top Related