Intro to R
description
Transcript of Intro to R
Intro to R
Stephanie LeeDept of Sociology, CSSCRUniversity of Washington
September 2009
Class Outline
I. What is R?II. The R EnvironmentIII.Reading in DataIV.Viewing and Manipulating DataV. Data Analysis
What is R?
R is frequently thought of as another statistics package, like SPSS, Stata or SAS.
While many people use R for statistical analysis, R is actually a full programming environment.
What is R?
R is completely command-driven.
There are very few menu items, so you must use the R language to do anything.
Another important distinction between traditional stats packages and R is that R is object-oriented.
Why Use R?Free!Extremely flexibleMany additional packages availableExcellent graphics
DisadvantagesSteep learning curveDifficult data entry
Download R
Download R:http://cran.r-project.org
Available for Linux, MacOS, and Windows
The R Environment
A traditional stats program like SPSS or Stata only contains one rectangular dataset at a time. All analysis is done on the current dataset.
In contrast, the R environment is like a sandbox.
It can contain a large number of different objects.
The R Environment
R is also function-driven. The functions act on objects and return objects.Functions themselves are objects, too!
function works its black-box magic!
InputArguments(Objects)
Output(Objects)
Rectangular Dataset(Excel, SPSS, Stata, SAS)
Variable 1 Variable 2 Variable 3
Case 1
Case 2
Case 3
Case 4
Case 5
R Environment (Object-Oriented)Function 1
Function 2
Results
Vector 1
Vector 2
Matrix
Data Frame
String
Numeric Value
Help Function
help(function name)help.search(“search term”)
Note: R is case-sensitive!
Try: help(help), ls()
Help Function
Sometimes one help file will contain information for several functions.
Usage: Shows syntax for command and required arguments (input) and any default values for arguments.
Value: the output object of the function
Setting Up Our Data
> library(datasets)> mtcars> ?mtcars> write.csv(mtcars, “C:/temp/cars.csv”)
Creating Objects
Assignment operator: = or <-
Objects need to be assigned a name, otherwise they get dumped to main window, not saved to the environment.
c() is a useful function for creating vectors
Reading in Data
read.table(filename, ...)
> cars = read.csv(C:/temp/cars.csv)
I prefer the CSV (comma-separated values) format. Almost every stats program will export to this format.
Viewing Data
What does the dataset look like?> str(cars)> colnames(cars)> dim(cars)> nrow(cars)> ncol(cars)You can also assign row/col names with these
functions.
Common Mode Types
Mode Possible ValuesLogical TRUE or FALSE or NA
Integer Whole numbers
Numeric Real numbers
Character Single character or String (in double quotes)
Common Object Types
Object Modes More than one mode?
vector Logical, Char, or Numeric
No
factor Logical, Char, or Numeric
No
matrix Logical, Char, or Numeric
No
data frame Logical, Char, and Numeric
Yes
Creating Objects
Object Create Function
vector c(), vector()
factor factor()
matrix matrix()
data frame data.frame()
Viewing Data: Indexing
datasetname[rownum, columnnum]
> cars[1,4] displays value at row 1, column 4
> cars[2:5, 6]displays rows 2-5, column 6
Viewing Data: Indexing
> cars[, 2] displays all rows, column 2
> cars[4,]displays row 4, all columns
Viewing Data
You can also access columns (variables) using the ‘$’ symbol if the data frame has column names:
> cars$mpg> cars$wt
Manipulating Data
Now we can give that first column (variable) a better name than “X”.
> colnames(cars) = c(“name”, colnames(cars)[2:ncol(cars)])
Manipulating Data
> str(cars)
R has the unfortunate habit of trying to turn vectors of character strings into factors (categorical data).
> cars$name = as.character(cars$name)
Manipulating Data: Operators
Arithmetic: + - * / ^
Comparison < less than
> greater than
<= less than or equal to
>= greater than or equal to
== is equal to
!= is not equal to
Logical ! not
& and
| orxor() exclusive or
Manipulating Data
Viewing subsets of data using column names and operators:
> cars[cars$vs == 1,]> cars[cars$cyl >= 6,]> cars$name[cars$hp > 100]> cars$name[cars$wt > 3]
Analyzing Data
What do the variables look like?> table(cars$gear)> hist(cars$qsec)> mean(cars$mpg)> sd(cars$mpg)> cor(cars$mpg, cars$wt)> mean(cars$mpg[cars$cyl == 4])
Manipulating Data
Transforming variables:> wt.lb = cars$wt * 1000
This creates a new vector called wt.lb of length 32 (our number of cases).
Manipulating Data
We can use wt.lb without “adding” it to our dataframe.
But if you like the rectangular dataset concept, you can column bind it to the existing dataframe:
> cars = cbind(cars, wt.lb)
Data Analysis
Hypothesis Testingt.test, prop.test
Regressionlm(), glm()
Data Analysis: OLS Regression
> regr = lm(cars$mpg ~ wt.lb + cars$hp + cars$cyl)
The output of the regression is also an object. We’ve named it regr.
> summary(regr)
Saving DataYou can use write.csv() or write.table() to save your dataset.
When you quit R, it will ask if you want to save the workspace. This includes all the objects you have created, but it does not include the code you’ve written. You can also use save.image() to save the workspace.
You should always save your code in a *.r file.
Other Useful Functions
> ifelse()> is.na()> match()> merge()> apply()> order()> sort()
Other Resources
Main R website: http://www.r-project.org
UW CSSS Intro to RUW CSDE Intro to R UCLA Statistical Computinghttp://www.ats.ucsla.edu/stat
Advanced Topics
More on factorsLists (data type)LoopsString manipulationWriting your own functionsGraphics