Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the...
-
Upload
cody-little -
Category
Documents
-
view
215 -
download
1
Transcript of Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the...
Ann Arbor ASA ‘Up and Running’ With R
Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the department of statistics and the center for statistical consultation and research of the university of Michigan
October 27th, 2010
Ann Arbor ASA (Up and Running with R)
2
R Class Agenda Brief Introduction to R Using R Help Introduction to Functions Available in R Working with Data Importing/Exporting Data Graphs Simple Models Writing Functions/Programming
Ann Arbor ASA (Up and Running with R)
3
What is R?
R is a computing language commonly used for statistical analysis
R is open source which means that the source code is available to all users
R is a free software package, download it at http://www.r-project.org/
Ann Arbor ASA (Up and Running with R)
4
More About R Most statistical analysis is done using
pre-defined functions in R. These functions are available in many
different packages. When you download R, you have access
to many functions from the ‘base’ package.
More advanced functions will require that you download other packages.
Ann Arbor ASA (Up and Running with R)
5
What can you do with R? Topics in statistics are readily available
such as linear modeling, linear mixed modeling, multivariate analysis, clustering, non-parametric methods, and classification
R is well known to produce high quality graphics. Simple plots are easy and with a little more practice, users can produce publishable graphics!
Ann Arbor ASA (Up and Running with R)
6
Time to Launch R Find R on your computer:
Start>Statistical Software Packages>R Go to the file menu and double click
‘New script’ Here is the editor window where we will
type our script It is more convenient to type here than
in your workspace Try typing in both the workspace and
the editor window
Ann Arbor ASA (Up and Running with R)
7
Data Objects in R Users create different data objects in R Data objects refer to variables, arrays of
numbers, character strings, functions and other more complicated data manipulations
‘<-’ allows you to assign data objects with names of your choice
Type ‘a<-7’ in your editor window Submit this command by highlighting it and
pressing ctrl+r Practice creating different data objects and
submit them to the workspace
Ann Arbor ASA (Up and Running with R)
8
Data Objects in R Type ‘objects ()’ This allows you to see that you have created
the object ‘a’ during this R session You can view previously submitted commands
by using the up/down arrow on your computer You can remove this object by typing ‘rm(a)’ Try removing some objects you created and
then type ‘objects()’ to see if they are listed
Ann Arbor ASA (Up and Running with R)
9
Getting Help in R To get help on any specific function:
Type ‘help(name of function)’ OR type ‘?(name of function)’
Sometimes help is not available from the packages you have downloaded Type ‘??(name of function)’ Try searching for help on ‘hist’ or ‘lm’
Two popular R resource websites: Rseek.org nabble.com
Ann Arbor ASA (Up and Running with R)
10
A Simple Example to Get You Started
To set up a vector named x use the R command: ‘x<-c(5,4,3,6)’ This is an assignment statement using the
function c() which creates a vector by concatenating its arguments
Perform vector/matrix arithmetic: ‘v<- 3*x - 5’
Ann Arbor ASA (Up and Running with R)
11
R Reference Card*created by Tom Short
There are thousands of available functions in R, but this Reference Card provides a strong working knowledge
Let’s take a minute to look at the organization of the Reference Card and try out a few of the functions available!
Ann Arbor ASA (Up and Running with R)
12
Generating Sequences/Replicating Objects Sequences: submit the following commands
‘seq(-5, 5, by=.2)’ ‘seq(length=51, from=-5, by=.2)’ Both produce a sequence from -5 to 5 with a
distance of .2 between objects Replications: submit the following
commands ‘rep(x, times=5)’ ‘rep(x, each=5) ‘ Both produce x replicated 5 times
Ann Arbor ASA (Up and Running with R)
13
Working with Data Sets
There are many data sets available for use in R Type ‘data()’ to see what’s available
We will work with the trees data set Type ‘data(trees)’ This data set is now ready to use in R
The following are useful commands: ‘summary(trees)’ – summary of variables ‘dim(trees)’ – dimension of data set ‘names(trees)’ – see variable names ‘attach(trees)’ – attach the variable names for use
in R
Ann Arbor ASA (Up and Running with R)
14
Extracting Data
R has saved the data set trees as a data frame object Check this by typing ‘class(trees)’
R stores this data in matrix row/column format: data.frame[rows,columns] Type ‘trees[c(1:2),2]’ – we see the first 2 rows and
2nd column Type ‘trees[3,c(“Height”,”Girth”)]’ – can also
reference column names Type ‘trees[-c(10:20),”Height”]’ – skips rows 10-
20 for variable Height
Ann Arbor ASA (Up and Running with R)
15
Extracting Data (continued)
The subset() command is very useful to extract data in a logical manner. 1st argument is data, 2nd argument is logical subset requirement ‘subset(trees, Height>80)’ – subset where all
tree heights >80 ‘subset(trees, Height<70 & Girth>10) ‘– subset
where all tree heights<70 AND tree girth>10 ‘subset(trees, Height <60 | Girth >11)’ – subset
where all tree heights <60 OR Girth >11
Ann Arbor ASA (Up and Running with R)
16
Importing Data
The most common (and easiest) file to import is a text file with the read.table() command
R needs to be told where the file is located You can set the working directory which tells R
where all your files are located by typing ‘setwd("C:\\Users\\hicksk\\Desktop")’
OR you can physically point to the working directory by going to File<Change dir… and choosing the location of your files
OR you can include the physical location of your file in your read.table() command
Ann Arbor ASA (Up and Running with R)
17
Using the read.table() command
Go to ASA Ann Arbor Chapter’s website here and look under the R Classes section, open ‘furniture.zip’ and save the files to your desktop
Remember we must tell R where these files are located to read them in properly read.table("C:\\Users\\hicksk\\Desktop\\
furniture.txt",header=TRUE,sep=“”) Important to use double slashes \\ rather than
single slash \ Tell R whether you have column names on your
data with header=TRUE or header=FALSE
Ann Arbor ASA (Up and Running with R)
18
Using read.table() (cont’d)
Remember, another way of specifying the file’s location is to set the working directory first and then read in the file setwd(“C:\\Users\\hicksk\\Desktop”) read.table(“furniture.txt”,header=TRUE,sep=“
”)• OR we had the option of physically pointing the
location by going to File>Change dir… and pointing to the file’s location. We would then be able to read the file similar to above by typing ‘read.table(“furniture.txt”,header=TRUE,sep=“”)’
Ann Arbor ASA (Up and Running with R)
19
read.table(), read.csv() and Missing Values It is also popular to import csv files since
excel files are easily converted to csv files read.csv() and read.table() are very similar
although they handle missing values differently read.csv() automatically assign an ‘NA’ to
missing values read.table() will not load data with missing
values, so you must assign ‘NA’ to missing values before reading it into R
Ann Arbor ASA (Up and Running with R)
20
read.table(), read.csv() and Missing Values (cont’d) Let’s remove a data entry from both
“furniture.txt” and “furniture.csv” From the first row, erase 100 from the Area
column Now try to read in the data from these
two files using read.table() and read.csv() You should see that you cannot read the
data in using the read.table() command unless you input an entry for the missing value
Ann Arbor ASA (Up and Running with R)
21
Other Options for Importing Data When you download R, you should have
automatically obtained the foreign package
By submitting ‘library(foreign)’, you will have many more options for importing data: read.xport(), read.spss(), read.dta(),
read.mtp() For more information on these options,
simply submit ‘help(read.XXXX)’
Ann Arbor ASA (Up and Running with R)
22
Exporting Data You can export data by using the write.table()
command ‘write.table(trees, “treesDATA.txt”,
row.names=FALSE, sep=“,”)’ Specify that we want the trees data set exported Type in name of file to be exported. The default
is that it will write the file to the working directory already specified unless you give a location.
row.names=FALSE tells R that we do not wish to preserve the row names
sep=“,” tells R that the data set is comma delimited
Ann Arbor ASA (Up and Running with R)
23
Furniture Data Set
Let’s assign a name to the furniture data set as we read it in so we can do some analysis furn<-read.table(“furniture.txt”,sep=“”,h=T)
To get a better understanding of our data set, use some useful commands: dim(furn) summary(furn) names(furn) attach(furn)
Ann Arbor ASA (Up and Running with R)
24
Graphs in R Using the Furniture Data R can produce both very simple and very
complex graphs We will only get a brief introduction today but I
encourage you to investigate further Let’s start by making a simple scatter plot of the
Area and Cost variables from our furniture data set plot(Area,Cost,main=“Area vs Cost”,
xlab=“Area”,ylab=“Cost”) We have told R to put Area on the x-axis, Cost on
the y-axis and provided a title and label axes
Ann Arbor ASA (Up and Running with R)
25
Graphs in R
Let’s look at the distribution of our variables using some different graphs in R hist(Area) – histogram of Area hist(Cost) – histogram of Cost boxplot(Cost ~ Type) – boxplot of Cost by
Type We can make the boxplot much prettier
boxplot(Cost ~ Type, main=“Boxplot of Cost by Type”, col=c(“orange”, “green”, “blue”), xlab=“Type”, ylab = “Cost”)
Ann Arbor ASA (Up and Running with R)
26
Graphs in R
We can also look at a scatter plot matrix of all variables in a data set by using the pairs() function pairs(furn)
Or we can look at a correlation/covariance matrix of the numeric variables cor(furn[,c(2:3)]) cov(furn[,c(2:3)])
Ann Arbor ASA (Up and Running with R)
27
Graphs in R/Simple Models
Let’s perform a simple linear regression using the furniture data set m1<-lm(Cost ~ Area) summary(m1) coef(m1) fitted.values(m1) residuals(m1)
We can also plot the residuals against the fitted values plot(fitted.values(m1), residuals(m1))
Ann Arbor ASA (Up and Running with R)
28
Graphs in R/Simple Models
Let’s continue with our scatter plot of Area and Cost plot(Area, Cost, main = “Cost Regression
Example”, xlab=“Cost”, ylab=“Area”) abline(lm(Cost~Area), col=3, lty=1) lines( lowess(Cost~Area), col=3, lty=2)
Now let’s interactively add a legend legend(locator(1), c(“Linear”, “Lowess”),
lty=c(1,2), col=2) You can point to your graph and place the legend
where you wish!
Ann Arbor ASA (Up and Running with R)
29
Graphs in R/Simple Models
Now let’s identify different points on the graph identify(Area, Cost, row.names(furn)) Makes it easy to identify outliers
We can use the locator() command to quantify differences between the regression fit and the loess line locator(2) Now let’s compare predicted values of Cost
when Area is equal to 250
Ann Arbor ASA (Up and Running with R)
30
Multivariate Analysis
Now let’s do a multivariate regression using both Area and Type as predictors in the model m2<-lm(Cost ~ Area + Type) summary(m2)
Now let’s see if our multivariate model is significantly better than the simple model by using ANOVA anova(m1, m2) The ANOVA table compares the two nested regression
models by testing the null hypothesis that the Type predictor did not need to be in the model. Since the p-value<.05, we have evidence to conclude that Type is an important predictor.
Ann Arbor ASA (Up and Running with R)
31
Writing Functions
You can easily write your own programs and functions in R
Type in the following function named f1: f1<-function(m,n) {
result<-m + nreturn(result) }
Now type ‘f1(3,5)’ and you should see that your function ran for the values 3,5 as specified
Ann Arbor ASA (Up and Running with R)
32
Working with If-Then Statements Here’s an example of how if-then works in R:
You’ll see since 10>5, it printed “GO BLUE” You can tell R to do multiple items using the
following structure if (logical condition)
{do this and this and this}
Ann Arbor ASA (Up and Running with R)
33
If-Else Conditions
We can make If-then statements slightly more complex using If-Else Conditions. Here’s an example: if(4>5) {print("Happy
Halloween") print(" BOO’’) } else {
print(‘’Merry XMAS’’)print(‘’HO HO HO’’)}
Ann Arbor ASA (Up and Running with R)
34
For Loop/While Loop For loops can be quite helpful when writing
functions. Here’s an example: for (i in 1:5) { print(i+1)}
While loops are also quite handy. Here’s an example: f2<-function (x) {
while( x<5) {x<- x+1print(x) }}
f2(-5)
Ann Arbor ASA (Up and Running with R)
35
Practice Problem #1
Create a sequence that starts at 0 and goes to 5 with a step of 0.5
Replicate ‘a b c’ 3 times
Replicate ‘a’ 3 times, ‘b’ 3 times, ‘c’ 3 times in one command
Ann Arbor ASA (Up and Running with R)
36
Practice Problem #2
Make a histogram of the “Girth” variable from the ‘trees’ data set. Include a title.
Make a boxplot of the “Height” variable from the ‘trees’ data set. Color it blue and label your axes.
Make a scatter plot of Girth and Height. Add the regression line.
Ann Arbor ASA (Up and Running with R)
37
Practice Problem #3
Create a simple linear model with Girth as the predictor and Height as the response. Extract the coefficients.
Now add Volume to the model. How can we tell if this model is preferred to the simpler model?
Ann Arbor ASA (Up and Running with R)
38
Practice Problem #4
Fix x at a number smaller than 5. Use a ‘while loop’ to create a sequence that starts at x and increases by 2 until you reach 20.
Create a function that will return the product of any two numbers.
Ann Arbor ASA (Up and Running with R)
39
Thank you for your attention!
Additional R Resources:
R project home http://www.r-project.org R documentation
http://www.r-project.org/other-docs.html R help forum
http://www.nabble.com/R-help-f13820.html R Journal http://journal.r-project.org/ R Graphical Gallery
http://addictedtor.free.fr/graphiques/ R Graphical Manual http://bm2.genes.nig.ac.jp/RGM2/ R Seek http://www.rseek.org/
Ann Arbor ASA (Up and Running with R)
40
Acknowledgements/References
Thank you to Brady West for allowing the use of his R introductory materials.
http://www.r-project.org
http://addictedtor.free.fr/graphiques/