Introduction to R 21/11/2016 · Thefewcommandsyoumust know Command Whatitdoes read.table...
Transcript of Introduction to R 21/11/2016 · Thefewcommandsyoumust know Command Whatitdoes read.table...
Introduction to R 21/11/2016
C3BI
Vincent Guillemot & Anne Biton
R: presentation and installation
How to install and use it?
I Follow the steps: you don’t need advanced rights to install it!I Open the R GUI.I Test a command: plot(-10:10, (-10:10)ˆ2).I Open an R script and save it in your working directory.
. . .
The few commands you must know
Command What it does
read.table Read a tabulated file.write.table Write a matrix or data frame.plot Command for graphical representation.x <- 1 Assign sthg (here 1) to object x.1:10 Create a vector containing integers 1 to 10.x[1:10] Extract a subvector from x.
. . .
Command What it does
c(2, 5) Create a vector containing 2 and 5.A[, 2:5] Extract columns 2 to 5 of matrix A.DF$variable Extract from data frame DF its column called variable.?rnorm Get help on the function called rnorm.??gaussian Get help on the topic gaussian.
A beginner’s test
I If you already know the previous commands → move to theback of the room, you can work independently on the handoutand the exercises and go home whenever you are finished ;
I If you are not familiar with these commands, move to the frontof the room.
I In any case, please ask us any R related question during theclass!
Basic commands
Prompt
A prompt is a special character that appears in the R console:
I > means that R is awaiting for an R command ;I + means that R is awaiting for the end of the current command;I A blank prompt means that R is computing something.
E.g., type:
1+12*3Sys.sleep(10)
Brackets
Brackets Use
() In functions, e.g. sin(2*pi).[] While indexing, e.g. x[1:2].{} In code blocks, e.g.
{x <- rnorm(10)y <- x[1:2]mean(y)
}
Writing your scripts: survival tips
1. Use spaces:
I x <- -1 is OK,I x<--1 not so much. . .
2. Indent!3. Save your scripts, it’s so easy with RStudio.4. Comment, comment, comment (use #).
“You are collaborating with at least one person: yourfuture self!”
– ∼ Hadley Wickham
What this course is about
I Write short R programsI Read and predict the outcome of simple R functions / programsI Make graphical representationsI Read data and write tables
To go beyond (or slower), there’s a lot of material available online:Quick-R, TryR, Data Camp, cookbook-r etc.
Types of exercises
Three levels:
1. Copy & paste some code and see what it does.2. Read some code and explain what it does.3. Create your own code to answer a question.
Ex. 1
1. Copy and execute the following command: log(exp(2)).2. What does this code do: log10(10ˆ3) ?3. Find a function to run a t-test.
R Objects
Types. . .
The type of an object is directly associated to the way it is stored inmemory:
I character : let <- "a"I double : nbr <- 2.0I integer : intg <- 1LI logical : TRUE or T or FALSE or FI Particular values: NA, +Inf, NaN
Types. . . and classes
The class of an object describes how different values are structuredwithin the object:
I vector: v <- c("a", "b", "a")I factor: fac <- factor(v)I matrix: M <- matrix(1:4, 2, 2)I data.frame: D <- data.frame(v, fac)I list,I etc.
Transformations
I as.integerI as.numericI as.characterI as.factorI as.vectorI . . .
Classes
Here are the classes that you need to know of:
I vectors and factors,I matrices,I data-frames,I lists,I functions.
FAMuSSS
FAMuSSS : The Functional Single Nucleotide PolymorphismsAssociated with Human Muscle Size and Strength Study
Load an RData file
In the RData file famusss.RData, there is an example of each ofthe 5 R classes we mentioned:
Name Class Content
ndrm.diff Vector Difference in strength in thenon-dominant arm
snp1 Factor SNP rs577x located in the geneACTN3
M Matrix Matrix containing the Age, heightand weight of the individuals
D Data-frame Sample data extracted from theFAMuSSS data
L List List containing various objectsbmi Function Computes the BMI of an
individual from their weight (lb)and height (in)
Ex. 2
1. Load the objects with the following command
load("famusss.RData")
2. Print all the objects: what type of data do they contain?3. What is the BMI of a person 70 inch tall person weighting 150
lb?4. What does L$Dimensions do? What does names(L) do?5. Extract the element called GenderTable from L?
VectorsCreate them with the combine function c or with the : operator:
x <- c(1, 10, -4, 5.0)i <- 1:10
Access elements from a vector with the square brackets
x[1]
## [1] 1
x[3:4]
## [1] -4 5
Factors
You can create factors in a number of ways, one of them is withfunction gl:
f0 <- gl(n = 3, k = 6, labels = c("CRTL", "A", "B"))
Ex. 3
1. What does f0 == "A" do?2. What do rep and seq do?3. Create a vector called v of length 18.4. What does v[f0 == "A"] do?5. Extract from v the values for which f0 is equal to B?
Matrices
A matrix is a two-dimensional kind of vector:
A <- matrix(0, 2, 2)B <- matrix(c("un","deux","trois","quatre"), 2, 2)A[1,]
## [1] 0 0
B[2,2]
## [1] "quatre"
Data framesA data frame is a two-dimensional structure that allows differenttypes for its columns:
D <- data.frame(a=1:10, b=letters[1:10], cos=cos(1:10))D[1:2,2:3]
## b cos## 1 a 0.5403023## 2 b -0.4161468
D$a[3]
## [1] 3
D[[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
Lists
In R, data frames are special lists:
L <- list(1:10, b=3, f=cos, char=letters[5:7])names(L)
## [1] "" "b" "f" "char"
Block of code
A block allows to gather several commands in order to execute all ofthem at once!
{a <- 1b <- 2
}
It is used in
I functions,I loops (for, while. . . )I Control-flow constructs (?Control).
Functions
I Syntax : f <- function(arg1=, ...) {Commands}.I f ends with a return.I What can f return? Whatever you like (e.g. in a list).I Indent!
Ex. 4
I Create a matrix filled with random numbers (rnorm).I Compute the sum of each column (colSums).I Which elements are > 0?I Create a second matrix filled with 1s. It should have the same
dimensions as the first matrix.I Combine it with the first matrix (rbind or cbind).I Write a function returning the square and the square root of a
positive real number.
for loops
Repeat a block, depending on an iterator i, n times.
for (i in 1:10) {j <- i^2 + i + 1print(j)
}
In general, we want to save the result:
s <- rep(NA, 10)for (i in 1:10) {
s[i] <- i^2 + i + 1}s
Ex. 5
What does this loop do?
library(tm)library(stringr)aveu <- removePunctuation(scan("phedre.txt",what = ""))nba <- 0 ; nbe <- 0 ; nbi <- 0 ; nbo <- 0 ; nbu <- 0
for (mot in aveu) {nba <- nba + str_count(mot, "a")nbe <- nbe + str_count(mot, "e")nbi <- nbi + str_count(mot, "i")nbo <- nbo + str_count(mot, "o")nbu <- nbu + str_count(mot, "u")
}
c(a=nba, e=nbe, i=nbi, o=nbo, u=nbu)
if, else
The random p-value generator:
r <- runif(1)
if (r < 0.05) {print("Youpi !")
} else if (r < 0.1) {print("I still trust my result!")
} else {print(" :'( ")
}
Read and write data
Many available commands
Command Read Save
data Yes Noload Yes Nosave No Yesread.table Yes Yeswrite.table No Yesread.* Yes Yeswrite.* No Yes
Correspondance
Figure 1: diagrammer
data
I Example: data(cars).I Before and after: ls().I Class of the loaded object: class(cars).I Quick object exploration: str(cars).I Only the beginning of the table: head(cars).
Working directory
You may (will) want to change the working directory in which yourcommands will look for data and save your outputs.
You can do this:
I with the commands setwd and getwd,I in a much simpler way with RStudio : Session → Set working
directory → . . .
Tabulated data
I Column names,I lines separated with and EOL (end of line),I column separator (tab, ;, etc.),I the same number of columns per line.
long and wide formats: a wide table
## ctrl trt1 trt2## 1: 4.17 4.81 6.31## 2: 5.58 4.17 5.12## 3: 5.18 4.41 5.54## 4: 6.11 3.59 5.50## 5: 4.50 5.87 5.37## 6: 4.61 3.83 5.29## 7: 5.17 6.03 4.92## 8: 4.53 4.89 6.15## 9: 5.33 4.32 5.80## 10: 5.14 4.69 5.26
long and wide formats: a long table
## values ind## 1: 4.17 ctrl## 2: 5.58 ctrl## 3: 5.18 ctrl## 4: 6.11 ctrl## 5: 4.50 ctrl## ---## 26: 5.29 trt2## 27: 4.92 trt2## 28: 6.15 trt2## 29: 5.80 trt2## 30: 5.26 trt2
read.table
5 important parameters:
I file → where the file is,I header → whether the first line contains the names of the
columns,I sep → column separator,I dec → decimal point (3, 1419 or 3.1419 ?),I skip → how many lines should be skipped.
write.table
4 important parameter:
I x → matrix or data.frame to save,I file → where the file should be stored,I sep → column separator,I dec → decimal point (3, 1419 or 3.1419 ?),
save and load
I save can write any R object into an RData file.I load reads RData files.
Example :
x <- 1:10 ; a <- "toto" ; objetaunomtreslong <- pisave(x, a, objetaunomtreslong, file="Sauvegarde.RData")rm(list=ls())load("Sauvegarde.RData")
Plots
plot
Syntax : plot(objet, ...) !
Parameter Role
main Main titlexlab & ylab Axis titlexlim & ylim Axis limitstype Type of graph : points, lines etc. . .col Color, e.g. “black”, “red”, “green”. . .
Ex. 6
Apply plot to any function, e.g. choose one among the alreadybuilt-in functions: sin, cos, exp, log, sqrt. . .
Ex. 7With plot and grid, reproduce this plot:
Figure 2: image
Add points, and lines or a function
You can draw a graph on an existing plot with the followingcommands:
I points to add points,I lines to add lines,I plot(f, add=TRUE, ...) to add a function.
Ex. 8
1. Generate two variables, x and y, linearly linked to one another.(do not forget to add some noise)
2. Represent the scatter-plot of the two variables with plot.3. Add to the plot the underlying linear model with lines or
plot.
Colors, dashes, symbols and width
4 important parameters :
I pch : to choose the type of point (circle, triangle, etc.),I lty : (line type) to choose the line type,I col : (color) to choose the color,I lwd : (line width) to set the width.
legend
Argument Meaning
x, y Legend position. . .legend Legend text.‘bty Type of box = "o" (with) or "n" (without).
Ex. 9
Add a legend to this graph
plot(1:10, type="b", col="steelblue", lwd=2)
.
1. Add a legend at the following coordinates: (1, 7).2. Add a legend without a box around it, in the upper left corner
of the graph.3. Add the legend wherever you want it with locator(1).
Combining plots. . .
. . . is easy with layout!
1. Create the layout, a matrix indicating the positions andorders of the plots.
2. plot the graphs to populate the layout.
Ex:
x <- rnorm(100) # DataM <- rbind(1, 2:3) # 3 graphs in the layoutlayout(M) # Create the layout and put theplot(x) # 1st ...hist(x) # 2nd ...boxplot(x) # and 3rd graphs
Here is the layout we used:
1
2 3
The resulting plot
0 20 40 60 80 100
−2
−1
01
2
Index
x
Histogram of x
x
Fre
quen
cy
−3 −2 −1 0 1 2
05
1015
20
−2
−1
01
2