A Handbook of Statistical Analyses Using R - The Comprehensive R
Using the R software R is an open source comprehensive statistical package, more and more used...
-
Upload
valentine-richardson -
Category
Documents
-
view
227 -
download
0
Transcript of Using the R software R is an open source comprehensive statistical package, more and more used...
![Page 1: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/1.jpg)
Using the R software
R is an open source comprehensive statistical package, more and more used around the world.
R project web site: http://www.r-project.org/
Find the mirror nearest to you when downloading
![Page 2: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/2.jpg)
![Page 3: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/3.jpg)
Wise to read this first!
![Page 4: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/4.jpg)
Install with PDF Manual included !!
Check this box!
(All boxes can be checked if you have enough memory space.)
![Page 5: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/5.jpg)
Start(All )ProgramRR 2.9.0
![Page 6: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/6.jpg)
R comes with no typical menu selection graphical user interface (GUI)
All must be entered at command level (or by writing scripts).
Entering data
Functions: c, matrix, cbind, data.frame, read.table
Help on functions available i R GUI from
Help R functions (text) …
![Page 7: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/7.jpg)
Entering data from keyboard
Example: We want to enter the vector x = (1, 2) and the matrix
To enter something (whatever) we use the assignment operator “<-”
11
12a
The function c() combines individual values (comma-spaced) to a vector
Assigning a vector:
![Page 8: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/8.jpg)
Printing the value on screen:
Either enter the variable or use the function print()
Note that the output begins with [1]. This is the row number, and in this case x is interpreted as a row vector
Listing defined objects (vectors, matrices, data frames):
Use the function ls() with no arguments
![Page 9: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/9.jpg)
What if we just use ls ?
The source code of the function ls() is printed on screen
![Page 10: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/10.jpg)
Removing objects:
Use the function rm()
(Enter x again:
)
![Page 11: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/11.jpg)
Assigning a matrix:
Alternative 1: Use the function matrix()
a<-matrix(values,nrow=m,ncol=n)
values is a list of values enclosed in c(), i.e. a row vector or an already defined vector.
m is the number of rows and n is the number of columns of the matrix. The number of values must be dividable by both m and n.
The values are entered column-wise.
The identifiers nrow= and ncol= can be omitted
Note the double indexing, first number for row and second number for column
![Page 12: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/12.jpg)
Identifiers skipped
If row and column numbers are “erroneously” specified:
Note! There is a result, though, but the fourth value is omitted.
![Page 13: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/13.jpg)
Alternative 2: Concatenating (already existing) columns
Use the function cbind()
…with already existing columns (vectors):
Note! The columns will now be indexed by the original column (vector) names
![Page 14: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/14.jpg)
Collecting vectors and matrices with the same number of rows in a data frame
Use the function data.frame(object 1, object 2, … , object k)
Matrices need to be protected , otherwise each column of a matrix will be identified as a single object in the data frame.
Protection is made with the function I()
![Page 15: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/15.jpg)
Objcets within a data frame can be called upon using the syntax
dataframe$object
![Page 16: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/16.jpg)
Names of objects within a data frame can be called, set or changed by handling the object
names()
![Page 17: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/17.jpg)
Reading from an external data file
Assume we have our data stored on the file demo.dat in directory D:\undv\732A26
x a.1 a.21 2 12 1 -1
Set correct working directory in R:
Note! Path must be specified with slashes (/) which is Unix-language and not backslashes (\) which is DOS-language.
To see which is the current working directory:
![Page 18: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/18.jpg)
To read from the file, use the function
read.table(filename,header=logical_value,sep=separator)
filename is the name of the file enclosed with double quotes ( ” ” ). It can be specified with the whole path if it is not in the current working
directory
logical_value is set to TRUE if the columns in the file have headers, otherwise it should be set to FALSE (it is set automatically if omitted, but the result may be “unexpected”)
separator is set to the separator sign for the columns in the file, (default is ” ” for blank-separated columns)
![Page 19: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/19.jpg)
Note! read.table treats every column of the file as an individual column, i.e. it cannot be used to read a matrix directly into the workspace
The columns of a stored matrix must be recombined to create the matrix
![Page 20: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/20.jpg)
The matrix can be added to the data frame by using cbind()
![Page 21: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/21.jpg)
Writing to an external file
The function
write.table(dataframe,filename,append=logical_value,sep=logical_value,
quote=logical_value,row.names=logical_value,col.names=logical_value)
can be used for different formats of the output
dataframe is the name of the data frame to be written on file
filename is the name of the file to write to
logical_value is either TRUE or FALSE
If append=FALSE (default) a file will be created and any existing file with that name will be destroyed. If append=TRUE the data frame will be added (vertical concatenation) to an existing file.
![Page 22: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/22.jpg)
Examples:
Exploring demo1.dat with Notepad (“Anteckningar” in Swedish)
Row numbers!
![Page 23: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/23.jpg)
Nothing in output will be quoted
![Page 24: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/24.jpg)
Tab-separated, but the first header do not correspond vertically with the first column. The first column of the file is the row number.
![Page 25: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/25.jpg)
(append=FALSE is default and can therefore be omitted for new file creation)
Row numbers have now been removed and headers correspond vertically with the columns.
![Page 26: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/26.jpg)
Note! Multiple lines can be used for a command input. A carriage return before the command is completed opens a new line with the prompt “+”
Column names (headers) have been removed.
![Page 27: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/27.jpg)
Calculation
The ordinary arithmetic operators “+”, “–”, “*” and “/” work element-wise
![Page 28: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/28.jpg)
For matrix multiplication use “%*%”
![Page 29: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/29.jpg)
Matrix operators/functions:
transpose b=t(a) b = aT
inverse b=solve(a) b = a-1 (when needed)
QR-factorization
qr=qr(a) Additional arguments possible
qr.Q(qr) Q
qr.R(qr) R
x=qr.solve(A,b) Solves A·x = b
matrix angular upper trian is and
matrix orthogonalan is where
matrix an
nnR
IQQ
nmQ
RQA
nmA
mT
![Page 30: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/30.jpg)
![Page 31: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/31.jpg)
Solving a linear system of equations, regression estimation
2
12
2
1
11
12
21
21
2
1
xx
xx
x
x
bxa
![Page 32: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/32.jpg)
Regression model
nn
iiii
xx
xx
xx
nixxy
,2,1
2,22,1
1,21,1
,22,110
1
1
1
,,1;
X
Xy εβ
![Page 33: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/33.jpg)
![Page 34: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/34.jpg)
Alternatively:
“reg” becomes an object as output from qr
This object has a number of members (coef, res, fitted)
![Page 35: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/35.jpg)
A more comprehensive regression analysis is done with the function lm() (linear model)
Use help(”lm”) to learn more about this function
![Page 36: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/36.jpg)
![Page 37: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/37.jpg)
![Page 38: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/38.jpg)
Putting it together in a script
Gather command rows in a text file..Give it extension “.r”
Call the script file with command source
a<-matrix(c(2,1,1,-1),2,2)
b<-c(1,2)
x=qr.solve(a,b)
print(x)
Store in
d:\undv\732A26\macro.r
“#” precedes a comment
![Page 39: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/39.jpg)
When exiting R
Workspace can be saved for future sessions:
save.image(”core.RData”) saves the workspace into file core.RData where core is replaced by a suitable filename base.
To restore a saved workspace:
load(”core.RData”)
To exit from R type q()
![Page 40: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/40.jpg)
More programming
Regular sequences:
Note! ”<-” can be reversed
and most often ”<-” can be replaced by ”=”
![Page 41: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/41.jpg)
Repeating patterns
Note! Identifier needs to be specified (times or each)
![Page 42: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/42.jpg)
Looping and conditioning
![Page 43: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/43.jpg)
Conditions must be within parentheses.
Normally: Put “else” directly after “}”
![Page 44: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/44.jpg)
Equality condition must be given with operator ”==”
Multiple statements following a for, if, else or while must be separated by semicolon (;)
runif(1) gives a random U(0,1) number
General usage: runif(n,a,b) n is the number of values, default: a=0, b=1
![Page 45: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/45.jpg)
A more complex example: Simulating regression data
Script:
x1=c(2,3,5,6,9,10,10,12,13,15) # First x-variable
x2=c(1,0,0,1,0,1,1,0,1,1) # Second x-variable
y<-as.numeric(1:10) # Dimensioning y
for (i in (1:10)) {
# Computing y using beta1=1.1 and beta2=-4.7
# Random error is N(0,2)
y[i]=12+1.1*x1[i]-4.7*x2[i]+rnorm(1,0,2) }
Plot(x1,y) # generates a scatter plot y vs. x1
# Estimating the coefficients:
x=cbind(rep(1,each=10),x1,x2)
b=qr.solve(x,y)
print(b)
Store in file regress.r
![Page 46: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/46.jpg)
![Page 47: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/47.jpg)
Suppose we would like to get empirically derived confidence limits for 1 , i.e. not using the normal distribution.
beta1<-as.numeric(1:500) # Dimensioning array of b1-values
x1=c(2,3,5,6,9,10,10,12,13,15) # First x-variable
x2=c(1,0,0,1,0,1,1,0,1,1) # Second x-variable
y<-as.numeric(1:10) # Dimensioning y
for (trial in 1:500) {
for (i in (1:10)) {
# Computing y using beta1=1.1 and beta2=-4.7
# Random error is N(0,2)
y[i]=12+1.1*x1[i]-4.7*x2[i]+rnorm(1,0,2) }
# Estimating the coefficients:
x=cbind(rep(1,each=10),x1,x2)
b=qr.solve(x,y)
# Storing b1 in array
beta1[trial]=b[2] }
Store in file regress2.r
![Page 48: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/48.jpg)
![Page 49: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/49.jpg)
![Page 50: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/50.jpg)
Bootstrapping the estimated 90th percentile of a sample
Assume we wish to assess the 90th percentile of a sample from a Poisson distribution.
This means that we wish to assess the properties opf the sample percentile as an estimator of the population percentile in terms of
bias
95% confidence
Simulate a sample of 40 observations from a Po(7)-distribution, show an initial histogram of the sample values.
Draw 500 pseudo-samples with replacement from the original sample
In each pseudo-sample, compute the sample percentile
Collect the pseudo-sample percentiles, translate them by subtracting the original sample percentile and estimate bias and 95% percentile confidence limits
![Page 51: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/51.jpg)
Formulae for 90th sample percentile:
Let x(1), … , x(n) depict the sample aranged in ascending order, i.e. x(1) is the smallest value and x(n) is the largest value
Calculate i = 0.90·n
If i is non-integer, let the 90th percentile be x(I + 1)
If i is an integer, let the 90th percentile be (x(i) + x(I + 1))/2
This construction ensures that
at most 90% of the sample values are ≤ 90th percentile
at most 10% of the sample values are ≥ 90th percentile
![Page 52: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/52.jpg)
# R-script for illustrating bootstrapping of the 90th sample percentile
n=40 # Sample size
b=500 # Number of bootstrap replications
pvec<-as.numeric(1:b) # Dimensioning vector of bootstrapped estimates
x=rpois(n,7) # Generate 40 independent Po(7)-observations
hist(x,main="Histogram from sample data",xlab=NULL)
xsort=sort(x) # Sort the data
p90index=0.90*n # Calculate decimal order for 90th percentile
if (p90index-floor(p90index)>0) {
p90=xsort[floor(p90index)+1]} else { # 90th perc. if decimal order is non-integer
p90=(xsort[floor(p90index)]+xsort[floor(p90index)+1])/2} # 90th perc. if decimal order is integer
![Page 53: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/53.jpg)
# Bootstrapping loop
for (i in 1:b) {
u=floor(40*runif(40,0,1)+1); # Vector of integers uniformly on {1,2,...,40}
xstar=x[u]; # Pseudo sample
xstarsort=sort(xstar);
if (p90index-floor(p90index)>0) {
p90star=xstarsort[floor(p90index)+1]} else { # Copying estimation method
p90star=(xstarsort[p90index]+xstarsort[p90index+1])/2}
pvec[i]=p90star;
}
pvec_sort=sort(pvec) # Sorting the bootstrapped estimates
pvec_trans=pvec_sort-p90 # Subtracting original estimate from sorted bootstr. est.
![Page 54: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/54.jpg)
# histogram of translated bootstrap estimates
readline("Press <Enter> to show next graph")
hist(pvec_trans,main="Histogram of p90star-p90",xlab=NULL)
# Finding 2.5th and 97.5th percentiles:
L025index=0.025*bU975index=0.975*b
if (L025index-floor(L025index)>0) { L025=pvec_trans[floor(L025index)+1] } else { L025=(pvec_trans[L025index]+pvec_trans[L025index+1])/2 }
if (U975index-floor(U975index)>0) { U975=pvec_trans[floor(U975index)+1] } else { U975=(pvec_trans[U975index]+pvec_trans[U975index+1])/2 }
![Page 55: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/55.jpg)
# Bias estimate:
bias=mean(pvec_trans)
# 95% percentile confidence interval:
lower=p90-U975
upper=p90-L025
output<-data.frame(p90,bias,lower,upper)
names(output)<-c("90th perc.","Bias","Lower 95% limit","Upper 95% limit")
print(output)
![Page 56: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/56.jpg)
![Page 57: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/57.jpg)
![Page 58: Using the R software R is an open source comprehensive statistical package, more and more used around the world. R project web site:](https://reader035.fdocuments.us/reader035/viewer/2022062308/56649edd5503460f94bee466/html5/thumbnails/58.jpg)
Huge more to find out!
• Use the PDF manual (read at least the first chapter)
• Use the help function (help(”function”) or ?function
• Use Google (search for “R: what you are looking for”)