Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction...

Post on 24-Jan-2021

0 views 0 download

Transcript of Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction...

Reinhard Furrer, UZH

I-Math, 12. 2. 2014NZZ.ch

Introduction to R

Contents

2

I Basics

I Data handling and storing

I Plotting

I Linear models

I Simple programming tricks

3

Part 1

Basics

4

I What is R?

I The R-environment

I Getting started

I R rules

What is R?

5

I R is a language and environment for statistical computing and

graphics.

I R provides a wide variety of statistical and graphical techniques,

and is highly extensible.

I R produces well-designed publication-quality plots with a careful

choice of default values.

I R is available as Free Software under the terms of the Free Soft-

ware Foundation’s GNU General Public License in source code

form.

What is R?

6

Crude classification:

I Symbolic software:

– Mathematica

– Maple

– Magma

– . . .

I Numeric software:

– MATLAB, Octave

– NCL, IDL

– . . .

– R

The R-environment: micro

7

I R is an integrated suite of software facilities

I Emphasis on statistical analysis and graphical display

I Perform an entire analysis from raw data to reports

I Essentially command line interpreted, links to precompiled code

are possible

The R-environment: macro

8

Due to licence:

I freely available: cran.r-project.org

I huge community

I many packages (>5100): cran.r-project.org/web/packages/

I abundant documentation in form of:

FAQs (cran.r-project.org/doc/FAQ/R-FAQ.html), manuals (cran.r-

project.org/manuals.html or cran.r-project.org/other-docs.html),

wiki’s, books, . . . see www.r-project.org

I several mailing lists: www.r-project.org/mail.html

The R-environment: macro

9

Slides are mainly based on the following sources:

I An Introduction to R: (IR)

cran.r-project.org/doc/manuals/R-intro.pdf

I The R Primer : (RP)

www.stat.washington.edu/cggreen/rprimer/

I The R Inferno: (RI)

www.burns-stat.com/pages/Tutor/R inferno.pdf

and some 10 years of personal use . . .

Getting started: install R

10

Done through “The Comprehensive R Archive Network” (CRAN):

cran.r-project.org

Easy to follow instructions in Chapter 1 of RP:

www.stat.washington.edu/cggreen/rprimer/

Getting started: run R (Linux)

11

Launch R in your console:<194>furrer@furrer-laptop:~/teaching/intro2R> R

R version 2.15.0 (2012-03-30)Copyright (C) 2012 The R Foundation for Statistical ComputingISBN 3-900051-07-0Platform: i686-pc-linux-gnu (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.

>

Getting started: run R

12

RStudio

Runs under Windows, Linux, OS X (free; AGPLv3) rstudio.org

Getting started: run R

13

Tinn-R (Tinn stands for the recursive acronym ’Tinn is not Notepad’)

Runs under Windows (free; GPL) sciviews.org/Tinn-R

Getting started: run R

14

EMACS environment for R (and other statistics software)

Runs under Windows, Linux, OS X (GPL)

Getting started

15

> pi

[1] 3.141593

> cos( pi)

[1] -1

> 2 + 2.3

[1] 4.3

> sqrt( -1) # Oops

[1] NaN

> myvar <- exp( -2.3) # Assigning

> print( myvar)

[1] 0.1002588

> print( myvar, digits=16)

[1] 0.1002588437228037

RStudio

Hands-on tasks 1

16

1. Open RStudio and familarize with it.

2. What is the 15th digit of π?

3. Interpret the result of sin( pi).

Getting started

17

> nrcyclones <- c(6, 5, 4, 6, 6, 3, 12, 7, 4, 2, 6, 7, 4)

> # "c" is a function... creating a vector out of its elements

> summary( nrcyclones)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.000 4.000 6.000 5.538 6.000 12.000

> hist( nrcyclones)

Histogram of nrcyclones

nrcyclones

Fre

quen

cy

2 4 6 8 10 12

01

23

45

Getting started

18

> plot( nrcyclones, type="b")

● ●

2 4 6 8 10 12

24

68

1012

Index

nrcy

clon

es

> cor( nrcyclones[-1], nrcyclones[-13])

[1] -0.1113836

Getting started

19

> par( mfrow=c(1,2))

> acf( nrcyclones)

> pacf( nrcyclones)

0 2 4 6 8 10

−0.

50.

00.

51.

0

Lag

AC

F

Series nrcyclones

2 4 6 8 10

−0.

4−

0.2

0.0

0.2

0.4

Lag

Par

tial A

CF

Series nrcyclones

Getting started

20

> help( acf)acf package:stats R Documentation

Auto- and Cross- Covariance and -Correlation Function Estimation

Description:

The function 'acf' computes (and by default plots) estimates ofthe autocovariance or autocorrelation function. Function 'pacf'is the function used for the partial autocorrelations. Function'ccf' computes the cross-correlation or cross-covariance of twounivariate series.

Usage:

acf(x, lag.max = NULL,type = c("correlation", "covariance", "partial"),plot = TRUE, na.action = na.fail, demean = TRUE, ...)

pacf(x, lag.max, plot, na.action, ...)

## Default S3 method:pacf(x, lag.max = NULL, plot = TRUE, na.action = na.fail,

...)

Getting started: getting help

21

Various possibilities:

> ?mean # Shortcut for help( mean)

> ?"%*%" # The quotes are required!

> help.start() # Interactive html-based help!

Further illustrative help is accessed via:

> example("image") # example code in the help of "image"

> demo("image") # run the demo "image"

> demo() # lists all available demos

We hardly use the following command:

> q()

R rules

22

I R is case-sensitive.

I Variable names, function names, etc., should contain only

alphanumeric characters (A-Z, a-z, 0-9), the “.” (and “ ”).

Cannot be a reserved word or start with a digit or ” ”.

I Commands are separated by semicolons (“;”) or by a newline.

Commands are grouped with curly braces ({ }).

I # is the comment sign. Remainder of the line is ignored.

I If a command is not complete at the end of a line, R will give

a continuation prompt, “+ ”, on subsequent lines until the com-

mand is complete.

I As long as matched, single quotes (’) and double quotes (") are

equivalent.

R rules: reserved words

23

The reserved words in R’s parser are:

if, else, repeat, while, for, in, next, break, function

TRUE, FALSE, NULL, Inf, NaN, NA and NA-specific types.

... and ...-derivatives, which are used to refer to arguments

passed down from an enclosing function.

There are (unprotected) short cuts T and F, for TRUE and FALSE:

> T

[1] TRUE

> T <- F # How not to do it!!

> T

[1] FALSE

R rules: functions and operators

24

Most R statements are composed of functions and operators:

> y <- sqrt(2 + 2)

consists of the + operator followed by the √ -function and then the

assign operator.

Functions are of the form function( list of arguments )

Operators are of the form lhs operator rhs

Hands-on tasks 2

25

1. What are operators and what are functions in the following calls:

2 + 1

sin( pi)

2 + cos( 0)

2. What does the function median calculate?

3. Notice the difference between ?mean, ?"mean" and ?in, ?"in".

4. Create a variable named my1var containing log( 3).

5. Which of the following are valid variable names:

yo, beHappy!, I am 2, myvar;val, getvar1, getvar$char.

6.? Many operators can be used as functions: "operator"(lhs, rhs).

Compare: 2 + 2 and "+"(2,2)

R rules: syntax

26

R has the following operators (highest to lowest)::: ::: access variables in a name space$ @ component / slot extraction[ [[ indexing^ exponentiation (right to left)- + unary minus and plus: sequence operator%any% special operators* / multiply, divide+ - (binary) add, subtract< > <= >= == != ordering and comparison! negation& && and| || or~ as in formulae-> ->> rightwards assignment= assignment (right to left)<- <<- assignment (right to left)? help (unary and binary)

Hands-on tasks 3

27

1. Compare:

1:-3

1:(-3)

-1:3

-(1:3)

2. Compare:

2^1/2

2^(1/2)

3.? Be aware of floating point arithmetic:

pi==3.14159265358979

pi==3.141592653589793

pi==3.141592653589793116

28

Part 2

Data handling and storage

29

I Objects

I Indexing

I Functions

I Reading from files

Objects

30

R uses the following “core” objects:

I vectors

I matrices

I arrays

I factors

I lists

I data frames

I functions

Objects: vectors

31

Intrinsic attributes: mode and length

> v <- 1:4

> v

[1] 1 2 3 4

mode is of logical, numeric, complex, character (or raw).

> length( v)

[1] 4

> mode( v)

[1] "numeric"

> mode( 1i) # to give another example

[1] "complex"

The mode numeric has storage mode integer or double.

Hands-on tasks 4

32

1. All elements of a vector are of the same mode.

What is the mode of c("char", pi), c(2,1i)?

2. Interpret the result of sqrt(-1) and sqrt(-1+0i)

3. is.integer and as.integer query and coerce to integer format.

What is the output of length (two ways to verify)?

4.? Compare the results of identical(1,1.0) and

identical( as.integer(1),1.0)

5.? What is the result and storage mode of 3L, 3L*1, 3L*1L, 3L/1L,

3L/3L?

Objects: vectors: generation

33

Concatenation operator:

> v <- c( 1, 2, 3, 4)

Generate sequences (several additional possibilities exist):

> seq( 4) # identical to 1:4

[1] 1 2 3 4

> seq( 1, 12, by=2)

[1] 1 3 5 7 9 11

> seq( 1, by=2, length.out=12)

[1] 1 3 5 7 9 11 13 15 17 19 21 23

> rep( 1:4, 2) # identical to rep.int( 1:4, 2)

[1] 1 2 3 4 1 2 3 4

> rep( 1:4, each=2)

[1] 1 1 2 2 3 3 4 4

> rep( 1:4, 2:5) # identical to rep( 1:4, times=2:5)

[1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4

Hands-on tasks 5

34

1. Interpret the output of the following calls:

seq( from=1, to=13, by=2)

seq( from=1, to=13, length.out=3)

seq( from=1, by=2, length.out=3)

seq( from=1, to=12, by=2, length.out=3)

2. What calls generate the sequence: 1, 4, 4, 7, 7, 7, 10, 10, 10,

10, 13, 13, 13, 13, 13?

3. Create a sequence containing TRUE and FALSE according to the

parity of the last sequence.

4. Why is it not advisable to use the command: c <- c(1, 2, 3, 4)?

Objects: matrices

35

A vector with (minimal) attribute dim

> m <- matrix( 1:16, 4, 4)

> m

[,1] [,2] [,3] [,4]

[1,] 1 5 9 13

[2,] 2 6 10 14

[3,] 3 7 11 15

[4,] 4 8 12 16

> length( m)

[1] 16

> attributes( m)

$dim

[1] 4 4

Objects: matrices

36

A matrix can contain additional attributes

> rownames( m) <- paste( "r", 1:4, sep="")

> attributes( m)

$dim

[1] 4 4

$dimnames

$dimnames[[1]]

[1] "r1" "r2" "r3" "r4"

$dimnames[[2]]

NULL

The function attr( object, name) can be used to specify an attribute:

> attr( m, "dim") <- c(2, 8) # What is the result?

Objects: matrices: generation

37

> m1 <- matrix( 1:8, nrow=4, ncol=4, byrow=TRUE) # recycling

> m2 <- diag( 1:4)

> m3 <- cbind( 1:3, 2:4, 1)

> m3

[,1] [,2] [,3]

[1,] 1 2 1

[2,] 2 3 1

[3,] 3 4 1

> t( m3) # transpose

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 2 3 4

[3,] 1 1 1

Hands-on tasks 6

38

1. What is the effect of dim( m) <- c( 2, 8)? Try other values.

2. What is the result of

matrix( 1:7, nrow=4, ncol=4)

diag( m1)

rbind( 1:3, 2:4, 1)

cbind( rbind( 1:2, 3:4), 0) ?

3. Construct a block diagonal matrix with 2 blocks of sizes 2×2.

Objects: arrays

39

Arrays are higher-dimensional “matrices”

> a <- array( 1:24, c( 3, 4, 2))

> a

, , 1

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

, , 2

[,1] [,2] [,3] [,4]

[1,] 13 16 19 22

[2,] 14 17 20 23

[3,] 15 18 21 24

Hands-on tasks 7

40

1. What is the length of a?

2. What are its attributes?

3. aperm is the generalization of t.

Trace the elements of aperm(a,c(2,1,3)) and aperm(a,c(3,2,1)).

Objects: factors

41

Strange concept, neither numeric nor character.

> as.factor( 1:3)

[1] 1 2 3

Levels: 1 2 3

> as.factor( 1:3) + 1

[1] NA NA NA

Used in the context of categorical data.

Objects: lists

42

A vector whose elements can be of ‘any’ type.

> l <- list(1:2, as.factor(1:2), paste(1:2))

> l

[[1]]

[1] 1 2

[[2]]

[1] 1 2

Levels: 1 2

[[3]]

[1] "1" "2"

> length(l)

[1] 3

Objects: data frames

43

Matrix-like structures, in which the columns can be of different types.

> d <- data.frame( m)

> d

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

2 2 4 6 8 10 12 14 16

> attributes( d)

$names

[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"

$row.names

[1] 1 2

$class

[1] "data.frame"

Objects: data frames

44

While rownames and colnames are for matrices, names and row.names are

for data frames.

> names( d)

[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"

> row.names( d)

[1] "1" "2"

Luckily, the former work as well:

> colnames( d)

[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"

> rownames( d)

[1] "1" "2"

In general, work with dimnames.

Hands-on tasks 8

45

1. Can factors be ordered?

2. What is the difference between l[1] and l[[1]] ?

(use is.list(..) to probe the result).

3. Internally, a data.frame is a list with class data.frame .

Check d[[3]] .

4. What is the length of d? Is the result intuitive?

Objects: functions

46

R is built upon itself. Many of the functions are “visible”:> sdfunction (x, na.rm = FALSE)sqrt(var(if (is.vector(x)) x else as.double(x), na.rm = na.rm))<bytecode: 0x25d9408><environment: namespace:stats>

More later . . .

Objects: coercion and testing

47

An object obj has usually with three associated functions:

obj() , as.obj() , and is.obj() .

> is.matrix( a)

[1] FALSE

> as.matrix( v) # here equivalent to "matrix(v)"

[,1]

[1,] 1

[2,] 2

[3,] 3

[4,] 4

Hands-on tasks 9

48

1. Notice the difference between matrix( a, nrow=3)

and as.matrix( a, nrow=3)

2. What is the result of c( 0, NULL, 3),

is.array( m), is.matrix( m)

is.array( a), is.matrix( a)

3. Note all coercions work. What is the result of

as.integer( pi)

as.integer( 2i)

as.numeric( "a")

Objects: summary

49

Source: RI

Indexing

50

Basically, extraction is done via the [ operator:

> v

[1] 1 2 3 4

> v[1]

[1] 1

> v[-c(2:3)] # or v[-c(2,3)] or v[-(2:3)]

[1] 1 4

Similarly, replacement is done via the [<- operator:

> v[ 1] <- 1.1

> v[-c(2:3)] <- c(2.2, 3.3)

> v

[1] 2.2 2.0 3.0 3.3

Indexing: vectors

51

Extraction is done via the [ operator:

> v

[1] 2.2 2.0 3.0 3.3

> v[ c(1, 4)]

[1] 2.2 3.3

> v[-c(1, 4)]

[1] 2 3

> v[c(TRUE, FALSE, TRUE, FALSE)]

[1] 2.2 3.0

> v[c(TRUE, FALSE, TRUE)] # note the recycling!

[1] 2.2 3.0 3.3

Extraction for (very) long vectors:

> tail( v, 2)

[1] 3.0 3.3

> head( v, -1)

[1] 2.2 2.0 3.0

Indexing: matrices

52

> m <- matrix( 1:16, 4, 4)

> m[2, 3]

[1] 10

> m[1,]

[1] 1 5 9 13

> m[,1]

[1] 1 2 3 4

> m[ c(1,8,12)] # ordered columwise

[1] 1 8 12

> m[ c(1,2,4), c(4,2,1)] # note the ordering

[,1] [,2] [,3]

[1,] 13 5 1

[2,] 14 6 2

[3,] 16 8 4

> m[cbind( c(1,2,4), c(4,2,1))] # What is the result when using rbind?

[1] 13 6 4

Indexing: matrices

53

If the matrix has appropriate dimnames attributes:

> rownames( m) <- paste( "r", 1:4, sep="")

> m

[,1] [,2] [,3] [,4]

r1 1 5 9 13

r2 2 6 10 14

r3 3 7 11 15

r4 4 8 12 16

> m["r1",]

[1] 1 5 9 13

> m[,1, drop=FALSE]

[,1]

r1 1

r2 2

r3 3

r4 4

Indexing: matrices

54

Extract or replace the diagonal values:

> n <- min( dim( m))

> diag( m)

[1] 1 6 11 16

> diag( m) <- -(1:n)

How to extract the values above the diagonal?

> m[ (1:(n-1))*(n+1)]

[1] 5 10 15

> m

[,1] [,2] [,3] [,4]

r1 -1 5 9 13

r2 2 -2 10 14

r3 3 7 -3 15

r4 4 8 12 -4

Hands-on tasks 10

55

1. Suppose that m only has rownames, interpret the result of m[,"c1"].

2. Use diag to extract the values above the diagonal.

3. Set the values of m below the diagonal to -1.

4. Compare m[cbind( c(1,2,4), c(4,2,1))] and the result when using

rbind instead?

Indexing: lists

56

Extraction is done via the [, [[, $ operator:

> l[[1]]

[1] 1 2

> l[1]

[[1]]

[1] 1 2

> ll <- list( a=2, b=3, cde=10)

> ll$a

[1] 2

> ll$c # note the partial matching

[1] 10

Indexing: data frames

57

Column extraction is also possible with $ operator:

> d$X1 # a data frame is primarily a list!

[1] 1 2

> d[,1]

[1] 1 2

> d[,"X1"]

[1] 1 2

Similarly:

> d[1,]

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

> d["1",]

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

Indexing: other details

58

I Matrices are stored column-wise.

I Arrays are stored along the indices.

I Objects can have length zero, e.g. v[0].

I Indexing starts at one, but indexing can have all negative values.

Hands-on tasks 11

59

1. What happens if ll <- list( a=2, b=3, cd=10, ce=12) is indexed

with ll$c?

2. What elements are extracted with m[1:6], a[1:4*2]?

3. Let exist <- 1:14. What elements are extracted with exist[-c(1:3)],

exist[c(1:3)]? What is the result of exist[-1:3]

4.? Examine the code

nonexist[2] <- 1

nonexist <- numeric(0)

length(nonexist)

nonexist[0]

nonexist[1]

nonexist[2] <- 1

nonexist

Functions

60

Example:

> x <- mean( x, trim=.1)

General structure:

> res <- fcn( defarg1, defarg2,..., optarg1, optarg2, ...)

I res may be NULL

I Required arguments need to be in order.

I Optional arguments are name matched.

Functions: “Math” group

61

Math(x, ...): abs, sign, sqrt, floor, ceiling, trunc

round, signif, exp, log, expm1, log1p

cos, sin, tan, acos, asin, atan

cosh, sinh, tanh, acosh, asinh, atanh

lgamma, gamma, digamma, trigamma

cumsum, cumprod, cummax, cummin

Ops(e1, e2): "+", "-", "*", "/", "^", "%%", "%/%"

"&", "|", "!"

"==", "!=", "<", "<=", ">=", ">"

Summary(..., na.rm=FALSE): all, any, sum, prod, min, max, range

Complex(z): Arg, Conj, Im, Mod, Re

Hands-on tasks 12

62

1. What is the result of min( c( 1, 3, NA)) ?

Is there a difference to min( 1, 3, NA) ?

How to get the result of 1?

2. What is the result of 17 %% 7 and 17 %/% 7 ? Why?

3.? It is possible to define functions without a function name:

(function(x,y) { z <- x**2 + y**2; x+y+z } )(0:7, 1)

Functions: matrices

63

For matrices, special operators are defined:

> m1 <- m2 <- matrix(1, 2, 2)

> m1[2, 2] <- 2

> m1 %*% m2

[,1] [,2]

[1,] 2 2

[2,] 3 3

> solve( m1)

[,1] [,2]

[1,] 2 -1

[2,] -1 1

> det( m1)

[1] 1

Functions: matrices: factorization

64

> svd( m1) # X = U D V'$d[1] 2.618034 0.381966

$u[,1] [,2]

[1,] -0.5257311 -0.8506508[2,] -0.8506508 0.5257311

$v[,1] [,2]

[1,] -0.5257311 -0.8506508[2,] -0.8506508 0.5257311> chol( m1) # X = R' R

[,1] [,2][1,] 1 1[2,] 0 1> eigen( m1) # X = G D G' ## We see eigen and chol again!$values[1] 2.618034 0.381966

$vectors[,1] [,2]

[1,] 0.5257311 -0.8506508[2,] 0.8506508 0.5257311

Functions: matrices: factorization

65

> qr( m1)$qr

[,1] [,2][1,] -1.4142136 -2.1213203[2,] 0.7071068 0.7071068

$rank[1] 2

$qraux[1] 1.7071068 0.7071068

$pivot[1] 1 2

attr(,"class")[1] "qr"

There are several additional functions associated: qr.qy, qr.tqr, . . .

Hands-on tasks 13

66

Let M <- m1 %*% t( m1)

1. What is the eigendecomposition of M ?

2. What are the singular values of the same matrix?

3. Propose several approaches to construct an inverse of

M + diag( 2)

4. How can you calculate the trace of an arbitrary matrix A ?

Functions: probability distributions

67

General construct of prefix and root.

I prefix: d density, p CDF, q quantile, r random numbers

I root: beta, binom, pois, norm, t, and many more

For example:

> runif( 5)

[1] 0.2282756 0.1472576 0.8364201 0.8430635 0.0640814

> dnorm( 0)

[1] 0.3989423

> qt( 0.975, df=1)

[1] 12.7062

Parameters are “quite” standard, consult the help.

Functions: apply

68

Applying a function to margins of an array or matrix.

> d

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

2 2 4 6 8 10 12 14 16

> apply( d, 2, mean)

X1 X2 X3 X4 X5 X6 X7 X8

1.5 3.5 5.5 7.5 9.5 11.5 13.5 15.5

> apply( d, 1, range)

[,1] [,2]

[1,] 1 2

[2,] 15 16

> apply( d, 1, function(x, tr) { x[2] - mean(x, trim=tr)}, tr=.4)

[1] -5 -5

Hands-on tasks 14

69

1. Draw a normal sample of size 100 and draw a histogram of the

sample.

What is the mean and standard deviation of the sample?

2. Repeat the previous exercise 1000 times and calculate the mean

of the means and the standard deviations.

3. How do the results compare to the ones from your peers?

Is there a way to “homogenize” the procedure?

Reading from files: data files

70

I Several possibilities of reading ASCII files:

> read.table(file, header = FALSE, sep = "")

> read.csv(file, header = TRUE, sep = ",", quote="\"")

> scan(file, ...)

I scan is a powerful (complex) alternative.

I Byte length encoding is read with read.fwd.

I Common open source storage formats are supported:

netCDF, GRIB, HDF, . . .

(specific packages need to be loaded).

I Directly reading Excel files is not possible (non-free software).

Reading from files: R code/objects

71

I R “source code” is read and evaluated with source("filename.R")

I R data files are read with load("file.RData")

I To save R objects use

> save.image()

> save(..., file="file.RData") # symbols or character strings

Note the save.image question when quitting R.

I data() lists all the available datasets in the search path (directly

available).

data( package=.packages( all.available=TRUE)) lists all the avail-

able datasets.

I data( name, package="packagename") loads name from the package

packagename.

Hands-on tasks 15

72

On www.math.uzh.ch/furrer/software/workshop/ the three datasets

data1.dat, data2.dat and data3.dat are deposited (use entire link).

1. Download the datasets and look at the content thereof.

What are the differences?

2. Load these three datasets into R, by properly keeping column and

row names of the original data.

Try to specify directly the URL instead of the filename, what

do you notice?

3. Save one of the datasets in R-native format.

4.? Are there ways to reduce the file size?

73

Part 3

Plotting

74

I Plotting in R

I High-level plotting (HLP) functions

I Low-level plotting (LLP) functions

I Interactive graphics functions

I Graphical parameters

Plotting in R

75

R distinguishes different plotting type functions:

I High-level plotting (HLP) functions create a new plot on the

graphics device, possibly with axes, labels, titles and so on.

I Low-level plotting (LLP) functions add more information to an

existing plot, such as extra points, lines and labels.

I Interactive graphics functions allow you interactively add infor-

mation to, or extract information from, an existing plot, using a

pointing device such as a mouse.

R maintains a list of graphical parameters which can be manipulated

to customize your plots.

Plotting in R: workflow

76

General workflow:

1. Choosing a device (screen, PDF file, . . . )

2. Setting graphical parameters

3. Calling a high-level plotting function

4. Calling low-level plotting functions

5. More calls to high-level and low-level functions

6. Closing the device

Simplest example (i.e., point 3 only):

> plot(0)

Plotting in R: workflow: example

77

> x <- rnorm( 100) # 100 random numbers

> pdf( "figure1.pdf") # Output to a PDF file

> par( mfrow=c(1, 2)) # Two panels for this plot

> hist( x) # high-level call

> abline( v=mean( x)) # low-level call

> qqnorm( x) # second high-level call

> dev.off() # close the device

produces: Histogram of x

x

Fre

quen

cy

−2 0 1 2

05

1015

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 0 1 2

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Plotting in R: workflow

78

I If no device is open, the default one will be used (usually screen).

I When producing files, dev.off() is required.

I Each new high-level plot overwrites the current area, unless dif-

ferently specified (usually, add=TRUE).

I Several devices can be open, only one is active. Use dev.cur()

and dev.set(), to inquire and set the active device.

HLP functions

79

Common high-level plotting functions:

plot(x, y) most basic plotting command, flexiblehist(x) histogram (specify breaks for discrete data)boxplot(x) boxplot of one or several variablesqqnorm(y) quantile-quantile plot (empirical vs normal)qqplot(x, y) quantile-quantile plot (empirical vs arbitrary)pairs(x) scatterplots for multidimensional datacurve(expr) plots a functionimage(x, y, z) z = f(x, y) is provided in a matrixcontour(x, y, z) z = f(x, y) is provided in a matrixpersp(x, y, z) basic 3D plotting with shading

Hands-on tasks 16

80

1. Draw a random sample of size 15 from a normal distribution.

Plot a histogram and superimpose the true density.

2. Repeat the experiment 100 times and superimpose a histogram

of the means.

HLP functions: 3D plotting

81

Consider X1, . . . , Xniid∼ N (µ, σ2).

Investigate the likelihood function L(µ, σ) =n∏i=1

fX(xi;µ, σ).

For numerical stability, we work with the log-likelihood.

> mu <- 2

> sigma <- 2

> n <- 20

> x <- rnorm(n,mu,sigma)

> loglikelihood <- function(pars, x) {

+ return( sum( dnorm( x, pars[1], pars[2], log=T) ) )

+ }

HLP functions: 3D plotting

82

Evaluate the log-likelihood over a grid

> ns <- 50

> m <- seq( 1, to=4, length=ns)

> s <- seq( 1, to=5, length=ns)

> grid <- expand.grid( m, s)

> ll <- apply( grid, 1, loglikelihood, x=x) # What is ll?

> llmat <- matrix( ll, ns) # What is dim(llmat)? Why?

> image( m, s, llmat)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

12

34

5

m

s

HLP functions: 3D plotting

83

> ncol <- 64

> mx <- unlist( grid[ which.max( ll),])

> image( m, s, llmat, col=topo.colors(ncol),

+ xlab=expression(mu), ylab=expression(sigma))

> abline( v=mx[1], h=mx[2])

1.0 1.5 2.0 2.5 3.0 3.5 4.0

12

34

5

µ

σ

HLP functions: 3D plotting

84

> image( m, s, llmat, col=topo.colors(64),

+ xlab=expression(mu), ylab=expression(sigma))

> abline( v=mx[1], h=mx[2])

> box()

> contour( m, s, llmat, add=T)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

12

34

5

µ

σ

−70 −65 −60 −60 −55 −55 −50

−50

−45

−40

HLP functions: 3D plotting

85

> persp( m, s, llmat)

m

s

llmat

HLP functions: 3D plotting

86

> persp( m, s, llmat, phi=45, theta=30)

m

s

llmat

HLP functions: 3D plotting

87

> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE)

HLP functions: 3D plotting

88

> zfacet <- llmat[-1,-1]+llmat[-1,-ns]+llmat[-ns,-1]+llmat[-ns,-ns]

> facetcol <- cut( zfacet, ncol)

> brcol <- colorRampPalette( c("white","yellow", "red") )

> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE,

+ col=brcol( ncol)[facetcol]) -> out

HLP functions: 3D plotting

89

> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE,

+ col=brcol(ncol)[facetcol], border=NA)

> points( trans3d(mx[1], mx[2], max( llmat), out), cex=4, col=4,

+ pch=4)

HLP functions: 3D plotting

90

Grid search delivers maximum:

> c( value=max(ll), mx)

value Var1 Var2

-39.924118 2.408163 1.816327

Numerical optimum is at:

> par <- optim( mx, function( theta) -loglikelihood( theta, x),

+ method="L-BFGS-B", lower=c( -Inf, 0))

> c( value=-par[["value"]], par[["par"]])

value Var1 Var2

-39.913951 2.381047 1.780258

> par[4] # _ALWAYS_ check!

$convergence

[1] 0

For a maximization, set control$fnscale to a negative value.

Hands-on tasks 17

91

Draw a random sample of size two from a normal density.

1. Plot the log-likelihood as a function of x1 and x2.

2. Plot the log-likelihood as a function of µ and σ.

LLP functions

92

Common low-level plotting functions:

points, lines similar as plot

title main/sub above/below the panelabline v, h, or intercept/slopetext like points with text insteadmtext quite flexiblelegend flexible through many parametersaxis add additional axis, (see xaxt, yaxt)box around the panelarrows, segments . . .polygon . . .rect . . .

Interactive graphics function

93

I locator(n=512):

gets n coordinates of the graphics cursor when left mouse button

is pressed.

I identify(x, y, n=length(x)):

after a left mouse button click, reads the position and searches

the closest point among x,y. Returns the index of the points.

I Both functions quit when pressing any other button.

I For more interaction, use package rgl.

Graphical parameters

94

The function par queries and sets plotting parameters (similar to

option for “system” parameters).

> par("bty") # Frame is a rectangle

[1] "o"

> par(bty="n") # no frame/box is drawn

> par("bty")

[1] "n"

Many options are available, see for example:

> par()

?par is my most frequent help call.

Graphical parameters

95

Further parameters:

adj text ajustment (.5 is default, centering)bg fg background and foreground (default) colorcex, cex. magnification of text and symbols relative to the defaultcol, col. color specification (numbers 0:7, words, rgb hex string)las rotation style of axis labelslty line type (1=solid, 2=dashed, 3=dotted, ...)lwd line widthmfrow,mfcol array of subplots filled by row/columnew if TRUE the next HLP will not clean the framepch specifying the symbol used for pointspty if s use square plotting areaxaxs, yaxs i for precise axis boundsxaxt, yaxt n to suppress axis drawingxlog, ylog if TRUE use logarithmic scale

where “ ” : axis, lab, main, sub

Graphical parameters

96

mai and omi (in inches or mar and oma in ’lines’):

As well as mgp, (defaults to c(3,1,0)). . .

Graphical parameters: example

97

> sample <- rt(100, df=2)

> boxplot( sample)

●●

−10

−5

05

1015

20

Graphical parameters: example

98

> par(bty="l", col=5, col.main=2, cex=2)

> boxplot( sample, main="Boxplot")−

100

1020

Boxplot

Graphical parameters: example

99

> par(bty="l", col.main=2, col.axis=4, cex=2, mai=c(.1,.7,.5,.1),

+ mgp=c(3,.8,0), adj=1, las=1, pch="-")

> boxplot( sample, main="Boxplot", col=5)

−−−10

−5

0

5

10

15

20Boxplot

Graphical parameters: example

100

> par(bty="l", col.main=2, col.axis=4, cex=2, mai=c(.1,.7,.5,.1),

+ mgp=c(3,.8,0), adj=1, las=1, pch="-")

> boxplot( sample, col=5)

> title("Boxplot", adj=.5)

−−−10

−5

0

5

10

15

20Boxplot

Hands-on tasks 18

101

Create the following plot. The data is available at:

www.math.uzh.ch/furrer/software/workshop/wheat.csv50

6070

8090

110

Durum

US

pro

duct

ion

(mio

bus

hel)

56

78

910

Pric

e (U

SD

per

bus

hel)

2008/09 2009/10 2010/11 2011/12

Hands-on tasks 19

102

Create the following plot.

−3 −2 −1 0 1 2 3

−6

−4

−2

02

46

x

f(x)

ex

ln(x)

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

2.0

2.5

3.0

x

f(x)

cosh(x)arcosh(x)

103

Part 4

Linear models

104

I A regression example

I Objects of class formula

I lm object

I Another regression example

I Other uses of formula objects

A regression example

105

Suppose we have a response y for a set of predictors x1, . . . , xp.

Assume a linear model

yi = β1xi1 + · · ·+ βpxip + εi εiiid∼ N (0, σ2), i = 1, . . . , n

in matrix notation y = Xβ + ε.

Given response and predictors “solve” the regression problem:

I What are the estimates β̂?

I Which predictors are significant?

I Is the model adequate?

I . . .

A regression example

106

Artificial data, so we know the “truth”:

> n <- 10

> x <- runif( n, -1, 2)

> beta <- c( 1, 1)

> sigma <- .5

> y <- beta[1] + beta[2]*x + rnorm( n, sd=sigma)

> plot( x, y)

●●

−0.5 0.0 0.5 1.0 1.5

0.5

1.5

2.5

3.5

x

y

A regression example

107

A linear model is fitted with> lm1 <- lm( y~x)> summary( lm1)Call:lm(formula = y ~ x)

Residuals:Min 1Q Median 3Q Max

-1.1663 -0.3133 0.1224 0.3003 0.6425

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.9993 0.2257 4.426 0.002208 **x 1.0673 0.2031 5.255 0.000769 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.577 on 8 degrees of freedomMultiple R-squared: 0.7754, Adjusted R-squared: 0.7473F-statistic: 27.62 on 1 and 8 DF, p-value: 0.000769

A regression example

108

> coef( lm1)

(Intercept) x

0.9992531 1.0673167

> fitted( lm1)

1 2 3 4 5 6 7

0.7820819 1.1234585 1.7661843 2.8399724 0.5777119 2.8085353 2.9567395

8 9 10

2.0477779 1.9463282 0.1297729

> resid( lm1)

1 2 3 4 5

-0.39579007 0.23662769 0.32153818 0.17254164 -0.12536026

6 7 8 9 10

0.64252432 0.07220797 -0.37600485 -1.16633597 0.61805134

A regression example

109

> par( mfrow=c(2, 2))

> plot( lm1)

0.5 1.0 1.5 2.0 2.5 3.0

−1.

00.

0

Fitted values

Res

idua

ls

●●

Residuals vs Fitted

9

610

●●

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

01

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als Normal Q−Q

9

106

0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●●

●●

Scale−Location9

10 6

0.0 0.1 0.2 0.3

−2

01

Leverage

Sta

ndar

dize

d re

sidu

als

●●●

Cook's distance 1

0.5

0.5

Residuals vs Leverage10

9

6

A regression example

110

> pre <- predict( lm1, newdata=data.frame(x=0))

> pre

1

0.9992531

> plot( x, y)

> points( 0, pre, col=2, cex=2)

●●

−0.5 0.0 0.5 1.0 1.5

0.5

1.5

2.5

3.5

x

y

A regression example

111

> new <- data.frame( x = seq(-2, 3, by=0.25))

> pred.w.plim <- predict( lm1, new, interval="prediction")

> pred.w.clim <- predict( lm1, new, interval="confidence")

> plot( x, y)

> points( 0, pre, col=2, cex=2)

> matlines( new$x, cbind(pred.w.clim, pred.w.plim[,-1]), lty=1)

●●

−0.5 0.0 0.5 1.0 1.5

0.5

1.5

2.5

3.5

x

y

Objects of class formula

112

General structure: LHS ~ RHS

I ~ is used to define a model formula

I LHS is usually a single vector, the response

I RHS is of the form

op1 term1 op2 term2 ...

where opi is either + or - and termi: formula expression consisting

of factors, vectors or matrices connected by formula operators.

I Examples of formula operators are in RI p52.

I I(object) treated as is, inhibit the interpretation of operators as

model operators.

I offset(object) term in a linear model with known coefficient (=1)

lm object

113

Generic for lm object:plot

print

summary

residuals resid

coef

predict

add1

drop1

step

deviance

formula

anova

vcov

kappa

effects

There exist some more . . .

Another regression example

114

> pairs( swiss, panel = panel.smooth, main = "swiss data",

+ col = 3 + (swiss$Catholic > 50), gap=0)

Fertility

0 40 80

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ●

● ●●

●●

●●●

● ●

●●

●●

● ●

0 20 40

●●

●●

● ●●

●● ●

●●

●●

●●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

●●

15 20 25

4060

80●●

● ●

● ●●

●● ●

●●

●●

●● ●

●●

● ●●

● ●

●●

●●

●●

040

80

●●●

●●

● ●

●●

●●●●

● ●

●●

●●

●●

●●●

●●●

●●

●●●

Agriculture

●● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●

● ●●

●●

●● ●

●●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●

●●●

●●

●● ●

●●●

●●

●●

●●

●●●●● ●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●● ●

●●

●●

●●

●●●

● ●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● Examination●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

515

30

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

020

40

●●

●●

● ● ●●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

● ●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

● ● ●●

● ●

● ● Education

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

● ● ●●●

●●●●

●●● ●●●●● ●●●

●●

●● ●● ●● ●●

●● ●●

●●

●●

●●

● ●●●

●●

●●● ●

●●

● ●●●●● ●●●

●●

●●●●● ● ●●

●● ●

●●

●●

●●

● ●●●●

●● ● ●

●●

●● ●●●

●●●●●

● ●● ●●●● ●

●●●

● ●

●●

●●

●●●●

●●

●● ●●

●●

●●●●●

●● ●●●

●●●●● ●● ●

●●●

●●

●●●

Catholic0

4080

●●

●●●●●●

●● ●●

●●

● ●● ●●

● ● ●●●

● ●●●●●● ●

●●●

●●

●●

40 60 80

1520

25

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

5 15 25 35

●●

● ● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●

● ●●●

●●

●● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●●●

0 40 80

● ●

●●●

●●

●●

●● ●

●●

●●

●●

●●●●

●●

●●●

●●●●

● Infant.Mortality

swiss data

Another regression example

115

> summary( lmswiss <- lm(Fertility ~ . , data = swiss))Call:lm(formula = Fertility ~ ., data = swiss)

Residuals:Min 1Q Median 3Q Max

-15.2743 -5.2617 0.5032 4.1198 15.3213

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 66.91518 10.70604 6.250 1.91e-07 ***Agriculture -0.17211 0.07030 -2.448 0.01873 *Examination -0.25801 0.25388 -1.016 0.31546Education -0.87094 0.18303 -4.758 2.43e-05 ***Catholic 0.10412 0.03526 2.953 0.00519 **Infant.Mortality 1.07705 0.38172 2.822 0.00734 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 7.165 on 41 degrees of freedomMultiple R-squared: 0.7067, Adjusted R-squared: 0.671F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10

Another regression example

116

> drop1( lmswiss, test="F")

Single term deletions

Model:

Fertility ~ Agriculture + Examination + Education + Catholic +

Infant.Mortality

Df Sum of Sq RSS AIC F value Pr(>F)

<none> 2105.0 190.69

Agriculture 1 307.72 2412.8 195.10 5.9934 0.018727 *

Examination 1 53.03 2158.1 189.86 1.0328 0.315462

Education 1 1162.56 3267.6 209.36 22.6432 2.431e-05 ***

Catholic 1 447.71 2552.8 197.75 8.7200 0.005190 **

Infant.Mortality 1 408.75 2513.8 197.03 7.9612 0.007336 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Another regression example

117

> add1( lm( Fertility ~ 1, data=swiss), ~ Agriculture +

+ Examination + Education + Catholic + Infant.Mortality)

Single term additions

Model:

Fertility ~ 1

Df Sum of Sq RSS AIC

<none> 7178.0 238.34

Agriculture 1 894.8 6283.1 234.09

Examination 1 2994.4 4183.6 214.97

Education 1 3162.7 4015.2 213.04

Catholic 1 1543.3 5634.7 228.97

Infant.Mortality 1 1245.5 5932.4 231.39

Other uses of formula objects

118

I Functions like plot or boxplot can be fed with a formula object.

I Generalized linear models, extensions of linear models:

glm( formula, family = gaussian, data, weights, subset, ...)

119

Part 5

Programming tricks

120

I Search path

I Scripting

I Functions

I Writing packages

I Customize the environment

I Writing documents

Search path

121

R objects of a session are stored in environments.

The global environment is called the workspace.

> ls()

[1] "a" "beta" "brcol" "d"

[5] "facetcol" "grid" "l" "ll"

[9] "llmat" "lm1" "loglikelihood" "m"

[13] "m1" "m2" "m3" "mu"

[17] "mx" "myvar" "n" "ncol"

[21] "nrcyclones" "ns" "par" "s"

[25] "sample" "sigma" "v" "x"

[29] "y" "zfacet"

> rm( m1, m2, m3, facet, loglikelihood, nrcyclones, facetcol, grid,

+ llmat, zfacet, ncol, mx, brcol, ll, myvar, lm1, sample)

> ls()

[1] "a" "beta" "d" "l" "m" "mu" "n" "ns"

[9] "par" "s" "sigma" "v" "x" "y"

Search path

122

To list all environments or databases:

> search()

[1] ".GlobalEnv" "package:stats" "package:graphics"

[4] "package:grDevices" "package:utils" "package:datasets"

[7] "package:methods" "Autoloads" "package:base"

Variables are searched for in the databases until an appropriate match

is found.

Search path: data frames

123

attach allows you to put the “columns” of the argument in your

“search path”, i.e., they are directly accessible.

> X1

Error in try(X1) : object 'X1' not found

> attach( d) # reverse is done with a detach(d)

> X1

[1] 1 2

> search()

[1] ".GlobalEnv" "d" "package:stats"

[4] "package:graphics" "package:grDevices" "package:utils"

[7] "package:datasets" "package:methods" "Autoloads"

[10] "package:base"

> detach( d)

> search()[1:3]

[1] ".GlobalEnv" "package:stats" "package:graphics"

Hands-on tasks 20

124

1. What is the command rm( list=ls()) doing.

2. Attach d, change an entry in X1, then attach d again.

What do you notice?

Scripting

125

I Save R commands in a file.

File is executed with source( filename ),

where filename is a character string.

I Scripting is faster than line by line evaluation.

I Better programming practice compared to history re-evaluation!

I Make use of #.

I Add plenty of spaces or newlines to structure the code.

Scripting: flow control

126

I if-statements:

> if(condition) expr

> if(condition) cons.expr else alt.expr

I Control:

> stop('message')> warning('message') # evaluation is continued

I Loops:

> for(var in seq) expr

> while(condition) expr

> repeat expr # needs a break

Most loops can be avoided by “vectorizing” the commands.

Scripting: flow control: vectorizing

127

Instead of:

> rns <- matrix(0, 90, 100)

> sol <- numeric( 90)

> for ( i in 1:90) {

+ rns[i,] <- rnorm(100)

+ sol[i] <- mean( rns[i,])

+ }

> rns

Use:

> rns <- array( rnorm( 90*100), c(90,100))

> sol <- apply( rns, 1, mean)

Hands-on tasks 21

128

1. Convince yourself that if ( cond ) expr and if(cond)expr

are equivalent (note the spaces).

2. Create a script executing a few commands and evaluate the script.

E.g. drawing 1000 random numbers from a gamma distribution,

plotting the histogram and indicating the mean and median with

vertical lines.

3. Implement a statement causing an error in the last call, what do

you notice?

Functions

129

I A function is defined by an assignment of the form

> functionname <- function(arg_1, arg_2, ...) expression

expression is usually a series of R expressions (evaluations) grouped

by { and }.

I The last (evaluated) expression is returned.

I Recommended to use a return() or invisible().

Functions

130

Example:

two functions that transform Cartesian (x, y) to polar coordinates

(θ, ρ):

> cart2polar <- function(x) {

+ return( cbind( atan2(x[,2], x[,1]), sqrt( x[,1]^2 + x[,2]^2)))

+ }

> polar2cart <- function(x) {

+ return( cbind( x[,2]*cos(x[,1]), x[,2]*sin(x[,1])) )

+ }

> n <- 1500

> po <- cbind( runif(n, 0, 2*pi), runif( n, 0, 1))

Functions

131

> par( pty="s")

> plot( polar2cart( po))

●●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●●

●●

●●

●● ●

●●

● ●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

polar2cart(po)[,1]

pola

r2ca

rt(p

o)[,2

]

Functions

132

Maybe, some checking might be useful:

> cart2polar <- function(x) {

+ if ((length(dim(x))!=2) || (dim(x)[2]!=2))

+ stop("Need a nx2 matrix/array")

+ return( cbind(atan2(x[,2],x[,1]), sqrt( x[,1]^2+x[,2]^2)))

+ }

> cart2polar(rep(1,2))

Error in cart2polar(rep(1, 2)) : Need a nx2 matrix/array

> cart2polar(cbind(1,2))

[,1] [,2]

[1,] 1.107149 2.236068

Hands-on tasks 22

133

1. extend the function cart2polar such that an optional argument

allows scaling of the coordinates.

2. extend the function polar2cart such that degrees as input are

possible.

Packages

134

I All R functions and datasets are stored in packages.

I Only when a package is loaded are its contents available.

This is done both for efficiency and to aid package developers,

who are protected from name clashes with other code.

I Packages come along with help files for each function and dataset!

I A few packages are standard and loaded by default:

stats, graphics, grDevices, utils, datasets, methods, base.

I There are > 3800 packages publicly available on CRAN.

Daily increasing . . .

Packages

135

I To see which packages are installed at your site, issue

> library()

I To see which packages are currently loaded, use

> search()

I To load a package, use

> library( abind)

I To remove a package, use

> detach( package:abind)

I A basic description of the package is often given by

> help( "package.name")

RStudio

Packages: namespaces

136

Packages have a NAMESPACE

:: accessing public (exported) objects

::: accessing private (non-exported) objects

Works for not-loaded packages as well!

> exists( "diag.spam")

[1] FALSE

> spam::diag.spam( 1)

[,1]

[1,] 1

Class 'spam'> spam::.spam.addsparsefull

Error : '.spam.addsparsefull' is not an exported object from 'namespace:spam'> # The following would work:

> # spam:::.spam.addsparsefull

Packages: writing packages

137

I Disseminate R code (globally or locally)

I Thorough code and documentation checking

Documentation:

cran.r-project.org/doc/manuals/R-exts.html

cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf

Customize the environment

138

Within RStudio, set preferences (→ Tools → Options)

Customize the environment

139

Global and local initialization files (Section 10.8 in RI).

I global: file taken from the R PROFILE environment variable

I local: .Rprofile in any directory

Launching R executes (“sources”)

1. site profile

2. user profile (local or home)

3. .RData

4. .First()

Customize the environment

140

Example:

> .First <- function() {

+ library( spam)

+ source( "/home/furrer/R/usefulfcn.R")

+ options( width=120)

+ }

Similarly, before closing R, .Last() is executed:

> .Last <- function() {

+ cat( "Thanks for using R - good night or enjoy your coffee\n")

+ }

Customize the environment: ESS

141

ESS: EMACS speaks statistics

EMACS environment for R (and other statistics software)

Writing documents

142

Using Sweave() mingle/merges LATEX with R code and R code output

within one document.

Structure of a LATEX file with embedded R code:

<<tag, eval=TRUE, echo=TRUE, fig=TRUE>>=

plot( x, y, xlab=’Diameter’, ylab=’Height’)

@

Prints, evaluates the code and includes the figure.

Documentation:

stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf

This presentation has been prepared with Sweave and the LATEX pack-

age pfuef.