Statistical Programming with R

Lecture 1: Basic Concepts

Bisher M. [email protected]

Department of Mathematics, Faculty of Science,

The Islamic University of Gaza

2019-2020, Semester 1

Simple R Expressions

A user types expressions to the R interpreter.R responds by computing and printing the answers.(The second line isthe answer from the machine.)

> # "*" is the symbol for multiplication.

> # Everything following a # sign is assumed to be a

> # comment and is ignored by R.

> 1 + 2

[1] 3

> 1/2

[1] 0.5

> 17^2

[1] 289

> 1 + 2 * 3

[1] 7

> (1 + 2) * 3

[1] 9

R Functions

> sqrt(2)

[1] 1.414214

> exp(2)

[1] 7.389056

> sin(1)

[1] 0.841471

> 4 * atan(1)

[1] 3.141593

> abs(3-7) # Absolute value of 3-7

[1] 4

Named storage

R has a workspace that can be used to provides a way of naming the valuesproduced by computations. A name/value pair stored by R is called avariable. To assign the value 10 to the variable x, you can enter

> x=10 # or x<-10 (<- read as a single symbol)

From now on x has the value 10 and can be used in subsequent arithmeticexpressions.

> x

[1] 10

> x + x

[1] 20

A variable?s value can be changed by performing a new assignment of thename.

> x=12 # Names are case-sensitive:

> x + x # X and x do not refer to the same variable.

[1] 24

Rather than working with individual data values, R computationsoperate on vectors of values.

This re�ects the fact that statistical computations generally take placeon collections of values rather than individual ones.

The key point about vectors is that they contain values which are allof the same basic type (numbers, complex-numbers, character strings,etc.).

For the time being, we'll con�ne ourselves to discussing numericvectors (vectors whose elements are numbers).

Vectors Continued...

A numeric vector is a list of numbers. The c() function is used to collectthings together into a vector (i.e. concatenated). We can type

> c(-1, 5, 9)

[1] -1 5 9

Again, we can assign this to a named object:

> X <- c(-1, 5, 9) # now X is a 3-element vector

To see the contents of X, simply type

> X

[1] -1 5 9

If you also type x, you will obtain (Why???)

> x

[1] 12

One very useful way of generating vectors is using the sequence operator :.The expression z1:z2, generates the sequence of integers ranging from z1to z2.

> 1:45

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13

[14] 14 15 16 17 18 19 20 21 22 23 24 25 26

[27] 27 28 29 30 31 32 33 34 35 36 37 38 39

[40] 40 41 42 43 44 45

> 7:-5

[1] 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5

Combining Vectors

The function c() can be used to combine both vectors and scalars intolarger vectors.

> y = c(1, 2, 3, 4)

> c(y, 10)

[1] 1 2 3 4 10

> c(y, y)

[1] 1 2 3 4 1 2 3 4

In fact, R stores scalar values like 10 as vectors of length one, so that allarguments in the expression above are vectors.

Vector Arithmetic

Because 'everything is a vector', R handles vector arithmetic quite easilyand intuitively.

>(w<-1:5)#create and print a vector of consecutive integers

[1] 1 2 3 4 5

> w+2 # scalar addition

[1] 3 4 5 6 7

> 2*w #scalar multiplication

[1] 2 4 6 8 10

> w^2 #raise each component to the second power

[1] 1 4 9 16 25

> 2^w #raise 2 to the first through fifth power

[1] 2 4 8 16 32

> w #w itself has not been unchanged

[1] 1 2 3 4 5

> w<-w*2

> w #it is now changed

More examples of vector arithmetic:

>z=c(1,3,2,10,5);w=1:5 #use semicolon to separate statements

> z+w

[1] 2 5 5 14 10

> z*w

[1] 1 6 6 40 25

> z/w

[1] 1.0000000 1.5000000 0.6666667 2.5000000 1.0000000

> z^w

[1] 1 9 8 10000 3125

> sum(z) #sum of elements in z

[1] 21

> cumsum(z) #cumulative sum vector

[1] 1 4 6 16 21

> max(z) #maximum

[1] 10

> min(z) #minimum

The Recycling Rule

When the vectors are di�erent lengths, the shorter one is extended byrecycling: values are repeated, starting at the beginning. For example, tosee what happens when vectors of di�erent sizes are combined.

> c(1, 2, 3, 4) + c(1, 2)

[1] 2 4 4 6

This result is explained by the recycling rule which is used by R to de�nethe meaning of this kind of calculation. Here is how the recycling ruleworks.










Binary Operations

The following binary operations all obey the recycling rule.

+ addition- subtraction* multiplication/ division� raising to a power%% remainder after division (modulo)%/% integer division

The Integer Division, Remainder and Modulo Operators

The value of the integer division z1 %/% z2 is computed by dividing z1 byz2 and then rounding down to the next lowest integer.

> 11 %/% 3

[1] 3

The result of the remainder expression z1 %% z2 is de�ned asz1 − z2 × (z1%/%z2).

> 11 %% 3

[1] 2 # 11-3(11 %/% 3)=11-3(3)=2

floor(), ceiling()- rounds to integers not greater or not less than theirarguments, respectively.

> floor(11/3) > 11 %/% 3 == floor(11/3)

[1] 3 [1] TRUE

> ceiling(11/3)

[1] 4

The Integer Division, Remainder and Modulo Operators


The modulo operator is useful in integer computations,

> 11 %% 3

[1] 2

> 1:10 %% 2

[1] 1 0 1 0 1 0 1 0 1 0

but it can also be used with more general numbers

> 13.5 %% 2

[1] 1.5

> 13.5 %/% 2

[1] 6

Extracting elements from vectors

Individual elements can be extracted from vectors by specifying their index.The third element can be extracted from v = 3:8 as follows.

> v[3]

[1] 5

Square brackets ([ ]) are used for subscripting, and can be applied to anysubscriptable value.It is also possible to extract subvectors by specifying vectors of indices.

> v[c(2, 4)]

[1] 4 6

The sequence operator provides a useful way of extracting consecutiveelements from a vector.

> v[2:4]

[1] 4 5 6

Negative subscripts

Negative indices can be used to avoid certain elements. For example, wecan select all but the second element of v as follows:

> v[-2]

[1] 3 5 6 7 8

The third through �fth elements of v can be avoided as follows:

> v[-(3:5)]

[1] 3 4 8

Do not mix positive and negative subscripts. To see what happens, consider

> v[c(-2,4)]

Error in v[c(-2, 4)]:only 0's may be mixed with

negative subscripts

The problem is that it is not clear what is to be extracted: do we want thethird element of v before or after removing the second one?

Changing Vector Subsets

As well extracting the values at particular positions in a vector, it ispossible to reset their values. This is done by putting the subset to bemodi�ed on the left-hand side of the assignment with the replacementvalue(s) on the right.

> y = 1:10

> y[4:6] = 0

> y

[1] 1 2 3 0 0 0 7 8 9 10

Special Numerical Values ? In�nity

When 1 is divided by 0, mathematics de�nes the result to be in�nite. Thiskind of special result is also produced by R.

> 1 / 0

[1] Inf

Here, Inf represents positive in�nity. There is also a negative in�nity.

> -1 / 0

[1] -Inf

Properties of In�nity

In�nities have all the properties you would expect. For example

> 1 + Inf

[1] Inf


> 1000 / Inf

[1] 0

Special Numerical Values ? Not a Number

R also has a special value, called NaN, which indicates that a numericalresult is unde�ned.

> 0 / 0

[1] NaN

and subtracting in�nity from in�nity.

> Inf - Inf

[1] NaN

Some mathematical functions will also produce NaN results.

> sqrt(-1)

[1] NaN

Warning message:

In sqrt(-1) : NaNs produced

Special Numerical Values ? Not Available

R has a particular value which is used to indicate that a value is missing ornot available. The value is indicated by NA. Any arithmetic expressionwhich contains NA will produce NA as a result.

> 1 + sin(NA)

[1] NA

The value NA is usually used for statistical observations where the valuecould not be recorded, for example, when a survey researcher visits a houseand no one is home.

Some Functions for Vectors

• unique()- returns a vector containing one element for each uniquevalue in the vector.

• duplicated()- returns a logical vector which tells if elements of avector are duplicated with regard to previous ones.

• rev()- reverse the order of elements in a vector.

• sort()- sorts the elements in a vector.

• append()- append or insert elements in a vector.

• sum()- returns the sum of the elements of a vector.

• min()- returns the minimum value in a vector.

• max()- returns the maximum value in a vector.

• range()- returns a vector containing the minimum and maximumvalues in a vector.

• prod()- returns the product of all the values present in a vector.

> x=1:4; y=c(5,-3,4,8,2); z=6:3

> rev(x)

[1] 4 3 2 1

> sort(y)

[1] -3 2 4 5 8

> m=append(x,z)

> m

[1] 1 2 3 4 6 5 4 3

> unique(m)

[1] 1 2 3 4 6 5

> duplicated(m)


Summary Functions: min, max and range

The functions min and max return the minimum and maximum valuescontained in any of their arguments, and the function range returns avector of length 2 containing the minimum and maximum of the values inthe arguments.

> max(1:100)

[1] 100

> max(1:100, Inf)

[1] Inf

> range(1:100)

[1] 1 100

Summary Functions: sum and prod

The functions sum and prod compute the sum and prod of all the elementsin their arguments.

> sum(1:100)

[1] 5050

> prod(1:10)

[1] 3628800

Summary Functions and NA

In any of these summary functions the presence of NA and NaN values inany of the arguments will produce a result which is NA and NaN.

> min(NA, 100)

[1] NA

NA and NaN values can be disregarded by specifying an additional argumentof na.rm=TRUE.

> min(10, 20, NA, na.rm = TRUE)

[1] 10

Cumulative Summaries

There are also cumulative variants of the summary functions.

> cumsum(1:10)

[1] 1 3 6 10 15 21 28 36 45 55

> cumprod(1:10)

[1] 1 2 6 24 120

[6] 720 5040 40320 362880 3628800

> cummax(1:10)

[1] 1 2 3 4 5 6 7 8 9 10

> cummax(10:1)

[1] 10 10 10 10 10 10 10 10 10 10

> cummin(1:10)

[1] 1 1 1 1 1 1 1 1 1 1

> cummin(10:1)

[1] 10 9 8 7 6 5 4 3 2 1

These cumulative summary functions do not have a na.rm argument.

Non-vectorized functions

Although most functions in R are vectorized, returning objects which arethe same size and shape as their input, some will always return a singlelogical value.any() tests if any of the elements of its arguments meet a particularcondition; all() tests if they all do.

> x = c(7,3,12,NA,13,8)

> any(

[1] TRUE

> all(x > 0)

[1] NA

> all(x > 0, na.rm=TRUE)

[1] TRUE

Non-vectorized functions Cont.

identical() tests if two objects are exactly the same.

> set.seed(5)

> x <- rnorm(3)

> x

[1] -0.8408555 1.3843593 -1.2554919

> set.seed(5)

> y <- rnorm(3)

> y

[1] -0.8408555 1.3843593 -1.2554919

> identical(x, y)

[1] TRUE

> z <- c(-0.8408555, 1.3843593, -1.2554919)

> identical(x, z)


End of lecture 1. Thank you.!!!