Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training...

52
Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi University, Rajshahi-6205, Bangladesh. (An Open Source Package) Date: March 22-23, 2013 Lecture 3

Transcript of Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training...

Page 1: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

Training on R for StudentsHigher Education Quality Enhancement Project (HEQEP)

Software Training Program

Organized by

Department of Statistics

Rajshahi University, Rajshahi-6205, Bangladesh.

(An Open Source Package)

Date: March 22-23, 2013

Lecture 3

Page 2: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

2

Programming with R

Page 3: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

3

X<-matrix(rpois(20,1.5),nrow=4) # rpois() is random sample from Poisson distX [,1] [,2] [,3] [,4] [,5][1,] 1 0 2 5 3[2,] 1 1 3 1 3[3,] 3 1 0 2 2[4,] 1 0 2 1 0

Suppose that the rows refer to four different trials and we want to label the rows ‘Trial.1’ etc. We employ the function rownames to do this. We could use the paste function but here we take advantage of the prefix option:

rownames(X)<-rownames(X,do.NULL=FALSE, prefix="Trial.")X

[,1] [,2] [,3] [,4] [,5]Trial.1 1 0 2 5 3Trial.2 1 1 3 1 3Trial.3 3 1 0 2 2Trial.4 1 0 2 1 0

Matrices

Page 4: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

4

For the columns we want to supply a vector of different names for the five drugs involved in the trial, and use this to specify the colnames(X):

drug.names<-c("aspirin", "paracetamol", "nurofen", "hedex", "placebo")colnames(X)<-drug.namesX

aspirin paracetamol nurofen hedex placeboTrial.1 1 0 2 5 3Trial.2 1 1 3 1 3Trial.3 3 1 0 2 2 Trial.4 1 0 2 1 0

Alternatively, you can use the dimnames function to give names to the rows and/orcolumns of a matrix. In this example we want the rows to be unlabelled (NULL) and thecolumn names to be of the form ‘drug.1’, ‘drug.2’, etc.

dimnames(X)<-list(NULL, paste("drug.",1:5,sep=""))X drug.1 drug.2 drug.3 drug.4 drug.5[1,] 1 0 2 5 3[2,] 1 1 3 1 3[3,] 3 1 0 2 2[4,] 1 0 2 1 0

Matrices

Page 5: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

5

Making data frames (1)We illustrate how to construct a data frame from the following car data.

Make Model Cylinder Weight Mileage TypeHonda Civic V4 2170 33 SportyChevrolet Beretta V4 2655 26 CompactFord Escort V4 2345 33 SmallEagle Summit V4 2560 33 SmallVolkswagen Jetta V4 2330 26 SmallBuick Le Sabre V6 3325 23 LargeMitsubishi Galant V4 2745 25 CompactDodge Grand Caravan V6 3735 18 VanChrysler New Yorker V6 3450 22 MediumAcura Legend V6 3265 20 Medium

Page 6: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

6

Making data frames (2)> Make <- c("Honda","Chevrolet","Ford","Eagle","Volkswagen","Buick","Mitsbusihi", + "Dodge","Chrysler","Acura")

> Model <- c("Civic","Beretta","Escort","Summit","Jetta","Le Sabre","Galant", + "Grand Caravan","New Yorker","Legend")

> Cylinder <-c (rep("V4",5),"V6","V4",rep("V6",3))

> Weight <- c(2170, 2655, 2345, 2560, 2330, 3325, 2745, 3735, 3450, 3265)

> Mileage <- c(33, 26, 33, 33, 26, 23, 25, 18, 22, 20)

> Type <- c("Sporty","Compact",rep("Small",3),"Large","Compact","Van", + rep("Medium",2)) # rep("V4",5) instructs R to repeat V4 five times.

Page 7: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

7

Making data frames (3)Now data.frame() function combines the six vectors into a single data frame.

Car <- data.frame(Make,Model,Cylinder,Weight,Mileage,Type) Car

  Make Model Cylinder Weight Mileage Type 1 Honda Civic V4 2170 33 Sporty 2 Chevrolet  Beretta V4 2655 26 Compact 3 Ford Escort V4 2345 33 Small 4 Eagle Summit V4 2560 33 Small 5 Volkswagen Jetta V4 2330 26 Small 6 Buick Le Sabre V6 3325 23 Large 7 Mitsubishi Galant V4 2745 25 Compact 8 Dodge Grand Caravan V6 3735 18 Van 9 Chrysler New Yorker V6 3450 22 Medium 10 Acura Legend V6 3265 20 Medium

Page 8: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

8

Few Operations in data frame Car (1)names(Car) [1] "Make" "Model" "Cylinder“ "Weight" "Mileage" "Type"

Car[1,] Make Model Cylinder Weight Mileage Type 1 Honda Civic V4 2170 33 Sporty

Car[10,4][1] 3265

Car$Mileage [1] 33 26 33 33 26 23 25 18 22 20

mean(Car$Mileage) #average mileage of the 10 vehicles [1] 25.9

min(Car$Weight) # minimum of car weights[1] 2170

Page 9: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

9

table(Car$Type) # gives a frequency table Compact Large Medium Small Sporty Van 2 1 2 3 1 1

table(Car$Make, Car$Type) # Cross tabulation Compact Large Medium Small Sporty Van Acura 0 0 1 0 0 0 Buick 0 1 0 0 0 0 Chevrolet 1 0 0 0 0 0 Chrysler 0 0 1 0 0 0 Dodge 0 0 0 0 0 1 Eagle 0 0 0 1 0 0 Ford 0 0 0 1 0 0 Honda 0 0 0 0 1 0 Mitsbusihi 1 0 0 0 0 0 Volkswagen 0 0 0 1 0 0

Few Operations in data frame Car (2)

Page 10: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

10

Making data frames (6)

Make.Small <- Car$Make[Car$Type == "Small"]

summary(Car$Mileage) # gives summary statistics Min. 1st Qu. Median Mean 3rd Qu. Max. 18.00 22.25 25.50 25.90 31.25 33.00

Page 11: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

11

Rank, Sorting and OrderPrice<- scan()1: 325 201 157 162 164 101 211 188 95 117 188 121 13: Read 12 itemsranks<-rank(Price)sorted<-sort(Price)ordered<-order(Price) # positionview<-data.frame(Price, ranks, sorted, ordered)view Price ranks sorted ordered1 325 12.0 95 92 201 10.0 101 63 157 5.0 117 104 162 6.0 121 125 164 7.0 157 36 101 2.0 162 47 211 11.0 164 58 188 8.5 188 89 95 1.0 188 1110 117 3.0 201 211 188 8.5 211 712 121 4.0 325 1

Page 12: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

apply( arr, margin, fct )

Applies the function fct along some dimensions of the array arr, according to margin, and returns a vector or array of the appropriate size.

The apply function is used for applying functions to the rows or columns of matrices or dataframes.

Evaluating Functions with apply, sapply and lapply

Page 13: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

13

For example: apply

(X<-matrix(1:24,nrow=4)) [,1] [,2] [,3] [,4] [,5] [,6][1,] 1 5 9 13 17 21[2,] 2 6 10 14 18 22[3,] 3 7 11 15 19 23[4,] 4 8 12 16 20 24

apply(X,1,sum) # to obtain the row total[1] 66 72 78 84

apply(X,2,sum) # to obtain the column totals (six of them):[1] 10 26 42 58 74 90

apply(X,1,sqrt)apply(X,2,sqrt)

apply(X,1,sample)apply(X,1,function(x) x^2+x)

Page 14: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

14

If you want to apply a function to a vector then use sapply (rather than apply for matrices or margins of matrices). Here is the code to generate a list of sequences from 1:3 up to 1:7

sapply(3:7, seq)

[[1]][1] 1 2 3[[2]][1] 1 2 3 4[[3]][1] 1 2 3 4 5[[4]][1] 1 2 3 4 5 6[[5]][1] 1 2 3 4 5 6 7

The function sapply is most useful with complex iterative calculations.

Vector and sapply

Page 15: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

15

Example: sapply

a<-seq(0.01,0.2,.005)

Now we can use sapply to apply the sum of squares function for each of these values of a (without writing a loop), and plot the deviance against the parameter value for a:

sumsq<- function(x) {sum(x^2)} # function that produce sum of squaresplot(a, sapply(a, sumsq), type="l")

Page 16: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

16

Lists and lapply

lapply( li, fct ) # to each element of the list li, the function fct is applied.

a<-c("a","b","c","d")b<-c(1,2,3,4,4,3,2,1)c<-c(T,T,F)list.object<-list(a,b,c) # create a list objectclass(list.object) # to see the class type[1] "list"list.object # to see the contents of the list we just type its name:

[[1]][1] "a" "b" "c" "d"[[2]][1] 1 2 3 4 4 3 2 1[[3]][1] TRUE TRUE FALSE

The function lapply applies a specified function to each of the elements of a list in turn (without the need for specifying a loop, and not requiring us to know how many elements there are in the list).

Page 17: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

17

#To know the length of each of the vectors making up the list:lapply(list.object, length)

#To find out class, we apply the function class to the list:lapply(list.object, class)

#To find meanlapply(list.object, mean)

Lists and lapply

Page 18: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

18

Working with Vectors and Logical Subscripts

Take the example of a vector containing the 11 numbers 0 to 10:

x<-0:10

There are two quite different kinds of things we might want to do with this. We might want to add up the values of the elements:

sum(x) # adds up the values of the xs[1] 55

Alternatively, we might want to count the elements that passed some logical criterion. Suppose we wanted to know how many of the values were less than 5:

sum(x<5) # counts up the number of cases that pass the logical

# condition ‘x is less than 5’[1] 5

Page 19: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

19

When we counted the number of cases, the counting was applied to the entire vector, using sum(x<5). To find the sum of the values of x that are less than 5:

sum(x[x<5])[1] 10

The logical condition x<5 is either true or false:x<5[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE[10] FALSE FALSE

Imagine false as being numeric 0 and true as being numeric 1. Then the vector of subscripts [x<5] is five 1s followed by six 0s:1*(x<5)[1] 1 1 1 1 1 0 0 0 0 0 0

Now imagine multiplying the values of x by the values of the logical vectorx*(x<5)[1] 0 1 2 3 4 0 0 0 0 0 0

When the function sum is applied, it gives us the answer we want: the sum of the values of the numbers 0+1+2+3+4=10.sum(x*(x<5))[1] 10This produces the same answer as sum(x[x<5])

Page 20: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

20

Addresses within VectorsThere are two important functions for finding addresses within arrays. The function which is very easy to understand. y<-c(8,3,5,7,6,6,8,9,2,3,9,4,10,4,11)y[1] 8 3 5 7 6 6 8 9 2 3 9 4 10 4 11Suppose we wanted to know which elements of y contained values bigger than 5. We typewhich(y>5)[1] 1 4 5 6 7 8 11 13 15Notice that the answer to this enquiry is a set of subscripts. We don’t use subscripts inside the which function itself. The function is applied to the whole array. To see the values of y that are larger than 5, we just typey[y>5][1] 8 7 6 6 8 9 9 10 11Note that this is a shorter vector than y itself, because values of 5 or less have been left out:length(y)[1] 15length(y[y>5])[1] 9

Page 21: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

21

Finding Closest ValuesFinding the value in a vector that is closest to a specified value is straightforward using which. Here, we want to find the value of xv that is closest to 108.0:which(abs(y-8.9)==min(abs(y-8.9)))[1] 8 11The closest value to 108.0 is in location 332. But just how close to 8.9 is this 8th and 11th value? We use 8 and 11 as a subscript on y to find this outy[c(8,11)][1] 9 9

Thus, we can write a function to return the closest value to a specified value svclosest<-function(xv, sv){xv[which(abs(xv-sv)==min(abs(xv-sv)))] }and run it like this:closest(y,8.9)[1] 9 9

Page 22: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

22

The sample Functiony<- scan()[1] 8 3 5 7 6 6 8 9 2 3 9 4 10 4 11and here are two samples of y:sample(y)[1] 8 8 9 9 2 10 6 7 3 11 5 4 6 3 4sample(y)[1] 9 3 9 8 8 6 5 11 4 6 4 7 3 2 10The order of the values is different each time that sample is invoked, but the same numbers are shuffled in every case. This is called sampling without replacement.

sample(y, 5)[1] 9 4 10 8 11sample(y, 5)[1] 9 3 4 2 8The option replace=T allows for sampling with replacement,

sample(y, replace=T)[1] 9 6 11 2 9 4 6 8 8 4 4 4 3 9 3

Page 23: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

23

Writing a computer program to solve a problem can usually be

reduced by

following this sequence of steps:

1 Understand the problem.

2 Work out a general idea how to solve it.

3 Translate your general idea into a detailed implementation.

4 Check: Does it work?

Is it good enough?

If yes, you are done!

If no, go back to step 2.

Some general programming guidelines

Page 24: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

24

1. The for() statement

2. The if() statement

3. The while() loop

4. The repeat loop, and the break and

next statements

Flow control

Page 25: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

25

The for statement (1)

The for() statement allows one to specify that a certain operation should be repeated a fixed number of times.

Syntax

for (variable in sequence) expressionOr,

for (variable in sequence) { expression expression expression

}

Flow control

Page 26: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

26

Example: The for statementsum.x <- 0 for (i in 1:5) { sum.x <- sum.x + i print(i)}[1] 1[1] 2[1] 3[1] 4[1] 5

sum.x[1] 15

print() # Prints a single R objectcat() # Prints multiple objects, # one after the other

Page 27: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

27

Example: for loopThe Fibonacci sequence is a famous sequence in mathematics. The first two elements are defined as [1, 1]. Subsequent elements are defined as the sum of the preceding two elements. For example, the third element is 2 (= 1+1), the fourth element is 3 (= 1+2), the fifth element is 5 (= 2+3), and so on.

To obtain the first 12 Fibonacci numbers in R, we can use

Fibonacci <- numeric(12) # numeric array of size 12

Fibonacci[1] <- Fibonacci[2] <- 1

for (i in 3:12) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1]

To see all 12 values, type in

Fibonacci

[1] 1 1 2 3 5 8 13 21 34 55 89 144

Page 28: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

28

The for statement (2)

What are the outputs of the following statement?

for (x in 1:10) print(sqrt(x))

It prints the square root of the integers one to ten

Page 29: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

29

Conditional execution (1)The if() statements:

The if() statement allows us to control which statements are executed

Syntaxif (condition) {commands when TRUE}

Or,if (condition) {commands when TRUE} else {commands when FALSE}

That is, if (expr1) expr2 else expr3

where expr1 must evaluate to a single logical value and the result of the entire expression is then evident.

Page 30: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

30

Conditional execution: if statement

Expre-ssion

Statement 1False

True (1)

Entry

Exit

Page 31: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

31

Conditional execution: if – else statement

Body of if

Test Expre- ssion

Body of else

True (1) False (0)

Entry

Exit

Page 32: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

32

Example: Conditional execution (2)A simple example:

x <- 3if (x > 2) y <- 2 * x else y <- 3 * x

Since x > 2 is TRUE, y is assigned 2 * 3 = 6. If it hadn’t been true, y would have been assigned the value of 3 * x.

Page 33: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

33

The while() loopSometimes we want to repeat statements, but the pattern of repetition isn’t known in advance. We need to do some calculations and keep going as long as a condition holds. The while() statement accomplishes this.

Syntax

while (condition) {statements}

The condition is evaluated, and if it evaluates to FALSE, nothing more is done. If it evaluates to TRUE the statements are executed, condition is evaluated again, and the process is repeated.

Page 34: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

34

Example: while loopSuppose we want to list all Fibonacci numbers less than 300. Wedon’t know beforehand how long this list is, so we wouldn’t know how to stop the for()loop at the right time, but a while()loop is perfect:

Fib1 <- 1Fib2 <- 1Fibonacci <- c(Fib1, Fib2)while (Fib2 < 300) { Fibonacci <- c(Fibonacci, Fib2) oldFib2 <- Fib2 Fib2 <- Fib1 + Fib2 Fib1 <- oldFib2}

To see the final result of the computation, typeFibonacci[1] 1 1 1 2 3 5 8 13 21 34 55 89 144 233

Page 35: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

35

Page 36: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

36

Page 37: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

37

The repeat loop, and the break and next statementsSometimes we don’t want a fixed number of repetitions of a loop, and we don’t want to put the test at the top of the loop the way it is in a while() loop. In this situation we can use a repeat loop. This loop repeats until we execute a break statement.

Syntax

repeat { statements }

This causes the statements to be repeated endlessly. The statements shouldnormally include a break statement, typically in the form

if (condition) break

but this is not a requirement of the syntax.The break statement causes the loop to terminate immediately. Break statements can also be used in for() and while() loops. The next statement causes control to return immediately to the top of the loop; it can also be used in any loop.

The repeat loop and the break and next statements are used relativelyinfrequently.

Page 38: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

38

Example: repeat, break

We can repeat the Newton’s algorithm example from the previous example using a repeat loop:

x <- x0<- .5tolerance <- 0.000001repeat {

f <- x^3 + 2 * x^2 - 7if (abs(f) < tolerance) breakf.prime <- 3 * x^2 + 4 * xx <- x - f / f.prime

}x

This version removes the need to duplicate the line that calculates f.

#****** Prog Using while

x <- x0<- .5f <- x^3 + 2 * x^2 - 7tolerance <- 0.000001while (abs(f) > tolerance) { f.prime <- 3 * x^2 + 4 * x x <- x - f / f.prime f <- x^3 + 2 * x^2 - 7 }x

Page 39: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

39

Writing functions (1)

A function is defined by an assignment of the form> name <- function(arg_1, arg_2, ...) expression

The expression is an R expression, (usually a grouped expression), that uses the arguments, arg_i, to calculate a value. The value of the expression is the value returned for the function.

A call to the function then usually takes the form

> name(expr_1, expr_2, ...)

and may occur anywhere a function call is legitimate.

Page 40: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

40

Writing functions (2)Example 1: Write a function to compute standard deviation

sd <- function(x){ sqrt(var(x))}

If X = 9, 5, 2, 3, 7; type

x <- c(9,5,2,3,7)sd(x)[1] 2.863564

Exercise: Calculate the coefficient of variation as the standard deviation of a variable, after dividing by its mean.

Page 41: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

41

Example : Geometric mean as a function

insects<-c(1,10,1000,10,1)mean(insects)[1] 204.4

To calculate a geometric mean by finding the antilog (exp) of the average of the logarithms (log) of the data:

exp(mean(log(insects)))[1] 10

So a function to calculate geometric mean of a vector of numbers x:

geometric<-function (x) {exp(mean(log(x)))}and testing it with the insect datageometric(insects)[1] 10

Page 42: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

42

Writing functions (3)

Page 43: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

43

Writing functions (4)Example 3: Calculation of Grade point and Letter grade from score or marks.

grade <- function (s) { n <- length(s) gp <- matrix(0, nrow = n, ncol = 1) # gp means Grade Point lg <- matrix(0, nrow = n, ncol = 1) # lg means Letter Grade for (i in 1:n) { if (s[i] < 40){ gp[i] = 0.00; lg[i]= "F" } else if (s[i] >= 40 && s[i] < 45){ gp[i] = 2.00; lg[i] = "D" }

Page 44: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

44

Writing functions (5)else if (s[i] >= 45 && s[i] < 50){

gp[i] = 2.25; lg[i] = "C" } else if (s[i] >= 50 && s[i] < 55){ gp[i] = 2.50; lg[i] = "C+" } else if (s[i] >= 55 && s[i] < 60){ gp[i] = 2.75; lg[i] = "B-" } else if (s[i] >= 60 && s[i] < 65){ gp[i] = 3.00; lg[i] = "B" } else if (s[i] >= 65 && s[i] < 70){ gp[i] = 3.25; lg[i] = "B+" }

Page 45: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

45

Writing functions (6)else if (s[i] >= 70 && s[i] < 75){

gp[i] = 3.50; lg[i] = "A-" } else if (s[i] >= 75 && s[i] < 80){ gp[i] = 3.75; lg[i] = "A" } else{ gp[i] = 4.00; lg[i] = "A+" } } # end of for loop return(list(Grade.Point = gp, Letter.Grade = lg))} # end of function

score <- c(80, 45, 55, 90, 75, 38, 62)result <- grade(score)result

Page 46: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

46

Writing functions (7)

Example 4: Write a function to calculate the two sample t-statistic. This is an artificial example. The function is defined as follows:

twosam <- function(y1, y2) {n1 <- length(y1); n2 <- length(y2)yb1 <- mean(y1); yb2 <- mean(y2)s1 <- var(y1); s2 <- var(y2)s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2))deg.free <- n1+n2-2return(list(test.stat = tst, df=deg.free))

}

Page 47: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

47

Writing functions (8)If,x <- c(37, 29, 35, 28, 24, 36, 40, 37, 33, 28, 39)y <- c(22, 32, 27, 30, 24, 34, 32, 20, 24, 25, 28, 26, 26)

Thentwosam(x, y)$test.stat [1] 3.307523

$df [1] 22

Page 48: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

48

Maximum likelihood estimation

The pdf of Gamma distribution

0,)(

),;( 1

xxexf x

The likelihood and log-likelihood are

n

ii

n

ii

xn

ii

nn

iin

xxnnL

exxfxxxL

n

ii

11

1

1121

)log()1()(log)log()log(

)(),;(),;,,,( 1

Example 4: Maximum likelihood estimation (Gamma distribution as an example)

Page 49: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

49

Maximum likelihood estimation

mle() allows to estimate parameters by maximum likelihood method using iterative methods of numerical calculus to minimize the negative log-likelihood (which is the same of maximizing the log-likelihood).

This requires to specify the negative log-likelihood analytical expression as argument and giving some starting parameters estimates.

Page 50: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

50

Maximum likelihood estimationx.gam <- rgamma(200, rate=0.5, shape=3.5) # sample size n=200 from a gamma distribution with # λ=0.5 (scale parameter) and α=3.5 (shape parameter)

library(stats4) # loading package stats4 for mle()logL <- function(lambda, alfa) { n <-200 x <- x.gam temp1 <- -n*alfa*log(lambda)+n*log(gamma(alfa)) temp2 <- -(alfa-1)*sum(log(x))+lambda*sum(x) temp1+temp2 # -log-likelihood function} est <- mle(minuslog =logL, start =list(lambda =2, alfa =1))

Page 51: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

51

Maximum likelihood estimationsummary(est)Maximum likelihood estimation

Call:mle(minuslogl = logL, start = list(lambda = 2, alfa = 1))

Coefficients: Estimate Std. Errorlambda 0.5350503 0.05485034alfa 3.8613209 0.37065500

-2 log L: 1051.109

Page 52: Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

52

Thank You