FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and...

37
First Year R Programming Workshop Prof. R. Willingale October 3, 2014 Department of Physics and Astronomy University of Leicester University Road Leicester LE1 7RH Telephone +44-116-252-3556 Internet http://www.star.le.ac.uk/zrw Email [email protected] Contents 1 Introduction 3 1.1 What is R? ................................... 3 1.2 Getting started with R ............................. 4 1.3 The R working directory ............................ 6 1.4 Quitting R .................................... 7 1.5 Getting R help ................................. 7 1.6 The R Console ................................. 8 2 Workshop Tasks 8 2.1 Using R as a scientific calculator ........................ 8 1

Transcript of FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and...

Page 1: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

First Year R Programming Workshop

Prof. R. Willingale

October 3, 2014

Department of Physics and AstronomyUniversity of LeicesterUniversity RoadLeicester LE1 7RH

Telephone +44-116-252-3556Internet http://www.star.le.ac.uk/zrw

Email [email protected]

Contents

1 Introduction 3

1.1 What is R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Getting started with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 The R working directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Quitting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Getting R help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 The R Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Workshop Tasks 8

2.1 Using R as a scientific calculator . . . . . . . . . . . . . . . . . . . . . . . . 8

1

Page 2: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 2

2.1.1 Exercise 1 - Simple calculations using R . . . . . . . . . . . . . . . 12

2.2 Plotting graphs with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.1 Exercise 2 - Sketching your own curves . . . . . . . . . . . . . . . . 15

2.3 User defined R functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Exercise 3 - Defining your own functions . . . . . . . . . . . . . . . 16

2.4 Programming using R script files . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.1 Exercise 3 - Creating and running your own script . . . . . . . . . . 17

2.5 Creating, reading and writing data files with R . . . . . . . . . . . . . . . . 19

2.5.1 Exercise 5 - Creating your own data files . . . . . . . . . . . . . . . 21

2.6 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.1 Exercise 6 - A script to perform linear fitting . . . . . . . . . . . . . 24

2.7 Including measurement errors in your analysis . . . . . . . . . . . . . . . . 24

2.7.1 Exercise 7 - Plotting error bars . . . . . . . . . . . . . . . . . . . . 25

2.8 Numerical differentiation and integration . . . . . . . . . . . . . . . . . . . 25

2.8.1 Exercise 8 - Decoding and modifying algorithms in R scripts . . . . 27

2.9 Images and multi-dimensional arrays in R . . . . . . . . . . . . . . . . . . 27

2.9.1 Exercise 9 - Plotting a function of 2 variables . . . . . . . . . . . . 31

2.10 R object types and attributes . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.10.1 Exercise 10 - Playing with data . . . . . . . . . . . . . . . . . . . . 33

2.11 Practising R programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.11.1 Exercise 11 - 2-D random walks - Brownian Motion . . . . . . . . . 34

3 Summary 36

Page 3: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 3

1 Introduction

This 1st year Workshop is an introduction to computer programming and usingcomputation as a tool for learning and doing physics, mathematics, data analysis andstatistics.

You will be using a programming environment called R.

The 1st year workshop comprises 6 hours of contact time in which you will learn howto use a computer, running R, to perform simple tasks. You will be able to apply thesetasks immediately in laboratory and project work. The workshop is a precursor to a moreextensive 2nd year workshop which has 12 hours of contact time. The 2nd year workshopwill extend your knowledge of R and introduce you to programming in C.

The workshop is arranged as a sequence of basic recipes for programming using R. Wewill introduce a series of simple computational tasks and show you how to perform thesetasks using the R environment. Included with each topic heading below are snippets ofcode which show you how to carry out each task. At the end of each section is an exercisewhich will give you practice in using R to solve your computing problems. You will extendthese basic recipes when using R in computational, laboratory and project work in futurecourses and modules.

This script is a pdf file and can be found at

http://www.star.le.ac.uk/zrw/compshop/R_workshop_1st.pdf

We suggest you download the file to your Desktop for ease of use.

1.1 What is R?

R is an open-source environment for statistical computing and visualisation but it alsoprovides a more general programming environment beyond statistics. There are norestrictions on access or use - it’s free and nothing is hidden or proprietry.

There are many high quality specialised procedures for performing statistical and analysistasks in a wide variety of contributed packages that are freely-available and easy tointegrate into your personal version of R.

It uses the S language developed by John Chambers at Bell Laboratories in the 1980’s -the same institution that developed C and UNIX.

Page 4: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 4

Web addresses you will find useful are:

http://www.r-project.org/

http://cran.r-project.org/doc/manuals/R-lang.html

Advantages of R

• Full programming capabilites - all elements of structural/procedural programmingare available in an assessible form.

• Graphics for visualisation of data etc. are fully integrated into the environment andare simple to use.

• All the source code is published and open. Many libraries and applications are freelyavailable to do a wide variety of analysis tasks.

• You can link R to routines you write in C or Fortran and use R as a front-end todrive all your own software.

Disadvantages of R

• You have to learn the S language. You can’t just drag and click. But if you learnS you will be familar with all the elements of computer programming languagesand subsequently learning C, Fortran, Perl, IDL and many other languages will bestraightforward.

• R is an interpreter not a compiler. This means that some types of programmingtasks may be rather slow compared to using C, Fortran or some other compiledlanguage. The time penalty is often a rather large factor of 10 or more. So insteadof getting a result in seconds it may take minutes or even hours. However, moderncomputers are so fast this is not a problem for the novice. In the 2nd year we willteach you how to write routines in C and run them from within R so you can getaround this limitation.

1.2 Getting started with R

Use the Windows Run Program menu or type R at the command line prompt on a Linuxmachine. On a University IT Windows machine:

Start-->All Programs-->R for Windows-->R 3.0.0 (64-bit) click

Page 5: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 5

The R Console will appear and a load of header stuff will be listed. On my MacBook theheader looks like:

R version 2.14.0 (2011-10-31)

Copyright (C) 2011 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type ’license()’ or ’licence()’ for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type ’contributors()’ for more information and

’citation()’ on how to cite R or R packages in publications.

Type ’demo()’ for some demos, ’help()’ for on-line help, or

’help.start()’ for an HTML browser interface to help.

Type ’q()’ to quit R.

[R.app GUI 1.42 (5933) x86_64-apple-darwin9.8.0]

[History restored from /Users/richardwillingale/Rwork/.Rapp.history]

>

You use R by typing instructions in response to the prompt in the R Console. In theheader text listed above the prompt is a greater than sign followed by a space at the startof the last line (on the left-hand side). In the following text all lines you type into R startwith the prompt > .

R is available on all the University IT Windows machines but you can easily get it runningon your own personal laptop under Windows, MacOS or Linux. We strongly recommendyou go to the R website address given above and download R onto your own machine, ifyou have one.

Page 6: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 6

1.3 The R working directory

When R starts up it sets a “working directory” which it uses as a default to save files andto look for existing files associated with the user. You can find out the current workingdirectory using the following command. You should type the command followed by acarriage return.

> getwd()

[1] "Z:/My Documents"

By default it will be set to your My Documents folder.

In the Windows version of the R GUI the user can set the “working directory” from theFile menu at the top left.

File-->Change dir... click

This will bring up a dialog box which allows you to select a directory. You should usea directory within your own personal directory space. We advise you to create a newdirectory (within your My Documents directory) called Rwork. All your R related files willthen be kept separate from all the rest. You can also set the working directory using thecommand setwd() in the R console. My home directory on UoL IT Windows is on the Zdrive so I use the following:

> setwd("Z:/My Documents/Rwork")

This will only work for you if the directory Rwork exists in your My Documents directory.If not you must create it first using Windows Explorer. If you omit the drive and MyDocuments then it will look for the specified directory within the current directory. Sowhen you first start up R and if you have created the directory My Documents/Rwork thefollowing will surfice.

> setwd("Rwork")

You can always check which directory is the current “working directory” using thecommand getwd() in the R Console.

Page 7: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 7

> getwd()

[1] "Z:/My Documents/Rwork"

If you want to list all the files in the current working directory then use the commandlist.files().

> list.files()

[1] "chaos3.c" "convolve.c"

[3] "convolve.o" "convolve.R"

...

1.4 Quitting R

There are many ways to quit a R session. You can click on the red cross to the top-rightof the GUI or you can type the command:

> q()

Either way you will be given the option to “Save workspace image?” Initially werecommend that you select Don’t Save. Later you may find occasions where savingthe current workspace is useful. Note that you must set the current working directory toa directory for which you have write permission if you want to use the Save option.

1.5 Getting R help

Extensive help is available. Use the Help menu at the top of the GUI screen:

Help--> Html help click for general browsing

Help--> R functions (text)... click for help on a known function/command

For example, try using the R functions help and type getwd into the dialog box.

Page 8: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 8

1.6 The R Console

You can select and edit previous lines which have been typed into the R Console usingcontrol key strokes. Hold the ctrl key down (with your left-little-finger) and hit therelevant character:

ctrl-p go to previous line

ctrl-n go to next line

ctrl-a go to start of current line

ctrl-e go to end of current line

ctrl-b move 1 character back along current line

ctrl-f move 1 character forward along current line

After a bit of practice you will find that you can type, select and edit text efficientlywithout moving your hands away from the keyboard.

You can cut and paste text into the Console (or R text editor) using:

ctrl-c cut text selected using mouse to clipboard

ctrl-v paste text from clipboard to position of cursor

There is a “complete word facility”. Use the TAB character (using your left-little-finger)in an attempt to complete long command names or file names etc. which you have onlyhalf typed.

More information about navigation in the R console is available from the help system:

Help--> Console

2 Workshop Tasks

2.1 Using R as a scientific calculator

You can do simple arithmetic by typing, for example, 5+3 followed by carriage return:

> 5+3

[1] 8

Page 9: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 9

The commands typed are all in the form of “expressions” and these are “parsed” by theR interpreter and then “executed”. The “parsing” and “execution” is initiated by thecarriage return typed at the end of the input line. Execution often produces a result andthis result is listed out or saved for subsequent use. In the above example the “expression”is a simple arithmetic addition. The default action is to list the answer at the console.The [1] listed above indicates that the answer is a single (scalar) result and so has theindex, or counter, 1.

The “expressions” comprise objects, operators and functions (functions are sometimescalled methods in the jargon). So in the above 5 and 3 are primitive objects (integers)and + is an operator. All functions or methods have parentheses after the name. Theseparentheses contain any function arguments but must be present even if there are noarguments, e.g. as in the command getwd() mentioned above. Actually all the namedelements in R can be considered as instances of objects and the S language which is usedby R incorporates elements of “object oriented programming” which you may have heardof. However, we will not be concerned with the details of that here. Instead we willconcentrate on procedureal programming centred on actions and functions rather thanobject oriented programming which is centred on data structures and objects.

All the usual mathematical functions are available. So for example:

> exp(-5.8)*(1-sin(0.4))

[1] 0.001848569

Instead of listing the result you can assign it to an object name. So we can do the abovein two stages:

> a<- exp(-5.8)

> b<- 1-sin(0.4)

> a*b

[1] 0.001848569

The assignment operator is <-. Strangely -> also works:

> 1-sin(0.4) -> b

> b

[1] 0.6105817

Assignment involves the movement of a result or value from one place to another hence thearrow-like assignment operator. An equals sign, =, also works as an assignment operator

Page 10: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 10

with the implied movement working right to left. This is used in some circumstances inR and is common in many other languages.

More interestingly objects in R don’t have to be scalar quantities (one value) but can belists or a sequence of numbers (i.e. a vector). There are many ways of generating simplesequences in R:

> s<- 1:20

> s

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

> r<- seq(length=10, from=-3, to=5)

> r

[1] -3.0000000 -2.1111111 -1.2222222 -0.3333333 0.5555556 1.4444444 2.3333333 3.2222222

[9] 4.1111111 5.0000000

You can now see what the [] listed out really refers to. It is the index of the left-mostitem on each line.

> t<- rep(0, length=10)

> t

[1] 0 0 0 0 0 0 0 0 0 0

A range specification like 1:20 and commands seq() and rep() create regular lists orsequences. The combine function, c(), is useful to generate a list or vector of any sequenceof values:

> x<- c(0.5, 0.8, 0.81, 0.9)

> x

[1] 0.50 0.80 0.81 0.90

You can use lists or vectors as arguments for functions in which case the function is appliedto each vector element in turn:

> x<- (1:20)*0.1

> x

[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Page 11: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 11

> y<- sin(x)

> y

[1] 0.09983342 0.19866933 0.29552021 0.38941834 0.47942554 0.56464247 0.64421769 0.71735609

[9] 0.78332691 0.84147098 0.89120736 0.93203909 0.96355819 0.98544973 0.99749499 0.99957360

[17] 0.99166481 0.97384763 0.94630009 0.90929743

Furthermore if two vectors are added (or multiplied etc.) together the result is a newvector with elements which are the sum (product etc.) of the individual elements:

> x<- (1:20)*0.1

> yp<- sin(x)/x

> yp

[1] 0.9983342 0.9933467 0.9850674 0.9735459 0.9588511 0.9410708 0.9203110 0.8966951 0.8703632

[10] 0.8414710 0.8101885 0.7766992 0.7411986 0.7038927 0.6649967 0.6247335 0.5833322 0.5410265

[19] 0.4980527 0.4546487

The R operator to raise a number to a power is ^. So to find the square:

> 11^2

[1] 121

> x<- c(5, 6, 7)

> x^2

[1] 25 36 49

In R the built-in constant pi is the ratio of the circumference of a circle to its diameter.

> pi

[1] 3.141593

All the arithmetic in R is performed using double precision real numbers. The defaultnumber of digits used for listing results etc. is 7. If you want to see a result to higherprecision then you can use the format() function.

> format(pi, digits=15)

[1] "3.14159265358979"

In some instances the number to be printed may be an integer but it will be listed as areal number using scientific notation because of the limited number of digits. For examplethe factorial() function gives:

Page 12: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 12

> factorial(16)

[1] 2.092279e+13

We can list the answer to full integer precision using:

> format(factorial(16), digits=9)

[1] "20922789888000"

The format() function also works with lists or vectors:

> format(factorial(c(13, 13, 15, 16)), digits=9)

[1] " 6227020800" " 6227020800" " 1307674368000" "20922789888000"

There are several functions with return a scalar statistic from a vector, sum(), mean(),median(), max(), min(). For example:

> mean(c(5.5, 8.4, 10.3, 1.2))

[1] 6.35

2.1.1 Exercise 1 - Simple calculations using R

(i) The radius of the Earth is 6378.1 km and the accelaration due to gravity at the surfaceis g = 9.807 m s−2. Use R to find the accelaration due to gravity at a distance of 104 kmfrom the centre of the Earth.

(ii) What is the volume of the Earth assuming it is spherical?

(iii) Use R to estimate the fractional error when the exponential function, exp(x), isapproximated by the first 5 terms of the series

ex = 1 + x+x2

2!+

x3

3!+ ...

if x = 1.5. You should use the sum() function to perform the summation of the seriesterms.

(iv) A full listing of the R Base Package functions available can be found using help. Usethe following command to browse the full menu

Page 13: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 13

> help("base")

or alternatively

Help--> R functions (text)... click type base into dialog box

(v) After you have completed a number of calculations using R you will have variousobjects defined in the current environment. You can list these objects using the ls()

command in the R Console:

> ls()

This is often useful if you have forgotten what variables you have defined etc..

2.2 Plotting graphs with R

Plotting a simple graph using vectors is easy:

> x<- (1:200)*0.1

> y<- sin(x)/x

> plot(x, y, type="l")

By default the plot appears in a separate window with annotated axes etc.

The argument of the plot() function called type controls the form of the graph, "l" forjoin the points with lines, "p" plot points. Further options can be found using the helpsystem.

You can create a PDF file of the plot using:

> pdf("myplot.pdf")

> plot(x, y, type="l")

> dev.off()

Page 14: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 14

The file, which in the above is called myplot.pdf, will be saved in the current workingdirectory (see Section 1.3).

Alternatively you can create bitmap files using the commands bmp(), jpeg(), png() ortiff(). In each case you must specify the width and height in pixels and the pointsizefor plotted text.

It is often useful to use logarithmic axes. The following example shows how to produce alog-log plot. Note that the samples in x are set logarithmically so that the resulting plotlooks suitably smooth.

> lx<- seq(-1, 1, length=1000)

> x<- 10^lx

> y<- exp(-1/x)

> yy<- (1-exp(-x))

> yyy<- (1-exp(-x))^2.5

> plot(x, y, type="l", xlim=c(0.1,10.), ylim=c(0.001,1.0), log="xy",

xlab="distance", ylab="density", main="density profile")

> lines(x, yy, type="l", col="green")

> lines(x, yyy, type="l", col="blue")

> lines(x, y, type="l", col="red")

In this example we have also demonstrated how to plot more than one curve in a singleframe. The initial plot() call sets up the limits and mapping of the axes. The graphicalparameters xlab, ylab and main are also included to label the axes and provide a maintitle for the plot. Subsequent calls to lines() add further curves with different colours.

You can get more detailed control over elements of the plot by splitting the plot() callinto individual functions.

> plot.new()

> plot.window(c(0.1, 10.), c(0.001,1.0), log="xy")

> title(xlab="distance", ylab="density", main="density profile")

> axis(1); axis(2)

> lines(x, y, type="l", col="green")

Note that the line axis(1); axis(2) contains two function calls separated by a semi-colon. Short lines can always be entered like this, using the semi-colon as a separater, tosave space and make things look neater.

Page 15: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 15

2.2.1 Exercise 2 - Sketching your own curves

The Lorentzian function has the form

p

x2 + w

(i) Use R to sketch the function for various values of the constants p and w.

(ii) What properties of the function do the constants p and w control?

(iii) How would you modify the function so that the shape remains the same but the peakoccurs at a specified position x = x0? Demonstrate that your answer works using R tosketch the new function.

2.3 User defined R functions

You can assign an expression to an object name to define your own R function.

> sinc<- function(x) {

y<- sin(x)/x

return(y)

}

> x<- (1:20)*0.1

> sinc(x)

[1] 0.9983342 0.9933467 0.9850674 0.9735459 0.9588511 0.9410708 0.9203110 0.8966951 0.8703632

[10] 0.8414710 0.8101885 0.7766992 0.7411986 0.7038927 0.6649967 0.6247335 0.5833322 0.5410265

[19] 0.4980527 0.4546487

If the result is invalid R records the fact:

> sinc(0)

[1] NaN

Here the result NaN means Not-a-Number.

The function can have more than one argument and if it is complicated you can spread thedefinition over several lines. In such a case the curly brackets define the so-called scope

Page 16: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 16

of the function such that the function definition comprises a sequence of lines. The lastline within the curly brackets, return(y), makes it clear which result/object is returnedas the value of the function. So, for example, we can set up a lorentzian function overthree lines.

lorentzian<- function(x, a, w) {

y<- 1/(x^2+w)

y<- a*y

return(y)

}

x<- seq(-10, 10, length=1000)

y<- lorentzian(x, 2, 2)

plot(x, y, type="l")

2.3.1 Exercise 3 - Defining your own functions

(i) Define the functions sinc(x) and lorentzian(x,a,w), as given above, in R.

(ii) Check that they return sensible results.

(iii) Plot both functions on the same plot over the range −10 < x < 10. Make sure thatthe y-limits of the plot are set so that the functions don’t fall off the bottom or over stepthe top.

2.4 Programming using R script files

While reading the previous sections it may have struck you that doing complicated thingswith R can involve a lot of tedious typing. Instead of typing in commands (expressions)at the R prompt in the Console you can type them into a script (ASCII text) file and thenget R to read the commands from the file. Once the file has been created you don’t haveto re-type everything to repeat the execution and the file acts as a convenient record ofwhat you did. The R console provides you with an editor which you can use to producea source file but you can also use any text editor of your choice.

You start the editor and create a new script file using the File menu:

File-->New script click

Page 17: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 17

This will open the editor window. When this window is in focus anything you type goesinto the ASCII file. When you have finished typing in the script you can save the fileusing the save button in the editor menu bar. You don’t need to close the editor window,in fact it’s a good idea to leave it open ready for further editing. When saved the file willautomatically be given the name extension .R indicating that it is a R script file. The filewill be saved in the current working directory (see above in Section 1.3). The file will besaved permanently so you can return and use it during a later session.

Once you have created the source file you can get R to execute it using the command:

> source("mysource.R")

Or alternatively you can use the File menu:

File-->Source script click

If you want to edit an existing source file (because you made a mistake or you want tochange the program) then you can go back to the editor window. If you closed the editorwindow or are starting a new session you must use the File menu:

File-->Open script click

The process of editing a “source file”, saving that file and then running the program iscommon to many computer languages. When you are developing or modifying a programyou will use the cycle edit-save-run-edit-save-run... repeatedly. The R GUI hasbeen designed to make this process easy and efficient. You will use a similar cycle whenyou start to learn the C language in the 2nd year but because C is a compiled languagethere is an extra step (or steps) in the cycle to compile and link the program prior toexecution, run. Note that, in the following text, lines of R code which are intended to betyped into a source file are not prefixed by the > prompt.

2.4.1 Exercise 3 - Creating and running your own script

(i) Use the R script editor to create a R source file containing the following program:

# this code defines a function and then uses it to plot a graph

sincscalar<- function(x) {

Page 18: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 18

if(x==0) y<- 1.0 else y<- sin(x)/x

return(y)

}

sinc<- function(x) {

y<- lapply(x, sincscalar)

return(y)

}

x<- seq(length=100, from=0, to=10) # define a sequence of x values

y<- sinc(x) # calculate sinc for each x value

plot(x, y, type="l")

To save time and typing effort you can cut and paste the script from this pdf file intothe R script editor. Use the mouse cursor to highlight the script above in the AcrobatReader window containing this document. Cut the text to the clipboard using ctrl-c.Now move to the R script editor window. Paste the text into the editor using ctrl-v.Within this program it’s important to note that the x and y used in the definition ofsincscalar, sinc and the main program are distinct. Variables declared and used withinthe scope of a function are not the same as variables used elsewhere. The usual connectionbetween any function and the rest of the code is through the list of arguments (within thebrackets, ()) and the return() at the end of the function definition before the closingcurly brackets. Variables declared and used outside a function can be used within afunction (such variables are effectively global) but variables declared and used within afunction are not available outside of that function. It is good practice to avoid use ofglobal variables inside function and pass all required values as arguments of the functionif possible.

The code within R is divided into packages and each package provides all the functions etc.to perform a particular suite of analysis tasks. R has a namespacemanagement system forcode in packages which allows the package writer to specify which variables are exportedto package users and which variables should be imported from other packages. All this isfar beyond what you require as present but will be useful if you write more complicatedcode in the future.

(ii) Save the file using a name of your choice (here we assume you call it first_script.R).

(iii) Once saved you can go to the R Console and type the following command:

> source("first_script.R")

This will set up the functions and plot the result. If it fails with some error associatedwith the program in the file you must go back to the editor, modify the source code tofix the error, save the file again and finally issue the source() command again.

Page 19: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 19

Note that we have defined two functions: sincscalar calculates the results for a scalarinput but uses the if() else structure to trap when x==0. The == operator returns a logicalvalue, TRUE if x is equal to zero in this case, FALSE otherwise. So the scalar functionwon’t fail when x is zero but returns the correct value of the function which is 1 when xis 0.

This scalar function is then applied to a list, each element at a time, using the suppliedfunction lapply(). You should look up lapply in the Help to get a full description ofwhat it does.

(iv) Try using the sincscalar function with a list or vector argument. Why does it fail?

Note that anything on a line after a hash character, #, is ignored and therefore indicatesa comment from/for the programmer/user. When writing any computer program youshould/must include comments so that if you or anybody else comes to look at the programin a later life you/they can understand what it does and how it does it.

2.5 Creating, reading and writing data files with R

In order to use R to analyse your data from, say, a laboratory experiment you need toinput the data in some way. If there are only a few data points then you can quickly typethem directly into a R script using the c() function.

v<- c(1.0, 1.053, 1.105, 1.158)

t<- c(292.3, 281.8, 284.5, 279.5)

p<- c(1.202, 1.117, 1.062, 0.990)

A better way, especially if there are many data points or if you are going to add moredata points later, is to create a data file. You can use the R editor (or some external texteditor) to create a tabulation of results from some experiment. Suppose you create a filegas.dat which looks like the following.

V_L T_K P_At

1.000 292.3 1.202

1.053 281.8 1.117

1.105 284.5 1.062

1.158 279.5 0.990

1.211 279.9 0.940

1.263 271.7 0.882

1.316 272.3 0.850

Page 20: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 20

1.368 264.9 0.793

1.421 266.8 0.738

1.474 262.9 0.725

1.526 258.7 0.681

1.579 254.2 0.647

1.632 253.1 0.634

1.684 252.3 0.602

1.737 243.2 0.595

1.789 241.1 0.568

1.842 242.8 0.544

1.895 245.1 0.535

1.947 237.2 0.508

2.000 240.3 0.483

Start the editor as before

File-->New script click

When you have typed in the data you must save the file. By default the file extensionwill be set to .R which will be confusing so you should specify the complete file nameincluding some other name extension, for example gas.dat.

It is easy to get R to read this file into what is called a “data frame”. The following codereads the data file, prints out a listing of the data and finally plots a graph.

gasdata<- read.table("gas.dat", header=TRUE)

print(gasdata)

x<- log10(gasdata$V_L)

y<- log10(gasdata$P_At)

# Plot the data as points

plot(x, y)

The data frame here is an object called gasdata. The columns in the original file becomecomponents of the data frame which are referenced using the names gasdata$V_L etc.The component names in the data frame have been taken from the header line in theoriginal data file.

The print() is what is called a generic function or method. What such a functiondoes depends on the type of object given in the argument. In the case above the objectgasdata is a data frame and in this case print() produces a neat tabulation of the

Page 21: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 21

columns within the data frame. Instead of listing the data frame in the Console you canredirect the output to a file using the command sink("filename"). In the followingexample we create a file called gas.txt. The file will be saved on the current workingdirectory (see Section 1.3).

gasdata<- read.table("gas.dat", header=TRUE)

sink("gas.txt")

print(gasdata)

sink()

sink() closes the file and restores the output to the console. This process of openingan output channel (to a file), writing to the file, and finally closing the file, is commonpractice in many computer languages.

Another way of producing a tabulation file is to use

write.table(gasdat, file="gas.txt")

If the object given as the first argument is not a data frame then write.table() willattempt to convert it into a data frame before printing it out. The exact format used,including the separators and use of quotes, can be controlled by other arguments. Tofind out details use the Help system. There is a nice symmetry between the functionsread.table() and write.table() but other than that it makes little difference. Thesink() - print() - sink() sequence provides a more general capability for writing textfiles using R.

2.5.1 Exercise 5 - Creating your own data files

(i) Use the editor to create your own version of the data file gas.dat. Remember you cancut and paste the table from this script into the R editor window (see above).

(ii) Write a R script file which reads this data file into a data frame, lists the data in theConsole and plots the data points.

(iii) Modify the R script so that it also write the data to a new file gas.txt. Have a lookat this new file to see what format R uses when it writes such a tabulation.

When you are making measurements in the laboratory or during a research project youshould get into the habit of saving your results in a tabulated ASCII file using a formatcompatible with R. You will then be able to write a simple R script to analyse and plotyour data and perform calculations to produce the final results.

Page 22: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 22

2.6 Linear regression

It is often the case that the relationship between a dependent variable, y, and independentvariable, x, is expected to be linear. i.e. the variables obey a relationship of the form

y = mx+ c

where m and c are constants; the gradient of the line and the intercept of the line withthe y-axis (when x = 0).

In some experiment we may make measurements of y at a series of points x. The datafile gas.dat you created above contains measurements of the temperature and pressureof a gas at a series of values of the volume. These gas measurements were made underadiabatic conditions so we expect the relationship between pressure and volume to be

P = CV −γ

where C and γ are constants. The physics behind this relationship is covered in the CoreModule, Light and Matter, PA1120. If we take the logarithm of both sides of this equationwe get a linear relationship

log10(P ) = log10(C)− γ log10(V )

where log10(C) is now the intercept and−γ is the gradient. In Exercise 5 above you plottedlog10(P ) vs. log10(V ) and you can see that the data seem to obey a linear relationshipalthough there is some scatter because the pressure values include measurement errors.Using these measurements we should be able to estimate a value for γ by finding the “bestfit” line. A crude way of doing this would be using a ruler, pencil and eyeball. A betterway is to use a numerical process known variously as Linear Regression, Least SquaresLinear Fitting or Linear Modelling. Details about the Principle of Least Squares are givenin your Laboratory Handbook and will be covered in more detail in the Probability andStatistics section of the mathematics Module PA1710.

There are a number of ways of doing Linear Regression with R. The code below illustratestwo of them.

gasdata<- read.table("gas.dat", header=TRUE)

print(gasdata)

Page 23: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 23

x<- log10(gasdata$V_L)

y<- log10(gasdata$P_At)

# Plot the data

plot(x, y, xlab="log10(volume L)", ylab="log10(pressure At)")

# Use a linear model to perform a simple linear regression

lmfit<- lm(y~x)

print(summary(lmfit))

# Plot regression line

abline(lmfit)

# Do the same using the simpler lsfit function

fit<- lsfit(x, y, intercept=TRUE)

print(summary(fit))

abline(fit, col="red")

The lm()function produces what is called a linear model (hence the name lm). Theargument y~x means y depends on x. If y and x are 1-d vectors then the linear model isy = mx + c as described above. The object lmfit, which is created by the linear modelprocess above, is complicated. It is not just a single scalar result or even just a singlevector. It is a structured object which contains many results/statistics from the fittingprocedure. The function abline(lmfit) plots the regression line using the informationfrom the linear model object. The function summary() produces a text summary of theresults.

An alternative is to use the function lsfit() which performs a least square fit as thename suggests.

In both cases the intercept and gradient (c and m above) are found in a vector calledcoefficients which is held in the results structure called lmfit$coefficients (orfit$coefficients). The intercept is lmfit$coefficients[1] and the gradient islmfit$coefficients[2]. This illustrates how you refer to a variable within a structure(using the $ separator) and index a single element in a vector or list. If there are Nelements in the list/vector then each element is referenced using an index in squarebrackets. The first element is [1] and the last is [N].

You can list just the gradient value (γ for the gas) using the following

cat("The value of gamma for the gas is ", -fit$coefficients[2], "\n")

The cat() function converts each of the arguments to character strings and concatenatesthem all together as a single line of text and then prints out the result. It provides a neatway of listing out your results from analysis using R. The third argument above, "\n", isa string which specifies a carriage return and line feed at the end of the line. Alternativelyyou can specify fill=TRUE (or fill=T) to force a new line.

Page 24: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 24

cat("The value of gamma for the gas is ", -fit$coefficients[2], fill=T)

It is also useful to include text labels etc. on plots. This can be done using the text()

command. For example:

text(x,y,paste("gamma for the gas is ", format(-fit$coefficients[2],digits=5)))

This will put the text “gamma for the gas is 1.294” centred at the plotting position x,y.You should look at the help on text for further options. In the above we have used thefunction paste() which acts like the cat() function but doesn’t print out the result. Wehave already used the format() function in Section 2.1.

2.6.1 Exercise 6 - A script to perform linear fitting

(i) Create and run a script to find the value of γ for the gas used to produce the gas.datdata tabulation.

(ii) Add a line to the script to print out the estimated value of γ for the gas. Plot out thedata points and the best fit linear regression line and include a text label on your plotgiving the resulting value for γ, the ratio of specific heats for the gas.

(iii) Try using both lm() and lsfit() to perform the linear regression. You should getexactly the same coefficients. However you will see that the summary() you get from lm()

is more detailed. In particular, the linear model summary produces an estimate of thestandard error on each of the coefficients whereas the least squares fit does not. You willlearn more about estimating the standard error on results in the Probability and Statisticssection of mathematics Module PA1710.

2.7 Including measurement errors in your analysis

When you make measurements in the laboratory you will record an estimate of the error.It is important that you include these errors in your analysis. Below are a few lines of Rcode which show you how to plot error bars on a set of data points.

# Illustrate how to plot error bars

x<- c(1.95, 2.90, 3.85, 4.80, 5.75, 6.70)

y<- c(1.38, 4.16, 6.19, 6.35, 6.92, 11.1)

Page 25: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 25

sigma<- c(1.3, 1.3, 1.3, 1.3, 1.3, 1.3)

ylo<- y-sigma

yhi<- y+sigma

plot(x, y, type="p", pch=19, ylim=c(min(ylo), max(yhi)))

segments(x, ylo, x, yhi)

The Greek letter lower-case sigma, σ, is often used to represent the Standard Error hencethe use of this name above.

2.7.1 Exercise 7 - Plotting error bars

(i) Look up the segments() function in the R Help.

(ii) Write your own version of a script to plot errors bars for the y-values on a graph.

(iii) Modifiy the script to include error bars on the x-values as well.

2.8 Numerical differentiation and integration

Numerical differentiation and integration are often used in computational physics andthere are many sophisticated algorithms to estimate differentials and integrals which areoptimized for speed and accuracy depending on the type of function to be analysed.

Below is a simple R script to estimate the gradient of a function. The function is sampledat np points x as the array y. In this example we have chosen y to be the sinc function,sin(x)/x. The key lines are those which assign yp and xp. We are using the syntaxy[2:np] etc. to specify sub-vectors or vector slices over an index range so xp and yp bothhave np − 1 elements. Because we have chosen the sinc function we can cheat and alsocalculate the analytical result. This is held in the array ya. The accuracy of this simplealgorithm is controlled by the choice of the step size dx.

# Simple numerical differentiation

dx<- 0.01

x<- seq(from=0.1, to=10, by=dx)

np<- length(x)

# Set function

y<- sin(x)/x

# Estimate gradient between adjacent pairs

yp<- (y[2:np]-y[1:np-1])/dx

Page 26: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 26

# Set points at which gradient estimated

xp<- x[1:np-1]+dx/2

# Calculate the analytical result

ya<- cos(xp)/xp-sin(xp)/xp^2

# Now plot the results

par(mfrow=c(2,2))

plot(x, y, type="l", main="f(x)=sin(x)/x")

plot(xp, yp, type="l", main="numerical estimate of df/dx")

plot(xp, ya, type="l", main="analytical df/dx")

plot(xp, ya-yp, type="l", main="error")

The command par(mfrow=c(2,2)) splits the plotting surface into sub-frames. Eachsubsequent plot() command will use the next sub-frame in sequence so we get fourgraphs on one page.

The next script performs a complimentary numerical integration. When you perform anumerical integration you are necessarily calculating a definite integral. In the examplebelow the definite integral is

∫ xp

xlo

(

cos(x)

x−

sin(x)

x2

)

dx

The lower limit is fixed but the upper limit is set to a range of values held in the vectorxp. The integral is approximated by the cummulative sum function cumsum(). So theroutine returns a series of integral results. The accuracy and number of integrals returnedis controlled by the variable np.

# Simple numerical integration

xlo<- -10.01

xhi<- -0.01

np<- 1000

dx<- (xhi-xlo)/np

# set x sample positions

x<- seq(from=xlo+dx/2, to=xhi-dx/2, by=dx)

# set upper limit for each sample

xp<- x+dx/2

# Set function to be integrated

y<- cos(x)/x-sin(x)/x^2

# perform integral using cummulative sum

yp<- cumsum(y)*dx

# calculate analytical result for definite integral

Page 27: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 27

ya<- sin(xp)/xp-sin(xlo)/xlo

# Now plot the results

par(mfrow=c(2,2))

plot(x, y, type="l", main="f(x)=cos(x)/x-sin(x)/x^2")

plot(xp, yp, type="l", main="numerical estimate of integral")

plot(xp, ya, type="l", main="analytical integral")

plot(xp, ya-yp, type="l", main="error")

2.8.1 Exercise 8 - Decoding and modifying algorithms in R scripts

(i) Study the R scripts above carefully. Make sure you understand how they produce anumerical approximation to a differential and a definite integral.

(ii) The numerical integration script uses np samples across the range xlo-xhi. The firstsample is at position xlo+dx/2 and the last is at xhi-dx/2. With such an arrangementeach sample carries equal weight so we can use the cummulative sum function withoutcomplication. The well know Simpson’s Rule (sometimes called the trapezium rule) forperforming a numerical integration employs a slightly different sample sequence. The firstand the last samples are at the limits xlo and xhi respectively. In this case there are npsamples but only np-1 gaps (think of fence posts and fence panels). Using the trapeziumrule gives the numerical approximation to the integral as

I =∫ xhi

xlo

f(x)dx ≈ ∆x

1

2(f1 + fnp) +

i=np−1∑

i=2

fi

Write your own version of the numerical integration script but modify it to employ thetrapezium rule. You will have to change the lines which assign the increment dx, thesample positions x and xp and the definite integral values yp.

2.9 Images and multi-dimensional arrays in R

Above we have only considered 1-dimensional lists or vectors. Each element can beselected as a scalar using an index. So in the data frame read in above we can picka given value of the volume

> gasdata$V_L[4]

[1] 1.158

Page 28: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 28

You can refer to a subset or slice of a vector using the syntax we introduced in the sectionon numerical differentiation and integration.

> gasdata$V_L[4:10]

[1] 1.158 1.211 1.263 1.316 1.368 1.421 1.474

Objects with multiple subscripts or indices are called arrays. The dimensionality of anarray is held in a dim attribute which is associated with the object concerned. Supposewe set up a vector of 24 elements then we can set the dimensions as follows:

> x<- (1:24)

> dim(x)<- c(6, 2, 2)

> x

, , 1

[,1] [,2]

[1,] 1 7

[2,] 2 8

[3,] 3 9

[4,] 4 10

[5,] 5 11

[6,] 6 12

, , 2

[,1] [,2]

[1,] 13 19

[2,] 14 20

[3,] 15 21

[4,] 16 22

[5,] 17 23

[6,] 18 24

We can pick a single element:

> x[2, 2, 1]

[1] 8

The following R script sets up a 2-d array which samples a function f(x, y).

Page 29: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 29

# define a 2-d function

fxy<- function(x, y) {

z<- sin(x+y)*cos(x^2+y^2)

return(z)

}

# select an area on the x,y plane

xleft<- -2; xright<- 4; nx<- 600

ybot<- -1; ytop<- 3; ny<- 400

# set up grid positions

xsam<- (xright-xleft)/nx

ysam<- (ytop-ybot)/ny

x=seq(from=xleft+xsam/2, by=xsam, length.out=nx)

y=seq(from=ybot+ysam/2, by=ysam, length.out=ny)

# generate sample array using outer product and 2-d function

z<- outer(x, y, fxy)

# plot rendering of the sample array

layout(1)

image(x, y, z, col=rainbow(200), useRaster=T, asp=1)

All the real work is done by the function outer(x,y,fxy). This generates an array(matrix) of number pairs which is the outer product of the vectors x and y and appliesthe function fxy at each point. If the indices of the x and y are i and j then the outerproduct array has elements i,j where i is a column index and j is a row index.

The plotting function image() produces a picture which represents the array using acolour mapping specified by col=rainbow(200). The useRaster produces a bitmap(which is fast, rather than plotting polygons which is slow) for the plot and is onlyapplicable if the x-y grid is regular. The asp=1 specifies the aspect ratio between the 2axes.

The layout(1) command switches the plotting layout back to 1 frame per page, cancellingany previous layout setting like par(mfrow=c(2,2)). Alternatively you could close theplotting device before starting a new plot using dev.off().

Suppose we want to search for local minima or maxima of the function in the array. Todo this we need to compare each point (pixel) with the 8 points (pixels) which surroundit. We must employ what is called a repetition structure or “for loop” that looks like thefollowing in R.

for(i in seq) {

do these lines for each i value from seq

}

Page 30: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 30

So if the seq is set to the sequence 1:20 the lines within curly brackets will be repeatedwith i=1, i=2, ... i=20. Below is code which searches for minima and maxima in thesampled array using “for loops”. There are two nested loops. The outer loop countsthrough the first index, i, and the inner loop counts through the second index, j. Withinthese loops we pick out a tile of 3 by 3 elements. The function which.min() treats itsargument as a vector and returns the vector index corresponding to the element with theminimum value. For a 3 by 3 array the centre corresponds to vector index 5. So if thisvalue is returned then we must be at a local minimum. The function which.max() doesthe same thing for the maximum. When a maximum or minimum is found the scriptplots a white (maximum) or black (mimimum) dot. Note that the command points()

works in the same way as lines(). It adds points to the existing plot.

# search for local minima or maxima

for(i in 2:(nx-1)) {

for(j in 2:(ny-1)) {

tile<- z[(i-1):(i+1), (j-1):(j+1)]

im<- which.min(tile)

if(im==5) {

points(x[i], y[j], type="p", pch=19, col="black")

}

im<- which.max(tile)

if(im==5) {

points(x[i], y[j], type="p", pch=19, col="white")

}

}

}

In R most repetition loops are implicit rather than explicit. Suppose we have

x<- 1:20

y<- sin(x)

You could achieve the same result using an explicit for loop.

for(i in 1:20) {

x[i]<- i

y[i]<- sin(x[i])

}

Page 31: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 31

In R using an explicit “for loop” is inefficient because R is an interpreted language not acompiled language. The implicit “for loop” contained in the R statement y<- sin(x) isboth syntactically neat and much more efficient. A similar vector arithmetic syntax usingimplicit loops is available in Fortran 90 and many modern languages. If the language iscompiled then the explicit loops will be as fast as the implicit loops.

2.9.1 Exercise 9 - Plotting a function of 2 variables

(i) Implement a script to plot the function sin(x+ y)cos(x2+ y2) as a false colour image.

(ii) Add code to search for and mark the positions of local maxima and minima of thefunction.

(iii) Check that the positions plotted do, indeed, correspond to local maxima and minimaof the function plotted.

2.10 R object types and attributes

R was designed to manipulate and analyse data and therefore it includes a wide range ofdata object types. In this section we review the object types you have already encounteredabove and introduce a few more details which you will find useful.

• Vectors - the basic unit of data in R is a vector, an ordered collection of elementsall of one type - numeric (integer or double), complex, character or raw. Specialvalues include NA (not available or missing data), NaN (not a number) and +/- Inf(infinity). If there is only one element then the vector becomes a scalar.

• Lists - these are generic vectors in which elements can be mixed, of any typeincluding list itself. i.e. lists are recursive - you can have lists of lists. Elementswithin a list can have a name attribute.

• Functions - take other objects as arguments (an ordered group of objects enclosed inparentheses ()), perform some action and return an object as the result. Functionsare objects in there own right. They can can be manipulated in the same way asany other object - they can be assigned, passed as arguments and returned fromfunctions.

• Matrices and arrays - special type of vector which have a dim (dimensions) attribute.A matrix has two dimensions specified using a vector of length 2 giving the numberof rows an columns in the matrix. An array has a dim of length n which can be > 2.

Page 32: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 32

• Data frames - special kind of list in which all the elements are vectors of the samelength. So takes the form of a data table which can be indexed using a row andcolumn.

• Factors - represent the category or rank of data. For example small/medium/large,male/female or numerical order. Factors are also known as “category” and“enumerated type” or ENUM.

There are many ways to define and generate vectors. We have already come across thefunctions c(), seq() etc..

A list in which each element is named can be produced using the list() function. Thisis a useful way of collecting together related data values in a single object.

x<- c(1.2,2.3,5.5,8.8,10.1,11.2)

y<- c("up","down","up","up","up","down")

experiment1<-list(date="12 Nov 2012",distance=x,direction=y)

We have already seen how we define a new function and assign it to a name.

ringing<-function(t,tc,omega) {

return(exp(-t/tc)*sin(omega*t))

}

Matrices and arrays can be set up using the functions matrix() and array().

> y<-matrix(1:20, nrow=5,ncol=4)

> y

[,1] [,2] [,3] [,4]

[1,] 1 6 11 16

[2,] 2 7 12 17

[3,] 3 8 13 18

[4,] 4 9 14 19

[5,] 5 10 15 20

In the exercises above you used the function read.table() to create a data frame using atabulation of data on a file. You can create a data frame in situ using the data.frame()function. This is useful if you want to list data in tabular form either in the R Console orin an external data file. In the lines below we set up a vector t and then use sapply() tocreate a sampled array of the function ringing() as defined above. The results are thenput into a data frame using the function data.frame().

Page 33: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 33

t<- seq(from=0.0, to=100.0, length.out=1000)

tc<- 20.0

omega<- 5.0

f<- sapply(t,ringing,tc,omega)

a<- data.frame(seconds=t,amplitude=f)

plot(a,type="l")

write.table(a,file="ringing.dat")

Factors in R are stored as a vector of integer values which correspond to the differentpossible values (or levels) the factor can take. Typically there are a small number ofpossible values (levels) and these are stored as character strings in R. So setting up afactor can be an efficient way of storing character strings. The direction list in ourexperiment1 above can be converted to a factor using:

b<- factor(experiment1$direction)

b

[1] up down up up up down

Levels: down up

When we print out the factor at the console we get the original list of values but inaddition we get the levels identified as well. We can use the function table() to producea contingency table of the levels. This shows the number of occurences at each level.

table(b)

b

down up

2 4

Factors are important in statistical modelling but such details are beyond the scope ofthis workshop.

2.10.1 Exercise 10 - Playing with data

(i) Try producing your own tabulation of the ringing function as defined above. Whatsort of physical system might produce a signal which behaves like this? What do theconstants tc and omega do?

(ii) R has an extensive list of demonstration data sets. You can get a list of these using

Page 34: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 34

data()

Each data set takes the form of a predefine object. For example airquality is a tabulationof airquality measurements in New York. Within this data set the airquality$Month

is given. Use factor() and table() to establish the number of days sampled in eachmonth. Can you identify which of the variables tabulated, Ozone,Solar.R,Wind,Tempare correlated?

(iii) We encourage you to have a look at other data sets to see how they are specified andwhich object types are used. Try plotting a few graphs to see how the data look.

2.11 Practising R programming

R is a tool for solving problems. In the preceeding sections we have introduced all themain language constructs that are used to write a script or computer program in R. Toget the solution to a problem using a computer you have to:

(i) Express the problem in computational terms without reference to a particularprogramming language. This may include defining variables or objects, constructing alogical tree of ifs and buts, setting up iterative sums or repeated loops, defining functionswith arguments and returned values etc.. . .

(ii) Decide on how to implement the above in the language you are going to use. Rcontains many in-built functions which can do much of the work for you.

(iii) Write a draft program or script in R.

(iv) Run the draft script to produce an answer which you already know to check thingsare working correctly. This is arguably the most important stage in the process. Youmust be sure that the program is behaving in the way you think it is behaving.

(v) Edit the draft script so that it solves the problem and run it.

2.11.1 Exercise 11 - 2-D random walks - Brownian Motion

Brownian motion is the observed jitter of very small particles suspended in a gas (orliquid). It was originally seen by the botanist Robert Brown in 1827 while looking atpollen grains in water using a microscope. In 1905 Einstein published a paper thatattributed the random motion of the grains to collisions with individual water molecules.We can ask the question: How far do we expect the particles to travel in a given time?

Page 35: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 35

This is clearly a statistical problem so we might more properly ask: What is the averagedistance moved after some fixed time? How does that average distance depend on thefixed time?

It’s easy to set up a simplified mathematical description of the motion in 2 dimensions (theprojection seen the microscope). We assume that the grain moves with constant speedand collides with a molecule after moving a distance jump = 1. After the collision thenew direction is specified by a angle theta as shown in Fig. 1. This angle is to be chosen

Figure 1: One step of a 2-D random walk

from a uniform distribution from 0 to 2π radians (equal probabilities). The componentsof the jump are dx = cos θ and dy = sin θ. If we do this n.step times we can find the finalposition by summing the dx and dy values, x =

∑n.step1 dx, y =

∑n.step1 dy. The distance

travelled is then r =√

(x2 + y2). This completes stage (i) above. We have expressed the

problem in computational (mathematical) terms.

We now have to convert this into R. Here are a few lines which provide the key steps.

# generate n.step random directions (angles)

n.step <- 100

theta <- runif(n.step, 0, 2*pi)

jump <- 1

# compute the x and y step sizes

dx <- jump * cos(theta)

dy <- jump * sin(theta)

# compute the cumulative x and y positions

x <- c(0, cumsum(dx))

y <- c(0, cumsum(dy))

Page 36: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 36

The function runif() provides samples from a uniform random distribution. The functioncumsum() calculates the cummulative sum of a vector. The result is a vector in whicheach element is the sum of the elements before and including the current element of theargument vector. We used this previously when doing numerical integration.

(i) Write a R script incorporating the lines above to generate and plot a 2-D random walkwith 100 steps.

(ii) Use the script (and in particular the plot) to verify that your program is workingcorrectly.

(iii) Add lines to calculate, plot and list the distance travelled.

(iv) Now add lines so that the script performs n.walks = 1000 and calculates the distancetravelled for each walk. This will include a loop something like:

x <- array(0, dim=n.walks)

y <- array(0, dim=n.walks)

for (i in 1:n.walks) {

theta <- runif(n.step, 0, 2*pi)

dx <- jump * cos(theta)

dy <- jump * sin(theta)

x[i] <- sum(dx)

y[i] <- sum(dy)

}

r <- sqrt(x^2 + y^2)

(v) Use your script to find out how the average distance travelled in the 2-D random walkdepends on the number of steps (or equivalently the time of observation).

3 Summary

On completion of the workshop you should be familiar with using R to perform thefollowing.

• Simple scientific calculations

• Plotting graphs

• Defining and using functions

Page 37: FirstYearRProgrammingWorkshop · R is an open-source environment for statistical computing and visualisation but it also provides a more general programming environment beyond statistics.

University of LeicesterDepartment of Physics and AstronomyFirst Year R Programming Workshop

Document: R1Issue: 1.0Date: October 3, 2014Page: 37

• Using R scripts as computer programs

• Creating, reading and writing ASCII data files

• Linear fitting to data sets

• Plotting measured data points including error bars

• Performing simple numerical differentiation and integration

• Plotting and analysing functions of 2 variables or 2-d images

• Writing a R script to solve simple mathematical/physical problems

You don’t need to remember all the details of the R syntax required. You can always usethe R scripts you have created as the starting point for new scripts/programs. However,you should also now be familiar with the following elements of structural/proceduralprogramming.

• declaring/defining and assigning objects or variables

• arithmetic and logical operators

• in-built functions like sin(), cos(), etc.

• indexing multi-dimensional objects, vectors, matices and arrays

• listing results and controlling the precision of the output

• Input/Output (IO) of data and data files

• defining and using functions (methods or routines)

• referencing components of objects (in R using the object$val syntax)

• if(this) {do something } else {do some other thing}

• for(i in seq) {repeat this for each i}

• handling different object types, vectors, lists and arrays

These elements are common to all structural/procedural programming languages includingC and Fortran.