intro_to_R1

download intro_to_R1

of 36

Transcript of intro_to_R1

  • 8/12/2019 intro_to_R1

    1/36

    A Short Introduction to R

    By and Richard Harris, School of Geographical Sciences, University of Bristol

    A Short Introduction to R by Richard Harrisis licensed under a Creative Coons Attribution!

    "onCoercial!ShareAli#e $%& Unported 'icense%

    Based on a (or# at (((%social!statistics%org%

    You are free:

    to Share) to copy, distribute and transit the (or#

    to Remix) to adapt the (or#

    Under the following conditions:

    Attribution ) *ou ust attribute the (or# in the follo(ing anner+ Based on A Short Introduction

    to R by Richard Harris (((%social!statistics%org-%

    Noncommercial) *ou ay not use this (or# for coercial purposes% Use for education in a

    recognised higher education institution a University- is perissible%

    Share Alike) If you alter, transfor, or build upon this (or#, you ay distribute the resulting

    (or# only under the sae or siilar license to this one%

    With the understanding that:

    Waiver) Any of the above conditions can be (aived if you get perission fro the copyright

    holder Richard Harris, rich%harris.bris%ac%u#-

    Public omain) /here the (or# or any of its eleents is in the public doain under applicable

    la(, that status is in no (ay affected by the license%

    !ther Rights) In no (ay are any of the follo(ing rights affected by the license+

    *our fair dealing or fair use rights, or other applicable copyright e0ceptions and liitations1

    2he author3s oral rights1

    Rights other persons ay have either in the (or# itself or in ho( the (or# is used, such as publicity

    or privacy rights%Notice) 4or any reuse or distribution, you ust a#e clear to others the license ters of this

    (or# (hich applies also to derivatives%

    5ocuent version 6, 7&67-

    http://creativecommons.org/choose/www.social-statistics.orghttp://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://www.social-statistics.org/http://www.social-statistics.org/mailto:[email protected]://creativecommons.org/licenses/by-nc-sa/3.0/http://creativecommons.org/licenses/by-nc-sa/3.0/http://www.social-statistics.org/http://www.social-statistics.org/mailto:[email protected]://creativecommons.org/choose/www.social-statistics.org
  • 8/12/2019 intro_to_R1

    2/36

  • 8/12/2019 intro_to_R1

    3/36

    Introduction

    2his docuent presents a short introduction to R highlighting soe geographical functionality%

    Specifically, it provides+

    A basic overvie( of the 3nuts and bolts3 of R Session 6-

    Soe e0aple of data analysis and siple apping in R Session 7-

    Soe further inforation about the (or#ings of R Session $-

    2he docuent is provided in good faith and the contents have been tested by the author% Ho(ever,

    use is entirely as the user3s ris#% Absolutely no responsibility or liability is accepted by the author

    for conse8uences arising fro ho(soever this docuent is used% It is is licensed under a Creative

    Coons Attribution!"onCoercial!ShareAli#e $%& Unported 'icense see above-%

    "efore starting the following should be considered#

    4irst, you (ill notice that in this docuent the pages and, ore unusually, the lines are nubered%2he reason is educational+ it a#es directing a class to a specific part of a page easier and faster% 4or

    other readers, the line nubers can be ignored%

    Second, the sessions presue that, as (ell as R, a nuber of additional R pac#ages libraries- have

    been installed and are available to use% 2he coplete list of pac#ages used is Rgoogle9aps, png, sp

    and spdep% 2o install theses pac#ages, use

    > install.packages(c("RgoogleMaps", "png", "sp", "spdep"))

    4urther instructions for ho( to install pac#ages can be found in Section $%:%6, 3Installing and

    loading one or ore of the pac#ages3on page $:%

    2hird, each session is (ritten to be copleted in a single sitting% If that is not possible, then it (ouldnorally be possible to stop at a convenient point, save the (or#space before 8uitting R, then

    reload the saved (or#space (hen you (ish to continue% "ote, ho(ever, that (hilst the additional

    pac#ages libraries- need only be installed once, they ust be loaded each tie you begin again in

    R and re8uire the% Any ob;ects that (ere attached before 8uitting R also need to be attached again

    to ta#e you bac# to the point at (hich you left off% See the sections entitled 3Saving and loading

    (or#spaces3, 3Attaching a data frae3and 3Installing and loading one or ore of the pac#ages3on

    pages 6&, 7

    76

    7?

    7@

  • 8/12/2019 intro_to_R1

    4/36

    7

  • 8/12/2019 intro_to_R1

    5/36

    Session 1: Getting Started with R

    2his session provides a brief introduction to ho( R (or#s and to introduce soe of the ore

    coon coands and procedures%

    1.1 About RR is an open source soft(are pac#age, licensed under the G"U General ublic 'icence% *ou can

    obtain and install it for free, (ith versions available for Cs, 9acs and 'inu0% 2o find out (hat3s

    available, go to the Coprehensive R Archive "et(or# CRA"- at http+cran%r!pro;ect%org

    Being free is not necessarily a good reason to use R% Ho(ever, R is not ;ust free, it is also (ell

    developed, (ell docuented, (idely used and (ell supported by an e0tensive user counity% It is

    not ;ust soft(are for 3hobbyists3% It is (idely used in research, both acadeic and coercial%

    In his boo#R in a Nutshell3Reilly, 7&6&-, Doseph Adler (rites, ER is very good at plotting

    graphics, analyFing data, and fitting statistical odels using data that fits in the coputer3s

    eory%

    "evertheless, no soft(are is a perfect tool for every ;ob and Adler adds that Eit3s not good at storing

    data in coplicated structures, efficiently 8uerying data, or (or#ing (ith data that doesn3t fit in the

    coputer3s eory%

    2o these caveats it should be added that R does not offer spreadsheet editing of data to the level

    found, for e0aple, in 9icrosoft 0cel% Conse8uently, it is often easier to prepare and 3clean3 data

    prior to loading the into R% 2here is an add!in to R that provides soe integration (ith 0cel% Go

    to http+rco%univie%ac%atand loo# for R0cel%

    A possible barrier to learning R is that it is generally coand!line driven% 2hat is, the user types a

    coand that the soft(are interprets and responds to% 2his can be daunting for those (ho are usedto e0tensive graphical user interfaces GUIs- (ith drop!do(n enus, tabs, pop!up enus, left or

    right!clic#ing and other navigational tools to steer you through a process% It ay ean that R ta#es

    a (hile longer to learn1 ho(ever, that tie is (ell spent% nce you #no( the coands it is usually

    uch faster to type the than to (or# through a series of enu options% 2hey can be easily edited

    to change things such as the siFe or colour of sybols on a graph, and a log or script of the

    coands can be saved for use on another occasion%

    Saying that, a fairly siple and platfor independent GUI called R Coander can be installed

    see http+cran%r!pro;ect%org(ebpac#agesRcdrinde0%htl-% 4ield et al%3s boo#Discovering

    Statistics Using Rprovides a coprehensive introduction to statistical analysis in R using both

    coand!lines and R Coander%

    1.2 Getting Started

    Assuing R has been installed in the noral (ay on your coputer, clic#ing on the lin#shortcut to

    R on the des#top (ill open the RGui, offering soe drop!do(n enu options, and also the R

    Console, (ithin (hich R coands are typed and e0ecuted% 2he appearance of the RGui differs a

    little depending upon the operating syste being used /indo(s, 9ac or 'inu0- but having used

    one it should be fairly straightfor(ard to navigate around another%

    $

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    http://cran.r-project.org/http://rcom.univie.ac.at/http://cran.r-project.org/web/packages/Rcmdr/index.htmlhttp://cran.r-project.org/http://rcom.univie.ac.at/http://cran.r-project.org/web/packages/Rcmdr/index.html
  • 8/12/2019 intro_to_R1

    6/36

  • 8/12/2019 intro_to_R1

    7/36

  • 8/12/2019 intro_to_R1

    8/36

    1.3.2 Logging

    *ou can save the contents of the R Console (indo( to a te0t file% 2he easiest (ay to do this is to

    clic# on the R Console to ta#e the focus fro the Scripting (indo(- and then use 4ile L Save

    History in /indo(s- or 4ile L Save As 9ac-% "ote that graphics are not usually plotted in the R

    Console and therefore need to be saved separately%

    1.4 Some R Basics

    1.4.1 Functions, assignments and getting elp

    It is helpful to understand R as an ob;ect!oriented syste that assigns inforation to ob;ects (ithin

    the current (or#space% 2he (or#space is siply all the ob;ects that have been created or loaded

    since beginning the session in R% 'oo# at it this (ay+ the ob;ects are li#e bo0 files, containing useful

    inforation, and the (or#space is a larger storage bo0, #eeping the inforation together% A useful

    feature of this is that R can operate on ultiple tables of data at once+ they are ;ust stored as

    separate ob;ects (ithin the (or#space%

    2o vie( the ob;ects currently in the (or#space, type

    > ls()cha%acte%(!)

    5oing this runs the function ls(), (hich lists the contents of the (or#space% 2he result,

    cha%acte%(!), indicates that the (or#space is epty% Assuing it currently is-%

    2o find out ore about a function, type 8or help (ith the function nae,

    > 8ls()> help(ls)

    2his (ill provide details about the function, including e0aples of its use% It (ill also list the

    arguents re8uired to run the arguent, soe of (hich ay be optional and soe of (hich ay

    have default values (hich can be changed if re8uired% Consider, for e0aple,

    > 8log()

    A re8uired arguent is 0, (hich is the data value or values% 2yping log()oits any data and

    generates an error% Ho(ever, log(1!!)(or#s ;ust fine% 2he arguent base ta#es a default value of e6

    (hich is appro0iately 7%@7 and eans the natural logarith is calculated% Using log(1!!,

    ase91!)gives the coon logarith, (hich can also be calculated using the convenience function

    log1!(1!!).

    2he results of atheatical e0pressions can be assigned to ob;ects, as can the outcoe of any

    coands e0ecuted in the R Console% /hen the ob;ect is given a nae that is different to otherob;ects (ithin the current (or#space, a ne( ob;ect (ill be created% /here the nae and ob;ect

    already e0ists, the previous contents of the ob;ect (ill be over!(ritten, (ithout (arning N so be

    carefulK

    > a 7 1! : > p%int(a)[1] > 7 1! $ 2> p%int()[1] 2!> p%int(a $ )

    [1] 1!!> a 7 a $

    =

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $

  • 8/12/2019 intro_to_R1

    9/36

    > p%int(a)[1] 1!!

    In these e0aples the assignent is achieved using the cobination of 7 and , as in a 7 1!!%

    Alternatively, 1!! > acould be used or, ore siply, a 9 1!!% 2he p%int(..)coand can often

    be oitted, though it is useful, and soeties necessary for e0aple, (hen (hat you had hoped

    (ould appear on screen doesn3t-%

    > & 9 a $ > p%int(&)[1] 2!!!> &[1] 2!!!> s*%t()[1] .421-3> p%int(s*%t(), digits9-) # The additional pa%amete% no; speci&ies

    # the nme% o& signi&icant &ig%es[1] .4> c(a,) # The c(...) &nction comines its a%gments[1] 1!! 2!

    > c(a,s*%t())[1] 1!!.!!!!!! .421-3> p%int(c(a,s*%t()), digits9-)[1] 1!!.!! .4

    Although the naing of ob;ects is fle0ible, there are soe e0ceptions,

    > 2a 7 1!=%%o%/ nepected smol in "2a"

    "ote also that R is case sensitive, so a and A are different ob;ects> a 7 1!> ? 7 2!> a 99 ?[1] @?AB=

    2he follo(ing is not sensible because it (on3t appear in the (or#space, although it is there,

    > .a 7 1!> ls()[1] "a" "" "&"> .a[1] 1!> %m(.a, ?) # Remo'es the oCects .a and ? (see elo;)

    1.4.2 Removing o!"ects from te #or$space

    4ro typing ls()(e #no( that the (or#space no longer is epty% 2o reove an ob;ect fro the

    (or#space it can be referenced e0plicitly or indirectly by its position in the (or#space% 2o see ho(

    the second of these options (ill (or#, type

    > ls()[1] "a" "" "&"

    2he output returned fro the ls()function here is a vector of length three (here the first eleent is

    the first ob;ect alphabetically- in the (or#space, the second is the second ob;ect, and so forth% /e

    can access specific eleents by using notation of the for ls[inde.nme%]% So, the first eleent,

    @

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $

  • 8/12/2019 intro_to_R1

    10/36

    the first ob;ect in the (or#space can be obtained using,

    > ls()[1] # Det the %ackets %ightE some %onded some s*a%e[1] "a"> ls()[2][1] ""

    "ote ho( the s8uare brac#ets[F] are being used to reference specific eleents (ithin the vector%

    Siilarly,

    > ls()[-][1] "&"> ls()[c(1,-)][1] "a" "&"> ls()[c(1,2,-)][1] "a" "" "&"> ls()[c(1/-)] # 1/- means the nme%s 1 to -[1] "a" "" "&"

    Using the reove function, %m(...), the second and third ob;ects in the (or#space can be reoved

    using> %m(list9ls()[c(1,-)])> ls()[1] ""

    Alternatively, ob;ects can be reoved by nae

    > %m()

    2o delete all the ob;ects in the (or#space and therefore epty it, type the follo(ing code but N be

    (arnedK N there is no undo function% /henever %m(...)is used the ob;ects are deleted peranently%

    > %m(list9ls())> ls()

    cha%acte%(!) # Gn othe% ;o%ds, the ;o%kspace is empt

    1.4.3 Saving and loading #or$spaces

    Because ob;ects are deleted peranently, a sensible precaution prior to using %m(...)is to save the

    (or#space% 2o do so perits the (or#space to be reloaded if necessary and the ob;ects recovered%

    ne (ay to save the (or#space is to use

    > sa'e.image(&ile.choose(ne;9T))

    Alternatively, the drop!do(n enus can be used 4ile L Save /or#space in the /indo(s version

    of the RGui-% In either case, type the e0tension %R5ata anually else it ris#s being oitted, a#ing

    it harder to locate and reload (hat has been saved% 2ry creating a couple of ob;ects in your(or#space and then save it (ith the naes (or#space6%R5ata

    2o load a previously saved (or#space, use

    > load(&ile.choose())

    or the drop!do(n enus%

    /hen 8uitting R, it (ill propt to save the (or#space iage% If the option to save is chosen it (ill

    be saved to the file %R5ata (ithin the (or#ing directory% Assuing that directory is the default one,

    the (or#space (ill be reloaded autoatically each and every tie R is opened, (hich could be

    useful or it could be irritating% 2o stop it, locate and delete the file% 2he current (or#ing directory is

    identified using the get (or#ing directory, get;d()and changed ost easily using the drop!do(n

    >

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $ get;d()[1] "se%sgg%Ch"

    *our (or#ing directory (ill differ fro the above%

    A good strategy for file anageent is to create a ne( folder for each pro;ect in R, saving the

    (or#space regularly in it using a naing convention such as 5ecO>O6%R5ata, 5ecO>O7%R5ata etc%2hat (ay you can easily find and recover (or#%

    1. !uitting R

    Before 8uitting R, you ay (ish to save the (or#space% 2o 8uit R use either the drop!do(n enus

    or

    > *()

    As proised, you (ill be propted (hether to save the (or#space% Ans(ering yes (ill save the

    (or#space to the file %R5ata in the current (or#ing directory see section 6%?%$,3Saving and loading

    (or#spaces3, on page6&, above-% 2o e0it (ithout the propt, use> *(sa'e 9 "no")

    r, ore siply,

    > *("no")

    1." Getting #e$p

    In addition to the use of the 8or help(F)docuentation and the aterial available at CRA",

    http+cran%r!pro;ect%org,R has an active user counity% Helpful ailing lists can be accessed

    fro (((%r!pro;ect%orgail%htl%

    erhaps the best all round introduction to R is theAnIntroduction to R(hich is freely available atCRA" http+cran%r!pro;ect%organuals%htl- or by using the drop!do(n Help enus in the RGui%

    It is clear and succinct%

    I also have a free introduction to statistical analysis in R (hich accopanies the boo# Statistics for

    Geograph and !nviron"ental Science% It can be obtained fro http+(((%social!statistics%orgP

    pQ$:?%

    2here are any boo#s available% 9y favourite, (ith a oderate level statistical leaning and (ritten

    (ith clarity is,

    9aindonald, D% Braun, D%, 7&&@%Data Analsis and Graphics using R7ndedition-% Cabridge+

    CU%

    I also find useful,

    Adler, D%, 7&6&%R in a Nutshell% 3Reilly+ Sebastopol, CA%

    Cra(ley, 9D, 7&&:% Statistics# An Introduction using R% Chichester+ /iley (hich is a shortened

    version of $he R %oo&by the sae author-%

    4ield, A%, 9iles, D% 4ield, %, 7&67%Discovering Statistics Using R% 'ondon+ Sage

    Ho(ever, none of these boo#s is about apping or spatial analysis of particular interest to e as a

    geographer-% 4or that, the authoritative guide a#ing the lin#s bet(een geographical inforation

    %Applied Spatial Data Analsis with R%

    Berlin+ Springer%

    Also helpful is,

    /ard, 9%5% S#rede Gleditsch, V%, 7&&>% Spatial Regression 'odels% 'ondon+ Sage% /hich usesR code e0aples-%

    2he follo(ing boo# has a short section of aps as (ell as other graphics in R and is also, as the

    title suggests, good for practical guidance on ho( to analyse surveys using cluster and stratified

    sapling, for e0aple-+

    'uley, 2%, 7&6&% (o"ple) Surves. A Guide to Analsis Using R.Hobo#en, "D+ /iley%

    Springer publish an ever!gro(ing series of boo#s under the banner Use RK If you are interested in

    visualiFation, tie!series analysis, Bayesian approaches, econoetrics, data ining, W, then you3ll

    find soething of relevance at http+(((%springer%coseries=

  • 8/12/2019 intro_to_R1

    13/36

  • 8/12/2019 intro_to_R1

    14/36

    "e0t the nuber of coluns and ro(s, and a chec# N ro(!by!ro( N to see if the data are coplete

    have no issing data-%

    > ncol(schools.data)> n%o;(schools.data)> complete.cases(schools.data)

    It is not the ost coprehensive chec# but everything appears to be in order%

    2.3 Some simp$e graphics

    2he file schools%csv contains inforation about the location and soe attributes of schools in

    Greater 'ondon in 7&&>-% 2he locations are given as a grid reference asting, "orthing-% 2he

    inforation is not real but is realistic% It should not, ho(ever, be used to a#e inferences about real

    schools in 'ondon%

    f particular interest is the average attainent on leaving priary school of pupils entering their

    first year of secondary school% 5o soe schools in 'ondon attract higher attaining pupils ore than

    othersP 2he variable attainent contains this inforation%

    A stripchart and then a histogra (ill sho( that not surprisingly- there is variation in the average

    prior attainent by school%

    > attach(schools.data)> st%ipcha%t(attainment, method9"stack", la9"Mean H%io% ?ttainment Bchool")> hist(attainment, col9"light le", o%de%9"da%k le", &%e*9@, lim9c(!,!.-!),+ la9IMean attainment)

    Here the histogra is scaled so the total area sus to one% 2o this (e can add a rug plot,

    > %g(attainment)

    also a density curve, a "oral curve for coparison and a legend%

    > lines(densit(so%t(attainment)))> 7 se*(&%om92-, to9-, 9!.1)> 7 dno%m(, mean(attainment), sd(attainment))> lines(, , lt9"dotted")> %m(, )> legend("top%ight", legend9c("densit c%'e","Jo%mal c%'e"),+ lt9c("solid","dotted"))

    If (ould be interesting to #no( if attainent varies by school type% A siple (ay to consider this is

    to produce a bo0 plot% 2he data contain a series of duy variables for each of a series of school

    types oluntary Aided Church of ngland+ coe Q 61 oluntary Aided Roan Catholic+ rc Q 61

    oluntary controlled faith school+ vol%con Q 61 another type of faith school+ other%faith Q 61 a

    selective school (ith an entrance e0a-+ selective Q 6-% /e (ill cobine these into a single,

    categorical variable then produce the bo0 plot sho(ing the distribution of average attainent by

    school type%

    4irst the categorical variable+

    > school.tpe 7 %ep("Jot @aithBelecti'e", times9n%o;(schools.data))> school.tpe[coe991] 7 "K? Lo="> school.tpe[%c991] 7 "K? RL"> school.tpe['ol.con991] 7 "KL"> school.tpe[othe%.&aith991] 7 "the% @aith"> school.tpe[selecti'e991] 7 "Belecti'e"

    > school.tpe 7 &acto%(school.tpe)

    67

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $ le'els(school.tpe)[1] "Jot @aithBelecti'e" "the% @aith" "Belecti'e" [etc.]

    "o( the bo0 plots+

    > pa%(mai9c(1,1.,!.,!.)) # Lhanges the g%aphic ma%gins> oplot(attainment N school.tpe, ho%iOontal9T, la9"Mean attainment", las91,+ ce.ais9!.5) # Gncldes options to d%a; the oes and laels ho%iOontall

    > aline('9mean(attainment), lt9"dashed") # ?dds the mean 'ale to the plot> legend("top%ight", legend9"D%and Mean", lt9"dashed")

    Figure +.1. A histogra" with annotation in R

    Figure +.+. 'ean prior attain"ent , school tpe

    "ot surprisingly, the selective schools recruit the pupils (ith highest average prior attainent%

    6$

    $

    =

  • 8/12/2019 intro_to_R1

    16/36

  • 8/12/2019 intro_to_R1

    17/36

    schools in 'ondon by the proportion of their inta#e (ho are free school eal eligible% 2he result is

    the regression line sho(n on the scatterplot above-%

    2he second adds a variable giving the proportion of the inta#e of a (hite ethnic group%

    2he third adds a duy variable indicating (hether the school is selective or not%

    > model1 7 lm(attainment N &sm, data9schools.data)> smma%(model1)

    Lall/lm(&o%mla 9 attainment N &sm, data 9 schools.data)

    Residals/ Min 1S Median -S Ma2.5541 !.41- !.1153 !.54 -.3351

    Loe&&icients/ =stimate Btd. =%%o% t 'ale H%(>t)(Gnte%cept) 26.316! !.115 25.12 72e13 $$$

    &sm 3.36 !.-3!- 15.14 72e13 $$$Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.! U.V !.1 U V 1

    Residal standa%d e%%o%/ 1.145 on -3 deg%ees o& &%eedomMltiple Rs*a%ed/ !.4,?dCsted Rs*a%ed/ !.4-3@statistic/ --!.- on 1 and -3 P@, p'ale/ 7 2.2e13

    > model2 7 lm(attainment N &sm + ;hite, data9schools.data)> smma%(model2)

    Lall/lm(&o%mla 9 attainment N &sm + ;hite, data 9 schools.data)

    Residals/ Min 1S Median -S Ma2.62 !.426 !.1-- !.111 -.45-4

    Loe&&icients/ =stimate Btd. =%%o% t 'ale H%(>t)(Gnte%cept) -!.12! !.1646 12.21 7 2e13 $$$&sm 4.2!2 !.21 14.2! 7 2e13 $$$;hite !.5422 !.2463 -.12 !.!!163 $$

    Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.! U.V !.1 U V 1

    Residal standa%d e%%o%/ 1.13 on -3 deg%ees o& &%eedomMltiple Rs*a%ed/ !.554, ?dCsted Rs*a%ed/ !.56@statistic/ 14-.6 on 2 and -3 P@, p'ale/ 7 2.2e13

    > model- 7 pdate(model2, . N . + selecti'e)> smma%(model-)

    Lall/lm(&o%mla 9 attainment N &sm + ;hite + selecti'e, data 9 schools.data)

    6:

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $

  • 8/12/2019 intro_to_R1

    18/36

    Residals/ Min 1S Median -S Ma2.3232 !.32! !.!-4 !.3!4 -.321

    Loe&&icients/ =stimate Btd. =%%o% t 'ale H%(>t)(Gnte%cept) 26.14!3 !.1356 142.412 72e13 $$$

    &sm .2-51 !.-61 1.53 72e13 $$$;hite !.2266 !.226 1.!22 !.-!4selecti'e -.435 !.2--5 1.542 72e13 $$$Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.! U.V !.1 U V 1

    Residal standa%d e%%o%/ !.6156 on -3- deg%ees o& &%eedomMltiple Rs*a%ed/ !.352-, ?dCsted Rs*a%ed/ !.3463@statistic/ 26.5 on - and -3- P@, p'ale/ 7 2.2e13

    'oo#ing at the ad;usted R!s8uared value, each odel appears to be an iproveent on the one that

    precedes it arginally so for odel 7-% Ho(ever, loo#ing at the last odel $-, (e ay suspect that

    (e could drop the (hite ethnicity variable (ith no significant loss in the aount of variancee0plained% An analysis of variance confirs that to be the case%

    > model 7 pdate(model-, . N . ;hite)> ano'a(model, model-)?nalsis o& Ka%iance Tale

    Model 1/ attainment N &sm + selecti'eModel 2/ attainment N &sm + ;hite + selecti'e Res.P& RBB P& Bm o& B* @ H%(>@)1 -3 -!4.22 -3- -!3. 1 !.55222 1.!4 !.-!4

    2he residual error, easured by the residual su of s8uares RSS-, is not very different for the t(oodels, and that difference, &%>>7, is not significant 4 Q 6%&?:, p Q &%$&@-%

    2. Some simp$e maps

    2he schools data contain geographical coordinates and are therefore geographical data%

    Conse8uently they can be apped% 2he siplest (ay for point data is to use a 7!diensional plot,

    a#ing sure the aspect ratio is fi0ed correctly%

    > plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools")

    Aongst the attribute data for the schools, the variable esl gives the proportion of pupils (ho spea#

    nglish as an additional language% It (ould be interesting for the siFe of the sybol on the ap tobe proportional to it%

    > plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools",+ ce9s*%t(esl$))

    It ight also be nice to add a little colour to the ap% /e ight, for e0aple, change the default

    plotting 3character3 to a filled circle (ith a yello( bac#ground%

    > plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools",+ ce9s*%t(esl$), pch921, g9"ello;")

    A ore interesting option (ould be to have the circles filled (ith a colour gradient that is related to

    a second variable in the data N the proportion of pupils eligible for free school eals for e0aple%

    2o achieve this, (e can begin by creating a siple colour palette+

    6=

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $ palette 7 c("ello;","o%ange","%ed","p%ple")

    /e no( cut the free school eals eligibility variable into 8uartiles four classes, each containing

    appro0iately the sae nuber of observations-%

    > map.class 7 ct(&sm, *antile(&sm), laels9@?AB=, inclde.lo;est9TR=)

    /hat has happened is that the fs variable has been split into four groups (ith the value 6 given to

    the first 8uarter of the data schools (ith the lo(est proportions of eligible pupils-, the value 7 given

    to the ne0t 8uarter, then $, and finally the value ? for schools (ith the highest proportions of 4S9

    eligible pupils%

    2here are, then, no( four ap classes and the sae nuber of colours in the palette% Schools in

    ap class 6 and (ith the lo(est proportion of fs!eligible pupils- (ill be coloured yello(, the ne0t

    class (ill be orange, and so forth%

    Bringing it all together,

    > plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools",+ ce9s*%t(esl$), pch921, g9palette[map.class])

    It (ould be good to add a legend, and perhaps a scale bar and "orth arro(% "evertheless, as a first

    ap in R this isn3t too badK

    Figure +.-. A si"ple point "ap in R

    /hy don3t (e be a bit ore abitious and overlay the ap on a Google 9aps tile, adding a legend

    as (e do soP 2his re8uires us to load an additional library for R and to have an active Internetconnection%

    > li%a%(RgoogleMaps)

    If it hasn3t been installed, it could be using install.packages(c("RgoogleMaps","png"))(hich

    installs both it and another pac#age, png, that it re8uires for any functions-%

    Assuing that the data frae, schools%data, reains in the (or#space and attachedit (ill be if you

    have follo(ed the instructions above-, and that the colour palette created above has not been

    deleted, then the ap sho(n in 4igure 7%? is created (ith the follo(ing code+

    > MMap 7 MapWackg%ond(lat9Aat, lon9Aong)

    > HlotnBtaticMap(MMap, Aat, Aong, ce9s*%t(esl$), pch921,

    6@

    $

    =

    76

    7?

    7@

  • 8/12/2019 intro_to_R1

    20/36

    g9palette[map.class])> legend("tople&t", legend9paste("7",tappl(&sm, map.class, ma)), pch921, pt.g9palette, pt.ce91., g9";hite", title9"H(@BMeligile)")> legKals 7 se*(&%om9!.2,to91,9!.2)> legend("top%ight", legend9%ond(legKals,-), pch921, pt.g9";hite", pt.ce9s*%t(legKals$), g9";hite", title9"H(=BA)")

    Reeber that the data are siulated% 2he points sho(n on the ap are not the true locations of

    schools in 'ondon%

    Figure +.. A slightl less si"ple "ap produced in R

    2." Some simp$e geographica$ ana$)sis

    Reeber the regression odels fro earlierP It (ould be interesting to test the assuption thatthe residuals e0hibit independence by loo#ing for spatial dependencies% 2o do this (e (ill consider

    to (hat degree the residual value for any one school correlates (ith the ean residual value for its

    si0 nearest other schools the choice of si0 is copletely arbitrary-%

    4irst, (e (ill ta#e a copy of the schools data and convert that into an e0plicitly spatial ob;ect in R+

    > detach(schools.data)> schools. 7 schools.data> li%a%(sp)> attach(schools.)> coo%dinates(schools.) 7 c("=asting", "Jo%thing")> # Lon'e%ts into a spatial oCect> class(schools.)

    6>

    $

    =

    76

  • 8/12/2019 intro_to_R1

    21/36

    > detach(schools.)> p%oCst%ing(schools.) 7 LRB("+p%oC9tme%c datm9BDW-3")> # Bets the Loo%dinate Re&e%encing Bstem

    Second, (e find the si0 nearest neighbours for each school%

    > li%a%(spdep)> nea%est.si 7 knea%neigh(schools., k93, R?JJ9@)

    > # R?JJ 9 @ to o'e%%ide the se o& the R?JJ package that ma not e installed/e can learn fro this that the si0 nearest schools to the first school in the data ro( 6- are schools

    :, $>, 7, ?&, 77$ and =+

    > nea%est.siXnn[1,][1] -5 2 ! 22- 3

    2he neighbours ob;ect, nearest%si0, is an ob;ect of class #nn+

    > class(nea%est.si)

    It is ne0t converted into the ore generic class of neighbours%

    > neigho%s 7 knn2n(nea%est.si)> class(neigho%s)

    [1] "n"> smma%(neigho%s)Jeigho% list oCect/Jme% o& %egions/ -34Jme% o& nonOe%o links/ 22!2He%centage nonOe%o ;eights/ 1.3-544?'e%age nme% o& links/ 3[etc.]

    2he connections bet(een each point and its neighbours can then be plotted% It ay ta#e a fe(

    inutes%

    > plot(neigho%s, coo%dinates(schools.))

    Having identified the si0 nearest neighbours to each school (e could give each e8ual (eight in aspatial (eights atri0 or, alternatively, decrease the (eight (ith distance a(ay so the first nearest

    neighbour gets ost (eight and the si0th nearest the least-% Creating a atri0 (ith e8ual (eight

    given to all neighbours is straightfor(ard%

    > spatial.;eights 7 n2list;(neigho%s)

    2he other possibility (ill not be considered further here but is achieved by creating then supplying

    a list of general (eights to the function-

    /e no( have all the inforation re8uired to test (hether there are spatial dependencies in the

    residuals% 2he ans(er is yes 9oran3s I Q &%76>, p Z &%&&6, indicating positive spatial

    autocorrelation-%

    > lm.mo%antest(model, spatial.;eights)

    Dloal Mo%ans G &o% %eg%ession %esidals

    data/model/ lm(&o%mla 9 attainment N &sm + selecti'e, data 9 schools.data);eights/ spatial.;eightsMo%an G statistic standa%d de'iate 9 4.612, p'ale 9 1.2-e1alte%nati'e hpothesis/ g%eate%sample estimates/

    se%'ed Mo%ans G =pectation Ka%iance!.215161352 !.!!-554! !.!!!454!115

    6 sa'e.image(&ile.choose(ne;9T))

    > %m(list9ls()) # We ca%e&l, it deletes e'e%thingE

    7&

    $

    =

  • 8/12/2019 intro_to_R1

    23/36

    Session 3: A Litt$e ,ore about the wor(ings o& R

    2his session provides a little ore guidances on the 3inner (or#ings3 of R% All the coands are

    contained in file session$%R and can be run using it see 3Scripting3on p%@-%

    3.1 '$asses and t)pes'et us create t(o ob;ects, each a vector containing ten eleents% 2he first (ill be the nubers fro

    one to ten, recorded as integers% 2he second (ill be the sae se8uence but no( recorded as real

    nubers that is, 3floating point3 nubers, those (ith a decial place-%

    > 7 1/1!> [1] 1 2 - 3 4 5 6 1!> c 7 se*(&%om91.!, to91!.!, 91)> c[1] 1 2 - 3 4 5 6 1!

    "ote that in the second case, (e could ;ust type,> c 7 se*(1, 1!, 1)> c[1] 1 2 - 3 4 5 6 1!

    2his (or#s because if (e don3t e0plicitly define the arguent so oit &%om91etc%- then R (illassue that (e are giving values to the arguents in their default order, (hich in this case is fro,

    to and by%2ype ?seqand loo# under Usage for this to a#e a little ore sense%

    In any case, the t(o ob;ects, b and c, are printed the sae on screen but one is an ob;ect of class

    integer (hereas the other is an ob;ect of class nueric and of type double double precision in the

    eory space-%

    > class()[1] "intege%"> class(c)[1] "nme%ic"> tpeo&(c)[1] "dole"

    ften it possible to coerce an ob;ect fro one class and type to another%

    > 7 1/1!> class()[1] "intege%"

    > 7 as.dole()> class()[1] "nme%ic"> tpeo&()[1] "dole"> class(c)> c 7 as.intege%(c)> class(c)[1] "intege%"> c[1] 1 2 - 3 4 5 6 1!> c 7 as.cha%acte%(c)> class(c)[1] "cha%acte%"

    76

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $ set.seed(1!1)> 'a%2 7 - $ 'a%1 + 1! + %no%m(1!!, !, 2)# ;hich, ecase n, mean and sd a%e the &i%st th%ee a%gments into %no%m# is the same as ;%iting 'a%2 7 - $ 'a%1 + 1! + %no%m(n91!!, mean91!!, sd92!)> head('a%2)[1] 23.2316 --.5-!1 22.6554 11.!45 --4.-64 26!.1211

    "e0t, the t(o variables are gathered together in a data table, of class data frae, (here each ro( is

    an observation and each colun is a variable% 2here is ore about data fraes on page 7@, inSection $%735ata fraes3-

    > mdata 7 data.&%ame( 9 'a%1, 9 'a%2)> class(mdata)[1] "data.&%ame"> head(mdata) 1 54.4!62 23.23162 1!-.34254 --.5-!1- 5-.254- 22.6554 1-1.6!32 11.!45 1!3.6!13 --4.-643 5-.6!3- 26!.1211> n%o;(mdata) # The nme% o& %o;s in the data[1] 1!!> ncol(mdata) # The nme% o& colmns[1] 2

    In this case, plotting the data frae (ill produce a scatter plot% 2he line of best fit also sho(n in

    4igure $%6 (ill be added shortly-%

    > plot(mdata)

    If there had been ore than t(o coluns in the data table, or if they had not been arranged in 0, y

    order, then the plot could be produced by referencing the coluns directly% All the follo(ing are

    e8uivalent+

    7$

    $

    =

    76

    7?

    7@

    $&

  • 8/12/2019 intro_to_R1

    26/36

    > ;ith(mdata, plot(, )) # Ye%e the o%de% is , > ;ith(mdata, plot( N )) # Ye%e it is N > plot(mdataX, mdataX)> plot(mdata[,1], mdata[,2]) # Hlot sing the &i%st and second colmns> plot(mdata[,2] N mdata[,1])

    2he attach(...)coand could also be used% 2his is introduced in Section $%7%7,3Attaching a data

    frae3on page7 model1 7 lm( N , data9mdata) # lm is sho%t &o% linea% model> class(model1)[1] "lm"

    odel6 is an ob;ect of class l, short for linear odel% Using the smma%(...)function suarises

    the relationship bet(een y and 0%

    > smma%(model1)Lall/lm(&o%mla 9 N , data 9 mdata)Residals/ Min 1S Median -S Ma4.1!2 13.24 !.5 1.155 4.26!Loe&&icients/

    =stimate Btd. =%%o% t 'ale H%(>t)(Gnte%cept) 5.332 1-.32!5 !.3- !.24 -.!!2 !.1-1- 22.545 72e13 $$$Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.! U.V !.1 U V 1

    Residal standa%d e%%o%/ 2-.4 on 65 deg%ees o& &%eedomMltiple Rs*a%ed/ !.52-, ?dCsted Rs*a%ed/ !.5!4@statistic/ 2-. on 1 and 65 P@, p'ale/ 7 2.2e13

    "o( using the plot(...)function on the ob;ect of class l has an effect that is soe(hat different

    fro the previous t(o cases% It produces a series a diagnostic plots to help chec# the assuptions of

    7?

    $

    =

    76

    7?

    7@

    $&

  • 8/12/2019 intro_to_R1

    27/36

  • 8/12/2019 intro_to_R1

    28/36

    > names(mdata)[1] "" ""

    or (ith

    > colnames(mdata)[1] "" ""

    2he ro( naes appear to be the nubers fro 6 to 6&& the nuber of ro(s in the data-, though

    actually they are character data+

    > %o;names(mdata) [1] "1" "2" "-" "" "" "3" "4" "5" [etc.]> class(%o;names(mdata))[1] "cha%acte%"

    2he colun naes can be changed either individually or together% Individually+

    > names(mdata)[1] 7 "'1"> names(mdata)[2] 7 "'2"> names(mdata)[1] "'1" "'2"

    And all at once+> names(mdata) 7 c("","")> names(mdata)[1] "" ""

    W as can the ro( naes,

    > %o;names(mdata)[1] 7 "!"> %o;names(mdata) [1] "!" "2" "-" "" "" "3" "4" "5" [etc.]> %o;names(mdata) 9 se*(&%om9!, 91, length.ot9n%o;(mdata))> %o;names(mdata) [1] "!" "1" "2" "-" "" "" "3" "4" "5" [etc.]

    2he above can be especially useful (hen erging data tables (ith GIS shapefiles in R because thefirst entry in an attribute table for a shapefile usually is given an I5 of &-% ther(ise, it is usually

    easiest for the first ro( in a data table to be labelled 6, so let3s put the bac# to ho( they (ere%

    > %o;names(mdata) 9 1/n%o;(mdata)> %o;names(mdata) [1] "1" "2" "-" "" "" "3" "4" "5" [etc.]

    3.2.1 Referencing ro#s and columns in a data frame

    2he s8uare brac#et notation can be used to inde0 specific ro(, coluns or cells in the data frae%

    4or e0aple+

    > mdata[1,] # The &i%st %o; o& data 1 54.4!62 23.2316> mdata[2,] # The second %o; o& data 2 1!-.3426 --.5-!1> %ond(mdata[2,],2) # The second %o;, %onded to 2 decimal places 2 1!-.34 --.5-> mdata[n%o;(mdata),] # The &inal %o; o& the data 1!! 6!.-166 231.2-3

    > mdata[,1] # The &i%st colmn o& data

    7=

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $

  • 8/12/2019 intro_to_R1

    29/36

    [1] 54.4!62 1!-.34254 5-.254- 1-1.6!32 [etc.]

    > mdata[,2] # The second colmn, ;hich is also F[1] 23.2316 --.5-!1 22.6554 11.!45 --4.-64 [etc.]

    > mdata[,ncol(mdata)] # F the &inal colmn o& data [1] 23.2316 --.5-!1 22.6554 11.!45 --4.-64 [etc.]> mdata[1,1] # The data in the &i%st %o; o& the &i%st colmn[1] 54.4!62

    > mdata[,2] # The data in the &i&th %o; o& the second colmn[1] --4.-64> %ond(mdata[,2],!)[1] --5

    Specific coluns of data can also be referenced using the $notation

    > mdataX # =*i'alent to mdata[,1] ecase the colmn name is [1] 54.4!62 1!-.34254 5-.254- 1-1.6!32 1!3.6!13 [etc.]> mdataX [1] 23.2316 --.5-!1 22.6554 11.!45 --4.-64 26!.1211 [etc.]> smma%(mdataX) Min. 1st S. Median Mean -%d S. Ma..41 6!.12 1!2.-! 1!2.2! 11-.5! 15.!!

    > smma%(mdataX) Min. 1st S. Median Mean -%d S. Ma.1!. 25.1 -1.1 -1.3 -.4 4.3

    > mean(mdataX)[1] 1!2.1444> median(mdataX)[1] -1.1223> sd(mdataX) # Di'es the standa%d de'iation o& [1] 14.63-66> oplot(mdataX)

    > oplot(mdataX, ho%iOontal9T, main9"Woplot o& 'a%iale ")Bo0plots are soeties said to be easier to read (hen dra(n horiFontally-

    ne (ay to avoid the use of the X notation is to use the function ;ith(...)instead+

    > ;ith(mdata, 'a%()) # Di'es the 'a%iance o& [1] -22.4!5> ;ith(mdata, plot(, la9"se%'ation nme%"))

    3.2.2 %ttacing a data frame

    Soeties any of the (ays to access a specific part of a data table becoes tiresoe and it is useful

    to reference the colun or variable nae directly% 4or e0aple, instead of having to typemean(mdata[,1]), mean(mdataX)or ;ith(mdata, mean())it (ould be easier ;ust to refer to the

    variable of interest, 0, as in mean()%

    2o achieve this the attach(...)coand is used% Copare, for e0aple,

    > mean()=%%o% in mean() / oCect not &ond

    (hich generates an error because there is not an ob;ect called 0 in the (or#space1 it is only a

    colun nae (ithin the data frae ydata- (ith

    > attach(mdata)> mean()

    [1] 1!2.1444

    7@

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $

  • 8/12/2019 intro_to_R1

    30/36

    (hich (or#s fine% If, to use the earlier analogy, ob;ects in R3s (or#space are li#e bo0 files, then no(

    you have opened one up and its contents (hich include the variable 0- are visible%

    2o detach the contents of the data frae use detach(...)

    > detach(mdata)> mean()=%%o% in mean() / oCect not &ond

    It is sensible to use detach (hen the data frae is no longer being used or else confusion can arise

    (hen ultiple data fraes contain the sae colun naes, as in the follo(ing e0aple+

    > attach(mdata)> mean() # This ;ill gi'e the mean o& mdataX[1] 1!2.1444> mdata2 9 data.&%ame( 9 1/1!, 911/2!)> head(mdata2) 1 1 112 2 12- - 1-

    1 13 3 13> attach(mdata2)The &ollo;ing oCect(s) a%e masked &%om mdata/ , > mean() # This ;ill no; gi'e the mean o& mdata2X[1] .> detach(mdata2)> mean()[1] 1!2.1444> detach(mdata)

    > %m(mdata2)

    3.2.3 Su!&setting te data ta!le and logical 'ueries

    Subsets of a data frae can be created by referencing specific ro(s (ithin it% 4or e0aple, iagine

    (e (ant a table only of those observations that have a a value above the ean of soe variable%

    > attach(mdata)> sset 7 ;hich( > mean())> class(sset)[1] "intege%"> sset[1] 2 4 5 6 11 12 1 15 16 2! 21 22 2 -! -1 -- [etc.]

    > mdata.s 7 mdata[sset,]> head(mdata.s) 2 1!-.3426 --.5-!1 1-1.6!3 11.!45 1!3.6!2 --4.-644 1!6.453 -.415 11.433 -1.5116 111.13 -34.423

    "ote ho( the ro( naes of this subset have been inherited fro the parent data frae%

    A ore direct approach is to define the subset as a logical vector that is either true or false

    dependent upon (hether a condition is et%

    7>

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $

    :6

  • 8/12/2019 intro_to_R1

    31/36

    > sset 7 > mean()> class(sset)[1] "logical"> sset [1] @?AB= TR= @?AB= TR= TR= @?AB= TR= TR= TR= [etc.]> mdata.s 7 mdata[sset,]> head(mdata.s)

    2 1!-.3426 --.5-!1 1-1.6!3 11.!45 1!3.6!2 --4.-644 1!6.453 -.415 11.433 -1.5116 111.13 -34.423

    A yet ore parsionious (ay of achieving the sae is+

    > mdata.s 7 mdata[ > mean(),]# Belects those %o;s that meet the logical condition, and all colmns

    > head(mdata.s)

    2 1!-.3426 --.5-!1 1-1.6!3 11.!45 1!3.6!2 --4.-644 1!6.453 -.415 11.433 -1.5116 111.13 -34.423

    In the sae (ay, to select those ro(s (here 0 is greater than or e8ual to the ean of 0 andy is

    greater than or e8ual to the ean of y

    > mdata.s 7 mdata[ >9 mean() Z >9 mean(),]# The smol Z is sed &o% and

    r, those ro(s (here 0 is less than the ean of 0 or y is less than the ean of y

    > mdata.s 7 mdata[ 7 mean() 7 mean(),]

    # The smol is sed &o% o%

    3.2.4 ,issing data

    9issing data is given the value J?% 4or e0aple,

    > mdata[1,1] 9 J?> mdata[2,2] 9 J?> head(mdata) 1 J? 23.2316

    2 1!-.34254 J?- 5-.254- 22.6554 1-1.6!32 11.!45 1!3.6!13 --4.-643 5-.6!3- 26!.1211

    R (ill, by default, report "A or an error (hen soe calculations are tried (ith issing data+

    > mean(mdataX)[1] J?> *antile(mdataX)=%%o% in *antile.de&alt(mdataX) /missing 'ales and JaJs not allo;ed i& na.%m is @?AB=

    2o overcoe this, the default can be changed or the issing data reoved%

    7 mean(mdataX, na.%m9T)[1] 1!2.-23-> *antile(mdataX, na.%m9T) # Pi'ides the data into *a%tiles ! 2 ! 4 1!!1!.4 252.1!3 -1-.4-3 -3.532 4.3!2!

    4or the second, there are various (ays to reove the issing data% 4or e0aple W> sset 7 Eis.na(mdataX)

    W creates a logical vector (hich is true (here the data values of 0 are not issing the Ein the

    e0presion eans not-+

    > head(sset)[1] @?AB= TR= TR= TR= TR= TR=

    Using the subset,

    > 2 7 mdataX[sset]> mean(2)[1] 1!2.-23-

    9ore succinctly,

    > ;ith(mdata, mean([Eis.na()]))[1] 1!2.-23-

    Alternatively, a ne( data frae could be created (ithout any issing data (hereby any ro( (ith

    any issing value is oitted%

    > sset 7 complete.cases(mdata)> head(sset)[1] @?AB= @?AB= TR= TR= TR= TR=> mdata.complete 9 mdata[sset,]

    > head(mdata.complete) - 5-.254- 22.6554 1-1.6!32 11.!45 1!3.6!13 --4.-643 5-.6!3- 26!.12114 1!6.455 -.415 11.4336 -1.511

    3.2. Reading data &rom a &i$e into a data &rame

    2he accopanying file schools%csv contains inforation about the location and soe attributes of

    schools in Greater 'ondon in 7&&>-% 2he locations are given as a grid reference asting,"orthing-% 2he inforation is not real but is realistic% It should not, ho(ever, be used to a#e

    inferences about real schools in 'ondon%

    A standard (ay to read a file into a data frae, (ith cases corresponding to lines and variables to

    fields in the file, is to use the %ead.tale(...) coand%

    > 8%ead.tale

    In the case of schools%csv, it is coa deliited and has colun headers% 'oo#ing through the

    arguents for %ead.talethe data ight be read into R using

    > schools.data 7 %ead.tale("schools.cs'", heade%9T, sep9",")

    2his (ill only (or# if the file is located in the (or#ing directory, else the location path- of the file

    $&

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $ schools.data 7 %ead.tale(&ile.choose(), heade%9T, sep9",")

    'oo#ing through the usage of read%table in the R help page, a variant of the coand is found

    (here the defaults are for coa deliited data% So, ost siply, (e could use,

    schools.data 7 %ead.cs'(&ile.choose())

    Having read!in the data, soe basic chec#s of it are helpful,

    > head(schools.data, n9-) @BM =?A B=J ;hite lk.ca% lk.a&% indian pakistani [etc.]1 !.36 !.5- !.!-1 !.214 !.!-2 !.222 !.!!2 !.!2!2 !.-61 !.2 !.!!1 !.-! !.!54 !.123 !.!!- !.!12- !.4!5 !.6- !.!-5 !.!5 !.!!! !.2-6 !.!!! !.!!# Kie;s the &i%st th%ee lines o& the data> ncol(schools.data)[1] 14> n%o;(schools.data)

    [1] -33> smma%(schools.data) @BM =?A B=J [etc.]Min. /!.!!!! Min. /!.!!!! Min. /!.!!!!!1st S./!.1-2- 1st S./!.142 1st S./!.!!5!!Median /!.2!! Median /!.-13 Median /!.!2!!!Mean /!.24!2 Mean /!.-61 Mean /!.!2-!5-%d S./!.-564 -%d S./!.122 -%d S./!.!-!!Ma. /!.44-! Ma. /1.!!!! Ma. /!.11-!!

    It sees to be fine%

    4or ore about iporting and e0porting data in R, consult the R help docuent, R 5ata

    Iport0port%

    3.3 Lists

    A list is a little li#e a data frae but offers a ore fle0ible (ay to gather ob;ects of different classes

    together% 4or e0aple,

    > mlist 7 list(schools.data, model1, "a")> class(mlist)[1] "list"

    2o find the nuber of coponents in a list, use length(...),

    > length(mlist)[1] -

    Here the first coponent is the data frae containing the schools data% 2he second coponent is the

    linear odel created earlier% 2he third is the character Ea% 2o reference a specific coponent,

    double s8uare brac#ets are used+

    > head(mlist[[1]], n9-) @BM =?A B=J ;hite lk.ca% lk.a&% indian pakistani [etc.]1 !.36 !.5- !.!-1 !.214 !.!-2 !.222 !.!!2 !.!2!2 !.-61 !.2 !.!!1 !.-! !.!54 !.123 !.!!- !.!12- !.4!5 !.6- !.!-5 !.!5 !.!!! !.2-6 !.!!! !.!!

    > smma%(mlist[[2]])Lall/

    $6

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $ class(mlist[[-]])[1] "cha%acte%"

    2he double s8uare brac#ets can be cobined (ith single ones% 4or e0aple,> mlist[[1]][1,] @BM =?A B=J ;hite lk.ca% lk.a&% indian pakistani [etc.]1 !.36 !.5- !.!-1 !.214 !.!-2 !.222 !.!!2 !.!2!

    is the first ro( of the schools data% 2he first cell of the sae data is

    > mlist[[1]][1,1][1] 24

    3.4 -riting a &unction

    In brief, a function is (ritten in R in the follo( (ay,

    > &nction.name 7 &nction(list o& a%gments) 0+ &nction code+ %et%n(%eslt)+

    So, a siple function to divide the product of t(o nubers by their su could be,

    > m.&nction 7 &nction(1, 2) 0+ %eslt 7 (1 $ 2) (1 + 2)+ %et%n(%eslt)+

    "o( running the function

    > m.&nction(-, 4)[1] 2.1

    3. R pac(ages &or mapping and spatia$ data ana$)sis

    By default, R coes (ith a base set of pac#ages and ethods for data analysis and visualiFation%

    Ho(ever, there are any other pac#ages available, too, that greatly e0tend R3s value and

    functionality% 2hese pac#ages are listed alphabetically at http+cran%r!pro;ect%org(ebpac#ages

    availableOpac#agesObyOnae%htl%

    Because there are so any, it can be useful to bro(se the pac#ages by topic at http+cran%r!

    pro;ect%org(ebvie(s-% 2he topic, or 3tas# vie(3 of particular interest here is the analysis of spatial

    data+ http+cran%r!pro;ect%org(ebvie(sSpatial%htl

    $7

    $

    =

    76

    7?

    7@

    $&

    $$

    $=

    $