Lecture 2 (week 1): introduction to R - NYU Computer...
Transcript of Lecture 2 (week 1): introduction to R - NYU Computer...
Computing with large data setsRichard Bonneau, spring 2009
Lecture 2 (week 1): introduction to R
Thursday, January 22, 2009
other notes, courses, lectures about R and S
v22.0480: computing with data, Richard Bonneau
Ingo Ruczinski and Rafael Irizarry (Johs Hopkins Biostat):
http://www.biostat.jhsph.edu/~bcaffo/statcomp/index.htmlhttp://www.biostat.jhsph.edu/~ririzarr/Teaching/688/
Roger D. Peng (JHU):
http://www.biostat.jhsph.edu/~rpeng/
Read the manual !!:
http://cran.r-project.org/doc/manuals/R-intro.html
Lecture 2Thursday, January 22, 2009
S history
Lecture 1v22.0480: computing with data, Richard Bonneau
S is a language and system for organizing, visualizing, and analyzing data.
S started at Bell Labs since 1976.
The language has evolved through several major versions to become the most widely used environment for research in data analysis and statistics.
In 1998, S became the first statistical system to receive the Software System Award, the top software award from the ACM.
( For a great account of the early history of S see the paper on the course websitehttp://www.research.att.com/areas/stat/doc/94.11.ps )
Lecture 2Thursday, January 22, 2009
R history and facts
Lecture 1v22.0480: computing with data, Richard Bonneau
R is an environment for data analysis and visualization. R is an open source implementation of the S language (S-Plus is a commercial implementation of the S language). The current version of R (September 2004) is 1.9.1. The R Core group consists of Doug Bates, John Chambers, Peter Dalgaard, Rober t Gentleman, Kur t Hornik, Stefano Iacus, Ross Ihaka, Friedrich Leisch, Thomas Lumley, Mar tin Maechler, Guido Masarotto, Paul Murrell, Brian Ripley, Duncan Temple Lang, and Luke Tierney.
join the R Foundation for Statistical Computing http://www.r-project.org/ .
1991 Ross Ihaka and Rober t Gentleman begin work on a project that will ultimately become R. 1992 Design and implementation of pre-R. 1993 The first announcement of R. 1995 R available by ftp under the GPL. 1996 A mailing list is star ted and maintained by Martin Maechler at ETH. 1997 The R core group is formed. 1999 DSC meeting in Vienna, the first time many R core members meet. 2000 R 1.0.0 is released. 2009 R is still very actively developed and availiable for all platforms, open source, pervasive in bioinformatics and several other fields.
Lecture 2Thursday, January 22, 2009
playing around
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- 65.56> y <- 4> x <- 2.0> y <- c(2,4,6)> x * y[1] 4 8 12> x * 2[1] 4> y * y[1] 4 16 36> sqrt( -1 )[1] NaNWarning message:In sqrt(-1) : NaNs produced> sqrt(-1+0i) [1] 0+1i
Lecture 2Thursday, January 22, 2009
playing around
v22.0480: computing with data, Richard Bonneau Lecture 1
> y <- 1:10> y ^^ 2Error: syntax error> y ^ 2 [1] 1 4 9 16 25 36 49 64 81 100> y [1] 1 2 3 4 5 6 7 8 9 10> y <- jitter( y )> y [1] 1.003289 2.011200 2.965646 3.885774 4.909870 5.993501 6.907029 7.956502 8.902033[10] 10.104556> class( y )[1] "numeric"> class( x )[1] "numeric"> length( x )[1] 1> length( y )[1] 10> dim( y )NULL> dim( x )NULL
Lecture 2Thursday, January 22, 2009
playing around
v22.0480: computing with data, Richard Bonneau Lecture 1
> z <- matrix( sample(y), nrow = 5, ncol = 5)> z [,1] [,2] [,3] [,4] [,5][1,] 90.42972 0.7650916 90.42972 0.7650916 90.42972[2,] 9743.39636 6225.7870318 9743.39636 6225.7870318 9743.39636[3,] 3973.01279 250.6684005 3973.01279 250.6684005 3973.01279[4,] 560.98542 1253.9400470 560.98542 1253.9400470 560.98542[5,] 2420.98800 15.4225674 2420.98800 15.4225674 2420.98800> dim(z)[1] 5 5> length( z )[1] 25> summary( y ) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.7651 130.5000 907.5000 2454.0000 3585.0000 9743.0000
Lecture 2Thursday, January 22, 2009
playing around
v22.0480: computing with data, Richard Bonneau Lecture 1
> hist( y )> hist( y, nclass = 20 )> hist( y )> pdf("hist.l2.pdf")> hist( y )> dev.off()quartz 2 > x <- 1:20> y <- runif( length( x ) )> plot( x, y )> abline(h=0.5, lty=2, col="green",lwd=2) > pdf("sample-sesion.pdf")> plot( x, y )> abline(h=0.5, lty=2, col="green",lwd=2)> dev.off()quartz 2
Histogram of y
y
Frequency
0 20 40 60 80 100
01
23
4
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
x
y
Lecture 2Thursday, January 22, 2009
using built in examples
v22.0480: computing with data, Richard Bonneau Lecture 1
> ? heatmap ### then cut and paste in exmples> require(graphics); require(grDevices)> x <- as.matrix(mtcars)> rc <- rainbow(nrow(x), start=0, end=.3)> cc <- rainbow(ncol(x), start=0, end=.3)> hv <- heatmap(x, col = cm.colors(256), scale="column",+ RowSideColors = rc, ColSideColors = cc, margins=c(5,10),+ xlab = "specification variables", ylab= "Car Models",+ main = "heatmap(<Mtcars data>, ..., scale = \"column\")")
## mtcars is a datastructure provided as an example## of how to use heatmap()##
> str( mtcars )'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ qsec: num 16.5 17.0 18.6 19.4 17.0 ... $ vs : num 0 0 1 1 0 1 0 1 1 1 ... $ am : num 1 1 1 0 0 0 0 0 0 0 ... $ gear: num 4 4 4 3 3 3 3 4 4 4 ... $ carb: num 4 4 1 1 2 1 4 2 2 4 ...> ?mtcars ## for description of what it actually is ##
cyl
am vs
carb wt drat
gear
qsec
mpg hp disp
specification variables
Maserati BoraChrysler ImperialLincoln ContinentalCadillac FleetwoodHornet SportaboutPontiac FirebirdFord Pantera LCamaro Z28Duster 360ValiantHornet 4 DriveAMC JavelinDodge ChallengerMerc 450SLCMerc 450SEMerc 450SLHonda CivicToyota CorollaFiat X1−9Fiat 128Ferrari DinoMerc 240DMazda RX4Mazda RX4 WagMerc 280CMerc 280Lotus EuropaMerc 230Volvo 142EDatsun 710Porsche 914−2Toyota Corona
Car M
odel
s
heatmap(<Mtcars data>, ..., scale = "column")
Lecture 2Thursday, January 22, 2009
dumping functions : example code galore
v22.0480: computing with data, Richard Bonneau Lecture 1
> hist ### type function with no “( )” or argumentfunction (x, ...) UseMethod("hist")<environment: namespace:graphics>
you get info, but not code if the function is part of the main R code (part of the base orcore)
> heatmap ### for higher level functions ### or defined functions you ### you get the code
function (x, Rowv = NULL, Colv = if (symm) "Rowv" else NULL, distfun = dist, hclustfun = hclust, reorderfun = function(d, w) reorder(d, w), add.expr, symm = FALSE, revC = identical(Colv, "Rowv"), scale = c("row", "column", "none"), na.rm = TRUE, margins = c(5, 5), ColSideColors, RowSideColors, cexRow = 0.2 + 1/log10(nr), cexCol = 0.2 + 1/log10(nc), labRow = NULL, labCol = NULL, main = NULL, xlab = NULL, ylab = NULL, keep.dendro = FALSE, verbose = getOption("verbose"), ...) { scale <- if (symm && missing(scale)) "none" else match.arg(scale) if (length(di <- dim(x)) != 2 || !is.numeric(x)) stop("'x' must be a numeric matrix") ...truncated
cyl
am vs
carb wt drat
gear
qsec
mpg hp disp
specification variables
Maserati BoraChrysler ImperialLincoln ContinentalCadillac FleetwoodHornet SportaboutPontiac FirebirdFord Pantera LCamaro Z28Duster 360ValiantHornet 4 DriveAMC JavelinDodge ChallengerMerc 450SLCMerc 450SEMerc 450SLHonda CivicToyota CorollaFiat X1−9Fiat 128Ferrari DinoMerc 240DMazda RX4Mazda RX4 WagMerc 280CMerc 280Lotus EuropaMerc 230Volvo 142EDatsun 710Porsche 914−2Toyota Corona
Car M
odel
s
heatmap(<Mtcars data>, ..., scale = "column")
Lecture 2Thursday, January 22, 2009
R basic types / atomic classes of objects
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- character() ### char, strings, vectors or both> x <- "test1"> x[1] "test1"> x[1] <- "test1"> x[2] <- "test1"> x[3] <- "test2"> x[1] "test1" "test1" "test2"> class(x)[1] "character"
> x <- numeric() ### double floats, vectors of floats> x <- complex() ### complex numbers> xcomplex(0)> x <- logical() ## logicals, can be used ## to index other objects> x <- 12> class(x)[1] "numeric"> x <- 12L ### force integer> class(x) [1] “integer”
> x <- Inf ## infinity> x[1] Inf> x <- NA ### missing values are NA or NaN> x [1] NA> is.na( x ) ### built in functions help in dealing ### with NAs[1] TRUE> x <- logical()> xlogical(0)> x <- NaN> is.na( x )[1] TRUE
Lecture 2Thursday, January 22, 2009
NA, NaN, empty/missing values
v22.0480: computing with data, Richard Bonneau Lecture 1
Values can be missing for lots of good reasons.
Technical:-the measurement failed (it was cloudy that night, the probe for that DNA was synthesized incorrectly)
Budgetary/Social:- we could only afford to measure so many points / attributes- people will only answer 15 minutes of questions...
Bugs (incorrect explicit type coercion)
Values not filled in YET
see also:is.nan(), is.null(), as.null()
> x <- Inf ## infinity> x[1] Inf ### this IS a number
> x <- NA ### missing values are NA or NaN> x [1] NA> is.na( x ) ### built in functions help in dealing ### with NAs[1] TRUE
> ### messed up explicit coercion> x <- c( "f" , "fg" )> as.numeric ( x )[1] NA NAWarning message:NAs introduced by coercion
Lecture 2Thursday, January 22, 2009
R basic types vectors
v22.0480: computing with data, Richard Bonneau Lecture 1
Integers> x <- 1:12> class( x)[1] "integer"> x <- c(1L, 2L, 3L)> x[1] 1 2 3
Numeric> x <- c(1, 2, 3.2)> x[1] 1.0 2.0 3.2
Logical> x <- c( TRUE, TRUE, FALSE) > x[1] TRUE TRUE FALSE
Logical from conditional statement> x <- c("azure", "red", "green", "red")> x[1] "azure" "red" "green" "red" > x == "azure"[1] TRUE FALSE FALSE FALSE
> x <- c(1, 2, 3.2)> x[1] 1.0 2.0 3.2> x < 2.1 [1] TRUE TRUE FALSE
Integer indexes from conditionals> which ( x < 2.1 )[1] 1 2
Lecture 2Thursday, January 22, 2009
R basic types: vectors
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- numeric( 10 ) ## a length 10 numeric vactor> x ## short for print(x) [1] 0 0 0 0 0 0 0 0 0 0
> x <- character( 10 )> x [1] "" "" "" "" "" "" "" "" "" ""> x[ length(x) + 1] <- "a"> x [1] "" "" "" "" "" "" "" "" "" "" "a"> x <- c(x, "b")> x [1] "" "" "" "" "" "" "" "" "" "" "a" "b"> ### attributes> length ( x )[1] 12> names( x )NULL> str( x ) chr [1:12] "" "" "" "" "" "" "" "" "" "" "a" "b"
> x <- 1:5 ## loading atributes> names( x ) <- c("one", "two", "three", "four", "five")> x one two three four five 1 2 3 4 5 > names( x )[1] "one" "two" "three" "four" "five" > class( x )[1] "integer"
Lecture 2Thursday, January 22, 2009
creative ways of making nasty bugs
v22.0480: computing with data, Richard Bonneau Lecture 1
> ## you can, but shouldn't do nutz stuff like this> x <- c( 1, "two" )> x[1] "1" "two"> class(x )[1] "character"> y <- c(1,0,TRUE, FALSE)> y[1] 1 0 1 0> class( y )[1] "numeric"> y <- c( "true", TRUE, FALSE) ## nuts!> y[1] "true" "TRUE" "FALSE"> class( y )[1] "character"> ## creative ways of writing nasty nasty bugs
R variables, vectors and matrices assume the type 1st specified OR loaded.
assigning different types laterin the code will often override this initial type.
for example
> x <- 1:10> example.function( x ) ## function returns a charcter vec> x <- length( x ) * pi> x <- FALSE
x has been 4 types in 4 lines of code
Lecture 2Thursday, January 22, 2009
factors ...
v22.0480: computing with data, Richard Bonneau Lecture 1
Making a factor vector
> youare <- as.factor ( c("M", "F", "F", "U" ) )> youare[1] M F F ULevels: F M U
> youare <- rep( 1, 10)> youare [1] 1 1 1 1 1 1 1 1 1 1> ?runif> y <- runif( 10 )> youare[ y > 0.5 ] <- "big"> youare[ y <= 0.5 ] <- "small"> youare [1] "big" "big" "big" "big" "big" "small" "big" "small" "big" "big" > as.factor(youare) [1] big big big big big small big small big big Levels: big small
Factors are integers with a label, but the label is storedmuch more efficiently (once for the whole vector offactors)
Using Factors is better in that they have meaningful attributes ... why say 1, 2, 3 as integers when you can say“male”, “female”, “undetermined” ?
Many functions ( functions that aim to classify instances based on vectors of mixed attributes) use factors.
Lecture 2Thursday, January 22, 2009
forcing type conversions, explicit coercion
v22.0480: computing with data, Richard Bonneau Lecture 1
> ### explicit coercion --- forcing the type> x <- character( "1", "2", "3", "4", "0", "0" )Error in character("1", "2", "3", "4", "0", "0") : unused argument(s) ("2", "3", "4", "0", "0")> x <- c( "1", "2", "3", "4", "0", "0" )> class ( x )[1] "character"> x[1] "1" "2" "3" "4" "0" "0"> x <- as.numeric( x )> x[1] 1 2 3 4 0 0> str( x) num [1:6] 1 2 3 4 0 0> as.logical( x )[1] TRUE TRUE TRUE TRUE FALSE FALSE> as.complex( x )[1] 1+0i 2+0i 3+0i 4+0i 0+0i 0+0i> as.integer( x )[1] 1 2 3 4 0 0
* remember, many times coercionto the type you think is a good way of checking youʼve read in OR computes what you think you have ... e.g. coercion of a character to a numeric can often produce NAs that lead you to bugs.
so declaring and coercion of type is a good idea even if Rdoesnʼt strictly require it.
Lecture 2Thursday, January 22, 2009
coercion of matrix objects
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- c(1,2,3,4,0,0)> x[1] 1 2 3 4 0 0> matrix( x, ncol = 2, nrow = 2 ) [,1] [,2][1,] 1 3[2,] 2 4> matrix( x, ncol = 2, nrow = 3 ) [,1] [,2][1,] 1 4[2,] 2 0[3,] 3 0> matrix( x, ncol = 2, nrow = 4 ) [,1] [,2][1,] 1 0[2,] 2 0[3,] 3 1[4,] 4 2Warning message:In matrix(x, ncol = 2, nrow = 4) : data length [6] is not a sub-multiple or multiple of the number of rows [4]> ### but it still did it !!!!! is this a feature or a bug waiting to happen?
> x <- c(1,2,3,4,0,0)> dim(x) <- c(3,2)> x [,1] [,2][1,] 1 4[2,] 2 0[3,] 3 0> ### but the dim has to match the length?
Lecture 2Thursday, January 22, 2009
matrix names
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- c(NA, NA, 1)> x[1] NA NA 1> is.na(x)[1] TRUE TRUE FALSE> > y <- matrix( x, ncol = 2, nrow = 3 )> > dim(y )[1] 3 2> y [,1] [,2][1,] NA NA[2,] NA NA[3,] 1 1> y[ is.na(y) ] <- 0.01641428> y [,1] [,2][1,] 0.01641428 0.01641428[2,] 0.01641428 0.01641428[3,] 1.00000000 1.00000000
> y[1,2] <- 5.67> rownames( y ) <- c( "eq", "er", "es")> colnames( y ) <- c("qr", "rq" )> dimnames( y )[[1]][1] "eq" "er" "es"
[[2]][1] "qr" "rq"
> y qr rqeq 0.01641428 0.01641428er 0.01641428 0.01641428es 1.00000000 1.00000000>
Lecture 2Thursday, January 22, 2009
matrices
v22.0480: computing with data, Richard Bonneau Lecture 1
> ## matrix are filled starting in the upper left courner and then running down > ## the column. The first indexis the row, and the second is the > > y <- 1:10> dim(y) <- c(2,5)> y [,1] [,2] [,3] [,4] [,5][1,] 1 3 5 7 9[2,] 2 4 6 8 10> dim(y) <- c(5,2)> y [,1] [,2][1,] 1 6[2,] 2 7[3,] 3 8[4,] 4 9[5,] 5 10> dim(y) <- c(5,5) ### oops?Error in dim(y) <- c(5, 5) : dims [product 25] do not match the length of object [10]
Lecture 2Thursday, January 22, 2009
rbind, cbind
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- 1:10> y <- 10:1> z <- c(1:5, 5:1)> xyz <- rbind( x,y,z )> xyz [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]x 1 2 3 4 5 6 7 8 9 10y 10 9 8 7 6 5 4 3 2 1z 1 2 3 4 5 5 4 3 2 1> xyz <- cbind( x,y,z )> xyz x y z [1,] 1 10 1 [2,] 2 9 2 [3,] 3 8 3 [4,] 4 7 4 [5,] 5 6 5 [6,] 6 5 5 [7,] 7 4 4 [8,] 8 3 3 [9,] 9 2 2[10,] 10 1 1>
> ## adding to a matrix one row at a time> xyz <-rbind( xyz, c( 3,4,5) )> xyz x y z [1,] 1 10 1 [2,] 2 9 2 [3,] 3 8 3 [4,] 4 7 4 [5,] 5 6 5 [6,] 6 5 5 [7,] 7 4 4 [8,] 8 3 3 [9,] 9 2 2[10,] 10 1 1[11,] 3 4 5 > ## could do a similar thing with cbind()
Lecture 2Thursday, January 22, 2009
lists
v22.0480: computing with data, Richard Bonneau Lecture 1
Making a list
> p1 <- list()> ### Declare a list> p1$x <- 1> p2$x <- 2.0Error in p2$x <- 2 : object "p2" not found> p1$x <- 2.0> p2 <- list()> p2$x <- 3.0> p2$y <- 2.0> p2$y <- 2.0> p1$y <- 2.0> p1$x[1] 2$y[1] 2
> p2$x[1] 3$y[1] 2
Making a list of lists
> all.p <- list()> all.p[[1]] <- p1> all.p[[2]] <- p2> all.p[[1]][[1]]$x[1] 2
[[1]]$y[1] 2
[[2]][[2]]$x[1] 3
[[2]]$y[1] 2
Naming and accessing lists:
> names( all.p ) <- c("p1","p2")> all.p$p1$p1$x[1] 2
$p1$y[1] 2
$p2$p2$x[1] 3
$p2$y[1] 2
> all.p$p1$x[1] 2
$y[1] 2
> all.p$p1$x[1] 2> all.p[[1]]$x[1] 2> all.p[[1]][[2]][1] 2
Lecture 2Thursday, January 22, 2009
lists are a great way to return and pass data
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- rnorm( 20, 1.2, 0.5 ) ## 20 draws from a normal N(1.2, 0.5)> hist.x <- hist( x ) > hist.x$breaks[1] 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0$counts[1] 2 3 2 6 2 3 0 2$intensities[1] 0.4999999 0.7500000 0.5000000 1.5000000 0.5000000 0.7500000 0.0000000 0.5000000$density[1] 0.4999999 0.7500000 0.5000000 1.5000000 0.5000000 0.7500000 0.0000000 0.5000000$mids[1] 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9$xname[1] "x"$equidist[1] TRUE
attr(,"class")[1] "histogram"
> class(hist.x )[1] "histogram" ## so it is not ʻjustʼ a list ## more on that later
Histogram of rnorm(2000, 1.2, 0.5)
rnorm(2000, 1.2, 0.5)
Freq
uenc
y
−0.5 0.0 0.5 1.0 1.5 2.0 2.50
2040
6080
100
Lecture 2Thursday, January 22, 2009
a strange thing lists do ... name autocompletion
v22.0480: computing with data, Richard Bonneau Lecture 1
> x <- rnorm( 20, 1.2, 0.5 ) ## 20 draws from a normal N(1.2, 0.5)> hist.x <- hist( x ) > hist.x$breaks[1] 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0$counts[1] 2 3 2 6 2 3 0 2$intensities[1] 0.4999999 0.7500000 0.5000000 1.5000000 0.5000000 0.7500000 0.0000000 0.5000000$density[1] 0.4999999 0.7500000 0.5000000 1.5000000 0.5000000 0.7500000 0.0000000 0.5000000$mids[1] 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9$xname[1] "x"$equidist[1] TRUE
attr(,"class")[1] "histogram"
> class(hist.x )[1] "histogram"> ## so it is not ʻjustʼ a list> ## more on that later
Lecture 2Thursday, January 22, 2009
dataframes
v22.0480: computing with data, Richard Bonneau Lecture 1
Data.frames are tables of data,most of the time you get them by reading in tab delimitedtables or flat files, read.table()
Letʼs look at an example data.frame mostinstalls of R should have loaded.
> class( USJudgeRatings )[1] "data.frame"> str(USJudgeRatings)'data.frame': 43 obs. of 12 variables: $ CONT: num 5.7 6.8 7.2 6.8 7.3 6.2 10.6 7 7.3 8.2 ... $ INTG: num 7.9 8.9 8.1 8.8 6.4 8.8 9 5.9 8.9 7.9 ... $ DMNR: num 7.7 8.8 7.8 8.5 4.3 8.7 8.9 4.9 8.9 6.7 ... $ DILG: num 7.3 8.5 7.8 8.8 6.5 8.5 8.7 5.1 8.7 8.1 ... ... $ RTEN: num 7.8 8.7 7.8 8.7 4.8 8.6 9 5 8.8 7.9 ...
> pairs( USJudgeRatings[,1,5] ) ## this function knows ## what to do with a ## dataframe> USJudgeRatings$CONT [1] 5.7 6.8 7.2 6.8 ...> USJudgeRatings[,1] [1] 5.7 6.8 7.2 6.8 ...
CONT
6.0 7.5 9.0
●
●●
●●
●
●
●●
●
●● ●
●●
●
●●
●
●
●●
●●
●●
●●●●
●
●
●●
●
●
●●
● ●
●
●
●
●
●●
●●
●
●
●●
●
●● ●
●●
●
●●
●
●
●●
●●
●●
●●●●
●
●
●●
●
●
●●
● ●
●
●
●
5 6 7 8 9
●
●●
●●
●
●
●●
●
●●●
●●
●
●●
●
●
●●
●●
●●
●●●●
●
●
●●
●
●
●●
● ●
●
●
●
67
89
●
●●
●●
●
●
●●
●
●●●
●●
●
●●
●
●
●●
●●
●●
●●●●
●
●
●●
●
●
●●
● ●
●
●
●
6.0
7.5
9.0
●
●
●
●
●
●●
●
●
●●●
●
●
● ●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● INTG ●
●
●
●
●
●●
●
●
● ●●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
● DMNR●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
●
56
78
9
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
●
56
78
9
●
●
●
●
●
●●
●
●
●
●●●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●● DILG ●
●
●
●
●
●●
●
●
●
●●●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
6 7 8 9
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
5 6 7 8 9
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
5.5 7.0 8.5
5.5
7.0
8.5
CFMG
Lecture 2Thursday, January 22, 2009
dataframes
v22.0480: computing with data, Richard Bonneau Lecture 1
coerce a data.frame to a matrix
> x <- as.matrix ( USJudgeRatings )> str( x ) num [1:43, 1:12] 5.7 6.8 7.2 6.8 7.3 6.2 10.6 7 7.3 8.2 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:43] "AARONSON,L.H." "ALEXANDER,J.M." "ARMENTANO,A.J." "BERDON,R.I." ... ..$ : chr [1:12] "CONT" "INTG" "DMNR" "DILG" ...
Lecture 2Thursday, January 22, 2009
reading in code
v22.0480: computing with data, Richard Bonneau Lecture 1
# ~bonneau/v22-class/ > cat mean.vec.R## function to report the mean of a vectormean.vec <- function ( x, na.remove = T ) {
if ( class( x ) == "numeric" || class( x) == "integer") { return( mean(x, na.rm = na.remove) ) } else { return( NULL ) ## we could also return a NA }}
# ~bonneau/v22-class/ > R
...R startup...
> ? source>source( file = “mean.vec.R” ) ## you might need a path ...> mean.vec( c( 2,3) )2.5> mean.vec( c( 2, 3, NA) )2.5> mean.vec( c(“ps”, “qs”) )NULL>
Lecture 2Thursday, January 22, 2009
Lecture 1v22.0480: computing with data, Richard Bonneau
1. Read the R manual.
2. non-graded homework:Make a function that:given a matrix returns a vector containing the means of each rowgiven a list of numeric vectors returns the mean of each vector in the list
for test data either use the link to “small test expression matrix”or use a built in R data object ( like volcano ):> dim( volcano )[1] 87 61> str( volcano ) num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...> dim(volcano )[1] 87 61> ? image
use loops, don’t worry about NAs for now, this is not a graded assignment, but give it a try to get your feet wet.
if you want a hint stay after class ... ... next lecture we’ll play with plotting and graphics. If you’re confused there will be time to catch up next week.
homework and reading for next time
Lecture 2Thursday, January 22, 2009