11 Data Structures
-
Upload
hadley-wickham -
Category
Technology
-
view
1.430 -
download
6
description
Transcript of 11 Data Structures
![Page 1: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/1.jpg)
Hadley Wickham
Stat405Data structures
Wednesday, 30 September 2009
![Page 2: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/2.jpg)
1. Outline for next couple of weeks
2. Basic data types
3. Vectors, matrices & arrays
4. Lists & data.frames
Wednesday, 30 September 2009
![Page 3: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/3.jpg)
More details about the underlying data structures we’ve been using so far.
Then pulling apart those data structures (particularly data frames), operating on each piece and putting them back together again.
An alternative to for loops when each piece is independent.
Outline
Wednesday, 30 September 2009
![Page 4: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/4.jpg)
Vector
Matrix
Array
List
Data frame
1d
2d
nd
Same types Different types
Wednesday, 30 September 2009
![Page 5: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/5.jpg)
character
numeric
logical
mode()
length() A scalar is a vector of length 1
as.character(c(T, F))
as.character(seq_len(5))
as.logical(c(0, 1, 100))
as.logical(c("T", "F", "a"))
as.numeric(c("A", "100"))
as.numeric(c(T, F))
When vectors of different types occur in an expression, they will be automatically coerced to the same type: character > numeric > logical
names() Optional, but useful
Technically, these are all atomic vectorsWednesday, 30 September 2009
![Page 6: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/6.jpg)
Your turn
Experiment with automatic coercion. What is happening in the following cases?
104 & 2 < 4
mean(diamonds$cut == "Good")
c(T, F, T, T, "F")
c(1, 2, 3, 4, F)
Wednesday, 30 September 2009
![Page 7: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/7.jpg)
Matrix (2d)Array (>2d)
a <- seq_len(12)
dim(a) <- c(1, 12)
dim(a) <- c(4, 3)
dim(a) <- c(2, 6)
dim(a) <- c(3, 2, 2)
1 5 9
2 6 10
3 7 11
4 8 12
1 2 3 4 5 6 7 8 9 10 11 12
1 3 5 7 9 11
2 4 6 8 10 12
(1, 12)
(4, 3) (2, 6)Just like a vector. Has mode() and length().
Create with matrix() or array(), or from a vector by setting dim()
as.vector() converts back to a vector
Wednesday, 30 September 2009
![Page 8: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/8.jpg)
ListIs also a vector (so has mode, length and names), but is different in that it can store any other vector inside it (including lists).
Use unlist() to convert to a vector. Use as.list() to convert a vector to a list.
c(1, 2, c(3, 4))
list(1, 2, list(3, 4))
c("a", T, 1:3)
list("a", T, 1:3)
a <- list(1:3, 1:5)
unlist(a)
as.list(a)
b <- list(1:3, "a", "b")
unlist(b)
Technically a recursive vectorWednesday, 30 September 2009
![Page 9: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/9.jpg)
Data frame
List of vectors, each of the same length. (Cross between list and matrix)
Different to matrix in that each column can have a different type
Wednesday, 30 September 2009
![Page 10: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/10.jpg)
# How do you convert a matrix to a data frame?
# How do you convert a data frame to a matrix?
# What is different?
# What does these subsetting operations do?
# Why do they work? (Remember to use str)
diamonds[1]
diamonds[[1]]
diamonds[["cut"]]
Wednesday, 30 September 2009
![Page 11: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/11.jpg)
x <- sample(12)
# What's the difference between a & b?
a <- matrix(x, 4, 3)
b <- array(x, c(4, 3))
# What's the difference between x & y
y <- matrix(x, 12)
# How are these subsetting operations different?
a[, 1]
a[, 1, drop = FALSE]
a[1, ]
a[1, , drop = FALSE]
Wednesday, 30 September 2009
![Page 12: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/12.jpg)
1d names() length() c()
2dcolnames()rownames()
ncol()nrow()
cbind()rbind()
nd dimnames() dim() abind()(special package)
Wednesday, 30 September 2009
![Page 13: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/13.jpg)
b <- seq_len(10)
a <- letters[b]
# What sort of matrix does this create?
rbind(a, b)
cbind(a, b)
# Why would you want to use a data frame here?
# How would you create it?
Wednesday, 30 September 2009
![Page 14: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/14.jpg)
load(url("http://had.co.nz/stat405/data/quiz.rdata"))
# What is a? What is b?
# How are they different? How are they similar?
# How can you turn a in to b?
# How can you turn b in to a?
# What are c, d, and e?
# How are they different? How are they similar?
# How can you turn one into another?
# What is f?
# How can you extract the first element?
# How can you extract the first value in the first
# element?
Wednesday, 30 September 2009
![Page 15: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/15.jpg)
# a is numeric vector, containing the numbers 1 to 10# b is a list of numeric scalars# they contain the same values, but in a different formatidentical(a[1], b[[1]])identical(a, unlist(b))identical(b, as.list(a))
# c is a named list# d is a data.frame# e is a numeric matrix# From most to least general: c, d, eidentical(c, as.list(d))identical(d, as.data.frame(c))identical(e, data.matrix(d))
Wednesday, 30 September 2009
![Page 16: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/16.jpg)
# f is a list of matrices of different dimensions
f[[1]]f[[1]][1, 2]
Wednesday, 30 September 2009
![Page 17: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/17.jpg)
# What does these subsetting operations do?
# Why do they work? (Remember to use str)
diamonds[1]
diamonds[[1]]
diamonds[["cut"]]
diamonds[["cut"]][1:10]
diamonds$cut[1:10]
Wednesday, 30 September 2009
![Page 18: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/18.jpg)
Vectors x[1:4] —
MatricesArrays
x[1:4, ]x[, 2:3, ]
x[1:4, , drop = F]
Listsx[[1]]x$name
x[1]
Wednesday, 30 September 2009
![Page 19: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/19.jpg)
# What's the difference between a & b?
a <- matrix(x, 4, 3)
b <- array(x, c(4, 3))
# What's the difference between x & y
y <- matrix(x, 12)
# How are these subsetting operations different?
a[, 1]
a[, 1, drop = FALSE]
a[1, ]
a[1, , drop = FALSE]
Wednesday, 30 September 2009
![Page 20: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/20.jpg)
1d c()
2dmatrix()
data.frame()t()
nd array() aperm()
Wednesday, 30 September 2009
![Page 21: 11 Data Structures](https://reader033.fdocuments.us/reader033/viewer/2022042813/547e952eb379595e2b8b5525/html5/thumbnails/21.jpg)
b <- seq_len(10)
a <- letters[b]
# What sort of matrix does this create?
rbind(a, b)
cbind(a, b)
# Why would you want to use a data frame here?
# How would you create it?
Wednesday, 30 September 2009