20130215 Reading data into R
-
Upload
kazuki-yoshida -
Category
Documents
-
view
381 -
download
1
Transcript of 20130215 Reading data into R
![Page 1: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/1.jpg)
Reading and Manipulationg
data in2013-02-15 @HSPH
Kazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
![Page 2: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/2.jpg)
Reading data in
n Usually the first task in real-life data analysis.
![Page 3: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/3.jpg)
Supportedn .RData (native) files: load()
n .csv files: read.csv()
n .xls/.xlsx files: gdata::read.xls() or xlsx::read.xlsx()
n .sas7bdat files: sas7bdat ::read.sas7bdat()
n .dta files: foreign::read.dta()
n and more...http://cran.r-project.org/doc/manuals/R-data.html
![Page 4: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/4.jpg)
foreign::read.dta()
package name(packages add functions) function name
functions are followed by (),in which you specify arguments
![Page 5: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/5.jpg)
Create a folder for this group
![Page 6: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/6.jpg)
Open R Studio
![Page 7: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/7.jpg)
Make sure your working directory
is correct
![Page 8: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/8.jpg)
Download files
n Rosner (ASCII, comma-separated and Stata): http://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20bI&product_isbn_issn=9780538733496
n Hernan (Excel and SAS): http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
![Page 9: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/9.jpg)
.csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
![Page 10: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/10.jpg)
For comma-, tab-, or space-separated text
![Page 11: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/11.jpg)
new.dat <- read.csv(“file.csv”)
name of object to create
file name herefunction to read .csv files
assignment operator
![Page 12: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/12.jpg)
Space separated
http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
![Page 13: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/13.jpg)
read.table(“file.dat”)or
read.table(“file.dat”, header = T)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
![Page 14: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/14.jpg)
tab-separated
![Page 15: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/15.jpg)
read.delim(“file.tsv”)http://www.brookscole.com/cgi-wadsworth/
course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495384
960&disciplinenumber=1038&template=AUS
![Page 16: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/16.jpg)
Excel files
![Page 17: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/17.jpg)
Install xlsx package
![Page 18: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/18.jpg)
Just click box to load
![Page 19: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/19.jpg)
To install/load a package
install.packages(“package”, dep = T)
library(package)
![Page 20: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/20.jpg)
xlsdat <- read.xlsx(“file.xls”, 1)
name of object to create
file name herefunction to read .xlsx files
assignment operator
sheet number
![Page 21: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/21.jpg)
![Page 22: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/22.jpg)
library(sas7bdat)sasdat <- read.sas7bdat(“file.sas7bdat”)
SAS native files
![Page 23: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/23.jpg)
library(foreign)xptdat <- read.xport(“file.xpt”)
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2009-2010/DEMO_F.xpt
SAS xport files
![Page 24: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/24.jpg)
![Page 25: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/25.jpg)
library(foreign)statadat <- read.dta(“file.dta”)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/headache.dta
![Page 26: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/26.jpg)
Fixed width
![Page 27: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/27.jpg)
fwfdat <- read.fwf(“file.txt”, width = c(3, 5, ...))
Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
![Page 28: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/28.jpg)
Manipulating data in R
n Objects
n Classes
n Various data objects
![Page 29: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/29.jpg)
Objects
n Just about everything named in R is an object
n An object is a container that
n knows its class (eg, I have numbers inside!).
n has contents (eg, Actual numbers).
![Page 30: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/30.jpg)
Examples of objects
n data, which you use for analysis (various classes)
n functions, which perform analysis (function class)
n results, which come out of analysis (various classes)
![Page 31: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/31.jpg)
Classes of data values inside data objects
n Numeric: Continuous variables
n Factor: Categorical variables
n Logical: TRUE/FALSE binary variables
n etc...
![Page 32: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/32.jpg)
Class?
n An object’s class tells R how the object should be handled.
n For example, summarizing data should work differently for numbers and categories!
![Page 33: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/33.jpg)
Data objects
n Vector (contains single class of data values)
n List (contains multiple classes of data values)
![Page 34: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/34.jpg)
Data objects
n Vector (contains single class of data values)
n Array including Matrix
n List (contains multiple classes of data values)
n Data frame
![Page 35: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/35.jpg)
Vector
n Smallest building block of data objects
n Single dimension
n Combination of values of same class
n vec1 <- c(2013, 2, 15, -10) # combine
n vec2 <- 1:16 # integers 1 to 16
![Page 36: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/36.jpg)
Arrayn Vector folded into a multidimensional structure
n 2-dimensional array is a matrix
n vec3 <- 1:16
n dim(vec3) <- c(4, 4) # 4 x 4 structure
n dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure
n arr1 <- array(1:60, dim = c(3,4,5))
![Page 37: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/37.jpg)
List
n Combination of any values or objects
n Can contain objects of multiple classes
n eg, a list of two vectors, a matrix, three arrays
n list1 <- list(first = 1:17, second = matrix(letters, 13,2))
n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))
![Page 38: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/38.jpg)
Data frame
n Special case of a list
n List of same-length vectors vertically aligned
n df1 <- data.frame(list2)
n list3 <- list(small = letters, large = LETTERS, number = 1:26)
n df2 <- data.frame(list3)
![Page 39: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/39.jpg)
Access by indexes
n letters[3] # 1-dimensional object
n arr1[1,2,3] # 3-dimensional object
n arr1[1, ,3] # implies 1,(all),3
n df1[ ,3] # implies (all),3
n list1[[1]] # list needs [[ ]]
![Page 40: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/40.jpg)
Access named elements
n list3
n list3$small
n list3[["small"]]
n df1$large
n df1[, "large"]
![Page 41: 20130215 Reading data into R](https://reader038.fdocuments.us/reader038/viewer/2022103114/5563a517d8b42a2b6a8b521e/html5/thumbnails/41.jpg)