R Data Import/Export Dr. Jieh-Shan George YEH [email protected].

19
R Data Import/Export Dr. Jieh-Shan George YEH [email protected]

Transcript of R Data Import/Export Dr. Jieh-Shan George YEH [email protected].

Page 1: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

R Data Import/Export

Dr. Jieh-Shan George [email protected]

Page 2: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Save and Load R Data

• Data in R can be saved as .Rdata files with function save().

getwd()setwd("c:\\temp")a <- 1:10save(a, file="dumData.Rdata")rm(a)load("dumData.Rdata")print(a)

Page 3: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Fixed-width-format filescat("2 3 5 7", "11 13 17 19", file="ex1.data", sep="\n")scan(file="ex1.data", what=list(x=0, y="", z=0), flush=TRUE)

cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex2.data", sep = "\n")pp <- scan("ex2.data", skip = 1, quiet = TRUE)scan("ex2.data", skip = 1)scan("ex2.data", skip = 1, nlines = 1) # only 1 line after the skipped one

pp2<-scan("ex2.data", what = list("","","")) # flush is F -> read "7"

pp3<-scan("ex2.data", what = list("","",""), flush = TRUE)

unlink("ex2.data") # unlink deletes the file

Page 4: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Import from and Export to .CSV Files

• Create a dataframe df1 and save it as a .CSV le with write.csv().

• The dataframe is loaded from file to df2 with read.csv()var1 <- 1:5var2 <- (1:5) / 10var3 <- c("R", "and", "Data Mining", "Examples", "Case Studies")df1 <- data.frame(var1, var2, var3)names(df1) <- c("VariableInt", "VariableReal", "VariableChar")write.csv(df1, "dummmyData.csv", row.names = FALSE)df2 <- read.csv("dummmyData.csv")print(df2)

Page 5: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Scan• One common use of scan is to read in a large matrix. Suppose file

matrix.dat just contains the numbers for a 200 x 2000 matrix. • Then we can use

A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE)

On one test this took 1 second (under Linux, 3 seconds under Windows on the same machine)Whereas

A <- as.matrix(read.table("matrix.dat"))took 10 seconds (and more memory), and

A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200,comment.char = "", colClasses = "numeric"))

took 7 seconds.

Page 6: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

• Note that timings can depend on the type read and the data.writeLines(as.character((1+1e6):2e6), "ints.dat")xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77sxn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93sxc <- scan("ints.dat", what=character(0), n=1e6) # 0.85sxf <- as.factor(xc) # 2.2sDF <- read.table("ints.dat") # 4.5s

Page 7: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

code <- c("LMH", "SJC", "CHCH", "SPC", "SOM")writeLines(sample(code, 1e6, replace=TRUE), "code.dat")y <- scan("code.dat", what=character(0), n=1e6) # 0.44syf <- as.factor(y) # 0.21sDF <- read.table("code.dat") # 4.9s

Page 8: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

zz <- read.csv("mr.csv", strip.white = TRUE)zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))

Page 9: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.
Page 10: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

read.table

• HousePrice <- read.table("houses.data")

• HousePrice <- read.table("houses.data", header=TRUE)

Page 11: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

scan() function

• inp <- scan("input.dat", list("",0,0))• inp <- scan("input.dat", list(id="", x=0, y=0))• X <- matrix(scan("light.dat", 0), ncol=5,

byrow=TRUE)

Page 12: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

BUILT IN DATASETS

Page 13: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Accessing built in datasets

• Around 100 datasets are supplied with R (in package datasets)data()data(infert)

• To access data from a particular package, use the package argumentdata(package="rpart")data(Puromycin, package="datasets")

Page 14: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Editing data

• This is useful for making small changes once a data set has been read. The command

data(car90, package="rpart")

xnew <- edit(car90)• If you want to alter the original dataset xold, the

simplest way is to use fix(xold),• which is equivalent to xold <- edit(xold).

• to enter new data via the spreadsheet interface.xnew <- edit(data.frame())

Page 15: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

PACKAGE ‘XLSX’

Page 16: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Package ‘xlsx’• http://cran.r-project.org/web/packages/xlsx/xlsx.pdfinstall.packages("xlsx")require(xlsx)

# example of reading xlsx sheetsfile <- system.file("tests", "test_import.xlsx", package = "xlsx")res <- read.xlsx(file, 2) # read the second sheet

# example of writing xlsx sheetsfile <- paste(tempfile(), "xlsx", sep=".")write.xlsx(USArrests, file=file) #This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

res <- read.xlsx("mydata.xlsx", 1, encoding="utf-8") # read the sheet1

Page 17: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Output to connections

zz <- file("ex.data", "w") # open an output file connectioncat("TITLE extra line", "2 3 5 7", "", "11 13 17",file = zz, sep = "\n")cat("One more line\n", file = zz)close(zz)

Page 18: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Output to connections

## capture R output: use examples from help(lm)zz <- textConnection("ex.lm.out", "w")sink(zz)example(lm, prompt.prefix = "> ")sink()close(zz)## now ‘ex.lm.out’ contains the output for futher processing.## Look at it by, e.g.,cat(ex.lm.out, sep = "\n")

Page 19: R Data Import/Export Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw.

Input from connections

## read in file created in last examplesreadLines("ex.data")unlink("ex.data")## read listing of current directory (Unix)readLines(pipe("ls -1"))## read listing of current directory (windows)readLines(pipe(“dir"))