NCompass Live: Dig Into Reading: Summer Reading Program 2013
Reading Data into R
-
Upload
kazuki-yoshida -
Category
Education
-
view
1.857 -
download
3
Transcript of Reading Data into R
![Page 1: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/1.jpg)
Reading data into
2012-09-28 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO(KNOW
![Page 2: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/2.jpg)
! Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
! Introduction to R
Previously in this group
![Page 3: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/3.jpg)
Menu
! What statistics is all about.
! Data-reading functions in R
! Installing packages
! Reading excel files
! Reading other files
![Page 4: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/4.jpg)
is the study of the collection, organization, analysis, interpretation,
and presentation of
datahttp://en.wikipedia.org/wiki/Statistics
http://mediacrushllc.com/2012/internet-statistics-2012/
![Page 5: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/5.jpg)
No data,No life
No statistics
![Page 6: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/6.jpg)
http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
Loading data is the first step
![Page 7: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/7.jpg)
Supported! .RData (native): load()
! .csv: read.csv()
! .xls/.xlsx: library(gdata) or library(XLConnect)
! .sas7bdat: read.sas7bdat() via library(sas7bdat)
! .dta: read.dta via library(foreign)
! and more...http://cran.r-project.org/doc/manuals/R-data.html
![Page 8: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/8.jpg)
library()packages
![Page 9: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/9.jpg)
http://r4stats.com/articles/popularity/
4000+user-
contributedpackages
Fast development
![Page 10: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/10.jpg)
Downside:not much can be
done withoutpackages
![Page 11: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/11.jpg)
CRAN
![Page 12: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/12.jpg)
Comprehensive
RArchive
Network
http://cran.r-project.org/web/packages/available_packages_by_date.html
![Page 13: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/13.jpg)
Let’s try
![Page 14: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/14.jpg)
Open R Studio
![Page 16: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/16.jpg)
SourceConsole
Plot Workspace
switched
![Page 17: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/17.jpg)
Menu: RStudio - Preferences
SourceConsole
My configuration
Plot Workspace
![Page 18: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/18.jpg)
Menu: RStudio - Preferences My configuration
Configure CRAN mirror
![Page 19: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/19.jpg)
Use .CSVif possible
http://www.edrugsearch.com/edsblog/cvs-takes-on-wal-marts-generic-drug-prices-with-a-gimmicky-twist/#.UEfft0J8z0d
Comma Separated Values
![Page 20: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/20.jpg)
.csvhttp://www.wondergraphs.com/img/SFO_Landings.csv
![Page 21: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/21.jpg)
read.csv(“file.csv”)
http://www.wondergraphs.com/img/SFO_Landings.csv
Careful big file!
![Page 22: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/22.jpg)
new.dat <- read.csv(“file.csv”)
name of a dataset here
file name herefunction to read .csv files
![Page 23: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/23.jpg)
new.dat <- read.csv(file.choose())
name of a dataset here
function to open a file-choose dialoguefunction to read .csv files
alternatively
![Page 24: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/24.jpg)
Space separated
http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
![Page 25: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/25.jpg)
read.table(“file.dat”)or
read.table(“file.dat”, header = T)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/tlc.dat
![Page 26: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/26.jpg)
tab-separated
![Page 27: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/27.jpg)
read.delim(“file.tsv”)http://www.brookscole.com/cgi-wadsworth/
course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495384
960&disciplinenumber=1038&template=AUS
![Page 28: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/28.jpg)
For comma-, tab-, or space-separated text
Let’s try!
![Page 29: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/29.jpg)
http://www.last.fm/music/Excel/+images/285200http://www.biography.com/people/bill-gates-9307520
Excel files prevalent
![Page 30: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/30.jpg)
http://www.hsph.harvard.edu/faculty/miguel-hernan/files/nhefs_book.xls
http://www.hsph.harvard.edu/faculty/miguel-hernan/causal-inference-book/
http://www.philipcoppens.com/matrixconstructs.html
We will use publicly available
data
![Page 31: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/31.jpg)
install.packages(“gdata”, dep = T)
library(gdata)read.xls(“file.xls”)
Perl configuration necessary on Winhttp://cran.r-project.org/web/packages/gdata/INSTALL
![Page 32: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/32.jpg)
install.packages(“XLConnect”, dep = T)
library(XLConnect)readWorksheet(loadWorkbook(“file.xls”),
sheet=1)
install.packages("XLConnect", type = "source") on Mac
Define a function for simplicitymy.read.xls <- function(file) readWorksheet(loadWorkbook(file), sheet = 1)my.read.xls(“file.xls”)
![Page 33: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/33.jpg)
install.packages(“package”, dep = T)
package name here
To install a package
short for dependenciesshort for TRUE
![Page 34: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/34.jpg)
![Page 35: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/35.jpg)
To load a package
library(package)
package name here
double quote “” can be omitted
![Page 36: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/36.jpg)
Just click box
![Page 37: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/37.jpg)
Install packageLoad package
Read xls file chosen to nhefs
![Page 38: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/38.jpg)
![Page 39: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/39.jpg)
install.packages(“sas7bdat”, dep = T)
library(sas7bdat)read.sas7bdat(“file.sas7bdat”)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/smoking.sas7bdat
![Page 40: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/40.jpg)
library(foreign)read.xport(“file.xpt”)
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2009-2010/DEMO_F.xpt
![Page 41: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/41.jpg)
![Page 42: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/42.jpg)
library(foreign)read.dta(“file.dta”)
http://www.biostat.harvard.edu/~fitzmaur/ala2e/headache.dta
![Page 43: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/43.jpg)
http://www.drugs.com/top200_2003.html
HTML table
![Page 44: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/44.jpg)
install.packages(“XML”, dep = T)library(XML)
readHTMLTable("http://www.drugs.com/top200_2003.html", which = 2, skip.rows = 1)
http://www.drugs.com/top200_2003.html
![Page 45: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/45.jpg)
Fixed width
![Page 46: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/46.jpg)
read.fwf(“file.txt”, width = c(3, 5, ...))
Use width = list(c(3,5,..), c(5,7,..)) for multiple rows per subject
![Page 47: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/47.jpg)
Important functions
! install.packages(“PackageName”, dep = T)
! library(PackageName)
! str(dataset)
! summary(dataset)
! head(dataset)
![Page 48: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/48.jpg)
![Page 49: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/49.jpg)
Appendix:Probability Functions
![Page 50: Reading Data into R](https://reader038.fdocuments.us/reader038/viewer/2022103016/554e812ab4c905f66a8b5509/html5/thumbnails/50.jpg)
-norm -t -binom -pois what it does
d- dnorm dt dbinom dpois
density (mass)
given x-axis
p- pnorm pt pbinom ppois
return probability,
given x- axis(quan.)
q- qnorm qt qbinom qpois
return quantile (x-axis),
given prob.
-testlibrary(BS
DA): z.test,
zsum.test
t.test, library(BS
DA): tsum.test
binom.test poisson.test
return p-value and confidence
interval