04 Reports
-
Upload
hadley-wickham -
Category
Sports
-
view
1.269 -
download
0
description
Transcript of 04 Reports
![Page 1: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/1.jpg)
If you’re using a laptop, start installing latex, from the instructions on the website
Thursday, 2 September 2010
![Page 2: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/2.jpg)
Hadley Wickham
Stat405Statistical reports
Thursday, 2 September 2010
![Page 3: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/3.jpg)
1. More subsetting.
2. Missing values.
3. Statistical reports: data, code, graphics & written report
Thursday, 2 September 2010
![Page 4: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/4.jpg)
Office hoursMe: before class, DH 2056Garrett: Wednesday, 3pm, DH 1041
Lab access: you should now have it
Thursday, 2 September 2010
![Page 5: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/5.jpg)
Saving results
# Prints to screen
diamonds[diamonds$x > 10, ]
# Saves to new data frame
big <- diamonds[diamonds$x > 10, ]
# Overwrites existing data frame. Dangerous!
diamonds <- diamonds[diamonds$x < 10,]
Thursday, 2 September 2010
![Page 6: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/6.jpg)
diamonds <- diamonds[1, 1]diamonds
# Uh oh!
rm(diamonds)str(diamonds)
# Phew!
Thursday, 2 September 2010
![Page 7: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/7.jpg)
Your turn
Create a logical vector that selects diamonds with equal x & y. Create a new dataset that only contains these values.
Create a logical vector that selects diamonds with incorrect/unusual x, y, or z values. Create a new dataset that omits these values. (Hint: do this one variable at a time)
Thursday, 2 September 2010
![Page 8: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/8.jpg)
equal_dim <- diamonds$x == diamonds$yequal <- diamonds[equal_dim, ]
y_big <- diamonds$y > 10z_big <- diamonds$z > 6
x_zero <- diamonds$x == 0 y_zero <- diamonds$y == 0z_zero <- diamonds$z == 0zeros <- x_zero | y_zero | z_zero
bad <- y_big | z_big | zerosgood <- diamonds[!bad, ]
Thursday, 2 September 2010
![Page 9: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/9.jpg)
Missing values
Thursday, 2 September 2010
![Page 10: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/10.jpg)
Typically removing the entire row because of one error is overkill. Better to selectively replace problem values with missing values.
In R, missing values are indicated by NA
Data errors
Thursday, 2 September 2010
![Page 11: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/11.jpg)
Expression Guess Actual
5 + NA
NA / 2
sum(c(5, NA))
mean(c(5, NA)
NA < 3
NA == 3
NA == NA
Thursday, 2 September 2010
![Page 12: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/12.jpg)
NA behaviour
Missing values propagate
Use is.na() to check for missing values
Many functions (e.g. sum and mean) have na.rm argument to remove missing values prior to computation.
Thursday, 2 September 2010
![Page 13: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/13.jpg)
# Can use subsetting + <- to change individual # values
diamonds$x[diamonds$x == 0] <- NAdiamonds$y[diamonds$y == 0] <- NAdiamonds$z[diamonds$z == 0] <- NA
y_big <- !is.na(diamonds$y) & diamonds$y > 10diamonds$y[y_big] <- diamonds$y[y_big] / 10z_big <- !is.na(diamonds$z) & diamonds$z > 6diamonds$z[z_big] <- diamonds$z[z_big] / 10
Thursday, 2 September 2010
![Page 14: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/14.jpg)
What happens if you don’t remove the missing values during the subsetting replacement? Why?
Your turn
Thursday, 2 September 2010
![Page 15: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/15.jpg)
Statistical reports
Thursday, 2 September 2010
![Page 16: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/16.jpg)
Statistical reports
Regardless of whether you go into academia or industry, you need to be able to present your findings.
And you should be able to do more than just present them, you should be able to reproduce them.
Thursday, 2 September 2010
![Page 17: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/17.jpg)
Data (.csv)+
Code (.r)+
Graphics (.png, .pdf)+
Written report (.tex)
In one directory
Thursday, 2 September 2010
![Page 18: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/18.jpg)
Set your working directory to specify where files will be loaded from and saved to.
From the terminal (linux or mac): the working directory is the directory you’re in when you start R
On windows: File | Change dir.
On the mac: ⌘-D
Working directory
Thursday, 2 September 2010
![Page 19: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/19.jpg)
DataSo far we’ve just used built in datasets
Next week we’ll learn how to use external data
Thursday, 2 September 2010
![Page 20: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/20.jpg)
Code
Thursday, 2 September 2010
![Page 21: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/21.jpg)
Workflow
At the end of each interactive session, you want a summary of everything you did
Two options:
Save everything that you did with savehistory(filename.r) then remove the unimportant bits
Build up the important bits as you go
Up to you - I prefer the second
Thursday, 2 September 2010
![Page 22: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/22.jpg)
R editor
Linux: gedit(copy and paste - see website)
Windows: File | New Script(press F5 to send line)
Mac: File | New document (press command-enter to send)
Thursday, 2 September 2010
![Page 23: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/23.jpg)
Code is communication!
Thursday, 2 September 2010
![Page 24: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/24.jpg)
Code presentationUse comments (#) to describe what you are doing and to create scannable headings in your code
Every comma should be followed by a space, and every mathematical operator (+, -, =, *, / etc) should be surrounded by spaces. Parentheses do not need spaces
Lines should be at most 80 characters. If you have to break up a line, indent the following piece
Thursday, 2 September 2010
![Page 25: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/25.jpg)
qplot(table,depth,data=diamonds)qplot(table,depth,data=diamonds)+xlim(50,70)+ylim(50,70)qplot(table-depth,data=diamonds,geom="histogram")qplot(table/depth,data=diamonds,geom="histogram",binwidth=0.01)+xlim(0.8,1.2)
Thursday, 2 September 2010
![Page 26: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/26.jpg)
# Table and depth -------------------------
qplot(table, depth, data = diamonds)qplot(table, depth, data = diamonds) + xlim(50, 70) + ylim(50, 70)
# Is there a linear relationship?qplot(table - depth, data = diamonds, geom = "histogram")
# This bin width seems the most revealing qplot(table / depth, data = diamonds, geom = "histogram", binwidth = 0.01) + xlim(0.8, 1.2)# Also tried: 0.05, 0.005, 0.002
Thursday, 2 September 2010
![Page 27: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/27.jpg)
# Table and depth -------------------------
qplot(table, depth, data = diamonds)qplot(table, depth, data = diamonds) + xlim(50, 70) + ylim(50, 70)
# Is there a linear relationship?qplot(table - depth, data = diamonds, geom = "histogram")
# This bin width seems the most revealingqplot(table / depth, data = diamonds, geom = "histogram", binwidth = 0.01) + xlim(0.8, 1.2)# Also tried: 0.05, 0.005, 0.002
Thursday, 2 September 2010
![Page 28: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/28.jpg)
Graphics
Thursday, 2 September 2010
![Page 29: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/29.jpg)
Saving graphics# Uses size on screen:ggsave("my-plot.pdf")ggsave("my-plot.png")
# Specify sizeggsave("my-plot.pdf", width = 6, height = 6)
# Remember to set your working # directory!
Thursday, 2 September 2010
![Page 30: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/30.jpg)
PDF PNG
Vector based (can zoom in infinitely)
Raster based(made up of pixels)
Good for most plots
Good for plots with thousands of
points
Thursday, 2 September 2010
![Page 31: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/31.jpg)
Your turn
Recreate some of the graphics from previous lectures and save them.
Experiment with the scale and height and width settings.
Modify the template to include them.
Thursday, 2 September 2010
![Page 32: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/32.jpg)
Written report
Thursday, 2 September 2010
![Page 33: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/33.jpg)
Latex
We are going to use the open source document typesetting system called latex to produce our reports.
This is widespread in statistics - if you ever write a journal article, you will probably write it in latex.
(Not as useful if you’re not in grad school, but still an important skill)
Thursday, 2 September 2010
![Page 34: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/34.jpg)
Edit-Compile-Preview
Edit: a text document with special formatting
Compile: to produce a pdf
Preview: with a pdf viewer
See web page for system specifics.
Thursday, 2 September 2010
![Page 35: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/35.jpg)
Latex
Template
Sections
Images
Figures and cross-references
Verbatim input (for code)
Thursday, 2 September 2010
![Page 36: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/36.jpg)
Your turn# Get the sample reportwget http://had.co.nz/stat405/\resources/sample-report.zip unzip sample-report.zip
cd sample-reportgedit template.tex &pdflatex template.texevince template.pdf# Experiment!
Thursday, 2 September 2010
![Page 37: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/37.jpg)
Your turn
If not on linux, follow the instructions on the class website.
If you feel comfortable, start on homework 2.
Thursday, 2 September 2010
![Page 38: 04 Reports](https://reader034.fdocuments.us/reader034/viewer/2022051608/5456546fb1af9fb66e8b4ed9/html5/thumbnails/38.jpg)
Homework
Thursday, 2 September 2010