R Lecture 5 Naomi Altman Department of Statistics.
-
Upload
adele-terry -
Category
Documents
-
view
215 -
download
2
Transcript of R Lecture 5 Naomi Altman Department of Statistics.
R Lecture 5
Naomi Altman
Department of Statistics
Example: RegressionThe data are available athttp://www.stat.psu.edu/~jls/stat511/homework/body.dat
?read.table
body=read.table("body.txt",header=T)
plot(body$hips,body$weight)
plot(body$waist,body$weight)
?formula
lm.out=lm(weight~hips+waist,data=body)
attributes(lm.out)
Formulaslm fits the regression of Y on a set of X variables.
The variable for Y and the predictors are denoted by a formula of the form.
You can also use formulas in other contexts. e.g.
plot(weight~waist, data=body)
Object Oriented Programming in R
or how a bunch of smart programming types made R easier to use and harder to program - at least in the eyes of a statistician
In the bad old daysIf I wanted to write a function similar to something already in R, I
would edit the R code:
myFun=edit(Rfun)
myDensity=edit(density)
Sometimes the R code would call a C or C++ program, but the code for that is also available.
But now ...
plot
boxplot
rnorm
Classes and Generic FunctionsI have already mentioned that one of the
attributes a R object can have is a class.
A generic function is a function that captures the class of an object and then calls another function to do the actual work. If the function is called fun and the class is called cls, the function that does the work is (almost always) called fun.cls.
If there is no suitable fun.cls, then fun.default is used.
e.g.
plot(body$hips,body$weight)
plot(lm.out)
plot.default
plot.lm
methods(plot)
Classes
Actually, a class can be a pair
c("first","second") in which the "first" "inherits from" i.e. is a special case of "second". In practise, this means that it has all the components of class "first" objects but possibly some additional ones.
If there is no fun.first, then the generic function will search for fun.second. Only if there is also no fun.second will fun.default be used.
e.g. plot
uses plot.lm on an object with class "lm"
and also on an object with class ("glm","lm")
'inherits' indicates whether its first argument inherits from any of the classes specified in the 'what' argument
glm.out=glm(weight~hips+waist,data=body)
class(glm.out)"glm" "lm"
inherits(lm.out,"lm") inherits(glm.out,"lm")inherits(lm.out,"glm") inherits(glm.out,"glm")
plot.lmplot.glm
plot(glm.out)
unclassIf you remove the class, most objects are just lists.
lm.out
unclass(lm.out)
For example, the "lm" objects are lists with the following components:
"coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model"
Some of these components are obvious.Some of them are matrix computations that can be used to compute, e.g. the leverages and Cook's Distance (notice that these have not been stored).Some of them are only empty - they are used primarily when the predictor variable is a factor (ANOVA).
Why use classesFor the user: less to think about
e.g. you can try generic functions like plot and summary with any output
For the programmer: provides a framework
e.g. you might think about having a plot.myfun and summary.myfun for the function you are writing
also, you can use inheritance so that you do not need to write your own functions
Generic Functions
Functions that act on many different types of objects are termed "generic functions".
Examples include:
plot print
summary coefficients
anova residuals
Generic Functions
We have already seen that generic functions behave differently for different classes. The idea is that the user should not have to remember a lot of different function names.
Generic functions are a "good thing" when you want R to do what someone else thinks it should do and can be a "bad thing" when you are trying to do something else with your data.
Generic Functions
The form of the generic function "genfun" is
genfun=function (object, ...) {
UseMethod("genfun")
}
Generic Functions
We can use UseMethod to give aliases to the same function.
genfun=function (object, ...){
UseMethod("genfun")}
gen=function (object, ...){
UseMethod("genfun")}
gfun=function (object, ...){
UseMethod("genfun")}
Generic Functions
If you want an argument other than the first to be the one whose class controls the generic function, then the name of the argument must be sent to UseMethod
genfun=function(x,y,z,...){
UseMethod("genfun",z)
}
Generic Functions
If UseMethod finds that the calling object inherits from a class, it searches for a function "genfun.class". If there is no function that matches the class, it looks through the inheritance list. If there is no match, or no class, the function "genfun.default" is used.
Generic Functions
There is a lot more on this in the
"S Poetry" manual - it looks very complete to me.
I have been writing programs in S/R since
1981, and have not needed to create classes or methods but ...
Generic FunctionsI have often used an existing function to create
new functions - I have been confused by failing to understand generic functions (especially "summary" and "print").
One way to become well-known is to distribute your methodology as an R package. To be distributed from CRAN or other project repositories, your package must adhere to R programming standards.
Generic Functions
Some of the newer packages (particularly packages for bioinformatics) rely heavily on the use of Generic Functions, and you can never understand what they are doing without understanding at least the basics of this material.
SlotsI was not able to find an intuitive definition for "slot" so this is my
own heuristic.
An object is a list with a class.
A slot is a function that extracts data from an object.
It may be one of the elements stored in the object, or a derived data element.
Slots For example: an lm object includes the list:For example: an lm object includes the list:
"coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model"
We might build a new class, "Elm" (extended We might build a new class, "Elm" (extended "lm")"lm")
Slots Suppose we wanted to write a method that draws
a histogram of any of dependent variable, residuals, studentized residuals, fitted values.
We could have a method of the form:
hist.Elm=function(object,slot)
Our slots would be: dependent, residuals, student, fitted
SlotsIf we set class(lm.out)=c("Elm","lm")
then
hist(lm.out,residual) would extract the residuals from the list and draw the histogram.
hist(lm.out,student) would compute the studentized residuals (which are not stored) and draw the histogram.
SlotsBy convention, the slots of an object can be
extracted either by:
objectname@slotname
or
slotname(objectname)
SlotsAgain, I have used S/R for many years without
writing or even encountering slots.
But some of the recent packages use this programming concept, so it is important to understand it.
My understanding is that slots are used primarily in areas like data-mining and microarrays, where the data storage requirements are large.
Learning to Use Objects and other Extensions
Calling C or C++ from R:
Writing R extensions
Object oriented programming in R
(S3 protocol)
R Language Definition
(S4 protocol)
R Internals