Bob Muenchen, Author R for SAS and SPSS Users,Co-Author R for Stata Users
[email protected], http://r4stats.com
Copyright © 2010, 2011, Robert A Muenchen. All rights reserved. 2
What is R? R’s Advantages R’s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More
3
“The most powerful statistical computing language on the planet.” -Norman Nie, Developer of SPSS
Language + package + environment for graphics and data analysis
Free and open source Created by Ross Ihaka & Robert Gentleman 1996
& extended by many more An implementation of the S language by
John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs
4
http://r4stats.com/popularity5 6Source: r4stats.com/popularity
1. Data input & management (data step)2. Analytics & graphics procedures (proc step)3. Macro language4. Matrix language5. Output management systems (ODS/OMS)
R integrates these all seamlessly.
7
* SAS Approach;
DATA A; SET A;
logX = log(X);
PROC REG;
MODEL Y = logX;
# R Approach
lm( Y ~ log(X) )
8
Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…) Its object orientation “does the right thing” Its language is powerful & fully integrated Procedures you write are on an equal footing It is the universal language of data analysis It runs on any computer Being open source, you can study and modify it It is free
9 10
* Using SAS;PROC TTEST DATA=classroom; CLASS gender; VAR score;
# In Rt.test(score ~ gender, data=classroom)
t.test(posttest, pretest , paired=TRUE, data=classroom)
11
Language is somewhat harder to learn Help files are sparse & complex Must find R and its add-ons yourself Graphical user interfaces not as polished Most R functions hold data in main memory Rule-of-thumb: 10 million values per gigabyte SAS/SPSS: billions of records Several efforts underway to break R’s memory limit
including Revolution Analytics’ distribution
12
Base R plus Recommended Packages like: Base SAS, SAS/STAT, SAS/GRAPH, SAS/IML Studio SPSS Stat. Base, SPSS Stat. Advanced, Regression
Tested via extensive validation programs But add-on packages written by… Professor who invented the method? A student interpreting the method?
13
Email support is free, quick, 24-hours: www.r-project.org/mail.html Stackoverflow.com Quora.com Crossvalidated stats.stackexchange.com
/questions/tagged/r Phone support available commercially
14
1. Go to cran.r-project.org, the Comprehensive R Archive Network
2. Download binaries for Base & run3. Add-ons:
install.packages(“myPackage”)
4. To update: update.packages()
15
Comprehensive R Archive Network Crantastic.com Inside-R.org R4Stats.com
16
17 18
19 20
21
Run code interactively Submit code from Excel, SAS, SPSS,… Point-n-click using
Graphical User Interfaces (GUIs) Batch mode
22
23 24
25 26Copyright © 2010, 2011, Robert A Muenchen. All rights reserved.
run ExportDataSetToR("mydata");
submit/r;mydata$workshop <-factor(mydata$workshop)
summary(mydata)
endsubmit;
27
GET FILE=‘mydata.sav’.BEGIN PROGRAM R.mydata <- spssdata.GetDataFromSPSS(
variables = c("workshop gender q1 to q4"),missingValueToNA = TRUE,row.label = "id" )
summary(mydata)END PROGRAM.
28
29 30
31 32
33 34
A company focused on R development & support Run by SPSS founder Norman Nie Their enhanced distribution of R:
Revolution R Enterprise Free for colleges and universities, including for
outside consulting
35
43 44
mydata <- read.csv("mydata.csv")
print(mydata)
mydata$workshop <- factor(mydata$workshop)
summary(mydata)
plot( mydata$q1, mydata$q4 )
myModel <- lm( q4~q1+q2+q3, data=mydata )
summary( myModel )
anova( myModel )
plot( myModel )
45
> mydata <- read.csv("mydata.csv")
> print(mydata)
workshop gender q1 q2 q3 q4
1 1 f 1 1 5 1
2 2 f 2 1 4 1
3 1 f 2 2 4 3
4 2 <NA> 3 1 NA 3
5 1 m 4 5 2 4
6 2 m 5 4 5 5
7 1 m 5 3 4 4
8 2 m 4 5 5 546
> mydata$workshop <-factor(mydata$workshop)
> summary(mydata)workshop gender
1:4 f :3
2:4 m :4
NA's:1
q1 q2 q3 q4
Min. :1.00 Min. :1.00 Min. :2.000 Min. :1.00
1st Qu.:2.00 1st Qu.:1.00 1st Qu.:4.000 1st Qu.:2.50
Median :3.50 Median :2.50 Median :4.000 Median :3.50
Mean :3.25 Mean :2.75 Mean :4.143 Mean :3.25
3rd Qu.:4.25 3rd Qu.:4.25 3rd Qu.:5.000 3rd Qu.:4.25
Max. :5.00 Max. :5.00 Max. :5.000 Max. :5.00
NA's :1.000
47 48
> myModel <- lm(q4 ~ q1+q2+q3, data=mydata)
> summary(myModel)
Call:
lm(formula = q4 ~ q1 + q2 + q3, data = mydata)
Residuals:
1 2 3 5 6 7 8
-0.3113 -0.4261 0.9428 -0.1797 0.0765 0.0225 -0.1246
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.3243 1.2877 -1.028 0.379
q1 0.4297 0.2623 1.638 0.200
q2 0.6310 0.2503 2.521 0.086
q3 0.3150 0.2557 1.232 0.306
Multiple R-squared: 0.9299, Adjusted R-squared: 0.8598
F-statistic: 13.27 on 3 and 3 DF, p-value: 0.03084
49 50Copyright © 2010, 2011, Robert A Muenchen. All rights reserved.
51 52
53
R for SAS and SPSS Users, Muenchen R for Stata Users, Muenchen & Hilbe R Through Excel: A Spreadsheet Interface for Statistics,
Data Analysis, and Graphics, Heiberger & Neuwirth Data Mining with Rattle and R: The Art of Excavating
Data for Knowledge Discovery, Williams
54
R is powerful, extensible, free Download it from CRAN Academics download Revolution R Enterprise
for free at www.revolutionanalytics.com You run it many ways & from many packages Several graphical user interfaces are available R's programming language is the way
to access its full power
55
Slides: r4stats.com/misc/webinarPresentation: bit.ly/R-sas-spss
Top Related