Getting Started with R
-
Upload
jen-underwood -
Category
Data & Analytics
-
view
619 -
download
0
Transcript of Getting Started with R
© 2014 Impact Analytix, LLC
q u i c k l y m a k e a p o s i t i v e i m p a c t
R Part 1: Getting Started, Language Basics and Data Visualization
Jen UnderwoodFounder & Principal ConsultantImpact Analytix, LLC
© 2014 Impact Analytix, LLC
Impact Analytix, LLC
o Impact Analytix, LLC is a boutique business intelligence and predictive analytics firm based in Tampa, Florida.
o Jen Underwood, Founder & Principal Consultant • ~20 years of business intelligence industry experience
• Former Global Microsoft BI and Analytics Technical Product Manager and seasoned Big-Four Consulting BI Practice Lead
• Passionate technology blogger, evangelist and volunteer, TDWI, BeyeNETWORK, PASS, SharePoint Conference, and Microsoft TechEd
• Bachelor of Business Administration degreeUniversity of Wisconsin Milwaukee
Post Graduate Certificate Computer Science - Data Mining University of California, San Diego
© 2014 Impact Analytix, LLC
Getting Started
© 2014 Impact Analytix, LLC
Introduction
o R popular open source statistical platformo Base install with ~8 packageso Extended via packages o R package foreign allows for reading data from
SAS, SSPS and otherso RODBC, xlsx and many others for connectivityo Huge and growing global developer community
o Commonly used R Toolso The R Project for Statistical Computing
download latest R from CRANhttp://www.r-project.org
o R Studio IDE popular for developmenthttp://www.rstudio.comFree, open source, and works great on Windows, Mac, and Linux
© 2014 Impact Analytix, LLC
R
o RGUI.exe default user interface to the command line languageo R Console > o to quit type q() and to clear workspace ctrl + lo Case-sensitive: “jen”, “Jen” and “jEn” o ?, Help( ) or Help(function name)o Use print(variable name) to see variable content or
type the variable at command prompt and hit entero Use # for Comments in Ro Use <- for assigning a variable <-
> x <- “hello world”> x
[1] “hello world”> 1+2
[1] 3
© 2014 Impact Analytix, LLC
R Studio
Source R files
Console for interactive work
Variables and command history
Installed packages, help, other goodies
© 2014 Impact Analytix, LLC
Development Environment
o Set a working directory getwd()o save() or save.image() or use menu o load(“file name”)
o install.packages(package name) or use Tools > Install Packages menu
o Popular repositories for Rhttp://cran.r-project.org
o Use library(“package name”) to load a package only when needed to save memory
o detach(package: package name)unloads the package
© 2014 Impact Analytix, LLC
Development Environment
o Objects o Variables, arrays of numbers, strings,
functions, structures o Use memoryo objects() to see themo rm(object name) to remove them o If saved, in work directory file called .RData
o R function calls function.name(arguments, options)o Vectors, Lists, Arrays or Matrices and Data Frames o Data Frame in R like a database tableo Many ways to get data into and out of Ro R Data Import/Export Help has a plethora
of options to work with data
© 2014 Impact Analytix, LLC
Reading from Files
o Import Dataset menu in R Studioo Copy from clipboard read.delim("clipboard") and using scan()o Reads a file in table format and creates a data frame from it,
with cases corresponding to lines and variables to fields in the file.
read.csv(file, header = TRUE, sep = ",", quote = "\"",dec = ".", fill = TRUE, comment.char = "", ...)
read.delim(file, header = TRUE, sep = "\t", quote = "\"",dec = ".", fill = TRUE, comment.char = "", ...)
read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL,startRow=NULL, endRow=NULL, colIndex=NULL,as.data.frame=TRUE, header=TRUE, colClasses=NA,keepFormulas=FALSE, encoding="unknown", ...)
o Other packages like gdata or XLConnect
loadWorkbook("C:\\Users\\Jen\\Documents\\BikeBuyers.xlsx")
readWorksheet(wb, sheet = "BikeBuyers", startRow = 0, endRow = 10,startCol = 0, endCol = 0)
© 2014 Impact Analytix, LLC
Connecting to a Database
o RODBC package for database connectivity, also many vendor specific R packages available
o Other packages like sqlutils for database query and procedure calling functionality
odbcConnect(dsn, uid = "", pwd = "", ...)
odbcDriverConnect(connection = "", case, believeNRows = TRUE,
colQuote, tabQuote = colQuote,
interpretDot = TRUE, DBMSencoding = "",
rows_at_time = 100, readOnlyOptimize = FALSE)
odbcConnectAccess(access.file, uid = "", pwd = "", ...)
odbcConnectExcel(xls.file, readOnly = TRUE, ...)
© 2014 Impact Analytix, LLC
Function Description
odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database
sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame
sqlQuery(channel, query) Submit a query to an ODBC database and return the results
sqlSave(channel, mydf, tablename = sqtable, append = FALSE)
Write or update (append=True) a data frame to a table in the ODBC database
sqlDrop(channel, sqtable) Remove a table from the ODBC database
close(channel) Close the connection
Querying a Database
© 2014 Impact Analytix, LLC
Querying a Database
# RODBC Example
# import data from a DBMS
library(RODBC)
myconn <-odbcConnect("mydsn", uid=“Jen", pwd=“demo")
demoDf <- sqlQuery(myconn, "select top 10 * from dbo.FactInternetSales")
close(myconn)
© 2014 Impact Analytix, LLC
Writing to Files
o writeClipboard exports vector datao write.table converts object to a data frame and prints to a file or connection
write.table(x, file = "foo.csv", sep = ",", col.names = NA,qmethod = "double")
write.csv(x, file = "foo.csv")
© 2014 Impact Analytix, LLC
R Language Basics
© 2014 Impact Analytix, LLC
Basics
o Expressions: 1+1, 10*10, “Hello World”
o Logical Values: TRUE or FALSE
o Variables: Store values into a variable using <-x <- 42
o Functions: name and one or more arguments in parenthesis sum(1,3,4)
sqrt(16)
help(plot)
© 2014 Impact Analytix, LLC
Basics
o Vectors: numbers, strings, logical values, or any other type, as long as they're all the same type; c (Combine) creates a new vector by combining a list of valuesc(4, 7, 9)
o Matrices: 2-dimensional arraymatrix(0, 3, 4)
o Data Frames: similar to a database table or an Excel spreadsheetdemoDF <- c(“king”, “joy”, “pen”)demoDF2 = read.csv("C:\\Users\\Jen\\Documents\\BikeBuyers.csv")
© 2014 Impact Analytix, LLC
R Data Visualization
© 2014 Impact Analytix, LLC
Graphic Packages
o R graphics functions can be grouped into three types:o High level plotting functions that
create graph, often with axis labels and titles
o Low level plotting functions that allow additional information to be added to an existing graph, or that allow graphs to be drawn from scratch
o Interactive graphics functions that allow extraction of information
© 2014 Impact Analytix, LLC
> example(plot)
> example(barplot)
> example(boxplot)
> example(dotchart)
> example(coplot)
> example(hist)
> example(fourfoldplot)
> example(stars)
> example(image)
> example(contour)
> example(filled.contour)
> example(persp)
Graphic Examples
© 2014 Impact Analytix, LLC
Graphing with Sample Data Sets
o R comes with a package of base datasets library(help = "datasets")
o Use the print function to explore content print(iris)
o Start to play/explore using R visualizationsplot(iris$Petal.Length, iris$Petal.Width)
install.packages("ggplot2")library(“ggplot2”)qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
© 2014 Impact Analytix, LLC
Additional R Resources
© 2014 Impact Analytix, LLC
Resources
o Free O’Reilly R School http://tryr.codeschool.como CRAN Intro http://cran.r-project.org/doc/manuals/R-intro.htmlo R Tutor http://www.r-tutor.com/r-introductiono One Page Survival Guide
http://www.datasciencecentral.com/profiles/blogs/one-page-r-a-survival-guide-to-data-science-with-r
o Popular Bookso R Cookbook, R in a Nutshell, R for Business Analytics
© 2014 Impact Analytix, LLC
www.impactanalytix.com
quickly make a positive impact