Getting Started with R

23
quickly make a positive impact R Part 1: Getting Started, Language Basics and Data Visualization Jen Underwood Founder & Principal Consultant Impact Analytix, LLC 813.435.5344 [email protected] www.impactanalytix.com

Transcript of Getting Started with R

Page 1: Getting Started with R

© 2014 Impact Analytix, LLC

q u i c k l y m a k e a p o s i t i v e i m p a c t

R Part 1: Getting Started, Language Basics and Data Visualization

Jen UnderwoodFounder & Principal ConsultantImpact Analytix, LLC

[email protected]

Page 2: Getting Started with R

© 2014 Impact Analytix, LLC

Impact Analytix, LLC

o Impact Analytix, LLC is a boutique business intelligence and predictive analytics firm based in Tampa, Florida.

o Jen Underwood, Founder & Principal Consultant • ~20 years of business intelligence industry experience

• Former Global Microsoft BI and Analytics Technical Product Manager and seasoned Big-Four Consulting BI Practice Lead

• Passionate technology blogger, evangelist and volunteer, TDWI, BeyeNETWORK, PASS, SharePoint Conference, and Microsoft TechEd

• Bachelor of Business Administration degreeUniversity of Wisconsin Milwaukee

Post Graduate Certificate Computer Science - Data Mining University of California, San Diego

Page 3: Getting Started with R

© 2014 Impact Analytix, LLC

Getting Started

Page 4: Getting Started with R

© 2014 Impact Analytix, LLC

Introduction

o R popular open source statistical platformo Base install with ~8 packageso Extended via packages o R package foreign allows for reading data from

SAS, SSPS and otherso RODBC, xlsx and many others for connectivityo Huge and growing global developer community

o Commonly used R Toolso The R Project for Statistical Computing

download latest R from CRANhttp://www.r-project.org

o R Studio IDE popular for developmenthttp://www.rstudio.comFree, open source, and works great on Windows, Mac, and Linux

Page 5: Getting Started with R

© 2014 Impact Analytix, LLC

R

o RGUI.exe default user interface to the command line languageo R Console > o to quit type q() and to clear workspace ctrl + lo Case-sensitive: “jen”, “Jen” and “jEn” o ?, Help( ) or Help(function name)o Use print(variable name) to see variable content or

type the variable at command prompt and hit entero Use # for Comments in Ro Use <- for assigning a variable <-

> x <- “hello world”> x

[1] “hello world”> 1+2

[1] 3

Page 6: Getting Started with R

© 2014 Impact Analytix, LLC

R Studio

Source R files

Console for interactive work

Variables and command history

Installed packages, help, other goodies

Page 7: Getting Started with R

© 2014 Impact Analytix, LLC

Development Environment

o Set a working directory getwd()o save() or save.image() or use menu o load(“file name”)

o install.packages(package name) or use Tools > Install Packages menu

o Popular repositories for Rhttp://cran.r-project.org

o Use library(“package name”) to load a package only when needed to save memory

o detach(package: package name)unloads the package

Page 8: Getting Started with R

© 2014 Impact Analytix, LLC

Development Environment

o Objects o Variables, arrays of numbers, strings,

functions, structures o Use memoryo objects() to see themo rm(object name) to remove them o If saved, in work directory file called .RData

o R function calls function.name(arguments, options)o Vectors, Lists, Arrays or Matrices and Data Frames o Data Frame in R like a database tableo Many ways to get data into and out of Ro R Data Import/Export Help has a plethora

of options to work with data

Page 9: Getting Started with R

© 2014 Impact Analytix, LLC

Reading from Files

o Import Dataset menu in R Studioo Copy from clipboard read.delim("clipboard") and using scan()o Reads a file in table format and creates a data frame from it,

with cases corresponding to lines and variables to fields in the file.

read.csv(file, header = TRUE, sep = ",", quote = "\"",dec = ".", fill = TRUE, comment.char = "", ...)

read.delim(file, header = TRUE, sep = "\t", quote = "\"",dec = ".", fill = TRUE, comment.char = "", ...)

read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL,startRow=NULL, endRow=NULL, colIndex=NULL,as.data.frame=TRUE, header=TRUE, colClasses=NA,keepFormulas=FALSE, encoding="unknown", ...)

o Other packages like gdata or XLConnect

loadWorkbook("C:\\Users\\Jen\\Documents\\BikeBuyers.xlsx")

readWorksheet(wb, sheet = "BikeBuyers", startRow = 0, endRow = 10,startCol = 0, endCol = 0)

Page 10: Getting Started with R

© 2014 Impact Analytix, LLC

Connecting to a Database

o RODBC package for database connectivity, also many vendor specific R packages available

o Other packages like sqlutils for database query and procedure calling functionality

odbcConnect(dsn, uid = "", pwd = "", ...)

odbcDriverConnect(connection = "", case, believeNRows = TRUE,

colQuote, tabQuote = colQuote,

interpretDot = TRUE, DBMSencoding = "",

rows_at_time = 100, readOnlyOptimize = FALSE)

odbcConnectAccess(access.file, uid = "", pwd = "", ...)

odbcConnectExcel(xls.file, readOnly = TRUE, ...)

Page 11: Getting Started with R

© 2014 Impact Analytix, LLC

Function Description

odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database

sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame

sqlQuery(channel, query) Submit a query to an ODBC database and return the results

sqlSave(channel, mydf, tablename = sqtable, append = FALSE)

Write or update (append=True) a data frame to a table in the ODBC database

sqlDrop(channel, sqtable) Remove a table from the ODBC database

close(channel) Close the connection

Querying a Database

Page 12: Getting Started with R

© 2014 Impact Analytix, LLC

Querying a Database

# RODBC Example

# import data from a DBMS

library(RODBC)

myconn <-odbcConnect("mydsn", uid=“Jen", pwd=“demo")

demoDf <- sqlQuery(myconn, "select top 10 * from dbo.FactInternetSales")

close(myconn)

Page 13: Getting Started with R

© 2014 Impact Analytix, LLC

Writing to Files

o writeClipboard exports vector datao write.table converts object to a data frame and prints to a file or connection

write.table(x, file = "foo.csv", sep = ",", col.names = NA,qmethod = "double")

write.csv(x, file = "foo.csv")

Page 14: Getting Started with R

© 2014 Impact Analytix, LLC

R Language Basics

Page 15: Getting Started with R

© 2014 Impact Analytix, LLC

Basics

o Expressions: 1+1, 10*10, “Hello World”

o Logical Values: TRUE or FALSE

o Variables: Store values into a variable using <-x <- 42

o Functions: name and one or more arguments in parenthesis sum(1,3,4)

sqrt(16)

help(plot)

Page 16: Getting Started with R

© 2014 Impact Analytix, LLC

Basics

o Vectors: numbers, strings, logical values, or any other type, as long as they're all the same type; c (Combine) creates a new vector by combining a list of valuesc(4, 7, 9)

o Matrices: 2-dimensional arraymatrix(0, 3, 4)

o Data Frames: similar to a database table or an Excel spreadsheetdemoDF <- c(“king”, “joy”, “pen”)demoDF2 = read.csv("C:\\Users\\Jen\\Documents\\BikeBuyers.csv")

Page 17: Getting Started with R

© 2014 Impact Analytix, LLC

R Data Visualization

Page 18: Getting Started with R

© 2014 Impact Analytix, LLC

Graphic Packages

o R graphics functions can be grouped into three types:o High level plotting functions that

create graph, often with axis labels and titles

o Low level plotting functions that allow additional information to be added to an existing graph, or that allow graphs to be drawn from scratch

o Interactive graphics functions that allow extraction of information

Page 19: Getting Started with R

© 2014 Impact Analytix, LLC

> example(plot)

> example(barplot)

> example(boxplot)

> example(dotchart)

> example(coplot)

> example(hist)

> example(fourfoldplot)

> example(stars)

> example(image)

> example(contour)

> example(filled.contour)

> example(persp)

Graphic Examples

Page 20: Getting Started with R

© 2014 Impact Analytix, LLC

Graphing with Sample Data Sets

o R comes with a package of base datasets library(help = "datasets")

o Use the print function to explore content print(iris)

o Start to play/explore using R visualizationsplot(iris$Petal.Length, iris$Petal.Width)

install.packages("ggplot2")library(“ggplot2”)qplot(Sepal.Length, Petal.Length, data = iris, color = Species)

Page 21: Getting Started with R

© 2014 Impact Analytix, LLC

Additional R Resources

Page 22: Getting Started with R

© 2014 Impact Analytix, LLC

Resources

o Free O’Reilly R School http://tryr.codeschool.como CRAN Intro http://cran.r-project.org/doc/manuals/R-intro.htmlo R Tutor http://www.r-tutor.com/r-introductiono One Page Survival Guide

http://www.datasciencecentral.com/profiles/blogs/one-page-r-a-survival-guide-to-data-science-with-r

o Popular Bookso R Cookbook, R in a Nutshell, R for Business Analytics

Page 23: Getting Started with R

© 2014 Impact Analytix, LLC

www.impactanalytix.com

quickly make a positive impact