R - Get Started I - Sanaitics
-
Upload
vijith-nair -
Category
Education
-
view
333 -
download
0
Transcript of R - Get Started I - Sanaitics
![Page 1: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/1.jpg)
1
GET STARTED IN RModule I
![Page 2: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/2.jpg)
2
What is R?• A language and environment for statistical
computing and graphics
• Wide variety of statistical & graphical techniques built in
• Free and Open Source software
• Compiles and runs on a wide variety of UNIX platforms, Windows and MacOS
![Page 3: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/3.jpg)
3
R Environment• Most functionality through built in functions
• Basic functions available by default
• Other functions contained in packages that can be attached
• All datasets created in the session are in Memory
• Output can be used as input to other functions
![Page 4: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/4.jpg)
4
R-CRAN• The Comprehensive R Archive Network
• A network of global web servers storing identical, up-to-date, versions of code and documentation for R
• Use the CRAN mirror nearest to you to download R setup at a faster speed
• http://cran.r-project.org/
![Page 5: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/5.jpg)
5
SECTION 1Introduction to R and User Interface (UI)
![Page 6: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/6.jpg)
6
The R-UI
The R console-- Type in the command -- Press enter-- See the output
![Page 7: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/7.jpg)
7
Create Your Data Setx<-c(12,23,45)y<-c(13,21,6) Create vectors x, y, z on R
Consolez<-c("a","b","c")data1<-data.frame(x,y,z) Combine them in a
tabledata1 Type data name for output
![Page 8: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/8.jpg)
8
Your data in Table mode
edit(data1)
Edit your table, add new values & close the window to save your updates
![Page 9: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/9.jpg)
9
Export your updated table write.csv (data1,file.choose())
Select the path, name your file and click save
![Page 10: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/10.jpg)
10
Data Set typesVector 1 dimension
All elements have the same data types
NumericCharacterLogicFactor
Matrix 2 dimensions
Array 2 or moredimensions
Data frame 2 dimensionsTable-like data object allowing different data types for different columns
ListCollection of data objects, each element of a list is a data object
![Page 11: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/11.jpg)
11
Useful In-Built Functionsmin(data1$x)Use $ sign after data set name to use specific columns
max(data1$y) length(data1$x)
mean(data1$y)
levels(data1$z) Gives a list of unique categories in the data
![Page 12: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/12.jpg)
12
Save Your ScriptOpen a new script, write commands, execute using F5 key. Save the file at a desired folder. Helps to save the script for a later use.
![Page 13: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/13.jpg)
13
R Workspace• Objects that you create during an R session are
hold in memory, the collection of objects that you currently have is called the workspace
• This workspace is not saved on disk unless you tell R to do so
• This means that your objects are lost when you close R and not save the objects, or worse when R or your system crashes on you during a session
![Page 14: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/14.jpg)
14
R Workspace• If you have saved a workspace image and
you start R the next time, it will restore the workspace
• So all your previously saved objects are available again
• You can also explicitly load a saved workspace i.e., that could be the workspace image of someone else
• Go the `File' menu and select `Load workspace'
![Page 15: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/15.jpg)
15
R WorkspaceDisplay last 25 commandshistory()
Display all previous commandshistory(max.show=Inf)
Save your command history to a file. Default is ".Rhistory"savehistory(file="myfile")
Recall your command history. Default is ".Rhistory"loadhistory(file="myfile")
![Page 16: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/16.jpg)
16
Types of Variables• Type: Logical, integer, double, complex,
character, factor or raw
• Type conversion functions have the naming convention
as.xxx
• For example, as.integer(3.2) returns the integer 3, and as.character(3.2) returns the string "3.2".
![Page 17: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/17.jpg)
17
Factors• Tell R that a variable is nominal by making it
a factor
• The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers
• An ordered factor is used to represent an ordinal variable
![Page 18: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/18.jpg)
18
Factors• R will treat factors as nominal variables and
ordered factors as ordinal variables in statistical procedures and graphical analyses
• You can use options in the factor( ) and ordered( ) functions to control the mapping of integers to strings (overriding the alphabetical ordering)
![Page 19: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/19.jpg)
19
Help in R•If you know the topic but not the exact
function
Help.search(“topic”)
•If you know the exact function
Help(functionname)
![Page 20: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/20.jpg)
20
In-Memory Computing•Storage of information in the main RAM
of dedicated servers rather than in complicated relational databases operating on comparatively slow disk drives
•Helps business customers, including retailers, banks and utilities, to quickly detect patterns, analyze massive data volumes on the fly, and perform their operations quickly
![Page 21: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/21.jpg)
21
Import .csv Data Filebasic_salary<-read.table(“C:/Users/Documents/R/BASIC SALARY DATA.csv", header=TRUE, sep=",")
• Command requires the file path separated by a /
• header=TRUE if there are row labels
![Page 22: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/22.jpg)
22
Data ImportedUse Head function to get an idea about how your data looks
like
head(basic_salary)
![Page 23: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/23.jpg)
23
Manual Importbasic_salary<-read.table(file.choose(), header=TRUE , sep=",")
Select file of interest and click open to import
![Page 24: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/24.jpg)
24
Checks after Importdim(basic_salary)
str(basic_salary)
names(basic_salary)
![Page 25: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/25.jpg)
25
Importing .txt File#importing a text file having dlm = “blank/space”sample1<-read.table(file.choose(),header=T)sample1
#importing a comma separated text file new<-read.table(file.choose(), header=T, sep=",")new
#another mehod for importing a comma separated text file
new1<-read.csv(file.choose(),header=T)new1
![Page 26: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/26.jpg)
26
SECTION 2Data Management: Creating Subsets
![Page 27: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/27.jpg)
27
Using Indices•Row level sub setting: display some
selected rows •Sub setting by putting condition on
observations•Column level sub setting: display some
selected columns •Sub setting by putting condition on
variable names•Display selected rows for selected
columns•Sub setting by putting conditions on
variables and observations
![Page 28: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/28.jpg)
28
Row Sub-Settingbasic_salary[c(5:10), ] Display rows from 5th to 10th
![Page 29: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/29.jpg)
29
Row Sub-Settingbasic_salary[c(1,3,5,8), ] Display only selected rows
![Page 30: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/30.jpg)
30
Condition on Observationsbasic_salary1 <- basic_salary[basic_salary$Location ==“DELHI” & basic_salary$ba > 20000, ]
head(basic_salary1)
![Page 31: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/31.jpg)
31
More than one value of Variablebasic_salary2 <-basic_salary[basic_salary$Location %in% c("MUMBAI"),]
head(basic_salary2)
![Page 32: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/32.jpg)
32
Column Sub-Settingbasic_salary3<-basic_salary[ , 1:2]
head(basic_salary3)
![Page 33: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/33.jpg)
33
Condition on Variable Namesbasic_salary4 <- basic_salary[ ,c("First_Name","Location", "ba")]
head(basic_salary4)
![Page 34: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/34.jpg)
34
Selected Rows for Selected Columns
basic_salary5<-basic_salary[c(1,5,8,4), c(1,2)]
basic_salary5
![Page 35: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/35.jpg)
35
Condition on Variables & Observations
basic_salary6<-basic_salary[basic_salary$Grade=="GR1" & basic_salary$ba >15000 , c("First_Name","Location", "ba") ]
head(basic_salary6)
![Page 36: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/36.jpg)
36
Subset Function•Condition on observations
•Condition on variable names
•Condition on observations and variable names
![Page 37: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/37.jpg)
37
Condition on Observationsbasic_salary7<-subset(basic_salary, Location=="DELHI" & ba > 20000)
head(basic_salary7)
![Page 38: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/38.jpg)
38
Condition on Variable Namesbasic_salary8<-subset(basic_salary,select=c(Location,First_Name,ba))
head(basic_salary8)
![Page 39: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/39.jpg)
39
Condition on Observations & Variablebasic_salary9<-subset(basic_salary,Grade=="GR1" & ba>15000,select= c(First_Name,Grade, Location))
head(basic_salary9)
![Page 40: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/40.jpg)
40
Using Head Functionhead(basic_salary, n=5)
![Page 41: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/41.jpg)
41
Using Tail Functiontail(basic_salary, n=5)
![Page 42: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/42.jpg)
42
Using Split Functionbasic_salary10<-split(basic_salary,basic_salary$Location)
delhi_salary<-data.frame(basic_salary10[1])
head(delhi_salary)
![Page 43: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/43.jpg)
43
Using Split Functionmumbai_salary<-data.frame(basic_salary10[2])
head(mumbai_salary,5)
![Page 44: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/44.jpg)
44
Delete Casesbasic_salary11<-basic_salary[!(basic_salary$Grade=="GR1") & !(basic_salary$Location=="MUMBAI"), ]
basic_salary11
![Page 45: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/45.jpg)
45
Delete Casesbasic_salary12<-basic_salary[ !(basic_salary$ba>14500), ]
basic_salary12
![Page 46: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/46.jpg)
46
Recoding Based on Quartilesbrk1<-with(basic_salary,quantile(basic_salary$ba,probs<-seq(0,1,0.25)))brk1
basic_salary13<-within(basic_salary,salary_quartile<-cut(basic_salary$ba,breaks=brk1,labels=1:4,include.lowest=TRUE))
![Page 47: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/47.jpg)
47
Recoding Based on Quartileshead(basic_salary13)
![Page 48: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/48.jpg)
48
Recoding Based on Decilesbrk2<-with(basic_salary,quantile(basic_salary$ba, probes<-seq(0,1,0.10)))brk2
basic_salary14<-within(basic_salary,salary_deciles<- cut(basic_salary$ba,breaks=brk2,labels=1:10,include.lowest=TRUE))
![Page 49: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/49.jpg)
49
Recoding Based on Decilestail(basic_salary14)
![Page 50: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/50.jpg)
50
Recoding Based on User Cut Pointsbrk3<-with(basic_salary,c(min(basic_salary$ba), 15000,18000,max(basic_salary$ba)))
brk3
basic_salary15<-within(basic_salary,salary_partition<-cut(basic_salary$ba,breaks=brk3,labels=1:3,include.lowest=TRUE))
![Page 51: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/51.jpg)
51
Recoding Based on User Cut Points basic_salary15[c(20,30),]
![Page 52: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/52.jpg)
52
Remove Variables#using remove.vars function (need to load “gdata”
package)
Removing variable 'Name'Removing variable 'Place‘
basic_salary<-remove.vars(basic_salary,c("Last_Name", "Location"))
names(basic_salary)[1] "X" "First_Name" "Grade" "ba"
![Page 53: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/53.jpg)
53
SECTION 3Sorting & Merging
![Page 54: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/54.jpg)
54
Sorting Dataattach(basic_salary)bs_sorted1<-basic_salary[order(-ba), ]head(bs_sorted1)
![Page 55: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/55.jpg)
55
Sorting by Multiple Variablesbs_sorted2<-basic_salary[order( First_Name, ba) , ] head(bs_sorted2)
![Page 56: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/56.jpg)
56
Merging by VariablesCreate the below two datasetssal_data
bonus_data
![Page 57: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/57.jpg)
57
Outer Joinouterjoin<-
merge(sal_data,bonus_data,by=c("Employee_ID"),all=TRUE)
outerjoin
![Page 58: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/58.jpg)
58
Inner Joininnerjoin<-merge(sal_data,bonus_data,by=("Employee_ID"))
innerjoin
![Page 59: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/59.jpg)
59
Left Joinleftjoin<-merge(sal_data,bonus_data, by=c("Employee_ID"),all.x=TRUE)
leftjoin
![Page 60: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/60.jpg)
60
Right Joinright_join<-merge(sal_data,bonus_data, by="Employee_ID" , all.y=TRUE)
right_join
![Page 61: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/61.jpg)
61
Merging Cases (append)# rbind function
Appending two datasets using rbind function requires both the datasets with exactly the same number of variables with exactly the same names
If datasets do not have the same number of variables , variables can be either dropped or created so both match.
or else
# smartbind function can be used #need to donwload gtools package to use this function
![Page 62: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/62.jpg)
62
R-String Functionsbasic_salary$Location<-substr(basic_salary$Location,1,1)head(basic_salary)
![Page 63: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/63.jpg)
63
Replace Functionbasic_salary$Location<-substr(basic_salary$Location,1,1)head(basic_salary)table(basic_salary$Location)
![Page 64: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/64.jpg)
64
Contd…..
replace Functionreplace() function replaces the values in x with indices given in list by thosegiven in values. If necessary, the values in values are recycled.
![Page 65: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/65.jpg)
65
Deleting missing cases# complete.cases, na.exclude(), na.omit()are for dealing manually with NAs in a dataset
x<-c(12,NA,13,12)y<-c(25,48,NA,NA) data<-data.frame(x,y) data
data1<-na.omit(data) data1
x y1 12 25
data[complete.cases(data),]
x y1 12 25
na.exclude(data) x y1 12 25
![Page 66: R - Get Started I - Sanaitics](https://reader031.fdocuments.us/reader031/viewer/2022030211/58a39f191a28abb1348b6809/html5/thumbnails/66.jpg)
66
Thank you !!!!