Univariate data analysis -...
Transcript of Univariate data analysis -...
![Page 1: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/1.jpg)
Univariate data analysis
![Page 2: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/2.jpg)
Loading Nations.txtI We want to load Nations.txt located in ...I C:/Program Files/R/R-2.13.0/library/Rcmdr/etc/ ...I And call it mydata
![Page 3: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/3.jpg)
Extracting variables from the data set
I To refer to the variables we typename-dataset$name-variable
I Put the sign $ between name of the data set and the variableyou want to see.
names(mydata)
mydata$GDP
![Page 4: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/4.jpg)
![Page 5: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/5.jpg)
Graphical displays - barchart
![Page 6: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/6.jpg)
Graphical displays - barchart cont.
![Page 7: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/7.jpg)
Graphical displays - barchart cont.I This one is with the default settings
Africa Americas Asia Europe Oceania
region
Fre
quen
cy
010
2030
4050
![Page 8: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/8.jpg)
Graphical displays - barchart cont.
I Use the Script Window to obtain ’pretty’ barchart: col to setup the color, and main, for the main title, and storefrequencies/stats in a variable b by writing b = barplot(...)
barplot(table(mydata$region),xlab="region",
ylab="Frequency",col="blue",main="My Barchart")
# For all options of command barplot , type:
?barplot
![Page 9: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/9.jpg)
![Page 10: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/10.jpg)
Graphical displays - barchart cont.I This is the result
Africa Americas Asia Europe Oceania
MY FIRST, REALLY COOL BARCHART
region
Fre
quen
cy
010
2030
4050
![Page 11: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/11.jpg)
Graphical displays - histogram
I A histogram is a graphical display of tabulated frequencies,shown as bars. It shows what proportion of cases fall intoeach of several categories.
I Procedure:
Graph ⇒ Histogram
Select the variable of interest
Select the axis scaling
OK
![Page 12: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/12.jpg)
Graphical displays - histogram
![Page 13: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/13.jpg)
Graphical displays - histogram
I For all options of command hist, type:?hist
I Use the menu or/and modify in the Script Window to changecolor, etc and get stats
I Set right to FALSE to exclude right-end point of theintervals
hist(mydata$GDP ,right=FALSE ,col="red")
I Other nice options, using for example,
xlab="GDP",main="My Histogram"
![Page 14: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/14.jpg)
Graphical displays - histogram cont.I This is the result
Histogram of mydata$GDP
mydata$GDP
Fre
quen
cy
0 10000 20000 30000 40000
020
4060
8010
012
014
0
![Page 15: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/15.jpg)
Graphical displays - boxplot
I A boxplot graphically visualise data through their five-numbersummaries: the smallest observation (minimum), lowerquartile (Q1), median (Q2), upper quartile (Q3), and largestobservation (maximum).
I A quartile is any of the three values which divide the sorteddataset into four equal parts, so that each part represents onefourth of the sampled population.
I Outliers, points which are more than 1.5 the interquartilerange (Q3-Q1) away from the interquartile boundaries aremarked individually.
![Page 16: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/16.jpg)
Graphical displays - boxplot
I Select the variable of interest
I Plot by groups: allows you to have boxplots side by side bysplitting the variable by a categorical variable.
I Identify outliers with mouse: this option allows you tohover over a outlier data point and determine its position inthe dataset.
I OK
![Page 17: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/17.jpg)
Graphical displays - boxplot
![Page 18: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/18.jpg)
Graphical displays - boxplot
I For all options of command boxplot, type:?boxplot
I Use the menu or/and modify in the Script Window to changecolor, etc and get stats
boxplot(GDP ∼ region , ylab="region",
data=mydata , col =1:5)
![Page 19: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/19.jpg)
Graphical displays - boxplot cont.I Can be obtained by group if applicable (here by region)
●●●
●
●
●
●
●
●
●
●
●
●●
●
Africa Americas Asia Europe Oceania
010
000
2000
030
000
4000
0
region
GD
P
![Page 20: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/20.jpg)
Saving graphs
![Page 21: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/21.jpg)
Numerical summariesI mean, quasi-standard deviation, min, first quartile, median
(second quartile), third quartile, max, sample size, number ofmissing values
![Page 22: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/22.jpg)
Numerical summaries
I Statistics ⇒ Summaries ⇒ Numerical summary
I If you have multiple groups (e.g. control versus treatment)click on summarize by groups and select the appropriatevariable
I OK
![Page 23: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/23.jpg)
Numerical summaries
![Page 24: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/24.jpg)
Numerical summariesI Can be obtained by group if applicable (here by region)
![Page 25: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/25.jpg)
Numerical summaries
Coefficient of Variation: CV =s
x̄
I Coefficient of variation by hand (compute the mean and SDignoring the missing values coded as NA!)
s = sd(mydata$contraception , na.rm=TRUE)
xbar = mean(mydata$contraception , na.rm=TRUE)
CV = s/xbar
CV
![Page 26: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/26.jpg)
Numerical summaries
![Page 27: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/27.jpg)
Numerical summaries
Coefficient of kurtosis and skewness:
b2 =m4
s4− 3
b1 =m3
s3
I You have to load the library e1071
library(e1071)
?kurtosis
?skewness
kurtosis(mydata$contraception , na.rm=TRUE)
skewness(mydata$contraception , na.rm=TRUE)
![Page 28: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/28.jpg)
Frequency distribution - categorical data
I Categorical variables are measures on a nominal scale i.e.where you use labels.
I The values that can be taken are called levels.
I Categorical variables have no numerical meaning, but are oftencoded for easy of data entry and processing in spreadsheets.
I For example, gender is often coded where male=1 andfemale=2. Data can thus be entered as characters (e.g.’normal’) or numeric (e.g. 0, 1, 2).
![Page 29: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/29.jpg)
Frequency distribution - categorical data
![Page 30: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/30.jpg)
Frequency distribution - numerical data
I Use the Script Window to obtain the frequency distribution.
I First load the library agricolae, then get the stats from thehistogram, then use table.freq
library(agricolae)
h = hist(mydata$contraception ,
right=FALSE , plot=FALSE)
table.freq(h)
![Page 31: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/31.jpg)
Frequency distribution - numerical data
![Page 32: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/32.jpg)
Modifying the dataset: Compute a new variable
I Data ⇒ Manage variables in active dataset ⇒ computenew variables
I Enter new variable name
I An expression (equation) is written to reflect the calculationrequired.
![Page 33: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/33.jpg)
Modifying the dataset: Compute a new variable
The table below indicates the operators available and examples ofhow it could be used. Double clicking on a variable in the currentvariables box will send the variable to the expression.
![Page 34: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/34.jpg)
Converting numeric variables to factors
I Data ⇒ Manage variables in active dataset ⇒ Convertnumeric variables to factors
I Select the variables.
![Page 35: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/35.jpg)
Converting numeric variables to factors
I You can generate a new variable by entering a name in boxnew variable name or over-write the original name.
1. The levels can be formatted as Levels by selecting usenumbers
2. Recoded to a name by selecting supply level names
I OK
![Page 36: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/36.jpg)
Sub-dividing data by columns (variables)
I Data ⇒ active dataset ⇒ subset active dataset
I Hold the CTRL key to select the variables you wish to keep
I Give the new dataset a name
![Page 37: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/37.jpg)
Sub-dividing data by rows (and variables if you wish)
I Data ⇒ active dataset ⇒ subset active dataset
I Select the variables you wish to include in the new dataset
I Write a subset expression which is a rule to drive theselection of rows
![Page 38: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/38.jpg)
Sub-dividing data by rows (and variables if you wish)
Note: If you use a name in an expression you need to surround thename with double quotes e.g. ”name”Example: GENDER == "Female" & AGE ≤ 25
![Page 39: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/39.jpg)
Plot time seriesNote: Time series are plotted with a different method with respectto usual variables.Example: Simulate 24 observations from a given time series. Plotobservations.
x = rnorm (24) + 100
plot(ts(x,start =1992) , ylab="levels")
![Page 40: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/40.jpg)
DotPlots I
Example: Simulate 100 observations from a time series given twoyears.Note: Better use the library lattice
thing = data.frame(rnorm (100,10,2),
c(rep("A" ,50),rep("B" ,50)))
colnames(thing) <- c("Returns","Year")
X11()
dotchart(thing$Returns , xlab="Returns")
X11()
dotplot(thing$Returns ∼ thing$Year ,
ylab="Returns", xlab="years")
![Page 41: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/41.jpg)
DotPlots II
![Page 42: Univariate data analysis - UC3Mhalweb.uc3m.es/esp/Personal/personas/jmmarin/esp/MasterMana/Pract1.pdf · coded for easy of data entry and processing in spreadsheets. I For example,](https://reader030.fdocuments.us/reader030/viewer/2022040621/5d0c90e688c99358558b6e75/html5/thumbnails/42.jpg)
DotPlots III