1 Descriptive Stats R
-
Upload
anonymous-d70cgo -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Descriptive Stats R
-
8/16/2019 1 Descriptive Stats R
1/7
Descriptive Statistics with R
1. Initial steps
1. Read in the data
2. What variables are in the file?2. Measures of central tendency
1. What to use when?
2. Sum and mean3. Median
. Mode
!. "rimmed mean to remove influence of outliers
3. Measures of variability1. Ran#e
2. $uartiles and I$R
3. %ariance and sd. Mean absolute deviation& median absolute deviation
. Measures of shape
1. S'ewness2. (urtosis
!. Summary of a variable
). *escribin# a data frame1. *escriptive statistics separately for each #roup
2. Summari+in# an entire dataframe
,. Standard scores -+
Initial StepsRead in the data
• use the load- function
• read.table or read.csv
> setwd("~/Documents/statistics/probability_and_statistics_with_R/navarro_datasets")> load("afsmall.Rdata")
What variables are in the fle?
"wo ways/• use head-
• load lsr pac'a#e and use who- function
> library(lsr)> who() !ame lass #i$e af.%nalists &actor '
af.marins numeric *+,- inteer *
-
8/16/2019 1 Descriptive Stats R
2/7
Measures of central tendency
What to use when
Measure Data type
Mean Ratio& Interval
Median 0rdinal -usually& also Ratio& Interval
Mode ominal -usually& also 0rdinal& Ratio& Interval
Sum and mean> sum(af.marins)* ,0*1> sum(af.marins*23) 4 sum o& a subset o& data* *51> sum(af.marins*23) / 3 4 mean o& a subset o& data* 1,.,> mean(- 6 af.marins) 4 - is the arument passed to mean()* 13.1**'
Mediansa#e/
•
ordinal data• ratio data
• interval data
or median& first sort/> sort(- 6 af.marins) * * * * * 0 0 1 1 1 1 1 1 1 1 ' ' 3 0 , + + 5 5 5 5 5 7 7 7 7 7 7 * * * * * ...> median(- 6 af.marins)* 1.3
ModeWho has played the most finals?> print(afl.finalists) * 8awthorn 9elbourne arlton 9elbourne
3 8awthorn arlton 9elbourne arlton...
4et a fre5uency table/> table(afl.finalists)
-
8/16/2019 1 Descriptive Stats R
3/7
af.%nalists :delaide ;risbane arlton ollinwood
0, 03 0, 05 modeB&(- 6 af.%nalists)* "eelon"> ma-=reC(-6af.%nalists)* 17
Trimmed mean to remove inuence o outliers> dataset c(*3E0E1E'E3E,E+E5E7E*0)> mean(-6dataset)* '.*> median(-6dataset)* 3.3> mean(-6datasetE trim6.*) 4 trim by *F one value on either side* 3.3 4 trimmed mean is same as median
or afl.mar#ins dataset/> mean(-6af.marinsE trim6.3)* 11.+3
Measures of variability
> rane(af.marins)* **,> Cuantile(- 6 af.marinsE probs 6 c(.03E .+3)) 4 ives 03th and +3th percentile 03F +3F*0.+3 3.3> GHR(- 6 af.marins) 4 tells where the middle hal& o& data sits* 1+.+3
> var(af.marins)* ,+7.51'3> sd(af.marins)* 0,.+1,'> mean(abs(af.marins I mean(af.marins))) 4 mean absolute deviation* 0*.**0'> mad(af.marins) 4 median absolute deviation* 05.7*+
-
8/16/2019 1 Descriptive Stats R
4/7
Measures of shape
S'ewness -measure of asymmetry and 'urtosis/> library(psych)> sJew(-6af.marins)
* .+,+*333 4 the data are Cuite sJewed> Jurtosi(-6af.marins) 4 note the spellinK* .07,0,11
Summary of a variable
> summary(obLect 6 af.marins) 4 arument is numeric 9in. *st Hu. 9edian 9ean 1rd Hu. 9a-.
. *0.+3 1.3 13.1 3.3 **,.> summary(obLect 6 af.%nalists) 4 arument is a &actor :delaide ;risbane arlton ollinwood
0, 03 0, 05 &0 as.character(af.%nalists) 4 &actor to character vector
> summary(obLect 6 &0) Menth lass 9ode' character character
> describe(- 6 af.marins) var n mean sd median trimmed mad min ma- rane sJew Jurtosis se* * *+, 13.1 0,.+ 1.3 10.50 05.7* **, **, .++ .1 *.7+
or a lo#ical vector/
e.#. how many blowouts were there?6lowout 7 a #ame in which the winnin# mar#in e8ceeds !9 points.
> blowouts af.marins > 3> blowouts * NRO< =:M#< NRO< =:M#< =:M#< =:M#< =:M#< NRO< =:M#< =:M#< =:M#< NRO< =:M#< *' =:M#< NRO< =:M#< =:M#< =:M#< =:M#< NRO< =:M#< =:M#< =:M#< =:M#< =:M#< =:M#< P> summary(obLect 6 blowouts) 9ode =:M#< NRO< !:Qsloical *10 ''
-
8/16/2019 1 Descriptive Stats R
5/7
Describing a dataframe
> load("clinicaltrial.Rdata")> who(NRO by(data 6 clin.trialE G!DG
-
8/16/2019 1 Descriptive Stats R
6/7
9ean 2.+0001rd Hu.2*.19a-. 2*.+
clin.trialtherapy2 ;N dru therapy mood.ainplacebo 21 no.therapy2 9in. 2.1
an-i&ree21 ;N 27 *st Hu.2.5 Loy$epam21 9edian 2*.*9ean 2*.''1rd Hu.2*.19a-. 2*.5
se aggregate() to #roup multiple variables.
e.#. areate(&ormula6mood.ain ~ dru T therapyE data 6 clin.trialE =O! 6 sd) dru therapy mood.ain* placebo no.therapy .00 an-i&ree no.therapy .01 Loy$epam no.therapy .05*,,,' placebo ;N .13 an-i&ree ;N .05*,,,, Loy$epam ;N .0,'3+3*
Summariing an entire datarame> summary(clin.trial) dru therapy mood.ainplacebo 2, no.therapy27 9in. 2.*an-i&ree2, ;N 27 *st Hu.2.'03
Loy$epam2, 9edian 2.539ean 2.55111rd Hu.2*.19a-. 2*.5
> describe(-6clin.trial) 4 load psych pacJae %rst var n mean sd median trimmed mad min ma- rane sJew Jurtosis sedruS * *5 0. .5' 0. 0. *.'5 *. 1. 0. . *.,, .0therapyS 0 *5 *.3 .3* *.3 *.3 .+' *. 0. *. . 0.** .*0mood.ain 1 *5 .55 .31 .53 .55 .,+ .* *.5 *.+ .*1 *.'' .*1
-
8/16/2019 1 Descriptive Stats R
7/7
Standard scores (z)
> - c(1E*E5E'E7E**E,)> mean(-)* +.053+*'> sd(-)* 1.17'0'> $ ((* mean(-)) / sd(-)) 4 calculatin $ score &or *> $* .5710,3
"o calculate the percentile ran' of the +=score& use pnorm()/> pnorm(.571,03)* .5*'55*
Interpretation/
• + 7 9.>39. "he individual score is 9.> sd above the mean.
• pnorm value/ If 19 had been a score for la+iness& then that individual is la+ier than >1.@ of the
people sampled.
Handling missing values> partial.data c(*E 0E !:E 1)> mean(- 6 partial.data)* !:> mean(- 6 partial.dataE na.rm 6 NRO