1 Descriptive Stats R

download 1 Descriptive Stats R

of 7

Transcript of 1 Descriptive Stats R

  • 8/16/2019 1 Descriptive Stats R

    1/7

    Descriptive Statistics with R 

    1. Initial steps

    1. Read in the data

    2. What variables are in the file?2. Measures of central tendency

    1. What to use when?

    2. Sum and mean3. Median

    . Mode

    !. "rimmed mean to remove influence of outliers

    3. Measures of variability1. Ran#e

    2. $uartiles and I$R

    3. %ariance and sd. Mean absolute deviation& median absolute deviation

    . Measures of shape

    1. S'ewness2. (urtosis

    !. Summary of a variable

    ). *escribin# a data frame1. *escriptive statistics separately for each #roup

    2. Summari+in# an entire dataframe

    ,. Standard scores -+

    Initial StepsRead in the data 

    • use the load- function

    • read.table or read.csv

    > setwd("~/Documents/statistics/probability_and_statistics_with_R/navarro_datasets")> load("afsmall.Rdata")

    What variables are in the fle?

    "wo ways/• use head-

    • load lsr pac'a#e and use who- function

    > library(lsr)> who()  !ame lass #i$e   af.%nalists &actor '

    af.marins numeric *+,- inteer *

  • 8/16/2019 1 Descriptive Stats R

    2/7

    Measures of central tendency

    What to use when

     Measure Data type

    Mean Ratio& Interval

    Median 0rdinal -usually& also Ratio& Interval

    Mode ominal -usually& also 0rdinal& Ratio& Interval

    Sum and mean> sum(af.marins)* ,0*1> sum(af.marins*23) 4 sum o& a subset o& data* *51> sum(af.marins*23) / 3 4 mean o& a subset o& data* 1,.,> mean(- 6 af.marins) 4 - is the arument passed to mean()* 13.1**'

    Mediansa#e/

    ordinal data• ratio data

    • interval data

    or median& first sort/> sort(- 6 af.marins)  * * * * * 0 0 1 1 1 1 1 1 1 1 ' ' 3 0 , + + 5 5 5 5 5 7 7 7 7 7 7 * * * * * ...> median(- 6 af.marins)* 1.3

    ModeWho has played the most finals?> print(afl.finalists)  * 8awthorn 9elbourne arlton 9elbourne

    3 8awthorn arlton 9elbourne arlton...

    4et a fre5uency table/> table(afl.finalists)

  • 8/16/2019 1 Descriptive Stats R

    3/7

    af.%nalists  :delaide ;risbane arlton ollinwood

    0, 03 0, 05 modeB&(- 6 af.%nalists)* "eelon"> ma-=reC(-6af.%nalists)* 17

    Trimmed mean to remove inuence o outliers> dataset c(*3E0E1E'E3E,E+E5E7E*0)> mean(-6dataset)* '.*> median(-6dataset)* 3.3> mean(-6datasetE trim6.*) 4 trim by *F one value on either side* 3.3 4 trimmed mean is same as median

    or afl.mar#ins dataset/> mean(-6af.marinsE trim6.3)* 11.+3

    Measures of variability

    > rane(af.marins)* **,> Cuantile(- 6 af.marinsE probs 6 c(.03E .+3)) 4 ives 03th and +3th percentile  03F +3F*0.+3 3.3> GHR(- 6 af.marins) 4 tells where the middle hal& o& data sits* 1+.+3

    > var(af.marins)* ,+7.51'3> sd(af.marins)* 0,.+1,'> mean(abs(af.marins I mean(af.marins))) 4 mean absolute deviation* 0*.**0'> mad(af.marins) 4 median absolute deviation* 05.7*+

  • 8/16/2019 1 Descriptive Stats R

    4/7

    Measures of shape

    S'ewness -measure of asymmetry and 'urtosis/> library(psych)> sJew(-6af.marins)

    * .+,+*333 4 the data are Cuite sJewed> Jurtosi(-6af.marins) 4 note the spellinK* .07,0,11

    Summary of a variable

    > summary(obLect 6 af.marins) 4 arument is numeric  9in. *st Hu. 9edian 9ean 1rd Hu. 9a-.

    . *0.+3 1.3 13.1 3.3 **,.> summary(obLect 6 af.%nalists) 4 arument is a &actor  :delaide ;risbane arlton ollinwood

    0, 03 0, 05 &0 as.character(af.%nalists) 4 &actor to character vector

    > summary(obLect 6 &0)  Menth lass 9ode' character character

    > describe(- 6 af.marins)  var n mean sd median trimmed mad min ma- rane sJew Jurtosis se* * *+, 13.1 0,.+ 1.3 10.50 05.7* **, **, .++ .1 *.7+

    or a lo#ical vector/

    e.#. how many blowouts were there?6lowout 7 a #ame in which the winnin# mar#in e8ceeds !9 points.

    > blowouts af.marins > 3> blowouts  * NRO< =:M#< NRO< =:M#< =:M#< =:M#< =:M#< NRO< =:M#< =:M#< =:M#< NRO< =:M#< *' =:M#< NRO< =:M#< =:M#< =:M#< =:M#< NRO< =:M#< =:M#< =:M#< =:M#< =:M#< =:M#< P> summary(obLect 6 blowouts)  9ode =:M#< NRO< !:Qsloical *10 ''

  • 8/16/2019 1 Descriptive Stats R

    5/7

    Describing a dataframe

    > load("clinicaltrial.Rdata")> who(NRO by(data 6 clin.trialE G!DG

  • 8/16/2019 1 Descriptive Stats R

    6/7

      9ean 2.+0001rd Hu.2*.19a-. 2*.+

    clin.trialtherapy2 ;N  dru therapy mood.ainplacebo 21 no.therapy2 9in. 2.1

    an-i&ree21 ;N 27 *st Hu.2.5 Loy$epam21 9edian 2*.*9ean 2*.''1rd Hu.2*.19a-. 2*.5

    se aggregate() to #roup multiple variables.

    e.#. areate(&ormula6mood.ain ~ dru T therapyE data 6 clin.trialE =O! 6 sd)  dru therapy mood.ain* placebo no.therapy .00 an-i&ree no.therapy .01 Loy$epam no.therapy .05*,,,' placebo ;N .13 an-i&ree ;N .05*,,,, Loy$epam ;N .0,'3+3*

    Summariing an entire datarame> summary(clin.trial)  dru therapy mood.ainplacebo 2, no.therapy27 9in. 2.*an-i&ree2, ;N 27 *st Hu.2.'03

     Loy$epam2, 9edian 2.539ean 2.55111rd Hu.2*.19a-. 2*.5

    > describe(-6clin.trial) 4 load psych pacJae %rst  var n mean sd median trimmed mad min ma- rane sJew Jurtosis sedruS * *5 0. .5' 0. 0. *.'5 *. 1. 0. . *.,, .0therapyS 0 *5 *.3 .3* *.3 *.3 .+' *. 0. *. . 0.** .*0mood.ain 1 *5 .55 .31 .53 .55 .,+ .* *.5 *.+ .*1 *.'' .*1

  • 8/16/2019 1 Descriptive Stats R

    7/7

    Standard scores (z)

    > - c(1E*E5E'E7E**E,)> mean(-)* +.053+*'> sd(-)* 1.17'0'> $ ((* mean(-)) / sd(-)) 4 calculatin $ score &or *> $* .5710,3

    "o calculate the percentile ran' of the +=score& use pnorm()/> pnorm(.571,03)* .5*'55*

    Interpretation/

    • + 7 9.>39. "he individual score is 9.> sd above the mean.

    •  pnorm value/ If 19 had been a score for la+iness& then that individual is la+ier than >1.@ of the

     people sampled.

    Handling missing values> partial.data c(*E 0E !:E 1)> mean(- 6 partial.data)* !:> mean(- 6 partial.dataE na.rm 6 NRO