Introducing The R Software

33
1 https://xkcd.com/1478/

Transcript of Introducing The R Software

Introducing the R software: Free statistics at your fingertips

1

https://xkcd.com/1478/

Introducing the R software: Free statistics at your fingertipsKamarul Imran Musa

MD, M Community Medicine Associate Professor (Epidemiology and Statistics)Dept of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia, Health Campus Email: [email protected] 2

Overview of presentationA bit on Data and people dealing with DataStatistical software choicesOur main course ---- R -----Different flavors of ROur experiences with R at Health CampusData analysis now and future

3

Data as for now Data in future? What is data? Facts and statistics collected together for reference and analysis

4

5

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ (Harvard Business Review)

Scientist and DataAre you a scientist?Do we work with data?Scientist + Data Are we data scientist?What is new for data science in health and medicine? 6

7

What does a data scientist do?Data scientists are inquisitive:

8

http://www-01.ibm.com/software/data/infosphere/data-scientist/

Scientists use tools to work withWe need tools in scienceWhat is the right tool for a data scientist?In my case, tools to deal with dataScientists use tool or tools to manipulate data giving them results that they have to make sense of the findings Which tools are available to help scientist best work with their data?

A Statistical Software

9

Choices of statistical software many. So dont get spoiled The normal questions for scientist dealing with data analysisWhat choices do you have?Which one are you familiar with?Popularity?Cost?Capability?After-sale support?Meet scientific rigor? 10

IBM SPSS everyone knows Popular, easy and user-friendlyHow about the cost? When does the license expire? Usually for USM, every July. What does it do when it expires? NOTHING works

http://www.ibm.com/marketplace/cloud/spss-statistics/us/en-us?step=Plan

11

STATA less people know it, but it is amazing Do not expire but upgradeableMuch cheaper than SPSSBalanced betweenCodes usePoint-and-click usePowerful

http://www.stata.com/order/new/edu/single-user-licenses/

12

Who are using what software?Number of scholarly articles found in the most recent complete year (2014) for each software package.In order of # of articles:SPSSSASRSTATA

http://r4stats.com/articles/popularity/

13

The number of scholarly articles found in each year by Google Scholar. Only the top six classic statistics packages are shown.

14

No so new kid on the block R

15

What (almost) everybody knows about R?R is :GNU S, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Questions: What can R do?What is special about R?Does R have future?

16

R and R-gui https://cran.r-project.org/

17

Revolution-RMicrosoft owns Revolution-R

http://www.revolutionanalytics.com/revolution-r-enterprise

18

Revolution R Ent and Revolution R Open

19

Rstudio IDEhttps://www.rstudio.com/Highly recommend to start with Rstudio IDEIt is an interface for RRequires users to download and install R first from CRAN

20

RStudio IDE- FeaturesClean interfaceOrganizedIntegrated with many brilliant in-built tools

21

DEMO

22

How does R fit into data analysis now and in the future?

23

Recognition

24

Reproducibility (DEMO)

25

Reproducibility Reproducibility in researchThe Associate Editor for reproducibility (AER) will handle submissions of reproducible articles. Data: The analytic data from which the principal results were derived are made available on the journal's Web site. Code: Any computer code, software, or other computer instructions that were used to compute published results are provided. Reproducible: An article is designated as reproducible if the AER succeeds in executing the code on the data provided and produces results matching those that the authors claim are reproducible. http://biostatistics.oxfordjournals.org/content/10/3/405.full26

On the fly report using R-markdown (DEMO)Produce report on a flyIn HTML or PDF formatsBenefitsSave timeReduce error no more copy pastePretty 27

Integration with other software (DEMO)LatexStataWinBUGSSPSSSAS

28

Our experience with RNo experience with undergraduateStarted teaching R for DrPH candidates this academic sessionPersonally introduced R, 2 years agoCommon resistanceTotally command-drivenSteep learning curveLimited resources esp books on R --- that was 2 years ago. Not a problem nowYou need to know your statistics Not for data entryVery difficult to view and manipulate variables 29

Hows the feedback from users?No formal study or assessment on their experienceUsers seem to like R because it opens up creativityR pushes users to explore more and challenge themselvesR is not boring like point-and-click (menu driven) softwareThey seem to like R-markdownOn-the-fly report

30

The BIG question--- stick to R? .. And R only?Yes, you mayHmm, maybe notSpecialized software for data entrySoftware for data cleaningSoftware for data mining

But yes, 1 software is enough for 95% of us31

Embrace R and abandon others? I love RLots of data analysis Creating publication : HTML, PDFSpatial data analysisBayesianWINBUGSINLABut I do love Stata tooData cleaningVariable manipulationAnd I use Epidata for data entryBut yes, I have left SPSS 32

Are two better than one?

East-coast data science user groupMy blog : https://designdataanalysis.wordpress.com 33