Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R...

43
Overview Introduction Analysing data: The iris data example Using R for data analysis Daniel M¨ ullensiefen Goldsmiths, University of London August 18, 2009 Daniel M¨ ullensiefen Goldsmiths, University of London Using R for data analysis

Transcript of Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R...

Page 1: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Using R for data analysis

Daniel MullensiefenGoldsmiths, University of London

August 18, 2009

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 2: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

IntroductionWhat’s it good for?R and its competitorsCore characteristicsHistory

Analysing data: The iris data exampleGetting data inSummarising data

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 3: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 4: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 5: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 6: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 7: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process models

I Pre-processing data from different sourcesI textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 8: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)

I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 9: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio files

I databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 10: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databases

I texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 11: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is good for

I Flexible Data Analysis (programmable)

I Using different analysis techniques

I Data Visualisation

I Numeric Accuracy

I Rapid prototyping of analysis / process modelsI Pre-processing data from different sources

I textfiles (.txt) and binary files (e.g. SPSS .sav, Excel)I Audio filesI databasesI texts (linguistic data)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 12: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is considered less good for

I Graphical User Interfaces

I Internet programming

I Low-level programming

I ...

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 13: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is considered less good for

I Graphical User Interfaces

I Internet programming

I Low-level programming

I ...

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 14: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is considered less good for

I Graphical User Interfaces

I Internet programming

I Low-level programming

I ...

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 15: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R is considered less good for

I Graphical User Interfaces

I Internet programming

I Low-level programming

I ...

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 16: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R compares to

I Matlab (open source, community driven, not commercial)

I SPSS, SAS, Stata (programming language, not program)

I Weka (driven by community, not individuals)

I SciPy and other software libraries (entire language specialisedfor data analysis)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 17: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R compares to

I Matlab (open source, community driven, not commercial)

I SPSS, SAS, Stata (programming language, not program)

I Weka (driven by community, not individuals)

I SciPy and other software libraries (entire language specialisedfor data analysis)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 18: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R compares to

I Matlab (open source, community driven, not commercial)

I SPSS, SAS, Stata (programming language, not program)

I Weka (driven by community, not individuals)

I SciPy and other software libraries (entire language specialisedfor data analysis)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 19: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

R compares to

I Matlab (open source, community driven, not commercial)

I SPSS, SAS, Stata (programming language, not program)

I Weka (driven by community, not individuals)

I SciPy and other software libraries (entire language specialisedfor data analysis)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 20: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

Pros and Cons

I Huge community support

I Cross-plattform and command-line based

I Interactive: interpreted not complied

I Mainly functional

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 21: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

Pros and Cons

I Huge community support

I Cross-plattform and command-line based

I Interactive: interpreted not complied

I Mainly functional

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 22: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

Pros and Cons

I Huge community support

I Cross-plattform and command-line based

I Interactive: interpreted not complied

I Mainly functional

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 23: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

Pros and Cons

I Huge community support

I Cross-plattform and command-line based

I Interactive: interpreted not complied

I Mainly functional

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 24: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

How R came about

I 1976: John Chambers releases 1st version of S: Language forstatistics, stochastic simulation and data visualisation

I 1995: Ross Ihaka and Robert Gentleman release R as GPL

I 1998: Comprehensive R Archive Network (CRAN) founded

I 2001: R News published for 1st time

I 2004: 1st useR! conference

I 2009: More than 1000 packages available on CRAN

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 25: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

How R came about

I 1976: John Chambers releases 1st version of S: Language forstatistics, stochastic simulation and data visualisation

I 1995: Ross Ihaka and Robert Gentleman release R as GPL

I 1998: Comprehensive R Archive Network (CRAN) founded

I 2001: R News published for 1st time

I 2004: 1st useR! conference

I 2009: More than 1000 packages available on CRAN

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 26: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

How R came about

I 1976: John Chambers releases 1st version of S: Language forstatistics, stochastic simulation and data visualisation

I 1995: Ross Ihaka and Robert Gentleman release R as GPL

I 1998: Comprehensive R Archive Network (CRAN) founded

I 2001: R News published for 1st time

I 2004: 1st useR! conference

I 2009: More than 1000 packages available on CRAN

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 27: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

How R came about

I 1976: John Chambers releases 1st version of S: Language forstatistics, stochastic simulation and data visualisation

I 1995: Ross Ihaka and Robert Gentleman release R as GPL

I 1998: Comprehensive R Archive Network (CRAN) founded

I 2001: R News published for 1st time

I 2004: 1st useR! conference

I 2009: More than 1000 packages available on CRAN

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 28: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

How R came about

I 1976: John Chambers releases 1st version of S: Language forstatistics, stochastic simulation and data visualisation

I 1995: Ross Ihaka and Robert Gentleman release R as GPL

I 1998: Comprehensive R Archive Network (CRAN) founded

I 2001: R News published for 1st time

I 2004: 1st useR! conference

I 2009: More than 1000 packages available on CRAN

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 29: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

What’s it good for?R and its competitorsCore characteristicsHistory

How R came about

I 1976: John Chambers releases 1st version of S: Language forstatistics, stochastic simulation and data visualisation

I 1995: Ross Ihaka and Robert Gentleman release R as GPL

I 1998: Comprehensive R Archive Network (CRAN) founded

I 2001: R News published for 1st time

I 2004: 1st useR! conference

I 2009: More than 1000 packages available on CRAN

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 30: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Basic data in and out

I Start R

I Save file fromhttp://www.doc.gold.ac.uk/˜mas03dm/teaching/r/iris.data.txtto R‘s working directory (using getwd()

I Get data into R using the read.table command (usefuloperations help(read.table) and assignment operator“←”)

I Change the species label of the 3rd observations to your ownfirst name (using the indexing function [ , ]), save thisdataset (using write.table())

I Remove the altered dataset (using rm()) and get the originaldataset in again

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 31: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Basic data in and out

I Start R

I Save file fromhttp://www.doc.gold.ac.uk/˜mas03dm/teaching/r/iris.data.txtto R‘s working directory (using getwd()

I Get data into R using the read.table command (usefuloperations help(read.table) and assignment operator“←”)

I Change the species label of the 3rd observations to your ownfirst name (using the indexing function [ , ]), save thisdataset (using write.table())

I Remove the altered dataset (using rm()) and get the originaldataset in again

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 32: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Basic data in and out

I Start R

I Save file fromhttp://www.doc.gold.ac.uk/˜mas03dm/teaching/r/iris.data.txtto R‘s working directory (using getwd()

I Get data into R using the read.table command (usefuloperations help(read.table) and assignment operator“←”)

I Change the species label of the 3rd observations to your ownfirst name (using the indexing function [ , ]), save thisdataset (using write.table())

I Remove the altered dataset (using rm()) and get the originaldataset in again

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 33: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Basic data in and out

I Start R

I Save file fromhttp://www.doc.gold.ac.uk/˜mas03dm/teaching/r/iris.data.txtto R‘s working directory (using getwd()

I Get data into R using the read.table command (usefuloperations help(read.table) and assignment operator“←”)

I Change the species label of the 3rd observations to your ownfirst name (using the indexing function [ , ]), save thisdataset (using write.table())

I Remove the altered dataset (using rm()) and get the originaldataset in again

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 34: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Basic data in and out

I Start R

I Save file fromhttp://www.doc.gold.ac.uk/˜mas03dm/teaching/r/iris.data.txtto R‘s working directory (using getwd()

I Get data into R using the read.table command (usefuloperations help(read.table) and assignment operator“←”)

I Change the species label of the 3rd observations to your ownfirst name (using the indexing function [ , ]), save thisdataset (using write.table())

I Remove the altered dataset (using rm()) and get the originaldataset in again

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 35: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 36: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 37: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 38: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 39: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 40: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 41: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

Data summary and plots

I Summarise dataset (summary(), str() )

I Plot 1st column vs 2nd column (plot())

I Attach dataset to search path (attach())

I Plot Species vs Petal.Width and give graph a title and axesnames

I Plot histogram of Petal.Length (hist()

I Plot scattergram of full dataset(plot(dataset,col=Species))

I Add non-parametric smoother (plot(dataset,col=Species, panel=panel.smooth))

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 42: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

More plots and a function

I Do boxplot(Petal.Length Species,notch=TRUE). Whatare the notches?

I Set the graphical device to be split into a 2x2 panel: op ←par(mfrow = c(2,2)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis

Page 43: Using R for data analysis - Goldsmiths, University of Londonmas03dm/papers/r_intro.pdf · Using R for data analysis Daniel Mullensiefen Goldsmiths, University of London August 18,

OverviewIntroduction

Analysing data: The iris data example

Getting data inSummarising data

More plots and a function

I Do boxplot(Petal.Length Species,notch=TRUE). Whatare the notches?

I Set the graphical device to be split into a 2x2 panel: op ←par(mfrow = c(2,2)

Daniel Mullensiefen Goldsmiths, University of London Using R for data analysis