Introductory statistics with R Dalgaard P (2002) ISBN0387954759; 267pages; £21.00, €29.95, $44.95...

1
end of each chapter. This enables readers to reproduce results, as the data and code are also available on the internet. The website listed is incorrect, but a quick internet search rectifies the situation (omitting the ‘tilde’ in the web address should make it work). Unfortunately, not all the data are available for confidentiality reasons. It would have been nicer if the authors had disguised some aspects of the data to make them available. All the chapters use examples and methodology to motivate the S-Plus code. A majority of the chapters work through examples step by step, highlighting methodological issues whilst discussing how to use S-Plus. For some of the trickier analysis, functions have been written and supplied at the end of each chapter. Some chapters present the methodology and examples in detail, treating the S-Plus aspects as secondary (and concealing the latter in an appendix in most cases). My preference was for the former style. Rather bizarrely, Chapter 11 is the only chapter that does not supply the S-Plus functions in the appendix. Instead the reader is expected to download them from the internet (not easy as I was travelling on a train at the time!). When I did check the website, two of the functions (N.HW and QuickSize.Fisher) appeared not to be available. Chapter 13 (‘Analysis of Variance: a Comparison between SAS and S-Plus’) is probably the best starting point for SAS users. It offers a direct comparison between how things are done in both packages. Output from both is discussed, with clarification of what at first site may appear to be differences between the two. In summary, I feel the book has achieved its aim to describe statistical methods used in the pharmaceutical industry and their implementation using S-Plus. It is not an introduction to statistics methods in the pharmaceutical industry or S-Plus (it does not claim to be either) and therefore readers will need some familiarity with both. I would recommend this book to M.Sc. and Ph.D. students in biostatistics and statisticians working in the pharmaceutical industry wishing to learn more about S-Plus. Absolute beginners in S-Plus would probably be better off learning and experimenting with S-Plus or R before tackling this text. Saghir Bashir SBTC Limited [email protected] (DOI: 10.1002/pst.076) Introductory Statistics with R Dalgaard P (2002) ISBN0387954759; 267pages; £21.00, h29.95, $44.95 Springer; http://www.springer.de/cgi/svcat/search book.pl?isbn= 0387954759 This book was written by one of the R Development Core Team members, and the author’s familiarity with the language is obvious. He provides the reader with information in a natural way, including numerous pieces of R code and figures. R is an implementation of the S language, originally developed by John Chambers, Richard Becker and Allan Wilks at AT&T Bell Laboratories in the 1980s. R is similar to S-Plus, a commercial package from the Insightful Corporation, in that they are both interpretations of the S language, but R is written under the GNU General Public License and is available as free software. For more information on R visit its website at http://www.R- project.org or download it from http://cran.R-project.org. There is always a danger with writing a book that depends on a specific software package, or rather the version of it just prior to printing the book. Published in 2002, this book refers to R version 1.5.0, and at the time of this review the version was already at 1.7.0. The good news is that I believe R should not change substantially in the near future and that this book is a solid introduction to not only basic data analysis and exploration, but also the fundamental concepts of the S language, as implemented in R. The author does provide a web page for the book with a brief description and errata (http://www.biostat.ku.dk/pd/ISwR.html). The first three chapters offer a nice blend of basic statistical techniques and R syntax, with brief descriptions of the numerous data sets analysed throughout the book. These chapters are sufficient to provide the reader with the necessary background for future chapters. The author then introduces additional statistical concepts, such as hypothesis testing and correlation. Here, the graphical capabilities of R begin to be realized. The ability to visualize several pieces of information in the same figure is crucial for modern-day exploratory data analysis. Contingency tables, power and sample-size calcula- tions are then briefly discussed. Two chapters are devoted to linear regression and, more generally, linear models. Again, the author provides numerous graphical summaries of the models and their diagnostics. Logistic regression and survival analysis are the last two topics covered. Appendices at the end of the book include instructions on how to download and install R, the help files for all data sets in the author’s R package, ISwR, and most importantly a compendium of basic R commands. Exercises are provided at the end of each chapter and make use of the ISwR package. Installation of the ISwR package on my SGI workstation was seamless, and reproducing R code from the book worked every time. This book is concise and published in paperback, making the price quite reasonable. I highly recommend this book for new R users and as a supplementary textbook for those teaching introductory statistics who want their students to learn a flexible programming environment. Brandon Whitcher GlaxoSmithKline (DOI: 10.1002/pst.077) Book Reviews 233 Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 229–233

Transcript of Introductory statistics with R Dalgaard P (2002) ISBN0387954759; 267pages; £21.00, €29.95, $44.95...

Page 1: Introductory statistics with R Dalgaard P (2002) ISBN0387954759; 267pages; £21.00, €29.95, $44.95 Springer;

end of each chapter. This enables readers to reproduce results,

as the data and code are also available on the internet. The

website listed is incorrect, but a quick internet search rectifies

the situation (omitting the ‘tilde’ in the web address should

make it work). Unfortunately, not all the data are available for

confidentiality reasons. It would have been nicer if the authors

had disguised some aspects of the data to make them available.

All the chapters use examples and methodology to motivate

the S-Plus code. A majority of the chapters work through

examples step by step, highlighting methodological issues whilst

discussing how to use S-Plus. For some of the trickier analysis,

functions have been written and supplied at the end of each

chapter. Some chapters present the methodology and examples

in detail, treating the S-Plus aspects as secondary (and

concealing the latter in an appendix in most cases). My

preference was for the former style.

Rather bizarrely, Chapter 11 is the only chapter that does not

supply the S-Plus functions in the appendix. Instead the reader

is expected to download them from the internet (not easy as I

was travelling on a train at the time!). When I did check the

website, two of the functions (N.HW and QuickSize.Fisher)

appeared not to be available.

Chapter 13 (‘Analysis of Variance: a Comparison between

SAS and S-Plus’) is probably the best starting point for SAS

users. It offers a direct comparison between how things are done

in both packages. Output from both is discussed, with

clarification of what at first site may appear to be differences

between the two.

In summary, I feel the book has achieved its aim to describe

statistical methods used in the pharmaceutical industry and

their implementation using S-Plus. It is not an introduction to

statistics methods in the pharmaceutical industry or S-Plus (it

does not claim to be either) and therefore readers will need

some familiarity with both.

I would recommend this book to M.Sc. and Ph.D. students in

biostatistics and statisticians working in the pharmaceutical

industry wishing to learn more about S-Plus. Absolute

beginners in S-Plus would probably be better off learning and

experimenting with S-Plus or R before tackling this text.

Saghir Bashir

SBTC Limited

[email protected]

(DOI: 10.1002/pst.076)

Introductory Statistics with R

Dalgaard P (2002)

ISBN0387954759; 267pages; £21.00, h29.95, $44.95

Springer; http://www.springer.de/cgi/svcat/search book.pl?isbn=

0387954759

This book was written by one of the R Development Core

Team members, and the author’s familiarity with the language

is obvious. He provides the reader with information in a natural

way, including numerous pieces of R code and figures. R is an

implementation of the S language, originally developed by John

Chambers, Richard Becker and Allan Wilks at AT&T Bell

Laboratories in the 1980s. R is similar to S-Plus, a commercial

package from the Insightful Corporation, in that they are both

interpretations of the S language, but R is written under the

GNU General Public License and is available as free software.

For more information on R visit its website at http://www.R-

project.org or download it from http://cran.R-project.org.

There is always a danger with writing a book that depends on

a specific software package, or rather the version of it just prior

to printing the book. Published in 2002, this book refers to R

version 1.5.0, and at the time of this review the version was

already at 1.7.0. The good news is that I believe R should not

change substantially in the near future and that this book is a

solid introduction to not only basic data analysis and

exploration, but also the fundamental concepts of the S

language, as implemented in R. The author does provide a

web page for the book with a brief description and errata

(http://www.biostat.ku.dk/�pd/ISwR.html).

The first three chapters offer a nice blend of basic statistical

techniques and R syntax, with brief descriptions of the

numerous data sets analysed throughout the book. These

chapters are sufficient to provide the reader with the necessary

background for future chapters. The author then introduces

additional statistical concepts, such as hypothesis testing and

correlation. Here, the graphical capabilities of R begin to be

realized. The ability to visualize several pieces of information in

the same figure is crucial for modern-day exploratory data

analysis. Contingency tables, power and sample-size calcula-

tions are then briefly discussed. Two chapters are devoted to

linear regression and, more generally, linear models. Again, the

author provides numerous graphical summaries of the models

and their diagnostics. Logistic regression and survival analysis

are the last two topics covered.

Appendices at the end of the book include instructions on how

to download and install R, the help files for all data sets in the

author’s R package, ISwR, and most importantly a compendium

of basic R commands. Exercises are provided at the end of each

chapter and make use of the ISwR package. Installation of the

ISwR package on my SGI workstation was seamless, and

reproducing R code from the book worked every time.

This book is concise and published in paperback, making the

price quite reasonable. I highly recommend this book for new R

users and as a supplementary textbook for those teaching

introductory statistics who want their students to learn a

flexible programming environment.

Brandon Whitcher

GlaxoSmithKline

(DOI: 10.1002/pst.077)

Book Reviews 233

Copyright # 2003 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2003; 2: 229–233