Applied Statistics Using SPSS, STATISTICA, MATLAB and R, 2nd Edition by Joaquim P. Marques de Sá

International Statistical Review (2007), 75, 3, 409–438 doi:10.1111/j.1751-5823.2007.00030.x

Short Book ReviewsEditor: Simo Puntanen

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and UsageZdravko Markov, Daniel T. LaroseWiley, 2007, xvi + 218 pages, £ 38.95 / € 56.00, hardcoverISBN: 978-0-471-66655-4

Table of contents

Part I: Web structure mining Part III: Web usage mining

1. Information retrieval and web search 6. Introduction to web usage mining

2. Hyperlink-based ranking 7. Preprocessing for web usage mining

Part II: Web content mining 8. Exploratory data analysis for web usage mining

3. Clustering 9. Modeling for web usage mining:

4. Evaluating clustering clustering, association, and classification

5. Classification

Readership: Graduate and advanced undergraduate students in computer science and engineer-ing, web designers and researchers.

This book is the third volume in a data mining series, with the two previous ones, authored byDaniel Larose, entitled Discovering Knowledge in Data: An introduction to Data Mining andData Mining Methods and Models. The book stands independently of the earlier two, addressingthe particular data mining topics of the web. It is divided into three parts concerned with,respectively, mining the structure, content, and usage of the web.

Web mining is an increasingly important sub-domain of data mining. It has multiple aspects– and, of course, intersects heavily with other application domains of data mining. This bookaims to explain the operation of the relevant data mining algorithms and tools by leading thereader through applications of them to small data sets, and then illustrating them by applyingthe applications to real large data sets. The authors have used the Weka software system (apublic domain data analysis collection) and SPSS Clementine to illustrate. An extensive rangeof exercises is included.

The book is supported by a website. Although not complete at the time of the review, the aimis to include the data sets used in the book and links to other data mining resources. For facultywho adopt the book for teaching, there is intended to be a further restricted-access website whichincludes solutions to the exercises, Powerpoint presentations of each chapter, sample data miningprojects, and other resources.

Overall, the book is highly accessible and clearly presented. In my view it provides an excellentintroduction to mining the web.

David J. Hand

Mathematics Department, Imperial College

London SW7 2AZ, UK

E-mail: [email protected]

C© 2007 The Author. Journal compilation C© 2007 International Statistical Institute. Published by Blackwell Publishing Ltd, 9600 Garsington Road,

Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

410 SHORT BOOK REVIEWS

Dynamic Data Assimilation: A Least Squares ApproachJohn M. Lewis, S. Lakshmivarahan, Sudarshan DhallCambridge University Press, 2006, xxii + 654 pages, £ 80.00, hardcoverISBN: 978-0-521-85155-8

Table of contents

1. Synopsis 18. Data assimilation-static models:

2. Pathways into data assimilation: concepts and formulation

illustrative examples 19. Classical algorithms for data assimilation

3. Applications 20. 3DVAR – a Bayesian formulation

4. Brief history of data assimilation 21. Spatial digital filters

5. Linear least squares estimation: 22. Dynamical data assimilation:

method of normal equations the straight line problem

6. A geometric view: 23. First-order adjoint method:

projection and invariance linear dynamics

7. Nonlinear least squares estimation 24. First-order adjoint method:

8. Recursive least squares estimation nonlinear dynamics

9. Matrix methods 25. Second-order adjoint method

10. Optimization: steepest descent method 26. The ADVAR problem:

11. Conjugate direction/gradient methods a statistical and a recursive view

12. Newton and quasi-Newton methods 27. Linear filtering – Part I: Kalman filter

13. Principles of statistical estimation 28. Linear filtering – Part II

14. Statistical least squares estimation 29. Nonlinear filtering

15. Maximum likelihood method 30. Reduced rank filters

16. Bayesian estimation method 31. Predictability: a stochastic view

17. From Gauss to Kalman: sequential, linear 32. Predictability: a deterministic view

minimum variance estimation

Readership: First year graduates in meteorology, physics, industrial engineering, petroleum andgeological engineering, and computer science, but also for researchers and practictioners.

This book is about combining observations with dynamic models. The book is divided intoeight parts, beginning with examples and history, working through deterministic approachesand computational techniques, covering statistical issues (at a fairly basic level, to make thebook self-contained, covering least squares and maximum likelihood estimation, as well asBayesian methods), moving onto stochastic and dynamic models, and finishing with twochapters on predictability. Although the practical examples in the book emphasize oceanography,meteorology, and related areas, the techniques described in the book apply equally to a largenumber of other problems and situations. The book is comprehensive and extensive and providesan excellent core text and reference work on ‘dynamic data assimilation’, regardless of theparticular application domain.

The authors encourage the use of MATLAB for classroom instruction, and there are exercises,though these are limited in number and would need to be supplemented if the course was to beused as a text.

David J. Hand


London SW7 2AZ, UK


International Statistical Review (2007), 75, 3, 409–438C© 2007 The Author. Journal compilation C© 2007 International Statistical Institute

SHORT BOOK REVIEWS 411

Fundamentals of Clinical Research: Bridging Medicine, Statistics and OperationsAntonella Bacchieri, Giovanni Della CioppaSpringer, 2007, xxv + 343 pages, US$ 99 / € 76.92, hardcoverISBN: 978-88-470-0491-7

Table of contents

1. Variability of biological phenomena 9. Experimental design: the randomized

and measurement errors blinded study as an instrument to

2. Distinctive aspects of a biomedical study: reduce bias

observational and experimental studies 10. Experimental designs

3. Observational studies 11. Study variants applicable to

4. Defining the treatment effect more than one type of design:

5. Probability, inference and decision making equivalence studies, interim analyses,

6. The choice of the sample adaptive plans and repeated measurements

7. The choice of treatments 12. The drug development process and

8. Experimental design: fallacy of ‘before-after’ the phases of clinical sx research

comparisons in uncontrolled studies

Readership: The authors do not describe their intended audience but I would guess that he or sheis the clinical researcher who wants an intelligent discussion on the (mainly) statistical aspectsof clinical research. It would make a useful background reading for a course in clinical trials.However, this would need to be supplemented by a more practical book, even for those whoare only consumers of research, which explains how to interpret what is actually published inclinical research.

This book was originally published in Italian in 2004. I think it exemplifies the contrast betweenthe more philosophical approach of southern Europe, with the more pragmatic Anglo-Saxonapproach of many UK/US books. The objectives of the book have been summarized by theauthors as: Integrate medical and statistical components of clinical research; Do justice to theoperational and practical requirements of clinical research; Give space to the ethical implicationsof methodological issues in clinical research. The book is well written. There are few issues withwhich I would disagree. Given the background of the authors, clinical research is taken tomean largely clinical trials, particularly pharmacological clinical trials with only one chapterdevoted to observational studies. The book is essentially an extended essay on the applicationof statistical inference in clinical research. There are no worked examples and no exercises. Thelevel of mathematics is kept deliberately simple. There are very few examples from the medicalliterature and some of these are fictitious. An example of the style is the discussion of theproblems of sample size calculations. The authors discuss power and give the usual diagram ofoverlapping normal distributions for the null and alternative hypothesis. They then discuss nineindividual components of a power calculation. However, after this the reader would be unable tocarry out a sample size calculation, and no real example is given. They don’t even give the ‘16over delta squared’ rule.

The longest chapter is that discussing probability, inference and decision making, whichalso discusses frequentist and Bayesian methods. Confidence intervals are explained, but don’tfeature in the index, and estimation is not emphasized. The concept of maximum likelihoodis introduced, but not tied in with data analysis. There are four chapters on designs of



clinical trials and a final chapter on the drug development process and the phases of clinicaltrials.

M. J. Campbell

Medical Statistics Group, ScHARR, University of Sheffield

Regent’s Court, 30 Regent St., Sheffield S1 4DA, UK


Bayesian Process Monitoring, Control and OptimizationEnrique del Castillo, Bianca M. Colosimo (Editors)Chapman & Hall/CRC, 2007, 336 pages, US$ 89.95, hardcoverISBN: 978-1-58488-544-3

Table of contents

1. An introduction to Bayesian inference in 7. Bayes’ rule of information and monitoring

process monitoring, control and optimization in manufacturing integrated circuits

2. Modern numerical methods in Bayesian 8. A Bayesian approach to signal analysis

computation of pulse trains

3. A Bayesian approach to statistical 9. Bayesian approaches to process monitoring

process control and process adjustment

4. Empirical Bayes process monitoring 10. A review of Bayesian reliability approaches

techniques to multiple response surface optimization

5. A Bayesian approach to monitoring the mean 11. An application of Bayesian statistics to

of a multivariate normal process sequential empirical optimization

6. Two-sided Bayesian X control charts 12. Bayesian estimation from saturated

for short production runs factorial designs

Readership: Applied statisticians working in industry, process engineers and quality engineersworking in manufacturing, academics working in industrial and manufacturing engineering,applied statistics, and operations research departments.

This book provides an up-to-date survey of the applications of Bayesian statistics in threespecific fields of industrial engineering and industrial statistics: process monitoring, processcontrol, and process optimization. The adoption of Bayesian techniques in these areas haslagged behind other application areas such as biostatistics and econometrics. This book willhelp to close that gap. There will be a steep learning curve for many industrial engineers andstatisticians learning Bayesian methods, but there will be a good payoff for those who make theeffort.

Bill Bolstad

Department of Statistics, University of Waikato

Private Bag 3105, Hamilton, New Zealand




Optimum Experimental Designs, with SASAnthony Atkinson, Alexander Donev, Randall TobiasOxford University Press, 2007, xvi + 511 pages, £ 35.00 / € 56, softcover (€ 114 hardcover)ISBN: 978-0-19-929660-6

Table of contents

I Background 14. Experiments with both qualitative and

1. Introduction quantitative factors

2. Some key ideas 15. Blocking response surface designs

3. Experimental strategies 16. Mixture experiments

4. The choice of a model 17. Nonlinear models

5. Models and least squares 18. Bayesian optimum designs

6. Criteria for a good experiment 19. Design augmentation

7. Standard designs 20. Model checking and designs for

8. The analysis of experiments discriminating between models

II Theory and applications 21. Compound design criteria

9. Optimum design theory 22. Generalized linear models

10. Criteria of optimality 23. Response transformation and structured

11. D-optimum designs variances

12. Algorithms for the construction of 24. Time-dependent models with correlated

exact D-optimum designs observations

13. Optimum experimental design 25. Further topics

with SAS 26. Exercises

Readership: Together with its many examples, the book will be a very valuable aid for allresearchers and students who have to deal with, or are interested in, the efficient design ofstatistical experiments.

The book is a major revision, and considerable extension, of the successful 1992 monographOptimum Experimental Designs by the first two authors. Whereas the old version covered328 pages, the new book comprises 511 pages. The section on References has grown from 9 to24 pages, indicating that much of the growth is due to bringing the material up to date.

The contents is presented in two parts, offering a very apt blend of theoretical development andpractical examples. The eight chapters of Part I provide the background material, such as modelchoice, least squares method, and design standards, all aspects being illustrated with real-worldexamples.

Emphasis is on Part II, where the 13 chapters range from abstract optimum design criteria toconcrete algorithms for design construction. Much of the algorithmic discussion centers aroundthe SAS software, as reflected by the extended title of the new book. Presumably the extensionis owed to the third author who is affiliated with the SAS Institute. Readers who are no SAS fanswill easily translate the methods to R or some other high-level statistical computing language.

The re-arrangement and extension of the material that grew out of the practically andcomputionally oriented approach includes model discrimination, compound criteria, generalizedlinear models, time-dependent models, and more. It is the view towards applications and designcalculations which makes this book unique.

Friedrich Pukelsheim

Institut fur Mathematik, Universitat Augsburg

DE-86135 Augsburg, Germany




An Introduction to Categorical Data Analysis, 2nd EditionAlan AgrestiWiley, 2007, xvii + 372 pages, £ 52.95 / € 76.00, hardcoverISBN: 978-0-471-22618-5

Table of contents

1. Introduction 7. Loglinear models for contingency tables

2. Contingency tables 8. Models for matched pairs

3. Generalized linear models 9. Modeling correlated, clustered responses

4. Logistic regression 10. Random effects: Generalized linear

5. Building and applying logit mixed models

and loglinear models 11. A historical tour of categorical

6. Multicategory logit models data analysis

Readership: Students of statistics and biostatistics, applied statisticians, research workers in thebiomedical and social sciences, practicing scientists involved in data analyses.

The previous edition (1996) [Short Book Reviews, Vol. 16, No 3, 1996] of this very popular texthas been used with a large number of students and as a result this revised edition contains evenmore exercises (nearly 300), most of them requiring computer solutions. Writing in an applied,nontechnical style, the author illustrates methods using a wide variety of real data: almost 200analyses of data sets are presented.

The material in an appendix that demonstrates the use of SAS for nearly all methods presentedin the book, is updated to reflect technical advances since the publication of the first edition.The companion webpage contains information about the use of other software, such as S-Plusand R, Stata, and SPSS. A set of solutions to some problems are provided.

The main change from the first edition is the addition of two new chapters discussing methodsfor clustered correlated categorical data along with improvements in major software. Chapter 9deals with marginal models, including the generalized estimating equations (GEE) approach.Chapter 10 deals with random effects models through generalized linear models. Earlier chaptersand appendices are updated.

This edition places more emphasis on logistic regression modeling and less emphasis onloglinear models. The book provides a unified perspective, based on generalized linear models,that connects methods for analysing categorical data with ordinary regression and ANOVAmodels. Introductory material on generalized linear models will now include information onnegative binomial regression. An excellent practical book to be used in conjunction withrelevant courses or to be used for reference. The readers interested also in the mathematicalconcepts and derivations should consult Agresti’s book Categorical Data Analysis (2nd edition,Wiley 2002).

Erkki P. Liski

Department of Mathematics, Statistics and Philosophy

FI-33014 University of Tampere, Finland




Introduction to Variance Estimation, 2nd EditionKirk M. WolterSpringer, 2007, xiv + 447 pages, € 69.50 / US$ 89.95, hardcoverISBN: 978-0-387-32917-8

Table of contents

1. Introduction 5. The bootstrap method

2. The method of random groups 6. Taylor series methods

3. Variance estimation based on 7. Generalized variance functions

balanced half-samples 8. Variance estimation for systematic sampling

4. The jackknife method 9. Summary of methods for complex surveys

Readership: Survey statisticians, graduate students.

The book is a second revised edition of the one published more than 20 years ago, in 1985.The majority of the material has remained unchanged; a new chapter for bootstrap is added andthe material for Taylor series variance estimation is extended. The book covers systematically anumber of important variance estimation methods meant for large complex sample surveys. Theresampling methods (random groups, balanced half-samples, jackknife and bootstrap), Taylorseries and generalized variance functions methods are described. The design-based approach forthe variability mechanism is assumed. Variance estimation under systematic sampling, both forequal and unequal probability cases, is elaborated more thoroughly in a separate chapter.

For each method both theory and applications are given. Theoretical results are formulatedin theorems and proved where necessary. Numerous examples emphasize basic principles invariance estimation as well their applications in real surveys. The presented methods differ fromthe standard ones, given elsewhere, by their applicability in samples with nonresponse, frameimperfections and other deficiencies, e.g. increased complexity of sampling mechanism dueto the cost and operational issues. The book, gathering widely scattered variance estimationmethods into one source, is a good reference book for survey methodologists and researchers.

Imbi Traat

Institute of Mathematical Statistics

University of Tartu, J. Liivi 2, 50409 Tartu, Estonia




Mathematical Models of Social Evolution: A Guide for the PerplexedRichard McElreath, Robert BoydThe University of Chicago Press, 2007, xiii + 407 pages, US$ 25.00, softcover (US$ 62.50hardcover)ISBN: 978-0-226-55827-1

Table of contents

1. Theoretician’s laboratory Appendixes

2. Animal conflict A. Facts about derivatives

3. Altruism & inclusive fitness B. Facts about random variables

4. Reciprocity C. Calculating binomial expectations

5. Animal communication D. Numerical solution of the Kokko et al. model

6. Selection among groups E. Solutions to problems

7. Sex allocation

8. Sexual selection

Readership: Graduate students, researchers, or anyone interested in the mathematical foundationsof evolutionary biology.

The subtitle of the book characterizes its contents succinctly. A solid mathematical foundationhas been developed for many current topics in evolutionary and behavioural ecology, including– but not limited to – the evolution of altruism, aggression, sex allocation, and sexual selection.However, many textbooks that address these topics assume that students are not interested (orcapable?) of following the mathematical derivations of the key results. This book is an excellentexception. Actually, one could state that the first half of its title is perhaps misleading, in thatthe book is broader than it suggests: the book covers a wider range of topics than what manymight assume when hearing the term “social evolution”. We find not only an excellent derivationof Hamilton’s Rule but also a lucid account of how it relates to the price equation, and a veryclear explanation of why individual-level selection is usually stronger than selection at the grouplevel. The book also derives, from first principles, several “classic” models of phenomena thathave been intensively researched, such as mate choice or dispersal.

Each chapter ends with further problems that give students further insight, with answers atthe end of the book. The bibliography is not complete, in the sense of attempting to form anexhaustive list of all theoretical work up to date, but instead it points readers selectively towardsthe papers from which most insight can be gained. I find this filtering work useful. All in all,this is an excellent book that I have already recommended my own students should read, andprobably reread.

Hanna Kokko

Laboratory of Ecological and Evolutionary Dynamics

Dept. of Biological and Environmental Sciences, PO Box 65

FI-00014 University of Helsinki, Finland




Statistical Development of Quality in MedicinePer Winkel, Nien Fan ZhangWiley, 2007, xiii + 263 pages, £ 45.00 / € 67.50, hardcoverISBN: 978-0-470-02777-6

Table of contents

Introduction – on quality of health care in general 6. Risk-adjusted control charts

Part I Control Charts 7. Risk-adjusted comparison of health care

1. Theory of statistical process control providers

2. Shewhart control charts Part III Learning and Quality Assessment

3. Time-weighted control charts 8. Learning curves

Part II Risk Adjustment 9. Assessing the quality of clinical processes

5. Tools for risk adjustment

Readership: People directly or indirectly involved in the quality assurance of clinical work,including physicians, nurses, administrators, and students of areas such as epidemiology andbioengineering.

This book is about the quality of measurement procedures, tests, provisions, systems, andpractices in clinical medicine. In the introduction, the authors note that the purpose of measuringhealth care system performance is to improve it, and that such improvement may be achieved bythe selection of health care providers or by improvements in the care provided by a given provider.In the first place, the health care measures serve to make the providers accountable and to enableinformed choice, and in the second place, the measures serve primarily internally for the givenprovider. The authors comment that “data for accountability are usually summary statistics sofar removed in time and so coarsely granulated that they contain little or no information usefulfor caregivers interested in improvement” – so that different measures are needed for the twoapproaches.

After the introductory chapter, the book is divided into three parts: on control charts, riskadjustment, and learning and quality assessment. The last part discusses so-called “learningcurves” – the improvement of various measures of a new procedure as people become morefamiliar with it – and issues such as benchmarking and dealing with processes that arenot in control. There are brief cautionary discussions analysing observational data and theconsequences of publishing performance data. There is an appendix giving basic statisticalconcepts.

There is some discussion of “gaming” – avoidance of high risk cases, inflation of pre-operativeco-morbidities, and so on – but I would have liked to see more discussion of Goodhart’s law –the tendency for particular measures of complex systems to become less useful as measures ofquality as time progresses, simply because increasing attention is paid to them, and less to other,equally important aspects.

David J. Hand


London SW7 2AZ, UK




Matrix Methods in Data Mining and Pattern RecognitionLars EldenSIAM, 2007, x + 224 pages, US$ 69, softcoverISBN: 978-0-898716-26-9

Table of contents

Part I: Linear Algebra Concepts and 8. Tensor decomposition

Matrix Decompositions 9. Clustering and non-negative matrix factorization

1. Vectors and matrices in data Part II: Data Mining Applications

mining and pattern recognition 10. Classification of handwritten digits

2. Vectors and matrices 11. Text mining

3. Linear systems and least squares 12. Page ranking for a web search engine

4. Orthogonality 13. Automatic key word and key sentence extraction

5. QR decomposition 14. Face recognition using tensor SVD

6. Singular value decomposition Part III: Computing the Matrix Decompositions:

7. Reduced rank least squares models 15. Computing eigenvalues and singular values

Readership: Undergraduates who have taken courses introductory courses in scientific comput-ing/numerical analysis, and graduate students in data mining and pattern recognition who needan introduction to linear algebra.

The book has two main parts dealing with, respectively, linear algebra and data miningapplications, and a small third one describing the principles underlying eigenvalue and singularvalue extraction, as applied to dense and large sparse matrices. It is not a cookbook, but isintended to describe a set of tools which might be applicable, or which might be modified to beapplicable, to various data mining and pattern recognition problems.

The procedural orientation of the first part of the book makes it highly readable – to theextent that I would recommend that any student having trouble with the ideas might find thatthis book gives a very helpful and revealing alternative perspective. The chapters begin at a veryelementary level and build up to topics with include QR decomposition, SVD, reduced-rankleast squares models, and tensor decomposition.

The five data mining applications illustrated in the second part are primarily from textprocessing and image processing domains. I think this is a pity – I would have liked to see anexample from a commercial application area such as customer value management to complementthese illustrations.

Overall, I think this book would make attractive supplementary reading for a course on linearalgebra, especially in the context of statistics, data mining, pattern recognition, or machinelearning.

David J. Hand


London SW7 2AZ, UK




Festschrift for Tarmo Pukkila on his 60th BirthdayErkki P. Liski, Jarkko Isotalo, Jarmo Niemela, Simo Puntanen, George P.H. Styan (Editors)Dept. of Mathematics, Statistics and Philosophy, Ser. A, 368, Univ. of Tampere, Finland, 2006,383 pages, € 40, hardcoverISBN: 978-951-44-6620-5

Table of contents

1. S. Puntanen, G.P.H. Styan: A conversation with

Tarmo Mikko Pukkila

2. S. Puntanen, G.P.H. Styan: Some comments

on the research publications of Tarmo Mikko

Pukkila

3. K. Brannas, J. Hellstrom: Very small samples and

additional non-sample information in forecasting

4. D.R. Brillinger, B.S. Stewart, C.L. Littnan: A

meandering hylje

5. R.W. Farebrother: On the trail of Trotter’s 1957

translation of Gauss’s work on the theory of least

squares

6. C.W.J. Granger, Y. Jeon: Building econometric

models with large data sets

7. T. Kollo, G. Pettere: Copula models for

estimating outstanding claim provisions

8. S. Koreisha, Y. Fang: Dealing with serial

correlation in regression

9. L.A. Koskinen: Statistical applications in Finnish

pension insurance

10. E.P. Liski: Normalized ML and the MDL

principle for variable selection in linear regression

11. T. Mathew, K. Nordstrom: Exhibiting latent

linear associations in large data sets

12. J.K. Merikoski, A. Virtanen: Bounds for the

Perron root using the sum of entries of matrix

powers

13. S. Mustonen: Logarithmic mean for several

arguments

14. L. Nordberg: On the reliability of performance

rankings

15. K. Nordhausen, H. Oja, D.E. Tyler: On the

efficiency of invariant multivariate sign and rank

tests

16. M. Balinski, F. Pukelsheim: Matrices and politics

17. J. Isotalo, S. Puntanen, G.P.H. Styan: On the role

of the constant term in linear regression

18. C.R. Rao: Familial correlations

19. J. Rantala: On joint and separate history of

probability, statistics and actuarial science

20. I. Tabus, J. Rissanen: Normalized Maximum

Likelihood models for logit regression

21. A.J. Scott, C. Wild: Calculating efficient

semi-parametric estimators for a broad class of

missing-data problems

22. K.R. Shah, Bikas K. Sinha: Universal optimality

for the joint estimation of parameters

23. P. Yimprayoon, M. Tiensuwan, Bimal K. Sinha:

Some statistical aspects of assessing agreement:

theory and applications

24. S.-O. Troschke, G. Trenkler: Mean square error

optimal linear plus quadratic combination of

forecasts

25. S. Puntanen, G.P.H. Styan: A photo album for

Tarmo Mikko Pukkila

Readership: Students and researchers interested in statistics, mathematics and applications;general.

The Festschrift is published to celebrate Dr. Tarmo Pukkila’s 60th birthday, which was on26 March 2006 and to honour his mathematical, statistical and administrative work devotedto the academic and public communities. It was presented to him in a specific session ofthe 15th International Workshop on Matrices and Statistics, held in Uppsala, 13–17 June2006.

The Festschrift comprises Dr. Pukkila’s interview, a detailed annotated list of his researchpublications, 23 invited and refereed papers, and a colourful photo album. The papers are allwritten by prominent researchers and his friends, colleagues and collaborators. The interviewreveals his life and work with interesting stories covering “early years”, “first internationalcontacts”, “towards professorship”, “towards the rectorship in 1987” and “to the ministry ofsocial affairs and health in 1993”. As seen from the contents, the papers reflect his contributions



in the various areas, both academically and administratively. In particular, the paper “SomeComments on the Research Publications of Tarmo Mikko Pukkila” lists his 27 co-authors withan authorship matrix and a figure of linkage between his co-authors and the four groups, andstatistical analysis of his research interests and publications. The papers address diverse issuesinvolving or applicable to problems in mathematics, statistics, economics, finance, insurance,actuarial science, politics, history, ecology and environmental studies, among many others. Mostpapers provide ideas, methods and results with tables, graphs, examples or empirical illustrations,all towards practical applications.

The Festschrift is indeed a great gift to Dr. Tarmo Pukkila and an appealing collection ofthe latest research work to us (students and researchers). It is available through the BookshopTAJU and through Granum, a virtual bookstore of Finnish scientific books and magazines; seehttp://mtl.uta.fi/pukkila60/.

Shuangzhe Liu

University of Canberra, Canberra ACT 2601, Australia


Optimization Methods in FinanceGerard Cornuejols, Reha TutuncuCambridge University Press, 2007, xii + 345 pages, US$ 70.00, hardcoverISBN: 978-0-521-86170-0

Table of contents

1. Introduction 13. Dynamic programming methods

2. Linear programming: Theory and algorithms 14. DP models: Option pricing

3. LP models: Asset/liability cash-flow matching 15. DP models: Structuring asset-backed securities

4. LP models: Asset pricing and arbitrage 16. Stochastic programming: Theory and algorithms

5. Nonlinear programming: Theory and algorithms 17. Stochastic programming models:Value-at-Risk

6. NLP models: Volatility estimation and Conditional Value-at-Risk

7. Quadratic programming: Theory and algorithms 18. Stochastic programming models: Asset/liability

8. QP models: Portfolio optimization management

9. Conic optimization tools 19. Robust optimization: Theory and tools

10. Conic optimization models in finance 20. Robust optimization models in finance

11. Integer programming: Theory and algorithms Appendix A: Convexity

12. Integer programming models: Constructing Appendix B: Cones

an index fund Appendix C: A probability primer

Appendix D: The revised simplex method

Readership: Master students in Financial Engineering, Finance, Computational Finance;Introductory or upper level undergraduate students in Operations Research, ManagementScience, Applied Mathematics; Practitioners working in Mathematical or ComputationalFinance.



This book presents a comprehensive overview of how optimization models, methods and softwarecan be used to efficiently and accurately solve optimization problems related to finance. Thus thereader is introduced to all major classes of optimization problems. However, no advance is made tocover the most advanced techniques. For instance, dynamic programming is restricted to binomialtrees, whereas stochastic programming is limited to finite sample spaces. Nevertheless, the widerange of intuitive applications, ranging from asset allocation to risk management, compensatesfor this.

The authors alter chapters discussing the theory and efficient solution methods with chaptersillustrating their use in modelling problems. The material is supported by numerous workedexamples, exercises, and case studies. This provides a significant amount of guidance andpractical experience, which contributes to the understanding of how the specific optimizationalgorithms work.

Max Fehr

ETH Zurich, Institut fur Operations Research

HG G 22.1, Ramistrasse 101, 8092 Zurich, Switzerland


Applied Statistics Using SPSS, STATISTICA, MATLAB and R, 2nd EditionJoaquim P. Marques de SaSpringer, 2007, xxiv + 506 pages, € 59.95, hardcoverISBN: 978-3-540-71971-7

Table of contents

1. Introduction 9. Survival analysis

2. Presenting and summarizing the data 10. Directional data

3. Estimating data parameters Appendix A – Short survey on probability theory

4. Parametric tests of hypotheses Appendix B – Distributions

5. Non-parametric tests of hypotheses Appendix C – Point estimation

6. Statistical classification Appendix D – Tables

7. Data regression Appendix E – Datasets

8. Data structure analysis Appendix F – Tools

Readership: Students, professionals and research workers with undergraduate knowledge levelof mathematics, who are interested in statistical methods and statistical program packages.

The book is a fairly large treatment of many basic statistical methods and procedures. It presentsboth theoretical issues and a wide variety of applications, examples and exercises. These coversuch areas as engineering, medicine, biology, psychology, economy, geology, and astronomy.In addition, there are five computer programs (SPSS, Statistica, Matlab, R, and Excel) that aretightly integrated in the text, from the very first chapter to the very last appendix. Excel has asomewhat minor role compared to the other packages, and it is not mentioned in the title of thebook. The second edition has added R in the selection of the programs.



I like the approach of using multiple programs in one book, but perhaps I would have givenmore weight to the programs, decreasing the number of mechanical calculations and formulas.On some occasions I felt that the programs were used more as calculators, instead of efficienttools of statistical analysis.

The introduction wisely reminds us that we should not apply the methods mechanically andthat we should be critical towards the results given by the programs. However, this wisdomis not so clearly seen in the chapters concerning, for example, testing, stepwise methods, oroutliers. Since the book intends to be a reference book, I think the words of wisdom given in theintroduction should have been better integrated in the text. I am afraid that only a few peoplewill read the introduction if they use this as a reference book.

Two chapters deserve special attention: Chapter 6 on Statistical Classification andChapter 10 on Directional Data. I would like to see much more emphasis on these chapters,as they are not among the most typical treatments in other books, and because they are quitechallenging. Respectively, I would decrease the number of traditional statistical tests, which nowoccupy about one quarter of the book. Some tests could well be replaced by randomized testsputting more weight on the computer programs.

There are 32 data sets described and included on CD. Most of them consist of some measure-ments from various fields. Surprisingly, I did not find any discussion on the measurements as asource of uncertainty in the statistical analysis and inference. The measurement errors are onlymentioned briefly in Appendix B when describing the origin of the normal distribution. In thissense the book seems to follow the traditional statistical point of view, where the uncertaintyis seen to be caused mainly by sampling, and the errors of measurement are more or lessneglected.

After all, I would easily recommend this book for those who are interested in the programpackages mentioned in the title.

Kimmo Vehkalahti

Department of Mathematics and Statistics





Contributions to Probability and Statistics: Applications and Challenges(Proceedings of the International Statistics Workshop held at University of Canberra, Australia,4–5 April 2005)Peter Brown, Shuangzhe Liu, Dharmendra Sharma (Editors)World Scientific, 2006, x + 311 pages, US$ 106.00, hardcoverISBN: 978-981-270-391-0

Table of contents

Part A: Mathematics and Statistics in Society

1. A. Daly & R. Lloyd: Estimating internet access

for welfare recipients in Australia

2. A. Hayashi: Two classification methods of

individuals for educational data and an

application

3. R. Kelly & P.E.T. Lewis: Measurement of skill

and skill change

4. S. Puntanen & G.P.H. Styan: Some comments

about Issai Schur (1875–1941) and the early

history of Schur complements

Part B: Applications of Statistics

5. J. Gani: Estimating the number of SARS cases in

Mainland China in 2002–3

6. K.J. King & J. Chapman: Using statistics to

determine the effectiveness of prescribed burning

7. G. Pollard: A fair tennis scoring system for

doubles in the presence of sun and wind effects –

An application of probability

Part C: Theoretical Issues in Probability and Statistics

8. J.J. Hunter: Perturbed Markov chains

9. J. Isotalo, S. Puntanen & G.P.H. Styan: Matrix

tricks for linear statistical models: A short review

of our personal top fourteen

10. S. Liu: On influence diagnostics in multivariate

regression models under elliptical distributions

11. C.-Y. Lu, Y. Gao & B. Zhang: A necessary

condition for admissibility of nonnegative

quadratic estimators of variance components

when the moments are not necessarily as under

normality

12. S. Nargis & A. Richardson: Small-sample

performance of robust methods in logistic

regression

13. H. Neudecker: On the asymptotic distribution of

the ‘natural’ estimator of Cronbach’s alpha with

standardized variates under nonnormality,

ellipticity and normality

14. H. Neudecker & G. Trenkler: On the

approximate variance of a non-linear function of

random variables

15. G. Trenkler: On oblique and orthogonal

projectors

16 B.X. Zhang, X.Z. Xu & X. Li: Fiducial inference

of means and variances from normal populations

under order restrictions

Part D: Probabilistic Models in Economics and

Finance

17. R. Gay: When large claims are extremes

18. R. Ghori, S.E. Ahmed & A.A. Hussein:

Shrinkage estimation of Gini Index

19. C.C. Heyde & K. Au: On the problem of

discriminating between the tails of distributions

Part E: Numerical Methods

20. M. Hegland: An approximate maximum aposteriori method with Gaussian process priors

21. F. Shadabi, D. Sharma, R. Cox & N. Petrovsky:

Data extraction for improved prediction

outcomes in organ transplantation – a hybrid

approach

22. G.J. Williams: Mining multiple models

Readership: Researchers and graduate students of statistics, or anyone interested in having ataste of the colourful world of statistics.

This book includes the proceedings of the International Statistics Workshop held at the Universityof Canberra, Canberra, Australia on 4–5 April 2005. It appears to be more than just a typicalcollection of conference papers, however. According to its challenging title, the book covers awide variety of topics, both theoretical and applied. In sum, it includes 22 papers and five abstractscontributed by authors from nine different countries. Together the local and the internationalcontributions constitute a definitely interesting mixture, presented nicely in six parts.



The papers of the parts A and B are of general interest, giving an overall picture of the roleof statistics in the society. These papers would be ideal reading for anybody wondering whatstatistics is for and what sort of things it might cover. One example is a huge biographicalarticle on Issai Schur and the Schur complement (39 pages, including 174 references as wellas photographs and footnotes). Examples of statistical research in contemporary real-worldproblems are highlighted by papers on Internet usage in Australia and SARS cases in China.

Part C is more theoretically oriented, although its title is also quite general. Nearly halfof the papers belong to this part. They address a large variety of technical issues on topicssuch as Markov chains, linear models, regression diagnostics, and projectors. Part D and Einclude three papers each. Part D presents some applications of probabilistic models, and part Econcerns numerical methods. The entity is supported by five abstracts that are listed briefly inpart F.

I was happy to learn that conference proceedings may be useful and interesting as a book, aslong as the book is well edited, has a clear structure and a wide variety of contents.

Kimmo Vehkalahti

Department of Mathematics and Statistics



Applied Asymptotics: Case Studies in Small-Sample StatisticsA.R. Brazzale, A.C. Davison, N. ReidCambridge University Press, 2007, viii + 236 pages, £ 35.00 / US$ 65, hardcoverISBN: 978-0-521-84703-2

Table of contents

1. Introduction 6. Some case studies

2. Uncertainty and approximation 7. Further topics

3. Simple illustrations 8. Likelihood approximations

4. Discrete data 9. Numerical implementation

5. Regression with continuous responses 10. Problems and further results

Appendices – Some numerical techniques

Readership: This is likely to be quite wide – academic statisticians and postgraduate students,of course, but also users of statistics in academia, research institutes, industry and commerce (inparticular, the modern finance industry).

The theme of the book is to show how standard first-order asymptotics, which work well for“large” samples, can be refined to give more accurate results for “small” samples. The aim isto obtain improved approximations to the distributions of pivotal quantities and test statistics. Arival method is straightforward simulation, or “parametric bootstrapping” as it has come to beknown nowadays. However, in some cases that can be computationally costly and even unfeasiblefor small samples.

The authors acknowledge the usual reservation that higher-order asymptotics place strongreliance on the correctness of the model: since “all models are wrong” and model checking isproblematic with small samples, the whole exercise might be regarded as pointless. However,accepting that the models are pragmatically correct, the authors here show some impressive



results. The improved accuracy of higher-order asymptotics in the examples shown here is quiteeye catching, even when applied to samples of size one. One is left with the uncomfortablefeeling of laziness in routinely relying on the usual (first-order) theory, though the authors aretoo polite to put this into words.

The methods are very well exemplified, sensibly confining the treatment to simple, parametric,well-known models. Data, R-code and a website are given so that readers can try things out forthemselves. There are illustrations in Chapters 3 and 7, focused case studies in Chapters 4 and 5,and three more detailed case studies in Chapter 6; the latter have the substance of brief journalarticles. The theory is outlined in Chapter 2 and more detail is given in Chapter 8. Chapter 9gives a detailed route map for implementing the techniques. Chapter 10 gives exercises, boththeoretical and practical.

M.J. Crowder

Room 530, Mathematics Department, Imperial College London

Huxley Building, 180 Queen’s Gate, London SW7 2BZ, UK


The R BookMichael J. CrawleyWiley, 2007, viii + 942 pages, £ 55.00 / € 82.50, hardcoverISBN: 978-0-470-51024-7

Table of contents

1. Getting started 15. Count data in tables

2. Essentials of the R language 16. Proportional data

3. Data input 17. Binary response variables

4. Data frames 18. Generalized additive models

5. Graphics 19. Mixed-effects models

6. Tables 20. Non-linear regression

7. Mathematics 21. Tree models

8. Classical tests 22. Time series analysis

9. Statistical modelling 23. Multivariate statistics

10. Regression 24. Spatial statistics

11. Analysis of variance 25. Survival analysis

12. Analysis of covariance 26. Simulation models

13. Generalized linear models 27. Changing the look of graphics

14. Count data

Readership: Undergraduates, postgraduates and professionals in science, engineering, statistics,economics, geography, social science and medicine.

This book is an introduction to the R environment for beginners and can be used as a referencetext by more experienced users of R. The reader will need to have R installed on their machineto be able to try out the examples used in the text. This is a simple exercise and can be done viathe website http://cran.r-project.org/ and following the instructions on screen. In chapter 1 theauthor gives instructions and explanations on how to do this, the running of R and how to gethelp.



The early chapters assume no knowledge of statistics or computing; however, it is assumedthat the reader has read these chapters in order to be able to follow the later chapters. The bookhas been written assuming that the reader has no background in mathematics or statistics andcovers a wide range of statistical methods from elementary classical tests, through regressionand analysis of variance and generalized linear modelling – an ambitious task for any author. Themethods are introduced with a range of worked examples providing the reader with an inclusiveguide to R. This is the most comprehensive text on R, which I have seen that is currently available.

The text also contains a wealth of references for the reader to pursue on related issues. It isa book to be recommended for all who wish to use it and have a comprehensive R referencemanual.

Susan Starkings

London South Bank University

103 Borough Road, London SE1 0AA, UK


Analysing Ecological DataAlain F. Zuur, Elena N. Ieno, Graham M. SmithSpringer, 2007, xxvi + 672 pages, £ 50.50 / € 64.95, hardcoverISBN: 978-0-387-45967-7

Table of contents

1. Introduction

2. Data management and software

3. Advice for teachers

4. Exploration

5. Linear regression

6. Generalised linear modelling

7. Additive and generalised additive modelling

8. Introduction to mixed modelling

9. Univariate tree models

10. Measures of association

11. Ordination – First encounter

12. Principal component analysis and redundancy

analysis

13. Correspondence analysis and canonical

correspondence analysis

14. Introduction to discriminant analysis

15. Principal coordinate analysis and non-metric

multidimensional scaling

16. Time series analysis – Introduction

17. Common trends and sudden changes

18. Analysis and modelling lattice data

19. Spatially continuous data analysis and modelling

20. Univariate methods to analyse abundance of

decapod larvae

21. Analysing presence and absence data for flatfish

distribution in the Tagus estuary, Portugal

22. Crop pollination by honeybees in an Argentinean

pampas system using additive mixed modelling

23. Investigating the effects of rice farming on

aquatic birds with mixed modelling

24. Classification trees and radar detection of birds

for North Sea wind farms

25. Fish stock identification through neural network

analysis of parasite fauna

26. Monitoring for change: using generalised least

squares, nonmetric multidimensional scaling,

and the Mantel test on western Montana

grasslands

27. Univariate and multivariate analysis applied on a

Dutch sandy beach community

28. Multivariate analyses of South-American

zoobenthic species-spoilt for choice

29. Principal component analysis applied to harbour

porpoise fatty acid data

30. Multivariate analysis of morphometric turtle

data-size and shape

31. Redundancy analysis and additive modelling

applied on savanna tree data



32. Canonical correspondence analysis of lowland

pasture vegetation in the humid tropics of

Mexico

33. Estimating common trends in Portuguese

fisheries landings

34. Common trends in demersal communities on the

Newfoundland-Labrador Shelf

35. Sea level change and salt marshes in the Wadden

Sea: a time series analysis

36. Time series analysis of Hawaiian waterbirds

37. Spatial modelling of forest community features

in the Volzhsko-Kamsky reserve

Readership: Undergraduates, postgraduates and scientists engaged in areas of the environmentalsciences and ecological research.

The material presented in this book has been developed and used by the authors in teachingstatistics to its intended readership. The text is divided into two parts – the first, coversapplied statistical theory and the second part presents 18 case studies which illustrate theapplication of data exploration techniques and approaches to statistical analysis. The case studiesillustrate the use of particular statistical techniques rather than completely answering a specificecological question, and as such only provide a brief glimpse into real world problems and theirsolution. The case studies are divided into four groups – univariate techniques (six chapters),multivariate techniques (seven chapters), time series methods (four chapters), and spatial statistics(disappointingly, only one chapter). I have no doubt that for undergraduate students the mainstrength of the book will be the breadth of topics covered by the case studies – ranging fromterrestrial ecology to marine biology. However, personally, and for the more advanced readership,the case studies will appear to be rather superficial and the authors could have expanded thepresentation of these at the expense of limiting the number included in the text. It would havebeen helpful to have an author index, in addition to the subject index, as this would aid personalstudy and research. The book contains a number of minor typographical errors and the formattingof the references is not consistent throughout

C.M. O’Brien

Cefas Lowestoft Laboratory

Pakefield Road, Lowestoft, Suffolk NR33 0HT, UK




Random Graph DynamicsRick DurrettCambridge University Press, 2006, ix + 212 pages, US$ 55, hardcoverISBN: 978-0-521-86656-9

Table of contents

1. Overview 5. Small worlds

2. Erdos-Renyi random graphs 6. Random walks

3. Fixed degree distributions 7. CHKNS model

4. Power laws

Readership: Students and researchers in stochastics.

Networks with random structures is an area of wide ranging, theoretical and applied, interest. Itis seeing important developments in electronic networking, graphical models, expert systems,biological and hereditary networks, and other fields. Among the mathematically most advancedtopics in the area, with strong relations to physics, are percolation, spin glasses and phasetransitions.

The book under review is not a comprehensive account on the mathematical stochastics ofrandom networks. Rather, it provides a panorama of a number of the most fascinating questionsconcerning the dynamic behaviour of various types of graphs equipped with stochastic properties.Among the topics discussed are scale free random graphs, epidemics and percolation, Pottsmodels and the contact process, and the CHKNS model.

The ideas behind the questions and the proofs of the probabilistic results are set out in asparkling and highly informative manner. Thus, if you are not familiar with the stochasticdynamics of random graphs and want a masterly and very readable introduction to the area,this book is likely to please you.

Ole E. Barndorff-Nielsen

Department of Mathematical Sciences

University of Aarhus, DK-8000 A�

rhus C, Denmark


Statistics Using SAS® Enterprise Guide®

James B. DavisSAS Press, 2007, vi + 772 pages, € 104.00 / US$ 99.95, softcoverISBN: 978-1-59047-566-9

This book is a technical guide for learning how to perform basic statistical analysis using SAS.It is a good beginner’s guide for those with little or no statistical background. The text containsa wealth of practical examples, which are easy to follow, and provide the reader with hand-onexperience with a statistical package that is user friendly. This book can easily be used to supportcourses which use SAS.



Exercises with working example SAS code are included and the companion website ishttp://support.sas.com/publishing/bbu/companion site/57255.html where the code can be down-loaded.

Susan Starkings

London South Bank University

103 Borough Road, London SE1 0AA, UK


Data Preparation for Analytics Using SAS®

Gerhard SvolbaSAS Press, 2006, xxi + 408 pages, € 71.00 / US$ 67.95, softcoverISBN: 978-1-59994-047-2

The main focus of this practical book is that it contains lots of macros and SAS codeexamples with many useful tips. A sample chapter, author interview and the SAS macrosand code examples used in this book can be downloaded from the companion website athttp://support.sas.com/publishing/bbu/companion site/60502.html

This is not an introductory text but is aimed at researchers who are involved in using advancedanalytical methods such as regression analysis, clustering methods, survival analysis, decisiontrees, neural networks or time series forecasting. The text covers all data preparation tasks byusing data extraction, and data transformation with SAS data steps and procedure steps.

Susan Starkings

London South Bank University, UK

Pharmaceutical Statistics Using SAS®: A Practical GuideAlex Dmitrienko, Christy Chuang-Stein, Ralph D’AgostinoSAS Press, 2007, vi + 444 pages, € 73.00 / US$ 69.95, softcoverISBN: 978-1-59047-886-8

This book starts with a review of statistical problems in drug development. The further 13 chaptersdeal with the statistical approaches used in drug discovery experiments, animal toxicology studiesand clinical trails.

Examples from real studies, with relevant SAS code, are presented and the completeSAS code and data sets used in this book are available on the book’s companion websitehttp://support.sas.com/publishing/bbu/companion site/60622.html The book also contains awealth of references, at the end of each chapter, for the reader to pursue on drug development.The authors clearly demonstrate extensive knowledge in the subject area and link statisticalapplications to relevant research problems in the pharmaceutical industry

Susan Starkings




Learning SAS® by Example: A Programmer’s GuideRon CodySAS Press, 2007, xxviii + 626 pages, € 73.00 / US$ 69.95, softcoverISBN: 978-1-59994-165-3

This book covers most of SAS programming techniques, from the basics through to the advancedtopics. The text is divided into four main sections namely (a) Getting Started, (b) DATA StepProcessing, (c) Presenting and Summarising Your Data and (d) Advanced Topics. It is a goodway to learn by examples and this book has plenty of them for you to work through. Everyexample is followed by a detailed explanation of how the program works. At the end of eachchapter there are numerous problems for the reader to attempt and solutions to the odd numberedquestions are supplied at the back of the book. Solutions to all problems are on the companionwebsite http://support.sas.com/publishing/bbu/companion site/60864.html

Susan Starkings


Elementary Statistics Using JMP®

Sandra D. SchlotzhauerSAS Publishing, 2007, viii + 458 pages, € 83.50 / US$ 79.95, softcoverISBN: 978-1-59994-375-6

This book shows how to use JMP for basic statistical analyses and explains the statisti-cal methods used i.e. when to use them, what the assumptions are and how to interpretthe results. Prior JMP experience, statistical knowledge or programming experience is notrequired to follow the text. The book starts with how to get data into JMP and contin-ues through graphical and data summary methods and then moves onto statistical infer-ence. It is a useful text for researchers from both academia and industry who need toperform elementary statistical analyses on their data. The supporting companion website ishttp://support.sas.com/publishing/bbu/companion site/61382.html

Susan Starkings




Testing 1-2-3: Experimental Design with Applications in Marketing and ServiceOperationsJohannes Ledolter, Arthur J. SwerseyStanford University Press, 2007, xii + 300 pages, US$ 65, hardcoverISBN: 978-0-8047-5612-9

Table of contents

1. Introduction

2. A review of basic statistical Concepts

3. Testing differences among several means:

completely randomized and randomized

complete block experiments

4. Two-level factorial experiments

5. Two-level fractional factorial designs

6. Plackett-Burman designs

7. Experiments with factors at three or more levels

8. Nonorthogonal designs and computer software

for design construction and data analysis

Appendix: Case studies

Readership: Students in business administration or anyone interested in experimental design.

The book is a text book on experimental design with special emphasis on marketing applications.It may be used as a text book for students in business administration or as a general text bookon experimental design. It is also suitable for self-study because of its clear writing style. It isonly assumed that the reader has taken an introductory course in statistics. However, the bookaims to be self-complete and starts with a review of basic statistical concepts.

The basic ideas and concepts are well-motivated and illustrated using lively examples. Thetext also discusses computer output from several packages, in particular, Minitab and JMP.Each chapter ends with section “Nobody Asked Us, But . . .”, were the techniques are furtherelaborated. For those who have a stronger mathematical background and are interested in deepertheoretical understanding, additional explanations are provided in the appendices following thechapters.

In addition to the examples presented in the main text and the exercises, the book contains 13case studies from real-life situations. These case studies help the students develop their fluencyin the techniques they have learned and give them the opportunity to see how the experimentswere actually carried out.

In my opinion, the authors have succeeded in demonstrating the power of experimentaldesign in practical business problems. The presented techniques are relevant and the theoryand applications are in good balance.

Arto Luoma






Case Study Research: Principles and PracticesJohn GerringCambridge University Press, 2007, x + 265 pages, US$ 24.99, softcover (US$ 70.00 hardcover)ISBN: 978-0-521-67656-4

Table of contents

1. The conundrum of the case study

Part I: Thinking about case studies

2. What is a case study? The problem of definition

3. What is a case study good for? Case study

versus Large-N cross-case analysis

Part II: Doing case studies

4. Preliminaries

5. Techniques for choosing cases

6. Internal validity: an experimental template

7. Internal validity: process tracing

Epilogue: single-outcome studies

Readership: Social scientists and postgraduate students of statistics.

Often, it is far easier to undertake a quantitative analysis in statistics than to question theunderlying premise of a particular technique and methodology. In this book the author providesa general understanding of the case study, as well as the tools and techniques necessary for itssuccessful implementation. Issues pertaining to single-case and cross-case studies are discussedand the role of case studies in facilitating causal analysis is emphasised. Within the text, chaptersquestion some of the usual assumptions applied to case study research and the author presentsa few wisely chosen examples as illustrations. There are numerous references and an extensivebibliography – with many of the references taken from political and social sciences. The statisticalreferences are few in number but include important papers by Phil Dawid on causal inference,Bradley Efron on maximum likelihood and decision analysis, and papers by Rosenbaum and hisco-authors on statistical estimates of causal effects based on matching techniques.

The book is not easy to read and it will either capture your imagination or it will leave youwondering why such a text has been written. Personally, I think that the time spent reading thetext was well-worth the effort and has given me much to think about.

C.M. O’Brien

Centre for Environment, Fisheries & Aquaculture Science

Pakefield Road, Lowestoft, Suffolk NR33 0HT, UK




Bayesian Inference for Gene Expression and ProteomicsKim-Anh Do, Peter Muller, Marina Vannucci (Editors)Cambridge University Press, 2006, xviii + 437 pages, US$ 75, hardcoverISBN: 978-0-521-86092-5

Table of contents

1. An introduction to high-throughput

bioinformatics data

2. Hierarchical mixture models for expression

profiles

3. Bayesian hierarchical models for inference in

microarray data

4. Bayesian process-based modeling of

two-channel microarray experiments: estimating

absolute mRNA concentrations

5. Identification of biomarkers in classification and

clustering of high-throughput data

6. Modeling nonlinear gene interactions using

Bayesian MARS

7. Models for probability of under- and

overexpression: the POE scale

8. Sparse statistical modelling in gene expression

genomics

9. Bayesian analysis of cell cycle gene expression

10. Model-based clustering for expression data via a

Dirichlet process mixture model

11. Interval mapping for Expression Quantitative

Trait Loci mapping

12. Bayesian mixture model for gene expression and

protein profiles

13. Shrinkage estimation for SAGE data using a

mixture Dirichlet prior

14. Analysis of mass spectrometry data using

Bayesian wavelet-based functional mixed models

15. Nonparametric models for proteomic peak

identification and quantification

16. Bayesian modeling and inference for sequence

motif discovery

17. Identifying of DNA regulatory motifs and

regulators by integrating gene expression and

sequence data

18. A misclassification model for inferring

transcriptional regulatory networks

19. Estimating cellular signaling from transcription

data

20. Computational methods for learning Bayesian

networks from high-throughput biological data

21. Bayesian networks and informative priors:

transcriptional regulatory network models

22. Sample size choice for microarray experiments

Readership: This book is suitable for two broad audiences. For researchers with a backgroundin Bayesian statistics, this book provides a detailed look at how Bayesian techniques arebeing applied in genomic and proteomic research. For researchers without a strong statisticalbackground, the book provides examples of the application of these techniques to real data sets,and in many cases includes links to freely available software packages that implement thesemethods.

The first chapter of the book provides an excellent overview of the different types of highthroughput data most commonly encountered by bioinformaticians, as well as providing linksto sources of publicly available bioinformatic data. Beyond the introduction, the self-containedchapters are written by separate groups of authors, with each describing the application ofBayesian methodology to a particular problem in genomic or proteomic data analysis. Roughlyhalf of the book (chapters 2 to 11) is devoted to the analysis of data from gene expressionmicroarray experiments, reflecting both the popularity, and the prevalence, gained by thistechnology over the past decade. The remainder of the book covers the analysis of data producedby other high-throughput methods (e.g., Serial Analysis of Gene Expression (SAGE), MassSpectrometry (MS), DNA sequencing), as well as devoting four chapters to Regulatory NetworkInference, a current “hot topic” within the field of bioinformatics.



The book provides a wonderfully broad collection of examples of how Bayesian methodshave been used to provide elegant solutions to problems in this area, and includes detailedinformation about both model development and data analysis. As it focuses solely on the useof Bayesian analysis methods, this book is most suitable for those researchers who alreadyhave some familiarity either with Bayesian techniques, or with the structure and analysis ofbioinformatic data.

Mik Black

Department of Biochemistry, 710 Cumberland Street

University of Otago, PO Box 56, Dunedin, New Zealand


Introduction to Clustering Large and High-Dimensional DataJacob KoganCambridge University Press, 2007, 222 pages, US$ 75, hardcover (US$ 30 softcover)ISBN: 978-0-521-85267-8

Table of contents

1. Introduction and motivation 6. Information-theoretic clustering

2. Quadratic k-means algorithm 7. Clustering with optimization techniques

3. BIRCH 8. k-means clustering with divergence

4. Spherical k-means algorithm 9. Assessment of clustering results

5. Linear algebra techniques 10. Appendix: Optimization and linear algebra background

Readership: Undergraduate and graduate students in applied mathematics, statistics, computersciences, and engineering, or anyone interested in mathematical background of clustering.

The aim of the cluster analysis is to group the individuals in a data set into subsets or clusters.Clustering techniques can be used to discover natural subgroups or hidden structures of the datawithout any prior knowledge of the group membership. This book describes and discusses thek-means algorithms and some other important clustering algorithms in great detail. The k-meanstechnique is perhaps the most popular clustering algorithm used in applications.

The book is first motivated by the document clustering problem. It then focuses on differentk-means clustering techniques (quadratic k-means, spherical k-means, information theoreticalk-means, k-means with divergences). Also Principal Direction Divisive Partitioning (PDDP)and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithms arediscussed. Both PDDP and BIRCH are designed to generate partitions of large data sets; herethey also serve as tools to get initial partitions for k-means techniques. Last chapter is devotedto the assessment (internal and external criteria) of clustering methods.

The book is clearly written and well organised and it provides a nice mathematical introductionto k-means and related algorithms. The book is often concerned with mathematical results, andthe reader should be mathematically oriented. To meet readers at different mathematical levels,the treatment is however kept as simple as possible. The algorithms are described with care. Tomotivate the reader and to deepen the theory, theoretical problems (with selected solutions in



the end of the book), suggested projects as well as numerical experiments are provided. Eachchapter ends with some bibliographical notes.

Hannu Oja

Tampere School of Public Health



Matrix Algebra: Theory, Computations, and Applications in StatisticsJames E. GentleSpringer, 2007, xxii + 528 pages, US$ 89.95 / € 69.95, hardcoverISBN: 978-0-387-70872-0

Table of contents

1. Basic vector/matrix structure and notation

2. Vectors and vector spaces

3. Basic properties of matrices

4. Vector/matrix derivatives and integrals

5. Matrix tranformations and factorizations

6. Solution of linear systems

7. Evaluation of eigenvalues and eigenvectors

8. Special matrices and operations useful in

modeling and data analysis

9. Selected applications in statistics

10. Numerical methods

11. Numerical linear algebra

12. Software for numerical linear algebra.

Readership: Students of a course in matrix algebra for statistics, or in statistical computing.Supplementary text for various courses in linear models or multivariate statistics.

Recently, quite a number of books on matrices related to statistics have been published and hereis one more. Matrices are so essential tools that there is surely room for new good matrix books.In the Preface, James Gentle mentions some of the recents matrix books pointing out that theyall are useful books. As he says, the computational orientation of this book is probably the maindifference between it and these other books.

Gentle has written an excellent Preface which well describes the leading principles of hiswriting style. For example, there are no proofs, definitions, theorems, and end-of-proofs in theusual style. Gentle has left that style: he assumes that “the reader is engaged in the development”.That is somewhat unusual style but it makes the book more narrative, puts there “many words”,things develop smoothly, so to say.

I never thought that one could write a matrix book with statistical applications without havingC.R. Rao in the references; here the book now is. Gentle deliberately has not paid too muchattention to the origins of the results, but he admits that several of them might have appeared inC.R. Rao’s work.

One of the author’s favorite phrases is the following (appearing word for word in several placesin the book): “This is an instance of a principle that we will encounter repeatedly: the form of amathematical expression and the way the expression should be evaluated in actual practice maybe quite differerent.”

I find this an extensive, personal, and easy-to-read matrix book of high quality. Recommended.

Simo Puntanen






Parameter Estimation for Scientists and EngineersAdriaan van den BosWiley, 2007, xiv + 273 pages, £ 50.50 / € 79.90, hardcoverISBN: 978-0-470-14781-8

Table of contents

1. Introduction

2. Parametric models of observations

3. Distributions of observations

4. Precision and accuracy

5. Precise and accurate estimation

6. Numerical methods for parameter estimation

7. Solutions or partial solutions to problems

Appendix A: Statistical Results

Appendix B: Vectors and Matrices

Appendix C: Positive Semidefinite and Positive

Definite Matrices

Appendix D: Vector and Matrix Differentiation

Readership: Applied scientists and engineers.

In his Preface the author says that, in his experience, scientists and engineers are often notaware of estimators other than least squares. He therefore proposes to show them that thereis a better-informed approach based on a set of ‘coherent, generally applicable principles andnotions’ in Statistics. The stated aim sounds eminently laudable. However, (i) the criticism ofscientists and engineers is a little out of date and (ii) the statistical coverage in this book is ratherrestricted.

I must justify my comments in (i) and (ii). Firstly, the widespread availability of easy-to-usestatistical software over the past few years has transformed the situation. Nowadays, the userscan apply sophisticated methodology to their data at the click of the mouse. A major role of thestatistician has become that of a moderator, attempting to restrain the reckless use of statisticalpackages.

Secondly, the restricted coverage in this book gives the impression of an engineer or scientisthaving used some statistical methods in the course of his work and presenting them to histarget audience as a rounded picture. My quarrel with this restricted coverage is that it givesundue prominence to some aspects and omits altogether other important areas of statistics.Undue prominence includes 30 pages on the Cramer-Rao lower bound in Chapter 4 and 40pages on numerical optimisation in Chapter 6 (omitting to mention modern standard algorithmslike DFP, BFGS and Nelder-Mead). Most users of applied statistics would be unconcernedwith either of these topics. Omissions include, for example, confidence intervals, analysis ofvariance, simulation, reliability of systems (components in series and parallel), reliability andsurvival analysis, extreme values, categorical data, longitudinal data, spatial data, time series,etc.

More detailed criticism concerns a lack of organisation. For example, the ‘expectation model’ isintroduced in Section 2.4.1 without explaining what an expectation is: the reader just has to makedo with the phrase ‘theoretical mean values’ and then the E notation is used (for ‘mathematical ex-pectation’ now) without further ado. (It is not until Chapter 3 that expectation is defined formally,and then abruptly for the multivariate case by a multiple integral or summation in equations 3.3and 3.14.) Later in Section 2.4.1 ‘the variance’ appears to be defined as the square of the ‘standarddeviation’ without telling us what the latter is. (In Chapter 4 equation 4.4 defines the standarddeviation as the square root of an expression that one deduces to be a variance, since the notation‘var’ is used.) There are many other instances of such statistical eccentricity, including a discus-sion of the ‘Fisher score vector’ in Chapter 3 while the likelihood function does not appear untilChapter 5.



In summary, I regret to say that I could not recommend this book to the stated target audience.

M.J. Crowder

Room 530, Mathematics Department, Imperial College London

Huxley Building, 180 Queen’s Gate, London SW7 2BZ, UK


Modelling for Field Biologists and Other Interesting PeopleHanna KokkoCambridge University Press, 2007, xii + 230 pages, £ 27.99, softcover (£ 65.00 hardcover)ISBN: 978-0-521-53856-5

Table of contents

1. Modelling philosophy 6. Game theory

2. Population genetics 7. Self-consistent games and evolutionary

3. Quantitative genetics invasion analysis

4. Optimization methods 8. Individual-based simulations

5. Dynamic optimization 9. Concluding remarks

Appendix: A quick guide to MATLAB

Readership: Field biologists and evolutionary ecologists.

This is a remarkable book. It is written in a disarmingly informal, light-hearted style, yet stillmanages to convey considerable insight into the challenging realm of mathematical modeling inbiology. Anyone who has tried to teach biology students the value of mathematical modeling willrecognize the difficulties: Most biology students neither enjoy the challenge of mathematicalthinking nor find it easy. Through a cleverly crafted sequence of increasingly complex,provocative examples, the author does a masterful job of tackling both of these obstacles tolearning.

When I first skimmed through the book, I was concerned that the author might have glossedover critically important topics in a subject that is often misused. I do remain somewhat apprehen-sive. There is much truth in her comments (in the introductory chapter on modeling philosophy)on the counterproductive divide between the more theoretical mathematical modelers and mostbiologists. Yet I found her cursory dismissal of the role of formal mathematical deduction to beunhelpful in this regard.

I experienced similar, but only minor, qualms about later chapters. Her material on optimizationtheory provides an appropriate demonstration of the role of calculus, and she does a fine job ofweaving in an introduction to MATLAB throughout the book. Yet her passing remarks on matrixalgebra in the MATLAB appendix provide no clue of the value of this fundamental componentof mathematics. A chapter introducing matrix calculations through models for age-structuredpopulation dynamics would have been a valuable addition.

Nonetheless, these misgivings are indeed relatively minor. The author’s engaging introductionto this challenging subject will serve the more mathematically intimidated students well. If thebook motivates even a small fraction of the intended audience to pursue mathematical modelingin more depth, then it will be a very notable success.

Rick Routledge

Department of Statistics and Actuarial Science

Simon Fraser University, Burnaby, BC, Canada V5A 1S6




Linear Models and Generalizations: Least Squares and Alternatives, 3rd EditionC. Radhakrishna Rao, Helge Toutenburg, Shalabh, Christian Heumann (with Contributions byMichael Schomaker)Springer, 2008, xx + 570 pages, US$119.00 / € 89.95, hardcoverISBN: 978-3-540-74226-5

Table of contents

1. Introduction 7. Sensitivity analysis

2. The simple linear regression model 8. Analysis of incomplete data sets

3. The multiple linear regression model 9. Robust regression

and its extensions 10. Models for categorical response variables

4. The generalized linear regression model A. Matrix algebra

5. Exact and stochastic linear restrictions B. Tables

6. Prediction in the generalized regression model C. Software for linear regression models

Readership: Researchers, graduate students, anyone interested in linear statistical models.

This book is the third version of the book originally written by C. Radhakrishna Rao and HelgeToutenburg in 1997. Now, after ten years, there are 200 pages more, reflecting the fact thatsomething has been going on with the linear models. As the authors state in the Preface, mostchapters are updated with recent developments in the area of linear models and more topics areincluded.

The book contains a massive amount of useful results related to the world of linear models.So much that it is impossible to go through in a single graduate course, but surely that is not theauthors’ intention. I find my life more comfortable when I have this book in my bookshelf whilechecking whether some results have appeared in the literature. This is a book which is a naturalsource book for a student and researcher of linear models.

As for the writing style, I personally prefer to use different symbols for vectors and scalars butthis is not the only book where the difference is not made. Moreover, I would have appreciatedthe author index. As a human being, I paid attention to a reference to my paper Puntanen (1986)which the authors (page 155) claim to present an overview of conditions for OLSE to be BLUE.That, unfortunately, is not true since that “paper” is a 6-lines letter to the editor where only onecondition is mentioned. However, I do have the impression that in general the text is written withgreat care and, of course, with great skills under the leadership of Professor C. RadhakrishnaRao.

This is a very useful book and the authors earn congratulations.

Simo Puntanen





Applied Statistics Using SPSS, STATISTICA, MATLAB and R, 2nd Edition by Joaquim P. Marques de Sá

Documents

Transcript of Applied Statistics Using SPSS, STATISTICA, MATLAB and R, 2nd Edition by Joaquim P. Marques de Sá