Margins of Error: A Study of Reliability in Survey Measurement by Duane F. Alwin

International Statistical Review (2008), 76, 2, 300–328 doi:10.1111/j.1751-5823.2008.00054.x

Short Book ReviewsEditor: Simo Puntanen

Adaptive Design Theory and Implementation Using SAS and RMark ChangChapman & Hall/CRC, 2007, xxii + 418 pages, £ 49.99 / US$ 79.00, hardcoverISBN: 978-1-58488-962-5

Table of contents

1. Introduction 11. Drop-loser and add-arm design2. Classic design 12. Biomarker-adaptive design3. Theory of adaptive design 13. Adaptive treatment switching and crossover4. Method with direct combination of P-values 14. Response-adaptive allocation design5. Method with inverse-Normal P-values 15. Adaptive dose finding design6. Implementation of K-stage adaptive designs 16. Bayesian adaptive design7. Conditional error function method 17. Planning, execution, analysis, and reporting8. Recursive adaptive design 18. Paradox – debates in adaptive designs9. Sample-size re-estimation design A. Random number generation

10. Multiple-endpoint adaptive design B. Implementing adaptive designs in R

Readership: Biostatisticians, medical researchers, and those engaged in areas of pharmaceuticalresearch and development.

This easy-to-read book provides the reader with a unified and concise presentation of adaptivedesign theories; together with computer programs written in SAS and R for the design andsimulation of adaptive trials. Compared to a classic trial design with static features, an adaptivedesign allows for changing or modifying the characteristics of a trial based on accumulatedinformation gathered during an on-going trial. Such adaptations are made because they canimprove the efficiency of the trial design, provide earlier remedies, and reduce the time andcost of drug development – clearly, in addition to their ethical importance! Practical examplesare provided throughout the text that are motivated by real issues in aspects of clinical trials –particularly, stopping a trial early if either the risk to subjects outweighs the benefit or thereis evidence of efficacy for a safe drug. The last chapter in the book presents a discussionof controversial issues surrounding statistical theories, and highlights future research and theapplication of adaptive designs. The text, computer programs and data sets will be of value toboth practitioners and students alike.

C. M. O’Brien: [email protected] for Environment, Fisheries & Aquaculture Science

Pakefield Road, Lowestoft, Suffolk NR33 0HT, UK

C©2008 The Author. Journal compilation C© 2008 International Statistical Institute. Published by Blackwell Publishing Ltd, 9600 Garsington Road, OxfordOX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

SHORT BOOK REVIEWS 301

Statistical Design and Analysis of Stability StudiesShein-Chung ChowChapman & Hall/CRC, 2007, xvii + 330 pages, £49.99 / US$ 84.55, hardcoverISBN: 978-1-58488-905-2

Table of contents

1. Introduction 8. Stability analysis with discrete responses2. Accelerated testing 9. Stability analysis with multiple components3. Expiration dating period 10. Stability analysis with frozen drug products4. Stability designs 11. Stability testing for dissolution5. Stability analysis with fixed batches 12. Current issues and recent developments6. Stability analysis with random batches A. Guidance for industry7. Stability analysis with a mixed effects model B. SAS macro files for STAB system for stability analysis

Readership: Biostatisticians, medical researchers, and those engaged in areas of pharmaceuticalresearch and development.

The labelled shelf-life of a drug product provides the consumer with the assurance that the drugproduct will retain its strength, quality and purity throughout the expiration period of the drugproduct. Drug shelf-life is supported by stability data collected from stability studies conductedunder appropriate storage conditions. Distribution of a drug following inadequate studies canpotentially present a significant risk to public health and recalls for drug products can be costly– both in administration costs (together with lost reputation) and possible penalties for thepharmaceutical company producing the drug product.

This book provides a comprehensive and unified presentation of the principles and method-ologies of design and analysis of stability studies. The text is well-written and provides a well-balanced summary of current regulatory perspectives. The primary SAS programs (macros) –comprising STAB – for estimating the expiration dating period of a drug product based on linearregression analysis are listed in the Appendix B and these macros help to reinforce the theorypresented in the text.

The potential readership for this book is limited and specialized, although the text will appealto all those engaged in the design and analysis of stability studies – studies that increasinglyplay an important role in drug safety and quality assurance.



Sample Size Calculations in Clinical Research, Second EditionShein-Chung Chow, Jun Shao, Hansheng WangChapman & Hall/CRC, 2007, xiv + 465 pages, £ 49.99 / US$ 89.95, hardcoverISBN: 978-1-58488-982-3

Table of contents

1. Introduction 5. Exact tests for proportions2. Considerations prior to sample size calculation 6. Tests for goodness-of-fit and contingency tables3. Comparing means 7. Comparing time-to-event data4. Large sample tests for proportions 8. Group sequential methods

International Statistical Review (2008), 76, 2, 300–328C©2008 The Author. Journal compilation C© 2008 International Statistical Institute

302 SHORT BOOK REVIEWS

9. Comparing variabilities 13. Bayesian sample size calculation10. Bioequivalence testing 14. Nonparametrics11. Dose response studies 15. Sample size calculation in other areas12. Microarray studies Appendix: Table of quantiles

Readership: Clinical scientists and biostatisticians in the pharmaceutical industry, regulatoryagencies, and academia.

This book continues the aims of the series to provide useful reference texts on important topicsin biostatistics. Clinical development is an integral part of pharmaceutical development. It isa lengthy and costly process for providing accurate and reliable assessment of the efficacyand safety of pharmaceutical entities under investigation. This text provides procedures andformulae for the determination of sample size and the appropriate calculation of power for thehypotheses that reflect study objectives under a valid study design. Each chapter presents abrief background on context, regulatory requirements, statistical design and methods for dataanalysis, recent development, and related references. Practical issues are discussed; togetherwith appropriate statistical tests, and each chapter is balanced by the judicious use of examples.The revised text in this Second Edition will appeal to both practitioners and students alike.



Negative Binomial RegressionJoseph M. HilbeCambridge University Press, 2007, xii + 251 pages, £ 40.00 / US$ 65.10, hardcoverISBN: 978-0-521-85772-7

Table of contents

1. Overview of count response models 9. Negative binomial with censoring, truncation,2. Methods of estimation and sample selection3. Poisson regression 10. Negative binomial panel models4. Overdispersion A. Negative binomial log-likelihood functions5. Negative binomial regression B. Deviance functions6. Negative binomial regression: modeling C. Stata negative binomial – ML algorithm7. Alternative variance parameterizations D. Negative binomial variance functions8. Problems with zero counts E. Data sets

Readership: Applied and bio-statisticians, medical scientists, teachers and students of statisticscourses.

In this book the author explores both the theory and practice of the negative binomial – providingthe reader with guidelines of how best to implement the model into their research. The text hasbeen written to provide the reader with a thorough understanding of the negative binomial modeland its many variations. Without exaggeration, the negative binomial is now one of the mostcommon methods used to accommodate over-dispersion when modelling count response data.The text is based on seminars and classes related to count response models that the author hastaught over the past 20 years.



The Stata package has been used extensively throughout the text to analyse the majorityof the examples. However, a brief review of other software available at the time of writing(GENSTAT, GLIM, R, SAS, S-PLUS) for fitting negative binomial models is presented. Thetext is well-written, easy-to-read but once started, is difficult to put down as each chapter unfoldsthe intricacies of the distribution.



Life Distributions: Structure of Nonparametric, Semiparametric, and ParametricFamiliesAlbert W. Marshall, Ingram OlkinSpringer, 2007, xx + 782 pages, € 69.95 / US$ 89.95, hardcoverISBN: 978-0-387-20333-1

Table of contents

1. Preliminaries 13. Inverse Gaussian distributions2. Ordering distributions: Descriptive statistics 14. Distributions with bounded support3. Mixtures 15. Additional parametric families4. Nonparametric families: densities and hazard rates 16. Covariate models5. Nonparametric families: origins in reliability theory 17. Several types of failure; competing risks6. Nonparametric famlies: inequalities for moments 18. Characterizations through coincidences

and survival functions of semiparametric families7. Semiparametric families 19. More about semiparametric families8. Exponential distributions 20. Some topics from probability theory9. Parametric extensions of the exponential distribution 21. Convexity and total positivity

10. Gompertz and Gompertz–Makeham distributions 22. Functional equations11. Pareto and F distributions and their parametric extensions 23. The Gamma and Beta functions12. Logarithmic distributions 24. Some topics from calculus and analysis

Readership: Postgraduate researchers and practitioners in Survival Analysis.

This is not the ‘usual’ treatise on Survival Analysis. Instead of setting out methodology foranalysing lifetime data it concentrates on the probability models that are used for such analyses.So, the subject is the intrinsic properties of survival distributions, that is to say of distributionsof non-negative random variables. The authors and their collaborators have a distinguished trackrecord of research in this field and the book reflects this. For the student, probably one with adegree in statistics, mathematics or engineering, the book provides a wealth of material both forbackground facts and for further research. For the working statistician it explains connectionsbetween distributions that might be adopted as models for data; it also reveals some of thoseunforeseen, perhaps unwanted, consequences that accompany such assumptions.

One can pick out a variety of aspects that give the book its character. One or two that caughtmy eye are as follows.

(i) It is familiar to specify a survival model in terms of its survivor or hazard function. But hereone finds other ways: the specification can be based on the reverse hazard function, the



mean residual lifetime, the TTT transform (look them up!), etc. (Incidentally, the authorsdefine hazard rates and hazard functions in a certain way.)

(ii) There is detailed discussion of different types of parameter and their their roles. In additionto the usual location and scale parameters there are others that define shape in variousways. It is useful to have this dissection available.

(iii) The book is divided into seven parts and some of these deal separately with nonparametricfamilies (one part), semiparametric families (two parts) and parametric families (one part).This comprises more than the usual level of detail and is one of the things that give thebook its particular flavour.

Martin Crowder: [email protected] Department, Imperial College

London SW7 2AZ, UK

Financial SurveillanceMarianne Frisen (Editor)Wiley, 2008, viii + 264 pages, US$ 120.00 / £ 60.00 / € 84.00, hardcoverISBN: 978-0-470-06188-6

Table of contents

1. Introduction to financial surveillance 6. Surveillance of univariate and multivariate(Marianne Frisen) nonlinear time series (Yarema Okhrin,

2. Statistical models in finance (Helgi Tomasson) Wolfgang Schmid)3. The relation between statistical surveillance and 7. Sequential monitoring of optimal portfolio

technical analysis in finance (David Bock, weights (Vasyl Golosnoy, Wolfgang Schmid,Eva Andersson, Marianne Frisen) Iryna Okhrin)

4. Evaluations of likelihood-based surveillance 8. Likelihood-based surveillance forof volatility (David Bock) continuous-time processes (Helgi Tomasson)

5. Surveillance of univariate and multivariate 9. Conclusions and future directionslinear time series (Yarema Okhrin, (Marianne Frisen)Wolfgang Schmid)

Readership: Advanced level students in statistics, economics, and finance. Readers fromstatistics or finance wanting to learn more about the other discipline.

The editor of this book defines the aim of financial surveillance as being ‘to signal at the optimaltrading time’. Implicit in this is the detection of that optimal time, and this problem is the coreof the book. It thus describes statistical tools for monitoring an on-going stream of informationand deciding, at each stage, whether action should be taken. Of course, other situations with thissame abstract description occur in many other places, so it is not surprising that a large range ofmodels and techniques have been developed for approaching such problems. This also meansthat the book would be valuable for readers beyond those with a narrow financial interest.

Topics covered by the various chapters of the book include early Shewart, CUSUM and EWMAmethods, more elaborate ML approaches, performance measures of surveillance systems,multivariate problems, general statistical approaches to modelling financial time series, therelation between surveillance methods and technical analysis, linear and nonlinear time series,optimal portfolio weights, and continuous time processes. The perspective is very much a



statistical one, and no attempt is made to examine the parallel computational intelligence workon this problem.

The opening sentence of the final chapter nicely conveys the editor’s perspective on thesubject matter, saying ‘financial methods have developed from mathematics and probabilitytheory to statistical methods for decision theory.’ This makes it clear that statistics builds on themathematics and probability theory, and is something beyond those.

I was a little disappointed that, in the final chapter, the editor did not give more specificindications of where she felt that future work was needed, restricting herself to generalobservations of likely directions. Nonetheless, overall the book does provide a very goodintroduction to the problems and statistical methods for financial surveillance. I shall certainlyrecommend it to my students.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

New Drug Development: Design, Methodology, and AnalysisJ. Rick TurnerWiley, 2007, xxi + 270 pages, £ 50.50 / € 66.40, hardcoverISBN: 978-0-470-07373-5

Table of contents

PART I: INTRODUCTION 8. Clinical significance: employment1. New drug development of confidence intervals2. The regulatory environment for new 9. Sample-size estimation

drug development PART IV: LIFECYCLE CLINICALPART II: DRUG DISCOVERY DEVELOPMENT

AND NONCLINICAL RESEARCH 10. Safety assessment in clinical trials3. Drug discovery 11. Efficacy assessment in clinical trials4. Nonclinical research 12. Pharmaceutical and biopharmaceutical

PART III: DESIGN, METHODOLOGY, drug manufacturingAND ANALYSIS 13. Postmarketing surveillance and5. Design and methodology in clinical trials evidence-based medicine6. Statistical analysis PART V: INTEGRATIVE DISCUSSION7. Statistical significance: employment of 14. Unifying themes and concluding

hypothesis testing comments

Readership: Entry-level professionals in the pharmaceutical, biotechnology, and contractresearch organisation industries, as well as seasoned clinical research professionals who wish torefresh their knowledge in areas outside their immediate fields of expertise.

This book gives a comprehensive presentation of all of the stages involved in drug development,from identifying a potentially useful drug candidate, through regulatory issues, to post marketingsurveillance. The coverage is indeed extensive – and therefore necessarily not very detailed. Justto indicate the breadth of this coverage, here is a fairly random and incomplete selection fromthe very many section headings: pharmacokinetics, clinical trial design, case report forms,project management, descriptive and inferential statistics, anova, multiple testing, confidenceintervals, ethical issues, safety assessment, equivalence trials, adaptive designs, commercial



manufacturing, quality control, and many other topics. Powerpoint slides to accompany thebook are available on the web.

As is indicated by the above selective list, this is not a statistics text. It does not includemathematical formulae. However, the role that statistics plays in the book is nicely illustratedby a comment made in Chapter 14: ‘the discipline of Statistics is regarded as a wide-ranging discipline that provides critical assistance in study design at all stages of new drugdevelopment and provides information that facilitates decision-making at all stages of thisprocess.’

Overall, this volume would provide an excellent introduction for anyone entering thepharmaceutical industry. It would also make superb background reading for medical statisticsMSc students.

David J. Hand: [email protected] Department, Imperial College

London SW7 2AZ, UK

Probabilistic Networks and Expert Systems: Exact Computational Methods forBayesian NetworksRobert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen, David J. SpiegelhalterSpringer, 2007, xii + 324 pages, £ 30.50 / € 39.95 / US$ 49.95, softcoverISBN: 978-0-387-71823-1

Table of contents

1. Introduction 7. Gaussian and mixed discrete-Gaussian networks2. Logic, uncertainty, and probability 8. Discrete multistage decision networks3. Building and using probabilistic networks 9. Learning about probabilities4. Graph theory 10. Checking models against data5. Markov properties on graphs 11. Structural learning6. Discrete networks

Readership: Researchers, graduate students – involved in the theory and applications ofprobabilistic expert systems.

This is a re-issue of an earlier classic by the authors. Given that much of the book was atthe cutting edge when written and came of the new or almost new research of the authors,it is remarkably clear, well written, and well illustrated with both toy and real life examples.The introductory pages contain several suggestions about how to read the book, depending onmotivation and background.

With an overview and quick history of non-probabilistic expert systems taken care of inChapters 1 and 2, a systematic study of probabilistic expert systems begins in Chapter 3. Thischapter deals with relatively simple examples, including a real life medical diagnostic system,namely, the CHILD network for diagnosing the possible disease of a newborn with symptomsof “cyanosis (blue baby)” or breathlessness. The network is basically a tree, so it illustrates in asimple way how graph theory and conditional independence can help build a complex inferenceengine based on “local computation”. Chapter 3 also provides a preview of the theory to come.So far the book is easy reading.



The level changes with Chapter 4. It develops the necessary theory of decomposable graphs,and Chapter 5 uses such graphs and a Markovial probability structure on those graphs to representcomponents of expert knowledge as nodes of the graph and the interrelations among the nodesthrough the edges and the Markov probability structure. Chapter 6 starts with discrete probabilitynetworks and explains carefully both the local computations and the so-called propagationof uncertainty in the system when some piece of evidence enters the system or is changed.More complex networks are needed to represent probabilistic knowledge, available decisionsand corresponding utilities, and manipulate all these ingredients for optimal decision makingvia backward induction. This is done in Chapter 8. I enjoyed sailing through the first threechapters as well as plodding through the next four 4, 5, 6, 8, which constitute the core of thebook.

Chapter 7 is interesting but not essential, while Chapters 9, 10 and 11 are somewhat tentativeand could have benefited from a postscript from the authors. My advice to readers would be tochoose between Chapters 1, 2, and 3 or the more demanding option of Chapters 1 through 6.Both would be very rewarding.

Jayanta K. Ghosh: [email protected] of Statistics, Purdue University

West Lafayette, IN 47909, USA

Bayesian Networks and Decision Graphs, 2nd EditionFinn V. Jensen, Thomas D. NielsenSpringer, 2007, xvi + 448 pages, € 64.95 / £ 50.00 / US$ 79.95, hardcoverISBN: 978-0-387-68281-5

Table of contents

1. Prerequisites on probability theory 7. Learning the structure of Bayesian networksPart I: Probabilistic graphical models 8. Bayesian networks as classifiers

2. Causal and Bayesian networks Part II: Decision graphs3. Building models 9. Graphical languages for specification4. Belief updating in Bayesian networks of decision problems5. Analysis tools for Bayesian networks 10. Solution methods for decision graphs6. Parameter estimation 11. Methods for analyzing decision problems

Readership: Graduate students, researchers, readers in both academia and industry who want tolearn enough theory and technology to be able to build expert systems.

Bayesian networks for learning and making decision was pioneered in the eighties and ninetiesby various people, among whom Shaefer, Pearl, Dawid, Lauritzen and Spiegelhalter deservespecial mention. The present book provides a very readable but also rigorous and comprehensiveintroduction to the subject. It would make a very good text for a graduate or an advancedundergraduate course.

The exposition in Chapters 1 through 3 is very well structured. They introduce the readerquickly but smoothly to Bayesian networks with lots of motivation, real life examples, andclarification of many subtle points. This part of the book is very suitable for a reader who wantsan easy but comprehensive introduction to what it is all about.



Chapter 4 on belief updating on the basis of new evidence is quite technical but all the technicalconcepts are first motivated and then illustrated with simple examples before formal definitionor presentation of theorems.

The remaining parts of Part 1, namely Chapters 5 through 8, provide interesting informationon various aspects of learning like sensitivity analysis or parameter estimation, as well asapplications like classification or learning the architecture of a latent network from the way itworks. Especially the last topic is currently at the cutting edge of understanding relatively simplebiological or communication systems.

Part 2 deals with using decision making based on Bayesian networks. It has all the strengthof Part 1. It could be used for a second course on Bayesian networks. As the authors point out,the mathematical level is considerably higher than in Part 1.

Altogether, this is a very useful book for anyone interested in learning Bayesian networkswithout tears.



Nonlinear Dimensionality ReductionJohn A. Lee, Michel VerleysenSpringer, 2007, xviii + 308 pages, € 64.95 / £ 50.00 / US$ 79.95, hardcoverISBN: 978-0-387-39350-6

Table of contents

1. High-dimensional data A. Matrix calculus2. Characteristics of an analysis method B. Gaussian variables3. Estimation of the intrinsic dimension C. Optimization4. Distance preservation D. Vector quantization5. Topology preservation E. Graph building6. Method comparisons F. Implementation issues7. Conclusions

Readership: Statisticians, computer scientists working in data mining, practicing scientists withsufficient background in statistics or computational learning, graduate students in statistics orcomputer science.

If you have not followed actively the work in dimensionality reduction (DR), you may not knowthat there is a great deal more out there nowadays than just principal components analysis,multidimensional scaling or Sammon’s nonlinear mapping. Many of the interesting new ideasin DR have emerged in the neural network and data mining communities and the book by Leeand Verleysen presents a comprehensive summary of the state-of-the-art of the field in a veryaccessible manner. It is the only book I know that offers such a thorough and systematic accountof this interesting and important area of research. The authors manage to explain the differentDR methods in a remarkably unified and transparent manner. If complemented with a set ofexercises, the book could perhaps even be used as a text for a university course in dimensionalityreduction.



After discussing the peculiarities of high-dimensional spaces and describing the basic conceptsof manifolds in Chapter 1, the book presents in Chapter 2 the general goals of dimensionalityreduction using principal components analysis as an example. The authors also put forward12 discriminating criteria – such as linear vs. nonlinear, continuous vs. discrete and batchvs. online algorithm – that can be used to classify the various DR methods into differentcategories. Chapter 3 presents methods for estimation of the intrinsic dimensionality of the datamanifold. Chapters 5 and 6 describe the various DR methods in detail, evaluate their performanceusing the same two toy examples throughout all comparisons, discusses the advantages anddisadvantages of the methods, and gives references to the available software. In Chapter 6 afurther thorough comparison of the various methods is presented using both artificial examplesand real data. Chapter 7 finally gives conclusions, discusses once more the relative merits ofthe DR methods from different points of view, proposes taxonomies and suggests a generalsequence of data-analysis steps one might take to apply dimensionality reduction in practice.The authors also give advice as to which particular methods might turn out to be useful given theoverall nature of the data at hand and the final objective of dimensionality reduction. The bookcontains several appendices including such standard methodological background as matricesand Gaussian variables but it also gives descriptions of vector quantization and graph buildingthat both figure prominently in many DR approaches.

Reading the book is quite enjoyable – getting rid of the remaining typos and polishing the useof English a bit more would have made it even more so.

Lasse Holmstrom: [email protected] of Mathematical Sciences, P.O. Box 3000

FI-90014 University of Oulu, Finland

Multiple Testing Procedures with Applications to GenomicsSandrine Dudoit, Mark J. van der LaanSpringer, 2008, xxxiv + 588 pages, € 69.95 / £ 54.00 / US$ 84.95, hardcoverISBN: 978-0-387-49316-9

Table of contents

1. Multiple hypothesis testing 9. Identification of differentially expressed and2. Test statistics null distribution co-expressed genes in high-throughput gene3. Overview of multiple testing procedures expression experiments4. Single-step multiple testing procedures for 10. Multiple tests of association with biological

controlling general type I error rates, �(FVn ) annotation metadata5. Step-down multiple testing procedures for 11. HIV-1 sequence variation and viral replication

controlling the family-wise error rate capacity6. Augmentation multiple testing procedures for 12. Genetic mapping of complex human traits

controlling generalized tail probability error rates using single nucleotide polymorphisms:7. Resampling-based empirical Bayes multiple the ObeLinks project

testing procedures for controlling generalized 13. Software implementationtail probability error rates A. Summary of multiple testing procedures

8. Simulation studies: assessment of test B. Miscellaneous mathematical and statistical resultsstatistics null distributions C. SAS code

Readership: Researchers in Statistical Genetics.



The subject matter, as described in the first sentence of Chapter 1, grabs the attention:“simultaneous test of thousands, or even millions, of null hypotheses”. Phew, this is a longway from the Neyman–Pearson Lemma. For good measure, such problems occur for “high-dimensional multivariate distributions, with complex and unknown dependence structures”. So,the task seems doubly impossible.

The material is quite technical, based on research published over the last few years by theauthors and a few others. There are many theoretical results, proofs and derivations, plus a largenumber of figures and tables and even 24 pages of colour plates. The authors could never beaccused of laziness: everything is covered at great length and in great detail. The text aboundswith overviews, scene-setting and literature reviews, which is probably necessary to balance theequation-heavy sections.

Chapters 1 and 2 set out the framework of multiple hypothesis testing. (Here one can learnwhat a Type III error is.) Chapters 3 to 7 give a catalogue of “multiple testing procedures”,a phrase that occurs in each chapter heading (and the book title). Chapter 8 presents somesimulations for assessment of the methods. In the second half of the book Chapters 9 to 12 coversome biological/genetic applications and Chapter 13 describes a software package and someSAS macros.

The core ideas, as presented in Chapter 2, involve a set of test statistics and rejection/criticalregions for the hypotheses to be tested. The true joint distribution of the test statistics is unknown(inaccessible) but is replaced by a proxy: this is called a “test statistics null distribution”, whichis known or estimated by bootstrap resampling. Its vital characteristic is that it should tend toyield a larger Type I error rate than the true joint distribution. So, by controlling the accessibleerror rate one controls the unknown true one. Explicit constructions for “test statistics nulldistributions” are given in Sections 2.3 and 2.4.

In summary, the methodology here is ambitious in tackling a tough problem. At first glancethe tools seem a little specialised but they could well become more widespread in time. Thecentral ideas are certainly novel and interesting and will, in my opinion, receive much criticalattention now that they appear in book form.

Martin Crowder: [email protected] Department, Imperial College

London SW7 2AZ, UK

Elementary Bayesian BiostatisticsLemuel A. MoyeChapman & Hall/CRC, 2007, xxi + 376 pages, £ 42.99 / US$ 79.95, hardcoverISBN: 978-1-58488-724-9

Table of contents

1. Basic probability and Bayes theorem 7. Using posterior distributions: loss and risk2. Compounding and the law of total probability 8. Putting it all together3. Intermediate compounding and prior distributions 9. Bayesian sample size4. Completing your first Bayesian computations 10. Predictive power and adaptive procedures5. When worlds collide 11. Is my problem a Bayes problem?6. Developing prior probability 12. Conclusions and commentary



Readership: Strongly recommended to teachers and students of Bayesian biostatistics.

This is a fun book for teaching oneself (or others) both some fundamental principles ofepidemiology and clinical trials and fundamental principles of probability and statisticalinference from the point of view of a practising clinical scientist who is also a very knowledgeableno nonsense Bayesian. What makes it very different from common textbooks is its blending ofhistory, controversy (about probability, statistics, and clinical studies), real life examples, andwise practical advice.

The Prologue, also nicknamed Opening Salvos, provides a short tongue-in-cheek history ofthe Bayesian paradigm, right up to the conflict about P-values. Chapter 5 describes a partlyironic picture of two statisticians, a Frequentist and a Bayesian, struggling to win over the soulof a physician.

The other chapters provide a very readable introduction to basic probability models, inferencequestions, and Bayesian answers without calculus and Markov chain Monte Carlo.

Chapter 6 is a must read even for a seasoned Bayesian, I hope I qualify as one such. Thischapter advises “separating prior knowledge from prior belief”, often (ignorant) prior beliefof subject matter experts, and the need to have a “counter-intuitive” prior which puts positivemass on those parts of the parameter space that are currently counter-intuitive. Two wonderfulexamples, pages 179 through 182, present a compelling account of the damage that can becaused by prior belief that is not based on prior knowledge. In this connection, I would onlyadd that clinical trials are experiments conducted to prove a hypothesis to a whole communityof clinical scientists, not an experiment for a Bayesian scientist to learn one self. It is in suchcontexts that even subjective Bayersians, like Don Rubin, Jay Kadane, and even Savage, wouldadvocate apparently non-Bayesian methods like randomization as a relatively easy alternativefor a full Bayesian protocol that can guard itself against unknown bias.

Chapters 7 and 8 have a good discussion of loss function and posterior risk, and their usein decision making. Chapter 9 discusses how a Bayesian may choose a suitable sample size,a problem that is often not discussed in Bayesian textbooks but which is very important inclinical trials. Chapter 10 lists some of the recent practical gains and changes in the protocolfor clinical trials, namely predictive power and adaptive randomization, both introduced byBayesians. Chapter 11 provides an overview and many further practical insights.



Introduction to Applied Bayesian Statistics and Estimation for Social ScientistsScott M. LynchSpringer, 2007, xxviii + 364 pages, € 59.95 / £ 46.00 / US$ 74.95, hardcoverISBN: 978-0-387-71264-2

Table of contents

1. Introduction 5. Modern model estimation part 2:2. Probability theory and classical statistics Metropolis–Hastings sampling3. Basics of Bayesian statistics 6. Evaluating Markov chain Monte Carlo algorithms4. Modern model estimation part 1: Gibbs sampling and model fit



7. The linear regression model 11. Conclusion8. Generalized linear models A. Background mathematics9. Introduction to hierarchical models B. The central limit theorem, confidence intervals,

10. Introduction to multivariate regression models and hypothesis tests

Readership: Social scientists.

This is a good introduction to computation intensive, i.e., modern Bayesian analysis. The authorwrites that he aimed at “a highly applied book showing how to use MCMC” to do a completeBayesian analysis of “typical social science data” with “typical social science models”. This goalis admirably achieved without assuming any background other than familiarity with standardmodels in social science and some classical statistical analysis.

Chapter 1 reviews probability, Bayesian ideas are introduced in Chapter 3, keeping in mindnecessary general ideas as well as the needs of social scientists. The next three chapters provide ahands on introduction to MCMC, namely Gibbs and Metropolis–Hastings, use of the evaluationof convergence and mixing of the algorithms, and finally use of the MCMC output for estimationof model parameters and model fit as well as answers to relevant practical questions that the dataare supposed to answer. There is not much material on testing via Bayes factors or predictionvia the currently very popular model selection or model average. Apparently, the typical socialscientist is less in need of those than say the biologists and other appliers of statistics. However,these topics are introduced and an explanation is given as to why they are not essential in anintroductory text for social science.

The last three chapters cover linear regression, generalized linear models, hierarchicalBayesian modeling and analysis, and (multivariate) regression data with multiple outcomes.All are covered well. Pains are taken to show how and why the Bayesian analysis can answermore questions in a more practically relevant way.

My favorite is the treatment of hierarchical Bayes, with a clear distinction of both thehierarchy of parameters and hierarchy of measurements as in panel or multilevel data. Allthese applications, but specially Chapters 8 and 9, bring out the advantage of Bayesian methodsover classical statistics.

Throughout the book Lynch points out, coherently and persuasively, the many practicaladvantages of being an objective Bayesian. A strength of the book is the close inter-weaving ofphilosophy, computation and bread and butter analysis of real life social science problems withhigh dimensions, missing data, and other complications like using past data as in the votingexample.

The chapters are very well integrated by analysing a couple of examples by the somewhatdifferent methods of different chapters.

My only critical remark relates to use of posterior predictive distributions for model fit. Thisis probably all right for large data sets, the sample size being much bigger than parameterdimension, but in other cases, it may lead to overfitting caused by double use of same data. Seefor example Bayarri and Berger (JASA 2000) and Ghosh, Purkayastha and Samanta (Handbookof Statistics, vol 25, eds Dipak K. Dey and C. R. Rao, Elsevier, 2005).





Information Criteria and Statistical ModelingSadanori Konishi, Genshiro KitagawaSpringer, 2008, xii + 276 pages, € 64.95 / £ 50.00 / US$ 79.95, hardcoverISBN: 978-0-387-71886-6

Table of contents

1. Concept of statistical modeling 6. Statistical modeling by GIC2. Statistical models 7. Theoretical development and asymptotic3. Information criterion properties of the GIC4. Statistical modeling by AIC 8. Bootstrap information criterion5. Generalized information criterion (GIC) 9. Bayesian information criteria

10. Various model evaluation criteria

Readership: Researchers and graduate students in statistics, computer science and engineering,anyone interested in statistical modeling.

This book explains the basic ideas of model evaluation and presents the definition and derivationof the AIC and related criteria, including BIC. Both theoretical and practical aspects, togetherwith a wide range of practical applications, are discussed. The authors introduce a generalizedinformation criterion GIC that relax the assumptions imposed on the AIC. The GIC can be appliedto evaluate statistical models constructed by various types of estimation procedures includingthe robust estimation and maximum penalized likelihood. This generalized criterion has notbeen available in standard textbooks. The Bayesian approach to model evaluation is discussedin Chapter 9. There the authors mention that the minimum description length MDL coincideswith the BIC. However, they completely neglect the last two decades in the development of theMDL theory which reveals that MDL is much more than the BIC. The book makes a majorcontribution to the understanding of statistical modeling. Researchers interested in statisticalmodeling will find a lot of interesting material in it.

Erkki P. Liski: [email protected] of Mathematics and StatisticsFI-33014 University of Tampere, Finland

Statistics of Financial Markets: An Introduction, 2nd EditionJurgen Franke, Wolfgang K. Hardle, Christian M. HafnerSpringer, 2008, xxii + 501 pages, € 64.95 / £ 50.00 / US$ 89.95, softcoverISBN: 978-3-540-76269-0

Table of contents

Part I. Option pricing 6. Black–Scholes option pricing model1. Derivatives 7. Binomial model for European options2. Introduction to option management 8. American options3. Basic concepts of probability theory 9. Exotic options4. Stochastic processes in discrete time 10. Models for the interest rate and interest rate5. Stochastic integrals and differential equations derivatives



Part II. Statistical models of financial time series 17. Copulae and value at risk11. Introduction: definitions and concepts 18. Statistics of extreme risks12. ARIMA time series models 19. Neural networks13. Time series with stochastic volatility 20. Volatility risk of option portfolios14. Non-parametric concepts for financial time series 21. Nonparametric estimators for the probability

Part III. Selected financial applications of default15. Pricing options with flexible volatility estimators 22. Credit risk management16. Value at risk and backtesting Appendix 1. Integration theory

Appendix 2. Portfolio strategies

Readership: Anyone interested in the statistical analysis of financial data.

The first edition of this book (424 pages) appeared in 2004. The current edition (495 pages)contains an extensive update on copulas (Chapter 17) and a more detailed discussion on GARCHmodels (Chapter 13). Also various sections, especially relevant for risk management, have beenupdated; for instance Chapter 22 (new) is on credit risk management. As a consequence, thetext is rather encyclopaedic in character. In total, the book contains 22 chapters. This impliesthat throughout depth has to be traded against breadth. Also the quality of the various chaptersis rather variable: no doubt, the core competence of the three authors in the field of time seriesanalysis is evident. A definite bonus is the on-line availability of R- and Matlab code. For abroad reference text of this type I would have wished a more careful proofreading, especiallyat the level of the references, and a more intelligently structured and more detailed index. Asexamples concerning the references, just check Artzner and Sklar. Also I find it strange that witha substantial updated chapter on copulas, there is no reference included to the statistical workof Genest and co-workers. The famous Pickands–Balkema–de Haan Theorem is not co-creditedto Balkema and de Haan. Finally, in a book containing chapters on copulas and credit riskmanagement, I would have expected a discussion of the Li model. I somehow have the feelingthat the authors were rushed into “getting the job done” which is a pity as many of the topicstreated will appeal to a broad readership.

Paul Embrechts: [email protected] of Mathematics, ETH Zurich, HG F 42.3

8092 Zurich, Switzerland

Introduction to BioinformaticsAnna TramontanoChapman & Hall/CRC, 2006, xiv + 174 pages, £ 30.59 / US$ 59.95, softcoverISBN: 978-1-58488-569-6

Table of contents

1. The data: storage and retrieval 6. Prediction of the three-dimensional structure of a protein2. Genome sequence analysis 7. Homology modeling3. Protein evolution 8. Fold recognition methods4. Similarity searches in databases 9. New fold modeling5. Amino acid sequence analysis 10. The “omics” universe



Readership: Students of bioinformatics and computational biology.

This book provides a nice summary of introductory topics in bioinformatics, suitable for higherlevel undergraduates with some biological background looking to enter the field or masterslevel graduate students. In addition to biology, the reader would benefit from having basicmathematical and computational skills.

Tramontano begins each chapter with a glossary of relevant terms, a useful tool for thosenew to bioinformatics. Each chapter ends with helpful references and a series of questions forthe reader to answer. The questions range in difficulty and many require the reader to exploreonline software and databases. The author discusses topics such as sequence alignments andprotein structures, explaining several methods and evaluating their strengths and weaknesses. Thedescriptions of the motivations behind the problems in bioinformatics are good and appropriatefor a book of this level. The discussion of topics such as microarrays is done only superficially.While this book is not the foremost reference for bioinformatics, the subject matter is informativeand well written for an introductory book.

Frank Mannino: [email protected] Statistics Unit, GlaxoSmithKline Pharmaceuticals

1250 South Collegeville RoadCollegeville, PA 19426-0989, USA

Generalized Linear Models for Insurance DataPiet de Jong, Gillian Z. HellerCambridge University Press, 2008, x + 196 pages, £ 35.00 / US$ 70.00, hardcoverISBN: 978-0-521-87914-9

Table of contents

1. Insurance data 7. Categorical responses2. Response distributions 8. Continuous responses3. Exponential family responses and estimation 9. Correlated data4. Linear modeling 10. Extensions to the generalized linear model5. Generalized linear models Appendix 1. Computer code and output6. Models for count data

Readership: Students of generalized linear models; particularly, but not exclusively, actuarialstudents.

The authors, who are employed by Macquarie University in Sydney, Australia, have combinedtheir experiences (in teaching actuarial students and in analyzing insurance data) in this slimwell-written volume. It is extremely clear and has many useful diagrams. The only possibledrawback for some readers is the use of SAS/STAT (Version 9.1) and R (where SAS lacksconvenience) in the data analyses Appendix (pp. 150–191, a quarter of the book). Of course,authors have to select some computing system, and it seems reasonable to believe that thoseemploying other systems will be able to use the detailed analyses exhibited in the Appendix tocheck their own calculations. Overall verdict: Highly recommended!

Norman R. Draper: [email protected] of Statistics, University of Wisconsin – Madison

1300 University Avenue, Madison, WI 53706-1532, USA



An Introduction to the Theory of Point Processes, Volume II: General Theory andStructure, 2nd EditionDaryl J. Daley, David Vere-JonesSpringer, 2008, xviii + 566 pages, € 69.95 / £ 54.00 / US$ 89.95, hardcoverISBN: 978-0-387-21337-8

Table of contents

9. Basic theory of random measures and point 12. Stationary point processes and randomprocesses measures

10. Special classes of processes 13. Palm theory11. Convergence concepts and limit theorems 14. Evolutionary processes and predictability

15. Spatial point processes

Readership: Statisticians and probabilists working on applied or theoretical aspects of pointprocess modeling; PhD students in these areas; researchers applying point process models.

The second volume is an updated version of chapters 9–15 of the first edition. The order of therepresentation is changed, and new material includes point processes related to Markov processesand chains, modeling long-range dependency and self-similarity, fractal dimensions and limittheorems related to equilibrium. Together with volume I this is a rich source of information ontheory and applications of point process methods.

Esko Valkeila: [email protected] University of Technology, Institute of Mathematics

P.O. Box 1100, FI-02015 TKK, Finland

Reliability and Risk: A Bayesian PerspectiveNozer D. SingpurwallaWiley, 2006, xvi + 371 pages, € 87.80 / £ 65.00 / US$ 130.00, hardcoverISBN: 978-0-470-85502-7

Table of contents

1. Introduction and overview 10. Survivability of co-operative, competing and vague2. The quantification of uncertainty systems3. Exchangeability and indifference 11. Reliability and survival in econometrics and finance4. Stochastic models of failure Appendix A. Markov chain Monte Carlo simulation5. Parametric failure data analysis Appendix B. Fourier series models and the power6. Composite reliability: signatures spectrum7. Survival in dynamic environments Appendix C. Network survivability and Borel’s paradox8. Point processes for event histories9. Non-parametric Bayes methods in reliability

Readership: Practitioners and research workers in reliability and survivability, graduate studentsattending a course in reliability and risk analysis.

Nozer D. Singpurwalla has written an interesting and unusual book on reliability. He begins withrisk evaluation and risk management associated with occurrence of an extreme event like failure



for a mechanical device or death of a human being. Management of risk leads to subjective prob-ability (Chapter 2), Bayesian modeling (Chapter 3) and Bayesian decision theory (Chapter 2).Then follow the usual models (Chapters 4 and 5) but with lots of insight obtained by basingthem on or linking them to shock models, shot noise processes, point processes etc. Chapter6 contains a lot of new material and an invitation to use signatures as covariates in survivalanalysis.

Chapters 1 through 8 provide new approaches to reliability, new ideas like hazard potentials,and sophisticated modeling insights that provide better understanding of even classical bivariateexponentials. Chapter 9 and the first half of Chapter 10 are rather specialized but fit quite wellwith the earlier chapters. The rest of the book, namely, the second half of Chapter 10, andChapter 11 are rather speculative, interesting problems and analogies are introduced and partlyexplored.

Singpurwalla seems to be at his best in probabilistic modeling of reality. He has written whatmust be one of the first books on reliability written from a subjective, Bayesian point of view.However, his Bayesian belief doesn’t stand in the way of presenting a good survey of alternativeapproaches at various points in the book.

There are many interesting topics, usually not covered in reliability texts, e.g., modelingbased on copulas, prediction based on Kalman filters, signatures, fuzzy sets, neural networks.Important questions are raised regarding the last three. How do you use signature as a covariate inreliability analysis? How can you combine probability theory and a fuzzy description of realityto get a better picture of reliability? How can you devise the architecture of a neural networkthat can learn about the survival function from survival data?

As the author mentions in his preface, the book can be read in several different ways, as atext for a graduate level course on reliability or as a source book for “information and openproblems.” This book has been a joy to read for this reviewer.



Design, Evaluation, and Analysis of Questionnaires for Survey ResearchWillem E. Saris, Irmtraud N. GallhoferWiley, 2007, viii + 376 pages, £ 47.50 / € 60.00 / US$ 89.95, hardcoverISBN: 978-0-470-11495-7

Table of contents

Introduction Part II. Choices involved in questionnaire designPart I. The three steps procedure to design requests 4. Specific survey research features of requestsfor an answer for an answer1. Concepts-by-postulation and concepts- 5. Response alternatives

by-intuition 6. The structure of open ended and closed2. From social science concepts-by-intuition survey items

to assertions 7. Survey items in batteries3. The formulation of requests for an answer 8. Mode of data collection and other choices



Part III. The effects of survey characteristics on Part IV. Applications in social science researchdata quality 13. The prediction and improvement of survey

9. Criteria for the quality of survey measures requests by SQP10. The estimation of reliability, validity and 14. The quality of measures for concepts-by-

method effects postulation11. The split ballot MTMM designs 15. Correction for measurement error in survey data12. The estimation of the effects of measurlement analysis

characteristics on the quality of survey questions 16. Coping with measurement error in cross-culturalresearch

Readership: Researchers and practitioners in any field interested in developing better ques-tionnaires and understanding the uncertainties of the survey process. Suitable for advancedundergraduate or beginning graduate-level survey research courses.

Questionnaire is the heart of the survey research and therefore its design is crucial. This bookfocuses on methods of designing a good survey questionnaire. It is not a simple task, althoughsome people might think so, and there are number of issues that need to be considered. Only afew of those are related to statistics, so it is easy to believe that designing a good questionnairerequires team work.

The book consists of four parts, preceded by an excellent, short introduction to survey research.Part I shows how to define the concepts of interest, present them as statements and transformthem into questions. This requires expertise of the field and linguistic skills. The latter aresomewhat specific to the language used, although many of the topics are language-independent.

The challenges of measurement come along with the response alternatives in Part II. Theexamples are mainly taken from the European Social Survey, and they are used to demonstrateand illustrate the methods and techniques in all parts of the book. For anyone interested in thisbook I would also recommend Alwin (2007), which has examples from American surveys.

Parts III and IV go deeper into statistical models and criteria for the quality of survey questions.Sometimes it feels that the authors may have had problems balancing the content optimally. Someof the more complex issues have been skipped, and some topics are given merely by presentingchunks of minimally documented LISREL code. The survey quality predictor software soundsinteresting, although it is heavily language-dependent.

To conclude, I would say that this book is quite inspiring, giving many practical ideas forsurvey research, especially for designing better questionnaires.

Reference

Alwin, Duane F. (2007). Margins of Error: A Study of Reliability in Survey Measurement. Wiley.

Kimmo Vehkalahti: [email protected] of Mathematics and StatisticsFI-00014 University of Helsinki, Finland



Margins of Error: A Study of Reliability in Survey MeasurementDuane F. AlwinWiley, 2007, xvi + 389 pages, £ 52.95 / € 66.70 / US$ 99.95, hardcoverISBN: 978-0-470-08148-8

Table of contents

1. Measurement errors in surveys 7. The source and content of survey questions2. Sources of survey measurement error 8. Survey question context3. Reliability theory for survey measures 9. Formal properties of survey questions4. Reliability methods for multiple measures 10. Attributes of respondents5. Longitudinal methods for reliability estimation 11. Reliability estimation for categorical latent6. Using longitudinal data to estimate reliability variables

parameters 12. Final thoughts and future directions

Readership: Statisticians and researchers in the fields of survey research and public opinion, forexample. Suitable for undergraduate or graduate-level survey methodology courses.

The theme of this book is reliability, and how it is a necessary condition for valid measurementin any empirical science. The book takes a broad view to this topic, and hence the word Surveyin its title is well justified. Indeed, beginning from the data gathering process of a typicalsurvey research, the author deconstructs it to six major elements of the response process, clearlyshowing the points where the various measurement errors may occur, and how they are relatedto other sources of errors, such as sampling, non-response, and coverage.

The book reflects the author’s 40 years of experience in field. Therefore the ideas andpoints are presented very clearly throughout the book. Some parts of the book are moredemanding theoretically, but most of the content can be understood quite easily without complexmathematical formulas.

The examples cover nearly 500 survey measures obtained in surveys conducted at theUniversity of Michigan. These examples are used throughout the book to clarify the issues inpractice. For anyone interested in this book I would also recommend Saris & Gallhofer (2007),which in turn has examples from the European Social Survey. I would say it is interesting tomake comparisons between European and American surveys.

The last chapter, titled Final Thoughts and Future Directions, is an excellent ending for thebook. Experts of survey research could begin by reading that chapter first, as it gives the mostimportant points of the book in a nutshell and refers nicely back to the previous chapters. Inaddition, it includes precious thoughts and ideas for future research.

Experience talks in this book. Truly recommended.

Reference

Saris, Willem E. & Gallhofer, Irmtraud N. (2007). Design, Evaluation, and Analysis of Questionnaires for SurveyResearch. Wiley.




Indirect SamplingPierre LavalleeSpringer, 2007, xvi + 245 pages, £ 46.00 / € 59.95 / US$ 72.95, hardcoverISBN: 978-0-387-70778-5

Table of contents

1. Introduction 6. Application in longitudinal surveys2. Description and use of the GWSM 7. GWSM and calibration3. Literature review 8. Non-response4. Properties 9. GWSM and record linkage5. Other generalisations 10. Conclusion

Readership: Sampling statisticians and researchers designing sample surveys, postgraduatestudents of statistics studying sample design theory.

In sound sample survey design based on selection probabilities, an often encountered, majorhurdle is cost of constructing a suitable sampling frame. The ideal frame is a single, complete, up-to-date list of the eligible, potential respondents, without duplications. The high cost of producingand maintaining such frames, even for major Official Statistics agencies, is one reason clustersampling and its variants are so often used. In cluster sampling, areas are sampled, and areas linkto respondents, so that the frame used can be the list of areas rather than respondents. Clustersampling is an example of indirect sampling, since it uses a frame which links to each potentialrespondent, rather than requiring the more extensive and expensive frame which would listall potential respondents directly. As the book elaborates, other examples of indirect samplingare network sampling, multiplicity estimation, adaptive cluster sampling, and certain types ofsnowball sampling.

In its simplest form, indirect sampling starts with two populations, population A for whichwe have a frame, and population B, which consists of the potential survey respondents and isassumed to consist of clusters (eg households), the characteristics and size of which may not beknown at time of survey design. Each cluster in B, but not necessarily each unit in each cluster, isassumed to be linked to at least one element in population A. A sample is drawn from A, and itslinked units in B are sampled together with all other units in the same cluster in B. With knownselection probabilities for the units in population A, an unbiased estimate of the number of linksin A for each sampled unit in B can be formed, where unbiasedness is defined over all possiblesamples (which in general may or may not include any given sampled cluster in the achievedsample). For the sampled clusters, the actual number of such links is either known a priori orcollected from sampled units in the selected clusters. After the actual and estimated numberof links are summed separately for each sampled cluster, their ratio (which will generally bemore than one for sampled clusters, since it includes in inverse form the a priori probabilitiesof sampling each element in each sampled cluster) then becomes the weight for all units in thatsampled cluster when estimating means and totals. By construction, the denominator of thisratio is known without error and the expected value (over all possible samples) of the numeratorequals the denominator, so the expected value of the ratio is one, and the method is unbiasedeven though the number of links from B back to A is generally known only for sampled units(ie sampled clusters). This method does of course result in some loss of efficiency relativeto direct sampling of the respondents, but direct sampling would often be considerably moreexpensive for the same accuracy since it would require construction of a complete frame forpopulation B.



In its simplest form, the estimator for indirect sampling uses an indicator function to showwhether or not there is a link between a unit in A and a unit in B but, without loss of estimatorunbiasedness, it is shown in Indirect Sampling that estimator variance can be minimised byaltering the scale of the linking (in both numerator and denominator of the ratio) so that thescaled realisations are not necessarily zero or one. Certain types of two-stage indirect samplingwhich redefine clusters to ensure estimator unbiasedness remains unaffected are also possibleas discussed in Chapter 5.

This book is without doubt the current, indispensable reference on indirect sampling. Aswell as detailed exposition of the theory, it provides a clear, simple outline of the underlyinggeneralised weight share (GWSM) method for constructing sampling weights, and is leavenedthroughout by simple examples that illustrate the core ideas behind the theory. This enablesthe book to be read on two levels, one expository and the other as a reference. The connectionbetween the GWSM method and the Horwitz–Thompson (H–T) estimator is clearly outlined, asis the duality of the estimator when defined in A and B. There are chapters on generalisingindirect sampling, longitudinal surveys, calibration, non response, record linkage, and aconclusion section which summaries a range of real-world usages of the method and suggeststhat indirect sampling has the “potential to solve [complex estimation problems] in a simplemanner”.

The index is very brief, eg H–T estimator and its central duality role with respect to thetwo populations detailed in equation (4.1) and the pages that follow are not referenced. Thisbrevity may make the use of the book for reference purposes more difficult, but there are namedsubsections in the table of contents, and few of these are more than four pages in length which,given the book’s length, mitigates this difficulty. There is occasional awkwardness in Englishusage, but this is a minor matter and does not mar the meaning.

One of the potential issues for indirect sampling remains inefficiency of estimators, since theweights used for sampled units are ‘smoothed’ to be the same for all units in each sampled cluster.As outlined in the book, the form of clustering may be redefined. Once redefined however, themethod requires that all units in sampled clusters must be sampled, so that for a fixed cost(as for adaptive cluster sampling), the number of sampled clusters (which usually drives thevariance estimate) must be reduced in comparison with a sampling scheme which allows clustersubsampling. One possibility not explored in the book, is that, for a fixed cost, at the price ofsome bias but reduced mean square error it may be possible to randomly subsample at leastthe larger sampled clusters in B and use a (biased) ratio estimator for the weight for each suchsub-sampled cluster. Of course there are other possibilities for extension too. Generally, becauseof the very large possible cost savings, indirect sampling methods have considerable merit andpromise, which is one reason the conclusion of the book notes that the “more indirect samplingis studied, the more its potential to solve, in a simple manner, complex estimation problems isdiscovered”.

Stephen Haslett: [email protected] of Fundamental Sciences – Statistics

Massey University, PO Box 11222,Palmerston North, New Zealand



Studying Human Populations: An Advanced Course in StatisticsNicholas T. LongfordSpringer, 2008, xvi + 474 pages, £ 48.50 / € 62.95 / US$ 79.95, hardcoverISBN 978-0-387-98735-4

Table of contents

1. ANOVA and ordinary regression 7. Experiments and observational studies2. Maximum likelihood estimation 8. Clinical trials3. Sampling 9. Random coefficients4. The Bayesian paradigm 10. Generalised linear models5. Incomplete data 11. Longitudinal and time-series analysis6. Imperfect measurement 12. Meta-analysis and estimating many quantities

Appendix: A refresher

Readership: Postgraduate students of statistics, statistical analysts and other professionalinterested in the design and analysis of studies with responses from human subjects.

The preface notes several overarching themes: many statistical problems can be viewed asmissing data problems, small mean square error (efficiency) and its unbiased estimation(honesty) need to be emphasised; analysis is of information rather than of a single dataset;and study design is more important than analysis per se. The statistical topics are wide ranging.It is the themes, rather than the topics, that give the book its coherence.

The frequentist perspective, definitions of discrete and continuous distributions, indepen-dence, density and distribution functions, common classes of distributions, sample designand measurement, statistical calculus, and simple hypothesis testing and confidence intervalsare assumed known, although most of these topics are summarised in the appendices. Thisassumption of previous learning is the reason that the book, while published as a text instatistics, is classified as advanced. The publisher specifically notes that the book follows anovel curriculum. The link to the references is confined to a short paragraph at the end of eachchapter, so that, given the author’s publications constitute eighteen of the 198 references, it maynot be completely clear, even to advanced students, which parts of the book are the author’sown views and material. The index is somewhat limited, and could usefully be expanded in anyfuture edition, given the length of the book. For example, although mentioned in the text, andsome even as chapter subheadings, “sample size”, “randomness”, “multivariate distributions”,“coherence”, “incomplete data”, and “honesty” are not indexed. These factors may complicateuse as a textbook.

One of the core ideas of the book, implicit in the dedication to the fictional Shenki Xhadni (seefor example, Longford, N.L. Letters to the Editor, Significance, Vol 4, Issue 1, March 2007, p46),is that hypothesis testing generally fails to take account of actual or perceived consequences,so that events with low probability but severe outcomes may alter the balance in favour of analternative, even if statistically insignificant. There are links between this idea and prospecttheory (Kahneman, D and Tversky, A. (1979) “Prospect Theory: An analysis of decision underrisk”, Econometrica, Vol 47, No 2, p263–291). Prospect theory has obvious parallels too witheconomic utility theory, and endeavours to model real life choices rather than optimal decisionsby using a sum over all choices of products of two functions, the first depending on the (realor perceived) value of a consequence, the other on the probability of its outcome. One possiblelimitation with this type of approach, which remains to be fully explored as the alternativeto hypothesis testing and Bayesian methods that Longford proposes it be, is Kenneth Arrow’s



impossibility theorem (Arrow, K. J. (1951) Social Choice and Individual Values, John Wileyand Sons, New York). Arrow’s result indicates that where value judgements are personal ratherthan objective (as they are in voting contexts, or in Shenki Xhadni’s situation), such a decisionframework can only be applied in a coherent manner by an individual operating in isolation, notuniversally by two or more people with more than three choices.

In summary, the book provides a useful collection of material at postgraduate-level andbeyond, under broad themes. As a postgraduate textbook, its novel emphasis, its limited indexingand references, and its range of statistical sophistication suggest it may be better as a coursesupplement than as a sole text. For the wider readership, it can generally serve as a useful,consolidated reference on an extensive variety of topics.



Factor Analysis at 100: Historical Developments and Future DirectionsRobert Cudeck, Robert C. MacCallum (Editors)Lawrence Erlbaum Associates, 2007, xiv + 381 pages, US$ 38.70, hardcoverISBN: 978-0-8058-5347-6

Table of contents

1. Factor analysis in the year 2004: still spry at 100 11. Understanding human intelligence since Spearman2. Three faces of factor analysis 12. Factoring at the individual level: some matters for3. Remembering L. L. Thurstone the second century of factor analysis4. Rethinking Thurstone 13. Developments in the factor analysis of individual5. Factor analysis and its extensions time series6. On the origins of latent curve models 14. Factor analysis and latent structure of categorical7. Five steps in the structural factor analysis of and metric data

longitudinal data 15. Rotation methods, algorithms, and standard errors8. Factorial invariance: historical perspectives and 16. A review of nonlinear factor analysis and nonlinear

new problems structural equation modeling9. Factor analysis models as approximations

10. Common factors versus components: principalsand principles, errors and misconceptions

Readership: Statisticians interested in the current knowledge and the future of factor analysisand other latent variable models as well as the history of these methods and history of statistics ingeneral. Some parts are more demanding mathematically, but overall the level of mathematicaldetail is not high.

This book is all about factor analysis (FA), its history, development, developers, theory,applications, and variations during the past 100 years. Sixteen clearly-written chapters byestablished authors constitute a volume full of interesting, both methodological and historicaldetails.

No, it is not just history. It is also a fresh look at the future, as the title suggests. But, theexceptional history of FA deserves a serious attention, too, because it reflects so much history of



the statistical science in general. Indeed, FA can well be taken as one of the great success storiesof statistics, although earlier it used to be even overcriticized by many respected statisticians.

Today – and in the future – FA is seen as a general statistical model which links togethervarious areas or methods of statistics, such as latent variable models, linear and nonlinearstructural equation models, multivariate time series models, and longitudinal models – practicallyanything that consists of multidimensional structures and different sources of errors. The bookis worth reading just for curiosity, but many of the chapters will serve well as a supplementarymaterial for courses of these topics.


Data Manipulation with RPhil SpectorSpringer, 2008, x + 154 pages, € 44.95 / £ 34.50 / US$ 54.95, softcoverISBN: 978-0-387-74730-9

Table of contents

1. Data in R 6. Subscripting2. Reading and writing data 7. Character manipulation3. R and databases 8. Data aggregation4. Dates 9. Reshaping data5. Factors

Readership: Students and researchers using R.

We have recently seen a phenomenal growth in the use of R for data analysis and graphics,as evidenced by the number of R-related books now published and by the breadth of thepresentations at the useR! conferences. This book, a volume in Springer’s Use R! series, hasa different emphasis from most R-related books that focus on a particular type of statisticalmodel or graphical technique or a particular audience. Instead this comprehensive, compactand concise book provides all R users with a reference and guide to the mundane but terriblyimportant topic of data manipulation in R. New R users, and even some experienced users, canbe frustrated when trying to input, rearrange and summarize their data, especially when workingwith large data sets. On the R mailing lists we frequently see comments that this is awkward orslow or takes too much memory. Those who know the S language well (R is an implementationof the S language) and have taken the trouble to read the “R Data Import/Export” manual canusually suggest ways of making this process less agonizing but, until now, this knowledge wasscattered in many different pieces of documentation or required considerable experience andexperimentation to acquire. Phil Spector has taken his considerable experience – he was oneof the first users of S outside of Bell Labs – and provided it to neophyte and expert alike in acompact volume which I have already recommended to students and faculty in my department.He even provides a brief introduction to databases and the Structured Query Language (SQL)in chapter 3. This is a book that should be read and kept close at hand by everyone who



uses R regularly. The time spent reading it will be soon recovered through more efficient useof R.

Douglas M. Bates: [email protected] of Statistics, University of Wisconsin

1300 University Avenue, Madison, WI 53706-1532, USA

Data Quality and Record Linkage TechniquesThomas N. Herzog, Fritz J. Scheuren, William E. WinklerSpringer, 2007, xiv + 234 pages, € 34.95 / £ 27.00 / US$ 44.95, softcoverISBN: 978-0-387-69502-0

Table of contents

1. Introduction 11. Phonetic coding systems for namesPart 1: Data quality: what it is, why it is important, and 12. Blocking

how to achieve it 13. String comparator metrics for typographical2. What is data quality and why should we care? error3. Examples of entities using data to their Part 3: Record linkage case studies

advantage/disadvantage 14. Duplicate FHA single-family mortgage4. Properties of data quality and metrics for records: a case study of data problems,

measuring it consiquences, corrective steps5. Basic data quality tools 15. Record linkage case studies in the medical,

Part 2: Specialized tools for database improvement biomedical, and highway safety areas6. Mathematical preliminaries for specialized data 16. Constructing list frames and administrative

quality techniques lists7. Automatic editing and imputation of sample 17. Social security and related topics

survey data Part 4: Other topics8. Record linkage – methodology 18. Confidentiality: maximizing access to9. Estimating the parameters of the Fellegi– micro-data while protecting privacy

Sunter record linkage model standardization 19. Review of record linkage softwareand parsing 20. Summary chapter

10. Standardization and parsing

Readership: Practitioners in database management and information retrieval, official statisti-cians, survey statisticians.

Databases pervade our lives. This book provides a gentle but thorough introduction, by threeacknowledged experts, to the topic of database quality and techniques for linking records inand between databases. The book is written as a primer, rather than a detailed reference forresearchers, but an extensive reference list is provided for follow-up on more technical material.Apart from the first section of Chapter 6 (which considers conditional probability) and the firstfew sections of Chapter 9 (which consider the Fellegi–Sunter record linkage model and theEM algorithm) the mathematical level is deliberately low, seldom extending beyond logarithmsand ratios, in order to reach the widest possible audience. Even for complicated topics, thefocus is on explanation of principles, examples, case studies and anecdotes, rather than theunderlying mathematics. Applications are given from mortgage guarantee and social insurance,



medicine, biomedicine, and highway safety. There is also material on construction of list framesand administrative lists, and good information is provided on currently available computerpackages. Paradoxically perhaps, this emphasis seems just as it needs to be to help ensure thatthe mathematical and statistical aspects of databases and their management are not relegated tothe status of ‘esoteric curiosity’.

The core problem the book does much to remedy is that the principles it explores are toooften perceived as of little importance or use by commercial database designers and managers,and even sometimes by official statisticians. In order to reach this wider audience, the book is aconsiderably lighter read than I was expecting (which was not a disappointment) and I think thisimpression would be shared by any graduate in statistics, and even by many whose statisticalskills were rather below this level.

The book provides a good, sound, verbal introduction and summary, and a useful point ofdeparture into the more technical side of database quality and record linkage problems. Insummary, it should be a core sourcebook for non-mathematical statisticians in official statisticsagencies, and database designers and managers in government and commerce. It also provides auseful introduction to this important topic, and a comprehensive reference list for further study,for professional statisticians and academics.



Sampling Techniques for Forest InventoriesDaniel MandallazChapman and Hall/CRC, 2007, xv + 256 pages, £ 42.99 / US$ 79.95, hardcoverISBN: 978-1-58488-976-2

Table of contents

1. Introduction and terminology 10. The Swiss National Forest Inventory2. Sampling finite populations: the essentials 11. Estimating change and growth3. Sampling finite populations: advanced topics 12. Transect-sampling4. Forest inventory: one stage sampling schemes Appendices5. Forest inventory: two stage sampling schemes A. Simulations6. Forest inventory: advanced topics B. Conditional expectations and variances7. Geostatistics C. Solutions to selected exercises8. Case study9. Optimal sampling schemes for forest inventory

Readership: Graduates and professionals in forestry and forestry management, applied statisti-cians, survey statisticians, statisticians interested in the statistical theory of sampling for forestinventory.

This is an important reference for those wanting to understand the theory of sampling inforest inventory, and also for those with graduate or postgraduate level skills in statistics whoapply these techniques in the forestry industry. Despite its length, the book provides reasonablythorough coverage of the theory of statistics applied to forest inventories.



The general topic is a very broad one. The author has consequently been forced to choose tosome extent, and has opted for mathematical rigour over coverage of a wider range of forestryrelated topics or an extensive collection of case studies to illustrate principles. Applied statisticsbooks in specified application areas often do not indicate how formulae have been derived,so this focus makes the book a welcome and useful addition to the forest inventory literature,even if its rigour may restrict readership. As the author states in the preface, “This expositionis as general and concise as possible”. The approach does however have some disadvantages,as it makes practical points more difficult to see, and the need for brevity means that someterms (e.g. stochastic integral, order of approximation, convergence in probability, asymptoticequivalence) remain essentially undefined. To extend readership, explicit definitions of or atleast more references to these concepts would have been a useful addition. A certain level ofknowledge of forest inventory is also assumed. The book has a nice treatment of design-based,model-based and model-assisted survey sampling, including the Horvitz–Thompson estimatorand more advanced topics such as three-stage element sampling and model-based and model-assisted estimation procedures such as GREG (generalised regression estimation). Use is madeof anticipated variance for optimal design of forest inventories, and this is illustrated using datafrom the Swiss National Forest Inventory. Estimation of growth and transect sampling using astereological approach is outlined. There is also material on methods of sampling, developedby the author, that involve using locally observed density as a random function. As noted inChapter 6, “. . .microscopic models at tree level and macroscopic models at point level aregenerally incompatible. . .”, so that while local observed density models hold promise, furtherresearch is warranted before they should become common as an alternative to more standardforestry sampling schemes.

The book makes extensive and generally careful use of matrix algebra, although I would haveliked to see matrix properties more clearly stated in some places, e.g. equation (3.24) involvescovariance matrix that is itself the product of a covariance matrix and a sample weightingmatrix (since the product can only be symmetric, as required, if both matrices in the product arediagonal or share (or where eigenvalues are equal can be constructed to share) all eigenvectors).The coverage of small area estimation is brief and limited to situations where all small areascontain sampled elements. Although the emphasis is on estimating summary statistics (e.g.totals and means), there is also material on analysis of complex survey data and on modellingof relationships between variables. Additional references to all three topics would have beenuseful. The assumption is made (p54) that all responses (i.e. measurements) are “assumed to beerror free”. This is a rather strong assumption in forest inventory, where height, volume, andeven DBH (diameter at breast height) can sometimes be difficult to measure, and extension ofthe methods to situations where there is measurement error would possibly be a useful futureaddition (especially since, even if height and DBH are unbiased, estimated volume need not be).In summary however, this is a very useful, up-to-date reference book on the theory of statisticsas it should be applied to forest inventory.





Statistical Methods for Human RightsJana Asher, David Banks, Fritz J. Scheuren (Editors)Springer, 2008, xxii + 339 pages, € 32.95 / £ 25.50 / US$ 39.95, softcoverISBN: 978-0-387-72836-0

Table of contents

Part I Statistical Thinking on Human Rights Topics 8. Metagora: an experiment in the1. Introduction measurement of democratic governance2. The statistics of genocide Part III History and Future Possibilities3. Why estimate direct and indirect casualties 9. Human rights of statisticians and statistics of

from war? The rule of proportionality and human rights: early history of the Americancasual estimates statistical association’s committee on scientific

4. Statistical thinking and data analysis: freedom of human rightsenhancing human rights work 10. Obtaining evidence for the international

Part II Recent Projects criminal court using data and quantitative5. Hidden in plain sight: X.X. burials and the analysis

desaparecidos in the department of Guatemala 11. New issues in human rights statistics1977–1986 12. Statistics and the millennium development

6. The demography of conflict-related mortality in goalsTimor-Leste 1974–1999: reflections on Part IV A Final Word of Warningempirical quantitative measurement of civilian 13. Using population data systems to targetkillings, disappearances, and famine-related vulnerable population subgroups anddeaths individuals: Issues and incidents

7. Afghan refugee camp surveys in Pakistan 2002

Readership: Researchers and graduate students.

This book describes the statistics that underlie the social science research in human rights.It contains case studies, methodology, and research papers that discuss the fundamentalmeasurement issues. It is intended as an introduction to applied human rights.

Publications in human rights issues have been around for some time now; however this bookis unique in that it investigates the statistics used in the service of human rights and democracy.The book is an international collection of papers of current research in the area of human rights.The book is full of interesting and useful examples to use when teaching statistics, particularlyfor the social science students. The chapters contain a summary/conclusion at the end and awealth of references for the reader to pursue. It is not a text book with questions and answers tobe used in a classroom and so educators will have to use the material, in the chapters, to providetheir own questions to assess learning.

A useful book with statistics that helps researchers in their approach to present and measurehuman rights issues with clarity.

Susan Starkings: [email protected] South Bank University

103 Borough Road, London SE1 0AA, UK


Margins of Error: A Study of Reliability in Survey Measurement by Duane F. Alwin

Documents

Transcript of Margins of Error: A Study of Reliability in Survey Measurement by Duane F. Alwin