Springer Texts in Statistics978-3-319-52401-6/1.pdf · Springer Texts in Statistics ... Matthew A....

Springer Texts in Statistics

More information about this series at http://www.springer.com/series/417

Series Editors

R. DeVeaux

S.E. Fienberg

I. Olkin

Matthew A. Carlton • Jay L. Devore

Probability with Applicationsin Engineering, Science,and Technology

Second Edition

EditorsMatthew A. CarltonDepartment of StatisticsCalifornia Polytechnic State UniversitySan Luis Obispo, CA, USA

Jay L. DevoreDepartment of StatisticsCalifornia Polytechnic State UniversitySan Luis Obispo, CA, USA

ISSN 1431-875X ISSN 2197-4136 (eBook)Springer Texts in StatisticsISBN 978-3-319-52400-9 ISBN 978-3-319-52401-6 (eBook)DOI 10.1007/978-3-319-52401-6

Library of Congress Control Number: 2017932278

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole orpart of the material is concerned, specifically the rights of translation, reprinting, reuse ofillustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,and transmission or information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names areexempt from the relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information inthis book are believed to be true and accurate at the date of publication. Neither the publisher northe authors or the editors give a warranty, express or implied, with respect to the materialcontained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer NatureThe registered company is Springer International Publishing AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Additional material to this book can be downloaded from http://extras.springer.com.

# Springer International Publishing AG 2014, 2017

http://extras.springer.com

Preface

Purpose

Our objective is to provide a post-calculus introduction to the subject of probability that

• Has mathematical integrity and contains some underlying theory

• Shows students a broad range of applications involving real problem scenarios

• Is current in its selection of topics

• Is accessible to a wide audience, including mathematics and statistics majors (yes, there are a few

of the latter, and their numbers are growing), prospective engineers and scientists, and business

and social science majors interested in the quantitative aspects of their disciplines

• Illustrates the importance of software for carrying out simulations when answers to questions

cannot be obtained analytically

A number of currently available probability texts are heavily oriented toward a rigorous mathe-

matical development of probability, with much emphasis on theorems, proofs, and derivations. Even

when applied material is included, the scenarios are often contrived (many examples and exercises

involving dice, coins, cards, and widgets). So in our exposition we have tried to achieve a balance

between mathematical foundations and the application of probability to real-world problems. It is our

belief that the theory of probability by itself is often not enough of a “hook” to get students interested

in further work in the subject. We think that the best way to persuade students to continue their

probabilistic education beyond a first course is to show them how the methodology is used in practice.

Let’s first seduce them (figuratively speaking, of course) with intriguing problem scenarios and

applications. Opportunities for exposure to mathematical rigor will follow in due course.

Content

The book begins with an Introduction, which contains our attempt to address the following question:

“Why study probability?” Here we are trying to tantalize students with a number of intriguing

problem scenarios—coupon collection, birth and death processes, reliability engineering, finance,

queuing models, and various conundrums involving the misinterpretation of probabilistic information

(e.g., Benford’s Law and the detection of fraudulent data, birthday problems, and the likelihood of

having a rare disease when a diagnostic test result is positive). Most of the exposition contains

references to recently published results. It is not necessary or even desirable to cover very much of

this motivational material in the classroom. Instead, we suggest that instructors ask their students to

read selectively outside class (a bit of pleasure reading at the very beginning of the term should not be

v

an undue burden!). Subsequent chapters make little reference to the examples herein, and separating

out our “pep talk” should make it easier to cover as little or much as an instructor deems appropriate.

Chapter 1 covers sample spaces and events, the axioms of probability and derived properties,

counting, conditional probability, and independence. Discrete random variables and distributions are

the subject of Chap. 2, and Chap. 3 introduces continuous random variables and their distributions.

Joint probability distributions are the focus of Chap. 4, including marginal and conditional

distributions, expectation of a function of several variables, correlation, modes of convergence, the

Central Limit Theorem, reliability of systems of components, the distribution of a linear combination,

and some results on order statistics. These four chapters constitute the core of the book.

The remaining chapters build on the core in various ways. Chapter 5 introduces methods of

statistical inference—point estimation, the use of statistical intervals, and hypothesis testing. In

Chap. 6 we cover basic properties of discrete-time Markov chains. Various other random processes

and their properties, including stationarity and its consequences, Poisson processes, Brownian

motion, and continuous-time Markov chains, are discussed in Chap. 7. The final chapter presents

some elementary concepts and methods in the area of signal processing.

One feature of our book that distinguishes it from the competition is a section at the end of almost

every chapter that considers simulation methods for getting approximate answers when exact results

are difficult or impossible to obtain. Both the R software and Matlab are employed for this purpose.

Another noteworthy aspect of the book is the inclusion of roughly 1100 exercises; the first four

core chapters together have about 700 exercises. There are numerous exercises at the end of each

section and also supplementary exercises at the end of every chapter. Probability at its heart is

concerned with problem solving. A student cannot hope to really learn the material simply by sitting

passively in the classroom and listening to the instructor. He/she must get actively involved in

working problems. To this end, we have provided a wide spectrum of exercises, ranging from

straightforward to reasonably challenging. It should be easy for an instructor to find enough problems

at various levels of difficulty to keep students gainfully occupied.

Mathematical Level

The challenge for students at this level should be to master the concepts and methods to a sufficient

degree that problems encountered in the real world can be solved. Most of our exercises are of this

type, and relatively few ask for proofs or derivations. Consequently, the mathematical prerequisites

and demands are reasonably modest. Mathematical sophistication and quantitative reasoning ability

are, of course, crucial to the enterprise. Univariate calculus is employed in the continuous distribution

calculations of Chap. 3 as well as in obtaining maximum likelihood estimators in the inference

chapter. But even here the functions we ask students to work with are straightforward—generally

polynomials, exponentials, and logs. A stronger background is required for the signal processing

material at the end of the book (we have included a brief mathematical appendix as a refresher for

relevant properties). Multivariate calculus is used in the section on joint distributions in Chap. 4 and

thereafter appears rather rarely. Exposure to matrix algebra is needed for the Markov chain material.

Recommended Coverage

Our book contains enough material for a year-long course, though we expect that many instructors

will use it for a single term (one semester or one quarter). To give a sense of what might be

reasonable, we now briefly describe three courses at our home institution, Cal Poly State University

(in San Luis Obispo, CA), for which this book is appropriate. Syllabi with expanded course outlines

are available for download on the book’s website at Springer.com.

vi Preface

http://dx.doi.org/10.1007/978-1-4939-0395-5_1

http://dx.doi.org/10.1007/978-1-4939-0395-5_2

http://dx.doi.org/10.1007/978-1-4939-0395-5_3

http://dx.doi.org/10.1007/978-1-4939-0395-5_4

http://dx.doi.org/10.1007/978-1-4939-0395-5_5

http://dx.doi.org/10.1007/978-1-4939-0395-5_6

http://dx.doi.org/10.1007/978-1-4939-0395-5_7

http://dx.doi.org/10.1007/978-1-4939-0395-5_3

http://dx.doi.org/10.1007/978-1-4939-0395-5_4

http://springer.com

Title: Introduction to

Probability and

Simulation

Introduction to Probability

Models

Probability and Random Processes

for Engineers

Main

audience:

Statistics and math

majors

Statistics and math majors Electrical and computer

engineering majors

Prerequisites: Univariate calculus,

computer

programming

Univariate calculus, computer

programming, matrix algebra

Multivariate calculus, continuous-

time signals incl. Fourier analysis

Sections

covered:

1.1–1.6 1.1–1.6 1.1–1.5

2.1–2.6, 2.8 2.1–2.5, 2.8 2.1–2.5

3.1–3.4, 3.8 3.1–3.4, 3.8 3.1–3.5

4.1–4.3, 4.5 4.1–4.3, 4.5, 4.8 4.1–4.3, 4.5, 4.7

6.1–6.5 7.1–7.3, 7.5–7.6

7.5 8.1–8.2

Both of the first two courses place heavy emphasis on computer simulation of random phenomena;

instructors typically have students work in R. As is evident from the lists of sections covered,

Introduction to Probability Models takes the earlier material at a faster pace in order to leave a few

weeks at the end for Markov chains and some other applications (typically reliability theory and a bit

about Poisson processes). In our experience, the computer programming prerequisite is essential for

students’ success in those two courses.

The third course listed, Probability and Random Processes for Engineers, is our university’s

version of the traditional “random signals and noise” course offered by many electrical engineering

departments. Again, the first four chapters are covered at a somewhat accelerated pace, with about

30–40% of the course dedicated to time and frequency representations of random processes (Chaps. 7

and 8). Simulation of random phenomena is not emphasized in our course, though we make liberal

use of Matlab for demonstrations.

We are able to cover as much material as indicated on the foregoing syllabi with the aid of a not-

so-secret weapon: we prepare and require that students bring to class a course booklet. The booklet

contains most of the examples we present as well as some surrounding material. A typical example

begins with a problem statement and then poses several questions (as in the exercises in this book).

After each posed question there is some blank space so the student can either take notes as the solution

is developed in class or else work the problem on his/her own if asked to do so. Because students have

a booklet, the instructor does not have to write as much on the board as would otherwise be necessary

and the student does not have to do as much writing to take notes. Both the instructor and the students

benefit.

We also like to think that students can be asked to read an occasional subsection or even section on

their own and then work exercises to demonstrate understanding, so that not everything needs to be

presented in class. For example, we have found that assigning a take-home exam problem that

requires reading about the Weibull and/or lognormal distributions is a good way to acquaint students

with them. But instructors should always keep in mind that there is never enough time in a course of

any duration to teach students all that we’d like them to know. Hopefully students will like the book

enough to keep it after the course is over and use it as a basis for extending their knowledge of

probability!

Preface vii

http://dx.doi.org/10.1007/978-3-319-52401-6_7

http://dx.doi.org/10.1007/978-3-319-52401-6_8

Acknowledgments

We gratefully acknowledge the plentiful feedback provided by the following reviewers: Allan Gut,

Murad Taqqu, Mark Schilling and Robert Heiny.

We very much appreciate the production services provided by the folks at SPi Technologies. Our

production editors, Sasireka. K and Maria David did a first-rate job of moving the book through the

production process and were always prompt and considerate in communications with us. Thanks to

our copyeditors at SPi for employing a light touch and not taking us too much to task for our

occasional grammatical and stylistic lapses. The staff at Springer U.S. has been especially supportive

during both the developmental and production stages; special kudos go to Michael Penn and Rebekah

McClure.

A Final Thought

It is our hope that students completing a course taught from this book will feel as passionately about

the subject of probability as we still do after so many years of living with it. Only teachers can really

appreciate how gratifying it is to hear from a student after he/she has completed a course that the

experience had a positive impact and maybe even affected a career choice.

San Luis Obispo, CA Matthew A. Carlton

San Luis Obispo, CA Jay L. Devore

viii Preface

Contents

1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Sample Spaces and Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 The Sample Space of an Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.3 Some Relations from Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.4 Exercises: Section 1.1 (1–12) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Axioms, Interpretations, and Properties of Probability . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Interpreting Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.2 More Probability Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.3 Determining Probabilities Systematically . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.4 Equally Likely Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.5 Exercises: Section 1.2 (13–30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3 Counting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3.1 The Fundamental Counting Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3.2 Tree Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.3.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22


1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.4.1 The Definition of Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 30

1.4.2 The Multiplication Rule for P(A \ B) . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.4.3 The Law of Total Probability and Bayes’ Theorem . . . . . . . . . . . . . . . . 34


1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1.5.1 P(A \ B) When Events Are Independent . . . . . . . . . . . . . . . . . . . . . . . . 44

1.5.2 Independence of More than Two Events . . . . . . . . . . . . . . . . . . . . . . . . 45

1.5.3 Exercises: Section 1.5 (79–100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1.6 Simulation of Random Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1.6.1 The Backbone of Simulation: Random Number Generators . . . . . . . . . . 51

1.6.2 Precision of Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

1.6.3 Exercises: Section 1.6 (101–120) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

1.7 Supplementary Exercises (121–150) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2 Discrete Random Variables and Probability Distributions . . . . . . . . . . . . . . . . . . . . 67

2.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.1.1 Two Types of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


ix

2.2 Probability Distributions for Discrete Random Variables . . . . . . . . . . . . . . . . . . . 71

2.2.1 A Parameter of a Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . 74

2.2.2 The Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2.2.3 Another View of Probability Mass Functions . . . . . . . . . . . . . . . . . . . . . 78


2.3 Expected Value and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

2.3.1 The Expected Value of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

2.3.2 The Expected Value of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

2.3.3 The Variance and Standard Deviation of X . . . . . . . . . . . . . . . . . . . . . . 88

2.3.4 Properties of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90


2.4 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

2.4.1 The Binomial Random Variable and Distribution . . . . . . . . . . . . . . . . . . 97

2.4.2 Computing Binomial Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

2.4.3 The Mean and Variance of a Binomial Random Variable . . . . . . . . . . . . 101

2.4.4 Binomial Calculations with Software . . . . . . . . . . . . . . . . . . . . . . . . . . 102


2.5 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

2.5.1 The Poisson Distribution as a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

2.5.2 The Mean and Variance of a Poisson Random Variable . . . . . . . . . . . . . 110

2.5.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

2.5.4 Poisson Calculations with Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111


2.6 Other Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

2.6.1 The Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

2.6.2 The Negative Binomial and Geometric Distributions . . . . . . . . . . . . . . . 117

2.6.3 Alternative Definition of the Negative Binomial Distribution . . . . . . . . . 120


2.7 Moments and Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

2.7.1 The Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

2.7.2 Obtaining Moments from the MGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

2.7.3 MGFs of Common Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128


2.8 Simulation of Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

2.8.1 Simulations Implemented in R and Matlab . . . . . . . . . . . . . . . . . . . . . . 134

2.8.2 Simulation Mean, Standard Deviation, and Precision . . . . . . . . . . . . . . . 135



3 Continuous Random Variables and Probability Distributions . . . . . . . . . . . . . . . . . 147

3.1 Probability Density Functions and Cumulative Distribution Functions . . . . . . . . . 147

3.1.1 Probability Distributions for Continuous Variables . . . . . . . . . . . . . . . . . 148

3.1.2 The Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 152

3.1.3 Using F(x) to Compute Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

3.1.4 Obtaining f(x) from F(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

3.1.5 Percentiles of a Continuous Distribution . . . . . . . . . . . . . . . . . . . . . . . . 156


3.2 Expected Values and Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . 162

3.2.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

3.2.2 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

3.2.3 Exercises: Section 3.2(19–38) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

x Contents

3.3 The Normal (Gaussian) Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

3.3.1 The Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

3.3.2 Non-standardized Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 175

3.3.3 The Normal MGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

3.3.4 The Normal Distribution and Discrete Populations . . . . . . . . . . . . . . . . . 179

3.3.5 Approximating the Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 180

3.3.6 Normal Distribution Calculations with Software . . . . . . . . . . . . . . . . . . 182


3.4 The Exponential and Gamma Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

3.4.1 The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

3.4.2 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

3.4.3 The Gamma MGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

3.4.4 Gamma and Exponential Calculations with Software . . . . . . . . . . . . . . . 193


3.5 Other Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

3.5.1 The Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

3.5.2 The Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

3.5.3 The Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


3.6 Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

3.6.1 Sample Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

3.6.2 A Probability Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

3.6.3 Departures from Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

3.6.4 Beyond Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

3.6.5 Probability Plots in Matlab and R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213


3.7 Transformations of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216


3.8 Simulation of Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

3.8.1 The Inverse CDF Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

3.8.2 The Accept–Reject Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

3.8.3 Built-In Simulation Packages for Matlab and R . . . . . . . . . . . . . . . . . . . 227

3.8.4 Precision of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227



4 Joint Probability Distributions and Their Applications . . . . . . . . . . . . . . . . . . . . . . 239

4.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

4.1.1 The Joint Probability Mass Function for Two Discrete

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

4.1.2 The Joint Probability Density Function for Two Continuous

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

4.1.3 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

4.1.4 More Than Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 246


4.2 Expected Values, Covariance, and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 255

4.2.1 Properties of Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

4.2.2 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

4.2.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Contents xi

4.2.4 Correlation Versus Causation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262


4.3 Properties of Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

4.3.1 The PDF of a Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

4.3.2 Moment Generating Functions for Linear Combinations . . . . . . . . . . . . . 270


4.4 Conditional Distributions and Conditional Expectation . . . . . . . . . . . . . . . . . . . . 277

4.4.1 Conditional Distributions and Independence . . . . . . . . . . . . . . . . . . . . . 279

4.4.2 Conditional Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 280

4.4.3 The Laws of Total Expectation and Variance . . . . . . . . . . . . . . . . . . . . . 281


4.5 Limit Theorems (What Happens as n Gets Large) . . . . . . . . . . . . . . . . . . . . . . . . 290

4.5.1 Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

4.5.2 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

4.5.3 Other Applications of the Central Limit Theorem . . . . . . . . . . . . . . . . . 297

4.5.4 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299


4.6 Transformations of Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . . 302

4.6.1 The Joint Distribution of Two New Random Variables . . . . . . . . . . . . . 303

4.6.2 The Joint Distribution of More Than Two New Variables . . . . . . . . . . . 306


4.7 The Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

4.7.1 Conditional Distributions of X and Y . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

4.7.2 Regression to the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

4.7.3 The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 312

4.7.4 Bivariate Normal Calculations with Software . . . . . . . . . . . . . . . . . . . . 313


4.8 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

4.8.1 The Reliability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

4.8.2 Series and Parallel Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

4.8.3 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

4.8.4 Hazard Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321


4.9 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

4.9.1 The Distributions of Yn and Y1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

4.9.2 The Distribution of the ith Order Statistic . . . . . . . . . . . . . . . . . . . . . . . 328

4.9.3 The Joint Distribution of the n Order Statistics . . . . . . . . . . . . . . . . . . . 329


4.10 Simulation of Joint Probability Distributions and System Reliability . . . . . . . . . . 332

4.10.1 Simulating Values from a Joint PMF . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

4.10.2 Simulating Values from a Joint PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

4.10.3 Simulating a Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 336

4.10.4 Simulation Methods for Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

4.10.5 Exercises: Section 4.10 (143–153) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340


5 The Basics of Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

5.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

5.1.1 Estimates and Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

5.1.2 Assessing Estimators: Accuracy and Precision . . . . . . . . . . . . . . . . . . . . 357


xii Contents

5.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

5.2.1 Some Properties of MLEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372


5.3 Confidence Intervals for a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

5.3.1 A Confidence Interval for a Normal Population Mean . . . . . . . . . . . . . . 376

5.3.2 A Large-Sample Confidence Interval for μ . . . . . . . . . . . . . . . . . . . . . . 380

5.3.3 Software for Confidence Interval Calculation . . . . . . . . . . . . . . . . . . . . . 381


5.4 Testing Hypotheses About a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . 386

5.4.1 Hypotheses and Test Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

5.4.2 Test Procedures for Hypotheses About a Population Mean μ . . . . . . . . . 388

5.4.3 P-Values and the One-Sample t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

5.4.4 Errors in Hypothesis Testing and the Power of a Test . . . . . . . . . . . . . . 392

5.4.5 Software for Hypothesis Test Calculation . . . . . . . . . . . . . . . . . . . . . . . 395


5.5 Inferences for a Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

5.5.1 Confidence Intervals for p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

5.5.2 Hypothesis Testing for p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

5.5.3 Software for Inferences about p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405


5.6 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

5.6.1 The Posterior Distribution of a Parameter . . . . . . . . . . . . . . . . . . . . . . . 410

5.6.2 Inferences from the Posterior Distribution . . . . . . . . . . . . . . . . . . . . . . . 413

5.6.3 Further Comments on Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . 413



6 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

6.1 Terminology and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

6.1.1 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426


6.2 The Transition Matrix and the Chapman–Kolmogorov Equations . . . . . . . . . . . . 431

6.2.1 The Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

6.2.2 Computation of Multistep Transition Probabilities . . . . . . . . . . . . . . . . . 432


6.3 Specifying an Initial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

6.3.1 A Fixed Initial State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443


6.4 Regular Markov Chains and the Steady-State Theorem . . . . . . . . . . . . . . . . . . . . 446

6.4.1 Regular Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446

6.4.2 The Steady-State Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448

6.4.3 Interpreting the Steady-State Distribution . . . . . . . . . . . . . . . . . . . . . . . 450

6.4.4 Efficient Computation of Steady-State Probabilities . . . . . . . . . . . . . . . . 451

6.4.5 Irreducible and Periodic Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453


6.5 Markov Chains with Absorbing States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

6.5.1 Time to Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

6.5.2 Mean Time to Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461

6.5.3 Mean First Passage Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

6.5.4 Probabilities of Eventual Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . 466


Contents xiii

6.6 Simulation of Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472


6.7 Supplementary Exercises (67–82) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

7 Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

7.1 Types of Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

7.1.1 Classification of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

7.1.2 Random Processes Regarded as Random Variables . . . . . . . . . . . . . . . . 493


7.2 Properties of the Ensemble: Mean and Autocorrelation Functions . . . . . . . . . . . . 496

7.2.1 Mean and Variance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496

7.2.2 Autocovariance and Autocorrelation Functions . . . . . . . . . . . . . . . . . . . 499

7.2.3 The Joint Distribution of Two Random Processes . . . . . . . . . . . . . . . . . 502


7.3 Stationary and Wide-Sense Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . 504

7.3.1 Properties of Wide-Sense Stationary Processes . . . . . . . . . . . . . . . . . . . 508

7.3.2 Ergodic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511


7.4 Discrete-Time Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516

7.4.1 Special Discrete Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518


7.5 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522

7.5.1 Relation to Exponential and Gamma Distributions . . . . . . . . . . . . . . . . . 524

7.5.2 Combining and Decomposing Poisson Processes . . . . . . . . . . . . . . . . . . 526

7.5.3 Alternative Definition of a Poisson Process . . . . . . . . . . . . . . . . . . . . . . 528

7.5.4 Nonhomogeneous Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 530

7.5.5 The Poisson Telegraphic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531


7.6 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535

7.6.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536

7.6.2 Brownian Motion as a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538

7.6.3 Further Properties of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . 538

7.6.4 Variations on Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541


7.7 Continuous-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544

7.7.1 Infinitesimal Parameters and Instantaneous Transition Rates . . . . . . . . . . 546

7.7.2 Sojourn Times and Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548

7.7.3 Long-Run Behavior of Continuous-Time Markov Chains . . . . . . . . . . . . 552

7.7.4 Explicit Form of the Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 554


7.8 Supplementary Exercises (98–114) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

8 Introduction to Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

8.1 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

8.1.1 Properties of the Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . 566

8.1.2 Power in a Frequency Band . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569

8.1.3 White Noise Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570

8.1.4 Power Spectral Density for Two Processes . . . . . . . . . . . . . . . . . . . . . . 572


8.2 Random Processes and LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576

8.2.1 Statistical Properties of the LTI System Output . . . . . . . . . . . . . . . . . . . 577

xiv Contents

8.2.2 Ideal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580

8.2.3 Signal Plus Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583


8.3 Discrete-Time Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

8.3.1 Random Sequences and LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 591

8.3.2 Random Sequences and Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593


Appendix A: Statistical Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Appendix B: Background Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609

Appendix C: Important Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615

Answers to Odd-Numbered Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

Contents xv

Introduction: Why Study Probability?

Some of you may enjoy mathematics for its own sake—it is a beautiful subject which provides many

wonderful intellectual challenges. Of course students of philosophy would say the same thing about

their discipline, ditto for students of linguistics, and so on. However, many of us are not satisfied just

with aesthetics and mental gymnastics. We want what we’re studying to have some utility, some

applicability to real-world problems. Fortunately, mathematics in general and probability in particu-

lar provide a plethora of tools for answering important professional and societal questions. In this

section, we’ll attempt to provide some preliminary motivation before forging ahead.

The initial development of probability as a branch of mathematics goes back over 300 years, where

it had its genesis in connection with questions involving games of chance. One of the earliest recorded

instances of probability calculation appeared in correspondence between the two very famous

mathematicians, Blaise Pascal and Pierre de Fermat. The issue was which of the following two

outcomes of die-tossing was more favorable to a bettor: (1) getting at least one 6 in four rolls of a fair

die (“fair” here means that each of the six outcomes 1, 2, 3, 4, 5, and 6 is equally likely to occur) or

(2) getting at least one pair of 6s when two fair dice are rolled 24 times in succession. By the end of

Chap. 1, you shouldn’t have any difficulty showing that there is a slightly better than 50-50 chance of

(1) occurring, whereas the odds are slightly against (2) occurring.

Games of chance have continued to be a fruitful area for the application of probability methodol-

ogy. Savvy poker players certainly need to know the odds of being dealt various hands, such as a full

house or straight (such knowledge is necessary but not at all sufficient for achieving success in card

games, as such endeavors also involve much psychology). The same holds true for the game of

blackjack. In fact, in 1962 the mathematics professor Edward O. Thorp published the book Beat theDealer; in it he employed probability arguments to show that as cards were dealt sequentially from a

deck, there were situations in which the likelihood of success favored the player rather than the dealer.

Because of this work, casinos changed the way cards were dealt in order to prevent card-counting

strategies from bankrupting them. A recent variant of this is described in the paper “Card Counting in

Continuous Time” (Journal of Applied Probability, 2012: 184-198), in which the number of decks

utilized is large enough to justify the use of a continuous approximation to find an optimal betting

strategy.

In the last few decades, game theory has developed as a significant branch of mathematics devoted

to the modeling of competition, cooperation, and conflict. Much of this work involves the use of

probability properties, with applications in such diverse fields as economics, political science, and

biology. However, especially over the course of the last 60 years, the scope of probability applications

has expanded way beyond gambling and games. In this section, we present some contemporary

examples of how probability is being used to solve important problems.

xvii

http://dx.doi.org/10.1007/978-3-319-52401-6_1

Software Use in Probability

Modern probability applications often require the use of a calculator or software. Of course, we rely

on machines to perform every conceivable computation from adding numbers to evaluating definite

integrals. Many calculators and most computer software packages even have built-in functions that

make a number of specific probability calculations more convenient; we will highlight these through-

out the text. But the real utility of modern software comes from its ability to simulate random

phenomena, which proves invaluable in the analysis of very complicated probability models. We will

introduce the key elements of probability simulation in Sect. 1.7 and then revisit simulation in a

variety of settings throughout the book.

Numerous software packages can be used to implement a simulation. We will focus on two:

Matlab and R. Matlab is a powerful engineering software package published by MathWorks; many

universities and technology companies have a license for Matlab. A freeware package called Octave

has been designed to implement the majority of Matlab functions using identical syntax; consult

http://www.gnu.org/software/octave/. (Readers using Mac OS or Windows rather than GNU/Linux

will find links to compatible versions of Octave on this same website.) R is a freeware statistical

software package maintained by a core user group. The R base package and numerous add-ons are

available at http://cran.r-project.org/.

Throughout this textbook, we will provide side by side Matlab and R code for both probability

computations and simulation. It is not the goal, however, to serve as a primer in either language

(certainly, some prior knowledge of elementary programming is required). Both software packages

have extensive help menus and active online user support groups. Readers interested in a more

thorough treatment of these software packages should consultMatlab Primer by Timothy A. Davis or

The R Book by Michael J. Crawley.

Modern Application of Classic Probability Problems

The coupon collector problem has been well known for decades in the probability community. As an

example, suppose each box of a certain type of cereal contains a small toy. The manufacturer of this

cereal has included a total of ten toys in its cereal boxes, with each box being equally likely to yield

one of the ten toys. Suppose you want to obtain a complete set of these toys for a young relative or

friend. Clearly you will have to purchase at least ten boxes, and intuitively it would seem as though

you might have to purchase many more than that. How many boxes would you expect to have to

purchase in order to achieve your goal? Methods from Chap. 4 can be used to show that the average

number of boxes required is 10(1 + 1/2 + 1/3 + � � � + 1/10). If instead there are n toys, then

n replaces 10 in this expression. And when n is large, more sophisticated mathematical arguments

yield the approximation n(ln(n) + .58).

The article “A Generalized Coupon Collector Problem” (Journal of Applied Probability, 2011:1081-1094) mentions applications of the classic problem to dynamic resource allocation, hashing in

computer science, and the analysis of delays in certain wireless communication channels (in this latter

application, there are n users, each receiving packets of data from a transmitter). The generalization

considered in the article involves each cereal box containing d different toys with the purchaser then

selecting the least collected toy thus far. The expected number of purchases to obtain a complete

collection is again investigated, with special attention to the case of n being quite large. An

application to the wireless communication scenario is mentioned.

xviii Introduction: Why Study Probability?

http://dx.doi.org/10.1007/978-1-4939-0395-5_1#Sec28

http://www.gnu.org/software/octave/

http://cran.r-project.org/

http://dx.doi.org/10.1007/978-1-4939-0395-5_4

Applications to Business

The article “Newsvendor-Type Models with Decision-Dependent Uncertainty” (MathematicalMethods of Operations Research, 2012, published online) begins with an overview of a class of

decision problems involving uncertainty. In the classical newsvendor problem, a seller has to choose

the amount of inventory to obtain at the beginning of a selling season. This ordering decision is made

only once, with no opportunity to replenish inventory during the season. The amount of demand D is

uncertain (what we will call in Chap. 2 a random variable). The cost of obtaining inventory is c perunit ordered, the sale price is r per unit, and any unsold inventory at the end of the season has a

salvage value of v per unit. The optimal policy, that which maximizes expected profit, is easily

characterized in terms of the probability distribution of D (this distribution specifies how likely it is

that various values of D will occur).

In the revenue management problem, there are S units of inventory to sell. Each unit is sold for a

price of either r1 or r2 (r1 > r2). During the first phase of the selling season, customers arrive who will

buy at the price r2 but not at r1. In the second phase, customers arrive who will pay the higher price.

The seller wishes to know how much of the initial inventory should be held in reserve for the second

phase. Again the general form of the optimal policy that maximizes expected profit is easily

determined in terms of the distributions for demands in the two periods. The article cited in the

previous paragraph goes on to consider situations in which the distribution(s) of demand(s) must be

estimated from data and how such estimation affects decision making.

A cornerstone of probabilistic inventory modeling is a general result established more than

50 years ago: Suppose that the amount of inventory of a commodity is reviewed every T time periods

to decide whether more should be ordered. Under rather general conditions, it was shown that the

optimal policy—the policy that minimizes the long-run expected cost—is to order nothing if the

current level of inventory is at least an amount s but to order enough to bring the inventory level up toan amount S if the current level is below s. The values of s and S are determined by various costs, the

price of the commodity, and the nature of demand for the commodity (how customer orders and order

amounts occur over time).

The article “A Periodic-Review Base-Stock Inventory System with Sales Rejection” (OperationsResearch, 2011: 742-753) considers a policy appropriate when backorders are possible and lost sales

may occur. In particular, an order is placed every T time periods to bring inventory up to some level S.Demand for the commodity is filled until the inventory level reaches a sales rejection thresholdM for

some M < S. Various properties of the optimal values of M and S are investigated.

Applications to the Life Sciences

Examples of the use of probability and probabilistic modeling can be found in many subdisciplines of

the life sciences. For example, Pseudomonas syringae is a bacterium which lives in leaf surfaces. The

article “Stochastic Modeling of Pseudomonas Syringae Growth in the Phyllosphere” (MathematicalBiosciences, 2012: 106-116) proposed a probabilistic (synonymous with “stochastic”) model called a

birth and death process with migration to describe the aggregate distribution of such bacteria and

determine the mechanisms which generated experimental data. The topic of birth and death processes

is considered briefly in Chap. 7 of our book.

Another example of such modeling appears in the article “Means and Variances in Stochastic

Multistage Cancer Models” (Journal of Applied Probability, 2012: 590-594). The authors discuss awidely used model of carcinogenesis in which division of a healthy cell may give rise to a healthy cell

and a mutant cell, whereas division of a mutant cell may result in two mutant cells of the same type or

possibly one of the same types and one with a further mutation. The objective is to obtain an

Introduction: Why Study Probability? xix

http://dx.doi.org/10.1007/978-1-4939-0395-5_2

http://dx.doi.org/10.1007/978-1-4939-0395-5_7

expression for the expected number of cells at each stage and also a quantitative assessment of how

much the actual number might deviate from what is expected (that is what “variance” does).

Epidemiology is the branch of medicine and public health that studies the causes and spread of

various diseases. Of particular interest to epidemiologists is how epidemics are propagated in one or

more populations. The general stochastic epidemic model assumes that a newly infected individual is

infectious for a random amount of time having an exponential distribution (this distribution is

discussed in Chap. 3) and during this infectious period encounters other individuals at times

determined by a Poisson process (one of the topics in Chap. 7). The article “The Basic ReproductionNumber and the Probability of Extinction for a Dynamic Epidemic Model” (MathematicalBiosciences, 2012: 31-35) considers an extension in which the population of interest consists of a

fixed number of subpopulations. Individuals move between these subpopulations according to a

Markov transition matrix (the subject of Chap. 6) and infectives can only make infectious contact

with members of their current subpopulation. The effect of variation in the infectious period on the

probability that the epidemic ultimately dies out is investigated.

Another approach to the spread of epidemics is based on branching processes. In the simplest such

process, a single individual gives birth to a random number of individuals; each of these in turn gives

birth to a random number of progeny, and so on. The article “The Probability of Containment for

Multitype Branching Process Models for Emerging Epidemics” (Journal of Applied Probability,2011: 173-188) uses a model in which each individual “born” to an existing individual can have one

of a finite number of severity levels of the disease. The resulting theory is applied to construct a

simulation model of how influenza spread in rural Thailand.

Applications to Engineering and Operations Research

We want products that we purchase and systems that we rely on (e.g., communication networks,

electric power grids) to be highly reliable—have long lifetimes and work properly during those

lifetimes. Product manufacturers and system designers therefore need to have testing methods that

will assess various aspects of reliability. In the best of all possible worlds, data bearing on reliability

could be obtained under normal operating conditions. However, this may be very time consuming

when investigating components and products that have very long lifetimes. For this reason, there has

been much research on “accelerated” testing methods which induce failure or degradation in a much

shorter time frame. For products that are used only a fraction of the time in a typical day, such as

home appliances and automobile tires, acceleration might entail operating continuously in time but

under otherwise normal conditions. Alternatively, a sample of units could be subjected to stresses

(e.g., temperature, vibration, voltage) substantially more severe than what is usually experienced.

Acceleration can also be applied to entities in which degradation occurs over time—stiffness of

springs, corrosion of metals, and wearing of mechanical components. In all these cases, probability

models must then be developed to relate lifetime behavior under such acceleration to behavior in

more customary situations. The article “Overview of Reliability Testing” (IEEE Transactions onReliability, 2012: 282-291) gives a survey of various testing methodologies and models. The article

“A Methodology for Accelerated Testing by Mechanical Actuation of MEMS Devices” (Microelec-tronics Reliability, 2012: 1382-1388) applies some of these ideas in the context of predicting lifetimes

for micro-electro-mechanical systems.

An important part of modern reliability engineering deals with building redundancy into various

systems in order to decrease substantially the likelihood of failure. A k-out-of-n:G system works or is

good only if at least k amongst the n constituent components work or are good, whereas a k-out-of-n:Fsystem fails if and only if at least k of the n components fail. The article “Redundancy Issues in

Software and Hardware Systems: An Overview” (Intl. Journal of Reliability, Quality, and SafetyEngineering, 2011: 61-98) surveys these and various other systems that can improve the performance

xx Introduction: Why Study Probability?

http://dx.doi.org/10.1007/978-1-4939-0395-5_3

http://dx.doi.org/10.1007/978-1-4939-0395-5_7

http://dx.doi.org/10.1007/978-1-4939-0395-5_6

of computer software and hardware. The so-called triple modular redundant systems, with 2-out-of-3:

G configuration, are now commonplace (e.g., Hewlett-Packard’s original NonStop server, and a

variety of aero, auto, and rail systems). The article “Reliability of Various 2-Out-of-4:G Redundant

Systems with Minimal Repair” (IEEE Transactions on Reliability, 2012: 170-179) considers using a

Poisson process with time-varying rate function to model how component failures occur over time so

that the rate of failure increases as a component ages; in addition, a component that fails undergoes

repair so that it can be placed back in service. Several failure modes for combined k-out-of-n systems

are studied in the article “Reliability of Combined m-Consecutive-k-out-of-n:F and Consecutive-kc-out-of-n:F Systems” (IEEE Transactions on Reliability, 2012: 215-219); these have applications in

the areas of infrared detecting and signal processing.

A compelling reason for manufacturers to be interested in reliability information about their

products is that they can establish warranty policies and periods that help control costs. Many

warranties are “one dimensional,” typically characterized by an interval of age (time). However,

some warranties are “two dimensional” in that warranty conditions depend on both age and cumula-

tive usage; these are common in the automotive industry. The article “Effect of Use-Rate on System

Lifetime and Failure Models for 2DWarranty” (Intl. Journal of Quality and Reliability Management,2011: 464-482) describes how certain bivariate probability models for jointly describing the behavior

of time and usage can be used to investigate the reliability of various system configurations.

The word queue is used chiefly by the British to mean “waiting line,” i.e., a line of customers or

other entities waiting to be served or brought into service. The mathematical development of models

for how a waiting line expands and contracts as customers arrive at a service facility, enter service,

and then finish began in earnest in the middle part of the 1900s and continues unabated today as new

application scenarios are encountered.

For example, the arrival and service of patients at some type of medical unit are often described by

the notation M/M/s, where the first M signifies that arrivals occur according to a Poisson process, the

second M indicates that the service time of each patient is governed by an exponential probability

distribution, and there are s servers available for the patients. The article “Nurse Staffing in Medical

Units: A Queueing Perspective” (Operations Research, 2011: 1320-1331) proposes an alternative

closed queueing model in which there are s nurses within a single medical unit servicing n patients,

where each patient alternates between requiring assistance and not needing assistance. The perfor-

mance of the unit is characterized by the likelihood that delay in serving a patient needing assistance

will exceed some critical threshold. A staffing rule based on the model and assumptions is developed;

the resulting rule differs significantly from the fixed nurse-to-patient staffing ratios mandated by the

state of California.

A variation on the medical unit situation just described occurs in the context of call centers, where

effective management entails a trade-off between operational costs and the quality of service offered

to customers. The article “Staffing Call Centers with Impatient Customers” (Operations Research,2012: 461-474) considers an M/M/s queue in which customers who have to wait for service may

become frustrated and abandon the facility (don’t you sometimes feel like doing that in a doctor’s

office?). The behavior of such a system when n is large is investigated, with particular attention to thestaffing principle that relates the number of servers to the square root of the workload offered to the

call center.

The methodology of queueing can also be applied to find optimal settings for traffic signals. The

article “Delays at Signalized Intersections with Exhaustive Traffic Control” (Probability in Engi-neering and Informational Sciences, 2012: 337-373) utilizes a “polling model,” which entails

multiple queues of customers (corresponding to different traffic flows) served by a single server in

cyclic order. The proposed vehicle-actuated rule is that traffic lights stay green until all lanes within a

group are emptied. The mean traffic delay is studied for a variety of vehicle interarrival-time

distributions in both light-traffic and heavy-traffic situations.

Introduction: Why Study Probability? xxi

Suppose two different types of customers, primary and secondary, arrive for service at a facility

where the servers have different service rates. How should customers be assigned to the servers? The

article “Managing Queues with Heterogeneous Servers” (Journal of Applied Probability, 2011:435-452) shows that the optimal policy for minimizing mean wait time has a “threshold structure”:

for each server, there is a different threshold such that a primary customer will be assigned to that

server if and only if the queue length of primary customers meets or exceeds the threshold.

Applications to Finance

The most explosive growth in the use of probability theory and methodology over the course of the

last several decades has undoubtedly been in the area of finance. This has provided wonderful career

opportunities for people with advanced degrees in statistics, mathematics, engineering, and physics

(the son-in-law of one of the authors earned a Ph.D. in mechanical engineering and taught for several

years, but then switched to finance). Edward O. Thorp, whom we previously met as the man who

figured out how to beat blackjack, subsequently went on to success in finance, where he earned much

more money managing hedge funds and giving advice than he could ever have hoped to earn in

academia (those of us in academia love it for the intangible rewards we get—psychic income, if

you will).

One of the central results in mathematical finance is the Black-Scholes theorem, named after the

two Nobel-prize-winning economists who discovered it. To get the flavor of what is involved here, a

bit of background is needed. Suppose the present price of a stock is $20 per share, and it is known that

at the end of 1 year, the price will either double to $40 or decrease to $10 per share (where those prices

are expressed in current dollars, i.e., taking account of inflation over the 1-year period). You can enter

into an agreement, called an option contract, that allows you to purchase y shares of this stock (for anyvalue y) 1 year from now for the amount cy (again in current dollars). In addition, right now you can

buy x shares of the stock for 20x with the objective of possibly selling those shares 1 year from now.

The values x and y are both allowed to be negative; if, for example, x were negative, then you would

actually be selling shares of the stock now that you would have to purchase at either a cost of $40 per

share or $10 per share 1 year from now. It can then be shown that there is only one value of c,specifically 50/3, for which the gain from this investment activity is 0 regardless of the choices of

x and y and the value of the stock 1 year from now. If c is anything other than 50/3, then there is an

arbitrage, an investment strategy involving choices of x and y that is guaranteed to result in a

positive gain.

A general result called the Arbitrage Theorem specifies conditions under which a collection of

investments (or bets) has expected return 0 as opposed to there being an arbitrage strategy. The basis

for the Black-Sholes theorem is that the variation in the price of an asset over time is described by a

stochastic process called geometric Brownian motion (see Sect. 7.6). The theorem then specifies a fair

price for an option contract on that asset so that no arbitrage is possible.

Modern quantitative finance is very complex, and many of the basic ideas are unfamiliar to

most novices (like the authors of this text!). It is therefore difficult to summarize the content of

recently published articles as we have done for some other application areas. But a sampling

of recently published titles emphasizes the role of probability modeling. Articles that appeared in

the 2012 Annals of Finance included “Option Pricing Under a Stressed Beta Model” and “Stochastic

Volatility and Stochastic Leverage”; in the 2012 Applied Mathematical Finance, we found

“Determination of Probability Distribution Measures from Market Prices Using the Method of

Maximum Entropy in the Mean” and “On Cross-Currency Models with Stochastic Volatility and

Correlated Interest Rates”; the 2012 Quantitative Finance yielded “Probability Unbiased Value-at-

xxii Introduction: Why Study Probability?


Risk Estimators” and “A Generalized Birth-Death Stochastic Model for High-Frequency Order Book

Dynamics.”

If the application of mathematics to problems in finance is of interest to you, there are now many

excellent masters-level graduate programs in quantitative finance. Entrance to these programs

typically requires a very solid background in undergraduate mathematics and statistics (including

especially the course for which you are using this book). Be forewarned, though, that not all

financially savvy individuals are impressed with the direction in which finance has recently moved.

Former Federal Reserve Chairman Paul Volcker was quoted not long ago as saying that the ATM

cash machine was the most significant financial innovation of the last 20 years; he has been a very

vocal critic of the razzle-dazzle of modern finance.

Probability in Everyday Life

In the hopefully unlikely event that you do not end up using probability concepts and methods in your

professional life, you still need to face the fact that ideas surrounding uncertainty are pervasive in our

world. We now present some amusing and intriguing examples to illustrate this.

The behavioral psychologists Amos Tversky and Daniel Kahneman spent much of their academic

careers carrying out studies to demonstrate that human beings frequently make logical errors when

processing information about uncertainty (Kahneman won a Nobel prize in economics for his work,

and Tversky would surely have also done so had the awards been given posthumously). Consider the

following variant of one Tversky-Kahneman scenario. Which of the following two statements is more

likely?

(A) Dr. D is a former professor.

(B) Dr. D is a former professor who was accused of inappropriate relations with some students,

investigation substantiated the charges, and he was stripped of tenure.

T-K’s research indicated that many people would regard statement B as being more likely, since it

gives a more detailed explanation of why Dr. D is no longer a professor. However, this is incorrect.

Statement B implies statement A. One of our basic probability rules will be that if one event B is

contained in another event A (i.e., if B implies A), then the smaller event B is less likely to occur or

have occurred than the larger event A. After all, other possible explanations for A are that Dr. D is

deceased or that he is retired or that he deserted academia for investment banking—all of those plus B

would figure in to the likelihood of A.

The survey article “Judgment under Uncertainty; Heuristics and Biases” (Science, 1974: 1124-1131) by T-K described a certain town served by two hospitals. In the larger hospital about 45 babies

are born each day, whereas about 15 are born each day in the smaller one. About 50% of births are

boys, but of course the percentage fluctuates from day to day. For a 1-year period, each hospital

recorded days on which more than 60% of babies born were boys. Each of a number of individuals

was then asked which of the following statements he/she thought was correct: (1) the larger hospital

recorded more such days, (2) the smaller hospital recorded more such days, or (3) the number of such

days was about the same for the two hospitals. Of the 95 participants, 21 chose (1), 21 chose (2), and

53 chose (3). In Chap. 5 we present a general result which implies that the correct answer is in fact (2),

because the sample percentage is less likely to stray from the true percentage (in this case about 50%)

when the sample size is larger rather than small.

In case you think that mistakes of this sort are made only by those who are unsophisticated or

uneducated, here is yet another T-K scenario. Each of a sample of 80 physicians was presented with

the following information on treatment for a particular disease:

Introduction: Why Study Probability? xxiii

http://dx.doi.org/10.1007/978-1-4939-0395-5_5

With surgery, 10% will die during treatment, 32% will die within a year, 66% will die within 5 years. With

radiation, 0% will die during treatment, 23% will die within a year, 78% will die within 5 years.

Each of the 87 physicians in a second sample was presented with the following information:

With surgery, 90% will survive the treatment, 68% will survive at least 1 year, and 34% will survive at least

5 years. With radiation, 100% will survive the treatment, 77% will survive at least 1 year, and 22% will survive

at least 5 years.

When each physician was asked to indicate whether he/she would recommend surgery or radiation

based on the supplied information, 50% of those in the first group said surgery whereas 84% of those

in the second group said surgery.

The distressing thing about this conclusion is that the information provided to the first group of

physicians is identical to that provided to the second group, but described in a slightly different way.

If the physicians were really processing information rationally, there should be no significant

difference between the two percentages.

It would be hard to find a book containing even a brief exposition of probability that did not

contain examples or exercises involving coin tossing. Many such scenarios involve tossing a “fair”

coin, one that is equally likely to result in H (head side up) or T (tail side up) on any particular toss.

Are real coins actually fair, or is there a bias of some sort? Various analyses have shown that the result

of a coin toss is predicable at least to some degree if initial conditions (position, velocity, angular

momentum) are known. In practice, most people who toss coins (e.g., referees in a football game

trying to determine which team will kick off and which will receive) are not conversant in the physics

of coin tossing. The mathematician and statistician Persi Diaconis, who was a professional magician

for 10 years prior to earning his Ph.D. and mastered many coin and card tricks, has engaged in

ongoing collaboration with other researchers to study coin tossing. One result of these investigations

was the conclusion based on physics that for a caught coin, there is a slight bias toward heads—about

.51 versus .49. It is not, however, clear under which real-world circumstances this or some other bias

will occur.

Simulation of fair-coin tossing can be done using a random number generator available in many

software packages (about which we’ll say more shortly). If the resulting random number is between

0 and .5, we say that the outcome of the toss was H, and if the number is between .5 and 1, then a T

occurred (there is an obvious modification of this to incorporate bias). Now consider the following

sequence of 200 Hs and Ts:

THTHTTTHTTTTTHTHTTTHTTHHHTHHTHTHTHTTTTHHTTHHTTHHHTHHHTTHHHTTTHHHTHHHHTTTHTHTHHHHTHTTTHHHTHHTHTTTHHTHHHTHHHHTTHTHHTHHHTTTHTHHHTHHTTTHHHTTTTHHHTHTHHHHTHTTHHTTTTHTHTHTTHTHHTTHTTTHTTTTHHHHTHTHHHTTHHHHHTHH

Did this sequence result from actually tossing a fair coin (equivalently, using computer simulation

as described), or did it come from someone who was asked to write down a sequence of 200 Hs and Ts

that he/she thought would come from tossing a fair coin? One way to address this question is to focus

on the longest run of Hs in the sequence of tosses. This run is of length 4 for the foregoing sequence.

Probability theory tells us that the expected length of the longest run in a sequence of n fair-coin

tosses is approximately log2(n) � 2/3. For n = 200, this formula gives an expected longest run of

length about 7. It can also be shown that there is less than a 10% chance that the longest run will have

a length of 4 or less. This suggests that the given sequence is fictitious rather than real, as in fact was

the case; see the very nice expository article “The Longest Run of Heads” (Mathematics Magazine,1990, 196-207).

As another example, consider giving a fair coin to each of the two authors of this textbook. Carlton

tosses his coin repeatedly until obtaining the sequence HTT. Devore tosses his coin repeatedly until

the sequence HTH is observed. Is Carlton’s expected number of tosses to obtain his desired sequence

xxiv Introduction: Why Study Probability?

the same as Devore’s, or is one expected number of tosses smaller than the other? Most students to

whom we have asked these questions initially answered that the two expected numbers should be the

same. But this is not true. Some rather tricky probability arguments can be used to show that Carlton’s

expected number of tosses is eight, whereas Devore expects to have to make ten tosses. Very

surprising, no? A bit of intuition makes this more plausible. Suppose Carlton merrily tosses away

until at some point he has just gotten HT. So he is very excited, thinking that just one more toss will

enable him to stop tossing the coin and move on to some more interesting pursuit. Unfortunately his

hopes are dashed because the next toss is an H. However, all is not lost, as even though he must

continue tossing, at this point he is partway toward reaching his goal of HTT. If Devore sees HT at

some point and gets excited by light at the end of the tunnel but then is crushed by the appearance of a

T rather than an H, he essentially has to start over again from scratch. The charming nontechnical

book Probabilities: The Little Numbers That Rule Our Lives by Peter Olofsson has more detail on this

and other probability conundrums.

One of the all-time classic probability puzzles that stump most people is called the BirthdayProblem. Consider a group of individuals, all of whom were born in the same year (one that did not

have a February 29). If the group size is 400, how likely is it that at least two members of the group

share the same birthday? Hopefully a moment’s reflection will bring you to the realization that a

shared birthday here is a sure thing (100% chance), since there are only 365 possible birthdays for the

400 people. On the other hand, it is intuitively quite unlikely that there will be a shared birthday if the

group size is only five; in this case we would expect that all five individuals would have different

birthdays.

Clearly as the group size increases, it becomes more likely that two or more individuals will have

the same birthday. So how large does the group size have to be in order for it to be more likely than

not that at least two people share a birthday (i.e., that the likelihood of a shared birthday is more than

50%)? Which one of the following four group-size categories do you believe contains the correct

answer to this question?

(1) At least 100 (2) At least 50 but less than 100

(3) At least 25 but less than 50 (4) Fewer than 25

When we have asked this of students in our classes, a substantial majority opted for the first two

categories. Very surprisingly, the correct answer is category (4). In Chapter 1 we will show that with

as few as 23 people in the group, it is a bit more likely than not that at least two group members will

have the same birthday.

Two people having the same birthday implies that they were born within 24 h of one another, but

the converse is not true; e.g., one person might be born just before midnight on a particular day and

another person just after midnight on the next day. This implies that it is more likely that two people

will have been born within 24 h of one another than it is that they have the same birthday. It follows

that a smaller group size than 23 is needed to make it more likely than not that at least two people will

have been born within 24 h of one another. In Sect. 4.9 we show how this group size can be

determined.

Two people in a group having the same birthday is an example of a coincidence, an accidental and

seemingly surprising occurrence of events. The fact that even for a relatively small group size it is

more likely than not that this coincidence will occur should suggest that coincidences are often not as

surprising as they might seem. This is because even if a particular coincidence (e.g., “graduated from

the same high school” or “visited the same small town in Croatia”) is quite unlikely, there are so many

opportunities for coincidences that quite a few are sure to occur.

Introduction: Why Study Probability? xxv

http://dx.doi.org/10.1007/978-1-4939-0395-5_1


Back to the follies of misunderstanding medical information: Suppose the incidence rate of a

particular disease in a certain population is 1 in 1000. The presence of the disease cannot be detected

visually, but a diagnostic test is available. The diagnostic test correctly detects 98% of all diseased

individuals (this is the sensitivity of the test, its ability to detect the presence of the disease), and 93%of non-diseased individuals test negative for the disease (this is the specificity of the test, an indicatorof how specific the test is to the disease under consideration). Suppose a single individual randomly

selected from the population is given the test and the test result is positive. In light of this information,

how likely is it that the individual will have the disease?

First note that if the sensitivity and the specificity were both 100%, then it would be a sure thing

that the selected individual has the disease. The reason this is not a sure thing is that the test

sometimes makes mistakes. Which one of the following five categories contains the actual likelihood

of having the disease under the described conditions?

1. At least a 75% chance (quite likely)

2. At least 50% but less than 75% (moderately likely)

3. At least 25% but less than 50% (somewhat likely)

4. At least 10% but less than 25% (rather unlikely)

5. Less than 10% (quite unlikely)

Student responses to this question have overwhelmingly been in categories (1) or (2)—another

case of intuition going awry. The correct answer turns out to be category (5). In fact, even in

light of the positive test result, there is still only a bit more than a 1% chance that the individual is

diseased!

What is the explanation for this counterintuitive result? Suppose we start with 100,000 individuals

from the population. Then we’d expect 100 of those, or 100, to be diseased (from the 1 in 1000

incidence rate) and 99,900 to be disease free. From the 100 we expect to be diseased, we’d expect

98 positive test results (98% sensitivity). And from the 99,900 we expect to be disease free, we’d

expect 7% of those or 6993 to yield positive test results. Thus we expect many more false positives

than true positives. This is because the disease is quite rare and the diagnostic test is rather good but

not stunningly so. (In case you think our sensitivity and specificity are low, consider a certain D-dimer

test for the presence of a coronary embolism; its sensitivity and specificity are 88% and 75%,

respectively.)

Later in Chapter 1 (Example 1.31) we develop probability rules which can be used to show that the

posterior probability of having the disease conditional on a positive test result is .0138—a bit over

1%. This should make you very cautious about interpreting the results of diagnostic tests. Before you

panic in light of a positive test result, you need to know the incidence rate for the condition under

consideration and both the sensitivity and specificity of the test. There are also implications for

situations involving detection of something other than a disease. Consider airport procedures that are

used to detect the presence of a terrorist. What do you think is the incidence rate of terrorists at a given

airport, and how sensitive and specific do you think detection procedures are? The overwhelming

number of positive test results will be false, greatly inconveniencing those who test positive!

Here’s one final example of probability applied in everyday life: One of the following columns

contains the value of the closing stock index as of August 8, 2012, for each of a number of countries,

and the other column contains fake data obtained with a random number generator. Just by looking at

the numbers, without considering context, can you tell which column is fake and which is real?

xxvi Introduction: Why Study Probability?

http://dx.doi.org/10.1007/978-1-4939-0395-5_1

China 2264 3058

Japan 8881 9546

Britain 5846 7140

Canada 11,781 6519

Euro area 797 511

Austria 2053 4995

France 3438 2097

Germany 6966 4628

Italy 14,665 8461

Spain 722 598

Norway 480 1133

Russia 1445 4100

Sweden 1080 2594

Turkey 64,699 35,027

Hong Kong 20,066 42,182

India 17,601 3388

Pakistan 14,744 10,076

Singapore 3052 5227

Thailand 1214 7460

Argentina 2459 2159

⋮ ⋮ ⋮

The key to answering this question is a result called Benford’s Law. Suppose you start reading

through a particular issue of a publication like the New York Times or The Economist, and each time

you encounter any number (the amount of donations to a particular political candidate, the age of an

actor, the number of members of a union, and so on), you record the first digit of that number. Possible

first digits are 1, 2, 3, . . ., or 9. In the long run, how frequently do you think each of these nine possible

first digits will be encountered? Your first thought might be that each one should have the same long-

run frequency, 1/9 (roughly 11%). But for many sets of numbers this turns out not to be the case.

Instead the long-run frequency is given by the formula log10[(x + 1)/x], which gives .301, .176, .125,

. . ., .051, .046 for x = 1, 2, 3, . . ., 8, 9. Thus a leading digit is much more likely to be 1, 2, or 3 than

7, 8, or 9.

Examination of the foregoing lists of numbers shows that the first column conforms much more

closely to Benford’s Law than does the second column. In fact, the first column is real, whereas the

second one is fake. For Benford’s Law to be valid, it is generally required that the set of numbers

under consideration span several orders of magnitude. It does not work, for example, with batting

averages of major league baseball players, most of which are between .200 and .299, or with fuel

efficiency ratings (miles per gallon) for automobiles, most of which are currently between 15 and 30.

Benford’s Law has been employed to detect fraud in accounting reports, and in particular to detect

fraudulent tax returns. So beware when you file your taxes next year!

This list of amusing probability appetizers could be continued for quite a while. Hopefully what

we have shown thus far has sparked your interest in knowing more about the discipline. So without

further ado . . .

Introduction: Why Study Probability? xxvii

Springer Texts in Statistics978-3-319-52401-6/1.pdf · Springer Texts in Statistics ... Matthew A....

Documents

Transcript of Springer Texts in Statistics978-3-319-52401-6/1.pdf · Springer Texts in Statistics ... Matthew A....