1614 probability-models and concepts

79
ROBABILITY : P MODELS & CONCEPTS

Transcript of 1614 probability-models and concepts

Page 1: 1614 probability-models and concepts

ROBABILITY:

P

MODELS & CONCEPTS

Page 2: 1614 probability-models and concepts

Chance, Consequence & Strategy:

Likelihood or ProbabilitySince there is little in life that occurs with absolute certainty, probability theory has found application in virtually every field of human endeavor.

Page 3: 1614 probability-models and concepts

Why Probability Theory?

As we observe the universe about us, wonderful Craftsmanship can be seen.

As we examine the elements of this creation we discover that there is incredible order, but also variation therein.

Probability theory seeks to describe the variation or randomness within order so that underlying order may be better understood.

Once understood, strategies can be more effectively formulated and their risks evaluated.

Page 4: 1614 probability-models and concepts

Objective Assessment:Apriori & Aposteriori Probability

Apriori means “before the fact” and hence probability assessments of this sort typically rely on a study of

traits of the phenomenon under consideration. Based on Theory.

Aposteriori means “after the fact”. This approach to likelihood assessment is also called the “relative frequency” approach.Based on repeated observation.

Page 5: 1614 probability-models and concepts

Likelihood Concepts EVENTS

• As we observe a phenomenon, we generally note that varying, and sometimes “identical” conditions do not always give rise to identical results. As a phenomena is repeatedly observed, the various possible results can be thought of as “events”.

Page 6: 1614 probability-models and concepts

Mutually Exclusive Events

• Any number of events are said to be mutually exclusive if they have no overlap or commonality.

“Nothing is impossible Mario; improbable, unlikely maybe, but not impossible.” Luigi Mario speaking to brother, Mario Mario in the movie, “Super Mario Bros.”

Page 7: 1614 probability-models and concepts

• A collection of events is exhaustive if, taken in totality, they account for all possible results or outcomes.

A

B A and B are mutually exclusive.

Mutually Exclusive & Exhaustive Events

Page 8: 1614 probability-models and concepts

Intersection & Union of Events

The intersection of two or more events is like the intersection of two streets --- it is the property they share in-common.

The intersection of events A and B is symbolized by AB.

The union of two or more events is the totality of results captured by these events.The union of two events A and B is symbolized by AUB

Page 9: 1614 probability-models and concepts

Notation & Definitions The probability of the event A is given by: P(A)

The probability of AB is P(AB) = P(A) + P(B) - P(AUB) where P(AUB) is the probability of the union of events A and

B. The conditional probability of the event A given that the event B

has occurred is: P(A|B) = P(AB)/P(B)DEPENDENCE & INDEPENDENCE

Two events A and B are said to be independent if and only if: P(A|B) = P(A) and P(B|A) = P(B)

It follows from this that if A and B are independent then P(AB) = P(A)*P(B)

This is the multiplication rule for independent events.

Page 10: 1614 probability-models and concepts

A Service Sector Example:Fast Food Clientele

A leading fast food restaurant chain routinely & randomly surveys its customers in an effort to continually improve ability to serve their clientele. Two primary questions on the survey address frequency of customer patronage and primary reason for this patronage. Results of last month’s survey of 1,000 customers are recorded in the following table.

Page 11: 1614 probability-models and concepts

Survey of 1000 Customers:Frequency of and Reason for Patronage

occasional moderate frequent TOTALS

menu/food 60 120 30 210

customer relations

75 180 45 300

value/cost 35 200 40 275

location/ access

60 80 25 165

other reason

20 20 10 50

TOTALS 250 600 150 1000

Page 12: 1614 probability-models and concepts

Marginal Probability• Marginal probabilities can be thought of as

the probabilities of being in the various margins of the table. For example, the marginal probability of a customer patronizing the restaurant chain due to menu, regardless of frequency of patronage is:

• P(menu) = 210/1000 = .21• The various marginal probabilities for this

example are determined and represented graphically as follows. The graphs are “marginal probability distributions”.

Page 13: 1614 probability-models and concepts

Frequency of Patronage:Marginal Probability

Distribution • Occasional Patrons:

P(occasional) = 250/1000 = .25• Moderate Patronage:

P(moderate) = 600/1000 = .60• Frequent Patrons:

P(frequent) = 150/1000 = .150

0.1

0.2

0.3

0.4

0.5

0.6

occasion

moderate

frequent

Page 14: 1614 probability-models and concepts

Reasons for Patronage:Marginal Probability

Distribution• Menu: P(Menu) = 210/1000 = .21• C.Rel.: P(CR) = 300/1000 = .30• Value: P(Value) = 275/1000 = .275• Location: P(Loc) =165/1000 = .165• Other: P(Other) = 50/1000 = .05

0

0.05

0.1

0.15

0.2

0.25

0.3

menucust.rel.valuelocationother

Page 15: 1614 probability-models and concepts

Joint Probability• Consider the cross-tabulation relating the two traits:

– frequency of patronage, and – primary reason for patronage

• Joint probabilities are probabilities of intersections of the categories (or events) of two traits. As an example, the joint probability that a customer is moderate in their patronage and their primary reason for patronage is the menu is given by

– P(moderate menu) = P(AB) = 120/1000 = .120.• A graphical representation of the complete joint probability

distribution follows.

Page 16: 1614 probability-models and concepts

Reasons & Frequency of Patronage Joint

Probabilities

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

occasion moderate frequent

menucust.rel.valuelocationother

Page 17: 1614 probability-models and concepts

Conditional Probability

Conditional probability can be thought of as probability determined in the mode of either “what if” or “given that”

For example, we might ask, “what is the probability that a customer’s primary reason for patronage is the value (A), given that the customer is frequent (B) in their patronage?”

This is symbolized by P(A|B) and is calculated as P(AB)/P(B) where the vertical line, “|” is read as “given that”.

Thus a “conditional probability” is equal to the probability of the appropriate intersection, divided by the marginal probability of the given.

Page 18: 1614 probability-models and concepts

Reasons for Frequent Patrons:

Conditional Probabilities

0

0.2

0.4 menucust.rel.valuelocationother

• P(value | frequent) = P(value ∩ frequent)/P(frequent) =

(40/1000) / (150/1000) = .04/.15 = .267• This is represented by the “red” bar above. The entire

“conditional probability distribution of reasons for patronage by frequent customers” is displayed above.

Page 19: 1614 probability-models and concepts

Independence & Dependence

Recall that two events, A & B, are mutually independent if and only if P(A|B) = P(A) and P(B|A) = P(B)

Are the events A & B independent where: A = Primary patronage reason is customer relations B = Customer is a frequent in patronage Recall that P(A) = .3, that P(B) = .15 and that

P(AB) = 45/1000 = .045 so that P(A|B) = P(AB)/P(B) = .045/.15 = .30 = P(A) P(B|A) = P(AB)/P(A) = .045/.30 = .15 = P(B) Indeed, A & B are independent.,

Page 20: 1614 probability-models and concepts

Independence - Key Concept

If two events, A & B, are independent then the occurrence of one of the two events does not change the LIKELIHOOD or probability that the other of the two events will occur.

Occurrence of one of the two events does alter the MANNER in which the other of the two events may occur.

Page 21: 1614 probability-models and concepts

Dependence If two events A & B are dependent then P(A|B) will

not equal P(A) and, similarly, P(B|A) will not equal P(B).

Let A = primary reason for patronage is menu. Let B = frequency of patronage is moderate. We have P(A|B) = 120/600 = .20 and is not equal

to P(A) = 210/1000 = .21. P(B|A) = 120/210 = .57 which is not equal to

P(B) = 600/1000 = .60. In this case, even though values are comparable

they are not equal => dependence.

Page 22: 1614 probability-models and concepts

Dependence - Key Concept

If two events A & B are dependent, then occurrence of one of the two events will alter the likelihood and the manner in which the other of the two events may occur.

In the case of mutually exclusive events, occurrence of one of the two events will preclude occurrence of the other event.

Mutually exclusive events are always dependent.

Page 23: 1614 probability-models and concepts

ProbabilityModels

Page 24: 1614 probability-models and concepts

Probability Models Probability models are mathematical descriptions of the

behavior of one or more variables. The ability to somewhat anticipate the behavior of a variable can be useful in risk assessment and strategy formulation.

Three commonly used models, the binomial, Poisson, and normal models, are introduced.

Random variables described by these models may be either ‘discrete’ or ‘continuous’.

Page 25: 1614 probability-models and concepts

Mean, Variance and Standard Deviation of a

Random Variable The mean of a random variable (r.v.) Y is denoted by

Y. For a discrete r.v. Y this is calculated as: Y = yiP(yi) This is the weighted average of the values of Y. For continuous random variables, integration replaces summation.

The variance and standard deviation of the r.v. Y are represented by 2

Y and Y, respectively. For a discrete r.v. Y, these are:

2Y = Pyi)(yi - Y)2 and Y = 2

Y

Page 26: 1614 probability-models and concepts

The Poisson ModelNapoleon had a problem: many of

his men were killed when kicked in the head by their own horse or mule.

Napoleon had to plan for this problem.

The Poisson model helped him to do so.

Page 27: 1614 probability-models and concepts

Poisson ConditionsThe Poisson model (or distribution) is

commonly applicable when: We are modeling events which occur only “rarely”,

where “rare” means “rare relative to opportunity for occurrence”.

Our random variable will be the “number of occurrences of the event over the region of opportunity for occurrence”.

Page 28: 1614 probability-models and concepts

Poisson Conditions:Region of Opportunity

Examples of region of opportunity include:number of customers arriving per minute

(or any other time unit);number of phone calls arriving at a

switch board per unit time;number of scars on the surface of a

compact disk.Generally “region of opportunity” is

defined either temporally or spatially.

Page 29: 1614 probability-models and concepts

The Poisson Model is Integral to the Study of

Queueing Theory

Page 30: 1614 probability-models and concepts

The Poisson Model• Defining our random variable as Y = “number of

occurrences of the event over the region of opportunity”, y = 0, 1, 2, 3, ... we have the Poisson probability model:

• P(y) = ye-/y! for y = 0, 1, 2, 3, ...

• Where is the mean or average number of occurrences of the event over the region of opportunity and e = 2.7183 is the natural base.

Page 31: 1614 probability-models and concepts

Estimation of the Process Mean,

• The mean of the Poisson process is ,• The variance of the process is also , that is2 = • so that the standard deviation is = • In the following example we proceed as though is of

known value. When this is not the case we simply estimate with X, the mean of the sample.

Page 32: 1614 probability-models and concepts

First Federal Bank of Centerville

A Queueing Example First Federal Bank (FFB) of Centreville has an automatic teller

machine (ATM) near the entrance of the bank. Long lines at the ATM have sometimes led to congestion and

perhaps a diminishing clientele. With a view toward improved customer service, FFB is considering the addition of one or more ATMs or, possibly, relocation of the current ATM.

During peak hours ATM users arrive in a manner described by a Poisson distribution with a mean of 1.7 customers per minute.

Page 33: 1614 probability-models and concepts

First Federal Bank of Centreville

The Probability Distribution• What is the probability that no customers arrive in

one-minute during a peak business period?

• Solution: P(0) = 1.70e-1.7/0! = .1827• Similarly, P(1) = 1.71e-1.7/1! = .3106• Determine probabilities for 2, 3, ...., 9 customers.

The probability distribution appears on the next slide.

Page 34: 1614 probability-models and concepts

FFB of CentrevilleProbability Distribution

x P(X = x) 0 0.1827 1 0.3106 2 0.2640 3 0.1496 4 0.0636 5 0.0216 6 0.0061 7 0.0015 8 0.0003 9 0.0001 10 0.0000

x P(X LESS < x) 0 0.1827 1 0.4932 2 0.7572 3 0.9068 4 0.9704 5 0.9920 6 0.9981 7 0.9996 8 0.9999 9 1.0000

Poisson Probabilities with µ= 1.7

Poisson Cum

ulative Probabilitiesw

ith µ= 1.7

Page 35: 1614 probability-models and concepts

First Federal Bank of CentrevilleATM Customer Probabilities

0

0.05

0.1

0.15

0.2

0.25

0.3

0.350123456789

Page 36: 1614 probability-models and concepts

First Federal Bank of Centreville

CDF Graph

0

0.2

0.4

0.6

0.8

10123456789

The cdf graph above was constructed by adding the appropriate Poisson probabilities.

Page 37: 1614 probability-models and concepts

First Federal Bank of Centreville

Key ConsiderationsKey factors that FFB should address

prior to making a decision include:What is the service rate (how quickly do

customers complete their ATM transactions)? If long lines are forming during peak hours, the

service rate may be less than customer arrival rate and addition of one or more ATMs may be necessary.

If the problem is congestion, rather than excessive wait to use the ATM, the solution may be to simply move the ATM.

Page 38: 1614 probability-models and concepts

Model Adequacy:Chi-Square Goodness of Fit Testing

Page 39: 1614 probability-models and concepts

DOES THIS MODEL FIT?Chi-Square Goodness-of-Fit

Tests The purpose of 2 goodness-of-fit tests is to

evaluate whether a particular probability distribution does an adequate job of modeling the behavior of the process under consideration. This sort of test can be applied to any model.

A “skeleton” or template for the chi-square goodness-of-fit test follows.

Page 40: 1614 probability-models and concepts

2 Goodness of Fit Test - General Layout.

1) H0: p1 = p10, p2 = p20, ... , pk = pk0

HA: at least one pi ≠ pi0

2) n = _______ = _______3) DR: Reject H0 in favor of HA iff 2

calc > 2crit = ___.

Otherwise, FTR H0.4) 2

calc = (Oi - npio)2/npio = (Oi - Ei)2/Ei

5) Interpretation: Should relate to whether the hypothesized model adequately describes behavior of the process underconsideration.

Page 41: 1614 probability-models and concepts

Generic Example: A computer manufacturer produces a disk drive which has three major causes of failure (A, B, C) and a variety of minor failure causes (D).

Suppose that historic failure rates are:Due to A: .20 Due to B: .35 Due to C: .30 Due to D: .15The manufacturer has worked on A, B, and C and believes that failures due to these causes has been reduced, so that, while fewer failure will occur, it is more likely that when one occurs, it will be due to D. To examine this claim the manufacturer will sample 200 failed disk drives manufactured since process changes were made. IF THE CHANGES HAD NO IMPACT then the number of these failed drives that were due to causes A, B, C, and D that would be EXPECTED would be:EA = npA0 = 200(.20) = 40 EB = npB0 = 200(.35) = 70EC = npC0 = 200(.30) = 60 ED = npD0 = 200(.15) = 30

Upon observation, suppose that we had OA = 28, OB = 66, OC = 46, OD = 60. Test the appropriate hypothesis at the= .05 level.

CONTINUED NEXT PAGE

Page 42: 1614 probability-models and concepts

Failure Mode Profile Example - Continued

1) H0: pA = .20, pB = .35, pC = .30, pD = .15

HA: at least one pi ≠ pi0 for i = A, B, C, D

2) n = 200 = .05

3) DR: Reject H0 in favor of HA iff 2c > 2

T = 7.8147. Otherwise, FTR H0. Note: There are (k-1) = 3 degrees of freedom.

4) 2c = (Oi - npio)2/npio = (Oi - Ei)2/Ei

= (28-40)2/40 + (66-70)2/70 + (46-60)2/60 + (60-30)2/30 = 3.6000 + 0.2286 + 3.2667 + 30.0000 = 37.0953

5) Interpretation: Since 2c exceeds 2

T, we can conclude that the historic failure mode distribution no longer applies (reject H0 in favor of HA). So how has the distribution changed? The answer is embedded in the individual category contributions to 2

calc ... larger contributions indicate where the changes have occurred: reductions in A and C, no obvious change in B, the various failures that make-up D now comprise a (proportionally) larger amount of the failures.

Page 43: 1614 probability-models and concepts

Chi-Square Goodness of Fit Test

for the Poisson DistributionA sample of 120 minutes selected during rush periods at FFB gave the following number of customers arriving during each of those 120 minutes. Is this data consistent with a Poisson distribution with a mean of 1.7 customers per minute, as previously stated? Test the appropriate hypothesis at the = .10 level of significance.

Number of 0 1 2 3 4 or more Customers

Frequency 25 42 35 9 9

Page 44: 1614 probability-models and concepts

FFB of CentrevillePoisson Goodness of Fit

TestCustomers/ Prob. Obs (O) Exp (E) (O-E)2/Eminute 0 0.1827 25 21.924 0.4316 1 0.3106 42 37.272 0.5998 2 0.2640 35 31.680 0.3479 3 0.1496 9 17.952 4.4640

> 4 0.0932 9 11.184 0.4265 1.00 120 120 6.2698 = 2

calc

with = .10 and (k-1) = 4 df, the critical value is 7.7794

Page 45: 1614 probability-models and concepts

FFB of Centreville - Continued

1) H0: the number of customers arriving per minute is Poisson distributed with a mean of 1.7. OR p(0) = .1827 p(1) = .3106 p(2) = .2640 p(3) = .1496 p(4+) = .0932

HA: the number of customers arriving per minute is not Poisson with = 1.7

2) n = 120 and = .10

3) DR: Reject H0 in favor of HA iff 2calc > 2

crit = 7.7794. Otherwise, FTR H0. (NOTE - THERE ARE 4 DF)

4) 2calc = 6.2698 (calculations on previous slide)

5) FTR H0. In this case, the number of customers arriving per minute during the business rush at FFB of Centreville is reasonably well-modeled by a Poisson distribution with a mean of 1.7.

As a modification --- if we had not had information about the mean number of customers arriving per minute, we would have had to estimate this value with the sample mean and then determined the estimated probabilities. This would have cost an additional degree of freedom (e.g. df = (k-1) - 1 = 3.

Page 46: 1614 probability-models and concepts

Binomial Conditions

Suppose that there are two possible outcomes to an experiment which are mutually exclusive and exhaustive (refer to these generically as “success” and “failure”);

a predetermined sample size, n;the probability of “success” is a constant, p, and

the probability of “failure” is a constant, (1-p);the condition of one item is not influenced by the

condition of any other item (this is called independence).

Collectively, these are the binomial conditions.

Page 47: 1614 probability-models and concepts

Binomial Probability Model

• When the binomial conditions are present, and the random variable Y is defined as the number of “successes” out of n items sampled, then the model which determines probabilities for the various values of Y is given by:

• P(Y = y) = [nCy]py(1-p)n-y • where nCy = n!/[y!(n-y)!] is read as the number of

combinations of n things selected y-at-a-time.• with any integer x! being x(x-1)(x-2)...(1)• so that, for example, 5! = 5(4)(3)(2)(1) = 120

Page 48: 1614 probability-models and concepts

Binomial Mean, Varianceand Standard Deviation

• Although the formulas previously presented can be used to determine the values of Y, 2

Y and Y, the following results are more easily applied in the binomial case:

Y = np2

Y = np(1-p)Y = np(1-p)

Page 49: 1614 probability-models and concepts

Estimation of p The binomial parameter, p, is thought of as the “probability that

any single item sampled is identified as a ‘success’ “. Frequently this value will be unknown and will need to be

estimated from sample information. p is estimated as simply x/n where x is the number of ‘successes’

in the sample of n items. This estimate is often denoted by p. Similarly, the estimate of (1-p) is (1-p).

^^

Page 50: 1614 probability-models and concepts

The Electronix Store

In a competitive local retail electronics market, the probability that a randomly selected “customer” browsing in The Electronix Store will make a purchase is .2.

If 6 “customers” are randomly selected, what is the probability that exactly 2 of these individuals will make a purchase?

This and similar questions can be addressed via the binomial distribution.

Page 51: 1614 probability-models and concepts

The Electronix StoreWe identify:

n = 6 customersp = .2 = probability that a customer buysY = number of the six customers who buy

Thus we see that:Y = np = 6(.2) = 1.2 Customers2

Y = np(1-p) = 6(.2)(.8) = .96Y = √ .96 = .98 Customers

^

^^ ^

Page 52: 1614 probability-models and concepts

The Electronix Store

• We have:– P(0) = .2621 P(1) = .3932– P(2) = .2458 P(3) = .0819– P(4) = .0154 = {6!/[4!2!]}(.2)4(.8)2

– = 15(.0016)(.64)– P(5) = .0015 P(6) = .0001 (or .000064)

Page 53: 1614 probability-models and concepts

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0123456

The Electronix StoreCustomer Purchase

Probabilities

Page 54: 1614 probability-models and concepts

The Electronix Store

We may require answers to such questions as:“What is the probability that no more than two of six

customers make a purchase?”“What is the probability that at least four of six

customers make a purchase?”“How many cash registers are needed?”

Answers to these and similar questions can be investigated through study such as we have undertaken.

Page 55: 1614 probability-models and concepts

The Electronix StoreCumulative Probabilities

0

0.5

10123456

• A tabulation of the “less than or equal to” probabilities is called a “cumulative distribution function” or cdf. The Electronix Store cdf appears above.

Page 56: 1614 probability-models and concepts

Application of this information might spark discussion on:staffing decisions,sales representative

specialization focus on

merchandise, value, and customer service.

The Electronix Store:Strategy

Page 57: 1614 probability-models and concepts

2 Goodness-of-Fit Test: Binomial

ExampleOil & Gas Exploration is both expensive and risky. The average cost of a “dry hole” is in excess of $20 million. New technologies are always under development in an effort to reduce the likelihood of drilling a “dry hole” with the result being increased profitability. Suppose an experimental technology has been developed that claims to have an 80% success rate (e.g. only 20% dry holes). This technology was tested by drilling four holes and counting the number of productive wells. This was done 100 times, each time counting the number of productive wells. The data is recorded below:

Number of productive wells 0 1 2 3 4

Observed 3 6 22 41 28 Frequency

Test the appropriate hypothesis at the = .01 level of significance.

Page 58: 1614 probability-models and concepts

Oil & Gas Exploration Example

1) H0: the new technology delivers success according to a binomial distribution with p = .8 or ... p(0 or 1) = .0272 p(2) = .1536 p(3) = .4096 p(4) = .4096 (NOTE - SEE NEXT PAGE FOR THESE VALUES)

HA: the new technology does not deliver success according to a binomial distribution with p=.8.

2) n = 100 and = .01

3) DR: Reject H0 in favor of HA iff 2calc > 2

crit = 11.3449. Otherwise, FTR H0.

4) 2calc = 21/4705 (calculations on next slide)

5) Reject H0 in favor of HA. In this case, note that “O” tends to be greater than “E” for lower numbers of successful wells, and the reverse for higher numbers of successful wells ... this indicates that the success rate of the new technology is LESS THAN THE CLAIMED 80% rate.

Page 59: 1614 probability-models and concepts

Hits Prob Count Expected Combined C-Prob C-Count C-Expect (O-E)^2/E X^2calc 0 0.0016 3 0.16 0-1 0.0272 9 2.72 14.4994 21.4705 1 0.0256 6 2.56 2 0.1536 22 15.36 2.8704 2 0.1536 22 15.36 3 0.4096 41 40.96 0.0000 3 0.4096 41 40.96 4 0.4096 28 40.96 4.1006 4 0.4096 28 40.96

Page 60: 1614 probability-models and concepts

Modified Oil & Gas Exploration Example

(still binomial)If p were unknown, then it would have to be estimated from the data. There is a cost to this --- a lost degree of freedom. In general df = (k - 1) - m

where k = number of categories-1 because the probabilities across all categories add to one (lacking only one probability, we can determine the otherm = the number of parameters that must be estimated.

In this case, the estimate of p is this: a total of 400 wells were drilled (100 fields at 4 wells each). The number of productive wells was (3*0 + 6*1 + 28*2 + 41*3 + 22*4) = 273

So that our estimate of p is 273/400 = .6825. The modified calculations follow.

Page 61: 1614 probability-models and concepts

Modified Oil & Gas Exploration Example

MTB > pdf;SUBC> binomial n=4 p=.6825.

BINOMIAL WITH N = 4 P = 0.682500 K P( X = K) Observed Expected (O-E)2/E 0 0.0102

combine these .0976 9 9.76 0.0592 1 0.0874

2 0.2817 28 28.17 0.0010 3 0.4037 41 40.37 0.0098 4 0.2170 22 21.70 0.0041

0.0742 = calculated value of 2

MTB > invcdf .99;SUBC> chis 2. 0.9900 9.2103 = critical value

Clearly we would FTR H0. So that if you combine the information, really, you havenot rejected the binomial distribution altogether ... though you did reject the binomialdistribution with p=.8. The binomial distribution with p=.6825 does an excellent jobof modeling the performance of this new oil & gas exploration technology.

Page 62: 1614 probability-models and concepts

The Normal Probability Model

The “normal” or “Gaussian” distribution is the most commonly used of all probability models.This distribution is known perhaps most familiarly as the “bell curve”. The normal distribution serves as the assumed model of behavior for various phenomena, generally as an approximation. It is also foundational to the development of numerous commonly used statistical methods.

Page 63: 1614 probability-models and concepts

The Normal Distribution• The normal distribution is described by the

mathematical expression:

f(x) = (1/ √ 22)exp(-(x-)2/22)

X is a random variable with mean and standard deviation exp = e = 2.7183 is the natural base, raised to the power expressed in the ( ). As will be seen, we need not work with the formula above.

Page 64: 1614 probability-models and concepts
Page 65: 1614 probability-models and concepts

00.020.040.060.080.1

0.120.140.160.180.2

A histogram representation of the normal distribution might appear as this one.

The normal distribution is symmetric about its mean, It is also well-tabled as the “standard normal distribution” with = 0 and = 1.

The Normal Probability Model

Page 66: 1614 probability-models and concepts

Table Use - Relationships

Since the normal distribution isa probability distribution, with total area

under the curve equal to 1, andsymmetric about its mean, µ, we have:

P(Z > Z*) = .5 - A(Z*) where Z* > 0 A(-Z*) = A(Z*) by symmetry. Knowing these few relationships, any needed

probabilities can be found. Only positive values of Z need be tabled.

Page 67: 1614 probability-models and concepts

Z Table Use Examples

• Using available Z tables determine :• A(1.33) and A(-1.33)• The probability of being between Z = -1.33 and +1.33.• The probability that Z is at most 1.33• The probability that Z is at least 1.33• The probability that Z is at most -1.33• The probability that Z is between -.75 and +1.2• The probability that Z is between +.50 and +1.2

Page 68: 1614 probability-models and concepts

-1.33 0 .5 .75 1.2 1.33

Page 69: 1614 probability-models and concepts

Z Table - Selected Portions

Z 0.00 0.01 0.02 0.03 0.04 0.05 ......... 0.090.0 .0000 .0040 .0080 .0120 .0160 .0199 ......... .0359

0.5 .1915 .1950 .1985 .2019 .2054 .2088 ......... .2224

0.7 .2580 .2611 .2642 .2673 .2704 .2734 ......... .2852

1.2 .3849 .3869 .3888 .3907 .3925 .3944 ......... .4015

1.3 .4032 .4049 .4066 .4082 .4099 .4155 ......... .4177

Page 70: 1614 probability-models and concepts

Inverse Use of the Z Table

In application, there are two common variations requiring opposite use of tables of the standard normal distribution.

We have illustrated the first variation where, given one or more values of Z, we can determine the needed area under the curve (e.g. the needed probability).

The “inverse” situation is one in which an area under the curve is designated, and the corresponding value(s) of Z are obtained.

Page 71: 1614 probability-models and concepts

Inverse Use of the Z Table The inverse approach is to:

locate the appropriate area or probability in the body of the table,

then move to the corresponding top and left table margins to identify the appropriate value(s) of Z.

From this we have X = + Z

Page 72: 1614 probability-models and concepts

A(Z) = known

?

Application of the Inverse Normal

Page 73: 1614 probability-models and concepts

The Normal Distribution in General

We can determine probabilities for any normally distributed process performance measure or PPM, X, by determining the corresponding value of Z, that is Z = (X - )/

Inversely, given an area under the curve, we can determine a needed value of X as: X = + Z

Page 74: 1614 probability-models and concepts

The SUPER MarketThe SUPER Market, a major metropolitan area

superstore chain, offers delivery service to addresses within a defined region.

The SUPER Market guarantees delivery within two hours of the time that the order is received. If this guarantee is not met, the customer receives a 10% discount for each 30 minutes late.

Page 75: 1614 probability-models and concepts

The SUPER Market

• Delivery time is approximately normally distributed with an average delivery time of 1 hour and 20 minutes and a standard deviation of 20 minutes. That is = 80 min. and = 20 min.

Guaranteeddelivery

within two hours!

Page 76: 1614 probability-models and concepts

The SUPER Market:Time to Delivery

• Inverse Problems• Given a designated

probability, what is the corresponding value of Z and, in turn, X = delivery time?

Page 77: 1614 probability-models and concepts

A Goodness of Fit Test for the Normal

Distribution IS DELIVERY TIME NORMAL? To determine whether delivery times for the SUPER MARKET are, within reason, normally distributed we would select a random sample of delivery times and apply any of a number of goodness of fit techniques.

While the chi-square goodness of fit test could be applied, a graphical procedure, the normal probability plot, will be illustrated. This is augmented by a more formal procedure, the Anderson-Darling test.

To proceed we will select a sample of, say, 40 delivery times. These appear in the sequel.

Page 78: 1614 probability-models and concepts

40 Sampled Delivery Times56 89 123 97 68 79 80 96 74 108 86 65 102 96 90 88 67 87 58 71 72 83 90 59 76 73 82 88 63 114 86 54 109 43 69 47 90 96 52 117

N Mean Median Std. Dev. Del_Time 40 81.07 82.50 19.45

Page 79: 1614 probability-models and concepts

p-value: 0.934A-Squared: 0.166

Anderson-Darling Normality Test

N of data: 40Std Dev: 19.448Average: 81.075

120110100908070605040

.999

.99

.95

.80

.50

.20

.05

.01

.001

Prob

abilit

y

Del_Time

Normal Probability PlotSampled Delivery Times from the SUPER Market

Normally distributed values should plot VERY close to a straight line. While this is a judgment call, a more objective approach is to examine the p-value from theAnderson-Darling test -- if the p-value is less than , then normality is questionable.