Statistics for Managers Using Microsoft Excel 3 rd Edition

Post on 03-Jan-2016

51 views 1 download

Tags:

description

Statistics for Managers Using Microsoft Excel 3 rd Edition. Chapter 5 The Normal Distribution and Sampling Distributions. Chapter Topics. The normal distribution The standardized normal distribution Evaluating the normality assumption The exponential distribution. Chapter Topics. - PowerPoint PPT Presentation

Transcript of Statistics for Managers Using Microsoft Excel 3 rd Edition

Statistics for Managers Using Microsoft Excel

3rd Edition

Chapter 5The Normal Distribution and

Sampling Distributions

Chapter Topics

The normal distribution The standardized normal distribution Evaluating the normality assumption The exponential distribution

Chapter Topics

Introduction to sampling distribution

Sampling distribution of the mean

Sampling distribution of the proportion

Sampling from finite population

(continued)

Continuous Probability Distributions

Continuous random variable Values from interval of numbers Absence of gaps

Continuous probability distribution Distribution of continuous random variable

Most important continuous probability distribution The normal distribution

The Normal Distribution

“Bell shaped” Symmetrical Mean, median and

mode are equal Interquartile range

equals 1.33 Random variable

has infinite range

Mean Median Mode

X

f(X)

The Mathematical Model

21

2

2

1

2

: density of random variable

3.14159; 2.71828

: population mean

: population standard deviation

: value of random variable

X

f X e

f X X

e

X X

Expectation

0

)(

)(

22

22

22

2/)(

21

2/)(

21

2/)(

21

dxe

xdex

dxxeXE

x

x

x

Variance

2

)(

2

2/2

2

2/)(2

212

2

22

2

dyey

deXE

y

xxx

Many Normal Distributions

By varying the parameters and , we obtain different normal distributions

There are an infinite number of normal distributions

Finding Probabilities

Probability is the area under the curve!

c dX

f(X)

?P c X d

Which Table to Use?

An infinite number of normal distributions means an infinite number of tables to look

up!

Solution: The Cumulative Standardized Normal

Distribution

Z .00 .01

0.0 .5000 .5040 .5080

.5398 .5438

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255

.5478.02

0.1 .5478

Cumulative Standardized Normal Distribution Table (Portion)

Probabilities

Shaded Area Exaggerated

Only One Table is Needed

0 1Z Z

Z = 0.12

0

Standardizing Example

6.2 50.12

10

XZ

Normal Distribution

Standardized Normal

Distribution

Shaded Area Exaggerated

10 1Z

5 6.2 X Z0Z

0.12

Example:

Normal Distribution

Standardized Normal

Distribution

Shaded Area Exaggerated

10 1Z

5 7.1 X Z0Z

0.21

2.9 5 7.1 5.21 .21

10 10

X XZ Z

2.9 0.21

.0832

2.9 7.1 .1664P X

.0832

Z .00 .01

0.0 .5000 .5040 .5080

.5398 .5438

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255

.5832.02

0.1 .5478

Cumulative Standardized Normal Distribution Table (Portion)

Shaded Area Exaggerated

0 1Z Z

Z = 0.21

Example: 2.9 7.1 .1664P X

(continued)

0

Z .00 .01

-03 .3821 .3783 .3745

.4207 .4168

-0.1.4602 .4562 .4522

0.0 .5000 .4960 .4920

.4168.02

-02 .4129

Cumulative Standardized Normal Distribution Table (Portion)

Shaded Area Exaggerated

0 1Z Z

Z = -0.21

Example: 2.9 7.1 .1664P X

(continued)

0

Normal Distribution in PHStat

PHStat | probability & prob. Distributions | normal …

Example in excel spreadsheet

Microsoft Excel Worksheet

Example: 8 .3821P X

Normal Distribution

Standardized Normal

Distribution

Shaded Area Exaggerated

10 1Z

5 8 X Z0Z

0.30

8 5.30

10

XZ

.3821

Example: 8 .3821P X

(continued)

Z .00 .01

0.0 .5000 .5040 .5080

.5398 .5438

0.2 .5793 .5832 .5871

0.3 .6179 .6217 .6255

.6179.02

0.1 .5478

Cumulative Standardized Normal Distribution Table (Portion)

Shaded Area Exaggerated

0 1Z Z

Z = 0.30

0

.6217

Finding Z Values for Known Probabilities

Z .00 0.2

0.0 .5000 .5040 .5080

0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871

.6179 .6255

.01

0.3

Cumulative Standardized Normal Distribution Table

(Portion)

What is Z Given Probability = 0.1217 ?

Shaded Area Exaggerated

.6217

0 1Z Z

.31Z 0

Recovering X Values for Known Probabilities

5 .30 10 8X Z

Normal Distribution

Standardized Normal

Distribution10 1Z

5 ? X Z0Z 0.30

.3821.1179

Assessing Normality

Not all continuous random variables are normally distributed

It is important to evaluate how well the data set seems to be adequately approximated by a normal distribution

Assessing Normality Construct charts

For small- or moderate-sized data sets, do stem-and-leaf display and box-and-whisker plot look symmetric?

For large data sets, does the histogram or polygon appear bell-shaped?

Compute descriptive summary measures Do the mean, median and mode have similar

values? Is the interquartile range approximately 1.33

? Is the range approximately 6 ?

(continued)

Assessing Normality

Observe the distribution of the data set Do approximately 2/3 of the observations lie

between mean 1 standard deviation? Do approximately 4/5 of the observations lie

between mean 1.28 standard deviations? Do approximately 19/20 of the observations

lie between mean 2 standard deviations? Evaluate normal probability plot

Do the points lie on or close to a straight line with positive slope?

(continued)

Assessing Normality

Normal probability plot Arrange data into ordered array Find corresponding standardized normal

quantile values Plot the pairs of points with observed data

values on the vertical axis and the standardized normal quantile values on the horizontal axis

Evaluate the plot for evidence of linearity

(continued)

Assessing Normality

Normal Probability Plot for Normal Distribution

Look for Straight Line!

30

60

90

-2 -1 0 1 2

Z

X

(continued)

Normal Probability Plot

Left-Skewed Right-Skewed

Rectangular U-Shaped

30

60

90

-2 -1 0 1 2

Z

X

30

60

90

-2 -1 0 1 2

Z

X

30

60

90

-2 -1 0 1 2

Z

X

30

60

90

-2 -1 0 1 2

Z

X

Exponential Distributions

arrival time 1

: any value of continuous random variable

: the population average number of

arrivals per unit of time

1/ : average time between arrivals

2.71828

XP X e

X

e

e.g.: Drivers Arriving at a Toll Bridge; Customers Arriving at an ATM Machine

Exponential Distributions

Describes time or distance between events Used for queues

Density function

Parameters

(continued)

f(X)

X

= 0.5

= 2.0

1 x

f x e

Example

e.g.: Customers arrive at the check out line of a supermarket at the rate of 30 per hour. What is the probability that the arrival time between consecutive customers to be greater than five minutes?

30 5/ 60

30 5 / 60 hours

arrival time > 1 arrival time

1 1

.0821

X

P X P X

e

Exponential Distribution in PHStat

PHStat | probability & prob. Distributions | exponential

Example in excel spreadsheet

Microsoft Excel Worksheet

Why Study Sampling Distributions

Sample statistics are used to estimate population parameters e.g.: Estimates the population mean

Problems: different samples provide different estimate Large samples gives better estimate; Large

samples costs more How good is the estimate?

Approach to solution: theoretical basis is sampling distribution

50X

Sampling Distribution

Theoretical probability distribution of a sample statistic

Sample statistic is a random variable Sample mean, sample proportion

Results from taking all possible samples of the same size

Developing Sampling Distributions

Assume there is a population … Population size N=4 Random variable, X,

is age of individuals Values of X: 18, 20,

22, 24 measured inyears A

B C

D

1

2

1

18 20 22 2421

4

2.236

N

ii

N

ii

X

N

X

N

.3

.2

.1

0 A B C D (18) (20) (22) (24)

Uniform Distribution

P(X)

X

Developing Sampling Distributions

(continued)

Summary Measures for the Population Distribution

1st 2nd Observation Obs 18 20 22 24

18 18,18 18,20 18,22 18,24

20 20,18 20,20 20,22 20,24

22 22,18 22,20 22,22 22,24

24 24,18 24,20 24,22 24,24

All Possible Samples of Size n=2

16 Samples Taken with Replacement

16 Sample Means1st 2nd Observation Obs 18 20 22 24

18 18 19 20 21

20 19 20 21 22

22 20 21 22 23

24 21 22 23 24

Developing Sampling Distributions

(continued)

1st 2nd Observation Obs 18 20 22 24

18 18 19 20 21

20 19 20 21 22

22 20 21 22 23

24 21 22 23 24

Sampling Distribution of All Sample Means

18 19 20 21 22 23 240

.1

.2

.3

P(X)

X

Sample Means

Distribution

16 Sample Means

_

Developing Sampling Distributions

(continued)

1

2

1

2 2 2

18 19 19 2421

16

18 21 19 21 24 211.58

16

N

ii

X

N

i Xi

X

X

N

X

N

Summary Measures of Sampling Distribution

Developing Sampling Distributions

(continued)

Comparing the Population with its Sampling

Distribution

18 19 20 21 22 23 240

.1

.2

.3 P(X)

X

Sample Means Distribution

n = 2

A B C D (18) (20) (22) (24)

0

.1

.2

.3

PopulationN = 4

P(X)

X_

21 2.236 21 1.58X X

Properties of Summary Measures

I.E. Is unbiased

Standard error (standard deviation) of the sampling distribution is less than the standard error of other unbiased estimators

For sampling with replacement: As n increases, decreases

X

X

Xn

X

X

Unbiasedness

BiasedUnbiased

P(X)

X X

Less Variability

Sampling Distribution of Median Sampling

Distribution of Mean

P(X)

X

Effect of Large Sample

Larger sample size

Smaller sample size

P(X)

X

When the Population is Normal

Central Tendency

Variation

Sampling with Replacement

Population Distribution

Sampling Distributions

X

Xn

X50X

4

5X

n

16

2.5X

n

50

10

When the Population is Not Normal

Central Tendency

Variation

Sampling with Replacement

Population Distribution

Sampling Distributions

X

Xn

X50X

4

5X

n

30

1.8X

n

50

10

Central Limit Theorem

As sample size gets large enough…

the sampling distribution becomes almost normal regardless of shape of population

X

How Large is Large Enough?

For most distributions, n>30 For fairly symmetric distributions, n>15 For normal distribution, the sampling

distribution of the mean is always normally distributed

Example:

8 =2 25

7.8 8.2 ?

n

P X

Sampling Distribution

Standardized Normal

Distribution2

.425

X 1Z

8X 8.2 Z

0Z 0.5

7.8 8 8.2 87.8 8.2

2 / 25 2 / 25

.5 .5 .3830

X

X

XP X P

P Z

7.8 0.5

.1915

X

Population Proportions p Categorical variable

e.g.: Gender, voted for Bush, college degree

Proportion of population having a characteristic

Sample proportion provides an estimate

If two outcomes, X has a binomial

distribution Possess or do not possess characteristic

number of successes

sample sizeS

Xp

n

p

Sampling Distribution of Sample Proportion

Approximated by normal distribution

Mean:

Standard error: p = population

proportion

Sampling DistributionP(ps)

.3

.2

.1 0

0 . 2 .4 .6 8 1ps

5np 1 5n p

Spp

1Sp

p p

n

Standardizing Sampling Distribution of Proportion

1S

S

S p S

p

p p pZ

p p

n

Sampling Distribution

Standardized Normal

Distribution

Sp 1Z

Sp Sp Z0Z

Example: 200 .4 .43 ?Sn p P p

.43 .4.43 .87 .8078

.4 1 .4

200

S

S

S pS

p

pP p P P Z

Sampling Distribution

Standardized Normal

DistributionSp

1Z

Sp

Sp Z0.43 .87

Sampling from Finite Sample

Modify standard error if sample size (n) is large relative to population size (N ) Use finite population correction factor (fpc)

Standard error with FPC

1X

N n

Nn

1

1SP

p p N n

n N

.05 or / .05n N n N