5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

Post on 31-Mar-2015

224 views 3 download

Tags:

Transcript of 5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

5-1

Chapter 5

Theory & Problems of

Probability & Statistics

Murray R. Spiegel

Sampling Theory

5-2

Outline Chapter 5

Population X

mean and variance - µ, 2

Sample

mean and variance X, ^s2

Sample Statistics

X mean and variance

^s2 mean and variance

x , x

ˆ s 2

,ˆ s 22

5-3

Outline Chapter 5

Distributions

Population

Samples Statistics

Mean

Proportions

Differences and Sums

Variances

Ratios of Variances

5-4

Outline Chapter 5

Other ways to organize samplesFrequency DistributionsRelative Frequency Distributions

Computation Statistics for Grouped Datameanvariance

standard deviation

5-5

Population Parameters

A population - random variable X

probability distribution (function) f(x)

probability function

- discrete variable f(x)

density function

- continuous variable

f(x) function of several parameters, i.e.:

mean: , variance: 2

want to know parameters for each f(x)

5-6

Example of a Population

5 project engineers in department

total experience of (X) 2, 3, 6, 8, 11 years

company performing statistical report

employees expertise based on experience

survey must include:

average experience

variance

standard deviation

5-7

Mean of Population

average experience mean:

years 6530

5118632

5-8

Variance of Population

variance: n)x( 2

i2

5)611()68()66()63()62( 22222

2

8.105

25409162

5-9

Standard Deviation of Population

standard deviation:

2..ds

8.10..ds

29.38.10

5-10

Sample Statistics

What if don’t have whole population Take random samples from population

estimate population parametersmake inferenceslets see how

How much experience in companyhire for feasibility studyperformance study

5-11

Sampling Example

manager assigns engineers at random

each time chooses first engineer she sees

same engineer could do both

lets say she picks (2,2)

mean of sample X= (2+2)/2 = 2

you want to make inferences about true µ

5-12

Samples of 2

replacement she will go to project department twice

pick engineer randomly

potentially 25 possible teams

25 samples of size two

5 * 5 = 25

order matters (6, 11) is different from (11, 6)

5-13

Population of Samples

All possible combinations are:

(2,2) (2,3) (2,6) (2,8) (2,11)

(3,2) (3,3) (3,6) (3,8) (3,11)

(6,2) (6,3) (6,6) (6,8) (6,11)

(8,2) (8,3) (8,6) (8,8) (8,11)

(11,2) (11,3) (11,6) (11,8) (11,11)

5-14

Population of Averages

Average experience or sample means are: Xi

(2) (2.5) (3) (5) (6.5)

(2.5) (3) (4.5) (5.5) (7)

(3) (4.5) (6) (7) (8.5)

(5) (5.5) (7) (8) (9.5)

(6.5) (7) (8.5) (9.5) (11)

5-15

Mean of Population Means

And mean of sampling distribution of means is :

This confirms theorem that states:

625

15025

(11)...(5)(3)(2.5)(2)X

6)X(E X

5-16

Variance of Sample Means

variance of sampling distribution of means (Xi -X)2

(2-6)2 (2.5-6)2 (3-6)2 (5-6)2 (6.5-6)2

(2.5-6)2 (3-6)2 (4.5-6)2 (5.5-6)2 (7-6)2

(3-6 ) (4.5-6)2 (6-6)2 (7-6)2 (8.5-6)2

(5-6 )2 (5.5-6)2 (7-6)2 (8-6)2 (9.5-6)2

(6.5-6 )2 (7-6)2 (8.5-6)2 (9.5-6 )2 (11-6)2

5-17

Variance of Sample Means

Calculating values:

16 12.25 9 1 0.25

12.25 9 2.25 0.25 1

9 2.25 0 1 6.25

1 0.25 1 4 12.25

0.25 1 6.25 12.25 25

5-18

Variance of Sample Means

variance is:

Therefore standard deviation is

4.525

135n

)XX( 2

i2

X

32.24.5X

5-19

Variance of Sample Means

These results hold for theorem:

Where n is size of samples. Then we see that:

n

22

X

40.52

8.10n

22

X

5-20Math Proof

X mean

X = X1 + X2 + X3 + . . . Xn

n

E(X) = E(X1) + E(X2)+ E(X3) + . . . E(Xn)

n

E(X) = + + + . . .

n

E(X) =

5-21Math Proof X variance

X = X1 + X2 + X3 + . . . Xn

n

Var(X) = 2x = 2

x + 2x + 2

x + . . . 2x

n2

=

5-22

Sampling Means No Replacement

manager picks two engineers at same time

order doesn't matter

order (6, 11) is same as order (11, 6)

10 choose 2 5!/(2!)(5-2)! = 10

10 possible teams, or 10 samples of size two.

5-23

Sampling Means No Replacement

All possible combinations are:

(2,3) (2,6) (2,8) (2,11) (3,6)

(3,8) (3,11) (6,8) (6,11) (8,11)

corresponding sample means are:

(2.5) (3) (5) (6.5) (4.5)

(5.5) (7) (7) (8.5) (9.5)

mean of corresponding sample of means is:

610

5.9...535.2X

5-24

Sampling Variance No Replacement

variance of sampling distribution of means is:

standard deviation is:

05.410

)65.9(...)64()65.2(n

)XX( 2222

i2

X

01.205.4n

)XX( 2

i2

XX

5-25Theorems on Sampling

Distributions with No Replacements

1.

2.05.4

4

3

2

8.10

15

25

2

8.10

1N

nN

n

22X

6X

5-26Sum Up Theorems on Sampling Distributions

Theorem I:Expected values sample mean = population mean

E(X ) = x = : mean of population

Theorem II:infinite population or sampling with replacementvariance of sample is

E[(X- )2] = x2 = 2/n

2: variance of population

5-27Theorems on Sampling

Distributions

Theorem III: population size is N

sampling with no replacement

sample size is n

then sample variance is:

1NnN

n

22

x

5-28Theorems on Sampling

Distributions

Theorem IV: population normally distributed

mean , variance 2

then sample mean normally distributed

mean , variance 2/n

)1,0(N

n

XZ

5-29Theorems on Sampling

Distributions

Theorem V:

samples are taken from distribution

mean , variance 2

(not necessarily normal distributed) standardized variables

asymptotically normal

n

XZ

5-30Sampling Distribution of

Proportions

Population properties:

* Infinite

* Binomially Distributed

( p “success”; q=1-p “fail”)

Consider all possible samples of size n

statistic for each sample

= proportion P of success

5-31

Sampling Distribution of Proportions

Sampling distribution of proportions of:mean:

std. deviation:

pP

n

)p1(p

n

pqP

5-32Sampling Distribution of

Proportions

large values of n (n>30) sample distribution for Papproximates normal distribution

finite population sample without replacingstandardized P is

npq

pPZ

5-33

Example Proportions

Oil service company

explores for oil

according to geological department

37% chances of finding oil

drill 150 wells

P(0.4<P<0.6)=?

5-34

Example Proportions

npq

pPZ

P(0.4<P<0.6)=?

P(0.4-0.37 < P-.37 < 0.6-0.37) =? (.37*.63/150).5 (pq/n).5 (.37*.63/150).5

5-35

Example Proportions

P(0.4<P<0.6)=P(0.24<Z<1.84)

=normsdist(1.84)-normsdist(0.24)= 0.372

Think about mean, variance and distribution of

np the number of successes

5-36

Sampling Distribution of Sums & Differences

Suppose we have two populations.

Population XA XB

Sample of size nA nB

Compute statistic SA SB

Samples are independent

Sampling distribution for SA and SB gives

mean: SA SB

variance: SA2 SB

2

5-37 Sampling Distribution of Sums

and Differences

combination of 2 samples from 2 populations sampling distribution of differences

S = SA +/- SB

For new sampling distribution we have:

mean: S = SA +/- SB

variance: S2 = SA

2 + SB2

5-38Sampling Distribution of

Sums and Differences

two populations XA and XB

SA= XA and SB = XB sample means

mean: XA+XB = XA + XB = A + B

variance:

Sampling from infinite populationSampling with replacement

B

2

B

A

2

ABX

2

AX nn

5-39Example Sampling Distribution

of Sums

You are leasing oil fields from

two companies for two years

lease expires at end of each year

randomly assigned a new lease for next year

Company A - two oil fields

production XA: 300, 700 million barrels

Company B two oil fields

production XB: 500, 1100 million barrels

5-40

Population Means

•Average oil field size of company A:

•Average oil field size of company B:

5002

700300XA

80021100500

XB

1300800500XBXA

5-41

Population Variances

Company A - two oil fields

production XA: 300, 700 million barrels

Company B two oil fields

production XB: 500, 1100 million barrels

XA2 = (300 – 500)2 + (700 – 500)2/2 = 40,000

XB2 = (500 – 800)2 + (1100 – 800)2/2 = 90,000

5-42Example Sampling Distribution

of Sums

Interested in total production: XA + XB

Compute all possible leases assignments

Two choices XA, Two choices XB

XAi XBi

{300, 500}

{300, 1100}

{700, 500}

{700, 1100}

5-43Example Sampling Distribution

of Sums

XAi XBi

{300, 500}

{300, 1100}

{700, 500}

{700, 1100}

Then for each of the 4 possibilities –

4 choices year 1, four choices year 2 = 4*4 samples

5-44Example Sampling Distribution

of Sums

Samples XAi XBi XAi XBi

Year 1 300 500 300 1100

Year 2 300 500 300 500

Year 1 300 500 300 1100

Year 2 300 1100 300 1100

Year 1 300 500 300 1100

Year 2 700 500 700 500

Year 1 300 500 300 1100

Year 2 700 1100 700 1100

5-45

Example Sampling Distribution of Sums

Samples XAi XBi XAi XBi

Year 1 700 500 700 1100

Year 2 300 500 300 500

Year 1 700 500 700 1100

Year 2 300 1100 300 1100

Year 1 700 500 700 1100

Year 2 700 500 700 500

Year 1 700 500 700 1100

Year 2 700 1100 700 1100

5-46

Compute Sum and Means of each sample

Means XAi+XBi Mean XAi+XBi Mean

Year 1 800 800 1400 1100

Year 2 800   800  

Year 1 800 1100 1400 1400

Year 2 1400   1400  

Year 1 800 1000 1400 1300

Year 2 1200   1200  

Year 1 800 1300 1400 1600

Year 2 1800   1800  

5-47

Compute Sum and Means of each Sample

Means XAi+XBi Mean XAi+XBi Mean

Year 1 1200 1000 1800 1300

Year 2 800   800  

Year 1 1200 1300 1800 1600

Year 2 1400   1400  

Year 1 1200 1200 1800 1500

Year 2 1200   1200  

Year 1 1200 1500 1800 1800

Year 2 1800   1800  

5-48

Mean of Sum of Sample Means

Population of Samples

{800, 1100, 1000, 1300, 1100, 1400, 1300, 1600, 1000, 1300, 1200, 1500, 1300, 1600, 1500, 1800}_______XAi+XBi =

(800 + 1100 + 1000 + 1300 + 1100 + 1400 + 1300 + 1600 + 1000 + 1300 + 1200 + 1500 + 1300 + 1600 + 1500 + 1800)

16

= 1300

5-49

Mean of Sum of Sample Means

This illustrates theorem on means _____ (XA+XB)= 1300= XA+ XB = 500 + 800 = 1300

_____What about variances of XA+XB

5-50

Variance of Sum of Means

Population of samples

{800, 1100, 1000, 1300, 1100, 1400, 1300, 1600, 1000, 1300, 1200, 1500, 1300, 1600, 1500, 1800}

2 = {(800 - 1300)2 + (1100 - 1300)2 + (1000 - 1300)2 + (1300 - 1300)2 + (1100 - 1300)2 + (1400 - 1300)2 + (1300- 1300)2 + (1600 - 1300)2 + (1000 - 1300)2 + (1300 - 1300)2 + (1200 - 1300)2 + (1500 - 1300)2 + (1300 - 1300)2 + (1600 - 1300)2 + (1500 - 1300)2 + (1800 - 1300)2}/16

= 65,000

5-51

Variance of Sum of Means

B

2

B

A

2

ABX

2

AX nn

2000,90

240000

000,65

This illustrates theorem on variances

5-52Normalize to Make Inferences on

Means

B

2

B

a

2

A

BABA

nn

XX

5-53

Estimators for Variance

n)XX(...)XX()XX(

S2

n

2

2

2

12

22 )ˆ( SE1n

)XX(...)XX()XX(S

2

n

2

2

2

12

use for populations

unbiased better for smaller samples

Two choices

5-54

Sampling Distribution of Variances

All possible random samples of size n

each sample has a variance

all possible variances

give sampling distribution of variances

sampling distribution of related random variable

2

2n

22

21

2

2

2

2 )XX(...)XX()XX(S)1n(nS

5-55

Example Population of Samples

All possible teams are:

(2,2) (2,3) (2,6) (2,8) (2,11)

(3,2) (3,3) (3,6) (3,8) (3,11)

(6,2) (6,3) (6,6) (6,8) (6,11)

(8,2) (8,3) (8,6) (8,8) (8,11)

(11,2) (11,3) (11,6) (11,8) (11,11)

5-56

Compute Variance for Each Sample

sample variance corresponding to each of 25 possible

choice that manager makes are: ^s2

0 0.25 4 9 20.25

.25 0 2.25 6.25 16

4 2.25 0 1 6.25

9 6.25 1 0 2.25

20.25 16 6.25 2.25 0

25.202

)5.611()5.62( 22

5-57

Sampling Distribution of Variance

Population of Variancesmeanvariancedistribution

(n-1)s2/2 2n-1

5-58What if Unknown Population

Variance?

X is Normal (, 2)

to make inference on means we normalize

n

XZ

5-59

Unknown Population Variance

2

22 S)1n(

1n

2

2

t

nS

X

S)1n(

n

X

5-60

Unknown Population Variance

)t

nSX

t(P 2c,1n1c,1n

Use in the same way as for normal

except use different Tables

α = 0.05

05.01)0639.2

nSX

0639.2(P

n = 25, =tinv(0.05,24)= 2.06392.06-2.06

5-61

Uses t -statistics

Will use for testing

means, sums, and differences of means

small samples when variable is normal

substitute sample variance in for true

ns

Xt

n

XZ 1n

5-62

Uses t -statistics

sums and differences of means

)1,0(N

nn

)(XX

21

2

2

2

1

2121

unknown variance

2nn

21

21

21

2

22

2

11

2121

21t

nnnn

2-nns1-ns1-n

)(XX

5-63

Uses 2 statistic

2

22 S)1n(

Inference on Variance

Large sample test

5-64

Inferences

F Statistic

)1n(s)1n(

)1n(

s)1n(

22

2

2

22

1

2

1

2

11

2df1/df1 =2df2/df2

2df,1df2

1

2

2

2

2

2

1 Fss

5-65F Statistic

Other tests

groups of coefficients

5-66

Other Statistics

. Medians .

n > 30, sample distribution of medians

nearly normal if X is normal

n2533.1

n2med

med

5-67

Frequency Distributions

If sample or population is large

difficult to compute statistics

(i.e. mean, variance, etc)

Organizing RAW DATA is useful

arrange into CLASSES or categories

determine number in each class

Class Frequency or Frequency Distribution

5-68

Frequency Distributions - Example

Example of Frequency Distribution:

middle size oil company

portfolio of 100 small oil reservoirs

reserves vary from 89 to 300 million barrels

5-69

Frequency Distributions - Example

arrange data into categories

create table showing ranges of reservoirs sizes

number of reservoirs in each range

ReservesNumber of

Fields50-100 4

101-150 21151-200 42201-250 27251-300 6TOTAL 100

5-70

Frequency Distributions - Example

Class intervals are in ranges of 50 million barrels

Each class interval represented by median value

e.g. 200 up to 250 will be represented by 225

Can plot data

histogram

polygon

This plot is represents frequency distribution

5-71Frequency Distributions Plotted -

Example

ReservesNumber of

Fields50-100 4

101-150 21151-200 42201-250 27251-300 6TOTAL 100

0

5

10

15

20

25

30

35

40

45

25 75 125 175 225 275 325

Reserves (mmb)

No. o

f Fie

lds

5-72Relative Frequency Distributions

and Ogives

number of individuals

- frequency distribution

- empirical probability distribution

percentage of individual

- relative frequency distribution

empirical cumulative probability distribution

- ogive

5-73

Percent Ogives

OGIVE for oil company portfolio of reservoirs

Shows percent reservoirs < than x reserves

5-74Computation of Statistics for

Grouped Data

can calculate mean and variance from grouped data

5-75Computation of Statistics for

Grouped Data

take 420 samples of an ore bodymeasure % concentration of Zinc (Zn) frequency distribution of lab results

5-76Computation of Statistics for

Grouped Data% Weight Frequency % Weight Frequency1.00 2 1.55 281.05 5 1.60 141.10 11 1.65 221.15 21 1.70 181.20 33 1.75 151.25 41 1.80 41.30 53 1.85 21.35 42 1.90 21.40 38 1.95 31.45 31 2.00 11.50 34 TOTAL 420

5-77

Computation of Statistics for Grouped Data

mean will then be:

And in our example:n

xf...xfxf

n

xfx kk2211ii

k21i f...fffn

40.1420

1*00.2...31*45.1...5*05.12*00.1

n

xfx ii

5-78Computation of Statistics for

Grouped Data

variance will then be:

n)xx(f...)xx(f)xx(f

n)xx(f

S2

kk

2

22

2

11

2

ii2

5-79Computation of Statistics for

Grouped Data

And in our example:

0365.0S420

)40.100.2(1....)40.105.1(5)40.100.1(2S

n

)xx(fS

2

2222

2ii2

5-80Computation of Statistics

for Grouped DataSimilar formula are available for higher moments:

n

)xx(f...)xx(f)xx(f

n

)xx(fm

rkk

r22

r11

rii

r

n

xf...xfxf

n

xfm

rkk

r22

r11

rii

r

5-81Sum up Chapter 5

Population X

mean and variance - µ, σ2

distribution

A Sample

statistic from sample

usually mean and variance X, ^s2

5-82Sum up Chapter 5

Sample Statistics

X mean and variance x, x 2

^s2 mean and variance ^s2, ^s

2

Distribution

5-83

Sum Up Chapter 5

Samples Statistics

Mean X ~ µ, σ2/n

Distribution

ns

Xt

n

XZ 1n

5-84

Sum Up Chapter 5

Samples Statistics

Proportions P ~ p, p(1-p)/n

n>30

Distribution

npq

pPZ

5-85

Sum Up Chapter 5Samples Statistics

Differences and Sums

X1+/- X2 ~ 1 + 2, 12/n1 + 2

2/n2

Distribution

)1,0(N

nn

)(XX

21

2

2

2

1

2121

2nn

21

21

21

2

22

2

11

2121

21t

nnnn

2-nns1-ns1-n

)(XX

5-86

Sum Up Chapter 5

Samples Statistics

Variances

Distribution

Mean = n-1

Variance = 2(n-1)

2

22

1n

S)1n(

5-87

Sum Up Chapter 5

Samples Statistics

Ratios of Variances

2df,1df2

1

2

2

2

2

2

1 Fss

5-88

Sum up Chapter 5

Other ways to organize samplesFrequency DistributionsRelative Frequency Distributions

Computation Statistics for Grouped Datameanvariance

standard deviation

5-89

THAT’S ALL FORCHAPTER 5

THANK YOU!!