PRBE002 Slides Session 05

download PRBE002 Slides Session 05

of 18

Transcript of PRBE002 Slides Session 05

  • 7/26/2019 PRBE002 Slides Session 05

    1/18

    6/02/2014

    1

    Chapter 10Sampling distributions

    3

    Introduction

    In real life, calculating the parametersof populations is prohibitive becausepopulations are very large.

    Rather than investigating the wholepopulation, we take a sample, calculatea statisticrelated to theparameterofinterest and make an inference.

    The sampling distribution of the statisticis the tool that tells us how close thestatistic is to the parameter.

  • 7/26/2019 PRBE002 Slides Session 05

    2/18

    6/02/2014

    2

    4

    Sampling Distributions

    A sampling distribution is created by, as thename suggests, s am p l i n g .

    The method we will employ to derive thesampling distribution uses the r u l e s o f p r o b a b i l i t y and the l a w s o f e x p e c t ed v al u e a n d v a r ia n ce .

    For example, consider the roll of one and twodice

    5

    x 1 2 3 4 5 6

    P(x) 1/6 1/6 1/6 1/6 1/6 1/6

    10.1 Sampling Distribution of theSample Mean

    Sampling distribution of a single dieA fair die is thrown infinitely many times,with the random variable X = Number of spotsshowing on any throw.The probability distribution of X is:

    and the mean and variance are calculated as:

    6

    Sampling Distribution of Two Dice

    A sampling distribution of the sample mean iscreated by looking at all samples of size n=2(i.e. two dice) and their means

    While there are 36 possible samples of size2, there are only 11 values for , and some(e.g. =3.5) occur more frequently thanothers (e.g. =1).

  • 7/26/2019 PRBE002 Slides Session 05

    3/18

    6/02/2014

    3

    7

    The s am p l i n g d i s t r i b u t i o n of is shown below:

    P( )

    Sampling Distribution of Two Dice

    8

    Compare

    Compare the distribution of X

    with the sampling distribution of .

    As well, note that:

    9

    Generalise

    We can generalise the mean and variance of thesampling of two dice:

    to n-dice:

    The standard deviation ofthe sampling distribution ofthe sample mean is calledthe s t a n d a r d e r r o r :

  • 7/26/2019 PRBE002 Slides Session 05

    4/18

    6/02/2014

    4

    10

    1

    1

    1

    6

    6

    6

    Notice that is smaller

    thanx. The larger the sample

    size the smaller . Therefore,

    tends to fall closer to, as the

    sample size increases.

    11

    The variance of the sample mean is smallerthan the variance of the population.

    1 2 3

    Also,

    Expected value of the population = (1 + 2 + 3)/3 = 2

    Mean = 1.5 Mean = 2.5Mean = 2.

    Population 1.51.51.51.51.51.51.51.51.51.51.51.51.5

    2.52.52.52.52.52.52.52.52.52.52.52.52.5

    22222222222

    Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2

    Compare the variability of the population

    to the variability of the sample mean.Let us take samplesof two observations.

    12

    Central Limit Theorem

    If a random sample is drawn from a normalpopulation, then the sampling distribution ofthe sample mean is normally distributed forall values of n (sample size).

    If a random sample is drawn from anypopulation, the sampling distribution of thesample mean is approximately normal for asufficiently large sample size.

  • 7/26/2019 PRBE002 Slides Session 05

    5/18

    6/02/2014

    5

    13

    Central Limit Theorem

    The larger the sample size, the more closely

    the sampling distribution of will resemble anormal distribution.

    In most practical situations, a sample size of30 may be sufficiently large to allow us to usethe normal distribution as an approximationfor the sampling distribution of .

    14

    Sampling Distribution of the Sample Mean

    15

    We can standardise the sampling distribution ofthe sample mean as

    Sampling Distribution of the Sample Mean

  • 7/26/2019 PRBE002 Slides Session 05

    6/18

    6/02/2014

    6

    16

    The summaries above assume that the population

    is infinitely large. However, if the population is finitethe standard error is

    where N is the population size and

    is the f i n i t e p o p u l a t i o n c o r r e c t i o n f a c t o r .

    Sampling Distribution of the Sample Mean

    17

    Sampling Distribution of the Sample Mean

    If the population size is large relative to thesample size the finite population correctionfactor is close to 1 and can be ignored.

    We will treat any population that is at least 20times larger than the sample size as large.

    In practice, most applications involve

    populations that qualify as large. As a consequence the finite population

    correction factor is usually omitted.

    18

    The weight of each 32g chocolate bar isnormally distributed with a mean of 32.2 gand a standard deviation of 0.3 g.

    Find the probability that, if a customer buysone chocolate bar, that bar will weigh morethan 32 g.

    Solution The random variable X is the weight of a

    chocolate bar.

    = 32.2

    0.7486

    x = 32

    Example 10.1

  • 7/26/2019 PRBE002 Slides Session 05

    7/18

    6/02/2014

    7

    19

    = 32.2

    0.7486

    x = 32

    Find the probability that, if a customer buysa pack of 4 bars, the mean weight of the 4

    bars will be more than 32 g.Solution

    The random variable here is the meanweight per chocolate bar, . We want

    0.9082

    20

    Example

    The average weekly income of graduates oneyear after graduation is $600. Suppose thedistribution of weekly income has a standarddeviation of $100. What is the probability that25 randomly-selected graduates have anaverage weekly income of less than $550?

    Solution

    Let X be the weekly income of graduates one

    year after graduation.

    21

    ExampleThe average weekly income of graduatesone year after graduation is $600. Supposethe distribution of weekly income has astandard deviation of $100.

    1. What is the probability that 25 randomly-selected graduates have an average weeklyincome of less than $550?

    2. If a random sample of 25 graduates actuallyhad an average weekly income of $550, whatwould you conclude about the validity of theclaim that the average weekly income is$600?

  • 7/26/2019 PRBE002 Slides Session 05

    8/18

    6/02/2014

    8

    22

    Solution

    Let X be the weekly income of graduates one

    year after graduation.1.

    2. With = 600, the probability of having asample mean of 550 is very low (0.0062).So the claim that the average weeklyincome is $600 is probably unjustified.

    It would be more reasonable toassume that is smaller than $600,because then a sample mean of $550becomes more probable.

    23

    To make inferences about population parameterswe use sampling distributions.

    The symmetry of the normal distribution along withthe sample distribution of the mean lead to:

    - Z.025 Z.025

    Using Sampling Distributions forInference

    24

    1.96 1.960

    0.025 0.025

    0.025 0.025

    Standard normal distribution Z

    Normal distribution of

  • 7/26/2019 PRBE002 Slides Session 05

    9/18

    6/02/2014

    9

    25

    Conclusion

    There is a 95% chance that the samplemean falls within the interval [560.8,639.2] if the population mean is 600.

    Since the sample mean was 550, thepopulation mean is probably not 600.

    26

    In general

    27

    Creating the Sampling Distributionby Computer Simulation

    By producing data sets of randomnumbers that come from givendistributions, we can verify probabilisticand statistical characteristics.

    We simulate a dice-tossing experiment(creating the distribution of theaverage).

    Effects of an increasing sample size onthe distribution of the mean are shown.

  • 7/26/2019 PRBE002 Slides Session 05

    10/18

    6/02/2014

    10

    28

    Simulation of Dice Tossing

    n = 2 n = 5

    n = 10Mean = 3.494

    Stand. dev. = 0.544

    Mean = 3.486

    Stand. dev. = 1.215

    Mean = 3.495

    Stand. dev. = 0.749

    29

    Type in the variable values and the probability (type 1/6).

    Create a histogram

    for the distribution

    of the mean.

    Type the bin.

    Excel

    Creating a

    simulated

    distribution

    of the mean

    Calculate the means.

    Create samples

    of size two.

    30

    The parameter of interest for nominal datais theproportion of times a particularoutcome (success) occurs.

    To estimate the population proportionp weuse the sample proportion .

    The sampling distribution of is binomial.

    We prefer to use normal approximation tothe binomial distribution to makeinferences about .

    p

    10.2 Sampling Distribution of the

    Sample Proportion

    p

    p

  • 7/26/2019 PRBE002 Slides Session 05

    11/18

    6/02/2014

    11

    31

    Normal Approximation to BinomialBinomial distribution with n=20 and p=.5 with anormal approximation superimposed (=10 and

    =2.24)

    32

    Normal Approximation to Binomial

    Binomial distribution with n=20 and p=.5 with a normalapproximation superimposed ( =10 and =2.24).

    where did these values come from?!

    From Section 7.6 we saw that:

    Hence:

    and

    33

    Normal approximation to the binomialworks best when

    the number of experiments (sample size) islarge, and

    the probability of successp is close to 0.5.

    For the approximation to provide goodresults:

    np 5; n(1 p) 5

    Normal Approximation to Binomial

  • 7/26/2019 PRBE002 Slides Session 05

    12/18

    6/02/2014

    12

    34

    To calculate P(X=10) usingthe normal distribution, we

    can find the area under thenormal curve between 9.5and 10.5

    P(X = 10) P(9.5 < Y < 10.5)

    where Y is a normal random variable approximatingthe binomial random variable X.

    Normal Approximation to Binomial

    35

    P(X = 10) P(9.5 < Y < 10.5)

    where Y is a normal random variable approximatingthe binomial random variable X.

    In fact:

    P(X = 10) = .176

    while

    P(9.5 < Y < 10.5) = .1742

    t h e a p p r o x i m a t i on i s q u i t e

    g o o d .

    Normal Approximation to Binomial

    36

    Approximate the binomial probabilityP(X = 10) when n = 20 and p = 0.5.

    The parameters of the normaldistribution used to approximate thebinomial are:

    = np; = np(1 p)

    Example

  • 7/26/2019 PRBE002 Slides Session 05

    13/18

    6/02/2014

    13

    37

    109.5 10.5

    P(XBinomial = 10)~= P(9.5

  • 7/26/2019 PRBE002 Slides Session 05

    14/18

    6/02/2014

    14

    40

    The Laurier companys brand has a market

    share of 30%. In a survey, 1 000consumers were asked which brand theyprefer. What is the probability that morethan 32% of the respondents say theyprefer the Laurier brand?

    Example

    41

    Solution

    The number of respondents who prefer Laurieris binomial with n = 1000 and p = 0.30.

    Also, np = 1000(0.3) = 300 > 5n(1 p) = 1000(1 0.3) = 700 > 5.

    Therefore, is normal with mean p = 0.30and standard error

    Hence

    42

    Sampling Distribution of theDifference between Two Means,

    The difference between two means canbecome a parameter of interest when thecomparison between two populations isstudied.

    To make an inference about 1 2 weobserve the distribution of .

  • 7/26/2019 PRBE002 Slides Session 05

    15/18

    6/02/2014

    15

    43

    Applying the laws of expected valueand variance we have:

    The distribution of is normalwith mean 1 2 and standarddeviation of if

    the two samples are independent

    the original populations are normallydistributed.

    44

    If the original populations are notnormally distributedbut the samplesizes are 30 or more, the distributionof is approximately normal.

    45

    The mean salaries of MBA graduates fromtwo universities (1 and 2), after 5 years,are $62 000 (stand. dev. = $14 500) and$60 000 (stand. dev. = $18 300).What isthe probability that a sample mean ofUniversity-1 graduates will exceed thesample mean of University-2 graduates?(n1 = 50; n2 = 60)

    Solution

    As the sample sizes are more than 30, thedistribution of is approximatelynormal.

    Example

  • 7/26/2019 PRBE002 Slides Session 05

    16/18

    6/02/2014

    16

    1 2 = 62 000 60 000 = $2 000

    Th e r e i s a b o u t a 7 4% c h a n c e t h a t t h e s am p l e m e a n

    s t a r t i n g s a l ar y o f U . # 1 g r a d s w i l l e x c e e d t h a t o f U .

    # 2 . 46

    47

    In Chapters 7 and 8 we introducedprobability distributions, which allowed usto make probability statements aboutvalues of the random variable.

    A prerequisite of this calculation isknowledge of the distribution and the

    relevant parameters.

    From Here to Inference

    48

    In Example 7.9, we needed to know thatthe probability that Pat Statsdud guessesthe correct answer is 20% (p = .2) andthat the number of correct answers(successes) in 10 questions (trials) is abinomial random variable.

    We then could compute the probability ofany number of successes.

    From Here to Inference

  • 7/26/2019 PRBE002 Slides Session 05

    17/18

    6/02/2014

    17

    In Example 8.3, we needed to know thatthe amount of time to assemble acomputer is normally distributed with amean of 60 minutes and a standarddeviation of 8 minutes.

    These three bits of information allowed usto calculate the probability of variousvalues of the random variable.

    From Here to Inference

    49

    50

    The figure below symbolically represents the use ofprobability distributions.

    Simply put, knowledge of the population and itsparameter(s) allows us to use the probabilitydistribution to make probability statements aboutindividual members of the population.

    Probability Distribution ---------- Individual

    From Here to Inference

    51

    In this chapter we developed the samplingdistribution, wherein knowledge of the parameter(s)and some information about the distribution allowus to make probability statements about a samplestatistic.

    ----- Statistic

    From Here to Inference

  • 7/26/2019 PRBE002 Slides Session 05

    18/18

    6/02/2014

    52

    Statistics works by reversing the direction of

    the flow of knowledge in the previous figure.The next figure displays the character of statistical inference.

    Starting in Chapter 11, we will assume thatmost population parameters are unknown. Thestatistics practitioner will sample from thepopulation and compute the required statistic.The sampling distribution of that statistic willenable us to draw inferences about theparameter.

    From Here to Inference

    53

    From Here to Inference

    Statistic ------ Parameter