Week 8_2014s2

download Week 8_2014s2

of 31

Transcript of Week 8_2014s2

  • 8/10/2019 Week 8_2014s2

    1/31

  • 8/10/2019 Week 8_2014s2

    2/31

    2

    Week 8 topics

    Confidence intervals for the population mean

    More on hypothesis testing

    Type I and Type II errors

    p-values Power

  • 8/10/2019 Week 8_2014s2

    3/31

    3

    Interval estimation: review

    Point estimators produce a single estimate of theparameter of interest

    In many real-world situations, some notion of the

    margin of error would be useful Interval estimatorsproduce an intervali.e., a

    range of valuesand a degree of confidenceassociated with that interval Hence the name confidence interval

    How often would you expect the true population parameterto be in this (sample-specific) interval?

  • 8/10/2019 Week 8_2014s2

    4/31

    4

    Interval estimation for means

    95.96.196.1

    :yieldtostatementyprobabilitthisrearrangecanweNow

    95.96.1/

    96.1

    tables)(from96.1025.)(05.chooseSay we

    1/

    :ofvalueedstandardizthisaroundintervalsymmetricaConsider

    /thusand)/,(~Suppose 2

    nX

    nXP

    n

    XP

    boundboundZP

    boundn

    XboundP

    X

    n

    XZnNX

  • 8/10/2019 Week 8_2014s2

    5/31

    5

    Interval estimation

    interval!confidenceadefine,96.1,endpointsThen

    X

    Remember: The endpoints of the interval are themselves random

    variables

    We have constructed a random interval

    is a constant

    For a particular sample (and sample mean value), iseither in the confidence interval, or it is not

    If 100 size-n samples were drawn, we would expect95 of them to include

  • 8/10/2019 Week 8_2014s2

    6/31

    6

    Confidence intervals

    CIs for means and proportions typically have a

    similar structure

    Centred at sample statistics

    Endpoints are some multiple of the standarderror (if we dont know sigma) or standard

    deviation (if we do know sigma) of the samplingdistribution

    The multiple is determined by the confidencelevelchosen by the investigator

    Remember: If you dont know sigma and have asmall sample, use the t-distribution tables to get

    your boundsnot the Z!

  • 8/10/2019 Week 8_2014s2

    7/31

    7

    Selecting sample size

    Recall the auditor Clare from last lecture

    Suppose she is OK with assuming she knows sigma, andneeds to decide on a sample size

    She wants a sample size that yields a margin of error of$4, and she is willing to set the confidence level at 90%

    We can now write down the CI and use it to solve for thesample size nthat she requires.

    1564

    334.30645.1

    nor

    4requireweThus

    intervalconfidencethedefines

    05.

    2/

    n

    nz

    nzX

    Recall, =$30.334, byassumption(based onhistorical data)

  • 8/10/2019 Week 8_2014s2

    8/31

    8

    Hypothesis testing examples

    and concepts, again

    Maintained or null hypothesis Some statement about a population parameter

    LetX be the weight of precooked meat, with mean

    Then the null hypothesis is H0: = 0.25 Alternative hypothesis

    Will depend on the research objective

    Some possibilities here: H1: 0.25, two-tailed hypothesis test(so a value too

    extreme in either direction violates the tenet of Truth in

    Advertising)

    H1: < 0.25, one (lower)-tailed hypothesis test (so a valuetoo low violates the minimum standard of a trading standards

    agency or consumer advocacy group)

  • 8/10/2019 Week 8_2014s2

    9/31

    9

    Hypothesis testing examples

    and concepts

    Recall how are data used to test a nullhypothesis: Proceed by comparing a test statistic with the value

    specified in H0and decide whether the difference is: Small enough to be attributable to random sampling errors

    do not reject H0, or

    So large that H0is more likely not to be correctreject H0

    Formally define a rejection (or critical) region Values of the test statistic that are so extreme they lead us to

    reject H0 in favour of H1

    Other values of the test statistic that are not soextreme lie in the non-critical region

  • 8/10/2019 Week 8_2014s2

    10/31

    10

    Quality control at McDonalds A quarter-pounder with cheese is presumed to

    comprise 0.25 pounds (0.11 kg) of precooked meat

    Consider H0: = 0.25, H1: < 0.25

    A sample of 25 hamburgers produces sample mean

    weights (in pounds!) of: (a) 0.24 (b) 0.23 (c) 0.28 (d) 0.21

    Which of these represents evidence against H0?

    Which of these would lead you to reject H0?

    For which are you most likely to reject H0?

  • 8/10/2019 Week 8_2014s2

    11/31

    11

    Quality control at

    McDonalds

    .determinecanthen we,andknowandsetweifThus

    25.)(

    implieswhichifReject

    :belwhich wilregion,rejectionthedeterminetoneedWe

    25.0:

    25.0:

    ),(~ismeathamburgerofweighttheAssume

    1

    0

    2

    L

    LL

    L

    xn

    n

    xZPxXP

    xX

    H

    H

    NX

  • 8/10/2019 Week 8_2014s2

    12/31

  • 8/10/2019 Week 8_2014s2

    13/31

    13

    Quality control at

    McDonalds

    -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

    z

    0.05

    -1.645

    0.2336

    = 0.25

    x

  • 8/10/2019 Week 8_2014s2

    14/31

    Quality control at

    McDonalds

    Our choice of significance level matters!

    Suppose =0.01

    Our new rejection region is then z < -2.33 insteadof z < -1.645 , or in terms of the sample mean, thecutoff is 0.2267 rather 0.2336

    Does it make sense that the critical value is loweron the number line when =0.01 than when =0.05?

    Thus, case (b) with a sample mean of 0.23 would

    now NOTlead to rejectionof the null hypothesis 14

  • 8/10/2019 Week 8_2014s2

    15/31

  • 8/10/2019 Week 8_2014s2

    16/31

    16

    p-values

    How do I choose the significance level ? No rules Conventional choices are = 0.1, 0.05, or 0.01 In the McDonalds example, we saw it could matter

    Why do I have to choose a particular ? You dont (though doing so can helpfully bind your hands) You can calculate the empirical significance level, orp-

    value

    The p-value associated with a given test statistic

    is the probability of obtaining a value of the teststatistic as or m ore extreme than that observed,given that the null hypothesis is true More extreme depends on form of the alternative

    hypothesis

  • 8/10/2019 Week 8_2014s2

    17/31

    17

    pvalues

    0069.)46.2(

    5/160.609.7

    )09.7(

    ZP

    n

    XP

    XPvaluep

    From the skills test example of last week:

    Thus, it is very unlikely (less than a 1% chance) to

    find such an extreme value for the sample mean testscore if H0is correct

    There is strong evidence to reject H0

    Put another way: for any choice of significance level ()

    greater than .0069, we would reject H0

    We can only calculatethis if we assume thesampling distribution of Xis centred at a particular

    valuewhich we assumeto be the populationmean under the null!

  • 8/10/2019 Week 8_2014s2

    18/31

  • 8/10/2019 Week 8_2014s2

    19/31

    Student progress in BES...

    Key features of pastresults for the 10-markquiz On average, students do

    well Median=9, mean=8.5,

    only 5.3% with mark

  • 8/10/2019 Week 8_2014s2

    20/31

    20

    Student progress in BES...

    Q1: Suppose a student randomlychose another BESstudent to help him with his work

    What is the probability that the chosen students test mark is

    at least 9? Q2: Define a good tutorial (of 20 students) to be one

    where the mean mark is at least 9

    What is the probability that any given student is in a good

    tutorial? Q3: Was the semester 1 2014 BES cohort any

    different from what we have seen in the past?

    Test this hypothesis using data from a randomly-selected s1

    2014 tutorial of size 25, in which the mean mark is 8.8

  • 8/10/2019 Week 8_2014s2

    21/31

  • 8/10/2019 Week 8_2014s2

    22/31

    22

    Student progress in BES...

    .rejecttoevidencentinsufficieisthere),level!alconventionotherany(or0.01say,at,

    4238.02119.0280.02

    2588.1

    5.88.82)8.8(2

    above.statedtesttailed-twotheassuming

    mean,sampleourwithassociatedvalue-thecalculatecanWe.)25/88.1,5.8(~CLT,By

    5.8:5.8::3

    0

    2

    10

    H

    ZP

    n

    XPXPp

    pNX

    HHQ

    This isthe

    samplemean in

    oursample!

  • 8/10/2019 Week 8_2014s2

    23/31

    Hypothesis testing:

    A note about types of errors

    Concepts about errors one can make duringhypothesis testing are similar in statisticalinference and in the judicial system (introduced in

    last lecture) Recall the McDonalds quality control example A quarter-pounder with cheese is presumed to

    comprise 0.25 pounds (0.11 kg) of precooked meat

    Given data (a sample of hamburgers), we can evaluatethis implicit claim

    Yet we might conclude that their hamburgers do contain0.25 pounds when in fact they dont (false negative)

    Or, we might conclude that their hamburgers dont

    contain 0.25 pounds when in fact they do (false positive)23

  • 8/10/2019 Week 8_2014s2

    24/31

    24

    Hypothesis testing examples

    and concepts

    Type I errorsoccur when we reject a true nullhypothesis Only possible to make this error when the null is true

    Denote P(Type I error) = P(Reject H0| H0true) =

    Type II errorsoccur when we dont reject a false nullhypothesis

    Only possible to make this error when the null is false Denote P(Type II error) =b

    P(Do not reject H0| H0not correct) =b

    P(Type II error) depends on what the actual(alternative) parameter value is!

    significance level

  • 8/10/2019 Week 8_2014s2

    25/31

    25

    Calculating probability of Type

    II errors

    Recall our McDonalds example: Suppose we conduct this one-tailed test:

    H0: = 0.25, H1: < 0.25

    With n= 25, = 0.05, and = 0.05, we previously found the

    relevant decision rule to be a rejection of H0 if we find that oursample mean < 0.2336

    Suppose that in fact, McDonalds only puts 0.24 pounds in theirquarter pounder. Will we detect this?

    y!discrepancthedetectingnotofchance74%aisThere

    7389.)64.0(

    2505.0

    24.2336.)24.|2336.(

    ZP

    ZPXP b

    We nowassume thesamplingdistribution iscentred at thealternative!

    the probability that we get a test statistic that makes us fail to

    reject the null, given that an alternative is true

  • 8/10/2019 Week 8_2014s2

    26/31

    26

    Power of a test

    Power (in statistics): The probability of correctlyrejecting a false null hypothesis

    P(Do not reject H0| H0not correct) =b

    P(Reject H0| H0not correct) = Power = 1-b

    ( 0.64) .7389P Zb

    From the prior slide:

    So, given a true population parameter of 0.24 pounds of meat inthe quarter pounder, the power of this test is:

    1 1 0.7389 0.2611b

  • 8/10/2019 Week 8_2014s2

    27/31

    27

    Power of a test

    Suppose a company is considering installing an additionalpress to continuously extrude copper. The investment is onlyviable if the press extrudes more than 170 metres of copperper hour. This suggests the following hypothesis test:

    H0: = 170; H1: > 170;

    with the firm investing in an additional press upon rejection ofthe null.

    A large random sample of 400 production hours from existingsimilar presses in the plant has a sample mean of 176 m/hand a sample standard deviation of 65 m/h. Further supposethat this sample is large enough to invoke the CLT.

  • 8/10/2019 Week 8_2014s2

    28/31

    28

    Power of a testSay the firm sets up the hypothesis testusing = 0.01.

    H0: = 170; H1: > 170

    n= 400, = 0.01, s = 65

    Decision rule:

    Reject H0if sample mean > 177.57

    The firm would hate to make a mistakeover this critical investment. If the newpress had a mean production of 180m/h,

    it would be a very attractive investment.What is the power of the test if, in actualfact, = 180?

    177.57 | 180p Xb

    177.57 180

    65 400p Zb

    0.75 0.2266p Zb

    1-b=1-0.2266

    = 0.7734

    Verify at home!!

  • 8/10/2019 Week 8_2014s2

    29/31

    29

    Power of a test

    What happens to power ifwe increase to 0.05?

    What happens to power ifwe increase n to 1000?

    1b= 10.0764= 0.9236.

    The power increases:

    The power increases:

    1b10 1

  • 8/10/2019 Week 8_2014s2

    30/31

    Power of a test...

    Summary: H0: = 170; H1: = 180

    If =0.01, thenb = 0.2266(Power = 0.7734) when n=400

    If =0.05, thenb = 0.0764(Power = 0.9236) when n=400

    If =0.05, thenb 0.0000(Power 1) when n=1000

    With a different alternative, e.g. H1: = 178 (verify!): If =0.05, thenb=0.2075(Power = 0.7925) when n=400

    This (closer-to-the-null) alternative is harder to detect!

    P(Z

  • 8/10/2019 Week 8_2014s2

    31/31

    31

    Progress report

    We now have procedures to test hypotheses in arange of circumstances, using the sampleproportion and the sample mean

    We can use the standard normal tables and thet-tables, as appropriate, in generatingconfidence intervals and testing hypotheses

    We know the mistakes we could make in ourtesting, and about the power of our tests.

    Next week: Chi-squared tests.

    After that, the final broad module of the course

    begins Linear regression!