Chapter 9 Hypothesis Testing - Queen's...

21
Chapter 9 Hypothesis Testing 9.1 Testing Hypotheses With our knowledge of interval estimation, we can consider hypothesis tests An Example of an Hypothesis Test: Statisticians at Employment and Immigration Canada believe that the average duration of unemployment in Alberta is less than 6 weeks. They want to test: 0 : 6 1 : 6 NOTE: 1. 0 is called the null hypothesis. It is typically what we are interested in. It is a maintained hypothesis that is held to be true unless sucient evidence to the contrary is obtained. 2. 1 is called the alternative hypothesis. It is a hypothesis against which the null hypothesis is tested and which will be held to be true if the null is held false. 1

Transcript of Chapter 9 Hypothesis Testing - Queen's...

  • Chapter 9

    Hypothesis Testing

    9.1 Testing Hypotheses

    • With our knowledge of interval estimation, we can consider hypothesis tests

    • An Example of an Hypothesis Test: Statisticians at Employment and ImmigrationCanada believe that the average duration of unemployment in Alberta is less than6 weeks.

    They want to test:

    0 : ≤ 6

    1 : 6

    NOTE:

    1. 0 is called the null hypothesis. It is typically what we are interested in. It isa maintained hypothesis that is held to be true unless sufficient evidence to thecontrary is obtained.

    2. 1 is called the alternative hypothesis. It is a hypothesis against which the nullhypothesis is tested and which will be held to be true if the null is held false.

    1

  • 2 CHAPTER 9. HYPOTHESIS TESTING

    Statistics for Busi ness and Economi cs, 6e © 2007 Pearson Education, Inc. Chap 10-4

    The Null Hypothesis, H0

    States the assumption (numerical) to be tested

    Example: The average number of TV sets in U.S. Homes is equal to three ( )

    Is always about a population parameter, not about a sample statistic

    3μ:H0

    3μ:H0 3X:H0

    Figure 9.1:

  • 9.2. AN TWO SIDED HYPOTHESIS TEST EXAMPLE 3

    9.2 An Two Sided Hypothesis Test Example

    A firm’s sales records show that customers spend on average $550 per month on theirproduct. They wish to know whether this has changed using a significance level of 01this year. They survey 30 customers and find that the mean expenditure is $510 with asample standard deviation of $90.

    We can follow the steps outlined earlier:

    1. Formulate Null

    0 : = 550 null hypothesis

    : 6= 550 alternative hypothesis

    2. Level of Significance of test say = 01

    3. Calculate test statistic as

    =̄ − 0√=51− 559√30

    = −2434

    4. Critical Region Rejection rule

    reject 0 | | 2

    −1

    The critical value is 01229 = 2756 ( = 30)

    We do not reject since || 01229 = 2756.

    9.3 Interpretation and Notes

    1. We say the null hypothesis that the population mean is equal to 5.5 is not rejectedat the 1% level of significance. This is to make it clear that it might be if we wereto choose a higher level of significance (Try = 10). Notice we never “accept” anull hypothesis.

    2. The idea is that differences between ̄ and 0 are not significant, since it could arisefrom sampling variability under the null. Even if 0 were true we would expect tosee some samples with ̄ 55.

    3. If we reject 0 it implies the difference between ̄ and 0 is too large to beattributed to ordinary sampling variability.

  • 4 CHAPTER 9. HYPOTHESIS TESTING

    4. The whole trick in hypothesis testing is to figure out the correct statistic to construct(one with a known sample distribution) and then to find a rejection region. Drawinga picture often helps one avoid mistakes with rejection regions.

    5. We know if the are normal then we can use the t-distribution in situations wherewe are estimating the variance.

    6. On the other hand, if are not normally distributed we can appeal to the centrallimit theorem so that ̄ is approximately normally distributed as gets large, sothat we can still use the t-distribution. Also as we have seen for 30 the t-distribution is close to the normal and often critical values are calculated directlyfrom the normal tables.

    9.4 Confidence Intervals and Hypothesis Tests

    • A final relation between hypothesis tests and confidence intervals can be stated.• If one calculates a 95% confidence interval for and finds that the value 0 iscontained in the interval, then we know that the null hypothesis (0 : = 0)would not be rejected at the 5% level of significance.

    • Similarly if the confidence interval did not contain the value 0, then the nullhypothesis is rejected.

    • Once we have calculated the confidence interval, we have in fact obtained all thepossible null hypotheses that would be retained at the chosen significance levelfor this particular sample.

    • The two-sided confidence interval is:

    ̄ ± 01229 × √

    • The calculation gives: (4.65,5.56) which contains 5.5, the null hypothesis.

    9.5

    9.6 Definitions and Terms

    1. Test Statistic: A test statistic is a random variable whose value determineswhether we reject or do not reject the null hypothesis.

    2. Decision or Rejection Rule: A decision rule specifies the set of values for thetest statistic for which the null hypothesis 0 will be rejected and the set of valuesfor which 0 will not be rejected.

  • 9.7. SUMMARY OF CONCEPTS OF A HYPOTHESIS TEST 5

    3. Critical Region or Rejection Region: The critical region of a test consists ofall the values of the test statistic for which 0 will be rejected.

    4. Non-Rejection Region: The non-rejection region of a test consists of all thevalues of the test statistic for which 0 will not be rejected.

    5. Critical Values: Critical values of a test statistic separate the critical region fromthe non-rejection region.

    6. Level of Significance (Usually denoted as ): The level of significance of a testis the probability that the test statistic lies in the critical region or rejection regionwhen 0 is true

    7. Two-Sided Alternative: An alternative hypothesis involving all possible values ofa population parameter other than the value specified by a simple null hypothesis.

    8. One-Sided Alternative: An alternative hypothesis involving all possible valuesof a population parameter on either one side or the other of (that is , either greaterthan or less than) the value specified by a simple null hypothesis.

    9.7 Summary of Concepts of a Hypothesis Test

    • Judgments in the form of the hypothesis testing involve an a priori assumptionabout the value of an unknown parameter.

    • If the sample information provides evidence against the null hypothesis we rejectit, otherwise we do not reject it.

    • The evidence from a sample is summarized in the form of a test statistic which isused at arriving at a verdict concerning the hypothesis.

    9.8 Steps in Conducting an Hypothesis Test

    1. Formulate the null and alternative hypotheses (0 1).

    2. Choose the level of significance and hence define the critical value (i.e. divide theregion into rejection and non-rejection regions.

    3. Calculate the test statistic using sample information.

    4. If the calculated statistic falls within the rejection region, reject the null hypothesis;if it is in the non-rejection region, do not reject the null hypothesis.

  • 6 CHAPTER 9. HYPOTHESIS TESTING

    9.9 An Generic Example for Two-Sided Alternative:

    [Transparency 9.4]

    1. Formulate Null and Alternative Hypotheses

    0 : = 0

    1 : 6= 0

    2. Choose level of Significance (say 5% level)

    3. Calculate test statistic (with data)

    =̄ − 0

    We are testing on the basis of sample information whether = 0 (where 0 isa specified (known) number). In this case the test statistic is a t-test. If the nullhypothesis is true ( = 0) then this statistic is distributed as a −1.

    4. Critical Region for decision rule: Reject 0 if

    •|| 2−1

    Do not Reject 0 if:

    −2−1 2−1

    • Notice carefully that there are really two critical values for two-sided tests:

    ±2−1

    9.10 One-Sided Alternatives

    The hypothesis tests above was for a two-sided alternative

    1 : 6= 0

  • 9.11. NOTES ON ONE-SIDED ALTERNATIVES 7

    • Suppose in the previous example it was thought that sales probably had fallen from5.5 (everyone was confident that they could rule out a rise in sales)

    • We might wish to incorporate this belief right into the hypothesis test• This is accomplished by a one-sided alternative:• Redo example for this1. Formulate the Hypothesis (for some specified value of 0)

    0 : ≥ 551 : 55

    Notice that the alternative is narrowed in the direction where we think salesare (in the event that the null is false)

    2. Level of significance is still

    3. The test statistic is unchanged

    4. Decision rule is beased on the critical value −1 (not 2−1 ) so that we reject

    the null hypothesis if

    • −−1

    • Otherwise we retain or do not reject 0.• The calculated value is unchanged at -2.434 but − −1 = −2462 whichmeans that we barely retain the null hypothesis for the one-sied alternative

    9.11 Notes on One-Sided Alternatives

    • This is an example of a one-sided test, since the alternative hypothesis includeseither the less than “” or the greater than “” condition.

    0 : ≥ 0

    1 : 0

    • We could change the nature of the critical value (and hence the rejection andnon-rejection region) by changing the hypothesis test to:

    0 : ≤ 0

    1 : 0

  • 8 CHAPTER 9. HYPOTHESIS TESTING

    [Transparency 9.3]

    • In this case we would calculate the same test statistic as above but the rejectionrule would be

    −1

    • The inequality for 1 is a useful memory aid to decide whether you want to use thepositive critical value () or the negative critical value ()

    • Of course the two-sied alternative you use ± critical value ( 6=)

    9.11.1 Reason for a One-Sided Alternative

    • We note that since −1 2

    −1 that for the same calculated value it ispossible, to retain the null hypothesis for the two-sided alternative while rejectingfor the one-sided alternative

    • Whether we want to reject the null or not depends on whether it is true or not• We never know whether the null hypothesis is true (afterall why would we test it ifwe knew)

    9.12 Type I & Type II Errors

    • It is very easy to lose sight of the fact that we DO NOT KNOW whether thenull is true or not (if we did why do we need to do any test).

    There are 2 kinds of errors we can make:

    1. Type I Error: Rejecting 0 when 0 is true

    2. Type II Error: Not Rejecting 0 when 0 is false

    0 is true 0 is false

    Do not Reject (1−)

    ()

    Reject 0

    ()

    (1−) (called power)

  • 9.13. PROBABILITY OF TYPE I AND II ERRORS 9

    9.13 Probability of Type I and II Errors

    = (Type I Error) = (we reject 0|0is true)= (Test statistic lies in the rejection region|0is true)

    = (TypeII Error) = (we do not reject 0|0is false)

    = 1−

    • Power measures the probability of correctly rejecting 0 when 0 is false.

    9.13.1 Example of Probability of Type I and II

    1. What is the (Type I Error)? Answer which for the above example = 01

    2. What is the P(Type II Error) and Power?

    • To answer this question we must consider values for that are in 1 and

    • Calculate the probabilities. of retaining 0 under various values in the alternative

    • The null can be false in MANY ways under the alternative:

    9.14 Power Calculation

    • Let us calculate the probability of retaining 0 : = 55 when the true = 51(which also happens to be the sample mean, but other values could also be chosen).

    • Suppose Truth: = 51

    • Test Null at: = 55

    • What is our Decision Rule?: Rejection rule: reject 0 if || 2756

    9.14.1 Calculating P(Type II Error) and Power

    1. We assume that the variance is unchanged under 0 and 1 and use the estimate√.

    2. We want to calculate what are the critical values in terms of ̄ .

  • 10 CHAPTER 9. HYPOTHESIS TESTING

    3. We know that we retain 0 : = 55 whenever our calculated t-statistic || 2756.

    {−2756 ̄ − 0√

    2756} = 99

    which after some manipulation can be written

    {0 − 2756×√

    ̄ 0 + 2756×√} = 99

    Substituting 0 = 55 and the estimated standard deviation gives:

    {55− 2756× 9√30

    ̄ 55 + 2756× 9√30} = 99

    4. This leads to 99% critical values in terms of the sample the sample mean ̄:

    (5047 5953) (9.1)

    This gives all the values for the sample mean that would not be rejected for0 : = 55 at the = 01 level of significance. Note that our sample mean̄ = 51 is in the interval and hence we did not reject the null hypothesis We wantto find out the probability of being the interval (5047 5953) for various valuesin the alternative we start with = 51

    5. Calculate the probability of a Type II Error

    = { }= { 0| 0 }= { − |0 ]= {5047 ̄ 5953| = 51}= {5047−519√

    30

    ̄−√

    5953−519√30

    ]

    = [−3225 51911}= {( 3225}= 626

    = 1 − = 1 − 626 = 374

    9.15 Notes on Power

    • The probability of a Type II error when testing 0 : = 55 when the true valueof = 51

    • Note the interval (5047 5953) is not the same as the confidence interval• The confidence interval for the population mean

    ̄ ± 01229 × √which gives (465 556)

  • 9.15. NOTES ON POWER 11

    • We can repeat this calculation for all possible alternatives under 1 : 6= 55.• For example let us do another calculation on the other side of the null, say = 57

    = (5047 ̄ 5953| = 57)

    Ã5047− 57

    9√30

    ̄ −

    5953− 57

    9√30

    != (−3974 1594) = (1594) = 9441

    • Now we can calcultate Probability of Type II Error and Power for a variety of valuesunder

    Power Calculations for Testing 0 : = 55Value under Probability of Type II Error= Power = 1−

    = 44 0 1 = 45 .001 .9999 = 46 .003 .997 = 47 .017 .98 = 48 .067 .93 = 50 .386 .614 = 51 .626 .374 = 549999 .99 .01... = 57 .944 .056...

    • Power Curve: Plot of power on − axis and value of on -axis

    9.15.1 Notes on Type I and II Error

    • We can make a Type I error only when we reject 0 and a Type II error only whenwe do not.

    • We want both and to be small.• While we would like both Type I and Type II Errors to be as small as possible,there is in fact a trade-off.

    • Suppose the null hypothesis is that those charged with crimes are innocent.• Then a legal test which never convicts the innocent (has = 0) would free manywho are guilty (large ).

  • 12 CHAPTER 9. HYPOTHESIS TESTING

    • Lowering will result in a wider non- rejection region which makes it more likelythat a false null hypothesis will be retained.

    • To see this redo the above exercise with = 005 and .05.• Since our interest is usually centered on the null hypothesis is usually chosen tobe small; 10 percent or less.

    • The null and alternative are not treated symmetrically; rejecting the nulldoes not imply that the alternative is true.

    • Alternative is not under test.• Do not say ”we accept the alternative”.• We have seen that the closer is the true value of in to the value under0 the larger is the probability of Type II error and hence the lower is power.

    9.16 Prob- or p-Values [Transparency 9.10 and 9.11]

    • It is arbitrary that the rejection/non-rejection of a test depends on the choice of.

    • An alternative way to report one’s results is to quote the test statistic with ap-value, or prob-value.

    • This allows the user to choose a particular and make their own decision using thereported −

    • A p-value is simply the probability that a test statistic is as large (in absolutevalues) as that calculated under the null hypothesis.

    • It is simply the area in the statistic’s density beyond the point actually observed.

    9.16.1 Example of a −valueIn our example of the expenditure on customer sales our test statistic was -2.43. Thishas a p-value of:

    = ( −243) = 011• For two-sided alternatives you will see authors report the p-value as .011 × 2=.022 (reflecting that both large positives and negatives of the statistic are possi-ble).

    • Only 2% chance of observing a mean of 5.1, if the null hypothesis 0 : = 55against 1 : 6= 55 is true.

    • − can be found from tables in the textbook or from the tables built intocomputer packages like STATA.

  • 9.17. TESTING PROPORTIONS 13

    9.16.2 Interpretation and Use of P-Values

    If the − is less than a chosen level of the test one rejects the null hypothesisat the level.

    − ⇒ Reject 0 (9.2)− ≥ ⇒ Do not Reject 0 (9.3)

    In the above example

    − = 0022

    Therefore for an = 05 we would reject 0 but = 01 we would retain 0

    • One can also report p-values and let readers decide on their own significance levels.• We now have all the tools necessary to do any hypothesis test.• In the rest of this chapter we will consider other applications.

    9.17 Testing Proportions

    • We can test hypotheses about the number of successes in trials , or about theproportion of successes.

    • In the binomial distribution we know the standard deviation of and underthe null which we can use together with the standard normal tables (in fact betterapproximation results can often be obtained by using t-distributions) for one- ortwo-sided tests.

    0 : = 0

    1 : 6= 0• Form test statistic (either a Z statistic or t depending on the degrees of freedom):

    =−0p

    0(1− 0)

  • 14 CHAPTER 9. HYPOTHESIS TESTING

    9.17.1 Example of Test for Population Proportion

    Let us return to the mini survey conducted by Employment and Immigration Canada.They survey = 9 unemployed persons. We have seen how they tested an hypothesisabout the mean duration of unemployment. Now suppose they want to learn the propor-tion of searchers who receive a job offer within the first six weeks of unemployment. Ofthe 9 people surveyed, 2 receive such offers. Suppose that:

    0 : ≥ 5

    : 5

    Let = 05.

    • Then the rejection rule is: reject 0 if −1860 (058 = 1860).• The test statistic is:

    =222− 5p5(5)9

    = −1667

    • The null hypothesis is not rejected at the 5 percent level.• You can see in this example that the null is not rejected, even though there seemsto be a large gap between the agency’s hypothesis and the sample proportion.

    • The null is not easily rejected because the sample is so small, so that samplingvariability is large.

    • The p-value for this problem P( −1667) = .067, again showing that thehypothesis would not be rejected at the 5 percent level.

  • Chapter 10

    Hypothesis Testing: AdditionalTopics

    10.1 Tests of Differences of Population Means

    • Suppose that we have two samples with 1 observations, a mean ̄1, and samplestandard deviation 1 in the first, and 2 observations, a mean ̄2, and samplestandard deviation 2 in the second.

    • Data with this property are most likely to arise from experiments in which twotreatments are applied. The testing problem is:

    0 : 1 = 2

    which can be written as:

    0 : 1 − 2 = 0

    • A general hypothesis test of differences is:

    0 : 1 − 2 = 0

    • where 0 is the hypothesized difference.(usually 0 = 0)

    15

  • 16 CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS

    10.2 Testing Differences when Variances are the same

    • We assume that the two populations have the same variance (see Chapters 7 and8) ,

    1 = 2 =

    • The population standard deviation of the difference (assuming independence) is

    s211+

    222=

    r1

    1+1

    2

    • A pooled estimate of 2 is

    2 =

    P1(1 − ̄1)2 +P2(2 − ̄2)21 + 2 − 2 =

    (1 − 1)21 + (2 − 1)221 + 2 − 2

    • Then we multiply the root of this byr1

    1+1

    2

    • Taking the square root gives us the estimated sample standard deviation:

    ̄1−̄2 =

    r2 × [

    1

    1+1

    2]

    • The test statistic is:

    =̄1 − ̄2 − (1 − 2)

    ̄1−̄2

    • As before we have three different rejection rules (all use the same test statistic)depending on the alternative:

    0 : 1 = 2

    1 : 1 6= 2

  • 10.2. TESTING DIFFERENCES WHEN VARIANCES ARE THE SAME 17

    • Rejection Rule: || 21+2−2 then reject 0

    0 : 1 ≤ 2

    1 : 1 2

    • Rejection Rule: 1+2−2 then reject 0

    0 : 1 ≥ 2

    1 : 1 2

    • Rejection Rule: −1+2−2 then reject 0

    10.2.1 Example of Testing Differences in Population Means

    A market research firm wishes to know if the mean number of hours of TV watching perweek is the same for teenage boys as for teenage girls. The following data were obtained:

    Boys: 1 = 20 ̄1 = 245 21 = 64Girls: 2 = 12 ̄2 = 287 22 = 71

    Carry out a hypothesis test that boys and girls watch the same number of hours ofTV at the 5% level of significance.

    0 : 1 = 2

    1 : 1 6= 2

    • Why 2 sided? DO not have any reason to believe girls and boys are different

  • 18 CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS

    =245 − 287 − 0q[665720

    + 665712]= −141

    since

    2 =(1 − 1)21 + (2 − 1)22

    1 + 2 − 2 =(19)64 + (11)71

    30= 6657

    • As = −141 lies in the non-rejection region (05230 = ±2042) we do not rejectthe hypothesis at the 5% level of significance that boys and girls watch the samenumber of TV hours.

    • Note that ( −141) = 084 and therefore the p-value = 2× 084 =.168.

    10.3 Testing Differences when Variances are Differ-ent

    • Tests are conducted exactly the same way as before except the formula for ̄1 − ̄2and the degrees of freedom are different.

    ̄1 − ̄2 =

    s211

    +222

    and the degrees of freedom is a crazy formula that can be found in the book:

    =

    h2+

    2

    i22

    (−1)+

    2

    (−1)

    • For convenience I have always used = 1 + 2 − 2 and hoped I was not too faroff.

    • Redo the above TV example by not assuming the variances are the same. Is thereany difference to your conclusions?

  • 10.4. TESTING DIFFERENCES OF POPULATION PROPORTIONS 19

    10.4 Testing Differences of Population Proportions

    • Recall that the variance of a sample proportion is estimated by (1− ).• Then if we have two independent samples, for hypotheses about the differencebetween the two population proportions (and hypotheses are always about popula-tions)

    0 : 1 − 2 = 0

    : 1 − 2 6= 0• the test statistic is:

    =(1 − 2)− 0p

    1(1− 1)1 + 2(1− 2)2

    • Often the null hypothesis will be that the two proportions are equal:

    0 : 1 = 2

    • In the equality of proportion case the formula simplifies to:

    =1 − 2p

    (1− )(11 + 12)

    where

    =1 +21 + 2

    10.4.1 Example of Testing Differences in Population Propor-tions

    In a sample of 400 products produced by Machine 1, 23 were defective and in a sampleof 400 products produced by Machine 2, 17 were defective.

    Test:

    0 : 1 − 2 = 0

  • 20 CHAPTER 10. HYPOTHESIS TESTING: ADDITIONAL TOPICS

    against

    1 : 1 − 2 6= 0

    using a 5% level of significance.

    AnswerFrom the question we know

    1 =23

    4002 =

    17

    400

    The pooled estimate is:

    =1 +21 + 2

    =23 + 17

    400 + 400= 05

    Therefore

    =23400− 17

    400p(05)(95)(1400 + 1400)

    = 08111

    • Since is in the non-rejection region (-1.96,1.96) we do not reject 0 at the 5%level of significance.

    • Note that since ( 8111) = 21 that the p-value= .21 × 2 =0.42.

    10.5 Testing the Hypothesis 1 = 2 with PairedData

    • In testing whether the two means were different form 0 or 0 we assumed that thetwo samples were independent.

    • Hence the estimated variance of the difference was the sum of the two variances.

  • 10.5. TESTING THE HYPOTHESIS 1 = 2 WITH PAIRED DATA 21

    • We could do the testing under the assumption that the variances were the same(pooled variance) or different.

    • On occasion, we may have paired data.• This is data that is grouped or paired so that the variation in responses betweenthe members of any pair are less than the variation between members of differentpairs.

    • We can improve the efficiency (lower the variance) of the experiment by randomizingthe two treatments over the two members of each pair.

    • We restrict the randomization so that the treatment is given to one member of eachpair. and obtain a separate estimate of the difference between the treatment effectsfor each pair.

    • The variation among the pairs is not included in our estimate of the variance.• Hence if this variation is large relative to the variation within pairs, the variancefrom paired tests will be smaller than that from a completely randomized (indepen-dent) sampling experiment.

    • This motivates the use of twins in some experiments.• In economics we seldom (ever?) have paired data and so we will not puirsue thismatter

    • In hospital, clinical and drug testing setting, this is often the casewher there ispaired data