Introduction to Hypothesis Testing

Post on 05-Jan-2016

43 views 0 download

Tags:

description

Introduction to Hypothesis Testing. Chapter 11. Introduction. The purpose of hypothesis testing is to determine whether there is enough statistical evidence supporting a certain belief about a parameter. Examples - PowerPoint PPT Presentation

Transcript of Introduction to Hypothesis Testing

1

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing

Chapter 11

2

Introduction

• The purpose of hypothesis testing is to determine whether there is enough statistical evidence supporting a certain belief about a parameter.

• Examples– Is there statistical evidence in a random sample of potential

customers, that support the hypothesis that more than p% of all potential customers will purchase a new products?

– Is the hypothesis that a certain drug is effective supported by the level of improvement in patients’ conditions after treated with the drug, compared with this of another group of patients who were given a placebo?

3

• Two hypotheses are defined.

H0: The null hypothesis. Under this hypothesis we specify our current belief about the parameter we test. ( = 170, p = .4, etc.)

H1: The alternative hypothesis. Under this hypothesis we specify a range of values for the parameter tested ( > 170; p .4; etc.)effected by some action taken.This is the hypothesis we try to prove!

11.1 Concepts of Hypothesis Testing

4

• The two hypotheses are stated, and a test is run to determine whether a sample statistic supports the rejection of H0 in favor of H1.

Concepts of Hypothesis Testing

H0: = 170H1: > 170

5

The Concept of Hypothesis Testing

180x

= 170

Let’s assume H0 is true: = 170

If we have little incentive to believe 170because and are relatively close. x

180x

A sample is drawn.Assume the sample mean = 180.

6

The Concept of Hypothesis Testing

= 170 250x

A sample is drawn.Now assume the sample mean = 250.

Let’s assume H0 is true: = 170

If we have much more incentive to believe 170 because falls far above . x

250x

The question is: How far is far?Is 250 sufficiently larger than 170 for us to believe that > 170? Click.

7

The Concept of Hypothesis Testing

= 170 250x

Let’s assume H0 is true: = 170

This is the probability that when = 170250x

You may want to think about it as follows. Click:

> 170

If were greater than 170… click

With = 170… clickThis is the probability that when > 170250x

As you can see it becomes more likely that when > 170 250x

8

The Concept of Hypothesis Testing

• We’ll look next at the probability thatas a tool to help decide whether we shouldreject H0.

• This idea will be further discussed (with a somewhat more computational flavor) as example 1 is presented next.

• Pay attention!

250x

9

11.2 Testing the Population Mean when the Population Standard Deviation is Known

• Example 1: Department Store new Billing System – A new billing system for a department store will be cost- effective

only if the mean monthly account is more than $170.– A sample of 400 accounts has a mean of $178.– If the accounts are approximately normally distributed with =

$65, can we conclude that the new system will be cost effective? (can we conclude from the sample result that the accounts population mean is greater than 170?)

10

• Example 1 - Solution– The population of interest is the credit accounts at

the store.– We want to show that the mean account for all

customers is greater than $170.H1 : > 170

– The null hypothesis must specify the values of the parameter not included in H1

H0 : 170

This is what you want to prove

Testing the Population Mean ( is Known)

11

Testing the Population Mean ( is Known)

• To better understand the hypotheses testing concept let us ask the following question: – If H0 is true ( = 170) how likely is it a sample of 400 accounts have a

sample mean at least as large as 178? – Answer: By the central limit theorem

– To illustrate, by sheer chance, out of 10000 samples of 400 accounts each only 69 samples will have a sample mean of 178 or more, if indeed = 170.

– It seems there must be another reason (rather than just “chance”) why the event has occurred. Click.

– Most likely > 170, which explains better why . That is, H0 should be rejected in favor of H1

0.006940065

170178ZP178)xP(

178x 178x

12

Types of Errors

• Testing the hypotheses, two types of errors may occur when deciding whether to reject H0 based on the sample result.– Type I error: Reject H0 when it is true.

– Type II error: Do not reject H0 when it is false.

13

Types I and Type II Errors in Example 1

• Example 1 - continued– Type I error: Believe that > 170 when the real

value of is 170 (reject H0 in favor of H1 when H0 is true).

– Type II error: Believe that 170 when the real value of > 170 (do not reject H0 when it is false).

14

Controlling the probability of conducting a type I error

• Recall:H0: 170 H1: > 170, Since the alternative hypothesis has the form of > 0, H0 is rejected if is sufficiently large!x

x= 170

Critical value

H0

Our job is to determine a critical value for the sample mean. H0 is rejected if the sample mean exceeds that critical value.

15

Controlling the probability of conducting a type I error

• Recall:H0: 170 H1: > 170,

Note. May exceed a critical value (leading to the rejection of H0) but the population mean may still be 170. We don’t want the probability of this event exceeds some acceptable value ().

x

x= 170

Critical value

H0

So how do we determine this critical value?We turn to a type I error and limit the probabilityit occurs.

16

Approaches to Testing

• There are two approaches to test whether the sample mean supports the alternative hypothesis (H1)– The rejection region method is mandatory for manual testing

(but can be used when testing is supported by a statistical software)

– The p-value method which is mostly used when a statistical software is available.

• Both involve an upper limit we set on the probability of conducting a type I error.

17

The null hypothesis is rejected in favor of the alternative hypothesis if a test statistic falls in

the rejection region.

The null hypothesis is rejected in favor of the alternative hypothesis if a test statistic falls in

the rejection region.

The Rejection Region Method

18

Example 1 – solution continued

• Recall: H0: 170 H1: > 170.

• Define a critical value for that is just large enough to reject the null hypothesis.

xLx

• Reject the null hypothesis if

Lxx Lxx

The Rejection Region Method of a Right Hand Tail Test

19

• Allow the probability of committing a type I error be (also called the significance level).

• Find a critical value of the sample mean that is just large enough to guarantee that the actual probability of committing a type I error does not exceed .

Determining the Critical Value for the Rejection Region of a Right Hand Tail Test

20

= 170

P(commit a type I error) = P(reject H0 when H0 is true)

Lxx

n

xZP L

Example 1 – solution continued

Determining the Critical Value for a Right Hand Tail Test

α170)μwhenP( From the central limit theorem:

Lxx

21

= 170x

Example 1 – solution continued

Determining the Critical Value for a Right Hand Tail Test

αL z

nσμx

)ZZ(PSince

n

xZP Land

Lx

22

Determining the Critical Value for a Right Hand Tail Test

.34.17540065

645.1170x

.645.1z,05.0selectweIf

.40065

z170x

L

05.

L

170x Lx

Example 11.1 – solution continued

nzxL

α

L znσμx

Simple algebra

23

Determining the Critical Value for a Right Hand Tail Test

34.175xifhypothesisnullthejectRe

34.175xifhypothesisnullthejectRe

ConclusionSince the sample mean (178) is greater than the critical value of 175.34, there is sufficient evidence to infer that the mean monthly balance is greater than $170 at 5% significance level.

ConclusionSince the sample mean (178) is greater than the critical value of 175.34, there is sufficient evidence to infer that the mean monthly balance is greater than $170 at 5% significance level.

24

Determining the Critical Value for a Right Hand Tail Test

InterpretationThe null hypothesis is rejected in favor of the alternative hypothesis because the sample mean falls in the rejection region. Still we may be erroneous when rejecting the null hypothesis, since could be 170, but the chance we make such a mistake is not greater than 5% (the significance level).

InterpretationThe null hypothesis is rejected in favor of the alternative hypothesis because the sample mean falls in the rejection region. Still we may be erroneous when rejecting the null hypothesis, since could be 170, but the chance we make such a mistake is not greater than 5% (the significance level).

25

– Instead of using the statistic , we can use the standardized value z.

– If the alternative hypothesis is: H1: , then the rejection region is

x

nσμx

z 0

zz

The standardized test statistic

H0: = 0

26

• Example 1 - continued– We redo this example using the standardized test

statistic.Recall:H0: 170

H1: > 170– Test statistic:

– Rejection region: z > z.051.645.

46.240065

170178

n

xz

The standardized test statistic

27

• Example 11.1 - continued

The standardized test statistic

645.1ZifhypothesisnullthejectRe

645.1ZifhypothesisnullthejectRe

ConclusionSince Z = 2.46 > 1.645, reject the null hypothesis in favor of the alternative hypothesis.

ConclusionSince Z = 2.46 > 1.645, reject the null hypothesis in favor of the alternative hypothesis.

28

• Ask the question: How probable is it to obtain a sample mean at least as extreme as 178, if the population mean is 170 (H0 is true)?

The P-value Method

29

0069.)4615.2z(P

)40065170178

z(P

170x 178

The probability of observing a test statistic at least as extreme as 178, given that = 170 is…

The p-value

P-value method

)170when178x(P

30

Because the probability that the sample mean will assume a value of more than 178 when = 170 is so small (.0069), there are reasons to believe that > 170.

178x 170:H x0

170:H x1

…it becomes more probable under H1, when 170x

Note how the event is rare under H0

when but...178x

,170x

Interpreting the p-value

31

We can conclude that the smaller the p-value the more statistical evidence exists to support the alternative hypothesis.

We can conclude that the smaller the p-value the more statistical evidence exists to support the alternative hypothesis.

Interpreting the p-value

32

The p-value provides information about the amount of statistical evidence that supports the alternative hypothesis.

The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed, given that the null hypothesis is true.

P-value – Summary

33

• Describing the p-value– If the p-value is less than 1%, there is overwhelming

evidence that supports the alternative hypothesis.– If the p-value is between 1% and 5%, there is a strong

evidence that supports the alternative hypothesis.– If the p-value is between 5% and 10% there is a weak

evidence that supports the alternative hypothesis.– If the p-value exceeds 10%, there is no evidence that

supports the alternative hypothesis.

Interpreting the p-value

34

The p-value = 0.0069

– The p-value can be used when making decisions based on rejection region methods as follows:

34.175xL

170x

= 0.05

178x

The p-value and the rejection region methods

– Compare the p-value to . Reject the null hypothesis only if the p value < ; Otherwise, do not reject the null hypothesis.

Note: 0.0069 < 0.05!

35

H0:

0

H1:

< 0

Left Hand Tail Test

Reject H0 if falls herex Criticalvalue

36

An Example for a Left Hand Tail Test

• The SSA envelop plan example.– The chief financial officer in FedEx believes that

including a stamped self-addressed (SSA) envelop in the monthly invoice sent to customers will decrease the amount of time it take for customers to pay their monthly bills.

– Currently, customers return their payments in 22 days on the average, with a standard deviation of 6 days.

37

• The SSA envelop example – continued – A random sample of 220 customers was selected

and SSA envelops were included with their invoice packs.

– The time it took customers to pay their bill was recorded (see SSA)

– Can the CFO conclude that the plan will be successful at 10% significance level?

An Example for a Left Hand Tail Test

38

• The SSA envelop example – Solution– The parameter tested is the ‘population mean of the

payment time’ ()– Since the CFO wants to prove that the plan will

be successful, we test whether H1: < 22

– Accordingly, The null hypothesis is: H0: 22

An Example for a Left Hand Tail Test

39

• The SSA envelop example – Solution continued– The rejection region:

It makes sense to believe that < 22 if the sample mean is sufficiently smaller than 22.

– Thus, reject the null hypothesis if

An Example for a Left Hand Tail Test

Lxx Lxx

Lx 22

Rejection Region

40

• Note that is small (certainly less than 50%). So the critical Z value must be negative. Click.

The Standardized Rejection Region for a Left Hand Tail Test

zz zz

The standardized rejection region is:

-z 0

41

• The SSA envelop example – Solution continued• The standardized approach:

From the data we find that the sample mean = 21.44

An Example for a Left Hand Tail Test

Conclusion: Since -1.384 < –1.285 reject the null hypothesis.

-z 0Z = Z.10 = 1.285 so,

-Z.10 = -1.285

This is the sample mean

1.3842206

2221.44nσμx

Z

42

• The SSA envelop example – Solution continue

The p – value approach for a Left Hand Tail Test

The p value = P(Z<-1.384) = .0831 and = 0.1Since .0831 < .1 (p value<) reject the null hypothesis.

p value

-1.384 -1.285

43

An Example for a Two Tail Test

H0:

H1:

Reject H0 if falls herex

Criticalvalue

Criticalvalue

Reject H0 if falls herex

44

• Example 2– AT&T has been challenged by competitors whose

rates arguably resulted in lower bills.– A statistician believes the monthly mean and

standard deviation of the long-distance bills for all AT&T residential customers are $17.85 and $3.87 respectively.

An Example for a Two Tail Test

45

• Example 2 - continued– A random sample of 25 customers is selected and

customers’ bills recalculated using a leading competitor’s rates.

– Assuming the standard deviation is indeed 3.87, can we infer that there is a difference between AT&T’s bills and the competitor’s bills (on the average)?

An Example for a Two Tail Test

46

17.85

• Solution – Is the mean different than 17.85?

H0: 17.85

17.85μ:H1 – Define a two tail rejection region of the form…

(see ATT)

1Lx 2Lx1Lxx 2Lxx

An Example for a Two Tail TestThe Rejection Region approach

47

17.85

We do not want this erroneous rejection of H0 occurs too frequently, say not more than = 5% of the time.

Even under H0 ( =17.85), can fall far above or far below 17.85, in which case we erroneously state that

x

17.85μ

20.025 )xx(P 1L

1Lx 2Lx

20.025 )xx(P 2L

Solution - continued

An Example for a Two Tail TestThe Rejection Region approach

48

17.851Lx 2Lx

16.3325

3.871.9617.85

zμx α/20L1

19.3625

3.871.9617.85

zμx α/20L2

19.13x

From the sample we have:

19.13

Solution - continued

An Example for a Two Tail TestThe Rejection Region approach

49

17.8516.33 19.36

Solution - continued

Since falls between the twocritical values, do not reject the null hypothesis

x

19.13x

From the sample we have:

19.13

An Example for a Two Tail TestThe Rejection Region approach

50

0

20.025 20.025

1.656253.87

17.8519.13nσμx

z

-z= -1.96 z= 1.96

Rejection region

Solution - continued

An Example for a Two Tail Test Standardized approach

Do not reject the null hypothesis

51

20.025 20.025

1.65253.87

17.8519.13nσμx

z

-z= -1.96 z= 1.961.65

The p-value = P(Z< -1.65)+ P(Z >1.65)= 2 P(Z >1.65) > .05

-1.65 0

The two areas combined form the p value

An Example for a Two Tail Test P – Value approach

52

Conclusion: There is insufficient evidence to infer that there is a difference between the bills of AT&T and the competitor, at 5% significance level.

53

11.3 Calculating the Probability of a Type II Error

• To properly interpret the results of a hypothesis test, we need to– specify an appropriate significance level or judge the

p-value of a test;– understand the relationship between Type I and

Type II errors.• How do we compute a type II error probability?

54

• To calculate the probability of a type II error we need to…– Express the rejection region directly, in terms of

the parameter hypothesized (not standardized).– Specify the alternative value under H1

H0:

H1:

Calculating the Probability of a Type II Error

55

• Let us revisit example 1– The null hypothesis was H0: = 170

= 170

Calculating the Probability of a Type II Error

180

H0: = 170

Specify the alternative value

under H1.

– Let the alternative value be = 180 (rather than just >170)

H1: = 180

56

–The rejection region was with = .05.

Express the rejection region directly, not in standardized terms

=.05

= 170

Calculating the Probability of a Type II Error

• Let us revisit example 1

180

H0: = 170

H1: = 180

175.34x

175.34

57

– A type II error occurs when a false H0 is not rejected.

– H0 is false when

– H0 is not rejected when – So, the probability a

type II errors occurs is

180

H1: = 180

H0: = 170

= 170

Calculating the Probability of a Type II Error

180μ

34.175x

34.175x

175.34

58

Calculating the Probability of a Type II Error

175.34xP(β

0764.)40065

18034.175z(P

when = 180)

180

True

H1: = 180

34.175

To summarize:

59

• A hypothesis test is effectively defined by the significance level and by the sample size n.

• The probability of a type II error can be controlled by– changing , and/or– changing the sample size.

Judging the Test

60

Effects on of changing

• Increasing the significance level decreases the value of and vice versa

= 170 180

2 < 2 >

Lx

61

Judging the Test

• Increasing the sample size n reduces

nzxthus,

nx

z:callRe LL

So, by increasing the sample size decreases, and grows smaller.

n

Lx

62

H0 = 170

Judging the TestGraphical demonstration: Note what happens when n increases:

Lx

xL moves to the left, thus, grows smaller.

Lx H1:180

H0 = 170 H1:180

Small n

Larger n

n

σzμx αL

63

• Increasing the sample size reduces • In example 1, suppose ‘n’ increases from 400 to 1000.

0)22.3Z(P)100065

18038.173Z(P

38.173100065

645.1170n

zxL

Judging the Test

• The probability of conducting a type I error remains 5%, but the probability of conducting a type II error drops dramatically.

64

• Power of a test– The power of a test is defined as 1 - – It represents the probability to reject the null

hypothesis when it is false.

Judging the Test

65

Optional: Determining the Sample Size for a Hypothesis test about the Population

Mean (known )• It has been shown that and ‘n’ are inversely related

(increasing ‘n’ decreases ). • So, for a desired value of we can determine the

required sample size.• The formula to determine ‘n’ is:

22

10

2β σ)μ(μ

)Z(Zn

22

10

2β σ)μ(μ

)Z(Zn

• For a two tailed test Z/2 replaces Z.

66

Optional: Determining the Sample Size for a one tail test – Example

• Example 8: Determine the sample size needed to test H0: = 100 against H1: = 130, if the significance level is 2.5% and the desired probability of a type II error is 8%. The population standard deviation is known to be 30.

• Solution: Z = Z.025 = 1.96; Z = Z.08 = 1.405

The selected sample size is therefore n = 12.

11.32130)(100

301.405)(1.96)μ(μ

σ)Z(Zn 2

22

210

22βα