Week 8_2014s2

8/10/2019 Week 8_2014s2

1/31

8/10/2019 Week 8_2014s2

2/31

2

Week 8 topics

Confidence intervals for the population mean

More on hypothesis testing

Type I and Type II errors

p-values Power

8/10/2019 Week 8_2014s2

3/31

3

Interval estimation: review

Point estimators produce a single estimate of theparameter of interest

In many real-world situations, some notion of the

margin of error would be useful Interval estimatorsproduce an intervali.e., a

range of valuesand a degree of confidenceassociated with that interval Hence the name confidence interval

How often would you expect the true population parameterto be in this (sample-specific) interval?

8/10/2019 Week 8_2014s2

4/31

4

Interval estimation for means

95.96.196.1

:yieldtostatementyprobabilitthisrearrangecanweNow

95.96.1/

96.1

tables)(from96.1025.)(05.chooseSay we

1/

:ofvalueedstandardizthisaroundintervalsymmetricaConsider

/thusand)/,(~Suppose 2

nX

nXP

n

XP

boundboundZP

boundn

XboundP

X

n

XZnNX

8/10/2019 Week 8_2014s2

5/31

5

Interval estimation

interval!confidenceadefine,96.1,endpointsThen

X

Remember: The endpoints of the interval are themselves random

variables

We have constructed a random interval

is a constant

For a particular sample (and sample mean value), iseither in the confidence interval, or it is not

If 100 size-n samples were drawn, we would expect95 of them to include

8/10/2019 Week 8_2014s2

6/31

6

Confidence intervals

CIs for means and proportions typically have a

similar structure

Centred at sample statistics

Endpoints are some multiple of the standarderror (if we dont know sigma) or standard

deviation (if we do know sigma) of the samplingdistribution

The multiple is determined by the confidencelevelchosen by the investigator

Remember: If you dont know sigma and have asmall sample, use the t-distribution tables to get

your boundsnot the Z!

8/10/2019 Week 8_2014s2

7/31

7

Selecting sample size

Recall the auditor Clare from last lecture

Suppose she is OK with assuming she knows sigma, andneeds to decide on a sample size

She wants a sample size that yields a margin of error of$4, and she is willing to set the confidence level at 90%

We can now write down the CI and use it to solve for thesample size nthat she requires.

1564

334.30645.1

nor

4requireweThus

intervalconfidencethedefines

05.

2/

n

nz

nzX

Recall, =$30.334, byassumption(based onhistorical data)

8/10/2019 Week 8_2014s2

8/31

8

Hypothesis testing examples

and concepts, again

Maintained or null hypothesis Some statement about a population parameter

LetX be the weight of precooked meat, with mean

Then the null hypothesis is H0: = 0.25 Alternative hypothesis

Will depend on the research objective

Some possibilities here: H1: 0.25, two-tailed hypothesis test(so a value too

extreme in either direction violates the tenet of Truth in

Advertising)

H1: < 0.25, one (lower)-tailed hypothesis test (so a valuetoo low violates the minimum standard of a trading standards

agency or consumer advocacy group)

8/10/2019 Week 8_2014s2

9/31

9


and concepts

Recall how are data used to test a nullhypothesis: Proceed by comparing a test statistic with the value

specified in H0and decide whether the difference is: Small enough to be attributable to random sampling errors

do not reject H0, or

So large that H0is more likely not to be correctreject H0

Formally define a rejection (or critical) region Values of the test statistic that are so extreme they lead us to

reject H0 in favour of H1

Other values of the test statistic that are not soextreme lie in the non-critical region

8/10/2019 Week 8_2014s2

10/31

10

Quality control at McDonalds A quarter-pounder with cheese is presumed to

comprise 0.25 pounds (0.11 kg) of precooked meat

Consider H0: = 0.25, H1: < 0.25

A sample of 25 hamburgers produces sample mean

weights (in pounds!) of: (a) 0.24 (b) 0.23 (c) 0.28 (d) 0.21

Which of these represents evidence against H0?

Which of these would lead you to reject H0?

For which are you most likely to reject H0?

8/10/2019 Week 8_2014s2

11/31

11

Quality control at

McDonalds

.determinecanthen we,andknowandsetweifThus

25.)(

implieswhichifReject

:belwhich wilregion,rejectionthedeterminetoneedWe

25.0:

25.0:

),(~ismeathamburgerofweighttheAssume

1

0

2

L

LL

L

xn

n

xZPxXP

xX

H

H

NX

8/10/2019 Week 8_2014s2

12/31

8/10/2019 Week 8_2014s2

13/31

13

Quality control at

McDonalds

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

z

0.05

-1.645

0.2336

= 0.25

x

8/10/2019 Week 8_2014s2

14/31

Quality control at

McDonalds

Our choice of significance level matters!

Suppose =0.01

Our new rejection region is then z < -2.33 insteadof z < -1.645 , or in terms of the sample mean, thecutoff is 0.2267 rather 0.2336

Does it make sense that the critical value is loweron the number line when =0.01 than when =0.05?

Thus, case (b) with a sample mean of 0.23 would

now NOTlead to rejectionof the null hypothesis 14

8/10/2019 Week 8_2014s2

15/31

8/10/2019 Week 8_2014s2

16/31

16

p-values

How do I choose the significance level ? No rules Conventional choices are = 0.1, 0.05, or 0.01 In the McDonalds example, we saw it could matter

Why do I have to choose a particular ? You dont (though doing so can helpfully bind your hands) You can calculate the empirical significance level, orp-

value

The p-value associated with a given test statistic

is the probability of obtaining a value of the teststatistic as or m ore extreme than that observed,given that the null hypothesis is true More extreme depends on form of the alternative

hypothesis

8/10/2019 Week 8_2014s2

17/31

17

pvalues

0069.)46.2(

5/160.609.7

)09.7(

ZP

n

XP

XPvaluep

From the skills test example of last week:

Thus, it is very unlikely (less than a 1% chance) to

find such an extreme value for the sample mean testscore if H0is correct

There is strong evidence to reject H0

Put another way: for any choice of significance level ()

greater than .0069, we would reject H0

We can only calculatethis if we assume thesampling distribution of Xis centred at a particular

valuewhich we assumeto be the populationmean under the null!

8/10/2019 Week 8_2014s2

18/31

8/10/2019 Week 8_2014s2

19/31

Student progress in BES...

Key features of pastresults for the 10-markquiz On average, students do

well Median=9, mean=8.5,

only 5.3% with mark

8/10/2019 Week 8_2014s2

20/31

20


Q1: Suppose a student randomlychose another BESstudent to help him with his work

What is the probability that the chosen students test mark is

at least 9? Q2: Define a good tutorial (of 20 students) to be one

where the mean mark is at least 9

What is the probability that any given student is in a good

tutorial? Q3: Was the semester 1 2014 BES cohort any

different from what we have seen in the past?

Test this hypothesis using data from a randomly-selected s1

2014 tutorial of size 25, in which the mean mark is 8.8

8/10/2019 Week 8_2014s2

21/31

8/10/2019 Week 8_2014s2

22/31

22


.rejecttoevidencentinsufficieisthere),level!alconventionotherany(or0.01say,at,

4238.02119.0280.02

2588.1

5.88.82)8.8(2

above.statedtesttailed-twotheassuming

mean,sampleourwithassociatedvalue-thecalculatecanWe.)25/88.1,5.8(~CLT,By

5.8:5.8::3

0

2

10

H

ZP

n

XPXPp

pNX

HHQ

This isthe

samplemean in

oursample!

8/10/2019 Week 8_2014s2

23/31

Hypothesis testing:

A note about types of errors

Concepts about errors one can make duringhypothesis testing are similar in statisticalinference and in the judicial system (introduced in

last lecture) Recall the McDonalds quality control example A quarter-pounder with cheese is presumed to

comprise 0.25 pounds (0.11 kg) of precooked meat

Given data (a sample of hamburgers), we can evaluatethis implicit claim

Yet we might conclude that their hamburgers do contain0.25 pounds when in fact they dont (false negative)

Or, we might conclude that their hamburgers dont

contain 0.25 pounds when in fact they do (false positive)23

8/10/2019 Week 8_2014s2

24/31

24


and concepts

Type I errorsoccur when we reject a true nullhypothesis Only possible to make this error when the null is true

Denote P(Type I error) = P(Reject H0| H0true) =

Type II errorsoccur when we dont reject a false nullhypothesis

Only possible to make this error when the null is false Denote P(Type II error) =b

P(Do not reject H0| H0not correct) =b

P(Type II error) depends on what the actual(alternative) parameter value is!

significance level

8/10/2019 Week 8_2014s2

25/31

25

Calculating probability of Type

II errors

Recall our McDonalds example: Suppose we conduct this one-tailed test:

H0: = 0.25, H1: < 0.25

With n= 25, = 0.05, and = 0.05, we previously found the

relevant decision rule to be a rejection of H0 if we find that oursample mean < 0.2336

Suppose that in fact, McDonalds only puts 0.24 pounds in theirquarter pounder. Will we detect this?

y!discrepancthedetectingnotofchance74%aisThere

7389.)64.0(

2505.0

24.2336.)24.|2336.(

ZP

ZPXP b

We nowassume thesamplingdistribution iscentred at thealternative!

the probability that we get a test statistic that makes us fail to

reject the null, given that an alternative is true

8/10/2019 Week 8_2014s2

26/31

26

Power of a test

Power (in statistics): The probability of correctlyrejecting a false null hypothesis

P(Do not reject H0| H0not correct) =b

P(Reject H0| H0not correct) = Power = 1-b

( 0.64) .7389P Zb

From the prior slide:

So, given a true population parameter of 0.24 pounds of meat inthe quarter pounder, the power of this test is:

1 1 0.7389 0.2611b

8/10/2019 Week 8_2014s2

27/31

27

Power of a test

Suppose a company is considering installing an additionalpress to continuously extrude copper. The investment is onlyviable if the press extrudes more than 170 metres of copperper hour. This suggests the following hypothesis test:

H0: = 170; H1: > 170;

with the firm investing in an additional press upon rejection ofthe null.

A large random sample of 400 production hours from existingsimilar presses in the plant has a sample mean of 176 m/hand a sample standard deviation of 65 m/h. Further supposethat this sample is large enough to invoke the CLT.

8/10/2019 Week 8_2014s2

28/31

28

Power of a testSay the firm sets up the hypothesis testusing = 0.01.

H0: = 170; H1: > 170

n= 400, = 0.01, s = 65

Decision rule:

Reject H0if sample mean > 177.57

The firm would hate to make a mistakeover this critical investment. If the newpress had a mean production of 180m/h,

it would be a very attractive investment.What is the power of the test if, in actualfact, = 180?

177.57 | 180p Xb

177.57 180

65 400p Zb

0.75 0.2266p Zb

1-b=1-0.2266

= 0.7734

Verify at home!!

8/10/2019 Week 8_2014s2

29/31

29

Power of a test

What happens to power ifwe increase to 0.05?

What happens to power ifwe increase n to 1000?

1b= 10.0764= 0.9236.

The power increases:

The power increases:

1b10 1

8/10/2019 Week 8_2014s2

30/31

Power of a test...

Summary: H0: = 170; H1: = 180

If =0.01, thenb = 0.2266(Power = 0.7734) when n=400

If =0.05, thenb = 0.0764(Power = 0.9236) when n=400

If =0.05, thenb 0.0000(Power 1) when n=1000

With a different alternative, e.g. H1: = 178 (verify!): If =0.05, thenb=0.2075(Power = 0.7925) when n=400

This (closer-to-the-null) alternative is harder to detect!

P(Z

8/10/2019 Week 8_2014s2

31/31

31

Progress report

We now have procedures to test hypotheses in arange of circumstances, using the sampleproportion and the sample mean

We can use the standard normal tables and thet-tables, as appropriate, in generatingconfidence intervals and testing hypotheses

We know the mistakes we could make in ourtesting, and about the power of our tests.

Next week: Chi-squared tests.

After that, the final broad module of the course

begins Linear regression!

Week 8_2014s2

Documents

Transcript of Week 8_2014s2