Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

35
Statistical Techniques I EXST7005 Lets go Power and Types of Errors

Transcript of Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

Statistical Techniques IEXST7005

Lets go

Power and Types of Errors

Logic of Test of Hypothesis is based on the selected probability, (significance level) for the test statistic (Z) which determines the range of what would be expected due to chance alone assuming H0 is true. Significance level notation, commonly used levels

and terminology"Significant" = 0.05"Highly significant" = 0.01

Hypothesis testing Concepts

Errors!

Is it possible, when we do a test of hypothesis that we are wrong? Yes, unfortunately. It is always possible that we

are wrong.Furthermore, there are two types of error that we

could make!

Data indicates: H0 is true

Data indicates: H0 is false

True result: H0 is true

NO ERROR Type I Error: Reject TRUE H0

True result: H0 is false

Type II Error: Fail to Reject FALSE H0

NO ERROR

Types of error

Type I Error

Type I error is the REJECTION of a true Null Hypothesis.This type of error is also called (alpha) error.This is the value we choose as the "level of

significance" So we actually set the probability of making this

type of error. The probability of a type I error =

Type II Error Type II error is the FAILURE TO REJECT of a Null Hypothesis that is false.This type of error is also called (beta) error.We do not set this value, but we call the probability

of a type II error = Furthermore, in practice we will never know this

value. This is another reason we cannot "accept" the null hypothesis, because it is possible that we are wrong and we cannot state the probability of this type of error.

So what can we do?

The good news, it is only possible to make one error at a time.If you reject H0, then maybe you made a type I

error, but you have not made a type II error. If you fail to reject H0, then maybe you made a

type II error, but you have not made a type I error.

The probability of Type II Error

This is a probability that we will not know. This probability is called

However, we can do several things to make the error smaller. So this will be our objective.

First, let's look at how these errors occur.

Type II Error

10 12 14 16 18 20 22 24 26-4 -3 -2 -1 0 1 2 3 4

Examine an hypothesized distribution (below) that we believe to have a mean of 18.

Type II Error

We are going to do a 2 tailed test with an value of 0.05. Our critical limits will be ±1.96.

10 12 14 16 18 20 22 24 26-4 -3 -2 -1 0 1 2 3 4

10 12 14 16 18 20 22 24 26

-4 -3 -2 -1 0 1 2 3 4

Type II Error

So we will reject any test statistic over 1.96 (or under -1.96). But lets suppose the null hypothesis is false!!! Lets suppose that the alternate hypothesis is true. Then the hypothesized distribution is not real, there is another "real" distribution that we are sampling from. What might it look like?

Type II Error

10 12 14 16 18 20 22 24 26 27 28

Here is the real distribution. It actually has a mean of 22, but we don't know that. If we did, we would not have hypothesized a mean of 18!

Type II Error

10 12 14 16 18 20 22 24 26 27 28

So where on the real distribution is our critical limit. This is the key question.

10 12 14 16 18 20 22 24 26

-4 -3 -2 -1 0 1 2 3 4

Critical value

Type II Error

Now we draw our sample from the real distribution. If our result says reject the H0, we make no error.

10 12 14 16 18 20 22 24 26

-4 -3 -2 -1 0 1 2 3 4

Critical value

10 12 14 16 18 20 22 24 26 27 28

Type II Error

But if our result causes us to fail to reject, we err.

10 12 14 16 18 20 22 24 26

-4 -3 -2 -1 0 1 2 3 4

Critical value

10 12 14 16 18 20 22 24 26 27 28

Type II Error

And in this case it appears we have pretty close to a 50-50 chance of going either way.

Critical value

10 12 14 16 18 20 22 24 26 27 28

Type II Error So we take our sample and do our test. Will we err? Maybe, and maybe not. If our sample happens to fall in the red area, we will not reject H0.

We would err.

10 12 14 16 18 20 22 24 26 27 28

Type II Error

But if our sample happens to fall in the yellow area, we will reject H0.

No error, we draw the correct conclusion.

10 12 14 16 18 20 22 24 26 27 28

Probability of error

For = 0.05, our critical limit would be 1.96. The critical limit translates to a value on the original scale of Yi = + Zi = 18 + 1.96(2) = 18 ± 3.92. The lower bound is 14.08 and the upper bound is 21.92. The lower bound is so far down on the real distribution that the probability of getting a sample that falls there is near zero. The upper bound is the one that falls in the middle of the "real" distribution.

Probability of error (continued)

In this ficticious case we know that the true mean is 22 (we wouldn't normally know the true mean.

Since we know the true mean, we can calculate the probability of drawing a sample above and below the critical limit (21.92 on the Y scale, 0.04 on the Z scale of the real distribution).

Probability of error (continued)

The probability of falling below this value, and of making a type II error is 0.484, or about 48.4%. This is beta ()

The probability of falling above this value, and of NOT making a type II error is 0.516, or 51.6%.

So in this case we can calculate , the probability of a Type II error. In practice we cannot usually know these.

Type II Error Probability

Since we don't actually know what the true mean is (or we wouldn't be hypothesizing something else), we cannot know in practice the type II error rate (). However, it is affected by a number of things, and we can know about these.

We define a new term POWER, this is the probability of NOT making a type II error (=1-). This was 0.516 in our example.

Power and Type II Error First, power is effected by the distance between the hypothesized mean () and true mean ().

Power and Type II Error Second, power is effected by the value chosen for Type I error ().

10 12 14 16 18 20 22 24 26 27 2810 12 14 16 18 20 22 24 26

-4 -3 -2 -1 0 1 2 3 4

10 12 14 16 18 20 22 24 26 27 2810 12 14 16 18 20 22 24 26

-4 -3 -2 -1 0 1 2 3 4

Power and Type II Error

And third, power is effected by the variability or spread of the distribution.

10 12 14 16 18 20 22 24 26 27 28

10 12 14 16 18 20 22 24 26 27 28

Controlling Power

Which of the 3 factors influencing power can you control?For testing means you may be able to control

sample size (n). This reduces the variability and increases power.

You probably cannot influence the difference between and .

You can choose any value of . However, this cannot be too small or Type II error becomes more likely. Too large and Type I error becomes likely.

POWER OF A TEST

The capability of the test to reject H0 when it is false is called Power = 1 -

How would we use our knowledge of factors affecting power to increase the power of our tests of hypothesis?

METHODS OF INCREASING THE POWER OF A TEST

METHODS OF INCREASING THE POWER OF A TESTIncrease the significance level (eg. from =0.01 to =0.05)

– If H0 is true we would increase , the probability of a Type I error.

– If H0 is false then we decrease , the probability of a Type II error, and by decreasing , we are increasing the POWER of test.

METHODS OF INCREASING THE POWER OF A TEST (continued)

For a given , the POWER can be increased by .... increasing n, so Y (= /n) decreases, and the

amount of overlap between the real and hypothesized distributions decreases.

For example, lets suppose we are conducting a test of the hypothesis H0:=0 against the alternative H0:0. We believe 0 = 50 and we set = 0.05. We also know that 2 = 100 and that n = 25.

POWER OF A TEST (continued)

From this information we can calculate that Y = /n = 10/5 = 2.

Then the Critical Region is P(|Z| Z0) = 0.05 and Z0 = 1.96.

And our critical value on the original scale is Yi = + Zi =50 + 1.96(2) = 53.92.

POWER OF A TEST (continued)

Now lets suppose that the REAL population mean is 54. CALCULATE P(Y 53.92), given that the TRUE mean is 54.

Z = (53.92 - 54)/2 = -0.08/2 =- 0.04. The probability of a TYPE II error () is the probability of not drawing a sample that falls above this value and not rejecting the false null hypothesis. The value of = is P(Z-0.04) = 0.4840.

POWER OF A TEST (continued)

So for an experiment with n = 25, the power is 1 - = 1 - 0.4840 = 0.516.

But suppose we had a larger n, say 100. Now Y = /n = 10/10 = 1.

The Critical Region stays at Z0 = 1.96, but on the original scale this is now Yi = + Zi =50 + 1.96(1) = 51.96.

For a true mean of 54 we now get Z=(51.96-54)/1 = -2.04/1 = -2.04.

POWER OF A TEST (continued)

The value of = is P(Z-2.04) = 0.0207, and the power for this test is 1 - = 0.9793.

The result,With n = 25, the power is 0.5160. With n = 100, the power is 0.9793.

This is why statisticians recommend larger sample sizes so strongly. We may never really know what power is, but we know how to reduce it.

Summary

Hypothesis testing is prone to two types of errors, one we control () and one we do not (). Type I error is the REJECTION of a true Null

Hypothesis.Type II error is the FAILURE TO REJECT a Null

Hypothesis that is false. Not only do we not control TYPE II error, we probably do not even know its value.

Summary (continued)

The "Power" of a test is 1 = We can influence power, and hopefully reduce it, by Controlling the distance between and (not

really likely)Selecting a value of that is not too small (0.05

and 0.01 are the usual values) Getting a larger sample size, usually the factor

most under the control of the investigator.