Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents...

Statistical Hypothesis Testing

What you need:

1)A good sample statistic or estimator for f(X), call it m(X1,X2,…,Xn).

What is a good sample statistic. Example, f(X) = E(X) = expected value of X. A bad statistic: you have 100 samples, and you use the sample statistic m = (X1 + X2 + … + X10)/10 for your sample mean. A better statistic: m = (X1 + X2 + … + Xn)/n. It is good since it is an unbiased statistic (E(m) = theoretical mean of D, in the above example) and have minimum variance. Other criteria for goodness are consistency and linearity

2)Large enough sample size

The more data, the better the sample estimate is. The law of large numbers states the sample mean (X1 + X2 + … + Xn)/n converges to the true mean of the distribution D as n -> ∞, provided that the population mean exists, Xi’s are independent and has the same distribution as D.

3) f(X) exists and is finite

Implicit in the previous statements is the assumption that f(X) exists and has a finite value. For example, if f(X) = E(X), then we’re assuming that the distribution D has a finite mean. This may not be the case, say when D is Cauchy distributed. The simplest Cauchy distribution has pdf (probability density function)

It can be seen that the mean

is not defined and neither is the variance, so it does not make sense to

test for E(X) =µ0. Be careful what you’re testing for. By the way, the ratio of two independent normal N(0,1) random variables is Cauchy distributed.

)1

1)(

2xxpdf

])1log(

2

1))(()(E 2xdxxpdfxX

No assumption is made about the normality of the distribution D, and it does not have to be so. However, it is known that if D has finite mean and finite variance, and if the sample size n is large (~100 or more), then the statistic

m(X1,X2,…,Xn) = (X1 + X2 + … + Xn)/n

approximates a normal distribution. This is the so-called Central Limit Theorem of probability. This result is often applied to justify the use of normal distribution statistics in large datasets.

It is important to observe that the distribution D’ (not identical to D) of the test statistic m is known completely, given that the null hypothesis is true.

How does it work:

Let m=m(X1,X2,…,Xn) be our test statistic. m itself is a random variable, with probability distribution D’. By assuming that the null hypothesis is true, the distribution D’ is determined completely, and we can find the value q (either directly or using lookup tables) such that

Prob( Z > m ) = q where Z is a standard random variable with distribution D’. We then select a number α (called the significance level, usually = .05), and

reject H0 : if q < α accept H0 : if q > α

What can happen:

H0

True False

p β

(type II error)

1- p(type I error)

1-β

Accept

Reject

α=1-p is the significance level of the test. This is usually set to a value of 0.05. This means that we want the probability of making a type I error (false positives) to be small, ie.

Prob(Reject H0 | H0 is true) = 0.05

Another way of interpreting this: when we reject H0 using the test statistic, there is only a 5% chance that we could be wrong. Note that accepting the null hypothesis H0 is a weaker conclusion than rejecting H0 . It only means that the test provide no evidence to contradict our assumption that H0 is true.

In some cases, it may be desirable to minimize type II errors as well. The power of a test is defined by

Prob(Reject H0 | H0 is false) = 1- β

The more powerful a test, the less chance of making a type II error. The function

H(u,m) = Prob(Z > m | f(X)=u)

is known as the power function of the test.

Example: Annual rainfall data for 8 years in inches at some locality34.1, 33.7,27.4,31.1,30.9,35.2,28.4,32.1

Historically known to be normally distributed with mean=30 inches but with unknown variability. It is hypothesized that the mean annual rainfall has increased as of late.

H0 : µ = 30H1 : µ > 30

Sample statistic m = = 31.6. s2 = sample variance = 7.5. Note

has Student’s t distribution with 8-1=7 degrees of freedom, therefore

Prob(Z > 31.6 | µ =30) = Prob( ) = .072 > .05

So we accept the null hypothesis at .05 significance level. No evidence to support µ > 30 inches with this test. Suppose we have evidence from another dataset collected nearby that the mean annual rainfall is actually 31 in. The power of our test is then

Prob(Z > 31.6 | µ =31) = Prob( ) = .28

with β=1-.28 = .72.

X

8/

(

s

m

5.7

86.1

5.7

8)30(

Z

5.7

86.0

5.7

8)31(

Z

Conclusion:

Hypothesis testing is just another tool for analyzing data. It cannot be relied upon by itself as proof or disproof of an assumption. It may not even be practical. Independent verification of the test result using other methods is always necessary.

“Absence of evidence is not the same as evidence of absence.” - Carl Sagan

Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents...

Documents

Transcript of Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents...