1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of...

1

URBDP 591 A Lecture 12:

Statistical Inference Objectives• Sampling Distribution• Principles of Hypothesis Testing• Statistical Significance

2

Inferential Statistics

• Based on probability theory.

• The question that inferential statistics answers is whether the difference between the sample results and the population results is too great to be due to chance alone.

3

Sampling Distributions

• A sampling distribution is the probability distribution of sample statistics.

• A sampling distribution involves sample statistics that are distributed around a population parameter.

• The central limit theorem.– (1) the sampling distribution will approximate a

normal distribution,– (2) the mean of the sampling distribution will be

equal to the population parameter.

4

The Central Limit Theorem

• For any population, no matter what its shape or form, the distribution of means taken from that population with a constant sample size will be normally distributed if the sample size is sufficiently large (i.e., at least 30).

• The standard deviation of the sampling distribution, or standard error, is equal to the standard deviation of the population divided by the square root of the sample size

9

Hypothesis testing

• Step 1: Set up hypothesis– you should determine whether it is 1-tailed or 2-tailed test

• Step 2: Compute test statistics• Step 3: Determine p-value of the test statistic

– for a pre-determined a, you can find the corresponding critical limit

• Step 4: Draw conclusion

– reject H0 if p-value < alpha (ie greater than the critical limit)

– accept H0 if p-value > alpha (ie less than the critical limit)

11

B-IBI = 60 (no change despite development)

B-IBI 60

15

Test Statistics

Very general statement: obtained difference Test statistic = difference expected by chance

Basic form of all tests of significance:

sample statistic – hypothesized population parameter Test statistic = standard error of the distribution of the test statistic

X

XX

XZ

e.g., z-score for use with sample means:

16

Hypothesis tests

countsin y variabilit

counts expected and observedbetween difference

groups within variancesampling

means group of variance

difference theoferror standard

means sample obetween tw difference e.g.

measure in the variationsampling

Hunder expected is what from measure somein y discrepanc statisticTest

2

0

F

t

17

Interpreting p-values

• p-value quantifies the role of chance

• Large p-value

Result may be due to chance

• Small p-value

Result unlikely to be due to chance.

Conclude that a true and statistically significant difference exists

18

1. Assume the observed effect is due to chance -(This is null hypothesis - written H0.)

2. Find the probability of obtaining the observed effect or bigger when H0 is true. - (The p value)

3. If p is small then it is implausible that the effect is due to chance and we reject H0. (We call this result statistically significant.)

4. If p is large then the effect could be due to chance and we retain H0 as plausible. We call this result statistically not significant.

The logic of statistical testing

19

Drawing conclusion for one-tailed tests

20

Drawing conclusion for two-tailed tests

21

A single population mean

– Suppose we want to study the effect of development on bird species richness on a randomly selected number of sites (n=100).

– We measure species richness (X) after development, and mean species richness is 9.

– Assume X follows a Normal distribution with a S.D. of 4.

22

Step 1: Set up hypothesis

H0 : 0 = 8

H1 : 0 8

This is a 2-tailed test.

23

x= (x1+x2+...+x100)/100 = 9

If is known to be 4, then

test statistics z = (9.0 - 8.0) / (4.0 / 100 )

= 2.5

It follows a Normal distribution with mean 0 and variance 1

z x - n

=

0

/

If x ~ N(0 ,2), then x ~ N(0 ,2/n),

Step 2: Compute test statistics

23

24

• For =0.05, Z0.05/2 = 1.96 (from Normal table)

Since z-value = 2.5 > 1.96, so p-value < 0.05

• For =0.01, Z0.01/2 = 2.58 (from Normal table)

Since z-value = 2.5 < 2.58, so p-value > 0.01

Step 3: Determine p-value

25

• We reject H0 at 5% level as p-value<0.05 and conclude that bird species richness is significantly different from 8 at 5% significance level.

• Notice that we have to accept H0 at 1% level as p-value>0.01 and conclude that bird species richness is not statistically different from 8 at 1% significant level.

Step 4: Draw conclusion

26

– Suppose we want to study the effect of two development patterns A and B on bird species richness. We randomly select 52 sites which will be developed with high density development (development type A), and low density development (development type B).

– We measure species richness (X) after development. The means for treatments A and B are 9 and 8 respectively.

– Assume Normal distribution, and the S.D. for treatments A and B are 4 and 4.5.

Difference between two population means

27

Step 1: Set up hypothesis

H0: A = B

H1: A B

or

H0: A - B = 0

H1: A - B 0

28

Step 2: Compute test statistics

SE( xA - xB ) = A2/nA + B

2/nB

= 8.51

Test statistic is:

z = [ ( xA - xB ) - (A - B) ] / SE( xA - xB )

= [ (90 - 80) - 0 ] / 8.51

=1.18

29

Step 3: Determine p-value• For =0.05, Z0.05/2 = 1.96

Since z-value = 1.18 < 1.96, so p-value > 0.05

Step 4: Draw conclusion– We accept H0 at 5% level as p-

value>0.05, and conclude that bird species richness after the two treatments are not statistically different at 5% significance level. In other words, the effects of the two treatments are not statistically different.

30

– If there is an effect, the effect may either be positive (get better) or negative (get worse).

– Two-tailed test is to study the existence of the effect in either direction (ie. positive or negative effect).

– One-tailed test is to study the existence of the effect in one direction (eg. positive effect).

– Provided that we have a priori knowledge about it– Directional hypothesis

One-tailed vs. two-tailed tests

31

Statistical significance vs. practical importance

• If the sample size is unnecessarily large...– Differences may be established as statistically

significant, and yet be too small to have any practical consequences.

• The optimum sample size…– is just large enough to detect differences of a size

which the researcher believes to be of practical importance. This firstly involves a professional assessment of how large a difference is important, followed by a power analysis to determine the required sample size.

32

Statistical significance vs. practical importance

• If the sample size is unnecessarily large...– Differences may be established as statistically

significant, and yet be too small to have any practical consequences.

• The optimum sample size…– is just large enough to detect differences of a size

which the researcher believes to be of practical importance. This firstly involves a professional assessment of how large a difference is important, followed by a power analysis to determine the required sample size.

1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of...

Documents

Transcript of 1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of...