CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing...

33
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 2 Test of Significance

Transcript of CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing...

Page 1: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

CHP400:Community Health Program - lI

Research Methodology.

Data analysis

Hypothesis testing

Statistical Inference

testt-test and 2

Test of Significance

Page 2: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

Content

Definition of Hypothesis Statistical Hypotheses Hypothesis testing Performing the t – test Interpretation of the t-test The contingency table Performing the chi square test The assumptions / Limitations of the Chi square test Causal Associations

Page 3: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

A tentative explanation for an observation, phenomenon, or scientific problem that can be tested by further investigation.

Something taken to be true for the purpose of argument or investigation.

A mere assumption or guess

Statistical HypothesisDefinition

Page 4: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

Since that is often impractical, researchers typically examine a random sample from the population.

Statistical Hypotheses

The best way to determine whether a statistical hypothesis is true would be to examine the entire population.

If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected.

Page 5: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true.

Hypothesis testing consists of four steps.

Statistical Hypothesis testing

1- Null hypothesis: H0 and Alternative hypothesis: HA

2- Test statistic:3- Compute the “-value”4- Compare the ()-value to an

acceptable significance value()

Page 6: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

Statistical Hypothesis testing1- Null hypothesis: H0 and Alternative hypothesis: HA

Formulate the null hypothesis (commonly, that the observations are “not different” or” not associated”)

2- Test statistic:3- Compute the “P-value”4- Compare the ()-value to an acceptable significance value()

and the alternative hypothesis (commonly, that the observations show a “real difference” or “association”)

Page 7: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

Statistical Hypothesis testing1- Null hypothesis: H0 and Alternative hypothesis: HA

2- Test statistic:

Identify a “statistical test” that can be used to assess the truth of the null hypothesis(2 test, t-test etc.); and calculate the “test statistic”3- Compute the “P-value”4- Compare the ()-value to an acceptable significance value()

Page 8: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

Statistical Hypothesis testing1- Null hypothesis: H0 and Alternative hypothesis: HA

2- Test statistic:

3- Compute the “P-value”Compute the -value, which is the probability that a test statistic equals observed value or even more extreme would be obtained assuming that the null hypothesis were true. The smaller the -value, the stronger the evidence against the null hypothesis. 4- Compare the ()-value to an acceptable significance value()

Page 9: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

Statistical Hypothesis testing1- Null hypothesis: H0 and Alternative hypothesis: HA

2- Test statistic:3- Compute the “P-value”

4- Compare the ()-value to an acceptable significance value()

Compare the -value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid

Page 10: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-test

SignificanceTest of Significance Test of

1. Test for single mean Whether the sample mean is equal to the

predefined population mean ? 2. Test for difference in means Whether “the mean energy expenditure ” in

obese women is equal to “the mean energy expenditure ” in lean women?

3. Test for paired observation Whether the treatment conferred any

significant benefit ?

Page 11: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

The t-testExamines the difference between means

2nd groupmean

Is there a difference?

1st groupmean

Page 12: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

mediumvariability

highvariability

lowvariability

The difference in mean is the same for

all three cases

What does difference mean?

Which one showsthe greatestdifference?

Page 13: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

Difference between means

Sample size

Variability of data

t-test t t ++

What does difference mean?

Page 14: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

a statistical difference is a function of the difference between means relative to the variability

=difference between group means

Variability of groups

=

=

X

1 2

X

X

1 2

X( )SE

t-value

=

X X

1 2

112pS n1 n2

What does difference mean?

Page 15: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

Given below are the 24 hrs total energy expenditure (MJ/day) in groups of lean and obese women. Examine whether the obese women’s mean energy expenditure is significantly higher ?.

Lean

6.1 7.2 7.5 10.9

8.5 5.5 7.6

7.9 9.1 8.1

8.3 8.4 10.2

t-test for difference in means

Obese 8.8 9.2 9.2 9.7 9.7 10.0 11.5 11.8 12.8

Page 16: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

Null Hypothesis( )

Obese women’s mean energy expenditure is equal to the lean women’s mean energy expenditure.

Data Summary

lean Obese

N 13 9

8.10 10.30

S 1.38 1.25

Xt-test for difference in means

Solution

H0

Page 17: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

= 0.05df = 13 + 9 - 2 = 20Critical Value(s):

X X

X X

Solution

t-test for difference in means

H0 : 1 - 2 = 0

HA : 1 - 2 ≠ 0

Page 18: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

α (2 tail) 0.1 0.05 0.02 0.01 0.005 0.002 0.001

df

1 6.3138 12.7065 31.8193 63.6551 127.3447 318.4930 636.0450

2 2.9200 4.3026 6.9646 9.9247 14.0887 22.3276 31.5989

3 2.3534 3.1824 4.5407 5.8408 7.4534 10.2145 12.9242

4 2.1319 2.7764 3.7470 4.6041 5.5976 7.1732 8.6103

17 1.7396 2.1098 2.5669 2.8983 3.2224 3.6458 3.9651

18 1.7341 2.1009 2.5524 2.8784 3.1966 3.6105 3.9216

19 1.7291 2.0930 2.5395 2.8609 3.1737 3.5794 3.8834

20 1.7247 2.0860 2.5280 2.8454 3.1534 3.5518 3.8495

21 1.7207 2.0796 2.5176 2.8314 3.1352 3.5272 3.8193

49 1.6766 2.0096 2.4049 2.6800 2.9397 3.2651 3.5004

50 1.6759 2.0086 2.4033 2.6778 2.9370 3.2614 3.4960

51 1.6753 2.0076 2.4017 2.6757 2.9343 3.2579 3.4917

106 1.6593 1.9826 2.3620 2.6230 2.8670 3.1689 3.3847

107 1.6592 1.9824 2.3617 2.6225 2.8664 3.1681 3.3838

149 1.6551 1.9760 2.3516 2.6092 2.8494 3.1458 3.3570

150 1.6551 1.9759 2.3515 2.6090 2.8491 3.1455 3.3565

151 1.6550 1.9758 2.3513 2.6088 2.8489 3.1451 3.3561

199 1.6525 1.9720 2.3452 2.6008 2.8387 3.1317 3.3401

200 1.6525 1.9719 2.3451 2.6007 2.8385 3.1315 3.3398

t-test for difference in means

Page 19: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

= 0.05df = 13 + 9 - 2 = 20Critical Value(s): (tab t 9+13-2 = 20 df = t 0.05,20 = 2.086)

X X

X X

Solution

t-test for difference in means

H0 : 1 - 2 = 0

HA : 1 - 2 ≠ 0

(2.086)

Page 20: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

Calculating the Test Statistic:

=

X X

1 2

11

n1

2pS

t

S2

p=S2

1

(n1-1)+(n2-1)

(n1-1)+ (n2-1)2

S2p

S22

S21

n1

n2

:pooled-variance

: variance of sample 1

: variance of sample 2

:size of sample 1

:size of sample 2

n2

t-test for difference in means

S2

Page 21: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

S2p=

S21

(n1-1)+(n2-1)

(n1-1)+ (n2-1)S22

First, estimate the common variance as a weighted average of the two sample variances using the degrees of freedom as weights

Calculating the Test Statistic:

= (13-1)+(9-1)= 1.767

t-test for difference in means

1.38 2(13-1)+1.25 2(9-1)

Page 22: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

Calculating the Test Statistic:

=

X X

1 2

11

n1

2pS

t

=8.1 – 10.3

1.767

n2

1 1+

13 9_ _

3.82=

t-test for difference in means

Page 23: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

The t-testTest of Significance

Inference: The calculated t (3.82) is higher than table t (at 0.05, 20. ie 2.086) . This implies that there is an evidence that the mean energy expenditure in obese group is significantly (p<0.05) higher than that of lean group.

t-test for difference in means

Page 24: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

A physician wants to know whether the proportion of esophageal cancer diagnosed as multiple primary tumors (MPT) differs between females and males. He selected a random sample of 100 cases of esophageal cancer, 60 were males and 40 were females. Of the 60 male esophageal cancer patients, 40 were diagnosed as MPT. Of the 40 female esophageal cancer patients, 10 were diagnosed as MPT.

Chi square test

Page 25: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

Exposure(estrogen)

Outcome (cancer)Total

Yes No

Yes

No

Total

Out of 30 women who had uterine cancer, 20 claimed to have used estrogens. Out of 30 women without uterine cancer 10 claimed to have used estrogens.

Chi square test

Page 26: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

Exposure(estrogen)

Outcome (cancer)Total

Yes No

Yes 20 20 10 10 3030

No 10 10 20 20 3030

Total 3030 3030 6060

Out of 30 women who had uterine cancer, 20 claimed to have used estrogens. Out of 30 women without uterine cancer 10 claimed to have used estrogens.

Chi square test

Page 27: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

A: Calculating 2 value1. Calculate the expected frequency (E) for each

cell

2. For each cell subtract the expected frequency from the observed frequency (O):

3. For each cell, square the result of (O-E) and

divide by

the expected frequency (E):

StepsChi square test

row total x column total

total number of observation

E =

O – E

(O – E)E

Page 28: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

Exposure(estrogen)

Outcome(cancer)Total

Yes No

Yes 20 20 (15) 10 10 (15) 3030

No 10 10 (15) 20 20 (15) 3030

Total 3030 3030 6060

Out of 30 women who had uterine cancer, 20 claimed to have used estrogens. Out of 30 women without uterine cancer 10 claimed to have used estrogens.

StepsChi square test

Page 29: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

4. Add the results of step (3) for all the cells Formula: 2 = [ ( O - E )2 / E] = [(20-15)2/15] + [(10-15) 2//15] +

[(10-15) 2/15] + [(20-15) 2/15]

= 1.67 + 1.67 + 1.67 + 1.67 = 6.68B: Deciding p value 1. Calculate degrees of freedom (r-1) (c-1)

df = (2-1) (2-1) = 1 x 1 =1

2. Looking at 2 table

StepsChi square test

Page 30: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879

2 4.605 5.991 7.378 9.210 10.597

3 6.251 7.815 9.348 11.345 12.838

4 7.779 9.488 11.143 13.277 14.860

22 30.813 33.924 36.781 40.289 42.796

23 32.007 35.172 38.076 41.638 44.181

24 33.196 36.415 39.364 42.980 45.559

30 40.256 43.773 46.979 50.892 53.672

40 51.805 55.758 59.342 63.691 66.766

50 63.167 67.505 71.420 76.154 79.490

60 74.397 79.082 83.298 88.379 91.952

100 118.498

124.342 129.561 135.807 140.169

StepsChi square test

Page 31: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

StepsChi square test

Inference: The calculated 2 (6.68) is higher than table 2(at 0.05, 1. ie 3.841) . This implies that there is an evidence that there is association between use of estrogens and uterine cancer

Page 32: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance Chi square test

Data is from a random sample. Data is from a random sample.

A sufficiently large sample size is A sufficiently large sample size is required (at least 20) required (at least 20)

Actual count data (not percentages) Actual count data (not percentages)

All expected counts are All expected counts are ≥1≥ 80%expected counts are expected counts are ≥5

Observations must be independent. Observations must be independent.

Does not prove causality.Does not prove causality.

Assumptions / Limitations

Page 33: CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.

2testTest of Significance

Causal Associations

Relationship between variables

Relationship between variables

Not statistically associated

Not statistically associated

Non-causalNon-causal CausalCausal

Indirectly causal

Indirectly causal

Statistically associated

Statistically associated

Directly causal

Directly causal