Statistical Significance for a two-way table Inference for a two-way table We often gather data and...

10
Statistical Significance for a two-way table ference for a two-way table •We often gather data and arrange them in a two-way table to see if two categorical variables are related to each other. •Look for an association between the row and column variables. • Is the association in the sample evidence of an association between these variables in the entire population? •Or could the sample association easily arise just from the error in random sampling?

Transcript of Statistical Significance for a two-way table Inference for a two-way table We often gather data and...

Page 1: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way tableInference for a two-way table

•We often gather data and arrange them in a two-way table to see if two categorical variables are related to each other.

•Look for an association between the row and column variables.

• Is the association in the sample evidence of an association between these variables in the entire population?

•Or could the sample association easily arise just from the error in random sampling?

Page 2: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way table

Heart

Attack No Heart

Attack

Total Heart

Attacks (%) Rate per

1000 Aspirin 104 10,933 11,037 0.94 9.4 Placebo 189 10,845 11,034 1.71 17.1

Total 293 21,778 22,071

Aspirin Group: Percentage who had heart attacks = 0.94%Placebo Group: Percentage who had heart attacks = 1.71%

Difference: only 1.71% – 0.94% = 0.77%

•Are we convinced by the data that there is a real relationship in the population between taking aspirin and risk of heart attack?

•Need to assess if the relationship is statistically significant.

•Experiment included over 22,000 men, so small difference could be statistically significant

Example : Aspirin and Heart Attacks

Page 3: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way table

Example : Ease of Pregnancy for Smokers and Nonsmokers

Difference: 41% – 29% = 12%

Larger difference, but only based on 586 subjects. Convincing?

Page 4: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way tableStep 1: Stating The Hypotheses

Example 1: Aspirin and Heart Attacks

Null Hypothesis: There is no relationship between taking aspirin

and risk of heart attack in the population.

Alternative Hypothesis: There is a relationship between taking aspirin

and risk of heart attack in the population.

Example 2: Ease of Pregnancy and Smoking

Null Hypothesis: Smokers and nonsmokers are equally likely to get pregnant in 1st cycle in population of women trying to get pregnant.

Alternative Hypothesis: Smokers and nonsmokers are not equally likely to get pregnant in 1st cycle in population of women trying to get pregnant.

Page 5: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way tableThe chi-square test

To see if the data give evidence against the null hypothesis of "no relationship," compare the counts in the two way table with the counts we would expect if there really were no relationship.

If the observed counts are far from the expected counts, that's the evidence we were seeking. The test uses a statistic that measures how far apart the observed and expected counts are.

•The chi-square statistic is a sum of terms, one for each cell in the table.

•Because chi-square measures how far the observed counts are from what would be expected if null hypothesis were true, large values are evidence against null hypothesis.

•This sampling distribution is not a Normal distribution. It is a right-skewed distribution that allows only nonnegative values because chi-square can never be negative.

Expected count= (row total)(column total)

(table total)

Page 6: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way table

Step 2: Collect data and summarize with a ‘test statistic’.Chi-square statistic: compares data in sample to what would be expected if no relationship between variables in the population.

Step 3: Determine how unlikely test statistic would be if the null hypothesis were true.

p-value: probability of observing a test statistic as extreme as the one observed or more so, if the null hypothesis is really true. (For chi-square: more extreme = larger value of chi-square statistic.)

Step 4: Make a decision.

If chi-square statistic is at least 3.84, the p-value is 0.05 or less, so conclude relationship in population is real. That is, we reject the null hypothesis and conclude the relationship is statistically significant.

The chi-square test

Page 7: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way tableEase of Pregnancy and Smoking

Pregnancy Occurred After First Cycle Two or More Cycles Total Percentage in First

Smoker 29 71 100 29% Nonsmoker 198 288 486 41%

Total 227 359 586 38.7%

1. Compute the expected numbers.Expected number of smokers pregnant after 1st cycle:

(100)(227)/586 = 38.74

Can find the remaining expected numbers by subtraction.

Pregnancy Occurred After First Cycle Two or More Cycles Total

Smoker 38.74 100 – 38.74 = 61.26 100 Nonsmoker 227 – 38.74 = 188.26 486 – 188.26 = 297.74 486

Total 227 359 586

Page 8: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way tableExample 3: Ease of Pregnancy and Smoking

2. Compare Observed and Expected counts.(observed count – expected count)2/(expected count)First cell: (29 – 38.74)2/(38.74) = 2.45Remaining cells shown in table below.

3. Compute the chi-squared statistic.chi-square statistic = 2.45 + 1.55 + 0.50 + 0.32 = 4.82

Pregnancy Occurred After First Cycle Two or More Cycles Total

Smoker 29 (38.74) 71 (61.26) 100 Nonsmoker 198 (188.26) 288 (297.74) 486

Total 227 359 586

Pregnancy Occurred After First Cycle Two or More Cycles

Smoker 2.45 1.55 Nonsmoker 0.50 0.32

What is your conclusion?

Page 9: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way table

Minitab Results for Example : Ease of Pregnancy and Smoking

P-Value = 0.028

Page 10: Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.

Statistical Significance for a two-way table

Heart

Attack No Heart

Attack

Total Heart

Attacks (%) Rate per

1000 Aspirin 104 10,933 11,037 0.94 9.4 Placebo 189 10,845 11,034 1.71 17.1

Total 293 21,778 22,071

Example : Aspirin and Heart Attacks

Chi-squared statistic = 25.01 - highly statistically significant with with p-value < 0.00001