Stoichiometry Chemistry I: Chapter 12 Chemistry IH: Chapter 12.
Chapter 12
description
Transcript of Chapter 12
![Page 1: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/1.jpg)
Chapter 12
The Analysis of Categorical Data and
Goodness-of-Fit Tests
![Page 2: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/2.jpg)
2 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Univariate Categorical Data
Univariate categorical data is best summarized in a one-way frequency table. For example, consider the following observations of sample of faculty status for faculty in a large university system.
Full Professor
Associate Professor
Assistant Professor Instructor
Adjunct/Part time
Frequency 22 31 25 35 41
Category
![Page 3: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/3.jpg)
3 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Univariate Categorical Data
A local newsperson might be interested in testing hypotheses about the proportion of the population that fall in each of the categories. For example, the newsperson might want to test to see if the five categories occur with equal frequency throughout the whole university system.
To deal with this type of question we need to establish some notation.
![Page 4: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/4.jpg)
4 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Notation
k = number of categories of a categorical variable
1 = true proportion for category 1
2 = true proportion for category 2 k = true proportion for category k
(note: 1 + 2 + + k = 1)
![Page 5: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/5.jpg)
5 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Hypotheses
H0: 1 = hypothesized proportion for category 1
2 = hypothesized proportion for category 2
k = hypothesized proportion for category k
Ha: H0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.
![Page 6: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/6.jpg)
6 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Expected Counts
For each category, the expected count for that category is the product of the total number of observations with the hypothesized proportion for that category.
![Page 7: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/7.jpg)
7 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Expected Counts - Example
Consider the sample of faculty from a large university system and recall that the newsperson wanted to test to see if each of the groups occurred with equal frequency.
Full Professor
Associate Professor
Assistant Professor
InstructorAdjunct/Part time Total
Frequency 22 31 25 35 41 154Hypothesized
Proportion0.2 0.2 0.2 0.2 0.2 1
Expected Count
30.8 30.8 30.8 30.8 30.8 154
Category
![Page 8: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/8.jpg)
8 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Goodness-of-fit statistic, 2
22 (observed cell count - expected cell count)expected cell count
The value of the 2 statistic is the sum of these terms.
22 (observed cell count - expected cell count)expected cell count
The value of the 2 statistic is the sum of these terms.
The goodness-of-fit statistic, 2, results from first computing the quantity
for each cell.
2(observed cell count - expected cell count)expected cell count
The goodness-of-fit statistic, 2, results from first computing the quantity
for each cell.
2(observed cell count - expected cell count)expected cell count
![Page 9: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/9.jpg)
9 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Chi-square distributions
Chi-square Distributions
0 5 10 15 20 25x
df = 1
df = 2
df = 3
df = 4
df = 5
df = 8
df = 10
df = 15
![Page 10: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/10.jpg)
10 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Upper-tail Areas for Chi-squared DistributionsRight-tail area df = 1 df = 2 df = 3 df = 4 df = 5
> .100 < 2.70 < 4.60 < 6.25 < 7.77 < 9.230.100 2.70 4.60 6.25 7.77 9.230.095 2.78 4.70 6.36 7.90 9.370.090 2.87 4.81 6.49 8.04 9.520.085 2.96 4.93 6.62 8.18 9.670.080 3.06 5.05 6.75 8.33 9.830.075 3.17 5.18 6.90 8.49 10.000.070 3.28 5.31 7.06 8.66 10.190.065 3.40 5.46 7.22 8.84 10.380.060 3.53 5.62 7.40 9.04 10.590.055 3.68 5.80 7.60 9.25 10.820.050 3.84 5.99 7.81 9.48 11.070.045 4.01 6.20 8.04 9.74 11.340.040 4.21 6.43 8.31 10.02 11.640.035 4.44 6.70 8.60 10.34 11.980.030 4.70 7.01 8.94 10.71 12.370.025 5.02 7.37 9.34 11.14 12.830.020 5.41 7.82 9.83 11.66 13.380.015 5.91 8.39 10.46 12.33 14.090.010 6.63 9.21 11.34 13.27 15.080.005 7.87 10.59 12.83 14.86 16.740.001 10.82 13.81 16.26 18.46 20.51
< .001 > 10.82 > 13.81 > 16.26 > 18.46 > 20.51
Right-tail area df = 6 df = 7 df = 8 df = 9 df = 10 > .100 < 10.64 < 12.01 < 13.36 < 14.68 < 15.980.100 10.64 12.01 13.36 14.68 15.980.095 10.79 12.17 13.52 14.85 16.160.090 10.94 12.33 13.69 15.03 16.350.085 11.11 12.50 13.87 15.22 16.540.080 11.28 12.69 14.06 15.42 16.750.075 11.46 12.88 14.26 15.63 16.970.070 11.65 13.08 14.48 15.85 17.200.065 11.86 13.30 14.71 16.09 17.440.060 12.08 13.53 14.95 16.34 17.710.055 12.32 13.79 15.22 16.62 17.990.050 12.59 14.06 15.50 16.91 18.300.045 12.87 14.36 15.82 17.24 18.640.040 13.19 14.70 16.17 17.60 19.020.035 13.55 15.07 16.56 18.01 19.440.030 13.96 15.50 17.01 18.47 19.920.025 14.44 16.01 17.53 19.02 20.480.020 15.03 16.62 18.16 19.67 21.160.015 15.77 17.39 18.97 20.51 22.020.010 16.81 18.47 20.09 21.66 23.200.005 18.54 20.27 21.95 23.58 25.180.001 22.45 24.32 26.12 27.87 29.58
< .001 > 22.45 > 24.32 > 26.12 > 27.87 > 29.58
![Page 11: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/11.jpg)
11 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Goodness-of-Fit Test ProcedureHypotheses:
H0:1 = hypothesized proportion for category 1
2 = hypothesized proportion for category 2
k = hypothesized proportion for category k
Ha:H0 is not true
Test statistic:22 (observed cell count - expected cell count)
expected cell count Test statistic:
22 (observed cell count - expected cell count)expected cell count
![Page 12: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/12.jpg)
12 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Goodness-of-Fit Test Procedure
P-values: When H0 is true and all expected counts are at least 5, 2 has approximately a chi-squared distribution with df = k-1.
Therefore, the P-value associated with the computed test statistic value is the area to the right of 2 under the df = k-1 chi-squared curve.
![Page 13: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/13.jpg)
13 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Goodness-of-Fit Test Procedure
Assumptions:
1. Observed cell counts are based on a random sample.
2. The sample size is large. The sample size is large enough for the chi-squared test to be appropriate as long as every expected count is at least 5.
![Page 14: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/14.jpg)
14 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleConsider the newsperson’s desire to determine if the faculty of a large university system were equally distributed. Let us test this hypothesis at a significance level of 0.05.
Let 1, 2, 3, 4, and 5 denote the proportions of all faculty in this university system that are full professors, associate professors, assistant professors, instructors and adjunct/part time respectively.
H0: 1 = 0.2, 2 = 0.2, 3 = 0.2, 4= 0.2, 5 = 0.2
Ha: H0 is not true
![Page 15: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/15.jpg)
15 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleSignificance level: = 0.05
Assumptions: As we saw in an earlier slide, the expected counts were all 30.8 which is greater than 5. Although we do not know for sure how the sample was obtained for the purposes of this example, we shall assume selection procedure generated a random sample.
Test statistic:22 (observed cell count - expected cell count)
expected cell count Test statistic:
22 (observed cell count - expected cell count)expected cell count
![Page 16: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/16.jpg)
16 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleCalculation:
recallFull
ProfessorAssociate Professor
Assistant Professor
InstructorAdjunct/Part time Total
Frequency 22 31 25 35 41 154Hypothesized
Proportion0.2 0.2 0.2 0.2 0.2 1
Expected Count
30.8 30.8 30.8 30.8 30.8 154
Category
2 2 2 2 2
2 22 30.8 31 30.8 25 30.8 35 30.8 41 30.8
30.8 30.8 30.8 30.8 30.82.514 0.001 1.092 0.573 3.378
7.56
![Page 17: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/17.jpg)
17 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleP-value: The P-value is based on a chi-
squared distribution with df = 5 - 1 =4. The compute value of 2, 7.56 is smaller than 7.77, the lowest value of 2 in the table for df = 4, so that the P-value is greater than 0.100.
Conclusion: Since the P-value > 0.05 = , H0 cannot be rejected. There is not sufficient evidence to refute the claim that the proportion of faculty in each of the different categories is the same.
![Page 18: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/18.jpg)
18 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity and Independence in a Two-Way Table
Data resulting from observations made on two different categorical variables can be summarized using a tabular format. For example, consider the student data set giving information on 79 student dataset that was obtained from a sample of 79 students taking elementary statistics. The table is on the next slide.
![Page 19: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/19.jpg)
19 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity and Independence in a Two-Way Table
This is an example of a two-way frequency table, or contingency table.
The numbers in the 6 cells with clear backgrounds are the observed cell counts.
Contacts Glasses NoneFemale 5 9 11Male 5 22 27
![Page 20: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/20.jpg)
20 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity and Independence in a Two-Way TableMarginal totals are obtained by adding the observed cell counts in each row and also in each column.
The sum of the column marginal total (or the row marginal totals) is called the grand total.
Contacts Glasses NoneRow Marginal
Total
Female 5 9 11 25Male 5 22 27 54
Column Marginal Total
10 31 38 79
![Page 21: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/21.jpg)
21 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity in a Two-Way Table
Typically, with a two-way table used to test homogeneity, the rows indicate different populations and the columns indicate different categories or vice versa.
For a test of homogeneity, the central question is whether the category proportions are the same for all of the populations
![Page 22: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/22.jpg)
22 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity in a Two-Way TableWhen the row indicates the population, the expected count for a cell is simply the overall proportion (over all populations) that have the category times the number in the population. To illustrate:
Contacts Glasses NoneRow Marginal
Total
Female 5 9 11 25Male 5 22 27 54
Column Marginal Total
10 31 38 7910
79= overall proportion of students using contacts
54 = total number of male students
1054 6.83
79 = expected number of males that use
contacts as primary vision correction
1054 6.83
79 = expected number of males that use
contacts as primary vision correction
![Page 23: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/23.jpg)
23 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity in a Two-Way Table
The expected values for each cell represent what would be expected if there is no difference between the groups under study can be found easily by using the following formula.
(Row total)(Column total)Expected cell count =
Grand total
![Page 24: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/24.jpg)
24 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity in a Two-Way Table
Contacts Glasses None
Row Marginal
Total
5 9 11
5 22 27
Column Marginal
Total10 31 38 79
Female
Male
25
54
25 10
79
25 31
79
25 38
79
54 10
79
54 31
79
54 38
79
![Page 25: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/25.jpg)
25 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Tests for Homogeneity in a Two-Way Table
Expected counts are in parentheses.
Contacts Glasses None
Row Marginal
Total
5 9 11(3.16) (9.81) (12.03)
5 22 27(6.84) (21.19) (25.97)
Column Marginal
Total10 31 38 79
Female25
Male54
![Page 26: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/26.jpg)
26 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Comparing Two or More Populations Using the 2 Statistic
Hypotheses:
H0: The true category proportions are the same for all of the populations (homogeneity of populations).
Ha: The true category proportions are not all the same for all of the populations.
![Page 27: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/27.jpg)
27 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Comparing Two or More Populations Using the 2 Statistic
Test statistic:
22 (observed cell count - expected cell count)expected cell count
Test statistic:
22 (observed cell count - expected cell count)expected cell count
(Row total)(Column total)Expected cell count =
Grand total
The expected cell counts are estimated from the sample data (assuming that H0 is true) using the formula
(Row total)(Column total)Expected cell count =
Grand total
The expected cell counts are estimated from the sample data (assuming that H0 is true) using the formula
![Page 28: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/28.jpg)
28 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Comparing Two or More Populations Using the 2 Statistic
P-value: When H0 is true, 2 has approximately a chi-squared distribution with
df = (number of rows - 1)(number of columns - 1)
The P-value associated with the computed test statistic value is the area to the right of c2 under the chi-squared curve with the appropriate df.
![Page 29: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/29.jpg)
29 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Comparing Two or More Populations Using the 2 Statistic
Assumptions:
1. The data consists of independently chosen random samples.
2. The sample size is large: all expected counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.
![Page 30: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/30.jpg)
30 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
The following data come from a clinical trial of a drug regime used in treating a type of cancer, lymphocytic lymphomas.* Patients (273) were randomly divided into two groups, with one group of patients receiving cytoxan plus prednisone (CP) and the other receiving BCNU plus prednisone (BP). The responses to treatment were graded on a qualitative scale. The two-way table summary of the results is on the following slide.
* Ezdinli, E., S., Berard, C. W., et al. (1976) Comparison of intensive versus moderate chemotherapy of lympocytic lymphomas: a progress report. Cancer, 38, 1060-1068.
![Page 31: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/31.jpg)
31 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Set up and perform an appropriate hypothesis test at the 0.05 level of significance.
Complete Response
Partial Response
No Change Progression
Row Marginal
Total
26 51 21 4031 59 11 34
Column Marginal
Total57 110 32 74 273
BPCP
138135
![Page 32: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/32.jpg)
32 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Hypotheses:
H0: The true response to treatment proportions are the same for both treatments (homogeneity of populations).
Ha: The true response to treatment proportions are not all the same for both treatments.
Example
Significance level: = 0.05
Test statistic:
22 (observed cell count - expected cell count)expected cell count
Test statistic:
22 (observed cell count - expected cell count)expected cell count
![Page 33: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/33.jpg)
33 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Assumptions: All expected cell counts are at least 5, and samples were chosen independently so the 2 test is appropriate.
Complete Response
Partial Response
No Change Progression
Row Marginal
Total
26 51 21 40(28.81) (55.60) (16.18) (37.41)
31 59 11 34(28.19) (54.40) (15.82) (36.59)
Column Marginal
Total57 110 32 74 273
Female138
Male135
![Page 34: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/34.jpg)
34 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleCalculations:
The two-way table for this example has 2 rows and 4 columns, so the appropriate df is (2-1)(4-1) = 3. Since 4.60 < 6.25, the P-value > 0.10 > = 0.05 so H0 is not rejected. There is not sufficient evidence to conclude that the responses are different for the two treatments.
2 2 2 2
2
2 2 2 2
26 28.81 51 55.60 21 16.18 40 37.41
28.81 55.60 16.18 37.41
31 28.19 59 54.40 11 15.82 34 36.59
28.19 54.40 15.82 36.590.275+0.381+1.439+0.180+0.281+0.390+1.471+0.184
4.60
![Page 35: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/35.jpg)
35 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Comparing Two or More Populations Using the 2 Statistic
P-value: When H0 is true, 2 has approximately a chi-squared distribution with
df = (number of rows - 1)(number of columns - 1)
(Row total)(Column total)Expected cell count =
Grand total
The P-value associated with the computed test statistic value is the area to the right of c2 under the chi-squared curve with the appropriate df.
(Row total)(Column total)Expected cell count =
Grand total
The P-value associated with the computed test statistic value is the area to the right of c2 under the chi-squared curve with the appropriate df.
![Page 36: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/36.jpg)
36 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example II
The following data come from a study of the length of stay of 336 patients in psychiatric observation ward prior to be sent to another location. The observed variables are length of stay and four categories combining gender and status of admission.
![Page 37: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/37.jpg)
37 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example II
Male voluntary
Female voluntary
Male certified
Fermal Certified Total
1 5 9 4 11 292 16 25 18 18 773 20 34 20 28 1024 10 17 6 8 415 5 15 1 12 336 3 8 0 5 167 3 7 0 5 158 5 11 1 6 23
Total 67 126 50 93 336
Nu
mb
er
of d
ays
in
the
wa
rd
Type of patient
![Page 38: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/38.jpg)
38 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example III
The following data come from a study of Hodgkin's disease, a cancer of the lymph nodes. The observed variables are histological type and response to treatment 3 months after it had begun.
LP = lymphocyte dominanceNS = Nodular sclerosisMC = mixed cellurarityLD = lymphocyte depletion Positive Partial None
LP 74 18 12 104NS 68 16 12 96MC 154 54 58 266LD 18 10 44 72
Total 314 98 126 538
Response
His
tolo
gic
al t
ype
![Page 39: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/39.jpg)
39 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Hypotheses:
H0: The two variables are independent.
Ha: The two variables are not independent.
2 Test for Independence
The 2 test statistic and procedures can also be used to investigate the association between tow categorical variable in a single population.
![Page 40: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/40.jpg)
40 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
2 Test for Independence
Test statistic:
22 (observed cell count - expected cell count)expected cell count
Test statistic:
22 (observed cell count - expected cell count)expected cell count
(Row total)(Column total)Expected cell count =
Grand total
The expected cell counts are estimated from the sample data (assuming that H0 is true) using the formula
(Row total)(Column total)Expected cell count =
Grand total
The expected cell counts are estimated from the sample data (assuming that H0 is true) using the formula
![Page 41: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/41.jpg)
41 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
P-value: When H0 is true, 2 has approximately a chi-squared distribution with
df = (number of rows - 1)(number of columns - 1)
2 Test for Independence
(Row total)(Column total)Expected cell count =
Grand total
The P-value associated with the computed test statistic value is the area to the right of c2 under the chi-squared curve with the appropriate df.
(Row total)(Column total)Expected cell count =
Grand total
The P-value associated with the computed test statistic value is the area to the right of c2 under the chi-squared curve with the appropriate df.
![Page 42: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/42.jpg)
42 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Assumptions:
1. The observed counts are from a random sample.
2. The sample size is large: all expected counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.
2 Test for Independence
![Page 43: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/43.jpg)
43 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Consider the two categorical variables, gender and principle form of vision correction for the sample of students used earlier in this presentation.
We shall now test to see if the gender and the principle form of vision correction are independent.
Contacts Glasses NoneRow Marginal
Total
Female 5 9 11 25Male 5 22 27 54
Column Marginal Total
10 31 38 79
![Page 44: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/44.jpg)
44 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Hypotheses:
H0: Gender and principle method of vision correction are independent.
Ha: Gender and principle method of vision correction are not independent.
Significance level: We have not chosen one, so we shall look at the practical significance level.
Test statistic:
22 (observed cell count - expected cell count)expected cell count
Test statistic:
22 (observed cell count - expected cell count)expected cell count
![Page 45: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/45.jpg)
45 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Assumptions:We are assuming that the sample of students was randomly chosen.
All expected cell counts are at least 5, and samples were chosen independently so the 2 test is appropriate.
Contacts Glasses None
Row Marginal
Total
5 9 11(3.16) (9.81) (12.03)
5 22 27(6.84) (21.19) (25.97)
Column Marginal
Total10 31 38 79
Female25
Male54
![Page 46: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/46.jpg)
46 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Assumptions:Notice that the expected count is less than 5 in the cell corresponding to Female and Contacts. So that we should combine the columns for Contacts and Glasses to get
Contacts or Glasses None
Row Marginal
Total
14 11
27 27
Column Marginal
Total41 38 79
Female25
Male54
4125
79
38 25
79
4154
79
38 54
79
Contacts or Glasses None
Row Marginal
Total
14 11(12.97) (12.03)
27 27(28.03) (25.97)
Column Marginal
Total41 38 79
Female 25
Male 54
![Page 47: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/47.jpg)
47 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleCalculations:
The contingency table for this example has 2 rows and 2 columns, so the appropriate df is (2-1)(2-1) = 1. Since 0.246 < 2.70, the P-value is substantially greater than 0.10. H0 would not be rejected for any reasonable significance level. There is not sufficient evidence to conclude that the gender and vision correction are related. (I.e., For all practical purposes, one would find it reasonable to assume that gender and need for vision correction are independent.
2 2 2 2
2 14 12.97 11 12.03 27 28.03 27 25.97
12.97 12.03 28.03 25.970.081+0.087+0.038+0.040
0.246
![Page 48: Chapter 12](https://reader036.fdocuments.us/reader036/viewer/2022081513/568150eb550346895dbf04bb/html5/thumbnails/48.jpg)
48 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
ExampleMinitab would provide the following output if the frequency table was input as shown.
Chi-Square Test: Contacts or Glasses, None
Expected counts are printed below observed counts
Contacts None Total 1 14 11 25 12.97 12.03
2 27 27 54 28.03 25.97
Total 41 38 79
Chi-Sq = 0.081 + 0.087 + 0.038 + 0.040 = 0.246DF = 1, P-Value = 0.620