8/3/2019 Ken Black QA 5th chapter 12 Solution
1/36
Chapter 12: Analysis of Categorical Data 1
Chapter 12Analysis of Categorical Data
LEARNING OBJECTIVES
This chapter presents several nonparametric statistics that can be used to analyze data
enabling you to:
1. Understand the chi-square goodness-of-fit test and how to use it.
2. Analyze data using the chi-square test of independence.
CHAPTER TEACHING STRATEGY
Chapter 12 is a chapter containing the two most prevalent chi-square tests: chi-
square goodness-of-fit and chi-square test of independence. These two techniques areimportant because they give the statistician a tool that is particularly useful for analyzing
nominal data (even though independent variable categories can sometimes have ordinal
or higher categories). It should be emphasized that there are many instances in businessresearch where the resulting data gathered are merely categorical identification. For
example, in segmenting the market place (consumers or industrial users), information is
gathered regarding gender, income level, geographical location, political affiliation,
religious preference, ethnicity, occupation, size of company, type of industry, etc. Onthese variables, the measurement is often a tallying of the frequency of occurrence of
individuals, items, or companies in each category. The subject of the research is given no
"score" or "measurement" other than a 0/1 for being a member or not of a given category.These two chi-square tests are perfectly tailored to analyze such data.
8/3/2019 Ken Black QA 5th chapter 12 Solution
2/36
Chapter 12: Analysis of Categorical Data 2
The chi-square goodness-of-fit test examines the categories of one variable to
determine if the distribution of observed occurrences matches some expected ortheoretical distribution of occurrences. It can be used to determine if some standard or
previously known distribution of proportions is the same as some observed distribution of
proportions. It can also be used to validate the theoretical distribution of occurrences ofphenomena such as random arrivals that are often assumed to be Poisson distributed.
You will note that the degrees of freedom, k- 1 for a given set of expected values or for
the uniform distribution, change to k- 2 for an expected Poisson distribution and to k- 3for an expected normal distribution. To conduct a chi-square goodness-of-fit test to
analyze an expected Poisson distribution, the value of lambda must be estimated from the
observed data. This causes the loss of an additional degree of freedom. With the normal
distribution, both the mean and standard deviation of the expected distribution areestimated from the observed values causing the loss of two additional degrees of freedom
from the k- 1 value.
The chi-square test of independence is used to compare the observed frequenciesalong the categories of two independent variables to expected values to determine if the
two variables are independent or not. Of course, if the variables are not independent,they are dependent or related. This allows business researchers to reach some
conclusions about such questions as: is smoking independent of gender or is type of
housing preferred independent of geographic region? The chi-square test of
independence is often used as a tool for preliminary analysis of data gathered inexploratory research where the researcher has little idea of what variables seem to be
related to what variables, and the data are nominal. This test is particularly useful with
demographic type data.
A word of warning is appropriate here. When an expected frequency is small, the
observed chi-square value can be inordinately large thus yielding an increased possibilityof committing a Type I error. The research on this problem has yielded varying results
with some authors indicating that expected values as low as two or three are acceptable
and other researchers demanding that expected values be ten or more. In this text, wehave settled on the fairly widespread accepted criterion of five or more.
8/3/2019 Ken Black QA 5th chapter 12 Solution
3/36
Chapter 12: Analysis of Categorical Data 3
CHAPTER OUTLINE
12.1 Chi-Square Goodness-of-Fit Test
Testing a Population Proportion Using the Chi-square Goodness-of-Fit
Test as an Alternative Technique to the z Test
12.2 Contingency Analysis: Chi-Square Test of Independence
KEY TERMS
Categorical Data Chi-Square Test of Independence
Chi-Square Distribution Contingency Analysis
Chi-Square Goodness-of-Fit Test Contingency Table
8/3/2019 Ken Black QA 5th chapter 12 Solution
4/36
Chapter 12: Analysis of Categorical Data 4
SOLUTIONS TO THE ODD-NUMBERED PROBLEMS IN CHAPTER 12
12.1 f0 fee
eo
fff
2
)(
53 68 3.30937 42 0.595
32 33 0.030
28 22 1.63618 10 6.400
15 8 6.125
Ho: The observed distribution is the same
as the expected distribution.
Ha: The observed distribution is not the sameas the expected distribution.
Observed
=e
e
f
ff 202 )( = 18.095
df = k- 1 = 6 - 1 = 5, = .05
2.05,5 = 11.0705
Since the observed 2 = 18.095 > 2.05,5 = 11.0705, the decision is to reject thenull hypothesis.
The observed frequencies are not distributed the same as the expected
frequencies.
8/3/2019 Ken Black QA 5th chapter 12 Solution
5/36
Chapter 12: Analysis of Categorical Data 5
12.2 f0 fee
eo
f
ff 2)(
19 18 0.056
17 18 0.05614 18 0.889
18 18 0.00019 18 0.056
21 18 0.50018 18 0.000
18 18 0.000
fo = 144 fe = 144 1.557
Ho: The observed frequencies are uniformly distributed.
Ha: The observed frequencies are not uniformly distributed.
8
1440==
k
fx = 18
In this uniform distribution, eachfe = 18
df = k 1 = 8 1 = 7, = .01
2.01,7 = 18.4753
Observed
=
e
e
f
ff 202 )( = 1.557
Since the observed 2 = 1.557 < 2.01,7 = 18.4753, the decision is to fail toreject
the null hypothesis
There is no reason to conclude that the frequencies are not uniformly
distributed.
8/3/2019 Ken Black QA 5th chapter 12 Solution
6/36
Chapter 12: Analysis of Categorical Data 6
12.3 Number f0 (Number)(f0)
0 28 01 17 17
2 11 22
3 _5 1561 54
Ho: The frequency distribution is Poisson.Ha: The frequency distribution is not Poisson.
=61
54=0.9
Expected Expected
Number Probability Frequency
0 .4066 24.803
1 .3659 22.3202 .1647 10.047
> 3 .0628 3.831
Sincefe for > 3 is less than 5, collapse categories 2 and >3:
Number fo fee
eo
f
ff 2)(
0 28 24.803 0.4121 17 22.320 1.268
>2 16 13.878 0.324
61 60.993 2.004
df = k- 2 = 3 - 2 = 1, = .05
2.05,1 = 3.8415
Observed
=e
e
f
ff 202 )( = 2.001
Since the observed 2 = 2.001 < 2.05,1 = 3.8415, the decision is to fail to reject
the null hypothesis.
There is insufficient evidence to reject the distribution as Poisson distributed.The conclusion is that the distribution is Poisson distributed.
8/3/2019 Ken Black QA 5th chapter 12 Solution
7/36
Chapter 12: Analysis of Categorical Data 7
12.4
Category f(observed) Midpt. fm fm2
10-20 6 15 90 1,350
20-30 14 25 350 8,75030-40 29 35 1,015 35,525
40-50 38 45 1,710 76,950
50-60 25 55 1,375 75,62560-70 10 65 650 42,250
70-80 7 75 525 39,375
n = f= 129 fm = 5,715 fm2 = 279,825
129
715,5==
f
fmx = 44.3
s =
128
129
)715,5(825,279
1
)( 22
2
=
n
n
fMfM = 14.43
Ho: The observed frequencies are normally distributed.
Ha: The observed frequencies are not normally distributed.
For Category 10 - 20 Prob
z =43.14
3.4410
= -2.38 .4913
z =43.14
3.4420 = -1.68 - .4535
Expected prob.: .0378
For Category 20-30 Prob
forx = 20, z= -1.68 .4535
z=43.14
3.4430 = -0.99 -.3389
Expected prob: .1146
8/3/2019 Ken Black QA 5th chapter 12 Solution
8/36
Chapter 12: Analysis of Categorical Data 8
For Category 30 - 40 Prob
forx = 30, z= -0.99 .3389
z =
43.14
3.4440 = -0.30 -.1179
Expected prob: .2210
For Category 40 - 50 Prob
forx = 40, z= -0.30 .1179
z =43.14
3.4450 = 0.40 +.1554
Expected prob: .2733
For Category 50 - 60 Prob
z =43.14
3.4460 = 1.09 .3621
forx = 50, z= 0.40 -.1554Expected prob: .2067
For Category 60 - 70 Prob
z =43.14
3.4470
= 1.78 .4625
forx = 60, z= 1.09 -.3621Expected prob: .1004
For Category 70 - 80 Prob
z =43.14
3.4480 = 2.47 .4932
forx = 70, z= 1.78 -.4625
Expected prob: .0307
Forx < 10:
Probability between 10 and the mean, 44.3, = (.0378 + .1145 + .2210
+ .1179) = .4913. Probability < 10 = .5000 - .4912 = .0087
8/3/2019 Ken Black QA 5th chapter 12 Solution
9/36
Chapter 12: Analysis of Categorical Data 9Forx > 80:
Probability between 80 and the mean, 44.3, = (.0307 + .1004 + .2067 + .1554) =.4932. Probability > 80 = .5000 - .4932 = .0068
Category Prob expected frequency< 10 .0087 .0087(129) = 1.12
10-20 .0378 .0378(129) = 4.88
20-30 .1146 14.7830-40 .2210 28.51
40-50 .2733 35.26
50-60 .2067 26.66
60-70 .1004 12.9570-80 .0307 3.96
> 80 .0068 0.88
Due to the small sizes of expected frequencies, category < 10 is folded into 10-20and >80 into 70-80.
Category fo fee
eo
f
ff 2)(
10-20 6 6.00 .000
20-30 14 14.78 .04130-40 29 28.51 .008
40-50 38 35.26 .213
50-60 25 26.66 .103
60-70 10 12.95 .67270-80 7 4.84 .964
2.001
Calculated
=e
e
f
ff 202 )( = 2.001
df = k- 3 = 7 - 3 = 4, = .05
2.05,4 = 9.4877
Since the observed 2 = 2.004 < 2.05,4 = 9.4877, the decision is to fail to rejectthe null hypothesis. There is not enough evidence to declare that the observed
frequencies are not normally distributed.
8/3/2019 Ken Black QA 5th chapter 12 Solution
10/36
Chapter 12: Analysis of Categorical Data 10
12.5 Definition fo Exp.Prop. fee
eo
f
ff 2)(
Happiness 42 .39 227(.39)= 88.53 24.46Sales/Profit 95 .12 227(.12)= 27.24 168.55
Helping Others 27 .18 40.86 4.70Achievement/
Challenge 63 .31 70.37 0.77227 198.48
Ho: The observed frequencies are distributed the same as the expectedfrequencies.
Ha: The observed frequencies are not distributed the same as the expectedfrequencies.
Observed 2 = 198.48
df = k 1 = 4 1 = 3, = .05
2.05,3 = 7.8147
Since the observed 2 = 198.48 > 2.05,3 = 7.8147, the decision is to reject thenull hypothesis.
The observed frequencies for men are not distributed the same as the
expected frequencies which are based on the responses of women.
8/3/2019 Ken Black QA 5th chapter 12 Solution
11/36
Chapter 12: Analysis of Categorical Data 11
12.6 Age fo Prop. from survey fee
eo
f
ff 2)(
10-14 22 .09 (.09)(212)=19.08 0.4515-19 50 .23 (.23)(212)=48.76 0.03
20-24 43 .22 46.64 0.2825-29 29 .14 29.68 0.02
30-34 19 .10 21.20 0.23> 35 49 .22 46.64 0.12
212 1.13
Ho: The distribution of observed frequencies is the same as the distribution of
expected frequencies.
Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.
= .01, df = k- 1 = 6 - 1 = 5
2.01,5 = 15.0863
The observed 2 = 1.13
Since the observed 2 = 1.13 < 2.01,5 = 15.0863, the decision is to fail to rejectthe null hypothesis.
There is not enough evidence to declare that the distribution of observedfrequencies is different from the distribution of expected frequencies.
8/3/2019 Ken Black QA 5th chapter 12 Solution
12/36
Chapter 12: Analysis of Categorical Data 1212.7 Age fo m fm fm
2
10-20 16 15 240 3,600
20-30 44 25 1,100 27,50030-40 61 35 2,135 74,725
40-50 56 45 2,520 113,400
50-60 35 55 1,925 105,87560-70 19 65 1,235 80,275
231 fm = 9,155 fm2 = 405,375
231
155,9==
n
fMx = 39.63
s =
230
231
)155,9(375,405
1
)( 222
=
n
n
fMfM
= 13.6
Ho: The observed frequencies are normally distributed.
Ha: The observed frequencies are not normally distributed.
For Category 10-20 Prob
z =6.13
63.3910 = -2.18 .4854
z =6.13
63.3920 = -1.44 -.4251
Expected prob. .0603
For Category 20-30 Prob
forx = 20, z= -1.44 .4251
z =6.13
63.3930 = -0.71 -.2611
Expected prob. .1640
For Category 30-40 Prob
forx = 30, z= -0.71 .2611
z =6.13
63.3940 = 0.03 +.0120
Expected prob. .2731
8/3/2019 Ken Black QA 5th chapter 12 Solution
13/36
Chapter 12: Analysis of Categorical Data 13
For Category 40-50 Prob
z =6.13
63.3950 = 0.76 .2764
forx = 40, z= 0.03 -.0120
Expected prob. .2644
For Category 50-60 Prob
z =6.13
63.3960 = 1.50 .4332
forx = 50, z= 0.76 -.2764
Expected prob. .1568
For Category 60-70 Prob
z =6.13
63.3970 = 2.23 .4871
forx = 60, z= 1.50 -.4332
Expected prob. .0539
For < 10:Probability between 10 and the mean = .0603 + .1640 + .2611 = .4854
Probability < 10 = .5000 - .4854 = .0146
For > 70:
Probability between 70 and the mean = .0120 + .2644 + .1568 + .0539 = .4871
Probability > 70 = .5000 - .4871 = .0129
8/3/2019 Ken Black QA 5th chapter 12 Solution
14/36
Chapter 12: Analysis of Categorical Data 14
Age Probability fe
< 10 .0146 (.0146)(231) = 3.3710-20 .0603 (.0603)(231) = 13.93
20-30 .1640 37.88
30-40 .2731 63.0940-50 .2644 61.08
50-60 .1568 36.22
60-70 .0539 12.45> 70 .0129 2.98
Categories < 10 and > 70 are less than 5.
Collapse the < 10 into 10-20 and > 70 into 60-70.
Age fo fee
eo
f
ff 2)(
10-20 16 17.30 0.1020-30 44 37.88 0.9930-40 61 63.09 0.07
40-50 56 61.08 0.42
50-60 35 36.22 0.0460-70 19 15.43 0.83
2.45
df = k- 3 = 6 - 3 = 3, = .05
2.05,3 = 7.8147
Observed 2 = 2.45
Since the observed 2 < 2.05,3 = 7.8147, the decision is to fail to reject the nullhypothesis.
There is no reason to reject that the observed frequencies are normally
distributed.
8/3/2019 Ken Black QA 5th chapter 12 Solution
15/36
Chapter 12: Analysis of Categorical Data 15
12.8 Number f (f) (number)0 18 01 28 28
2 47 94
3 21 63
4 16 645 11 55
6 or more 9 54
f= 150 f (number) = 358
=150
358=
f
numberf= 2.4
Ho: The observed frequencies are Poisson distributed.
Ha: The observed frequencies are not Poisson distributed.
Number Probability fe0 .0907 (.0907)(150) = 13.611 .2177 (.2177)(150) = 32.66
2 .2613 39.20
3 .2090 31.354 .1254 18.81
5 .0602 9.03
6 or more .0358 5.36
fo fe
0
2
0 )(
f
ff e
18 13.61 1.42
28 32.66 0.6647 39.20 1.55
21 31.35 3.42
16 18.81 0.42
11 9.03 0.439 5.36 2.47
10.37
The observed 2 = 10.37
= .01, df = k 2 = 7 2 = 5, 2.01,5 = 15.0863
Since the observed 2 = 10.37 < 2.01,5 = 15.0863, the decision is to fail to rejectthe null hypothesis.
There is not enough evidence to reject the claim that the observed
frequencies are Poisson distributed.
8/3/2019 Ken Black QA 5th chapter 12 Solution
16/36
Chapter 12: Analysis of Categorical Data 16
12.9 H0: p = .28 n = 270 x = 62
Ha: p .28
fo fe e
eo
f
ff 2)(
Spend More 62 270(.28) = 75.6 2.44656
Don't Spend More 208 270(.72) = 194.4 0.95144
Total 270 270.0 3.39800
The observed value of 2 is 3.398
= .05 and /2 = .025 df = k- 1 = 2 - 1 = 1
2.025,1 = 5.02389
Since the observed 2 = 3.398 < 2.025,1 = 5.02389, the decision is to fail toreject the null hypothesis.
12.10 H0: p = .30 n = 180 x= 42Ha: p < .30
f0 fee
eo
f
ff 2)(
Provide 42 180(.30) = 54 2.6666
Don't Provide 138 180(.70) = 126 1.1429
Total 180 180 3.8095
The observed value of 2 is 3.8095
= .05 df = k- 1 = 2 - 1 = 1
2.05,1 = 3.8415
Since the observed 2 = 3.8095 < 2.05,1 = 3.8415, the decision is to fail toreject the null hypothesis.
8/3/2019 Ken Black QA 5th chapter 12 Solution
17/36
Chapter 12: Analysis of Categorical Data 1712.11
Variable Two
Variable
One
203 326 529
17868 110
271 436 707
Ho: Variable One is independent of Variable Two.Ha: Variable One is not independent of Variable Two.
e11 =707
)271)(529(= 202.77 e12 =
707
)436)(529(= 326.23
e21 =707
)178)(271(= 68.23 e22 =
707
)178)(436(= 109.77
Variable Two
Variable
One
(202.77
)203
(326.23
)326
529
178(68.23)68
(109.77)
110
271 436
707
2 =77.202
)77.202203( 2+
23.326
)23.326326( 2+
23.68
)23.668( 2+
77.109
)77.109110( 2=
.00 + .00 + .00 + .00 = 0.00
= .01, df = (c-1)(r-1) = (2-1)(2-1) = 1
2.01,1 = 6.6349
Since the observed 2 = 0.00 < 2.01,1 = 6.6349, the decision is to fail to rejectthe null hypothesis.
Variable One is independent of Variable Two.
8/3/2019 Ken Black QA 5th chapter 12 Solution
18/36
Chapter 12: Analysis of Categorical Data 1812.12
VariableTwo
VariableOne
24 13 47 58 14258393 59 187 244
117 72 234 302 725
Ho: Variable One is independent of Variable Two.
Ha: Variable One is not independent of Variable Two.
e11 =725
)117)(142(= 22.92 e12 =
725
)72)(142(= 14.10
e13 =725
)234)(142(= 45.83 e14 =
725
)302)(142(= 59.15
e21 =725
)117)(583(= 94.08 e22 =
725
)72)(583(= 57.90
e23 =725
)234)(583(= 188.17 e24 =
725
)302)(583(= 242.85
Variable Two
Variable
One
(22.92
)24
(14.10
)13
(45.83)
47
(59.15)
58
142
583(94.08)
93
(57.90)
59
(188.17)
187
(242.85)244
117 72 234 302 725
2 =92.22
)92.2224( 2+
10.14
)10.1413( 2+
83.45
)83.4547( 2+
15.59
)15.5958( 2+
08.94
)08.9493( 2+
90.57
)90.5759( 2+
17.188
)17.188188( 2+
85.242)85.242244(
2
=
.05 + .09 + .03 + .02 + .01 + .02 + .01 + .01 = 0.24
= .01, df = (c-1)(r-1) = (4-1)(2-1) = 3, 2.01,3 = 11.3449
Since the observed 2 = 0.24 < 2.01,3 = 11.3449, the decision is to fail to
8/3/2019 Ken Black QA 5th chapter 12 Solution
19/36
Chapter 12: Analysis of Categorical Data 19reject the null hypothesis.
Variable One is independent of Variable Two.
12.13
Social Class
Number
of
Children
Lower Middle Upper
0
1
2 or 3>3
7 18 6 31
70
189108
9 38 23
34 97 58
47 31 30
97 184 117 398
Ho: Social Class is independent of Number of Children.Ha: Social Class is not independent of Number of Children.
e11 =398
)97)(31( = 7.56 e31 =398
)97)(189( = 46.06
e12 =398
)184)(31(= 14.3 e32 =
398
)184)(189(= 87.38
e13 =398
)117)(31(= 9.11 e33 =
398
)117)(189(= 55.56
e21 = 398
)97)(70(
= 17.06 e41 = 398
)97)(108(
= 26.32
e22 =398
)184)(70(= 32.36 e42 =
398
)184)(108(= 49.93
e23 =398
)117)(70(= 20.58 e43 =
398
)117)(108(= 31.75
Social Class
Number
of
Children
Lower Middle Upper
0
1
2 or 3
(7.56)
7
(14.33
)
18
(9.11)
6
31
70
189
(17.06
)9
(32.36
)38
(20.58
)23
8/3/2019 Ken Black QA 5th chapter 12 Solution
20/36
Chapter 12: Analysis of Categorical Data 20
>3
108
(46.06
)34
(87.38
)97
(55.56
)58
(26.32)
47
(49.93)
31
(31.75)
3097 184 117 398
2 =56.7
)56.77( 2+
33.14
)33.1418( 2+
11.9
)11.96( 2+
06.17
)06.179( 2+
36.32
)36.3238( 2+
58.20
)58.2023( 2+
06.46
)06.4634( 2+
38.87
)38.8797( 2+
56.55
)56.5558( 2
+ 32.26
)32.2647( 2
+ 93.49
)93.4931( 2
+ 75.31
)75.3130( 2
=
.04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 +
7.18 + .10 = 34.97
= .05, df = (c-1)(r-1) = (3-1)(4-1) = 6
2.05,6 = 12.5916
Since the observed
2
= 34.97 >
2
.05,6 = 12.5916, the decision is to reject thenull hypothesis.
Number of children is not independent of social class.
12.14
Type of Music Preferred
Region
Rock R&B Coun Clssic
195235
202632
NE 140 32 5 18S 134 41 52 8
W 154 27 8 13
428 100 65 39
Ho: Type of music preferred is independent of region.Ha: Type of music preferred is not independent of region.
8/3/2019 Ken Black QA 5th chapter 12 Solution
21/36
Chapter 12: Analysis of Categorical Data 21
e11 =632
)428)(195(= 132.6 e23 =
632
)65)(235(= 24.17
e12 =632
)100)(195(= 30.85 e24 =
632
)39)(235(= 14.50
e13 =632
)65)(195(= 20.06 e31 =
632
)428)(202(= 136.80
e14 =632
)39)(195(= 12.03 e32 =
632
)100)(202(= 31.96
e21 =632
)428)(235( = 159.15 e33 =
632
)65)(202(= 20.78
e22 =632
)100)(235(= 37.18 e34 =
632
)39)(202(= 12.47
Type of Music Preferred
Region
Rock R&B Coun Clssic
195235
202
632
NE (132.06
)
140
(30.85
)
32
(20.06
)
5
(12.03
)
18
S (159.15
)134
(37.18
)41
(24.17
)52
(14.50
)8
W (136.80
)154
(31.96
)27
(20.78
)8
(12.47
)13
428 100 65 39
2 =06.132
)06.132141( 2+
85.30
)85.3032( 2+
06.20
)06.205( 2+
03.12
)03.1218( 2
+
15.159
)15.159134( 2+ 18.37
)18.3741( 2+ 17.24
)17.2452( 2+ 50.14
)50.148( 2+
80.136
)80.136154( 2+
96.31
)96.3127( 2+
78.20
)78.208( 2+
47.12
)47.1213( 2
=
.48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 +
8/3/2019 Ken Black QA 5th chapter 12 Solution
22/36
Chapter 12: Analysis of Categorical Data 227.86 + .02 = 64.91
= .01, df = (c-1)(r-1) = (4-1)(3-1) = 6
2.01,6 = 16.8119
Since the observed 2 = 64.91 > 2.01,6 = 16.8119, the decision is toreject the null hypothesis.
Type of music preferred is not independent of region of the country.
12.15
Transportation Mode
Industr
y
Air Train Truck 85
35
120
Publishing 32 12 41
Comp.Hard. 5 6 24
37 18 65
H0: Transportation Mode is independent of Industry.
Ha: Transportation Mode is not independent of Industry.
e11 =120
)37)(85(= 26.21 e21 =
120
)37)(35(= 10.79
e12 =120
)18)(85(= 12.75 e22 =
120
)18)(35(= 5.25
e13 = 120
)65)(85(
= 46.04 e23 = 120
)65)(35(
= 18.96
Transportation Mode
Industry
Air Train Truck
85
35120
Publishing (26.21
)
32
(12.75
)
12
(46.04
)
41
Comp.Hard. (10.79
)5
(5.25)
6
(18.96
)24
37 18 65
2 =21.26
)21.2632( 2+
75.12
)75.1212( 2+
04.46
)04.4641( 2+
79.10
)79.105( 2+
25.5
)25.56( 2+
96.18
)96.1824( 2=
8/3/2019 Ken Black QA 5th chapter 12 Solution
23/36
Chapter 12: Analysis of Categorical Data 231.28 + .04 + .55 + 3.11 + .11 + 1.34 = 6.43
= .05, df = (c-1)(r-1) = (3-1)(2-1) = 2
2
.05,2 = 5.9915
Since the observed 2 = 6.43 > 2.05,2 = 5.9915, the decision is toreject the null hypothesis.
Transportation mode is not independent of industry.
12.16
Number of Bedrooms
Number of
Stories
< 2 3 > 4
274
575
1 116 101 57
2 90 325 160206 426 217 849
H0: Number of Stories is independent of number of bedrooms.
Ha: Number of Stories is not independent of number of bedrooms.
e11 =849
)206)(274(= 66.48 e21 =
849
)206)(575(= 139.52
e12 = 849
)426)(274(
= 137.48 e22
= 849
)426)(575(
= 288.52
e13 =849
)217)(274(= 70.03 e23 =
849
)217)(575(= 146.97
2 =52.139
)52.13990( 2+
48.137
)48.137101( 2+
03.70
)03.7057( 2+
52.139
)52.13990( 2+
52.288)52.288325(
2
+97.146
)97.146160(
2
=
2 = 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34
= .10 df = (c-1)(r-1) = (3-1)(2-1) = 2
2.10,2 = 4.6052
8/3/2019 Ken Black QA 5th chapter 12 Solution
24/36
Chapter 12: Analysis of Categorical Data 24
Since the observed 2 = 72.34 > 2.10,2 = 4.6052, the decision is toreject the null hypothesis.
Number of stories is not independent of number of bedrooms.
8/3/2019 Ken Black QA 5th chapter 12 Solution
25/36
Chapter 12: Analysis of Categorical Data 2512.17
Mexican Citizens
Typeof
Store
Yes No
4135
3060
Dept. 24 17
Disc. 20 15
Hard. 11 19Shoe 32 28
87 79 166
Ho: Citizenship is independent of store type
Ha: Citizenship is not independent of store type
e11 =166
)87)(41(= 21.49 e31 =
166
)87)(30(= 15.72
e12 =166
)79)(41(= 19.51 e32 =
166
)79)(30(= 14.28
e21 =166
)87)(35(= 18.34 e41 =
166
)87)(60(= 31.45
e22 =166
)79)(35(= 16.66 e42 =
166
)79)(60(= 28.55
Mexican Citizens
Type
ofStore
Yes No
41
35
30
60
Dept. (21.49
)24
(19.51
)17
Disc. (18.34
)20
(16.66
)15
Hard. (15.72)
11
(14.28)
19
Shoe (31.45)
32
(28.55)
28
87 79 166
2 =49.21
)49.2124( 2+
51.19
)51.1917( 2+
34.18
)34.1820( 2+
66.16
)66.1615( 2
+
8/3/2019 Ken Black QA 5th chapter 12 Solution
26/36
Chapter 12: Analysis of Categorical Data 26
72.15
)72.1511( 2+
28.14
)28.1419( 2+
45.31
)45.3132( 2+
55.28
)55.2828( 2
=
.29 + .32 + .15 + .17 + 1.42 + 1.56 + .01 + .01 = 3.93
= .05, df = (c-1)(r-1) = (2-1)(4-1) = 3
2.05,3 = 7.8147
Since the observed 2 = 3.93 < 2.05,3 = 7.8147, the decision is to fail toreject the null hypothesis.
Citizenship is independent of type of store.
12.18 = .01, k= 7, df = 6
H0: The observed distribution is the same as the expected distributionHa: The observed distribution is not the same as the expected distribution
Use:
=e
e
f
ff 202 )(
critical 2.01,6 = 16.8119
fo fe (f0-fe)2
e
eo
f
ff 2)(
214 206 64 0.311
235 232 9 0.039279 268 121 0.451
281 284 9 0.032
264 268 16 0.060254 232 484 2.086
211 206 25 0.121
3.100
=e
e
f
ff 202 )( = 3.100
8/3/2019 Ken Black QA 5th chapter 12 Solution
27/36
Chapter 12: Analysis of Categorical Data 27
Since the observed value of 2 = 3.1 < 2.01,6 = 16.8119, the decision is to failto
reject the null hypothesis. The observed distribution is not different from the
expected distribution.
12.19
Variable 2
Variable 1
12 23 21 56
8 17 20 45
7 11 18 36
27 51 59 137
e11 = 11.04 e12 = 20.85 e13 = 24.12
e21 = 8.87 e22 = 16.75 e23 = 19.38
e31 = 7.09 e32 = 13.40 e33 = 15.50
2 =04.11
)04.1112( 2+
85.20
)85.2023( 2+
12.24
)12.2421( 2+
87.8
)87.88( 2+
75.16
)75.1617( 2+
38.19
)38.1920( 2+
09.7
)09.77( 2+
40.13
)40.1311( 2+
50.15
)50.1518( 2 =
.084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652
df = (c-1)(r-1) = (2)(2) = 4 = .05
2.05,4 = 9.4877
Since the observed value of
2
= 1.652 <
2
.05,4 = 9.4877, the decision is to failto reject the null hypothesis.
8/3/2019 Ken Black QA 5th chapter 12 Solution
28/36
Chapter 12: Analysis of Categorical Data 28
12.20
Location
NE W S
Customer Industrial 230 115 68 413
Retail 185 143 89 417415 258 157 830
e11 =830
)415)(413(= 206.5 e21 =
830
)415)(417(= 208.5
e12 =830
)258)(413(= 128.38 e22 =
830
)258)(417(= 129.62
e13
= 830
)157)(413(
= 78.12 e23
= 830
)157)(417(
= 78.88
Location
NE W S
Customer Industrial (206.5)
230
(128.38)
115
(78.12)
68
413
Retail (208.5
)
185
(129.62
)
143
(78.88
)
89
417
415 258 157 830
2 =5.206
)5.206230( 2+
38.128
)38.128115( 2+
12.78
)12.7868( 2+
5.208
)5.208185( 2+
62.129
)62.129143( 2+
88.78
)88.7889( 2=
2.67 + 1.39 + 1.31 + 2.65 + 1.38 + 1.30 = 10.70
= .10 and df = (c - 1)(r- 1) = (3 - 1)(2 - 1) = 2
2.10,2 = 4.6052
Since the observed 2 = 10.70 > 2.10,2 = 4.6052, the decision is to reject thenull hypothesis.
8/3/2019 Ken Black QA 5th chapter 12 Solution
29/36
Chapter 12: Analysis of Categorical Data 29Type of customer is not independent of geographic region.
12.21 Cookie Type foChocolate Chip 189
Peanut Butter 168
Cheese Cracker 155Lemon Flavored 161
Chocolate Mint 216
Vanilla Filled 165
fo = 1,054
Ho: Cookie Sales is uniformly distributed across kind of cookie.Ha: Cookie Sales is not uniformly distributed across kind of cookie.
If cookie sales are uniformly distributed, thenfe =6
054,1
.
0=
kindsno
f= 175.67
fo fee
eo
f
ff 2)(
189 175.67 1.01168 175.67 0.33
155 175.67 2.43
161 175.67 1.23216 175.67 9.26
165 175.67 0.65
14.91
The observed 2 = 14.91
= .05 df = k- 1 = 6 - 1 = 5
2.05,5 = 11.0705
Since the observed 2 = 14.91 > 2.05,5 = 11.0705, the decision is to reject thenull hypothesis.
Cookie Sales is not uniformly distributed by kind of cookie.
8/3/2019 Ken Black QA 5th chapter 12 Solution
30/36
Chapter 12: Analysis of Categorical Data 30
12.22
Gender
M F
Bought
Car
Y 207 65 272
N 811 984 1,7951,018 1,049 2,067
Ho: Purchasing a car or not is independent of gender.Ha: Purchasing a car or not is not independent of gender.
e11 =067,2
)018,1)(272(= 133.96 e12 =
067,2
)049,1)(27(= 138.04
e21 =067,2
)018,1)(795,1(= 884.04 e22 =
067,2
)049,1)(795,1(= 910.96
Gender
M F
Bought
Car
Y (133.96
)
207
(138.04
)
65
272
N (884.04
)811
(910.96
)984
1,795
1,018 1,049 2,067
2 =96.133
)96.133207( 2+
04.138
)04.13865( 2+
04.884
)04.884811( 2+
96.910
)96.910984( 2= 39.82 + 38.65 + 6.03 + 5.86 = 90.36
= .05 df = (c-1)(r-1) = (2-1)(2-1) = 1
2.05,1 = 3.8415
Since the observed 2 = 90.36 > 2.05,1 = 3.8415, the decision is to reject thenull hypothesis.
Purchasing a car is not independent of gender.
8/3/2019 Ken Black QA 5th chapter 12 Solution
31/36
Chapter 12: Analysis of Categorical Data 3112.23 Arrivals fo (fo)(Arrivals)
0 26 0
1 40 402 57 114
3 32 96
4 17 685 12 60
6 8 48
fo = 192 (fo)(arrivals) = 426
=192
426))((
0
0=
f
arrivalsf= 2.2
Ho: The observed frequencies are Poisson distributed.
Ha: The observed frequencies are not Poisson distributed.
Arrivals Probability fe
0 .1108 (.1108)(192) = 21.27
1 .2438 (.2438)(192) = 46.812 .2681 51.48
3 .1966 37.75
4 .1082 20.775 .0476 9.14
6 .0249 4.78
fo fe
e
eo
f
ff 2)(
26 21.27 1.05
40 46.81 0.99
57 51.48 0.5932 37.75 0.88
17 20.77 0.68
12 9.14 0.89
8 4.78 2.177.25
Observed 2 = 7.25
= .05 df = k- 2 = 7 - 2 = 5
2.05,5 = 11.0705
Since the observed 2 = 7.25 < 2.05,5 = 11.0705, the decision is to fail to rejectthe null hypothesis. There is not enough evidence to reject the claim that the
observed frequency of arrivals is Poisson distributed.
8/3/2019 Ken Black QA 5th chapter 12 Solution
32/36
Chapter 12: Analysis of Categorical Data 32
12.24 Ho: The distribution of observed frequencies is the same as the
distribution of expected frequencies.Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.
Soft Drink fo proportions fee
eo
f
ff 2)(
Classic Coke 314 .179 (.179)(1726) = 308.95 0.08
Pepsi 219 .115 (.115)(1726) = 198.49 2.12Diet Coke 212 .097 167.42 11.87
Mt. Dew 121 .063 108.74 1.38
Diet Pepsi 98 .061 105.29 0.50Sprite 93 .057 98.32 0.29
Dr. Pepper 88 .056 96.66 0.78
Others 581 .372 642.07 5.81fo = 1,726 22.83
Observed 2 = 22.83
= .05 df = k- 1 = 8 - 1 = 7
2.05,7 = 14.0671
Since the observed 2 = 22.83 > 2.05,6 = 14.0671, the decision is to reject thenull hypothesis.
The observed frequencies are not distributed the same as the expected frequenciesfrom the national poll.
8/3/2019 Ken Black QA 5th chapter 12 Solution
33/36
Chapter 12: Analysis of Categorical Data 33
12.25
Position
Manager Programmer Operator
Systems
Analyst
Years
0-3 6 37 11 13 67
4-8 28 16 23 24 91
> 8 47 10 12 19 88
81 63 46 56 246
e11 =246
)81)(67(= 22.06 e23 =
246
)46)(91(= 17.02
e12 =246
)63)(67( = 17.16 e24 =246
)56)(91( = 20.72
e13 =246
)46)(67(= 12.53 e31 =
246
)81)(88(= 28.98
e14 =246
)56)(67(= 15.25 e32 =
246
)63)(88(= 22.54
e21 =246
)81)(91(= 29.96 e33 =
246
)46)(88(= 16.46
e22 =246
)63)(91(= 23.30 e34 =
246
)56)(88(= 20.03
Position
Manager Programmer Operato
r
Systems
Analyst
Years
0-3 (22.06)
6
(17.16)
37
(12.53)
11
(15.25)
13
67
4-8 (29.96)
28
(23.30)
16
(17.02)
23
(20.72)
24
91
> 8 (28.98)
47
(22.54)
10
(16.46)
12
(20.03)
19
88
81 63 46 56 246
8/3/2019 Ken Black QA 5th chapter 12 Solution
34/36
Chapter 12: Analysis of Categorical Data 34
2 =06.22
)06.226( 2+
16.17
)16.1737( 2+
53.12
)53.1211( 2+
25.15
)25.1513( 2+
96.29
)96.2928( 2+
30.23
)30.2316( 2+
02.17
)02.1723( 2+
72.20
)72.2024( 2+
98.28
)98.2847( 2+
54.22
)54.2210( 2+
46.16
)46.1612( 2+
03.20
)03.2019( 2=
11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 +
1.21 + .05 = 59.63
= .01 df = (c-1)(r-1) = (4-1)(3-1) = 6
2.01,6 = 16.8119
Since the observed 2 = 59.63 > 2.01,6 = 16.8119, the decision is to reject thenull hypothesis. Position is not independent of number of years of experience.
12.26 H0: p = .43 n = 315 =.05
Ha: p .43 x = 120 /2 = .025
fo fee
eo
fff
2
)(
More Work,
More Business 120 (.43)(315) = 135.45 1.76
Others 195 (.57)(315) = 179.55 1.33
Total 315 315.00 3.09
The observed value of 2 is 3.09
= .05 and /2 = .025 df = k- 1 = 2 - 1 = 1
2.025,1 = 5.0239
Since 2 = 3.09 < 2.025,1 = 5.0239, the decision is to fail to reject the nullhypothesis.
8/3/2019 Ken Black QA 5th chapter 12 Solution
35/36
Chapter 12: Analysis of Categorical Data 3512.27
Type of College or University
Community
College
Large
University
Small
College
Number
ofChildren
0 25 178 31 234
1 49 141 12 2022 31 54 8 93
>3 22 14 6 42
127 387 57 571
Ho: Number of Children is independent of Type of College or University.Ha: Number of Children is not independent of Type of College or University.
e11 =571
)127)(234(= 52.05 e31 =
571
)127)(93(= 20.68
e12 =571
)387)(234( = 158.60 e32 =571
)387)(193( = 63.03
e13 =571
)57)(234(= 23.36 e33 =
571
)57)(93(= 9.28
e21 =571
)127)(202(= 44.93 e41 =
571
)127)(42(= 9.34
e22 =571
)387)(202(= 136.91 e42 =
571
)387)(42(= 28.47
e23 =571
)57)(202(= 20.16 e43 =
571
)57)(42(= 4.19
Type of College or University
Community
College
Large
University
Small
College
Number
of
Children
0 (52.05)
25
(158.60)
178
(23.36)
31
234
1 (44.93)
49
(136.91)
141
(20.16)
12
202
2 (20.68)
31
(63.03)
54
(9.28)
8
93
>3 (9.34)
22
(28.47)
14
(4.19)
6
42
127 387 57 571
8/3/2019 Ken Black QA 5th chapter 12 Solution
36/36
Chapter 12: Analysis of Categorical Data 36
2 =05.52
)05.5225( 2+
6.158
)6.158178( 2+
36.23
)36.2331( 2+
93.44
)93.4449( 2+
91.136
)91.136141( 2+
16.20
)16.2012( 2+
68.20
)68.2031( 2+
03.63
)03.6354( 2+
28.9
)28.98( 2+
34.9
)34.922( 2+
47.28
)47.2814( 2+
19.4
)19.46( 2=
14.06 + 2.37 + 2.50 + 0.37 + 0.12 + 3.30 + 5.15 + 1.29 + 0.18 +
17.16 + 7.35 + 0.78 = 54.63
= .05, df= (c - 1)(r- 1) = (3 - 1)(4 - 1) = 6
2.05,6 = 12.5916
Since the observed 2 = 54.63 > 2.05,6 = 12.5916, the decision is to reject thenull hypothesis.
Number of children is not independent of type of College or University.
12.28 The observed chi-square is 30.18 with ap-value of .0000043. The chi-squaregoodness-of-fit test indicates that there is a significant difference between the
observed frequencies and the expected frequencies. The distribution of responses
to the question is not the same for adults between 21 and 30 years of age as they
are for others. Marketing and sales people might reorient their 21 to 30 year oldefforts away from home improvement and pay more attention to leisure
travel/vacation, clothing, and home entertainment.
12.29 The observed chi-square value for this test of independence is 5.366. The
associatedp-value of .252 indicates failure to reject the null hypothesis. There is
not enough evidence here to say that color choice is dependent upon gender.Automobile marketing people do not have to worry about which colors especially
appeal to men or to women because car color is independent of gender. In
addition, design and production people can determine car color quotas based on
other variables.
Top Related