STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd...
Transcript of STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd...
![Page 1: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/1.jpg)
STAT 22000 Lecture SlidesExploring Categorical Data
Yibi HuangDepartment of StatisticsUniversity of Chicago
![Page 2: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/2.jpg)
Outline
This set of slides cover Section 1.7 in the text.
• Ways to summarize of a single categorical variable• Frequency tables• Barplots, pie charts
• Ways to summarize of relationships between two categoricalvariables• two-way contingency tables• segmented barplots, standardized segmented barplots,
mosaic plot
1
![Page 3: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/3.jpg)
Bar Graphs and Pie Charts
![Page 4: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/4.jpg)
Graphs for Categorical Variables
A categorical variable is summarized by a table showing the countor the percentage of cases in each category, and is often displayedby a bar plot or a pie chart.
Ex: Passengers on Titanic
Class Freq Percent1st 325 14.8%2nd 285 12.9%3rd 706 32.1%
Crew 885 40.2%Total 2201 100%
1st 2nd 3rd Crew
Class
Num
ber
of P
eopl
e
020
040
060
080
0
1st
2nd
3rd
Crew 2
![Page 5: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/5.jpg)
Bar plots
A bar plot is a common way to display a single categorical variable.A bar plot where proportions instead of frequencies are shown iscalled a relative frequency bar plot.
1st 2nd 3rd Crew
Class
Num
ber
of P
eopl
e
020
040
060
080
0
1st 2nd 3rd Crew
Class
Per
cent
age
of P
eopl
e
010
2030
40
3
![Page 6: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/6.jpg)
How are Bar Plots Different From Histograms?
• Bar plots are used for displaying distributions of categoricalvariables, while histograms are used for numerical variables.
• The horizontal axis in a histogram is a number line, hence theorder of the bars cannot be changed, while in a bar plot thecategories can be listed in any order (though some orderingsmake more sense than others, especially for ordinalvariables.)
4
![Page 7: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/7.jpg)
Why We Recommend Bar Plots Over Pie Charts?
In a pie chart, the areas of slices represents the percentages ofcategories. However, it is generally more difficult to compare groupsizes in a pie chart than in a bar plot, especially when categorieshave nearly identical counts or proportions
1st
2nd
3rd
Crew
Without looking at the counts,can you tell which class havefewest people from the pie?
5
![Page 8: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/8.jpg)
Why We Recommend Bar Plots Over Pie Charts?
It’s much easier to make a wrong pie chart than a wrong bar plot.In a pie chart, the categories must make up a whole. There is nosuch restriction for a bar plot.
6
![Page 9: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/9.jpg)
Another Wrong Pie Chart
http:// www.youtube.com/ watch?v=-rbyhj8uTT8
7
![Page 10: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/10.jpg)
Two-Way Contingency Tables
![Page 11: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/11.jpg)
Two-Way Contingency Tables
A table that summarizes data for two categorical variables is calleda contingency table.
E.g., breakdown of people on Titanic by class and survival status
Died Survived Total
Class
1st 122 203 3252nd 167 118 2853rd 528 178 706Crew 673 212 885Sum 1490 711 2201
The marginal totals give the distributions of the two variables, e.g.,
• overall, 1490 died and 711 survived• there were 325, 285, and 706 passengers in the 1st, 2nd and
3rd classes, and 885 crew members8
![Page 12: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/12.jpg)
Overall Proportions
Dividing the cell counts in a contingency table by the overall total, we getthe proportions of observations in the combinations of the two variables.
SurvivedNo Yes Total
Class
1st 122/2201 ≈ 0.06 203/2201 ≈ 0.09 325/2201 ≈ 0.152nd 167/2201 ≈ 0.08 118/2201 ≈ 0.05 285/2201 ≈ 0.133rd 528/2201 ≈ 0.24 178/2201 ≈ 0.08 706/2201 ≈ 0.32Crew 673/2201 ≈ 0.31 212/2201 ≈ 0.10 885/2201 ≈ 0.40Sum 1490/2201 ≈ 0.68 711/2201 ≈ 0.32 1
e.g., of people on Titanic
• 122/2201 ≈ 6% were in the 1st class and died in the disaster
• 212/2201 ≈ 10% were survived crew members
Note the marginal totals give the distributions of the two variables, e.g.,
• Overall, 711/2201 ≈ 32% of the people survived9
![Page 13: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/13.jpg)
Row Proportions
The row proportions (cell counts divided by the corresponding rowtotals) give the proportion of people survived in the four classes.
SurvivedNo Yes Total
Class
1st 122/325 ≈ 0.38 203/325 ≈ 0.62 12nd 167/285 ≈ 0.59 118/285 ≈ 0.41 13rd 528/706 ≈ 0.75 178/706 ≈ 0.25 1Crew 673/885 ≈ 0.76 212/885 ≈ 0.24 1
e.g.,
• 203/325 ≈ 62% of people in the 1st class survived.
• 178/706 ≈ 25% of people in the 3rd class survived.
10
![Page 14: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/14.jpg)
Column Proportions
The column proportions (dividing cell counts by the correspondingcolumn totals) give the proportion of people survived in each of thefour classes.
SurvivedNo Yes
Class
1st 122/1490 ≈ 0.08 203/711 ≈ 0.292nd 167/1490 ≈ 0.11 118/711 ≈ 0.173rd 528/1490 ≈ 0.35 178/711 ≈ 0.25Crew 673/1490 ≈ 0.45 212/711 ≈ 0.30Sum 1 1
• Among those who survived, 203/711 ≈ 29% were in the 1stclass.
• Among those who died, 673/1490 ≈ 45% were crew members
11
![Page 15: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/15.jpg)
Independence of Two Categorical Variables
If the row proportions do not change from row to row, we say thetwo categorical variables are independent. Otherwise, we say theyare associated.
E.g., if the survival rates do not change from class to class, we say‘survival’ is independent of ‘class’. In the Titanic data, the survivalof passengers is associated with the class they were in becausethe survival rates differ substantially from class to class.
We can also define two categorical variables to be independent ifthe column proportions do not vary from column to column sincethe two conditions are equivalent (why?)
12
![Page 16: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/16.jpg)
Exercise
The table below shows the breakdown of cases of injuries in theU.S in a certain year. by circumstance and gender1. Counts are inmillions.
CircumstanceGender Work Home Other Total
Male 8.0 9.8 17.8 35.6Female 1.3 11.6 12.9 25.8
Total 9.3 21.4 30.7 61.4
• What proportion of injury cases occurred at work?9.3/61.4 ≈ 0.15
• What proportion of injury cases occurred at work and onwomen? 1.3/61.4 ≈ 0.02
1Source: Vital and Health Statistics published by the National Center for Health Statistics
13
![Page 17: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/17.jpg)
Practise (Cont’d)
CircumstanceGender Work Home Other Total
Male 8.0 9.8 17.8 35.6Female 1.3 11.6 12.9 25.8
Total 9.3 21.4 30.7 61.4
• Among all injury cases occurred on women, what proportionoccurred at work? 1.3/25.8 ≈ 0.05
• Among all injury cases occurred at work, what proportionoccurred on women? 1.3/9.3 ≈ 0.14
• Is the circumstance of injury cases independent of the genderof the victims? No, only 5% of injury cases on womenoccurred at work, compared with 8.0/36.5 ≈ 22% of cases onmen occurred at work.
14
![Page 18: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/18.jpg)
Segmented Bar and Mosaic Plots
![Page 19: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/19.jpg)
Segmented Bar Plots
SurvivedClass No Yes Total1st 122 203 3252nd 167 118 2853rd 528 178 706Crew 673 212 885Sum 1490 711 2201
Class
Fre
q
0
200
400
600
800
1st 2nd 3rd Crew
SurvivedNoYes
15
![Page 20: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/20.jpg)
Standardized Segmented Bar Plots
SurvivedNo Yes Total
Class
1st 0.38 0.62 12nd 0.59 0.41 13rd 0.75 0.25 1Crew 0.76 0.24 1
Class
0.0
0.2
0.4
0.6
0.8
1.0
1st 2nd 3rd Crew
SurvivedNoYes
Standardized segmented bar plots are convenient for comparingrow proportions, and determining whether the two variables areindependent.
However, the information of row totals is lost after standardization.
16
![Page 21: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/21.jpg)
Mosaic Plots
• bar widths = row totals• segment lengths within a bar = row proportions
Class
Sur
vive
d
1st 2nd 3rd Crew
No
Yes
segment area = (barwidth) × (segment length)
= row total × (row proportion)
= row total ×cell countrow total
= cell count 17
![Page 22: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/22.jpg)
Exercise 1.68 Raise Taxes on the Rich or the Poor
The mosaic plot below shows the relationship between politicalparty affiliation and views on whether it’s better to raise taxes onthe rich or on the poor for a random sample of registered voterstaken nationally in 2015.
Democrat Republican Indep / Other
Raise taxes on the rich
Raise taxes on the poorNot sure
18
![Page 23: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/23.jpg)
Democrat Republican Indep / Other
Raise taxes on the rich
Raise taxes on the poorNot sure
Which political party identification is least common in the sample,Democrats, Republicans, or Indep/Other?
Ans: Indep/Other.
19
![Page 24: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/24.jpg)
Democrat Republican Indep / Other
Raise taxes on the rich
Raise taxes on the poorNot sure
Based on this sample, which political party identification had thehighest percentage supported raising taxes on the rich? Which hadthe lowest?
Ans: Democrats the highest, Republicans the lowest.
20
![Page 25: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/25.jpg)
Democrat Republican Indep / Other
Raise taxes on the rich
Raise taxes on the poorNot sure
What percentage of Democrats (in this sample) supported raisingtaxes on the rich?
(a) below 25%(b) between 25% and 50%(c) between 50% and 75%(d) over 75%
21
![Page 26: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/26.jpg)
Democrat Republican Indep / Other
Raise taxes on the rich
Raise taxes on the poorNot sure
In this sample, which of the following groups contains the greatestnumber of subjects?
(a) Democrats who supported raising taxes on the rich.(b) Democrats who supported raising taxes on the poor.(c) Republicans who supported raising taxes on the rich.(d) Republicans who supported raising taxes on the poor.
22
![Page 27: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/27.jpg)
Democrat Republican Indep / Other
Raise taxes on the rich
Raise taxes on the poorNot sure
Based on the mosaic plot, do views on raising taxes and politicalaffiliation appear to be independent?
23
![Page 28: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/28.jpg)
Instead of looking at survival rates in the four classes, we can alsolook at the breakdown of the four classes among those whosurvived and among those who died.
Survived
Fre
q
0
500
1000
1500
No Yes
Survived
0.0
0.2
0.4
0.6
0.8
1.0
No Yes
Class1st2nd3rdCrew
Survived
Cla
ss
No Yes1st
2nd
3rd
Crew
24
![Page 29: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/29.jpg)
Ways to Inspect Relationships Between Variables
• numerical v.s. numerical• scatterplots
• categorical v.s. categorical• contingency tables• segmented barplots, standardized segmented barplots,
mosaic plot
• categorical v.s. numerical• side-by-side boxplots• histograms by group on the same horizontal axis
25
![Page 30: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/30.jpg)
Example (Diamonds)
Mosaic plot: Carat Weight v.s. Quality of Cut
Carat
Qua
lity
of C
ut
0.4
0.41
0.42
0.43
0.44
0.45
0.46
0.47
0.48
0.49 0.
5
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59 0.
60.
610.
620.
630.
640.
650.
660.
670.
680.
69 0.7
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
Fair
Good
Very Good
Premium
Ideal
26
![Page 31: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/31.jpg)
Example (Diamonds)
Carat
Qua
lity
of C
ut
0.8
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89 0.
9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99 1
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09 1.
11.
111.
121.
131.
141.
151.
161.
171.
181.
19 1.2
1.21
1.22
1.23
1.24
Fair
Good
Very Good
Premium
Ideal
27
![Page 32: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/32.jpg)
Example (Diamonds)
Carat
Qua
lity
of C
ut
1.25
1.26
1.27
1.28
1.29 1.
31.
311.
321.
331.
341.
351.
361.
371.
381.
39 1.4
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49 1.
5
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59 1.
61.
611.
621.
631.
641.
651.
661.
671.
681.
69 1.7
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
FairGood
Very Good
Premium
Ideal
28
![Page 33: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/33.jpg)
Example (Diamonds)
Carat
Qua
lity
of C
ut
1.8
1.81
1.82
1.83
1.84
1.85
1.86
1.87
1.88
1.89 1.
91.
911.
921.
931.
941.
951.
961.
971.
981.
99 2
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09 2.
12.
112.
122.
132.
142.
152.
162.
172.
182.
19 2.2
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
Fair
Good
Very Good
Premium
Ideal
29
![Page 34: STAT 22000 Lecture Slides Exploring Categorical Datayibi/teaching/stat220/... · 1st 122 203325 2nd 167 118285 3rd 528 178706 Crew 673 212885 Sum14907112201 The marginal totals give](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec4911ba6d92d38321d07cd/html5/thumbnails/34.jpg)
Example (Diamonds)
From the mosaic plots, we can see the proportion of low-quality cutdiamonds increases substantially whenever the carat weight ofdiamonds reaches those benchmarks (0.5, 0.7, 0.9, 1, 1.2, 1.5,2,. . . ). Diamonds with carat weights right above those benchmarksgenerally have better quality of cut then those just at thosebenchmarks.
Possible reasons:
Diamond cutters would want to get the heaviest diamond out of arough stone whenever possible. They might increase the depth ofdiamonds to increase the carat weight, but result in a loss ofbrilliance due to light leakage.
30