Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

39
Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data

Transcript of Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Page 1: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education.

Chapter 4

Displaying and Describing

Categorical Data

Page 2: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-2

4.1 Summarizing a Categorical Variable

The Three Rules of Data Analysis Make a picture. Make a picture. Make a picture. Pictures …

• reveal things that can’t be seen in a table of numbers.

• show important features and patterns in the data.

• provide an excellent means for reporting findings to others.

Page 3: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-3

4.1 Summarizing a Categorical Variable

A frequency table organizes data by recording totals and category names as in the table below.

The names of the categories label each row in the frequency table.

When the number of categories gets too large (around 12), values may be lumped together in an “Other” category.

Page 4: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-4

4.1 Summarizing a Categorical Variable

A relative frequency table displays the percentages that lie in each category rather than the counts. (See the table below.)

The percentage of the data in each category is found by dividing the count in each category by the total number of cases and multiplying by 100.

Page 5: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-5

4.2 Displaying a Categorical Variable

The Area Principle

The figure given distorts the data from the frequency tables seen in previous slides (which was data based on internet sandal sales).

Though the length of the sandals do match the data, our eyes tend to be impressed by the area which can be misleading.

Page 6: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-6

4.2 Displaying a Categorical Variable

The Area Principle

The best data displays observe the area principle: the area occupied by a part of the graph should correspond to the magnitude of the value it represents.

Page 7: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-7

4.2 Displaying a Categorical Variable

Bar Charts

A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.

The bar graph here gives a more accurate visual impression of the sandal data.

Page 8: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-8

4.2 Displaying a Categorical Variable

Bar Charts

If the counts are replaced with percentages, the data can be displayed in a relative frequency bar chart.

The relative frequency barchart looks the same as thebar chart, but shows the proportion of visits in eachCategory rather than counts.

Page 9: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-9

4.2 Displaying a Categorical Variable

Pie Charts

Pie charts show the whole group of cases as a circle sliced into pieces with sizes proportional to the fraction of the whole in each category. The KEEN Inc. data is displayed below.

Page 10: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-10

4.2 Displaying a Categorical Variable

Before making a bar chart or pie chart, …

•the data must satisfy the Categorical Data Condition: the data are counts or percentages of individuals in categories.

•be sure the categories don’t overlap.

•consider what you are attempting to communicate about the data.

Page 11: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-11

4.2 Displaying a Categorical Variable

Example: U.S. Market ShareAn article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the following pie chart.

Is this an appropriate display for these data? Explain.Which company had the largest share of the market?

Page 12: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-12

4.2 Displaying a Categorical Variable

Example: U.S. Market ShareAn article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the following pie chart.

Yes this is an appropriate display for these data because the categories divide the whole.Coca-Cola had the largest share of the market.

Page 13: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-13

4.2 Displaying a Categorical Variable

Example: U.S. Market ShareAn article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the bar chart below. Compare to the previously shown pie chart.

Which is better for displaying the relative portions of market share?What is missing from this display that might make it misleading?

Page 14: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-14

4.2 Displaying a Categorical Variable

Example: U.S. Market ShareAn article in The Wall Street Journal (March 16, 2007) reported the 2006 U.S. market share of leading sellers of carbonated drinks, summarized in the bar chart below. Compare to the previously shown pie chart.

The pie chart is better for displaying the relative portions of market share.The category “Other” is missing from this display which might make the graph misleading.

Page 15: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-15

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: Data was collected on the strength of consumers’ preferences for regional foods in their country. The data is displayed in the frequency table and clarified with a pie chart.

Page 16: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-16

4.3 Exploring Two Categorical Variables: Contingency Tables

To show how opinions on regional foods varied by countries, we can display the data in a contingency table where we have added the countries as a new variable.

Page 17: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-17

4.3 Exploring Two Categorical Variables: Contingency Tables

The marginal distribution of a variable in a contingency table is the total count that occurs when the value of that variable is held constant.

Here the marginal distribution indicated shows that there were 1502 respondents from China.

Page 18: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-18

4.3 Exploring Two Categorical Variables: Contingency Tables

Each cell of a contingency table (any intersection of a row and column of the table) gives the count for a combination of values of the two variables.

Here the indicated cell shows that 4 respondents from India didn’t know how they felt about the question asked.

Page 19: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-19

4.3 Exploring Two Categorical Variables: Contingency Tables

Rather than displaying the data as counts, a table may display the data as a percentage – as a total percent, row percent, or column percent, which show percentages with respect to the total count, row count, or column count, respectively.

We see that 6.74% of all respondents were from China and agreed completely with the question asked.

Page 20: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-20

4.3 Exploring Two Categorical Variables: Contingency Tables

Conditional Distributions

Variables may be restricted to show the distribution for just those cases that satisfy a specified condition. This is called a conditional distribution.

Here are the preferences of the respondents from India and the U.K, which allows comparison of these responses.

Page 21: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-21

4.3 Exploring Two Categorical Variables: Contingency Tables

Conditional Distributions

We may display the results of a conditional distribution as a pie chart or as a bar graph.

The data from the previous table is displayed here as a side-by-side bar chart.

Page 22: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-22

4.3 Exploring Two Categorical Variables: Contingency Tables

Conditional Distributions

Variables can be related in many ways, so it is typically easier to ask if they are not related.

In a contingency table, when the distribution of one variable is the same for all categories of another variable, we say that the variables are independent.

Page 23: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-23

4.3 Exploring Two Categorical Variables: Contingency Tables

Segmented Bar Charts

Data can be displayed by dividing up bars rather than circles. The result is a segmented bar chart where a bar is divided proportionally into segments corresponding to the percentage in each group.

The data from the conditional distribution pertaining to India and the U.K. are displayed here as segmented bar charts.

Page 24: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-24

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: Importance of WealthGFK Roper Reports Worldwide survey in 2004, asked “How important is acquiring wealth to you?” The percent who responded that it was of more than average importance were: 71.9% China, 59.6% France, 76.1% India, 45.5% UK, and 45.3% USA. Look at the following bar chart.

How much larger is the proportion of those who said acquiring wealth was important in India than in the UnitedStates? Is that the impression given by the display? How would you improve this display?

Page 25: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-25

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: Importance of WealthGFK Roper Reports Worldwide survey in 2004, asked “How important is acquiring wealth to you?” The percent who responded that it was of more than average importance were: 71.9% China, 59.6% France, 76.1% India, 45.5% UK, and 45.3% USA. Look at the following bar chart.

The statistics reveal that India is less than twice as much as the U.S., but the graph suggests India’s percentage is about 6 times a big as the U.S. The vertical scale beginning at 40% distorts the visual impression. Start the graph at 0%.

Page 26: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-26

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: Google financialsGoogle Inc. derives revenue from three major sources: advertising revenue from their websites, advertising revenue from the thousands of third party websites that comprise the Google Network, and licensing and miscellaneous revenue. The following table shows the percentage of all revenue derived from these sources for the period 2002 to 2006.

Are these row or column percentages?What percent of Google’s revenue came from their website’s advertising in 2006?

Page 27: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-27

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: Google financialsGoogle Inc. derives revenue from three major sources: advertising revenue from their websites, advertising revenue from the thousands of third party websites that comprise the Google Network, and licensing and miscellaneous revenue. The following table shows the percentage of all revenue derived from these sources for the period 2002 to 2006.

These are column percentages because the row sums are greater than 100% but columns add to 100%. 60% of Google’s revenue in 2006 came from their website’s advertising.

Page 28: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-28

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: MBAsA survey of the entering MBA students at a universityin the United States classified the country of originof the students, as seen in the table.

What % of all MBA students were from North America?What % of the Two-Year MBAs were from North America?What % of the Evening MBAs were from North America?

Page 29: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-29

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: MBAsA survey of the entering MBA students at a universityin the United States classified the country of originof the students, as seen in the table.

62.7% of all MBA students were from North America. 62.8% of the Two-Year MBAs were from North America. 62.5% of the Evening MBAs were from North America.

Page 30: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-30

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: MBAsA survey of the entering MBA students at a universityin the United States classified the country of originof the students, as seen in the table.

What is the marginal distribution of origin?

Page 31: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-31

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: MBAsA survey of the entering MBA students at a universityin the United States classified the country of originof the students, as seen in the table.

The marginal distribution of origin is 23.9% from Asia, 1.9% Europe, 7.8% Latin America, 3.7% Middle East, and 62.7% North American.

Page 32: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-32

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: MBAsA survey of the entering MBA students at a universityin the United States classified the country of originof the students, as seen in the table.

Obtain the column percentages and show the conditionaldistributions of origin by MBA Program.Do you think that origin of the MBA student is independent of the MBA programs?

Page 33: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-33

4.3 Exploring Two Categorical Variables: Contingency Tables

Example: MBAsA survey of the entering MBA students at a universityin the United States classified the country of originof the students, as seen in the table.

Origin of the MBA student is not independent of the MBA programs because the distributions appear to be different. For example, the % from Latin America among those in Two-Yr programs is nearly 20% while those in Evening Programs is less than 1%.

Page 34: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-34

• Don’t violate the area principle.

• Keep it honest. • The pie chart below is confusing because the percentages

add up to more than 100% and the 50% piece of pie looks smaller than 50%.

Page 35: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-35

• Keep it honest. • The scale of the years change from two-year values to one-

year values for the last three values, which makes comparison awkward.

Page 36: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-36

• Don’t confuse percentages – differences in what a percentage represents needs to be clearly identified.

• Don’t forget to look at the variables separately in contingency tables and through marginal distributions.

• Be sure to use enough individuals in gathering data.

• Don’t overstate your case. You can only conclude what your data suggests. Other studies under other circumstances may find different results.

• Don’t use unfair or inappropriate percentages.

Page 37: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-37

Simpson’s Paradox

The data below suggest that Peter is a more successful sales rep because his overall success is 83% compared to Katrina’s 78%. The data suggest that Katrina is more successful because of her higher percentage of sales of each product.

This is known as Simpson’s Paradox and occurs because percentages are inappropriately combined.

Page 38: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-38

Make and interpret a frequency table for a categorical variable.• We can summarize categorical data by counting the number of cases in each category, sometimes expressing the resulting distribution as percentages.

Make and interpret a bar chart or pie chart.• We display categorical data using the area principle in either a bar chart or a pie chart.

Make and interpret a contingency table.• When we want to see how two categorical variables are related, we put the counts (and/or percentages) in a two-way table called a contingency table.

Page 39: Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.

Copyright © 2012 Pearson Education. 4-39

Make and interpret bar charts and pie charts of marginal distributions.

• We look at the marginal distribution of each variable (found in the margins of the table). We also look at the conditional distribution of a variable within each category of the other variable.

• Comparing conditional distributions of one variable across categories of another tells us about the association between variables. If the conditional distributions of one variable are (roughly) the same for every category of the other, the variables are independent.

© 2010 Pearson Education