Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

28

Transcript of Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

Page 1: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.
Page 2: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

Copyright © 2011 Pearson Education, Inc.

Describing Categorical Data

Chapter 3

Page 3: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.1 Looking At Data

Which hosts send the most visitors to Amazon’s Web site?

Data set consists of 188,996 visits

Host is a categorical variable

To answer this question we must describe the variation in Host

Copyright © 2011 Pearson Education, Inc.

3 of 28

Page 4: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.1 Looking At Data

Frequency and Relative Frequency Tables

The distribution of a categorical variable is a list of values with its associated count (frequency)

A frequency table summarizes the distribution of a categorical variable

A relative frequency table shows the proportion (or percentage) in each category

Copyright © 2011 Pearson Education, Inc.

4 of 28

Page 5: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.1 Looking At Data

Copyright © 2011 Pearson Education, Inc.

5 of 28

Page 6: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.2 Charts of Categorical Data

Bar Charts and Pie Charts

Unless you need to know exact counts, charts are better than tables for summarizing more than five categories

The two most common displays of a categorical variable are a bar chart and a pie chart

Copyright © 2011 Pearson Education, Inc.

6 of 28

Page 7: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.2 Charts of Categorical Data

The Bar Chart

Uses horizontal or vertical bars to show the distribution of a categorical variable

Is called a Pareto chart when the categories are sorted by frequency (popular in quality control)

Becomes cluttered with too many categories

Is appropriate for ordinal categorical variables

Copyright © 2011 Pearson Education, Inc.

7 of 28

Page 8: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.2 Charts of Categorical Data

Bar Chart (Horizontal) of Top 10 Hosts

Copyright © 2011 Pearson Education, Inc.

8 of 28

Page 9: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.2 Charts of Categorical Data

Bar Chart (Vertical) of Top 10 Hosts

Copyright © 2011 Pearson Education, Inc.

9 of 28

Page 10: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.2 Charts of Categorical Data

The Pie Chart

Uses wedges of a circle to show the distribution of a categorical variable

Commonly chosen to illustrate market shares or sources of revenue for a company

Less useful than bar charts if we want to compare actual counts (easier to compare bars than angles of wedges)

Copyright © 2011 Pearson Education, Inc.

10 of 28

Page 11: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.2 Charts of Categorical Data

Pie Chart of Top 10 Hosts

Copyright © 2011 Pearson Education, Inc.

11 of 28

Page 12: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.3 The Area Principle

The Fundamental Rule for Data Displays

The area occupied by a part of the graph/chart that displays data should be proportional to the amount of data it represents

Charts decorated to attract attention often violate the area principle

Copyright © 2011 Pearson Education, Inc.

12 of 28

Page 13: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.3 The Area Principle

An Example Violating the Area Principle

Copyright © 2011 Pearson Education, Inc.

13 of 28

Page 14: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.3 The Area Principle

The Same Example Respecting the Area Principle

Copyright © 2011 Pearson Education, Inc.

14 of 28

Page 15: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.1: ROLLING OVER

Motivation

Are certain types of vehicles more prone to roll-over accidents than others?

Copyright © 2011 Pearson Education, Inc.

15 of 28

Page 16: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.1: ROLLING OVER

Method

Data gathered from Fatality Analysis Reporting System (FARS) for roll-over accidents on interstate highways. Cases that make up the rows are accidents resulting in roll-overs in 2000. The column of interest is model of the car involved.

Copyright © 2011 Pearson Education, Inc.

16 of 28

Page 17: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.1: ROLLING OVER

Mechanics

Copyright © 2011 Pearson Education, Inc.

17 of 28

Page 18: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.1: ROLLING OVER

Mechanics

Copyright © 2011 Pearson Education, Inc.

18 of 28

Page 19: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.1: ROLLING OVER

Message

Ford Broncos were involved in more than twice as many roll-over accidents as the next-closest model.

Copyright © 2011 Pearson Education, Inc.

19 of 28

Page 20: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.2: CHIP SALES

Motivation

Infineon pled guilty to price fixing for DRAM’s in September 2004. Did Infineon gain a larger share of the market for chips during this period?

Copyright © 2011 Pearson Education, Inc.

20 of 28

Page 21: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.2: CHIP SALES

Method

Copyright © 2011 Pearson Education, Inc.

21 of 28

Page 22: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.2: CHIP SALES

Mechanics

Copyright © 2011 Pearson Education, Inc.

22 of 28

Page 23: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

4M Example 3.2: CHIP SALES

Message

Infineon and Samsung increased their shares from 1999 to 2002. It appears to have been at the expense of smaller companies.

Copyright © 2011 Pearson Education, Inc.

23 of 28

Page 24: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.4 Mode and Median

Mode

Category with the highest frequency

The longest bar in a bar chart

The widest slice in a pie chart

Two or more categories can tie with the highest frequency (bimodal or multimodal)

Copyright © 2011 Pearson Education, Inc.

24 of 28

Page 25: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

3.4 Mode and Median

Median

Not appropriate for nominal data

Data must be ordinal

It is the category label of the middle observation in ordered data

Copyright © 2011 Pearson Education, Inc.

25 of 28

Page 26: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

Best Practices

Use a bar chart to show the frequencies of a categorical variable.

Use a pie chart to show the proportions of a categorical variable.

Preserve the ordering of an ordinal variable.

Copyright © 2011 Pearson Education, Inc.

26 of 28

Page 27: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

Best Practices (Continued)

Respect the area principle.

Show the best plots to answer the motivating question.

Label your chart to show the categories and indicate whether some have been combined or omitted.

Copyright © 2011 Pearson Education, Inc.

27 of 28

Page 28: Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.

Pitfalls

Avoid elaborate plots that may be deceptive.

Do not show too many categories.

Do not put ordinal data in a pie chart.

Do not carelessly round data.

Copyright © 2011 Pearson Education, Inc.

28 of 28