Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

28
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describi ng Categori cal Data

Transcript of Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Page 1: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 1

Chapter 3Describing Categorical Data

Page 2: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 2

3.1 Looking At Data

Which hosts send the most visitors to Amazon’s Web site?

Data set consists of 188,996 visits

Host is a categorical variable

To answer this question we must describe the variation in Host

Page 3: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 3

3.1 Looking At Data

Frequency and Relative Frequency Tables

The distribution of a categorical variable is a list of values with its associated count (frequency)

A frequency table summarizes the distribution of a categorical variable

A relative frequency table shows the proportion (or percentage) in each category

Page 4: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 4

3.1 Looking At Data

Page 5: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 5

3.2 Charts of Categorical Data

Bar Charts and Pie Charts

Unless you need to know exact counts, charts are better than tables for summarizing more than five categories

The two most common displays of a categorical variable are a bar chart and a pie chart

Page 6: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 6

3.2 Charts of Categorical Data

The Bar Chart

Uses horizontal or vertical bars to show the distribution of a categorical variable

Is called a Pareto chart when the categories are sorted by frequency (popular in quality control)

Becomes cluttered with too many categories

Is appropriate for ordinal categorical variables

Page 7: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 7

3.2 Charts of Categorical Data

Bar Chart (Horizontal) of Top 10 Hosts

Page 8: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 8

3.2 Charts of Categorical Data

Bar Chart (Vertical) of Top 10 Hosts

Page 9: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 9

3.2 Charts of Categorical Data

The Pie Chart

Uses wedges of a circle to show the distribution of a categorical variable

Commonly chosen to illustrate market shares or sources of revenue for a company

Less useful than bar charts if we want to compare actual counts (easier to compare bars than angles of wedges)

Page 10: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 10

3.2 Charts of Categorical Data

Pie Chart of Top 10 Hosts

Page 11: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 11

3.3 The Area Principle

The Fundamental Rule for Data Displays

The area occupied by a part of the graph/chart that displays data should be proportional to the amount of data it represents

Charts decorated to attract attention often violate the area principle

Page 12: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 12

3.3 The Area Principle

An Example Violating the Area Principle

Page 13: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 13

3.3 The Area Principle

The Same Example Respecting the Area Principle

Page 14: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 14

4M Example 3.1: ROLLING OVER

Motivation

Are certain types of vehicles more prone to roll-over accidents than others?

Page 15: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 15

4M Example 3.1: ROLLING OVER

Method

Data gathered from Fatality Analysis Reporting System (FARS) for roll-over accidents on interstate highways. Cases that make up the rows are accidents resulting in roll-overs in 2000. The column of interest is model of the car involved.

Page 16: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 16

4M Example 3.1: ROLLING OVER

Mechanics

Page 17: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 17

4M Example 3.1: ROLLING OVER

Mechanics

Page 18: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 18

4M Example 3.1: ROLLING OVER

Message

Ford Broncos were involved in more than twice as many roll-over accidents as the next-closest model.

Page 19: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 19

4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES

Motivation

Apple, Google and Research in Motion (RIM) aggressively compete to sell their smartphones to businesses. RIM has dominated with its Blackberry line, but has that success held up to the intense competition from Apple and Google?

Page 20: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 20

4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES

Method

Page 21: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 21

4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES

Mechanics

Page 22: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 22

4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES

Mechanics

Page 23: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 23

4M Example 3.2: SELLING SMARTPHONES TO BUSINESSES

Message

Corporate customers are purchasing more iPhones and Android phones for managers. From 2010 to 2011, Blackberry sales grew less than sales of iPhones and Android phones. While RIM still had the largest share of the market in 2011, it had decreased to less than 50%.

Page 24: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 24

3.4 Mode and Median

Mode

Category with the highest frequency

The longest bar in a bar chart

The widest slice in a pie chart

Two or more categories can tie with the highest frequency (bimodal or multimodal)

Page 25: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 25

3.4 Mode and Median

Median

Not appropriate for nominal data

Data must be ordinal

It is the category label of the middle observation in ordered data

Page 26: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 26

Best Practices

Use a bar chart to show the frequencies of a categorical variable.

Use a pie chart to show the proportions of a categorical variable.

Keep the baseline of a bar chart at zero.

Preserve the ordering of an ordinal variable.

Page 27: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 27

Best Practices (Continued)

Respect the area principle.

Show the best plots to answer the motivating question.

Label your chart to show the categories and indicate whether some have been combined or omitted.

Page 28: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.

Copyright © 2014, 2011 Pearson Education, Inc. 28

Pitfalls

Avoid elaborate plots that may be deceptive.

Do not show too many categories.

Do not put ordinal data in a pie chart.

Do not carelessly round data.