Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

42

Transcript of Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Page 1: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Page 2: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Copyright © 2011 Pearson Education, Inc.

Describing Numerical Data

Chapter 4

Page 3: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

Can 500 different songs fit on the iPod Shuffle?

To answer this question we must understand the typical length of a song and the variation of song sizes around the typical length

We can do this using summary statistics

Copyright © 2011 Pearson Education, Inc.

3 of 42

Page 4: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

A Subset of the Data

Copyright © 2011 Pearson Education, Inc.

4 of 42

Page 5: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Median

Value in the middle of a sorted list of numerical values (a typical value)

Half of the values fall below the median; half fall above

It is the 50th Percentile

Copyright © 2011 Pearson Education, Inc.

5 of 42

Page 6: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

Common Percentiles

Lower Quartile = 25th Percentile

Upper Quartile = 75th Percentile

One quarter of the values fall below the lower quartile and one quarter fall above the upper quartile

Copyright © 2011 Pearson Education, Inc.

6 of 42

Page 7: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Interquartile Range (IQR)IQR = 75th Percentile – 25th Percentile

A measure of variation based on quartiles

Used to accompany the median

Copyright © 2011 Pearson Education, Inc.

7 of 42

Page 8: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Range Range = Maximum - Minimum

Maximum Value = 100th Percentile

Minimum Value = 0th Percentile

Another measure of variation; not preferred because based on extreme values

Copyright © 2011 Pearson Education, Inc.

8 of 42

Page 9: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Five Number Summary Minimum Lower Quartile Median Upper Quartile Maximum

Copyright © 2011 Pearson Education, Inc.

9 of 42

Page 10: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Five Number Summary for Song Sizes Minimum = 0.148 MB Lower Quartile = 2.85 MB Median = 3.5015 MB Upper Quartile = 4.32 MB Maximum = 21.622 MB

Copyright © 2011 Pearson Education, Inc.

10 of 42

Page 11: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

Summary Statistics for Song Sizes Median = 3.5015 MB

IQR = 4.32 MB – 2.85 MB = 1.47 MB

Range = 21.622 MB – 0.148 MB = 21.474 MB

Copyright © 2011 Pearson Education, Inc.

11 of 42

Page 12: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Mean (Average) Arithmetic average; divide the sum of the values

by the number of values (another typical value)

The symbol y represents the variable of interest

The symbol read “y bar” represents the mean

Copyright © 2011 Pearson Education, Inc.

12 of 42

y

Page 13: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Mean (Average)

Copyright © 2011 Pearson Education, Inc.

13 of 42

1 2 ny y yy n

Page 14: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Variance (s2)

Is a measure of variation based on the mean

How far a value is from the mean is known as its deviation; the variance is the average of the squared deviations

Copyright © 2011 Pearson Education, Inc.

14 of 42

Page 15: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Variance

Copyright © 2011 Pearson Education, Inc.

15 of 42

2

2 2 2

1 2

1ny y y y y y

sn

Page 16: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

The Standard Deviation (SD)

Is the square root of the variance

Is a measure of variability in the original units of the data (the variance results in squared units)

Copyright © 2011 Pearson Education, Inc.

16 of 42

2s s

Page 17: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.1 Summaries of Numerical Variables

Summary Statistics for Song Sizes

Mean = 3.7794 MB

Variance = 2.584 MB²

SD = 1.607 MB

Copyright © 2011 Pearson Education, Inc.

17 of 42

Page 18: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.1: MAKING M&M’s

Motivation

How many M&M’s are needed to fill a bag labeled to weigh 1.6 ounces?

Copyright © 2011 Pearson Education, Inc.

18 of 42

Page 19: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.1: MAKING M&M’s

Method

Data are weights of 72 plain chocolate M&M’s taken from several packages. To get a measure of the amount of variation relative to the typical size, we use the ratio of the standard deviation to the mean (known as the coefficient of variation).

Copyright © 2011 Pearson Education, Inc.

19 of 42

v

sc

y

Page 20: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.1: MAKING M&M’s

Mechanics

Mean Weight = 0.86 gmSD = 0.04 gm

Cv = 0.04 gm / 0.86 gm = 0.0465

Copyright © 2011 Pearson Education, Inc.

20 of 42

Page 21: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.1: MAKING M&M’s

Message

Since the SD is quite small compared to the mean (with a cv of about 5%) the results suggest that 53 pieces are usually enough to fill a bag.

A bag labeled 1.6 ounces weighs about 45.36 grams. Since there is little variability around the typical weight of an M&M, we can calculate the number of pieces to fill a 1.6 ounce bag as 45.36/0.86.

Copyright © 2011 Pearson Education, Inc.

21 of 42

Page 22: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.2 Histograms and the Distribution of Numerical Data

Histograms

Plot the distribution of a numerical variable by showing counts of values occurring within adjacent intervals

Similar to bar charts but designed for continuous quantitative data (bar charts are only appropriate for discrete categories)

Copyright © 2011 Pearson Education, Inc.

22 of 42

Page 23: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.2 Histograms and the Distribution of Numerical Data

Histogram of Song Sizes

Copyright © 2011 Pearson Education, Inc.

23 of 42

Page 24: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.2 Histograms and the Distribution of Numerical Data

Histogram of Song Sizes

Indicates a few very long songs (outliers)

The graph devotes more than half of its area to show less than 1% of the songs (white space rule: graphs with mostly white space can be improved by changing the interval of the plot to focus on the data rather than the white space)

Copyright © 2011 Pearson Education, Inc.

24 of 42

Page 25: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.3 Boxplot

Graph of the Five Number Summary

Copyright © 2011 Pearson Education, Inc.

25 of 42

Page 26: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.3 Boxplot

Combining Boxplots with Histograms

Boxplots locate the median and quartiles and highlight outliers

The median splits the area of the histogram in half (unlike the mean, it is resistant or robust to the effects of outliers)

Copyright © 2011 Pearson Education, Inc.

26 of 42

Page 27: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.3 Boxplot

Boxplot with Histogram of Song Sizes

Copyright © 2011 Pearson Education, Inc.

27 of 42

Page 28: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.4 Shape of a Distribution

Modes

Position of an isolated peak in a histogram

A histogram with one peak is unimodal; two is bimodal; three or more is multimodal

A histogram with all bars about the same height is uniform

Copyright © 2011 Pearson Education, Inc.

28 of 42

Page 29: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.4 Shape of a Distribution

Symmetry and Skewness

A distribution is symmetric if the two sides of its histogram are mirror images

A distribution is skewed if one tail of the histogram stretches out farther than the other

Copyright © 2011 Pearson Education, Inc.

29 of 42

Page 30: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.4 Shape of a Distribution

Distribution of Song Sizes

The mode lies between 3 and 4 MB

The distribution is right skewed (the right tail stretches out farther than the left tail)

Copyright © 2011 Pearson Education, Inc.

30 of 42

Page 31: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.2: EXECUTIVE COMPENSATION

Motivation

What can we say about the salaries of CEO’s in 2003?

Copyright © 2011 Pearson Education, Inc.

31 of 42

Page 32: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.2: EXECUTIVE COMPENSATION

Method

Data consist of the salaries for 1,501 CEO’s reported in thousands of dollars (obtained from Compustat).

Copyright © 2011 Pearson Education, Inc.

32 of 42

Page 33: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.2: EXECUTIVE COMPENSATION

Mechanics

Copyright © 2011 Pearson Education, Inc.

33 of 42

Page 34: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4M Example 4.2: EXECUTIVE COMPENSATION

Message

The distribution of annual salaries of CEO’s in 2003 is unimodal, nearly symmetric around the median of $650,000, and right skewed. The average is $697,000. The largest salary is $4,000,000.

Copyright © 2011 Pearson Education, Inc.

34 of 42

Page 35: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.4 Shape of a Distribution

Bell-Shaped Distributions and Empirical Rule

A bell-shaped distribution is symmetric and unimodal

The empirical rule uses the standard deviation to describe how data with a bell-shaped distribution cluster around the mean

Copyright © 2011 Pearson Education, Inc.

35 of 42

Page 36: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.4 Shape of a Distribution

The Empirical Rule

Copyright © 2011 Pearson Education, Inc.

36 of 42

Page 37: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.4 Shape of a Distribution

Standardizing

Converting data to z-scores

Z- scores measure the distance from the mean in standard deviations

Copyright © 2011 Pearson Education, Inc.

37 of 42

y yz

s

Page 38: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

4.5 Epilog

Can 500 different songs fit on the iPod Shuffle?

Because of variation, not every collection of 500 songs will fit. The longest 500 songs won’t fit. However, based on the typical song size, the amount of variation in song sizes and the shape of its distribution, we can say that most collections of 500 songs will fit!

Copyright © 2011 Pearson Education, Inc.

38 of 42

Page 39: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Best Practices

Be sure that data are numerical when using histograms and summaries such as the mean and standard deviation.

Summarize the distribution of a numerical variable with a graph.

Choose interval widths appropriate to the data when preparing a histogram.

Copyright © 2011 Pearson Education, Inc.

39 of 42

Page 40: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Best Practices (Continued)

Scale your plots to show data, not empty space.

Anticipate what you will see in a histogram.

Label clearly.

Check for gaps.

Copyright © 2011 Pearson Education, Inc.

40 of 42

Page 41: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Pitfalls

Do not use the methods of this chapter for categorical variables.

Do not assume that all numerical data have a bell-shaped distribution.

Do not ignore the presence of outliers.

Copyright © 2011 Pearson Education, Inc.

41 of 42

Page 42: Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Pitfalls (Continued)

Do not remove outliers unless you have a good reason.

Do not forget to take the square root of a variance.

Copyright © 2011 Pearson Education, Inc.

42 of 42