Chapter 5 Understanding and Comparing Distributions Math2200.

25
Chapter 5 Understanding and Comparing Distributions Math2200

Transcript of Chapter 5 Understanding and Comparing Distributions Math2200.

Page 1: Chapter 5 Understanding and Comparing Distributions Math2200.

Chapter 5 Understanding and Comparing

Distributions

Math2200

Page 2: Chapter 5 Understanding and Comparing Distributions Math2200.

Example: The Hopkins Memorial Forest

• A 2500-acre reserve in Massachusetts, New York, Vermont

• Managed by the Williams College center for Environmental Studies (CES)

• http://www.williams.edu/CES/hopkins.htm

• Average wind speed for every day in 1989– Important for monitoring storms

Page 3: Chapter 5 Understanding and Comparing Distributions Math2200.

Avg Wind Day of Year Month1.88 1 1

2.57 2 1

4.04 3 1

4.73 4 1

2.49 5 1

2.17 6 1

3.51 7 1

4.59 8 1

4.4 9 1

1.85 10 1

3.17 11 1

3.44 12 1

6.33 13 1

2.39 14 1

3.14 15 1

2.56 16 1

3.11 17 1

1.64 18 1

2.05 19 1

2.98 20 1

4.66 21 1

0.81 22 1

0.72 23 1

Page 4: Chapter 5 Understanding and Comparing Distributions Math2200.
Page 5: Chapter 5 Understanding and Comparing Distributions Math2200.

Five-number summary

Max 8.670

Q3 2.930

Median 1.900

Q1 1.150

Min 0.200

Page 6: Chapter 5 Understanding and Comparing Distributions Math2200.

Boxplot

• Invented by John W. Tukey

Page 7: Chapter 5 Understanding and Comparing Distributions Math2200.

Constructing Boxplots

1. Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.

Page 8: Chapter 5 Understanding and Comparing Distributions Math2200.

Constructing Boxplots (cont.)

2. Erect “fences” around the main part of the data.

– The upper fence is 1.5 IQRs above the upper quartile.

– The lower fence is 1.5 IQRs below the lower quartile.

– Note: the fences only help with constructing the boxplot and should not appear in the final display.

Page 9: Chapter 5 Understanding and Comparing Distributions Math2200.

Constructing Boxplots (cont.)

3. Use the fences to grow “whiskers.”

– Draw lines from the ends of the box up and down to the most extreme data values found within the fences.

– If a data value falls outside one of the fences, we do not connect it with a whisker.

Page 10: Chapter 5 Understanding and Comparing Distributions Math2200.

Constructing Boxplots (cont.)

4. Add the outliers by displaying any data values beyond the fences with special symbols.– We often ( not always

) use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

Page 11: Chapter 5 Understanding and Comparing Distributions Math2200.

How to make a boxplot?

• Draw a single vertical axis spanning the extent of the data

• Draw short horizontal lines at the Q1, median, Q3. Then connect them to make a box.

• Draw ‘fences’– Upper fence = Q3 + 1.5 * IQR– Lower fence = Q1 - 1.5 * IQR

• Grow ‘whiskers’• Add outliers• TI-83 can make boxplots

Page 12: Chapter 5 Understanding and Comparing Distributions Math2200.
Page 13: Chapter 5 Understanding and Comparing Distributions Math2200.

Comparing groups

• Relationship between a quantitative variable and a categorical variable– A categorical variable defines groups

• Is it windier in the winter or summer?– A binary categorical variable

• Spring/Summer: April -- September• Fall/Winter: October – March

– A quantitative variable: average wind speed

Page 14: Chapter 5 Understanding and Comparing Distributions Math2200.
Page 15: Chapter 5 Understanding and Comparing Distributions Math2200.
Page 16: Chapter 5 Understanding and Comparing Distributions Math2200.

Comparison

    Spring/Summer Fall/Winter

shape

mode unimodal unimodal

symmetry skewed to the right less skewed

outlier no yes

centermean 1.556 2.712

median 1.340 2.470

spreadStdDev 1.005 1.359

IQR 1.315 1.865

Page 17: Chapter 5 Understanding and Comparing Distributions Math2200.

Are some months windier than others?

Page 18: Chapter 5 Understanding and Comparing Distributions Math2200.

Summary

• Average wind speed is lower and less variable in the summer, especially July

• Average wind speed is higher and more variable in the winter

• The highest winder speed occurs in November

• More outliers than when plotting for the entire year

Page 19: Chapter 5 Understanding and Comparing Distributions Math2200.

Outliers

• Some outliers are obviously errors– Misplacing the decimal point– Digit transposed– Digits repeated or omitted– Units may be wrong– Incorrectly copied

• What to do with outliers?– If there are any clear outliers and you are reporting

the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing.

– Note: The median and IQR are not likely to be affected by the outliers.

Page 20: Chapter 5 Understanding and Comparing Distributions Math2200.

Timeplots• For some data sets, we are interested in how the

data behave over time. In these cases, we construct timeplots of the data.

Page 21: Chapter 5 Understanding and Comparing Distributions Math2200.

Timeplots

Page 22: Chapter 5 Understanding and Comparing Distributions Math2200.

Re-expressing Skewed Data to Improve Symmetry

When data are skewed, it is hard to simply summarize with a center and spread.

Can we transform the data to be more symmetric?

Histogram of the annual compensation to CEOs of the Fortune 500 companies in 2005

Page 23: Chapter 5 Understanding and Comparing Distributions Math2200.

Re-expressing Skewed Data to Improve Symmetry (cont.)

• One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function (e.g., logarithmic function or square root).

Page 24: Chapter 5 Understanding and Comparing Distributions Math2200.

What Can Go Wrong?

• Avoid inconsistent scales, either within the display or when comparing two displays.

• Label clearly so a reader knows what the plot displays.

• Beware of outliers• Be careful when comparing groups that have

very different spreads

Page 25: Chapter 5 Understanding and Comparing Distributions Math2200.

What Can Go Wrong? (cont.)