Summarizing and Displaying Measurement Data

22
Summarizing and Displaying Measurement Data 1. If you were to read the results of a study showing that daily use of a certain exercise machine resulted in an average 10- pound weight loss, what more would you want to know about the numbers in addition to the average? (Hint: Do you think everyone who used the machine lost 10 pounds?) 2. A real estate website reported that the median price of single family homes sold in the past 9 months in the local area was $136,900 and the average price was $161,447. How do you think these values are computed? Which do you think is more useful to someone considering the purchase of a home, the median or the average? 3. From the roster of a professional football team, the salary of the starting quarterback is an “outlier.” What do you think this means? If the players’ ages were considered, would this same Thought Questions

description

Summarizing and Displaying Measurement Data. Thought Questions. If you were to read the results of a study showing that daily use of a certain exercise machine resulted in an average 10-pound weight loss, what more would you want to know about the numbers in addition to the average? - PowerPoint PPT Presentation

Transcript of Summarizing and Displaying Measurement Data

Page 1: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

1. If you were to read the results of a study showing that daily use of a certain exercise machine resulted in an average 10-pound weight loss, what more would you want to know about the numbers in addition to the average?

(Hint: Do you think everyone who used the machine lost 10 pounds?)

2. A real estate website reported that the median price of single family homes sold in the past 9 months in the local area was $136,900 and the average price was $161,447.

How do you think these values are computed?

Which do you think is more useful to someone considering the purchase of a home, the median or the average?

3. From the roster of a professional football team, the salary of the starting quarterback is an “outlier.”

What do you think this means?

If the players’ ages were considered, would this same quarterback necessarily have a value that is an “outlier”? Explain.

Thought Questions

Page 2: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

A workman is known by his tools. - Anonymous proverb

•There are many hundreds of useful tools—statistical methods—for analyzing data and drawing conclusions.

•At the most basic level, we use simple, straightforward tools for which pencil-and-paper arithmetic is adequate.

•At the other end of the spectrum, we use complex, sophisticated procedures that can require the fastest available computing systems for timely results.

•Like all tools, the effectiveness of the statistical methods depends on using them appropriately.

•Even such basic arithmetic tasks as counting, adding, subtracting, taking percentages, ranking in order, making plots, and so forth, are tools that help us to understand the messages hidden in the data.

•More complex tools are often concerned with summarizing data so that we can draw some conclusions without looking at the data in detail.

•Examples of such tools of summarization are mean, median, standard deviation (a measure of the scatter, or dispersion, of the data)

Page 3: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataOn the AverageThe most commonly used statistical summary measure is a typical value for a set of data.

Why would someone want a typical value for a set of data?

An athlete might want to know the typical time for a particular knee injury to heal.

A regulatory agency might want to know the average cholesterol reduction of a particular drug.

An investor might want to know the typical annual return of mutual funds in an industry sector.

•We usually think of the numerical values of data for a single variable as falling along a line. Therefore, we often call a typical value a measure of “location”.

•The typical value tells where the numerical values fall along some imaginary axis. In the case of the athlete, the values could be anywhere along an imaginary axis from zero to infinity.

•But what is the ”location” of the values reported? Somewhere between 50 and 100 days? Some people will heal in 5 days, others may have to wait as long as 300 days.

•We also think of a typical value as a measure of ”central tendency”, showing where the data tend to cluster.

Page 4: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

The data below are the annual salaries of 10 business executives (in thousands of dollars):

Raw data8901,1101,4601,4202,0001,4301,5201,1102,4001,680

•The arithmetic mean, usually called the mean or the average, is the sum of all data values divided by the number of such values.

•Thus, for the executive salaries, you compute the arithmetic mean by adding up all the salaries and dividing by the number of executives.

•In this case, the total for all the salaries is $15 million; divided by 10 you get a mean executive salary of $1.5 million.

Statistical Measures – Mean and Median

Page 5: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

•The arithmetic mean has the most meaning when the values are closely centered, with few exceptional values and tending to symmetry about the mean.

•But suppose that the one executive who earned $1,460,000 has had a profit-sharing bonanza one year and earned $5 million more for a total salary of $6,460,000 instead of $1,460,000.

•While most of the executive salaries are still around $1.5 million and only one other makes more than $2 million, the mean has jumped from $1.5 million to $2 million, an increase in the value of the mean of more than 30%.

•Similar situations often arise wherever money or value is involved. Can you think of one?

Statistical Measures – Mean and Median

Page 6: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataThe Median•Using the next most common measure of central tendency, the median, avoids this problem.

•To get the value of the median, take all the numbers you have collected, and order them by increasing value.

•Once the numbers have been ordered, the median is the middle value (if the number of values is odd) or the average of the two middle values (if the number of values is even).

To get the median of the salaries, order the values as shown below: 8901,1101,1101,4201,4301,4601,5201,6802,0002,400

Then find the middle value (or as in this case, the average of the middle two values) to get a median executive salary of $1,445,000 ($1,430,000 + $1,460,000 divided by 2).

Page 7: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

The median is that value that about half the population have values below and half have values above.

Example: In 1994, the median money income of the U.S. households was about $32,300; about half of the 99 million households had money incomes below $32,300 and about half had money incomes above $32,300.

Salary Example•Note that in the original data set, the median of $1,445,000 is only a little less than the arithmetic mean $1.5 million.

•But when the one executive's $1,460,000 salary is increased to $6,460,000, the median does not change.

•At $1,445,000, the median is still typical of the executive salaries. The mean does change, however, and the new mean of $2 million is not typical

The Median

Page 8: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Displaying Quantitative Data – Histograms and Stemplots

Always start the summary of data by making a picture

Histograms•First, slice up the entire span of values covered by the quantitative variable into equal-width piles called bins.

•The bins and the counts in each bin give the distribution of the quantitative variable.

Example: Earthquake Magnitudes

•The tsunami of December 26th, 2004, in Sumatra was caused by an earthquake of Magnitude 9.0 on the Richter scale.

•It killed more than 225,000 people, making it the most disastrous tsumani on record. But was theearthquake that caused it truly extraordinary, or did it just happen at an unlucky place and time?

•The U.S. National Geophysical Data Center has information on more than 2400 tsunamis dating back to 2000 B.C.E., and we have estimates of the magnitude of the underlying earthquake for 1240 of them

Page 9: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Example: Earthquake Magnitudes

Page 10: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Relative frequency histogram

A relative frequency histogram displays the percentage of cases in each bin instead of the count.

Page 11: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Stemplots

Stemplots (also called stem-and-leaf displays) show the distribution of a quantitative variable, like histograms do, while preserving the individual values.

ExampleCompare the histogram and stem-and-leaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer?

Page 12: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataConstructing a Stemplot

1. First, cut each data value into leading digits (“stems”) and trailing digits (“leaves”).

2. Use the stems to label the bins.

3. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem.

Example•The results of the 1999 Laboratory of Ornithology Christmas Bird Count are displayed in the stem and leaf display at the right.

•This display uses split stems, to give the display a bit more definition. The lower stem contains leaves with digits 0,1,2,3,4 and the upper stem contains leaves with digits 5,6,7,8,9.

Describe the characteristics of the stem-plot?

228

178

186

162

183

160

206

181

160

166

206

157

163

177

156

175

153

167

153

162

152

Christmas BirdCount Totals 1999

Data

KEY:18 | 6 = 186speciesspotted

Page 13: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataConstructing a Stemplot

Weight Data - STAT 208 Class Survey -- Spring, 1997 -- Virginia Commonwealth University

Page 14: Summarizing and Displaying Measurement Data

1011121314151617181920212223242526

Key

20|3 means203 pounds

Stems = 10’sLeaves = 1’s

192

2

1522

5

135

10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0

Weight Data - STAT 208 Class Survey -- Spring, 1997 -- Virginia Commonwealth University

Summarizing and Displaying Measurement Data

Page 15: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataWeight Data: Frequency Table

Weight Group Frequency 100-120 7 120-140 12 140-160 7 160-180 9 180-200 12 200-220 4 220-240 1 240-260 0 260-280 1

Weight

0

2

4

6

8

10

12

14

Frequency

100 120 140 160 180 200 220 240 260 280

* Left endpoint is included in the group, right endpoint is not.

Page 16: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataShape, Center, and SpreadWhen describing a distribution, make sure to always tell about three things: shape, center, and spread

What is the Shape of the Distribution?Does the histogram have a single, central hump or several separated humps?

Humps in a histogram are called modes.

A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal;

histograms with three or more peaks are called multimodal.

Diastolic Blood Pressure Proportion of Wins

Page 17: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Is the histogram symmetric?If you can fold the histogram along a vertical line through the middle and have the edges match pretty closely, the histogram is symmetric.

Page 18: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement DataOutliers Extreme values, far from the rest of the data. May occur naturally May occur due to error in recording May occur due to error in measuring Observational unit may be fundamentally different

Number of Books Read for Pleasure

0123456789

10

Number of Books

Page 19: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Boxplots and the 5-Number Summary

The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum).

Example: The five-number summary for the daily wind speed is:

Max 8.67

Q3 2.93

Median 1.90

Q1 1.15

Min 0.20

A boxplot is a graphical display of the five-number summary.

• Draw a single vertical axis spanning the range of the data.

• Draw short horizontal lines at the lower and upper quartiles and at the median.

• Then connect them with vertical lines to form a box.

Constructing a Boxplot

Page 20: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Constructing a Boxplot

• Erect “fences” around the main part of the data.

• The upper fence is 1.5 IQRs above the upper quartile.

• The lower fence is 1.5 IQRs below the lower quartile.

Note: the fences only help with constructing the boxplot and should not appear in the final display.

• Use the fences to grow “whiskers.”

• Draw lines from the ends of the box up and down to the most extreme data values found within the fences.

• If a data value falls outside one of the fences, we do not connect it with a whisker.

Page 21: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

• Add the outliers by displaying any data values beyond the fences with special symbols.

• We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

Constructing a Boxplot

Compare the histogram and boxplot for daily wind speeds:

Page 22: Summarizing and Displaying Measurement Data

Summarizing and Displaying Measurement Data

Comparing Groups Boxplots offer an ideal balance of information and simplicity, hiding the details while displaying the overall summary information.

We often plot them side by side for groups or categories we wish to compare.

What do these boxplots tell you?