Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

29
Chapter 4 Displaying and Summarizing Quantitative Data Math2200

Transcript of Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Page 1: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Chapter 4 Displaying and Summarizing

Quantitative Data

Math2200

Page 2: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Example: Tsunamis and Earthquake

- The most disastrous tsunami

- Dec 26, 2004, in Sumatra

- Earthquake: magnitude 9.0

- Killed 225,000 people

Page 3: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Question

• Was the earthquake that caused it truly unusually big?

• US National Geophysical Data Center

• Data on the magnitude of underlying earthquakes for 1240 historical tsunamis

• How do we learn the data

Year Country Magnitude

-1300 GREECE 6

-479 GREECE 6.7

-426 GREECE 7.1

-373 GREECE 7.3

-330 GREECE 7

-227 GREECE 7.2

-57 ALBANIA 6.6

-26 CYPRUS ISLAND 7.3

2003 NEW ZEALAND 7.5

2003 JAPAN 8.1

2003 JAPAN 6.8

2003 USA 7.2

2004 INDONESIA 6.5

2004 JAPAN 7.2

2004 JAPAN 7.4

2004 INDONESIA 9

Page 4: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Histogram

• Display a quantitative variable by discretizing it into equal-width bins

• Counts for the bins give the distribution of the quantitative variable

• Make a bar chart based on these counts and align the bar according to the bin values, we get a histogram– Do not leave gaps between bars

Page 5: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Historgram of earthquake magnitude

Magnitude

Fre

qu

en

cy

3 4 5 6 7 8 9

05

01

00

15

02

00

25

03

00

35

0

Page 6: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

With different number of bins

Magnitude

Fre

quen

cy

3 4 5 6 7 8 9

050

100

150

Magnitude

Fre

quen

cy

3 4 5 6 7 8 9

050

100

150

Magnitude

Fre

quen

cy

3 4 5 6 7 8 9

050

150

250

350

Magnitude

Fre

quen

cy

3 4 5 6 7 8 9

010

020

030

040

050

0

Page 7: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Summarize the histogram

• Magnitudes are typically around 7

• Most are between 5.5 and 8.5

• Minimum is around 3

• Maximum is around 9

• Why there is a sharp peak in the middle?

Page 8: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Stem-and-Leaf Plot• John W. Tukey

• Useful for small data sets

• Similar to histogram, but the bars give numerical values more than counts

5 | 6 6 | 0444 6 | 8888 7 | 2222 7 | 6666 8 | 000044 8 | 8Pulse-rates of 24 woman(8|8 means 88 beats/min)

Page 9: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Handwriting

• Handwriting may not give the same space for different digits. That violates the area principle

• When you make a stem-and-leaf plot, be sure to give each digit the same width.

Page 10: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Dotplot

• Replace digits in stem-and-leaf plot by dots

Page 11: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

How to summarize the distribution of a quantitative variable?

• shape

mode, symmetry, outlier

• center

mean, median

• Spread

sd, IQR

Page 12: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Shape• Peak / Mode

– Is there a peak? If so, how many peaks?– For quantitative variables, the mode is where

the peak is at.– No peak: uniform– One peak: unimodal– Two peaks: bimodal– More than two peaks: multimodal

Page 13: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

uniform

x

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

unimodal

x

Fre

quen

cy

-2 -1 0 1 2

05

1015

20

bimodal

x

Fre

quen

cy

-2 0 2 4 6

020

4060

8010

0

multimodal

x

Fre

quen

cy

-5 0 5

020

4060

8010

0

Page 14: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Shape• Symmetry

– Tail: thinner ends of a distribution– Skewed: If one tail stretches out farther than the

other, we say the histogram is skewed to the side of the longer tail

skewed to the right

x1

Fre

quen

cy

0.0 0.5 1.0 1.5 2.0 2.5 3.0

020

4060

8010

0

symmetric

x

Fre

quen

cy

-3 -2 -1 0 1 2 3

020

4060

8010

0skewed to the left

x2

Fre

quen

cy

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

020

4060

8010

0

Page 15: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Shape

• Outliers– Those that stand

away from the body of the distribution

– The judgment is vague sometimes

Fre

qu

en

cy

-6 -4 -2 0 2

05

10

15

Page 16: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Center

• When a histogram is symmetric and unimodal, the center is obvious– The corresponding numerical value can be

taken as the sample average, or say the sample mean

– The sample mean is actually where the histogram balances

Page 17: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Center

• For skewed distribution– The sample mean is dragged to the side of

the longer tail– Usually, much more than 50% values will be

less or larger than the sample mean– Median is more appropriate

• Median is the value that splits the data in half

Page 18: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.
Page 19: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.
Page 20: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Finding the median

• Suppose that we have n numbers

• Order them first– If n is odd, the median is middle value. That

is, the value in the (n+1)/2 position– If n is even, we take median as the average of

the values in positionsn/2 and n/2+1

Page 21: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Mean versus median

• Extreme values / outliers:– Median only considers the order of the values,

so it is resistant to extreme values– Mean is very sensitive

• Skewed distribution– Median is preferred than mean

• Unimodal and symmetric distribution– Mean is preferred because it uses more

information from the data

Page 22: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Spread

To quantify the variation

• Range

• Interquartile range (IQR)

• Standard deviation

Page 23: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Range

• Range = max – min

• Very sensitive to extreme values

Page 24: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Interquartile Range

• Quartiles– Q1 (lower quartile or the 25th percentile): one quarter

of the data lies below Q1– Q2 (median or the 50th percentile)– Q3 (upper quartile or the 75th percentile): one quarter

of the data lies above Q3• IQR = Q3-Q1

– Not sensitive to extreme values• How to find Q1 and Q3?

– Split the order values into two halves using the median

– Q1 is the median of the first half– Q3 is the median of the second half

Page 25: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Standard deviation

• Sample variance = average of squared deviations

• Standard deviation (sd)– Sensitive to extreme values

Page 26: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

σX in TI-83

Page 27: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

How to obtain these numbers using TI-83?

• Press STAT• Move the cursor to CALC• Press 1• The screen shows 1-Var Stats• Put the list you want the statistics for. For examp

le, L1.• Press ENTER, then you will see

– Sample mean, sample sum, sample sum squares, sample standard deviation (Sx),σx (the same as except divided by n instead of n-1), sample size n, minimum, Q1, median, Q3, maximum

Page 28: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

Summary

• Make a picture– Histogram, stem-and-leaf plot, dot plot

• Shape– How many modes?– Symmetric?– Outliers?

• If there are outliers, summarize once with the outliers and another time without the outliers

• Center and spread– Skewed distribution: median and IQR– Symmetric and unimodal distribution: mean and sd

Page 29: Chapter 4 Displaying and Summarizing Quantitative Data Math2200.

What can go wrong?• Do not use what we learned in chapter 4 for a categorica

l variable– Do not make histogram of a categorical variable– Do not look for shape and center and spread of a bar chart– Do not use mean, sd, IQR, etc. for a categorical variable

• Graph with bars are not always histograms or bar charts• Choose a bin width appropriate to the data• Check the summary numbers. Do they make sense?• Do not worry about small differences when using differen

t methods– No need to use too many digits for the summary numbers– Using one or two more digits than data is enough

• Do not round in the middle of a calculation• Multiple modes, outliers (make a picture)