BPS - 3rd Ed. Chapter 2 1
Chapter 2
Describing Distributions with Numbers
BPS - 3rd Ed. Chapter 2 2
Numerical Summaries
Center of distribution– mean– median
Spread of distribution– five-point summary (&
interquartile range) – standard deviation (&
variance)
BPS - 3rd Ed. Chapter 2 3
Mean (Arithmetic Average)
Traditional measure of center Notation (“xbar”): Sum the values and divide by the
sample size (n)
xn
x x xn
xn i
i
n
1 1
1 2
1
x
BPS - 3rd Ed. Chapter 2 4
Mean Illustrative Example: “Metabolic Rate”
Data: Metabolic rates, 7 men (cal/day) :
1792 1666 1362 1614 1460 1867 1439
1600 7
200,11
7
1439186714601614136216661792
x
BPS - 3rd Ed. Chapter 2 5
Median (M)
Half of the ordered values are less than or equal to the median value
Half of the ordered values are greater than or equal to the median value
If n is odd, the median is the middle ordered value If n is even, the median is the average of the two
middle ordered values
BPS - 3rd Ed. Chapter 2 6
Median
Example 1 data: 2 4 6 Median (M) = 4
Example 2 data: 2 4 6 8 Median = 5 (average of 4 and 6)
Example 3 data: 6 2 4 Median 2
(order the values: 2 4 6 , so Median = 4)
BPS - 3rd Ed. Chapter 2 7
Location of the Median L(M)
Location of the median: L(M) = (n+1)÷2 ,
where n = sample size.
Example: If 25 data values are recorded, the
Median is located at position (25+1)/2 = 13 in
ordered array.
BPS - 3rd Ed. Chapter 2 8
Median Illustrative Example
L(M) = (7 + 1) / 2 = 4
Ordered array:
1362 1439 1460 1614 1666 1792 1867 median
Value of median = 1614
Data: Metabolic rates, n = 7:1792 1666 1362 1614 1460 1867 1439
BPS - 3rd Ed. Chapter 2 9
Comparing the Mean & Median Mean = median when data are symmetrical Mean median when data skewed or have
outlier (mean ‘pulled’ toward tail) while the median is more resistant
If we switch this:
1362 1439 1460 1614 1666 1792 1867
to this:
1362 1439 1460 1614 1666 1792 9867
the median is still 1614 but the mean goes from 1600 to 2742.9
BPS - 3rd Ed. Chapter 2 10
Question
The average salary at a high tech company is $250K / year
The median salary is $60K. How can this be? Answer: There are some very highly
paid executives, but most of the workers make modest salaries
BPS - 3rd Ed. Chapter 2 11
Spread = Variability
Variability the amount values spread above and below the center
Can be measured in several ways:
– range (rarely used)
– 5-point summary & inter-quartile range
– variance and standard deviation
BPS - 3rd Ed. Chapter 2 12
Range
Based on smallest (minimum) and largest (maximum) values in the data set:
Range = max min
The range is not a reliable measure of
spread (affected by outliers, biased)
BPS - 3rd Ed. Chapter 2 13
Quartiles
Three numbers which divide the ordered data into four equal sized groups.
Q1 has 25% of the data below it.
Q2 has 50% of the data below it. (Median)
Q3 has 75% of the data below it.
BPS - 3rd Ed. Chapter 2 14
Obtaining the Quartiles Order the data. Find the median
– This is Q2 Look at the lower half of the data (those
below the median)– The “median” of this lower half = Q1
Look at the upper half of the data – The “median” of this upper half = Q3
BPS - 3rd Ed. Chapter 2 15
Illustrative example: 10 ages
AGE (years) values, ordered array (n = 10):
05 11 21 24 27 | 28 30 42 50 52
Q1 Q2 Q3
Q1 = 21
Q2 = average of 27 and 28 = 27.5
Q3 = 42
BPS - 3rd Ed. Chapter 2 16
Weight Data: Sorted n = 53 Median: L(M)=(53+1)/2=27 placing it at 165
L(Q1)=(26+1)/2=13.5 placing it between 127 and 128 (127.5)L(Q3) = 13.5 from the top placing it between 185 and 185
100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212
Q1 = 127.5 Q2 = 165 Q3 = 185
BPS - 3rd Ed. Chapter 2 17
10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0
Weight Data:Quartiles
Q2 = 165
Q3 = 185
Q1 = 127.5
BPS - 3rd Ed. Chapter 2 18
Five-Number Summary
minimum = 100 Q1 = 127.5 M = 165 Q3 = 185 maximum = 260
InterquartileRange (IQR)= Q3 Q1
= 57.5
IQR gives spread of middle 50% of the data
BPS - 3rd Ed. Chapter 2 19
Boxplot
Central box spans Q1 and Q3.
A line in the box marks the median M.
Lines extend from the box out to the minimum and maximum.
BPS - 3rd Ed. Chapter 2 20
M
Weight Data: Boxplot
Q1 Q3min max
100 125 150 175 200 225 250 275
Weight
BPS - 3rd Ed. Chapter 2 21
Quartile extrapolation Quartile divides data set into 4 segment: bottom,
bottom middle, top middle, upper With small data sets extrapolate values Illustrative data: 2, 4, 6, 8
2 | 4 | 6 | 8 Q1 Q2 Q3
Q1 = average of 2 and 4, which is 3Q2 = average of 4 and 5, which is 5Q3 = average of 6 and 8, which is 7
BPS - 3rd Ed. Chapter 2 22
Boxplots useful for comparing two groups (text p. 39)
BPS - 3rd Ed. Chapter 2 23
Variances & Standard Deviation
The most common measures of spread
Based on deviations around the mean
Each data value has a deviation, defined as
xxi
BPS - 3rd Ed. Chapter 2 24
Fig 2.3: Metabolic Rate for 7 men, with their mean (*) and two deviations shown
BPS - 3rd Ed. Chapter 2 25
Variance Find the mean Find the deviation of each value Square the deviations Sum the squared deviations: we call
this the sum of squares, or SS Divide the SS by n-1
(gives typical squared deviation from mean)
BPS - 3rd Ed. Chapter 2 26
Variance Formula
sn
x xii
n2
1
12
1
( )
( )
BPS - 3rd Ed. Chapter 2 27
Standard Deviation Square root of the variance
n
ii xx
nss
1
22 )()1(
1
BPS - 3rd Ed. Chapter 2 28
Variance and Standard DeviationIllustrative Example
Data: Metabolic rates, 7 men (cal/day) :
1792 1666 1362 1614 1460 1867 1439
1600 7
200,11
7
1439186714601614136216661792
x
BPS - 3rd Ed. Chapter 2 29
Variance and Standard Deviation Illustrative Example (cont.)
Observations Deviations Squared deviations
1792 17921600 = 192 (192)2 = 36,864
1666 1666 1600 = 66 (66)2 = 4,356
1362 1362 1600 = -238 (-238)2 = 56,644
1614 1614 1600 = 14 (14)2 = 196
1460 1460 1600 = -140 (-140)2 = 19,600
1867 1867 1600 = 267 (267)2 = 71,289
1439 1439 1600 = -161 (-161)2 = 25,921
sum = 0 SS = 214,870
xxi ix 2xxi
BPS - 3rd Ed. Chapter 2 30
Variance and Standard Deviation Illustrative Example (cont.)
67.811,3517
870,2142
s
calories 24.18967.811,35 s
Notes: (1) Use standard deviation s for descriptive purposes(2) Variance & standard deviation calculated by calculator or computer in practice
BPS - 3rd Ed. Chapter 2 31
Summary Statistics
Two main measures of central location– Mean ( ) – Median (M)
Two main measures of spread– Standard deviation (s)– 5-point summary (interquartile range)
x
BPS - 3rd Ed. Chapter 2 32
Choosing Summary Statistics
Use the mean and standard deviation for reasonably symmetric distributions that are free of outliers.
Use the median and IQR (or 5-point summary) when data are skewed or when outliers are present.
BPS - 3rd Ed. Chapter 2 33
Example: Number of Books Read
0 1 2 4 10 300 1 2 4 10 990 1 2 4 120 1 3 5 130 2 3 5 140 2 3 5 140 2 3 5 150 2 4 5 150 2 4 5 201 2 4 6 20
M
L(M)=(52+1)/2=26.5
BPS - 3rd Ed. Chapter 2 34
Illustrative example: “Books read”
5-point summary: 0, 1, 3, 5.5, 99Note highly asymmetric distribution
“xbar” = 7.06 s = 14.43The mean and standard deviation give false
impression with asymmetric data
0 10 20 30 40 50 60 70 80 90 100 Number of books
Top Related