Post on 17-Jan-2020
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
1
Chapter 2
Descriptive Statistics
2-1 Overview
2-2 Summarizing Data
2-3 Pictures of Data
2-4 Measures of Central Tendency
2-5 Measures of Variation
2-6 Measures of Position
2-7 Exploratory Data Analysis
Review and Projects
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
2
Descriptive Statistics
summarizes or describes the important
characteristics of a known set of
population data
Inferential Statistics
uses sample data to make inferences
about a population
Overview 2-1
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
3
1. Nature or shape of the distribution, such as bell-shaped, uniform, or skewed
2. Representative score, such as an average
3. Measure of scattering or variation
Important Characteristics
of Data
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
4
Summarizing Data With
Frequency Tables
Frequency Table
lists categories (or classes) of scores,
along with counts (or frequencies) of the
number of scores that fall into each
category
2-2
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
5
Axial Loads of 0.0109 in. Cans
270
278
250
278
290
274
242
269
257
272
265
263
234
270
273
270
277
294
279
268
230
268
278
268
262
Table 2-1
273
201
275
260
286
272
284
282
278
268
263
273
282
285
289
268
208
292
275
279
276
242
285
273
268
258
264
281
262
278
265
241
267
295
283
281
209
276
273
263
218
271
289
223
217
225
283
292
270
262
204
265
271
273
283
275
276
282
270
256
268
259
272
269
270
251
208
290
220
259
282
277
282
256
293
254
223
263
274
262
263
200
272
268
206
280
287
257
284
279
252
280
215
281
291
276
285
287
297
290
228
274
277
286
277
251
278
277
286
277
289
269
267
276
206
284
269
284
268
291
289
293
277
280
274
282
230
275
236
295
289
283
261
262
252
283
277
204
286
270
278
270
283
272
281
288
248
266
256
292
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
6
Frequency Table of Axial
Loads of Aluminum Cans
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Table 2-2
Axial Load Frequency
9
3
5
4
4
14
32
52
38
14
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
7
Frequency Table Definitions
• Class: An interval.
• Lower Class Limit: The left endpoint of a class.
• Upper Class Limit: The upper endpoint of a class.
• Class Mark: The midpoint of the class.
• Class width: the difference between the two
consecutive lower class limits.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
8
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Score Frequency
9
3
5
4
4
14
32
52
38
14
Table 2-2
Definition values for the example
Lower Class Limits: 200, 210, …
Upper class limits: 209,219 …
Class Marks: 204.5=(200+209)/2,, 214.5, …
Class width: 210-200=10.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
9
Determine the Definition Values
for this Frequency Table
Quiz Scores
Frequency
0 - 4
5 - 9
10 - 14
15 - 19
20 - 24
2
5
8
11
7
Classes
Lower Class Limits
Upper Class Limits
Class Marks
Class Width
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
10
•3. Select for the first lower limit either the lowest score or a
convenient value slightly less than the lowest score.
•4. Add the class width to the starting point to get the second lower
class limit.
•5. List the lower class limits in a vertical column and enter the
upper class limits.
•6. Represent each score by a tally mark in the appropriate class.
Total tally marks to find the total frequency for each class.
Constructing A Frequency Table
• 1. Decide on the number of classes.
• 2. Determine the class width by dividing the range by the number
of classes (range = highest score – lowest score) and round up.
class width = round up of range
number of classes
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
11
1. Classes should be mutually exclusive.
2. Include all classes, even if the frequency is zero.
3. Try to use the same width for all classes.
4. Select convenient numbers for class limits.
5. Use between 5 and 20 classes.
6. The sum of the class frequencies must equal the number of original data values.
Guidelines For Frequency
Tables
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
12
Relative Frequency Table
relative frequency = class frequency
sum of all frequencies
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
13
Relative Frequency Table
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Score Frequency
9
3
5
4
4
14
32
52
38
14
Table 2-2
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Axial Load
Relative Frequency
0.051
0.017
0.029
0.023
0.023
0.080
0.183
0.297
0.217
0.080
-
Table 2-3
9 175
3 175
5 175
= .051
= .017
= .029
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
14
Cumulative Frequency Table
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Score Frequency
9
3
5
4
4
14
32
52
38
14
Table 2-2
Less than 210
Less than 220
Less than 230
Less than 240
Less than 250
Less than 260
Less than 270
Less than 280
Less than 290
Less than 300
Axial Load
Cumulative Frequency
9
12
17
21
25
39
71
123
161
175
Table 2-4
Cumulative
Frequencies
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
15
Frequency Tables
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Score Frequency
9
3
5
4
4
14
32
52
38
14
Table 2-2
200 - 209
210 - 219
220 - 229
230 - 239
240 - 249
250 - 259
260 - 269
270 - 279
280 - 289
290 - 299
Axial Load
Relative Frequency
0.051
0.017
0.029
0.023
0.023
0.080
0.183
0.297
0.217
0.08-
Table 2-3
Axial Load
Cumulative Frequency
9
12
17
21
25
39
71
123
161
175
Table 2-4
Less than 210
Less than 220
Less than 230
Less than 240
Less than 250
Less than 260
Less than 270
Less than 280
Less than 290
Less than 300
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
16
Mean
FIGURE 2-7
Mean as a Balance Point
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
17
µ is pronounced ‘mu’ and denotes the mean of all values
Notation
x is pronounced ‘x-bar’ and denotes the mean of a set of
sample values
S denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
in a population
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
18
Calculators can calculate the mean of data
Definitions
Mean the value obtained by adding the scores and
dividing the total by the number of scores
n x =
S x Sample
N µ =
S x Population
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
19
Definitions
Median
the middle value when scores are arranged in (ascending or descending) order
often denoted by x (pronounced ‘x-tilde’)
is not affected by an extreme value
~
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
20
• 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 4.5
4 + 5
2 = 4.5
• 5 5 5 3 1 5 1 4 3 5 2 • 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
21
Definitions
Mode
the score that occurs most frequently
Bimodal
Multimodal
No Mode
the only measure of central tendency that can be used with nominal data
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
22
Examples
• Mode is 5
• Bimodal
• No Mode
a. 5 5 5 3 1 5 1 4 3 5
b. 2 2 2 3 4 5 6 6 6 7 9
c. 2 3 6 7 8 9 10
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
23
Examples
• Mode is 5
• Bimodal
• No Mode
a. 5 5 5 3 1 5 1 4 3 5
b. 2 2 2 3 4 5 6 6 6 7 9
c. 2 3 6 7 8 9 10
d. 2 2 3 3 3 4
e. 2 2 3 3 4 4 5 5
• Mode is 3
• No Mode
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
24
Midrange
the value halfway between the highest and lowest scores
Definitions
Midrange = highest score + lowest score
2
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
25
Carry one more decimal place than is present in the orignal set of data
Round-off rule for
measures of central tendency
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
26
An Example of Skewness
76543
3
2
1
0
C1
Fre
qu
en
cy
Dataset 1: 3, 4, 4, 5, 5, 5, 6, 6, 7
Mean = 5, Median = 5
Dataset 2: 3, 4, 4, 5, 5, 5, 7, 7 ,9.
Mean=5.444, Median = 5.
Dataset 3: 2, 3, 3, 5, 5, 5, 6, 6, 7.
Mean = 4.667, Median = 5.
Symmetric
Skewed right
Skewed left
765432
3
2
1
0
C3
Fre
qu
en
cy
9876543
3
2
1
0
C2
Fre
qu
en
cy
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
27
Skewness
Mode = Mean = Median
SKEWED LEFT
(negatively)
SYMMETRIC
Mean Mode
Median
SKEWED RIGHT
(positively)
Mean Mode
Median
Figure 2-8 (b)
Figure 2-8 (a)
Figure 2-8 (c)
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
28
• Advantages - Disadvantages
Best Measure
of Central Tendency
Table 2-6
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
29
use class mark of classes for variable x
Mean from a Frequency Table
x = Formula 2-2 f
S (f • x)
S
x = class mark
f = frequency
S f = n
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
30
0 - 4
5 - 9
10 - 14
15 - 19
20 - 24
2
5
8
11
7
Quiz Scores
Frequency Class Marks
2
7
12
17
22
Mean of this
frequency table =14.4
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
31
Waiting Times of Bank Customers
at Different Banks
in minutes
Jefferson Valley Bank
Bank of Providence
6.5
4.2
6.6
5.4
6.7
5.8
6.8
6.2
7.1
6.7
7.3
7.7
7.4
7.7
7.7
8.5
7.7
9.3
7.7
10.0
Jefferson Valley
Bank
7.15
7.20
7.7
7.10
Bank of Providence
7.15
7.20
7.7
7.10
Mean
Median
Mode
Midrang
e
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
32
Measure of Variation
Range
score
highest lowest
score
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
33
a measure of variation of the scores
about the mean
(average deviation from the mean)
Measure of Variation
Standard Deviation
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
34
Sample Standard Deviation
Formula
Formula 2 -4
calculators can calculate sample standard
deviation of data
S (x – x)2
n – 1 S =
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
35
Find the standard deviation of the sample data: 2, 3, 4, 5, 5, 5. S2 = 8/5=1.6, S=1.26. Use the shortcut formula to find the standard deviations of the above data, and the waiting times at the two banks. 1) S x2
=104,
2) Jefferson Valley Bank: S x2 =513.27, S x
=71.5, s=0.48.
3) Bank of Providence: S x2 =541.09, S x =71.5, s=1.82.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
36
Population Standard Deviation
calculators can calculate the
population standard deviation
of data
S (x – µ)
N
2
s =
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
37
Symbols
for Standard Deviation
Sample Population
s
s x
xsn
s
Sx
xsn–1
Book
Some graphics calculators Some nongraphics calculators
Textbook
Some graphics calculators
Some nongraphics
calculators
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
38
Measure of Variation
Variance
standard deviation squared
s
s
2
2
}
use square key on calculator
Notation
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
39
S (x – x)2
n – 1 s
2 =
S (x – µ)2
N s2
=
Sample
Variance
Population
Variance
Variance
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
40
Round-off Rule
for measures of variation
Carry one more decimal place than was present in the original data
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
41
Standard Deviation
Shortcut Formula
Formula 2 - 6
n (n – 1) s
= n (S x2) – (S x)2
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
42
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
1
2
3
4
5
6
7
s = 0
s = 0.8 s = 1.0 s = 3.0
Standard deviation gets larger as spread of data increases.
Same Means (x = 4)
Different Standard Deviations
FIGURE 2-10
Fre
qu
en
cy
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
43
x – s x x + s
68% within 1 standard deviation
0.340 0.340
The Empirical Rule
(applies to bell shaped distributions)
FIGURE 2-10
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
44
x – 2s x – s x x + 2s x + s
68% within 1 standard deviation
0.340 0.340
95% within 2 standard deviations
The Empirical Rule
(applies to bell shaped distributions)
0.135 0.135
FIGURE 2-10
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
45
x – 3s x – 2s x – s x x + 2s x + 3s x + s
68% within 1 standard deviation
0.340 0.340
95% within 2 standard deviations
99.7% of data are within 3 standard deviations of the mean
The Empirical Rule
(applies to bell shaped distributions)
0.001 0.001
0.024 0.024
0.135 0.135
FIGURE 2-10
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
46
Range Rule of Thumb
x – 2s
x x + 2s
Range 4s
or
s Range
4
(minimum) (maximum)
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
47
Chebyshev’s Theorem
applies to distributions of any shape
the proportion (or fraction) of any set of
data lying within k standard deviations of
the mean is always at least 1 – 1/k2, where
k is any positive number greater than 1.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
48
Measures of Variation
Summary
• For typical data sets, it is unusual for a
score to differ from the mean by more than
2 or 3 standard deviations.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
49
An application of measure of variation
There are two brands, A, B or car tires. Both have a mean life time of 60,000 miles, but brand A has a standard deviation on lifetime of 1000 miles and Brand B has a standard deviation on lifetime of 3000 miles. Which brand would you prefer?
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
50
Q1, Q2, Q3 divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3 Q2 Q1
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
51
• 99 Percentiles
Percentiles
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
52
Finding the Percentile of a Given Score
Percentile of score x = • 100 number of scores less than x
total number of scores
[1] 200 201 204 204 206 206 208 208 209 215 217 218 220 223 223
[16] 225 228 230 230 234 236 241 242 242 248 250 251 251 252 252
[31] 254 256 256 256 257 257 258 259 259 260 261 262 262 262 262
[46] 262 263 263 263 263 263 264 265 265 265 266 267 267 268 268
[61] 268 268 268 268 268 268 268 269 269 269 269 270 270 270 270
[76] 270 270 270 270 271 271 272 272 272 272 272 273 273 273 273
[91] 273 273 274 274 274 274 275 275 275 275 276 276 276 276 276
[106] 277 277 277 277 277 277 277 277 278 278 278 278 278 278 278
[121] 279 279 279 280 280 280 281 281 281 281 282 282 282 282 282
[136] 282 283 283 283 283 283 283 284 284 284 284 285 285 285 286
[151] 286 286 286 287 287 288 289 289 289 289 289 290 290 290 291
[166] 291 292 292 292 293 293 294 295 295 297
Sorted Axial Loads of 175 Aluminum Cans
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
53
Rank the data.
(Arrange the data in
order of lowest to
highest.)
Finding the Value of the
kth Percentile
The value of the kth percentile
is midway between the Lth score
and the highest score in the
original set of data. Find Pk by
adding the L th score and the
next higher score and dividing the
total by 2.
Start
Compute
L = n where
n = number of scores
k = percentile in question
) ( k
100
Change L by rounding
it up to the next
larger whole number.
The value of Pk is the
Lth score, counting
from the lowest
Is L a whole
number ?
Yes
No
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
54
[1] 200 201 204 204 206 206 208 208 209 215 217 218 220 223 223
[16] 225 228 230 230 234 236 241 242 242 248 250 251 251 252 252
[31] 254 256 256 256 257 257 258 259 259 260 261 262 262 262 262
[46] 262 263 263 263 263 263 264 265 265 265 266 267 267 268 268
[61] 268 268 268 268 268 268 268 269 269 269 269 270 270 270 270
[76] 270 270 270 270 271 271 272 272 272 272 272 273 273 273 273
[91] 273 273 274 274 274 274 275 275 275 275 276 276 276 276 276
[106] 277 277 277 277 277 277 277 277 278 278 278 278 278 278 278
[121] 279 279 279 280 280 280 281 281 281 281 282 282 282 282 282
[136] 282 283 283 283 283 283 283 284 284 284 284 285 285 285 286
[151] 286 286 286 287 287 288 289 289 289 289 289 290 290 290 291
[166] 291 292 292 292 293 293 294 295 295 297
The 10th percentile: L=175*10/100=17.5, round up to 18. So the 10th
percentile is the 18th one in the sorted data, i.e., 230.
The 25th percentile: L=175*25/100=43.52, rounded up to 44. The 25th
percentile is the 44th one in the sorted data, I.ei. 262.
Sorted Axial Loads of 175 Aluminum Cans
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
55
Interquartile Range: Q3 – Q1
Semi-interquartile Range:
Midquartile:
2
2
Q3 – Q1
Q1 + Q3
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
56
Exploratory Data Analysis
Used to explore data at a
preliminary level
Few or no assumptions are made
about the data
Tends to evolve relatively simple
calculations and graphs
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
57
Used to confirm final conclusions
about data
Typically requires some very
important assumptions about the
data
Calculations are often complex, and
graphs are often unnecessary
Exploratory Data Analysis
Used to explore data at a
preliminary level
Few or no assumptions are made
about the data
Tends to evolve relatively simple
calculations and graphs
Traditional Statistics
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
58
Boxplots
Box-and-Whisker Diagram
5 - number summary
Minimum
first quartile Q1
Median
third quartile Q3
Maximum
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
59
Boxplots Box-and-Whisker Diagram
Figure 2-13 Boxplot of Pulse Rates (Beats per minute) of Smokers
52
60 68.5 78
90
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
60
Normal Skewed
Figure 2-14 Boxplots
Uniform
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
61
Outliers
Values that are very far away from most of the data
300
290
280
270
260
250
240
230
220
210
200
Axia
l L
oa
d
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
62
Class Survey Data
yn
75
70
65
60
Bone
He
igh
t
Boxplots for the heights of those who never broke a bone and those who did
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman
63
When comparing two or more boxplots, it is necessary to use the same scale.
40
50
60
70
80
90
100
PU
LS
E
(yes) SMOKE (No)
1 2