Math146 - Chapter 3 Handouts - CoffeeCup Software
Transcript of Math146 - Chapter 3 Handouts - CoffeeCup Software
Math146 - Chapter 3 Handouts
Page 1 of 39
The Greek Alphabet Source: www.mathwords.com
Math146 - Chapter 3 Handouts
Page 2 of 39
Some Miscellaneous Tips on Calculations
Examples: Round to the nearest thousandth
0.92431
0.75693 CAUTION! Do not truncate numbers!
Example: 1
6= 0.166666…
A common mistake is to truncate this decimal, and write it as: Round it off correctly (say, to three decimal places) as:
However, “in-between” zeros DO count as significant digits. Examples: Round to three significant figures
0.20361
0.00059254 A FINAL CAUTION! Be very careful to not “overly” round off intermediate
calculations, if you are going to use those numbers in a subsequent calculation.
A better method is to store those values on your calculator (using the memory
registers), OR to just do the calculation using a single command which will
probably involve the use of a lot of parentheses ( ).
Math146 - Chapter 3 Handouts
Page 3 of 39
Section 3.1 Measures of Central Tendency Descriptive measures: used to describe data sets.
Measure of center – a value at the of a data set
Three different measures of center:
1.
2.
3.
1. Mean
Probably the most commonly used measure of center.
Same as
Add the values and divide by the total number of values.
Write in symbols as:
where, xi = the data values
n = number of values in the sample
is the uppercase Greek letter ‘sigma’, summation symbol, means
“add all this stuff up”.
A parameter is a descriptive measure for a .
A statistic is a descriptive measure for a .
Math146 - Chapter 3 Handouts
Page 4 of 39
Sample statistics Population parameters
Number of Data Values
Symbol for mean
Formula for mean
Round-Off Rule: Round off your final answer to than
is present in the original set of data.
Example: Contents of a sample of cans of regular Coke have the following
weights in lbs:
0.8192 0.8150 0.8163 0.8211 0.8181 0.8247
Advantage of using mean as the measure of center for a data set:
Takes every into account
Much statistical inference that will be performed is based on the mean
Disadvantage:
Can be dramatically affected by a few .
Example: What People Earn (see next page) Is the mean a good measure of center for this data set?
Math146 - Chapter 3 Handouts
Page 5 of 39
What People Earn* Annual Salary Sorted Data
Admin. Clerk $38,000 1 $20,000
Real estate agent $103,100 2 $20,000
Professional golfer $5,500,000 3 $23,500
Dogwalker $20,000 4 $25,000
High school counselor $58,900 5 $26,000
Mechanical engineer $47,900 6 $38,000
Mechanical engineer $46,000 7 $39,500
Health-care director $68,000 8 $40,000
Bridal salon owner $25,000 9 $42,000
Private investigator $210,000 10 $46,000
Part-time acupuncturist $40,000 11 $46,000
Surgery resident $39,500 12 $47,900
Housekeeping aide $23,500 13 $54,600
Migrant family liaison $26,000 14 $58,900
Court clerk $54,600 15 $68,000
Sales manager $180,000 16 $103,100
Deputy sheriff $46,000 17 $180,000
Fishing guide $42,000 18 $210,000
Radio host $32,000,000 19 $5,500,000
Lay pastor $20,000 20 $32,000,000
=
Mean =
Median =
* Source: Parade (Tri-City Herald), March 2004
Note: This data is NOT randomly selected – just some data I happened to pick from one
particular page.
Math146 - Chapter 3 Handouts
Page 6 of 39
2. Median
Can help overcome the disadvantage of the mean being dramatically
affected by .
Median is physically the when the original
data values are sorted in order of increasing order.
Symbol is for a median
Procedure to find median:
Sort data in order.
If odd number of values, median =
If even number of values, median =
Example: What People Earn (see previous page)
The median is .
Resistant measure – not sensitive to the influence of a few
.
Because of this, the median is frequently used for particular types of data,
instead of the mean.
Math146 - Chapter 3 Handouts
Page 7 of 39
Example: Tri-City Housing Market
Source: https://www.tricitiesbusinessnews.com/2019/01/housing-market-slowing-down/
Math146 - Chapter 3 Handouts
Page 8 of 39
Example: US Household Income, 2018
Source: https://www.census.gov/library/stories/2019/09/us-median-household-income-up-in-2018-from-2017.html
Why do we use median instead of mean for data like housing prices and income?
Consider the population of:
All home selling prices in the Tri-Cities
All household incomes in the U.S.
The median is useful to eliminate the effect of the .
Math146 - Chapter 3 Handouts
Page 9 of 39
The is commonly quoted instead of the
for data that is strongly , such as:
Example: TV commercial for Abreva
“Studies show 4.1 days median healing time”.
Math146 - Chapter 3 Handouts
Page 10 of 39
3.
Is the value that occurs
Used mainly for , not numerical data
Can have more than one , if more than one value
occurs with the greatest frequency.
If no value is repeated, there is .
Example: Class survey, people who have a tattoo or not.
people said yes people said no
The mode is .
Example: What People Earn (see previously)
Mode: Data set is: Is the mode useful here?
Math146 - Chapter 3 Handouts
Page 11 of 39
How to determine the most appropriate measure of center :
If there are no affecting the mean,
and if the distribution is fairly ,then the
mean is the most appropriate choice for measure of center, because it
takes of the data values into consideration.
If there extreme values affecting the mean, and if the
distribution is fairly in either direction, then the
median might be a better choice, because it is
to the extreme values, and provides the best “typical” value of the data
set.
If the data consists of qualitative data, then the is the
only appropriate measure of center.
The mode is not commonly used with data.
For one reason, if you have continuous data, it is very possible that your
data set will not have a mode, because there are no repeated values. The
only time you would use it numerical set of data is if you specifically want
to know what the most frequent data value is.
Math146 - Chapter 3 Handouts
Page 12 of 39
1. From the class survey, here are the responses for how many of the United
States you have visited (combined with Math 146 Spring 2019 responses):
No. of states
1 4 9 2 5 9 2 5 9 2 5 10 3 5 10 3 5 10 3 5 12 3 6 12 3 6 13 3 6 15 3 6 16 4 6 21 4 7 22 4 7 23 4 7 26 4 8 30 4 8 34 4 8 35 4 8 35 4 8 40
Number of states
Notice that there are a total of n = 60 data values. 1. Make a dotplot of the data (below), and use the plot to
describe the distribution of the data. 2. Find the following values: Mean = Median = Mode = 3. Which measure of center would you quote for this dataset,
and why?
Math146 - Chapter 3 Handouts
Page 13 of 39
2. Answer and explain (briefly) your answers to the following. Note: one
way to explain is by coming up with a small example data set.
a. Is it possible that the mean might not equal any of the values in a data set?
b. Is it possible that the median might not equal any of the values in a data set?
c. Is it possible that the mean might be smaller than all of the values in a data set?
d. Is it possible that the median might be smaller than all of the values in a data
set?
e. Is it possible that the mean might be larger than only one value in a data set?
f. Is it possible that the median might be larger than only one value in a data set?
3. Think about all of the human beings alive at this moment.
a. Which value do you think is greater: the mean age or the median age of
all human beings alive at this moment? Explain your answer.
b. Which value do you think is greater: the mean age of all human beings
alive at this moment, or the mean age of all Americans alive at this
moment? Explain.
c. Estimate the median age of all human beings alive at this moment.
Estimate the median age of all Americans alive at this moment.
Math146 - Chapter 3 Handouts
Page 14 of 39
Section 3.2 Measures of Dispersion
Dispersion is the degree to which the data are
Example: Class Grades, 10 sample test grades from two sections
Section 1: 26, 43, 57, 64, 65, 79, 82, 88, 92, 104
Section 2: 50, 57, 66, 68, 70, 70, 72, 75, 82, 90
Math146 - Chapter 3 Handouts
Page 15 of 39
Different Measures of Dispersion/Variation:
1.
2.
3.
4.
1.
Difference between value and value
To calculate: range =
Benefit: Easy to compute
Drawback: as other measures of variation
because it depends only on highest and lowest values.
Example: Section 1: Range = Section 2: Range = 2.
Preferred measure of variation when the is used as
the measure of center.
Measures the variation of all sample values
Larger values of standard deviation indicate
Only equals 0 if all data values are the
Units are the same as the units of the original data
A drawback is that it is not , because its value
can be strongly affected by a few extreme data values.
Math146 - Chapter 3 Handouts
Page 16 of 39
Formula:
where, s = standard deviation of a
xi =
x = mean of values
n = number of data values in the
Example: Section 1 Grades
xi
(data values)
xi - x
(xi - x )2
26
43
57 -13 169
64 -6 36
65 -5 25
79 9 81
82 12 144
88 18 324
92 22 484
104 34 1156
n = 10 data values
Sum of squared deviations =
=
Sample variance: Sample standard deviation: Same round-off rule: One decimal place than the original
data for the final answer.
Math146 - Chapter 3 Handouts
Page 17 of 39
In STATDISK: Data/Explore Data – Descriptive Statistics
Section 1: Section 2:
Could tell just by looking at the graphs that Section 1 was more spread out.
Now we have an actual measure of that variation.
Standard deviation for Section 2 is much smaller, because the values are not
spread as far apart, in general the values are closer to the .
Math146 - Chapter 3 Handouts
Page 18 of 39
Sample Statistics Population Parameters
Number of Data Values n N
Mean n
xx
N
x
Standard deviation
Variance
KEY!!! Standard deviation and variance are closely related – if you have one,
you can calculate the other!
Standard deviation = Variance =
Math146 - Chapter 3 Handouts
Page 19 of 39
Math146 - Chapter 3 Handouts
Page 20 of 39
Math146 - Chapter 3 Handouts
Page 21 of 39
Math146 - Chapter 3 Handouts
Page 22 of 39
Empirical Rule
Notice that this rule applies to data having an approximately
distribution. It can be used to determine the percentage of data that will lie within
a certain number of standard deviations of the mean.
1. About of all data values fall within 1 standard deviation of the mean.
2. About of all data values fall within 2 standard deviations of the mean.
3. About of all data values fall within 3 standard deviations of the mean.
.
Note: Can also be used assuming population parameters µ and .
Math146 - Chapter 3 Handouts
Page 23 of 39
Example: Using the Empirical Rule
Men’s pulse data from STATDISK
x =
s =
x – s =
x + s =
From the Empirical Rule, would expect about 68% of the
data values to fall within the range of to
.
No. of values in this range =
x – 2s =
x + 2s =
From the Empirical Rule, would expect about 95% of the
data values to fall within the range of to
.
No. of values in this range = From the Range Rule of Thumb, are there any unusual
Men’s Pulse Data
(Data Set 1, 12th ed.)
Pulse (bpm)
1 46
2 50
3 52
4 54
5 56
6 56
7 58
8 58
9 60
10 60
11 60
12 60
13 62
14 62
15 64
16 64
17 64
18 66
19 66
20 66
21 68
22 68
23 68
24 68
25 68
26 70
27 70
28 70
29 72
30 74
31 74
32 74
33 76
34 78
35 80
36 80
37 84
38 86
39 88
40 90
Math146 - Chapter 3 Handouts
Page 24 of 39
The following table reports the daily high temperatures (F) in February 2006
for three locations.
Feb Date
Lincoln, Neb
San Luis Obispo, CA
Sedona, AZ
1 53 68 62
2 59 69 64
3 40 77 62
4 36 68 66
5 36 76 61
6 44 71 61
7 46 79 68
8 34 85 68
9 41 87 63
10 39 67 66
11 27 75 57
12 30 81 62
13 61 83 66
14 68 64 63
15 41 57 55
16 26 57 51
17 11 53 48
18 14 54 48
19 28 52 47
20 47 58 49
21 51 58 50
22 53 67 55
23 48 70 61
24 65 67 62
25 36 65 64
26 53 64 67
27 71 63 66
28 73 62 57
1. Use the dot plots of the data below to
rank the three locations in order of
smallest to largest standard deviation:
Smallest std. dev:
Middle std. dev:
Largest std. dev:
2. On the next page, calculate (by hand)
the standard deviation for the
temperature data from San Luis Obispo.
Std. dev. =
3. Using the standard deviation for the
San Luis Obispo data, estimate the
standard deviation for the other two
locations:
Lincoln:
Sedona:
Math146 - Chapter 3 Handouts
Page 25 of 39
Calculate the standard deviation for the San Luis Obispo data set, using the
following table:
Day High Temp. (F) xi – �̅� (xi – �̅�)2
1 68
2 69
3 77
4 68
5 76
6 71 3.25 10.5625
7 79 11.25 126.5625
8 85 17.25 297.5625
9 87 19.25 370.5625
10 67 -0.75 0.5625
11 75 7.25 52.5625
12 81 13.25 175.5625
13 83 15.25 232.5625
14 64 -3.75 14.0625
15 57 -10.75 115.5625
16 57 -10.75 115.5625
17 53 -14.75 217.5625
18 54 -13.75 189.0625
19 52 -15.75 248.0625
20 58 -9.75 95.0625
21 58 -9.75 95.0625
22 67 -0.75 0.5625
23 70 2.25 5.0625
24 67 -0.75 0.5625
25 65 -2.75 7.5625
26 64 -3.75 14.0625
27 63 -4.75 22.5625
28 62 -5.75 33.0625
Sum =
Variance = Standard Deviation =
Math146 - Chapter 3 Handouts
Page 26 of 39
Section 3.4 Measures of Position and Outliers
In this section, will introduce a number measures of position, which describe the
of a certain data value within the
entire set of data.
z-Scores
z-Scores are standardized values:
z score equals the number of standard deviations that a given data
value is above or below the
z score is if value is greater than mean, z score is
if value is less than mean
can use z-score to identify
To calculate: For a sample: z = For a population: z = where x = the particular data value.
Round z scores off to decimal places.
The z-scores allow us to compare values from different data sets by providing a
standard basis of comparison.
Math146 - Chapter 3 Handouts
Page 27 of 39
Example: Comparing Standardized Test Scores
Suppose a college admissions office needs to compare scores of students who
take the Scholastic Aptitude Test (SAT) with those who take the American
College Test (ACT). Among the college’s applicants who take the SAT, scores
have a mean of 1500 and a standard deviation of 240. Among the college’s
applicants who take the ACT, scores have a mean of 21 and a standard
deviation of 6.
Mike scored 1740 on the SAT, and Packard scored 30 on the ACT.
Who did relatively better on their test?
Standardize the comparison by calculating the z-scores:
Mike: z = Packard: z =
Math146 - Chapter 3 Handouts
Page 28 of 39
Identifying Outliers Using z-scores
Note that this method applies only to distributions that are fairly
, because it is based on the Empirical Rule.
Within a particular data set, the z-score is useful for giving us some idea of the
relative standing of a particular data value:
If a data value has a z-score fairly near 0, it is to the mean,
a very data value.
If a data value has a z-score of less than -2, or greater than +2, that
means it is very from the mean, and very far
away from most of the data values. That would be a less typical data
value.
Ordinary or “usual” values:
Unusual values:
-3 -2 -1 0 1 2 3
z score
ordinary unusual unusual
Math146 - Chapter 3 Handouts
Page 29 of 39
Percentiles and Quartiles
Percentiles
Percentiles are numbers that divide a data set into equal
parts, with about of the data in each part.
A data set has percentiles:
The interpretation the “kth percentile” of an observation means that
of the observations are less than or equal to the observation.
Example: a data set with 500 values in it, sorted in ascending order
Percentiles would divide it into 100 groups, with 5 data values in each group.
1st 2nd 3rd 4th 5th | 6th 7th 8th 9th 10th | … | 496th 497th 498th 499th 500th
Quartiles – values that divide the data into four roughly equal groups.
There are three Quartiles, Q1, Q2 and Q3
Note that the data has to be sorted in ascending order.
Q1 separates the bottom of the values
Q2 is the – separates bottom from top
Q3 separates the top .
Math146 - Chapter 3 Handouts
Page 30 of 39
Note:
If the number of observations in the data set is odd,
include the median when determining Q1 and Q3.
Quartiles are a measure.
Math146 - Chapter 3 Handouts
Page 31 of 39
Example: For twelve data values (even) sorted in ascending value:
2 5 6 10 15 17 24 27 27 28 30 31
Q2 = median = .
Q1 = median of bottom half = .
Q3 = median of top half = .
Example: For eleven data (odd) values sorted in ascending value:
5 6 10 15 17 24 27 27 28 30 31
Q2 = median = .
Q1 = median of bottom half = .
Q3 = median of top half = .
Note that different textbooks or statistical software packages may have slightly
different methods on how to find the quartiles.
STATDISK will always give the same results for the quartiles as the method in
our textbook as long as there are an even number of data values.
Math146 - Chapter 3 Handouts
Page 32 of 39
Interquartile Range (IQR)
The interquartile range is another measure of .
IQR =
IQR represents the range of values over which of the data is spread.
Outliers
An outlier is a data value that is from the other data
values, an extreme observation. An outlier could be the result of:
An (measurement, sampling, or recording)
Just an unusually observation
Checking for outliers using Quartiles:
Calculate the fences, cutoff points for determining outliers:
Lower fence =
Upper fence =
A data value is considered an outlier if:
It is the lower fence, or
It is the upper fence.
Checking
for Outliers
Math146 - Chapter 3 Handouts
Page 33 of 39
Example: Natural Selection (source: Workshop Statistics, 4th edition)
A landmark study on the topic of natural selection was conducted by Hermon
Bumpus in 1898. Bumpus gathered extensive data on house sparrows that were
brought to the Anatomical Laboratory of Brown University in Providence, Rhode
Island, following a particularly severe winter storm. Some of the sparrows were
revived, but some sparrows perished. Bumpus analyzed his data to investigate
whether or not those that survived tended to have distinctive physical
characteristics related to their fitness.
The following sorted data are the total length measurements (in millimeters, from
the tip of the sparrow’s beak to the tip of its tail) for the 24 adult males that died
and the 35 adult males that survived. (note: I also added a column of numbers
next to each data list, just to identify the data values)
Sparrow Died Sparrow Lived
Minimum
Q1
Q2
Q3
Maximum
Interquartile Range (IQR)
Lower Fence
Upper Fence
Note: the first five rows that you filled out are called the:
Math146 - Chapter 3 Handouts
Page 34 of 39
Sparrow
Sparrow
Died
Lived Length (mm)
Length (mm)
1 156 1 153
2 158 2 154
3 160 3 155
4 160 4 155
5 160 5 156
6 161 6 156
7 161 7 157
8 161 8 157
9 161 9 158
10 161 10 158
11 162 11 158
12 162 12 158
13 162 13 158
14 162 14 159
15 162 15 159
16 162 16 159
17 163 17 159
18 163 18 159
19 164 19 160
20 165 20 160
21 165 21 160
22 165 22 160
23 166 23 160
24 166 24 160 25 160 26 160 27 160 28 161 29 161 30 161 31 161 32 162 33 163 34 165 35 166
Math146 - Chapter 3 Handouts
Page 35 of 39
Boxplots
This is a graphical display of the data based on the .
Many books call what we are going to make a “modified boxplot”, because we
are going to indicate the outliers on our graph.
Math146 - Chapter 3 Handouts
Page 36 of 39
Sparrow died :
Sparrow lived :
Conclusions? 1. What do the boxplots reveal, as far as whether or not there appears to be
a difference in lengths between the sparrows that survived and the sparrows that
died?
2. What type of a study was this: observational, or designed experiment? 3. As such, can we conclude that being shorter caused the sparrows to be
more likely to survive the storm?
Math146 - Chapter 3 Handouts
Page 37 of 39
PULSE RATES - MEN
Men's Pulse Rates –
SORTED (bpm)
1 46
2 50
3 52
4 54
5 56
6 56
7 58
8 58
9 60
10 60
11 60
12 60
13 62
14 62
15 64
16 64
17 64
18 66
19 66
20 66
21 68
22 68
23 68
24 68
25 68
26 70
27 70
28 70
29 72
30 74
31 74
32 74
33 76
34 78
35 80
36 80
37 84
38 86
39 88
40 90
1. Take your own pulse rate for one minute:
Pulse = beats per minute
2. Calculate your z-score.
z =
Assuming the distribution is bell-shaped, is your data value an
outlier?
3. Find the quartiles (do NOT add your data value):
Q1 =
Q2 =
Q3 =
4. Find the IQR (interquartile range):
IQR =
5. Identify any outliers (circle them):
Lower fence =
Upper fence =
Math146 - Chapter 3 Handouts
Page 38 of 39
PULSE RATES - WOMEN
Women's Pulse Rates –
SORTED (bpm)
1 56
2 60
3 62
4 62
5 64
6 64
7 66
8 68
9 68
10 72
11 72
12 72
13 72
14 72
15 72
16 72
17 74
18 74
19 76
20 76
21 78
22 78
23 78
24 78
25 78
26 78
27 78
28 80
29 82
30 82
31 82
32 88
33 90
34 90
35 90
36 96
37 98
38 98
39 100
40 104
1. Take your own pulse rate for one minute:
Pulse = beats per minute
2. Calculate your z-score.
z =
Assuming the distribution is bell-shaped, is your data value an
outlier?
3. Find the quartiles (do NOT add your data value):
Q1 =
Q2 =
Q3 =
4. Find the IQR (interquartile range):
IQR =
5. Identify any outliers (circle them):
Lower fence =
Upper fence =
Math146 - Chapter 3 Handouts
Page 39 of 39
Women’s Pulse Data: Men’s Pulse Data:
mean = mean =
standard deviation = standard deviation =
Who has the relatively higher pulse rate (compared to their respective populations):
A woman with a pulse rate of 100 bpm, or a man with a pulse rate of 88 bpm? Check by
calculating the z-score for both.
Women’s 5-number summary: Men’s 5-number summary:
Min = Min =
Q1 = Q1 =
Q2 = Q2 =
Q3 = Q3 =
Max = Max =
Boxplot of Women’s Pulse Rate Data (indicating potential outliers):
Boxplot of Men’s Pulse Rate Data (indicating potential outliers):
Pulse Rate (bpm)
What shape do the distributions appear to be, and what conclusions can you make based
on the comparison of the two boxplots?