M35 Chapter 4
Transcript of M35 Chapter 4
-
8/12/2019 M35 Chapter 4
1/24
Analyzing and Summarizing Data
-
8/12/2019 M35 Chapter 4
2/24
Summarizing Data
Most sets of data show a distinct tendency to grouparound a central value (or central tendency).
The purpose of central tendency is to find a single
value that best represents an entire distribution of
scores.
When people talk about an average value or the middle
value or the most frequent value, they are talking
informally about the mean, median, and modethree
measures of central tendency.
-
8/12/2019 M35 Chapter 4
3/24
TERMINOLOGY
Central Tendency
-the extent to which the data values group around a typicalor central value
Measures of Central Tendency
-numerical values that locate, in some sense, the center of aset of data
Variation-the amount of dispersion, or scattering, of values away
from a central value
Shape-the pattern of the distribution of values from the lowest
value to the highest value
-
8/12/2019 M35 Chapter 4
4/24
IMPORTANCE
1) To find representative valueIt give us one value for the distribution and this valuerepresents the entire distribution.
2) To condense dataAverage converts the whole set of figures into just one figure
and thus helps in condensation.
3) To make comparisonsTo make comparisons of two or more than two distributions,
we have to find the representative values of these
distributions.
4) Helpful in further statistical analysisMany techniques of statistical analysis (Dispersion,
Skewness, Correlation) are based on measures of central
tendency.
-
8/12/2019 M35 Chapter 4
5/24
Mean-The average with which you are probably most familiar.-The sample mean is represented by (read x-bar orsample mean).
-The mean is found by adding all the values of the variable x(this sum of x values is symbolized x) and dividing the sum
by the number of these values, n (the sample size).
Sample Mean =
=
Population Mean =
=
-
8/12/2019 M35 Chapter 4
6/24
Activity:
Typical Time It Takes To Get Ready In The Morning
If you knew the typical time it takes you to get ready in the morning,you might be able to better plan your morning and minimize any
excessive lateness (or earliness) going to your destination.
Find the Mean for the following times (in mins)collected for 10 consecutive days.
Day 1 2 3 4 5 6 7 8 9 10
Time (min) 39 29 43 52 39 44 40 31 44 35
Answer:
-
8/12/2019 M35 Chapter 4
7/24
Mean = 39.6 minutes
Even though no individual day in the sample actuallyhad the value 39.6 minutes, allotting about 40
minutes to get ready would be a good rule for
planning your mornings.
-
8/12/2019 M35 Chapter 4
8/24
What if on Day 4, the time you spent is 102
minutes instead of 52 minutes:
Day 1 2 3 4 5 6 7 8 9 10
Time (min) 39 29 43 102 39 44 40 31 44 35
Find the Mean.
-
8/12/2019 M35 Chapter 4
9/24
Answer:
Mean = 44.6 minutes
The one extreme value has increased the mean from39.6 to 44.6 minutes.
In contrast to the original mean that was in themiddle, the new mean is greater than 9 of the 10
getting-ready times.
Because of the extreme value, now the mean is not agood measure of central tendency.
Time (min) 29 31 35 39 39 39.6 40 43 44 44 52
Time (min) 29 31 35 39 39 40 43 44 44 44.6 102
-
8/12/2019 M35 Chapter 4
10/24
Mean
Use the mean to describe the middle of a set of data thatdoes nothave an outlier (extreme values).
Advantages:
Most popular measure in fields such as business,
engineering and computer science.
It is unique - there is only one answer.
Useful when comparing sets of data.
Disadvantages: Affected by extreme values (outliers)
-
8/12/2019 M35 Chapter 4
11/24
Median-The value of the data that occupies the middle position
when the data are ranked in order according to size.
-The sample median is represented by x (read x-tilde orsample median).
-The median is not affected by extreme values, so you canuse the median when extreme values are present.
-
8/12/2019 M35 Chapter 4
12/24
Steps in determining the Median:
1)
Rank the data.
2) Determine the depth of the median (rank of the medianvalue).
= () + 1
2
3) Determine the value of the median by counting its rank asgiven by the depth.
-
8/12/2019 M35 Chapter 4
13/24
Activity:
A)Find the median for the set of data {6, 3, 8, 5, 3}.Median = 5 (3rdvalue)
B)Find the median of the sample 9, 6, 7, 9, 10, 8.Median = 8.5 (3.5thvalue)
C)Find the median for both cases:
a) Median = 39.5 (5.5th) b) Median = 39.5 (5.5th)a) Time (min) 29 31 35 39 39 40 43 44 44 52b) Time (min) 29 31 35 39 39 40 43 44 44 102
-
8/12/2019 M35 Chapter 4
14/24
Median
Use the median to describe the middle of a set of data thatdoeshave an outlier.
Advantages:
Extreme values (outliers) do not affect the median as
strongly as they do the mean
Easy to calculate and in some cases, can be obtained
by inspection
It is unique - there is only one answer.
Disadvantages:
Not capable of further algebraic treatment
Ranking a large number of data can be tedious
-
8/12/2019 M35 Chapter 4
15/24
Mode
-The value of x that occurs most frequently-Can be used with categorical data-Like the median, extreme values do not affect the mode-Often, there is no mode or there are several modes in a set
of data-Distributions can be: unimodal, bimodal, or multimodal
-
8/12/2019 M35 Chapter 4
16/24
Activity: For Categorical Data
Find the mode.
Flavor f
Vanilla 28
Chocolate 22
Strawberry 15Neapolitan 8
Butter Pecan 12
Rocky Road 9Fudge Ripple 6
Mode: Vanilla
-
8/12/2019 M35 Chapter 4
17/24
Activity: For Numerical Data
A)Find the Mode.Day 1 2 3 4 5 6 7 8 9 10
Time (min) 39 29 43 52 39 44 40 31 44 35
Mode = 39, 44 --> bimodal
B)The bounced check fees ($) for a sample of 10 banksis:
26 28 20 21 22 25 18 23 15 30
Find the Mode.
Mode = no mode
-
8/12/2019 M35 Chapter 4
18/24
Mode
Use the mode when the data is non-numeric or whenasked to choose the most popular item.
Advantages:
Extreme values (outliers) do not affect the mode.
Disadvantages:
Not necessarily unique - may be more than one
answer
When no values repeat in the data set, there is nomode and may seem useless.
When there is more than one mode, it is difficult to
interpret and/or compare.
-
8/12/2019 M35 Chapter 4
19/24
Considerations for Choosing a Measure of Central
Tendency:
For nominal variables, the mode is the only measurethat can be used.
For ordinal variables, the mode and the median may beused. The median provides more information (taking
into account the ranking of categories).
For numerical variables, the mode, median and meanmay all be calculated. The mean provides the most
information about the distribution but the median is
preferred if the distribution has extreme values.
-
8/12/2019 M35 Chapter 4
20/24
Midrange
-The number exactly midway between a lowest-valued data,L, and a highest-valued data, H
= +
2
-
8/12/2019 M35 Chapter 4
21/24
Activity:
Find the mean, median, mode and midrange.
{6, 7, 8, 9, 9, 10}
Mean = 8.17
Median = 8.5
Mode = 9Midrange = 8
-
8/12/2019 M35 Chapter 4
22/24
ASSIGNMENT: (1 whole sheet, due: THU, Dec. 12)1)Compute the mean, median and mode for the set of
scores shown in the following frequency distribution
table.
X f7 16 15 14 13 42 31 12) Identify the circumstances where the median instead
of the mean is the preferred measure of central
tendency.
-
8/12/2019 M35 Chapter 4
23/24
3)Under what circumstances will the mean, the median,and the mode all have the same value?
4)Under what circumstances is the mode the preferredmeasure of central tendency?
5)Explain why the mean is often not a good measure ofcentral tendency for a skewed distribution?
6)Draw and determine the shape of the distributionwhen:
a) The mean, median and mode are equalb)The mode is lowest, followed by median and meanc) The mean is lowest, followed by the median and
mode
-
8/12/2019 M35 Chapter 4
24/24