Central Tendency
Introduction to StatisticsIntroduction to StatisticsChapter 3 Chapter 3
Sep 1, 2009Sep 1, 2009Class #3Class #3
Central Tendency
A statistical measure to determine a single A statistical measure to determine a single score that defines the center of the score that defines the center of the distributiondistribution
There are many measures of central There are many measures of central tendency. The most common are the tendency. The most common are the mean, median, and mode mean, median, and mode
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
The ModeThe Mode The MedianThe Median
The Mean The Mean The Symmetrical Distribution The Symmetrical Distribution The Positively Skewed DistributionThe Positively Skewed Distribution The Negatively Skewed Distribution The Negatively Skewed Distribution
Illustrating the mode
GOV SHOULD ASSIST LOW-INCOME COLLEGE STUDENT
DEFIN SHOULD NOT BE
PROB SHOULD NOT BE
PROBAB SHOULD BE
DEFIN SHOULD BE
Pe
rce
nt
60
50
40
30
20
10
0
Mode
Mode: The Category or Score with the Mode: The Category or Score with the Largest frequency(or %)Largest frequency(or %)
The mode is always a category or scoreThe mode is always a category or score The mode is not necessarily the category The mode is not necessarily the category
with the majority(more than 50% of the with the majority(more than 50% of the cases)cases)
The mode is the only measure of central The mode is the only measure of central tendency for nominal variablestendency for nominal variables
Some distributions are bimodalSome distributions are bimodal
GOV SHOULD PROVIDE HOUSING TO POOR
80 8.2 19.3 19.3
194 19.8 46.9 66.2
99 10.1 23.9 90.1
41 4.2 9.9 100.0
414 42.3 100.0
524 53.5
18 1.8
23 2.3
565 57.7
979 100.0
DEFIN SHOULD BE
PROBAB SHOULD BE
PROB SHOULD NOT BE
DEFIN SHOULD NOT BE
Total
Valid
NAP
CANT CHOOSE
NA
Total
Missing
Total
Frequency PercentValid
PercentCumulative
Percent
Mode is the category with highest frequency (not necessarily the majority)
GOVTS RESP: PROVIDE FOR THE UNEMPLOYED
56 5.7 13.5 13.5
135 13.8 32.5 45.9
139 14.2 33.4 79.3
86 8.8 20.7 100.0
416 42.5 100.0
524 53.5
17 1.7
22 2.2
563 57.5
979 100.0
DEFIN SHOULD BE
PROBAB SHOULD BE
PROB SHOULD NOT BE
DEFIN SHOULD NOT BE
Total
Valid
NAP
CANT CHOOSE
NA
Total
Missing
Total
Frequency PercentValid
PercentCumulative
Percent
Bimodal Distribution
Bimodal Distribution
Bimodal distributionBimodal distribution is a continuous probability is a continuous probability distribution with two different modes. These appear as distribution with two different modes. These appear as
distinct peaks. See below:distinct peaks. See below:
THE MEDIANTHE MEDIAN
The median is a measure of central The median is a measure of central tendency for variables which are at tendency for variables which are at least ordinalleast ordinal
The median represents the exact The median represents the exact middle of a distributionmiddle of a distribution
It is the score that divides the It is the score that divides the distribution into two equal partsdistribution into two equal parts
Finding the Median in sorted data
“How satisfied are you with your health insurance?
Responses of 7 Individuals
very dissatisfied
very satisfied
somewhat satisfied
very dissatisfied
somewhat dissatisfied
somewhat satisfied
very satisfied
Total (N) 7
To locate the median
Arrange the responses in order from lowest to highest (or highest to lowest):
Response
very dissatisfied
very dissatisfied
somewhat dissatisfied
somewhat satisfied ( The middle case =Median)
somewhat satisfied
very satisfied
very satisfied______________________________________________________
Summary :Locating the Median Summary :Locating the Median with N=Oddwith N=Odd
The median is the response associated with The median is the response associated with the middle case.the middle case.
You find the middle case by :(N + 1) / 2You find the middle case by :(N + 1) / 2 Since N= 7, Since N= 7, the middle case is the (7 + the middle case is the (7 +
1) / 2, or the 4th case1) / 2, or the 4th case
The response associated with the 4th case is The response associated with the 4th case is “somewhat satisfied”. Therefore the median “somewhat satisfied”. Therefore the median is: is: Somewhat satisfied.Somewhat satisfied.
To locate the medianTo locate the median (N=Even) (N=Even)
Suicide rates of cities
7.44, 10.00, 12.26, 12.61, 13.38, 14.11, 14.30, 14.78
The median is located halfway between the two middle cases. When the variable is interval we can average the two middle cases.
Median = 12.61 + 13.38 = 12.99
2
Finding the Median in Frequency Finding the Median in Frequency DistributionsDistributions
Construct a cumulative percentage Construct a cumulative percentage distributiondistribution
The observation located at the middle The observation located at the middle of the distribution is the one that has a of the distribution is the one that has a cumulative percentage value equal to cumulative percentage value equal to 50%50%
The median is the category associated The median is the category associated with this observationwith this observation
POWER OF BUSINESS AND INDUSTRY
49 5.0 12.2 12.2
148 15.1 36.9 49.1
170 17.4 42.4 91.5
32 3.3 8.0 99.5
2 .2 .5 100.0
401 41.0 100.0
524 53.5
37 3.8
17 1.7
578 59.0
979 100.0
FAR TOO MUCH POWER
TOO MUCH POWER
RIGHT AMNT OF POWER
TOO LITTLE POWER
FAR TOO LITTLE POWER
Total
Valid
NAP
CANT CHOOSE
NA
Total
Missing
Total
Frequency PercentValid
PercentCumulative
Percent
Median
Look here
50th % included here
Locating Percentiles in a Locating Percentiles in a Frequency DistributionFrequency Distribution
A percentile is a score below which a specific A percentile is a score below which a specific percentage of the distribution falls (the median is the percentage of the distribution falls (the median is the 50th percentile.50th percentile.
The 75th percentile is a score below which 75% of The 75th percentile is a score below which 75% of the cases fall.the cases fall.
The median is the 50th percentile: 50% of the cases The median is the 50th percentile: 50% of the cases fall below itfall below it
Another type of percentile :The quartile lower Another type of percentile :The quartile lower quartile is 25th percentile and the upper quartile is quartile is 25th percentile and the upper quartile is the 75th percentilethe 75th percentile
97th Percentile
85th Percentile
50th Percentile
NUMBER OF CHILDREN
260 26.6 26.6 26.6
161 16.4 16.5 43.1
260 26.6 26.6 69.7
155 15.8 15.9 85.6
70 7.2 7.2 92.7
31 3.2 3.2 95.9
21 2.1 2.1 98.1
11 1.1 1.1 99.2
8 .8 .8 100.0
977 99.8 100.0
2 .2
979 100.0
0
1
2
3
4
5
6
7
EIGHT OR MORE
Total
Valid
NAMissing
Total
Frequency PercentValid
PercentCumulative
Percent
50th percentile
80th percentile
50% included here
80% includedhere
25th percentile
25% included here
Locating Percentiles in a Frequency Distribution
________________________________________________________Mean. Mean.
The arithmetic average obtained by The arithmetic average obtained by adding up all the scores and dividing by adding up all the scores and dividing by
the total number of scores.the total number of scores.______________________________________________________________________
The Mean_________________________________________________________________The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.
Mean = M = X n
_________________________________________________________________
__
X = the sum of all the x scores in the sample
n = the number of observations in the sample
Often we use symbol referred to Often we use symbol referred to as “x-bar”as “x-bar”
Murder Rate in U.S Murder Rate in U.S Cities:Finding the MeanCities:Finding the Mean
Murder Rate in U.S Cities: Murder Rate in U.S Cities: Finding the MeanFinding the Mean
CITYNew YorkLos AngelesChicagoHoustonPhiladelphiaSan DiegoDetroitDallasPhoenixSan AntonioHonoluluSan JoseBaltimoreSan FranciscoJacksonville
MURDER RATE per 100029.328.932.936.5
2514.758.448.812.821.83.46.6
40.612.919.8
Total 392.4
Finding the mean in a frequency distribution
When data are arranged in a frequency distribution, we must give each score its proper weight by multiplying it by its frequency. We use the following formula to calculate the mean:
__
X = fxn
where__X = the meanf X = a score multiplied by its frequencyn = the total number of cases in the distribution
Calculating the Mean from a Calculating the Mean from a Frequency DistributionFrequency Distribution
# of Children(X)01234567
Total
Frequency(f)1225
733333183261512
1339
Frequency*X(fX)0
2514669997321309084
3526
Weighted MeanWeighted Mean
Often it is necessary to combine two sets of Often it is necessary to combine two sets of scores and then find the overall mean for the scores and then find the overall mean for the combined groupcombined group
See formula:See formula:
• X11 + + X22……
nn11+ + nn22……
Changing a score…Changing a score…
Changing any score will change the meanChanging any score will change the mean
Introducing a new score or Introducing a new score or removing a score…removing a score…
Will usually change the meanWill usually change the mean
Adding or subtracting a new Adding or subtracting a new score…score…
If you add (or subtract) a constant from If you add (or subtract) a constant from each score then the mean will increase (or each score then the mean will increase (or decrease) by that scoredecrease) by that score
Multiplying or dividing by a Multiplying or dividing by a constant…constant…
Same thing here: Same thing here: If you multiply of divide each score by a If you multiply of divide each score by a
constant the mean will be effected in a similar constant the mean will be effected in a similar mannermanner
Selecting a measure of central Selecting a measure of central tendency…tendency…
If it is at all possible, you should attempt to If it is at all possible, you should attempt to find all three measures of central tendencyfind all three measures of central tendency However, if this is not possible, then there are However, if this is not possible, then there are
situations in which the mean, median, and situations in which the mean, median, and mode have their specific "advantages"mode have their specific "advantages"
Choosing the mean…Choosing the mean…
The mean is ordinarily the The mean is ordinarily the preferred measure of central preferred measure of central tendency tendency The mean presented along with the The mean presented along with the
variance and the standard deviation variance and the standard deviation is the "best" measure of central is the "best" measure of central tendency for continuous datatendency for continuous data
Choosing the median…Choosing the median…
There are some situations in which the mean is There are some situations in which the mean is not the "best" measure of central tendency. In not the "best" measure of central tendency. In certain situations, the median is the preferred certain situations, the median is the preferred measure. These situations are as follows:measure. These situations are as follows: when you know that a distribution is skewed when you know that a distribution is skewed when you believe that a distribution might be when you believe that a distribution might be
skewed skewed when you have a small number of subjects when you have a small number of subjects
Choosing the median…Choosing the median…
The purpose for reporting the median in The purpose for reporting the median in these situations is to combat the effect of these situations is to combat the effect of outliersoutliers
Outliers affect the distribution because Outliers affect the distribution because they are extreme scoresthey are extreme scores
Choosing the mode…Choosing the mode… The mode is rarely chosen as the preferred The mode is rarely chosen as the preferred
measure of central tendencymeasure of central tendency The mode is not usually used because the The mode is not usually used because the
largest frequency of scores might not be at the largest frequency of scores might not be at the centercenter
The only situation in which the mode may be The only situation in which the mode may be preferred over the other two measures of central preferred over the other two measures of central tendency is when describing discrete categorical tendency is when describing discrete categorical datadata
The mode is preferred in this situation because The mode is preferred in this situation because the greatest frequency of responses is important the greatest frequency of responses is important for describing categorical datafor describing categorical data
SymmetricalSymmetricalDistributionsDistributions
A distribution is symmetrical if the A distribution is symmetrical if the frequencies at the right and left tails of the frequencies at the right and left tails of the distribution are identical, so that if it is distribution are identical, so that if it is divided into two halves, each will be the divided into two halves, each will be the mirror image of the other. mirror image of the other.
In a unimodal symmetrical distribution the In a unimodal symmetrical distribution the mean, median, and mode are identical.mean, median, and mode are identical.
Almost Symmetrical DistributionAlmost Symmetrical Distribution
Mean=13.4
Median=13.0
HIGHEST YEAR OF SCHOOL COMPLETED
20.017.515.012.510.07.55.02.50.0
HIGHEST YEAR OF SCHOOL COMPLETED
Fre
qu
en
cy
400
300
200
100
0
Std. Dev = 2.97
Mean = 13.4
N = 975.00
Skewed Distribution:Skewed Distribution:
Skewed DistributionSkewed DistributionFFew extreme values on one side of the distribution or ew extreme values on one side of the distribution or
on the other.on the other.
Positively skewedPositively skewed distributions: distributions: distributions which have one or a few distributions which have one or a few extremely high values (Mean > Median)extremely high values (Mean > Median)
Negatively skewed distributions: Negatively skewed distributions:
distributions which have one or a few distributions which have one or a few
extremely low values (Mean < Median)extremely low values (Mean < Median)
Positively Skewed DistributionPositively Skewed Distribution
Positively Skewed DistributionPositively Skewed Distribution
GOVT INVESTIGATE WORKERS ILLEGAL DRUG USE
4.03.02.01.0
GOVT INVESTIGATE WORKERS ILLEGAL DRUG USE
Fre
qu
en
cy
500
400
300
200
100
0
Std. Dev = .39
Mean = 1.1
N = 474.00
Mean=1.13
Median=1.0
Positively Skewed DistributionPositively Skewed Distribution
Example: Personal income is frequently positively
skewed because there are fewer people with high income; therefore, studies on earnings often report median income. The mean tends to overestimate both the earnings of the most typical earner (the mode) and the earnings represented by the 50th percentile (the median).
Negatively Skewed DistributionNegatively Skewed Distribution
Negatively Skewed DistributionNegatively Skewed Distribution
FAVOR PREFERENCE IN HIRING BLACKS
4.03.02.01.0
FAVOR PREFERENCE IN HIRING BLACKS
Fre
qu
en
cy
600
500
400
300
200
100
0
Std. Dev = .98
Mean = 3.3
N = 908.00
Mean=3.3
Median=4.0
Bas ic Business Statis tics , 10e © 2006 Prentice-Hall, Inc . Chap 3-37
S hape o f a D istr ibution
D e s c r ib e s h o w d a ta a re d is tr ib u te d
M e a s u re s o f s h a p e
S ym m e tric o r ske w e d
M ean = M edianM ean < M edian M edian < M ean
Right-SkewedLeft-Skewed Symmetric
Choosing a Measure of Central Choosing a Measure of Central TendencyTendency
IF variable is Nominal..IF variable is Nominal.. ModeMode
IF variable is Ordinal...IF variable is Ordinal... Mode or Median (or both)Mode or Median (or both)
IF variable is Interval-Ratio and distribution is IF variable is Interval-Ratio and distribution is Symmetrical…Symmetrical… Mode, Median or Mean Mode, Median or Mean
IF variable is Interval-Ratio and distribution is IF variable is Interval-Ratio and distribution is Skewed…Skewed… Mode or MedianMode or Median
CreditsCredits http://www.uwm.edu/People/chava/centraltend.ppt http://simon.cs.vt.edu/SoSci/converted/MMM/choosingct.html http://www.cvgs.k12.va.us/DIGSTATS/main/descriptv/d_skewd.html
Top Related