Statistics 1 Measures of central tendency and measures of spread.
Statistics-Measures of dispersions
description
Transcript of Statistics-Measures of dispersions
![Page 1: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/1.jpg)
MEASURES OF DISPERSIONS
STATISTICS
![Page 2: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/2.jpg)
MEASURES OF DISPERSIONS
• A quantity that measures the variability among the data, or how the data one dispersed about the average, known as Measures of dispersion, scatter, or variations.
![Page 3: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/3.jpg)
2. Common Measures of Dispersion
• The main measures of dispersion1. Range
2. Mean deviation or the average deviation
3. The variance & the standard deviation
![Page 4: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/4.jpg)
1. RANGE
• It is the difference between the largest and the smallest observation in a set of data.
• Range = xm – xo
• Its relative measure known as coefficient of dispersion.
• Coefficient of dispersion =
• It is used in daily temperature recording stick prices rate• It ignores all the information available in middle of data.• It might give a misleading picture of the spread of data.
om
om
xx
xx
![Page 5: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/5.jpg)
1. RANGE• Example:
1. Find the range in the following data.
31,26,15,43,19,10,12,37
Range = xm – xo 33 = 43 – 10
2. Find the range in the following F.D. (Ungrouped)
5 = 8 – 3
Range 5 = 8 – 3
3. Find the range in the following data.
Range = 60 – 10 = 50
X 3 4 5 6 7 8
f 5 8 12 10 4 2
X 10 - 20 20 - 30 30- 40 40 – 50 50 - 60
f 5 8 12 10 4
![Page 6: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/6.jpg)
MEAN (OR AVERAGE) DEVIATION
• It is defined as the “Arithmetic mean of the absolute deviation measured either from the mean or median.
• or for ungroup.
• or for grouped.
n
xxDM
..
N
xxf
N
medianx
N
medianxf
![Page 7: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/7.jpg)
MEAN (OR AVERAGE) DEVIATION
• Example:
1. Calculate mean deviation from the FD (Ungrouped Data).
MD (x) = 33.6 / 20 = 1.68
X f f.x I x – 4.9 I f I x - 4.9 I
2 3 6 2.9 8.7
4 9 36 0.9 8.1
6 5 30 1.1 5.5
8 2 16 3.1 6.2
10 1 10 5.1 5.1
Total Σf =20 Σf.x =98 Σ f I x - 4.9 I = 33.6
![Page 8: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/8.jpg)
MEAN (OR AVERAGE) DEVIATION
• Exp: Calculate mean deviation from the FD (Grouped Data).
MD (x) = 33.6 / 20 = 1.68
M.D = 23.72 / 14 = 1.69
X f Class Mark ( x )
f.x I x – 6.57 I f I x – 6.57 I
2 – 4 2 3 6 3.57 7.14
4 - 6 3 5 15 1.57 4.71
6 – 8 6 7 42 0.43 2.58
8 – 10 2 9 18 2.43 4.86
10 – 12 1 11 11 4.43 4.43
Total Σf =14 Σ f.x =92 Σ f I x – 6.57 I = 23.72
ẋ=92/14=6.57
![Page 9: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/9.jpg)
• It is an absolute measure.
• It is relative measure is coefficient of M.D.
• Coefficient of M.D. =
• It is based on all the observed values.
MEAN (OR AVERAGE) DEVIATION
median
DMor
mean
DM ....
![Page 10: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/10.jpg)
• EXAMPLES
MEAN (OR AVERAGE) DEVIATION
![Page 11: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/11.jpg)
THE VARIANCE ANDSTANDARD DEVIATION
• It is defined as “The mean of the squares of deviations of all the observation from their mean.” It’s square root is called “standard deviation”.
• Usually it is denoted by (for population of statistics) S2 (for sample)
• = for ungrouped
2
2n
xx 2)(
![Page 12: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/12.jpg)
• = for grouped
• It is an absolute measure;
• It is relative measure is coefficient of variation.
•
• Shortcut method
N
xxf 2)(2
100.
VC 100..
.. x
DSVC
222
N
x
N
x
222 .
N
fx
N
xf
THE VARIANCE ANDSTANDARD DEVIATION
![Page 13: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/13.jpg)
• EXAMPLES
THE VARIANCE ANDSTANDARD DEVIATION
![Page 14: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/14.jpg)
VARIANCE AND STANDARD DEVIATION
• Example:
1. Calculate Variance and SD from the FD (Ungrouped Data).
Using Short cut method
var = (564 / 20) - (98 / 20) ^ 2 = 28.2 – 24.01 = 4.09
Sd = √ σ^2 = √ 4.09 = 2.02
X f f.x X^2 f.x^2
2 3 6 4 12
4 9 36 16 144
6 5 30 36 180
8 2 16 64 128
10 1 10 100 100
Total Σf =20 Σf.x = 98 Σ f.x^2=564
222 .
N
fx
N
xf
![Page 15: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/15.jpg)
VARIANCE AND STANDARD DEVIATION
• Exp: Calculate Variance and Standard deviation from the FD (Grouped Data).
Using Short cut method:
var = (670 /14) - (92 / 14) ^ 2 = 47.85 – 43.18 = 4.67
Sd = √ σ^2 = √ 4.67 = 2.16
X f Class Mark ( x )
f.x x^2 f.x^2
2 – 4 2 3 6 9 18
4 - 6 3 5 15 25 75
6 – 8 6 7 42 49 294
8 – 10 2 9 18 81 162
10 – 12 1 11 11 121 121
Total Σf =14 Σ f.x =92 Σ f.x^2 =670
222 .
N
fx
N
xf
![Page 16: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/16.jpg)
16
Relative Measures of Relative Measures of DispersionDispersion
Coefficient of Range Coefficient of Quartile Deviation Coefficient of Mean Deviation Coefficient of Variation (CV)
12:24 AM
![Page 17: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/17.jpg)
17
Relative Measures of Variation Relative Measures of Variation
Largest Smallest
Largest Smallest
Coefficient of RangeX X
X X
3 1
3 1
Coefficient of Quartile DeviationQ Q
Q Q
Coefficient of Mean DeviationMD
Mean
12:24 AM
![Page 18: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/18.jpg)
Coefficient of Variation (CV)Coefficient of Variation (CV)
Can be used to compare two or more sets of data measured in
different units or same units but different average size.
12:24 AM 18
100%X
SCV
![Page 19: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/19.jpg)
19
Use of Coefficient of VariationUse of Coefficient of VariationStock A:
Average price last year = $50Standard deviation = $5
Stock B:Average price last year = $100Standard deviation = $5 but stock B is
less variable relative to its price
10%100%$50
$5100%
X
SCVA
5%100%$100
$5100%
X
SCVB
Both stocks have the same standard deviation
12:24 AM
![Page 20: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/20.jpg)
20
Five Number SummaryFive Number SummaryThe five number summary of a data set consists of the minimum value, the first quartile, the second quartile, the third quartile and the maximum value written in that order: Min, Q1, Q2, Q3, Max.
From the three quartiles we can obtain a measure of central tendency (the median, Q2) and measures of variation of the two middle quarters of the distribution, Q2-Q1 for the second quarter and Q3-Q2 for the third quarter.
12:24 AM
![Page 21: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/21.jpg)
21
The weekly TV viewing times (in hours).
25 41 27 32 43 66 35 31 15 5 34 26 32 38 16 30 38 30 20 21
The array of the above data is given below:
5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 37 38 41 43 66
Five Number SummaryFive Number Summary
12:24 AM
![Page 22: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/22.jpg)
22
Hrs 22.021}-0.25{2521obs.}5th -obs.0.25{6th obs.5th ; Q1 of VALUE
obs.5.25th data in the obs.th 4
1)1(20 ; Q1 of LOCATION
Five Number SummaryFive Number Summary
Hrs 30.530}-0.50{3103obs.}10th -obs.0.50{11th obs.th 10; Q2 of VALUE
obs.th 50.10data in the obs.th 4
1)2(20;2Q of LOCATION
Minimum value=5.0 Maximum value=66.0
Hrs 36.535}-0.75{37 35 obs}15th -obs{16th 75.0 obs15th ;3Q of VALUE
obs.15.75th data in the obs.th 4
1)3(20 ;3Q of LOCATION
12:24 AM
![Page 23: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/23.jpg)
23
Box and Whisker DiagramBox and Whisker DiagramA box and whisker diagram or box-plot is a graphical mean for displaying the five number summary of a set of data. In a box-plot the first quartile is placed at the lower hinge and the third quartile is placed at the upper hinge. The median is placed in between these two hinges. The two lines emanating from the box are called whiskers. The box and whisker diagram was introduced by Professor Jhon W. Tukey.
12:24 AM
![Page 24: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/24.jpg)
24
Construction of Box-PlotConstruction of Box-Plot
1. Start the box from Q1 and end at Q3
2. Within the box draw a line to represent Q2
3. Draw lower whisker to Min. Value up to Q1
4. Draw upper Whisker from Q3 up to Max. Value
Q1
Q3
Q2
12:24 AM
MaxValue
MinValue
![Page 25: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/25.jpg)
25
Construction of Box-PlotConstruction of Box-Plot
1. Q1=22.0 Q3=36.52. Q2=30.53. Minimum Value=5.04. Maximum Value=66.0
70
60
50
40
30
20
10
012:24 AM
![Page 26: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/26.jpg)
26
Interpretation of Box-PlotInterpretation of Box-Plot
70
60
50
40
30
20
10
0
Box-Whisker Plot is useful to identify
•Maximum and Minimum Values in the data
•Median of the data
•IQR=Q3-Q1,
Lengthy box indicates more variability in the data
•Shape of the data From Position of line within box
Line At the center of the box----Symmetrical
Line above center of the box----Negatively skewed
Line below center of the box----Positively Skewed
•Detection of Outliers in the data12:24 AM
![Page 27: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/27.jpg)
27
OutliersOutliersAn outlier is the values that falls well outside the overall
pattern of the data. It might be
• the result of a measurement or recording error,• a member from a different population,• simply an unusual extreme value.
An extreme value needs not to be an outliers; it might, instead, be an indication of skewness.
12:24 AM
![Page 28: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/28.jpg)
28
Inner and Outer FencesInner and Outer FencesIf Q1=22.0 Q2=30.5 Q3=36.5
25.58IQR1.5QFenceInner Upper
25.0IQR1.5QFenceInner Lower :FencesInner
3
1
0.80IQR3QFenceOuter Upper
5.21IQR3QFenceOuter Lower :FencesOuter
3
1
12:24 AM
Lower Inner Fence 22-1.5(36.5-22) = 0.25Upper Inner Fence 36+1.5(36.5-22) = 58.25
Lower Outer Fence 22-3(36.5-22) = -21.5Upper Outer Fence 36+3(36.5-22) = 80.0
![Page 29: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/29.jpg)
29
Identification of the OutliersIdentification of the Outliers
1. The values that lie within inner fences are normal values
2. The values that lie outside inner fences but inside outer fences are possible/suspected/mild outliers
3. The values that lie outside outer fences are sure outliers
80
70
60
50
40
30
20
10
0
Plot each suspected outliers with an asterisk and each sure outliers with an hollow dot.
*
Only 66 is a mild outlier
12:24 AM
![Page 30: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/30.jpg)
30
Box plots are especially suitable for
comparing two or more data sets. In such a
situation the box plots are constructed on the
same scale.
Uses of Box and Whisker DiagramUses of Box and Whisker Diagram
MaleFemale
12:24 AM
![Page 31: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/31.jpg)
Standardized VariableStandardized VariableA variable that has mean “0” and
Variance “1” is called standardized variable
Values of standardized variable are called standard scores
Values of standard variable i.e standard scores are unit-less
Construction
VariableofDeviation Standard
VariableofMeanVariableZ
12:24 AM 31
![Page 32: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/32.jpg)
X Z
3 25 -1.3624 1.85611.8561
6 4 -0.5450 0.29700.2970
11 9 0.81741 0.66820.6682
12 16 1.0899 1.18791.1879
32 54 0 4.009
5.134
54
84
32
2
xS
n
XX
2)( XX
67.3
8
X
Sx
XXZ
14
009.4
0
2
zS
n
ZZ
2)( ZZ
Variable Z has mean “0” and
variance “1” so Z is a standard variable.
Standard Score at X=11 is8174.0
67.3
811
Sx
XXZ
12:24 AM
Standardized VariableStandardized Variable
![Page 33: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/33.jpg)
33
The industry in which sales rep Mr. Atif works has mean annual sales=$2,500
standard deviation=$500.
The industry in which sales rep Mr. Asad works has mean annual sales=$4,800
standard deviation=$600.
Last year Mr. Atif’s sales were $4,000 and Mr. Asad’s sales were $6,000.
Performance evaluation by z-scoresPerformance evaluation by z-scores
Which of the representatives would you hire if you have one sales position to fill?
12:24 AM
![Page 34: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/34.jpg)
34
Performance evaluation by z-scoresPerformance evaluation by z-scores
3500
500,2000,4
B
B
BBB
Z
S
XXZ
Sales rep. Atif
XB= $2,500
S= $500
XB= $4,000
Sales rep. Asad
XP =$4,800
SP = $600
XP= $6,000
2600
800,4000,6
P
P
PPP
Z
S
XXZ
Mr. Atif is the best choice12:24 AM
![Page 35: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/35.jpg)
35
valuesof 68%about contains1SX
The Empirical RuleThe Empirical Rule
X
68%
1SX
valuesof 99.7%about contains3SX
valuesof 95%about contains2SX 95%
X 2S
X 3S
99.7%
12:24 AM
![Page 36: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/36.jpg)
Chebysev’s TheoremChebysev’s Theorem
![Page 37: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/37.jpg)
37
A distribution in which the values equidistant from
the centre have equal frequencies is defined to be
symmetrical and any departure from symmetry is
called skewness.
1. Length of Right Tail = Length of Left
Tail
2. Mean = Median = Mode
3. Sk=0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
12:24 AM
Measures of Skewness
![Page 38: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/38.jpg)
38
A distribution is positively skewed, if the observations tend to concentrate more at the lower end of the possible values of the variable than the upper end. A positively skewed frequency curve has a longer tail on the right hand side
1. Length of Right Tail > Length of Left
Tail
2. Mean > Median > Mode
3. SK>0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
MeasuresMeasures ofof SkewnessSkewness
12:24 AM
![Page 39: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/39.jpg)
39
A distribution is negatively skewed, if the
observations tend to concentrate more at the upper
end of the possible values of the variable than the
lower end. A negatively skewed frequency curve
has a longer tail on the left side.
1. Length of Right Tail < Length of Left
Tail
2. Mean < Median < Mode
3. SK< 0
a) Sk=(Mean-Mode)/SD
b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)
12:24 AM
Measures of Skewness
![Page 40: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/40.jpg)
12:24 AM 40
The Kurtosis is the degree of peakedness or flatness of a unimodal (single humped) distribution,
• When the values of a variable are highly concentrated around the mode, the peak of the curve becomes relatively high; the curve is Leptokurtic.
• When the values of a variable have low concentration around the mode, the peak of the curve becomes relatively flat;curve is Platykurtic.
• A curve, which is neither very peaked nor very flat-toped, it is taken as a basis for comparison, is called Mesokurtic/Normal.
Measures of Kurtosis
![Page 41: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/41.jpg)
4112:24 AM
Measures of Kurtosis
![Page 42: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/42.jpg)
42
Measures of Kurtosis
1. If Coefficient of Kurtosis > 3 ----------------- Leptokurtic.
2. If Coefficient of Kurtosis = 3 ----------------- Mesokurtic.
3. If Coefficient of Kurtosis < 3 ----------------- is Platykurtic.
4
22
n X-XCoefficient of Kurtosis=
X-X
12:24 AM
![Page 43: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/43.jpg)
Moments about originC.I f x f.x f. x^2 f.x^3 f.x^4
2.5 2.9 2 2.7 5.4 14.58 39.366 106.2882
3 3.4 7 3.2 22.4 71.68 229.376 734.0032
3.5 3.9 17 3.7 62.9 232.73 861.101 3186.074
4 4.4 25 4.2 105 441 1852.2 7779.24
4.5 4.9 20 4.7 94 441.8 2076.46 9759.362
5 5.4 12 5.2 62.4 324.48 1687.296 8773.939
5.5 5.9 9 5.7 51.3 292.41 1666.737 9500.401
6 6.4 8 6.2 49.6 307.52 1906.624 11821.07
Total (Σ) 100 453 2126.2 10319.16 51660.38
12:24 AM 43
![Page 44: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/44.jpg)
Moments about origin
12:24 AM 44
![Page 45: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/45.jpg)
Regression
X Price
Y (Quantity
Demanded) x*y x^215 440 6600 22520 430 8600 40025 450 11250 62530 370 11100 90040 340 13600 160050 370 18500 2500
Σx=180 Σy=2400 Σx.y=69650 Σx^2=6250
12:24 AM 45
![Page 46: Statistics-Measures of dispersions](https://reader034.fdocuments.us/reader034/viewer/2022042509/54827ebdb47959e20c8b47d0/html5/thumbnails/46.jpg)
12:24 AM 46