Measures of dispersion

59
AGBS | Bangalore Measures of Dispersion

Transcript of Measures of dispersion

Page 1: Measures of dispersion

AGBS | Bangalore

Measures of Dispersion

Page 2: Measures of dispersion

Summary Measures

Arithmetic Mean

Median

Mode

Describing Data Numerically

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Geometric Mean

Skewness

Central Tendency Variation ShapeQuartiles

Page 3: Measures of dispersion

Quartiles

Quartiles split the ranked data into 4 segments with an equal number of values per segment

25% 25% 25% 25%

The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger

Q2 is the same as the median (50% are smaller, 50% are larger)

Only 25% of the observations are greater than the third quartile

Q1 Q2 Q3

Page 4: Measures of dispersion

Quartile Formulas

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = n/4

Second quartile position: Q2 = n/2 (the median position)

Third quartile position: Q3 = 3n/4

where n is the number of observed values

Page 5: Measures of dispersion

(n = 9)

Q1 is in the 9/4 = 2.25 position of the ranked data

so Q1 = 12.25

Quartiles

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency

Page 6: Measures of dispersion

Quartiles

Example:(continued)

Overtime Hours

10-15 15-20 20-25 25-30 30-35 35-40 Total

No. of employees

11 20 35 20 8 6 100

Calculate Median, First Quartile, 7th Decile & Interquartile Range

Page 7: Measures of dispersion

Same center, different variation

Measures of Variation

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Page 8: Measures of dispersion

Range

Simplest measure of variation Difference between the largest and the smallest

values in a set of data:

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Page 9: Measures of dispersion

Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Page 10: Measures of dispersion

Coefficient of Range

Coefficient of Range = [L-S]/[L+S]

Where,

L = Largest value in the data set

S = Smallest value in the data set

Page 11: Measures of dispersion

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate some high- and low-valued observations and calculate the range from the remaining values

Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1

Page 12: Measures of dispersion

Interquartile Range

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Page 13: Measures of dispersion

Coefficient of QD

Coefficient of QD= [Q3-Q1]/[Q3+Q1]

Page 14: Measures of dispersion

Average Deviation

Coefficient of AD = (Average Deviation)/(Mean or Median)

Page 15: Measures of dispersion

Calculate the Coefficient of AD (mean)

Sales (in thousand Rs) No. of days

10-20 3

20-30 6

30-40 11

40-50 3

50-60 2

Page 16: Measures of dispersion

Average (approximately) of squared deviations of values from the mean

Sample variance:

Variance

1-n

)X(XS

n

1i

2i

2∑

=

−=

Where = mean

n = sample size

Xi = ith value of the variable X

X

Page 17: Measures of dispersion

Standard Deviation

Most commonly used measure of variation Shows variation about the mean Is the square root of the variance Has the same units as the original data Sample standard deviation:

1-n

)X(XS

n

1i

2i∑

=

−=

Page 18: Measures of dispersion

Calculation Example:Sample Standard Deviation

Sample Data (Xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = X = 16

4.3087

130

7

16)(2416)(1416)(1216)(10

1-n

)X(24)X(14)X(12)X(10S

2222

2222

==

−++−+−+−=

−++−+−+−=

A measure of the “average” scatter around the mean

Page 19: Measures of dispersion

Population vs. Sample Standard Deviation

Page 20: Measures of dispersion

Population vs. Sample Standard Deviation (Grouped Data)

Page 21: Measures of dispersion

Calculate the Standard Deviation for the following sample

Number of pins knocked down in ten-pin bowling matches

Page 22: Measures of dispersion

Solution

Number of pins knocked down in ten-pin bowling matches

Page 23: Measures of dispersion

Calculate the Standard Deviation for the following sample

Heights of students

Page 24: Measures of dispersion

Solution

Page 25: Measures of dispersion

Calculate the Standard Deviation for the following sample

Number of typing errors

Page 26: Measures of dispersion

Comparing Standard Deviations

Mean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 S = 0.926

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 S = 4.567

Data C

Page 27: Measures of dispersion

Measuring variation

Small standard deviation

Large standard deviation

Page 28: Measures of dispersion

Advantages of Variance and Standard Deviation

Each value in the data set is used in the calculation

Values far from the mean are given extra weight (because deviations from the mean are squared)

Page 29: Measures of dispersion

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of

data measured in different units

100%X

SCV ⋅

=

Page 30: Measures of dispersion

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Both stocks have the same

standard deviation, but stock B is less

variable relative to its price

10%100%$50

$5100%

X

SCVA =⋅=⋅

=

5%100%$100

$5100%

X

SCVB =⋅=⋅

=

Page 31: Measures of dispersion

Sample vs. Population CV

Page 32: Measures of dispersion

Calculate!

Calculate the coefficient of variation for the following data set. The price, in cents, of a stock over five trading days

was 52, 58, 55, 57, 59.

Page 33: Measures of dispersion

Solution

Page 34: Measures of dispersion

Pooled Standard Deviation

Page 35: Measures of dispersion

Pooled Standard Deviation

Page 36: Measures of dispersion

Z Scores

A measure of distance from the mean (for example, a Z-score of 2.0 means that a value is 2.0 standard deviations from the mean)

The difference between a value and the mean, divided by the standard deviation

A Z score above 3.0 or below -3.0 is considered an outlier

S

XXZ

−=

Page 37: Measures of dispersion

Z Scores

Example: If the mean is 14.0 and the standard deviation is 3.0,

what is the Z score for the value 18.5?

The value 18.5 is 1.5 standard deviations above the mean

(A negative Z-score would mean that a value is less than the mean)

1.53.0

14.018.5

S

XXZ =−=−=

(continued)

Page 38: Measures of dispersion

Shape of a Distribution

Describes how data are distributed Measures of shape

Symmetric or skewed

Mean = Median Mean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Page 39: Measures of dispersion

If the data distribution is approximately bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

The Empirical Rule

1σμ ±

μ

68%

1σμ ±

Page 40: Measures of dispersion

contains about 95% of the values in the population or the sample

contains about 99.7% of the values in the population or the sample

The Empirical Rule

2σμ ±

3σμ ±

3σμ ±

99.7%95%

2σμ ±

Page 41: Measures of dispersion
Page 42: Measures of dispersion

Regardless of how the data are distributed, at least (1 - 1/k2) x 100% of the values will fall within k standard deviations of the mean (for k > 1)

Examples:

(1 - 1/12) x 100% = 0% ……..... k=1 (μ ± 1σ)

(1 - 1/22) x 100% = 75% …........ k=2 (μ ± 2σ)

(1 - 1/32) x 100% = 89% ………. k=3 (μ ± 3σ)

Chebyshev Rule

withinAt least

Page 43: Measures of dispersion

Calculate!

Calculate the mean & the standard dev. Calculate the percentage of

observations between the mean & 2σ±

Variable Frequency

44-46 3

46-48 24

48-50 27

50-52 21

52-54 5

Page 44: Measures of dispersion

Find the correct Mean & Std. dev.

The Mean & Standard Deviation of a set of 100 observations were worked out as 40 & 5 respectively. It was later learnt that a mistake had been made in data entry – in place of 40 (the correct value), 50 was entered.

Page 45: Measures of dispersion

Skewness

A fundamental task in many statistical analyses is to characterize the location and variability of a data set (Measures of central tendency vs. measures of dispersion)

Both measures tell us nothing about the shape of the distribution

It is possible to have frequency distributions which differ widely in their nature and composition and yet may have same central tendency and dispersion.

Therefore, a further characterization of the data includes skewness

Page 46: Measures of dispersion

Positive & Negative Skew

Positive skewness There are more observations below the mean than

above it When the mean is greater than the median

Negative skewness There are a small number of low observations and a

large number of high ones When the median is greater than the mean

Page 47: Measures of dispersion

Measures of Skew

Skew is a measure of symmetry in the distribution of scores

Positive Skew

Negative Skew

Normal (skew = 0)

Page 48: Measures of dispersion

Measures of Skew

Page 49: Measures of dispersion

The Rules

I. Rule One. If the mean is less than the median, the data are skewed to the left.

II. Rule Two. If the mean is greater than the median, the data are skewed to the right.

Page 50: Measures of dispersion

Karl Pearson Coefficient of Skewness

Page 51: Measures of dispersion

Karl Pearson Coefficient - Properties

Advantage – Uses the data completelyDisadvantage – Is sensitive to extreme values

Page 52: Measures of dispersion

Calculate!

Calculate the coefficient of skewness & comment on its value

Page 53: Measures of dispersion

Calculate!

Calculate the coefficient of skewness & comment on its value

Profits (lakhs) No. of Companies

100-120 17

120-140 53

140-160 199

160-180 194

180-200 327

200-220 208

220-240 2

Page 54: Measures of dispersion

Bowley’s Coefficient of Skewness

Page 55: Measures of dispersion

Bowley’s Coefficient - Properties

Advantage – Is not sensitive to extreme valuesDisadvantage – Does not use the data completely

Page 56: Measures of dispersion

Calculate!

Page 57: Measures of dispersion

Kelly’s Coefficient of Skewness

Page 58: Measures of dispersion

Calculate!

Obtain the limits of daily wages of the central 50% of the workers

Calculate Bowley’s & Kelly’s Coefficients of Skewness

Wages No. of Workers

Below 200 10

200-250 25

250-300 145

300-350 220

350-400 70

Above 400 30

Page 59: Measures of dispersion

Moments about the Mean