MSV 33: Measures of Spread .

19
MSV 33: Measures of Spread www.making-statistics- vital.co.uk

Transcript of MSV 33: Measures of Spread .

Page 1: MSV 33: Measures of Spread .

MSV 33: Measures of Spread

www.making-statistics-vital.co.uk

Page 2: MSV 33: Measures of Spread .

The Bee Academy

Page 3: MSV 33: Measures of Spread .

‘And our topic today, my fellow bees, is spread!’

‘Mmm...’

Professor Zzub

Page 4: MSV 33: Measures of Spread .

‘No, no, no, Millie! I mean, How can we measure

how spread out a data set is!’

‘The data sets 1, 3, 5, 7, 9 and 3, 4, 5, 6, 7have the same mean, but the first set

is clearly more spread out than the second.’

‘I’m lost.Example please...’

Page 5: MSV 33: Measures of Spread .

‘So you are asking how we could measure that – how about the

top number take away the bottom for each set? If the

spread is big, that’ll be big!’

‘Nice idea, Ding – and this measure is used! It’s called the RANGE. So the range for our first set

is 9 - 1 = 8, while the range for our second set is 7 - 3 = 4.’

1, 3, 5, 7, 9 and 3, 4, 5, 6, 7

Page 6: MSV 33: Measures of Spread .

‘Let me guess – there’s more to it than that.’

‘Sadly, Brenda, the range is badly affected by extreme values or

outliers. It can give a rather misleading picture of the data.’

1, 3, 5, 7, 9, 11, 13 and 3, 4, 5, 6, 7, 8, 20

Range = 12 Range = 17

Page 7: MSV 33: Measures of Spread .

‘Okay, then, don’t take all the data; chuck away the lowest quarter, and the highest

quarter, and THEN take the range. Just taking the middle 50%, you’ve got rid

of all those extreme values.‘

‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

‘Great idea, Paul – so for example with this small data set, we can add the quartiles, Q1, Q2 (the median) and Q3...’

‘... And the Interquartile Range is Q3 – Q1 = 6, the range of the middle 50% of the data.

Page 8: MSV 33: Measures of Spread .

‘I’ve got another idea!’

‘Go back to this data set again. We could find the mean, then find the difference of each of these numbers from the mean,

and then add the differences together. If the numbers are spread out, then this will be big!’

‘What’s that, Millie?’

‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

Page 9: MSV 33: Measures of Spread .

‘That is nearly a great idea, Millie, but watch what happens...’

‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

‘So the differences add to 0. Always.’

‘But that is easily fixed...’

Page 10: MSV 33: Measures of Spread .

‘Find the POSITIVE difference of each of these numbers from the mean,

and then add these differences together.It won’t be 0 now!’

‘Indeed, Virender, the sum now is 30.

But is that a fair measure of spread?’

Page 11: MSV 33: Measures of Spread .

‘Surely you have to divide by the total number of numbers you have -

to take an average!’

‘Excellent, Ding! And this takes us to what is called

‘the mean deviation from the mean’. If we write it in symbols, we have

Page 12: MSV 33: Measures of Spread .

‘There is still a problem, however – The modulus function is not always easy to handle mathematically. It is true that |ab|=|a||b|, but it is not generally true

that |a + b| = |a|+|b|.’

‘Well, there are other ways to make the differences from the mean all positive.

You could square the differences, for example!’

‘Great idea, Millie. So we can find the square of the difference of each of these numbers from the mean,

and then add these together. Then divide by the total number of numbers we have.’

Page 13: MSV 33: Measures of Spread .

‘This is called the MSD, or ‘the population variance’. If we multiply out, we get an alternative formulation

that is usually easier to calculate, especially if the mean is not a whole number.’

Page 14: MSV 33: Measures of Spread .

‘As before.’

‘So have we got it now? Is this the measure of spread

we generally use?’

‘We are very nearly there, Brenda. There is, sadly, a problem with the MSD. Most of the time we are taking a SAMPLE from a population. We would like the expectation of our variance statistic to be the variance of the population. But in order for that to happen...

Page 15: MSV 33: Measures of Spread .

‘We have to take our MSD statistic...

‘And divide by n-1 rather than n.’

‘This statistic is called ‘the ‘sample variance’ or simply the ‘variance’. The expected value of this is the population variance.

As with the population variance statistics, there is an alternative form...

‘Which is often easier to use.’

Page 16: MSV 33: Measures of Spread .

‘So is that all the measures of

spread we need to know?’

‘I should add, Virender, that we do use the square root of the MSD (called RMSD) and

the square root of the variance (called the Standard Deviation)

as measures of spread too. The advantages of the RMSD and the SD are that they are

measured in the same units as the random variable we are interested in.’

‘So to summarise...’

Page 17: MSV 33: Measures of Spread .

Range = Top value –

bottom value.

Interquartile range (IQR)= Q3 Q1, where the quartiles Q1, Q2

and Q3 divide the data set into four groups of equal size.

Page 18: MSV 33: Measures of Spread .

Mean Square Deviation

(population variance).

Root Mean Square Deviation = RMSD.

Variance (or sample variance).

Standard Deviation.

Page 19: MSV 33: Measures of Spread .

www.making-statistics-vital.co.uk

is written by Jonny Griffiths

With thanks to pixabay.com

[email protected]