Statistics lecture 4 (ch3)

69
1

description

Measures of Location & Dispersion

Transcript of Statistics lecture 4 (ch3)

Page 1: Statistics lecture 4 (ch3)

1

Page 2: Statistics lecture 4 (ch3)

NEXT LECTURE

• Please bring scientific calculator and calculator manual

2

Page 3: Statistics lecture 4 (ch3)

NUBE Test• You will not be asked to draw graphs – simply interpret

them• The following topics will be included in test 1:-

– Discounts, percentages and commissions– Multiple choice questions– Graphs– Mean, median, mode and standard deviations– Dispersion– Box and whisker plot– Probabilities– Probability distributions– Sampling distributions

3

Page 4: Statistics lecture 4 (ch3)

NUBE Test• REMEMBER - YOU WILL NOT BE GIVEN ALL OF

THE FORMULAE IN THE TEST, YOU MUST REMEMBER THE ONES THAT ARE NOT IN THE PRINT OUT GIVEN TO YOU:-

• Attached please find the NUBE6112 FORMULAE AND TABLES which students will need for all tests and exams. Please can you print these back to back and have them laminated because these are to be used year on year. It has also been confirmed by the IIE that any formulas that aren’t in the sheets, students are expected to know from their lecturers, so could you please pass this info on to your lecturers. The IIE were only given permission to print what is on the sheet, which is also at the back of the textbook. 4

Page 5: Statistics lecture 4 (ch3)

5

• Properties to describe numerical data:– Central tendency– Dispersion– Shape

• Measures calculated for:– Sample data

• Statistics

– Entire population• Parameters

Page 6: Statistics lecture 4 (ch3)

6

Measures of location• Arithmetic mean• Median• Mode

Page 7: Statistics lecture 4 (ch3)

7

UNGROUPED or raw data refers to data as they were collected, that is, before they are summarised or organised in any way or form

GROUPED data refers to data summarised in a frequency table

Page 8: Statistics lecture 4 (ch3)

8

sum of sample observationsnumber of sample observationsSample mean =

1

n

ii

xx

n

ARITHMETIC MEAN - This is the most commonly used measure

and is also called the mean.

Sample size

Page 9: Statistics lecture 4 (ch3)

9

sum of observationsnumber of observationsPopulation mean =

1

N

ii

x

N

ARITHMETIC MEAN - This is the most commonly used measure

and is also called the mean.

Population size

Xi = observations of the population

∑ = “the sum of”Mean

Page 10: Statistics lecture 4 (ch3)

10

• MEDIAN – Half the values in data set is smaller than median. – Half the values in data set is larger than median.– Order the data from small to large.

• Position of median – If n is odd:

• The median is the (n+1)/2 th observation.

– If n is even: • Calculate (n+1)/2• The median is the average of the values before and

after (n+1)/2.

Page 11: Statistics lecture 4 (ch3)

11

• MODE – Is the observation in the data set that occurs the

most frequently.

– Order the data from small to large.– If no observation repeats there is no mode.– If one observation occurs more frequently:

• Unimodal

– If two or more observation occur the same number of times:

• Multimodal

– Used for nominal scaled variables.

Page 12: Statistics lecture 4 (ch3)

12

9

1

1 2 3 4 5 6 7 8 9

262,89

9

ii

xx

nx x x x x x x x x

n

The mean of the sample of nine measurements is given by:

22 55 88 55 22 66

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

−3−3 55 −4−4

99

Page 13: Statistics lecture 4 (ch3)

13

−4 −3 2 2 5 5 5 6 8

(n+1)/2 = (9+1)/2 = 5th measurement

1 2 3 4 5 6 7 8 9

Median = 5

Odd number

The median of the sample of nine measurements is given by:

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 14: Statistics lecture 4 (ch3)

14

Determine the median of the sample of ten measurements.•:Order the measurements

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4 3

−4 −3 2 2 3 5 5 5 6 8

(n+1)/2 = (10+1)/2 = 5,5th measurement

1 2 3 4 5 6 7 8 9

Median = (3+5)/2 = 4

Even number

10

Page 15: Statistics lecture 4 (ch3)

15

Determine the mode of the sample of nine measurements.•Order the measurements

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

−4 −3 2 2 5 5 5 6 8

Mode = 5•Unimodal

Page 16: Statistics lecture 4 (ch3)

16

Determine the mode of the sample of ten measurements.•Order the measurements

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4 2

−4 −3 2 2 2 5 5 5 6 8

Mode = 2 and 5•Multimodal

Page 17: Statistics lecture 4 (ch3)

17

Concept questions 1 - 12 p 64 – Elementary Statistics for Business & Economics

Page 18: Statistics lecture 4 (ch3)

18

thi

th

where frequency of the i class interval

= class midpoint of the i class interval

i i

i

i

f xx

f

f

x

• ARITHMETIC MEAN – Data is given in a frequency table

– Only an approximate value of the mean

Page 19: Statistics lecture 4 (ch3)

19

12

-1

where = lower boundary of the median interval = upper boundary of the median interval = cumulative frequency of interval foregoing

median interval = frequency o

ni i i

e ii

i

i

i

i

u l FM l

fluF

f

f the median interval

• MEDIAN – Data is given in a frequency table. – First cumulative frequency ≥ n/2 will indicate the

median class interval.– Median can also be determined from the ogive.

Page 20: Statistics lecture 4 (ch3)

20

• MODE– Class interval that has the largest frequency

value will contain the mode.– Mode is the class midpoint of this class.– Mode must be determined from the histogram.

Page 21: Statistics lecture 4 (ch3)

21

To calculate the mean for the sample of the 48 hours:determine the class

midpoints

Number of Number of

calls hours fi xi

[2–under 5) 3 3,5 [5–under 8) 4 6,5[8–under 11) 11 9,5[11–under 14) 13 12,5[14–under 17) 9 15,5[17–under 20) 6 18,5 [20–under 23) 2 21,5

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Page 22: Statistics lecture 4 (ch3)

22

Number of Number of

calls hours fi xi

[2–under 5) 3 3,5 [5–under 8) 4 6,5[8–under 11) 11 9,5[11–under 14) 13 12,5[14–under 17) 9 15,5[17–under 20) 6 18,5[20–under 23) 2 21,5

n = 48

597

4812,44

i i

i

f xx

f

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Average number of calls per hour is 12,44.

Page 23: Statistics lecture 4 (ch3)

23

To calculate the median for the sample of the 48: hours:

determine the cumulative frequencies

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

n/2 = 48/2 = 24The first cumulative frequency ≥ 24

Page 24: Statistics lecture 4 (ch3)

24

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

12

Median

14 11 24 1811

1312,38

ni i i

ii

u l Fl

f

50% of the time less than 12,38 or 50% of the time more than 12,38 calls per hour.

Page 25: Statistics lecture 4 (ch3)

25

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Number of calls at a call centre

0

8

16

24

32

40

48

2 5 8 11 14 17 20 23

Number of calls

Nu

mb

er

of

ho

urs

The median can be determined form the ogive.

n/2 = 48/2 = 24

Median = 12,4 Read at A.A

Page 26: Statistics lecture 4 (ch3)

26

To calculate the mode for the sample of the 48 hours:

draw the histogram

Number of Number of

calls hours fi xi

[2–under 5) 3 3,5 [5–under 8) 4 6,5[8–under 11) 11 9,5[11–under 14) 13 12,5[14–under 17) 9 15,5[17–under 20) 6 18,5[20–under 23) 2 21,5

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Page 27: Statistics lecture 4 (ch3)

27

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Mode = 12,3 Read at A.

Number of calls at a call centre

0

2

4

6

8

10

12

14

Number of calls

Nu

mb

er

of

ho

urs

2 5 8 11 14 17 20 23 A

Page 28: Statistics lecture 4 (ch3)

28

Relationship between mean, median, and mode• If a distribution is symmetrical:

– the mean, median and mode are the same and lie at centre of distribution

• If a distribution is non-symmetrical: – skewed to the left or to the right– three measures differA positively skewed distribution

(skewed to the right)

MeanMedian

Mode

A negatively skewed distribution(skewed to the left)

MedianModeMean

ModeMedian

Mean

Page 29: Statistics lecture 4 (ch3)

29

MEAN – very affected by outliers (values that are very small or very large relative to the majority of the values in a dataset). Therefore MEAN not best measure where outliers existMEDIAN – not affected by outliers so better to use this than mean when they exist. Disadvantage is that its calculation does not include all the values in a datasetMODE – not affected by outliers. Disadvantage is that it only includes values with highest frequency in its calculation. When distribution is skewed median may provide a better description of data

Page 30: Statistics lecture 4 (ch3)

Group ClassworkGroup Classwork• Get into groups of 4

• Read p75 – 76 Module Manual – Choosing between the mean, median & mode

• Read p 67 – 69 - Elementary Statistics for Business Economics – Relationship between mean, median and mode & When to use the mean, median & mode

• Complete Izimvo Exchange 1 p 83 Module Manual

30

Page 31: Statistics lecture 4 (ch3)

31

Concept questions 13 – 25 p69 – Elementary Statistics for Business and Economics

Page 32: Statistics lecture 4 (ch3)

32

Measures of dispersion• Range• Variance• Standard deviation• Coefficient of variation

Page 33: Statistics lecture 4 (ch3)

33

• Range– The range of a set of measurements is the

difference between the largest and smallest values in the data set.

– Its major advantage is the ease with which it can be computed.

– Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points.

Page 34: Statistics lecture 4 (ch3)

34

• Variance and standard deviationDetermine how far the observations are from their mean.

Where:– = sample mean– x = values of the sample– n = sample size

x

Page 35: Statistics lecture 4 (ch3)

35

• Variance and standard deviationDetermine how far the observations are from their mean.

2

2Population variancex

N

2

Population standard deviationx

N

Where:– μ = population mean– x = values of the population – N = population size

Page 36: Statistics lecture 4 (ch3)

36

• Coefficient of variation– Measures the standard deviation relative to the

mean.– It is expressed as a percentage.– Used to compare samples that are measured in

different units.

100s

CVx

Page 37: Statistics lecture 4 (ch3)

37

Example - Given the following data sets:

1st: -4 -3 2 2 5 5 5 6 8

2nd : 0 1 2 3 3 4 5 5

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9

232,9

8x

The means are the same but the dispersion of Dataset 1 much larger than the dispersion of Data set 2.

Page 38: Statistics lecture 4 (ch3)

38

Example – Given the following data sets:

1st: −4 −3 2 2 5 5 5 6 8

2nd : 0 1 2 3 3 4 5 5

The range of the measurements is given by:

Largest value – smallest value

= 8 – (−4) = 5 − 0

= 12 = 5

Page 39: Statistics lecture 4 (ch3)

39

The variance of the measurements is given by:

Example – Given the following data sets:

1st: −4 −3 2 2 5 5 5 6 8

2nd: 0 1 2 3 3 4 5 5

Page 40: Statistics lecture 4 (ch3)

40

The standard deviation of the measurements is given by:

Example – Given the following data sets:

1st: −4 −3 2 2 5 5 5 6 8

2nd : 0 1 2 3 3 4 5 5

Page 41: Statistics lecture 4 (ch3)

41

The coefficient of variation of the measurements is given by:

4,08100% 100 140,69%

2,9

sCV

x

1,81100% 100 62,41%

2,9

sCV

x

Example – Given the following data sets:

1st: −4 −3 2 2 5 5 5 6 8

2nd : 0 1 2 3 3 4 5 5

Page 42: Statistics lecture 4 (ch3)

42

P75 Elementary Statistics for Business and Economics

By applying the value of the std dev in combination with the value of the mean, we are able to define where the majority of the data values are clustered using CHEBYCHEFF’s THEORUM

•At least 75% of the values in any sample will be within k= 2 std dev of the sample mean

•At least 89% of the values in any sample will be within k=3 std dev of the mean

•At least 94% of the values in any sample will be within k=4 std dev of the sample mean

NOTE: k= the number of std dev distances to either side of the mean

Page 43: Statistics lecture 4 (ch3)

43

EXAMPLE

Assume a data set has a mean of 50 and a std dev of 5.

Then 75% of the values in the data set occur in the interval:-

Mean + 2 std dev = 50 +/- 2(5)

=50 +/- 10= from 40 to 60

Page 44: Statistics lecture 4 (ch3)

Classwork/HomeworkClasswork/Homework• Concept questions 26 -35 , p 76 –

Elementary Statistics for Business and Economics

44

Page 45: Statistics lecture 4 (ch3)

45

• Variance and standard deviation

Where:– f = frequencies of class intervals– x = class midpoints of class intervals– n = sample size

Page 46: Statistics lecture 4 (ch3)

46

• Variance and standard deviation

212

2Population variance = Nfx fx

N

212

Population standard deviation Nfx fx

N

Where:– f = frequencies of class intervals– x = class midpoints of class intervals– N = population size

Page 47: Statistics lecture 4 (ch3)

47

Number of Number of

calls hours fi xi

[2–under 5) 3 3,5 [5–under 8) 4 6,5[8–under 11) 11 9,5[11–under 14) 13 12,5[14–under 17) 9 15,5[17–under 20) 6 18,5[20–under 23) 2 21,5

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Page 48: Statistics lecture 4 (ch3)

48

Number of Number of

calls hours fi xi

[2–under 5) 3 3,5 [5–under 8) 4 6,5[8–under 11) 11 9,5[11–under 14) 13 12,5[14–under 17) 9 15,5[17–under 20) 6 18,5[20–under 23) 2 21,5

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Page 49: Statistics lecture 4 (ch3)

49

Concept Questions 36 – 41, p 80 – Elementary Statistics for Business & Economics

Page 50: Statistics lecture 4 (ch3)

50

• Quartiles• Percentiles• Interquartile range

Page 51: Statistics lecture 4 (ch3)

51

• QUARTILES– Order data in ascending order. – Divide data set into four quarters.

25% 25% 25% 25%Q1 Q2 Q3Min Max

Page 52: Statistics lecture 4 (ch3)

52

Determine Q1 for the sample of nine measurements:•Order the measurements

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

−4 −3 2 2 5 5 5 6 81 2 3 4 5 6 7 8 9

Q1 = −3 + 0,5(2 − (−3)) = −0,5

th1 11 4 4 is the 1 9 1 2,5 valueQ n

Page 53: Statistics lecture 4 (ch3)

53

−4 −3 2 2 5 5 5 6 81 2 3 4 5 6 7 8 9

Q3 = 5 + 0,5(6 − 5) = 5,5

th3 33 4 4 is the 1 9 1 7,5 valueQ n

Determine Q3 for the sample of nine measurements:

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 54: Statistics lecture 4 (ch3)

54

Q3 = 5,5Q1 = −0,5

Interquartile range = Q3 – Q1

Interquartile range

= 5,5 – (−0,5)

= 6

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 55: Statistics lecture 4 (ch3)

55

• PERCENTILES– Order data in ascending order. – Divide data set into hundred parts.

20%80%P80Min Max

50% 50%P50 = Q2Min Max

10%

P10Min Max90%

Page 56: Statistics lecture 4 (ch3)

56

−4 −3 2 2 5 5 5 6 81 2 3 4 5 6 7 8 9

P20 = −3

nd2020 100 100

is the 1 9 1 2 valuepP n

Determine P20 for the sample of nine measurements:

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 57: Statistics lecture 4 (ch3)

57

To calculate Q1 for the sample of the 48 hours:

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

n/4 = 48/4 = 12The first cumulative frequency ≥ 12

Page 58: Statistics lecture 4 (ch3)

58

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

1 1 1

1

1

1

14

Q

11 8 12 78

119,36

nQ Q Q

QQ

u l Fl

f

25% of the time less than 9,36 or 75% of the time more than 9,36 calls per hour.

Page 59: Statistics lecture 4 (ch3)

59

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

= 3n/4 = 3(48)/4 = 36The first cumulative frequency ≥ 36

Q3

Page 60: Statistics lecture 4 (ch3)

60

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

3 3 3

3

3

3

314

Q

17 14 36 3114

915,67

nQ Q Q

QQ

u l Fl

f

75% of the time less than 15,67 or 25% of the time more than 15,67 calls per hour.

Page 61: Statistics lecture 4 (ch3)

61

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Q3 = 15,67Q1 = 9,36

IRR

= 15,67 – 9,36

= 6,31

Page 62: Statistics lecture 4 (ch3)

62

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

= np/100 = 48(60)/100 = 28,8The first cumulative frequency ≥ 28,8

P60

Page 63: Statistics lecture 4 (ch3)

63

Number of Number of

calls hours fi F

[2–under 5) 3 3 [5–under 8) 4 7[8–under 11) 11 18[11–under 14) 13 31[14–under 17) 9 40[17–under 20) 6 46 [20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

60

1100

P

14 11 28,8 1811

1313,49

npp p p

pp

u l Fl

f

60% of the time less than 13,49 or 40% of the time more than 13,49 calls per hour.

Page 64: Statistics lecture 4 (ch3)

Classwork/HomeworkClasswork/Homework• Concept Questions 42 – 53, p88 –

Elementary Statistics for Business & Economics

64

Page 65: Statistics lecture 4 (ch3)

65

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

Me = 12,38Q3 = 15,67Q1 = 9,36IRR = 6,31

LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11

UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14

BOX-AND-WISKER PLOT

1,5(IQR)1,5(IQR) IQR

• Any value smaller than −0,11 will be an outlier.

• Any value larger than 25,14 will be an outlier.

Page 66: Statistics lecture 4 (ch3)

NORMAL CURVE• Bell shaped, single peaked and symmetric• Mean is located at centre of a normal curve• Total area under a normal curve =1, half of this area on

the left side and half on the right side• Mean, median and mode are =• Two tails extend indefinitely to the left and to the right of

the mean as they approach the horizontal axis• Two tails never touch horizontal axis• Completely described by its mean and its standard

deviation. Mean specifies position of curve on horizontal axis, standard deviation specifies the shape of the curve

• Smaller the std dev the less spread out and more sharply peaked the curve 66

Page 67: Statistics lecture 4 (ch3)

NORMAL CURVE & Empirical Rule• Chebycheff’s Theorum applies to any dataset

irrespective of the underlying distribution• Empirical Rule applies specifically to data that follows a

normal curve• Empirical Rule states that for a normal curve, approx:-

– 68% of observations lie within one std dev of mean– 95% of observations lie within 2 std dev of mean– 99.7% of observations lie within 3 std dev of mean

NOTE: FOR A NORMAL CURVE ANY VALUE THAT IS NOT WITHIN 3 STD DEV OF MEAN IS A SUSPECT OUTLIER

67

Page 68: Statistics lecture 4 (ch3)

Classwork/HomeworkClasswork/Homework• Concept Questions 61-70, p95 –

Elementary Statistics for Business & Economics

68

Page 69: Statistics lecture 4 (ch3)

69

1. Activity 1 & 2 – Module Manual p85 – 862. Revision Exercises 1,2,3 p 87 -93 Module

Manual3. Supplementary Exercises questions 1 - 12 p

100 – Elementary Statistics for Business & Economics

4. Self Review Test p96 - Elementary Statistics for Business & Economics

5. Read Chapter 4 – Basic Probability, p105 – 150 - Elementary Statistics for Business & Economics

Classwork/HomeworkClasswork/Homework