Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

30
Dr. Serhat Eren 1 CHAPTER 6 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA NUMERICAL DESCRIPTORS OF DATA

Transcript of Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Page 1: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren1

CHAPTER 6CHAPTER 6

NUMERICAL DESCRIPTORS OF DATANUMERICAL DESCRIPTORS OF DATA

Page 2: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren2

6.1 CHAPTER OBJECTIVES6.1 CHAPTER OBJECTIVES

Numerical measures of center: the mean, the median, and the mode

Numerical measures of variability: the range and the standard deviation

Describing a set of data: the empirical rule and box-plots

Descriptive statistics for grouped dataMeasures of relative standing: percentiles

and percentile rankIdentifying outliers: z-scores and box-

plots

Page 3: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren3

6.2 DESCRIBING DATA NUMERICALLY6.2 DESCRIBING DATA NUMERICALLY

Numerical measures calculated from the data are known as either statistics or parameters.

A statistic is a numerical descriptor that is calculated from sample data and is used to describe the sample. Statistics are usually represented by Roman letters.

A parameter is a numerical descriptor that is used to describe a population. Parameters are usually represented by Greek letters.

Page 4: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren4

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

6.3.1 The Arithmetic MeanThe mean, or average, is calculated by

adding all of the data values in the sample and then dividing by the number of values. The symbol for the sample mean is (this is read as X-bar).

The sample mean is the center of balance of a set of data, and is found by adding up all of the data values and dividing by the number of observations.

X

nsobservatioofnumberTotal

sampletheinvaluestheallofSumX

Page 5: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren5

The population parameter that corresponds to the sample mean is the population mean, (mu).

The population mean is represented by the Greek letter (mu).

Using the notation, we can write the formula for the sample mean as:

n

xor

n

xX

n

i i 1

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 6: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren6

Page 7: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren7

6.3.1.A What Does the Sample Mean Really Measure?You can think of the sample mean as the

balance point of the data.

The value of X balances the higher values against the lower ones.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 8: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren8

6.3.2 The Sample MedianThe sample median is a measure of the

middle of the data after it is sorted from lowest to highest.

The sample median is the value of the middle observation in an ordered set of data.

Finding the sample median requires sorting the data set first. Once this is done, the sample median is the value of the observation that is in the middle of the data.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 9: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren9

6.3.2 The Sample MedianThe exact location of the middle will depend

on whether the number of observations in the sample is even or odd.– Step 1: If the number of observations in

the sample, n, is odd, then the median is the value of the observation in the (n+1)/2 position.

– Step 2: If n is even, then the median is the average of the values in the n/2 and n/2 + 1 positions.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 10: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren10

Page 11: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren11

Page 12: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren12

6.3.2.A Why Use Two Different Measures?The median tells you that half of the

observations in the sample are above that value and half of the observations are below it.

Because it is a measure of location it ignores the actual values of the observations and may not fully reflect the sample data.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 13: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren13

The mean uses all of the data values in its calculation and measures the center of balance of the data. While it can be shifted by extreme values, it does reflect all of the data values equally.

If we start out with a symmetric, mound-shaped distribution, then the mean and the median are both located at the center of the distribution, at the bump. This is illustrated in Figure 6.2.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 14: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren14

Page 15: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren15

6.3.3 Comparing the Mean and the MedianWe know that when the data are more

spread out in one direction, then the mean is pulled toward these values, in the direction of the skew. This is illustrated in Figure 6.3.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 16: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren16

6.3.4 The Sample ModeThe sample mode is the data value that has

the highest frequency of occurrence in the sample.

It would appear that the mode would be a very good measure of a typical value.

But depending on the size of the sample and the number of possible data values, there may not be any repeated values in the sample. That is, for some samples, the mode may not exist.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 17: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren17

For continuous data, we often refer to the modal class in a frequency distribution or histogram.

The modal class is the class interval in a frequency distribution or histogram that has the highest frequency.

Another problem with the mode is that there may appear to be more than one mode for a sample. This frequently happens with small samples.

6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY

Page 18: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren18

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

6.4.1 The Sample RangeThe simplest measure of dispersion, the

sample range, involves looking at the two extreme values in the sample: the highest (maximum) and the lowest (minimum) values.

The sample range, R, is the difference between the maximum and minimum observations in the sample.

Page 19: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren19

The sample range is very easy to calculate and understand.

It gives information about the distance from one end of an ordered data set to the other.

If the sample data are symmetric, then it also gives information about the spread of the data relative to the measures of central tendency.

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 20: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren20

Page 21: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren21

6.4.2 The Sample Standard DeviationThe standard deviation is most often

defined relative to another measure of dispersion called the sample variance.

In practice, the measure that is used is the standard deviation because its units and order of magnitude are the same as those of the actual data.

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 22: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren22

The sample variance, s², is the average of the squared deviations of the data values from the sample mean.

The sample standard deviation, s, is the positive square root of the sample variance.

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 23: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren23

6.4.2 The Sample Standard DeviationTo calculate the sample standard

deviation, you calculate the sample variance, s², first, using the formula

To obtain the sample standard deviation, s, you take the positive square root of the sample variance to obtain

1

)(1

2

2

n

xxs

n

i i

2ss

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 24: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren24

6.4.2 The Sample Standard DeviationThe population variance and standard

deviation are represented by the Greek letter (sigma), where ² is the population variance and is the population standard deviation.

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 25: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren25

Page 26: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren26

Page 27: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren27

6.4.3 Interpreting the Standard Deviation-The Empirical RuleThe empirical rule says that for a mound-

shaped, symmetric distribution– about 68% of all observations are within one

standard deviation of the mean – about 95% of all observations are within two

standard deviations of the mean – almost all (more than 99%) of the observations

are within three standard deviations of the mean yourself.

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 28: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren28

The empirical rule (Figure 6.5) is defined for large data sets and distributions that are symmetric and mound-shaped, often called bell-shaped or normal curves.

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 29: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren29

6.4.4 z-ScoresA z-score measures the number of

standard deviations that a data value is from the mean.

To calculate the z-score of a data value we first find the distance that the data value is from the mean and then divide by the standard deviation:

X

DeviationdardS

meantheandvaluedatathebetweenceDisz

tan

tan

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD

Page 30: Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.

Dr. Serhat Eren30

As in the empirical rule, fort sample data we substitute and s for and .

A positive z-score indicates that the data value is above the mean, while a negative z-score indicates that the data value is below the mean.

X

6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD