Paris And Mitterand Meltem Ergin-Nazlı Keklik-Serhat Yalçınkaya
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
-
Upload
sarah-webb -
Category
Documents
-
view
215 -
download
0
Transcript of Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Dr. Serhat Eren1
CHAPTER 6CHAPTER 6
NUMERICAL DESCRIPTORS OF DATANUMERICAL DESCRIPTORS OF DATA
Dr. Serhat Eren2
6.1 CHAPTER OBJECTIVES6.1 CHAPTER OBJECTIVES
Numerical measures of center: the mean, the median, and the mode
Numerical measures of variability: the range and the standard deviation
Describing a set of data: the empirical rule and box-plots
Descriptive statistics for grouped dataMeasures of relative standing: percentiles
and percentile rankIdentifying outliers: z-scores and box-
plots
Dr. Serhat Eren3
6.2 DESCRIBING DATA NUMERICALLY6.2 DESCRIBING DATA NUMERICALLY
Numerical measures calculated from the data are known as either statistics or parameters.
A statistic is a numerical descriptor that is calculated from sample data and is used to describe the sample. Statistics are usually represented by Roman letters.
A parameter is a numerical descriptor that is used to describe a population. Parameters are usually represented by Greek letters.
Dr. Serhat Eren4
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
6.3.1 The Arithmetic MeanThe mean, or average, is calculated by
adding all of the data values in the sample and then dividing by the number of values. The symbol for the sample mean is (this is read as X-bar).
The sample mean is the center of balance of a set of data, and is found by adding up all of the data values and dividing by the number of observations.
X
nsobservatioofnumberTotal
sampletheinvaluestheallofSumX
Dr. Serhat Eren5
The population parameter that corresponds to the sample mean is the population mean, (mu).
The population mean is represented by the Greek letter (mu).
Using the notation, we can write the formula for the sample mean as:
n
xor
n
xX
n
i i 1
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren6
Dr. Serhat Eren7
6.3.1.A What Does the Sample Mean Really Measure?You can think of the sample mean as the
balance point of the data.
The value of X balances the higher values against the lower ones.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren8
6.3.2 The Sample MedianThe sample median is a measure of the
middle of the data after it is sorted from lowest to highest.
The sample median is the value of the middle observation in an ordered set of data.
Finding the sample median requires sorting the data set first. Once this is done, the sample median is the value of the observation that is in the middle of the data.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren9
6.3.2 The Sample MedianThe exact location of the middle will depend
on whether the number of observations in the sample is even or odd.– Step 1: If the number of observations in
the sample, n, is odd, then the median is the value of the observation in the (n+1)/2 position.
– Step 2: If n is even, then the median is the average of the values in the n/2 and n/2 + 1 positions.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren10
Dr. Serhat Eren11
Dr. Serhat Eren12
6.3.2.A Why Use Two Different Measures?The median tells you that half of the
observations in the sample are above that value and half of the observations are below it.
Because it is a measure of location it ignores the actual values of the observations and may not fully reflect the sample data.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren13
The mean uses all of the data values in its calculation and measures the center of balance of the data. While it can be shifted by extreme values, it does reflect all of the data values equally.
If we start out with a symmetric, mound-shaped distribution, then the mean and the median are both located at the center of the distribution, at the bump. This is illustrated in Figure 6.2.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren14
Dr. Serhat Eren15
6.3.3 Comparing the Mean and the MedianWe know that when the data are more
spread out in one direction, then the mean is pulled toward these values, in the direction of the skew. This is illustrated in Figure 6.3.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren16
6.3.4 The Sample ModeThe sample mode is the data value that has
the highest frequency of occurrence in the sample.
It would appear that the mode would be a very good measure of a typical value.
But depending on the size of the sample and the number of possible data values, there may not be any repeated values in the sample. That is, for some samples, the mode may not exist.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren17
For continuous data, we often refer to the modal class in a frequency distribution or histogram.
The modal class is the class interval in a frequency distribution or histogram that has the highest frequency.
Another problem with the mode is that there may appear to be more than one mode for a sample. This frequently happens with small samples.
6.3 MEASURES OF CENTRAL TENDENCY6.3 MEASURES OF CENTRAL TENDENCY
Dr. Serhat Eren18
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
6.4.1 The Sample RangeThe simplest measure of dispersion, the
sample range, involves looking at the two extreme values in the sample: the highest (maximum) and the lowest (minimum) values.
The sample range, R, is the difference between the maximum and minimum observations in the sample.
Dr. Serhat Eren19
The sample range is very easy to calculate and understand.
It gives information about the distance from one end of an ordered data set to the other.
If the sample data are symmetric, then it also gives information about the spread of the data relative to the measures of central tendency.
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren20
Dr. Serhat Eren21
6.4.2 The Sample Standard DeviationThe standard deviation is most often
defined relative to another measure of dispersion called the sample variance.
In practice, the measure that is used is the standard deviation because its units and order of magnitude are the same as those of the actual data.
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren22
The sample variance, s², is the average of the squared deviations of the data values from the sample mean.
The sample standard deviation, s, is the positive square root of the sample variance.
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren23
6.4.2 The Sample Standard DeviationTo calculate the sample standard
deviation, you calculate the sample variance, s², first, using the formula
To obtain the sample standard deviation, s, you take the positive square root of the sample variance to obtain
1
)(1
2
2
n
xxs
n
i i
2ss
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren24
6.4.2 The Sample Standard DeviationThe population variance and standard
deviation are represented by the Greek letter (sigma), where ² is the population variance and is the population standard deviation.
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren25
Dr. Serhat Eren26
Dr. Serhat Eren27
6.4.3 Interpreting the Standard Deviation-The Empirical RuleThe empirical rule says that for a mound-
shaped, symmetric distribution– about 68% of all observations are within one
standard deviation of the mean – about 95% of all observations are within two
standard deviations of the mean – almost all (more than 99%) of the observations
are within three standard deviations of the mean yourself.
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren28
The empirical rule (Figure 6.5) is defined for large data sets and distributions that are symmetric and mound-shaped, often called bell-shaped or normal curves.
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren29
6.4.4 z-ScoresA z-score measures the number of
standard deviations that a data value is from the mean.
To calculate the z-score of a data value we first find the distance that the data value is from the mean and then divide by the standard deviation:
X
DeviationdardS
meantheandvaluedatathebetweenceDisz
tan
tan
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD
Dr. Serhat Eren30
As in the empirical rule, fort sample data we substitute and s for and .
A positive z-score indicates that the data value is above the mean, while a negative z-score indicates that the data value is below the mean.
X
6.4 MEASURES OF DISPERSION OR SPREAD6.4 MEASURES OF DISPERSION OR SPREAD