Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

98
Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1

Transcript of Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Page 1: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Chapter 3

Descriptive Statistics: Numerical Methods

Statistics for Business(Env)

1

Page 2: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Descriptive Statistics

3.1 Describing Central Tendency3.2 Measures of Variation3.3 Percentiles, Quartiles and Box-and-Whiskers

Displays3.4 Covariance, Correlation, and the Least Square

Line3.5 Weighted Means and Grouped Data

(Optional)3.6 The Geometric Mean (Optional)

Page 3: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Describing Central Tendency

• In addition to describing the shape of a distribution, want to describe the data set’s central tendency– A measure of central tendency represents the

center or middle of the data– It is most typical or most representative of the

entire data

Page 4: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Parameters and Statistics• A population parameter is a number

calculated from all the population measurements that describes some aspect of the population

• A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample

Page 5: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Measures of Central Tendency

Mean, The average or expected value Median, Md The value of the middle point of

the ordered measurementsMode, Mo The most frequent value

Page 6: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The MeanPopulation X1, X2, …, XN

Population Mean

N

X

N

=1ii

Sample x1, x2, …, xn

Sample Mean

x

n

x x

n

=1ii

Page 7: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The Sample Mean

and is a point estimate of the population mean • It is the value to expect, on average and in the long run

• And the amount each member gets when the total is distributed equally within the sample

n

xxx

n

xx n

n

ii

...211

For a sample of size n, the sample mean is defined as

Page 8: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

8

Mean as the balance point for a distributionData: 2, 2, 6, 10mean=(2+2+6+10)/4=5

Page 9: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

9

Data: 3, 6, 6, 9, 11mean=(3+6+6+9+11)/5=7

What will happen to the mean if we add one more number to the data?

Page 10: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The Median

The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it

1. If the number of measurements is odd, the median is the middlemost measurement in the ordering

2. If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering

Page 11: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

11

Data: 3, 5, 8, 10, 11median=8

Page 12: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

12

Data: 3, 3, 4, 5, 7, 8median=(4+5)/2=4.5

Page 13: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

13

Data: 1, 2, 2, 3, 4, 4, 4, 4, 4, 5median=4??

Page 14: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

14

Page 15: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example for non-integer data

• Example 3.1: First five observations from Table 3.1:

30.8, 31.7, 30.1, 31.6, 32.1

• In order: 30.1, 30.8, 31.6, 31.7, 32.1

• There is an odd so median is one in middle, or 31.6

Page 16: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

16

Data: 2, 2, 2, 3, 3, 12 mean=4median=(2+3)/2=2.5

Page 17: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The Mode

The mode Mo of a population or sample of measurements is the measurement that occurs most frequently– Modes are the values that are observed “most typically”– Sometimes higher frequencies at two or more values

• If there are two modes, the data is bimodal• If more than two modes, the data is multimodal

– When data are in classes, the class with the highest frequency is the modal class

• The tallest box in the histogram

Page 18: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Histogram Describing the 50 Mileages

Page 19: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

19

Page 20: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Selecting a measure of Central Tendency

• Usually the mean is a good measure, because it uses every score in the distribution.

• There are some extreme cases in which the mean is not representative (or calculable). Then the mode and the median are used.

20

Page 21: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

21

Mean=(10+11*4+12*3+13+100)/10=20.3

Median=(11+12)/2=11.5

Mode=11

Page 22: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

22

Mean – not computableMedian=(12+13)/2=12.5Mode – not meaningful

Open-ended distributions A distribution is said to be open-ended when there is no upper limit (or lower limit) for one of the categories

Page 23: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Measures of Variation/variability• Knowing the measures of central tendency is not

enough• Both of the distributions below have identical

measures of central tendency

Page 24: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

24

Page 25: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Measures of VariationRange Largest minus the smallest measurement

Variance The average of the squared deviations of all the population measurements from the population mean

Standard The square root of the varianceDeviation

They provide quantitative measures of the degree to which data in a distribution are spread out or clustered together.

Page 26: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Range for discrete & continuous data

• The range is the distance between the largest score (Xmax) and the smallest score (Xmin) in the distribution for discrete data.

• For continuous data, you must also take into account the real limits of the maximum and minimum X values.

• range = URL Xmax - LRL Xmin

26

Page 27: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Population Variance and Standard Deviation

• The population variance (σ2) is the average of the squared deviations of the individual population measurements from the population mean (µ)

• The population standard deviation (σ) is the positive square root of the population variance

Page 28: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Variance• For a population of size N, the population

variance σ2 is:

• For a sample of size n, the sample variance s2 is:

N

xxx

N

xN

N

ii 22

22

11

2

2

11

222

211

2

2

n

xxxxxx

n

xxs n

n

ii

Page 29: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

29

Sample variability tends to underestimate the population value

Page 30: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Standard Deviation

• Population standard deviation (σ):

• Sample standard deviation (s):

2

2ss

Page 31: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example: Sample Variance and Standard Deviation

• Data points are: 60, 41, 15, 30, 34• Mean is 36• Variance is:

Standard deviation is:

4.2165

1082

5

436441255765

36343630361536413660 222222

71.144.216

Page 32: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

32

X

Page 33: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Percentiles & Quartiles

For a set of measurements arranged in increasing order, the pth percentile is a value such that p percent of the measurements fall at or below the value and (100-p) percent of the measurements fall at or above the value

• The first quartile Q1 is the 25th percentile • The second quartile (or median) is the 50th percentile• The third quartile Q3 is the 75th percentile• The interquartile range IQR is Q3 - Q1

Page 34: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Cumulative percentages & PERCENTILES

34

X=2 means that the measurement was somewhere between the real limits of 1.5 and 2.5.

30% of the individuals have been accumulated by the time you reach the top of the interval for X=2.

Q3

Q1

Page 35: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

35

Page 36: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

36

What is the 95th percentile? (Answer: X = 4.5.)

What is the percentile rank for X = 3.5? (Answer: 70%.)

What is the 50th percentile?What is the percentile rank for X = 4?estimates of these values by a standard procedure known as interpolation

Page 37: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

37

Page 38: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

38

Page 39: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

39

Using the following distribution of scores, we will use interpolation to find the 50th percentile:

Page 40: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

40

For the scores, the width of the interval is 5 points. For the percentages, thewidth is 50 points.The value of 50% is located 10 points from the top of the percentage interval. Asa fraction of the whole interval, this is 10 out of 50, or 1/5 of the total interval.

The 50th percentile is X = 8.5.

Page 41: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

41

USING INTERPOLATION TO FIND THE MEDIAN

Answer: X = 3.70 is the medianNotice that this is exactly the same answer we obtained using the graphic method of interpolation in Figure 3.7

Page 42: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

42

Page 43: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

43

Page 44: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example: Quartiles

20 customer satisfaction ratings:

1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10

Md = (8+8)/2 = 8

Q1 = (7+8)/2 = 7.5 Q3 = (9+9)/2 = 9

IQR = Q3 Q1 = 9 7.5 = 1.5

A slightly different way to find the quartiles (without using interpolation).

Page 45: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Five Number Summary in descriptive statistic

1. The smallest measurement2. The first quartile, Q1

3. The median, Md

4. The third quartile, Q3

5. The largest measurement

• Displayed visually using a box-and-whiskers plot

Page 46: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Box-and-whisker plots

46

A box and whisker plot (sometimes called a boxplot) is a graph that presents information from a five-number summary. It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set.

Page 47: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Outliers

• Outliers are measurements that are very different from other measurements– They are either much larger or much smaller than most of

the other measurements

• Outliers lie beyond the fences of the box-and-whiskers plot– Measurements between the inner and outer fences are

mild outliers– Measurements beyond the outer fences are severe

outliers

Page 48: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Box-and-Whiskers Plots

• The box plots the: – first quartile, Q1

– median, Md

– third quartile, Q3

– inner fences– outer fences

From: Business Statistics in Practice, 5th Edition, Bowerman O’Connell Murphree,

Page 49: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Box-and-Whiskers Plots Continued

• Inner fences– Located 1.5IQR away from the quartiles:

• Q1 – (1.5 IQR)• Q3 + (1.5 IQR)

• Outer fences– Located 3IQR away from the quartiles:

• Q1 – (3 IQR)• Q3 + (3 IQR)

From: Business Statistics in Practice, 5th Edition, Bowerman O’Connell Murphree,

Page 50: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Box-and-Whiskers Plots Continued

• The “whiskers” are dashed lines that plot the range of the data– A dashed line drawn from the box below Q1 down

to the smallest measurement between the inner fences

– Another dashed line drawn from the box above Q3 up to the largest measurement between the inner fences

Page 51: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Box-and-Whiskers Plots Continued

Page 52: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Symmetric distributionSymmetric distribution: A distribution having the same shape on either side of the center

Skewed distributionSkewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.

Can be positively or negatively skewed, or bimodal

negative

positive

Page 53: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution

M o d e

M ed ia n

M ea n

Symmetric distributionSymmetric distribution: mean = median = mode

Page 54: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

54

Page 55: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Relationships Among Mean, Medianand Mode

Page 56: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The The leftleft tail is longer; the mass of the tail is longer; the mass of the distribution is concentrated on the distribution is concentrated on the right of the figure. The distribution is right of the figure. The distribution is also said to be also said to be left-skewedleft-skewed..

The The rightright tail is longer; the mass of the tail is longer; the mass of the distribution is concentrated on the left distribution is concentrated on the left of the figure. The distribution is also of the figure. The distribution is also said to be said to be right-skewedright-skewed..

Mean<Median<Mode Mode<Median<Mean

Relative Positions of the Mean, Median, and Mode

Page 57: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Skewness is the measurement of the lack of symmetry of the distribution.

The coefficient of skewness can range from -3.00 up to 3.00 when using the following formula:

A value of 0 indicates a symmetric distribution.Negatively Skewed Distribution has negative coefficient of skewness. Positively Skewed Distribution has positive coefficient of skewness.

Some software packages use a different formula which results in a wider range for the coefficient.

s

MedianXSk

3

Page 58: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Normal distribution (Gaussian Normal distribution (Gaussian distribution) is a symmetric distribution) is a symmetric distribution that often gives a good distribution that often gives a good description of data that cluster description of data that cluster around the mean. The graph of the around the mean. The graph of the distribution is bell-shaped, with a distribution is bell-shaped, with a peak at the mean, and is known as peak at the mean, and is known as the Gaussian function or bell curve. the Gaussian function or bell curve.

Normal distributionNormal distribution

Page 59: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The bean machine is a device The bean machine is a device invented by Sir Francis Galton to invented by Sir Francis Galton to demonstrate how the normal demonstrate how the normal distribution appears in nature. This distribution appears in nature. This machine consists of a vertical board machine consists of a vertical board with interleaved rows of pins. Small with interleaved rows of pins. Small balls are dropped from the top and balls are dropped from the top and then bounce randomly left or right then bounce randomly left or right as they hit the pins. The balls are as they hit the pins. The balls are collected into bins at the bottom collected into bins at the bottom and settle down into a pattern and settle down into a pattern resembling the Gaussian curve.resembling the Gaussian curve.

Normal distribution in natureNormal distribution in nature

Page 60: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Height (in.)Height (in.)

Normal distribution in natureNormal distribution in nature

Distribution of the heights of 1052 women fits the normal distribution, Distribution of the heights of 1052 women fits the normal distribution, with a goodness of fit p value of 0.75with a goodness of fit p value of 0.75

Page 61: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Histogram of daily percentage changes in the S&P 500 index

Page 62: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The Empirical Rule for a Normal/Gaussian distribution

• If a population has mean µ and standard deviation σ and is described by a normal distribution, then– 68.26% of the population measurements lie within one

standard deviation of the mean: [µ-σ, µ+σ]– 95.44% of the population measurements lie within two

standard deviations of the mean: [µ-2σ, µ+2σ]– 99.73% of the population measurements lie within three

standard deviations of the mean: [µ-3σ, µ+3σ]

Page 63: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Origin and meaning of “Six Sigma Process"

If one has six standard deviations between the process mean and If one has six standard deviations between the process mean and the nearest specification limit, as shown in the graph, practically the nearest specification limit, as shown in the graph, practically no items will fail to meet specifications.no items will fail to meet specifications.

If the upper and lower specification limits (USL, LSL) are at a distance of 6σ from the If the upper and lower specification limits (USL, LSL) are at a distance of 6σ from the mean, values lying that far away from the mean are extremely unlikely. Even if the mean, values lying that far away from the mean are extremely unlikely. Even if the mean were to move right or left by 1.5σ at some point in the future (1.5 sigma shift), mean were to move right or left by 1.5σ at some point in the future (1.5 sigma shift), there is still a good safety cushion. This is why if the mean is at least 6σ away from there is still a good safety cushion. This is why if the mean is at least 6σ away from the nearest specification limit, the process will be under good quality control. the nearest specification limit, the process will be under good quality control.

Six Sigma – quality control of process outputs

Page 64: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Chebyshev’s Theorem

• Let µ and σ be a population’s mean and standard deviation, then for any value k> 1

• At least 100(1 - 1/k2 )% of the population measurements lie in the interval [µ-kσ, µ+kσ]

• Or, at most 100/k2 % of the population measurements lie outside the interval [µ-kσ, µ+kσ]

Page 65: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

• Chebyshev’s Theorem tells us roughly the percentage of data with values that will fall within a certain number of standard deviations of the mean.

• Chebyshev's Theorem applies to all distributions regardless of their shape and can therefore be used for non normal distributions and distributions where the shape is unknown.

Chebyshev’s Theorem: what is it?

Page 66: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Chebyshev’s Theorem: comparision

Minimum percentage of the population lie within the interval

Page 67: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

One study of heights in the U.S. concluded that women have a mean height of 65 inches with a standard deviation of 2.5 inches. If this is correct, what is the percentage of women that have heights between 60 and 70 inches according to Chebyshev’s Theorem? If we assume the distribution is a normal distribution, what is the percentage of women with heights between 60 and 70 inches?

Since the interval between 60 & 70 inches is 2 σ away from the mean, according to Chebyshev’s Theorem, there are at least 100(1 - 1/22 )% or 75% of women with heights between 60 and 70 inches. But if we know the distribution is a normal distribution, then there is 95.44% of the population measurements lie within two standard deviations from the mean.

Chebyshev’s Theorem: Example

Page 68: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

z Scores• For any x in a population or sample, the associated z

score is defined as

deviation standard

mean

xz

Page 69: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Z-score is: the exact location of a scorewithin a distribution

The z-score transforms each X value into a signed number so that

1. The sign tells whether the score is located above (+) or below (-) the mean, and2. The number tells the distance between the score and the mean in terms of the number of standard deviations.

69

Page 70: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

70

Two distributions of exam scores. For both distributions, = 70, but for one distribution, = 3, and for the other, = 12. The position of X = 76 is very different for these two distributions.

σ σ

Z=(76-70)/12=0.5Z=(76-70)/3=2

Page 71: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

What if you score a 65 on a test?On the surface it might look bad.But what if that was the mean is 45 and the standard deviation is 10. Then your z-Score is Z = (65 – 45)/10 = 2.That means you are 2 σ above the mean value.From Chebyshev’s Theorem, you are at worse in the top 12.5% of the class if the distribution is symmetric. If the distribution is normal you are in the top 2-3% of the class.

z Scores: application

Page 72: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

z Scores: assignment

In order to get an “A”, you should be in the top 10% of the In order to get an “A”, you should be in the top 10% of the class.class.

What should be your z-Score to get an “A”, according to What should be your z-Score to get an “A”, according to Chebyshev’s Theorem and assuming the distribution is Chebyshev’s Theorem and assuming the distribution is symmetrical?symmetrical?

Page 73: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example: z Score• Population of profit margins for five American

companies: 8%, 10%, 15%, 12%, 5%

• µ = 10%, σ = 3.406%

Page 74: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

74

Page 75: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

75

TRANSFORMING z-SCORES TO A DISTRIBUTION WITH A PREDETERMINED mean and standard deviation

An instructor gives an exam to a psychology class. For this exam, the distributionof raw scores has a mean of 57 with sd= 14. The instructor would like to simplify the distribution by transforming all scores into a new, standardized distribution with mean= 50 and sd= 10. To demonstrate this process, we will consider what happens to two specific students: Joe, who has a raw score of X= 64 in the original distribution; and Maria, whose original raw score is X= 43.

Step 1: Transform each of the original raw scores into z-scores. For Joe, X= 64, so hisz-score is 0.5 For Maria, X= 43, and her z-score is -1.0

Step 2: Change the z-scores to the new, standardized scores.Joe’s z-score, z=0.5, indicates that he is above the mean by exactly 1/2 standard deviation, so his standardized score would be 55. Maria’s score is located 1 standard deviation below the mean. So her new score would be X= 40.

Page 76: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

76

Example: A psychologist has developed a new intelligence test. For years, the test has been given to a large number of people; for this population, mean= 65 and sd= 10. The psychologist would like to make the scores of this test comparable to scores on other IQ tests, which have mean= 100 and sd= 15. If the test is standardized so that it is comparable (has the same mean and sd) to other tests, what would be the standardized scores for the following individuals?

For A z=(75-65)/10=1 standardized score=100+1x15=115 B z=(45-65)/10=-2 100-2x15=70 C z=(67-65)/10=0.2 100+0.2x15=103

Page 77: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Coefficient of Variation

• Measures the size of the standard deviation relative to the size of the mean

• Coefficient of variation =standard deviation/mean × 100%

• Used to:– Compare the relative variabilities of values about the

mean– Compare the relative variability of populations or samples

with different means and different standard deviations– Measure risk

Page 78: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Covariance, Correlation, and the Least Squares Line• When points on a scatter plot seem to

fluctuate around a straight line, there is a linear relationship between x and y

• A measure of the strength of a linear relationship is the covariance sxy

1

1

n

yyxxs

n

iii

xy

Page 79: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Covariance

• A positive covariance indicates a positive linear relationship between x and y– As x increases, y increases

• A negative covariance indicates a negative linear relationship between x and y– As x increases, y decreases

Page 80: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Interpretation of the Sample Covariance

Page 81: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Interpretation of the Sample Covariance Continued

• Points in quadrant I correspond to xi and yi both greater than their averages– (x-) and (y-y ̅) both positive so covariance positive

• Points in quadrant III correspond to xi and yi both less than their averages– (x-) and (y-y ̅) both negative so covariance positive

• If sxy is positive, the points having the greatest influence are in quadrants I and III

• Therefore, a positive sxy indicates a positive linear relationship

x

x

Page 82: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Interpretation of the Sample Covariance Continued

• Points in quadrant II correspond to xi less than and yi greater than y – (x-) is negative and (y-y ̅) is positive so covariance negative

• Points in quadrant IV correspond to xi greater than and yi less than y ̅– (x-) is positive and (y- y ̅) is negative so covariance

negative• If sxy is negative, the points having the greatest

influence are in quadrants II and IV• Therefore, a negative sxy indicates a negative linear

relationship

x

x

x

x

Page 83: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Correlation Coefficient• Magnitude of covariance does not indicate

the strength of the relationship– Magnitude depends on the unit of

measurement used for the data

• Sample correlation coefficient (r) is a measure of the strength of the relationship that does not depend on the magnitude of the data

yx

xy

ss

sr

Page 84: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Correlation Coefficient Continued

• Sample correlation coefficient r is always between -1 and +1– Values near -1 show strong negative correlation– Values near 0 show no correlation– Values near +1 show strong positive correlation

• Sample correlation coefficient is the point estimate for the population correlation coefficient ρ

Page 85: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Least Squares Line

• If there is a linear relationship between x and y, might wish to predict y on the basis of x

• This requires the equation of a line describing the linear relationship

• Line is calculated based on least squares line– Discussed in detail in Chapter 13

Page 86: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Least Squares Line Continued

• Need to calculate slope (b1) and y-intercept (b0)

21x

xy

s

sb

xbyb 10

Page 87: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Least Squares Line for the Sales Volume Data

Page 88: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Weighted Means• Sometimes, some measurements are more

important than others– Assign numerical “weights” to the data

• Weights measure relative importance of the value

• Calculate weighted mean as

where wi is the weight assigned to the ith measurement xi

i

ii

w

xw

Page 89: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example: Weighted Mean

• June 2001 unemployment rates by census region– Northeast, 26.9 million in civilian labor force, 4.1%

unemployment rate– South, 50.6 million, 4.7% unemployment– Midwest, 34.7 million, 4.4% unemployment– West, 32.5 million, 5.0 unemployment

• Want the mean unemployment rate for the US

Page 90: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example: Weighted Mean Continued

• Want the mean unemployment rate for the U.S.

• Calculate it as a weighted mean– So that the bigger the region, the more heavily it

counts in the mean• The data values are the regional

unemployment rates• The weights are the sizes of the regional labor

forces

Page 91: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Example: Weighted Mean Continued

• Note that the unweigthed mean is 4.55%, which underestimates the true rate by 0.03%– That is, 0.0003 144.7 million = 43,410

workers

%5847144

29663532525734650926

05532447347465014926

..

......

........

Page 92: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Descriptive Statistics for Grouped Data

• Data already categorized into a frequency distribution or a histogram is called grouped data

• Can calculate the mean and variance even when the raw data is not available

• Calculations are slightly different for data from a sample and data from a population

Page 93: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Descriptive Statistics for Grouped Data (Sample)

• Sample mean for grouped data:

• Sample variance for grouped data:

fi is the frequency for class i

Mi is the midpoint of class i

n = Σfi = sample size

n

Mf

f

Mfx ii

i

ii

1

22

n

xMfs ii

Page 94: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Descriptive Statistics for Grouped Data (Population)

• Population mean for grouped data:

• Population variance for grouped data:

fi is the frequency for class i Mi is the midpoint of class iN = Σfi = population size

N

Mf

f

Mf ii

i

ii

N

xMf ii

22

Page 95: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The Geometric Mean (Optional)• For rates of return of an investment, use

the geometric mean to give the correct wealth at the end of the investment

• Suppose the rates of return (expressed as decimal fractions) are R1, R2, …, Rn for periods 1, 2, …, n

• The mean of all these returns is the calculated as the geometric mean:

1111 21 n

ng RRRR

Page 96: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

The arithmetic mean of the absolute values of the deviations from the arithmetic mean.

The main features: All values are used in the calculation. It is sensitive to unusual data. The absolute values are difficult to

manipulate. So, it is not used in inferential stat.

Mean Deviation

M D = X - X

n

3- 96

Mean DeviationMean Deviation

Page 97: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

Chapter Three: Summary

ONECalculate the arithmetic mean, median, mode, weighted mean, and the geometric mean.

TWO Explain the characteristics of each measure of location.

THREEIdentify the position of the arithmetic mean, median, and mode for both a symmetrical and a skewed distribution.

Page 98: Chapter 3 Descriptive Statistics: Numerical Methods Statistics for Business (Env) 1.

FOUR

Compute and interpret the range, the mean deviation, the variance, and the standard deviation of ungrouped data.

FIVEExplain the characteristics, uses, advantages, and disadvantages of each measure of dispersion.

SIX

Understand Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations.

Chapter Three: Summary