1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches...

34
1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean : 60.07 inches Median : 62.50 inches Range : 42 inches Variance : 117.681 Standard deviation : 10.85 inches Minimum : 36 inches Maximum : 78 inches First quartile : 51.63 inches Third quartile : 67.38 inches Count : 58 bears Sum : 3438.1 inches 0 10 20 30 40 50 60 70 80 Frequen cy Length in Inches Black Bears

description

3 ES Measures of Central Tendency Numerical values used to locate the middle of a set of data, or where the data is clustered The term average is often associated with all measures of central tendency

Transcript of 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches...

Page 1: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

1

ES Chapter 2 ~ Descriptive Analysis &Presentation of Single-Variable Data

Mean: 60.07 inchesMedian: 62.50 inchesRange: 42 inchesVariance: 117.681Standard deviation: 10.85 inchesMinimum: 36 inchesMaximum: 78 inchesFirst quartile: 51.63 inchesThird quartile: 67.38 inchesCount: 58 bearsSum: 3438.1 inches

0

10

20

30 40 50 60 70 80

Frequency

Length in Inches

Black Bears

Page 2: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

2

ES Chapter Goals

• Learn how to present and describe sets of data

• Learn measures of central tendency, measures of dispersion (spread), measures of position, and types of distributions

• Learn how to interpret findings so that we know what the data is telling us about the sampled population

Page 3: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

3

ES Measures of Central Tendency

• Numerical values used to locate the middle of a set of data, or where the data is clustered

• The term average is often associated with all measures of central tendency

Page 4: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

4

ES MeanMean: The type of average with which you are probably most familiar. The mean is the sum of all the values divided by the total number of values, n:

The population mean, , (lowercase mu, Greek alphabet), is the mean of all x values for the entire population

Notes:

We usually cannot measure but would like to estimate its value

A physical representation: the mean is the value that balances the weights on the number line

xn

xn

x x xi n 1 11 2( )

Page 5: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

5

ES Example Example: The following data represents the number of accidents

in each of the last 6 years at a dangerous intersection. Find the mean number of accidents: 8, 9, 3, 5, 2, 6, 4, 5:

x 18

8 9 3 5 2 6 4 5 5 25( ) .Solution:

In the data above, change 6 to 26:

Note: The mean can be greatly influenced by outliers

x 18

8 9 3 5 2 26 4 5 7 75( ) .Solution:

Page 6: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

6

ES MedianMedian: The value of the data that occupies the middle position when the data are ranked in order according to size

Notes: Denoted by “x tilde”: The population median, (uppercase mu, Greek alphabet), is

the data value in the middle position of the entire population

~x

d x n(~) 12

To find the median:1. Rank the data2. Determine the depth of the median:3. Determine the value of the median

Page 7: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

7

ES Example Example: Find the median for the set of data:

Solution: 1. Rank the data: 2, 2, 3, 3, 4, 8, 8, 9, 112. Find the depth:3. The median is the fifth number from either end in the ranked

data:

d x(~) ( )/ 9 1 2 5

~x 4

Suppose the data set is {4, 8, 3, 8, 2, 9, 2, 11, 3, 15}:1. Rank the data: 2, 2, 3, 3, 4, 8, 8, 9, 11, 152. Find the depth: 3. The median is halfway between the fifth and sixth

observations: ~ ( )/x 4 8 2 6

5.52/)110()~( xd

{4, 8, 3, 8, 2, 9, 2, 11, 3}

Page 8: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

8

ES

midrange L H2

Mode & MidrangeMode: The mode is the value of x that occurs most frequently

Midrange: The number exactly midway between a lowest value data L and a highest value data H. It is found by averaging the low and the high values:

Note: If two or more values in a sample are tied for the highest frequency (number of occurrences), there is no mode

Page 9: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

9

ES

Example: Consider the data set {12.7, 27.1, 35.6, 44.2, 18.0}

Midrange L H2

127 4422

2845. . .

When rounding off an answer, a common rule-of-thumb is to keep one more decimal place in the answer than was present in the original data

To avoid round-off buildup, round off only the final answer, not intermediate steps

Notes:

Example

Page 10: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

10

ES Measures of Dispersion• Measures of central tendency alone cannot

completely characterize a set of data. Two very different data sets may have similar measures of central tendency.

• Measures of dispersion are used to describe the spread, or variability, of a distribution

• Common measures of dispersion: range, variance, and standard deviation

Page 11: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

11

ES

range H L

RangeRange: The difference in value between the highest-valued (H) and the lowest-valued (L) pieces of data:

• Other measures of dispersion are based on the following quantity

Deviation from the Mean: A deviation from the mean, ,is the difference between the value of x and the mean

x xx

Page 12: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

12

ES Example Example: Consider the sample {12, 23, 17, 15, 18}.

Find 1) the range and 2) each deviation from the mean.

Solutions:

range H L 23 12 111)

-560

-21

Data Deviation from Mean_________________________ 12 23 17 15 18

x x x

x 15

12 23 17 15 18 17( )2)

Page 13: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

13

ES

s s 2

Sample Variance: The sample variance, s2, is the mean of the squared deviations, calculated using n 1 as the divisor:

Standard Deviation: The standard deviation of a sample, s, is the positive square root of the variance:

Sample Variance & Standard Deviation

sn

x x2 211

( ) where n is the sample size

s xn

21

SS( ) SS( ) ( )x x x xn

x 2 2 21

Note: The numerator for the sample variance is called the sum of squares for x, denoted SS(x):

where

Page 14: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

14

ES

Example: Find the 1) variance and 2) standard deviation for the data {5, 7, 1, 3, 8}:

Example

2.8)8.32(412 s1) 86.22.8 s2)

x x x ( )x x 2

0.22.2-3.8-1.83.2

0.044.8414.443.2410.24

5713824 0 32.08Sum:

Solutions:x 1

5 5 7 1 3 8 48( ) .First:

Page 15: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

15

ES

1

22

2

n

nx

xs

Notes The shortcut formula for the sample variance:

The unit of measure for the standard deviation is the same as the unit of measure for the data

Page 16: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

16

ES Measures of Position

• Measures of position are used to describe the relative location of an observation

• Quartiles and percentiles are two of the most popular measures of position

• An additional measure of central tendency, the midquartile, is defined using quartiles

• Quartiles are part of the 5-number summary

Page 17: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

17

ES

25% 25% 25% 25%L Q1 Q2 Q3

H

Ranked data, increasing order

Quartiles

1. The first quartile, Q1, is a number such that at most 25% of the data are smaller in value than Q1 and at most 75% are larger

2. The second quartile, Q2, is the median

3. The third quartile, Q3, is a number such that at most 75% of the data are smaller in value than Q3 and at most 25% are larger

Quartiles: Values of the variable that divide the ranked data into quarters; each set of data has three quartiles

Page 18: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

18

ES

Percentiles: Values of the variable that divide a set of ranked data into 100 equal subsets; each set of data has 99 percentiles. The kth percentile, Pk, is a value such that at most k% of the data is smaller in value than Pk and at most (100 k)% of the data is larger.

Percentiles

at most k % at most (100 - k )%PkL H

~x Q P 2 50

Notes:

The 1st quartile and the 25th percentile are the same: Q1 = P25

The median, the 2nd quartile, and the 50th percentile areall the same:

Page 19: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

19

ES Finding Pk (and Quartiles)• Procedure for finding Pk (and quartiles):

1. Rank the n observations, lowest to highest

2. Compute A = (nk)/1003. If A is an integer:

– d(Pk) = A.5 (depth)– Pk is halfway between the value of the data in the Ath position and the value of the next data

If A is a fraction:– d(Pk) = B, the next larger integer– Pk is the value of the data in the Bth position

Page 20: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

20

ES

1) k = 25: (20) (25) / 100 = 5, depth = 5.5, Q1 = 6

5.6 5.6 5.8 5.9 6.06.0 6.1 6.2 6.3 6.46.7 6.8 6.8 6.8 6.97.0 7.3 7.4 7.4 7.5

Example Example: The following data represents the pH levels of a random sample of swimming pools in

a California town. Find: 1) the first quartile, 2) the third quartile, and 3) the 37th percentile:

2) k = 75: (20) (75) / 100 = 15, depth = 15.5, Q3 = 6.95

3) k = 37: (20) (37) / 100 = 7.4, depth = 8, P37 = 6.2

Solutions:

Page 21: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

21

ES

5-Number Summary: The 5-number summary is composed of:1. L, the smallest value in the data set2. Q1, the first quartile (also P25)

3. , the median (also P50 and 2nd quartile)

4. Q3, the third quartile (also P75)

5. H, the largest value in the data set~x

5-Number Summary

Notes: The 5-number summary indicates how much the data is

spread out in each quarter The interquartile range is the difference between the first and

third quartiles. It is the range of the middle 50% of the data

Page 22: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

22

ES Box-and-Whisker DisplayBox-and-Whisker Display: A graphic representation of the5-number summary:

• The five numerical values (smallest, first quartile, median, third quartile, and largest) are located on a scale, either vertical or horizontal

• The box is used to depict the middle half of the data that lies between the two quartiles

• The whiskers are line segments used to depict the other half of the data

• One line segment represents the quarter of the data that is smaller in value than the first quartile

• The second line segment represents the quarter of the data that is larger in value that the third quartile

Page 23: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

23

ES

63 64 76 76 81 83 85 86 88 89 90 91 92 93 93 93 94 97 99 99 99 101 108 109 112

Example: A random sample of students in a sixth grade class was selected. Their weights are given in the table below. Find the 5-number summary for this data and construct a boxplot:

63 85 92 99 112L HQ1 Q3

~x

Solution:

Example

Page 24: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

24

ES Boxplot for Weight DataWeights from Sixth Grade Class

11010090807060

L Q1~x Q3 H

Weight

Page 25: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

25

ES

z x xs

value meanst.dev.

z-Scorez-Score: The position a particular value of x has relative to the mean, measured in standard deviations. The z-score is found by the formula:

Notes: Typically, the calculated value of z is rounded to the nearest hundredth The z-score measures the number of standard deviations above/below, or

away from, the mean z-scores typically range from -3.00 to +3.00 z-scores may be used to make comparisons of raw scores

Page 26: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

26

ES

Example: A certain data set has mean 35.6 and standard deviation 7.1. Find the z-scores for 46 and 33:

Example

Solutions:

z x xs

46 35 67 1

176..

.

46 is 1.46 standard deviations above the mean

z x xs

33 35671

37..

.

33 is -0.37 below standard deviations below the mean.

0

Page 27: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

27

ES Interpreting & UnderstandingStandard Deviation

• Standard deviation is a measure of variability, or spread

• Two rules for describing data rely on the standard deviation:

– Empirical rule: applies to a variable that is normally distributed

– Chebyshev’s theorem: applies to any distribution

Page 28: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

28

ES

Notes: The empirical rule is more informative than Chebyshev’s theorem since

we know more about the distribution (normally distributed) Also applies to populations Can be used to determine if a distribution is normally distributed

1. Approximately 68% of the observations lie within 1 standard deviation of the mean

2. Approximately 95% of the observations lie within 2 standard deviations of the mean

3. Approximately 99.7% of the observations lie within 3 standard deviations of the mean

Empirical Rule: If a variable is normally distributed, then:

Empirical Rule

Page 29: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

29

ES

xx s x sx s 2 x s2x s 3 x s3

68%95%99.7%

Illustration of the Empirical Rule

Page 30: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

30

ES

1) What percentage of weights fall between 5.7 and 7.3?2) What percentage of weights fall above 7.7?

Example Example: A random sample of plum tomatoes was selected from a local grocery store and their weights

recorded. The mean weight was 6.5 ounces with a standard deviation of 0.4 ounces. If the weights are normally distributed:

Solutions:( , ) ( . (0. ), . (0. )) ( . , . )x s x s 2 2 65 2 4 65 2 4 57 73Approximately 95% of the weights fall between 5.7 and 7.3

1)

( , ) ( . (0. ), . (0. )) ( . , . )x s x s 3 3 65 3 4 65 3 4 53 77Approximately 99.7% of the weights fall between 5.3 and 7.7Approximately 0.3% of the weights fall outside (5.3, 7.7)Approximately (0.3/2)=0.15% of the weights fall above 7.7

2)

Page 31: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

31

ES A Note about the Empirical Rule

1. Find the mean and standard deviation for the data

2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean

3. Compare these actual proportions with those given by the empirical rule

4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed

Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed

Page 32: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

32

ES

Chebyshev’s Theorem: The proportion of any distribution that lies within k standard deviations of the mean is at least 1 (1/k2), where k is any positive number larger than 1. This theorem applies to all distributions of data.

at least1 1

2 k

xx ks x ks

Illustration:

Chebyshev’s Theorem

Page 33: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

33

ES

Chebyshev’s theorem is very conservative and holds for any distribution of data

Important Reminders!

Chebyshev’s theorem also applies to any population

The two most common values used to describe a distribution of data are k = 2, 3

1.7 2 2.5 30.65 0.75 0.84 0.89

k1 1 2 ( / )k

The table below lists some values for k and 1 - (1/k2):

Page 34: 1 ES Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

34

ES

Example: At the close of trading, a random sample of 35 technology stocks was selected. The mean selling price was 67.75 and the standard deviation was 12.3. Use Chebyshev’s theorem (with k = 2, 3) to describe the distribution.

Example

)104.65 ,85.30()3.12(375.67 ),3.12(375.67()3 ,3( sxsx

Using k=3: At least 89% of the observations lie within 3 standard deviations of the mean:

Solutions:

)35.92 ,15.43()3.12(275.67 ),3.12(275.67()2 ,2( sxsx

Using k=2: At least 75% of the observations lie within 2 standard deviations of the mean: