Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k...

23
Chapter 6 1

Transcript of Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k...

Page 1: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Chapter 6

1

Page 2: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Chebychev’s Theorem

• The portion of any data set lying within k standard deviations (k > 1) of the mean is at least:

2

2

11

k

• k = 2: In any data set, at least 2

1 31 or 75%

2 4

of the data lie within 2 standard deviations of the mean.

• k = 3: In any data set, at least 2

1 81 or 88.9%

3 9

of the data lie within 3 standard deviations of the mean.

Page 3: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Example: Using Chebychev’s Theorem

The age distribution for Florida is shown in the histogram. Apply Chebychev’s Theorem to the data using k = 2. What can you conclude?

3

Page 4: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Solution: Using Chebychev’s Theorem

k = 2: μ – 2σ = 39.2 – 2(24.8) = -10.4 (use 0 since age can’t be negative)

μ + 2σ = 39.2 + 2(24.8) = 88.8

4

At least 75% of the population of Florida is between 0 and 88.8 years old.

Page 5: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Standard Deviation for Grouped Data

Sample standard deviation for a frequency distribution

• When a frequency distribution has classes, estimate the sample mean and standard deviation by using the midpoint of each class.

5

2( )

1

x x fs

n

where n= Σf (the number of entries in the data set)

Page 6: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Example: Finding the Standard Deviation for Grouped Data

6

You collect a random sample of the number of children per household in a region. Find the sample mean and the sample standard deviation of the data set.

Number of Children in 50 Households

1 3 1 1 1

1 2 2 1 0

1 1 0 0 0

1 5 0 3 6

3 0 3 1 1

1 1 6 0 1

3 6 6 1 2

2 3 0 1 1

4 1 1 2 2

0 3 0 2 4

Page 7: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

x f xf

0 10 0(10) = 0

1 19 1(19) = 19

2 7 2(7) = 14

3 7 3(7) =21

4 2 4(2) = 8

5 1 5(1) = 5

6 4 6(4) = 24

Solution: Finding the Standard Deviation for Grouped Data

• First construct a frequency distribution.

• Find the mean of the frequency distribution.

7

Σf = 50 Σ(xf )= 91

911.8

50

xfx

n

The sample mean is about 1.8 children.

Page 8: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Solution: Finding the Standard Deviation for Grouped Data

• Determine the sum of squares.

8

x f

0 10 0 – 1.8 = –1.8 (–1.8)2 = 3.24 3.24(10) = 32.40

1 19 1 – 1.8 = –0.8 (–0.8)2 = 0.64 0.64(19) = 12.16

2 7 2 – 1.8 = 0.2 (0.2)2 = 0.04 0.04(7) = 0.28

3 7 3 – 1.8 = 1.2 (1.2)2 = 1.44 1.44(7) = 10.08

4 2 4 – 1.8 = 2.2 (2.2)2 = 4.84 4.84(2) = 9.68

5 1 5 – 1.8 = 3.2 (3.2)2 = 10.24 10.24(1) = 10.24

6 4 6 – 1.8 = 4.2 (4.2)2 = 17.64 17.64(4) = 70.56

x x 2( )x x 2( )x x f

2( ) 145.40x x f

Page 9: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Solution: Finding the Standard Deviation for Grouped Data

• Find the sample standard deviation.

9

x x 2( )x x 2( )x x f2( ) 145.401.7

1 50 1

x x fs

n

The standard deviation is about 1.7 children.

Page 10: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Quartiles

• Fractiles are numbers that partition (divide) an ordered data set into equal parts.

• Quartiles approximately divide an ordered data set into four equal parts. First quartile, Q1: About one quarter of the data

fall on or below Q1.

Second quartile, Q2: About one half of the data fall on or below Q2 (median).

Third quartile, Q3: About three quarters of the data fall on or below Q3.

10

Page 11: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Example: Finding Quartiles

The test scores of 15 employees enrolled in a CPR training course are listed. Find the first, second, and third quartiles of the test scores.

13 9 18 15 14 21 7 10 11 20 5 18 37 16 17

11

Solution:

• Q2 divides the data set into two halves.

5 7 9 10 11 13 14 15 16 17 18 18 20 21 37

Q2

Lower half Upper half

Page 12: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Solution: Finding Quartiles

• The first and third quartiles are the medians of the lower and upper halves of the data set.

5 7 9 10 11 13 14 15 16 17 18 18 20 21 37

12

Q2

Lower half Upper half

Q1 Q3

About one fourth of the employees scored 10 or less, about one half scored 15 or less; and about three fourths scored 18 or less.

Page 13: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Interquartile Range

Interquartile Range (IQR)• The difference between the third and first quartiles.

• IQR = Q3 – Q1

13

Page 14: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Example: Finding the Interquartile Range

Find the interquartile range of the test scores.

Recall Q1 = 10, Q2 = 15, and Q3 = 18

14

Solution:

• IQR = Q3 – Q1 = 18 – 10 = 8

The test scores in the middle portion of the data set vary by at most 8 points.

Page 15: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Box-and-Whisker Plot

Box-and-whisker plot• Exploratory data analysis tool.• Highlights important features of a data set.• Requires (five-number summary):

Minimum entry First quartile Q1

Median Q2

Third quartile Q3

Maximum entry

15

Page 16: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Drawing a Box-and-Whisker Plot

1. Find the five-number summary of the data set.

2. Construct a horizontal scale that spans the range of the data.

3. Plot the five numbers above the horizontal scale.

4. Draw a box above the horizontal scale from Q1 to Q3 and draw a vertical line in the box at Q2.

5. Draw whiskers from the box to the minimum and maximum entries.

16

Whisker Whisker

Maximum entry

Minimum entry

Box

Median, Q2 Q3Q1

Page 17: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Example: Drawing a Box-and-Whisker Plot

Draw a box-and-whisker plot that represents the 15 test scores.

Recall Min = 5 Q1 = 10 Q2 = 15 Q3 = 18 Max = 37

17

5 10 15 18 37

Solution:

About half the scores are between 10 and 18. By looking at the length of the right whisker, you can conclude 37 is a possible outlier.

Page 18: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

18

Page 19: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

The Standard Score

Standard Score (z-score)• Represents the number of standard deviations a given

value x falls from the mean μ.

19

value - mean

standard deviation

xz

Page 20: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Example: Comparing z-Scores from Different Data Sets

In 2007, Forest Whitaker won the Best Actor Oscar at age 45 for his role in the movie The Last King of Scotland. Helen Mirren won the Best Actress Oscar at age 61 for her role in The Queen. The mean age of all best actor winners is 43.7, with a standard deviation of 8.8. The mean age of all best actress winners is 36, with a standard deviation of 11.5. Find the z-score that corresponds to the age for each actor or actress. Then compare your results.

20

Page 21: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Solution: Comparing z-Scores from Different Data Sets

21

• Forest Whitaker45 43.7

0.158.8

xz

• Helen Mirren61 36

2.1711.5

xz

0.15 standard deviations above the mean

2.17 standard deviations above the mean

Page 22: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Solution: Comparing z-Scores from Different Data Sets

22

The z-score corresponding to the age of Helen Mirren is more than two standard deviations from the mean, so it is considered unusual. Compared to other Best Actress winners, she is relatively older, whereas the age of Forest Whitaker is only slightly higher than the average age of other Best Actor winners.

z = 0.15 z = 2.17

Page 23: Chapter 6 1. Chebychev’s Theorem The portion of any data set lying within k standard deviations (k > 1) of the mean is at least: 2 k = 2: In any data.

Chapter 6 Summary

• Determined the quartiles of a data set• Determined the interquartile range of a data set• Created a box-and-whisker plot• Interpreted other fractiles such as percentiles• Determined and interpreted the standard score

(z-score)

23