1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches...

77
1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean : 60.07 inches Median : 62.50 inches Range : 42 inches Variance : 117.681 Standard deviation : 10.85 inches Minimum : 36 inches Maximum : 78 inches First quartile : 51.63 inches Third quartile : 67.38 inches Count : 58 bears Sum : 3438.1 inches 0 10 20 30 40 50 60 70 80 Frequen cy Length in Inches Black Bears

Transcript of 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches...

Page 1: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

1

ES9 Chapter 2 ~ Descriptive Analysis &Presentation of Single-Variable Data

Mean: 60.07 inches

Median: 62.50 inches

Range: 42 inches

Variance: 117.681

Standard deviation: 10.85 inches

Minimum: 36 inches

Maximum: 78 inches

First quartile: 51.63 inches

Third quartile: 67.38 inches

Count: 58 bears

Sum: 3438.1 inches0

10

20

30 40 50 60 70 80

Frequency

Length in Inches

Black Bears

Page 2: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

2

ES9

Chapter Goals

• Learn how to present and describe sets of data

• Learn measures of central tendency, measures of dispersion (spread), measures of position, and types of distributions

• Learn how to interpret findings so that we know what the data is telling us about the sampled population

Page 3: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

3

ES92.1 ~ Graphic Presentation of Data

• Use initial exploratory data-analysis techniques to produce a pictorial representation of the data

• Resulting displays reveal patterns of behavior of the variable being studied

• The method used is determined by the type of data and the idea to be presented

• No single correct answer when constructing a graphic display

Page 4: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

4

ES9

Circle Graphs & Bar Graphs

Circle Graphs and Bar Graphs: Graphs that are used to summarize attribute data

• Circle graphs (pie diagrams) show the amount of data that belongs to each category as a proportional part of a circle

• Bar graphs show the amount of data that belongs to each category as proportionally sized rectangular areas

Page 5: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

5

ES9

Example

Example: The table below lists the number of automobiles sold last week by day for a local dealership.

Describe the data using a circle graph and a bar graph:

Day Number Sold

Monday 15

Tuesday 23

Wednesday 35

Thursday 11

Friday 12

Saturday 42

Page 6: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

6

ES9

Circle Graph Solution

Automobiles Sold Last Week

Page 7: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

7

ES9

Bar Graph Solution

Automobiles Sold Last Week

Frequency

Page 8: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

8

ES9

• Pareto Diagram: A bar graph with the bars arranged from the most numerous category to the least numerous category. It includes a line graph displaying the cumulative percentages and counts for the bars.

The Pareto diagram is often used in quality control applications

Notes:

Used to identify the number and type of defects that happen within a product or service

Pareto Diagram

Page 9: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

9

ES9

Example

Example: The final daily inspection defect report for a cabinet manufacturer is given in the table below:

1) Construct a Pareto diagram for this defect report. Management has given the cabinet production line the goal of reducing their defects by 50%.

2) What two defects should they give special attention to in working toward this goal?

Defect Number

Dent 5

Stain 12

Blemish 43

Chip 25

Scratch 40

Others 10

Page 10: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

10

ES9

2) The production line should try to eliminate blemishes and scratches. This would cut defects by more than 50%.

Solutions

140

120

100

80

60

40

20

0

100

80

60

40

20

0

Count Percent

Blemish Scratch Chip Stain Others DentDefect:

Count 43 40 25 12 10 5

Percent 31.9 29.6 18.5 8.9 7.4 3.7

Cum% 31.9 61.5 80.0 88.9 96.3 100.0

Daily Defect Inspection Report1)

Page 11: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

11

ES9

Key Definitions

Quantitative Data: One reason for constructing a graph of quantitative data is to examine the distribution - is the data compact, spread out, skewed, symmetric, etc.

Distribution: The pattern of variability displayed by the data of a variable. The distribution displays the frequency of each value of the variable.

Dotplot Display: Displays the data of a sample by representing each piece of data with a dot positioned along a scale. This scale can be either horizontal or vertical. The frequency of the values is represented along the other scale.

Page 12: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

12

ES9

Example: A random sample of the lifetime (in years) of 50 home washing machines is given below:

Note: Notice how the data is “bunched” near the lower extreme and more“spread out” near the higher extreme

Example

. : . . .:. .

..: :.::::::.. .::. ... . : . . . :. . +---------+---------+---------+---------+---------+------- 0.0 4.0 8.0 12.0 16.0 20.0

2.5 8.9 12.2 4.1 18.1 1.6 12.2

16.9 2.5 3.5 0.4 2.6 2.2 4.0

4.5 6.4 2.9 3.3 4.4 9.2 4.1

0.9 14.5 4.0 0.9 7.2 5.2 1.8

1.5 0.7 3.7 4.2 6.9 15.3 21.8

17.8 7.3 6.8 3.3 7.0 4.0 18.3

8.5 1.4 7.4 4.7 0.7 10.4 3.6

The figure below is a dotplot for the 50 lifetimes:

Page 13: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

13

ES9

Stem-and-Leaf Display: Pictures the data of a sample using the actual digits that make up the data values. Each numerical data is divided into two parts: The leading digit(s) becomes the stem, and the trailing digit(s) becomes the leaf. The stems are located along the main axis, and a leaf for each piece of data is located so as to display the distribution of the data.

Stem & Leaf Display• Background:

– The stem-and-leaf display has become very popular for summarizing numerical data

– It is a combination of graphing and sorting

– The actual data is part of the graph

– Well-suited for computers

Page 14: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

14

ES9

Example

Example: A city police officer, using radar, checked the speed of cars as they were traveling down the main street in town. Construct a stem-and-leaf plot for this data:

41 31 33 35 36 37 39 49 33 19 26 27 24 32 40 39 16 55 38 36

Solution:All the speeds are in the 10s, 20s, 30s, 40s, and 50s. Use the first digit of each speed as the stem and the second digit as the leaf. Draw a vertical line and list the stems, in order to the left of the line. Place each leaf on its stem: place the trailing digit on the right side of the vertical line opposite its corresponding leading digit.

Page 15: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

15

ES9

20 Speeds ---------------------------------------

1 | 6 92 | 4 6 73 | 1 2 3 3 5 6 6 7 8 9 94 | 0 1 95 | 5

----------------------------------------

Example

• The speeds are centered around the 30s

Note: The display could be constructed so that only five possible values (instead of ten) could fall in each stem. What would the stems look like? Would there be a difference in appearance?

Page 16: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

16

ES9

Remember!

1. It is fairly typical of many variables to display a distribution that is concentrated (mounded) about a central value and then in some manner be dispersed in both directions. (Why?)

2. A display that indicates two “mounds” may really be two overlapping distributions

3. A back-to-back stem-and-leaf display makes it possible to compare two distributions graphically

4. A side-by-side dotplot is also useful for comparing two distributions

Page 17: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

17

ES9

HW

• 2.1 page 49 #10, 13, 16, 19, 22

Page 18: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

18

ES92.2 ~ Frequency Distributions & Histograms

• Stem-and-leaf plots often present adequate summaries, but they can get very big, very fast

• Need other techniques for summarizing data

• Frequency distributions and histograms are used to summarize large data sets

•HW 2.2 p. 65: #39, 40, 43, 44, 47, 50, 51 due on Friday 2/2/4/11

•Quiz on Friday 2/4/11 on 2.1 and 2.2

Page 19: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

19

ES9

Frequency Distributions

Frequency Distribution: A listing, often expressed in chart form, that pairs each value of a variable with its frequency

1. A table that summarizes data by classes, or class intervals

2. In a typical grouped frequency distribution, there are usually 5-12 classes of equal width

3. The table may contain columns for class number, class interval, tally (if constructing by hand), frequency, relative frequency, cumulative relative frequency, and class midpoint

4. In an ungrouped frequency distribution each class consists of a single value

Ungrouped Frequency Distribution: Each value of x in the distribution stands alone

Grouped Frequency Distribution: Group the values into a set of classes

Page 20: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

20

ES9

Frequency Distribution

Guidelines for constructing a frequency distribution:

1. All classes should be of the same width

2. Classes should be set up so that they do not overlap and so that each piece of data belongs to exactly one class

3. For problems in the text, 5-12 classes are most desirable. The square root of n is a reasonable guideline for the number of classes if n is less than 150.

4. Use a system that takes advantage of a number pattern, to guarantee accuracy

5. If possible, an even class width is often advantageous

Page 21: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

21

ES9

Frequency Distributions

Procedure for constructing a frequency distribution:

1. Identify the high (H) and low (L) scores. Find the range.Range = H - L

2. Select a number of classes and a class width so that the product is a bit larger than the range

3. Pick a starting point a little smaller than L. Count from L by the width to obtain the class boundaries. Observations that fall on class boundaries are placed into the class interval to the right.

-=

Page 22: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

22

ES9

Example

6.5 5.0 5.6 7.6 4.8 8.0 7.5 7.9 8.0 9.2

6.4 6.0 5.6 6.0 5.7 9.2 8.1 8.0 6.5 6.6

5.0 8.0 6.5 6.1 6.4 6.6 7.2 5.9 4.0 5.7

7.9 6.0 5.6 6.0 6.2 7.7 6.7 7.7 8.2 9.0

Example: The hemoglobin test, a blood test given to diabetics during their periodic checkups, indicates the level of control of blood sugar during the past two to three months. The data in the table below was obtained for 40 different diabetics at a university clinic that treats diabetic patients:

1) Construct a grouped frequency distribution using the classes3.7 - <4.7, 4.7 - <5.7, 5.7 - <6.7, etc.

2) Which class has the highest frequency?

Page 23: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

23

ES9

Class Frequency Relative Cumulative ClassBoundaries f Frequency Rel. Frequency Midpoint, x

---------------------------------------------------------------------------------------

3.7 - <4.7 1 0.025 0.025 4.2

4.7 - <5.7 6 0.150 0.175 5.2

5.7 - <6.7 16 0.400 0.575 6.2

6.7 - <7.7 4 0.100 0.675 7.2

7.7 - <8.7 10 0.250 0.925 8.2

8.7 - <9.7 3 0.075 1.000 9.2

Solutions

2) The class 5.7 - <6.7 has the highest frequency. The frequency is 16 and the relative frequency is 0.40

1)

Page 24: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

24

ES9

Histogram

Histogram: A bar graph representing a frequency distribution of a quantitative variable. A histogram is made up of the following components:

1. A title, which identifies the population of interest

2. A vertical scale, which identifies the frequencies in the various classes

3. A horizontal scale, which identifies the variable x. Values for the class boundaries or class midpoints may be labeled along the x-axis. Use whichever method of labeling the axis best presents the variable.

Notes: The relative frequency is sometimes used on the vertical scale It is possible to create a histogram based on class midpoints

Page 25: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

25

ES9

Example: Construct a histogram for the blood test results given in the previous example

Example

9.28.27.26.25.24.2

15

10

5

0

Frequency

Blood Test

Solution:The Hemoglobin Test

Page 26: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

26

ES9

Age Frequency Class Midpoint------------------------------------------------------------

20 up to 30 34 25

30 up to 40 58 35

40 up to 50 76 45

50 up to 60 187 55

60 up to 70 254 65

70 up to 80 241 75

80 up to 90 147 85

Example: A recent survey of Roman Catholic nuns summarized their ages in the table below. Construct a histogram for this age data:

Example

Page 27: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

27

ES9

Solution

85756555453525

200

100

0

Frequency

Age

Roman Catholic Nuns

Page 28: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

28

ES9Terms Used to Describe Histograms

Symmetrical: Both sides of the distribution are identical mirror images. There is a line of symmetry.

Uniform (Rectangular): Every value appears with equal frequency

Skewed: One tail is stretched out longer than the other. The direction of skewness is on the side of the longer tail. (Positively skewed vs. negatively skewed)

J-Shaped: There is no tail on the side of the class with the highest frequency

Bimodal: The two largest classes are separated by one or more classes. Often implies two populations are sampled.

Normal: A symmetrical distribution is mounded about the mean and becomes sparse at the extremes

Page 29: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

29

ES9

The mode is the value that occurs with greatest frequency (discussed in Section 2.3)

Important Reminders

The modal class is the class with the greatest frequency

A bimodal distribution has two high-frequency classes separated by classes with lower frequencies

Graphical representations of data should include a descriptive, meaningful title and proper identification of the vertical and horizontal scales

Page 30: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

30

ES9Cumulative Frequency Distribution

Cumulative Frequency Distribution: A frequency distribution that pairs cumulative frequencies with values of the variable

• The cumulative frequency for any given class is the sum of the frequency for that class and the frequencies of all classes of smaller values

• The cumulative relative frequency for any given class is the sum of the relative frequency for that class and the relative frequencies of all classes of smaller values

Page 31: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

31

ES9

Class Relative Cumulative CumulativeBoundaries Frequency Frequency Frequency Rel. Frequency-------------------------------------------------------------------------------------

0 up to 4 4 0.08 4 0.08

4 up to 8 8 0.16 12 0.24

8 up to 12 8 0.16 20 0.40

12 up to 16 20 0.40 40 0.80

16 up to 20 6 0.12 46 0.92

20 up to 24 3 0.06 49 0.98

24 up to 28 1 0.02 50 1.00

Example: A computer science aptitude test was given to 50 students. The table below summarizes the data:

Example

Page 32: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

32

ES9

Ogive: A line graph of a cumulative frequency or cumulative relative frequency distribution. An ogive has the following components:

1. A title, which identifies the population or sample

2. A vertical scale, which identifies either the cumulative frequencies or the cumulative relative frequencies

3. A horizontal scale, which identifies the upper class boundaries. Until the upper boundary of a class has been reached, you cannot be sure you have accumulated all the data in the class. Therefore, the horizontal scale for an ogive is always based on the upper class boundaries.

Note: Every ogive starts on the left with a relative frequency of zero at the lower class boundary of the first class and ends on the right with a relative frequency of 100% at the upper class boundary of the last class.

Ogive

Page 33: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

33

ES9

Example

Example: The graph below is an ogive using cumulative relative frequencies for the computer science aptitude data:

Cumulative Relative

Frequency

0 4 8 12 16 20 24 28

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Test Score

Computer Science Aptitude Test

Page 34: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

34

ES92.3 ~ Measures of Central Tendency

• Numerical values used to locate the middle of a set of data, or where the data is clustered

• The term average is often associated with all measures of central tendency

Page 35: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

35

ES9

MeanMean: The type of average with which you are probably most familiar. The mean is the sum of all the values divided by the total number of values, n:

The population mean, , (lowercase mu, Greek alphabet), is the mean of all x values for the entire population

Notes:

We usually cannot measure but would like to estimate its value

A physical representation: the mean is the value that balances the weights on the number line

xn

xn

x x xi n 1 11 2( )

Page 36: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

36

ES9

Example

Example: The following data represents the number of accidents in each of the last 6 years at a dangerous intersection. Find the mean number of accidents: 8, 9, 3, 5, 2, 6, 4, 5:

x 1

88 9 3 5 2 6 4 5 5 25( ) .Solution:

In the data above, change 6 to 26:

Note: The mean can be greatly influenced by outliers

x 1

88 9 3 5 2 26 4 5 7 75( ) .Solution:

Page 37: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

37

ES9

MedianMedian: The value of the data that occupies the middle position when the data are ranked in order according to size

Notes: Denoted by “x tilde”: The population median, (uppercase mu, Greek alphabet), is

the data value in the middle position of the entire population

~x

d x n(~) 12

To find the median:

1. Rank the data

2. Determine the depth of the median:

3. Determine the value of the median

Page 38: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

38

ES9

Example

Example: Find the median for the set of data:

Solution:

1. Rank the data: 2, 2, 3, 3, 4, 8, 8, 9, 11

2. Find the depth:

3. The median is the fifth number from either end in the rankeddata:

d x(~) ( )/ 9 1 2 5

~x 4

Suppose the data set is {4, 8, 3, 8, 2, 9, 2, 11, 3, 15}:

1. Rank the data: 2, 2, 3, 3, 4, 8, 8, 9, 11, 15

2. Find the depth:

3. The median is halfway between the fifth and sixth observations: ~ ( )/x 4 8 2 6

5.52/)110()~( xd

{4, 8, 3, 8, 2, 9, 2, 11, 3}

Page 39: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

39

ES9

midrange L H2

Mode & Midrange

Mode: The mode is the value of x that occurs most frequently

Midrange: The number exactly midway between a lowest value data L and a highest value data H. It is found by averaging the low and the high values:

Note: If two or more values in a sample are tied for the highest frequency (number of occurrences), there is no mode

Page 40: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

40

ES9

Example: Consider the data set {12.7, 27.1, 35.6, 44.2, 18.0}

Midrange L H2

127 4422

2845. . .

When rounding off an answer, a common rule-of-thumb is to keep one more decimal place in the answer than was present in the original data

To avoid round-off buildup, round off only the final answer, not intermediate steps

Notes:

Example

Page 41: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

41

ES9

HW

• 2.3page 75-77

• #64,65,69,71

Page 42: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

42

ES9

2.4 ~ Measures of Dispersion

• Measures of central tendency alone cannot completely characterize a set of data. Two very different data sets may have similar measures of central tendency.

• Measures of dispersion are used to describe the spread, or variability, of a distribution

• Common measures of dispersion: range, variance, and standard deviation

Page 43: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

43

ES9

range H L

Range

Range: The difference in value between the highest-valued (H) and the lowest-valued (L) pieces of data:

• Other measures of dispersion are based on the following quantity

Deviation from the Mean: A deviation from the mean, ,is the difference between the value of x and the mean

x xx

Page 44: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

44

ES9

Example Example: Consider the sample {12, 23, 17, 15, 18}.

Find 1) the range and 2) each deviation from the mean.

Solutions:

range H L 23 12 111)

-560

-21

Data Deviation from Mean_________________________ 12 23 17 15 18

x x x

x 15

12 23 17 15 18 17( )2)

Page 45: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

45

ES9

Note: (Always!) xx 0)(

Mean Absolute Deviation

Mean Absolute Deviation: The mean of the absolute values of the deviations from the mean:

8.25

14)12065(

51

||1 xxn

For the previous example:

xx ||1

deviation absoluteMean n

Page 46: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

46

ES9

s s 2

Sample Variance: The sample variance, s2, is the mean of the squared deviations, calculated using n 1 as the divisor:

Standard Deviation: The standard deviation of a sample, s, is the positive square root of the variance:

Sample Variance & Standard Deviation

sn

x x2 211

( ) where n is the sample size

s xn

21

SS( ) SS( ) ( )x x x xn

x 2 2 21

Note: The numerator for the sample variance is called the sum of squares for x, denoted SS(x):

where

Page 47: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

47

ES9

Example: Find the 1) variance and 2) standard deviation for the data {5, 7, 1, 3, 8}:

Example

2.8)8.32(4

12 s1) 86.22.8 s2)

x x x ( )x x 2

0.22.2-3.8-1.83.2

0.044.84

14.443.24

10.24

57138

24 0 32.08Sum:

Solutions:

x 15 5 7 1 3 8 48( ) .First:

Page 48: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

48

ES9

1

22

2

n

n

xx

s

Notes

The shortcut formula for the sample variance:

The unit of measure for the standard deviation is the same as the unit of measure for the data

Page 49: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

49

ES9

Practice Exercises

• 2.4 page 84-86#82, 84, 85, 87, 90, 91

Page 50: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

50

ES9 2.5 ~ Mean & StandardDeviation of Frequency Distribution

• If the data is given in the form of a frequency distribution, we need to make a few changes to the formulas for the mean, variance, and standard deviation

• Complete the extension table in order to find these summary statistics

Page 51: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

51

ES9

2. In a grouped frequency distribution, we use the frequency of occurrence associated with each class midpoint:

xxf

f

s

x fxf

f

f2

2

2

1

To Calculate

• In order to calculate the mean, variance, and standard deviation for data:

1. In an ungrouped frequency distribution, use the frequency of occurrence, f, of each observation

Page 52: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

52

ES9

Example: A survey of students in the first grade at a local school asked for the number of brothers and/or sisters for each child. The results are summarized in the table below. Find 1) the mean, 2) the variance, and

3) the standard deviation:

Example

s2

2239 93

6262 1 163 ( )

.2) s 163 128. .3)x 93 62 15/ .1)

Solutions:First:

172352

17462010

1245

62 93Sum:

15 00

x f xf x f2

17928050239

0

Page 53: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

53

ES9

2.6 ~ Measures of Position

• Measures of position are used to describe the relative location of an observation

• Quartiles and percentiles are two of the most popular measures of position

• An additional measure of central tendency, the midquartile, is defined using quartiles

• Quartiles are part of the 5-number summary

Page 54: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

54

ES9

25% 25% 25% 25%

L Q1 Q2 Q3H

Ranked data, increasing order

Quartiles

1. The first quartile, Q1, is a number such that at most 25% of the data are smaller in value than Q1 and at most 75% are larger

2. The second quartile, Q2, is the median

3. The third quartile, Q3, is a number such that at most 75% of the data are smaller in value than Q3 and at most 25% are larger

Quartiles: Values of the variable that divide the ranked data into quarters; each set of data has three quartiles

Page 55: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

55

ES9

Percentiles: Values of the variable that divide a set of ranked data into 100 equal subsets; each set of data has 99 percentiles. The kth percentile, Pk, is a value such that at most k% of the data is smaller in value than Pk and at most (100 k)% of the data is larger.

Percentiles

at most k % at most (100 - k )%

PkL H

~x Q P 2 50

Notes:

The 1st quartile and the 25th percentile are the same: Q1 = P25

The median, the 2nd quartile, and the 50th percentile areall the same:

Page 56: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

56

ES9Finding Pk (and Quartiles)

• Procedure for finding Pk (and quartiles):1. Rank the n observations, lowest to highest

2. Compute A = (nk)/1003. If A is an integer:

– d(Pk) = A.5 (depth)

– Pk is halfway between the value of the data in the Ath position and the value of the next data

If A is a fraction:

– d(Pk) = B, the next larger integer

– Pk is the value of the data in the Bth position

Page 57: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

57

ES9

1) k = 25: (20) (25) / 100 = 5, depth = 5.5, Q1 = 6

5.6 5.6 5.8 5.9 6.06.0 6.1 6.2 6.3 6.46.7 6.8 6.8 6.8 6.97.0 7.3 7.4 7.4 7.5

Example Example: The following data represents the pH levels of a random sample of swimming

pools in a California town. Find: 1) the first quartile, 2) the third quartile, and 3) the 37th percentile:

2) k = 75: (20) (75) / 100 = 15, depth = 15.5, Q3 = 6.95

3) k = 37: (20) (37) / 100 = 7.4, depth = 8, P37 = 6.2

Solutions:

Page 58: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

58

ES9

Midquartile: The numerical value midway between the first and third quartile:

midquartileQ Q1 32

475.6295.12

295.66

2emidquartil 31 QQ

Example: Find the midquartile for the 20 pH values in the previous example:

Note: The mean, median, midrange, and midquartile are all measures of central tendency. They are not necessarily equal. Can you think of an example when they would be the same value?

Midquartile

Page 59: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

59

ES9

5-Number Summary: The 5-number summary is composed of:

1. L, the smallest value in the data set

2. Q1, the first quartile (also P25)

3. , the median (also P50 and 2nd quartile)

4. Q3, the third quartile (also P75)

5. H, the largest value in the data set

~x

5-Number Summary

Notes:

The 5-number summary indicates how much the data is spread out in each quarter

The interquartile range is the difference between the first and third quartiles. It is the range of the middle 50% of the data

Page 60: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

60

ES9

Box-and-Whisker Display

Box-and-Whisker Display: A graphic representation of the5-number summary:

• The five numerical values (smallest, first quartile, median, third quartile, and largest) are located on a scale, either vertical or horizontal

• The box is used to depict the middle half of the data that lies between the two quartiles

• The whiskers are line segments used to depict the other half of the data

• One line segment represents the quarter of the data that is smaller in value than the first quartile

• The second line segment represents the quarter of the data that is larger in value that the third quartile

Page 61: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

61

ES9

63 64 76 76 81 83 85 86 88 89 90 91 92 93 93 93 94 97 99 99 99 101 108 109 112

Example: A random sample of students in a sixth grade class was selected. Their weights are given in the table below. Find the 5-number summary for this data and construct a boxplot:

63 85 92 99 112L HQ1 Q3

~x

Solution:

Example

Page 62: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

62

ES9

Boxplot for Weight Data

Weights from Sixth Grade Class

11010090807060

L Q1~x Q3 H

Weight

Page 63: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

63

ES9

zx x

s value mean

st.dev.

z-Score (standardization)

z-Score: The position a particular value of x has relative to the mean, measured in standard deviations. The z-score is found by the formula:

Notes: Typically, the calculated value of z is rounded to the nearest hundredth The z-score measures the number of standard deviations above/below, or

away from, the mean z-scores typically range from -3.00 to +3.00 z-scores may be used to make comparisons of raw scores

Page 64: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

64

ES9

Example: A certain data set has mean 35.6 and standard deviation 7.1. Find the z-scores for 46 and 33:

Example

Solutions:

zx x

s 46 35 6

7 1176

..

.

46 is 1.46 standard deviations above the mean

zx x

s 33 35 6

7137

..

.

33 is -0.37 below standard deviations below the mean.

0

Page 65: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

65

ES9 2.7 ~ Interpreting & UnderstandingStandard Deviation

• Standard deviation is a measure of variability, or spread

• Two rules for describing data rely on the standard deviation:

– Empirical rule: applies to a variable that is normally distributed

– Chebyshev’s theorem: applies to any distribution

Page 66: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

66

ES9

Notes: The empirical rule is more informative than Chebyshev’s theorem since

we know more about the distribution (normally distributed) Also applies to populations Can be used to determine if a distribution is normally distributed

1. Approximately 68% of the observations lie within 1 standard deviation of the mean

2. Approximately 95% of the observations lie within 2 standard deviations of the mean

3. Approximately 99.7% of the observations lie within 3 standard deviations of the mean

Empirical Rule: If a variable is normally distributed, then:

Empirical Rule

Page 67: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

67

ES9

xx s x sx s 2 x s2x s 3 x s3

68%95%

99.7%

Illustration of the Empirical Rule

Page 68: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

68

ES9

1) What percentage of weights fall between 5.7 and 7.3?2) What percentage of weights fall above 7.7?

Example Example: A random sample of plum tomatoes was selected from a local grocery store and their

weights recorded. The mean weight was 6.5 ounces with a standard deviation of 0.4 ounces. If the weights are normally distributed:

Solutions:

( , ) ( . (0. ), . (0. )) ( . , . )x s x s 2 2 65 2 4 65 2 4 57 73Approximately 95% of the weights fall between 5.7 and 7.3

1)

( , ) ( . (0. ), . (0. )) ( . , . )x s x s 3 3 65 3 4 65 3 4 53 77Approximately 99.7% of the weights fall between 5.3 and 7.7Approximately 0.3% of the weights fall outside (5.3, 7.7)Approximately (0.3/2)=0.15% of the weights fall above 7.7

2)

Page 69: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

69

ES9A Note about the Empirical Rule

1. Find the mean and standard deviation for the data

2. Compute the actual proportion of data within 1, 2, and 3 standard deviations from the mean

3. Compare these actual proportions with those given by the empirical rule

4. If the proportions found are reasonably close to those of the empirical rule, then the data is approximately normally distributed

Note: The empirical rule may be used to determine whether or not a set of data is approximately normally distributed

Page 70: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

70

ES9

Chebyshev’s Theorem: The proportion of any distribution that lies within k standard deviations of the mean is at least 1 (1/k2), where k is any positive number larger than 1. This theorem applies to all distributions of data.

at least

1 12 k

xx ks x ks

Illustration:

Chebyshev’s Theorem

Page 71: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

71

ES9

Chebyshev’s theorem is very conservative and holds for any distribution of data

Important Reminders!

Chebyshev’s theorem also applies to any population

The two most common values used to describe a distribution of data are k = 2, 3

1.7 2 2.5 30.65 0.75 0.84 0.89

k

1 1 2 ( / )k

The table below lists some values for k and 1 - (1/k2):

Page 72: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

72

ES9

Example: At the close of trading, a random sample of 35 technology stocks was selected. The mean selling price was 67.75 and the standard deviation was 12.3. Use Chebyshev’s theorem (with k = 2, 3) to describe the distribution.

Example

)104.65 ,85.30()3.12(375.67 ),3.12(375.67()3 ,3( sxsx

Using k=3: At least 89% of the observations lie within 3 standard deviations of the mean:

Solutions:

)35.92 ,15.43()3.12(275.67 ),3.12(275.67()2 ,2( sxsx

Using k=2: At least 75% of the observations lie within 2 standard deviations of the mean:

Page 73: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

73

ES92.8 ~ The Art of Statistical Deception

Good Arithmetic, Bad Statistics

Misleading Graphs

Insufficient Information

Page 74: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

74

ES9

Good Arithmetic, Bad Statistics

• The mean can be greatly influenced by outliers– Example: The mean salary for all NBA players is $15.5 million

Misleading graphs:

1. The frequency scale should start at zero to present a complete picture. Graphs that do not start at zero are used to save space.

2. Graphs that start at zero emphasize the size of the numbers involved

3. Graphs that are chopped off emphasize variation

Page 75: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

75

ES9

Flight Cancellations

2002200019981996

35

30

25

20

15

10

5

0

Number ofCancellations

Year

Page 76: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

76

ES9

Flight Cancellations

35

34

33

32

31

30

29

28

27

2002200019981996

Year

Number ofCancellations

Page 77: 1 ES9 Chapter 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

77

ES9

Insufficient Information

• Example: An admissions officer from a state school explains that the average tuition at a nearby private university is $13,000 and only $4500 at his school. This makes the state school look more attractive.

– If most students pay the full tuition, then the state school appears to be a better choice

– However, if most students at the private university receive substantial financial aid, then the actual tuition cost could be much lower!