1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches...

27
1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean : 60.07 inches Median : 62.50 inches Range : 42 inches Variance : 117.681 Standard deviation : 10.85 inches Minimum : 36 inches Maximum : 78 inches First quartile : 51.63 inches Third quartile : 67.38 inches Count : 58 bears Sum : 3438.1 inches 0 10 20 30 40 50 60 70 80 Frequen cy Length in Inches Black Bears

description

3 ES Graphic Presentation of Data Use initial exploratory data-analysis techniques to produce a pictorial representation of the data Resulting displays reveal patterns of behavior of the variable being studied The method used is determined by the type of data and the idea to be presented No single correct answer when constructing a graphic display

Transcript of 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches...

Page 1: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

1

ES Chapter 1 & 2 ~ Descriptive Analysis &Presentation of Single-Variable Data

Mean: 60.07 inchesMedian: 62.50 inchesRange: 42 inchesVariance: 117.681Standard deviation: 10.85 inchesMinimum: 36 inchesMaximum: 78 inchesFirst quartile: 51.63 inchesThird quartile: 67.38 inchesCount: 58 bearsSum: 3438.1 inches

0

10

20

30 40 50 60 70 80

Frequency

Length in Inches

Black Bears

Page 2: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

2

ES Chapter 1 & 2 : Goals

• Learn how to present and describe sets of data

• Learn measures of central tendency, measures of dispersion (spread), measures of position, and types of distributions

• Learn how to interpret findings so that we know what the data is telling us about the sampled population

Page 3: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

3

ES Graphic Presentation of Data

• Use initial exploratory data-analysis techniques to produce a pictorial representation of the data

• Resulting displays reveal patterns of behavior of the variable being studied

• The method used is determined by the type of data and the idea to be presented

• No single correct answer when constructing a graphic display

Page 4: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

4

ES Circle Graphs & Bar GraphsCircle Graphs and Bar Graphs: Graphs that are used to summarize attribute data

• Circle graphs (pie diagrams) show the amount of data that belongs to each category as a proportional part of a circle

• Bar graphs show the amount of data that belongs to each category as proportionally sized rectangular areas

Page 5: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

5

ES Example Example: The table below lists the number of automobiles

sold last week by day for a local dealership. Describe the data using a circle graph and a bar graph:

Day Number SoldMonday 15

Tuesday 23

Wednesday 35

Thursday 11

Friday 12

Saturday 42

Page 6: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

6

ES Circle Graph SolutionAutomobiles Sold Last Week

Page 7: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

7

ES Bar Graph SolutionAutomobiles Sold Last Week

Frequency

Page 8: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

8

ES Key DefinitionsQuantitative Data: One reason for constructing a graph of quantitative data is to examine the distribution - is the data compact, spread out, skewed, symmetric, etc.

Distribution: The pattern of variability displayed by the data of a variable. The distribution displays the frequency of each value of the variable.

Dotplot Display: Displays the data of a sample by representing each piece of data with a dot positioned along a scale. This scale can be either horizontal or vertical. The frequency of the values is represented along the other scale.

Page 9: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

9

ES

Example: A random sample of the lifetime (in years) of 49 home washing machines is given below:

Note: Notice how the data is “bunched” near the lower extreme and more“spread out” near the higher extreme

Example

2.5 8.9 12.2 4.1 18.1 1.6 12.216.9 2.5 3.5 0.4 2.6 2.2 4.04.5 6.4 2.9 3.3 4.4 9.2 4.10.9 14.5 4.0 0.9 7.2 5.2 1.81.5 0.7 3.7 4.2 6.9 15.3 21.8

17.8 7.3 6.8 3.3 7.0 4.0 18.38.5 1.4 7.4 4.7 0.7 10.4 3.6

The figure below is a dot plot for the 49 lifetimes:

211815129630Lifetimes

Dotplot of Lifetimes

Page 10: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

10

ES

Stem-and-Leaf Display: Pictures the data of a sample using the actual digits that make up the data values. Each numerical data is divided into two parts: The leading digit(s) becomes the stem, and the trailing digit(s) becomes the leaf. The stems are located along the main axis, and a leaf for each piece of data is located so as to display the distribution of the data.

Stem & Leaf Display• Background:

– The stem-and-leaf display has become very popular for summarizing numerical data

– It is a combination of graphing and sorting– The actual data is part of the graph– Well-suited for computers

Page 11: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

11

ES Example Example: A city police officer, using radar, checked the

speed of cars as they were traveling down the main street in town. Construct a stem-and-leaf plot for this data:

41 31 33 35 36 37 39 49 33 19 26 27 24 32 40 39 16 55 38 36

Solution:All the speeds are in the 10s, 20s, 30s, 40s, and 50s. Use the first digit of each speed as the stem and the second digit as the leaf. Draw a vertical line and list the stems, in order to the left of the line. Place each leaf on its stem: place the trailing digit on the right side of the vertical line opposite its corresponding leading digit.

Page 12: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

12

ES

20 Speeds ---------------------------------------

1 | 6 92 | 4 6 73 | 1 2 3 3 5 6 6 7 8 9 94 | 0 1 95 | 5

----------------------------------------

Example

• The speeds are centered around the 30s

Note: The display could be constructed so that only five possible values (instead of ten) could fall in each stem. What would the stems look like? Would there be a difference in appearance?

Page 13: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

13

ES Remember!1. It is fairly typical of many variables to display a distribution

that is concentrated (mounded) about a central value and then in some manner be dispersed in both directions. (Why?)

2. A display that indicates two “mounds” may really be two overlapping distributions

3. A back-to-back stem-and-leaf display makes it possible to compare two distributions graphically

4. A side-by-side dotplot is also useful for comparing two distributions

Page 14: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

14

ES Frequency Distributions & Histograms

• Stem-and-leaf plots often present adequate summaries, but they can get very big, very fast

• Need other techniques for summarizing data

• Frequency distributions and histograms are used to summarize large data sets

Page 15: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

15

ES Frequency DistributionsFrequency Distribution: A listing, often expressed in chart form, that pairs each value of a variable with its frequency

1. A table that summarizes data by classes, or class intervals

2. In a typical grouped frequency distribution, there are usually 5-12 classes of equal width

3. The table may contain columns for class number, class interval, tally (if constructing by hand), frequency, relative frequency, cumulative relative frequency, and class midpoint

4. In an ungrouped frequency distribution each class consists of a single value

Ungrouped Frequency Distribution: Each value of x in the distribution stands alone

Grouped Frequency Distribution: Group the values into a set of classes

Page 16: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

16

ES Frequency DistributionGuidelines for constructing a frequency distribution:

1. All classes should be of the same width

2. Classes should be set up so that they do not overlap and so that each piece of data belongs to exactly one class

3. For problems in the text, 5-12 classes are most desirable. The square root of n is a reasonable guideline for the number of classes if n is less than 150.

4. Use a system that takes advantage of a number pattern, to guarantee accuracy

5. If possible, an even class width is often advantageous

Page 17: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

17

ES Frequency DistributionsProcedure for constructing a frequency distribution:

1. Identify the high (H) and low (L) scores. Find the range.Range = H - L

2. Select a number of classes and a class width so that the product is a bit larger than the range

3. Pick a starting point a little smaller than L. Count from L by the width to obtain the class boundaries. Observations that fall on class boundaries are placed into the class interval to the right.

Page 18: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

18

ES Example

6.5 5.0 5.6 7.6 4.8 8.0 7.5 7.9 8.0 9.26.4 6.0 5.6 6.0 5.7 9.2 8.1 8.0 6.5 6.65.0 8.0 6.5 6.1 6.4 6.6 7.2 5.9 4.0 5.77.9 6.0 5.6 6.0 6.2 7.7 6.7 7.7 8.2 9.0

Example: The hemoglobin test, a blood test given to diabetics during their periodic checkups, indicates the level of control of blood sugar during the past two to three months. The data in the table below was obtained for 40 different diabetics at a university clinic that treats diabetic patients:

1) Construct a grouped frequency distribution using the classes3.7 - <4.7, 4.7 - <5.7, 5.7 - <6.7, etc.

2) Which class has the highest frequency?

Page 19: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

19

ES

Class Frequency Relative Cumulative ClassBoundaries f Frequency Rel. Frequency Midpoint, x---------------------------------------------------------------------------------------3.7 - <4.7 1 0.025 0.025 4.24.7 - <5.7 6 0.150 0.175 5.25.7 - <6.7 16 0.400 0.575 6.26.7 - <7.7 4 0.100 0.675 7.27.7 - <8.7 10 0.250 0.925 8.28.7 - <9.7 3 0.075 1.000 9.2

Solutions

2) The class 5.7 - <6.7 has the highest frequency. The frequency is 16 and the relative frequency is 0.40

1)

Page 20: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

20

ES HistogramHistogram: A bar graph representing a frequency distribution of a quantitative variable. A histogram is made up of the following components:

1. A title, which identifies the population of interest

2. A vertical scale, which identifies the frequencies in the various classes

3. A horizontal scale, which identifies the variable x. Values for the class boundaries or class midpoints may be labeled along the x-axis. Use whichever method of labeling the axis best presents the variable.

Notes: The relative frequency is sometimes used on the vertical scale It is possible to create a histogram based on class midpoints

Page 21: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

21

ES

Example: Construct a histogram for the blood test results given in the previous example

Example

9.28.27.26.25.24.2

15

10

5

0

Frequency

Blood Test

Solution:The Hemoglobin Test

Page 22: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

22

ES

Age Frequency Class Midpoint------------------------------------------------------------20 up to 30 34 2530 up to 40 58 3540 up to 50 76 4550 up to 60 187 5560 up to 70 254 6570 up to 80 241 7580 up to 90 147 85

Example: A recent survey of Roman Catholic nuns summarized their ages in the table below. Construct a histogram for this age data:

Example

Page 23: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

23

ES Solution

85756555453525

200

100

0

Frequency

Age

Roman Catholic Nuns

Page 24: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

24

ES Terms Used to Describe Histograms

Symmetrical: Both sides of the distribution are identical mirror images. There is a line of symmetry.

Uniform (Rectangular): Every value appears with equal frequency

Skewed: One tail is stretched out longer than the other. The direction of skewness is on the side of the longer tail. (Positively skewed vs. negatively skewed)

J-Shaped: There is no tail on the side of the class with the highest frequency

Bimodal: The two largest classes are separated by one or more classes. Often implies two populations are sampled.

Normal: A symmetrical distribution is mounded about the mean and becomes sparse at the extremes

Page 25: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

25

ES

The mode is the value that occurs with greatest frequency (discussed in Section 2.3)

Important Reminders

The modal class is the class with the greatest frequency

A bimodal distribution has two high-frequency classes separated by classes with lower frequencies

Graphical representations of data should include a descriptive, meaningful title and proper identification of the vertical and horizontal scales

Page 26: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

26

ES Cumulative Frequency Distribution

Cumulative Frequency Distribution: A frequency distribution that pairs cumulative frequencies with values of the variable

• The cumulative frequency for any given class is the sum of the frequency for that class and the frequencies of all classes of smaller values

• The cumulative relative frequency for any given class is the sum of the relative frequency for that class and the relative frequencies of all classes of smaller values

Page 27: 1 ES Chapter 1 & 2 ~ Descriptive Analysis & Presentation of Single-Variable Data Mean: 60.07 inches Median: 62.50 inches Range: 42 inches Variance: 117.681.

27

ES

Class Relative Cumulative CumulativeBoundaries Frequency Frequency Frequency Rel. Frequency-------------------------------------------------------------------------------------0 up to 4 4 0.08 4 0.084 up to 8 8 0.16 12 0.248 up to 12 8 0.16 20 0.4012 up to 16 20 0.40 40 0.8016 up to 20 6 0.12 46 0.9220 up to 24 3 0.06 49 0.9824 up to 28 1 0.02 50 1.00

Example: A computer science aptitude test was given to 50 students. The table below summarizes the data:

Example