Displaying quantitative data with graphs

33
Displaying Displaying Quantitative Quantitative Data with Graphs Data with Graphs

Transcript of Displaying quantitative data with graphs

Page 1: Displaying quantitative data with graphs

Displaying QuantitativeDisplaying QuantitativeData with GraphsData with Graphs

Page 2: Displaying quantitative data with graphs

What you’ll learnWhat you’ll learnTo To createcreate and and interpretinterpret the following the following graphs:graphs: DotplotDotplot Stem and leafStem and leaf

Regular Stem and LeafRegular Stem and LeafSplit Stem and LeafSplit Stem and LeafBack-to-Back Stem and LeafBack-to-Back Stem and Leaf

HistogramHistogram Time PlotTime Plot OgiveOgive

Page 3: Displaying quantitative data with graphs

To learn how to display and describe quantitative data we To learn how to display and describe quantitative data we will be using some baseball statistics. The following table will be using some baseball statistics. The following table shows the number of home runs in a single season for shows the number of home runs in a single season for three well-known baseball players: Hank Aaron, Barry three well-known baseball players: Hank Aaron, Barry Bonds, and Babe Ruth.Bonds, and Babe Ruth.

Hank Aaron Barry Bonds Babe Ruth

13 32 16 40 54 46

27 44 25 37 59 41

26 39 24 34 35 34

44 29 19 49 41 22

30 44 33 73 46  

39 38 25   25  

40 47 34   47  

34 34 46   60  

45 40 37   54  

44 20 33   46  

24   42   49  

Page 4: Displaying quantitative data with graphs

DotplotDotplotLabel the horizontal axis with the name of the Label the horizontal axis with the name of the variable and title the graphvariable and title the graphScale the axis based on the values of the Scale the axis based on the values of the variablevariableMark a dot (we’ll use x’s) above the number on Mark a dot (we’ll use x’s) above the number on the axis corresponding to each data valuethe axis corresponding to each data value

Ruth20 25 30 35 40 45 50 55 60

Number of Home Runs in a Single Season Dot Plot

Page 5: Displaying quantitative data with graphs

Describing a DistributionDescribing a Distribution

We describe a distribution (the values the We describe a distribution (the values the variable takes on and how often it takes variable takes on and how often it takes these values) using the acronym these values) using the acronym SOCSSOCS SShape–hape– We describe the shape of a distribution in We describe the shape of a distribution in

one of two ways:one of two ways:Symmetric/Approx. SymmetricSymmetric/Approx. Symmetric

Symmetric-3 -2 -1 0 1 2 3

Collection 1 Dot Plot

Uniform-3 -2 -1 0 1 2 3 4

Shape Dot Plot

Page 6: Displaying quantitative data with graphs

SkewedSkewedRightRight LeftLeft

Notice that the direction of the “skew” is the same Notice that the direction of the “skew” is the same direction as the “tail”direction as the “tail”

LeftSkewed-3 -2 -1 0 1 2 3 4

Shape Dot Plot

RightSkewed-4 -3 -2 -1 0 1 2 3 4

Shape Dot Plot

“tail” “tail”

Page 7: Displaying quantitative data with graphs

•OOutliers: These are observations that we utliers: These are observations that we would consider “unusual”. Pieces of data would consider “unusual”. Pieces of data that don’t “fit” the overall pattern of the data.that don’t “fit” the overall pattern of the data.

Babe Ruth had two seasons that Babe Ruth had two seasons that appear to be somewhat different appear to be somewhat different than the rest of his career. than the rest of his career. These These maymay be “outliers be “outliers””

(We’ll learn a numerical way to (We’ll learn a numerical way to determine if observations are determine if observations are truly “unusual” later)truly “unusual” later)

The season in which Barry The season in which Barry Bonds hit 73 home runs does Bonds hit 73 home runs does not appear to fit the overall not appear to fit the overall pattern. This piece of data pattern. This piece of data maymay be an outlier. be an outlier.

Bonds10 20 30 40 50 60 70 80

Number of Home Runs in a Single Season Dot Plot

Unusual observation???

Ruth20 25 30 35 40 45 50 55 60 65

Number of Home Runs in a Single Season Dot Plot

Unusual observation???

Page 8: Displaying quantitative data with graphs

CCenter: A single value that describes the entire enter: A single value that describes the entire distribution. A “typical” value that gives a concise distribution. A “typical” value that gives a concise summary of the whole batch of numbers.summary of the whole batch of numbers.

A typical season for Babe Ruth appears to be A typical season for Babe Ruth appears to be approximately 46 home runsapproximately 46 home runs

Ruth20 25 30 35 40 45 50 55 60 65

Number of Home Runs in a Single Season Dot Plot

*We’ll learn about three different numerical measures of center in the next section

Page 9: Displaying quantitative data with graphs

SSpread: Since we know pread: Since we know that not everyone is that not everyone is typical, we need to also typical, we need to also talk about the variation of talk about the variation of a distribution. We need a distribution. We need to discuss if the values of to discuss if the values of the distribution are tightly the distribution are tightly clustered around the clustered around the center making it easy to center making it easy to predict or do the values predict or do the values vary a great deal from the vary a great deal from the center making prediction center making prediction more difficult? more difficult?

Ruth20 25 30 35 40 45 50 55 60 65

Number of Home Runs in a Single Season Dot Plot

Babe Ruth’s number of home runs in a single season varies from a low of 23 to a high of 60.

*We’ll learn about three different numerical measures of spread in the next section.

Page 10: Displaying quantitative data with graphs

Distribution Description using Distribution Description using SOCSSOCS

The distribution of Babe Ruth’s number of home The distribution of Babe Ruth’s number of home runs in a single season is runs in a single season is approximately approximately symmetricsymmetric11 with with two possible unusual two possible unusual observations at 23 and 25 home runsobservations at 23 and 25 home runs..22 He He typically hits about 46typically hits about 4633 home runs in a season. home runs in a season. Over his career, the number of home runs has Over his career, the number of home runs has varied from a low of 23 to a high of 60.varied from a low of 23 to a high of 60.44

1-Shape 2-Outliers

3-Center 4-Spread

Page 11: Displaying quantitative data with graphs

Stem and Leaf PlotStem and Leaf PlotCreating a stem and leaf plotCreating a stem and leaf plot

Order the data points from Order the data points from least to greatestleast to greatestSeparate each observation Separate each observation into a into a stemstem (all but the (all but the rightmost digit) and a rightmost digit) and a leafleaf (the (the final digit)—Ex. 123-> 12 final digit)—Ex. 123-> 12 (stem): 3 (leaf)(stem): 3 (leaf)In a T-chart, write the stems In a T-chart, write the stems vertically in increasing order vertically in increasing order on the left side of the chart.on the left side of the chart.On the right side of the chart On the right side of the chart write write eacheach leaf to the right of leaf to the right of its stem, spacing the leaves its stem, spacing the leaves equallyequallyInclude a key and title for the Include a key and title for the graphgraph

Hank Aaron

1 3

2 0 4 6 7 9

3 0 2 4 4 8 9 9

4 0 0 4 4 4 4 5 7

4 6 = 46

Key

Number of Home Runs in a Single Season

Page 12: Displaying quantitative data with graphs

Split Stem and Leaf PlotSplit Stem and Leaf PlotIf the data in a distribution is concentrated in just If the data in a distribution is concentrated in just a few stems, the picture may be more a few stems, the picture may be more descriptive if we “split” the stemsdescriptive if we “split” the stemsWhen we “split” stems we want the same When we “split” stems we want the same number of digits to be possible in each stem. number of digits to be possible in each stem. This means that each original stem can be split This means that each original stem can be split into 2 or 5 new stems.into 2 or 5 new stems.A good rule of thumb is to have a minimum of 5 A good rule of thumb is to have a minimum of 5 stems overallstems overallLet’s look at how splitting stems changes the Let’s look at how splitting stems changes the look of the distribution of Hank Aaron’s home run look of the distribution of Hank Aaron’s home run data.data.

Page 13: Displaying quantitative data with graphs

Split each stem into 2 Split each stem into 2 new stems. This new stems. This means that the first means that the first stem includes the stem includes the leaves 0-4 and the leaves 0-4 and the second stem has the second stem has the leaves 5-9leaves 5-9Splitting the stems Splitting the stems helps us to “see” the helps us to “see” the shape of the shape of the distribution in this distribution in this case.case.

Hank Aaron1 31  2 0 42 6 7 93 0 2 4 43 8 9 94 0 0 4 4 4 44 5 7

Number of Home Runs in a Single Season

Key

4 6 = 46

Page 14: Displaying quantitative data with graphs

Back-to-Back Stem and LeafBack-to-Back Stem and Leaf

Back-to-Back stem Back-to-Back stem and leaf plots allow and leaf plots allow us to quickly us to quickly compare two compare two distributions.distributions.

Use SOCS to Use SOCS to make comparisons make comparisons between between distributionsdistributions

Aaron   Ruth

3 1  

  1  

4 0 2 2

9 7 6 2 5

4 4 2 0 3 4

9 9 8 3 5

4 4 4 4 0 0 4 1 1

7 5 4 6 6 6 7 9

  5 4 4 9

  5  

  6 0

Number of Home Runs in a Single Season

Key

4 6 = 46

Page 15: Displaying quantitative data with graphs

Advantages and Disadvantages of Advantages and Disadvantages of dotplots/stem and leaf plotsdotplots/stem and leaf plots

AdvantagesAdvantages Preserves each piece Preserves each piece

of dataof data Shows features of the Shows features of the

distribution with distribution with regards to shape—regards to shape—such as clusters, gaps, such as clusters, gaps, outliers, etcoutliers, etc

DisadvantagesDisadvantages If creating by hand, If creating by hand,

large data sets can be large data sets can be cumbersomecumbersome

Data that is widely Data that is widely varied may be difficult varied may be difficult to graphto graph

Page 16: Displaying quantitative data with graphs

HistogramsHistograms

A histogram is one of the most common graphs A histogram is one of the most common graphs used for quantitative variables.used for quantitative variables.Although a histogram looks like a bar chart Although a histogram looks like a bar chart there are some important differencesthere are some important differences In a histogram, the “bars” touch each otherIn a histogram, the “bars” touch each other Histograms do not necessarily preserve individual Histograms do not necessarily preserve individual

data piecesdata pieces Changing the “scale” or “bin width” can drastically Changing the “scale” or “bin width” can drastically

alter the picture of the distribution, so caution must alter the picture of the distribution, so caution must be used when describing a distribution when only a be used when describing a distribution when only a histogram has been usedhistogram has been used

Page 17: Displaying quantitative data with graphs

Creating a histogramCreating a histogram

Divide the range of Divide the range of data into classes of data into classes of equal width. Count equal width. Count the number of the number of observations in each observations in each class. (Remember class. (Remember that the width is that the width is somewhat arbitrary somewhat arbitrary and you might choose and you might choose a different width than a different width than someone else)someone else)

Barry Bonds:Barry Bonds: Data Ranges from 16 Data Ranges from 16

to 73, so we choose to 73, so we choose for our classesfor our classes

15 15 ≤ # of HR ≤ 19≤ # of HR ≤ 19......

70 70 ≤ # of HR ≤ 75≤ # of HR ≤ 75 We can then We can then

determine the counts determine the counts for each “bin”for each “bin”

Page 18: Displaying quantitative data with graphs

So the frequency So the frequency distribution looks like:distribution looks like:

The horizontal axis The horizontal axis represents the represents the variable values, so variable values, so using the lower bound using the lower bound of each class to scale of each class to scale is appropriate.is appropriate.The vertical axis can The vertical axis can representrepresent FrequencyFrequency Relative frequencyRelative frequency Cumulative frequencyCumulative frequency Relative cumulative Relative cumulative

frequencyfrequencyWe’ll use frequencyWe’ll use frequency

Class Frequency

15-24 3

25-34 6

35-44 4

45-54 2

55-64 0

65-74 1

Page 19: Displaying quantitative data with graphs

Label and scale your axes. Title your graphLabel and scale your axes. Title your graphDraw a bar that represents the frequency for Draw a bar that represents the frequency for each class. Remember that the bars of the each class. Remember that the bars of the histograms should touch each other.histograms should touch each other.

Page 20: Displaying quantitative data with graphs

InterpretationInterpretation

We interpret a histogram in the same way We interpret a histogram in the same way we interpret a dotplot or stem and leaf plot.we interpret a dotplot or stem and leaf plot.ALWAYS useALWAYS use

S O C SS O C SShapeShape Outliers OutliersCenterCenter SpreadSpread

Page 21: Displaying quantitative data with graphs

Time PlotsTime Plots

Sometimes, our data is collected at Sometimes, our data is collected at intervals over time and we are looking for intervals over time and we are looking for changes or patterns that have occurred.changes or patterns that have occurred.We use a time plot for this type of dataWe use a time plot for this type of dataA time plot uses both the horizontal and A time plot uses both the horizontal and vertical axes.vertical axes. The horizontal axis represents the time The horizontal axis represents the time

intervalsintervals The vertical axis represents the variable The vertical axis represents the variable

valuesvalues

Page 22: Displaying quantitative data with graphs

Creating a Time PlotCreating a Time Plot

Label and scale the Label and scale the axes. Title your axes. Title your graph.graph.Plot a point Plot a point corresponding to the corresponding to the data taken at each data taken at each time intervaltime intervalA line segment drawn A line segment drawn between each point between each point may be helpful to see may be helpful to see patterns in the datapatterns in the data

Year HR Year HR

1986 16 1994 37

1987 25 1995 33

1988 24 1996 42

1989 19 1997 40

1990 33 1998 37

1991 25 1999 34

1992 34 2000 49

1993 46 2001 73

Bon

dsH

R

10

203040

506070

80

Year1986 1990 1994 1998 2002

Barry Bonds Line Scatter Plot

Page 23: Displaying quantitative data with graphs

Describing Time PlotsDescribing Time PlotsWhen describing time When describing time plots, you should look for plots, you should look for trends in the datatrends in the dataAlthough the number of Although the number of home runs do not show a home runs do not show a constant increase from constant increase from year to year we note that year to year we note that overall, the number of overall, the number of home runs made by Barry home runs made by Barry Bond has increased over Bond has increased over time with the most time with the most notable increase being notable increase being between 1999 and 2001.between 1999 and 2001.

Bon

dsH

R

10

203040

506070

80

Year1986 1990 1994 1998 2002

Barry Bonds Line Scatter Plot

Page 24: Displaying quantitative data with graphs

Relative frequency, Cumulative Relative frequency, Cumulative frequency, Percentiles, and Ogivesfrequency, Percentiles, and Ogives

Sometimes we are interested in describing Sometimes we are interested in describing the relative position of an observationthe relative position of an observationFor example: you have no doubtably been For example: you have no doubtably been told at one time or another that you scored told at one time or another that you scored at the 80at the 80thth percentile. This means that percentile. This means that 80% of the people taking the test score the 80% of the people taking the test score the same or lower than you did.same or lower than you did.How can we model this?How can we model this?

Page 25: Displaying quantitative data with graphs

Ogive Ogive (Relative cumulative frequency graph)(Relative cumulative frequency graph)

We first start We first start by creating a by creating a frequency frequency tabletableWe’ll look at We’ll look at how each how each column is column is created in the created in the next few next few slidesslides

# of home       Relative

runs in a   Relative Cumulative Cumulative

season Frequency Frequency Frequency Frequency

15-24 3 0.1875 3 0.1875

25-34 6 0.375 9 0.5625

35-44 4 0.25 13 0.8125

45-54 2 0.125 15 0.9375

55-64 0 0.0 15 0.9375

65-74 1 0.0625 16 1.0000

Page 26: Displaying quantitative data with graphs

Relative FrequencyRelative Frequency

The # of home runs… and The # of home runs… and the frequency are the same the frequency are the same columns as we created for columns as we created for the histogram.the histogram.To find the values for the To find the values for the “Relative Frequency” “Relative Frequency” column find the following:column find the following:

Frequency ValueFrequency ValueTotal # of Total # of = Relative = Relative FrequencyFrequencyobservationsobservations

# of home   * 

runs in a   Relative

season Frequency Frequency

15-24 3 0.1875

25-34 6 0.375

35-44 4 0.25

45-54 2 0.125

55-64 0 0.0

65-74 1 0.0625

* Within rounding, this column should equal 1

Page 27: Displaying quantitative data with graphs

Cumulative FrequencyCumulative Frequency

Cumulative frequency Cumulative frequency simply adds the simply adds the counts in the counts in the frequency column that frequency column that fall in or below the fall in or below the current class level.current class level.For Example: to find For Example: to find the “13”, add the the “13”, add the frequencies in the frequencies in the oval: oval: 3+6+4+2+0+1=163+6+4+2+0+1=16

# of home      

runs in a   Relative Cumulative

season Frequency Frequency Frequency

15-24 3 0.1875 3

25-34 6 0.375 9

35-44 4 0.25 13

45-54 2 0.125 15

55-64 0 0.0 15

65-74 1 0.0625 16

Page 28: Displaying quantitative data with graphs

Relative Cumulative FrequencyRelative Cumulative Frequency

Relative cumulative Relative cumulative frequency divides the frequency divides the cumulative frequency cumulative frequency by the total number of by the total number of observationsobservations

For Example:For Example:.8125 = 13/16.8125 = 13/16

# of home       Relative

runs in a   Relative Cumulative Cumulative

season Frequency Frequency Frequency Frequency

15-24 3 0.1875 3 0.1875

25-34 6 0.375 9 0.5625

35-44 4 0.25 13 0.8125

45-54 2 0.125 15 0.9375

55-64 0 0.0 15 0.9375

65-74 1 0.0625 16 1.0000

Sum 16 1

Page 29: Displaying quantitative data with graphs

Creating the OgiveCreating the OgiveLabel and scale the axesLabel and scale the axes Horizontal: VariableHorizontal: Variable Vertical: Relative Cumulative Frequency Vertical: Relative Cumulative Frequency

(percentile)(percentile)Plot a point corresponding to the relative Plot a point corresponding to the relative cumulative frequency in each class interval at cumulative frequency in each class interval at the the left endpoint of the left endpoint of the nextnext class class interval intervalThe last point you should plot should be at a The last point you should plot should be at a height of 100%height of 100%

Page 30: Displaying quantitative data with graphs

# of home Relative

runs in a Cumulative

season Frequency *

15-24 0.1875

25-34 0.5625

35-44 0.8125

45-54 0.9375

55-64 0.9375

65-74 1.0000

A line segment from point to point can be added for analysis

Page 31: Displaying quantitative data with graphs

Types of Info from OgivesTypes of Info from OgivesFinding an individual observation within the Finding an individual observation within the distributiondistributionFind the relative standing of a season in which Find the relative standing of a season in which Barry Bonds hit 40 home runsBarry Bonds hit 40 home runs

A season with 40 home runs lies at the 60th percentile, meaning that approximately 60% of his seasons had 40 or less home runs

Page 32: Displaying quantitative data with graphs

Locating an observation corresponding to a Locating an observation corresponding to a percentile.percentile.How many home runs must be hit in a season How many home runs must be hit in a season to correspond to the 75to correspond to the 75thth percentile? percentile?

To be better than 75% of Mr. Bonds season, approximately 42 home runs must be hit.

Page 33: Displaying quantitative data with graphs

A little History on the word Ogive A little History on the word Ogive (sometimes called an Ogee)(sometimes called an Ogee)It was first used by Sir Francis It was first used by Sir Francis Galton, who borrowed a term from Galton, who borrowed a term from architecture to describe the architecture to describe the cumulative normal curve (more cumulative normal curve (more about that next chapter).about that next chapter).The ogive in architecture was a The ogive in architecture was a common decorative element in common decorative element in many of the English Churches many of the English Churches around 1400. The picture at right around 1400. The picture at right shows the door to the Church of shows the door to the Church of The Holy Cross at the village of The Holy Cross at the village of Caston in Norfolk. In this image you Caston in Norfolk. In this image you can see the use of the ogive in the can see the use of the ogive in the design of the door and repeated in design of the door and repeated in the windows above. the windows above. Find more about this term at Find more about this term at Mathwords..