GraphsVersion10-11gg
-
Upload
kristin-mack -
Category
Documents
-
view
216 -
download
0
Transcript of GraphsVersion10-11gg
-
7/29/2019 GraphsVersion10-11gg
1/20
5The Graphical Description of Data:SPSS v. 10-11
(reproduced from Chapter 5 ofStatistics for Social and Health Research)
Chapter 3 looked at ways of summarizing data by producing frequency tables.
However, often the most striking way of summarizing information is with a
graph. A graph (sometimes called a chart) provides a quick visual sense of the
main features of the data. The particular graphs that can be constructed in any
given context are determined largely by the level of measurement, as indicatedin Table 5.1.
Table 5.1 Graphs by level of measurement
Level of measurement Graph
Nominal and ordinal Pie graph
Bar graph
Interval/ratio Pie graph
Bar graph (discrete variables)
Histogram (continuous variables)
Polygon
Some general principles
In this chapter, to illustrate the procedures and principles involved in
constructing graphs, we will use the hypothetical data for 20 people that we
introduced in Chapter 2 and which are contained in the SPSS file Ch05.sav on
the CD that comes with this book (Table 5.25.4).
Table 5.2 Sex of respondents
Sex Frequency
Male 12
Female 8
Total 20
Table 5.3 Health rating of respondentsHealth rating FrequencyUnhealthy 7
Healthy 5
Very healthy 8
Total 20
-
7/29/2019 GraphsVersion10-11gg
2/20
The graphical description of data: SPSS v. 10-11 93
Table 5.4 Age of respondents
Age in years Frequency
18 7
19 520 4
21 2
22 2
Total 20
The same general rules that apply to the construction of these tables also apply
to the construction of graphs. Most importantly a graph should be a self-
containedbundle of information. A reader should not have to search through
the text in order to understand a graph (if one does have to search the text this
may be a sign that the graph is concealing information rather than illuminating
it). In order for a graph to be a self-contained description of the data we need
to:
give the graph an appropriate title; clearly identify the categories or values of the variable; indicate, for interval/ratio data, the units of measurement; indicate the total number of cases; indicate the source of the data.We will now look at the construction of different types of graphs. Unlike
previous chapters where we separated the SPSS procedures from the hand
calculations, here we will only use SPSS to generate the relevant graphs. The
superiority of computer graphics has made the hand-drawing of graphs
redundant; unlike calculating the mean, where some knowledge of the
procedure for doing things by hand can be helpful, we almost invariably go
directly to a computer to generate a graph.
Pie graphs
A pie graph can be constructed for all levels of measurement.
A pie graph presents the distribution of cases in the form of a circle. The
relative size of each slice of the circle is equal to the proportion of cases within
the category represented by the slice.
To generate a pie graph on SPSS we select Graphs/Pie from the menu (Figure5.1). The Pie Charts dialog box will appear, which provides some options for
how the pie graph is to be defined. We are using the default setting, which is to
group cases according to their values for the same variable. This is the
appropriate choice for the data we are using so we click on the Define button.
This brings up the dialog box shown in Figure 5.2.
-
7/29/2019 GraphsVersion10-11gg
3/20
94 Statistics for Research
Figure 5.1 The Pie Graph command
Figure 5.2 The Define Pie dialog box
In this dialog box we select the variable we want to analyze and what the
slices of the pie will represent. The default setting is for the slices to represent
the number of cases in each category, which is what we want here. The output
shown in Figure 5.3 will be generated when we click on the OK button.
The pie chart describes the data in the same way as the frequency table above,
but in a more visually striking form. However, the basic SPSS procedure
produces a bare minimum amount of information. We do not get a title, nor the
number or proportion of cases in each slice. Some of this information could
have been generated if we had clicked on the Titles button in the Define Piedialog box.
A more complete set of options for improving the presentation of graphs in
SPSS is available if we simply double-click on the graph in the Viewer
window. This brings up the graph in a new SPSS window called the SPSS
Chart Editor (Figure 5.4).
-
7/29/2019 GraphsVersion10-11gg
4/20
The graphical description of data: SPSS v. 10-11 95
Figure 5.3 SPSS Pie Graph output
Figure 5.4 The SPSS Chart Editor
From the menu bar at the top (or the tool bar below it) we have a range of
options for improving the quality of our graph. Here we simply want to attach a
title to the pie graph, and to provide appropriate labels to the slices. I leave it to
you to explore the full range of options available under the Chart Editor. To
attach a title and labels to the graph we use the Chart/Title, Chart/Options,
and Chart/Footnotes commands.
-
7/29/2019 GraphsVersion10-11gg
5/20
96 Statistics for Research
Figure 5.5 The Chart command
The Title command allows us to type in various levels of titles. Here ourneeds are simple so we type in Sex of respondents as Title 1.
The Options command allows us to change the information that isattached to each slice of the pie. We can see in Figure 5.3 that the default
setting is for the text labels (in this example male and female)
represented by each slice to be included. If we click on the boxes next toValues and Percents we will also generate the absolute and relative
frequencies for each slice displayed, along with the text label. You may
also wish to click on the Format button and set the number of decimal
places to 0.
Using the Footnote command we can type Source: Hypothetical surveyfor Footnote 1 and n=20 for Footnote 2 to indicate respectively the
source of data and the total number of cases represented by the graph. We
can also right-justify these footnotes under Footnote Justification.
If we follow these commands Figure 5.6 will appear.
Figure 5.6 SPSS pie chart
-
7/29/2019 GraphsVersion10-11gg
6/20
The graphical description of data: SPSS v. 10-11 97
We can see that the basic chart is still the same, but now we have all the
relevant information as well. Female respondents represent 8 out of 20 students,
so the slice representing the frequency of female respondents is 8/20ths of the
circle (40%). Male respondents represent 12 out of 20 cases (60%).
Now that we have the pie chart in front of us with all the necessary
information what does it tell us about the distribution of the variable we are
investigating? Pie graphs emphasize the relative importance of a particular
category to the total. They are therefore mainly used to highlight distributions
where cases are concentrated in only one or two categories. For example,
Figure 5.7 illustrates the distribution of students by age, and clearly reflects the
large proportion of total cases that are 18 years of age.
Figure 5.7 SPSS pie chart with relative frequencies
Pie graphs begin to look a bit clumsy when there are too many categories for
the variable (about six or more categories, as a rule of thumb, is usually too
many for a pie graph). If we look at the pie graph for the age distribution of
students, we can see that if there were any more slices to the pie the chart will
begin to look a little cluttered.
A pie chart is actually a specific case of a more general type of graph. Any
shape can be used to represent the total number of cases, and the area within itdivided to show the relative number of cases in various categories. For
example, the United Nations has used a champagne glass to illustrate the
distribution of world income (Figure 5.8). Rather than using a simple circle, the
metaphor of the champagne glass, a symbol of wealth, makes this graph a
powerful illustrative device for showing the inequality of world income.
-
7/29/2019 GraphsVersion10-11gg
7/20
98 Statistics for Research
Figure 5.8 World distribution of income
Source: United Nations Development Program (1992) Human Development Report
1992, Oxford: Oxford University Press.
Bar graphs
Another type of graph that can be produced from the same set of data as that
used to generate a pie chart is a bar graph. While these two types of graph can
be generated from exactly the same set of data, they describe different aspects
of the distribution of that data. We have noted that a pie graph emphasizes the
relative contribution of the number of cases in each category to the total. On the
other hand,bar graphs emphasize the frequency of cases in each category
relative to each other.
A bar graph has two sides or axes:
Along one side, usually the horizontal base, of the graph are the values ofthe variable. This axis of the bar graph is called the abscissa.
Along the left vertical side of the graph are the frequencies, expressedeither as the raw count or as percentages of the total number of cases. This
vertical axis is known as the ordinate.
To generate a bar graph on SPSS we choose the Graphs/Bar command from
the menu bar. This brings up the Bar Charts dialog box (Figure 5.9).
-
7/29/2019 GraphsVersion10-11gg
8/20
The graphical description of data: SPSS v. 10-11 99
Figure 5.9 The Bar Charts dialog box
From this box we choose one of three types of bar charts, each of which we
will explore separately:
Simple bar graphs Clustered bar graphs Stacked bar graphs
Simple bar graphs
The default setting in the SPSS Bar Charts dialog box (Figure 5.9) is for a
simple bar chart to be generated. This is evident from the dark border around
the Simple button.
If we click on Define the Define Simple Bar: dialog box appears, which is
almost identical to the Pie Graph dialog box we came across in the previous
section. In this dialog box we select the variable we are investigating from the
source variable list, which in this instance is Sex of respondents, and click on
OK. This will generate a bar graph in the Viewer window.
As with the pie graph we generated earlier we can tidy up this basic graph by
going into the Chart Editor (double-click on the graph). I have generated a bar
chart and edited to add information about the source of data and the sample
size, and to improve its appearance (Figure 5. 10).
-
7/29/2019 GraphsVersion10-11gg
9/20
100 Statistics for Research
Figure 5.10 An SPSS bar chart
Again I emphasize that SPSS provides many options for improving the layout
and appearance of a graph and the amount of information it contains. Often
these choices (such as the wording and placement of titles) are matters of
aesthetic judgment so you need not follow my preferences exactly. Play around
with the options so that you produce something that you like, but always keep
in mind the general rules of graph construction that we discussed above.
Stacked bar graphs
An alternative way of presenting information is to stack the categories on top of
each other, rather than side by side.
A stacked bar graph (sometimes called a component bar graph) layers in a
single bar the number (or proportion) of cases in each category of a distribution.
Each bar is divided into layers, with the area of each layer proportional to the
frequency of the category it represents. It is very similar in this respect to a pie
chart, but using a rectangle rather than a circle.
One advantage of a stacked bar graph is that a number of distributions can be
combined into a single chart. Such a graph is especially helpful in comparing
cases broken down by the categories of another variable, such as distributions
over time. For example, a health assessor may want to compare the current
health rating of respondents with their ratings for the previous 2 years, the data
for which are given in Table 5.5.
-
7/29/2019 GraphsVersion10-11gg
10/20
The graphical description of data: SPSS v. 10-11 101
Table 5.5 Health ratings of respondents, 19971999
Health rating 1997 1998 1999
Very healthy 3 6 8
Healthy 9 6 5Unhealthy 15 11 7
Total 27 23 20
To generate a stacked bar graph in SPSS we click on the Stacked button from
the Bar Charts dialog box (Figure 5.9). I have entered the data from Table 5.5
in a separate SPSS file, with the year of the survey as a separate category axis
variable. This generates the stacked bar graph in Figure 5.11.
Figure 5.11 SPSS stacked bar chart output
We can immediately see that, relatively speaking, these people regard
themselves as getting healthier. Notice that we are using percentages in this
graph to standardize the distributions, since the totals for each year vary.
Stacked bar graphs have the same limitations as pie graphs, especially when
there are too many values stacked on top of each other. Figure 5.11 contains
only three values stacked on top of each other to form each bar, but when there
are more than five categories, and some of the layers are very small, the extra
detail hinders the comparison of distributions that we want to make, rather than
illuminating it.
Clustered bar graphs
An alternative to a stacked bar chart is to place the bars for each category side
by side along the horizontal. This is called a clustered bar graph.
-
7/29/2019 GraphsVersion10-11gg
11/20
102 Statistics for Research
A clustered bar graph displays, for each category of one variable, the
distribution of cases across the categories of another variable.
Figure 5.12 presents a clustered bar graph for health rating across years.
Figure 5.12 An SPSS clustered bar chart
Again I can see the relative decrease in the proportion of unhealthy
respondents and the gradual increase in the proportion of very healthy
respondents.
Histograms
Bar graphs constructed for discrete variables always have gaps between each of
the bars: there is no gradation between male and female, for example. A
persons age, on the other hand, is a continuous variable, in the sense that it
progressively increases. Even though we have chosen discrete values (whole
years) for age to organize the data, the variable itself actually increases in a
continuous way (as we discussed in Chapter 1). As a result, the bars on a
histogram, unlike a bar graph, are pushed together so they touch. The
individual values (or class mid-points if we are working with class intervals) are
displayed along the horizontal, and a rectangle is erected over each point. The
width of each rectangle extends half the distance to the values on either side.The area of each rectangle is proportional to the frequency of the class in the
overall distribution.
Using the example of the age distribution of students we can construct the
following histogram. We select Graphs/Histogram from the SPSS menu
which brings up the Histogram dialog box (Figure 5.13).
-
7/29/2019 GraphsVersion10-11gg
12/20
The graphical description of data: SPSS v. 10-11 103
Figure 5.13 The Histogram dialog box
This dialog box is one of the more straightforward ones that appear in SPSS.
We basically select the relevant variable from the source list and transfer it to
the target list. (Note for the next chapter, though, the option to Display normalcurve.) Selecting Age in years and clicking on OK will produce the histogram
in Figure 5.14.
Figure 5.14 SPSS Histogram output
We can see that the area for the rectangle over 18 years of age, as a proportionof the total area under the histogram, is equal to the proportion of all cases
having this age. The histogram indicates that we are working with a continuous
variable whereby we have clustered cases to the nearest whole year, so that a
score of 18 years of age gathers together cases that are actually between 17.5
and 18.5 years, a score of 19 years of age gathers together cases that are
-
7/29/2019 GraphsVersion10-11gg
13/20
104 Statistics for Research
actually 18.519.5 years, and so on. SPSS has also generated some other
descriptive statistics: the standard deviation, mean, and total number of cases,
which conform to our calculations in Chapter 4.
Frequency polygons
When working with continuous interval/ratio data, we can also represent the
distribution in the form of a frequency polygon.
A frequency polygon is a continuous line formed by plotting the values or
class mid-points in a distribution against the frequency for each value or class.
If we place a dot on the top and center of each bar in a histogram, and connect
the dots, we produce a frequency polygon (Figure 5.15). To illustrate this pointI have not used SPSS to generate a polygon (which SPSS calls a line graph) for
the age distribution of respondents, since we cannot superimpose a polygon
onto the equivalent histogram in SPSS.
Figure 5.15 A frequency polygon
Notice that the polygon begins and ends on a frequency of zero. To explain
why we close the polygon in this way we can think of this distribution as
having seven values for age, rather than five, by including the ages 17 and 23.
Since there are no such respondents with either of these ages the dot at these
values is on the abscissa, indicating a frequency of zero.There are certain common shapes to polygons that appear in research. For
example, Figure 5.16 illustrates the bell-shaped (symmetric) curve, the J-shaped
curve, and the U-shaped curve. The bell-shaped curve is one we will explore in
much greater length in later chapters.
-
7/29/2019 GraphsVersion10-11gg
14/20
The graphical description of data: SPSS v. 10-11 105
Figure 5.16 Three polygon shapes: (a) a bell-shaped curve; (b) a J-shaped curve; and (c)
a U-shaped curve
For polygons that have a bell shape such as that in Figure 5.16(a) we usually
describe two aspects of it. The first is the skewness of the distribution, an issue
we raised in Chapter 4. If the curve has a long tail to the left, it is negatively
skewed, whereas if it has a long tail to the right it is said to be positively
skewed. The second important aspect of a bell-shaped curve is kurtosis.Kurtosis refers to the degree of peakedness or flatness of the curve. We can
see in Figure 5.17(a) a curve with most cases clustered closed together. Such a
curve is referred to as leptokurtic, while curve (b) that has a wide distribution
of scores which fatten the tails is referred to as platykurtic. Curve (b) lies
somewhere in between, and is called mesokurtic.
Figure 5.17 A (a) leptokurtic, (b) platykurtic, and (c) mesokurtic curve
There are precise numerical techniques for measuring the skewness and
kurtosis of a distribution, but these are beyond the scope of this book.
Wherever necessary we will simply rely on a visual inspection of a polygon to
describe these aspects of the distribution.
One aspect of frequency polygons (and histograms) that will be of utmost
importance in later chapters needs to be pointed out, even though its relevance
may not be immediately obvious. A polygon is constructed in such a way that
the area under the curve between any two points on the horizontal, as aproportion of the total area, is equal to the proportion of cases in the
distribution that have that range of values. Thus the shaded area in Figure 5.18,
as a proportion of the total area under the curve, is equal to the proportion of
cases that are aged 21 or above. This is 4/20, or 0.2 of all cases.
-
7/29/2019 GraphsVersion10-11gg
15/20
106 Statistics for Research
Figure 5.18
Another way of looking at this is to say that the probability of randomly
selecting a respondent aged 21 years or more from this group is equal to the
proportion of the total area under the curve that is shaded. This probability is
0.2, or a 1-in-5 chance of selecting a respondent in this age group.
Common problems and misuses of graphs
Unfortunately graphs lend themselves to considerable misuse. Practically every
day in a newspaper we can find one (if not all) of the following tricks, which
can give a misleading impression of the data. For those wishing to look at this
issue further the starting point is Darrell Huffs cheap, readable, and
entertaining classic, How to Lie with Statistics (New York: W.W. Norton,various printings).
Relative size of axes
The same data can give different graphical pictures depending on the relative
sizes of the two axes. By stretching either the abscissa or the ordinate the graph
can be flattened or peaked depending on the impression that we might desire
to convey. Consider for example the data represented in Figure 5.19.
This provides two alternative ways of presenting the same data. The data are
for a hypothetical example of the percentage of respondents who gave a certain
answer to a survey item over a number of years. Graph (a) has the ordinate (the
vertical) compressed relative to graph (b); similarly graph (b) has the abscissa(the horizontal) relatively more compressed. The effect is to make the data
seem much smoother in graph (a) the rise and fall in responses is not so
dramatic. In graph (b) on the other hand, there is a very sharp rise and then fall,
making the change appear dramatic. Yet the two pictures describe the same
data!
-
7/29/2019 GraphsVersion10-11gg
16/20
The graphical description of data: SPSS v. 10-11 107
Figure 5.19 Two ways of presenting data
In order to avoid such distortions the convention in research is to construct
graphs, wherever possible, such that the vertical axis is around two-thirds to
three-quarters the length of the horizontal.
-
7/29/2019 GraphsVersion10-11gg
17/20
108 Statistics for Research
Truncation of the ordinate
A similar effect to the one just described can be achieved by cutting out a
section of the ordinate. Consider the data on the relative pay of women to men
in Australia between 1950 and 1975, shown in Figure 5.20.
Figure 5.20 Womens income as a percentage of mens income, 19501975 presented in
two different ways
-
7/29/2019 GraphsVersion10-11gg
18/20
The graphical description of data: SPSS v. 10-11 109
Graph (a) is complete in the sense that the vertical axis goes from 0 through to
100 percent. The vertical axis in graph (b) though has been truncated: a large
section of the scale between 0 percent and 50 percent has been removed and
replaced by a squiggly line to indicate the truncation. The effect is obvious: the
improvement in womens relative pay after 1965 seems very dramatic, as
opposed to the slight increase reflected in graph (a).
Selection of the start and end points of the abscissa
This is especially relevant with data that describe a pattern of change over time.
Varying the date at which to begin the data series and to end it can give
different impressions. For example, we may begin in a year in which the results
were unusually low, thereby giving the impression of increase over subsequent
years, and vice versa if we choose to begin with a year in which results
peaked. Consider, for example, the time series of data over a 15 year periodshown in Figure 5.21.
Figure 5.21
If we wanted to emphasize steady growth we could begin the graph where the
dashed is displayed, which would completely transform the image presented by
the graph.
Exercises
5.1 Which aspects of a distribution do pie graphs emphasize and whichaspects do bar graphs emphasize?
-
7/29/2019 GraphsVersion10-11gg
19/20
110 Statistics for Research
5.2 Explain the difference between a bar graph and a histogram.5.3 Survey your own statistics class in terms of the variables age, sex, and
health rating. Use the graphing techniques outlined in this chapter to
describe your results.
5.4 In Exercise 2.2 the following data were entered into SPSS (time, in
minutes, taken for subjects in a fitness trial to complete a certain
exercise task):
31 39 45 26 23 56 45 80
35 37 25 42 32 58 80 71
19 16 56 21 34 36 10 38
12 48 38 37 39 42 27 39
17 31 56 28 40 82 27 37
Using SPSS select an appropriate graphing technique to illustrate the
distribution. Justify your choice of technique against the other
available options.
5.5 Consider the following list of prices, in whole dollars, for 20 used cars:
8600 9200 8200 11,300 10,600
7980 11,100 12,900 10,750 9200
13,630 9400 11,800 10,200 12,240
11,670 10,000 11,250 12,750 12,990
From these data:
(a) Construct a frequency table using class intervals:
70008499, 85009999, 1000011499, 1150012999,
1300014499.
(b) Construct a histogram using these class intervals.
(c) Construct a frequency polygon using these class intervals.
5.6 Construct a pie graph to describe the following data:
Migrants in local area, place of origin
Place Number
Asia 900
Africa 1200
Europe 2100South America 1500
Other 300
Total 6000
What feature of this distribution does your pie graph mainly illustrate?
-
7/29/2019 GraphsVersion10-11gg
20/20
The graphical description of data: SPSS v. 10-11 111
5.7 From a recent newspaper or magazine find examples of the use of
graphs. Do these examples follow the rules outlined in this chapter?
5.8 Use the Employee data file to answer the following problems with the
aid of SPSS.
(a) I want to emphasize the high proportion of all cases that have
clerical positions. Which graph should I generate and why?
Generate this graph using SPSS, and add necessary titles and
notes.
(b) Use a stacked bar graph to show the number of women and the
number of men employed in each employment category. What
does this indicate about the sexual division of labor in this
company?
(c) Generate a histogram to display the distribution of scores for
current salary. How would you describe this distribution in terms
of skewness?