GraphsVersion10-11gg

7/29/2019 GraphsVersion10-11gg

1/20

5The Graphical Description of Data:SPSS v. 10-11

(reproduced from Chapter 5 ofStatistics for Social and Health Research)

Chapter 3 looked at ways of summarizing data by producing frequency tables.

However, often the most striking way of summarizing information is with a

graph. A graph (sometimes called a chart) provides a quick visual sense of the

main features of the data. The particular graphs that can be constructed in any

given context are determined largely by the level of measurement, as indicatedin Table 5.1.

Table 5.1 Graphs by level of measurement

Level of measurement Graph

Nominal and ordinal Pie graph

Bar graph

Interval/ratio Pie graph

Bar graph (discrete variables)

Histogram (continuous variables)

Polygon

Some general principles

In this chapter, to illustrate the procedures and principles involved in

constructing graphs, we will use the hypothetical data for 20 people that we

introduced in Chapter 2 and which are contained in the SPSS file Ch05.sav on

the CD that comes with this book (Table 5.25.4).

Table 5.2 Sex of respondents

Sex Frequency

Male 12

Female 8

Total 20

Table 5.3 Health rating of respondentsHealth rating FrequencyUnhealthy 7

Healthy 5

Very healthy 8

Total 20


2/20

The graphical description of data: SPSS v. 10-11 93

Table 5.4 Age of respondents

Age in years Frequency

18 7

19 520 4

21 2

22 2

Total 20

The same general rules that apply to the construction of these tables also apply

to the construction of graphs. Most importantly a graph should be a self-

containedbundle of information. A reader should not have to search through

the text in order to understand a graph (if one does have to search the text this

may be a sign that the graph is concealing information rather than illuminating

it). In order for a graph to be a self-contained description of the data we need

to:

give the graph an appropriate title; clearly identify the categories or values of the variable; indicate, for interval/ratio data, the units of measurement; indicate the total number of cases; indicate the source of the data.We will now look at the construction of different types of graphs. Unlike

previous chapters where we separated the SPSS procedures from the hand

calculations, here we will only use SPSS to generate the relevant graphs. The

superiority of computer graphics has made the hand-drawing of graphs

redundant; unlike calculating the mean, where some knowledge of the

procedure for doing things by hand can be helpful, we almost invariably go

directly to a computer to generate a graph.

Pie graphs

A pie graph can be constructed for all levels of measurement.

A pie graph presents the distribution of cases in the form of a circle. The

relative size of each slice of the circle is equal to the proportion of cases within

the category represented by the slice.

To generate a pie graph on SPSS we select Graphs/Pie from the menu (Figure5.1). The Pie Charts dialog box will appear, which provides some options for

how the pie graph is to be defined. We are using the default setting, which is to

group cases according to their values for the same variable. This is the

appropriate choice for the data we are using so we click on the Define button.

This brings up the dialog box shown in Figure 5.2.


3/20

94 Statistics for Research

Figure 5.1 The Pie Graph command

Figure 5.2 The Define Pie dialog box

In this dialog box we select the variable we want to analyze and what the

slices of the pie will represent. The default setting is for the slices to represent

the number of cases in each category, which is what we want here. The output

shown in Figure 5.3 will be generated when we click on the OK button.

The pie chart describes the data in the same way as the frequency table above,

but in a more visually striking form. However, the basic SPSS procedure

produces a bare minimum amount of information. We do not get a title, nor the

number or proportion of cases in each slice. Some of this information could

have been generated if we had clicked on the Titles button in the Define Piedialog box.

A more complete set of options for improving the presentation of graphs in

SPSS is available if we simply double-click on the graph in the Viewer

window. This brings up the graph in a new SPSS window called the SPSS

Chart Editor (Figure 5.4).


4/20


Figure 5.3 SPSS Pie Graph output

Figure 5.4 The SPSS Chart Editor

From the menu bar at the top (or the tool bar below it) we have a range of

options for improving the quality of our graph. Here we simply want to attach a

title to the pie graph, and to provide appropriate labels to the slices. I leave it to

you to explore the full range of options available under the Chart Editor. To

attach a title and labels to the graph we use the Chart/Title, Chart/Options,

and Chart/Footnotes commands.


5/20


Figure 5.5 The Chart command

The Title command allows us to type in various levels of titles. Here ourneeds are simple so we type in Sex of respondents as Title 1.

The Options command allows us to change the information that isattached to each slice of the pie. We can see in Figure 5.3 that the default

setting is for the text labels (in this example male and female)

represented by each slice to be included. If we click on the boxes next toValues and Percents we will also generate the absolute and relative

frequencies for each slice displayed, along with the text label. You may

also wish to click on the Format button and set the number of decimal

places to 0.

Using the Footnote command we can type Source: Hypothetical surveyfor Footnote 1 and n=20 for Footnote 2 to indicate respectively the

source of data and the total number of cases represented by the graph. We

can also right-justify these footnotes under Footnote Justification.

If we follow these commands Figure 5.6 will appear.

Figure 5.6 SPSS pie chart


6/20


We can see that the basic chart is still the same, but now we have all the

relevant information as well. Female respondents represent 8 out of 20 students,

so the slice representing the frequency of female respondents is 8/20ths of the

circle (40%). Male respondents represent 12 out of 20 cases (60%).

Now that we have the pie chart in front of us with all the necessary

information what does it tell us about the distribution of the variable we are

investigating? Pie graphs emphasize the relative importance of a particular

category to the total. They are therefore mainly used to highlight distributions

where cases are concentrated in only one or two categories. For example,

Figure 5.7 illustrates the distribution of students by age, and clearly reflects the

large proportion of total cases that are 18 years of age.

Figure 5.7 SPSS pie chart with relative frequencies

Pie graphs begin to look a bit clumsy when there are too many categories for

the variable (about six or more categories, as a rule of thumb, is usually too

many for a pie graph). If we look at the pie graph for the age distribution of

students, we can see that if there were any more slices to the pie the chart will

begin to look a little cluttered.

A pie chart is actually a specific case of a more general type of graph. Any

shape can be used to represent the total number of cases, and the area within itdivided to show the relative number of cases in various categories. For

example, the United Nations has used a champagne glass to illustrate the

distribution of world income (Figure 5.8). Rather than using a simple circle, the

metaphor of the champagne glass, a symbol of wealth, makes this graph a

powerful illustrative device for showing the inequality of world income.


7/20


Figure 5.8 World distribution of income

Source: United Nations Development Program (1992) Human Development Report

1992, Oxford: Oxford University Press.

Bar graphs

Another type of graph that can be produced from the same set of data as that

used to generate a pie chart is a bar graph. While these two types of graph can

be generated from exactly the same set of data, they describe different aspects

of the distribution of that data. We have noted that a pie graph emphasizes the

relative contribution of the number of cases in each category to the total. On the

other hand,bar graphs emphasize the frequency of cases in each category

relative to each other.

A bar graph has two sides or axes:

Along one side, usually the horizontal base, of the graph are the values ofthe variable. This axis of the bar graph is called the abscissa.

Along the left vertical side of the graph are the frequencies, expressedeither as the raw count or as percentages of the total number of cases. This

vertical axis is known as the ordinate.

To generate a bar graph on SPSS we choose the Graphs/Bar command from

the menu bar. This brings up the Bar Charts dialog box (Figure 5.9).


8/20


Figure 5.9 The Bar Charts dialog box

From this box we choose one of three types of bar charts, each of which we

will explore separately:

Simple bar graphs Clustered bar graphs Stacked bar graphs

Simple bar graphs

The default setting in the SPSS Bar Charts dialog box (Figure 5.9) is for a

simple bar chart to be generated. This is evident from the dark border around

the Simple button.

If we click on Define the Define Simple Bar: dialog box appears, which is

almost identical to the Pie Graph dialog box we came across in the previous

section. In this dialog box we select the variable we are investigating from the

source variable list, which in this instance is Sex of respondents, and click on

OK. This will generate a bar graph in the Viewer window.

As with the pie graph we generated earlier we can tidy up this basic graph by

going into the Chart Editor (double-click on the graph). I have generated a bar

chart and edited to add information about the source of data and the sample

size, and to improve its appearance (Figure 5. 10).


9/20


Figure 5.10 An SPSS bar chart

Again I emphasize that SPSS provides many options for improving the layout

and appearance of a graph and the amount of information it contains. Often

these choices (such as the wording and placement of titles) are matters of

aesthetic judgment so you need not follow my preferences exactly. Play around

with the options so that you produce something that you like, but always keep

in mind the general rules of graph construction that we discussed above.

Stacked bar graphs

An alternative way of presenting information is to stack the categories on top of

each other, rather than side by side.

A stacked bar graph (sometimes called a component bar graph) layers in a

single bar the number (or proportion) of cases in each category of a distribution.

Each bar is divided into layers, with the area of each layer proportional to the

frequency of the category it represents. It is very similar in this respect to a pie

chart, but using a rectangle rather than a circle.

One advantage of a stacked bar graph is that a number of distributions can be

combined into a single chart. Such a graph is especially helpful in comparing

cases broken down by the categories of another variable, such as distributions

over time. For example, a health assessor may want to compare the current

health rating of respondents with their ratings for the previous 2 years, the data

for which are given in Table 5.5.


10/20


Table 5.5 Health ratings of respondents, 19971999

Health rating 1997 1998 1999

Very healthy 3 6 8

Healthy 9 6 5Unhealthy 15 11 7

Total 27 23 20

To generate a stacked bar graph in SPSS we click on the Stacked button from

the Bar Charts dialog box (Figure 5.9). I have entered the data from Table 5.5

in a separate SPSS file, with the year of the survey as a separate category axis

variable. This generates the stacked bar graph in Figure 5.11.

Figure 5.11 SPSS stacked bar chart output

We can immediately see that, relatively speaking, these people regard

themselves as getting healthier. Notice that we are using percentages in this

graph to standardize the distributions, since the totals for each year vary.

Stacked bar graphs have the same limitations as pie graphs, especially when

there are too many values stacked on top of each other. Figure 5.11 contains

only three values stacked on top of each other to form each bar, but when there

are more than five categories, and some of the layers are very small, the extra

detail hinders the comparison of distributions that we want to make, rather than

illuminating it.

Clustered bar graphs

An alternative to a stacked bar chart is to place the bars for each category side

by side along the horizontal. This is called a clustered bar graph.


11/20


A clustered bar graph displays, for each category of one variable, the

distribution of cases across the categories of another variable.

Figure 5.12 presents a clustered bar graph for health rating across years.

Figure 5.12 An SPSS clustered bar chart

Again I can see the relative decrease in the proportion of unhealthy

respondents and the gradual increase in the proportion of very healthy

respondents.

Histograms

Bar graphs constructed for discrete variables always have gaps between each of

the bars: there is no gradation between male and female, for example. A

persons age, on the other hand, is a continuous variable, in the sense that it

progressively increases. Even though we have chosen discrete values (whole

years) for age to organize the data, the variable itself actually increases in a

continuous way (as we discussed in Chapter 1). As a result, the bars on a

histogram, unlike a bar graph, are pushed together so they touch. The

individual values (or class mid-points if we are working with class intervals) are

displayed along the horizontal, and a rectangle is erected over each point. The

width of each rectangle extends half the distance to the values on either side.The area of each rectangle is proportional to the frequency of the class in the

overall distribution.

Using the example of the age distribution of students we can construct the

following histogram. We select Graphs/Histogram from the SPSS menu

which brings up the Histogram dialog box (Figure 5.13).


12/20


Figure 5.13 The Histogram dialog box

This dialog box is one of the more straightforward ones that appear in SPSS.

We basically select the relevant variable from the source list and transfer it to

the target list. (Note for the next chapter, though, the option to Display normalcurve.) Selecting Age in years and clicking on OK will produce the histogram

in Figure 5.14.

Figure 5.14 SPSS Histogram output

We can see that the area for the rectangle over 18 years of age, as a proportionof the total area under the histogram, is equal to the proportion of all cases

having this age. The histogram indicates that we are working with a continuous

variable whereby we have clustered cases to the nearest whole year, so that a

score of 18 years of age gathers together cases that are actually between 17.5

and 18.5 years, a score of 19 years of age gathers together cases that are


13/20


actually 18.519.5 years, and so on. SPSS has also generated some other

descriptive statistics: the standard deviation, mean, and total number of cases,

which conform to our calculations in Chapter 4.

Frequency polygons

When working with continuous interval/ratio data, we can also represent the

distribution in the form of a frequency polygon.

A frequency polygon is a continuous line formed by plotting the values or

class mid-points in a distribution against the frequency for each value or class.

If we place a dot on the top and center of each bar in a histogram, and connect

the dots, we produce a frequency polygon (Figure 5.15). To illustrate this pointI have not used SPSS to generate a polygon (which SPSS calls a line graph) for

the age distribution of respondents, since we cannot superimpose a polygon

onto the equivalent histogram in SPSS.

Figure 5.15 A frequency polygon

Notice that the polygon begins and ends on a frequency of zero. To explain

why we close the polygon in this way we can think of this distribution as

having seven values for age, rather than five, by including the ages 17 and 23.

Since there are no such respondents with either of these ages the dot at these

values is on the abscissa, indicating a frequency of zero.There are certain common shapes to polygons that appear in research. For

example, Figure 5.16 illustrates the bell-shaped (symmetric) curve, the J-shaped

curve, and the U-shaped curve. The bell-shaped curve is one we will explore in

much greater length in later chapters.


14/20


Figure 5.16 Three polygon shapes: (a) a bell-shaped curve; (b) a J-shaped curve; and (c)

a U-shaped curve

For polygons that have a bell shape such as that in Figure 5.16(a) we usually

describe two aspects of it. The first is the skewness of the distribution, an issue

we raised in Chapter 4. If the curve has a long tail to the left, it is negatively

skewed, whereas if it has a long tail to the right it is said to be positively

skewed. The second important aspect of a bell-shaped curve is kurtosis.Kurtosis refers to the degree of peakedness or flatness of the curve. We can

see in Figure 5.17(a) a curve with most cases clustered closed together. Such a

curve is referred to as leptokurtic, while curve (b) that has a wide distribution

of scores which fatten the tails is referred to as platykurtic. Curve (b) lies

somewhere in between, and is called mesokurtic.

Figure 5.17 A (a) leptokurtic, (b) platykurtic, and (c) mesokurtic curve

There are precise numerical techniques for measuring the skewness and

kurtosis of a distribution, but these are beyond the scope of this book.

Wherever necessary we will simply rely on a visual inspection of a polygon to

describe these aspects of the distribution.

One aspect of frequency polygons (and histograms) that will be of utmost

importance in later chapters needs to be pointed out, even though its relevance

may not be immediately obvious. A polygon is constructed in such a way that

the area under the curve between any two points on the horizontal, as aproportion of the total area, is equal to the proportion of cases in the

distribution that have that range of values. Thus the shaded area in Figure 5.18,

as a proportion of the total area under the curve, is equal to the proportion of

cases that are aged 21 or above. This is 4/20, or 0.2 of all cases.


15/20


Figure 5.18

Another way of looking at this is to say that the probability of randomly

selecting a respondent aged 21 years or more from this group is equal to the

proportion of the total area under the curve that is shaded. This probability is

0.2, or a 1-in-5 chance of selecting a respondent in this age group.

Common problems and misuses of graphs

Unfortunately graphs lend themselves to considerable misuse. Practically every

day in a newspaper we can find one (if not all) of the following tricks, which

can give a misleading impression of the data. For those wishing to look at this

issue further the starting point is Darrell Huffs cheap, readable, and

entertaining classic, How to Lie with Statistics (New York: W.W. Norton,various printings).

Relative size of axes

The same data can give different graphical pictures depending on the relative

sizes of the two axes. By stretching either the abscissa or the ordinate the graph

can be flattened or peaked depending on the impression that we might desire

to convey. Consider for example the data represented in Figure 5.19.

This provides two alternative ways of presenting the same data. The data are

for a hypothetical example of the percentage of respondents who gave a certain

answer to a survey item over a number of years. Graph (a) has the ordinate (the

vertical) compressed relative to graph (b); similarly graph (b) has the abscissa(the horizontal) relatively more compressed. The effect is to make the data

seem much smoother in graph (a) the rise and fall in responses is not so

dramatic. In graph (b) on the other hand, there is a very sharp rise and then fall,

making the change appear dramatic. Yet the two pictures describe the same

data!


16/20


Figure 5.19 Two ways of presenting data

In order to avoid such distortions the convention in research is to construct

graphs, wherever possible, such that the vertical axis is around two-thirds to

three-quarters the length of the horizontal.


17/20


Truncation of the ordinate

A similar effect to the one just described can be achieved by cutting out a

section of the ordinate. Consider the data on the relative pay of women to men

in Australia between 1950 and 1975, shown in Figure 5.20.

Figure 5.20 Womens income as a percentage of mens income, 19501975 presented in

two different ways


18/20


Graph (a) is complete in the sense that the vertical axis goes from 0 through to

100 percent. The vertical axis in graph (b) though has been truncated: a large

section of the scale between 0 percent and 50 percent has been removed and

replaced by a squiggly line to indicate the truncation. The effect is obvious: the

improvement in womens relative pay after 1965 seems very dramatic, as

opposed to the slight increase reflected in graph (a).

Selection of the start and end points of the abscissa

This is especially relevant with data that describe a pattern of change over time.

Varying the date at which to begin the data series and to end it can give

different impressions. For example, we may begin in a year in which the results

were unusually low, thereby giving the impression of increase over subsequent

years, and vice versa if we choose to begin with a year in which results

peaked. Consider, for example, the time series of data over a 15 year periodshown in Figure 5.21.

Figure 5.21

If we wanted to emphasize steady growth we could begin the graph where the

dashed is displayed, which would completely transform the image presented by

the graph.

Exercises

5.1 Which aspects of a distribution do pie graphs emphasize and whichaspects do bar graphs emphasize?


19/20


5.2 Explain the difference between a bar graph and a histogram.5.3 Survey your own statistics class in terms of the variables age, sex, and

health rating. Use the graphing techniques outlined in this chapter to

describe your results.

5.4 In Exercise 2.2 the following data were entered into SPSS (time, in

minutes, taken for subjects in a fitness trial to complete a certain

exercise task):

31 39 45 26 23 56 45 80

35 37 25 42 32 58 80 71

19 16 56 21 34 36 10 38

12 48 38 37 39 42 27 39

17 31 56 28 40 82 27 37

Using SPSS select an appropriate graphing technique to illustrate the

distribution. Justify your choice of technique against the other

available options.

5.5 Consider the following list of prices, in whole dollars, for 20 used cars:

8600 9200 8200 11,300 10,600

7980 11,100 12,900 10,750 9200

13,630 9400 11,800 10,200 12,240

11,670 10,000 11,250 12,750 12,990

From these data:

(a) Construct a frequency table using class intervals:

70008499, 85009999, 1000011499, 1150012999,

1300014499.

(b) Construct a histogram using these class intervals.

(c) Construct a frequency polygon using these class intervals.

5.6 Construct a pie graph to describe the following data:

Migrants in local area, place of origin

Place Number

Asia 900

Africa 1200

Europe 2100South America 1500

Other 300

Total 6000

What feature of this distribution does your pie graph mainly illustrate?


20/20


5.7 From a recent newspaper or magazine find examples of the use of

graphs. Do these examples follow the rules outlined in this chapter?

5.8 Use the Employee data file to answer the following problems with the

aid of SPSS.

(a) I want to emphasize the high proportion of all cases that have

clerical positions. Which graph should I generate and why?

Generate this graph using SPSS, and add necessary titles and

notes.

(b) Use a stacked bar graph to show the number of women and the

number of men employed in each employment category. What

does this indicate about the sexual division of labor in this

company?

(c) Generate a histogram to display the distribution of scores for

current salary. How would you describe this distribution in terms

of skewness?

GraphsVersion10-11gg

Documents

Transcript of GraphsVersion10-11gg