GraphsVersion10-11gg

download GraphsVersion10-11gg

of 20

Transcript of GraphsVersion10-11gg

  • 7/29/2019 GraphsVersion10-11gg

    1/20

    5The Graphical Description of Data:SPSS v. 10-11

    (reproduced from Chapter 5 ofStatistics for Social and Health Research)

    Chapter 3 looked at ways of summarizing data by producing frequency tables.

    However, often the most striking way of summarizing information is with a

    graph. A graph (sometimes called a chart) provides a quick visual sense of the

    main features of the data. The particular graphs that can be constructed in any

    given context are determined largely by the level of measurement, as indicatedin Table 5.1.

    Table 5.1 Graphs by level of measurement

    Level of measurement Graph

    Nominal and ordinal Pie graph

    Bar graph

    Interval/ratio Pie graph

    Bar graph (discrete variables)

    Histogram (continuous variables)

    Polygon

    Some general principles

    In this chapter, to illustrate the procedures and principles involved in

    constructing graphs, we will use the hypothetical data for 20 people that we

    introduced in Chapter 2 and which are contained in the SPSS file Ch05.sav on

    the CD that comes with this book (Table 5.25.4).

    Table 5.2 Sex of respondents

    Sex Frequency

    Male 12

    Female 8

    Total 20

    Table 5.3 Health rating of respondentsHealth rating FrequencyUnhealthy 7

    Healthy 5

    Very healthy 8

    Total 20

  • 7/29/2019 GraphsVersion10-11gg

    2/20

    The graphical description of data: SPSS v. 10-11 93

    Table 5.4 Age of respondents

    Age in years Frequency

    18 7

    19 520 4

    21 2

    22 2

    Total 20

    The same general rules that apply to the construction of these tables also apply

    to the construction of graphs. Most importantly a graph should be a self-

    containedbundle of information. A reader should not have to search through

    the text in order to understand a graph (if one does have to search the text this

    may be a sign that the graph is concealing information rather than illuminating

    it). In order for a graph to be a self-contained description of the data we need

    to:

    give the graph an appropriate title; clearly identify the categories or values of the variable; indicate, for interval/ratio data, the units of measurement; indicate the total number of cases; indicate the source of the data.We will now look at the construction of different types of graphs. Unlike

    previous chapters where we separated the SPSS procedures from the hand

    calculations, here we will only use SPSS to generate the relevant graphs. The

    superiority of computer graphics has made the hand-drawing of graphs

    redundant; unlike calculating the mean, where some knowledge of the

    procedure for doing things by hand can be helpful, we almost invariably go

    directly to a computer to generate a graph.

    Pie graphs

    A pie graph can be constructed for all levels of measurement.

    A pie graph presents the distribution of cases in the form of a circle. The

    relative size of each slice of the circle is equal to the proportion of cases within

    the category represented by the slice.

    To generate a pie graph on SPSS we select Graphs/Pie from the menu (Figure5.1). The Pie Charts dialog box will appear, which provides some options for

    how the pie graph is to be defined. We are using the default setting, which is to

    group cases according to their values for the same variable. This is the

    appropriate choice for the data we are using so we click on the Define button.

    This brings up the dialog box shown in Figure 5.2.

  • 7/29/2019 GraphsVersion10-11gg

    3/20

    94 Statistics for Research

    Figure 5.1 The Pie Graph command

    Figure 5.2 The Define Pie dialog box

    In this dialog box we select the variable we want to analyze and what the

    slices of the pie will represent. The default setting is for the slices to represent

    the number of cases in each category, which is what we want here. The output

    shown in Figure 5.3 will be generated when we click on the OK button.

    The pie chart describes the data in the same way as the frequency table above,

    but in a more visually striking form. However, the basic SPSS procedure

    produces a bare minimum amount of information. We do not get a title, nor the

    number or proportion of cases in each slice. Some of this information could

    have been generated if we had clicked on the Titles button in the Define Piedialog box.

    A more complete set of options for improving the presentation of graphs in

    SPSS is available if we simply double-click on the graph in the Viewer

    window. This brings up the graph in a new SPSS window called the SPSS

    Chart Editor (Figure 5.4).

  • 7/29/2019 GraphsVersion10-11gg

    4/20

    The graphical description of data: SPSS v. 10-11 95

    Figure 5.3 SPSS Pie Graph output

    Figure 5.4 The SPSS Chart Editor

    From the menu bar at the top (or the tool bar below it) we have a range of

    options for improving the quality of our graph. Here we simply want to attach a

    title to the pie graph, and to provide appropriate labels to the slices. I leave it to

    you to explore the full range of options available under the Chart Editor. To

    attach a title and labels to the graph we use the Chart/Title, Chart/Options,

    and Chart/Footnotes commands.

  • 7/29/2019 GraphsVersion10-11gg

    5/20

    96 Statistics for Research

    Figure 5.5 The Chart command

    The Title command allows us to type in various levels of titles. Here ourneeds are simple so we type in Sex of respondents as Title 1.

    The Options command allows us to change the information that isattached to each slice of the pie. We can see in Figure 5.3 that the default

    setting is for the text labels (in this example male and female)

    represented by each slice to be included. If we click on the boxes next toValues and Percents we will also generate the absolute and relative

    frequencies for each slice displayed, along with the text label. You may

    also wish to click on the Format button and set the number of decimal

    places to 0.

    Using the Footnote command we can type Source: Hypothetical surveyfor Footnote 1 and n=20 for Footnote 2 to indicate respectively the

    source of data and the total number of cases represented by the graph. We

    can also right-justify these footnotes under Footnote Justification.

    If we follow these commands Figure 5.6 will appear.

    Figure 5.6 SPSS pie chart

  • 7/29/2019 GraphsVersion10-11gg

    6/20

    The graphical description of data: SPSS v. 10-11 97

    We can see that the basic chart is still the same, but now we have all the

    relevant information as well. Female respondents represent 8 out of 20 students,

    so the slice representing the frequency of female respondents is 8/20ths of the

    circle (40%). Male respondents represent 12 out of 20 cases (60%).

    Now that we have the pie chart in front of us with all the necessary

    information what does it tell us about the distribution of the variable we are

    investigating? Pie graphs emphasize the relative importance of a particular

    category to the total. They are therefore mainly used to highlight distributions

    where cases are concentrated in only one or two categories. For example,

    Figure 5.7 illustrates the distribution of students by age, and clearly reflects the

    large proportion of total cases that are 18 years of age.

    Figure 5.7 SPSS pie chart with relative frequencies

    Pie graphs begin to look a bit clumsy when there are too many categories for

    the variable (about six or more categories, as a rule of thumb, is usually too

    many for a pie graph). If we look at the pie graph for the age distribution of

    students, we can see that if there were any more slices to the pie the chart will

    begin to look a little cluttered.

    A pie chart is actually a specific case of a more general type of graph. Any

    shape can be used to represent the total number of cases, and the area within itdivided to show the relative number of cases in various categories. For

    example, the United Nations has used a champagne glass to illustrate the

    distribution of world income (Figure 5.8). Rather than using a simple circle, the

    metaphor of the champagne glass, a symbol of wealth, makes this graph a

    powerful illustrative device for showing the inequality of world income.

  • 7/29/2019 GraphsVersion10-11gg

    7/20

    98 Statistics for Research

    Figure 5.8 World distribution of income

    Source: United Nations Development Program (1992) Human Development Report

    1992, Oxford: Oxford University Press.

    Bar graphs

    Another type of graph that can be produced from the same set of data as that

    used to generate a pie chart is a bar graph. While these two types of graph can

    be generated from exactly the same set of data, they describe different aspects

    of the distribution of that data. We have noted that a pie graph emphasizes the

    relative contribution of the number of cases in each category to the total. On the

    other hand,bar graphs emphasize the frequency of cases in each category

    relative to each other.

    A bar graph has two sides or axes:

    Along one side, usually the horizontal base, of the graph are the values ofthe variable. This axis of the bar graph is called the abscissa.

    Along the left vertical side of the graph are the frequencies, expressedeither as the raw count or as percentages of the total number of cases. This

    vertical axis is known as the ordinate.

    To generate a bar graph on SPSS we choose the Graphs/Bar command from

    the menu bar. This brings up the Bar Charts dialog box (Figure 5.9).

  • 7/29/2019 GraphsVersion10-11gg

    8/20

    The graphical description of data: SPSS v. 10-11 99

    Figure 5.9 The Bar Charts dialog box

    From this box we choose one of three types of bar charts, each of which we

    will explore separately:

    Simple bar graphs Clustered bar graphs Stacked bar graphs

    Simple bar graphs

    The default setting in the SPSS Bar Charts dialog box (Figure 5.9) is for a

    simple bar chart to be generated. This is evident from the dark border around

    the Simple button.

    If we click on Define the Define Simple Bar: dialog box appears, which is

    almost identical to the Pie Graph dialog box we came across in the previous

    section. In this dialog box we select the variable we are investigating from the

    source variable list, which in this instance is Sex of respondents, and click on

    OK. This will generate a bar graph in the Viewer window.

    As with the pie graph we generated earlier we can tidy up this basic graph by

    going into the Chart Editor (double-click on the graph). I have generated a bar

    chart and edited to add information about the source of data and the sample

    size, and to improve its appearance (Figure 5. 10).

  • 7/29/2019 GraphsVersion10-11gg

    9/20

    100 Statistics for Research

    Figure 5.10 An SPSS bar chart

    Again I emphasize that SPSS provides many options for improving the layout

    and appearance of a graph and the amount of information it contains. Often

    these choices (such as the wording and placement of titles) are matters of

    aesthetic judgment so you need not follow my preferences exactly. Play around

    with the options so that you produce something that you like, but always keep

    in mind the general rules of graph construction that we discussed above.

    Stacked bar graphs

    An alternative way of presenting information is to stack the categories on top of

    each other, rather than side by side.

    A stacked bar graph (sometimes called a component bar graph) layers in a

    single bar the number (or proportion) of cases in each category of a distribution.

    Each bar is divided into layers, with the area of each layer proportional to the

    frequency of the category it represents. It is very similar in this respect to a pie

    chart, but using a rectangle rather than a circle.

    One advantage of a stacked bar graph is that a number of distributions can be

    combined into a single chart. Such a graph is especially helpful in comparing

    cases broken down by the categories of another variable, such as distributions

    over time. For example, a health assessor may want to compare the current

    health rating of respondents with their ratings for the previous 2 years, the data

    for which are given in Table 5.5.

  • 7/29/2019 GraphsVersion10-11gg

    10/20

    The graphical description of data: SPSS v. 10-11 101

    Table 5.5 Health ratings of respondents, 19971999

    Health rating 1997 1998 1999

    Very healthy 3 6 8

    Healthy 9 6 5Unhealthy 15 11 7

    Total 27 23 20

    To generate a stacked bar graph in SPSS we click on the Stacked button from

    the Bar Charts dialog box (Figure 5.9). I have entered the data from Table 5.5

    in a separate SPSS file, with the year of the survey as a separate category axis

    variable. This generates the stacked bar graph in Figure 5.11.

    Figure 5.11 SPSS stacked bar chart output

    We can immediately see that, relatively speaking, these people regard

    themselves as getting healthier. Notice that we are using percentages in this

    graph to standardize the distributions, since the totals for each year vary.

    Stacked bar graphs have the same limitations as pie graphs, especially when

    there are too many values stacked on top of each other. Figure 5.11 contains

    only three values stacked on top of each other to form each bar, but when there

    are more than five categories, and some of the layers are very small, the extra

    detail hinders the comparison of distributions that we want to make, rather than

    illuminating it.

    Clustered bar graphs

    An alternative to a stacked bar chart is to place the bars for each category side

    by side along the horizontal. This is called a clustered bar graph.

  • 7/29/2019 GraphsVersion10-11gg

    11/20

    102 Statistics for Research

    A clustered bar graph displays, for each category of one variable, the

    distribution of cases across the categories of another variable.

    Figure 5.12 presents a clustered bar graph for health rating across years.

    Figure 5.12 An SPSS clustered bar chart

    Again I can see the relative decrease in the proportion of unhealthy

    respondents and the gradual increase in the proportion of very healthy

    respondents.

    Histograms

    Bar graphs constructed for discrete variables always have gaps between each of

    the bars: there is no gradation between male and female, for example. A

    persons age, on the other hand, is a continuous variable, in the sense that it

    progressively increases. Even though we have chosen discrete values (whole

    years) for age to organize the data, the variable itself actually increases in a

    continuous way (as we discussed in Chapter 1). As a result, the bars on a

    histogram, unlike a bar graph, are pushed together so they touch. The

    individual values (or class mid-points if we are working with class intervals) are

    displayed along the horizontal, and a rectangle is erected over each point. The

    width of each rectangle extends half the distance to the values on either side.The area of each rectangle is proportional to the frequency of the class in the

    overall distribution.

    Using the example of the age distribution of students we can construct the

    following histogram. We select Graphs/Histogram from the SPSS menu

    which brings up the Histogram dialog box (Figure 5.13).

  • 7/29/2019 GraphsVersion10-11gg

    12/20

    The graphical description of data: SPSS v. 10-11 103

    Figure 5.13 The Histogram dialog box

    This dialog box is one of the more straightforward ones that appear in SPSS.

    We basically select the relevant variable from the source list and transfer it to

    the target list. (Note for the next chapter, though, the option to Display normalcurve.) Selecting Age in years and clicking on OK will produce the histogram

    in Figure 5.14.

    Figure 5.14 SPSS Histogram output

    We can see that the area for the rectangle over 18 years of age, as a proportionof the total area under the histogram, is equal to the proportion of all cases

    having this age. The histogram indicates that we are working with a continuous

    variable whereby we have clustered cases to the nearest whole year, so that a

    score of 18 years of age gathers together cases that are actually between 17.5

    and 18.5 years, a score of 19 years of age gathers together cases that are

  • 7/29/2019 GraphsVersion10-11gg

    13/20

    104 Statistics for Research

    actually 18.519.5 years, and so on. SPSS has also generated some other

    descriptive statistics: the standard deviation, mean, and total number of cases,

    which conform to our calculations in Chapter 4.

    Frequency polygons

    When working with continuous interval/ratio data, we can also represent the

    distribution in the form of a frequency polygon.

    A frequency polygon is a continuous line formed by plotting the values or

    class mid-points in a distribution against the frequency for each value or class.

    If we place a dot on the top and center of each bar in a histogram, and connect

    the dots, we produce a frequency polygon (Figure 5.15). To illustrate this pointI have not used SPSS to generate a polygon (which SPSS calls a line graph) for

    the age distribution of respondents, since we cannot superimpose a polygon

    onto the equivalent histogram in SPSS.

    Figure 5.15 A frequency polygon

    Notice that the polygon begins and ends on a frequency of zero. To explain

    why we close the polygon in this way we can think of this distribution as

    having seven values for age, rather than five, by including the ages 17 and 23.

    Since there are no such respondents with either of these ages the dot at these

    values is on the abscissa, indicating a frequency of zero.There are certain common shapes to polygons that appear in research. For

    example, Figure 5.16 illustrates the bell-shaped (symmetric) curve, the J-shaped

    curve, and the U-shaped curve. The bell-shaped curve is one we will explore in

    much greater length in later chapters.

  • 7/29/2019 GraphsVersion10-11gg

    14/20

    The graphical description of data: SPSS v. 10-11 105

    Figure 5.16 Three polygon shapes: (a) a bell-shaped curve; (b) a J-shaped curve; and (c)

    a U-shaped curve

    For polygons that have a bell shape such as that in Figure 5.16(a) we usually

    describe two aspects of it. The first is the skewness of the distribution, an issue

    we raised in Chapter 4. If the curve has a long tail to the left, it is negatively

    skewed, whereas if it has a long tail to the right it is said to be positively

    skewed. The second important aspect of a bell-shaped curve is kurtosis.Kurtosis refers to the degree of peakedness or flatness of the curve. We can

    see in Figure 5.17(a) a curve with most cases clustered closed together. Such a

    curve is referred to as leptokurtic, while curve (b) that has a wide distribution

    of scores which fatten the tails is referred to as platykurtic. Curve (b) lies

    somewhere in between, and is called mesokurtic.

    Figure 5.17 A (a) leptokurtic, (b) platykurtic, and (c) mesokurtic curve

    There are precise numerical techniques for measuring the skewness and

    kurtosis of a distribution, but these are beyond the scope of this book.

    Wherever necessary we will simply rely on a visual inspection of a polygon to

    describe these aspects of the distribution.

    One aspect of frequency polygons (and histograms) that will be of utmost

    importance in later chapters needs to be pointed out, even though its relevance

    may not be immediately obvious. A polygon is constructed in such a way that

    the area under the curve between any two points on the horizontal, as aproportion of the total area, is equal to the proportion of cases in the

    distribution that have that range of values. Thus the shaded area in Figure 5.18,

    as a proportion of the total area under the curve, is equal to the proportion of

    cases that are aged 21 or above. This is 4/20, or 0.2 of all cases.

  • 7/29/2019 GraphsVersion10-11gg

    15/20

    106 Statistics for Research

    Figure 5.18

    Another way of looking at this is to say that the probability of randomly

    selecting a respondent aged 21 years or more from this group is equal to the

    proportion of the total area under the curve that is shaded. This probability is

    0.2, or a 1-in-5 chance of selecting a respondent in this age group.

    Common problems and misuses of graphs

    Unfortunately graphs lend themselves to considerable misuse. Practically every

    day in a newspaper we can find one (if not all) of the following tricks, which

    can give a misleading impression of the data. For those wishing to look at this

    issue further the starting point is Darrell Huffs cheap, readable, and

    entertaining classic, How to Lie with Statistics (New York: W.W. Norton,various printings).

    Relative size of axes

    The same data can give different graphical pictures depending on the relative

    sizes of the two axes. By stretching either the abscissa or the ordinate the graph

    can be flattened or peaked depending on the impression that we might desire

    to convey. Consider for example the data represented in Figure 5.19.

    This provides two alternative ways of presenting the same data. The data are

    for a hypothetical example of the percentage of respondents who gave a certain

    answer to a survey item over a number of years. Graph (a) has the ordinate (the

    vertical) compressed relative to graph (b); similarly graph (b) has the abscissa(the horizontal) relatively more compressed. The effect is to make the data

    seem much smoother in graph (a) the rise and fall in responses is not so

    dramatic. In graph (b) on the other hand, there is a very sharp rise and then fall,

    making the change appear dramatic. Yet the two pictures describe the same

    data!

  • 7/29/2019 GraphsVersion10-11gg

    16/20

    The graphical description of data: SPSS v. 10-11 107

    Figure 5.19 Two ways of presenting data

    In order to avoid such distortions the convention in research is to construct

    graphs, wherever possible, such that the vertical axis is around two-thirds to

    three-quarters the length of the horizontal.

  • 7/29/2019 GraphsVersion10-11gg

    17/20

    108 Statistics for Research

    Truncation of the ordinate

    A similar effect to the one just described can be achieved by cutting out a

    section of the ordinate. Consider the data on the relative pay of women to men

    in Australia between 1950 and 1975, shown in Figure 5.20.

    Figure 5.20 Womens income as a percentage of mens income, 19501975 presented in

    two different ways

  • 7/29/2019 GraphsVersion10-11gg

    18/20

    The graphical description of data: SPSS v. 10-11 109

    Graph (a) is complete in the sense that the vertical axis goes from 0 through to

    100 percent. The vertical axis in graph (b) though has been truncated: a large

    section of the scale between 0 percent and 50 percent has been removed and

    replaced by a squiggly line to indicate the truncation. The effect is obvious: the

    improvement in womens relative pay after 1965 seems very dramatic, as

    opposed to the slight increase reflected in graph (a).

    Selection of the start and end points of the abscissa

    This is especially relevant with data that describe a pattern of change over time.

    Varying the date at which to begin the data series and to end it can give

    different impressions. For example, we may begin in a year in which the results

    were unusually low, thereby giving the impression of increase over subsequent

    years, and vice versa if we choose to begin with a year in which results

    peaked. Consider, for example, the time series of data over a 15 year periodshown in Figure 5.21.

    Figure 5.21

    If we wanted to emphasize steady growth we could begin the graph where the

    dashed is displayed, which would completely transform the image presented by

    the graph.

    Exercises

    5.1 Which aspects of a distribution do pie graphs emphasize and whichaspects do bar graphs emphasize?

  • 7/29/2019 GraphsVersion10-11gg

    19/20

    110 Statistics for Research

    5.2 Explain the difference between a bar graph and a histogram.5.3 Survey your own statistics class in terms of the variables age, sex, and

    health rating. Use the graphing techniques outlined in this chapter to

    describe your results.

    5.4 In Exercise 2.2 the following data were entered into SPSS (time, in

    minutes, taken for subjects in a fitness trial to complete a certain

    exercise task):

    31 39 45 26 23 56 45 80

    35 37 25 42 32 58 80 71

    19 16 56 21 34 36 10 38

    12 48 38 37 39 42 27 39

    17 31 56 28 40 82 27 37

    Using SPSS select an appropriate graphing technique to illustrate the

    distribution. Justify your choice of technique against the other

    available options.

    5.5 Consider the following list of prices, in whole dollars, for 20 used cars:

    8600 9200 8200 11,300 10,600

    7980 11,100 12,900 10,750 9200

    13,630 9400 11,800 10,200 12,240

    11,670 10,000 11,250 12,750 12,990

    From these data:

    (a) Construct a frequency table using class intervals:

    70008499, 85009999, 1000011499, 1150012999,

    1300014499.

    (b) Construct a histogram using these class intervals.

    (c) Construct a frequency polygon using these class intervals.

    5.6 Construct a pie graph to describe the following data:

    Migrants in local area, place of origin

    Place Number

    Asia 900

    Africa 1200

    Europe 2100South America 1500

    Other 300

    Total 6000

    What feature of this distribution does your pie graph mainly illustrate?

  • 7/29/2019 GraphsVersion10-11gg

    20/20

    The graphical description of data: SPSS v. 10-11 111

    5.7 From a recent newspaper or magazine find examples of the use of

    graphs. Do these examples follow the rules outlined in this chapter?

    5.8 Use the Employee data file to answer the following problems with the

    aid of SPSS.

    (a) I want to emphasize the high proportion of all cases that have

    clerical positions. Which graph should I generate and why?

    Generate this graph using SPSS, and add necessary titles and

    notes.

    (b) Use a stacked bar graph to show the number of women and the

    number of men employed in each employment category. What

    does this indicate about the sexual division of labor in this

    company?

    (c) Generate a histogram to display the distribution of scores for

    current salary. How would you describe this distribution in terms

    of skewness?