SAS Lecture 6 – SAS/GRAPH Aidan McDermott, May 3, 2005.

Post on 22-Dec-2015

219 views 0 download

Tags:

Transcript of SAS Lecture 6 – SAS/GRAPH Aidan McDermott, May 3, 2005.

SAS Lecture 6 – SAS/GRAPH

Aidan McDermott,May 3, 2005

2

SAS/GRAPH

There are a small number of graphic types commonly used in public health presentations and publication.

These basic types are either used alone or mixed together to form a composite graphic.

Here we will look at how to build some of these basic types of graph.

Golden Rule: Everybody is a graph critic.

3

Two types of graph maker If you are using SAS for statistics and data management then it

seems natural to use it to produce your graphs as well. Sometimes a statistical procedure will produce the graph you are looking for anyway.

Need a one-off graph for a presentation versus production line graphs.

To produce “quick and dirty” graphs you can use Graph-n-go.Very easy to use; not bad for putting multiple graphs on one page; data

viewer is a graph type; only a small number of graph types available; not all options available; labor intensive so not suitable for production line graphs.

Use SAS/Graph proceduresVery flexible; complete control over graphic elements; less labor intensive

in the long run; harder to learn; same control can be used for SAS/STAT graphics output.

4

Some common types of graphChartsHistogramsStem and leaf plotsBoxplotsPlotsContour plots / 3-dimensional plotsMapsGantt chartsTrellis plots Trees / pedigrees / dendograms (mathematical) graphs / networksFlow charts / entity-relationship diagrams

6

Graph-n-goSolutions reporting graph-n-goThe top two icons represent data modelsThe rest are data viewers.

7

Graph-n-goChoose and configure a data model.

Choose a dataset.Right mouse

button click on the data model and choose properties.

Set which columns to use, where clauses etc.

8

Graph-n-go Choose a viewer and position it on the viewer

area (e.g. a bar chart).Drag and drop the

data model onto the viewer to associate data with the viewer.

Right mouse button on the viewer and choose properties.

Configure (choose variables to plot etc).

9

Graph-n-go When finished graph can be exported to html etc.

Choose file export write to file

You’ll see more in the lab.

10

Graphic output within SAS

• You have already seen some graphic output from within SAS.

• proc means, proc univariate, proc genmod, proc lifetest etc. all produce graphs

• Other procedures in SAS specifically produce graphs, even some procedures that are not part of SAS/Graph (proc boxplot is an example)Here our aim is to produce

publication/presentation-- quality graphs.

11

Graph basics

SAS stores graphs in catalogs (an entity similar to a folder in windows).

Graphs are stored in a SAS proprietary format.By default graphs are stored in a catalog called

Gseg in the work library.Graphs can be translated to postscript, gif, jpeg, and

a number of other commonly used formats for printing or including in other documents (Word, html, etc.).

12

Graphic control

There are three ways to control the look of a sas/graph.

1. Use options within the procedure

2. Use global commands

3. Use goptions

13

GOPTIONS set the environment for a graphics program to

run and send output

independent of the program

remain in effect for the entire SAS session unless changed or reset

control appearance of graphic elements by specifying default fonts, colors, text heights etc.

Useful when you want the same options in multiple procs

14

PROC GOPTIONS

used to review current GOPTIONSlists alphabetically all of the current

GOPTIONS in the LOG window

proc goptions;run;

Can also type goptions at the command line

15

GOPTIONSGOPTIONS options-list ROTATE= portrait or landscape (will override the setting in the print dialog

box)

RESET=ALL resets all options to defaults including all global statements

RESET=GOPTIONS resets only goptions statements

16

COLORS=device dependent default color list for device driver

GUNIT= unit of measurement for height in global statements, such as TITLE and FOOTNOTE

cell - character cells pct - percent of graphics area in - inches

17

Data• From the SAS samples folder.• Three Californian pollutant monitoring

stations (AZU, LIV, SFO)

• One monthly measurement (taken on the 15th of the month) for CO, O3, SO4, temperature etc. for each station. 36 observations in all

• Month is a numeric variable taking the value 1 for January, 2 for February, etc.

18

Californian Air pollutant Data – ca88air

19

Charts

• Examples

Look for graphic elements in each chart

Look for common data types

Look for similarities among the examples

20

21

22

23

24

25

26

27

Charts• All the examples used a small number of

graphic elements• Main difference between plots is the

polygon/area type• Most involved a categorical/discrete

variable and a numeric variable. A histogram uses a continuous variable to

create categories. The counts of a categorical variable can be used to create the numeric variable.

28

Proc GCHART

produces charts based on the values of one or more chart variables.

produces vertical and horizontal bar charts, block charts, pie charts etc.

graphs based on statistics - counts, percentages, sums, or means

run-group processing

numeric and character variables

29

Proc GCHART example proc format; value seas 1 = ‘Win’ 2 = ‘Spr’

3 = ‘Sum’ 4 = ‘Fal’;

data ca88air; set vol1.ca88air(where=(station=“SFO”));

if ( month in (12,1,2) ) then season = 1; else if ( month in (3,4,5) ) then season = 2; else if ( month in (6,7,8) ) then season = 3; else if ( month in (9,10,11)) then season = 4;

format season seas.; format month mth.; run;

30

Proc GCHART example title1 h=4 ’Mean seasonal carbon monoxide for station

SFO’; footnote j=l h=4 f=simplex 'Bar Chart - vertical’;

proc gchart data=ca88air; vbar season / sumvar=co type=mean discrete ctext=black clm=95 ; run; quit;

31

32

Proc GCHART syntax

PROC GCHART data=data set name;

One of the following:

VBAR variables / options;

HBAR variables / options; STAR variables / options; PIE variables / options; BLOCK variables / options;

run;

33

VBARseparate bar chart for each chart

variable

each bar represents the statistic selected for a value of the chart variable

response axis (vertical) provides a scale for statistic graphed

midpoint axis - horizontal axis

34

VBAR SYNTAX

VBAR chart variables/ options;

chart-variable(s) specifies one or more variables that define the categories of data to chart.

optionsspecifies appearance, statistics, axes and midpoint options

35

VBAR

midpoints are the values of the chart variable that identify categories of data. By default, midpoints are selected or calculated by the procedure. The way the procedure handles the midpoints depends on whether the values of the chart variable are character, discrete numeric, or continuous numeric.

character chart variables- separate bar is drawn for each value

36

VBAR numeric chart variables - each bar represents

a range of values - DISCRETE option generates a midpoint

for each unique value of the chart variable.

- generates midpoints that represent ranges of values. By default, determines the ranges, calculates the median value of each range, and displays the median value at each midpoint on the chart. A value that falls exactly halfway between two midpoints is placed in the higher range.

37

VBAR OPTIONS

For character or discrete numeric values, you can use the MIDPOINTS= option to rearrange the midpoints or to exclude midpoints from the chart.

For character dataMIDPOINTS= list values in quotesMIDPOINTS=‘Sydney’ ‘Atlanta’ ‘Paris’

38

VBAR OPTIONS For continuous numeric variables, use the

MIDPOINTS= option to change the number of midpoints, to control the range of values each midpoint represents, or to change the order of the midpoints. To control the range of values each midpoint represents, use the MIDPOINTS= option to specify the median value of each range. For example, to select the ranges 20-29, 30-39, and 40-49, specify

MIDPOINTS=25 35 45

39

VBAR OPTIONS

Other options;

DISCRETE separate bar for each value of numeric variable

TYPE=statistic specifies the chart statistic.

FREQ frequency

PCT percentage

SUM sum (the default)

MEAN mean

CLM=confidence-level draws chart confidence intervals (error bars)

40

VBAR SYNTAX

SUMVAR=variablespecifies variable to used for sum or mean calculations for each midpoint. The resulting statistics are represented by

the length of the bars along the response axis, and they are displayed at major tick marks. REQUIRED if specifying TYPE-MEAN or SUM.

RAXIS= axisn response axis MAXIS=axisn midpoint axis

41

GLOBAL STATEMENTS

define titles, footnotes

used to control axes, symbols, patterns, and legends

can be defined anywhere inside a proc or before a proc

in effect until canceled, replaced, or the end of SAS session

cancel by repeating statement with no options or using

goptions RESET=ALL;

42

GLOBAL STATEMENTS

TITLE defines titles

AXIS defines appearance of axes

FOOTNOTE defines footnotes

PATTERN defines patterns used in graphs (histograms)

LEGEND defines legends

SYMBOL defines symbols (plotting) NOTE adds text to graph

43

TITLE STATEMENT

creates, changes or cancels a title for all subsequent graphics output in a SAS session

allowed up to 10 titles keyword TITLE can be followed by

unlimited number of text strings and options

text strings enclosed in single or double quotes

most recently created TITLE number replaces the previous TITLE of the same number

44

Title syntax

TITLE<1,2....10> <options | ‘text’> ...... <options-n>| ‘text-n’>;Options: FONT=font specifies the font for the

subsequent text.

HEIGHT= specifies the height of text H=n<units> characters in number of units

JUSTIFY= specifies the alignment J=R|L|C By default, JUSTIFY=C=center

R=right L=left.

45

PATTERN STATEMENT

defines the characteristics of patterns used in charts

type of fill pattern - solid, empty, lined color

An example of a global statement

46

PATTERN STATEMENT

PATTERN <1....99> options;

OPTIONS COLOR= pattern color

VALUE= fill E empty S solid Ln left slanting lines Rn right slanting lines Xn crosshatched lines where n is 1-5 1 indicating the lightest

47

Proc GCHART example

pattern1 color=blue value=fill; pattern2 color=red value=fill;

proc gchart data=ca88air; star month / sumvar=co type=mean discrete ctext=black noheading ; run; quit;

48

49

Exporting graphs

Make sure the graphics window has focus, by clicking on it.

File export as Image select type of image – gif, … open other software program – Powerpoint insert picture

50

Graphs can also be saved in a SAS catalog. They are stored in a SAS proprietary format. They can be viewed with proc greplay.

goptions replace;libname mylib ‘c:\Temp\sasclass\myfiles’;proc gchart data=mydat gout=lib.mygraphs;…

proc greplay allows multiple plots on one page.

Saving graphs

51

PROC GPLOTgraphs one variable against another

producing presentation quality plots

coordinates of each point correspond to the values in one or more observations of the input data set.

run-group processing- procedure does not end with a run- submit new statements and produce

more graphs without another PROC- ends with QUIT or PROC or DATA

52

Proc GPLOT

produces two-dimensional graphs that plot one variable against another within a set of coordinate axes

graphs are automatically scaled to the values of your data, although scaling can be controlled with options or with AXIS statements.

scatterplots, bubble plots plots, plots with interpolated lines (SYMBOL statement)

53

2 4 6 8

10

Tick Marks

Values

VERTICAL AXIS Y variable

H O R IZ O N T A L A X IS X va r iab le

20

54

GPLOT SYNTAX

PROC GPLOT data=data-set-name <options>;

PLOT request list </options list>;

request list is of the form:

vertical*horizontal e.g. PLOT y*x;

vertical*horizontal=variable e.g. PLOT y*x=z;

55

Graphics options on PLOT statement

CTEXT= color LEGEND= LEGENDn

(uses nth global LEGEND statement)

HAXIS=AXISn (uses nth global AXIS statement)

VAXIS=AXISn (uses nth global AXIS statement)

GPLOT SYNTAX

56

Proc GPLOT example

• Suppose we are asked to draw a plot of ozone by month for the three stations SFO, LIV, AZU. After consulting the help we might try:

proc gplot data=ca88air; plot o3 * month; run; quit;

which produces:

57

58

Proc GPLOT example• Increase the size of the text• use a format to print out Month names• clear the unwanted footnote

GOPTIONS gunits=pct htext=4; footnote1;

proc gplot data=ca88air; plot o3 * month ; format month mth.; title1 '1988 Air Quality Data - Ozone'; run;

59

60

Proc GPLOT example

• back to the help• you can make a stratified plot by station• x axis too crowded - use a different format

proc gplot data=ca88air; plot o3 * month = station; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;

61

62

Proc GPLOT example

• the symbols in the plot are too small• use symbol global statements!

symbol1 v=dot i=join c=blue h=1.3; symbol2 v=dot i=join c=green h=1.3; symbol3 v=dot i=join c=brown h=1.3;

proc gplot data=ca88air; plot o3 * month = station; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;

63

64

Proc GPLOT exampleThe x-axis is not right - use an axis global statement

axis1 minor = none label = (f=simplex j=c 'Ozone levels at three locations') major = (h=1.1) order = (0 to 13 by 1) value = (f=simplex h=3.0);

proc gplot data=ca88air; plot o3 * month = station / haxis=axis1; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;

65

66

Proc GPLOT example• The x-axis has extra characters - use a new format

or use an axis global statement• y-axis label need to be rotated and placed in

center of axis• legend needs moving - legend global command

axis1 minor = none label = (f=centb j=c 'Ozone levels at three locations') major = (h=1.0) order = (0 to 13 by 1) value = (f=simplex h=3.0 " " "J" "F" "M" "A" "M" "J” "J" "A" "S" "O" "N" "D" " ");

67

Proc GPLOT example axis2 label = (f=centb rotate=0 angle=90 j=c

'Ozone') value = (f=simplex h=3.0) ;

legend1 across=3 position=(bottom center inside) label=none; proc gplot data=ca88air; plot o3 * month = station / haxis=axis1

vaxis=axis2; format month mthc.; title1 '1988 Air Quality Data - Ozone'; run;

68

69

proc g3d and proc contour produce 3-dimensional analogs of gplot

70

Maps• You can use proc gmaps to make simple

presentation maps• There is another product by SAS called

SAS/GIS - i.e. SAS / geographical information system

71

72

Data• taken from the CDC web page• AIDS prevalence during 1997-1998

• rate is given for each state per 100,000 of population

• state is given by name and two letter code

• map data is provided by SAS in the library maps -- the map we will use is maps.us

• if you look in the maps library you will see data for maps for most countries and world maps

73

Data• this data uses FIPS coding to match geographic

boundries e.g. the fips coding for Alaska is 02 and Maryland is 24

• We need to join the AIDS data and the FIPS codes in order to map the data

proc sort data=aids; by name;proc sort data=state; by name;

data join; merge aids(in=inaids) state(in=instate); by name;

if inaids and instate then output join;run;

74

Proc GMAP

• proc gmap is used to create a number of different types of map

• the map we will be interested in is a choropleth map -- this is a map in which the rates will be color-coded by state.

• such a map shares many of the properties of a chart, particulary a pie or star chart -- both use areas to represent information, but in the case of the choropleth map the color/shading contains the display information

75

Proc GMAP

• First we set up some global title and footnote statements:

title1 color=blue font=centb "Acquired immunodeficiency syndrome (AIDS) by

state" ; title2 font=cent "(per 100,000 of population)" ; title3 font=cent "12 months ending June, 1998" ;

footnote1 color=green justify=left " Choropleth Map";

76

Proc GMAP• the syntax of proc gmap is like other graphic

procedures we have met, but it specifically requires:– a map dataset (maps.us in this case)– an id variable which is present in both the map

dataset and the dataset we wish to map (in this case the variable state is in both datasets and contains the fips code)

– the syntax is: proc gmap map=map data=data; id idvar; choro rate / options; run;

77

Proc GMAP title1 color=blue font=centb "Acquired immunodeficiency syndrome (AIDS) by

state" ; title2 font=cent "(per 100,000 of population)" ; title3 font=cent "12 months ending June, 1998" ;

footnote1 color=green justify=left " Choropleth Map"; proc gmap map=maps.us data=join; id state; choro rate / coutline=black midpoints=5.0 10.0 15.0 20.0 25.0 35.0 ; run;

78

79

Proc GMAP

Instead of a choropleth map, you could also make a surface map. For example:

proc gmap map=maps.us data=join; id state; surface rate / constant=20 cbody=red

nlines=100; footnote1 color=green justify=left " Surface

Map"; run;

80

81

defines appearance and location of axes and tick marks

defines text and appearance of axis label

defines order of data values on axis

99 active AXIS statements in a SAS session

Syntax: AXIS<1...99> <option(s)>;

Axis statement

82

ORDER=(value list)specifies the data values in the order they are to appear on the axis. The values specified by ORDER= are the major tick mark values. These values are displayed at the major tick marks unless they are modified by the VALUE= option.

Examples:

ORDER=(10 to 50 by 10)ORDER=(10,20,30,40,50)

Axis statement options

83

LABEL= (text description ‘text string’); By default, the text of the axis label is either the

variable name or a previously assigned variable label. Enclose each string in quotation marks.

COLOR=text-color ANGLE=degrees FONT=font | NONE HEIGHT=text-height <units>JUSTIFY=LEFT | CENTER | RIGHT

Example: Label= (font=swissb color=blue j=l a=90

‘Systolic BP mmHG’) ;

Axis statement options

84

VALUE=(text description1 ‘text’ ... text descriptionn ‘textn’);

modifies the major tick mark values , that is, the text that labels the major tick marks on the axis. Text-description defines the appearance and ‘text’ is the text of a major tick mark value.

COLOR=text-color ANGLE=degrees FONT=font | NONE HEIGHT=text-height <units>JUSTIFY=LEFT | CENTER | RIGHT

Axis statement options

85

specifies symbols in GPLOT

defines appearance of symbols, plot lines, including bars, boxes, confidence limits, and area fills

interpolation methods

Symbol statement

86

SYMBOL<1....99> options;

COLOR = symbol color FONT= font HEIGHT= n <units> INTERPOL = R<type> =STEP ( for KM plots) =BOX VALUE= symbol WIDTH=n