Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures...

93
Week 7: More graphics and SAS/BASE procedures SAS Programming October 2, 2014 1 / 93

Transcript of Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures...

Page 1: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Week 7: More graphics and SAS/BASE procedures

SAS Programming October 2, 2014 1 / 93

Page 2: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: crime data

You can make paneled graphics using SGPANEL. This allows manysubfigures in the same plot. For example, for the crime data, you can plotcrime against population for each state.

SAS Programming October 2, 2014 2 / 93

Page 3: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics

Note that you can specify the dimensions of the array, for example 6x6 or5x7 etc. SAS does a good job at not having white space between figuresand utilizing the same axes for all rows and columns, which saves space.

SAS Programming October 2, 2014 3 / 93

Page 4: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: UNISCALE option

SAS Programming October 2, 2014 4 / 93

Page 5: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: Earthquake example

Here is a 1x3 arrangement, but it stretches the y-axis.

SAS Programming October 2, 2014 5 / 93

Page 6: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: Earthquake example

Making a 1x3 arragement not stretched. If you have a better solution, letme know!

SAS Programming October 2, 2014 6 / 93

Page 7: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: Earthquake example

You can also have two BY variables for paneled graphics. It fills in eachcombination of the BY variable.

SAS Programming October 2, 2014 7 / 93

Page 8: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: Earthquake example

Some combinations of the BY variables can be empty. In this case thereare 19 nonempty plots.

SAS Programming October 2, 2014 8 / 93

Page 9: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Paneled graphics: Earthquake example

Some combinations of the BY variables can be empty. In this case thereare 19 nonempty plots.

SAS Programming October 2, 2014 9 / 93

Page 10: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Peneled graphics: Earthquake exmaple

There are a several options for how to present the data. You could usepanelby day eventtype instead. You can use an option to skip emptypanels. And you can use the layout option to control other aspects howthe panels are done. The layout=lattice option makes the rows andcolumns.

SAS Programming October 2, 2014 10 / 93

Page 11: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Scatterplot matrix: Earthquake data

(I used linux because SAS Studio had a bad connection...)

SAS Programming October 2, 2014 11 / 93

Page 12: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Scatterplot matrix: SGSCATTER for Earthquake data

22:12 Monday, September 29, 2014 122:12 Monday, September 29, 2014 1

SAS Programming October 2, 2014 12 / 93

Page 13: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

SGSCATTER subfigures: Earthquake data

You can also create arrays of plots using plot statements withinSGSCATTER, but these take up more space outside the plotting area thanSGPANEL.

SAS Programming October 2, 2014 13 / 93

Page 14: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

SGSCATTER subfigures: Earthquake data

22:25 Monday, September 29, 2014 122:25 Monday, September 29, 2014 1

SAS Programming October 2, 2014 14 / 93

Page 15: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

3D plots: Earthquake data

SAS Programming October 2, 2014 15 / 93

Page 16: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

3D Plots: Earthquake data

SAS Programming October 2, 2014 16 / 93

Page 17: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

3D Plots: Earthquake data

There are options for tilting and rotating.

SAS Programming October 2, 2014 17 / 93

Page 18: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

3D Plots

There’s a lot more that you can do with 3D plots that I haven’t explored.You can create grid lines and evaluate a function of two variables at allpoints on the grid and create a surface plot.To do something like this with data that isn’t evenly spaced, you can usePROC KDE which gives kernel density estimates for the value of thesurface and use output from this procedure to generate 3D plots.

SAS Programming October 2, 2014 18 / 93

Page 19: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Some basic SAS procedures

We’ll look at some basic SAS procedures useful for examining andsummarizing data in a little more depth. In particular are PROCUNIVARIATE, PROC MEANS, and PROC FREQ. We’ve used the lattertwo a little bit, but we’ll look in more depth at what they can do, and alsotake a look at PROC UNIVARIATE.

SAS Programming October 2, 2014 19 / 93

Page 20: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE

This procedure summarizes your data one variable at a time. The“summary” tends to be very extensive, so this can generate tons ofoutput. If you’ve ever wanted to summarize four observations with afive-number summary

SAS Programming October 2, 2014 20 / 93

Page 21: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE: output

SAS Programming October 2, 2014 21 / 93

Page 22: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE: output

SAS Programming October 2, 2014 22 / 93

Page 23: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE: output

Usually the output is a lot more than you are interested in. Here is aguideline:

1. N, the sample size or number of observations

2. Mean, the sample average

3. Std Deviation the usual formula√

1n−1

∑ni=1(xi − x̄)2

4. Skewness a measure of how asymmetric a distribution is. Asymmetric distribution has skewness 0. Negative skew means that itis skewed to the left. Positive skew means that it is skewed to theright (like an exponential distribution). Skewness is based on thethird moment of a distribution E [X 3]. A formula for skewness is

n

(n − 1)(n − 2)

n∑i=1

(xi − x̄)3

SAS Programming October 2, 2014 23 / 93

Page 24: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE: output

1. Uncorrected SS, the uncorrected sum of squares, sum of thesquared observations,

∑ni=1 x

2i

2. Coefficient of variation, the estimated standard deviation overthe estimated mean, x̄/s, this is used in industrial applications,quality control, and engineering

3. Sum of weights, normally this is the same as the sample size unlessyou have observations weighted by how frequently they appear in aseparate variable

4. Sum of observations,∑n

i=1 xi5. Variance, the sample variance6. Kurtosis, a measure of how peaked a distribution is, based on the

fourth moment, E [X 4]. Theoretically, this is defined as E [X 4]/σ4

where (σ2)2 = σ4 is the square of the variance. For a standardnormal distribution, the kurtosis is 3, but SAS substracts 3 from thekurtosis so that if your data is standard normal, the reported kurtosisclose to 0.

SAS Programming October 2, 2014 24 / 93

Page 25: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Kurtosis for different distributions

All of these distributions have identical first, second, and third moments,but can be distinguished by their fourth moments. Distributions areLaplace (double exponential), hyperbolic secant, logistic, normal, raisedcosine, Wigner semicircular, and uniform.

SAS Programming October 2, 2014 25 / 93

Page 26: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE

1. Std Error Mean The standard error of the mean is the samplestandard deviation divided by the square root of the sample size, sos/√n, which is what you use for constructing confidence intervals.

2. A a single sample t − test is done automatically, as well as somenonparametric tests, testing whether the data are different from 0.The Sign Test for example tests whether it is likely that as manyobservations had the observed signs. If data were equally likely to bepositive or negative, then there would be a 1/8 chance that all 4observations would have the same sign, hence the p-value of 1/8.

3. Extreme observations are also highlighted to show the highest andlowest observations. In this case, since there are only 4 observations,it lists them all, but for larger data sets, this can be useful forchecking for outliers or values that are not within acceptable limits(negative heights or lengths, magnitude 67 earthquakes, etc.)

SAS Programming October 2, 2014 26 / 93

Page 27: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE

You can also do a plot option in PROC UNIVARIATE, which generateshistograms or stem-and-leaf plots and plots for checking for normality.

SAS Programming October 2, 2014 27 / 93

Page 28: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE

SAS Programming October 2, 2014 28 / 93

Page 29: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE plot option

SAS Programming October 2, 2014 29 / 93

Page 30: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE plot option

SAS Programming October 2, 2014 30 / 93

Page 31: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE plot option

Charming text graphics in the .lst file from linux SAS...

SAS Programming October 2, 2014 31 / 93

Page 32: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC UNIVARIATE BY statement

You can also run PROC UNIVARIATE BY some variable, for example sex

for the temperature data. This will generate twice as much output, so itcan of course generate a ridiculous amount of output.

Typically, you might run UNIVARIATE initially when exploring your datato understand its range, look for outliers, and count missing values foreach variable (but doesn’t describe patterns of missingness for jointrandom variables). So you might use PROC UNIVARIATE initially toexplore your data, but then not include it in your final SAS code.

Note that PROC UNIVARIATE also describes quantitative variables, notcharacter variables. PROC FREQ is more useful for describing charactervariables.

SAS Programming October 2, 2014 32 / 93

Page 33: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS

We’ve encountered just a little bit of what PROC MEANS can do before.We’ll take a more thorough look now. PROC MEANS, like UNIVARIATEis also useful for quantitative variables. The default behavior is to computethe MEAN, STANDARD DEVIATION, number of nonmissing values, andMIN and MAX values. PROC MEANS generates less output than PROCUNIVARIATE and also useful for catching outliers.

Some options for PROC MEANS include NPLACES (for number of digitsof precision when printing), WIDTH (number of columns in the output)and SUM to calculate the sum of the observations.

The CLASS statement allows you to compute MEANS and other statisticswithin different class variables (i.e., separate means for men versuswomen).

SAS Programming October 2, 2014 33 / 93

Page 34: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS

SAS Programming October 2, 2014 34 / 93

Page 35: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: output data set

You can use PROC MEANS to create an output data set.

SAS Programming October 2, 2014 35 / 93

Page 36: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: output data set

You can use PROC MEANS to create an output data set.

SAS Programming October 2, 2014 36 / 93

Page 37: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: output data set

You can then extract relevant information from this dataset if desired.

SAS Programming October 2, 2014 37 / 93

Page 38: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: getting the average into a new column

Suppose we want the average temperature to be available for thetemperature data, and to be in the same dataset as the other temperaturedata. How can we do this?

The easiest thing would be to run PROC MEANS, write down the averageon a piece of paper, and then hard-code it by hand into the original dataas a fixed column (every row has the same value). This might be fine formost applications, but if you needed to repeat this every month or everyweek, you might prefer a more automatic solution.

SAS Programming October 2, 2014 38 / 93

Page 39: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: getting the average into a new column

Another solution is to compute the mean in proc means, and then readthat in to a copy of the data set.

SAS Programming October 2, 2014 39 / 93

Page 40: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: getting the average into a new column

SAS Programming October 2, 2014 40 / 93

Page 41: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: Options

These are from the book. Another useful option is noprint whichsurpresses output. This is useful if you are mostly interested in creating anoutput data set.

SAS Programming October 2, 2014 41 / 93

Page 42: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: naming variables

You can either have SAS automatically name variables in the outputdataset or you can name them yourself. Consider the following:

SAS Programming October 2, 2014 42 / 93

Page 43: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: naming variables

SAS Programming October 2, 2014 43 / 93

Page 44: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: TYPE

This seems a bit obscure. If you use chartype as an option, the 0 or 1tells you whether it is computing the marginal versus cell means. 0indicates a marginal mean and 1 indicates a cell mean. Thus 0 means thatit is taking an overall average. If you tell the procedure to do means forindividual sexes, a 1 will indicate that it is giving means for that particularsex. You can also tell this from FREQ since 129 is the number ofobservations in the data set.

SAS Programming October 2, 2014 44 / 93

Page 45: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC MEANS: different stats for different variables

SAS Programming October 2, 2014 45 / 93

Page 46: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FREQ

PROC FREQ is most useful for categorical data or quantitative data withfew values. We’ve already seen some use of PROC FREQ before — nowwe’ll look at some options that can be used. The most basic use of PROCFREQ is

proc freq data=mydata;

tables myvar; /* Use tables instead of var */

run;

SAS Programming October 2, 2014 46 / 93

Page 47: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FREQ: options

SAS Programming October 2, 2014 47 / 93

Page 48: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

The COMPRESS option is useful when you are generating .lst files anddon’t want output to be too verbose.

SAS Programming October 2, 2014 48 / 93

Page 49: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with Earthquake data

SAS Programming October 2, 2014 49 / 93

Page 50: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with Earthquake data

Notice that the categories are left-justified. This is true even if I classifyquakesize numerically (without quotes).

SAS Programming October 2, 2014 50 / 93

Page 51: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with Earthquake data

Putting the asterisk makes 2-way contingency tables instead of analyzingeach variable separately.

SAS Programming October 2, 2014 51 / 93

Page 52: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with Earthquake data

Note that something funny happened with the names of the magnitudecategories. The 6+ category got truncated to have length 1 because thefirst value of category had length 1. This can be fixed by a LENGTHstatement in the original datastep put before quakesize is first used.(Usually LENGTH statements occur before INPUT.)

Puttinglength quakesize $2;

fixes the problem.

SAS Programming October 2, 2014 52 / 93

Page 53: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data

The FORMCHAR option allows you to adjust the vertical, horizontal, andintersection points in the outputted table. Many journals accept horizontallines but not vertical separators between columns.

SAS Programming October 2, 2014 53 / 93

Page 54: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data

The FORMCHAR option allows you to adjust the vertical, horizontal, andintersection points in the outputted table. Many journals accept horizontallines but not vertical separators between columns. Unfortunately, this hasno effect in SAS Studio.

SAS Programming October 2, 2014 54 / 93

Page 55: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data

If you use LATEX, you can make your life easier...

SAS Programming October 2, 2014 55 / 93

Page 56: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data

How it looks in SAS Studio.

SAS Programming October 2, 2014 56 / 93

Page 57: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data: NLEVELS option

SAS Programming October 2, 2014 57 / 93

Page 58: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data

The NLEVELS option is useful for debugging. In this case, no earthquakeswere recorded below magnitude 1.0, which is why my category of “0” is amissing level.

For eventtype, PROC FREQ told us quickly from the output that therewere three levels (earthquake, quarry blast, and out of network of interest),but if there are more categories, it might be difficult to catch this by eye.

For example, you might have data that were recorded at 30 field stations,or 3000 counties across the US, and you need to make sure that data fromeach one is in your analysis...

SAS Programming October 2, 2014 58 / 93

Page 59: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data: TABLES options

You can reduce the output with some of the options for the TABLESstatement.

SAS Programming October 2, 2014 59 / 93

Page 60: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data: TABLES options

SAS Programming October 2, 2014 60 / 93

Page 61: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data: TABLES options

SAS Programming October 2, 2014 61 / 93

Page 62: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example with earthquake data: TABLES options

A three-way table is presented as a sequence of two-way tables. This wasgenerated from tables eventtype*quakesize*depthsize

SAS Programming October 2, 2014 62 / 93

Page 63: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example: birthmonth distribution

Suppose you wanted to do a chi-square using PROC FREQ on thefollowing data counting number of babies born by birthmonth and sex.

SAS Programming October 2, 2014 63 / 93

Page 64: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Example: birthmonth distribution

How can you enter the data to be read in by PROC FREQ? Twopossibilities are to have one row for each birth, like this

sex month

F June

M May

M April

F April

F June

...

SAS Programming October 2, 2014 64 / 93

Page 65: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

This will have 88273 rows, the number of observations. A second way is tohave weights, or counts, for each combination of categorical variable:

sex month count

F January 3537

F February 3407

...

F December 3371

M January 3743

...

M December 3761

Both approaches are legitimate, and which one is more convenient mightdepend on how you initially received the data.

SAS Programming October 2, 2014 65 / 93

Page 66: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Using the WEIGHT statement in PROC FREQ

SAS Programming October 2, 2014 66 / 93

Page 67: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Using the WEIGHT statement in PROC FREQ

SAS Programming October 2, 2014 67 / 93

Page 68: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Using the WEIGHT statement in PROC FREQ

SAS Programming October 2, 2014 68 / 93

Page 69: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Using the WEIGHT statement in PROC FREQ

ORDER=DATA preserves the order of the values encountered in the datainstead of alphabetizing.

SAS Programming October 2, 2014 69 / 93

Page 70: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

χ2 versus Likelihood-ratio χ2

The χ2 statistic is computed using the usual formula∑i

(Oi − Ei )2

Ei

where the sum is over all cells in the table. The expected count for the acell is its row total times column total divided by overall sample size.

The Likelihood-ratio χ2 is also called G 2, and it also has an asymptoticallyχ2 distribution. It’s formula is∑

i

Oi log(Oi/Ei )

using natural logs. The values are often very similar for the two teststatistics.

SAS Programming October 2, 2014 70 / 93

Page 71: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Other tests in PROC FREQ

In addition to χ2 and G 2 tests, PROC FREQ has statements for Fisher’sexact tests, odds ratios, and Cochran-Armitage test for trend (for ordinaldata).The test of trend test is useful for 2xK contingency tables, where theassociation between the two-valued variable and the K − valued variable isthought to be changing over time. In the case of the birthdata, this mightmean that the ratio of male-to-female births (which should be constantbut not necessarily 50-50 if there is no association) could be changinglinearly throughout the year. Perhaps the descrepancy is highest in thespring and smallest in the fall.

SAS Programming October 2, 2014 71 / 93

Page 72: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FREQ: test of trends

SAS Programming October 2, 2014 72 / 93

Page 73: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FREQ: test of trends

Note that the statistic is sensitive to the order of the categories (as itshould be), and we get a different result looking for a trend from Jan-Deccompared to Apr-Mar.

SAS Programming October 2, 2014 73 / 93

Page 74: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

ORDER= option for PROC FREQ

In addition to ORDER= you can use ORDER=FREQ so that the table ispresented in decreasing order of frequency. We did this in Week 2 with theRomeo and Juliet data. You can also use ORDER=FORMAT to ordervalues by their formatted labels rather than names in the original data(this wouldn’t make a difference for our example).

SAS Programming October 2, 2014 74 / 93

Page 75: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Alternative to the WEIGHT statement in PROC FREQ

SAS Programming October 2, 2014 75 / 93

Page 76: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Alternative to the WEIGHT statement in PROC FREQ

SAS Programming October 2, 2014 76 / 93

Page 77: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Alternative to the WEIGHT statement in PROC FREQ

SAS Programming October 2, 2014 77 / 93

Page 78: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Skinny versus wide data

Often there’s a choice of how to organize your data: a skinny format withlots of observations and fewer variables versus a wide format with morevariables and fewer observations. Using the weight column makes the datamore compact, but sometimes data is more easily organized in the skinnyrepresentation. Different SAS procedures might prefer one versus the otherrepresentation for input, so sometimes you have to convert from one tothe other.

SAS Programming October 2, 2014 78 / 93

Page 79: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Another place this comes up is with repeated measures data. Suppose youhave patients who are measured at 3 time points. Your data could looklike this:

patientid time1 time2 time3

0001 147 145 142

0002 135 125 125

0003 162 155 156

versus

patientid time bp

0001 1 147

0001 2 145

0001 3 142

0002 1 135

0002 2 125

0002 3 125

0003 1 162

0003 2 155

0003 3 156

SAS Programming October 2, 2014 79 / 93

Page 80: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Controlling category labels with PROC FORMAT

Instead of having ”f” and ”m” appear in out tables (which is how I codedthe data), I might want to have ”male” and ”female”. Similarly, I mightwant ”January” to appear instead of ”jan”. This could be achieved in adata step using

if month = "jan" then month2 = "January"

This creates extra variables, and if your data set has 88000 observations,uses a lot of extra memory or makes your program slower. Also, you migthdecide that sometimes you want to display ”jan” and sometimes ”January”and sometimes just ”J” for space reasons (like cramming those words inthe x-axis of a time series). This can be handled using PROC FORMAT.

SAS Programming October 2, 2014 80 / 93

Page 81: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT

SAS Programming October 2, 2014 81 / 93

Page 82: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT

SAS Programming October 2, 2014 82 / 93

Page 83: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT: grouping variables

Another common use for PROC FORMAT is to group variables. Supposewe want to classify births as Winter, Spring, Summer, and Fall. This couldbe done by creating a variable season in the data step and using IFstatements. Another approach is to use a format to group variables

SAS Programming October 2, 2014 83 / 93

Page 84: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT

SAS Programming October 2, 2014 84 / 93

Page 85: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT: numeric variables with ranges

You can use PROC FORMAT to group data into ranges instead of definingthem in a data step. Suppose we want to define cities to be either small(<500,000), medium (500,000–1,000,000), or large (over 1,000,000).

SAS Programming October 2, 2014 85 / 93

Page 86: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT: numeric variables with ranges

SAS Programming October 2, 2014 86 / 93

Page 87: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT: numeric variables with ranges

SAS Programming October 2, 2014 87 / 93

Page 88: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT: numeric variables with ranges

The syntax for the previous example was appropriate for integer values.For floating point values with decimals, we need to use inequalities. Thesyntax is a little weird. This creates intervals that are half-open, andclosed on the left, such as [55, 60). To make half-open intervals closed onthe left, use <- instead.

proc format

value age low -< 50 = "less than 50"

50 -< 55 = "50 to less than 55"

55 -< 60 = "55 to less than 60"

65 -< 70 = "60 to less than 65"

70 -< 80 = "70 to less than 80"

80 -< high = "80 and over"

other = "missing";

run;

SAS Programming October 2, 2014 88 / 93

Page 89: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

Some Uses of Formats

The main use of formats is perhaps to make the output prettier, but hereare some more statistically valuable uses:

1. Formats can be a way of collapsing categories in contingency tables(if cell counts are too low for χ2 tests)

2. You can use them to deal with sloppy/inconsistent coding of thedata. For example, if survey data has a mix of responses such as “Y”,“y”, “Yes”, then you can format them to all be the same value. Or ifsex is coded as “M”, “m”, and ”man”, ”male”, etc. Another way todeal with this might be to read in just the first character and use theUPCASE function. However, if states are sometimes coded as “NM”and sometimes “New Mexico”, the FORMAT approach might behandy.

SAS Programming October 2, 2014 89 / 93

Page 90: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT and graphics

Formats are also a good way to make your plots more readable. You canalso use a LABEL statement to change how a variable name appears in aplot.

SAS Programming October 2, 2014 90 / 93

Page 91: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT and graphics

SAS Programming October 2, 2014 91 / 93

Page 92: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

PROC FORMAT and graphics

SAS Programming October 2, 2014 92 / 93

Page 93: Week 7: More graphics and SAS/BASE proceduresjames/STAT579-F18/SAS7.pdfSome basic SAS procedures We’ll look at some basic SAS procedures useful for examining and summarizing data

More on LABEL statements and Formats

Label statements can be done in individual procedures, in a formatstatement or in a datastep, depending on what is most convenient.

More advanced uses of formats are to make permanent formats in auser-defined library. Instead of having a PROC FORMAT that you userepeatedly (such as converting states to their two-letter abbreviations),you can have SAS search your format library using

options fmtsearch=(myfmts);

so that you can reuse the same format for multiple programs. To me, thismakes your program less portable and less self-contained (it might make ithard to switch between different computers), but if you are reliably at thesame computer all the time, this might save a lot of time programmingand make your code shorter.

SAS Programming October 2, 2014 93 / 93