SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the...
Transcript of SAS PROGRAMMING ESSENTIALS FOR STATISTICAL …sscnars.icar.gov.in/dairysci/8.pdf · Base SAS is the...
SAS PROGRAMMING ESSENTIALS FOR STATISTICAL
COMPUTING IN DAIRY RESEARCH
A.K. Sharma
NDRI, Karnal-132 001
1. Introduction
Statistical Computing is the interface between Statistics and Computer Science. It is the area of
computational science (or scientific computing) specific to the mathematical science of Statistics.
The rapid and sustained increase in computing power starting from the second half of the 20th
century has made a substantial impact on the practice of Statistical Science. Early statistical
models were almost always from the class of linear models, but powerful computers, coupled
with suitable numerical algorithms, caused an increased interest in nonlinear models (such as
connectionist models) as well as the creation of new types, such as generalised linear models and
multivariate and multilevel models. Increased computing power has also led to the growing
popularity of computationally intensive methods. The computer revolution has implications for
the future of Statistics with new emphasis on “Experimental” and “Empirical” Statistics. A large
number of both general and special purpose statistical software are now available.
SAS (pronounced “sass”, originally Statistical Analysis System) is such an integrated system of
software products provided by SAS Institute Inc., Cary, North Carolina, USA that enables
Scientists to perform various tasks, viz.,
Data Entry, Retrieval, Management and Mining
Report Writing and Graphics
Statistical Analysis
Forecasting and Decision Support
Operations Research and Project Management
Quality Improvement
Data Warehousing
Platform Independent and Remote Computing
Grid and Cloud Computing, etc.
SAS was conceived by Anthony J. Barr in 1966. As a North Carolina State University graduate
student during 1962-64, Barr had created an Analysis of Variance Modelling Language inspired
by the notation of statistician Sir Maurice G. Kendall, followed by a Multiple Regression
Programme that generated machine code for performing algebraic transformations of the raw
data. Drawing on those programmes and his experience with structured data files, he created
SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr
developed the fundamental structure and language of SAS. By 1971, SAS was gaining popularity
within the academic community. Strength of the system was analysing experiments with missing
data, which was useful especially to the agricultural and pharmaceutical industries among others.
In 1973, extensive programming in econometrics, time series, and matrix algebra were made. In
1976, SAS Institute, Inc. was incorporated by Barr, et al.
SAS Programming Essentials for Statistical Computing in Dairy Research
Base SAS – An Overview
Base SAS is the core foundation for variety of data management and analytic software
components under SAS. It provides essential tools for the basic data-driven tasks that are
performed by the scientists, i.e., data access, management, analysis and presentation (Fig.1).
Fig.1: Base SAS.
The SAS programming language, ready-to-use procedures and the windowing interface can be
used to access research data from many sources, manage, analyse and present the research data
as meaningful information in the form of a report that can be delivered to any platform, in any
format as per need. The reports thus generated can be published on the Web for its wider
dissemination among various stakeholders.
Accessing data
As stated earlier, you can access data using SAS regardless of the data sources or platforms on
which it resides, i.e., you can access data that are stored (almost) anywhere, whether these are in
a file on your system, or data that is stored on a remote server or in another database system in
almost any format, including raw data, SAS data sets, and files created by other vendors’
software. You can access various types of file with SAS that include, reading raw data in any
format from any kind of file including variable-length records, binary files, free-formatted data,
even files with missing data; accessing some other vendors’ files directly, including BMDP,
SPSS, and OSIRIS files. For others, you can use SAS/ACCESS tool to access external data as if
it were native to SAS, e.g., reading data stored in a Microsoft Excel spreadsheet, in a Microsoft
Access table, in a dBase file, or in ORACLE or another DBMS.
Managing data
After having accessed the data, the same can be manipulated using the SAS programming
language, e.g., format the data, create variables (columns), use operators to evaluate data values,
Data Access and Management
Application Development
Reporting and Graphics
Analytics
base SAS
SAS Programming Essentials for Statistical Computing in Dairy Research
use functions to create and recode data values, subset data, perform conditional processing,
merge a wide range of data sources, create, retrieve, and update database information.
Analysing data and presenting information
Once the data is in shape, you can use SAS to analyse the data and produce reports. The resultant
SAS output can range from a simple listing of a data set to customised reports of complex
relationships. Base SAS provides powerful data analysis tools to produce tables, frequency
counts, and cross-tabulation tables; create a variety of charts and plots; compute a variety of
descriptive statistics, including the mean, sum, variance, standard deviation and more; compute
correlations and other measures of association, as well as multi-way cross-tabulations and
inferential statistics, etc.
For reporting and displaying analytical results, SAS provides several visually appealing output
formats, such as an array of mark-up languages including HTML4 and XML; output that is
formatted for a high-resolution printer, PostScript, PDF, and PCL files; RTF; interactive
coloured graphs using ActiveX controls or Java applets. Finally, these reports can be delivered to
a wide variety of locations and platforms in order to suit your needs.
SAS Workspace
SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks
you need to do. Once you get familiar with the starting points for your SAS tasks, you are ready
to accomplish any task that SAS can do. SAS workspace is organised in the following five main
windows:
The Explorer Window
In the Explorer Window, you can view and manage your SAS files and create shortcuts to files
that are not formatted by SAS. You can use this window to:
View and manage SAS files and create shortcuts to files that are not formatted by SAS
Create new SAS libraries and SAS files
Open any SAS file
Perform most file management tasks such as moving, copying, and deleting files.
Program Editor or Editor Window
You can use either of these windows to enter, edit, and submit SAS programs. The Editor
Window provides several useful editing features such as Colour Coding and Syntax Checking of
SAS language; Expandable and Collapsible sections; Multilevel Undo and Redo, etc. The initial
Editor Window title is Editor - Untitled. When you open a file or save the contents of the Editor
Window to a file, the window title changes to reflect that file name. When the contents of the
Editor Window are modified, an asterisk is added to the title. You can have multiple Editor
Windows open at the same time.
Log Window
The Log Window displays messages about your SAS session and any SAS programs that you
submit.
SAS Programming Essentials for Statistical Computing in Dairy Research
Output Window
The Output Window displays the output from SAS programs that you submit. It automatically
opens as soon as an output is created. In the MS-Windows operating environment, the Output
Window is positioned behind the Log and Editor Windows until there is output to display. You
can navigate between windows using the taskbar. Several SAS programs may not create any
output in the Output Window. Some programs open interactive windows while other programs
only produce messages in the Log Window. If you create HTML output, you can view it in the
Results Viewer Window, which is the internal browser for SAS.
Results Window
The Results Window helps us to navigate and manage output from SAS programs that you
submit. You can view, save and print individual items of the output. Recall that the Results
Viewer Window is the place where you actually view HTML output. The Results Window is
empty until you submit a SAS program that creates output. Then it opens automatically. The
Results Window is positioned in front of the Explorer Window when SAS creates output. You
can move between the two windows by using the tabs at the bottom of the windows.
SAS data sets
Before you start working with the data in SAS, these data must be in a special form called a SAS
data set. Understanding SAS data sets is the first and foremost step towards learning about SAS
programming. Conceptually, a SAS data set (also called a table) is a file containing descriptor
information and related data values. The file is organised as a table of observations (rows), also
called cases and variables (columns) that SAS can process. Some SAS data sets also contain an
index, which enables SAS to locate records in the data set. In some special cases, such as using
SAS/ACCESS to read database management system files directly, the SAS data set may contain
only the logic for accessing the data, not the data itself! However, for this presentation, you
assume that SAS data sets contain data. To work with SAS data sets, it is also important to
understand as to how are they stored? Note that all SAS files are stored in a SAS library, which
is a collection of files such as SAS data sets and catalogues. In the MS-Windows operating
environment, a SAS library is typically a group of SAS files in the same folder. To access a
library, assign it a name (also known as a libref, or library reference). The library names are
nicknames or shortcuts that you can use to identify libraries during a SAS session. In this
presentation, you will learn about assigning SAS libraries, and then to work with SAS data sets
in a library. SAS assigns three libraries, viz., Sashelp, Sasuser and Work, automatically each time
you start SAS. In this presentation, you learn about these libraries and view file types in the
Sashelp library. In the Explorer Window, double-click Libraries. Notice that there are three
libraries. Sashelp is a permanent library that contains sample data and other files, which control
how SAS works at your site; this is a read-only library. Sasuser is also a permanent library that
contains SAS files in the Profile Catalogue that store your personal settings; this is also a
convenient place to store your own files. Work is a temporary library for files that do not need to
be saved from session to session. Note that Sashelp, Sasuser and the user-assigned libraries are
permanent libraries. Files in permanent libraries are saved in the operating environment. Work is
a temporary library; i.e., files in the Work library are not saved once you end the SAS session.
Double-click the Sashelp library. Scroll the Explorer Window and notice that there are several
types of files, or members, in the library.
SAS Programming Essentials for Statistical Computing in Dairy Research
Assign a library
When you define a library, you indicate the location of your SAS files to SAS. Once you define a
library, you can manage SAS files within it. In this presentation, you use the New Library
Window to assign a library. On the toolbar, click the New Library tool; the New Library
Window opens. In the Name box, type Aksas (or any name you like). Library names are limited
to 8 characters, which must start with a letter or underscore. The library names can contain only
letters, numerals, or underscores. Select the Enable at Startup check box; this library will be
automatically assigned each time you start a SAS session. Click Browse; select the default
location or select another location in your operating environment. Any files that you save to the
Aksas library will be saved in the folder that you designate in the Path box; click OK. You can
delete SAS libraries. Note that when you delete a SAS library, SAS no longer has access to the
directory. However, the contents of the library still exist in your operating environment.
Import different format data files as SAS data sets and rename the data set files
Example 1
Consider the following MS-Excel worksheet (lact.xlsx) containing data pertaining to milk
production (state wise, up to first four lactations) for different breeds of dairy cattle. The same
worksheet file can be saved as ASCII text file (say, lact.txt) by selecting appropriate file type at
the time of saving the worksheet.
Let us see how to add and rename a SAS data set. Now that you have a library, you can add a
SAS data set to the library. With the Explorer Window active, select View ► Show Tree; the
libraries are displayed in the left pane of the window. Click the Aksas library. Import lact.xlsx
(or lact.txt) file to SAS using File ► Import Data... option. Click Aksas; notice that lact file
has been imported there. Right-click lact and select Rename. Type lactation for the new name
and click OK. SAS data set names must be 1 to 32 characters in length; begin with a letter (A-Z,
STATE LACT BREED MY
Haryana 1 Hariana 18
Haryana 2 Hariana 19
Haryana 3 Hariana 21
Haryana 4 Hariana 20
Haryana 1 Tharparkar 23
Haryana 2 Tharparkar 25
Haryana 3 Tharparkar 22
Haryana 4 Tharparkar 21
Haryana 1 Sahiwal 22
Haryana 2 Sahiwal 24
Haryana 3 Sahiwal 23
Haryana 4 Sahiwal 23
Haryana 1 Red Sindhi 27
Haryana 2 Red Sindhi 26
Haryana 3 Red Sindhi 25
Haryana 4 Red Sindhi 25
STATE LACT BREED MY
Panjab 1 Hariana 25
Panjab 2 Hariana 36
Panjab 3 Hariana 35
Panjab 4 Hariana 37
Panjab 1 Tharparkar 30
Panjab 2 Tharparkar 32
Panjab 3 Tharparkar 29
Panjab 4 Tharparkar 30
Panjab 1 Sahiwal 26
Panjab 2 Sahiwal 29
Panjab 3 Sahiwal 30
Panjab 4 Sahiwal 38
Panjab 1 Red Sindhi 29
Panjab 2 Red Sindhi 25
Panjab 3 Red Sindhi 27
Panjab 4 Red Sindhi 28
SAS Programming Essentials for Statistical Computing in Dairy Research
including mixed case characters) or an underscore ( _ ); and continue with any combination of
numbers, letters, or underscores.
Open a SAS data set
Now that you’ve created a data set, view the data that it contains. There are many other ways to
get your data into a SAS data set. In general, you can enter data directly into a SAS data set using
the VIEWTABLE Window; read raw data into a SAS data set using the Import wizard or SAS
programming statements; read and modify existing data; convert other vendors’ data files into
SAS data sets using SAS/ACCESS; read other vendor’s data directly using SAS/ACCESS. In the
Explorer Window, double-click the lactation table in the Aksas library; the table opens in the
VIEWTABLE Window; scroll the VIEWTABLE Window and notice that there are 32 rows
(also called observations or cases, both terms are used here synonymously) and 4 columns (also
called variables) sets using SAS programming statements.
View general data properties
The descriptor portion of a SAS data set contains information about the data set, including the
name of the data set; the date and time the data set was created; the number of observations; the
number of variables; you can see this information by viewing the general properties of a data set.
In the Explorer Window, right-click the Lactation table and select Properties. In the General
tab, view the data set’s properties. (Don’t close this window yet as you’ll need it for the next step
too).
View column attributes
Besides general information about the data set, the descriptor portion also contains attribute
information for each variable in the data set including the variable’s name, type, length, format,
informat (input format), and label. In the Aksas.lactation Properties Window, click the
Columns tab. In the Column Name column, notice that all the variables for the data set are
listed along with a symbol that indicates the variable’s type.
Variables and Labels
Variable (column) names must be 1 to 32 characters in length; begin with a letter (A-Z, including
mixed case characters) or an underscore ( _ ); continue with any combination of numbers, letters,
or underscores. SAS is case insensitive for variable names; however, SAS remembers the first
occurrence of the variable and writes it that way in output. Now, look at the Label column, a
label is descriptive text up to 256 characters; labels are used instead of variable names in some
reports and for the column headings in the VIEWTABLE Window.
Types of Variables
Further, look at the Type column. There are only two types of variables in SAS, viz., character
and numeric. Character variables are listed as Text in the type column, and numeric variables are
listed as Number in the Type column. The Length attribute is related to the variable’s type.
Character variables can contain any values; use a blank to represent missing values; can be up to
32k long. Numeric variables can contain only numeric values (the digits 0 through 9, +, -, ., and
E for scientific notation); use a single period (.) to represent missing values; have a default length
of 8; numeric values (no matter how many digits they contain) are stored as floating point
numbers in 8 bytes of storage, unless you specify another length.
SAS Programming Essentials for Statistical Computing in Dairy Research
Change a variable’s format
Formats are variable attributes that affect the way data values are written. SAS offers a variety of
formats for numeric and character data. In this session you change the format of a variable. In the
Aksas.lactation Properties Window, look at the Format column; the MY variable uses the SAS
default format, which displays the value 18.00 as 18 in a report. In general, SAS formats have a
name; a w value, which specifies the width that is used for displaying the value, a period
following the w value, numeric SAS formats, such as the Fw.d format, can also specify a d value,
which is the number of decimal places to be displayed. Open the lactation table if it is not
already open; right-click the MY column heading and select Column Attributes from the
shortcut menu (Fig.2); in the Column Attributes Window, click the selector next to the Format
box.
Fig.2: Viewing Column Attributes of Aksas.lactation data set.
In the Format Window, view the available formats and their descriptions. When you select a
format in this list, an example is displayed in the Format Details area. Let’s say that you want to
incorporate the decimal places for the values of the MY variable. Change the value in the
Decimal box to 12.2; click OK. In the Column Attributes Window, click Apply; then click
Close. View the lactation table and notice that the values in the MY column now contain
decimal places up to two decimal points.
SAS Programming Essentials for Statistical Computing in Dairy Research
Fig.3: Change a variable’s format.
1. SAS programming essentials
The SAS programming language is both powerful and flexible. You can program any number of
analyses and reports with it. The SAS language contains statements, expressions, functions and
CALL routines, options, formats, and informats, i.e., elements that many programming
languages share. SAS can also simplify programming for you with its library of built-in
programs known as SAS procedures.
Create and run SAS programs
SAS procedures use data values from SAS data sets to produce pre-programmed reports,
requiring minimal effort from users. Base SAS procedures enable you to manipulate data; store
and retrieve information; perform statistical analysis; create reports. This session shows you the
main characteristics of SAS programs, i.e., components of SAS programs. Let’s begin by looking
at a simple SAS program. The sample SAS program below (Fig.4) contains two steps, a DATA
step and a PROC step. These two types of steps, alone or combined, form all SAS programs:
Fig.4: A sample SAS program showing the DATA and PROC steps.
SAS Programming Essentials for Statistical Computing in Dairy Research
Components of a SAS program
DATA step typically creates or modifies SAS data set. Also, it is used to produce customised
reports. For example, you can use DATA step to:
put your data into a SAS data set
compute the values for new variables
check for and correct errors in your data
produce new SAS data sets by subsetting, merging, and updating existing data sets, etc.
Note that, in the sample program (Fig.4), the DATA step creates a new SAS data set,
lact_pp.sas7bdat in the Aksas library using the existing SAS data set lactation.sas7bdat in the
same library. Note that we do not directly use the original data set (lactation, in this case) rather
a new data file (lact_pp, in this case) is created from the original data set, which is actually
utilised by SAS for analysis of the underlying data.
PROC (procedure) steps typically analyse and process data presented in the form of a SAS data
set; and they sometimes create SAS data sets that contain the results of the procedure. PROC
steps control a library of pre-written routines (procedures) that perform tasks on SAS data sets,
such as listing, sorting, and summarising data, e.g., you can use PROC steps to print a report;
produce descriptive statistics; create a tabular report; produce plots and charts, etc. The sample
program (Fig.4) uses the PROC PRINT procedure, which prints the data in a data set. The
DATA= option tells SAS what data to use for the procedure.
Characteristics of SAS Programs
Let’s have a glance at the individual statements in the sample program (Fig.4). SAS programs
comprise of several SAS statements. A SAS statement has two important characteristics, i.e., it
usually begins with a SAS keyword; and it always ends with a semicolon. A DATA step begins
with the keyword DATA, and a PROC step begins with the keyword PROC. Generally, a step
ends with a RUN statement or when a new DATA or PROC step begins. A RUN statement tells
SAS to process all the preceding lines of the step. SAS statements are free-format, i.e., they can
begin and end anywhere on a line; one statement can continue over several lines; and several
statements can be on a single line. SAS statements are not case sensitive. Blanks or special
characters separate the ‘words’ in a SAS statement.
Composing a SAS program into the Editor Window Let’s compose the sample SAS programs [i.e., Code 1(a) and Code 1(b)] given below. These
programs create a new SAS data set from an existing SAS data set and then print a listing of the
new data set. The Aksas.lactation data set was created in Aksas with SAS data sets. If you
didn’t create it, return to the task to define the library and create the data set before proceeding
here. Type the following codes (one-by-one) separately, into the Editor Window and save:
SAS Programming Essentials for Statistical Computing in Dairy Research
Code 1(a): One-level name. Code 1(b): Two-level name.
LIBNAME aksas ‘d:\aksas’;
DATA lact_1a;
SET aksas.lactation;
RUN;
PROC PRINT DATA=lact_1a;
RUN;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_1b;
SET aksas.lactation;
RUN;
PROC PRINT DATA=aksas.lact_1b;
RUN;
Notice that when you write this text into the Editor Window, SAS keywords are colour-coded
and the program steps are separated into expandable blocks. The DATA step in Code 1(a)
creates a new SAS data set named lact_1a. Note that lact_1a is a one-level name. Since no
library name is specified for the lact_1a data set, SAS uses by default the Work library to
temporarily store this file. The lact_1a data set is created from the lactation data set that you
stored in your Aksas library. Notice that Aksas.lactation and Aksas.lact_1b are two-level
names [as shown in the Code 1(a) and Code 1(b)].The PROC PRINT step lists all the data in the
Work.lact_1a data set. If the user-defined library is assigned as in case of the Code 1(b), SAS
uses the user-defined library (Aksas in present case) rather than the Work library for one-level
names.
Submit the program and view output When you submit a SAS program, SAS compiles (checks the code for its grammatical
correctness in SAS perspective) and executes (if compilation process is successful) the code and
returns expected results (provided the code is logically correct) to the Output Window.
1. With the Editor Window active, select Run ►Submit. To submit only a portion of a
program in the Editor Window, highlight the portion you want to submit, right-click the
highlighted area, and select Submit ► Selection.
2. The Output Window comes to the front (Fig.5). Scroll the window and notice that the
PRINT procedure has printed all the data from the lactation data set. In addition, the
procedure has automatically formatted the data into rows and columns.
Fig.5: Results of the Code 1(a) SAS program being displayed in the OUTPUT Window.
SAS Programming Essentials for Statistical Computing in Dairy Research
View the log Each time a step is executed, SAS generates a log of the processing activities and the results of
the processing. The SAS log collects messages about the processing of SAS programs and any
errors that may occur. Click the Log Window to activate it. The contents of the Log Window
produced by SAS for the Code 1(a) program are shown in Fig.6. Notice that there are separate
messages for the DATA and PROC steps. The messages from the DATA step tell you that SAS
read 32 observations from the Aksas.lactation data set and that there are 32 observations and 4
variables in the new data set Work.lact_1a. The messages from the PROC step tell you that 32
observations were read during the PRINT procedure. There are no error messages.
Fig.6: The contents of the Log Window produced by SAS while executing the Code 1(a)
program.
It’s important that you understand how SAS processes DATA steps (one observation at a time)
and PROC steps. You should also learn methods and tips for debugging SAS programs. For
details on these topics, refer to the SAS documentation.
Use the Results Window Once you have created output for Code 1(a), the Results Window opens. On windows, the
Results Window covers the Explorer Window. You can switch between the two windows using
the tabs at the bottom of the window.
1. Look at the Results Window. The Results node at the top of the window contains a folder
for the output of the PROC PRINT procedure (Fig.7). The folder is labeled with the
procedure name and the title (default in present example) of the report.
2. Click the expansion icon next to the folder.
3. Double-click Data Set Work.lact_1a. The Output Window (Fig.5) scrolls to the top of
the output.
SAS Programming Essentials for Statistical Computing in Dairy Research
Fig.7: The contents of the Results Window produced by SAS while executing the Code 1(a)
program.
Note that each time you run a program that creates output; the new output is appended to the last
output in the Output Window. You can double-click the output item in the Results Window to
scroll the Output Window to the top of that particular output. If you clear the Output Window,
the links in the Results Window are deleted.
Create HTML output You’ve seen that you can create SAS listing output in the Output Window, but you can also set
preferences to create HTML output (Fig.8). Your preferences are stored from session to session.
1. Select Tools ► Options ► Preferences.
2. Click the Results tab.
3. Select the Create HTML check box and click OK.
SAS Programming Essentials for Statistical Computing in Dairy Research
Fig.8: Setting preferences to create HTML output.
Modify the program Now that you have seen the output and formatting for the PRINT procedure, let’s try adding
options to the program to incorporate the following changes:
Remove the date from the top of each page, which SAS prints by default
Replace the default “The SAS System” title with
Use the column label rather than the column name in the output
Remove the Obs column.
We can do this by using a SAS system option, a TITLE statement, and options on the PROC
PRINT statement. To remove the date from the top of each page, use the NODATE system
option. Insert the following OPTIONS statement at the top of the program: options nodate; to
replace the default title, use the TITLE statement. Insert the line, title “State, Lactation and
Breed wise Milk Yield Report”; above the RUN statement for the PROC PRINT procedure.
By including the TITLE statement in the PROC step, the title is applied to the output of the
procedure. To remove the Obs column and use column labels, use options on the PROC PRINT
statement. Change the PROC PRINT statement to add the LABEL and NOOBS options. Your
PROC PRINT statement should look like this:
proc print data=lact_1b label noobs. The revised program should look like as shown below.
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_1b;
SET aksas.lactation;
RUN;
PROC PRINT DATA=aksas.lact_1b label noobs;
TITLE "State, Lactation and Breed wise Milk Yield Report";
SAS Programming Essentials for Statistical Computing in Dairy Research
RUN;
OPTIONS;
TITLE;
Select Run ► Submit the above program; it produces the following revised output (compare it
with that given in Fig.5):
Fig.9: Revised output without system date as well as having user-defined title replacing the
default SAS title on top of the report.
With the Results Viewer Window active, select File ► Save As. In the File name box, type
e:\aksas\Lactation.htm. Click Save. Look at the Results Window. Notice that there is a new
folder for the results of the program you just ran. The folder name uses the State, Lactation and
Breed wise Milk Yield Report title that you specified in your program. Double-click the Print:
State, Lactation and Breed wise Milk Yield Report folder, then double-click Data Set
Aksas.lact_1b. There are links to both the listing output and the HTML output (Fig.10).
SAS Programming Essentials for Statistical Computing in Dairy Research
Fig.10: Revised Results with links to simple output and HTML formats.
Now you’ve learned the basic steps for creating, submitting (running), and saving SAS
programs.
Creating simple data sets and manipulating the data with SAS program
Earlier, we imported data from an already existing data file in different formats such as MS-
Excel worksheet or ASCII formats to build up SAS data set. The following example program
illustrates as to how to create a SAS data set directly through programming.
Example 2: Let us create directly a SAS data set for the data given in Example-1 above.
We use the following program (a revised version of code 1(b) above). The data set is saved as
aksas.lactation_.sas7bdat. Explanation of the code is given at its end.
OPTIONS NODATE;
LIBNAME aksas 'd:\aksas';
DATA aksas.lactation_;
INPUT state $ lact $ breed $ MY;
SAS Programming Essentials for Statistical Computing in Dairy Research
CARDS; Haryana 1 Hariana 18
Haryana 2 Hariana 19
Haryana 3 Hariana 21
Haryana 4 Hariana 20
Haryana 1 Tharparkar 23
Haryana 2 Tharparkar 25
Haryana 3 Tharparkar 22
Haryana 4 Tharparkar 21
Haryana 1 Sahiwal 22
Haryana 2 Sahiwal 24
Haryana 3 Sahiwal 23
Haryana 4 Sahiwal 23
Haryana 1 Red Sindhi 27
Haryana 2 Red Sindhi 26
Haryana 3 Red Sindhi 25
Haryana 4 Red Sindhi 25
Panjab 1 Hariana 25
Panjab 2 Hariana 36
Panjab 3 Hariana 35
Panjab 4 Hariana 37
Panjab 1 Tharparkar 30
Panjab 2 Tharparkar 32
Panjab 3 Tharparkar 29
Panjab 4 Tharparkar 30
Panjab 1 Sahiwal 26
Panjab 2 Sahiwal 29
Panjab 3 Sahiwal 30
Panjab 4 Sahiwal 38
Panjab 1 Red Sindhi 29
Panjab 2 Red Sindhi 25
Panjab 3 Red Sindhi 27
Panjab 4 Red Sindhi 28
;
PROC PRINT DATA = aksas.lactation_ label noobs;
TITLE "State, Lactation and Breed wise Milk Yield Report";
RUN;
OPTIONS;
TITLE;
Program Explanation (Example 2):
As stated earlier, this program creates a data set, lactation_.sas7bdat under Aksas library. The
above program needs some explanation for readers’ comprehension. The ‘input’ statement has
been incorporated under the DATA step. This statement creates the variables, state, lact, breed
and my denoting State Name, Lactation Number, Breed Name and Milk Yield, respectively.
Note the ‘$’ symbol following the variable names, state, lact and breed each under input
statement indicates that these variables are all of type ‘alpha-numeric’ or ‘non-numeric’, or
‘character’, i.e., no statistical/mathematical analysis is employed on these variables, though these
variables may be used as grouping variables for classification of the underlying data. The readers
SAS Programming Essentials for Statistical Computing in Dairy Research
should note it carefully that the data values assumed by the lact variable seem to be numeric (1-
4) but these values are like only serial numbers, which do not directly participate in the data
analysis, e.g., you will never compute average lactation number as it makes no sense. Hence,
such variables are better defined as character type variables (to be used as grouping variables)
rather than numeric variables, for you will not appreciate if the lactation number is represented as
1.00, 2.00…4.00. The data type of the variable, my (denoting milk yield) is numeric. Further, the
statement ‘cards’ used in conjunction with the input statement indicates to the SAS system that
each of the rows following cards statement represent one case or record pertaining to the milk
production of a particular breed of dairy cow in a specific state, pertaining to a specific lactation
number. Several cases can be included here and the end of the data is marked with a semi-colon
(;). Note it carefully that the DATA step does not terminate with ‘run’ statement in such a case;
rather it ends with only a semi-colon. Also, note that each variable in input statement is separated
with a blank space. Each case is written on a separate line and the data value for each variable on
this line is written in the same order as defined in the input statement. This program produces
same output as shown in Fig.9.
Example 3: The following table gives the gain in body weight (kg) per heifer during four
grazing treatments:
Heifer number Gain in body weight (kg)
Treatments
T1 T2 T3 T4
1 67.3 74.2 63.1 48.7
2 36.9 42.2 32.9 49.0
3 63.2 58.6 59.2 62.0
4 26.8 36.6 42.4 38.8
5 54.8 54.6 34.0 48.2
6 64.2 81.8 65.6
7 81.4
Test whether the mean gain in weight of heifers under four treatments is equal or not?
You know that this is a typical example of one-way Analysis of Variance (ANOVA). Therefore,
the SAS program to first create the data set and then perform the one-way ANOVA on this data
set is as follows:
LIBNAME aksas 'd:\aksas';
DATA aksas.bla256;
INPUT TREAT $ GAIN;
CARDS;
T1 67.3
T1 36.9
T1 63.2
T1 26.8
T1 54.8
T1 64.2
SAS Programming Essentials for Statistical Computing in Dairy Research
T1 81.4
T2 74.2
T2 42.2
T2 58.6
T2 36.6
T2 54.6
T2 81.8
T3 63.1
T3 32.9
T3 59.2
T3 42.4
T3 34.0
T3 65.6
T4 48.7
T4 49.0
T4 62.0
T4 38.8
T4 48.2
;
PROC ANOVA DATA=aksas.bla256;
CLASS TREAT;
MODEL GAIN=TREAT;
TITLE 'COMPARE MEAN GAIN IN BODY WEIGHT OF HEIFERS UNDER FOUR
TREATMENTS - ONE-WAY ANOVA';
RUN;
QUIT;
Program explanation (Example 3):
Similar to previous Example-2, the DATA step here creates data set containing data on two
variables, TREAT and GAIN denoting grazing treatments (grouping variable) and gain in body
weight, respectively. Note that the cases are presented sequentially treatment wise, i.e., following
the order: T1, T2, T3 and T4. The SAS system expects you to present the one-way data for
ANOVA in this fashion only. Now, further look at the PROC ANOVA step. (Note: the PROC
ANOVA is presented here so as to highlight the layout of the data within the data set under
DATA step in perspective with ANOVA as the arrangement of data in data set depends upon the
intended analysis tool to be employed on the data set). The PROC ANOVA step performs one-
way analysis of variance procedure on the underlying data set. PROC ANOVA is relatively easy
to use. Let’s look at the code statement-by-statement.
PROC ANOVA DATA = aksas.bla256;
This statement defines that we want to use the ANOVA procedure on the data set called
aksas.bla256.sas7bdat, which was created under the DATA step above. Now, consider the next
statement in PROC ANOVA step:
CLASS TREAT;
SAS Programming Essentials for Statistical Computing in Dairy Research
This defines TREAT (independent variable) as class variable. This is how we specify the
associated values to be grouped. Further, consider the next statement:
MODEL GAIN=TREAT;
It tells SAS what model we want to test. This statement signifies that GAIN (on LHS to ‘=’ sign)
is the response variable or dependent variable and TREAT (on RHS to ‘=’ sign) is a factor or
independent variable. Complete description of any SAS step and statements contained therein
can be obtained through inbuilt HELP documentation. This program produces the ANOVA table
along with several other results as follows:
COMPARE MEAN GAIN IN BODY WEIGHT OF HEIFERS UNDER FOUR
TREATMENTS - ONE-WAY ANOVA
The ANOVA Procedure Class Level Information
Class Levels Values
TREAT 4 T1 T2 T3 T4
Number of Observations Read 24
Number of Observations Used 24
The ANOVA Procedure
Dependent Variable: GAIN
Source DF
Sum of
Squares Mean Square F Value Pr > F
Model 3 359.889964 119.963321 0.48 0.7022
Error 20 5035.919619 251.795981
Corrected Total 23 5395.809583
R-Square Coeff Var Root MSE GAIN Mean
0.066698 29.60233 15.86808 53.60417
Source DF Anova SS Mean Square F Value Pr > F
TREAT 3 359.8899643 119.9633214 0.48 0.7022
Note: Complete details of the analysis and interpretation of the results are discussed elsewhere
(by different authors) in this e-course.
SAS Programming Essentials for Statistical Computing in Dairy Research
Example 4: The following table gives the fat content for four batches of paneer :
Batch
1 2 3 4
Fat
Conte
nt
50.3 54.1 57.5 55.3
52.2 53.7 56.3 52.8
52.5 52.5 55.8 53.5
51.7 55.2 56.9 51.7
52.6 55.8
53.2
Write a SAS program to create data set to analyse the data for variation among batches with
regard to fat content in paneer.
This is the example of one-way ANOVA, which is similar to the previous Example-3. Therefore,
the SAS program to create the requisite data set file for the data given in the above table and to
print the observations contained in the newly created data file is a follows:
OPTIONS NODATE;
LIBNAME aksas 'd:\aksas';
DATA aksas.ex4;
INPUT batch $ fat_cont;
cards;
B1 50.3
B1 52.2
B1 52.5
B1 51.7
B1 52.6
B1 53.2
B2 54.1
B2 53.7
B2 52.5
B2 55.2
B3 57.5
B3 56.3
B3 55.8
B3 56.9
B3 55.8
B4 55.3
B4 52.8
B4 53.5
B4 51.7
;
PROC PRINT DATA = aksas.ex4;
TITLE "Analysis of variance between batches w.r.t. fat content in Paneer";
RUN;
TITLE;
OPTIONS;
QUIT;
SAS Programming Essentials for Statistical Computing in Dairy Research
Program output (Example 4)
Analysis of variance between batches w.r.t. fat content in Paneer
Obs batch fat_cont
1 B1 50.3
2 B1 52.2
3 B1 52.5
4 B1 51.7
5 B1 52.6
6 B1 53.2
7 B2 54.1
8 B2 53.7
9 B2 52.5
10 B2 55.2
11 B3 57.5
12 B3 56.3
13 B3 55.8
14 B3 56.9
15 B3 55.8
16 B4 55.3
17 B4 52.8
18 B4 53.5
19 B4 51.7
Example 5: Four batches of butter were prepared by churning cream. Three fat levels in cream
and three churning speeds were taken for preparation of butter. The moisture content (mc) in
butter for the four batches is given below:
Bat
ch
Fat Level
30% 40% 50%
Churning Speed
(RPM)
Churning Speed
(RPM)
Churning Speed
(RPM)
1600 1700 1800 1600 1700 1800 1600 1700 1800
1 66.8 69.1 69.3 77.9 77.8 76.1 71.7 72.2 71.9
2 68.1 68.9 70.1 77.6 78.8 76.2 72.3 71.9 71.5
3 67.0 69.2 70.0 77.7 78.7 76.0 72.3 72.2 71.8
4 67.5 68.1 68.9 77.1 77.6 75.7 71.5 71.4 71.3
Write a SAS program to prepare a data file to perform two-way ANOVA with interaction.
This is an example of two-way classification of data with observations per cell ANOVA. The
program to create SAS data set file for this problem is given below (Note: Codes used for the
variables, fat level (f_level), speed (speed) and batch (batch) are 1-3, 1-3 and 1-4, respectively) :
SAS Programming Essentials for Statistical Computing in Dairy Research
LIBNAME aksas 'd:\aksas';
DATA aksas.twoway_m;
INPUT f_level speed batch mc;
CARDS;
1 1 1 66.8
1 2 1 69.1
1 3 1 69.3
2 1 1 77.9
2 2 1 77.8
2 3 1 76.1
3 1 1 71.7
3 2 1 72.2
3 3 1 71.9
1 1 2 68.1
1 2 2 68.9
1 3 2 70.1
2 1 2 77.6
2 2 2 78.7
2 3 2 76.4
3 1 2 72.3
3 2 2 71.9
3 3 2 71.8
1 1 3 67.0
1 2 3 69.2
1 3 3 70.0
2 1 3 77.0
2 2 3 78.2
2 3 3 76.0
3 1 3 72.2
3 2 3 72.3
3 3 3 71.8
1 1 4 67.5
1 2 4 68.1
1 3 4 68.9
2 1 4 77.1
2 2 4 77.6
2 3 4 75.7
3 1 4 71.5
3 2 4 71.4
3 3 4 71.3
;
PROC GLM DATA = aksas.twoway_m;
CLASS f_level speed batch;
MODEL mc = f_level speed f_level*speed;
TITLE 'Analyse the effects of fat level and speed';
TITLE2 'Including Interaction';
RUN;
QUIT;
SAS Programming Essentials for Statistical Computing in Dairy Research
Program output (Example 5):
Analyse the effects of fat level and speed Including Interaction
Obs f_level speed batch mc
1 1 1 1 66.8
2 1 2 1 69.1
3 1 3 1 69.3
4 2 1 1 77.9
5 2 2 1 77.8
6 2 3 1 76.1
7 3 1 1 71.7
8 3 2 1 72.2
9 3 3 1 71.9
10 1 1 2 68.1
11 1 2 2 68.9
12 1 3 2 70.1
13 2 1 2 77.6
14 2 2 2 78.7
15 2 3 2 76.4
16 3 1 2 72.3
17 3 2 2 71.9
18 3 3 2 71.8
19 1 1 3 67.0
20 1 2 3 69.2
21 1 3 3 70.0
22 2 1 3 77.0
23 2 2 3 78.2
24 2 3 3 76.0
25 3 1 3 72.2
26 3 2 3 72.3
27 3 3 3 71.8
28 1 1 4 67.5
29 1 2 4 68.1
30 1 3 4 68.9
31 2 1 4 77.1
32 2 2 4 77.6
33 2 3 4 75.7
34 3 1 4 71.5
35 3 2 4 71.4
36 3 3 4 71.3
SAS Programming Essentials for Statistical Computing in Dairy Research
Example programs (using data set, aksas.lactation.sas7bdat given in Example-1 above)
Ex.1. Let us learn how to work with SAS data set and to access and browse SAS data libraries.
The following program demonstrates how to generate a list of all the SAS files in the data library
(aksas in this case) using _ALL_ keyword. Note that NODS option suppresses the descriptor
portions of the data sets. NODS is used in conjunction with the keyword _ALL_.
LIBNAME aksas “d:\aksas”;
PROC CONTENTS DATA = aksas._ALL_ nods;
RUN;
Ex.2. The following program displays contents of the aksas.test data set file as well as prints
listing of the data set according to specified condition and order.
OPTIONS NODATE;
LIBNAME aksas “d:\aksas”;
DATA aksas.test;
SET aksas.lactation;
RUN;
PROC CONTENTS DATA = aksas.test;
RUN;
PROC PRINT DATA=aksas.test;
TITLE “State, Lactation and Breed wise Milk Yield Report”;
WHERE state=‘Haryana’;
VAR my lact;
RUN;
Output produced by SAS Program (Ex.2):
State, Lactation and Breed wise Milk Yield Report
Obs MY LACT
1 18 1
2 19 2
3 21 3
4 20 4
5 23 1
6 25 2
7 22 3
8 21 4
9 22 1
10 24 2
11 23 3
12 23 4
13 27 1
14 26 2
SAS Programming Essentials for Statistical Computing in Dairy Research
15 25 3
16 25 4
Program explanation (Ex.2):
Two new statements, where and var have been introduced under PROC PRINT procedure, which
need explanation. The other statements have been defined already. A where statement is used to
select cases from a SAS data set that meet a particular condition. Thus, where statement subsets
the input data by specifying certain condition(s) that each observation must meet before it is
available for processing. The condition(s) that you define in a where statement is an arithmetic or
logical expression that generally consists of a sequence of operands and operators. To compare
character values, you must enclose them in single or double quotation marks and the values must
match exactly, including capitalisation. Using the where statement might improve the efficiency
of your SAS programs because SAS is not required to read all the observations in the input data
set. Note: You can use only one where statement in a DATA step or a PROC step.
Now, let us further discuss the program described in this Example-2. The where statement is
used here to define certain condition or compound condition (connected through AND, OR or
NOT logical operators, etc. For more details on SAS operators visit:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/
viewer.htm#a000780367.htm) to be tested by SAS for selective data extraction from source data
set (aksas.lactation.sas7bdat in this case) into the destination data set (aksas.test.sas7bdat in this
exercise program). Consider the following statement:
WHERE state=‘Haryana’;
This instructs to SAS that extract all those cases from the original data set for which the state
variable contains data value as Haryana and stores them into destination data set file aksas.test.
The ‘=’ sign used in such conditions as above is for comparison and not for equality. Note that
state variable is of character data type; therefore, while matching its data value, Haryana is
enclosed in single quote marks. If the variable is numeric type, we equate the data value to the
variable as such, e.g., my > 18 (i.e., extract all the cases for which milk yield is greater than 18).
We can replace earlier condition with the following:
WHERE state=‘Haryana’ and my between 20 and 30;
This statement would extract all the cases having state as Haryana as well as milk yield lying in
the interval [20, 30] (see output with this compound condition instead of earlier simple condition,
at the end of this paragraph).
Now, consider the statement:
VAR my lact;
This indicates to SAS that while printing the information from the destination data set file,
consider only variables listed under var statement, i.e., my and lact in this case. The information
will be printed according to the order in which the variable names are mentioned under var
statement irrespective of the original order in which these variables have been actually stored in
SAS Programming Essentials for Statistical Computing in Dairy Research
computer memory. Hence, we can alter the order of variables while printing the report without
disturbing their original settings.
Output of SAS Program (Ex.2) with compound condition in where statement:
State, Lactation and Breed wise Milk Yield Report
Obs MY LACT
1 21 3
2 20 4
3 23 1
4 25 2
5 22 3
6 21 4
7 22 1
8 24 2
9 23 3
10 23 4
11 27 1
12 26 2
13 25 3
14 25 4
Programs for practice (All the following programs use data set, aksas.lactation.sas7bdat given
in Example-1 above)
Now the readers can try the following programs similar to the program discussed in Example-2
above. You will come across some new statements and options that are explained; however, you
can comprehend the new statements through inbuilt SAS help.
Ex.1. Program to subset cases from the master or base data set file, lactation.sas7bdat into sub
data set or target file, red_sin.sas7bdat pertaining to ‘Red Sindhi’ breed of dairy cattle for both
the states. Logical operator OR is used to form the compound condition in PROC PRINT step;
this means that it will extract the case if either of the two conditions is found TRUE. (Note: This
is unlike AND operator used in Example-2; which was used to test both the conditions to be
TRUE simultaneously). As stated earlier, spellings and case for the pattern matching string say,
‘Panjab’ should be unique throughout. If you store the state as Panjab and search for Punjab or
panjab, etc., SAS will not be able to find it even though the one exists in data set file!
SAS Program (Ex.1):
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
Data aksas.red_sin;
Set aksas.lactation;
RUN;
SAS Programming Essentials for Statistical Computing in Dairy Research
PROC CONTENTS DATA = aksas.red_sin;
RUN;
Proc Print Data=aksas.red_sin;
Title "State, Lactation and Breed wise Milk Yield Report";
Where (state=‘Haryana’ and breed=‘Red Sindhi’) or (state=‘Panjab’ and breed=‘Red Sindhi’);
Var lact my;
Run;
Ex.2. This program subsets and prints the Haryana state’s cases on variables breed, lact and
my.
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_hry;
SET aksas.lactation;
WHERE state=‘Haryana’;
RUN;
PROC PRINT DATA=aksas.lact_hry;
TITLE "State, Lactation and Breed wise Milk Yield Report";
VAR breed lact my;
RUN;
Ex.3. This program subsets and prints the Panjab state’s cases on variables state, breed, lact and
my.
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_pb;
SET aksas.lactation;
WHERE state = ‘Panjab’;
RUN;
PROC PRINT DATA=aksas.lact_pb;
TITLE "State, Lactation and Breed wise Milk Yield Report";
VAR state breed lact my;
RUN;
Ex.4. This subsets and prints the Panjab state’s cases on variables state, breed, lact and my
according to the criterion ‘my between 30 and 35’. This example demonstrates a different way to
perform the operation ‘my between 30 and 35’. SAS provides different tools to perform such
operations. Referring to the inbuilt Help is a good idea to learn new tools in SAS.
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_pb;
SET aksas.lactation;
WHERE state = ‘Panjab’ and (my>=30 and my<=35);
RUN;
PROC PRINT DATA=aksas.lact_pb;
SAS Programming Essentials for Statistical Computing in Dairy Research
TITLE "State, Lactation and Breed wise Milk Yield Report";
VAR state breed lact my;
RUN;
Ex.5. This is similar to the Ex.4; and subsets and prints the Panjab state’s cases on variables
state, breed, lact and my according to the criterion ‘my between 30 and 35’ using still another
option to perform the same operation ‘my between 30 and 35’.
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_pb;
SET aksas.lactation;
WHERE state = ‘Panjab’ and (30<=my<=35);
RUN;
PROC PRINT DATA=aksas.lact_pb;
TITLE "State, Lactation and Breed wise Milk Yield Report";
VAR state breed lact my;
RUN;
Ex.6. This program is also similar to the Ex.4; and i) subsets and prints the Panjab state’s cases
on variables state, breed, lact and my according to the criterion ‘my between 30 and 35’; and ii)
subsets and prints the Haryana state’s cases on variables state, breed, lact and my according to
the criterion ‘my > 22’. This program demonstrates the use of ‘like’ operator with where
statement. The like operator selects cases by comparing the values of a character variable to a
specified pattern (called pattern matching). The like operator is case sensitive. There are two
special characters available for specifying a pattern:
a) Percent sign (%) – specifies that any number of characters can occupy that position. The
following where expressions select all cases having state with a data value that starts with
the letters ‘Pan’ and ‘H’, respectively. The data value contained in variable, state can be
of any length.
WHERE state like ‘Pan%’;
WHERE state like ‘H%’;
b) Underscore (_) – matches just one character in the value for each underscore character.
You can specify more than one consecutive underscore character in a pattern, and you
can specify a percent sign and an underscore in the same pattern.
WHERE state LIKE 'P_n%'; (Note: data may now contain mixed names like Panjab or
Punjab)
WHERE state LIKE ‘Har_a%’; (Note: data may now contain mixed names like Haryana
or Hariana)
SAS Programming Essentials for Statistical Computing in Dairy Research
SAS Programs (Ex.6):
i)
OPTIONS NODATE;
LIBNAME aksas "d:\aksas";
DATA aksas.lact_pb;
SET aksas.lactation;
WHERE state LIKE "Pan%" and (my between 30 and 35);
RUN;
PROC PRINT DATA=aksas.lact_pb;
TITLE "State, Lactation and Breed wise Milk Yield Report";
VAR state breed lact my;
RUN;
Program Output (Ex.6-i):
State, Lactation and Breed wise Milk Yield Report
Obs STATE BREED LACT MY
1 Panjab Hariana 3 35
2 Panjab Tharparkar 1 30
3 Panjab Tharparkar 2 32
4 Panjab Tharparkar 4 30
5 Panjab Sahiwal 3 30
ii) OPTIONS NODATE;
LIBNAME aksas "d:\aksas";
DATA aksas.lact_hry;
SET aksas.lactation;
WHERE state LIKE "H%" and (my>22);
RUN;
PROC PRINT DATA=aksas.lact_hry;
TITLE "State, Lactation and Breed wise Milk Yield Report";
VAR state breed lact my;
RUN;
Program output (Ex.6-ii):
State, Lactation and Breed wise Milk Yield Report
Obs STATE BREED LACT MY
1 Haryana Tharparkar 1 23
2 Haryana Tharparkar 2 25
3 Haryana Sahiwal 2 24
4 Haryana Sahiwal 3 23
SAS Programming Essentials for Statistical Computing in Dairy Research
5 Haryana Sahiwal 4 23
6 Haryana Red Sindhi 1 27
7 Haryana Red Sindhi 2 26
8 Haryana Red Sindhi 3 25
9 Haryana Red Sindhi 4 25
Ex.7. This program subsets and prints the cases for the state of Haryana. This example
demonstrates a different way to perform the operation through subsetting IF statement rather
than where statement. The IF statement causes the DATA step to continue processing only those
raw data records or those observations from a SAS data set that meet the condition of the
expression that is specified in the IF statement. That is, if the expression is TRUE for the case
(its value is neither 0 nor missing), SAS continues to execute statements in the DATA step and
includes the current case in the data set. The resulting SAS data set(s) contains a subset of the
original external file or SAS data set. If the expression is FALSE (its value is 0 or missing), no
further statements are processed for that case, the current case is not written to the data set, and
the remaining program statements in the DATA step are not executed. SAS immediately returns
to the beginning of the DATA step because the subsetting IF statement does not require
additional statements to stop processing cases. The subsetting IF statement is equivalent to this
IF-THEN statement, e.g.
IF (condition)
THEN SAS statement;
While creating SAS data sets, you can use the subsetting IF statement when it is easier to specify
a condition for including cases. When it is easier to specify a condition for excluding cases, use
the DELETE statement. See examples below for understanding the logic.
The subsetting IF and the where statements are not equivalent! The two statements work
differently and produce different output data sets in some cases. The most important differences
are summarised below:
The subsetting IF statement selects observations that have been read into the program
data vector. The where statement selects observations before they are brought into the
program data vector. The subsetting IF might be less efficient than the where statement
because it must read each observation from the input data set into the program data
vector.
The subsetting IF statement and where statement can produce different results in DATA
steps that interleave, merge, or update SAS data sets.
The subsetting IF statement can select cases from an existing SAS data set or from raw
data that are read with the INPUT statement. The where statement can select cases only
from existing SAS data sets.
The subsetting IF statement is executable; the where statement is not.
SAS Programming Essentials for Statistical Computing in Dairy Research
SAS Program (Ex.7):
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_h;
SET aksas.lactation;
IF (state = ‘Haryana’)
THEN DELETE;
PROC PRINT DATA=aksas.lact_h;
RUN;
Program Output (Ex.7):
State, Lactation and Breed wise Milk Yield Report
Obs STATE BREED LACT MY
1 Panjab Hariana 1 25
2 Panjab Hariana 2 36
3 Panjab Hariana 3 35
4 Panjab Hariana 4 37
5 Panjab Tharparkar 1 30
6 Panjab Tharparkar 2 32
7 Panjab Tharparkar 3 29
8 Panjab Tharparkar 4 30
9 Panjab Sahiwal 1 26
10 Panjab Sahiwal 2 29
11 Panjab Sahiwal 3 30
12 Panjab Sahiwal 4 38
13 Panjab Red Sindhi 1 29
14 Panjab Red Sindhi 2 25
15 Panjab Red Sindhi 3 27
16 Panjab Red Sindhi 4 28
Ex.8. This program is similar to the previous example.
OPTIONS NODATE;
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lact_p;
SET aksas.lactation;
IF NOT (state LIKE ‘H%’)
THEN DELETE;
PROC PRINT DATA=aksas.lact_p;
RUN;
SAS Programming Essentials for Statistical Computing in Dairy Research
Program Output (Ex.8):
State, Lactation and Breed wise Milk Yield Report
Obs STATE BREED LACT MY
1 Panjab Hariana 1 25
2 Panjab Hariana 2 36
3 Panjab Hariana 3 35
4 Panjab Hariana 4 37
5 Panjab Tharparkar 1 30
6 Panjab Tharparkar 2 32
7 Panjab Tharparkar 3 29
8 Panjab Tharparkar 4 30
9 Panjab Sahiwal 1 26
10 Panjab Sahiwal 2 29
11 Panjab Sahiwal 3 30
12 Panjab Sahiwal 4 38
13 Panjab Red Sindhi 1 29
14 Panjab Red Sindhi 2 25
15 Panjab Red Sindhi 3 27
16 Panjab Red Sindhi 4 28
Ex.9. This program is also similar to previous examples.
LIBNAME aksas ‘d:\aksas’;
data aksas.lact_ph;
set aksas.lactation;
if not (my>=25)then
delete;
proc print data=aksas.lact_ph;
run;
Program Output (Ex.9):
State, Lactation and Breed wise Milk Yield Report
Obs STATE BREED LACT MY
1 Haryana Tharparkar 2 25
2 Haryana Red Sindhi 1 27
3 Haryana Red Sindhi 2 26
4 Haryana Red Sindhi 3 25
5 Haryana Red Sindhi 4 25
6 Panjab Hariana 1 25
7 Panjab Hariana 2 36
8 Panjab Hariana 3 35
9 Panjab Hariana 4 37
SAS Programming Essentials for Statistical Computing in Dairy Research
10 Panjab Tharparkar 1 30
11 Panjab Tharparkar 2 32
12 Panjab Tharparkar 3 29
13 Panjab Tharparkar 4 30
14 Panjab Sahiwal 1 26
15 Panjab Sahiwal 2 29
16 Panjab Sahiwal 3 30
17 Panjab Sahiwal 4 38
18 Panjab Red Sindhi 1 29
19 Panjab Red Sindhi 2 25
20 Panjab Red Sindhi 3 27
21 Panjab Red Sindhi 4 28
KEEP and DROP statements – dropping and keeping variables
SAS will automatically write to any data set created by a DATA step all variables in the input
data set(s) along with all new variables created in the DATA step. If you don’t want all these
variables, you can indicate this to SAS in one of several ways. KEEP and DROP statements are
used often to control the number of variables read into and output into the data sets. You can
create several variables during the data processing, but need to save only selected variables in the
final data set. If you want to restrict the number of columns in output (or target) data set, use the
following method. This will ensure that output data set is created with required variables only.
DATA aksas.lacpb (keep = state my);
SET aksas.lactation;
RUN;
Alternately, you can specify the first statement as follows:
DATA aksas.lacpb;
SET aksas.lactation;
KEEP state my;
RUN;
If you are reading a big data set into SAS and wish to extract only a few variables from it, use the
following statements in the program.
DATA aksas.lacpb;
SET base_data (keep = state breed my);
WHERE state = ‘Panjab’;
RUN;
In the first case, SAS reads the entire data set lactation, even though you only intend to use a few
variables. In the second case, SAS reads from storage only the three variables you intend to keep.
SAS Programming Essentials for Statistical Computing in Dairy Research
Note that we have to use such efficient methods to restrict the data read into the system so as to
optimise the system resources such as SAS Libraries and shared drives.
The same way DROP statement can also be specified based on the data requirement of the user.
Now, try following simple programs.
Ex.10: The following programs demonstrate KEEP/DROP statements to subset data from master
(base) data set into target data set.
i)
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lactpb;
SET aksas.lactation;
KEEP breed lact my;
WHERE state = ‘Panjab’;
RUN;
PROC PRINT DATA=aksas.lactpb;
Run;
ii)
LIBNAME aksas ‘d:\aksas’;
DATA aksas.lactpb;
SET aksas.lactation;
DROP state;
WHERE state = ‘Panjab’;
RUN;
PROC PRINT DATA=aksas.lactpb;
VAR breed lact my;
RUN;
SAS Programming Essentials for Statistical Computing in Dairy Research
Suggested Readings & WWW Resources
[1] An Introduction to SAS Data Steps. Social Science Computing Cooperative, University
of Wisconsin-Madison. Accessed May 23, 2013. Available online at:
http://www.ssc.wisc.edu/sscc/pubs/4-18.htm.
[2] SAS Institute Inc., SAS/IML User’s Guide, Version 8, Cary, NC: SAS Institute Inc.,
1999. pp. 846. Accessed May 23, 2013. Available online at:
http://ciser.cornell.edu/sasdoc/saspdf/iml/chap6.pdf.
[3] Gupta S, 2005. WHERE vs. IF Statements: Knowing the Difference in How and When to
Apply. Paper: 238-31. http://www2.sas.com/proceedings/sugi31/238-31.pdf.
[4] SAS Operators.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a00
0 780367.htm.
[5] Statistical Computing with SAS. The Institute for Digital Research and Education (IDRE)
Research Technology Group. The University of California, Los Angeles (UCLA).
Accessed: May 24, 2013. http://www.ats.ucla.edu/stats and https://idre.ucla.edu/stats/sas.
[6] Kurian, J. S for SAS: A Beginners’ Guide. Accessed May 24, 2013. Available online at:
http://www.puzha.com/sasbook/s4sas.pdf.
[7] Reference Manual - Data Analysis using SAS. Accessed June 18, 2013. Available online
at: http://www.iasri.res.in/sscnars/contentmain.htm.