BAS 150Lesson 4: Creating and Managing SAS Datasets & Formats and Labels
• Create a permanent SAS data set
• Effectively manage multiple SAS data sets and libraries
• Modify SAS data sets
• Evaluate the difference in data formats and labels
• Create a new variable in SAS using formats
This Lesson’s Learning Objectives
SAS Data Sets
SAS datasets can be permanent or temporary.
Previously, we’ve used the data statement to create
temporary SAS datasets.
In this lesson, we’ll learn how to create permanent SAS
datasets using the data statement.
They are subdirectories or folders
They store SAS datasets
Starts with libname statement
Understanding SAS Libraries (1 of 3)
Libref is the location name you give for the subdirectory
where SAS needs to look for the dataset
You will refer to this libref further in your programs
The libref for this libname statement is:
Understanding SAS Libraries (2 of 3)
Libref
o Naming Rules 8 characters or less
Begin with character or underscore
No special characters or spaces
o If you do not specify a libref… Temporary save to work library
Understanding SAS Libraries (3 of 3)
Creating a Permanent SAS Dataset
Two parts
o Descriptor
o Data
PROC Statements
o PROC Print
o PROC Contents
Format of SAS Datasets
PROC Contents (1 of 2)
PROC Contents can be used to display the metadata, or descriptor portion, of the SAS dataset.
PROC Contents (2 of 2) Note the key parts:
• Data set name• File name• Number of variables• Number of
observations• Variable list
Proc contents is an easy way to know what’s in your dataset
PROC Contents: VARNUM Option By default, all variables in the PROC contents procedure
are listed alphabetically
Sometimes, it is useful to look at the variable list in the order they were created
This can be done using the VARNUM option.
Before: After:
Useful for showing your results of your code
However, doing a default proc print option with a large dataset will
generate endless pages of output and can even make SAS freeze.
It’s better to specify options when using PROC Print, which we’ll discuss
next.
PROC Print
There are several options for Proc Print:
o Var – specifies the variables you want to print
o Noobs – suppresses the default observation column
o Obs – limits the number of observations
PROC Print Options
Importing a SAS Data Set Soccer is the Library
Soccer_Scores is the dataset within the library.
No need for infile statements to retrieve data.
The set statement is like an input statement, but instead of reading in a raw data file, it reads observations from a SAS dataset.
Modifying a SAS Data SetThere are many different ways to modify a SAS data set.
The example on the left uses the “Where” statement to subset the work.goals data set AND creates a new SAS data set called work.topgoals.
Work.topgoals only includes observations where the goal variable is greater than or equal to 10 goals.
Formats and Labels
Label Statement
Apply in DATA or PROC stepo DATA or PROC Datasets: permanent
o PROC steps: temporary
Using Labels (1 of 3)
Using Labels (2 of 3)• The first label statement is in the data
statement.
• It creates permanent descriptors for the
variables player, goals, age and
years_playing
• The second label statement is in the
proc print statement.
• Proc Print requires a label option
when you want to display labels
(instead of field names) in the column
header, because the default is the
variable name.
Using Labels (3 of 3)
• To the right is the output using our label statement
• The headings are now the labels, instead of variable names.
Similar to labels
Define appearance of data
Grouping
Predefined or custom
Both Data and Proc steps
o Data: Permanent
o Proc: Temporary
Formats
Format Rules
$ Indicates a character format.
format Names the SAS format.
w Specifies the total format width, including decimal places and special characters.
. Is required syntax. Formats always contain a period (.) as part of the name.
d Specifies the number of decimal places to display in numeric formats.
<$>format<w>.<d>
Commonly Used SAS FormatsFormat Definition
$w. Writes standard character data.
w.d Writes standard numeric data.
COMMAw.dWrites numeric values with a comma that separates every three digits and a period that separates the decimal fraction.
DOLLARw.dWrites numeric values with a leading dollar sign, a comma that separates every three digits, and a period that separates the decimal fraction.
COMMAXw.dWrites numeric values with a period that separates every three digits and a comma that separates the decimal fraction.
EUROXw.dWrites numeric values with a leading euro symbol (€), a period that separates every three digits, and a comma that separates the decimal fraction.
Pre-formatted value Format Formatted value
2125854 comma10. 2,125,854
52115 dollar14.2 $52,115.00
17526 mmddyy8. 12/26/07
17526 weekdate. Wednesday, December 26, 2007
M $Gender. Male
12 AgeGroup. Under 18
C $PassFail. Passing Grade
Examples of Formats
Format Names
Specifying Ranges of Values
Age of Player Value
5 - 6 Level 1
7-9 Level 2
10-12 Level 3
We can create our own formats to simplify data presentation by creating groups.
From the Soccer data, we can create a format for age so that each value represents a “Playing Level”
Example: Soccer players that are 5 or 6 years old are considered a “Level 1”, where those that are 7,8 or 9 are considered “Level 2”
Defining a Numeric FormatUsing PROC Format
We create a new format called “level”.
This format will convert the age variable into playing levels 1 – 3.
Notice we do not reference age at all in the proc statement. Why? Because a format can be used for any variable.
Applying a Numeric FormatOnce we’ve created the format, we can use it in PROC PRINT
Notice the period at the end of the second group of statements.
This is important and an easy place to make an error!
Remember, there is no period when you create the format, but a period is required when you use it.
Viewing the Output
Note: I changed the age variable label to ‘Playing Level’ for reporting purposes
Creating a New Variable in SASInstead of changing the variable label name for age, we can create a new variable called “Playing_Level”.
By using a PUT statement we can combine the age variable and the format level to create “Playing_Level”.
The Output.
• Create a permanent SAS data set
• Effectively manage multiple SAS data sets and libraries
• Modify SAS data sets
• Evaluate the difference in data formats and labels
• Create a new variable in SAS using formats
Summary - Learning Objectives
“This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s
Employment and Training Administration. The solution was created by the grantee and does not
necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor
makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such
information, including any information on linked sites and including, but not limited to, accuracy of the
information or its completeness, timeliness, usefulness, adequacy, continued availability, or
ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building Capacity in
Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative
Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
Copyright Information
Top Related