I OWA S TATE U NIVERSITY Department of Animal Science Working with Your Data (Chapter 2 in the...

32
IOWA STATE UNIVERSITY Department of Animal Science Working with Your Data (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 4 September 9, 2010

Transcript of I OWA S TATE U NIVERSITY Department of Animal Science Working with Your Data (Chapter 2 in the...

IOWA STATE UNIVERSITYDepartment of Animal Science

Working with Your Data (Chapter 2 in the Little SAS Book)

Animal Science 500

Lecture No. 4

September 9, 2010

IOWA STATE UNIVERSITYDepartment of Animal Science

Working with Your Data

To this point we have identified1. Many forms which data can stored and ultimately imported into

SAS1. Spreadsheets – Excel, Lotus, Quattro Pro, etc.

2. Databases – Access, SQL, others

3. Text files – from Word, WordPad, Notepad, others

4. Other fileformats

2. Many ways to import our data into SAS1. Import wizard

2. Infile statement

3. Others

Many options to use with the importing of the data, formatting the input data, etc.

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your Data

u Data stepn read and modify datan create a new datasetn performs actions on rows

u Proc stepn use an existing datasetn produce an output/resultsn performs actions on columns

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your Data

u Creating and redefining variables is straightforward in a SAS data stepn variable = expression;

u Examplesn Newvariable = constant;n Newvariable = oldvariable * constant;n Adjusted Backfat, growth rate, loin muscle area =

predetermined equation

IOWA STATE UNIVERSITYDepartment of Animal Science

Arithmetic Operators

Operation Symbol Example Result+ addition Num + Num

Example: 5 + 3add two numbers together

- subtraction Num - Num Example: 5 – 3 or can use two variables ending wt. – beginning wt.

subtract the value of 5 -3

* multiplication (table note 1)

2*yAlways have to have * cannot use 2(y) or 2y

multiply 2 by the value of Y

/ division var/5or can use variable weight gain / days on test

divide the value of VAR by 5

** can

also use the ^

exponentiation a**2or a^2

raise A to the second power

IOWA STATE UNIVERSITYDepartment of Animal Science

Comparison Operatorsu Comparison operators set up a comparison, operation,

or calculation with two variables, constants, or expressions within the dataset being used . n If the comparison is true, the result is 1. n If the comparison is false, the result is 0.

u Comparison operators can be expressed as symbols or with their mnemonic equivalents, which are shown in the following table:

IOWA STATE UNIVERSITYDepartment of Animal Science

Comparison Operators

SymbolMnemonic Equivalent Definition Example

= EQ equal to a=3

^= NE not equal to (table note 1) a ne 3

¬= NE not equal to

~= NE not equal to

> GT greater than num>5

< LT less than num<8

>= GE greater than or equal to (table note 2)

sales>=300

<= LE less than or equal to (table note 3) sales<=100

IN equal to one of a list num in (3, 4, 5)

IOWA STATE UNIVERSITYDepartment of Animal Science

Logical (Boolean) Operators and Expressions

Symbol Mnemonic Equivalent Example

& AND (a>b & c>d)

| OR (a>b or c>d)

! OR

¦ OR

¬ NOT not(a>b)

ˆ NOT

~ NOT

Logical operators, also called Boolean operators, are usually used in expressions to link sequences of comparisons.

IOWA STATE UNIVERSITYDepartment of Animal Science

Order of calculations

u The order in which any of the functions follow standard mathematical rules of precedence.

u To overcome this parentheses are used to override that order.

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your Data

u The DROP or KEEP statements n Used to decrease the number of variablesn Usually not a concern with datasets normally

encounteredn Remember that the variables are dropped or

retained (keep) within the SAS dataset unless you specify otherwise

IOWA STATE UNIVERSITYDepartment of Animal Science

Modifying your Data

Data new2; set new;

ADG = ((Finalwt. – Beginningwt) / DaysOnTest);

Drop Beginningwt DaysOnTest;

Proc Means;

Run;

Quit;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN Statements

u In cases where we want to assign some statement to some of your observations but not all.n For example adjustment factors for backfat, loin muscle

area, growth rate that differing by sex of animal

u Called condition – action statements

u IF condition THEN action;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN Statements

u Example1n If job='banker' then highsal=1;

u IF condition AND condition THEN action;

u Example2

u If job='banker' and age>65 then ret_banker=1;

u If job eq ‘banker’ and age ge 65 tehn ret_banker=1;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN Statements

u Example1n If job='banker' then highsal=1;

u IF condition AND condition THEN action;

u Example2

u If job='banker' and age>65 then ret_banker=1;

u If job eq ‘banker’ and age ge 65 tehn ret_banker=1;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using the IN Operator

u Using the IN operator makes comparisons and works similarly to the If – Then statement but gives a bit more flexibility

u IF Model IN (‘Corvette’, ‘Camaro’) Then make = ‘Cheverolet’;n Assumes you have a column or variable titled Modeln Creates new variable or column titled Cheverolet

IOWA STATE UNIVERSITYDepartment of Animal Science

Using the IN Operator

u Example using animal data.

u IF SEX IN (‘gilt’, ‘barrow’) Then adjustedBF = BF + ((actualwt – 250) * (actualbf / (actualwt – constant1)));

u IF SEX IN (‘boar”) Then adjustedBF = BF + ((actualwt – 250) * (actualbf / (actualwt – constant2)));

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN Statements

u A single IF – THEN statement can have only one action

u Using the key words DO and END then it is possible to execute more than 1 action

u Example IF condition THEN DO;

action;

action;

END;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN Statements

u A single IF – THEN statement can have only one action

u Using the key words DO and END then it is possible to execute more than 1 action

u Example IF Model = ‘Mustang” THEN DO;

Make = ‘Ford’;

Size = ‘Compact’;

END;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN Statements

u The AND and OR keywords can be used to specify multiple conditionsn IF condition AND condition THEN action;

u Examplel IF Model = ‘Mustang’ AND Year < 1975 THEN

Status = ‘Classic’; both conditions must be met to reach the ‘Classic” status

l IF Model = ‘Mustang’ OR Year < 1975 THEN Status = ‘Classic’; only one of the conditions must be met to reach the ‘Classic” status

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN /ELSE Statements

u Using the IF - Then / Else statement is typically used to group observations

u Basic form of statementn IF condition THEN action;

ELSE IF condition THEN action;

ELSE IF condition THEN action;

Advantages: When compared to regular IF – THEN statements

1. Computationally more efficient as it uses less computer time because once a condition is satisfied SAS skips the rest of the steps.

2. The ELSE statement is mutually exclusive thus preventing an observation from ending up in more than one group.

IOWA STATE UNIVERSITYDepartment of Animal Science

The DO – END statement

The DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions.

Note that the DO-loop continues until you end it using END;

Example

If sex = ’female’ then do;

u AdjustedBF = equation;

u AdjustedLMA = equation;

u AdjustedDAYS = equation;

u End;

IOWA STATE UNIVERSITYDepartment of Animal Science

The DO – END statement

The DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions.

Note that the DO-loop continues until you end it using END;

u Else if sex EQ ’male’ then do;

u ...

u ...

u end;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN /ELSE Statements

u Example:

u IF CowBC . THEN AdjustedCowBC = .; ELSE IF CowBC ge 9 THEN AdjustedCowBC = 5;

ELSE IF CowBC ge 7 and lt 9 THEN AdjustedCowBC = 4;

ELSE IF CowBC ge 5 and lt 7 THEN AdjustedCowBC = 3;

ELSE IF CowBC ge 3 and lt 5 THEN AdjustedCowBC = 2;

ELSE IF CowBC ge 1 and lt 3 THEN AdjustedCowBC = 1;

ELSE IF condition THEN action;

IOWA STATE UNIVERSITYDepartment of Animal Science

Using IF – THEN /ELSE Statementsu When the condition is true, SAS assigns the stated value

to AdjustedCowBC and then leaves the loop. The last ELSE is a trash-bin: anything that is not covered by the previous conditions is put to missing.

u Look at what gets assigned to the last ELSE statement as it may identify an error in the data set or other problems

u Make sure the condition part includes all possibilities, else you might get missings or hidden errors.

u Examine the log where it will reveal the number of missing values created, but there is no indication for observations that were not covered by your programming!

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting your data

u Sometimes researchers or programmers want to look at only a portion of the data that is collected.

u Can accomplish this using the IF statement

u Example only interested in gilts in a dataset that includes data from boars and barrows.n IF sex = ‘barrow’ THEN delete;n IF sex = ‘boar” THEN delete;

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting your data

u Sometimes researchers or programmers want to look at only a portion of the data that is collected.

u Another way to use the IF statement

u Rather than making it an deletion statement, make an inclusionary statement;

u Example only interested in gilts in a dataset that includes data from boars and barrows.n IF sex = ‘gilt’; this results in the program looking at data

where sex = gilt

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting your data

What is the difference when looking at only gilts if the statements

IF sex = ‘gilt’; this results in the program looking at data where sex = gilt;

IF sex = ‘barrow’ THEN delete; IF sex = ‘boar” THEN delete;

Dataset would include anything coded incorrectly in the dataset other than ‘barrow’ or ‘boar’

Using the inclusionary statement (IF sex =‘gilt”), requires that every line of data be examined

IOWA STATE UNIVERSITYDepartment of Animal Science

The RETAIN Statements

u Use the RETAIN statement when you want to keep some or all of the variable from a previous DATA step.n The RETAIN variable list; can appear anywhere

in the DATA stepn You can specify an initial value instead of

missing for variables as followsRETAIN variable list initial value;

IOWA STATE UNIVERSITYDepartment of Animal Science

The Sum Statements

u The Sum statement is used when have a cumulative total for some variable.

Example used in the book;

RETAIN MaxRuns;

MaxRuns = MAX (MaxRuns, Runs);

RunsToDate + Runs;

Might want to use something similar if you are totaling milk production across the days in lactation or pigs born alive across parity.

IOWA STATE UNIVERSITYDepartment of Animal Science

Using Arrays in SAS

u An array is a temporary holding site for a collection of variables upon which the same operations will be performed. n Arrays provide convenient shortcuts in programming.n An array is a group of variables that is user defined.n The array is user defined in the DATA step.

l All the variables in an array must either be characters or numeric CANNOT mix character and numeric variables in the same array.

IOWA STATE UNIVERSITYDepartment of Animal Science

Using Arrays in SAS

u Array name (n) $ (may or may not be included) variable list.n The number (n) must match the number of variables in

the list.n An array by itself does nothingn You create the array to perform some function that you

want to perform on all array variables (book uses changing missing value from 9 to .

IOWA STATE UNIVERSITYDepartment of Animal Science

Using Shortcuts for Lists of Variable Names

u Provides you a listing for inputting variables that have very similar names n Variable1, Variable2, Variable3, Variable4 and so forthn You could use an input statement that includes all of the names –

INPUT Variable1 Variable2 Variable3 Variable4;

n Alternatively you could write itn INPUT Variable1 – Variable4; and all variable will have

been inputted.