course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply...

28

Click here to load reader

Transcript of course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply...

Page 1: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

DSCI 325: Handout 5 – Conditional Statements in SAS

IF STATEMENT IN SAS

We have already worked briefly with IF-THEN statements in SAS. Here, we will discuss using IF-THEN statements in a DATA step to assign a value to a variable using one (or more) condition(s). Note that an IF-THEN statement takes on the following general form.

IF condition THEN action; [action is done only when condition is TRUE]

For example, consider once again the Grades data set. Suppose that the professor uses the following statements to calculate a final percentage grade for each student and to determine whether they pass or fail the course. Identify the condition and the action in each of the IF-THEN statements in the following code.

DATA Grades2; SET Grades;

TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;IF FinalPercent >= .70 THEN Status = 'Pass'; IF FinalPercent < .70 THEN Status = 'Fail';

RUN;

PROC PRINT DATA=Grades2 (OBS=5);VAR FirstName LastName FinalPercent Status;

RUN;

Note that for each observation in the data set, when SAS sees the keyword IF it evaluates the condition to determine whether it is true or false. If the condition is true, SAS takes the action that follows the keyword THEN. If the condition is false, SAS ignores the THEN clause and simply proceeds to the next statement in the DATA step.

A portion of the result of the PROC PRINT is shown below:

The condition in an IF-THEN statement typically involves the use of a comparison operator (such as “greater than or equal to”). SAS uses the following syntax for the standard comparison operators. Note

1

Page 2: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

that you can use either syntax when programming.

ComparisonSAS syntax using

symbolsSAS syntax using Mnemonic

EquivalentLess than < LTGreater than > GTLess than or equal to <= LEGreater than or equal to >= GEEqual to = EQNot equal to ^= or ~= NEEqual to one of a list IN( )

Change the IF-THEN statements in the above code to the following and verify that they work.

IF FinalPercent GE .70 THEN Status = 'Pass'; IF FinalPercent LT .70 THEN Status = 'Fail';

USING DO and END STATEMENTS

The IF-THEN statements discussed above only allows a single action to take place. If multiple actions need to take place if the condition is true, then you should use DO and END statements.

Typical IF Statement(single action)

IF Statement using DO and END(multiple actions)

IF condition THEN action; IF condition THEN DO; action 1; action 2; … END;

Goal: All towns in this dataset start with either an A or B. Create a new variable called Town and then create two binary variables, say ATown and BTown, that simply indicate whether or not the Town starts with an A or a B.

2

Page 3: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Consider the following code in an attempt to accomplish the task above.

DATA Grades3; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);

IF SUBSTR(Town,1,1) EQ 'A' THEN DO; ATown=1; BTown=0; END;

RUN;

Discuss in detail what the IF statement should do.

Consider the print-out from the above code. Is this what you expected?

3

Page 4: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Task: This dataset contains only towns that start with an A or B. Modify the code so that all B towns are given a value of 1 for BTown and 0 for ATown (instead of missing as was the case using the code above).

DATA Grades3; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);

IF SUBSTR(Town,1,1) EQ 'A' THEN DO; ATown=1; BTown=0; END;

RUN;

4

Page 5: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

IF-THEN-ELSE STATEMENTS

Next, note that the IF-THEN statement tells SAS what to do when the condition is true. To tell SAS what action to take when the condition is false, you could utilize an ELSE statement. For example, we could have also created the variable Status as follows.

IF condition THEN action1; [action1 is done only when condition is TRUE]ELSE action2; [action2 is done otherwise]

DATA Grades4; SET Grades;

TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;IF FinalPercent >= .70 THEN Status = 'Pass'; ELSE Status = 'Fail';

RUN;

PROC PRINT Data=Grades4 (OBS=5);VAR FirstName LastName FinalPercent Status;

RUN;

Task: Recall the following example from above. Use an ELSE statement to modify the code so that all B towns are given a value of 1 for BTown and 0 for ATown.

DATA Grades5; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);

IF SUBSTR(Town,1,1) EQ 'A' THEN DO; ATown=1; BTown=0; END;

RUN;5

Page 6: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

MISSING DATA ISSUES

Suppose that one of the students was ill and was unable to take the final exam as scheduled. In the data file GradesMissing.csv, this student’s final exam score is a missing value. Read this file into your personal library and run code similar to the following. What happens?

DATA Grades6; SET GradesMissing;

TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;IF FinalPercent >= .70 THEN Status = 'Pass'; ELSE Status = 'Fail';

RUN;

PROC PRINT Data=Grades6 (OBS=5);VAR FirstName LastName FinalPercent Status;

RUN;

Note: The SUM function returns the sum of all of the non-missing arguments, so SAS still calculates a value for FinalPercent. Alternatively, consider calculating FinalPercent as follows.

DATA Grades6; SET GradesMissing;

TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;IF FinalPercent >= .70 THEN Status = 'Pass'; ELSE Status = 'Fail';

RUN;

Note: The ‘+’ operator returns a missing value if any of the arguments are missing.

6

Page 7: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

If missing values are present in your data, then your code should specify actions for missing values. Consider the following code.

1234567891011

DATA Grades6; SET GradesMissing; TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12); TotalExam = SUM(Exam1,Exam2,Exam3);

FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;

IF FinalPercent =. THEN Status = ' '; IF FinalPercent >= .70 THEN Status = 'Pass'; ELSE Status = 'Fail';

RUN;

Questions:

1. Run this program. Is this the result you expected?

2. Why did you get an ‘F’ for Obs=3? Also, provide a possible explanation as to why ‘Pass’ was shortened to ‘P’ and ‘Fail’ shortened to ‘F’.

3. Put the following on Line 6 in the code above and re-run your program. Note: Missingness for a string variable in SAS must be an empty string and not a period; thus, the empty quotes in Line 7.

FORMAT Status $4.;

What is the purpose of this code? Explain.

4. Next, move the code provided on Line 7 to Line 10. This provides the desired result for Obs=3, why? What the value of Status is after Line 9 in your code. What is the value of Status after Line 10 in your code? Finally, Line 6 (the FORMAT Status $4.;) is not necessary when Line 7 is moved to Line 10, why?

7

Page 8: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

LOGICAL OPERATORS IN IF-THEN STATEMENTS

In addition to comparison operators, we can also use logical operators when writing a condition for an IF-THEN statement.

OperationSAS syntax using

symbolsSAS syntax using Mnemonic

EquivalentAll conditions must be true & AND

At least one condition must be true | or ! ORReverse the logic of a condition ^ or ~ NOT

For example, we could assign grades to the students in the file Grades.csv as follows.

DATA Grades7; SET Grades; TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);

TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;

IF (FinalPercent >= 0.90) THEN Grade='A'; IF (FinalPercent >= 0.80) AND (FinalPercent < 0.90) THEN Grade='B'; IF (FinalPercent >= 0.70) AND (FinalPercent < 0.80) THEN Grade='C'; IF (FinalPercent >= 0.60) AND (FinalPercent < 0.70) THEN Grade='D'; IF (FinalPercent < 0.60) THEN Grade='F';

RUN;

PROC PRINT Data=Grades7 (OBS=5); VAR FirstName LastName Final FinalPercent Grade;RUN;

Comment: Though it is not necessary, it is good practice to write programs that are easy to read (the earlier you develop this habit, the better off you’ll be). For example, in the above code, the conditions and actions have been aligned, and optional parentheses have been used above to offset the conditions.

Make your code readable…

8

Page 9: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Task: Now, suppose the instructor wants to give bonus points to students who show some sign of improvement from the beginning of the course to the end of the course. Suppose that two percentage points will be added to a student's final percentage grade if either their Exam 1 grade is less than their Exam 3 and Final grade or their Exam 2 grade is less than their Exam 3 and Final grade. Call this new variable FinalPercent_Adjusted. Finish the following SAS code to do this, and verify that your program works.

DATA Grades7; SET Grades; TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);

TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;

RUN;

9

Page 10: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

A Word of Caution Regarding IF-THEN Statements

Be careful when using multiple IF-THEN statements when creating a new variable. For example, suppose we tried to assign grades to the students in the file GradeMissing.csv as follows.

DATA Grades8; SET GradesMissing; TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);

TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;

FORMAT Grade $10.;

IF FinalPercent=. THEN Grade='Incomplete';IF FinalPercent >= 0.90 THEN Grade='A';

IF FinalPercent >= 0.80 AND FinalPercent < 0.90 THEN Grade='B'; IF FinalPercent >= 0.70 AND FinalPercent < 0.80 THEN Grade='C'; IF FinalPercent >= 0.60 AND FinalPercent < 0.70 THEN Grade='D'; IF FinalPercent < 0.60 THEN Grade='F';RUN;

PROC PRINT Data=Grades8 (OBS=5);VAR FirstName LastName Final FinalPercent Grade;

RUN;

The following is returned by SAS.

Questions: Is this the result you expected? What is the problem? What is SAS doing to you? Consider a re-ordering of the IF statements to fix this problem.

Comments: SAS is evaluating the missing value as being less than 0.60 and thus assigning a Grade of ‘F’. SAS evaluates lines from top down, thus, SAS evaluates each IF statement and the value for

Grade is overwritten each time a condition is met. Any change in the order of the statements may affect the results.

10

Page 11: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

USING IF-ELSE-IF STATEMENTS

A better approach to programming in the case above would be to use IF-THEN-ELSE statements. The general form is as follows:

IF condition1 THEN action1; [action1 is done only when condition1 is TRUE]ELSE IF condition2 THEN action2; [action2 is done only when condition 2 is TRUE]ELSE IF condition3 THEN action3; [action3 is done only when condition 3 is TRUE]

When using IF-ELSE-IF statements, once a condition is met, SAS stops checking the other IF-THEN statements. This not only prevents you from writing over values but is also more efficient.

Consider the following use of the IF-ELSE-IF statements.

DATA Grades9; SET GradesMissing; TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);

TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;

IF FinalPercent=. THEN Grade='Incomplete'; ELSE IF FinalPercent >= 0.90 THEN Grade='A';

ELSE IF FinalPercent >= 0.80 THEN Grade='B'; ELSE IF FinalPercent >= 0.70 THEN Grade='C'; ELSE IF FinalPercent >= 0.60 THEN Grade='D'; ELSE IF FinalPercent < 0.60 THEN Grade='F';RUN;

PROC PRINT Data=Grades9 (OBS=5);VAR FirstName LastName Final FinalPercent Grade;

RUN;

A portion of the results is shown below.

11

Page 12: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

In some situations, it will be more efficient to use an ELSE statement instead of an entire sequence of ELSE-IF statements for all possible conditions. That is, the last ELSE statement in a series may simply specify an action that is automatically executed for all observations that fail to satisfy any of the previous IF-THEN statements.

IF condition1 THEN action1; [action1 is done only when condition1 is TRUE]ELSE IF condition2 THEN action2; [action2 is done only when condition 2 is TRUE]ELSE action3; [action3 is done when all previous conditions are FALSE]

For example, we could have modified the above program as follows.

DATA Grades9; SET Grades_missing; TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);

TotalExam = SUM(Exam1,Exam2,Exam3);FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;

IF FinalPercent=. THEN Grade='Incomplete'; ELSE IF FinalPercent >= 0.90 THEN Grade='A';

ELSE IF FinalPercent >= 0.80 THEN Grade='B'; ELSE IF FinalPercent >= 0.70 THEN Grade='C'; ELSE IF FinalPercent >= 0.60 THEN Grade='D'; ELSE Grade='F';RUN;

PROC PRINT Data=Grades9 (OBS=5);VAR FirstName LastName Final FinalPercent Grade;

RUN;

12

Page 13: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Task: Consider the following SAS code.

DATA Test; INPUT ID Gender $ Score; DATALINES; 1 male 21 2 male 34 3 male 54 4 male 76 5 female 44 6 female 21 7 female 22 8 female 17;

DATA Test2; SET Test; IF Score < 30 THEN Group = 1; IF Gender = 'male' then Group = 2;RUN;

PROC PRINT DATA=Test2; RUN;

Predict the value of Group for each observation and write it in the table below. Then, run the program to verify your answer.

13

Page 14: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Now, suppose the code is changed to the following:

DATA Test2; SET Test; IF Score < 30 THEN Group = 1; ELSE IF Gender = 'male' then Group = 2;RUN;

PROC PRINT DATA=test2; RUN;

Once again, predict the value of Group for each observation and write it in the table below. Then, run the program to verify your answer.

Finally, consider changing the code as follows and predict the output. Run the program in SAS to check your answer.

DATA Test2; SET Test; IF Gender = 'male' THEN Group = 2; ELSE IF Score < 30 then Group = 1;RUN;

PROC PRINT DATA=Test2; RUN;

To avoid errors in your programs, make sure you understand how SAS deals with a series of IF-THEN or IF-THEN-ELSE statements.

14

Page 15: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

DEALING WITH STRINGS

Consider again the Grades.csv data set. The following code is used to create a new variable named Ada. The code assigns a 1 to this variable if the town is Ada and a ‘ . ‘ if the town is not Ada.

DATA Grades2; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8); IF Town = 'Ada' THEN Ada=1;RUN;

PROC PRINT DATA=Grades2 (OBS=3); VAR FirstName LastName ZipTownAreaCode ParenLocation Town Ada;RUN;

Comment #1: SAS is case-sensitive with regard to the string of text. For example, try to replace the above IF-THEN statement with the following: IF Town = 'ADA' THEN Ada=1;What happens?

Comment #2: This code does NOT work if you have leading or trailing spaces (which you cannot see) in your variable name. For example, in the following code, town contains a leading space, i.e. -7 instead of -8 in SUBSTR() function. Thus, in the first observation, Town is specified by _Ada, not Ada. As a result, the IF statement fails to recognize _Ada as Ada and thus does not assign Ada a value of 1.

DATA GradesBadTown; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,6,ParenLocation-7); IF Town = 'Ada' THEN Ada=1;RUN;

PROC PRINT DATA=GradesBadTown (OBS=3); VAR FirstName LastName ZipTownAreaCode ParenLocation Town Ada;RUN;

15

Page 16: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

The following program verifies this.

DATA GradesBadTown2; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,6,ParenLocation-7); IF Town = 'Ada' THEN Ada=1; TownLength = Length(Town); TownTest = '*'||Town; TownTestLength = Length(TownTest);RUN;

PROC PRINT DATA=GradesBadTown2 (OBS=3); VAR ZipTownAreaCode ParenLocation Town Ada TownLength TownTest TownTestLength;RUN;

You can remove any character (including a space) using the COMPRESS function in SAS. The first argument is the string with which you want to work; the second argument in the COMPRESS function is the character you’d like to remove.

DATA GradesBadTown3; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = COMPRESS(SUBSTR(ZipTownAreaCode,6,ParenLocation-7),' '); IF Town = 'Ada' THEN Ada=1; TownLength = Length(Town); TownTest = '*'||Town; TownTestLength = Length(TownTest);RUN;

PROC PRINT DATA=GradesBadTown3 (OBS=12); VAR ZipTownAreaCode ParenLocation Town Ada TownLength TownTest TownTestLength;RUN;

16

Page 17: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Note that the COMPRESS function removes ALL spaces, which is not desirable here because now any town with two words has its space removed (e.g., Albert_Lea is now AlbertLea). A better alternative is to use the STRIP function in SAS, which removes leading and trailing blanks from a string.

DATA GradesBadTown4; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = STRIP(SUBSTR(ZipTownAreaCode,6,ParenLocation-7)); IF Town = 'Ada' THEN Ada=1; TownLength = Length(Town); TownTest = '*'||Town; TownTestLength = Length(TownTest);RUN;

PROC PRINT DATA=GradesBadTown4 (OBS=12); VAR ZipTownAreaCode ParenLocation Town Ada TownLength TownTest TownTestLength;RUN;

17

Page 18: course1.winona.educourse1.winona.edu/.../dsci325/Handout5_SP18_v2.docx  · Web view... that simply indicate whether or not the Town starts ... make sure you understand how SAS deals

Tasks:

1. Suppose we’d like to create a variable named NotAda which takes on the value 1 for all towns NOT EQUAL to Ada and 0 otherwise. Write a program to do this.

2. Write a program which creates a variable named ATown which takes on the value 1 if the town is either Ada, Adams, or Adolph and 0 otherwise.

3. SAS has an IN operator that can be used to accomplish the previous task. The IN operator allows you to check a condition using a list. Verify that the following code produces the same output as your code for Task 2.

DATA Grades2; SET Grades; ParenLocation = INDEX(ZipTownAreaCode,'('); Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8); IF Town IN ('Ada','Adams', 'Adolph') THEN ATown=1;RUN;

4. Modify your code from Task 2 so that all towns that start with an ‘A’ are given a value of 1 for ATown and 0 otherwise.

18