1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets...

74
1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

Transcript of 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets...

Page 1: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

1

Controlling Input and Output

Outputting Multiple Observations

Writing to Multiple SAS Data Sets

Selecting Variables and Observations

Page 2: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

2

Outputting Multiple Observations

Page 3: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

A Forecasting Application

The growth rate of six departments is stored in the Increase variable in the data set orion.growth. If each department grows at its predicted rate for the next two years, how many employees will be in each department at the end of each year?

3

libname orion "/folders/myfolders/sasdata/prg2" ;

proc print data=orion.growth;run;

Page 4: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

4

The output SAS data set, forecast, should contain 12 observations. Two observations are written for each observation read.

Page 5: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

5

The end of a DATA step iteration.

data forecast; set orion.growth; total_employees= total_employees * (1+increase);run;

Implicit OUTPUT;Implicit RETURN;

Page 6: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

6

Explicit OutputThe explicit OUTPUT statement writes the contents

of the program data vector (PDV) to the data set or data sets being created. The presence of an explicit OUTPUT statement overrides implicit output.

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

No Implicit OUTPUT;

Page 7: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

7

Compilation and Execution of the Data Step

Page 8: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

8

Compilationdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

PDV – Program Data Vector

Department$ 40

Total_Employees

N 8

IncreaseN 8

...

Page 9: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

9

Compilation

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

...

Page 10: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

10

Compilationdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

PDV

Department$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

...

Write descriptor portion of output data set

work.forecast, descriptor portionDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Page 11: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

11

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

. . . ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Initialize PDV

Page 12: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

12

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 34 0.25 . ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Initialize PDV

Page 13: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

13

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 34 0.25 1...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 14: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

14

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 42.5 0.25 1 ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

34 * (1 + 0.25)

Page 15: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

15

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 42.5 0.25 1

...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Output current observation

Page 16: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

16

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 42.5 0.25 2

...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 17: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

17

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

42.5 * (1 + 0.25)

Page 18: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

18

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run; Output current observation

Page 19: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

19

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

No Implicit OUTPUT;

Page 20: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

20

The forecast data set contains two observations after the first iteration of the DATA step.

work.forecast

Department Total_Employees

Increase Year

Administration 42.500 0.25 1Administration 53.125 0.25 2

Page 21: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

21

Prior to the second iteration of the DATA step, some variables in the program data vector will be reinitialized.

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 22: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

22

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 . ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Reinitialize PDV

Page 23: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

23

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Engineering 9 0.30 ....

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 24: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

24

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Engineering 9 0.30 1 ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 25: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

25

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Engineering 9 0.30 1

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Continue until EOF

Page 26: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

26

Modify the DATA step to write only one observation per department, showing the number of employees after two years.

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 27: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

27

Writing to Multiple SAS Data Sets

Page 28: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

28

Use the orion.employee_addresses data set as input to create three new data sets: usa, australia, and other.

The usa data set will contain observations with a Country value of US.

The australia data set will contain observations with a Country value of AU.

Observations with any other Country value will be written to the other data set.

Page 29: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

29

The Input Data Set

libname orion "/folders/myfolders/sasdata/prg2";proc print data=orion.employee_addresses;var employee_name city country;run;

Page 30: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

30

Creating Multiple DATA Sets

Multiple data sets can be created in a DATA step by listing the names of the output data sets in the DATA statement.

You can direct output to a specific data set or data sets by listing the data set names in the OUTPUT statement.

An OUTPUT statement without arguments writes to every SAS data set listed in the DATA statement.

DATA <SAS-data-set-1 … SAS-data-set-n>; DATA <SAS-data-set-1 … SAS-data-set-n>;

OUTPUT <SAS-data-set-1 … SAS-data-set-n>;OUTPUT <SAS-data-set-1 … SAS-data-set-n>;

Page 31: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

31

Creating Multiple SAS Data Sets

libname orion "/folders/myfolders/sasdata/prg2";data usa australia other; set orion.employee_addresses; if Country='AU' then output australia; else if Country='US' then output usa; else output other;run;

Check the SAS Log

Page 32: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

32

Efficient Conditional Processing

• It is more efficient to check values in order of decreasing frequency. data usa australia other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

NOTE: There were 424 observations read from the data set ORION.EMPLOYEE_ADDRESSES.NOTE: The data set WORK.USA has 311 observations and 9 variables.NOTE: The data set WORK.AUSTRALIA has 105 observations and 9 variables.NOTE: The data set WORK.OTHER has 8 observations and 9 variables.

Page 33: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

33

Displaying Multiple SAS Data SetsThe PRINT procedure can only print one data set. A separate PROC PRINT step is required for each data set.

title 'Employees in the United States';proc print data=usa;run;

title 'Employees in Australia';proc print data=australia;run;

title 'Non US and AU Employees';proc print data=other;run;

title;

Page 34: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

34

Using a SELECT GroupAn alternate form of conditional execution uses a SELECT group.

The select-expression specifies any SAS expression that evaluates to a single value. Often a variable name is used as the select-expression.

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

Page 35: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

35

Using a SELECT Group

The previous task could be rewritten using a SELECT group:

SELECT processes the WHEN statements from top to bottom, so it is more efficient to check the values in order of decreasing frequency.

data usa australia other; set orion.employee_addresses; select (Country);

when ('US') output usa; when ('AU') output australia; otherwise output other;

end;run;

Page 36: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

36

The OTHERWISE Statement

The OTHERWISE statement is optional, but omitting it will result in an error when all WHEN conditions are false.

Use OTHERWISE followed by a null statement to prevent SAS from issuing an error message.

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

otherwise;

Page 37: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

37

data usa australia; set orion.employee_addresses; select (Country);

when ('US') output usa; when ('AU') output australia;

end;run;

Page 38: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

38

data usa australia; set orion.employee_addresses; select (Country);

when ('US') output usa;when ('AU') output australia;

otherwise; end;run;

An OTHERWISE statement is required.

Page 39: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

39

Test for Multiple Values in a WHEN Statement

data usa australia other; set orion.employee_addresses; select (Country);

when ('US','us') output usa; when ('AU','au') output australia; otherwise output other;

end;run;

Page 40: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

40

Using Functions in a Select Expression data usa australia other; set orion.employee_addresses; select (upcase(Country));

when ('US') output usa; when ('AU') output australia; otherwise output other;

end;run;

Page 41: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

41

Omitting the Select ExpressionThe select-expression can be omitted in a SELECT group:

Each WHEN expression evaluates to true or false. If true, the associated statement(s) are executed. If false, SAS proceeds to the next WHEN statement. If all WHEN expressions are false, then the statement(s) following the OTHERWISE statement will execute.

SELECT; WHEN (expression-1) statement; <…WHEN (expression-n) statement;> <OTHERWISE statement;>END;

SELECT; WHEN (expression-1) statement; <…WHEN (expression-n) statement;> <OTHERWISE statement;>END;

Page 42: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

42

Omitting the Select Expression

data usa australia other; set orion.employee_addresses; select;

when (country='US') output usa; when (country='AU') output australia; otherwise output other;

end;run;

Page 43: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

43

Using DO-END in a SELECT Groupdata usa australia other; set orion.employee_addresses; select (country); when ('US') do; Benefits=1; output usa; end;

when ('AU') do; Benefits=2;

output australia; end;

otherwise do; Benefits=0;

output other; end; end;run;

Page 44: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

44

Selecting Variables and Observations

Page 45: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

45

Create three data sets that are subsets of orion.employee_addresses based on the value of the Country variable.

Name the data sets usa, australia, and other, and write different variables to each output data set.

Page 46: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

46

Controlling Variable Output (Review)

By default, the SAS System writes all variables from the input data set to every output data set.

In the DATA step, the DROP and KEEP statements can be used to control which variables are written to output data sets.

Page 47: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

47

The DROP and KEEP Statements (Review)

PDV

Raw data file

Output SAS data set

Input SAS data set

DROP and KEEP statements

Page 48: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

48

Display Information About the Variables

libname orion "/folders/myfolders/sasdata/prg2";proc contents data=orion.employee_addresses;run;

Page 49: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

49

The DROP Statement

Use a DROP statement to drop the variable Street_ID.

data usa australia other; drop Street_ID; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 50: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

50

Check the SAS Log

•Street_ID was dropped from every output data set. Each output data set has eight variables.

Page 51: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

51

Controlling Variable Output

What if you want to drop Street_ID and Country from usa, drop Street_ID, Country, and State from australia, and keep all variables in other?

Page 52: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

52

The DROP and KEEP statements affect every data set named on the DATA statement.

We need to use the DROP= or KEEP= data set option for more flexibility.

Page 53: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

53

The DROP= Data Set Option

The DROP= data set option can be used to exclude variables from an output data set.

General form of the DROP= data set option:

The specified variables are not written to the output data set; however, all variables are available for processing.

SAS-data-set(DROP=variable-1 <variable-2 …variable-n>)SAS-data-set(DROP=variable-1 <variable-2 …variable-n>)

Page 54: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

54

Using the DROP= Data Set Option

data usa(drop=Street_ID Country) australia(drop=Street_ID State Country) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then

output australia; else output other;run;

Page 55: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

55

The KEEP= Data Set Option

The KEEP= data set option can be used to specify which variables to write to an output data set.

Only the specified variables are written to the output data set; however, all variables are available for processing.

SAS-data-set(KEEP=variable-1 <variable-2 …variable-n>)SAS-data-set(KEEP=variable-1 <variable-2 …variable-n>)

Page 56: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

56

Using the DROP= and KEEP= Options

data usa(keep=Employee_Name City State) australia(drop=Street_ID State) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 57: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

57

The data set orion.employee_addresses contains nine variables. How many variables will be in the usa, australia, and other data sets?

data usa(keep=Employee_Name City State Country) australia(drop=Street_ID State Country) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 58: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

58

Controlling Variable InputWhen the KEEP= data set option is associated with an input data set, only the variables specified are read into the PDV and therefore are available for processing.

data usa; set orion.employee_addresses (keep=Employee_ID Employee_Name City State Postal_Code); <additional SAS statements>run;

PDVEmployee_

IDEmployee_

NameCity State

Postal_Code

Only specified variables are in PDV

Page 59: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

59

Controlling Variable InputWhen the DROP= data set option is associated with an input data set, the specified variables are not read into the PDV, and therefore are not available for processing.

data usa; set orion.employee_addresses (drop=Street_ID Street_Number

Street_Name Country); <additional SAS statements>run;

PDVEmployee_

IDEmployee_

NameCity State

Postal_Code

Dropped variables are not in PDV

Page 60: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

60

Raw data file

Output SAS data set

DROP and KEEP statements

PDV

DROP= and KEEP= on an output data set

DROP= and KEEP= on an input data set

Input SAS data set

Affects contents of PDV

Page 61: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

61

What happens?

data usa australia(drop=State) other; set orion.employee_addresses (drop=Country Street_ID Employee_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 62: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

62

Country is dropped on input, and therefore it is not available for processing. Every observation is written to other.

data usa australia(drop=State) other; set orion.employee_addresses (drop=Country Street_ID Employee_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 63: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

63

Controlling Which Observations Are Read

By default, SAS processes every observation in a SAS data set, from the first observation to the last. The FIRSTOBS= and OBS= data set options can be used to control which observations are processed.

FIRSTOBS= and OBS= are used with input data sets. You cannot use either option with output data sets.

Page 64: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

64

The OBS= Data Set Option

The OBS= data set option specifies an ending point for processing an input data set.

General form of OBS= data set option:

This option specifies the number of the last observation to process, not how many observations should be processed.

SAS-data-set(OBS=n)SAS-data-set(OBS=n)

Page 65: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

65

data australia; set orion.employee_addresses (obs=100); if Country='AU' then output;run;

Page 66: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

66

The FIRSTOBS= Data Set Option

The FIRSTOBS= data set option specifies a starting point for processing an input data set. This option specifies the number of the first observation to process.

FIRSTOBS= and OBS= are often used together to define a range of observations to be processed.

SAS-data-set (FIRSTOBS=n)SAS-data-set (FIRSTOBS=n)

Page 67: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

67

Using OBS= and FIRSTOBS= Data Set Options

proc contents data=orion.employee_addresses;run;

data australia; set orion.employee_addresses (firstobs=50 obs=100); if Country='AU' then output;run;

Page 68: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

68

Using OBS= and FIRSTOBS= in a PROC Step

The FIRSTOBS= and OBS= data set options can also be used in SAS procedures. The PROC PRINT below begins processing at observation 10 and ends after observation 15.

proc print data=orion.employee_addresses (firstobs=10 obs=15); var Employee_Name City State Country;run;

Page 69: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

69

Adding a WHERE Statement

When FIRSTOBS= or OBS= and WHERE are used together:

The subsetting WHERE is applied first The FIRSTOBS= and OBS= are applied to the resulting observations.

The following step includes a WHERE statement and an OBS= option.

proc print data=orion.employee_addresses (obs=10); where Country='AU'; var Employee_Name City Country;run;

Page 70: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

70

An Aside -- proc surveyselect

proc surveyselect data=orion.employee_addresses method=srs n=10out=simplerandomsampleseed=12345;

run;title "Simple random sample";proc print data=simplerandomsample;var employee_id country;run;

Page 71: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

71

proc surveyselect data=orion.employee_addresses method=srs n=10out=simplerandomsampleseed=12345;

where country="US";run;title "Simple Random sample of US employees";proc print data=simplerandomsample;var employee_id country;run;

Page 72: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

72

Bootstrap sampling

data tmp;input x y @@;datalines;1 7 2 6 3 4 4 8 5 9 6 11 7 12;

run;proc print data=tmp;run;

Page 73: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

73

A single boot strap sample

proc surveyselect data=tmp method=urs n=7 outhits out=tmp1(drop=numberhits) seed=7456713;

run;

proc print data=tmp1;run;

Page 74: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations.

74

Multiple bootstrap samples

proc surveyselect data=tmp method=urs n=7 outhits out=tmp1(drop=numberhits) seed=7456713reps=3;

run;

proc print data=tmp1;run;