Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

26
Lesson 6 - Topics • Reading SAS datasets • Subsetting SAS datasets • Merging SAS datasets

Transcript of Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Page 1: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Lesson 6 - Topics

• Reading SAS datasets

• Subsetting SAS datasets

• Merging SAS datasets

Page 2: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Working With SAS Data Sets

• Reading SAS dataset– SET Statement

• Merging SAS datasets– MERGE Statement

Done within a DATA step

Page 3: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

SET STATEMENT• Reads SAS data set • Replaces INFILE and INPUT statements used when

reading in raw data• KEEP brings in selected variables (columns)• Where brings in selected observations (rows)

DATA new; SET old (KEEP = varlist); WHERE = condition;RUN;This creates a new data set called new that has the variables in varlist and selected observations from old.

Page 4: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

PROGRAM 9 Making SAS Datasets from Other SAS Datasets;

DATA tdata;INFILE ‘C:\SAS_Files\tomhs.data' ;INPUT @ 1 ptid $10. @ 12 clinic $1. @ 25 group 1. @ 30 sex 1. @ 123 sbp12 3. @ 14 randdate $10.;RUN;* Making a new dataset containing only men;DATA men; SET tdata; * reads the existing dataset; WHERE sex = 1; This does the selection; if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; KEEP ptid clinic group sbp12 randdate active;RUN;

Page 5: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

* Making a new dataset containing only women;DATA women; SET tdata; WHERE sex = 2; if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; KEEP ptid clinic group sbp12 randdate active;RUN;

We now have 3 datasets “active” tdata men women

Page 6: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

DATA clinic;

INFILE DATALINES;

INPUT id $ sbp ;

DATALINES;

C03615 115

B00979 107

B00644 138

D01348 142

A01088 117

B01408 121

B00025 130

B00714 144

A01166 113

… more data

;

DATA lab;

INFILE DATALINES;

INPUT id $ glucose;

DATALINES;

C03615 102

B00644 089

D01348 111

A01088 093

B01408 094

B00025 077

B00714 100

A01166 113

D00942 103

… more data

;

PROGRAM 11 - Merging SAS Datasets

Page 7: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

* Creating merged dataset;

PROC SORT DATA= clinic; BY id;PROC SORT DATA= lab; BY id;

DATA study; MERGE clinic lab; BY id ;RUN;

Note: The BY statement is very important!

Page 8: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Merged Dataset

Obs id sbp glucose

1 A00869 110 99

2 A01088 117 93

3 A01166 113 113

4 B00025 130 77

5 B00644 138 89

6 B00714 144 100

7 B00867 114 98

8 B00979 107 .

18 D00942 . 103

20 D01809 129 .

Page 9: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

What if you want observations that are in both datasets?

DATA study; MERGE clinic (IN=in1) lab (IN=in2); BY id; if in1 and in2;RUN;

PROC PRINT DATA=study; TITLE ‘Patients with Clinic and Lab';RUN;

Page 10: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Logical Statements

* Must be in 1st dataset; if in1; * Same as: if in1 = 1;

* Must be in 2nd dataset; if in2;

* Must be in both datasets; if in1 and in2;

Page 11: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Things to Remember When Merging Datasets

• Need to have common variable name is each dataset to use as linking variable

• Variables in dataset with no match will be set to missing

• Rows matched that have same variable names will be assigned right-most dataset value

• Always remember the BY statement in the merge!

Page 12: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Temporary vs Permanent SAS Datasets

Temporary (or working) SAS dataset - After SAS session is over the dataset is deleted.

DATA bp; * bp is deleted after SAS session;(rest of program)

Permanent SAS dataset - After program is run the dataset is saved and is available for use in future programs. You need to tell SAS where to store/retrieve the dataset.

Note: For PC SAS the working dataset is available until you end the SAS session.

Page 13: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Reasons to Create Permanent SAS Datasets

• Read raw data and compute calculated variables only once

• All variables have assigned names and labels.

• Data is ready to be analyzed.

• Dataset can be sent to other computers or users.

Page 14: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Creating a Permanent Dataset

LIBNAME mylib ‘C:\My SAS Datasets’; DATA mylib.sescore;

LIBNAME – assigns a directory (folder) reference name. In this example the directory ‘C:\My SAS Datasets’ is assigned a reference name of mylib.

DATA mylib.sescore;

Tells SAS to create a dataset called sescore in the directory referenced by mylib, which is ‘C:\My SAS Datasets’.

Page 15: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Examples of LIBNAME Statements

LIBNAME mylib ‘C:\My SAS Files';

LIBNAME class ‘C:\My SAS Files' ;

LIBNAME ph6420 'C:\My SAS Files\SASClass\' ;

LIBNAME points to a directory (folder)

DATA mylib.datasetname;

DATA class.datasetname;

DATA ph6420.datasetname;

On UNIX and PC the file will be called

datasetname.sas7bdat

Page 16: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

PROGRAM 11

LIBNAME mylib ‘C:\SAS_Files'; DATA mylib.sescore;

INFILE ‘C:\SAS_Files\tomhs.data' LRECL =400;INPUT @ 1 ptid $10. @ 12 clinic $1. @ 14 randdate mmddyy10. @ 25 group 1. @ 49 educ 1. @ 85 wtbl 5.1 @ 97 wt12 5.1 @115 sbpbl 3.0 @123 sbp12 3.0 @236 (sebl_1-sebl_20) (1. +1) @276 (se12_1-se12_20) (1. +1) ;

Page 17: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

wtd12 = wt12 - wtbl;sbpd12 = sbp12 - sbpbl;sescrbl = MEAN (OF sebl_1 - sebl_20) ;sescr12 = MEAN (OF se12_1 - se12_20) ;sescrd12 = sescr12 - sescrbl ;

LABEL educ = 'Highest Education Level';LABEL wt12 = 'Weight (lbs) at 12 Months';LABEL wtbl = 'Weight (lbs) at Baseline';LABEL wtd12 = 'Weight Change at Baseline';LABEL sbpbl = 'Systolic BP (mmHg) at Baseline';LABEL sbp12 = 'Systolic BP (mmHg) at 12 Months';LABEL sbpd12 = 'Systolic BP Change at 12 Months';LABEL group = 'Treatment Group (1-6)';LABEL sescrbl = 'Side Effect at Baseline';LABEL sescr12 = 'Side Effect at 12 Months';LABEL sescrd12 = 'Side Effect Change Score';FORMAT randdate mmddyy10. ;

DROP sebl_1-sebl_20 se12_1-se12_20 ;

Page 18: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

60 LIBNAME mylib 'C:\SAS_Files';NOTE: Libref MYLIB was successfully assigned

as follows: Engine: V9 Physical Name: C:\SAS_FilesDATA mylib.sescore; NOTE: The infile 'C:\SAS_Files\tomhs.data' is: File Name=C:\SAS_Files\tomhs.data, RECFM=V,LRECL=400

NOTE: 100 records were read from the infile 'C:\SAS_Files\tomhs.data'.

NOTE: The data set MYLIB.SESCORE has 100 observations

and 14 variables.

Page 19: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

PROC CONTENTS DATA=mylib.sescore VARNUM ;

TITLE 'Description of Variables in Dataset SESCORE' ;RUN;

What is inside a SAS dataset?

Data

Names, labels, and formats of all variables

PROC CONTENTS reads the descriptor portion of the dataset

Page 20: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Description of Variables in Dataset SESCORE

The CONTENTS Procedure

Data Set Name: MYLIB.SESCORE Observations: 100Member Type: DATA Variables: 14 Engine: V9 Indexes: 0 Created: 10:59 Wednesday, August 11,2004 Observation Length: 112Last Modified: 10:59 Wednesday, August 11,2004 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label:

-----Engine/Host Dependent Information-----

File Name: C:\SAS_Files\sescore.sas7bdat

Release Created: 9.1.3 Host Created: XP_PRO File Size (bytes): 24576

Note: mylib is not a part of the dataset name

Page 21: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

# Variable Type Len Pos Format Label

-----------------------------------------------------------------------------

1 ptid Char 10 96 Patient ID

2 clinic Char 1 106 Clinical Center

3 randdate Num 8 0 MMDDYY10. Randomization Date

4 group Num 8 8 Treatment Group (1-6)

5 educ Num 8 16 Highest Education Level

6 wtbl Num 8 24 Weight (lbs) at Baseine

7 wt12 Num 8 32 Weight (lbs) at 12 Months

8 sbpbl Num 8 40 Systolic BP (mmHg) at Baseline

9 sbp12 Num 8 48 Systolic BP (mmHg) at 12 Months

10 wtd12 Num 8 56 Weight Change at Baseline

11 sbpd12 Num 8 64 Systolic BP Change at 12 Months

12 sescrbl Num 8 72 Side Efect at Baseline

13 sescr12 Num 8 80 Side Efect at 12 Months

14 sescrd12 Num 8 88 Side Efect Change Score

Variables listed in creation order

This becomes the documentation of the dataset

Page 22: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

LIBNAME mylib ‘C:\SAS_Files'; DATA sescore;….RUN;PROC COPY IN=work OUT=mylib; SELECT sescore;RUN;

Using PROC COPY to copy work dataset to permanent dataset

Make a work dataset first – then when you know that is working correctly copy the work dataset to a permanent dataset.

Page 23: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Reading Permanent SAS Dataset

LIBNAME class ‘C:\SAS_Files' ;

* Tells SAS where to find the SAS dataset;

PROC MEANS DATA=class.sescore ;

TITLE 'Means of All Numeric Variables on SAS Permanent Dataset';

RUN;

PROC CORR DATA=class.sescore;

VAR wtd12 sbpd12 sescrd12;

TITLE 'Correlation Matrix of 3 Change Variables';

RUN;

What if dataset was moved to a different folder? Just need to change LIBNAME

Page 24: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Means of All Numeric Variables on SAS Permanent Dataset

The MEANS Procedure

Variable Label N Mean

------------------------------------------------------------------

randdate Randomization Date 100 10101.29

group Treatment Group (1-6) 100 3.62

educ Highest Education Level 99 6.00

wtbl Weight (lbs) at Baseline 100 191.76

wt12 Weight (lbs) at 12 Months 92 180.33

sbpbl Systolic BP (mmHg) at Baseline 100 139.92

sbp12 Systolic BP (mmHg) at 12 Months 92 124.04

wtd12 Weight Change at Baseline 92 -11.53

sbpd12 Systolic BP Change at 12 Months 92 -15.64

sescrbl Side Effect at Baseline 100 1.19

sescr12 Side Effect at 12 Months 95 1.16

sescrd12 Side Effect Change Score 95 -0.03

------------------------------------------------------------------

Page 25: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Pearson Correlation Coefficients

Prob > |r| under H0: Rho=0

Number of Observations

wtd12 sbpd12 sescrd12

wtd12 1.00000 0.23986 0.15341

Weight Change at Baseline 0.0213 0.1443

92 92 92

sbpd12 0.23986 1.00000 0.05679

Systolic BP Change at 12 Months 0.0213 0.5908

92 92 92

sescrd12 0.15341 0.05679 1.00000

Side Efect Change Score 0.1443 0.5908

92 92 95

Page 26: Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

*---------------------------------------------------------------*

Often you will read the permanent SAS dataset in a DATA step to

modify or add variables. Usually these will be put on a new

temporary SAS dataset. The SET statement reads a SAS dataset

*---------------------------------------------------------------*;

LIBNAME class 'C:\SAS_Files'

DATA rxdata;

SET class.sescore;

if group in(1,2,3,4,5) then rx = 1; else rx = 2;

RUN;

PROC MEANS DATA=rxdata N MEAN MAXDEC=2 FW=7;

CLASS group;

VAR sbpd12 wtd12 sescrd12;

TITLE 'Change in SBP, Weight, and Side Effect Score by Treatment';

RUN;