Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb...

26
Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004

description

Copyright © 2004, SAS Institute Inc. All rights reserved. 3 Situation – Data  The data is extracted from many different applications and saved to a flat file as a ‘report’.  Excluding the header and subheader, there were 9 different parts for each report  Data formatting was very specific and varied from part to part  The number of fields varied from 4 to 35,  Field lengths varied between 1 and 400  The part lengths varied between 139 and 507 characters.

Transcript of Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb...

Page 1: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved.

SASHELP DatasetsA real life exampleBarb CrowtherSAS ConsultantOctober 22, 2004

Page 2: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 2

Situation

The Canadian government requires all financial institutions to report suspicious financial transactions, including:• Cash• Money Orders• Casino Chips• Real Estate ….

Page 3: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 3

Situation – Data

The data is extracted from many different applications and saved to a flat file as a ‘report’.

Excluding the header and subheader, there were 9 different parts for each report

Data formatting was very specific and varied from part to part

The number of fields varied from 4 to 35,

Field lengths varied between 1 and 400

The part lengths varied between 139 and 507 characters.

Page 4: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 4

Types of Required Information ……

Name Add.

Char Char

Date

Char

Whoare

you?

Date

Char

Time

Num.

Amount

Num.

Susp.Txn

Desc.

CharWhy?

Page 5: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 5

The first kicker

Not all reports needed all parts.

Some parts were always required, others only some of the time.

Which parts were included was dependant on the data.

But if the part was required, it had to be perfect!!!

Page 6: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 6

The second kicker

The users wanted to be able to edit each part before it was sent to the government – but because of the tool they used, they could not insert (or delete) a missing part.

So even all the fields were missing from the source data, the part had to be included

Page 7: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 7

So……….

The application had to insert each required part

The only information I would get is a sequence number.

Page 8: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 8

Attempt #1 – Hard code the data step

For each of the nine parts…..

…. For each of the 10 to 35 fields per part….

…I could write a “length variable ($) n;” statement.

Oh, and by the way, did I tell you that the government regularly changes part content?

Page 9: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 9

Attempt #2 – Investigate system options

Options obs=0;Data parta1;

set input.parta1;Run;Options obs=max;The problem with this code is the output dataset has no observations and I needed one, even if there was no data.

Page 10: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 10

Attempt #3 – Look at SASHELP.datasets

The SASHELP datasets contain information about the current SAS session including• all the members of all the libraries

(SASHELP.VMEMBER)• all the columns of each member (SASHELP.VCOLUMN)

Page 11: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 11

VCOLUMN Contents

Variable Name Variable Label Create Date …Format Column FormatInformat Column InformatLabel Column LabelLength Column LengthLibname Library NameMemname Member NameType Column Type

Page 12: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 12

Accessing V* Tables

Accessing the V* Tables can be done using PROC, SQL, or Data statements• proc print data=sashelp.vtable; where

libname='WORK'; run; • proc sql; create view work.options as select * from

dictionary.options;

Page 13: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 13

So how does this help me?

Step 1: Get a list of all the variables (and their attributes) required for the “empty dataset”.

Step 2: Move all that information into macro variables

Step 3: Create a dataset template

Step 4: Create the empty dataset.

Page 14: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 14

Step 1: Variables and their attributes

proc sql noprint; create table &table._vars as select name, type, format, length, label from sashelp.vcolumns where upcase(libname)=upcase("&inset") and upcase(memname) = upcase("&table") ;quit;

For this example, our inset will be Work and our table Txns.

Page 15: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 15

Step 1: Work.Txns_varsVCOLUMNS Output

Name Type Format Length LabelTran_date Char 8Tran_Post_Date Char 8Tran_Currency Char $3.00 3 Currency_codeTran_time Char 4Tran_Amount Num 8Teller_id Char $15

Page 16: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 16

Step 2: Create macro variablesusing Txns_vars from Step 1

data _null_; set &table._vars end=eof; call symput('var'||left(put(_n_,3.)),name); if format ne ' ' then call symput('fmt'||

left(put(_n_,3.)),format); else if upcase(type) = 'CHAR' then call

symput('fmt'||left(put(_n_,3.)),'$'|| put(length,3.)||'.');

if label ne ' ' then call symput('label'||left(put(_n_,3.)),label);

if eof then call symput('var_cnt',left(put(_n_,3.))) ;

run;

Page 17: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 17

Step 2: Macro Output from the SASLOG

Macro NameMacro Variable Value

BUILD_TABLE_TEMPLATE VAR1 TRANSACTION_KEYBUILD_TABLE_TEMPLATE VAR2 TRAN_CURRENCY

. . .BUILD_TABLE_TEMPLATE FMT1 20BUILD_TABLE_TEMPLATE FMT2 $3

. . .BUILD_TABLE_TEMPLATE LABEL1 TRANSACTION_KEYBUILD_TABLE_TEMPLATE LABEL2 CURRENCY_CODE

. . .BUILD_TABLE_TEMPLATE VAR_CNT 21

Page 18: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 18

Step 3: Create the dataset templateWork.Txn_tpl using the Step 2 macro variables

%let i = 1;data &table._tpl; %do i=1 %to &var_cnt; attrib &&var&i format= &&fmt&i label = "&&label&i"; %end;

Page 19: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 19

Step 3: Create the dataset templategenerated code to create Work.Txn_tpl

Value of i Generated Code1 Attrib transaction_key format = 20. label =

‘transaction_key”;2 Attrib transaction currency format = $3.

label=‘Currency Code’;

Page 20: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 20

Step 3: Create a dataset templateGet a list of the required variables

%global &table._var_list;proc sql noprint; select distinct name into :&table._var_list separated by ' ' from &table._vars ; quit;

Results in a macro variable called txn_var_list with a value of TRANSACTION_KEY TRAN_CURRENCY …

Page 21: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 21

So where are we?

We have a report with a known sequence number, but no data

We know what variables are required • &txn_var_list

We know the variables’ attributes • &&var&i format= &&fmt&i label = "&&label&i";

Page 22: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 22

Step 4: Create the empty dataset

List of variablesin the Txn

dataset(&txn_var_list)

The variableattributes(&&var&1,&&attrib&i,&&label&i)

The sequencenumber of thereport with themissing Txn

part

Dataset with thesequence

number & allthe othervariables

Page 23: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 23

Step 4: Code to generate the dataset

data &table._miss_data; retain &&&table._var_list; set result (keep=seq_num); if _n_ = 1 then set

&table._tpl(drop=seq_num);run;

Page 24: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 24

Thoughts…

Writing the macros took longer than hard coding the attribute statements.

But, if there are any future changes, I won’t have to do very much (if any).

The macros can be used in other applications…

Page 25: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 25

Suggested readings

The SASHELP Library: It Really Does Help You Manage Data by Melinda Thielbar• http://support.sas.com/sassamples/bitsandbytes/sashel

p.html

You Could Look It Up: An Introduction to SASHELP Dictionary Tables by Michael Davis • http://www2.sas.com/proceedings/sugi26/p017-26.pdf

Page 26: Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.

Copyright © 2004, SAS Institute Inc. All rights reserved. 26Copyright © 2004, SAS Institute Inc. All rights reserved. 26