Use SAS and FTP

4
USE SAS TO MAINTAIN AND QC YOUR DATABASE Sergey Sian, PreVision, Lincoln, MA INTRODUCTION Keeping, updating and maintaining database is not a simple task. There are many steps involved (get and QC raw files from client/vendor; set files together and check them out; update database (DB) tables; QC update process, etc.). Each of these processes is very important. How SAS might be used to implement most of this steps is discussed in this paper: - Every morning automatically get raw files from the client’s FTP site, convert them into SAS and do QC - Set daily files together and define “extra” and “missing” stores just for “core” dates - Check out if all (and necessary) transactions have been loaded into the DB - Create *.html reports on $$$ amount spent by customers through different channels (Internet, catalog, retail store, etc.). GET FILES FROM CLIENT’S FTP SITE Say you keep an in-house database for a client, who has a thousand stores across the USA and Canada. They collect “customer” and “transaction” information from all stores on a daily basis and expect to put it for you on an FTP site (the most convenient and quickest way to pass big amount of information) before 8:00 AM. To distinguish one daily file from another, each suppose has a “current” day in its name. Can you check out automatically: - If all necessary files arrived at that time? - If so, do they have the same number of records, which they are expected to have (based on provided “count” files), etc.? The following SAS code might be helpful. /* Wake up your program */ data _NULL_; SLEPT=WAKEUP("01OCT01:8:00:00"dt); run; options mprint symbolgen noxwait; %Macro Script_Ftp (ScriptFile_path,ScriptFile_name, IP_address, UserName, Pass_Word, Ftp_Dir, T_Mode, File_Name_String, LocalDrive, LocalFolder); /* Create executable on PC file */ /* which automatically gets */ /* access to client’s FTP site */ data _null_; file "&scriptFile_path.\&scriptFile_name"; file_name= translate("&File_Name_string", ’ ’ ,’;’ ); put ’open’ / "&IP_ADDRESS"/ "&UserName"/ "&Pass_word"/ "cd &FTP_DIR"/ "&T_Mode"/ ’prompt’/ ’mget ’ File_Name ; run; /* Call&execute created file */ Data _null_; X ’c:’; call system ("&localdrive"); call system ("cd &localfolder"); call system ("ftp -s:&scriptfile_name"); call symput (’HowManyFiles’ , (compress (DMYTECWD ("&File_Name_String", ’ ’ )))); run;

description

SAS

Transcript of Use SAS and FTP

Page 1: Use SAS and FTP

USE SAS TO MAINTAIN AND QC YOUR DATABASE Sergey Sian, PreVision, Lincoln, MA INTRODUCTION Keeping, updating and maintaining database is not a simple task. There are many steps involved (get and QC raw files from client/vendor; set files together and check them out; update database (DB) tables; QC update process, etc.). Each of these processes is very important. How SAS might be used to implement most of this steps is discussed in this paper:

- Every morning automatically get raw files from the client’s FTP site, convert them into SAS and do QC

- Set daily files together and define “extra” and “missing” stores just for “core” dates

- Check out if all (and necessary) transactions have been loaded into the DB

- Create *.html reports on $$$ amount spent by customers through different channels (Internet, catalog, retail store, etc.).

GET FILES FROM CLIENT’S FTP SITE

Say you keep an in-house database for a client, who has a thousand stores across the USA and Canada. They collect “customer” and “transaction” information from all stores on a daily basis and expect to put it for you on an FTP site (the most convenient and quickest way to pass big amount of information) before 8:00 AM. To distinguish one daily file from another, each suppose has a “current” day in its name. Can you check out automatically:

- If all necessary files arrived at that

time? - If so, do they have the same number of

records, which they are expected to have (based on provided “count” files), etc.?

The following SAS code might be helpful. /* Wake up your program */ data _NULL_; SLEPT=WAKEUP("01OCT01:8:00:00"dt); run; options mprint symbolgen noxwait; %Macro Script_Ftp

(ScriptFile_path,ScriptFile_name, IP_address, UserName, Pass_Word, Ftp_Dir, T_Mode, File_Name_String, LocalDrive, LocalFolder); /* Create executable on PC file */ /* which automatically gets */ /* access to client’s FTP site */ data _null_; file "&scriptFile_path.\&scriptFile_name"; file_name= translate("&File_Name_string", ’ ’,’;’); put ’open’ / "&IP_ADDRESS"/ "&UserName"/ "&Pass_word"/ "cd &FTP_DIR"/ "&T_Mode"/ ’prompt’/ ’mget ’ File_Name ; run; /* Call&execute created file */ Data _null_; X ’c:’; call system ("&localdrive"); call system ("cd &localfolder"); call system ("ftp -s:&scriptfile_name"); call symput (’HowManyFiles’, (compress (DMYTECWD ("&File_Name_String", ’ ’)))); run;

Page 2: Use SAS and FTP

/* Check out first, if all files */ /* arrived on client’s FTP site */ %Macro scan_file; restart: ; %do i=1 %to &HowManyFiles; %let Q&i=%scan(&file_name_string, &i, ’;’); %let RR&I=&localdrive.\&localfolder.\&&Q&i; R&i="&&RR&I"; RCQ&i=fileexist(r&i); if RCQ&i=0 then put "r&i has not arrived"; %end; if sum (of RCQ1-RCQ&Howmanyfiles)<&HowManyFiles then do; slept=sleep(3600); Θ call system ("ftp -s:&scriptfile_name"); goto restart; end; else if sum (of RCQ1-RCQ&howmanyfiles)=&HowManyFiles then do; put ’!!! All Files have arrived. !!!’; end; %Mend scan_file; data _null_; %scan_file run; %mend Script_Ftp; … Here I have deliberately skipped the code to define macro variable “&Date”, etc. … %let files_z=%qcmpres(cust_&date..txt; ⊗ custcount_&date..txt; trans_&date..txt; transcount_&date..txt); %Script_ftp(N:\TR Daily Raw Data, script_1.txt, 172.16.1.3, pvm_001_adm, mypassw, /pvm_001/pvmin, binary, &files_z, N:, TR Daily Raw Data); Θ If not all files arrived recall your

executable script again, which automatically gets access to the client’s FTP site in 3,600 seconds (1 hour).

⊗ List of all daily files, which is expected to arrive on the client’s FTP site.

After you successfully downloaded (“mget” command) all files from FTP, your program can automatically convert them into SAS, do QC, compare values from “count” files (custcount_&date..txt and transcount_&date..txt) with real numbers of records in “customer” and “transaction” files, create and automatically e-mail you (“Is ‘Stup_ID’ really stupid?” – S. Sian (2000) the *.html report. DEFINE “EXTRA” AND “MISSING” STORES JUST FOR “CORE” DATES Suppose that you have to update a database every week. You should set all daily files together (7 “customer” and 7 “transaction” files, which belong to “core” dates for this particular week), QC them and implement your “update” logic. Of course you can do it in SAS! The first obvious QC question is, “Do I have weekly “customer” and “transaction” information from all thousand stores?” (which should be reflected in a “store” file). If not, then you have “missing” stores. Or you might have “customer” or “transaction” information from store(s), that are not part of “store” file (“extra” stores – newly opened or reopened after restoration). Say you want to find “missing” stores by date (by running cross-tabulation on “store*transdate”). It’s very common that in a “transaction” file you’ve got information not just for “core” but also for several previous dates (so-called “late pull”). Having “late pull” information just for 3 stores out of thousand you will get in the report 997 records for “missing” stores for previous (not “core” dates). The next SAS code (which is part of MACRO) might help you avoid this problem. proc freq data=trans_file noprint; tables TRANDATE / missing list out=one; run; data one; set one(keep=count TRANDATE); if count >= 11500; ♣ run;

Page 3: Use SAS and FTP

data _NULL_; set one; %do i=1 %to 7; ♠ if _N_=&i then call symput ("coredt&i",PUT(TRANDATE,9.)); %end; run; proc freq data=TRANSFILE; tables trandate*store / missing list; where flag=’MISSING STORE #’ AND (trandate=&coredt1 or trandate=&coredt2 or trandate=&coredt3 or trandate=&coredt4 or trandate=&coredt5 or trandate=&coredt6 or trandate=&coredt7); title "Missing Stores by trandate date”; run; ♣ Since the majority of “transaction”

records belong to “core” dates, set the reasonable limit to cut off “late pull” records. ♠ Create (through DO-Loop) 7 Macro

variables, which have values just for 7 “core” dates. Name them as “coredt1”, “coredt2”, “coredt3”, etc.

Somewhere before, the flag for

“missing” stores have been assigned. HAVE ALL (AND NECESSARY) TRANSACTIONS BEEN LOADED INTO THE DATABASE? Many retail stores use a point of sale (POS) system to collect and keep “customer” and “transaction” information. This system registers different types of transactions, based on a combination of “Program ID” and “Transaction Code” (program_id=CASH and trancode=100 means “Taxable Merchandise”; program_id=CASH and trancode=160 means “Taxable Ptomotion Discaunt”; program_id=RTRN and trancode=101 means

“Returned Taxable Merchandise Not-Damaged”, etc.). Based on business and marketing rules you might want to keep (load during update process) certain types of transactions (not all of them) in your database. That is why it’s important to get and compare distribution on “program_id*trancode” from RAW data with the same distribution from the database after the update process is finished. Of course, they should match “apple-to-apple”. I have deliberately skipped the SAS code, because it depends on particular business and marketing rules for a particular client. But definitely the SQL statement (to extract data from database), proc freq are involved in this QC procedure. Below you can find automatically produced results (Excel document) of this SAS program.

prgid trancode COUNT

CASH 100 1,388,421 CASH 102 48,394 CASH 160 169,602 EMPL 300 14,358 EMPL 302 260 EMPL 350 5,189 EMPL 360 9,224 ERTN 301 166 ERTN 303 7 ERTN 351 71 ERTN 361 59 EXCH 100 7,835 EXCH 101 6,775 EXCH 102 318 EXCH 103 187 EXCH 160 1,408 EXCH 161 355 LCAN 600 375 LCAN 602 41 LCAN 660 187 LINT 600 1,925 LINT 602 410 LINT 660 1,114 RTRN 101 16,972

Page 4: Use SAS and FTP

CONCLUSION This SAS code and the approaches discussed demonstrate how SAS might be used to keep, update and QC a database. This paper touches on the common issues related to a database (how to automate getting raw data from a client, how to analyze only the “right” dates, how to QC the update process, and how to evaluate the “state of health” of your database).

ACKNOWLEDGMENTS Song Jungdong contributed extensively to the development of this paper. His support and suggestions are greatly appreciated.

PreVision is a customer relationship management agency specializing in the development and implementation of relational database and relationship marketing strategies, including comprehensive customer loyalty, upgrade and acquisition programs.

PreVision provides strategic, analytic, creative and mail production services, along with support in the selection and efficient use of the newest database technologies. AUTHOR CONTACT INFORMATION Sergey Sian PreVision, LLC 55 Old Bedford Road Lincoln, MA 01773 Direct: (781) 259-5169 Fax: (781) 259-1548 Fax: (781) 259-1704 E-mail: [email protected] TRADEMARK INFORMATION SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. indicates USA registration.