MKADRGM: A Macro to Transform Drug-Level SDTM Data into ...€¦ · [2] ADaM Structure for...

1

PharmaSUG 2020 – Paper AD-146

MKADRGM: A Macro to Transform Drug-Level SDTM Data into Traceable, Regimen-Level ADaM Data Sets

Sara McCallum, Harvard T.H. Chan School of Public Health, Center for Biostatistics in AIDS Research (CBAR)

ABSTRACT

In our support of the NIH funded AIDS Clinical Trials Group (ACTG) and International Maternal Pediatric Adolescent AIDS Clinical Trials (IMPAACT) networks, participants concurrently take multiple medications for the treatment of HIV and other diseases in many of our studies. For analysis and presentation to investigators, our statisticians find an ADaM data set that aggregates all drugs in a regimen at a given time into a single record, where that record represents the period of time those drugs were taken, is needed to summarize regimens, identify regimen changes or treatment gaps, and to aid in explaining other events on study (e.g., occurrence of adverse events or emergence of drug resistance).

This paper will present our organization’s SAS macro, MKADRGM. The macro accepts WHO Drug coded CDISC domains recorded at the drug level, and outputs an ADaM data set at the regimen level with start and stop dates. Other variables in the analysis data set include the regimen duration, drug counts, and drug class information. An intermediate data set is used to facilitate traceability to the source SDTM domain(s). This macro was developed for HIV regimens, but also works with other therapeutic areas our organization is involved with, including TB, HCV, and HBV.

This paper will highlight some of the inner workings of the macro, and demonstrate the novel usage of the SRC* triplicate (SRCVAR, SRCDOM, and SRCSEQ) to provide traceability even when multiple source SDTM records and domains are mapped to a single regimen in the analysis data set.

INTRODUCTION

Customarily, studies are interested in investigating the effect of a single drug. However, treatment regimens often include the drug of interest, as well as the other drugs in the treatment plan. HIV antiretroviral treatment, or ART, typically comprises multi-drug regimens. Drugs in the regimen may be given as a single fixed-dose combination (FDC) pill (i.e., a single pill containing multiple agents), as a pill containing an individual agent (i.e., one pill per agent), or as a mix of FDC and individual agents. This means that the WHODRUG-coded components of a participant’s regimen at any given time may be represented as single or multiple records within the SDTM, and possibly over multiple domains (i.e., CM and EC). Aggregating drugs into HIV regimens is complex because components of a regimen are often added, replaced, or stopped at different times during a study. Further adding to regimen complexity is the number of ART options available, with currently 50 FDA-approved agents across 7 drug classes.

Note: The word "drug" and "medication" are used interchangeably in this document.

For the purposes of this paper, the focus and examples will be on HIV regimens, however this macro is also used in our organization to create TB, HCV, and HBV regimens.

DATA PREPARATION

The MKADRGM macro allows multiple input SDTM domains or ADaM data sets. However, each input data set must adhere to the following guidelines:

2

1. Contain observations from only one of the following SDTM domains: CM, EC, or EX 2. Contain the following variables:

DOMAIN, STUDYID, USUBJID, --TRT, --DECOD*, --SEQ, --STDT, --ENDT

* If ADECOD is present, it will be used. If ADECOD is not present the macro will use --DECOD, where -- is the value in DOMAIN

3. Contain NO pharmacokinetic (PK) data

In order to create regimens correctly, the macro cannot process the following types of records:

1. Records that are missing both --STDT and --ENDT 2. Duplicates along the following variables:

STUDYID DOMAIN USUBJID --DECOD --STDT --ENDT

REFERENCE DATA

The MKADRGM macro uses reference data, created and centrally maintained by our organization.

REFERENCE SAS DATA SET

The MKADRGM macro uses a WHO Drug Dictionary reference SAS data set (see Appendix A for a sample) which our organization derives from the WHODrug – Format B3 Drug Dictionary (DD) table. This data set is maintained on the same schedule as the release of WHO Drug updates (biannual) and contains a record for each PREFNM, where SEQNUM1=’01’ and SEQNUM2=’001’.

REFERENCE EXCEL SPREADSHEETS

The MKADRGM macro uses a centrally maintained reference spreadsheet (see Appendix B for an example) to identify the drugs relevant to the therapeutic area of choice. Created by our organization, this spreadsheet also includes drug information unavailable in the WHO Drug Dictionary, and additionally, standard medication abbreviations and drug class information derived from literature.

If needed, the macro also allows a statistician to replace the centrally maintained reference spreadsheet with a custom reference spreadsheet, such as if a custom sort order is preferred, or medication(s) outside of the WHO Drug Dictionary are taken on-study.

Spreadsheets, which can be easily read in with the XLSX engine, were chosen over SAS data sets as the method of storage for this data for their ease of customization. With this data in a spreadsheet, a statistician can reorder several drugs with just a few clicks. It takes several ordering variables and sort statements to accomplish the same goal on a SAS data set.

MKADRGM

The call to MKADRGM, below, takes the data provided in INDATA and produces two output data sets – an intermediate data set and an analysis data set – as well as a problems listing, in the OUTLIB location. For more information on how the output data sets are named, see Appendix C. Parameters indicated with an (R) are required, while those indicated with an (O) are optional:

%let STDY = /home/fake/study/location ;

%MKADRGM(

INDATA = cm /* (R) Valid domains: CM, EC, EX, CM and EC, or CM and EX */

,OUTLIB = derived /* (R) Library reference where output is stored */

,RGMCAT = HIV /* (R) Category of regimen, therapeutic area of interest */

,CLASSORD = OFF /* (O) Sort regimen by drug class, see below */

3

,CUSTREF = &STDY./CSTM.xlsx /* (O) Location of optional custom reference spreadsheet */

,DELIM = space /* (O) Delimiter to separate regimen with multiple drugs */

,USUBJL = %str('99999') /* (O) List USUBJID(s) to subset data sets for debugging */

,DEBUG = yes /* (O) Print data sets when experiencing errors */

);

The RGMCAT parameter defines the therapeutic area on which to build the regimen data set. MKADRGM subsets the input data set(s) to the category of regimen chosen by the user by merging to the centrally maintained WHO Drug Dictionary reference data set on the --DECOD variable. In this case, the macro will merge to the list of HIV drugs (as defined by our organization), and subset to those drugs, which will then be used to build the HIV regimens.

MKADRGM sorts the drugs within a regimen, allowing for highly customized reporting. The macro sorts first by drug class; then within drug class, by drug. Each therapeutic area has both a default class sort order, and a default within-class drug sort order. For instance, HIV regimens that accept the default will have any non-nucleoside reverse transcriptase inhibitors (NNRTIs) at the beginning of the regimen. Within the NNRTIs, drugs will be sorted in alphabetical order by active ingredient. To select a class sort order other than the default, an ordered list of drug classes is entered in the CLASSORD parameter. If another within-class drug sort order is preferred, a custom spreadsheet is needed (see below). The CLASSORD parameter can also be turned off if sorting by class is not needed. This is usually used in combination with the custom spreadsheet when an entirely custom drug order is desired.

The CUSTREF parameter allows statisticians to replace the centrally maintained reference spreadsheet with a custom reference spreadsheet to further customize regimens in a number of ways. A custom reference spreadsheet allows for a custom drug sort order – for instance, sorting a background therapy to the beginning of the regimen. This allows for highly customizable reporting. Statisticians can also add medications that are not in the WHO Drug Dictionary to the custom spreadsheet, allowing study treatments seeking approval to be added to treatment regimens. Lastly, the custom spreadsheet allows for cross-therapy regimens. Our organization is often interested in the comorbidities associated with HIV, and the SDTM data structure has made reporting on cross-therapy regimens possible. Statisticians simply combine the list of medications from the two therapeutic areas (for instance HIV and TB) into the custom spreadsheet, and drug regimens containing TB and HIV drugs are produced.

The DELIM parameter simply allows users to choose the character to insert between drugs in multi-drug regimens:

Options Example

slash (D) ABC/ddI/ATV/SQV

semicolon ABC;ddI;ATV;SQV

comma ABC,ddI,ATV,SQV

space ABC ddI ATV SQV

Lastly, the USUBJL parameter is an optional addition to the DEBUG parameter to aid in debugging.

TRACEABILITY: FROM INTERMEDIATE TO ANALYSIS DATA SET

The largest traceability challenge is identifying which drugs in the regimen come from which SDTM domains, since regimens can contain drugs from multiple domains. Without an intermediate data set, this required an indexed SRCDOM (SRCDOMxx) as well as an indexed SRCSEQ (SRCSEQxx). However, since some regimens contain 5 or 6 drugs, this quickly added an unwieldly number of variables to the analysis data set – making it difficult to interpret. The user had more trouble tracing records back to the SDTM

4

than when using an intermediate data set, even with the variable pairs next to each other. Our organization decided the better solution was to output an intermediate data set.

The intermediate data set closely resembles the source SDTM domain(s) in data set structure (see Appendix D and Appendix E for a contents listing and print of an example intermediate data set). Like the SDTM domains, it is at the drug level – one record per drug per dosing interval. The only difference between the input SDTM domain(s) and the intermediate data set, is the addition of several variables, including, the SRC triplicate – SRCDOM, SRCVAR, and SRCSEQ. Most importantly, a new ASEQ is assigned. However, it is imperative this data set is created because both the intermediate and the analysis data sets can be composed of multiple SDTM domains, making all three of the SRC* variables needed for traceability. The new ASEQ and SRCDOM variables become especially important in situations where regimens contain drugs from multiple domains.

SRCVAR is also especially important in the intermediate data set because it records the --DECOD variable on which the merge to the WHO Drug Dictionary reference data set is performed. This can be different for each input data set. If ADECOD is present in the input data set(s), it will be used. If it is not, --DECOD will be used, where -- is the value in DOMAIN.

To transform the intermediate data set into the analysis regimen data set (see Appendix F and Appendix G for a contents listing and print of an example analysis data set), MKADRGM uses a series of double set statements to preview the next drug and find starts, stops, and single dose medications. This allows the macro to correctly handle complicated regimen changes common in HIV, and simultaneously maintain traceability from the input SDTM records to the output analysis records. Each ASEQ from the intermediate data set becomes one of the indexed SRCSEQs in the analysis data set, allowing the user to trace each drug back to its SDTM domain:

_cm data set:

DOMAIN USUBJID CMSEQ CMTRT CMDECOD CMSTDT CMENDT

CM 1001 1 Certirizine Cetirizine 13APR2018 .

2 Dolutegravir Dolutegravir 13APR2018 .

3 Emtricitabine Emtricitabine 13APR2018 .

4 Mometasone nasal Mometasone 13APR2018 .

5 Tenofovir alafenamide Tenofovir alafenamide 13APR2018 .

Note: Some variables have been removed for the purposes of this example.

_ec data set:

DOMAIN USUBJID ECSEQ ECTRT ECDECOD ECSTDT ECENDT

EC 1001 1 Doravirine Doravirine 19JUL2018 19JUL2018

Note: Some variables have been removed for the purposes of this example.

5

admhivb data set:

USUBJ ID ASEQ ACAT ADECODCD ASTDT AENDT ACLAS SRCDOM SRCVAR SRCSEQ

1001 1 HIV DTG 13APR2018 . INSTI CM CMDECOD 2

2 HIV FTC 13APR2018 . NRTI CM CMDECOD 3

3 HIV TAF 13APR2018 . NRTI CM CMDECOD 5

4 HIV DOR 19JUL2018 19JUL2018 NNRTI EC ECDECOD 1

Note: Some variables have been removed for the purposes of this example, please see appendix D and E for a list and print of all of the variables in the dataset.

adhivb data set:

USUBJ ID ASEQ ACAT ARGMCD ASTDT AENDT SRCDOM SRCVAR

SRC SEQ 01

SRC SEQ 02

SRC SEQ 03

SRC SEQ 04

1001 1 HIV FTC/TAF/DTG 13APR2018 18JUL2018 ADMHIVB ADECOD 1 2 3 .

2 HIV DOR/FTC/TAF/DTG 19JUL2018 19JUL2018 ADMHIVB ADECOD 1 2 3 4

3 HIV FTC/TAF/DTG 20JUL2018 . ADMHIVB ADECOD 1 2 3 .

Note: Some variables have been removed for the purposes of this example, please see appendix F and G for a list and print of all of the variables in the dataset.

HANDLING SITUATIONAL CASES

SAME DAY START/STOP

When a participant comes in for a visit and a regimen is altered, the start date is entered as the prescription date for the new regimen. It is assumed the participant has not taken their medication for that day, and will take the first dose of the new regimen the day it is prescribed. For this reason, when one regimen ends on the same day the next regimen starts, the dates will be adjusted by subtracting one from the regimen end date. For example:

_cm data set:

USUBJID CMSEQ CMTRT CMDECOD CMSTDT CMENDT

1001 1 Saquinavir Saquinavir 01AUG1996 01DEC2010

2 Epzicom Abacavir;Lamivudine 01DEC2010 .

adhivc data set:

USUBJID ASEQ ARGM ARGMCD ASTDT AENDT

1001 1 Saquinavir SQV 01AUG1996 30NOV2010

2 Abacavir;Lamivudine ABC/3TC 01DEC2010 .

6

TREATMENT GAPS

Blank records are inserted for treatment gaps. Often, it is important to understand when a participant is off treatment, and how long this interruption lasted. This can be easily summarized with the treatment gap records in the analysis regimen data set. For example, instead of the macro outputting the set of records below:

USUBJID ASEQ ARGMCD ASTDT AENDT RGMDUR TRTTOT SRCSEQ01 SRCSEQ02

1001 1 FTC TDF 01JUL2013 01JUL2015 731 2 1 .

2 FTC TAF DRV COBI 15SEP2015 . . 4 2 3

the macro outputs this set of records instead:

USUBJID ASEQ ARGMCD ASTDT AENDT RGMDUR TRTTOT SRCSEQ01 SRCSEQ02

1001 1 FTC TDF 01JUL2013 01JUL2015 731 2 1 .

2 02JUL2015 14SEP2015 75 0 . .

3 FTC TAF DRV COBI 15SEP2015 . . 4 2 3

REPEATED REGIMENS ON SEQUENTIAL RECORDS

The analysis data set merges any repeated sequential regimen records (ie., those without a treatment gap) together and selects the first start date and last end date, combines all pertinent SEQs, and adjusts all other corresponding variables accordingly. For example, instead of the macro outputting the set of records below (see starred records):

USUBJID ASEQ ARGMCD ASTDT AENDT SRCSEQ01 SRCSEQ02 SRCSEQ03

1001 1 ZDV 01JAN2018 01JAN2018 1 . .

* 2 NVP/3TC/ZDV * 02JAN2018 20JAN2018 2 3 4

* 3 NVP/3TC/ZDV * 21JAN2018 04FEB2018 3 4 5

the macro outputs this set of records instead:

USUBJID ASEQ ARGMCD ASTDT AENDT SRCSEQ01 SRCSEQ02 SRCSEQ03 SRCSEQ04

1001 1 ZDV 01JAN2018 01JAN2018 1 . . .

* 2 NVP/3TC/ZDV * 02JAN2018 04FEB2018 2 3 4 5

ADaM OTHER DATA SETS

While the intermediate data set can be classified as ADaM OCCDS, the analysis data set does not parallel the structure of the intermediate data set and is classified as ADaM OTHER for the following reasons:

• The analysis data set is not used for the counting of subjects with at least one occurrence of a given term.

• The analysis data set does not contain the same number of records as its corresponding domain, and is heavily transformed from the input domain(s).

• No value being analyzed (AVAL), or description of the value being analyzed (PARAM) is reasonable in the analysis data set, thus it is not a BDS.

7

Both output data sets have been presented at CDISC Interchange in an effort to establish standards surrounding regimen data sets.

CONCLUSION

The analysis dataset is in a format that is clinically intuitive and easy to work with to obtain summary information, like this example which summarizes baseline regimens:

Summary of Baseline ARV Regimen

Treatment Group

Characteristic A

(N=61) B

(N=62) Total

(N=123)

Initial regimen MVC/DTG/3TC/TDF 0 (0%) 1 (2%) 1 (1%)

EVG/COBI/FTC/TAF 1 (2%) 0 (0%) 1 (1%)

DTG/FTC/TAF 17 (28%) 6 (10%) 23 (19%)

DTG/3TC/TDF 1 (2%) 1 (2%) 2 (2%)

DRV/COBI/FTC/TAF 41 (67%) 54 (87%) 95 (77%)

EFV/3TC/TDF 1 (2%) 0 (0%) 1 (1%)

Treatment gaps and suboptimal regimens, which are often critical in understanding clinical events, also become readily apparent in the data set. However, it must be noted that the analysis dataset produced by the macro is often only a starting point for statisticians before producing TLFS, and will sometimes become the input for a BDS or OCCDS, to aid in more complex analyses.

In short, this macro creates an invaluable regimen data set, while maintaining traceability.

REFERENCES

[1] McCallum, S., Ellingson, A., and J. Ritz. October 2019. “Traceable, Regimen-Level ADaM Dataset Derived from Drug-Level SDTM Data for HIV Studies”. Poster. 2019 CDISC US Interchange, San Diego, CA: CDISC.

[2] ADaM Structure for Occurrence Data (ADaM_OCCDS). Version 1.0. Feb 12, 2016. CDISC.

[3] Analysis Data Model Implementation Guide (ADaMIG). Version 1.2. Oct 3, 2019. CDISC.

ACKNOWLEDGEMENTS

This work was supported by the Statistical and Data Management Center (SDMC), AIDS Clinical Trials Group (ACTG), under the National Institute of Allergy and Infectious Diseases (NIAID) grant No. UM1 AI068634, Harvard fund 114660.

This work was supported by the Statistical and Data Management Center (SDMC) – IMPAACT Leadership Group under Eunice Kennedy Shriver National Institute Of Child Health & Human Development (NICHD) and National Institute Of Allergy And Infectious Diseases (NIAID) UM1 AI068616, Harvard fund 114661.

CONTACT INFORMATION

Comments and questions are encouraged. Contact the author at:

[email protected]

mailto:[email protected]

8

APPENDIX

Appendix A Sample of HIV Drugs in the WHO Drug Dictionary Reference Data Set

WHODrug Medication Name Common Name

Medication Abbreviation (CBAR)

Medication Class (CBAR)

Drug Category

ABACAVIR ABACAVIR ABC NRTI HIV

ABACAVIR HYDROCHLORIDE ABACAVIR ABC NRTI HIV

ABACAVIR HYDROCHLORIDE;LAMIVUDINE

ABACAVIR;LAMIVUDINE ABC;3TC HIV

ABACAVIR SUCCINATE ABACAVIR ABC NRTI HIV

ABACAVIR SULFATE ABACAVIR ABC NRTI HIV

ABACAVIR SULFATE;DOLUTEGRAVIR SODIUM;LAMIVUDINE

ABACAVIR;DOLUTEGRAVIR;LAMIVUDINE ABC;DTG;3TC HIV

ABACAVIR SULFATE;LAMIVUDINE ABACAVIR;LAMIVUDINE ABC;3TC HIV

ABACAVIR SULFATE;LAMIVUDINE;ZIDOVUDINE

ABACAVIR;LAMIVUDINE;ZIDOVUDINE ABC;3TC;ZDV HIV

ABACAVIR;DOLUTEGRAVIR;LAMIVUDINE ABACAVIR;DOLUTEGRAVIR;LAMIVUDINE ABC;DTG;3TC HIV

ABACAVIR;LAMIVUDINE ABACAVIR;LAMIVUDINE ABC;3TC HIV

ABACAVIR;LAMIVUDINE;NEVIRAPINE ABACAVIR;LAMIVUDINE;NEVIRAPINE ABC;3TC;NVP HIV

ABACAVIR;LAMIVUDINE;ZIDOVUDINE ABACAVIR;LAMIVUDINE;ZIDOVUDINE ABC;3TC;ZDV HIV

AMPRENAVIR AMPRENAVIR APV PI HIV

ATAZANAVIR ATAZANAVIR ATV PI HIV

ATAZANAVIR SULFATE ATAZANAVIR ATV PI HIV

Appendix B

Sample of HIV Drugs in the Centrally Maintained Reference Spreadsheet

9

Appendix C Output Data Set Naming Convention

The intermediate data set is distinguished from the analysis data set with the letter 'M' following the ADaM prefix 'AD' which indicates 'Map', 'Midpoint', or 'Mid‘. Possible input domain combinations were assigned 1 character codes:

C = CM

E = EC

X = EX

B = CM and EC or CM and EX

This code, along with the 2-3 letter abbreviation for each therapeutic area, are systematically combined to create flexible output data set names that allow for multiple regimen data sets on a study:

ADM[&RGMCAT][C/E/X/B]

AD[&RGMCAT][C/E/X/B]

For example, ADHIVE, where &RGMCAT = HIV and &INDATA = EC.

Appendix D Contents of Intermediate Data Set: ADMHIVC

# Variable Type Label

1 STUDYID Char Study Identifier

2 USUBJID Char Unique Subject Identifier

3 ASEQ Num Analysis Sequence Number

4 ACAT Char Analysis Category of Regimen

5 ATRT Char Reported Name of Drug, Med or Therapy

6 ADECOD Char Analysis Medication

7 ADECODCD Char Analysis Medication Code

8 ASTDT Num Analysis Start Date

9 AENDT Num Analysis End Date

10 ACLAS Char Analysis Medication Class

11 ANNRTI Num Number of NNRTIs in ADECOD

12 ANRTI Num Number of NRTIs in ADECOD

13 AINSTI Num Number of INSTIs in ADECOD

14 API Num Number of PIs in ADECOD

15 APE Num Number of PEs in ADECOD

16 AFI Num Number of FIs in ADECOD

17 AEI Num Number of EIs in ADECOD

18 SRCDOM Char Source Data

19 SRCVAR Char Source Variable

10


20 SRCSEQ Num Source Sequence Number

Appendix E Print of Intermediate Data Set: ADMHIVC

Unique Subject Identifier

Study Identifier

Analysis Sequence

Number

Analysis Category of Regimen

Reported Name of Drug, Med or Therapy Analysis Medication

1001 A0000 1 HIV trizivir Abacavir sulfate;Lamivudine;Zidovudine

A0000 2 HIV lexiva Fosamprenavir calcium

A0000 3 HIV didanosine Didanosine

A0000 4 HIV viread Tenofovir disoproxil fumarate

A0000 5 HIV norvir Ritonavir

A0000 6 HIV reyataz Atazanavir sulfate

A0000 7 HIV prezista Darunavir Ethanolate

A0000 8 HIV truvada Emtricitabine;Tenofovir disoproxil fumarate

1002 A0000 1 HIV TRUVADA Emtricitabine;Tenofovir disoproxil fumarate

A0000 2 HIV RALTEGRAVIR Raltegravir

A0000 3 HIV GENVOYA Cobicistat;Elvitegravir;Emtricitabine;Tenofovir alafenamide fumarate

1003 A0000 1 HIV Lamivudine(3TC) Lamivudine

A0000 2 HIV Zidovudine (AZT) Zidovudine

A0000 3 HIV Indinavir Indinavir

A0000 4 HIV Stavudine (d4T) Stavudine

A0000 5 HIV Ritonavir Ritonavir

A0000 6 HIV Saquinavir Saquinavir

A0000 7 HIV Didanosine (DDI) Didanosine

A0000 8 HIV Abacavir (ABC) Abacavir

A0000 9 HIV Kaletra Lopinavir;Ritonavir

A0000 10 HIV Atazanavir Atazanavir

A0000 11 HIV Emtricitabine Emtricitabine

A0000 12 HIV Ritonavir Ritonavir

11


Study Identifier

Analysis Sequence

Number

Analysis Category of Regimen

Reported Name of Drug, Med or Therapy Analysis Medication

A0000 13 HIV Tenofovir disoproxil fumarate (TDF)

Tenofovir disoproxil fumarate

A0000 14 HIV Dolutegravir Dolutegravir

A0000 15 HIV Truvada Emtricitabine;Tenofovir disoproxil fumarate


Analysis Medication Code

Analysis Start Date

Analysis End Date

Analysis Medication Class

Number of

NRTIs in

ADECOD

Number of

NNRTIs in

ADECOD

Number of EIs

in ADECOD

Number of

INSTIs in

ADECOD

1001 ABC;3TC;ZDV 03SEP2002 20APR2004 3 0 0 0

FPV 25JAN2005 09APR2007 PI 0 0 0 0

ddI 25JAN2005 30AUG2009 NRTI 1 0 0 0

TDF 25JAN2005 30AUG2009 NRTI 1 0 0 0

RTV 25JAN2005 . PI 0 0 0 0

ATV 10APR2007 30AUG2009 PI 0 0 0 0

DRV 31AUG2009 . PI 0 0 0 0

FTC;TDF 31AUG2009 . 2 0 0 0

1002 FTC;TDF 03MAY2011 16JUL2017 2 0 0 0

RAL 03MAY2017 16JUL2017 INSTI 0 0 0 1

COBI;EVG;FTC;TAF 17JUL2017 . 2 0 0 1

1003 3TC 15NOV1995 15NOV1996 NRTI 1 0 0 0

ZDV 15NOV1995 15NOV1996 NRTI 1 0 0 0

IDV 15APR1996 15NOV1996 PI 0 0 0 0

d4T 15NOV1996 15FEB2002 NRTI 1 0 0 0

RTV 15NOV1996 15OCT2005 PI 0 0 0 0

SQV 15NOV1996 15OCT2005 PI 0 0 0 0

ddI 15NOV1996 01APR2010 NRTI 1 0 0 0

ABC 15FEB2002 01APR2010 NRTI 1 0 0 0

LPV;RTV 15OCT2005 01APR2010 0 0 0 0

ATV 01APR2010 28FEB2017 PI 0 0 0 0

FTC 01APR2010 28FEB2017 NRTI 1 0 0 0

12


Analysis Medication Code

Analysis Start Date

Analysis End Date

Analysis Medication Class

Number of

NRTIs in

ADECOD

Number of

NNRTIs in

ADECOD

Number of EIs

in ADECOD

Number of

INSTIs in

ADECOD

RTV 01APR2010 28FEB2017 PI 0 0 0 0

TDF 01APR2010 28FEB2017 NRTI 1 0 0 0

DTG 28FEB2017 . INSTI 0 0 0 1

FTC;TDF 28FEB2017 . 2 0 0 0


Number of PIs

in ADECOD

Number of FIs

in ADECOD

Number of PEs

in ADECOD

Source Data

Source Variable

Source Sequence

Number

1001 0 0 0 CM CMDECOD 42

1 0 0 CM CMDECOD 9

0 0 0 CM CMDECOD 6

0 0 0 CM CMDECOD 56

1 0 0 CM CMDECOD 13

1 0 0 CM CMDECOD 40

1 0 0 CM CMDECOD 26

0 0 0 CM CMDECOD 43

1002 0 0 0 CM CMDECOD 20

0 0 0 CM CMDECOD 19

0 0 1 CM CMDECOD 4

1003 0 0 0 CM CMDECOD 8

0 0 0 CM CMDECOD 42

1 0 0 CM CMDECOD 6

0 0 0 CM CMDECOD 14

1 0 0 CM CMDECOD 11

1 0 0 CM CMDECOD 13

0 0 0 CM CMDECOD 3

0 0 0 CM CMDECOD 1

2 0 0 CM CMDECOD 7

1 0 0 CM CMDECOD 2

0 0 0 CM CMDECOD 5

1 0 0 CM CMDECOD 12

13


Number of PIs

in ADECOD

Number of FIs

in ADECOD

Number of PEs

in ADECOD

Source Data

Source Variable

Source Sequence

Number

0 0 0 CM CMDECOD 15

0 0 0 CM CMDECOD 4

0 0 0 CM CMDECOD 28

Appendix F

Contents of Analysis Data Set: ADHIVC


1 STUDYID Char Study Identifier

2 USUBJID Char Unique Subject Identifier

3 ASEQ Num Analysis Sequence Number

4 ACAT Char Analysis Category of Regimen

5 ARGM Char Analysis Regimen

6 ARGMCD Char Analysis Regimen Code

7 ASTDT Num Analysis Start Date

8 AENDT Num Analysis End Date

9 RGMDUR Num Regimen Duration (days)

10 TRTTOT Num Number of Drugs in Regimen

11 NNRTITOT Num Number of NNRTIs in ARGMCD

12 NRTITOT Num Number of NRTIs in ARGMCD

13 INSTITOT Num Number of INSTIs in ARGMCD

14 PITOT Num Number of PIs in ARGMCD

15 PETOT Num Number of PEs in ARGMCD

16 FITOT Num Number of FIs in ARGMCD

17 EITOT Num Number of EIs in ARGMCD

18 SRCDOM Char Source Data

19 SRCVAR Char Source Variable

20 SRCSEQ01 Num Source Sequence Number 01




14

Appendix G

Print of Analysis Data Set: ADHIVC


Study Identifier

Analysis Sequence

Number

Analysis Category of Regimen Analysis Regimen

Analysis Regimen Code

1001 A0000 1 HIV ABACAVIR SULFATE LAMIVUDINE ZIDOVUDINE ABC 3TC ZDV

A0000 2 HIV

A0000 3 HIV DIDANOSINE TENOFOVIR DISOPROXIL FUMARATE FOSAMPRENAVIR CALCIUM RITONAVIR

ddI TDF FPV RTV

A0000 4 HIV DIDANOSINE TENOFOVIR DISOPROXIL FUMARATE ATAZANAVIR SULFATE RITONAVIR

ddI TDF ATV RTV

A0000 5 HIV EMTRICITABINE TENOFOVIR DISOPROXIL FUMARATE DARUNAVIR ETHANOLATE RITONAVIR

FTC TDF DRV RTV

1002 A0000 1 HIV EMTRICITABINE TENOFOVIR DISOPROXIL FUMARATE

FTC TDF

A0000 2 HIV EMTRICITABINE TENOFOVIR DISOPROXIL FUMARATE RALTEGRAVIR

FTC TDF RAL

A0000 3 HIV EMTRICITABINE TENOFOVIR ALAFENAMIDE FUMARATE ELVITEGRAVIR COBICISTAT

FTC TAF EVG COBI

1003 A0000 1 HIV LAMIVUDINE ZIDOVUDINE 3TC ZDV

A0000 2 HIV LAMIVUDINE ZIDOVUDINE INDINAVIR 3TC ZDV IDV

A0000 3 HIV DIDANOSINE STAVUDINE RITONAVIR SAQUINAVIR

ddI d4T RTV SQV

A0000 4 HIV ABACAVIR DIDANOSINE RITONAVIR SAQUINAVIR

ABC ddI RTV SQV

A0000 5 HIV ABACAVIR DIDANOSINE LOPINAVIR RITONAVIR ABC ddI LPV RTV

A0000 6 HIV EMTRICITABINE TENOFOVIR DISOPROXIL FUMARATE ATAZANAVIR RITONAVIR

FTC TDF ATV RTV

A0000 7 HIV EMTRICITABINE TENOFOVIR DISOPROXIL FUMARATE DOLUTEGRAVIR

FTC TDF DTG


Analysis Start Date

Analysis End Date

Regimen Duration

(days)

Number of

Drugs in

Regimen

Number of

NRTIs in

ARGMCD

Number of

NNRTIs in

ARGMCD

Number of EIs

in ARGMCD

Number of

INSTIs in

ARGMCD

Number of PIs

in ARGMCD

1001 03SEP2002 20APR2004 596 3 3 0 0 0 0

21APR2004 24JAN2005 279 0 0 0 0 0 0

25JAN2005 09APR2007 805 4 2 0 0 0 2

10APR2007 30AUG2009 874 4 2 0 0 0 2

31AUG2009 . . 4 2 0 0 0 2

15


Analysis Start Date

Analysis End Date

Regimen Duration

(days)

Number of

Drugs in

Regimen

Number of

NRTIs in

ARGMCD

Number of

NNRTIs in

ARGMCD

Number of EIs

in ARGMCD

Number of

INSTIs in

ARGMCD

Number of PIs

in ARGMCD

1002 03MAY2011 02MAY2017 2192 2 2 0 0 0 0

03MAY2017 16JUL2017 75 3 2 0 0 1 0

17JUL2017 . . 4 2 0 0 1 0

1003 15NOV1995 14APR1996 152 2 2 0 0 0 0

15APR1996 14NOV1996 214 3 2 0 0 0 1

15NOV1996 14FEB2002 1918 4 2 0 0 0 2

15FEB2002 14OCT2005 1338 4 2 0 0 0 2

15OCT2005 31MAR2010 1629 4 2 0 0 0 2

01APR2010 27FEB2017 2525 4 2 0 0 0 2

28FEB2017 . . 3 2 0 0 1 0


Number of FIs

in ARGMCD

Number of PEs

in ARGMCD

Source Data

Source Variable

Source Sequence Number 01




1001 0 0 ADMHIVC ADECOD 1 . . .

0 0 ADMHIVC ADECOD . . . .

0 0 ADMHIVC ADECOD 2 3 4 5


0 0 ADMHIVC ADECOD 5 7 8 .

1002 0 0 ADMHIVC ADECOD 1 . . .

0 0 ADMHIVC ADECOD 1 2 . .

0 1 ADMHIVC ADECOD 3 . . .

1003 0 0 ADMHIVC ADECOD 1 2 . .






0 0 ADMHIVC ADECOD 14 15 . .

16

Appendix H MKADRGM Code /******************************************************************************/ /* PURPOSE: Create an analysis dataset with one record per participant per drug regimen. /* /* INPUT: /home/public/data/WHODICTIONARY/&DATEDIR/sas_datasets/adrgm_drugref.sas7bdat /* /home/public/data/WHODICTIONARY/&DATEDIR/RGMCAT.xlsx /* &CUSTREF (Custom Reference Spreadsheet if declared by the statistician) /* /* OUTPUT: &OUTLIB..ADM[RGMCAT][C|E|X|B] /* &OUTLIB..AD[RGMCAT][C|E|X|B] /* where C=CM, E=EC, X=EX, or B=Both CM and EC or CM and EX /* /* MACROS USED: %yesnosub /* %paramreq /* %dataexst /* %nobs /* %maxlength /* %pgmlabel /* %permset /* %vexst /* /* NOTES: 1. Ensure &VERSION is updated with each change to the program! /* 2. MKADRGM uses two reference datasets which are created as part of the /* WHODICTIONARY (semi-annual) update process: an excel spreadsheet /* (RGMCAT.xlsx) which is a lookup table of drugs of interest to our /* organization and their relevant therapeutic area, as well as other /* information such as their drug class and abbreviation; and a SAS dataset /* (adrgm_drugref.sas7bdat) that is created from merging that spreadsheet /* to the WHO Drug Dictionary dataset. /******************************************************************************/ %macro MKADRGM(INDATA=, OUTLIB=, RGMCAT=, CUSTREF=, CLASSORD=, DELIM=slash, USUBJL=, DEBUG=N) ; /* define all macro variables as local */ %local RGMDATA ALLCLASS THISCLASS TRANDLIM NDSETS MAXSEQN MAXSEQNF SEQNF SEQVARS DOM DSET INDSETS i k AALLCLASS VERSION DATEDIR TRTCATS RGMXLSX DOMCNT _CLSSOFF DFLTCLASSORD_HIV DFLTCLASSORD_TB DFLTCLASSORD_HCV DFLTCLASSORD_HBV __COREPROG MASTERCLASS STATCLASS TOTALLCLASS DOMFRAG CUM_MAXSEQN CUM_MAXSEQNF CUM_SEQNF DECOD OUTDIR PROBDSETS _STYLST _TMPEXC&i _TMPVI3&i _TMPVI4&i ; %let __COREPROG = MKADRGM ; /*** PROGRAMMER NOTE: Update &VERSION ! ************************************/ /***************************************************************************/ %let VERSION = /home/prinf/programs/development/programs/macros/MKADRGM/20200227 ; %put The CBAR program MKADRGM was used, &VERSION. ; /***INDEX**************************************************************/ /* 1. Initial setup and error checks */ /* 2. Subset input data and add in drug classes/abbreviations */ /* 2b. Create problem listing */ /* 3. Create ADM[RGMCAT][C|E|B] */ /* 4. Create dataset with one record for each start and stop per drug */ /* 5. Add on regimen end dates */ /* 6. Build the regimen in ALLSEQ */ /* 7. Parse ALLSEQ into SRCSEQzz */ /* 8. Build the regimen string */ /* 9. Adjust start and stop dates */ /* 10. Reorder drugs within regimen */ /* 11. Insert blank records for treatment gaps */ /* 12. Merge repeated sequential regimens (i.e. dose increases) */ /* 13. Create final dataset AD[RGMCAT][C|E|B] */ /**********************************************************************/ /*** 1. Initial setup and error checks ************************************/ /**************************************************************************/

17

/* identify the most current version of the reference data */ filename whodir pipe "ls -1t /home/public/data/WHODICTIONARY | head -n1 | sed -e 's#/$##'" ; data _null_ ; /* use the piped results as the input */ infile whodir lrecl = 40 pad end = eof ; input dir $40. ; /* if the last record, create macro variable */ if eof then do ; call symput('DATEDIR', strip(dir)) ; end ; run ; /* CBAR reference dataset: adrgm_drugref.sas7bdat */ libname rgmcat "/home/public/data/WHODICTIONARY/&DATEDIR/sas_datasets" access = readonly ; /* CBAR reference spreadsheet */ %if %length(&CUSTREF) = 0 %then %do ; /* RGMCAT.xslx */ libname rgmcatxl XLSX "/home/public/data/WHODICTIONARY/&DATEDIR/RGMCAT.xlsx" access = readonly ; %end ; /* allow case insensitive Y/N/yes/no values */ %yesnosub(DEBUG) ; options minoperator %if &DEBUG = Y %then %do ; mprint mlogic mlogicnest %end ; ; /* this is the semicolon to end the options statment */ /* check that all required parameters are supplied */ %paramreq(INDATA OUTLIB RGMCAT) ; /* check that all datasets exist */ %dataexst(&INDATA) ; /* upcase a few macro parameters */ %let OUTLIB = %upcase(&OUTLIB) ; %let RGMCAT = %upcase(&RGMCAT) ; %let CLASSORD = %upcase(&CLASSORD) ; %let DELIM = %upcase(&DELIM) ; /* set default delimiter to slash (/) */ %if %length(&DELIM) = 0 %then %let DELIM = slash ; /* check that DELIM is a valid option */ %else %if not (%upcase(&DELIM) in SLASH SEMICOLON SPACE COMMA) %then %do ; %put %upcase(note: (CBAR)) &__COREPROG. - &=DELIM is not a valid option. ; %put %upcase(note: (CBAR)) &__COREPROG. - Valid options for DELIM: slash, semicolon, space, comma. ; %put %upcase(note: (CBAR)) &__COREPROG. - Setting DELIM to slash. ; %let DELIM = slash ; %end ; /* subset the CBAR reference dataset to the selected treatment category */ %let TRTCATS = HIV TB HCV HBV ; %let RGMDATA = RGMDATA ; %if &RGMCAT in (&TRTCATS) %then %do ; data RGMDATA ; set rgmcat.adrgm_drugref ; where rgmcat = "&RGMCAT" ; run ; %end ; %else %do ; %put %upcase(error: (CBAR)) &__COREPROG. - &=RGMCAT is not a valid option. ; %put %upcase(error: (CBAR)) &__COREPROG. - Valid options for RGMCAT are: &TRTCATS.. ; %return ; %end ; /* define the spreadsheet sheet that will be used for reference */ %if &RGMCAT = HIV %then %let RGMXLSX = rgmcatxl.hiv_drugs ; %else %if &RGMCAT = TB %then %let RGMXLSX = rgmcatxl.tb_drugs ; %else %if &RGMCAT = HCV %then %let RGMXLSX = rgmcatxl.hcv_drugs ;

18

%else %if &RGMCAT = HBV %then %let RGMXLSX = rgmcatxl.hbv_drugs ; /* use the custom spreadsheet (if it exists) */ %if %length(&CUSTREF) > 0 %then %do ; /* check the file exists */ %if %sysfunc(fileexist(&CUSTREF)) %then %do; libname rgmcatxl XLSX "&CUSTREF." access = readonly ; %end ; %else %do ; %put %upcase(error: (CBAR)) &__COREPROG. - The following custom reference file does not exist: ; %put %upcase(error: (CBAR)) &=CUSTREF. ; %return ; %end ; /* grab any drugs that are flagged in the custom spreadsheet as not in WHODICTIONARY */ data nowho ; informat aabrv aclas $40. ; length aabrv aclas $40 DrugName $80 ; set &RGMXLSX ; where who in ('N' 'n') ; DrugName = upcase(adecod) ; drop adecod ; format _all_ ; run ; /* add the non-WHODICTIONARY drugs to the CBAR reference dataset */ data RGMDATA ; set nowho RGMDATA ; run ; %end ; /* create macro variable &ALLCLASS with list of classes (alphabetical) */ proc sql noprint nowarn ; select distinct strip(ACLAS) into :ALLCLASS separated by ' ' from &RGMXLSX order by aclas ; /* this dataset will be used to make a class format */ create table classlookup as select distinct aabrv as start, aclas as label, '$aclass' as fmtname from &RGMXLSX ; quit ; /* create ACLASS format to use as lookup table to link drug abbreviations to class */ title1 "Format: $ACLASS" ; title2 "Link drug abbreviations to drug class" ; proc format cntlin = classlookup %if &DEBUG=Y %then %do ; fmtlib ; select $aclass %end ; ; run ; title ; /* add keyword OFF to turn CLASSORD off and accept the order of the custom spreadsheet instead */ %if &CLASSORD = OFF %then %let _CLSSOFF = Y ; %else %let _CLSSOFF = N ; /* set the default class order */ %let DFLTCLASSORD_HIV = NNRTI FI EI INSTI PI PE NRTI ;

19

%let DFLTCLASSORD_TB = FLOA SLOA INJ OTH ; %let DFLTCLASSORD_HCV = NS5B NNS5B NS5A NS34A IMM NUCA PI ; %let DFLTCLASSORD_HBV = NRTI IMM ; %if %length(&CLASSORD) = 0 OR &CLASSORD = OFF %then %do ; %if &RGMCAT = HIV %then %let CLASSORD = &DFLTCLASSORD_HIV ; %else %if &RGMCAT = TB %then %let CLASSORD = &DFLTCLASSORD_TB ; %else %if &RGMCAT = HCV %then %let CLASSORD = &DFLTCLASSORD_HCV ; %else %if &RGMCAT = HBV %then %let CLASSORD = &DFLTCLASSORD_HBV ; %put %upcase(note:) &__COREPROG. - Regimens in the output dataset will be sorted in the following class order: &CLASSORD.. ; %end ; /* check that the classes in CLASSORD exist in the master list and vice versa: FALSE = in list, TRUE = not in list */ %if &_CLSSOFF = N %then %do ; %do i=1 %to %sysfunc(countw(&ALLCLASS)) ; %let MASTERCLASS = %scan(&ALLCLASS, &i) ; %if not (&MASTERCLASS in &CLASSORD) %then %do ; %put %upcase(error: (CBAR)) &__COREPROG. - Please list all classes relevant to &RGMCAT in the CLASSORD parameter. ; %put %upcase(error: (CBAR)) &__COREPROG. - Please see the README for more details. ; %return ; %end ; %end ; %do i=1 %to %sysfunc(countw(&CLASSORD)) ; %let STATCLASS = %scan(&CLASSORD, &i) ; %if not (&STATCLASS in &ALLCLASS) %then %do ; %put %upcase(error: (CBAR)) &__COREPROG. - &STATCLASS is not a valid class in the CLASSORD parameter for the &RGMCAT therapeutic area. ; %put %upcase(error: (CBAR)) &__COREPROG. - Please see the README for more details. ; %return ; %end ; %end ; %end ; /* build AALLCLASS (e.g. ANNRTI AINSTI ...) and TOTALLCLASS (e.g. NNRTITOT, INSTITOT, ...) with regular expressions: find a drug class, and add some letters to the front/back of it s/ specifies a substitution regular expression ([A-Z0-9]+) matches characters or numbers (does not include underscores) into buffer 1 A$1 replace buffer 1 with the letter A immediately followed by the result of buffer 1 (no space in between) $1TOT%nrstr(,) replace buffer 1 with the result of buffer 1 immediately followed by the letters TOT, immediately followed by a comma (no spaces in between) -1 repeat the search until the end of the source &CLASSORD the source */ %let AALLCLASS = %sysfunc(prxchange(s/([A-Z0-9]+)/A$1/, -1, &CLASSORD)) ; %let TOTALLCLASS = %sysfunc(prxchange(s/([A-Z0-9]+)/$1TOT%nrstr(,)/, -1, &CLASSORD)) ; /* setup other delimiters*/ %if &DELIM = SLASH %then %let TRANDLIM = %str(/) ; %else %if &DELIM = COMMA %then %let TRANDLIM = %str(,) ; %else %if &DELIM = SEMICOLON %then %let TRANDLIM = %str(;) ; %else %if &DELIM = SPACE %then %let TRANDLIM = %str( ) ; /* create an autocall to calculate/recalculate ASEQ or regimenID */ %macro RENUMBER(VAROUT) ; by usubjid ; &VAROUT + 1 ; if first.usubjid then &VAROUT = 1 ; /* ASEQ/regimenID starts at 1 for every usubjid */ %mend RENUMBER ; /*** 2. Subset input data and add in drug classes/abbreviations ***********/ /**************************************************************************/

20

%let NDSETS = %sysfunc(countw(&INDATA, " ")) ; /* for each dataset */ %do i=1 %to &NDSETS ; %let DSET = %scan(&INDATA, &i, ' ') ; /* error and exit if the INDATA contains 0 observations */ %if %nobs(&DSET) = 0 %then %do ; %put %upcase(error: (CBAR)) &__COREPROG. - The &INDATA dataset contains 0 observations. ; %return ; %end ; proc sql noprint ; select distinct strip(domain) into :DOM separated by ' ' from &DSET ; quit ; %let DOMCNT = %sysfunc(countw(&DOM)) ; %if &DOMCNT > 1 %then %do ; %put %upcase(error: (CBAR)) &__COREPROG. - You have more than one domain in the &DSET dataset. &__COREPROG relies ; %put %upcase(error: (CBAR)) &__COREPROG. - on your dataset to contain only one domain per input dataset. ; %return ; %end ; /* prioritize ADECOD over an SDTM --DECOD */ %if %vexst(&DSET, ADECOD) > 0 %then %let DECOD = ADECOD ; %else %let DECOD = &DOM.DECOD ; proc sql noprint ; /* subset input data to drugs in ADRGM_DRUGREF and link drug classes/abbreviations */ create table adrgm&i as select strip(a.STUDYID) as STUDYID label = 'Study Identifier', strip(a.USUBJID) as USUBJID label = 'Unique Subject Identifier', "&RGMCAT" as ACAT label = 'Analysis Category of Regimen', strip(a.DOMAIN) as SRCDOM label = 'Source Data', strip(a.&DOM.TRT) as ATRT label = 'Reported Name of Drug, Med or Therapy', /* prioritize ADECOD over an SDTM --DECOD */ %if %vexst(&DSET, ADECOD) > 0 %then %do ; strip(a.ADECOD) as ADECOD label = 'Analysis Medication', "ADECOD" as SRCVAR label = 'Source Variable' length = 7, %end ; %else %do ; strip(a.&DOM.DECOD) as ADECOD label = 'Analysis Medication', "&DOM.DECOD" as SRCVAR label = 'Source Variable' length = 7, %end ; a.&DOM.SEQ as SRCSEQ label = 'Source Sequence Number', a.&DOM.STDT as ASTDT label = 'Analysis Start Date' format=date9., a.&DOM.ENDT as AENDT label = 'Analysis End Date' format=date9., b.DRUGNAME as DRUGNAME label = 'WHODrug Medication Name', b.AABRV as ADECODCD label = 'Analysis Medication Code', b.ACLAS as ACLAS label = 'Analysis Medication Class' from &DSET a inner join &RGMDATA b on upcase(a.&DECOD) = b.DRUGNAME /* debug */ %if %length(&USUBJL) > 0 %then %do ; where usubjid in (&USUBJL) ; %end ; %else %do ; ; %end ; create table excluded_drugs&i as select USUBJID, &DECOD, &DOM.TRT

21

from &DSET where upcase(&DECOD) not in (select Drugname from &RGMDATA) ; quit ; /* error and exit if the adrgm&i dataset has 0 observations (no drugs of &RGMCAT (HIV, TB, etc) type to select) note: while DEBUG=Y, the output datasets are always subset to the list of USUBJIDs entered in the USUBJL parameter, so warnings using NOBS can be triggered if none of the USUBJIDs exist in the dataset */ %if %nobs(adrgm&i) = 0 %then %do ; %put %upcase(error: (CBAR)) &__COREPROG. - No drugs in the &=RGMCAT category were found in the &DSET dataset. ; %return ; %end ; /*** delete records that violate MKADRGM key assumptions: ***********/ /* delete records with missing START/END dates */ data adrgm&i violate3_&i ; set adrgm&i ; if astdt = . AND aendt = . then output violate3_&i ; else output adrgm&i ; run ; /* delete duplicates on the following variables: studyid srcdom usubjid adecod astdt aendt */ proc sort data = adrgm&i out = _mkadrgm&i. nodupkey dupout = violate4_&i ; by studyid srcdom usubjid adecod astdt aendt ; run ; /******************************************************************/ /*** 2b. Create problem listing ***********************************/ /******************************************************************/ /* initialize macro variables to missing to trick CAT if the file is missing */ %let _TMPEXC&i = ; /* excluded drugs file */ %let _TMPVI3&i = ; /* violate3 file */ %let _TMPVI4&i = ; /* violate4 file */ /* frequency listing of excluded drugs */ %if %nobs(excluded_drugs&i) > 0 %then %do ; %let _TMPEXC&i = /tmp/&__COREPROG._&SYSJOBID._tempx_&i..md ; proc printto file = "&&_TMPEXC&i" new ; title1 "## EXCLUDED DRUGS: &DSET" ; title2 "```" ; title3 "Input dataset: &DSET" ; title4 "Drugs* Excluded From &RGMCAT Regimens:" ; title5 "Contact cbar.statprog if you find drugs you think should be included." ; title6 "*Only the first 100 characters are printed." ; footnote "```"; proc freq data=excluded_drugs&i ; where &DECOD ~= "" ; tables &DECOD / list nocum nopct ; format &DECOD $100. ; /* displaying only the first 100 characters! */ run ; title1 "## MISSING --DECOD: &DSET" ; title2 "```" ; title3 "Input dataset: &DSET" ; title4 "Assumption 2: Records with a missing &DECOD will be excluded and may result in records with incomplete regimens!" ; title5 "Drugs* Excluded From &RGMCAT Regimens due to missing &DECOD:" ; title6 "Contact your PDM/SDTM specialist if you find drugs you think should include a &DECOD., however, as a temporary" ; title7 " measure you can code the drugs yourself in ADECOD." ; title8 "*Only the first 100 characters are printed." ; footnote "```"; proc freq data=excluded_drugs&i ; where &DECOD = "" ; tables &DOM.TRT * &DECOD / list missing nocum nopct ;

22

format &DOM.TRT $100. ; /* displaying only the first 100 characters! */ run ; proc printto ; run ; %end ; /* print a note to the log and listing */ %if %nobs(violate3_&i) > 0 %then %do ; %let _TMPVI3&i = /tmp/&__COREPROG._&SYSJOBID._temp3_&i..md ; proc printto file = "&&_TMPVI3&i" new ; %put %upcase(note: (CBAR)) &__COREPROG. - Records were deleted, see problems listing for more details. ; title1 "## 3. DELETED RECORD(S): &DSET" ; title2 "```" ; title3 "Input dataset: &DSET" ; title4 "Violation of assumption 3: &DOM.STDT=. AND &DOM.ENDT=." ; title5 "The following record(s) were deleted:" ; footnote "```"; proc print data=violate3_&i label ; by usubjid ; id usubjid ; var studyid srcdom adecod astdt aendt ; run ; proc printto ; run ; %end ; /* print a note to the log and listing */ %if %nobs(violate4_&i) > 0 %then %do ; %let _TMPVI4&i = /tmp/&__COREPROG._&SYSJOBID._temp4_&i..md ; proc printto file = "&&_TMPVI4&i" new ; %put %upcase(note: (CBAR)) &__COREPROG. - Records were deleted, see problems listing for more details. ; title1 "## 4. DELETED RECORD(S): &DSET" ; title2 "```" ; title3 "Input dataset: &DSET" ; title4 "Violation of assumption 4: Duplicates on: STUDYID DOMAIN USUBJID &DECOD &DOM.STDT &DOM.ENDT" ; title5 "The following record(s) were deleted:" ; footnote "```"; proc print data=violate4_&i label ; by usubjid ; id usubjid ; var studyid srcdom adecod astdt aendt ; run ; proc printto ; run ; %end ; footnote ; /******************************************************************/ %end ; /* get list of datasets that were created above */ proc sql noprint ; select memname into :INDSETS separated by ' ' from sashelp.vtable where libname = "WORK" and memname like '_MKADRGM%' ; select memname into :PROBDSETS separated by ' ' from sashelp.vtable where libname = "WORK" and memname like 'VIOLATE%' or memname like 'EXCLUDED_DRUGS%' having nobs > 0 ; quit ; /* combine input data */ %if &NDSETS > 1 %then %do ; /* maxlength needs at least 2 datasets */ /* create the one character fragment B for BOTH domains entered in &INDATA */ %let DOMFRAG = B ;

23

%maxlength(DSETS = &INDSETS) ; data combined_indata ; &MAXLENGTH ; set &INDSETS ; /* missing end dates are not missing when sorted (they bump to the top) */ if missing(aendt) then _aendt = today() ; else _aendt = aendt ; run ; %end ; %else %do ; /* create the one character fragment from the domain in &INDATA */ %if &DOM = EX %then %let DOMFRAG = X ; %else %let DOMFRAG = %sysfunc(substr(&DOM, 1, 1)) ; data combined_indata ; set &INDSETS ; /* missing end dates are not missing when sorted (they bump to the top) */ if missing(aendt) then _aendt = today() ; else _aendt = aendt ; run ; %end ; /*** create problem listing ***/ /* find OUTDIR for problem listing */ %if &OUTLIB = WORK %then %let OUTDIR = %dirname ; %else %let OUTDIR = %sysfunc(pathname(&OUTLIB)) ; /* remove the previous problem listing even if there are no issues */ x "/bin/rm -f &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md" ; /* create the problem listing */ %if %length(&PROBDSETS) > 0 %then %do ; %put %upcase(note: (CBAR)) &__COREPROG. - A problems listing has been created. Please check the ; %put %upcase(note: (CBAR)) &__COREPROG. - following location for a list of potential issues: ; %put %upcase(note: (CBAR)) &__COREPROG. - &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md ; /* define stylesheet */ %let _STYLST = <link rel='stylesheet' type='text/css' href='/Style%20Library/readme_css.css'/> ; /* create header file with stylesheet reference to add to the top of the problem listing */ x echo "&_STYLST" > &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md ; x echo "" >> &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md ; x echo "This file was created by &__COREPROG. (OUTLIB=&OUTLIB) on &SYSDATE" >> &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md ; x echo "" >> &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md ; /* for each dataset */ %do i=1 %to &NDSETS ; %let DSET = %scan(&INDATA, &i, ' ') ; %if %length(&&_TMPEXC&i &&_TMPVI3&i &&_TMPVI4&i) %then %do ; /* concatenate stylesheet and temporary listings together, then output the listing */ x "cat &&_TMPEXC&i &&_TMPVI3&i &&_TMPVI4&i | sed '/^.$/d' >> &OUTDIR./probs_AD&RGMCAT.&DOMFRAG..md" ; %end ; %end ; /* remove the temporary files */ x "/bin/rm -f /tmp/&__COREPROG._&SYSJOBID.*" ; %end ; /* --SEQ will not always be unique if using more than one input dataset, create a new unique sequence number for each record */ proc sort data = combined_indata ; by usubjid srcdom astdt _aendt srcseq ; run ;

24

data combined_indata (label='DEBUG: combined_indata') ; set combined_indata ; %renumber(ASEQ) ; label aseq = "Analysis Sequence Number" ; run; %if &DEBUG = Y %then %do ; title1 "Dataset: combined_indata" ; title2 "INDATA subset to &RGMCAT drugs" ; proc print data = combined_indata ; by usubjid ; id usubjid ; var studyid acat aseq adecodcd aclas adecod astdt aendt srcdom srcvar srcseq ; run ; %end ; /*** 3. Create ADM[RGMCAT][C|E|B] *****************************************/ /**************************************************************************/ data &OUTLIB..ADM&RGMCAT.&DOMFRAG (label = "&RGMCAT Regimen Intermediate Dataset") ; /* format without format before set statement controls order of variables */ format STUDYID USUBJID ASEQ ACAT ATRT ADECOD ADECODCD ASTDT AENDT ACLAS &AALLCLASS SRCDOM SRCVAR SRCSEQ ; set combined_indata ; /* initialize class counts to zero */ %do i=1 %to %sysfunc(countw(&ALLCLASS)) ; %let THISCLASS = %scan(&ALLCLASS, &i) ; A&THISCLASS = 0 ; label A&THISCLASS = "Number of &THISCLASS.s in ADECOD" ; %end ; /* for each drug count number of classes */ do j=1 to countw(adecodcd,';') ; component = scan(adecodcd, j,';') ; compclass = put(component,$aclass.) ; %do k=1 %to %sysfunc(countw(&ALLCLASS)) ; %let THISCLASS = %scan(&ALLCLASS, &k) ; if compclass = "&THISCLASS" then A&THISCLASS + 1 ; %end ; end ; drop _aendt component compclass j drugname ; run ; proc sort data = &OUTLIB..ADM&RGMCAT.&DOMFRAG ; by USUBJID ASEQ ; run ; %if &OUTLIB ~= WORK %then %do ; %permset(INDATA = &OUTLIB..ADM&RGMCAT.&DOMFRAG) ; %end ; %pgmlabel(&OUTLIB..ADM&RGMCAT.&DOMFRAG, label = &RGMCAT Regimen Intermediate Dataset, file = YES, numobs = 10) ; %if &DEBUG = Y %then %do ; title "ADM&RGMCAT.&DOMFRAG" ; proc contents data = &OUTLIB..ADM&RGMCAT.&DOMFRAG varnum ; run ; title1 "Output Dataset: &OUTLIB..ADM&RGMCAT.&DOMFRAG" ; title2 "One record for each input data record with abbreviations and drug classes" ; proc print data = &OUTLIB..ADM&RGMCAT.&DOMFRAG ; by usubjid ; id usubjid ; run ; %end ; /*** 4. Create dataset with one record for each start and stop per drug ***/ /**************************************************************************/ data _arv_dates (keep = studyid usubjid drgdate srcseq srcvar srcdom atrt adecodcd status acat adecod singleflg label='DEBUG: _arv_dates') ; set &OUTLIB..ADM&RGMCAT.&DOMFRAG (drop = srcseq rename = (aseq = SRCSEQ)) ;

25

if missing(aendt) then _aendt = today() ; else _aendt = aendt ; if astdt = aendt then singleflg = 'Y' ; drgdate = astdt ; status = 'start' ; output ; singleflg = ' ' ; drgdate = _aendt ; status = 'end' ; output ; format drgdate date9. ; run ; /* dataset cannot be sorted using only USUBJID and SRCSEQ because drugs are not always entered in chronological order on CM log */ proc sort data = _arv_dates ; by usubjid drgdate descending status singleflg srcdom srcseq ; run ; %if &DEBUG = Y %then %do ; title1 "Dataset: _arv_dates" ; title2 "One record for each start and stop per drug" ; proc print data=_arv_dates ; by usubjid ; id usubjid ; var srcdom srcvar srcseq adecod adecodcd singleflg status drgdate ; run ; %end ; /*** 5. Add on regimen end dates ******************************************/ /**************************************************************************/ /* use nested set statements to obtain the next start and status. on the second set statement, using firstobs=2 with a keep and rename creates two new variables which are the drgdate and status for the next observation: USUBJID SRCSEQ DRGDATE STATUS ADECODCD 01 1 03/01/16 start ABC 01 1 05/25/17 end ABC USUBJID SRCSEQ DRGDATE STATUS ADECODCD NEXTDATE NEXTSTATUS 01 1 03/01/16 start ABC 05/25/17 end 01 1 05/25/17 end ABC . */ data arv_dates (drop = next: label='DEBUG: arv_dates') ; set _arv_dates ; by usubjid drgdate ; if eof = 0 then set _arv_dates (firstobs = 2 keep = drgdate status rename = (drgdate=nextdate status=nextstatus)) end = eof ; if last.usubjid then call missing(nextdate, nextstatus) ; if status = 'start' and nextstatus = 'end' then drgend = nextdate ; /* flag to indicate consecutive start/stop dates */ if singleflg ~= 'Y' AND drgdate = drgend then consecflg = 'Y' ; format drgend date9. ; run; proc sort data = arv_dates ; by usubjid drgdate descending status singleflg srcdom srcseq ; run ; %if &DEBUG = Y %then %do ; title1 "Dataset: arv_dates" ; title2 "One record for each start and stop per drug with select drug end dates" ; proc print data=arv_dates ; by usubjid ;

26

id usubjid ; var srcdom srcvar srcseq adecod adecodcd singleflg status drgdate drgend consecflg ; run ; %end ; /*** 6. Build the ALLSEQ string *******************************************/ /**************************************************************************/ /* create a lookup table to link each participant's SRCSEQ to ADECODCD */ data usrcseq_drug ; set arv_dates ; keep usubjid srcseq adecodcd adecod ; run ; proc sort data = usrcseq_drug out = srcseq_drug nodupkey ; by usubjid srcseq adecodcd adecod ; run ; /* loop through dates and create regimen (SRCSEQs only) */ data _regimen_allseq ; length allseq $50 _consecflg $1 ; retain allseq _consecflg ; /* loop through records with the same drgdate */ do until(last.drgdate) ; set arv_dates ; by usubjid drgdate ; /* retain CONSECFLG */ if first.drgdate then _consecflg = ' ' ; if consecflg ~= ' ' then _consecflg = consecflg ; else if consecflg = ' ' then consecflg = _consecflg ; if first.usubjid then call missing(allseq) ; /* if a drug is starting, add the drugs SRCSEQ to the ALLSEQ string */ if status='start' then do ; allseq = catx(",", allseq, srcseq) ; /* must output a record for a single dose */ if singleflg = 'Y' AND drgend ~= . then do ; output ; end ; end ; /* else, remove the drug's SRCSEQ from the ALLSEQ string */ else do ; /* build a regular expression to remove appropriate SRCSEQs from list, searching for value of variable (plus 0 or more leading or trailing commas) and replace that SRCSEQ with nothing, 1 means do this replacement 1 time */ allseq = prxchange(cats('S/',srcseq,'[,]*//'),1,allseq) ; end ; end ; /* remove consecutive commas or semicolons */ _allseq = prxchange('s/\,$//',1,strip(allseq)) ; allseq = _allseq ; output ; /* <-- */ drop adecodcd adecod atrt _allseq ; run ; /* add ordering variable: regimenID, and count the number of SRCSEQs in ALLSEQ */ data regimen_allseq (label='DEBUG: regimen_allseq') ; set _regimen_allseq ; %renumber(regimenID) ; nseqs = countw(allseq,',','t') ; /* the 't' modifier trims blanks */ run; %if &DEBUG = Y %then %do ; title "Dataset: regimen_allseq" ; title2 "Compile ALLSEQ (comma separated list of SRCSEQs contributing to regimen)" ;

27

proc print data=regimen_allseq ; by usubjid ; id usubjid ; var allseq nseqs singleflg status drgdate drgend consecflg _consecflg ; run ; %end ; /*** 7. Parse ALLSEQ into _SRCSEQzz ***************************************/ /* this step is repeated in step 12 *****************************/ /**************************************************************************/ /* find max of NSEQS so we know how many _SRCSEQzz variables to create */ proc sql noprint ; select max(nseqs) into :MAXSEQN trimmed from regimen_allseq ; quit ; /* pad with zero */ %let MAXSEQNF = %sysfunc(putn(&MAXSEQN,z2.)) ; /* ALLSEQ is broken down into its numeric components, and a variable _SRCSEQzz is created for each SRCSEQ in ALLSEQ */ data regimen_seqs (label='DEBUG: regimen_seqs') ; set regimen_allseq ; /* loop through ALLSEQ and create SRCSEQ01-SRCSEQzz */ array parsed_vars(*) _srcseq01-_srcseq&MAXSEQNF ; i = 1 ; do while(scan(allseq, i, ",") ne "") ; parsed_vars(i) = scan(allseq, i, ",") ; i + 1 ; end ; /* label _SRCSEQ01-_SRCSEQzz */ %do i=1 %to &MAXSEQN ; %let SEQNF = %sysfunc(putn(&i,z2.)) ; label _srcseq&SEQNF = "Source Sequence Number &SEQNF" ; %end ; drop i ; run ; %if &DEBUG = Y %then %do ; title "Dataset: regimen_seqs" ; title2 "Parse ALLSEQ into _SRCSEQ01-_SRCSEQ&MAXSEQNF" ; proc print data=regimen_seqs ; by usubjid ; id usubjid ; var allseq nseqs singleflg _srcseq01-_srcseq&MAXSEQNF status drgdate drgend ; run ; %end ; /*** 8. Build the regimen string ******************************************/ /**************************************************************************/ /* must pad i with zero to match SRCSEQ */ %let i = %sysfunc(putn(1,z2.)) ; /* create datasets for each SRCSEQzz to the max SRCSEQ */ %do %until (&i = %eval(&MAXSEQNF + 1)) ; /* since i is padded, must increment in this manner */ /* join each SRCSEQzz to the lookup table and assign the corresponding ABBRV/ADECOD (single drug/combo) as its value */ proc sql ; create table druglist&i as select a.usubjid, a.regimenID, a.allseq, a.singleflg, a.drgdate, a.status, a.drgend, a._srcseq&i, b.adecodcd as ADECODCD&i label = "Analysis Medication Code &i", b.adecod as ADECOD&i label = "Analysis Medication &i" from regimen_seqs as a

28

left join srcseq_drug as b on a.usubjid = b.usubjid and a._srcseq&i = b.srcseq order by usubjid, regimenID, allseq, singleflg, drgdate, status, drgend ; quit; %let i = %sysfunc(putn(%eval(&i+1),z2.)) ; %end ; /* merge the DRUGLIST01-DRUGLISTzz datasets back into the original */ data regimen_drugs (label='DEBUG: regimen_drugs') ; merge regimen_seqs druglist01-druglist&MAXSEQNF ; by usubjid regimenID allseq singleflg drgdate status drgend ; drop nseqs srcseq ; length regimen $100 longreg $200 ; /* build the regimen and longregimen strings */ regimen = catx(',', of adecodcd01-adecodcd&MAXSEQNF) ; longreg = catx(',', of adecod01-adecod&MAXSEQNF) ; run ; %if &DEBUG = Y %then %do ; title "Dataset: regimen_drugs" ; title2 "Map _SRCSEQ01-_SRCSEQ&MAXSEQNF to ADECODCD01-ADECODCD&MAXSEQNF and build the REGIMEN string" ; proc print data=regimen_drugs ; by usubjid ; id usubjid ; var regimenID allseq _srcseq: adecodcd: regimen status drgdate drgend ; run ; %end ; /*** 9. Adjust start and stop dates ***************************************/ /**************************************************************************/ data regimen ; set regimen_drugs end=done ; if not done then set regimen_drugs (keep = drgdate status regimen consecflg rename = (drgdate=nextstart status=nextstatus regimen=nextreg consecflg=nextconsecflg) firstobs = 2) ; run; data regimen ; set regimen ; by usubjid ; lagstop = lag(drgend) ; if first.usubjid then call missing(lagstop, regstart) ; if last.usubjid then call missing(nextstart, nextstatus, nextreg, nextconsecflg) ; /* REGIMEN START DATE */ if drgdate = lagstop and status = 'end' then regstart = drgdate + 1 ; /* add one to any dropped drug(s) end date to get the regimen start date */ else regstart = drgdate ; /* otherwise use the current record's date */ /* NOTE: REGSTART still needs another condition applied to it - see data step below */ /* REGIMEN END DATE */ if drgend ~= . then _regend = drgend ; /* if a regimen end date is provided, use it */ /* if the regimen end date is still missing use the next regimen's start date */ if missing(_regend) then do ; /* if the regimen has a consecutive same day start and stop, subtract one */ if nextconsecflg = 'Y' then _regend = nextstart - 1 ; else do ;

29

if status = 'start' then _regend = nextstart - 1 ; /* if a drug is starting, subtract one */ else if status = 'end' then do ; if nextstatus = 'start' then _regend = nextstart - 1 ; /* if the next regimen is starting a drug, subtract one */ else if nextstatus = 'end' then _regend = nextstart ; /* if the next regimen is dropping a drug, use that date */ end ; end ; end ; /* delete empty records */ if not anyalpha(regimen) then delete ; /* delete the transition record IF: it is a "next day" data entry type (_regend + 1 = nextstart), AND it is a one day regimen (drgdate = _regend), AND there is no gap (drgdate = lagstop) */ if _regend + 1 = nextstart AND drgdate = _regend AND drgdate = lagstop then DELETE ; format regstart _regend lagstop date9. ; run; data regimen ; set regimen ; by usubjid ; if first.usubjid then lagend = . ; else lagend = lag(_regend) ; /* if the regimen start date is the same as the previous record's regimen end date, then add one to the regimen start date */ if regstart = lagend then regstart + 1 ; /* NOTE: this could not be done in the previous data step because _REGEND must first be calculated for every record before it is lagged */ format lagend date9. ; drop drgend lagstop status next: regimenID ; run ; /* recalculate regimenID */ data regimen (label='DEBUG: regimen') ; set regimen ; %renumber(regimenID) ; run ; proc sort data = regimen ; by usubjid regimenID ; run ; %if &DEBUG = Y %then %do ; title1 "Dataset: regimen" ; title2 "Regimens with adjusted start and stop dates" ; proc print data=regimen ; by usubjid ; id usubjid ; var regimenID allseq regstart _regend regimen ; run ; %end ; /*** 10. Reorder drugs within regimen *************************************/ /**************************************************************************/ /* build a dataset from the CLASSORD parameter to create a format to order the drug classes start is the left side of the = label is the right side of the = */ %if &_CLSSOFF = N %then %do ; data clssordlookup ; length start $ 5 ; do label = 1 to countw("&CLASSORD") ; start = scan("&CLASSORD", label) ;

30

fmtname = "$clssord" ; output ; end ; run ; /* links order to drug classes */ proc format cntlin = clssordlookup ; run ; %end ; /* build a dataset from the spreadsheet, maintaining the spreadsheet order, to create a format to order the drugs within the drug classes */ data drgordlookup ; set &RGMXLSX (keep = aabrv rename = (aabrv = start)) ; label = _n_ ; fmtname = '$drgord' ; run; proc sort data = drgordlookup nodupkey ; by start ; run ; /* links order to drugs within class */ proc format cntlin = drgordlookup ; run ; /* break down the regimen into its components to be ordered */ proc sort data = regimen ; by usubjid regimenID ; run ; data order_regimen ; length _drug $100 ; set regimen ; by usubjid regimenID ; if first.regimenID then i=1 ; do while(scan(regimen, i, ",;") ne "") ; _abrev = scan(regimen, i, ",;") ; _drug = upcase(strip(scan(longreg, i, ",;"))) ; _class = put(_abrev, $aclass.) ; %if &_CLSSOFF = N %then %do ; _classord = input(put(_class, $clssord.),3.) ; %end ; _drugord = input(put(_abrev, $drgord.),3.) ; output ; i+1 ; end ; run ; /* concatenate the ordered regimen components */ proc sort data = order_regimen ; by usubjid regimenID %if &_CLSSOFF = N %then %do ; _classord %end ; _drugord ; run ; data regimens_inorder ; length new_regimen $100 new_longreg $200 ; retain new_regimen new_longreg ; do until(last.regimenID) ; set order_regimen ; by usubjid regimenID ; if first.regimenID then call missing(new_regimen, new_longreg) ; new_regimen = catx("&TRANDLIM", new_regimen, _abrev) ; new_longreg = catx("&TRANDLIM", new_longreg, _drug) ; end; drop i _abrev _drug _drugord drgdate longreg singleflg adecodcd: adecod: _srcseq: srcvar %if &_CLSSOFF = N %then %do ; _class _classord %end ; ; run; /*** 11. Insert blank records for treatment gaps *************************/

31

/**************************************************************************/ data regimen_inorder ; set regimens_inorder end=done ; by usubjid ; if not done then set regimens_inorder (keep = regstart rename = (regstart=nextstart) firstobs = 2) ; if last.usubjid then call missing(nextstart) ; daysOff = sum(nextstart - _regend) ; output ; /* <-- */ if daysOff > 1 then do ; call missing(regimenID, allseq, regimen, new_regimen, new_longreg, daysOff) ; regstart = _regend + 1 ; _regend = nextstart - 1 ; output ; end; drop regimenID ; run; /* recalculate regimenID */ data regimen_inorder (label='DEBUG: regimen_inorder') ; set regimen_inorder ; %renumber(regimenID) ; run ; proc sort data = regimen_inorder ; by usubjid regimenID ; run ; %if &DEBUG = Y %then %do ; title1 "Dataset: regimen_inorder" ; title2 "Regimen with correct class/drug order, and blank records inserted for treatment gaps" ; title3 "regimen -> new_regimen" ; proc print data = regimen_inorder ; by usubjid ; id usubjid ; var regimenID allseq regstart _regend regimen new_regimen ; run ; %end ; /*** 12. Merge repeated sequential regimens (i.e. dose increases) *********/ /**************************************************************************/ proc sort data = regimen_inorder ; by usubjid new_regimen regstart ; run ; data regimens_touching ; set regimen_inorder ; by usubjid new_regimen regstart ; length cum_allseq $50 ; retain cum_allseq '' first_start prev_start prev_end compressord 0 ; if first.new_regimen then do ; compressord = compressord + 1 ; first_start = regstart ; prev_start = regstart ; prev_end = _regend ; cum_allseq = allseq ; end ; else do ; daysBtwn = regstart - prev_end ; if daysBtwn in (0 1) then do ; /* if there is no gap in dates, then we merge the information */ prev_start = regstart ; prev_end = _regend; cum_allseq = cats(cum_allseq, ",", allseq) ; end ; else do ; /* if there is a gap in dates, then we do not merge the information */ compressord = compressord + 1 ; first_start = regstart ; prev_start = regstart ;

32

prev_end = _regend ; cum_allseq = allseq ; end; end; /* use a regular expression to remove duplicate SEQs created when merging records \b matches a word boundary \d+ matches the numbers 0 to 9 only as many times as possible \, matches exactly a comma character .*? matches any characters as few times as possible \1+ matches the first capturing group as many times as possible */ rex1 = prxparse('s/(\b\d+\,\b)(.*?)(\b\1+\b)/\2\3/i') ; rex2 = prxparse('/(\b\d+\,\b)(.*?)(\b\1+\b)/i') ; do i=1 to 100 ; cum_allseq = prxchange(rex1, -1, compbl(cum_allseq)) ; if not prxmatch(rex2, compbl(cum_allseq)) then leave ; end ; format first_start prev_start prev_end date9. ; run; proc sort data = regimens_touching ; by compressord ; run ; data regimen_dosechg ; set regimens_touching ; by compressord ; regstart = first_start ; /* the last record contains the correct start/end date, as well as ALL of the merged SRCSEQs (in cum_allseq) */ if last.compressord then output ; drop i regimenID regimen rex: allseq ; run; proc sort data = regimen_dosechg ; by usubjid regstart _regend ; run ; /* recalculate regimenID */ data regimen_dosechg ; set regimen_dosechg ; %renumber(regimenID) ; cum_nseqs = countw(cum_allseq,',','t') ; /* the 't' modifier trims blanks */ run ; proc sort data = regimen_dosechg ; by usubjid regimenID ; run ; /* parse ALLSEQ into SRCSEQzz, this step is repeated in step 7. */ /* find max of NSEQS so we know how many SRCSEQzz variables to create */ proc sql noprint ; select max(cum_nseqs) into :CUM_MAXSEQN trimmed from regimen_dosechg ; quit ; /* pad with zero */ %let CUM_MAXSEQNF = %sysfunc(putn(&CUM_MAXSEQN,z2.)) ; /* CUM_ALLSEQ is broken down into its numeric components, and a variable SRCSEQzz is created for each SRCSEQ in CUM_ALLSEQ */ data regimen_merged (label='DEBUG: regimen_merged') ; set regimen_dosechg ; /* loop through CUM_ALLSEQ and create SRCSEQ01-SRCSEQzz */ array parsed_vars(*) SRCSEQ01-SRCSEQ&CUM_MAXSEQNF ; i = 1 ; do while(scan(cum_allseq, i, ",") ne "") ; parsed_vars(i) = scan(cum_allseq, i, ",") ; i + 1 ; end ;

33

/* label SRCSEQ01-SRCSEQzz */ %do i=1 %to &CUM_MAXSEQN ; %let CUM_SEQNF = %sysfunc(putn(&i,z2.)) ; label SRCSEQ&CUM_SEQNF = "Source Sequence Number &CUM_SEQNF" ; %end ; drop i ; run ; %if &DEBUG = Y %then %do ; title1 "Dataset: regimen_merged" ; title2 "Repeated sequential regimens merged" ; proc print data = regimen_merged ; by usubjid ; id usubjid ; var regimenID regstart _regend new_regimen cum_allseq srcseq: ; run ; %end ; /*** 13. Create final dataset AD[RGMCAT][C|E|B] ***************************/ /**************************************************************************/ data _final_regimen ; length SRCDOM $10 SRCVAR $7 ; set regimen_merged ; SRCDOM="ADM&RGMCAT.&DOMFRAG" ; SRCVAR="ADECOD" ; ARGMCD = new_regimen ; ARGM = new_longreg ; /* total number of drugs in regimen */ TRTTOT = countw(ARGMCD, '/;,', 's') ; ASTDT = regstart ; if _regend = today() then AENDT = . ; else AENDT = _regend ; /* calculate duration of regimen (days) */ RGMDUR = (AENDT - ASTDT) + 1 ; /* initialize class counts to zero and label */ %do i=1 %to %sysfunc(countw(&ALLCLASS)) ; %let THISCLASS = %scan(&ALLCLASS, &i) ; &THISCLASS.TOT = 0 ; label &THISCLASS.TOT = "Number of &THISCLASS.s in ARGMCD" ; %end ; /* create class totals for regimen */ do j=1 to countw(new_regimen,"&TRANDLIM") ; component = scan(new_regimen, j,"&TRANDLIM") ; compclass = put(component,$aclass.) ; %do k=1 %to %sysfunc(countw(&ALLCLASS)) ; %let THISCLASS = %scan(&ALLCLASS, &k) ; if compclass = "&THISCLASS" then &THISCLASS.tot + 1 ; %end ; end ; label ASTDT = "Analysis Start Date" AENDT = "Analysis End Date" ARGM = "Analysis Regimen" ARGMCD = "Analysis Regimen Code" TRTTOT = "Number of Drugs in Regimen" RGMDUR = "Regimen Duration (days)" SRCVAR = "Source Variable" ;

34

format ASTDT AENDT date9. ; drop j daysOff regstart _regend component compclass regimenID new_longreg new_regimen _: ; run ; /* add ASEQ */ data final_regimen ; set _final_regimen ; %renumber(ASEQ) ; label ASEQ = "Analysis Sequence Number" ; run ; /* create list of SRCSEQ vars and [class]TOT vars to be used to reorder variables */ proc sql noprint ; select name into :SEQVARS separated by ',' from dictionary.columns where libname = 'WORK' and memname = 'FINAL_REGIMEN' and name like 'SRCSEQ%' and name not = 'SRCSEQ' ; quit ; /* use sql to reorder variables and output the final dataset */ proc sql ; create table &OUTLIB..AD&RGMCAT.&DOMFRAG (label = "&RGMCAT Regimen Analysis Dataset") as select STUDYID, USUBJID, ASEQ, ACAT, ARGM, ARGMCD, ASTDT, AENDT, RGMDUR, TRTTOT, &TOTALLCLASS /*, intentionally missing */ SRCDOM, SRCVAR, &SEQVARS from final_regimen order by USUBJID, ASEQ ; quit ; /* sort output dataset */ proc sort data = &OUTLIB..AD&RGMCAT.&DOMFRAG ; by USUBJID ASEQ ; run ; %if &OUTLIB ~= WORK %then %do ; %permset(INDATA = &OUTLIB..AD&RGMCAT.&DOMFRAG) ; %end ; %pgmlabel(&OUTLIB..AD&RGMCAT.&DOMFRAG, label = &RGMCAT Regimen Analysis Dataset, file = YES, numobs = 10); %if &DEBUG = Y %then %do ; title1 "Output Dataset: &OUTLIB..AD&RGMCAT.&DOMFRAG" ; title2 "Final &RGMCAT regimens" ; proc print data=&OUTLIB..AD&RGMCAT.&DOMFRAG label ; by usubjid ; id usubjid ; var ASEQ ARGMCD ASTDT AENDT RGMDUR SRCSEQ: ; run ; %end ; /* delete _MKADRGM&i and PROBDSETS datasets */ proc delete lib = work data = &INDSETS ; run ; %if %length(&PROBDSETS) > 0 %then %do ; proc delete lib = work data = &PROBDSETS ; run ; %end ; %if &DEBUG = Y %then %do ; %put _LOCAL_ ; %end ; title ; %mend MKADRGM ;

MKADRGM: A Macro to Transform Drug-Level SDTM Data into ...€¦ · [2] ADaM Structure for...

Documents

Transcript of MKADRGM: A Macro to Transform Drug-Level SDTM Data into ...€¦ · [2] ADaM Structure for...