Post-lock Data Flow: From CRF to FDA

22
Post-Lock Data Flow: From CRF to FDA Ben Vaughn, MS, RAC Principal Statistical Scientist

Transcript of Post-lock Data Flow: From CRF to FDA

Page 1: Post-lock Data Flow: From CRF to FDA

Post-Lock Data Flow:From CRF to FDA

Ben Vaughn, MS, RACPrincipal Statistical Scientist

Page 2: Post-lock Data Flow: From CRF to FDA

Broad Strokes of What Statisticians Do with Clinical Trial Data

Get data from data management

Do assorted “Stuff” with that data

Make Tables, Listings and Figures

Transfer Tables, Listings and Figures to Medical Writing

Page 3: Post-lock Data Flow: From CRF to FDA

A little more detail and some terms-Step 1: EDC System Data

• Data exported from an EDC System is frequently called “raw” data.– Can contain superfluous variables (edit checks, audit

trails, etc.)– Not very conducive to generating tables (form follows

function: typically parallels CRF pages)– Documentation isn’t designed for an FDA reviewer as

the audience– Datasets are highly variable from vendor to vendor

and from various EDC systems– Generally speaking there is a 1:1 relationship; each

data point appears once and only once in the raw data

Page 4: Post-lock Data Flow: From CRF to FDA

A little more detail and some terms-Step 2: Analysis datasets• Datasets generated from the raw data to facilitate the production

of Tables Listings and Figures are called “Analysis Datasets” (ADS)– Any variables that must be derived are added: scoring of

instruments, determination of whether AEs are treatment emergent, determination of baseline values and calculation of change from baseline, etc.

– Key variables are merged onto all records: treatment codes, covariates, study start and stop dates, age, gender, race, etc.

– Datasets might be split or combined into more logical groups; many different patient reported outcomes might be in a single dataset from DM, but have different analysis rules, and therefore be split into multiples analysis datasets.

– Goal is to create datasets where most key points of information on tables can be generated with one procedure

Page 5: Post-lock Data Flow: From CRF to FDA

The Assorted “Stuff”

• Documentation is written for the raw EDC data to allow an FDA reviewer to understand the source of each data point

• Analysis datasets (ADS) are generated using (SAS) programs to transform the raw data into analysis datasets; these can be rerun on new cuts of data

• Documentation is written for the ADS to allow an FDA reviewer to understand the source of each variable/row and how it maps from the raw data

• Did I mention documentation? FDA loves documentation.

Page 6: Post-lock Data Flow: From CRF to FDA

But wait! Data standardsElectronic Standardized Study Data Timeline (Fitzmartin, PhUSE 2014)

Page 7: Post-lock Data Flow: From CRF to FDA

Data Standards, Cont.

• ALL data submitted to FDA for studies starting next year, MUST conform to data standards (but sponsors should already be doing it)

• These guidances are BINDING, refusal to file is possible if they are not followed

• A draft guidance defines what the standards are: Study Data Tabulation Model (SDTM) for the “raw” data and Analysis Data Model (ADaM) for the analysis datasets; this guidance is actively reviewed and updated

• A sponsor may apply for a waiver, but FDA seems unlikely to grant them

Page 8: Post-lock Data Flow: From CRF to FDA

Data Standards: SDTM

• Extremely rigid format• Anything can be mapped into this format, and there is

a standard for expanding it for things that don’t map well to an existing pre-specified dataset

• Does not necessarily reflect the flow of a clinic visit or the CRF design, which can make it difficult to implement directly in an EDC system

• Some types of data there is no excuse not to get in SDTM from the start (ex: central labs vendor should be able to provide SDTM data)

• Standardized documentation (define.xml)• Submitted to FDA in place of raw data

Page 9: Post-lock Data Flow: From CRF to FDA

Data Standards: ADaM

• Typically uses SDTM as its source• Somewhat less rigid than SDTM• Fewer specified data structures (but expanding):

– ADSL (Subject- Level dataset; standard variables for treatments, dates, sites, age, sex, race, populations

– ADAE (Adverse Events)– ADTTE (Time to event)– OCCDS (Occurrence Data Structure, generalization

of ADAE for things like Medical History and Concomitant Medications)

– BDS (Everything else)• Standardized documentation (Define.xml or Define.pdf)

Page 10: Post-lock Data Flow: From CRF to FDA

Data Standards: ADaM, cont.

Legacy data is frequently a “Wide” format…Subject

Visit

DIABP

SYSBP

PULSE

RESP WEIGHT

HEIGHT

BMI

DIABPBL

SYSBPBL

PULSEBL

RESPBL

WEIGHTBL

HEIGHTBL

BMIBL

DIABPCBL

SYSBPCBL

PULSECBL

RESPCBL

WEIGHTCBL

HEIGHTCBL

BMICBL

Page 11: Post-lock Data Flow: From CRF to FDA

Data Standards: ADaM, cont.Crammed onto one row:

Subject

Visit DIABP

SYSBP

PULSE

RESP WEIGHT

HEIGHT

BMI DIABPBL

SYSBPBL

PULSEBL

RESPBL

WEIGHTBL

HEIGHTBL

BMIBL

DIABPCBL

SYSBPCBL

PULSECBL

RESPCBL

WEIGHTCBL

HEIGHTCBL

BMICBL

Page 12: Post-lock Data Flow: From CRF to FDA

Data Standards: ADaM, cont.SDTM and ADaM are “Tall, Skinny” formats

SUBJID

AVISITN

PARAMCD

AVAL

BASE CHG

001 1 DIABP001 1 SYSBP001 1 PULSE001 1 RESP001 1 WEIGHT001 1 HEIGHT001 1 BMI

Page 13: Post-lock Data Flow: From CRF to FDA

Data Standards: ADaM Advantages• Huge efficiencies for table programming:

– You almost never need to look up variable names– Programming code for one table can be altered to

make a similar table by just changing the dataset and parameters

• Standard documentation allows reviewers to easily understand what is in each dataset, how it was derived and which flags should be used to produce a particular display

• Data from multiple studies can be “Stacked” as long as things like the parameter codes are uniform

Page 14: Post-lock Data Flow: From CRF to FDA

Data Standards: ADaM Disadvantages• Datasets are a bigger investment• Completely fails where you need multiple

outcomes on a single row• “Drill down” questions are problematic; can be

created as additional rows/ outcomes, but clinical reviewers are typically interested in how they relate to the questions that triggered the drill down

• Clinical reviewers almost always want “Wide” listings: Everything collected at the same time point on a single row (Transpose of the data is required)

Page 15: Post-lock Data Flow: From CRF to FDA

CDASH: Related, but not required

• Clinical Data Acquisition Standards Harmonization (CDASH) is a suite of standardized CRFs and variable names for the data points collected in those forms

• Goes cleanly and uniformly into SDTM• Saves time and money!• Your study is no longer a unique snowflake• It is likely that there will always be non-standard

data collected, so manual mapping will be required

Page 16: Post-lock Data Flow: From CRF to FDA

Broad Strokes of What Statisticians Do

Get data from data managementMap “raw” data to SDTM and generate documentationMap SDTM data to ADaM and generate documentationMake Tables, Listings and FiguresTransfer Tables, Listings and Figures to Medical Writing

Page 17: Post-lock Data Flow: From CRF to FDA

NDA/BLA Submission

• An integrated analysis of safety and efficacy (ISS/ISE) will be needed for nearly all NDAs and BLAs

• Many individual studies must be combined into an ISS/ISE database

• Integrated data must be summarized in ISS/ISE post-text tables

• This is distilled into the ISS/ISE text and sections 2.7.3 and 2.7.4 of the eCTD

Page 18: Post-lock Data Flow: From CRF to FDA

Ideal Dataflow Process

CDASH CRF data SDT

M ADaM TLFs CSR

Study 2 ADS

Study …n ADS

Study 3 ADS

ISS/ ISE ADS

ISSISETLFs

ISSISE

All SDTM is created consistently; study analysis datasets are created with uniform structures; all information can be cleanly and sequentially linked back to the CRF data.

NDA

Page 19: Post-lock Data Flow: From CRF to FDA

More Typical State of Data

CRF data SDTM

Study ADS

(ADaM)TLFs CSR

(Some) Phase III studies

CRF data in Legacy Format

Study ADS TLFs CSR

Phase I/II (III) studies

Assorted judgment calls with documentation of varying quality

Page 20: Post-lock Data Flow: From CRF to FDA

Considerations for Legacy Conversions• FDA places an extremely high value on traceability

and reproducibility- this trumps any data standard• SDTM conversion of legacy data is NOT required• When converting legacy data to SDTM for submission

(where CSRs were generated off legacy data) FDA suggests additionally submitting the legacy data

• FDA has not clearly indicated that it uses SDTM data in any way for non-pivotal trials where the CSR relies on legacy data.

Page 21: Post-lock Data Flow: From CRF to FDA

Suggested Integration and Submission ApproachSDTM Study #1

Study ADS

(ADaM)TLFs CSR

Legacy Study #1 Study ADS TLFs CSR

SDTM Study #2..n

Study ADS

(ADaM)TLFs CSR

Legacy Study #2..n

Study ADS TLFs CSR

Map study ADS into uniform

ADAM

ISSISETLFs

ISSISE

NDA