Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic &...

Use of Genomics in Use of Genomics in Clinical Trial Design and Clinical Trial Design and

How to Critically Evaluate How to Critically Evaluate Claims for Prognostic & Claims for Prognostic & Predictive BiomarkersPredictive Biomarkers

Richard Simon, D.Sc.Richard Simon, D.Sc.Chief, Biometric Research BranchChief, Biometric Research Branch

National Cancer InstituteNational Cancer Institutehttp://brb.nci.nih.govhttp://brb.nci.nih.gov

BRB WebsiteBRB Websitebrb.nci.nih.govbrb.nci.nih.gov

Powerpoint presentationsPowerpoint presentations ReprintsReprints BRB-ArrayTools softwareBRB-ArrayTools software

Data archiveData archive Q/A message boardQ/A message board

Web based Sample Size Planning Web based Sample Size Planning Clinical TrialsClinical Trials

Optimal 2-stage phase II designsOptimal 2-stage phase II designs Phase III designs using predictive biomarkersPhase III designs using predictive biomarkers Phase II/III designsPhase II/III designs

Development of gene expression based predictive Development of gene expression based predictive classifiersclassifiers

Different Kinds of Different Kinds of BiomarkersBiomarkers

EndpointEndpoint Measured before, during and after treatment Measured before, during and after treatment

to monitor treatment effectto monitor treatment effect Surrogate of clinical endpointSurrogate of clinical endpoint PharmacodynamicPharmacodynamic

Predictive biomarkersPredictive biomarkers Measured before treatment to identify who Measured before treatment to identify who

will benefit from a particular treatmentwill benefit from a particular treatment Prognostic biomarkersPrognostic biomarkers

Measured before treatment to indicate long-Measured before treatment to indicate long-term outcome for patients untreated or term outcome for patients untreated or receiving standard treatmentreceiving standard treatment

Types of Validation for Types of Validation for Prognostic and Predictive Prognostic and Predictive

BiomarkersBiomarkers Analytical validationAnalytical validation

Accuracy, reproducibility, robustnessAccuracy, reproducibility, robustness Clinical validationClinical validation

Does the biomarker predict a clinical Does the biomarker predict a clinical endpoint or phenotypeendpoint or phenotype

Clinical utilityClinical utility Does use of the biomarker result in Does use of the biomarker result in

patient benefitpatient benefit By informing treatment decisionsBy informing treatment decisions Is it actionableIs it actionable

Prognostic and Predictive Prognostic and Predictive Biomarkers in OncologyBiomarkers in Oncology

Single gene or protein measurementSingle gene or protein measurement Scalar index or classifier that Scalar index or classifier that

summarizes expression levels of summarizes expression levels of multiple genesmultiple genes

Prognostic Factors in OncologyPrognostic Factors in Oncology

Many prognostic factors are not used because Many prognostic factors are not used because they are not actionablethey are not actionable Most prognostic factor studies are not conducted with Most prognostic factor studies are not conducted with

an intended use an intended use They use a convenience sample of heterogeneous patients They use a convenience sample of heterogeneous patients

for whom tissue is availablefor whom tissue is available Retrospective studies of prognostic markers Retrospective studies of prognostic markers

should be planned and analyzed with specific should be planned and analyzed with specific focus on intended use of the markerfocus on intended use of the marker

Design of prospective studies depends on context Design of prospective studies depends on context of use of the biomarkerof use of the biomarker Treatment options and practice guidelinesTreatment options and practice guidelines Other prognostic factorsOther prognostic factors

Potential Uses of a Potential Uses of a Prognostic BiomarkerPrognostic Biomarker

Identify patients who have very good Identify patients who have very good prognosis on standard treatment and prognosis on standard treatment and do not require more intensive do not require more intensive regimens regimens

Identify patients who have poor Identify patients who have poor prognosis on standard chemotherapy prognosis on standard chemotherapy who are good candidates for who are good candidates for experimental regimensexperimental regimens

Prospective Marker Prospective Marker Strategy DesignStrategy Design

Patients are randomized to eitherPatients are randomized to either have marker measured and treatment have marker measured and treatment

determined based on marker result and determined based on marker result and clinical features clinical features

don’t have marker measured and don’t have marker measured and receive standard of care treatment receive standard of care treatment based on clinical features alonebased on clinical features alone

Randomize Patients to Test or No Test

Rx Determined by Test

Rx DeterminedBy SOC

Marker Strategy DesignMarker Strategy Design

InefficientInefficient Many patients get the same treatment Many patients get the same treatment

regardless of which arm they are regardless of which arm they are randomized torandomized to

UninformativeUninformative Since patients in the standard of care Since patients in the standard of care

arm do not have the marker measured, it arm do not have the marker measured, it is not possible to compare outcome for is not possible to compare outcome for patients whose treatment is changed patients whose treatment is changed based on the marker result based on the marker result

Using phase II data, develop predictor of response to new drugApply Test to All Eligible Patients

Test Deterimined Rx DifferentFrom SOC

Use TestDetermined Rx Use SOC

Test Determined Rx Same asSOC

Off Study

Prospective Evaluation of Prospective Evaluation of OncotypeDx (TAILORx)OncotypeDx (TAILORx)

For patients with predicted low risk For patients with predicted low risk of recurrenceof recurrence Withhold chemotherapy and observe Withhold chemotherapy and observe

long term recurrence ratelong term recurrence rate If recurrence rate is very low, potential If recurrence rate is very low, potential

chemotherapy benefit must be very smallchemotherapy benefit must be very small

Predictive BiomarkersPredictive Biomarkers

Prospective Co-Prospective Co-Development of Drugs and Development of Drugs and

Companion DiagnosticsCompanion Diagnostics1.1. Develop a completely specified Develop a completely specified

genomic classifier of the patients genomic classifier of the patients likely to benefit from a new druglikely to benefit from a new drug

2.2. Establish analytical validity of the Establish analytical validity of the classifierclassifier

3.3. Use the completely specified Use the completely specified classifier in the primary analysis classifier in the primary analysis plan of a phase III trial of the new plan of a phase III trial of the new drugdrug

Guiding PrincipleGuiding Principle

The data used to develop the classifier The data used to develop the classifier should be distinct from the data used should be distinct from the data used to test hypotheses about treatment to test hypotheses about treatment effect in subsets determined by the effect in subsets determined by the classifierclassifier Developmental studies can be exploratoryDevelopmental studies can be exploratory Studies on which treatment effectiveness Studies on which treatment effectiveness

claims are to be based should not be claims are to be based should not be exploratoryexploratory

Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug

Patient Predicted Responsive

New Drug Control

Patient Predicted Non-Responsive

Off Study

Evaluating the Efficiency of Evaluating the Efficiency of Enrichment DesignEnrichment Design

Simon R and Maitnourim A. Evaluating the efficiency of Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006supplement 12:3229, 2006

Maitnourim A and Simon R. On the efficiency of Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-targeted clinical trials. Statistics in Medicine 24:329-339, 2005.339, 2005.

reprints and interactive sample size calculations at reprints and interactive sample size calculations at http://linus.nci.nih.govhttp://linus.nci.nih.gov

Relative efficiency of targeted design Relative efficiency of targeted design depends on depends on proportion of patients test positiveproportion of patients test positive effectiveness of new drug (compared to effectiveness of new drug (compared to

control) for test negative patientscontrol) for test negative patients When less than half of patients are test When less than half of patients are test

positive and the drug has little or no positive and the drug has little or no benefit for test negative patients, the benefit for test negative patients, the targeted design requires dramatically targeted design requires dramatically fewer randomized patientsfewer randomized patients

Stratification DesignStratification Design

Develop Predictor of Response to New Rx

Predicted Non-responsive to New Rx

Predicted ResponsiveTo New Rx

ControlNew RX Control

New RX

Stratification Design Stratification Design

Use the test to structure a prospective specified Use the test to structure a prospective specified primary analysis planprimary analysis plan

Having a prospective analysis plan is essentialHaving a prospective analysis plan is essential “ “Stratifying” (balancing) the randomization is Stratifying” (balancing) the randomization is

useful to ensure that all randomized patients useful to ensure that all randomized patients have tissue available but is not a substitute for a have tissue available but is not a substitute for a prospective analysis planprospective analysis plan

The purpose of the study is to evaluate the new The purpose of the study is to evaluate the new treatment overall and for the pre-defined treatment overall and for the pre-defined subsets; not to modify or refine the classifier subsets; not to modify or refine the classifier

The purpose is not to demonstrate that repeating The purpose is not to demonstrate that repeating the classifier development process on the classifier development process on independent data results in the same classifierindependent data results in the same classifier

R Simon. Using genomics in clinical trial R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-design, Clinical Cancer Research 14:5984-93, 200893, 2008

R Simon. Designs and adaptive analysis R Simon. Designs and adaptive analysis plans for pivotal clinical trials of plans for pivotal clinical trials of therapeutics and companion diagnostics, therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics Expert Opinion in Medical Diagnostics 2:721-29, 20082:721-29, 2008

Use of Archived Specimens in Evaluation of Use of Archived Specimens in Evaluation of Prognostic and Predictive BiomarkersPrognostic and Predictive Biomarkers

Richard M. Simon, Soonmyung Paik and Daniel F. HayesRichard M. Simon, Soonmyung Paik and Daniel F. Hayes

Claims of medical utility for prognostic and Claims of medical utility for prognostic and predictive biomarkers based on analysis of archived predictive biomarkers based on analysis of archived tissues can be considered to have either a high or tissues can be considered to have either a high or low level of evidence depending on several key low level of evidence depending on several key factors. factors.

Studies using archived tissues, when conducted Studies using archived tissues, when conducted under ideal conditions and independently confirmed under ideal conditions and independently confirmed can provide the highest level of evidence. can provide the highest level of evidence.

Traditional analyses of prognostic or predictive Traditional analyses of prognostic or predictive factors, using non analytically validated assays on a factors, using non analytically validated assays on a convenience sample of tissues and conducted in an convenience sample of tissues and conducted in an exploratory and unfocused manner provide a very exploratory and unfocused manner provide a very low level of evidence for clinical utility. low level of evidence for clinical utility.

Use of Archived Specimens in Evaluation of Use of Archived Specimens in Evaluation of Prognostic and Predictive BiomarkersPrognostic and Predictive Biomarkers

Richard M. Simon, Soonmyung Paik and Daniel F. HayesRichard M. Simon, Soonmyung Paik and Daniel F. Hayes For Level I Evidence: For Level I Evidence: (i) archived tissue adequate for a successful assay must (i) archived tissue adequate for a successful assay must

be available on a sufficiently large number of patients be available on a sufficiently large number of patients from a phase III trial that the appropriate analyses have from a phase III trial that the appropriate analyses have adequate statistical power and that the patients included adequate statistical power and that the patients included in the evaluation are clearly representative of the in the evaluation are clearly representative of the patients in the trial. patients in the trial.

(ii) The test should be analytically and pre-analytically (ii) The test should be analytically and pre-analytically validated for use with archived tissue.validated for use with archived tissue.

(iii) The analysis plan for the biomarker evaluation should (iii) The analysis plan for the biomarker evaluation should be completely specified in writing prior to the be completely specified in writing prior to the performance of the biomarker assays on archived tissue performance of the biomarker assays on archived tissue and should be focused on evaluation of a single and should be focused on evaluation of a single completely defined classifier.completely defined classifier.

iv) the results from archived specimens should be iv) the results from archived specimens should be validated using specimens from a similar, but separate, validated using specimens from a similar, but separate, study. study.

Publications ReviewedPublications Reviewed

Original study on human cancer Original study on human cancer patients relating gene expression to patients relating gene expression to clinical outcomeclinical outcome Survival or disease-free survivalSurvival or disease-free survival Response to treatmentResponse to treatment

Published in English before Published in English before December 31, 2004December 31, 2004

Analyzed gene expression of more Analyzed gene expression of more than 1000 probesthan 1000 probes

90 publications identified that met 90 publications identified that met criteriacriteria Abstracted information for all 90Abstracted information for all 90

Performed detailed review of Performed detailed review of statistical analysis for the 42 papers statistical analysis for the 42 papers published in 2004published in 2004

Major Flaws Found in 40 Major Flaws Found in 40 Studies Published in 2004Studies Published in 2004

Inadequate control of multiple comparisons Inadequate control of multiple comparisons in gene findingin gene finding 9/23 studies had unclear or inadequate methods to deal 9/23 studies had unclear or inadequate methods to deal

with false positiveswith false positives 10,000 genes x .05 significance level = 500 false positives10,000 genes x .05 significance level = 500 false positives

Misleading report of prediction accuracyMisleading report of prediction accuracy 12/28 reports based on incomplete cross-validation12/28 reports based on incomplete cross-validation

Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters 13/28 studies invalidly claimed that expression clusters

based on differentially expressed genes could help based on differentially expressed genes could help distinguish clinical outcomesdistinguish clinical outcomes

50% of studies contained one or more major flaws50% of studies contained one or more major flaws

Control for Multiple Control for Multiple TestingTesting

If each gene is tested for significance at If each gene is tested for significance at level level and there are n genes, then the and there are n genes, then the expected number of false discoveries is n expected number of false discoveries is n . . e.g. if n=10,000 and e.g. if n=10,000 and =0.001, then 10 false =0.001, then 10 false

“discoveries”“discoveries” Control the FDR (false discovery rate)Control the FDR (false discovery rate)

g = number of genes reported as having g = number of genes reported as having expression significantly correlated with a expression significantly correlated with a phenotypephenotype

FDR = number of false positives / gFDR = number of false positives / g


Inadequate control of multiple comparisons in Inadequate control of multiple comparisons in gene findinggene finding 9/23 studies had unclear or inadequate methods to deal 9/23 studies had unclear or inadequate methods to deal



Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters 13/28 studies invalidly claimed that expression clusters



Evaluating a ClassifierEvaluating a Classifier

Fit of a model to the same data Fit of a model to the same data used to develop it is no evidence used to develop it is no evidence of prediction accuracy for of prediction accuracy for independent dataindependent data Goodness of fit vs prediction Goodness of fit vs prediction

accuracyaccuracy

Simulation Training Validation

1

2

3

4

5

6

7

8

9

10

p=7.0e-05

p=0.70

p=4.2e-07

p=0.54

p=2.4e-13

p=0.60

p=1.3e-10

p=0.89

p=1.8e-13

p=0.36

p=5.5e-11

p=0.81

p=3.2e-09

p=0.46

p=1.8e-07

p=0.61

p=1.1e-07

p=0.49

p=4.3e-09

p=0.09

Split-Sample EvaluationSplit-Sample Evaluation

Training-setTraining-set Used to select features, select model type, Used to select features, select model type,

determine parameters and cut-off thresholdsdetermine parameters and cut-off thresholds Test-setTest-set

Withheld until a single model is fully specified Withheld until a single model is fully specified using the training-set.using the training-set.

Fully specified model is applied to the Fully specified model is applied to the expression profiles in the test-set to predict expression profiles in the test-set to predict class labels. class labels.

Number of errors is countedNumber of errors is counted

Leave-one-out Cross Leave-one-out Cross ValidationValidation

Leave-one-out cross-validation Leave-one-out cross-validation simulates the process of separately simulates the process of separately developing a model on one set of developing a model on one set of data and predicting for a test set of data and predicting for a test set of data not used in developing the data not used in developing the modelmodel


Omit sample 1Omit sample 1 Develop multivariate classifier from Develop multivariate classifier from

scratch on training set with sample 1 scratch on training set with sample 1 omittedomitted

Predict class for sample 1 and record Predict class for sample 1 and record whether prediction is correctwhether prediction is correct


Repeat analysis for training sets Repeat analysis for training sets with each single sample omitted one with each single sample omitted one at a timeat a time

e = number of misclassifications e = number of misclassifications determined by cross-validationdetermined by cross-validation

Subdivide e for estimation of Subdivide e for estimation of sensitivity and specificitysensitivity and specificity

Cross validation is only valid if the test set is not Cross validation is only valid if the test set is not used in any way in the development of the used in any way in the development of the model. Using the complete set of samples to model. Using the complete set of samples to select genes violates this assumption and select genes violates this assumption and invalidates cross-validation.invalidates cross-validation.

With proper cross-validation, the model must be With proper cross-validation, the model must be developed developed from scratchfrom scratch for each leave-one-out for each leave-one-out training set. This means that feature selection training set. This means that feature selection must be repeated for each leave-one-out training must be repeated for each leave-one-out training set. set.

The cross-validated estimate of misclassification The cross-validated estimate of misclassification error is an estimate of the prediction error for error is an estimate of the prediction error for model fit using specified algorithm to full datasetmodel fit using specified algorithm to full dataset

Prediction on Simulated Null DataPrediction on Simulated Null Data

Generation of Gene Expression Profiles

• 14 specimens (Pi is the expression profile for specimen i)

• Log-ratio measurements on 6000 genes

• Pi ~ MVN(0, I6000)

• Can we distinguish between the first 7 specimens (Class 1) and the last 7 (Class 2)?

Prediction Method

• Compound covariate prediction

• Compound covariate built from the log-ratios of the 10 most differentially expressed genes.

Number of misclassifications

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Pro

po

rtio

n o

f sim

ula

ted

da

ta s

ets

0.00

0.05

0.10

0.90

0.95

1.00

Cross-validation: none (resubstitution method)Cross-validation: after gene selectionCross-validation: prior to gene selection


Inadequate control of multiple comparisons in Inadequate control of multiple comparisons in gene findinggene finding 9/23 studies had unclear or inadequate methods to deal 9/23 studies had unclear or inadequate methods to deal



Misleading use of cluster analysisMisleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters 13/28 studies invalidly claimed that expression clusters



Cluster Analysis is Cluster Analysis is SubjectiveSubjective

Cluster algorithms always produce Cluster algorithms always produce clustersclusters

Different distance metrics and Different distance metrics and clustering algorithms may find different clustering algorithms may find different structure using the same data.structure using the same data.

Supervised clustering is misleadingSupervised clustering is misleading

Good Microarray Studies Good Microarray Studies Have Clear ObjectivesHave Clear Objectives

Class Comparison (Gene Finding)Class Comparison (Gene Finding) Find genes whose expression differs among Find genes whose expression differs among

predetermined classes, e.g. tissue or predetermined classes, e.g. tissue or experimental conditionexperimental condition

Class PredictionClass Prediction Prediction of predetermined class (e.g. Prediction of predetermined class (e.g.

treatment outcome) using information from treatment outcome) using information from gene expression profilegene expression profile

Class DiscoveryClass Discovery Discover clusters of specimens having Discover clusters of specimens having

similar expression profilessimilar expression profiles

Class Comparison and Class Comparison and Class PredictionClass Prediction

Not clustering problemsNot clustering problems Global similarity measures generally Global similarity measures generally

used for clustering arrays may not used for clustering arrays may not distinguish classesdistinguish classes

Don’t control multiplicity or for Don’t control multiplicity or for distinguishing data used for classifier distinguishing data used for classifier development from data used for development from data used for classifier evaluationclassifier evaluation

Supervised methodsSupervised methods

AcknowledgementsAcknowledgements

NCI Biometric Research BranchNCI Biometric Research Branch Alain DupuyAlain Dupuy Boris FreidlinBoris Freidlin Wenyu JiangWenyu Jiang Aboubakar MaitournamAboubakar Maitournam Yingdong ZhaoYingdong Zhao

Soonmyung Paik, NSABPSoonmyung Paik, NSABP Daniel Hayes, U. MichiganDaniel Hayes, U. Michigan

Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic &...

Documents

Transcript of Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic &...