Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic &...
-
Upload
stephanie-webb -
Category
Documents
-
view
213 -
download
0
Transcript of Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic &...
Use of Genomics in Use of Genomics in Clinical Trial Design and Clinical Trial Design and
How to Critically Evaluate How to Critically Evaluate Claims for Prognostic & Claims for Prognostic & Predictive BiomarkersPredictive Biomarkers
Richard Simon, D.Sc.Richard Simon, D.Sc.Chief, Biometric Research BranchChief, Biometric Research Branch
National Cancer InstituteNational Cancer Institutehttp://brb.nci.nih.govhttp://brb.nci.nih.gov
BRB WebsiteBRB Websitebrb.nci.nih.govbrb.nci.nih.gov
Powerpoint presentationsPowerpoint presentations ReprintsReprints BRB-ArrayTools softwareBRB-ArrayTools software
Data archiveData archive Q/A message boardQ/A message board
Web based Sample Size Planning Web based Sample Size Planning Clinical TrialsClinical Trials
Optimal 2-stage phase II designsOptimal 2-stage phase II designs Phase III designs using predictive biomarkersPhase III designs using predictive biomarkers Phase II/III designsPhase II/III designs
Development of gene expression based predictive Development of gene expression based predictive classifiersclassifiers
Different Kinds of Different Kinds of BiomarkersBiomarkers
EndpointEndpoint Measured before, during and after treatment Measured before, during and after treatment
to monitor treatment effectto monitor treatment effect Surrogate of clinical endpointSurrogate of clinical endpoint PharmacodynamicPharmacodynamic
Predictive biomarkersPredictive biomarkers Measured before treatment to identify who Measured before treatment to identify who
will benefit from a particular treatmentwill benefit from a particular treatment Prognostic biomarkersPrognostic biomarkers
Measured before treatment to indicate long-Measured before treatment to indicate long-term outcome for patients untreated or term outcome for patients untreated or receiving standard treatmentreceiving standard treatment
Types of Validation for Types of Validation for Prognostic and Predictive Prognostic and Predictive
BiomarkersBiomarkers Analytical validationAnalytical validation
Accuracy, reproducibility, robustnessAccuracy, reproducibility, robustness Clinical validationClinical validation
Does the biomarker predict a clinical Does the biomarker predict a clinical endpoint or phenotypeendpoint or phenotype
Clinical utilityClinical utility Does use of the biomarker result in Does use of the biomarker result in
patient benefitpatient benefit By informing treatment decisionsBy informing treatment decisions Is it actionableIs it actionable
Prognostic and Predictive Prognostic and Predictive Biomarkers in OncologyBiomarkers in Oncology
Single gene or protein measurementSingle gene or protein measurement Scalar index or classifier that Scalar index or classifier that
summarizes expression levels of summarizes expression levels of multiple genesmultiple genes
Prognostic Factors in OncologyPrognostic Factors in Oncology
Many prognostic factors are not used because Many prognostic factors are not used because they are not actionablethey are not actionable Most prognostic factor studies are not conducted with Most prognostic factor studies are not conducted with
an intended use an intended use They use a convenience sample of heterogeneous patients They use a convenience sample of heterogeneous patients
for whom tissue is availablefor whom tissue is available Retrospective studies of prognostic markers Retrospective studies of prognostic markers
should be planned and analyzed with specific should be planned and analyzed with specific focus on intended use of the markerfocus on intended use of the marker
Design of prospective studies depends on context Design of prospective studies depends on context of use of the biomarkerof use of the biomarker Treatment options and practice guidelinesTreatment options and practice guidelines Other prognostic factorsOther prognostic factors
Potential Uses of a Potential Uses of a Prognostic BiomarkerPrognostic Biomarker
Identify patients who have very good Identify patients who have very good prognosis on standard treatment and prognosis on standard treatment and do not require more intensive do not require more intensive regimens regimens
Identify patients who have poor Identify patients who have poor prognosis on standard chemotherapy prognosis on standard chemotherapy who are good candidates for who are good candidates for experimental regimensexperimental regimens
Prospective Marker Prospective Marker Strategy DesignStrategy Design
Patients are randomized to eitherPatients are randomized to either have marker measured and treatment have marker measured and treatment
determined based on marker result and determined based on marker result and clinical features clinical features
don’t have marker measured and don’t have marker measured and receive standard of care treatment receive standard of care treatment based on clinical features alonebased on clinical features alone
Randomize Patients to Test or No Test
Rx Determined by Test
Rx DeterminedBy SOC
Marker Strategy DesignMarker Strategy Design
InefficientInefficient Many patients get the same treatment Many patients get the same treatment
regardless of which arm they are regardless of which arm they are randomized torandomized to
UninformativeUninformative Since patients in the standard of care Since patients in the standard of care
arm do not have the marker measured, it arm do not have the marker measured, it is not possible to compare outcome for is not possible to compare outcome for patients whose treatment is changed patients whose treatment is changed based on the marker result based on the marker result
Using phase II data, develop predictor of response to new drugApply Test to All Eligible Patients
Test Deterimined Rx DifferentFrom SOC
Use TestDetermined Rx Use SOC
Test Determined Rx Same asSOC
Off Study
Prospective Evaluation of Prospective Evaluation of OncotypeDx (TAILORx)OncotypeDx (TAILORx)
For patients with predicted low risk For patients with predicted low risk of recurrenceof recurrence Withhold chemotherapy and observe Withhold chemotherapy and observe
long term recurrence ratelong term recurrence rate If recurrence rate is very low, potential If recurrence rate is very low, potential
chemotherapy benefit must be very smallchemotherapy benefit must be very small
Predictive BiomarkersPredictive Biomarkers
Prospective Co-Prospective Co-Development of Drugs and Development of Drugs and
Companion DiagnosticsCompanion Diagnostics1.1. Develop a completely specified Develop a completely specified
genomic classifier of the patients genomic classifier of the patients likely to benefit from a new druglikely to benefit from a new drug
2.2. Establish analytical validity of the Establish analytical validity of the classifierclassifier
3.3. Use the completely specified Use the completely specified classifier in the primary analysis classifier in the primary analysis plan of a phase III trial of the new plan of a phase III trial of the new drugdrug
Guiding PrincipleGuiding Principle
The data used to develop the classifier The data used to develop the classifier should be distinct from the data used should be distinct from the data used to test hypotheses about treatment to test hypotheses about treatment effect in subsets determined by the effect in subsets determined by the classifierclassifier Developmental studies can be exploratoryDevelopmental studies can be exploratory Studies on which treatment effectiveness Studies on which treatment effectiveness
claims are to be based should not be claims are to be based should not be exploratoryexploratory
Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug
Patient Predicted Responsive
New Drug Control
Patient Predicted Non-Responsive
Off Study
Evaluating the Efficiency of Evaluating the Efficiency of Enrichment DesignEnrichment Design
Simon R and Maitnourim A. Evaluating the efficiency of Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006supplement 12:3229, 2006
Maitnourim A and Simon R. On the efficiency of Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-targeted clinical trials. Statistics in Medicine 24:329-339, 2005.339, 2005.
reprints and interactive sample size calculations at reprints and interactive sample size calculations at http://linus.nci.nih.govhttp://linus.nci.nih.gov
Relative efficiency of targeted design Relative efficiency of targeted design depends on depends on proportion of patients test positiveproportion of patients test positive effectiveness of new drug (compared to effectiveness of new drug (compared to
control) for test negative patientscontrol) for test negative patients When less than half of patients are test When less than half of patients are test
positive and the drug has little or no positive and the drug has little or no benefit for test negative patients, the benefit for test negative patients, the targeted design requires dramatically targeted design requires dramatically fewer randomized patientsfewer randomized patients
Stratification DesignStratification Design
Develop Predictor of Response to New Rx
Predicted Non-responsive to New Rx
Predicted ResponsiveTo New Rx
ControlNew RX Control
New RX
Stratification Design Stratification Design
Use the test to structure a prospective specified Use the test to structure a prospective specified primary analysis planprimary analysis plan
Having a prospective analysis plan is essentialHaving a prospective analysis plan is essential “ “Stratifying” (balancing) the randomization is Stratifying” (balancing) the randomization is
useful to ensure that all randomized patients useful to ensure that all randomized patients have tissue available but is not a substitute for a have tissue available but is not a substitute for a prospective analysis planprospective analysis plan
The purpose of the study is to evaluate the new The purpose of the study is to evaluate the new treatment overall and for the pre-defined treatment overall and for the pre-defined subsets; not to modify or refine the classifier subsets; not to modify or refine the classifier
The purpose is not to demonstrate that repeating The purpose is not to demonstrate that repeating the classifier development process on the classifier development process on independent data results in the same classifierindependent data results in the same classifier
R Simon. Using genomics in clinical trial R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-design, Clinical Cancer Research 14:5984-93, 200893, 2008
R Simon. Designs and adaptive analysis R Simon. Designs and adaptive analysis plans for pivotal clinical trials of plans for pivotal clinical trials of therapeutics and companion diagnostics, therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics Expert Opinion in Medical Diagnostics 2:721-29, 20082:721-29, 2008
Use of Archived Specimens in Evaluation of Use of Archived Specimens in Evaluation of Prognostic and Predictive BiomarkersPrognostic and Predictive Biomarkers
Richard M. Simon, Soonmyung Paik and Daniel F. HayesRichard M. Simon, Soonmyung Paik and Daniel F. Hayes
Claims of medical utility for prognostic and Claims of medical utility for prognostic and predictive biomarkers based on analysis of archived predictive biomarkers based on analysis of archived tissues can be considered to have either a high or tissues can be considered to have either a high or low level of evidence depending on several key low level of evidence depending on several key factors. factors.
Studies using archived tissues, when conducted Studies using archived tissues, when conducted under ideal conditions and independently confirmed under ideal conditions and independently confirmed can provide the highest level of evidence. can provide the highest level of evidence.
Traditional analyses of prognostic or predictive Traditional analyses of prognostic or predictive factors, using non analytically validated assays on a factors, using non analytically validated assays on a convenience sample of tissues and conducted in an convenience sample of tissues and conducted in an exploratory and unfocused manner provide a very exploratory and unfocused manner provide a very low level of evidence for clinical utility. low level of evidence for clinical utility.
Use of Archived Specimens in Evaluation of Use of Archived Specimens in Evaluation of Prognostic and Predictive BiomarkersPrognostic and Predictive Biomarkers
Richard M. Simon, Soonmyung Paik and Daniel F. HayesRichard M. Simon, Soonmyung Paik and Daniel F. Hayes For Level I Evidence: For Level I Evidence: (i) archived tissue adequate for a successful assay must (i) archived tissue adequate for a successful assay must
be available on a sufficiently large number of patients be available on a sufficiently large number of patients from a phase III trial that the appropriate analyses have from a phase III trial that the appropriate analyses have adequate statistical power and that the patients included adequate statistical power and that the patients included in the evaluation are clearly representative of the in the evaluation are clearly representative of the patients in the trial. patients in the trial.
(ii) The test should be analytically and pre-analytically (ii) The test should be analytically and pre-analytically validated for use with archived tissue.validated for use with archived tissue.
(iii) The analysis plan for the biomarker evaluation should (iii) The analysis plan for the biomarker evaluation should be completely specified in writing prior to the be completely specified in writing prior to the performance of the biomarker assays on archived tissue performance of the biomarker assays on archived tissue and should be focused on evaluation of a single and should be focused on evaluation of a single completely defined classifier.completely defined classifier.
iv) the results from archived specimens should be iv) the results from archived specimens should be validated using specimens from a similar, but separate, validated using specimens from a similar, but separate, study. study.
Publications ReviewedPublications Reviewed
Original study on human cancer Original study on human cancer patients relating gene expression to patients relating gene expression to clinical outcomeclinical outcome Survival or disease-free survivalSurvival or disease-free survival Response to treatmentResponse to treatment
Published in English before Published in English before December 31, 2004December 31, 2004
Analyzed gene expression of more Analyzed gene expression of more than 1000 probesthan 1000 probes
90 publications identified that met 90 publications identified that met criteriacriteria Abstracted information for all 90Abstracted information for all 90
Performed detailed review of Performed detailed review of statistical analysis for the 42 papers statistical analysis for the 42 papers published in 2004published in 2004
Major Flaws Found in 40 Major Flaws Found in 40 Studies Published in 2004Studies Published in 2004
Inadequate control of multiple comparisons Inadequate control of multiple comparisons in gene findingin gene finding 9/23 studies had unclear or inadequate methods to deal 9/23 studies had unclear or inadequate methods to deal
with false positiveswith false positives 10,000 genes x .05 significance level = 500 false positives10,000 genes x .05 significance level = 500 false positives
Misleading report of prediction accuracyMisleading report of prediction accuracy 12/28 reports based on incomplete cross-validation12/28 reports based on incomplete cross-validation
Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters 13/28 studies invalidly claimed that expression clusters
based on differentially expressed genes could help based on differentially expressed genes could help distinguish clinical outcomesdistinguish clinical outcomes
50% of studies contained one or more major flaws50% of studies contained one or more major flaws
Control for Multiple Control for Multiple TestingTesting
If each gene is tested for significance at If each gene is tested for significance at level level and there are n genes, then the and there are n genes, then the expected number of false discoveries is n expected number of false discoveries is n . . e.g. if n=10,000 and e.g. if n=10,000 and =0.001, then 10 false =0.001, then 10 false
“discoveries”“discoveries” Control the FDR (false discovery rate)Control the FDR (false discovery rate)
g = number of genes reported as having g = number of genes reported as having expression significantly correlated with a expression significantly correlated with a phenotypephenotype
FDR = number of false positives / gFDR = number of false positives / g
Major Flaws Found in 40 Major Flaws Found in 40 Studies Published in 2004Studies Published in 2004
Inadequate control of multiple comparisons in Inadequate control of multiple comparisons in gene findinggene finding 9/23 studies had unclear or inadequate methods to deal 9/23 studies had unclear or inadequate methods to deal
with false positiveswith false positives 10,000 genes x .05 significance level = 500 false positives10,000 genes x .05 significance level = 500 false positives
Misleading report of prediction accuracyMisleading report of prediction accuracy 12/28 reports based on incomplete cross-validation12/28 reports based on incomplete cross-validation
Misleading use of cluster analysis Misleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters 13/28 studies invalidly claimed that expression clusters
based on differentially expressed genes could help based on differentially expressed genes could help distinguish clinical outcomesdistinguish clinical outcomes
50% of studies contained one or more major flaws50% of studies contained one or more major flaws
Evaluating a ClassifierEvaluating a Classifier
Fit of a model to the same data Fit of a model to the same data used to develop it is no evidence used to develop it is no evidence of prediction accuracy for of prediction accuracy for independent dataindependent data Goodness of fit vs prediction Goodness of fit vs prediction
accuracyaccuracy
Simulation Training Validation
1
2
3
4
5
6
7
8
9
10
p=7.0e-05
p=0.70
p=4.2e-07
p=0.54
p=2.4e-13
p=0.60
p=1.3e-10
p=0.89
p=1.8e-13
p=0.36
p=5.5e-11
p=0.81
p=3.2e-09
p=0.46
p=1.8e-07
p=0.61
p=1.1e-07
p=0.49
p=4.3e-09
p=0.09
Split-Sample EvaluationSplit-Sample Evaluation
Training-setTraining-set Used to select features, select model type, Used to select features, select model type,
determine parameters and cut-off thresholdsdetermine parameters and cut-off thresholds Test-setTest-set
Withheld until a single model is fully specified Withheld until a single model is fully specified using the training-set.using the training-set.
Fully specified model is applied to the Fully specified model is applied to the expression profiles in the test-set to predict expression profiles in the test-set to predict class labels. class labels.
Number of errors is countedNumber of errors is counted
Leave-one-out Cross Leave-one-out Cross ValidationValidation
Leave-one-out cross-validation Leave-one-out cross-validation simulates the process of separately simulates the process of separately developing a model on one set of developing a model on one set of data and predicting for a test set of data and predicting for a test set of data not used in developing the data not used in developing the modelmodel
Leave-one-out Cross Leave-one-out Cross ValidationValidation
Omit sample 1Omit sample 1 Develop multivariate classifier from Develop multivariate classifier from
scratch on training set with sample 1 scratch on training set with sample 1 omittedomitted
Predict class for sample 1 and record Predict class for sample 1 and record whether prediction is correctwhether prediction is correct
Leave-one-out Cross Leave-one-out Cross ValidationValidation
Repeat analysis for training sets Repeat analysis for training sets with each single sample omitted one with each single sample omitted one at a timeat a time
e = number of misclassifications e = number of misclassifications determined by cross-validationdetermined by cross-validation
Subdivide e for estimation of Subdivide e for estimation of sensitivity and specificitysensitivity and specificity
Cross validation is only valid if the test set is not Cross validation is only valid if the test set is not used in any way in the development of the used in any way in the development of the model. Using the complete set of samples to model. Using the complete set of samples to select genes violates this assumption and select genes violates this assumption and invalidates cross-validation.invalidates cross-validation.
With proper cross-validation, the model must be With proper cross-validation, the model must be developed developed from scratchfrom scratch for each leave-one-out for each leave-one-out training set. This means that feature selection training set. This means that feature selection must be repeated for each leave-one-out training must be repeated for each leave-one-out training set. set.
The cross-validated estimate of misclassification The cross-validated estimate of misclassification error is an estimate of the prediction error for error is an estimate of the prediction error for model fit using specified algorithm to full datasetmodel fit using specified algorithm to full dataset
Prediction on Simulated Null DataPrediction on Simulated Null Data
Generation of Gene Expression Profiles
• 14 specimens (Pi is the expression profile for specimen i)
• Log-ratio measurements on 6000 genes
• Pi ~ MVN(0, I6000)
• Can we distinguish between the first 7 specimens (Class 1) and the last 7 (Class 2)?
Prediction Method
• Compound covariate prediction
• Compound covariate built from the log-ratios of the 10 most differentially expressed genes.
Number of misclassifications
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Pro
po
rtio
n o
f sim
ula
ted
da
ta s
ets
0.00
0.05
0.10
0.90
0.95
1.00
Cross-validation: none (resubstitution method)Cross-validation: after gene selectionCross-validation: prior to gene selection
Major Flaws Found in 40 Major Flaws Found in 40 Studies Published in 2004Studies Published in 2004
Inadequate control of multiple comparisons in Inadequate control of multiple comparisons in gene findinggene finding 9/23 studies had unclear or inadequate methods to deal 9/23 studies had unclear or inadequate methods to deal
with false positiveswith false positives 10,000 genes x .05 significance level = 500 false positives10,000 genes x .05 significance level = 500 false positives
Misleading report of prediction accuracyMisleading report of prediction accuracy 12/28 reports based on incomplete cross-validation12/28 reports based on incomplete cross-validation
Misleading use of cluster analysisMisleading use of cluster analysis 13/28 studies invalidly claimed that expression clusters 13/28 studies invalidly claimed that expression clusters
based on differentially expressed genes could help based on differentially expressed genes could help distinguish clinical outcomesdistinguish clinical outcomes
50% of studies contained one or more major flaws50% of studies contained one or more major flaws
Cluster Analysis is Cluster Analysis is SubjectiveSubjective
Cluster algorithms always produce Cluster algorithms always produce clustersclusters
Different distance metrics and Different distance metrics and clustering algorithms may find different clustering algorithms may find different structure using the same data.structure using the same data.
Supervised clustering is misleadingSupervised clustering is misleading
Good Microarray Studies Good Microarray Studies Have Clear ObjectivesHave Clear Objectives
Class Comparison (Gene Finding)Class Comparison (Gene Finding) Find genes whose expression differs among Find genes whose expression differs among
predetermined classes, e.g. tissue or predetermined classes, e.g. tissue or experimental conditionexperimental condition
Class PredictionClass Prediction Prediction of predetermined class (e.g. Prediction of predetermined class (e.g.
treatment outcome) using information from treatment outcome) using information from gene expression profilegene expression profile
Class DiscoveryClass Discovery Discover clusters of specimens having Discover clusters of specimens having
similar expression profilessimilar expression profiles
Class Comparison and Class Comparison and Class PredictionClass Prediction
Not clustering problemsNot clustering problems Global similarity measures generally Global similarity measures generally
used for clustering arrays may not used for clustering arrays may not distinguish classesdistinguish classes
Don’t control multiplicity or for Don’t control multiplicity or for distinguishing data used for classifier distinguishing data used for classifier development from data used for development from data used for classifier evaluationclassifier evaluation
Supervised methodsSupervised methods
AcknowledgementsAcknowledgements
NCI Biometric Research BranchNCI Biometric Research Branch Alain DupuyAlain Dupuy Boris FreidlinBoris Freidlin Wenyu JiangWenyu Jiang Aboubakar MaitournamAboubakar Maitournam Yingdong ZhaoYingdong Zhao
Soonmyung Paik, NSABPSoonmyung Paik, NSABP Daniel Hayes, U. MichiganDaniel Hayes, U. Michigan