Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies...

31
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007

Transcript of Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies...

Page 1: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Design Considerations in Molecular Biomarker Discovery Studies

Doris Damian and Robert McBurneyJune 6, 2007

Page 2: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 2

Outline of Presentation

• Introduction:

– Mass Spectrometry Data

– Studies objectives and questions

• Statistical Processing of MS Data

– Sample normalization

– Removal of peak-specific batch and other temporal trends

– Filtering of noisy peaks

• Design Considerations

– Power calculations – for univariate biomarkers

– Power calculations for multivariate biomarkers (regression)

Page 3: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 3

• Measurements: chemical compounds of different classes (proteins,

lipids, polar and non-polar metabolites, amino acids, etc.)

• The variables constituting the data sets are peak intensities (peaks)

identified by m/z and retention time. The peak intensities are

proportional to the amount of analyte detected by the mass

spectrometer. Note that p >> n!

0 10 20 30 40 50 60 70 80 90 100 110 120 1305

e+

067

e+

06

sample

peak

inte

nsity

MS of Individual

Peaks

Total Ion Chromatogram

Selected Ion Chromatogram

Figure modified from: http://www.asms.org/whatisms/p13.html

biological samplesQC samples

Mass Spectrometry Data

Page 4: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 4

Questions

Design

Experiment

StatisticalProcessing

Data Analysis

Objectives

Structure of a Molecular Biomarker Discovery Study

Page 5: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 5

Questions

Design

Experiment

Processing

Analysis

Objectives

Objectives

Questions

Diagnosis Elucidation of Mechanisms of Action (MoA)

•What is a minimal set of biomarkers?

•What are all the biomarkers?•What are the molecular

pathways?

Questions

Biomarker:A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic response(s) to a therapeutic intervention.

Studies Objectives and Questions

Page 6: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 6

Outline of Presentation

• Introduction:

– Mass Spectrometry Data

– Studies objectives and questions

• Statistical Processing of MS Data

– Sample normalization

– Removal of peak-specific batch and other temporal trends

– Filtering of noisy peaks

• Design Considerations

– Power calculations – for univariate biomarkers

– Power calculations for multivariate biomarkers (regression)

Page 7: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 7

• Sample normalization

– correction of baseline differences between samples

• Removal of peak-specific batch and other temporal trends

– due to instrument and processing limitations, samples are acquired

sequentially in batches – peaks exhibit batch-to-batch variation;

– instrument performance may become unstable over time, samples

may undergo degradation.

These are main causes for temporal variation observed in peak

intensities.

• Filtering of noisy peaks

– for each biological sample replicate measurements are obtained;

– the estimated correlation between these replicates is used as a filter

for noisy data.

Statistical Processing

Presented at IBC’s Biomarkers and Molecular Diagnostic conferences September 2006

Page 8: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 8

• Correction of baseline differences between samples.

• Based on Internal Standards.

• Internal Standards are known exogenous compounds,

added to the biological samples in fixed amounts at the

beginning of the sample preparation stage (same for all

samples).

• Used to account for sample variability (e.g., pipetting

errors) during sample preparation and acquisition.

Sample Normalization

Page 9: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 9

1 2 3 4 5 6

14.0

14.5

15.0

15.5

16.0

16.5

17.0

17.5

IS Peak

log(

inte

nsity

)

Before Normalization: Sample Profiles of 6 Internal Standard Peaks

Typical Sample Profiles of IS Peaks – before Normalization

Page 10: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 10

• Normalization – the statistical procedure of multivariate

scaling of samples based on (a subset of) IS peaks.

• Y = log(intensity); i = 1,…,I IS peak; j = 1,…,J sample.

• The sample-specific factors, , are estimated in this

ANOVA model and removed from all peaks.

ij i j ijY

j

Sample Normalization

Page 11: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 11

1 2 3 4 5 6

14.0

14.5

15.0

15.5

16.0

16.5

17.0

17.5

IS Peak

log(

inte

nsity

)

After Normalization: Sample Profiles of 6 Internal Standard Peaks

Through normalization, temporal trends common to all peaks are removed.

Typical Sample Profiles of IS Peaks – after Normalization

Page 12: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 12

0 50 100 150 200 250 300 350 400 450

14.0

14.5

15.0

15.5

16.0

16.5

17.0

17.5

sample order

log(

inte

nsity

)

Before Normalization: Temporal Profiles of 6 Internal Standard Peaks

ˆ t

Typical Temporal Profiles of IS Peaks – before Normalization

Page 13: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 13

0 50 100 150 200 250 300 350 400 450

14.0

14.5

15.0

15.5

16.0

16.5

17.0

17.5

sample order

log(

inte

nsity

)

After Normalization: Temporal Profiles of 6 Internal Standard Peaks

Typical Temporal Profiles of IS Peaks – after Normalization

Page 14: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 14

• Sample normalization

– correction of baseline differences between samples

• Removal of peak-specific batch and other temporal trends

– due to instrument and processing limitations, samples are acquired

sequentially in batches – peaks exhibit batch-to-batch variation;

– instrument performance may become unstable over time, samples

may undergo degradation.

These are main causes for temporal variation observed in peak

intensities.

• Filtering of noisy peaks

– for each biological sample replicate measurements are obtained;

– the estimated correlation between these replicates is used as a filter

for noisy data.

Statistical Processing

Page 15: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 15

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

Before Normalization: Temporal Profile of Peak 41

QC: Black; Biological samples: Red

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

After Normalization: Temporal Profile of Peak 41

QC: Black; Biological samples: Red

Peak-Specific Temporal Trends – after Normalization

Page 16: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 16

• The within and between batch patterns cause visible batch

separations:

• If one does not account for these intrinsic experimental trends,

important biological effects may be obscured.

The Need for Batch Corrections

-12

-10

-8

-6

-4

-2

0

2

4

6

8

10

12

14

-15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

t[2]

t[1]

PCA: Iris Plasma GC/MS Data SetAfter Normalization

(colored by batch, numbered sequentially)

Ellipse: Hotelling T2 (0.95)

1234

1

2

3

4 5

67

8

910

11 12

131415

16

1718

1920

2122

23

2425

2627

2829

30

31

32

33

34

35 36 3738

39 40

4142

43

44

45

46

47

4849

5051

5253

54

55

565758

5960

61

62

6364

65

6667

68

69

70

717273

7475

7677

78

7980

8182

8384

8586

8788

8990

9192

93

9495

9697

9899

100

101102

103104

105107

108109

110111

112

113

114

115

116

117118

119

120

121

122

123124

125126

127128129

130131

132133134

PCA Plot: Data set after NormalizationColored by Batch

first principal component

secon

d p

rin

cip

al com

pon

en

t

Page 17: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 17

• Based on QC samples (ideally)

– QC samples: a pool of material from the biological

samples in a study, aliquoted into a set of identical

samples that are acquired at specific intervals in

each batch of samples.

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

Before Normalization: Temporal Profile of Peak 41

QC: Black; Biological samples: Red

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

After Normalization: Temporal Profile of Peak 41

QC: Black; Biological samples: Red

Removal of Peak-Specific Temporal Trends

Page 18: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 18

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

After Normalization, Before Batch Correction: Temporal Profile of Peak 41

QCY: Black; Biological samples: Red

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

After Normalization, After Batch Correction: Temporal Profile of Peak 41

QCY: Black; Biological samples: Red

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

After Normalization, Before Batch Correction: Temporal Profile of Peak 41

QCY: Black; Biological samples: Red

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130

14.8

15.2

15.6

sample order

log(

inte

nsity

)

After Normalization, After Batch Correction: Temporal Profile of Peak 41

QCY: Black; Biological samples: Red

20, 1, 2,( )b b b bf t t t Temporal trend within batch b (b=1,…,B batches):

estimated based on QC samples within batch b

Removal of Peak-Specific Temporal Trends

Page 19: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 19

• Sample normalization

– correction of baseline differences between samples

• Removal of peak-specific batch and other temporal trends

– due to instrument and processing limitations, samples are acquired

sequentially in batches – peaks exhibit batch-to-batch variation;

– instrument performance may become unstable over time, samples

may undergo degradation.

These are main causes for temporal variation observed in peak

intensities.

• Filtering of noisy peaks

– for each biological sample replicate measurements are obtained;

– the estimated correlation between these replicates is used as a filter

for noisy data.

Statistical Processing

Page 20: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 20

• When the same sample is measured several times, we require

the measurements to correlate well.

• The correlation between replicates can be expressed as a

tradeoff between the biological variance ( ) and the

measurement error variance ( ).

• Ideal case: no measurement error .

• The estimated correlation, , can be used to filter noisy peaks.

2

1 2 2 2, Bio

Bio

Corr Y Y

2Bio

2

1 2 20.5 .Bio

Correlations between Biological Replicates

Page 21: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 21

10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 12.2 12.4 12.6

10.4

10.6

10.8

11.0

11.2

11.4

11.6

11.8

12.0

12.2

12.4

12.6

replicate 1

repl

icat

e 2

Peak 25: Estimated Correlation = 0.37

10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 12.2 12.4 12.6

10.4

10.6

10.8

11.0

11.2

11.4

11.6

11.8

12.0

12.2

12.4

12.6

replicate 1

repl

icat

e 2

Peak 101: Estimated Correlation = 0.98Distribution of Correlations

between Replicates

Fre

quen

cy

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

010

2030

4050

6070

80Examples of Correlations (two extremes)

Page 22: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 22

Outline of Presentation

• Introduction:

– Mass Spectrometry Data

– Studies objectives and questions

• Statistical Processing of MS Data

– Sample normalization

– Removal of peak-specific batch and other temporal trends

– Filtering of noisy peaks

• Design Considerations

– Power calculations – for univariate biomarkers

– Power calculations for multivariate biomarkers (regression)

Page 23: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 23

• The power in biomarker discovery studies is a function of:

– The sample size

– The separation between the groups (e.g., MFC)

– The proportion of biomarkers in the data set

– The false discovery rate (FDR) allowed

– The platform variability

– The within-group variability

– Other factors (e.g. other covariates in the model) ?

Power Calculations

• Statistical power = probability to detect biomarkers

Page 24: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 24

• The power in biomarker discovery studies is a function of:

– The sample size

– The separation between the groups (e.g., MFC)

– The proportion of biomarkers in the data set

– The false discovery rate (FDR) allowed

– The platform variability

– The within-group variability

– Other factors (e.g. other covariates in the model) ?

Power Calculations

• Statistical power = probability to detect biomarkers

Page 25: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 25

den

sity

x

healthydiseased

time (days)

y (E

xpec

ted

Val

ue)

0 1 2 3 4 5 6 7

healthydiseased

: MFC = 1.7: MFC = 2.0: MFC = 3.0

solid: FDR 0.1dashed: FDR 0.2

6 8 10 12 14 16 18 20 22 24 26 28 30

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

sample size (per group)

po

wer

Proportion of Biomarkers = 90%

Illustration I: Power Curves

Page 26: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 26

den

sity

x

healthydiseased

time (days)

y (E

xpec

ted

Val

ue)

0 1 2 3 4 5 6 7

healthydiseased

: MFC = 1.7: MFC = 2.0: MFC = 3.0

solid: FDR 0.1dashed: FDR 0.2

6 8 10 12 14 16 18 20 22 24 26 28 30

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

sample size (per group)

po

wer

Proportion of Biomarkers = 90%

6 8 10 12 14 16 18 20 22 24 26 28 300.

00.

10.

20.

30.

40.

50.

60.

70.

80.

91.

0

sample size (per group)

po

wer

Proportion of Biomarkers = 10%

Illustration I: Power Curves

Page 27: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 27

den

sity

x

healthydiseased

time (days)

y (E

xpec

ted

Val

ue)

0 1 2 3 4 5 6 7

healthydiseased

: MFC = 1.7: MFC = 2.0: MFC = 3.0

dotted: EstimatedFDR

There is no loss in power,

(proportion of biomarkers

discovered) BUT the FDR

may be undesirable.

6 8 10 12 14 16 18 20 22 24 26 28 300.

00.

10.

20.

30.

40.

50.

60.

70.

80.

91.

0

sample size (per group)

po

wer

Proportion of Biomarkers = 10%

FRD

Power Curves Not Accounting for the FDR

Page 28: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 28

Power Calculation for Multivariate Biomarkers (Regression)

Classical Setting

• n > p

• Linear regression model

• Parametric (F) test of model

significance

• Computationally inexpensive

Biomarker Discovery Setting

• n << p

• Regression with constraints on

parameters (elastic net)

• Dimensionality reduction

needed (through cross-

validation)

• Non-parametric (label

permutations) test of model

significance

• Computationally very expensive

Page 29: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 29

Illustration: Power for Regression

Tf X X

1 1 p pY X X

2

2

1

1,

1 p

ii

Corr Y f

X

• Model

• Multivariate biomarker

• Parameter of interest

• Test: = 0

• Power = proportion of times that this hypothesis is rejected

Page 30: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 30

Power Calculation – Regression

15 20 25 30 35 40 45 50 55 60

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

number of samples

pow

er

rho = 0.58rho = 0.75rho = 0.92

rhoNumber of Samples

Power

0.92 30 0.50

0.92 38 0.79

0.92 45 0.96

0.75 30 0.31

0.75 38 0.46

0.75 45 0.50

0.75 60 0.70

0.00 30 0.02

Biomarker with 10 Components(known in advance)

…10 minutes to calculate

Biomarker with 10 Components(buried among 90 other analytes)

…days to calculate

Page 31: Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Innovative Paths to Better Medicines

Confidential Information – Do Not Reproduce or Distribute – page 31

Thank you!