Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for...
Transcript of Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for...
![Page 1: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/1.jpg)
Big Data Training for Translational Omics Research
Principles of Biomarker
Discovery and Development
In Translational Medicine
Liu6/7/2017
Class 1
10:15am
Unit 3; Session 1
![Page 2: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/2.jpg)
Big Data Training for Translational Omics Research
Breakdown
Learning objectives
Biomarker and Precision Medicine
Biomarker in preclinical and clinical studies
Principles of Biomarker Discovery: Overview
Principles of Biomarker discovery: data collection
Principles of Biomarker discovery: data analysis
Principles of Biomarker Discovery: validation
![Page 3: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/3.jpg)
Big Data Training for Translational Omics Research
Philosophy of Translational Research
• As a biomedical researcher, how
can I make something to benefit
patients?
• I am working on cell lines and
mice, how the omics approach can
help me understand the
mechanism? esp. causality?
• Can the key molecule(s) I
identified in cells and animals be
able to used in humans?
Lab researchers, grant writers, physicians…
![Page 4: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/4.jpg)
Big Data Training for Translational Omics Research
Key Words
• Biomarker: A characteristic that is objectively measured
and evaluated as an indicator of normal biologic process,
pathogenic processes, or pharmacologic responses to a
therapeutic intervention.
NIH Biomarkers Definition Working Group
• Translational: Translational research aims to aid in the
transformation of biological knowledge into solutions that
can be applied in a clinical setting
Atkinson, et al., Clin Pharm Ther, 2001.
Azuaje F. Bioinformatics and Biomarker Discovery, 2010
![Page 5: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/5.jpg)
Big Data Training for Translational Omics Research
Why Biomarker?
![Page 6: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/6.jpg)
Big Data Training for Translational Omics Research
A Core Question in Modern Medicine
How to Address Patient Heterogeneity?
![Page 7: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/7.jpg)
Big Data Training for Translational Omics Research
Patient Heterogeneity
![Page 8: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/8.jpg)
Big Data Training for Translational Omics Research
BiomarkerPersonalized Medicine
CML Patients
All Breast Cancer
Patients
HER2+ Breast Cancer
Patients
All NSCLC Patients
EGFR MT+ NSCLC
Patients
Gleevec
Herceptin
Herceptin
Iressa
Iressa
90% RR
10–15% RR
35–45% RR
10–15% RR
60–70% RR
Slamon et al. NEJM 2001; Kantarjian et al. NEJM 2002; Vogel et al. JCO 2002. 20:3; Douillard et al. JCO 2010.
Biomarkers are especially important in diseases with low response rates in
the overall population
![Page 9: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/9.jpg)
Big Data Training for Translational Omics Research
Cancer
Other common diseases
Discovery ImplementationDrug development
EGFR
KRAS
ALK
HER2
ALK
BRAF
Gefitinib
ARS-853?
Crizotinib
Herceptin
Vemurafenib
Gene A
Gene B
ALK
Gene D
Gene C
Gene E
Precision molecules
BiomarkerPersonalized Medicine
![Page 10: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/10.jpg)
Big Data Training for Translational Omics Research
Precision Medicine
To deliver the right treatment to the right patient with the right dose
and at the right time
![Page 11: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/11.jpg)
Big Data Training for Translational Omics Research
Clinical Application of Biomarker
• Deal with the patient heterogeneity– Early risk assessment
– Disease prevention
– Assist diagnosis
– Optimize treatment: high effectiveness, low risk
– Match the patient to therapeutic strategy
– Monitor therapy success/disease recurrence
– Long-term management
Risk
Diagnosis
Treatment
Monitoring
![Page 12: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/12.jpg)
Big Data Training for Translational Omics Research
Biomarker in Preclinical Studies• To characterize the phenotype
• To monitor the response
• To identify potential translational biomarkers for humans
![Page 13: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/13.jpg)
Big Data Training for Translational Omics Research
Omics Approach in Basic Research
• Explore molecular mechanism
• Hypothesis generating
• Identify therapeutic targets and strategies
• Establish intermediate phenotypes
![Page 14: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/14.jpg)
Big Data Training for Translational Omics Research
Type of Biomarkers
• Prognostic marker (a): before treatment
• Predictive marker (b): before treatment
• Pharmacodynamic marker (c): after treatment
• Surrogate marker (d): during treatment
Gosho, et al. Sensors 2012, 12, 8966-8986
![Page 15: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/15.jpg)
Big Data Training for Translational Omics Research
Prognostic Marker• Signature separates a population with respect to the outcome (risk)
• Regardless of the types of therapies or treatments– Markers associated with overall survival regardless of treatment
• Distinguish outcome (poor or good) following the test and standard treatments
• Cannot guide the choice of a particular treatment
• Can determine the aggressiveness of treatment
Ballman KL, JCO. 2015.63.3651
![Page 16: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/16.jpg)
Big Data Training for Translational Omics Research
Predictive Biomarker
Ballman KL, JCO. 2015.63.3651
• Predicts the differential outcome of a particular therapy or treatment
• Prospectively identify patients who are likely to have a favorable clinical outcome from a specific treatment; therefore, a predictive biomarker
• Can guide the choice of treatment
![Page 17: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/17.jpg)
Big Data Training for Translational Omics Research
Prognostic and Predictive Markers
Ballman KL, JCO. 2015.63.3651
• Biomarkers are both predictive of disease susceptibility or progression and certain treatment outcomes
• ER status and breast cancer-prognostic
• ER status and antiestrogen therapy-prediction
![Page 18: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/18.jpg)
Big Data Training for Translational Omics Research
Pharmacodynamic Markers• PD biomarkers provide information about the pharmacologic
effects of a drug on its target
• Measured after treatment
• A clinical endpoint to be measured
• Application:– Proof of mechanism: i.e., Does the drug hit its intended target?
– Proof of concept: i.e., Does hitting the drug target alter the biology of the tumor?
– Selection of optimal biologic dosing
– Understanding response/resistance mechanisms
• Examples:– Protein phosphorylation markers. i.e. p-EGFR, p-ERK to evaluate
changes in target protein phosphorylation or the activation status of downstream signaling/adapter molecules.
– Apoptosis (TUNEL assay) to assess pharmacologic effect on proliferation
![Page 19: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/19.jpg)
Big Data Training for Translational Omics Research
Surrogate Biomarker• Substitute for a clinical endpoint
• Expected to predict clinical benefit (lack of benefit or harm) based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence
• During or after treatment
• Examples:
• Glucose level monitoring the treatment for diabetes
• Imaging-based measurement for anti-cancer therapy
![Page 20: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/20.jpg)
Big Data Training for Translational Omics Research
Questions
What kind of biomarker is
HOX13B:IL17BR in the first case paper?
What kind of biomarker is blood
concentration of R-/S-methadone in the
second case paper?
![Page 21: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/21.jpg)
Big Data Training for Translational Omics Research
Examples of FDA Approved Biomarkers
Gosho, et al. Sensors 2012, 12, 8966-8986
![Page 22: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/22.jpg)
Big Data Training for Translational Omics Research
Gosho, et al. Sensors 2012, 12, 8966-8986
Examples of FDA Approved Biomarkers
![Page 23: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/23.jpg)
Big Data Training for Translational Omics Research
Biomarker Discovery and Development in the Omics Era
1970s 1980s 1990s
>2005
![Page 24: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/24.jpg)
Big Data Training for Translational Omics Research
Biomarker Discovery and Development in the Omics Era
Genomics
Transcriptomics
miRNomics
lncRNomics
Epigenomics
Proteomics
Metabolomics
Lipidomics
Exposomics
![Page 25: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/25.jpg)
Big Data Training for Translational Omics Research
Prognostic-diagnostic Markers
• Genes for ~50% of rare diseases identified
Nature Reviews Genetics 14, 681–691 (2013)
![Page 26: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/26.jpg)
Big Data Training for Translational Omics Research
Prognostic-Diagnostic Markers• 11,907 SNPs strongly associated with common diseases
![Page 27: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/27.jpg)
Big Data Training for Translational Omics Research
Pharmacogenomic Markers
• 166 FDA approved PGx markers for drug treatment
![Page 28: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/28.jpg)
Big Data Training for Translational Omics Research
Transcriptomic Biomarkers
• MammaPrint test– Agendia
– 70-gene signature for breast cancer prognosis
• Oncotype Dx test– Genomic Health
– 21 gene-expression biomarkers for predicting the recurrence of breast cancer patients, and predicting response to both chemotherapy and radiation therapy
• H/I test– AviaraDx
– 2-gene signature that is used to estimate the risk of recurrence and response to therapy of breast cancer patients.
![Page 29: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/29.jpg)
Big Data Training for Translational Omics Research
Technical
development
Biomarker Development Pipeline
Discovery ConfirmationAssay
development
Validation/
Refinement
Clinical Validation
Clinical Adoption
Genomics
Transcriptomics
Proteomics
Metabolomics
Lipidomics
Epigenomics
Exposomics
Imaging
Target
selection
Integrated technologies and platforms
Multi-analyst assays
Robust validated assays
Clinical grade assays
Accurate, specific,
reproducible, reliable
Clinical grade assays
Instruments
Number of analytes
Number of samples
https://is.muni.cz
Lead
identification
Preclinical
Retrospective
Clinical
trials
Marketing
clinical use
![Page 30: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/30.jpg)
Big Data Training for Translational Omics Research
Institute of Medicine Roadmap for omics-
based tumor biomarker test development
Hayes BMC Medicine 2013, 11:221
![Page 31: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/31.jpg)
Big Data Training for Translational Omics Research
Institute of Medicine Roadmap for omics-
based tumor biomarker test development Hayes BMC Medicine 2013, 11:221
![Page 32: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/32.jpg)
Big Data Training for Translational Omics Research
Data Acquisition Strategies
• Retrospective:– Clinical samples collected before the design of the biomarker study,
and before comparison with control samples.
– Looks back at past, recorded data to find evidence of marker-disease relationships
– Inexpensive, rapid
– Potentially biased, noisy
– Weak evidence
• Prospective– The biomarker-based prediction or classification model is applied on
patients at the time of patient enrolment
– Clinical outcomes or disease occurrence are unknown at the time of enrolment
– Less biased
– Strong evidence
– Expensive, time-consuming,
• Pro-retrospective
FDA approval!!
![Page 33: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/33.jpg)
Big Data Training for Translational Omics Research
Study Design Consideration
• Biomarker discovery studies require careful planning and design
• Study style: retrospective, prospective, pro-retrospective
• Sample collection
• Phenotype
• Sample size and power estimation
• Other covariates
• Data collection
• Platform
• Replication, validation and application
• Data analysis plan
![Page 34: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/34.jpg)
Big Data Training for Translational Omics Research
Sample Collection, Assay Design, Data Analysis Plan
• Establish methods• Specimen collection • Processing • Storage
• Establish criteria • Quantity and quality• Minimum amount
• Feasibility • Obtaining specimens
• Assay design• Communication with core/service provider
• Data Analysis• Communication biostatistician and bioinformatician
![Page 35: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/35.jpg)
Big Data Training for Translational Omics Research
Sample and Materials
• Biospecimen• Tissue
• Blood
• Oral swab
• Hair
• Tear
• Urine
• Feces
• Saliva
• …
• Test materials• DNA
• RNA
• Protein
• Small
molecules
• Lipids
• Principles:• Non-invasive
• Reproducible
• Reliable
• Specific
• Accurate
• Inexpensive
• Point-of-care
invasiv
eness
![Page 36: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/36.jpg)
Big Data Training for Translational Omics Research
Ethical, Legal, and Regulatory Issues
• Establish communication with regulatory agencies, e.g. IRB, FDA
• Regulatory approvals
• Documents: – Informed consent
– Study protocol
• Intellectual property issues
• CLIA-lab based test for clinical trials involving patient selection
![Page 37: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/37.jpg)
Big Data Training for Translational Omics Research
Sample Size and Power Estimation• Power setting: 0.8
• Statistical significance: – Discovery: multiple hypothesis (corrected p
according to # of tests)
– Validation: usually one hypothesis (p<0.05)
• Input parameters: previous publication or pilot study
• Online tools:– piface.jar by Lenth (2006).
• http://homepage.stat.uiowa.edu/~rlenth/Power/
– Microarray power/sample size estimation• http://sph.umd.edu/department/epib/sample-size-
and-power-calculations-microarray-studies
• RNA-seq data:
• Scotty: http://bioinformatics.bc.edu/marthlab/scotty/scotty.php
• RnaSeqSampleSize: https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/
![Page 38: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/38.jpg)
Big Data Training for Translational Omics Research
Key Principles: Big Data in Biomarker
Phenotype Molecular Profiles
X“Digits” “Digits”Statistics
Bioinformatics
Network
…
![Page 39: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/39.jpg)
Big Data Training for Translational Omics Research
Always Start Your Design and
Analysis From Data Evaluation!
• What kind of phenotypic and marker data do I
have/should I use/collect?
• Are my data normally distributed?
• What kind of models should I choose?
• What factors may possibly confound my analyses?
• How covariate data may be correlated with my
phenotype?
![Page 40: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/40.jpg)
Big Data Training for Translational Omics Research
Phenotype to Digits
• Nominal data: no order– yes or no (Binary): disease vs normal, response vs no
response
– Cancer type: Breast, lung, colon…
• Ordinal data: some order– Pathologic: Tumor stage: I, II, III
– Disease progression: no, mild, severe, death
• Continuous data: – glucose level, LDL, drug concentration, gene expression
• Survival data: time to event– Death, occurrence of disease, onset of toxicity, in hr, day,
wk, month, yr, etc.
![Page 41: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/41.jpg)
Big Data Training for Translational Omics Research
Platform
Raw data
“Digits”Ordinal data
0, 1, 2
Continuous Variables-1.2,
-1.1,
0.58,
1.09,
2.34
…
Genomics
Transcriptomics
miRNomics
lncRNomics
Epigenomics
Proteomics
Metabolomics
Lipidomics
Molecular Data Collection
![Page 42: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/42.jpg)
Big Data Training for Translational Omics Research
Basic Statistical MethodsPhenotype Molecular Profiles
XNumerical data Numerical data
Nominal
Ordinal
Continuous
Nominal
Ordinal
Continuous
Survival
Chi-square test
t-test
ANOVA
Correlation
Log rank
Statistic
Models
Descriptive and exploratory association
![Page 43: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/43.jpg)
Big Data Training for Translational Omics Research
Basic Statistical Methods
• Continuous data
– Normal distributed: parametric method
– Non-normal distribution/ordinal data: non-parametric
method
• Winsorization
• Log transformation: log2
Parametric Non-parametric
t-test Mann-Whitney rank-sum test
Paired t-test Wilcoxon signed-rank test
ANOVA Kruskal-Wallis test
Pearson correlation Spearman correlation
![Page 44: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/44.jpg)
Big Data Training for Translational Omics Research
Statistic Models
• Univariate models– Logistic regression: binary/categorical phenotype
– Linear regression: continuous phenotype
– Kaplan-Meier (KM) method: survival phenotype
• Multivariate models– Multivariate regressions: linear or logistic
– Cox regression: survival phenotype
• Other sophisticated models
![Page 45: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/45.jpg)
Big Data Training for Translational Omics Research
• Example• P value cutoff =0.05
• 1000 genes: 50 genes by chance (error) at this significance level
• If 60 genes with p<0.05, many might be due to noise (false positive)
• Common Correction Method• Bonferroni Correction
• True significance level: pXn, e.g. p=0.0005, n=1000 genes, true p=
0.0005X1000=0.5.
• Correct p value = 0.05/N
• Explanation: among all genes selected, the p value for at least one
false positive is <=0.05
• False discovery rate (FDR)• FDR=0.1, meaning among all genes selected, (e.g. 100), we would
expect 10 to be false positive
• FDR as high as 0.5 may be acceptable to biologists
• Several different approaches to estimate (Benjamini & Hochberg,
B&H, most popular)
• Data filtering in the process step can also reduce the number of genes
Multiple Testing Issue
![Page 46: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/46.jpg)
Big Data Training for Translational Omics Research
Azuaje F. Bioinformatics and Biomarker Discovery, 2010
Basic Biomarker Discovery Pipeline
![Page 47: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/47.jpg)
Big Data Training for Translational Omics Research
Data Processing
• Data pre-processing – Data filtering and QC
• Remove samples with failed experiment
• Exclude markers with very low variance
• Exclude markers with very low expression levels, e.g. RNA-seq
– Data Normalization• To transform the data into a format that is compatible
or comparable between different samples or assays
• To level potential differences caused by experimental factors, such as labelling and hybridization
![Page 48: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/48.jpg)
Big Data Training for Translational Omics Research
Why Remove Genes with Low Variance?
Case
Co
ntr
ol
Case
Co
ntr
ol
0
1
2
3
4
Ge
ne
Ex
pre
ss
ion
p=0.004 p=0.008
![Page 49: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/49.jpg)
Big Data Training for Translational Omics Research
Data Reduction
• Focus on smaller sets of potentially novel and interesting data patterns (e.g. groups of samples or gene sets).
• Confirm initial hypothesis about the relevance of the features available and to guide future experimental and computational analysis
• Exploratory univariate analyses– T-test
– Chi-square test
– Correlation
– Univariate regression
![Page 50: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/50.jpg)
Big Data Training for Translational Omics Research
Data Matrix
• Data matrix
• Color-coded representations of
• Absolute or relative expression levels
Expre
ssio
n
Samples
![Page 51: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/51.jpg)
Big Data Training for Translational Omics Research
Data Visualization
dendrogram
• Statistical plotting: Graphpad
• Dendrogram and heatmap: R, GENE-E, Gitools
![Page 52: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/52.jpg)
Big Data Training for Translational Omics Research
Exploratory Analysis
• Univariate analysis
• Single marker vs phenotype
• Multiple-hypotheses testing corrections– DEG
– Fold change
– Statistical model: t-test, correlation, univariate regression
– P values and other cut-off
• Unsupervised classification (clustering) and visualization
• Filtering: to remove uninformative, highly noisy or redundant markers for subsequent analyses
• Supervised classification
![Page 53: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/53.jpg)
Big Data Training for Translational Omics Research
Data Integration
• Further reduction
• Which marker to be chosen for the predictive model construction
• To estimate the potential relevance of the identified markers and relationships;
• To discover other significant genes and relationships (e.g. gene-gene or gene-disease) not found in previous data-driven analysis steps
• Tools:– human gene annotation databases (e.g. GO),
– metabolic pathways databases (e.g. KEGG),
– gene-disease association extractors from public databases (e.g. Endeavour),
– Other functional catalogues
• Resulting data- and knowledge-driven findings, patterns or predictions provide a selected catalogue of genes, pathways and (gene-gene and gene-disease) relationships relevant to the phenotype classes investigated
IPA
![Page 54: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/54.jpg)
Big Data Training for Translational Omics Research
Don’t Forget Covariates!• Don’t forget these:
– Demographic• age, gender, race (often a PCA component), smoking, drinking, life style etc.
– Physiological• BMI, weight, height, etc.
– Clinical• blood tests, urine tests, other analytes.
• Integrate information– Molecular data
– Knowledge-driving data
– Covariates
• Multivariate regression– Model training
– Model validation
– Model assessment• ROC
![Page 55: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/55.jpg)
Big Data Training for Translational Omics Research
Data Integration is Critical
• Provide more reliable information
• Increase the prediction value
• Insight into the mechanism
• Reliable hypothesis generating
• But can be biased as well
Transcription Translation Catalysis
DNA RNA Protein Metabolites
Genome Transcriptome Proteome Metabolome/Lipidome Clinical endpoint
dysregulation
Genetic effect
Environmental effect
![Page 56: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/56.jpg)
Big Data Training for Translational Omics Research
Examples of Cardiovascular
Biomarkers with Integrated
Data
Vasan, 2006; Gerszten and Wang, 2008
![Page 57: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/57.jpg)
Big Data Training for Translational Omics Research
Building Predictive Models
If …Then…
Build up a model based on selected markers
Discovery set
validation set
Pro-retrospective set
Prospective set
Y= β0 + β1X1 + β2 X2 + βiXi^ ^ ^ ^
![Page 58: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/58.jpg)
Big Data Training for Translational Omics Research
Predictive Models
• Multivariable models
– Linear regression
• Continuous data
– logistic regression
• Presence/absence of disease
– Cox regression
• Survival data
• Algorithmic models—Machine learning
– Support vector machines (SVM)
– Artificial neural networks (ANN)
![Page 59: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/59.jpg)
Big Data Training for Translational Omics Research
Validation Strategies
• Internal validation
– Cross-validation
– Random/non-random split samples into
training and test set
• External validation
– Independent sample and dataset
![Page 60: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/60.jpg)
Big Data Training for Translational Omics Research
Assessment of Performance• Basic parameters
– Sensitivity: the proportion of the true positive outcomes (e.g. truly diseased subjects) that are predicted to be positive
– Specificity: the proportion of the true negative outcomes (e.g. truly disease-free subjects) that are predicted to be negative
![Page 61: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational](https://reader036.fdocuments.us/reader036/viewer/2022081405/5f0b06df7e708231d42e7d57/html5/thumbnails/61.jpg)
Big Data Training for Translational Omics Research
Assessment of Performance
• Receiver Operating Characteristic (ROC) curve
• Area under the curve (AUC)
– AUC=0.5: no association
– AUC=1: perfect association
– AUC<0.6: No medical value
– AUC>0.75: reasonable
“AUROC”