Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for...

Big Data Training for Translational Omics Research

Principles of Biomarker

Discovery and Development

In Translational Medicine

Liu6/7/2017

Class 1

10:15am

Unit 3; Session 1

Breakdown

Learning objectives

Biomarker and Precision Medicine

Biomarker in preclinical and clinical studies

Principles of Biomarker Discovery: Overview

Principles of Biomarker discovery: data collection

Principles of Biomarker discovery: data analysis

Principles of Biomarker Discovery: validation

Philosophy of Translational Research

• As a biomedical researcher, how

can I make something to benefit

patients?

• I am working on cell lines and

mice, how the omics approach can

help me understand the

mechanism? esp. causality?

• Can the key molecule(s) I

identified in cells and animals be

able to used in humans?

Lab researchers, grant writers, physicians…

Key Words

• Biomarker: A characteristic that is objectively measured

and evaluated as an indicator of normal biologic process,

pathogenic processes, or pharmacologic responses to a

therapeutic intervention.

NIH Biomarkers Definition Working Group

• Translational: Translational research aims to aid in the

transformation of biological knowledge into solutions that

can be applied in a clinical setting

Atkinson, et al., Clin Pharm Ther, 2001.

Azuaje F. Bioinformatics and Biomarker Discovery, 2010

Why Biomarker?

A Core Question in Modern Medicine

How to Address Patient Heterogeneity?

Patient Heterogeneity

BiomarkerPersonalized Medicine

CML Patients

All Breast Cancer

Patients

HER2+ Breast Cancer

Patients

All NSCLC Patients

EGFR MT+ NSCLC

Patients

Gleevec

Herceptin

Iressa

90% RR

10–15% RR

35–45% RR

10–15% RR

60–70% RR

Slamon et al. NEJM 2001; Kantarjian et al. NEJM 2002; Vogel et al. JCO 2002. 20:3; Douillard et al. JCO 2010.

Biomarkers are especially important in diseases with low response rates in

the overall population

Cancer

Other common diseases

Discovery ImplementationDrug development

Gefitinib

ARS-853?

Crizotinib

Herceptin

Vemurafenib

Gene A

Gene B

Gene D

Gene C

Gene E

Precision molecules

BiomarkerPersonalized Medicine

Precision Medicine

To deliver the right treatment to the right patient with the right dose

and at the right time

Clinical Application of Biomarker

• Deal with the patient heterogeneity– Early risk assessment

– Disease prevention

– Assist diagnosis

– Optimize treatment: high effectiveness, low risk

– Match the patient to therapeutic strategy

– Monitor therapy success/disease recurrence

– Long-term management

Diagnosis

Treatment

Monitoring

Biomarker in Preclinical Studies• To characterize the phenotype

• To monitor the response

• To identify potential translational biomarkers for humans

Omics Approach in Basic Research

• Explore molecular mechanism

• Hypothesis generating

• Identify therapeutic targets and strategies

• Establish intermediate phenotypes

Type of Biomarkers

• Prognostic marker (a): before treatment

• Predictive marker (b): before treatment

• Pharmacodynamic marker (c): after treatment

• Surrogate marker (d): during treatment

Gosho, et al. Sensors 2012, 12, 8966-8986

Prognostic Marker• Signature separates a population with respect to the outcome (risk)

• Regardless of the types of therapies or treatments– Markers associated with overall survival regardless of treatment

• Distinguish outcome (poor or good) following the test and standard treatments

• Cannot guide the choice of a particular treatment

• Can determine the aggressiveness of treatment

Ballman KL, JCO. 2015.63.3651

Predictive Biomarker

• Predicts the differential outcome of a particular therapy or treatment

• Prospectively identify patients who are likely to have a favorable clinical outcome from a specific treatment; therefore, a predictive biomarker

• Can guide the choice of treatment

Prognostic and Predictive Markers

• Biomarkers are both predictive of disease susceptibility or progression and certain treatment outcomes

• ER status and breast cancer-prognostic

• ER status and antiestrogen therapy-prediction

Pharmacodynamic Markers• PD biomarkers provide information about the pharmacologic

effects of a drug on its target

• Measured after treatment

• A clinical endpoint to be measured

• Application:– Proof of mechanism: i.e., Does the drug hit its intended target?

– Proof of concept: i.e., Does hitting the drug target alter the biology of the tumor?

– Selection of optimal biologic dosing

– Understanding response/resistance mechanisms

• Examples:– Protein phosphorylation markers. i.e. p-EGFR, p-ERK to evaluate

changes in target protein phosphorylation or the activation status of downstream signaling/adapter molecules.

– Apoptosis (TUNEL assay) to assess pharmacologic effect on proliferation

Surrogate Biomarker• Substitute for a clinical endpoint

• Expected to predict clinical benefit (lack of benefit or harm) based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence

• During or after treatment

• Examples:

• Glucose level monitoring the treatment for diabetes

• Imaging-based measurement for anti-cancer therapy

Questions

What kind of biomarker is

HOX13B:IL17BR in the first case paper?

What kind of biomarker is blood

concentration of R-/S-methadone in the

second case paper?

Examples of FDA Approved Biomarkers

Biomarker Discovery and Development in the Omics Era

1970s 1980s 1990s

Biomarker Discovery and Development in the Omics Era

Genomics

Transcriptomics

miRNomics

lncRNomics

Epigenomics

Proteomics

Metabolomics

Lipidomics

Exposomics

Prognostic-diagnostic Markers

• Genes for ~50% of rare diseases identified

Nature Reviews Genetics 14, 681–691 (2013)

Prognostic-Diagnostic Markers• 11,907 SNPs strongly associated with common diseases

Pharmacogenomic Markers

• 166 FDA approved PGx markers for drug treatment

Transcriptomic Biomarkers

• MammaPrint test– Agendia

– 70-gene signature for breast cancer prognosis

• Oncotype Dx test– Genomic Health

– 21 gene-expression biomarkers for predicting the recurrence of breast cancer patients, and predicting response to both chemotherapy and radiation therapy

• H/I test– AviaraDx

– 2-gene signature that is used to estimate the risk of recurrence and response to therapy of breast cancer patients.

Technical

development

Biomarker Development Pipeline

Discovery ConfirmationAssay

development

Validation/

Refinement

Clinical Validation

Clinical Adoption

Genomics

Transcriptomics

Proteomics

Metabolomics

Lipidomics

Epigenomics

Exposomics

Imaging

Target

selection

Integrated technologies and platforms

Multi-analyst assays

Robust validated assays

Clinical grade assays

Accurate, specific,

reproducible, reliable

Clinical grade assays

Instruments

Number of analytes

Number of samples

https://is.muni.cz

identification

Preclinical

Retrospective

Clinical

trials

Marketing

clinical use

Institute of Medicine Roadmap for omics-

based tumor biomarker test development

Hayes BMC Medicine 2013, 11:221

Institute of Medicine Roadmap for omics-

based tumor biomarker test development Hayes BMC Medicine 2013, 11:221

Data Acquisition Strategies

• Retrospective:– Clinical samples collected before the design of the biomarker study,

and before comparison with control samples.

– Looks back at past, recorded data to find evidence of marker-disease relationships

– Inexpensive, rapid

– Potentially biased, noisy

– Weak evidence

• Prospective– The biomarker-based prediction or classification model is applied on

patients at the time of patient enrolment

– Clinical outcomes or disease occurrence are unknown at the time of enrolment

– Less biased

– Strong evidence

– Expensive, time-consuming,

• Pro-retrospective

FDA approval!!

Study Design Consideration

• Biomarker discovery studies require careful planning and design

• Study style: retrospective, prospective, pro-retrospective

• Sample collection

• Phenotype

• Sample size and power estimation

• Other covariates

• Data collection

• Platform

• Replication, validation and application

• Data analysis plan

Sample Collection, Assay Design, Data Analysis Plan

• Establish methods• Specimen collection • Processing • Storage

• Establish criteria • Quantity and quality• Minimum amount

• Feasibility • Obtaining specimens

• Assay design• Communication with core/service provider

• Data Analysis• Communication biostatistician and bioinformatician

Sample and Materials

• Biospecimen• Tissue

• Blood

• Oral swab

• Hair

• Tear

• Urine

• Feces

• Saliva

• …

• Test materials• DNA

• RNA

• Protein

• Small

molecules

• Lipids

• Principles:• Non-invasive

• Reproducible

• Reliable

• Specific

• Accurate

• Inexpensive

• Point-of-care

invasiv

Ethical, Legal, and Regulatory Issues

• Establish communication with regulatory agencies, e.g. IRB, FDA

• Regulatory approvals

• Documents: – Informed consent

– Study protocol

• Intellectual property issues

• CLIA-lab based test for clinical trials involving patient selection

Sample Size and Power Estimation• Power setting: 0.8

• Statistical significance: – Discovery: multiple hypothesis (corrected p

according to # of tests)

– Validation: usually one hypothesis (p<0.05)

• Input parameters: previous publication or pilot study

• Online tools:– piface.jar by Lenth (2006).

• http://homepage.stat.uiowa.edu/~rlenth/Power/

– Microarray power/sample size estimation• http://sph.umd.edu/department/epib/sample-size-

and-power-calculations-microarray-studies

• RNA-seq data:

• Scotty: http://bioinformatics.bc.edu/marthlab/scotty/scotty.php

• RnaSeqSampleSize: https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/

Key Principles: Big Data in Biomarker

Phenotype Molecular Profiles

X“Digits” “Digits”Statistics

Bioinformatics

Network

Always Start Your Design and

Analysis From Data Evaluation!

• What kind of phenotypic and marker data do I

have/should I use/collect?

• Are my data normally distributed?

• What kind of models should I choose?

• What factors may possibly confound my analyses?

• How covariate data may be correlated with my

phenotype?

Phenotype to Digits

• Nominal data: no order– yes or no (Binary): disease vs normal, response vs no

response

– Cancer type: Breast, lung, colon…

• Ordinal data: some order– Pathologic: Tumor stage: I, II, III

– Disease progression: no, mild, severe, death

• Continuous data: – glucose level, LDL, drug concentration, gene expression

• Survival data: time to event– Death, occurrence of disease, onset of toxicity, in hr, day,

wk, month, yr, etc.

Platform

Raw data

“Digits”Ordinal data

0, 1, 2

Continuous Variables-1.2,

Genomics

Transcriptomics

miRNomics

lncRNomics

Epigenomics

Proteomics

Metabolomics

Lipidomics

Molecular Data Collection

Basic Statistical MethodsPhenotype Molecular Profiles

XNumerical data Numerical data

Nominal

Ordinal

Continuous

Nominal

Ordinal

Continuous

Survival

Chi-square test

t-test

Correlation

Log rank

Statistic

Models

Descriptive and exploratory association

Basic Statistical Methods

• Continuous data

– Normal distributed: parametric method

– Non-normal distribution/ordinal data: non-parametric

method

• Winsorization

• Log transformation: log2

Parametric Non-parametric

t-test Mann-Whitney rank-sum test

Paired t-test Wilcoxon signed-rank test

ANOVA Kruskal-Wallis test

Pearson correlation Spearman correlation

Statistic Models

• Univariate models– Logistic regression: binary/categorical phenotype

– Linear regression: continuous phenotype

– Kaplan-Meier (KM) method: survival phenotype

• Multivariate models– Multivariate regressions: linear or logistic

– Cox regression: survival phenotype

• Other sophisticated models

• Example• P value cutoff =0.05

• 1000 genes: 50 genes by chance (error) at this significance level

• If 60 genes with p<0.05, many might be due to noise (false positive)

• Common Correction Method• Bonferroni Correction

• True significance level: pXn, e.g. p=0.0005, n=1000 genes, true p=

0.0005X1000=0.5.

• Correct p value = 0.05/N

• Explanation: among all genes selected, the p value for at least one

false positive is <=0.05

• False discovery rate (FDR)• FDR=0.1, meaning among all genes selected, (e.g. 100), we would

expect 10 to be false positive

• FDR as high as 0.5 may be acceptable to biologists

• Several different approaches to estimate (Benjamini & Hochberg,

B&H, most popular)

• Data filtering in the process step can also reduce the number of genes

Multiple Testing Issue

Azuaje F. Bioinformatics and Biomarker Discovery, 2010

Basic Biomarker Discovery Pipeline

Data Processing

• Data pre-processing – Data filtering and QC

• Remove samples with failed experiment

• Exclude markers with very low variance

• Exclude markers with very low expression levels, e.g. RNA-seq

– Data Normalization• To transform the data into a format that is compatible

or comparable between different samples or assays

• To level potential differences caused by experimental factors, such as labelling and hybridization

Why Remove Genes with Low Variance?

p=0.004 p=0.008

Data Reduction

• Focus on smaller sets of potentially novel and interesting data patterns (e.g. groups of samples or gene sets).

• Confirm initial hypothesis about the relevance of the features available and to guide future experimental and computational analysis

• Exploratory univariate analyses– T-test

– Chi-square test

– Correlation

– Univariate regression

Data Matrix

• Data matrix

• Color-coded representations of

• Absolute or relative expression levels

Samples

Data Visualization

dendrogram

• Statistical plotting: Graphpad

• Dendrogram and heatmap: R, GENE-E, Gitools

Exploratory Analysis

• Univariate analysis

• Single marker vs phenotype

• Multiple-hypotheses testing corrections– DEG

– Fold change

– Statistical model: t-test, correlation, univariate regression

– P values and other cut-off

• Unsupervised classification (clustering) and visualization

• Filtering: to remove uninformative, highly noisy or redundant markers for subsequent analyses

• Supervised classification

Data Integration

• Further reduction

• Which marker to be chosen for the predictive model construction

• To estimate the potential relevance of the identified markers and relationships;

• To discover other significant genes and relationships (e.g. gene-gene or gene-disease) not found in previous data-driven analysis steps

• Tools:– human gene annotation databases (e.g. GO),

– metabolic pathways databases (e.g. KEGG),

– gene-disease association extractors from public databases (e.g. Endeavour),

– Other functional catalogues

• Resulting data- and knowledge-driven findings, patterns or predictions provide a selected catalogue of genes, pathways and (gene-gene and gene-disease) relationships relevant to the phenotype classes investigated

Don’t Forget Covariates!• Don’t forget these:

– Demographic• age, gender, race (often a PCA component), smoking, drinking, life style etc.

– Physiological• BMI, weight, height, etc.

– Clinical• blood tests, urine tests, other analytes.

• Integrate information– Molecular data

– Knowledge-driving data

– Covariates

• Multivariate regression– Model training

– Model validation

– Model assessment• ROC

Data Integration is Critical

• Provide more reliable information

• Increase the prediction value

• Insight into the mechanism

• Reliable hypothesis generating

• But can be biased as well

Transcription Translation Catalysis

DNA RNA Protein Metabolites

Genome Transcriptome Proteome Metabolome/Lipidome Clinical endpoint

dysregulation

Genetic effect

Environmental effect

Examples of Cardiovascular

Biomarkers with Integrated

Vasan, 2006; Gerszten and Wang, 2008

Building Predictive Models

If …Then…

Build up a model based on selected markers

Discovery set

validation set

Pro-retrospective set

Prospective set

Y= β0 + β1X1 + β2 X2 + βiXi^ ^ ^ ^

Predictive Models

• Multivariable models

– Linear regression

• Continuous data

– logistic regression

• Presence/absence of disease

– Cox regression

• Survival data

• Algorithmic models—Machine learning

– Support vector machines (SVM)

– Artificial neural networks (ANN)

Validation Strategies

• Internal validation

– Cross-validation

– Random/non-random split samples into

training and test set

• External validation

– Independent sample and dataset

Assessment of Performance• Basic parameters

– Sensitivity: the proportion of the true positive outcomes (e.g. truly diseased subjects) that are predicted to be positive

– Specificity: the proportion of the true negative outcomes (e.g. truly disease-free subjects) that are predicted to be negative

Assessment of Performance

• Receiver Operating Characteristic (ROC) curve

• Area under the curve (AUC)

– AUC=0.5: no association

– AUC=1: perfect association

– AUC<0.6: No medical value

– AUC>0.75: reasonable

“AUROC”

Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for...

Documents

Transcript of Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for...

Technologies enabling translational research...is a key point-of-contact for researchers engaged in biomarker discovery and validation studies. It provides a direct pathway to promote

Bioinformatics and Biomarker Discovery Part 3: Examples · 2009-08-23 · Biomarker Discovery Part 3: Examples Limsoon Wong ... • E.-J. Yeoh et al., “Classification, subtype discovery,

Translational Biomarker

Translational evidence for RRM2 as a prognostic biomarker ...

Principles of translational medicine: imaging, biomarker ...semmelweis.hu/cemdc/files/2015/12/Mikus_BM1_Principles-of... · Principles of translational medicine: imaging, biomarker

METHOD Open Access Metagenomic biomarker discovery and ...

High throughput urine biomarker discovery and integrative analysis for translational medicine High throughput urine biomarker discovery and integrative.

Revisiting biomarker discovery by plasma proteomicsmsb.embopress.org/content/msb/13/9/942.full.pdf · Revisiting biomarker discovery by plasma proteomics Philipp E ... (Aebersold

Advances in mass spectrometry-based clinical biomarker discovery · 2017-04-10 · Advances in mass spectrometry-based clinical biomarker discovery Christopher A. Crutchfield1,2*,

Accelerating Biomarker Discovery and Validation - …€¦ · Accelerating Biomarker Discovery and Validation. ... explore their data and define cohorts in minutes rather than being

Molecular Biomarker Discovery in Psoriatic Arthritis · Molecular Biomarker Discovery in Psoriatic Arthritis Remy Angela Pollock Doctor of Philosophy Institute of Medical Science

Significant Pattern Mining for Biomarker Discovery

Proteomics in biomarker discovery for clinical purposes i3DUrepositorio.insa.pt/bitstream/10400.18/3564/1/Microsoft PowerPoint... · Proteomics in biomarker discovery for clinical

Discovery and Qualification of Serum Protein Biomarker ...

Translational Tools for Neuroscience Drug Discovery ......Drug Discovery Biomarker development Biomarkers are indispensable tools for drug discovery, as they are used extensively to

Translational biomarker discovery in clinical metabolomics: an … · 2017-08-26 · Michael Wilson • David S. Wishart ... (Newby et al. 2001). This, of course, is the motivation

Biomarker development in translational research and commercialisationcgs.hku.hk/portal/files/GRC/Events/Seminars/2012/2012… · · 2012-11-26Biomarker development in translational

University of Groningen Biomarker discovery for cervical ...

METABOLOMICS & Biomarker discovery

INTEGRATED DRUG DISCOVERY CAPABILITIES · 2020. 7. 22. · •Target validation with a genetic and pharmacological approach •Biomarker expression and post-translational modifications