Data Integration for Cancer Genomics

31
Data Integration for Cancer Genomics

description

Data Integration for Cancer Genomics . Personalized Medicine Tumor Board Question : given all we know about a patient, what is the “optimal” treatment? . The Cancer Genome Atlas Project (TCGA). SNP Structural variations DNA methylation Gene expression microRNA expression - PowerPoint PPT Presentation

Transcript of Data Integration for Cancer Genomics

Page 1: Data Integration for Cancer  Genomics

Data Integration for Cancer Genomics

Page 2: Data Integration for Cancer  Genomics

Personalized Medicine Tumor Board

Question: given all we know about a patient, what is the “optimal” treatment?

Page 3: Data Integration for Cancer  Genomics
Page 4: Data Integration for Cancer  Genomics
Page 5: Data Integration for Cancer  Genomics

The Cancer Genome Atlas Project(TCGA)

SNPStructural variationsDNA methylationGene expression microRNA expression

Paired samples/unpaired samples

Page 6: Data Integration for Cancer  Genomics

Data Processing Challenges

ContaminationSubclones

Page 7: Data Integration for Cancer  Genomics
Page 8: Data Integration for Cancer  Genomics

Biological questions

• Changes in genes between cancer and normals

• Disease heterogeneity, subtypes

• Joint modeling, mechanisms

Page 9: Data Integration for Cancer  Genomics
Page 10: Data Integration for Cancer  Genomics

Integrative approach

Meta-analytical approach

Page 11: Data Integration for Cancer  Genomics

PARADIGM: PAthway Recognition Algorithm using Data Integration on Genomic Models

Page 12: Data Integration for Cancer  Genomics
Page 13: Data Integration for Cancer  Genomics
Page 14: Data Integration for Cancer  Genomics

Xpxn = Wpx(k-1) Z(k-1)xn + epxn

cov(e) = diag(ψ1, ψ2,…, ψp)

Page 15: Data Integration for Cancer  Genomics
Page 16: Data Integration for Cancer  Genomics
Page 17: Data Integration for Cancer  Genomics
Page 18: Data Integration for Cancer  Genomics

Non-negative matrix factorization

XMxN = WMxK x HKxN

All matrix entries are nonnegative

Minimize

Page 19: Data Integration for Cancer  Genomics

X1: an M x N1 matrixX2: an M x N2 matrixX3: an M x N3 matrix

X1 = W x H1 X2 = W x H2 X3 = W x H3

Page 20: Data Integration for Cancer  Genomics
Page 21: Data Integration for Cancer  Genomics

TCGA and GWAS, and ENCODE

Page 22: Data Integration for Cancer  Genomics

Cancer Treatment

Page 23: Data Integration for Cancer  Genomics

Exampleshttp://discover.nci.nih.gov/cellminer/

Gene expression data: HG-U133A chip, mapped to 12980 genes across 59 cell lines (expression data of the cell line “LC:NCI_H23” was unavailable). Use genes included in two lists: (1) 766 cancer-related genes (Chen, et al., 2008); (2) 8919 genes from the Integrated Druggable Genome Database (IDGD) Project (Hopkins and Groom, 2002; Russ and Lampel, 2005). After this filtering, 6958 genes retained.

Drug response data: 101 drugs annotated in the CancerResource database (Ahmed, et al., 2011). –log(GI50)

Pathway association information: Retrieved from the KEGG MEDICUS database (Kanehisa, et al., 2010). 58 pathways which are either known to be related to cancer or have drug targets. Among the 6958 genes selected in step (1), 1863 genes are covered by these 58 pathways and constitute the final list of genes in our real data analysis.

Page 24: Data Integration for Cancer  Genomics

GI50 values

Page 25: Data Integration for Cancer  Genomics

Cancer Types

Cancer type Number of cell linesLeukemia 6

Non-Small Cell Lung 8Colon 7CNS 6

Melanoma 9Ovarian 7Renal 8

Prostate (excluded) 2Breast 6

Page 26: Data Integration for Cancer  Genomics

Connectivity Map Data

• CMap Build 02 (http://www.broadinstitute.org/cmap/) provides public download of genome-wide transcriptional profiles of five human cancer cell lines (MCF7: human breast cancer; HL60: human promyelocytic leukemia; ssMCF7: MCF7 grown in a different vehicle; PC3: human epitelial prostate cancer; SKMEL5: human skin melanoma) both before and after the treatments of 1309 distinct bioactive small molecules.

• Used the data from the HT_HG-U133A array platform, which consists of 4466 expression response profiles, representing 1084 different compounds.

Page 27: Data Integration for Cancer  Genomics

• Integration within the same cancer type

• Integration across different cancer types

Page 28: Data Integration for Cancer  Genomics

One individual with 188 fold coverage

Page 29: Data Integration for Cancer  Genomics
Page 30: Data Integration for Cancer  Genomics

Ideal Pipeline

• Patient diagnosis and sample collection

• Various types of genomics profiling

• Driver mutations, disease subtypes

• Targeted treatments, monitoring, and additional treatments

Page 31: Data Integration for Cancer  Genomics

Topics of Interest• Data processing

• Relationships among different data types

• Tumor heterogeneity

• Single cell analysis

• Modeling

• Targeted treatment

• Integration over different tumor types

• TCGA, ENCODE, GWAS, 1000 Genomes, and others