Emergent Biology Through Integration and Mining Of Microarray Datasets Lance D. Miller GIS...
-
date post
24-Jan-2016 -
Category
Documents
-
view
220 -
download
0
Transcript of Emergent Biology Through Integration and Mining Of Microarray Datasets Lance D. Miller GIS...
Emergent Biology Through Integration and MiningOf Microarray Datasets
Lance D. MillerGIS Microarray & Expression Genomics
Mining of expression data to understandthe molecular composition of human
cancers and to define componentsof the tumor molecular profile
with mechanistic and clinical importance.
FOCUS:
70-gene prognosis classifier for predicting riskof distant metastasis within 5 years
Van’t veer, et. al.
Though each tumor is molecularly unique,there exist common transcriptional cassettesthat underly biological and clinical propertiesof tumors that may be of diagnostic,prognostic and therapeutic significance.
GOAL:
Mining of expression data to understandthe molecular composition of human
cancers and to define componentsof the tumor molecular profile
with mechanistic and clinical importance.
Meta-Analysis of Breast Cancer Datasets:
dataset source sample size array format
1. Miller-Liu: unpublished 61 tumors: 39 ER+, 22 ER- 19K spotted oligo
2. Sotiriou-Liu: submitted: PNAS 99 tumors: 34 ER+, 65 ER- 7.6K spotted cDNA
3. Gruvberger-Meltzer: Cancer Research 47 tumors: 23 ER+, 24 ER- 6.7K spotted cDNA
4. Sorlie-Borrensen-Dale: PNAS 74 tumors: 56 ER+, 18 ER- 8.1K spotted cDNA
5. van’t Veer-Friend: Nature 98 tumors: 59 ER+, 39 ER- 25K spotted oligo
6. West-Nevins: PNAS 49 tumors: 25 ER+, 24 ER- 7.1K Affymetrix
total: 428 tumors, ~73,500 probes
(Adaikalavan Ramasamy et. al.)
META MADB: The Construct
1. Extract and Format the Data 2. Link sample/probe info via unique keys3. Log Transform and Normalize4. Filter Genes and Arrays5. Apply Statistical Tests
Building the Matrix
Creating a Universe
1. Apply UniGene ID as Unifying Key2. Remove Gene Redundancy 3. Extract p values, d values, z-scores4. Set p value threshold5. Merge Datasets
d values (difference of average expression)
T1 T2 T3 T4 T5 …Tn T1 T2 T3 T4 T5 …Tn
gene1 : e1 e2 e3 e4 e5 …en e1 e2 e3 e4 e5 …en
d = average e [ER+] average e [ER-]/
ER+ ER-
Identifying Grade-Specific Genesin Hepatocellular Carcinoma
• Sample: 10 cases of each class• Sample collection: HBV(+)• Array: Human 19K Oligonucleotide array• Analysis : 50 arrays
OAH AAH G1 G2 G3
HCC Progression
Pre-neoplastic lesions
Adenomatous hyperplasiaordinary atypical
HCC Grade 1, 2, 3
ORC6L DNA replicationTROAP M/G1 cell adhesionBUB1 G2/M mitotic spindle checkpoint; oncogenesisCKS2 G2/M cytokinesisMELK G2 tyr/ser/thr kinase activityCDC20 G2/M regulation of cell cycleHN1 G2/M UnknownMCM6 G1/S DNA replication initiationCDC2 G2 mitotic initiationUBE2C G2 cyclin catabolismTOP2A G2 DNA metabolismCDKN3 M/G1 regulation of CDK activityPTTG1 M/G1 mitotic regulation; oncogenesisE2-EPF M/G1 ubiquitin cycleFLJ23462 electron transportGATA3 embryogenesis
Breast Cancer Grade-Associated Genes asPredictors of HCC Grade?
HCC
UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +Duodenal cytochrome -2.1 + +Thrombospondin 1 2.4 + +Putative transmembrane protein -3.8 + + +++Stromal cell-derived factor 1 3.8 ++ ++Retinoblastoma binding protein 8 2.2 ++ + + ++Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++protein kinase H11 1.5Olfactomedin 1 3.0 ++DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 10 (RNA helicase) 2.3 + +Hypothetical protein similar to mouse Dnajl1 2.5 + +++Putative protein kinase 1.7
2.5 +UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 1 3.7 + + ++Hypothetical protein FLJ14299/Similar to nocA zinc-finger protein 2.5 ++Immunoglobulin superfamily, member 4 2.2 + ++Cyclin G2 -2.6 ++ +Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 +Chitobiase, di-N-acetyl- -1.9 ++Arachidonate 12-lipoxygenase, 12R type -4.0 ++ +Purinergic receptor (family A group 5) -2.3 +G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + +
Estrogen Responsive Genes in vitro (Chin-Yo Lin)
UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +Duodenal cytochrome -2.1 + +Thrombospondin 1 2.4 + +Putative transmembrane protein -3.8 + + +++Stromal cell-derived factor 1 3.8 ++ ++Retinoblastoma binding protein 8 2.2 ++ + + ++Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++protein kinase H11 1.5Olfactomedin 1 3.0 ++
2.3 + +Hypothetical protein similar to mouse Dnajl1 2.5 + +++Putative protein kinase 1.7
2.5 +3.7 + + ++2.5 ++
Immunoglobulin superfamily, member 4 2.2 + ++Cyclin G2 -2.6 ++ +Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 +Chitobiase, di-N-acetyl- -1.9 ++Arachidonate 12-lipoxygenase, 12R type -4.0 ++ +Purinergic receptor (family A group 5) -2.3 +G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + +
UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +
1 2 3 4 5 6
(p<0.001)
Estrogen-Responsive in vitro and ER Status-Associated in vivo
E2 E2 + ICI E2 + CHX
Identifying Cancer-Linked Genesin Epithelial Adenocarcinomas
Datasets: 3 gastric, 3 prostate, 2 liver, 1 lung
selection at p<0.001 242 Genes that Distinguish Tumor from Normalat p<0.001 in at least 3 of the 4 Tumor Types
database components:
internal and external datasets derived from:
- tumor studies (clinical samples)
- in vitro, pathway studies (eg, timecourse)
- SAGE data
- mouse studies (in vitro/in vivo)
An Integrated Database for Pan-CancerMeta-Analysis of Gene Expression Data
Summary
Derive expression signatures for all major factors known or suspected to have prognostic value
Determine the reliability of expression signatures in outcome prediction
Expand integrated database for pan- cancer meta-analysis
Integrate expression profiling into clinical decision making
Future Directions