Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
![Page 1: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/1.jpg)
Microarrays and CancerSegal et al.
CS 466
Saurabh Sinha
![Page 2: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/2.jpg)
Genomics and pathology
• Genomics provides high-throughput measurements of molecular mechanisms– Microarrays, ChIP-on-chip, etc.
• Genomics may provide the molecular underpinnings of pathology, in a highly comprehensive manner– Revolutionize the diagnosis and management of
diseases, including cancer
![Page 3: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/3.jpg)
Prior applications to cancer
• Gene expression measurements have been applied to cancer diagnosis
• Measure each gene’s expression in several normal tissue samples, and several pathological (diseased) samples
• Find subset of genes differentially expressed in the two sample groups
• If such “gene signatures” of particular cancer types are found, they can become the basis of tests for malignancy
![Page 4: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/4.jpg)
We want better …
• Genes may be differentially expressed, but not enough to cross certain thresholds used in the analysis
• Analyzing the data on a gene-by-gene basis is error prone -- microarray data has inherent noise
• Finding the genes involved in one type of cancer is only the first step; it does not reveal the underlying processes
![Page 5: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/5.jpg)
Part 1: Cancer modules
![Page 6: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/6.jpg)
A “module” level view
• Many methods use “gene modules” (sets of genes) as basic blocks for analysis
• Instead of trying to find changes in individual gene expression profiles, look out for entire sets of genes with changing expression profiles
![Page 7: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/7.jpg)
The study of Mootha et al.
• Showed that expression of “oxidative phosphorylation” genes (a particular set of genes) is reduced in diabetic muscle
• Signal not very strong when looking at individual genes, but highly significant when looking at the “gene module”
![Page 8: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/8.jpg)
Source: Nature Genetics 37, S38 - S45 (2005) D
isea
se t
issu
e(D
iabe
tes
mel
litus
typ
e 2)
Normal tissue(Normal tolerance to glucose)
Grey: all genesRed: oxidative phosphorylation genes
![Page 9: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/9.jpg)
Segal et al.: Methodology
• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or
normal tissue
• Compile a large collection of gene sets (modules) from earlier studies
• Identify gene set (modules) induced or repressed in a microarray
• Identify modules induced in several arrays, or repressed in several arrays
• Check if these arrays are enriched in some clinical annotation
![Page 10: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/10.jpg)
Identify gene set (modules) induced or repressed in a microarray
• Given expression value Eg,m of each gene g in the microarray experiment m
• Compute average expression Eg of the gene g over all microarrays
• If Eg,m is 2-fold greater than Eg, call the gene g as induced in array m
• Categorize each gene as being induced or not-induced in the array.
Source: Nature Genetics 36, 1090 - 1098 (2004)
![Page 11: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/11.jpg)
Identify gene set (modules) induced or repressed in a microarray
• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k
• Hypergeometric test(N,n,m,k):
• If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be larger than or equal to k?
All genes
Module
Induced
Intersection
€
Pr(
![Page 12: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/12.jpg)
Identify gene set (modules) induced or repressed in a microarray
• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k
• Hypergeometric test(N,n,m,k):
• Sum over i>=k: If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be equal to i?
All genes
Module
Induced
Intersection
€
Pr(
![Page 13: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/13.jpg)
Identify gene set (modules) induced or repressed in a microarray
• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k
• Hypergeometric test(N,n,m,k):
All genes
Module
Induced
Intersection
€
Pr(
€
n
i
⎛
⎝ ⎜
⎞
⎠ ⎟N − n
m − i
⎛
⎝ ⎜
⎞
⎠ ⎟
N
m
⎛
⎝ ⎜
⎞
⎠ ⎟i≥k
∑
“p-value” of the Hypergeometric test
![Page 14: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/14.jpg)
Identify gene set (modules) induced or repressed in a microarray
• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k
• Hypergeometric test(N,n,m,k)
• If the “p-value” is very small, then we infer that the intersection is “statistically significant”, i.e., the module is induced in the microarray
• Similarly define module repressed in microarray
All genes
Module
Induced
Intersection
€
Pr(
![Page 15: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/15.jpg)
Segal et al.: Methodology
• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or
normal tissue
• Compile a large collection of gene sets (modules) from earlier studies
• Identify gene set (modules) induced or repressed in a microarray
• Identify modules induced in several arrays, or repressed in several arrays
• Check if these arrays are enriched in some clinical annotation
![Page 16: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/16.jpg)
Source: Nature Genetics 36, 1090 - 1098 (2004)
![Page 17: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/17.jpg)
Segal et al.: Methodology
• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or
normal tissue
• Compile a large collection of gene sets (modules) from earlier studies
• Identify gene set (modules) induced or repressed in a microarray
• Identify modules induced in several arrays, or repressed in several arrays
• Check if these arrays are enriched in some clinical annotation
![Page 18: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/18.jpg)
Identify modules induced in several arrays, or repressed in several arrays
Source: Nature Genetics 36, 1090 - 1098 (2004)
![Page 19: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/19.jpg)
Segal et al.: Methodology
• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or
normal tissue
• Compile a large collection of gene sets (modules) from earlier studies
• Identify gene set (modules) induced or repressed in a microarray
• Identify modules induced in several arrays, or repressed in several arrays
• Check if these arrays are enriched in some clinical annotation
![Page 20: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/20.jpg)
Check if these arrays are enriched in some clinical annotation
Source: Nature Genetics 36, 1090 - 1098 (2004)
![Page 21: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/21.jpg)
Segal et al: Cancer “module maps”
Source: Nature Genetics 37, S38 - S45 (2005)
Red(m,c): Microarrays in whichmodule m was overexpressed (induced) are enriched in condition c
Green: Microarrays in whichmodule m was underexpressed (repressed) are enriched in condition c
Rows and columns are not inan arbitrary order. They havebeen “clustered” to displaysimilar rows (or columns) together
![Page 22: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/22.jpg)
Insights from cancer module map
• Some modules activated or repressed across many tumor types. Such modules could be related to general tumorogenic processes
• Some modules specifically activated or repressed in certain tumor types or stages of tumor progression
![Page 23: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/23.jpg)
From modules to regulation
• A module map shows the transcriptional changes underlying cancer
• Transcriptional changes are a result of transcription factors and their binding sites
• A deeper understanding of cancer would come from finding out which transcription factors and binding sites led to the transcriptional changes
![Page 24: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/24.jpg)
Part 2: Cis-regulatory elements
![Page 25: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/25.jpg)
Genomics and gene regulation
• Such knowledge comes from genomics data• ChIP-chip studies identify which transcription
factors bind which DNA sequences• Analysis of DNA sequence, using known
binding site motifs, gives us putative binding sites
• Cross-species conservation also tells us something about possible locations of binding sites
![Page 26: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/26.jpg)
Cis-regulatory analysis
• Identify a set of genes whose promoters contain the same binding sites– Such a set of genes is likely to be regulated by the
same TF– Often called a “regulatory module”
• Earlier studies mined microarrays for “co-expressed” genes, then used motif finding algorithms to discover their shared binding sites
![Page 27: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/27.jpg)
Cis-regulatory analysis
• Another approach (Segal et al. 2003) tried to solve the problem in an integrated manner
• Find a set of genes such that– their expression profiles are similar (microarrays)– they share the same binding sites (sequence)
• Joint learning of “regulatory module” from two very different types of data: microarray and sequence– An important theme in current bioinformatics
![Page 28: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/28.jpg)
Cis-regulatory analysis
• Connection between gene expression and cis-regulatory elements (binding sites) also explored in Beer & Tavazoie.
• Found rules on combinations and locations of binding sites that would cause the gene to be over- or under-expressed
![Page 29: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/29.jpg)
• The binding sites “RRPE” and “PAC” must occur within240 bp and 140 bp of gene start• Genes containing both motifs, following certain rules on location, are tightly co-regulated• Genes containing any one motif, or both in incorrect positional configuration, have close to random expression
Source: Nature Genetics 37, S38 - S45 (2005)
![Page 30: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/30.jpg)
Eukaryotes• These studies have mostly focused on yeast
(which is a eukaryote, but has a small, compact genome)
• Not much work of this type in the longer, more complex genomes of metazoans (e.g., humans, rodents, fruitflies)
• The genome is not compact; may not suffice to look at sequence right next to a gene. Intergenic regions are long, and cis-regulatory signals may not be close to gene
![Page 31: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/31.jpg)
One study in humans
• HeLa cells are an “immortal” cell-line derived from cervical cancer cells in a person who died in 1951.– Used extensively in studying cancer
• Method of Segal et al. (joint learning of regulatory modules from gene expression and sequence data) applied to these cells
![Page 32: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/32.jpg)
One study in humans
• Gene expression data used: microarrays measuring genes during cell cycle in HeLa cells
• Sequence: 1000 bp promoters (upstream) of human genes
![Page 33: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/33.jpg)
Result of analysis: Two motifs found to be shared by this set of genes. The genes have similar expression profiles. One of theidentified motifs (NFAT) known to be involved in cell-cycle
Source: Nature Genetics 37, S38 - S45 (2005)
![Page 34: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d4a5503460f94a2796b/html5/thumbnails/34.jpg)
Summary
• The common theme is to analyze sets of genes, and relate their common expression patterns to cancer types or to presence of cis-regulatory motifs
• Search algorithms may be required to identify some of these features