Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.
-
Upload
lucy-wilcox -
Category
Documents
-
view
219 -
download
0
Transcript of Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.
Genomics of Erythroid Regulation: G1E and G1E-ER4
January 20, 2010
Investigators on global predictions and tests
• Penn State– Hardison– Francesca Chiaromonte– Yu Zhang– Webb Miller– Stephan Schuster – Frank Pugh, collaborator– Kateryna Makova, collaborator– Anton Nekrutenko, collaborator
• Childrens’ Hospital of Philadelphia– Mitch Weiss– Gerd Blobel
• Emory Univ.– James Taylor
• Duke Univ.– Greg Crawford, collaborator
• Univ. Queensland– Andrew Perkins, collaborator
• NHGRI– Laura Elnitski, collaborator
Aims of Global tests and predictions of erythroid regulation
Hematopoiesis
Factor Class Mode of discovery
GATA1 Zn finger binds globin locusNF-E2 bZIP binds globin locusKLF1/EKLF Zn finger subtractive hybridizationSCL/TAL1 bHLH rearranged in leukemiasGFI1b Zn finger oncoviral integration siteZBTB7a/LRF POZ-Kruppel proto-oncogene lymphomasOthers
Major erythroid transcription factors
Globin genes-GATA-
GATA-1
Transcription factor GATA-1
• Founding member of a small family of proteins- GATA-2 GATA-6
• Binds functionally important cis WGATAR motifs in regulatory regions of many hematopoietic genes
• Essential for erythroid and megakaryocyte development- gene knockout studies in mice- analysis of human patients
Lineage-restricted factors
cofactors
histone modifying enzymes
Transcription
Gene activation by alterations in chromatin
• “Regulatory signals entering the nucleus encounter chromatin, not DNA, and the rate-limiting biochemical response that leads to activation of gene expression in most cases involves alterations in chromatin structure. How are such alterations achieved?”
– Gary Felsenfeld & Mark Groudine (2003) Controlling the double helix. Nature 421: 448-453
• "It is now generally argued that reorganization of these chromatin structures is a process that is mechanistically linked to many gene activation or repression events, and is initiated by the action of site-specific transcription factors, acting either through ATP-driven nucleosome remodeling machines, or via the action of enzymes that covalently modify various components of the chromatin structure.”
– Sam John, … John A. Stamatoyannopoulos, and Gordon L. Hager (2008) Interaction of the Glucocorticoid Receptor with the Chromatin Landscape. Molecular Cell 29, 611–624.
• “These results imply that GATA-1 is sufficient to direct chromatin structure reorganization within the beta-globin LCR and an erythroid pattern of gene expression in the absence of other hematopoietic transcription factors.”
– Layon ME, Ackley CJ, West RJ, Lowrey CH. (2007) Expression of GATA-1 in a non-hematopoietic cell line induces beta-globin locus control region chromatin structure remodeling and an erythroid pattern of gene expression. J Mol Biol. 366:737-744.
Chromatin transitions before and after TF binding
• "We propose four specific kinds of interaction. The classical mode for GR binding to chromatin involves receptor-dependent recruitment of the Swi/Snf complex (1), resulting in a hormone-dependent hypersensitive transition. It is now clear, however, that some hormone-dependent events must involve other remodeling species (2). Furthermore, many GR binding events are associated with pre-existing transitions, and these constitutive events fall again into two classes, Brg1 dependent (3) and Brg1 independent (4). ”
– Sam John, … John A. Stamatoyannopoulos, and Gordon L. Hager (2008) Interaction of the Glucocorticoid Receptor with the Chromatin Landscape. Molecular Cell 29, 611–624.
Order of events in activation can vary
• Figure 5. Models Depicting Different Orders of Action by Regulators and Chromatin- Remodeling Complexes
• Regulators, HAT complexes, and ATP-dependent remodeling complexes can act in different orders (pathway A, B, or C) and still give the same end result: a template competent for transcription. Although not shown, it is also possible that binding by the general transcription factors precedes the action and recruitment of HAT complexes and ATP-dependent remodelers.
– Geeta J. Narlikar, Hua-Ying Fan and Robert E. Kingston (2002) Cooperation between Complexes that Regulate Chromatin Structure and Transcription. Cell 108: 475-487
What biochemical events precede and follow GATA1 binding?
• Where should we look in the genome?– Segments around the transcription start site (TSS)
• all genes• expressed vs nonexpressed genes• all GATA1-responsive genes
– induced vs repressed genes– All TF-occupied segments (OSs)– Distal TF OSs (i.e. outside the TSS zone)– All mappable regions
• Much larger computation
• Treat levels of biochemical marks as– continuous variables– discrete segments (bound or not, histones modified or not)
• Categorize genes and OSs by the order of events– Category 1. Co-occupancy by other TFs, histone modifications, and DHS
formation occurs after the TF1 of interest binds or is activated– Category 2. Co-occupancy by other TFs, histone modifications, and DHS
formation occurs before the TF1 of interest binds or is activated
Gata1–
ES cells
in vitro hematopoieticdifferentiation:
erythropoietinstem cell factor
immature hematopoietic
cell lines
thrombopoietin
add back GATA-1G1E G1ME
erythroid
+megakaryocyte
erythroid
G1E-ER4+estradiol
Cell-based models to study GATA1 function
+ estradiol
Stably expressing estrogen-activated GATA-1 (GATA-1-ER)
Global analysis of GATA1-regulated erythroid gene expression in G1E cells (1999-2009)
hemoglobin
morphology
U74 array 12,500 probesets
9,266 genes
430 2.0 array45,000 probesets19,000 genes
hrs in estradiol
0 3 7 14 21 30
Affymetrix gene chipBlood 2004Genome Res 2009
Transcriptome analysis
GATA1-induced (>2-fold)1048 genes
• known targets• new gene discovery
GATA1-repressed (>2-fold)1568 genes
• stem cell/progenitor markers• proto-oncogenes (Kit/Myc/Myb)• function unknown
Affy 430 2.0
Kinetics of GATA1-regulated Gene Expression
• 60 megabase region of chromosome 7
• identify new GATA1-regulated genes• define combinatorial TF interactions• correlate histone marks w/ TF occupancy and gene expression
Factor occupancy and GATA1 responses
Direct and indirect effects in repression and activation
Datasets available and in progress
Feature G1E G1E-ER4 + E2
DNase hypersensitive sites In progress Done
GATA1 occupancy No Done
TAL1 occupancy Done Done, will repeat
GATA2 occupancy Done Done
CTCF occupancy Done Done
H3K4me1 Done Done
H3K4me3 Done Done
H3K27me3 Done Done
RNA polymerase II Repeating it Repeating it
RNA seq In progress In progress
Western blots show specificity of antibodies and presence of proteins in cells
α-GATA1
G1E
G1E
-ER
4
ME
L
CH
12125
101
56.2
GATA1-ER
GATA1
125
101
56.2
GATA2
α-GATA2
G1E
G1E
-ER
4
ME
L
CH
12
125
101
56.2
35.8
α-CTCF
CTCF 125
101
56.2 TAL1
αTAL1
CH12 are B-lymphoid cells; others are erythroid. Cheryl Keller Capone
Major observations under investigation
• GATA1 binds to a majority of the DNA segments occupied by TAL1 in G1E-ER4 cells (+E2). However, over half of these segments are occupied by TAL1 prior to restoration of GATA1.– Only a minority are at GATA2 occupied segments (OSs)
• TAL1 seems to be redistributed around some target loci– Change gradient in TAL1 from HS6>HS1 to HS1>HS6 in Hbb LCR
• Large changes in histone modifications are not observed after restoring and activating GATA1– But some “small” changes are observed
• Level of GATA1 occupancy is similar in mouse (G1E-ER4+E2 cells) and human (K562 cells), but only a small minority of occupied segments are shared– 15,000 GATA1 OSs in each species
– 1,000 GATA1 OSs are shared
Hbb locus and
surrounding OR genes
TAL1 redistributes when GATA1 is restored
ChIPseq fits with previous data
PolII and TAL1 are recruited to Hbb genes when GATA1 is restored
Zfpm1
Induced immediately after GATA1-ER is activated
TAL1 occupancy corresponds to GATA2 OS in G1E
c-Kit
Repressed after GATA1-ER is activated
TAL1 occupancy at GATA1 OSs, may correspond to GATA2 OS in G1E
Loss of TAL1 occupancy correlates with repression
Changes in peaks of occupancy, co-occupancy
• Start with peak calls from MACS for all the TF OS and from Fseq for DNase hypersensitive sites
• Define overlapping segments as those sharing at least one nucleotide
• Use set operations tools in Galaxy to find overlapping segments
• Compare OS for each TF +/- GATA1 and find overlaps between TFs
Feature PkG1E PkER4DHS 512,815GATA1_OS 15,361GATA2_OS 2,077 10,759TAL1_OS 6,930 7,449CTCF_OS 15,757 27,909 Chris Morrissey
TAL1_G1E6,930
TAL1_ER47,449
Overlap2,777
GATA115,361
GATA1 TAL1_G1E4,269 GATA1 TAL1_ER4
4,443
GATA1 Overlap2,544
UU
U
TAL1 has the most overlap with GATA1
Chris Morrissey
CTCF_G1E15,757
CTCF_ER427,909
Overlap14,982
GATA115,361
GATA1 CTCF_G1E555
GATA1 CTCF_ER4932
GATA1 Overlap528
U
U
U
Chris Morrissey
CTCF expands, but doesn't move
GATA2_G1E2,077
GATA2_ER410,759
Overlap356
GATA115,361
GATA1 GATA2_G1E465
U
GATA1 GATA2_ER4178
U
GATA1 Overlap32
U
GATA2 moves a lot (?)
Chris Morrissey
But is this just an artifact of noisy GATA2 data? Seems like this would be an ideal application for Yu Zhang and Kuan-Bei’s improvement in peak calling by using ChIP data on other proteins. - RH
Compute ratios of signals in G1E and ER4, adjust by M vs A plot
• 1. for each 10bp bin, we have
tag counts for both G1E and ER4: tagcnts_g1e and tagcnts_er4;
the number of total mapped reads in G1E and ER4:
reads_g1e (in millions), and reads_er4 (in millions)
• 2. calculate rpm_g1e=(tagcnts_g1e+1)/reads_g1e;
rpm_er4=(tagcnts_er4+1)/reads_er4.
(the reason to do +1 is to remove zeros)
• 3. calculate M=log2(rpm_er4/rpm_g1e); A=0.5*log2(rpm_er4*rpm_g1e)
• 4. do MA-plot by plotting M versus A, and build a lowess line through the dots
• 5. based on the lowess regression, for each "A", predict a value "P"
• 6. calculate M'=M-P; this M' stands for the difference between ER4 and G1E.
Weisheng Wu and F. Chiaromonte
Effect of adjusting ratios by lowess of an M vs A plot
Weisheng Wu
Correlations among chromatin features and expression in TSS segments
• Examine 4kb DNA segments centered on transcription start sites (TSSs) for all genes
• Determine mean signal for TF occupancy, histone modifications in each
• Compute Log2 of ratios, MA lowess adjustment• Determine expression levels of genes and change
between G1E and ER4• Draw scatterplots and determine correlations for all
pairwise comparisons
Weisheng Wu
Correlations in G1E-ER4 cells +E2
Weisheng Wu Same sets of graphs for G1E and ratio of signals have been done
Notable pairwise correlations in TSSs
Weisheng Wu
• Co-occupancy by GATA1 and TAL1• Positive correlation with Trx marks (H3K4me)• Negative correlation with Pc marks (H3K27me3)
Limited explanatory power for CHANGE in expression
Changes (if any) in biochemical features at TSS show little or no difference between induced and repressed genes
black: TSSs of all genes, red: up, blue: down, green: non-responsive Weisheng Wu
Major results from pairwise correlations in GATA1os
Distribution of the changes of the TFs/HMs at GATA1os
The changes of HMs don’t differ quite much between induced and repressed genes at
GATA1os
Principal components in genomic features at TSSs
• PCA Dataset: Raw counts of all factors, in G1E and ER4 models, in 4kb window around TSS
Swathi A. Kumar
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
Std-dev 3.1743 0.8682 0.4928 0.2537 0.2253 0.1069 0.0982 0.0739 0.062 0.0507
Proportion of Var 0.8978 0.06716 0.0216 0.00573 0.00452 0.0010 0.0008 0.0004 0.0003 0.0002
Cumulative Var 0.8978 0.9649 0.9866 0.9923 0.9968 0.9979 0.9987 0.9992 0.9996 0.9998
Principal components in genomic features at TSSs
• PCA Dataset: Raw counts of all factors, in G1E and ER4 models, in 4kb window around TSS
Swathi A. Kumar
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
Std-dev 1.958 1.632 1.317 1.106 0.964 0.846 0.734 0.675 0.595 0.500
Proportion of Var 0.295 0.205 0.134 0.094 0.072 0.055 0.041 0.035 0.027 0.019
Cumulative Var 0.295 0.500 0.634 0.728 0.799 0.854 0.895 0.931 0.958 0.977
PCA results
Swathi A. Kumar
H3K4me3 is major contributor to variance
Swathi A. Kumar
Linear Discriminant Analysis
• Same dataset as used in pairwise correlations of features in TSS segments
• Initial run– Binary response of induced or repressed expression
• Responsive vs Non-responsive• Induced vs Repressed
– Predictor variables are raw counts of GATA1 and other associated transcription/histone factors, before and after induction.
– Leave-one-out cross validation
Swathi A. Kumar
Major results from LDA
Swathi A. Kumar
• Responsive vs Non-responsive
Misclassification rate = 0.007%
•Induced vs repressed
Misclassification rate = 36.7%
Real Response
Allocated Responsive Non-responsive
Responsive 3168 807
Non-responsive 877 5953
Real Response
Allocated Induced Repressed
Induced 3262 1958
Repressed 526 1014
Shared vs lineage-specific GATA1 OSs
Yong Cheng, Kuan-Bei Chen
• ChIP-seq reads in G1E-ER4 + E2 for mouse GATA1 OSs• ChIP-seq reads in K562 for human GATA1 OSs• Map each to their respective genomes (ELAND)• MACs peak calls
– 15,000 in each
• LiftOver each set of peaks to other species– About 10,000 liftOver in each
• Run intersection of the liftedOver peaks• 1000 are shared in both species
Shared GATA1 OSs show higher occupancy level
ChIPseqsignal for GATA1 in eachOS
Yong Cheng
Genes close to shared GATA1 OSs are enriched for well-known erythroid functions
Shared GATA1 OSs are enriched in induced genes
Up Down
Genes with shared GATA1 OSs within 10kb
205 87
Genes without shared GATA1 OS within 10kb
637 704
Yong Cheng
Preserved GATA1 motifs are enriched in the shared GATA1 OSs
Shared Not shared
Preserved WGATAR
704 2090
w/o preserved WGATAR
252 6391
Yong Cheng
Genome-wide turnover analysis in mouse GATA1 occupied intervals
mm8.gata1
A: occupied intervals 15,360
B: occupied intervals with binding motifs in reference sequence 12,202
B/A 79.44%
C: occupied intervals with rodent-specific binding motifs 6,565
C/B 53.80%
D: occupied intervals with primate-rodent compensatory pattern 1,383
D/B 11.33%
E: occupied intervals with shared motif between human and mouse 3,018
E/B 24.73%
• B is under-estimated since we are using alignments instead of the mouse sequence itself
• D : the compensatory motifs have a minimum distance of 10 bpKuan-Bei Chen
Genome-wide turnover analysis in human
hg18.gata1 hg18.myc hg18.ctcf hg18.gabp
A: occupied intervals 15,252 11,583 26,976 6,442B: occupied intervals with binding motifs in reference sequence 12,331 9,315 24,913 2,958
B/A 80.85% 80.42% 92.35% 45.92%
C: occupied intervals with primate-specific binding motifs 6,348 4,452 11,297 1,290
C/B 51.48% 47.79% 45.35% 43.61%
D: occupied intervals with primate-rodent compensatory pattern 656 1,200 2,255 348
D/B 5.32% 12.88% 9.05% 11.76%
E: occupied intervals with shared motif between human and mouse 2,446 3,397 12,649 829
E/B 19.84% 36.47% 50.77% 28.03%
Kuan-Bei Chen
Patterns of GATA1 binding sites in shared human/mouse shared OSs
• Out of 956 shared OSs:
- 200 OSs have rodent-specific GATA1 motifs which are not present in human, chimp, rhesus, dog and cow .
- 626 OSs have GATA1 motifs shared by human and mouse.
- 76 OSs have primate-rodent compensatory patterns.
Kuan-Bei Chen
Investigators on “PSU” mouse ENCODE
• Penn State– Hardison– Stephan Schuster – Frank Pugh– Robert Paulson – Francesca Chiaromonte, OSC– Yu Zhang, OSC– Webb Miller, OSC– Anton Nekrutenko, OSC
• Childrens’ Hospital of Philadelphia– Mitch Weiss– Gerd Blobel
• Emory Univ.– James Taylor
• Univ. Massachusetts– Job Dekker
• Duke Univ.– Greg Crawford, consultant– Terry Furey, consultant
• Cal Tech - Barbara Wold
Aims of “PSU” mouse ENCODE
Conservation of function, conservation of sequence (or not)