Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

54
Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010

Transcript of Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Page 1: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Genomics of Erythroid Regulation: G1E and G1E-ER4

January 20, 2010

Page 2: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Investigators on global predictions and tests

• Penn State– Hardison– Francesca Chiaromonte– Yu Zhang– Webb Miller– Stephan Schuster – Frank Pugh, collaborator– Kateryna Makova, collaborator– Anton Nekrutenko, collaborator

• Childrens’ Hospital of Philadelphia– Mitch Weiss– Gerd Blobel

• Emory Univ.– James Taylor

• Duke Univ.– Greg Crawford, collaborator

• Univ. Queensland– Andrew Perkins, collaborator

• NHGRI– Laura Elnitski, collaborator

Page 3: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Aims of Global tests and predictions of erythroid regulation

Page 4: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Hematopoiesis

Page 5: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Factor Class Mode of discovery

GATA1 Zn finger binds globin locusNF-E2 bZIP binds globin locusKLF1/EKLF Zn finger subtractive hybridizationSCL/TAL1 bHLH rearranged in leukemiasGFI1b Zn finger oncoviral integration siteZBTB7a/LRF POZ-Kruppel proto-oncogene lymphomasOthers

Major erythroid transcription factors

Page 6: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Globin genes-GATA-

GATA-1

Transcription factor GATA-1

• Founding member of a small family of proteins- GATA-2 GATA-6

• Binds functionally important cis WGATAR motifs in regulatory regions of many hematopoietic genes

• Essential for erythroid and megakaryocyte development- gene knockout studies in mice- analysis of human patients

Page 7: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Lineage-restricted factors

cofactors

histone modifying enzymes

Transcription

Page 8: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Gene activation by alterations in chromatin

• “Regulatory signals entering the nucleus encounter chromatin, not DNA, and the rate-limiting biochemical response that leads to activation of gene expression in most cases involves alterations in chromatin structure. How are such alterations achieved?”

– Gary Felsenfeld & Mark Groudine (2003) Controlling the double helix. Nature 421: 448-453

• "It is now generally argued that reorganization of these chromatin structures is a process that is mechanistically linked to many gene activation or repression events, and is initiated by the action of site-specific transcription factors, acting either through ATP-driven nucleosome remodeling machines, or via the action of enzymes that covalently modify various components of the chromatin structure.”

– Sam John, … John A. Stamatoyannopoulos, and Gordon L. Hager (2008) Interaction of the Glucocorticoid Receptor with the Chromatin Landscape. Molecular Cell 29, 611–624.

• “These results imply that GATA-1 is sufficient to direct chromatin structure reorganization within the beta-globin LCR and an erythroid pattern of gene expression in the absence of other hematopoietic transcription factors.”

– Layon ME, Ackley CJ, West RJ, Lowrey CH. (2007) Expression of GATA-1 in a non-hematopoietic cell line induces beta-globin locus control region chromatin structure remodeling and an erythroid pattern of gene expression. J Mol Biol. 366:737-744.

Page 9: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Chromatin transitions before and after TF binding

• "We propose four specific kinds of interaction. The classical mode for GR binding to chromatin involves receptor-dependent recruitment of the Swi/Snf complex (1), resulting in a hormone-dependent hypersensitive transition. It is now clear, however, that some hormone-dependent events must involve other remodeling species (2). Furthermore, many GR binding events are associated with pre-existing transitions, and these constitutive events fall again into two classes, Brg1 dependent (3) and Brg1 independent (4). ”

– Sam John, … John A. Stamatoyannopoulos, and Gordon L. Hager (2008) Interaction of the Glucocorticoid Receptor with the Chromatin Landscape. Molecular Cell 29, 611–624.

Page 10: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Order of events in activation can vary

• Figure 5. Models Depicting Different Orders of Action by Regulators and Chromatin- Remodeling Complexes

• Regulators, HAT complexes, and ATP-dependent remodeling complexes can act in different orders (pathway A, B, or C) and still give the same end result: a template competent for transcription. Although not shown, it is also possible that binding by the general transcription factors precedes the action and recruitment of HAT complexes and ATP-dependent remodelers.

– Geeta J. Narlikar, Hua-Ying Fan and Robert E. Kingston (2002) Cooperation between Complexes that Regulate Chromatin Structure and Transcription. Cell 108: 475-487

Page 11: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

What biochemical events precede and follow GATA1 binding?

• Where should we look in the genome?– Segments around the transcription start site (TSS)

• all genes• expressed vs nonexpressed genes• all GATA1-responsive genes

– induced vs repressed genes– All TF-occupied segments (OSs)– Distal TF OSs (i.e. outside the TSS zone)– All mappable regions

• Much larger computation

• Treat levels of biochemical marks as– continuous variables– discrete segments (bound or not, histones modified or not)

• Categorize genes and OSs by the order of events– Category 1. Co-occupancy by other TFs, histone modifications, and DHS

formation occurs after the TF1 of interest binds or is activated– Category 2. Co-occupancy by other TFs, histone modifications, and DHS

formation occurs before the TF1 of interest binds or is activated

Page 12: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Gata1–

ES cells

in vitro hematopoieticdifferentiation:

erythropoietinstem cell factor

immature hematopoietic

cell lines

thrombopoietin

add back GATA-1G1E G1ME

erythroid

+megakaryocyte

erythroid

G1E-ER4+estradiol

Cell-based models to study GATA1 function

Page 13: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

+ estradiol

Stably expressing estrogen-activated GATA-1 (GATA-1-ER)

Global analysis of GATA1-regulated erythroid gene expression in G1E cells (1999-2009)

Page 14: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

hemoglobin

morphology

U74 array 12,500 probesets

9,266 genes

430 2.0 array45,000 probesets19,000 genes

hrs in estradiol

0 3 7 14 21 30

Affymetrix gene chipBlood 2004Genome Res 2009

Transcriptome analysis

Page 15: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

GATA1-induced (>2-fold)1048 genes

• known targets• new gene discovery

GATA1-repressed (>2-fold)1568 genes

• stem cell/progenitor markers• proto-oncogenes (Kit/Myc/Myb)• function unknown

Affy 430 2.0

Kinetics of GATA1-regulated Gene Expression

Page 16: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

• 60 megabase region of chromosome 7

• identify new GATA1-regulated genes• define combinatorial TF interactions• correlate histone marks w/ TF occupancy and gene expression

Factor occupancy and GATA1 responses

Page 17: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Direct and indirect effects in repression and activation

Page 18: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Datasets available and in progress

Feature G1E G1E-ER4 + E2

DNase hypersensitive sites In progress Done

GATA1 occupancy No Done

TAL1 occupancy Done Done, will repeat

GATA2 occupancy Done Done

CTCF occupancy Done Done

H3K4me1 Done Done

H3K4me3 Done Done

H3K27me3 Done Done

RNA polymerase II Repeating it Repeating it

RNA seq In progress In progress

Page 19: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Western blots show specificity of antibodies and presence of proteins in cells

α-GATA1

G1E

G1E

-ER

4

ME

L

CH

12125

101

56.2

GATA1-ER

GATA1

125

101

56.2

GATA2

α-GATA2

G1E

G1E

-ER

4

ME

L

CH

12

125

101

56.2

35.8

α-CTCF

CTCF 125

101

56.2 TAL1

αTAL1

CH12 are B-lymphoid cells; others are erythroid. Cheryl Keller Capone

Page 20: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Major observations under investigation

• GATA1 binds to a majority of the DNA segments occupied by TAL1 in G1E-ER4 cells (+E2). However, over half of these segments are occupied by TAL1 prior to restoration of GATA1.– Only a minority are at GATA2 occupied segments (OSs)

• TAL1 seems to be redistributed around some target loci– Change gradient in TAL1 from HS6>HS1 to HS1>HS6 in Hbb LCR

• Large changes in histone modifications are not observed after restoring and activating GATA1– But some “small” changes are observed

• Level of GATA1 occupancy is similar in mouse (G1E-ER4+E2 cells) and human (K562 cells), but only a small minority of occupied segments are shared– 15,000 GATA1 OSs in each species

– 1,000 GATA1 OSs are shared

Page 21: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Hbb locus and

surrounding OR genes

TAL1 redistributes when GATA1 is restored

ChIPseq fits with previous data

PolII and TAL1 are recruited to Hbb genes when GATA1 is restored

Page 22: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Zfpm1

Induced immediately after GATA1-ER is activated

TAL1 occupancy corresponds to GATA2 OS in G1E

Page 23: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

c-Kit

Repressed after GATA1-ER is activated

TAL1 occupancy at GATA1 OSs, may correspond to GATA2 OS in G1E

Loss of TAL1 occupancy correlates with repression

Page 24: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Changes in peaks of occupancy, co-occupancy

• Start with peak calls from MACS for all the TF OS and from Fseq for DNase hypersensitive sites

• Define overlapping segments as those sharing at least one nucleotide

• Use set operations tools in Galaxy to find overlapping segments

• Compare OS for each TF +/- GATA1 and find overlaps between TFs

Feature PkG1E PkER4DHS 512,815GATA1_OS 15,361GATA2_OS 2,077 10,759TAL1_OS 6,930 7,449CTCF_OS 15,757 27,909 Chris Morrissey

Page 25: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

TAL1_G1E6,930

TAL1_ER47,449

Overlap2,777

GATA115,361

GATA1 TAL1_G1E4,269 GATA1 TAL1_ER4

4,443

GATA1 Overlap2,544

UU

U

TAL1 has the most overlap with GATA1

Chris Morrissey

Page 26: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

CTCF_G1E15,757

CTCF_ER427,909

Overlap14,982

GATA115,361

GATA1 CTCF_G1E555

GATA1 CTCF_ER4932

GATA1 Overlap528

U

U

U

Chris Morrissey

CTCF expands, but doesn't move

Page 27: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

GATA2_G1E2,077

GATA2_ER410,759

Overlap356

GATA115,361

GATA1 GATA2_G1E465

U

GATA1 GATA2_ER4178

U

GATA1 Overlap32

U

GATA2 moves a lot (?)

Chris Morrissey

But is this just an artifact of noisy GATA2 data? Seems like this would be an ideal application for Yu Zhang and Kuan-Bei’s improvement in peak calling by using ChIP data on other proteins. - RH

Page 28: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Compute ratios of signals in G1E and ER4, adjust by M vs A plot

• 1. for each 10bp bin, we have

tag counts for both G1E and ER4: tagcnts_g1e and tagcnts_er4;

the number of total mapped reads in G1E and ER4:

reads_g1e (in millions), and reads_er4 (in millions)

• 2. calculate rpm_g1e=(tagcnts_g1e+1)/reads_g1e;

rpm_er4=(tagcnts_er4+1)/reads_er4.

(the reason to do +1 is to remove zeros)

• 3. calculate M=log2(rpm_er4/rpm_g1e); A=0.5*log2(rpm_er4*rpm_g1e)

• 4. do MA-plot by plotting M versus A, and build a lowess line through the dots

• 5. based on the lowess regression, for each "A", predict a value "P"

• 6. calculate M'=M-P; this M' stands for the difference between ER4 and G1E.

Weisheng Wu and F. Chiaromonte

Page 29: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Effect of adjusting ratios by lowess of an M vs A plot

Weisheng Wu

Page 30: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Correlations among chromatin features and expression in TSS segments

• Examine 4kb DNA segments centered on transcription start sites (TSSs) for all genes

• Determine mean signal for TF occupancy, histone modifications in each

• Compute Log2 of ratios, MA lowess adjustment• Determine expression levels of genes and change

between G1E and ER4• Draw scatterplots and determine correlations for all

pairwise comparisons

Weisheng Wu

Page 31: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Correlations in G1E-ER4 cells +E2

Weisheng Wu Same sets of graphs for G1E and ratio of signals have been done

Page 32: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Notable pairwise correlations in TSSs

Weisheng Wu

• Co-occupancy by GATA1 and TAL1• Positive correlation with Trx marks (H3K4me)• Negative correlation with Pc marks (H3K27me3)

Page 33: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Limited explanatory power for CHANGE in expression

Page 34: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Changes (if any) in biochemical features at TSS show little or no difference between induced and repressed genes

black: TSSs of all genes, red: up, blue: down, green: non-responsive Weisheng Wu

Page 35: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Major results from pairwise correlations in GATA1os

Page 36: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Distribution of the changes of the TFs/HMs at GATA1os

Page 37: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

The changes of HMs don’t differ quite much between induced and repressed genes at

GATA1os

Page 38: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Principal components in genomic features at TSSs

• PCA Dataset: Raw counts of all factors, in G1E and ER4 models, in 4kb window around TSS

Swathi A. Kumar

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

Std-dev 3.1743 0.8682 0.4928 0.2537 0.2253 0.1069 0.0982 0.0739 0.062 0.0507

Proportion of Var 0.8978 0.06716 0.0216 0.00573 0.00452 0.0010 0.0008 0.0004 0.0003 0.0002

Cumulative Var 0.8978 0.9649 0.9866 0.9923 0.9968 0.9979 0.9987 0.9992 0.9996 0.9998

Page 39: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Principal components in genomic features at TSSs

• PCA Dataset: Raw counts of all factors, in G1E and ER4 models, in 4kb window around TSS

Swathi A. Kumar

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

Std-dev 1.958 1.632 1.317 1.106 0.964 0.846 0.734 0.675 0.595 0.500

Proportion of Var 0.295 0.205 0.134 0.094 0.072 0.055 0.041 0.035 0.027 0.019

Cumulative Var 0.295 0.500 0.634 0.728 0.799 0.854 0.895 0.931 0.958 0.977

Page 40: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

PCA results

Swathi A. Kumar

Page 41: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

H3K4me3 is major contributor to variance

Swathi A. Kumar

Page 42: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Linear Discriminant Analysis

• Same dataset as used in pairwise correlations of features in TSS segments

• Initial run– Binary response of induced or repressed expression

• Responsive vs Non-responsive• Induced vs Repressed

– Predictor variables are raw counts of GATA1 and other associated transcription/histone factors, before and after induction.

– Leave-one-out cross validation

Swathi A. Kumar

Page 43: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Major results from LDA

Swathi A. Kumar

• Responsive vs Non-responsive

Misclassification rate = 0.007%

•Induced vs repressed

Misclassification rate = 36.7%

Real Response

Allocated Responsive Non-responsive

Responsive 3168 807

Non-responsive 877 5953

Real Response

Allocated Induced Repressed

Induced 3262 1958

Repressed 526 1014

Page 44: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Shared vs lineage-specific GATA1 OSs

Yong Cheng, Kuan-Bei Chen

• ChIP-seq reads in G1E-ER4 + E2 for mouse GATA1 OSs• ChIP-seq reads in K562 for human GATA1 OSs• Map each to their respective genomes (ELAND)• MACs peak calls

– 15,000 in each

• LiftOver each set of peaks to other species– About 10,000 liftOver in each

• Run intersection of the liftedOver peaks• 1000 are shared in both species

Page 45: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Shared GATA1 OSs show higher occupancy level

ChIPseqsignal for GATA1 in eachOS

Yong Cheng

Page 46: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Genes close to shared GATA1 OSs are enriched for well-known erythroid functions

Page 47: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Shared GATA1 OSs are enriched in induced genes

Up Down

Genes with shared GATA1 OSs within 10kb

205 87

Genes without shared GATA1 OS within 10kb

637 704

Yong Cheng

Page 48: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Preserved GATA1 motifs are enriched in the shared GATA1 OSs

Shared Not shared

Preserved WGATAR

704 2090

w/o preserved WGATAR

252 6391

Yong Cheng

Page 49: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Genome-wide turnover analysis in mouse GATA1 occupied intervals

mm8.gata1

A: occupied intervals 15,360

B: occupied intervals with binding motifs in reference sequence 12,202

B/A 79.44%

C: occupied intervals with rodent-specific binding motifs 6,565

C/B 53.80%

D: occupied intervals with primate-rodent compensatory pattern 1,383

D/B 11.33%

E: occupied intervals with shared motif between human and mouse 3,018

E/B 24.73%

• B is under-estimated since we are using alignments instead of the mouse sequence itself

• D : the compensatory motifs have a minimum distance of 10 bpKuan-Bei Chen

Page 50: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Genome-wide turnover analysis in human

hg18.gata1 hg18.myc hg18.ctcf hg18.gabp

A: occupied intervals 15,252 11,583 26,976 6,442B: occupied intervals with binding motifs in reference sequence 12,331 9,315 24,913 2,958

B/A 80.85% 80.42% 92.35% 45.92%

C: occupied intervals with primate-specific binding motifs 6,348 4,452 11,297 1,290

C/B 51.48% 47.79% 45.35% 43.61%

D: occupied intervals with primate-rodent compensatory pattern 656 1,200 2,255 348

D/B 5.32% 12.88% 9.05% 11.76%

E: occupied intervals with shared motif between human and mouse 2,446 3,397 12,649 829

E/B 19.84% 36.47% 50.77% 28.03%

Kuan-Bei Chen

Page 51: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Patterns of GATA1 binding sites in shared human/mouse shared OSs

• Out of 956 shared OSs:

- 200 OSs have rodent-specific GATA1 motifs which are not present in human, chimp, rhesus, dog and cow .

- 626 OSs have GATA1 motifs shared by human and mouse.

- 76 OSs have primate-rodent compensatory patterns.

Kuan-Bei Chen

Page 52: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Investigators on “PSU” mouse ENCODE

• Penn State– Hardison– Stephan Schuster – Frank Pugh– Robert Paulson – Francesca Chiaromonte, OSC– Yu Zhang, OSC– Webb Miller, OSC– Anton Nekrutenko, OSC

• Childrens’ Hospital of Philadelphia– Mitch Weiss– Gerd Blobel

• Emory Univ.– James Taylor

• Univ. Massachusetts– Job Dekker

• Duke Univ.– Greg Crawford, consultant– Terry Furey, consultant

• Cal Tech - Barbara Wold

Page 53: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Aims of “PSU” mouse ENCODE

Page 54: Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010.

Conservation of function, conservation of sequence (or not)