ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael...

20
ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney, Marc Schaub, Alan Boyle

Transcript of ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael...

Page 1: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

ENCODE: Understanding the Genome

Michael Snyder

November 6, 2012

Conflicts: Personalis, Genapsys, Illumina

Slides From Ewan Birney, Marc Schaub, Alan Boyle

Page 2: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Encyclopedia of DNA Elements (ENCODE)

• NHGRI-funded consortium

• Goal: delineate all functional elements in the human genome

• Wide array of experimental assays

• Three Phases: 1) Pilot 2) Scale Up 1.0 3) Scale up 2.0

The ENCODE Project Consortium. An Integrated Encyclopedia of DNA Elements in the Human Genome. Nature 2012 Project website: http://encodeproject.org

Page 3: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

3

The ENCODE Consortium Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides)

Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng)

Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer)

Jim Kent (David Haussler, Kate Rosenbloom)

John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick

Navas, Phil Green)

Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman)

Rick Myers (Barbara Wold)

Scott Tenenbaum (Luiz Penalva)

Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent,

Roderic Guigo)

Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan,

Yoshihide Hayashizaki)

Zhiping Weng (Nathan Trinklein, Rick Myers)

Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt

.. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups

NHGRI: Elise Feingold, Mike Pazin, Peter Good

Page 4: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Experimental Assays

Chip-seq (165 TFs

+ Histone marks)

RNA-seq (292)

DNAse-seq (~200)

Page 5: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

RNA-Sequencing

Wang et al. 2009 Nat Gen. Rev.

Page 6: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Immunoprecipitation

Transcription Factor

Sequence and align ChIP-seq

Peak 300-500 bp

Antibody

Functional data: ChIP-seq

ChIP-exo Histone Marks

Motif (8-12 bp)

Page 7: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

DNaseI

Histone Histone

Transcription Factor

Sequence and align

DNaseI hypersensitivity peak

Functional data: DNase-seq

Region of open chromatin

Page 8: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

DNaseI

Histone Histone

Transcription Factor

Sequence and align

DNaseI Footprint

Functional data: DNase footprints

Region of open chromatin

Page 9: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Examples of Signal Tracks

a b

c

de

Phenotype−associated SNPs

Random sampling of matched SNPs

Genotyped SNPs

1000 Genomes

24 Peqsonal genomes

chq5:40,390,001-40,440,000 (50,000 bp)

qs4613763 qs17234657 qs11742570

qs6451493

qs1992660

qs6896969 qs1373692 qs9292777

HUVEC G ATA2

HUVEC cFOS

HUVEC Input

HUVEC

Juqkat

Th1

Th2

chq5:

G W AS Catalog

Human Feb. 2009 (GRCh37/hg19) chq5:39,274,501-40,819,500 (1,545,000 bp)

39500000 40000000 40500000

C9

DAB2

BC026261

PTGER4TTC33

OSRF

PRKAA1

DNase I

TFs

Cqohn’s disease

ulceqative colitis

multiple scleqosis

DNaseI peaks TF

Fraction of SNPs that overlap featuqe

0

0.1

0.2

0.3

log2 (fold enrichment oq depletion)

−0.5

0.0

0.5

1.0

1.5

E WETSSPF

CTCFTD/R

GM12878

genes above

thqeshold

G O:0006955 immune qesponseG WAS enqichment10

-log p-value

Phenotype SN

P-P

hen

o a

sso

cia

tio

ns

overl

ap

an

y T

F o

cc

up

an

cy

Gm

12

878

Eb

f

Gm

12

878

Po

l24h

8

Gm

12

878

Po

l2

Hela

s3C

eb

pb

Igg

rab

K56

2C

tcf

Gm

12

878

Pu

1

Gm

12

878

Me

f2a

K56

2P

ol2

V04

16

101

Gm

12

878

Pax5

c20

Gm

12

878

Nfk

bIg

gra

b

Hep

g2

Ctc

f

Hu

vecG

ata

2U

cd

Gm

12

878

Elf

1sc

631

V0

4161

01

Gm

12

878

Eg

r1V

0416

101

Hep

g2

Ma

fkab

50322

Igg

rab

Hu

vecC

fosU

cd

Gm

12

878

Bcl1

1a

Gm

12

878

Irf4

Gm

12

878

Batf

Gm

12

878

Tc

f12

Hep

g2

Fo

sl2

K56

2Ta

l1sc

12984

Igg

mu

s

K56

2M

ax

V0

4161

02

Hep

g2

Tcf4

Ucd

Hep

g2

Fo

xa2

Hep

g2

P30

0V

041

610

1

Hep

g2

Fo

xa1c2

0

Hu

vecC

tcf

Hela

s3C

tcf

Hep

g2

Ju

nd

CA

CO

2.D

S823

5

HU

VE

C.a

ll

Hep

G2.a

ll

Ju

rkat.

DS

12

659

hT

H1

.all

hT

H2

.DS

784

2

CD

34

.DS

16

814

TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81

Height 204 34 7 3 3 7 6 1 3 2 3 2 6 0 4 6 3 2 3 5 5 2 0 2 3 1 2 0 2 5 4 3 3 6 5 4 9 3 7

Systemic_lupus_erythematosus 62 10 4 6 6 2 1 1 4 0 1 4 1 1 4 2 0 1 2 3 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1

Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 5 1 1 1 3 2 1 1 0 2 1 1 2 1 2 3 2 3 1 3 6 5 3 9 5 5

Ulcerative_colitis 85 11 2 3 3 0 1 2 3 1 3 3 1 2 3 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 3 2 5 3 7 2 3

Multiple_sclerosis 71 15 4 3 3 1 0 3 4 2 4 2 0 2 2 1 0 2 4 3 2 3 0 3 1 0 0 0 0 0 0 0 0 1 1 3 5 4 3

Rheumatoid_arthritis 57 11 4 2 2 1 0 4 3 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 3 1

LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 0 1 0 0 0 0 0 3 0 3 2 3 2 2 1 1 3 1 0 2 1 0 1 0

Bone_mineral_density 65 9 1 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 3 1 1 1 2 2 4 3 3 2 3

Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 0 0 1 1 1 3 1 2 2 2 1 1 1 3 2 3 0 6 0 1

Chronic_lymphocytic_leukemia 17 8 1 4 5 0 0 3 1 0 2 1 0 0 2 0 1 0 2 1 1 2 0 1 0 1 0 0 0 0 0 0 1 0 0 0 2 0 1

Prostate_cancer 56 8 0 0 0 0 0 0 0 1 0 0 2 1 0 0 3 2 0 0 0 0 2 1 1 4 3 3 3 0 0 2 2 3 1 1 2 0 1

Triglycerides 48 10 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0 3 1 2 1 2 2 1 2 2 3 0 2 1 0 2 1 0

Celiac_disease 54 11 4 3 3 0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2

Colorectal_cancer 18 5 0 0 0 1 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 2 3 3 3 0 0 2 2 0 1 0 2 0 1

Hematological_parameters 85 12 3 0 0 3 1 0 3 0 1 1 2 0 0 0 2 0 1 1 2 2 1 1 0 0 1 1 1 0 0 0 1 1 3 3 6 3 5

HIV-1_control 55 10 0 2 4 1 2 0 1 2 2 0 0 0 1 0 0 0 1 1 1 1 1 1 2 1 1 2 1 0 0 1 0 0 0 0 2 1 0

Protein_quantitative_trait_loci 48 7 2 2 2 0 0 2 1 1 1 0 0 1 2 2 1 0 2 1 2 1 0 0 1 0 0 0 0 0 0 1 1 1 2 1 2 1 1

Alzheimer's_disease 42 5 0 0 0 1 2 0 0 1 0 0 2 0 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 2 2 1 1 2 0 0 2 0 1

HDL_cholesterol 55 8 1 0 0 1 1 0 0 2 1 0 1 0 0 0 1 2 0 0 0 1 2 1 1 1 2 2 2 1 1 2 0 1 2 0 3 1 0

Cholesterol 16 6 1 0 0 0 2 0 0 2 2 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 2 1 1 0 0 2 0 1 1 0

Longevity 30 5 0 2 3 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 1 2 0 2 1 0 0 1

Attention_deficit_hyperactivity_disorder 102 9 0 0 0 1 2 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 2 0 1 0 0

Cognitive_performance 111 8 0 0 2 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 1 3 0 0 0 0

Type_2_diabetes 97 13 0 0 0 1 1 2 1 1 1 0 1 1 0 0 2 1 1 1 1 0 0 3 1 0 0 0 0 2 1 0 1 0 2 0 4 0 0

Conduct_disorder 38 5 0 1 1 1 3 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 0 1 2 2 1 0 1

Type_1_diabetes 67 7 2 1 1 0 0 2 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 2 2 0 1 0 1 0 2 1 5 1 1

Dialysis-related_mortality 26 6 1 0 0 1 1 1 0 0 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 2 1 0 0 1

Bipolar_disorder 110 6 1 0 0 2 1 0 0 1 0 0 0 1 0 0 0 3 0 0 0 0 1 0 1 0 0 0 0 0 0 1 2 3 1 0 2 0 1

Body_mass 98 5 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 0 1 0 0

C-reactive_protein 34 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 2 1 0 0 1

Menarche_and_menopause 62 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 0 2 1 1 0

Breast_cancer 43 6 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 2 2 0 0

Mean_platelet_volume 15 5 1 0 0 0 0 1 0 0 1 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 2 0 1

Soluble_levels_of_adhesion_molecules 5 5 1 0 1 1 0 1 0 2 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Psoriasis 38 6 1 1 2 0 0 2 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 4 0 0

Parkinson's_disease 46 5 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0

Obesity 36 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Fasting_glucose-related_traits 17 5 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 1 1 1 1 0 0

Ross Hardison, Belinda Giardine

a b

c

de

Phenotype−associated SNPs

Random sampling of matched SNPs

Genotyped SNPs

1000 Genomes

24 Peqsonal genomes

chq5:40,390,001-40,440,000 (50,000 bp)

qs4613763 qs17234657 qs11742570

qs6451493

qs1992660

qs6896969 qs1373692 qs9292777

HUVEC G ATA2

HUVEC cFOS

HUVEC Input

HUVEC

Juqkat

Th1

Th2

chq5:

G W AS Catalog

Human Feb. 2009 (GRCh37/hg19) chq5:39,274,501-40,819,500 (1,545,000 bp)

39500000 40000000 40500000

C9

DAB2

BC026261

PTGER4TTC33

OSRF

PRKAA1

DNase I

TFs

Cqohn’s disease

ulceqative colitis

multiple scleqosis

DNaseI peaks TF

Fraction of SNPs that overlap featuqe

0

0.1

0.2

0.3

log2 (fold enrichment oq depletion)

−0.5

0.0

0.5

1.0

1.5

E WETSSPF

CTCFTD/R

GM12878

genes above

thqeshold

G O:0006955 immune qesponseG WAS enqichment10

-log p-value

Phenotype SN

P-P

hen

o a

sso

cia

tio

ns

overl

ap

an

y T

F o

cc

up

an

cy

Gm

128

78E

bf

Gm

128

78P

ol2

4h

8

Gm

128

78P

ol2

Hela

s3C

eb

pb

Igg

rab

K56

2C

tcf

Gm

128

78P

u1

Gm

128

78M

ef2

a

K56

2P

ol2

V04

16101

Gm

128

78P

ax5

c20

Gm

128

78N

fkb

Igg

rab

Hep

g2

Ctc

f

Hu

vecG

ata

2U

cd

Gm

128

78E

lf1

sc631

V0

4161

01

Gm

128

78E

gr1

V0416

101

Hep

g2

Mafk

ab

50322

Igg

rab

Hu

vecC

fosU

cd

Gm

128

78B

cl1

1a

Gm

128

78Ir

f4

Gm

128

78B

atf

Gm

128

78Tcf1

2

Hep

g2

Fo

sl2

K56

2Ta

l1sc

12984

Igg

mu

s

K56

2M

ax

V0

4161

02

Hep

g2

Tcf4

Ucd

Hep

g2

Fo

xa2

Hep

g2

P30

0V

0416

101

Hep

g2

Fo

xa1c2

0

Hu

vecC

tcf

Hela

s3C

tcf

Hep

g2

Ju

nd

CA

CO

2.D

S823

5

HU

VE

C.a

ll

Hep

G2.a

ll

Ju

rkat.

DS

12

659

hT

H1.a

ll

hT

H2.D

S784

2

CD

34

.DS

16

814

TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81

Height 204 34 7 3 3 7 6 1 3 2 3 2 6 0 4 6 3 2 3 5 5 2 0 2 3 1 2 0 2 5 4 3 3 6 5 4 9 3 7

Systemic_lupus_erythematosus 62 10 4 6 6 2 1 1 4 0 1 4 1 1 4 2 0 1 2 3 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1

Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 5 1 1 1 3 2 1 1 0 2 1 1 2 1 2 3 2 3 1 3 6 5 3 9 5 5

Ulcerative_colitis 85 11 2 3 3 0 1 2 3 1 3 3 1 2 3 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 3 2 5 3 7 2 3

Multiple_sclerosis 71 15 4 3 3 1 0 3 4 2 4 2 0 2 2 1 0 2 4 3 2 3 0 3 1 0 0 0 0 0 0 0 0 1 1 3 5 4 3

Rheumatoid_arthritis 57 11 4 2 2 1 0 4 3 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 3 1

LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 0 1 0 0 0 0 0 3 0 3 2 3 2 2 1 1 3 1 0 2 1 0 1 0

Bone_mineral_density 65 9 1 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 3 1 1 1 2 2 4 3 3 2 3

Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 0 0 1 1 1 3 1 2 2 2 1 1 1 3 2 3 0 6 0 1

Chronic_lymphocytic_leukemia 17 8 1 4 5 0 0 3 1 0 2 1 0 0 2 0 1 0 2 1 1 2 0 1 0 1 0 0 0 0 0 0 1 0 0 0 2 0 1

Prostate_cancer 56 8 0 0 0 0 0 0 0 1 0 0 2 1 0 0 3 2 0 0 0 0 2 1 1 4 3 3 3 0 0 2 2 3 1 1 2 0 1

Triglycerides 48 10 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0 3 1 2 1 2 2 1 2 2 3 0 2 1 0 2 1 0

Celiac_disease 54 11 4 3 3 0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2

Colorectal_cancer 18 5 0 0 0 1 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 2 3 3 3 0 0 2 2 0 1 0 2 0 1

Hematological_parameters 85 12 3 0 0 3 1 0 3 0 1 1 2 0 0 0 2 0 1 1 2 2 1 1 0 0 1 1 1 0 0 0 1 1 3 3 6 3 5

HIV-1_control 55 10 0 2 4 1 2 0 1 2 2 0 0 0 1 0 0 0 1 1 1 1 1 1 2 1 1 2 1 0 0 1 0 0 0 0 2 1 0

Protein_quantitative_trait_loci 48 7 2 2 2 0 0 2 1 1 1 0 0 1 2 2 1 0 2 1 2 1 0 0 1 0 0 0 0 0 0 1 1 1 2 1 2 1 1

Alzheimer's_disease 42 5 0 0 0 1 2 0 0 1 0 0 2 0 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 2 2 1 1 2 0 0 2 0 1

HDL_cholesterol 55 8 1 0 0 1 1 0 0 2 1 0 1 0 0 0 1 2 0 0 0 1 2 1 1 1 2 2 2 1 1 2 0 1 2 0 3 1 0

Cholesterol 16 6 1 0 0 0 2 0 0 2 2 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 2 1 1 0 0 2 0 1 1 0

Longevity 30 5 0 2 3 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 1 2 0 2 1 0 0 1

Attention_deficit_hyperactivity_disorder 102 9 0 0 0 1 2 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 2 0 1 0 0

Cognitive_performance 111 8 0 0 2 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 1 3 0 0 0 0

Type_2_diabetes 97 13 0 0 0 1 1 2 1 1 1 0 1 1 0 0 2 1 1 1 1 0 0 3 1 0 0 0 0 2 1 0 1 0 2 0 4 0 0

Conduct_disorder 38 5 0 1 1 1 3 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 0 1 2 2 1 0 1

Type_1_diabetes 67 7 2 1 1 0 0 2 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 2 2 0 1 0 1 0 2 1 5 1 1

Dialysis-related_mortality 26 6 1 0 0 1 1 1 0 0 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 2 1 0 0 1

Bipolar_disorder 110 6 1 0 0 2 1 0 0 1 0 0 0 1 0 0 0 3 0 0 0 0 1 0 1 0 0 0 0 0 0 1 2 3 1 0 2 0 1

Body_mass 98 5 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 0 1 0 0

C-reactive_protein 34 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 2 1 0 0 1

Menarche_and_menopause 62 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 0 2 1 1 0

Breast_cancer 43 6 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 2 2 0 0

Mean_platelet_volume 15 5 1 0 0 0 0 1 0 0 1 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 2 0 1

Soluble_levels_of_adhesion_molecules 5 5 1 0 1 1 0 1 0 2 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Psoriasis 38 6 1 1 2 0 0 2 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 4 0 0

Parkinson's_disease 46 5 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0

Obesity 36 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Fasting_glucose-related_traits 17 5 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 1 1 1 1 0 0

Page 10: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

ENCODE Dimensions

Methods/Factors

Cells

200 C

ell

Lin

es/ T

issues

200 Assays (~165 ChIP-Seq of different TFs)

3,010 Experiments

~10 TeraBases

~3000x of the Human Genome

Mouse:

126 TF ChIP

70 RNA-Seq

Page 11: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

ENCODE Uniform Analysis Pipeline

Mapped reads from production (Bam)

Uniform Peak Calling Pipeline (SPP, PeakSeq)

IDR Processing, QC and Blacklist Filtering

Motif Discovery Stats, GSC enrichments, etc.

Signal Aggregation over peaks

Signal Generation (read extension and mappability correction)

Segmentation

Poor reproducibility Good reproducibility

Rep1

Re

p2

Self Organising Maps

ChromHMM/Segway

Anshul Kundaje, Qunhua Li, Michael Hoffman, Jason Ernst, Joel Rozowsky, Pouya Kheradpour

Page 12: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Raw genome coverage of elements Element Type Coverage Cumulative

Coverage

Exons 3% 3%

Chip-seq bound motifs 4.5% 5%

DNaseI Footprints 5.7% 9%

Chip-seq bound regions 8.1% 12%

DNaseI HS regions 15.2% 19.4%

Histone Modifications (*) 44% 49%

RNA 62% 80%

(* excluding broad marks)

Region

Bound Motif/ Footprint

(Union over all experiments and cell types)

Page 13: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

ENCODE Integrative Segmentations

~7 Major genome segments

25 “elaborations”

1,000s of details

Well Known: TSS, Gene Start,

Gene Bodies

New Info: “Enhancers” (2 states),

Insulators

Unexpected: Specific Gene End

Page 14: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Experimental Confirmation of New Enhancers

Mann Whitney 0.003 HMM vs Background

1e-7, HMM vs Naïve or Biologist picks

Myers Lab

53% hit rate in Mouse Assay

Pennacchio Lab

Jason Gertz, Barbara Wold, Rick Myers, Len Pennacchio

Page 15: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Many other stories…

TF Co association and

Regulatory Code

Mike Snyder+Mark

Gerstein

Splicing/Histone interaction (Roderic Guigo)

RNA landscape Tom Gingeras

DNAseI footprints – John Stam. DNA Methylation – Rick Myers

Page 16: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

16

The ENCODE Consortium Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides)

Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng)

Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer)

Jim Kent (David Haussler, Kate Rosenbloom)

John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick

Navas, Phil Green)

Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman)

Rick Myers (Barbara Wold)

Scott Tenenbaum (Luiz Penalva)

Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent,

Roderic Guigo)

Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan,

Yoshihide Hayashizaki)

Zhiping Weng (Nathan Trinklein, Rick Myers)

Additional ENCODE Participants: Elliott Marguiles, Eric Green, Job Dekker, Laura Elnitski, Len Pennachio, Jochen Wittbrodt

.. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups

NHGRI: Elise Feingold, Mike Pazin, Peter Good

Page 17: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,
Page 18: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Saturation

Number of cell lines

Num

be

r of e

lem

ents

0 5 10 15 20 25 30 35 40 45 50 55 60

020000

040

0000

6000

00

800000

100

0000

12000

00

Most aggressive fit for saturation suggests a maximum of 50% of elements discovered Likely to be lower due to inaccessible cell types etc

Steve Wilder

Page 19: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Discovering functional genome segments

Well understood:

TSS, Gene Start,

Gene Bodies

Reassuringly Interesting

“Enhancers” (2 states)

Insulators

Definitely There, Unexpected

Specific Gene End

Sub-classification of Repeats

~7 Major segments of the genome

25 “elaborations”

1,000s of details

Michael Hoffman, Jason Ernst, Bill Noble, Manolis Kellis

Page 20: ENCODE: Understanding the Genome · 2012. 12. 11. · ENCODE: Understanding the Genome Michael Snyder November 6, 2012 Conflicts: Personalis, Genapsys, Illumina Slides From Ewan Birney,

Irreproducible Discovery Rate

(IDR) If one re-ran the experiment, what is the probability one

would observe the same element at this rank or better

Uses ranked element lists from two replicates, and makes

the assumption that there is noise at the bottom of the rank

Chip-seq Dnase-seq RNA-seq

Ben Brown, Qunhau Li, Peter Bickel