The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA...

51
The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute National Institutes of Health AgENCODE Workshop January 10, 2014

Transcript of The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA...

Page 1: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

The Encyclopedia of DNA Elements (ENCODE) Project

Elise A. Feingold, Ph.D. National Human Genome Research Institute

National Institutes of Health

AgENCODE Workshop January 10, 2014

Page 2: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

How can we “read” the human genome sequence?

• Genetic code, but no genomic code • Evolutionary conservation helps to identify

functionally important regions ~5% conserved/ ~1.5% protein coding

What is function of non-coding conserved sequences? What is function of non-conserved sequences?

• Moderately good at identifying protein-coding regions, but fine structures difficult to predict from sequence

• Regulatory regions can be very far away from genes • Need unbiased experimental investigation

Page 3: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE: Encyclopedia of DNA Elements

Compile a comprehensive encyclopedia of all sequence features in the human genome and in the genomes of selected model organisms

Approach: Apply lessons learned from the success of the

Human Genome Project Start with well-defined pilot project Develop and test high-throughput technologies

Page 4: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Community Resources Use by research community to enhance understanding of:

– regulation of gene expression on a spatial, temporal and quantitative level

– genetic basis of disease

Rapid pre-publication data release Consortia publications Analysis requires development of:

– Common data reporting formats – Data standards – Analytical tools

Page 5: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Timeline

Page 6: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Products

Page 7: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

“Marker” Papers

Page 8: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

<>

Page 9: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

PLoS Biol (2011) 9:e1001046

Page 10: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

modENCODE Publications

19 companion papers in Nature, Genome Research,

Genome Biology and Database

Page 11: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE 2 Publications

September 2012

Page 12: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

modENCODE and ENCODE 2 Final Efforts

modENCODE

• Cross-species Analyses – Transcription – Chromatin – Regulation

• Transfer of data and analyses to ENCODE 3 DCC

ENCODE “2”

• Mouse ENCODE Cross-species Analyses

• Transfer of data and analyses to ENCODE 3 DCC

Page 13: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Data

Modified from PLoS Biol 9-e1001046,2011

Page 14: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE 2 Data

Human Data >2,800 Datasets • >200 Cell types • >250 RNA-seq • 150 DNase • 1,100 Transcription factor

binding • >200 Histone modification • 90 DNAme • GENCODE mRNA • Functional Characterization

Mouse Data >600 Datasets • 100 Cell types • 100 RNA-seq • 50 DNase • 170 Transcription factor

binding • 170 Histone

modification

Page 15: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Cel

ls

182

cell

Line

s/ T

issu

es

ENCODE Dimensions

Methods/Factors

164 Assays (114 different Chip)

3,010 Experiments 5 TeraBases

1716x of the Human Genome

Ewan Birney

Page 16: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

More than 30 papers in • Nature • Genome Research • Genome Biology • Science • Cell

Publishing innovations

• Threads of themes • Virtual machines • iPad app

ENCODE increased our understanding of non-coding DNA and human disease

ENCODE 2 Publications

From www.nature.com/encode

Page 17: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

High-Level Findings • Very large fraction of the genome is biochemically active

– 80% of the genome has an ENCODE annotation in at least one cell type

– Fraction that are functional TBD

• GWAS SNPs are enriched within non-coding functional elements – >50% of non-coding GWAS SNPs are near ENCODE-defined

regions – In many cases, disease phenotypes can be associated with a

specific cell type or transcription factor.

• Segmenting the genome into 7 chromatin states predicts ~400,000 enhancers and ~70,000 promoters as well as 1000s of quiescent states

Page 18: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Non-coding DNA Is Important For Disease And Evolution

• Non-coding DNA variants are known to cause human diseases

• Non-coding variants are known to cause changes in drug metabolism

• About 90% of GWAS findings lie outside of protein-coding regions

• More than 80% of recent adaptation signatures in three recent studies are not associated with protein-coding mutations

Stamatoyannopoulos, Science 337-1190, 2012 Kingsley, Nature 484-55,2012; Sabeti, Cell 152-703,2013; Fraser, Genome Research, doi:10.1101/gr.152710.112,2013

Page 19: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Data Access

Page 20: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Data Access

www.encodeproject.org

UCSC Genome Browser

Ensembl

wwww.modENCODE.org

NCBI

FlyBase

WormBase

Page 21: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Portal http://encodeproject.org

Page 22: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Displaying ENCODE data from ENCODE portal

http://encodeproject.org

Page 23: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Experiment Matrix

http://encodeproject.org

Page 24: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Data Standards

http://encodeproject.org

Page 25: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Software Tools

http://encodeproject.org

Page 26: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Publications

http://encodeproject.org

Page 27: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

0

100

200

300

400

500

600

Num

ber o

f Pub

licat

ions

Cumulative ENCODE Publications Over Time

Papers from Non-ENCODE Authors

Papers from ENCODE 2 Production Groups

Page 28: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

0

20

40

60

80

100

120

140

160

Num

ber o

f Pub

licat

ions

Cumulative Publications Using ENCODE Data by Non-ENCODE Authors

Basic Biology

Methods Development

Human Disease

Page 29: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Use of ENCODE Data in Linking Genotype to Phenotype

• ENCODE data can be used in hypothesis generation and refinement – What is the causal variant? – What is the target gene? – What is the target cell type? – How does the variant alter the phenotype?

Page 30: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Social Media Facebook

ENCODE (ENCyclopedia Of DNA Elements)

Twitter @ENCODE_NIH

Page 31: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Tutorial Pages

http://www.genome.gov/27553900

Page 32: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE 3

Page 33: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Catalog is incomplete

• Only a small fraction of transcription factors studied

• Deeper analysis across many additional cell types (more primary cells) needed

• Additional data types need to be studied, e.g., RNA-binding proteins, lncRNAs

Page 34: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE 3 Solicitation • Comprehensive catalogs of functional elements

• Existing capacity for high-throughput, efficient production

• Centralized production, management & coordination

• 7 high priority scientific areas

• More integrated data coordination and analysis

• Primary focus on human, secondary focus on mouse

• Fly/worm allowed if demonstrate need for: – highly centralized effort for specific data type

– Work to be undertaken as part of highly interactive consortium

Page 35: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Priority areas • Maps of all classes of functional RNA molecules • Fine structural genome annotation (of the human and mouse

genomes only) by improving gene models • Maps of sites of open chromatin • Maps of selected histone marks and other relevant chromatin

proteins • Maps of sites of DNA methylation • Maps of all functional sequence elements within RNA

molecules • Maps of the binding sites for more transcription factors, using

a minimum of two cell types for each previously unstudied factor, and additional, well justified cell types as resources permit – For transcription factors for which binding site maps

already exist, development of maps in additional cell types will be considered, but will be of lower priority and expansion of this data set must be strongly justified

Page 36: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE 3 Structure

Gene Models

RNA TF Binding

Data Coordination Center

Data Analysis Center Analysis Working Group

Element ID

Chromatin States

Histone Mods DNase DNAme

RBP Binding

Computational Analysis Groups

Technology Development Groups

Data Production Groups

Page 37: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Project Management

Page 38: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Project Management • Monthly teleconference calls

• Working groups to address specific issues

• Data Analysis Working Groups

• Annual meetings

• Project oversight by external advisors

Page 39: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Individual Project Management

• Yearly quantitative milestones • Quarterly progress reports

– Track status of experiments and data submission to identify bottlenecks

– Track costs – Additional narrative section to track non-

quantitative milestones, e.g., technology development and to discuss bottlenecks

Page 40: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Participants • Groups funded by ENCODE solicitations • Open to additional data production or data

analysis groups agreeing to criteria for participation – Genome-wide analysis – Full participation in Consortium activities – Abide by data release policy – Demonstrated funding source

• Encourage inter-consortia collaborations • Encourage other collaborations/coordination

Page 41: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Peak Calling

ChIP/CLIP/RIP-seq

Human Subjects

Operational

ENCODE Consortium Activities

Human Resources

Policies/Logistics

Mouse Resources

Data Release and Publications

Outreach

Functional Characterization and Validation

Data Coordination, Analysis, and Interpretation

Analysis Working Group

Datatype Specific Coordination

DNase RNA

Binding DCC

DAC

EDCAC

Consortium

Production PI

Page 42: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Wiki

Page 43: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Nature 489-49,2012

Page 44: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Lessons Learned • Plan data collection

– Develop focused project goals and target end users in advance

– Employ high-throughput, robust methods • Keep production and technology development pipelines separate

– Centralize data collection to the extent possible to maximize economies of scale and consistent data quality

– Generate data on common samples to the extent possible • Consider centralized sample collection/distribution • Very powerful to have multiple data types on same samples

– Develop metadata useful for people outside of project – Develop experimental standards, data quality metrics and

uniform data processing • Especially needed if multiple groups are generating data using same

experimental assays • Ensure high (known) data quality • Perform data quality evaluation on ongoing basis

Page 45: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Lessons Learned

• Devote sufficient resources to bioinformatics (data storage, processing and analysis)

– Don’t assume that organism –specific community will come together on its own for analysis without dedicated support

• Be realistic about data analysis and publication timeline – Overestimate by at least 2X

• Create centralized mode of sharing information – e.g, wiki sites, google docs

Page 46: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Lessons Learned • Need for significant, centralized management

– Explicit, written guidelines, standards and rules • e.g., policies for data release, publications

• Balance needs of individual investigators with those of Consortium – Retain ability to publish independently – Focus on global data production and analysis – Beware of focus on individual research agendas and

“interesting biology” • Foster collegial interactions

– Encourage diversity of opinions – Keep consortium open and bring in needed expertise – Avoid “group think” – Have explicit process for decision making

Page 47: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

Summary • Set clear goals, articulate to community • Maximize utility of data to the community

– Rapid pre-publication data release – High (knowable) data quality – Data standards – Interoperability with other projects, especially metadata

• Take advantage of high-throughput production capabilities to maximize economies of scale

• Open consortium • Set and monitor production milestones • Facilitate communication between data production groups and

computational analysis • Devote sufficient resources (data production, analysis and

infrastructure)

Page 48: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

AgENCODE Considerations

• Focused goals • Number of species • Quality of genome sequence • Number of individuals per species • Number of phenotypes • Number of tissues/cell types

Page 49: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

ENCODE Production Centers

Bradley Bernstein (John Rinn, Manolis Kellis)

Thomas Gingeras (Carrie Davis, Roderic Guigo)

Brenton Graveley (Christopher Burge, Xiang-Dong Fu, Eugene Yeo)

Richard Myers (Devin Absher, Gregory Cooper, Shawn Levy, Florencia Pauli Behn, Ross Hardison, Ali Mortazavi, Timothy Reddy, Barbara Wold)

Bing Ren (Joseph Ecker, Len Pennacchio, Axel Visel, Wei Wang)

Michael Snyder (Kevin White, Sherman Weissman, Peggy Farnham)

John Stamatoyannopoulos (Ralph Hansen, Rajinder Kaul, Patrick Navas, George Stamatoyannopoulos, Piper Treuting, Michael Bender, Job Dekker, Mark Groudine)

ENCODE Data Coordination Center

Mike Cherry (Jim Kent)

ENCODE Data Analysis Center

Zhiping Weng (Mark Gerstein, Manolis Kellis, Roderic Guigo, Rafael Irizarry, Xiaole Shirley Liu, William Stafford Noble)

Additional ENCODE Participants

Timothy Hubbard (Mark Gerstein, Roderic Guigo, Jen Harrow, Rachel Harte, David Haussler, Manolis Kellis, Alexandre Reymond, Stephen Searle, Alfonso Valencia)

David Gilbert (Tamer Kahveci)

ENCODE 3 ENCODE Computational Analysis Groups

Peter Bickel (Haiyan Huang, Leonard Lipovich, Bin Yu)

David Gifford (Tommi Jaakkola)

Sunduz Keles (Emery Bresnick, Colin Dewey)

Robert Klein (Christina Leslie, Souma Raychaudhuri, Ross Levine, Kenneth Offit)

Jonathan Pritchard (Yoav Gilad)

Xinshu Xiao

ENCODE Technology Development Groups

Christopher Burge (Wendy Gilbert, Brenton Graveley, Robert Horvitz)

Barak Cohen and Joseph Corbo

Peggy Farnham (Victor Jin, David Jay Segal)

R. David Hawkins

Christina Leslie (Christopher Mason)

Jason Lieb (Karen Mohlke, Eran Segal)

Mats Ljungman (Thomas Wilson)

Tarjei Mikkelsen

Jay Shendure and Nadav Ahituv (Michael McManus)

Alexey Wolfson

Guo-Cheng Yuan (Stuart Orkin)

… and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups

Current ENCODE participants: http://www.genome.gov/26525220

Page 50: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

The ENCODE 3 Consortium

Page 51: The Encyclopedia of DNA Elements (ENCODE) Project · 2014-01-10 · The Encyclopedia of DNA Elements (ENCODE) Project Elise A. Feingold, Ph.D. National Human Genome Research Institute

NHGRI Staff

Program Directors Elise Feingold Peter Good Michael Pazin

Deputy Director Mark Guyer

Division Director Jeff Schloss

Program Analysts Sherry Zhou Preetha Nandi