ENCODE Portal and Uniform Processing Pipelines : Open Web ...

18
ENCODE Portal and Uniform Processing Pipelines : Open Web and Programmatic Access to ENCODE Data, Metadata, and Software Pipelines ENCODE Data Coordination Center Stanford University, Department of Genetics Asia Pacific Bioinformatics Conference January 10, 2016

Transcript of ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Page 1: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

ENCODE Portal and Uniform Processing Pipelines : Open Web and

Programmatic Access to ENCODE Data, Metadata, and Software Pipelines

ENCODE Data Coordination Center Stanford University, Department of Genetics

Asia Pacific Bioinformatics Conference

January 10, 2016

Page 2: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

ScheduleforWorkshop• Welcome&whoareyou?•  Introduc9ontoENCODE•  DCC,itsroleinENCODE•  ENCODEPortal(~45min)•  DataAccess&Availability•  DataProcessingviaUniformPipelines(~45)

ENCODEDCC2

Page 3: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Whatwouldyouliketolearn?

3

Howmanyofyou:

1.  ...workinalabthatperformsomicsmethology?

2.  …workinacomputa9onallabthatanalysesomicsdata?

3.  …havedownloadedENCODEdataandintersecteditwithotherdata?

4.  …knowwheretogoforacomprehensivecatalogofallassaysdoneby

ENCODE?

5.  …couldrepeatanENCODEanalysis(fromfastq’s)togenerateIDR-

thresholdedsetsofpeaks?

6.  …wanttorepeatoneoftheENCODEanalysispipelinesonyourdata?

7.  …needtoaccessENCODEdatabutfounditdifficultordon’tknow

wheretobegin?ENCODEDCC

Page 4: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

WhatisENCODE?•  13yearoldNIHproject,threephases•  Standards–  Experimentalmethodsandqualitymetrics– An9bodystandards– Metadataforexperiment,biosample&pipeline–  Transparentaccess–  Fulldatasharing

•  Data•  Tools&Pipelines•  Results&Publica9ons

ENCODEDCC4

Page 5: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Image courtesy of Mike Pazin Modified from PLoS Biol 9:e1001046, 2011

Science 306:636, 2004

ENCODE mapping features of the genome

Page 6: ENCODE Portal and Uniform Processing Pipelines : Open Web ...
Page 7: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

NIH ENCODE and Roadmap Epigenomics projects have produced >150TB of data

ENCODE 5403 experiments (+1316) 3659 biosamples (+3230)

Roadmap Epigenomics 3137 experiments 985 biosamples

REMC Data Coordinating Center Aleks Milosavlijevic, Baylor College of Medicine

Page 8: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Nature feature: Challenges in irreproducible research

http://www.nature.com/news/reproducibility-1.17552

QC & measures of confidence

antibodies

standard & unique IDs

built in replication

data sharing

Page 9: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Metadata integration using ontologies

Page 10: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

ENCODE Portal

•  Central source for ENCODE data: experimental and analysis data •  Hub for project information: data standards & publications •  High-quality metadata: data provenance & transparency

Page 11: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

The ENCODE Portal at a glance www.encodeproject.org

Sloanetal.,2016,NAR

Page 12: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Rich experimental metadata is collected and presented for clarity and context

Pla`orm

•  Instrument•  Readlength•  SingleorPairedend•  Lanenumber•  Sequencingdepth

•  Agent(chemical,biological)•  Concentra9on•  Dura9on•  Constructtype•  Tag•  Tagloca9on•  Insertsequence•  Target•  Transfec9ontype•  Protocol

Treatment&gene9cmodifica9ons•  Species

•  Age•  Sex•  Healthstatus•  Ethnicity•  Strain

Donor&biosample

•  Type(e.g.9ssue,cellline)•  Source•  Productid•  Lotid•  Dates(e.g.growth,harvest,procurement)•  Passagenumber•  Star9ngamount•  LabassignedIDs

Libraryprepara9on

•  Lysismethod•  Sonica9onmethod•  Extrac9onmethod•  Nucleicacidtype•  Nucleicacidsizerange•  Libraryprepara9onprotocol•  Strandspecificity•  Sizeselec9onmethod•  Valida9ondocument

+

For example:

Page 13: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Metadata integration using ontologies!

EFO (for cell lines): http://www.ebi.ac.uk/efo/!UBERON (for tissues): http://uberon.org/!CL (for primary cells): http://cellontology.org/!OBI (for assays): http://obi-ontology.org!

DCC!ENCODE portal!

(DCC)!

Otherprojects

Page 14: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Main components of an experimental analysis are uniquely accessioned

ENCSR###XXX ENCBS###XXX ENCDO###XXX ENCLB###XXX ENCAB###XXX ENCFF###XXX

Experiments Biosamples

Donors/strains Libraries

Antibody lots Files

Page 15: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Each antibody lot is characterized & accessioned

Page 16: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Replication and transparency of methods

ENCODE experiments are designed to minimally have two replicates.

Page 17: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Data provenance & process transparency

DNase I

RNA-seq

DNase-seq

ChIP-seq

Bisulfite-seq

Processed data for further

analyses or visualization

? pip

elin

e

Avoid the pipeline blackbox

Page 18: ENCODE Portal and Uniform Processing Pipelines : Open Web ...

Mike Cherry (PI) Ben Hitz Cricket Sloan

@encodedcc [email protected] https://github.com/ENCODE-DCC/

The ENCODE DCC

Tim Dreszer Marissa Melen Laurence Rowe Forrest Tanaka Stuart Miyasato Matt Simison Zhenhua Wang

Esther Chan Jean Davidson Idan Gabdank Seth Strattan Marcus Ho Aditi Narayanan Jason Hilton Kathrina Onate