D A S for ENCODE data coordination Felix Kokocinski, WTSI.

12
D A S for ENCODE data coordination Felix Kokocinski, WTSI

Transcript of D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Page 1: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

D A S for ENCODE

data coordination

Felix Kokocinski, WTSI

Page 2: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Project Overview

Annotate all evidence-based gene features at a high accuracy

across the human genome– protein-coding loci with isoforms– nc loci with transcript evidence– pseudogenes

Goal:

– HAVANA & EnsEMBL, Sanger Institute, UK– University of Lausanne, CH– Centre for Genomic Regulation, ES– Spanish Nat. Cancer Res. Centre, ES

– University of California Santa Cruz, USA– Washington University St. Louis, USA– Broad Inst. of MIT and Harvard, USA– Yale University, USA

Partners:

Page 3: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Manual Genome Annotation

• ~20 annotators working according to HAVANA guidelines

• computational pipeline for alignments

• Otterlace software

• input from partner groups, import of data source via DAS

• verification with RT-PCR, RACE & sequencing

Page 4: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Data Exchange using DAS

DistributedAnnotationSources

interfaceWWW

GenTrack

tracking system

Otterlace

ann. software

high prior.issues

exper. ver.issues

Perl API

Source Adaptors

Update Scripts

Page 5: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

GenTrack Annotation Tracking

• extension of open-source RoR ticketing system Redmine (www.redmine.org)• data import via DAS• modules for analyzing and flagging data• www.sanger.ac.uk/gentrack

Page 6: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

GenTrack Annotation Tracking

Page 7: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

GenTrack Annotation Tracking

QuickTime™ and a decompressor

are needed to see this picture.

Page 8: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

GenTrack Annotation Tracking

QuickTime™ and a decompressor

are needed to see this picture.

Page 9: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

• Entry points:

– List of all genes & transcripts in region

– High-priority loci

– Loci with specific tags

• Identify problem, compare in Otterlace

• Resolve by

– Changing annotation or

– Disbelieving other source

– Note decision

GenTrack: Workflow

Page 10: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

DAS Specifics

Format: Specialized 1.53E

<type-id>

from sequence ontology (exon: SO:0000147)

<method>

(havana_manual_annotation)

<type-category>

Evidence code describing the type of method

(inferred from RT-PCR experiment (ECO:0000109))

<note>

- key=value pairs

- parent, lastmod [req] (LASTMOD=2006-04-07T15:15:58+0100)

- transcripttype, etc. [opt]

Page 11: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

DAS Specifics

QuickTime™ and a decompressor

are needed to see this picture.

Page 12: D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Thanks

Tim Hubbard

ENCODE partners

Andy Jenkinson

Jonathan Warren

Paul Bevan

Jody Clements

Steve Trevanion

James Gilbert

Anacode

Adam Frankish

Toby Hunt Bronwen Aken

Steve Searle

Jennifer Harrow

Redmine.org