D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Post on 17-Jan-2016

217 views 0 download

Tags:

Transcript of D A S for ENCODE data coordination Felix Kokocinski, WTSI.

D A S for ENCODE

data coordination

Felix Kokocinski, WTSI

Project Overview

Annotate all evidence-based gene features at a high accuracy

across the human genome– protein-coding loci with isoforms– nc loci with transcript evidence– pseudogenes

Goal:

– HAVANA & EnsEMBL, Sanger Institute, UK– University of Lausanne, CH– Centre for Genomic Regulation, ES– Spanish Nat. Cancer Res. Centre, ES

– University of California Santa Cruz, USA– Washington University St. Louis, USA– Broad Inst. of MIT and Harvard, USA– Yale University, USA

Partners:

Manual Genome Annotation

• ~20 annotators working according to HAVANA guidelines

• computational pipeline for alignments

• Otterlace software

• input from partner groups, import of data source via DAS

• verification with RT-PCR, RACE & sequencing

Data Exchange using DAS

DistributedAnnotationSources

interfaceWWW

GenTrack

tracking system

Otterlace

ann. software

high prior.issues

exper. ver.issues

Perl API

Source Adaptors

Update Scripts

GenTrack Annotation Tracking

• extension of open-source RoR ticketing system Redmine (www.redmine.org)• data import via DAS• modules for analyzing and flagging data• www.sanger.ac.uk/gentrack

GenTrack Annotation Tracking

GenTrack Annotation Tracking

QuickTime™ and a decompressor

are needed to see this picture.

GenTrack Annotation Tracking

QuickTime™ and a decompressor

are needed to see this picture.

• Entry points:

– List of all genes & transcripts in region

– High-priority loci

– Loci with specific tags

• Identify problem, compare in Otterlace

• Resolve by

– Changing annotation or

– Disbelieving other source

– Note decision

GenTrack: Workflow

DAS Specifics

Format: Specialized 1.53E

<type-id>

from sequence ontology (exon: SO:0000147)

<method>

(havana_manual_annotation)

<type-category>

Evidence code describing the type of method

(inferred from RT-PCR experiment (ECO:0000109))

<note>

- key=value pairs

- parent, lastmod [req] (LASTMOD=2006-04-07T15:15:58+0100)

- transcripttype, etc. [opt]

DAS Specifics

QuickTime™ and a decompressor

are needed to see this picture.

Thanks

Tim Hubbard

ENCODE partners

Andy Jenkinson

Jonathan Warren

Paul Bevan

Jody Clements

Steve Trevanion

James Gilbert

Anacode

Adam Frankish

Toby Hunt Bronwen Aken

Steve Searle

Jennifer Harrow

Redmine.org