Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio.

Post on 18-Jan-2016

214 views 0 download

Transcript of Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio.

Crowd Sourcing Methods to Annotate Biological Processes

Andra Waagmeester

Micelio

Brothers Grimm: Stone soup

James Taylor http://km.aifb.kit.edu/ws/ckc2007/StoneSoup-www2007.pdf

“We try to analyze a 3D cell on a 2D level.” - Mike Washburn

Subsequently, we represent the multi-dimensional data space of this 2D view of the cell, again in a 2D space

Relational databases

Gene name ID Identifier

ZNF635m 18801 23126

…. ….. ….

Gene name ID Identifier

ZNF280E POGZ ENSG00000143442

…. ….. ……

Relational databasesGene name ID Identifier

ZNF635m 18801 23126

…. ….. ….

Gene name ID Identifier

ZNF280E POGZ ENSG00000143442

…. ….. ……

HGNC ID HGNC Symbol Name

18801 POGZ Pogo transposable element with ZNF domain

….. …… ……

Graph databases• ZNF635m is_a gene • ZNF635m has_Entrez_ID “23126”• ZNF635m ID “18801”• “18801” has_symbol “POGZ”• ZNF280E has_Ensembl_ID “ENSG00000143442”• ZNF280E HGNC_symbol “POGZ”

Something more profound is needed than relabeling old

wine in new bottles

Unique Resource Identifier• HGNCID:18801• ENSEMBL:ENSG00000143442• ENTREZ:23126• PMID:20196795

• ENTREZ:23126 rdf:type dbpedia:Gene• ENTREZ:23126 rdfs:label “ZNF635m”• ENTREZ:23126 rdfs:seeAlso HGNCID:18801• HGNCID:18801 rdfs:label “POGZ”• ENSEMBL:ENSG00000143442 rdf:type dbpedia:Gene• ENSEMBL:ENSG00000143442 rdfs:label “ZNF280E• ENSEMBL:ENSG00000143442 rdfs:seeAlso “HGNCID:POGZ”

Gerhard Michal 1974

Pathway external references

http://www.wikipathways.org/index.php/Pathway:WP430

Allows visualization of differences in expression

http://www.wikipathways.org/index.php/Pathway:WP430

Human and machine readable

@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix cas: <http://identifiers.org/cas/> .@prefix wprdf: <http://rdf.wikipathways.org/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> ....<http://www.ncbi.nlm.nih.gov/gene/1394> a gpml:DataNode , skos:Concept , wp:GeneProduct ; rdfs:isDefinedBy gpml:DataNode ; rdfs:label "CRHR1"@en ; dc:identifier <http://identifiers.org/ncbigene/1394> , "1394"^^xsd:string ; dc:source "Entrez Gene"^^xsd:string ; dcterms:isPartOf <http://rdf.wikipathways.org/WP4_r39380.ttl> ; gpml:centerx "340.0"^^xsd:float ; ...

311,696 articles (1.5% of PubMed)have been cited by GO annotations

Wikipedia is reasonably accurate

19

Wikipedia has breadth and depth

20

http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008

Articles

Words(millions)

Wikipedia Britannica Online

Centralizing key data storage

21

Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf

Centralizing key data storage

22

Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf

Wikidata

23

Provide a database of the world’s knowledge that

anyone can edit

- Denny Vrandečić

Centralizing key data storage

24

Centralizing key data storage

25

Centralizing key data storage

26

287 language editions of Wikipedia

Biocurators/Bioinformatics community

Wikidata for biology

27

is a

regulates

Interacts with

Protein

Glycoprotein

Neural development

VLDL receptor

Amyloid precursor protein

Property:P31

Property:P128

Property:P129

Q8054

Q187126

Q1345738

Q1979313

Q423510

Q414043

Reelin

http://www.wikidata.org/wiki/Q414043

Wikidata for biology

28

Property:P31

Property:P128

Property:P129

Q8054

Q187126

Q1345738

Q1979313

Q423510

Q414043

http://wikidata.org/w/api.php?action=wbgetentities&ids=Q414043&languages=en

Current progress

● All human and mouse genes and proteins loaded

● All diseases (Human Disease Ontology) loaded

● Dataset of all drugs in preparation

● Model for interlinking relations ready and proposed

Our current workflow

Stone soup of data

James Taylor http://km.aifb.kit.edu/ws/ckc2007/StoneSoup-www2007.pdf

Andrew Su, Scripps

Benjamin Good, Scripps

Sebastian Burgstaller, Scripps

Lynn Schriml, U Maryland

Elvira Mitraka, U Maryland

Gang Fu, NCBI

Evan Bolton, NCBI

Paul Pavlidis, U British Columbia

Peter Robinson, Charite

Many Wikipedia and Wikidata

editorsContact:

asu@scripps.eduandra@micelio.be

emitraka@som.umaryland.edu

Crowdsourcing in action