What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined...

57
What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern the terms and relations Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy Controlled vocabulary for describing: • Web services e.g. WSDL files • Standalone tools • Web servers • Databases • Data, e.g. XSD data schema associated with a WSDL file • Data syntax and file formats Aims to describe (coarse level) all major bioinformatics databases, data and tools in use The "beta" release covers tools (and associated data) in the EMBRACE Registry: http://www.embraceregistry.net/
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined...

Page 1: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

What is EDAM?

EMBRACE Data and Methods

Ontology for bioinformatics tools and data

A set of defined terms, relationships between terms and rules that govern the terms and relations

Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy

Controlled vocabulary for describing: • Web services e.g. WSDL files • Standalone tools • Web servers • Databases • Data, e.g. XSD data schema associated with a WSDL file • Data syntax and file formats

Aims to describe (coarse level) all major bioinformatics databases, data and tools in use The "beta" release covers tools (and associated data) in the EMBRACE Registry: http://www.embraceregistry.net/

Page 2: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Scope

EDAM includes 7 sub-ontologies (branches of terms in their own namespace) In the domain of "bioinformatics tool and data description“:

• biological entity – “Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that can be mapped to such a thing, a collection of such things or an

observable phenonema or occurrence” • topic – “A general field of bioinformatics study, data, processing and analysis or technology.”

• operation – “A specific, singular function or process performed by a tool, for example a WS operation. What is done, but not (typically) how or in what context.”

• data resource – “A category of content of a data source including databases and ontologies.”

• data – “A semantic description of a data entity (datum) commonly used in bioinformatics.”

• format – “A reference (typically a URL) of a data format specification.”

Required terms not specific to this domain might (eventually) be removed – including the entity branch (which provides biological context for other branches).

Page 3: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Conceptual model

Bold text within a box indicates a namespace (top-level term)

Non-bold text within a box indicates a minor branch

Text next to lines indicates a relation between two terms

Page 4: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Design Principles

It wasn’t just thrown together (honestly) …

• Clearly defined scope • A purpose-independent design, not tied to a particular use case • Relevant to annotation of current:

•WSDL files •XSD schema •Standalone databases, servers and tools

• Comprehensive, with enough terms to be useful • Comprehensible, with terms and relations that are simple and intuitive • Uncluttered, including only commonly used terms use and with as few relation types as possible • Navigable, with a simple class (is_a) hierarchy • General, including terms of general use and excluding fine-grained specialised concepts. • Complementary to (not duplicate) other established ontologies. • Compatible (e.g. cross-referenced) with existing resources • Integrity, compatible (so far as possible) with "upper level" ontologies• Extensible, with clear guidelines for developers• Convenient, with clear guidelines for annotators • Ideally, support automated logical inference (reasoning software) • Validatable

There is a compromise between “ontological correctness” and usability – a pragmatic approach is essential!

Page 5: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Limitations

EDAM is/does not:

• Describe syntax or file formats in detail (syntax namespace will provide references)

• Define data structures. Although has_part / is_part_of relations are defined they are not currently used.

• Include terms for every conceptual part of things. Typically a datatype is only listed if it known to be in common use

• A catalogue of individual data structures, databases etc. Terms correspond to classes; specific instances are not included.

• A full-strength ontology. Many relations and other domain features that could be expressed, e.g. in OWL format, are not modelled.

• A way (in itself) to identify or unify all services and data (but it might help).

• Complete (and arguably never can be).

Page 6: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Sources (current version)Software collections and registries:• EMBRACE Web Services• EBI Web Services• EBI databases and retrievable fields known to the EB-eye web services ()• EMBOSS including EMBASSY packages (>200 applications)• WHAT-IF data and services (see also WHAT-IF help)• Lists of tools from the Web

Domain ontologies:• myGrid ontology• NAR Databases• NAR web servers• Sequence (sequence-related terms)• Sequence service (sequence service terms)

Database-related terms:• dbxref.txt (databases cross-referenced in UniProtKB/Swiss-Prot)• List of databases collated by the ELIXIR project• Lists of databases from the web

Other (not used as source of terms):• MI (molecular interactions)• MIRIAM Resources• bio2rdf

Page 7: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Sources (to consider)

1. BioMoby: BioMoby Object Ontology (datatypes) BioMoby Namespace Ontology (namespaces) BioMoby service types (analysis types) BioMoby web service registry (Moby-compliant services) 2. Tool collections and registries: PSICQUIC services Web services lists and registries Services supported by the bio* projects 3. Domain ontologies: PDBML Schema (Protein Data Bank Markup Language) Sequence Ontology (sequence annotation and annotation exchange) BioPAX ontology (biological pathway data) Ondex ontology DAS (sequence annotation) Map (biological map-related terms from Gramene database) 4. XML formats: BSML MACSIM HSAML BEAST MSAML PHYLIP JalView 2 Project AlignmentML EBI Application XML UniProtKB RDF 5. Other: MSD/PDBe API OMG LSR documents

Page 8: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Download

“Beta" version in OBO (Open Biomedical Ontologies) format: http://sourceforge.net/projects/edamontology/files/

Page 9: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 10: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Status“Beta” version intended primarily for testing and feedback

Starting point for service nomenclature

Coverage is quite broad in general and quite deep for sequence analysis:

•~2000 terms with definitions

•8 basic types of relation (plus inverse relations)

• Relations are defined but not used in many term definitions. Relations will be added in the future depending on requirements.

Maturing nicely through iterative cycles of development

• Term names, definitions and hierarchy (is_a relations) in all branches are reasonably stable

• Future versions will not be a fundamental departure

EDAM is being actively developed:

• OBO uses IDs to uniquely identify terms. EDAM IDs will persist between versions: a given ID is guaranteed to identify the same concept. This does *not* imply term names, definitions and other fields will remain constant, but they will remain true to the concept.

• Obsolete terms will also persist (they will not be removed and will maintain their ID).

Suggestions, requirements and collaborations welcome!

Page 11: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

License

EDAM is made available to all without any constraint or license on its use or redistribution other than: • EDAM is clearly acknowledged as the source of the product. • EDAM files displayed publicly include the publication date and/or version number. • EDAM files are not altered and subsequently redistributed under their original name or with the same term identifiers.

Page 12: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

DocumentationDocumentation at:

http://edamontology.sourceforge.net/

Including clear statement of:

• Branches of terms (namespaces / sub-ontologies)

• Relations

• Rules (governing rules and relations)

• Guidelines for Developers

• Guidelines for Annotators (basic)

• And more …

Page 13: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 14: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Viewing

EDAM may be viewed in:

• Any text editor

• Ontology editorOBO Ontology Editor (OBOEdit) Version 2http://oboedit.org

• Web-based browsers: NCBO Ontology Browser http://bioportal.bioontology.org/visualize/42800

EBI Ontology Look-up Service (coming soon)http://www.ebi.ac.uk/ontology-lookup/

• SRSEBI SRS serverhttp://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM

Page 15: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Viewing in Text Editor

• Any text editor

Page 16: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 17: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Viewing in Ontology Editor

• Ontology editorOBO Ontology Editor (OBOEdit) Version 2http://oboedit.org

Page 18: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 19: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 20: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Viewing in Web-based Browser

• Web-based browsers: NCBO Ontology Browser http://bioportal.bioontology.org/visualize/42800

EBI Ontology Look-up Service (coming soon)http://www.ebi.ac.uk/ontology-lookup/

Page 21: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 22: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 23: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Viewing in SRS

EDAM is in EBI SRS server: http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+EDAM

And from the EBI dbfetch: http://wwwdev.ebi.ac.uk/Tools/dbfetch/

Which allows the terms to be addressed : http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352 (plain text view) or http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352?style=html (HTML view)

These views are the term “end-points”

Page 24: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 25: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 26: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 27: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 28: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Guidelines for Annotators

Which EDAM branch to use?• “topic” for coarse-grained annotation of tools, databases, servers and so on• “operation" for fine-grained annotation of tool functions• “data resource" for annotating data resources such as databases and servers into broad categories based on content-type• “data" and “format" for annotating data in semantic and syntactic terms respectively

Picking terms• Familiarise yourself with EDAM (use a text editor or OBOEdit)• Identify the correct branch/namespace (“operation", “data" etc. see above)• Search EDAM using keywords to find candidate terms. Use synyonyms, alternative spellings etc.• Pick the most specific term(s) available (some concepts are necessarily overlapping or general!) • Only pick a correct term (if it doesn't exist it can be added)

Use other ontologiesUse EDAM alongside other ontologies where possible and desirable. For example, an operation that predicts specific features of a molecular sequence could be annotated with GO terms for the features.

Page 29: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Annotation of Web ServicesModel of a Web Service

A WS is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of WS interoperation to one of compatibility between operations.

Operation• Discrete unit of functionality performing (typically) one or more definite functions• Reads an input• Writes an output• Uses zero or more data resources

Input• Payload of SOAP message passed in operation call• Name and (ideally) description is given in WSDL file• Input has one or XML elements which must be set (input values)

Output• Payload of SOAP message returned from operation call• Name and (ideally) description is given in WSDL file• Output has one or XML elements which are written (output values)

XML elements• Simple or complex XSD types given in XSD schema associated with a WSDL file• Correspond to values that are input or output by a service• Name and (ideally) description of element is given in schema• Element values are instances of a particular datatype with a semantic type and a specific syntax.• Most element values have a syntax fully specified by the schema• Some element values correspond to text in a specific file format which is not specified by the schema. Such reports may be a composite of different semantic types.

Data resources• Databases or ontologies used in the background• Not passed in a WS call• Might be specified indirectly via a parameter. For example an operation reads a database, the name of which is specified

Page 30: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Annotation of Web ServicesLevels of annotation

Annotation of a WSDL file or associated XSD schema is possible at several levels.

Assuming SAWSDL annotation, the XML elements that may be annotated are:

1. Service (<wsdl:portType>) • Ideally one “Topic" term for the service as a whole

2. Operation (<wsdl:operation>) • Ideally one "Operation" term for each WSDL operation (more than one in exceptional circumstances)

3. Input (parameter) values (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>) • One "Data" term • One “Format" term

4. Output values (<xs:element>, <xs:complexType>, <xs:simpleType>, <xs:attribute>) • One "Data" term • One “Format" term

The expectation is for annotation of operation inputs and outputs to go into XSD schema although the WSDL file (<input> and <output> elements) might also be used. The following annotations might be useful but are not supported by SAWSDL:

1. Web service (<wsdl:service>) • One or more "Topic" terms to describe the general area(s) the service operates in • One or more “Data resource" terms to describe the data resources used by the service

2. Operation input (<input>) • One or more "Data" terms for the input(s) of each operation (if needed)

3. Operation output (<output>) • One or more "Data" terms for the output(s) of each operation (if needed)

Page 31: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Annotation of EMBOSS

EMBOSS (European Molecular Biology Open Software Suite)

>200 applications for (mostly) molecular sequence analysis

Application descriptions are kept in ACD (Application Command Definition) file

ACD file includes:1 “Application definition”1 or more “Data definitions”

ACD files are annotated with EDAM terms

Application definition:>=1 “topic” term>=1 “operation” term

Data definition:>=1 “data” term

Page 32: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 33: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 34: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

EMBOSS Service Annotation

Annotated WSDL files (and associated XSD data schema) are available from:

http://wwwdev.ebi.ac.uk/soaplab/typed/services/list

You will see a list of service end-points with WSDL URLs. For example:

http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?wsdl

To see the data schema associated with a WSDL, you must replace

"?wsdl" with "?xsd=1", "?xsd=2" or "?xsd=3"

For example:

http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?xsd=1

Page 35: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 36: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 37: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

SAWSDL annotation

The proposed format of SAWSDL annotation includes the term namespace, unique identifier and URN pointing to the term definition: <element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id">

Where ...

* element is the XML element being annotated * elementName is the name of the XML element * namespace is the namespace of the EDAM term, e.g. "operation" * id is the unique identifier of the term, e.g. "0000295"

The term name, if required, could be given as an XML comment after the annotated element:

<element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id"> <!-- term_name -->

This is not recommended however as term names are not guaranteed to remain constant.

The value of the sawsdl:modelReference attribute is a URN pointing to the term definition. Proposal is to use PURLs (Persistent Uniform Resource Locators) which include the term namespace.

Page 38: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 39: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 40: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 41: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 42: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

EDAM term end-points

When pasted into a browser, the PURLs:

http://purl.org/edam/topic/0000182http://purl.org/edam/operation/0000292http://purl.org/edam/data/0000863

... will (eventually) resolve to:

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182 http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292 http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863

These are complete OBO term statements in plain text (OBO format). PURLs support text extensions allowing a format specifier to be added. For example these PURLs:

http://purl.org/edam/topic/0000182?style=htmlhttp://purl.org/edam/operation/0000292?style=htmlhttp://purl.org/edam/data/0000863?style=html

... will resolve to OBO term statements in HTML such that terms referred to in the statements (via relations) will be clickable to allow navigation:

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863?style=html

Page 43: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

EDAM term end-points

The eventual final list of end-points will provide other formats/views:

• Plain text in OBO format (default)• HTML • XML• JSON• The term in a web browser, e.g. NCBO Ontology Browser.

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=xmlhttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=txthttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=jsonhttp://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182%format=browser (default)

For now, you can see this in action for this term:

http://purl.org/edam/entity/0000002 http://purl.org/edam/entity/0000002?style=html

Page 44: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 45: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 46: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 47: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 48: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 49: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Parallel Developments(and other applications)

These include:• BioXSD• EMBRACE Registry / BioCatalogue• Taverna• BioNEMUS• Ondex• ELIXIR

Page 50: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 51: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 52: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 53: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 54: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

BioNemus

Page 55: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 56: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.
Page 57: What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Thanks

• Peter Rice (boss)• Alan Bleasby (PURL handling) • Mahmut Uludag (EMBOSS WS) • Hamish McWilliam (SRS, discussions)• Matus Kalas (BioXSD, discussions)• James Malone (SWO + discussions) • Steve Pettifer (publications + discussions)• The Forgotten … (sorry)

All enquiries to Jon Ison ([email protected])