Naming conventions - SourceForgemsi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc ·...
Transcript of Naming conventions - SourceForgemsi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc ·...
Naming Conventions for CVs and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Naming Conventions forControlled Vocabularies (CVs) and Ontologies
http://msi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc
1 RATIONALE FOR THIS DOCUMENT......................................................................4
2 (META-) REFERENCE TERMINOLOGY.................................................................5
3 GENERAL PRINCIPLES FOR CREATING REPRESENTATIONAL ARTIFACTS. 6
3.1 Univocity..............................................................................................................6
3.2 Positivity..............................................................................................................7
3.3 Objectivity............................................................................................................7
3.4 Try to avoid multiple parenthood and multiple inheritance............................7
4 NAMING CLASSES.................................................................................................9
4.1 Class name precision.........................................................................................9
4.2 Synonyms..........................................................................................................104.2.1 Different sorts of Synonyms ?......................................................................10
4.2.2 Property synonyms......................................................................................12
4.3 Lexical Properties of class names...................................................................134.3.1 Capitalisation................................................................................................13
4.3.2 Character set................................................................................................13
4.3.2.1 Character set formatting.......................................................................14
4.3.3 Word separators...........................................................................................14
4.3.3.1 Hyphens, dash and slash......................................................................15
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
4.3.4 Singular nouns.............................................................................................16
4.3.5 Use present tense for representational units................................................16
4.3.6 Plurals and sets............................................................................................16
4.3.7 Avoid linguistic ellipses................................................................................17
4.3.8 Acronyms and Abbreviations.......................................................................18
4.3.9 Registered Product- and Company-names..................................................18
4.3.10 Word compositions and length.....................................................................19
4.3.10.1 Compound vs. atomic names for representational units.......................19
4.3.10.2 Splitting and merging classes...............................................................20
4.3.11 Affixes (prefix, suffix, infix and circumfix).....................................................21
4.3.12 Logical connectives......................................................................................21
4.3.13 "Taboo" words and Characters....................................................................21
4.3.14 Specific language requirements...................................................................22
5 DEPICTING REPRESENTATIONAL UNITS WITHIN TEXT..................................23
6 CLASS DEFINITIONS............................................................................................24
6.1 General rules for creating sound normalized definitions..............................24
7 UNIQUE IDENTIFIERS...........................................................................................26
7.1 Capturing the class name and ID using the autoID plugin in Protégé-owl. .27
7.2 Life science Identifier, (LSID: http://lsid.sourceforge.net/)............................28
8 NAMESPACE.........................................................................................................30
9 ONTOLOGY IMPORTS IN PROTÉGÉ-OWL.........................................................31
9.1 The “lang” attribute issue................................................................................319.1.1 Import...........................................................................................................32
9.1.1.1 Importing from repositories (extracted from the Protégé wiki)...............33
9.1.1.2 Changing the imported ontology to be the newest updated version.....34
10 PROPERTIES (ATTRIBUTES AND RELATIONS).............................................36
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 2
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
10.1 Assigning "key-properties" to top level classes............................................36
11 NAMING OF ONTOLOGY FILES AND ONTOLOGY VERSIONS.....................37
12 References..........................................................................................................39
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 3
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
1 Rationale for this documentThis document defines naming conventions for controlled vocabularies (CVs) and
ontologies. Metadata annotation elements are not covered here; these are addressed in
the <<Metadata Annotations for Representational Units and Representational
Artefacts>> document [1].
These recommendations have been developed to guide the activities of the
Metabolomics Standards Initiative (MSI) [2] Ontology Working Group (OWG) [3].
The MSI OWG seeks to facilitate the consistent description of metabolomics experiment
components by reaching a consensus on a core set of CVs and then developing an
ontology. The CVs are developed in close collaboration with the HUPO Proteomics
Standards Initiative (PSI) [4] and structured as taxonomies in owl and OBO format. The
ontology is developed as part of the Ontology for Biomedical Investigation (OBI,
previously ‘FuGO’) [5], a larger, multi-domain collaborative effort.
These naming conventions are also used in the context of the OBI, developed in OWL.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 4
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
2 (Meta-) Reference TerminologyKnowledge representations (KR, also called representational models) are referred to
with the term ‘representational artefact’, RA). A representational artefact is made of
related ‘representational units’ (RU, also known as KR-idioms) - in most cases
classes and properties. We recommend using the term ‘class’ to refer to the
representational unit that models a ‘universal’ in an ontological representational
artefact. Each class has a ‘class name’, a term (string) to designate the class. An
‘Instance’ is the representation of a ‘particular’ in reality. A particular instantiates a
universal and an instance instantiates a class. Properties of universals are represented
through representational units called ‘properties’. Properties which have fillers of
simple datatypes (e.g. integer, string, boolean, ...) are called ‘attributes’ or ‘datatype properties’. Properties which have classes or instances as their fillers (also called
‘range’) are called ‘relations’ or ‘object properties’. Confusingly other formats use the
word "property" for restrictions. The word ‘domain’ can mean a group of classes that a
property is asserted to (in owl), but also describes the area of interest of a
representational artefact.
For a detailed recommendation have a look at the full paper:
http://ontology.buffalo.edu/bfo/Terminology_for_Ontologies.pdf
The following key words “MUST,” “MUST NOT,” “REQUIRED,” “SHALL,” “SHALL NOT,”
“SHOULD,” “SHOULD NOT,” “RECOMMENDED,” “MAY,” and “OPTIONAL” are to be
interpreted as described in RFC-2119, see S. Bradner, Key words for use in RFCs to
Indicate Requirement Levels, Internet Engineering Task Force, RFC 2119,
http://www.ietf.org/rfc/rfc2119.txt, March 1997.
Sections in Brackets [...] are comments for the editor. Please ignore these.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 5
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
3 General principles for creating representational artifactsBecome acquainted with the capablities and incapabilities of both the representation
formalism and its implementation (an ontology engineering tool) of your choice.
Don’t get into 'analysis paralysis'! You will not get it right at the first time! Sometimes
one has to throw things away and start again. Do not get into the ‘naïve euphoria’ either. Not every fancy just-built piece of representation is an ontology worth bothering
others.
Save often! Always save to a new version number including the date. Protégé-
OWL is not yet completely stable. Undo is difficult and bugs occasionally corrupt
ontologies beyond retrieval.
General Ontology Engineering Axioms: Every class has at least one instance
Distinct classes on the same level and leaf classes never share instances
3.1 Univocity
Names of RUs (including the ones for relations) should have the same meaning on every occasion of use and refer to the same universals and kinds of entities in reality. Each name should refer to exactly one RU, and each RU should represent
exactly one entity in reality (a universal in the case of a class). In effect, it should
unambiguously refer to the same entity in reality. Note that this principle of univocity
excludes homonyms, terms that are used as names of more than one RU. For example,
if you use the term ‘cell’ as a name of the class representing (the type of) cells as found
in all organisms, the same term should not be used as a name for a more specialized
class representing (the type of) cells as found only in plants. Likewise, the term ‘part of’
should not be used to name more than one relation, e.g., partonomy, set membership,
etc.
Further more:
Don’t confuse universals with ways of getting to know types
Don’t confuse universals with ways of talking about types
Don’t confuses universals with data about types
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 6
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
3.2 Positivity
Complements of classes such as ‘non-mammal’ or ‘non-membrane’ are not necessarily themselves classes and don’t designate genuine universals. Similarly,
do not represent the absence of a wing as the presence of the non-existence of a wing,
e.g.: 'wing' has_status "absent". The positivity recommendation may need to be
weakened; sometimes it can make sense to have e.g. an "ex-vivo" role or a “non-
living_organism”.
3.3 Objectivity
No distinction without a difference. A child class must differ from its parent class in a distinctive way. A child class must share all the properties of its parent classes
(inheritance principle) and have additional ones that the parents have not. Each class
must be defined in a formula which states the necessary and sufficient conditions for
being an instance of the corresponding universal. The sibling class of a given parent
class should have differentia which are really distinct. This means that the universals of
these classes at least have distinct (ideally non-overlapping = single inheritance)
extensions. The distinction between each pair of siblings must be explicitly represented
(opposition principle).
Which universals exist is not a function of our biological knowledge. Be aware that terms such as ‘unknown’ or ‘untypified’ or ‘unlocalized’ do not designate genuine universals. To characterize classes, formulate intrinsic properties (properties
that are inherent to the universal represented by the RU) rather than extrinsic ones
(properties that are asserted from outside, e.g. accession numbers). ‘Intrinsic’ describes
a characteristic or property of some thing or action which is essential and specific to that
thing or action, and which is wholly independent of any other object, action or
consequence. A characteristic which is not essential or inherent is extrinsic (from
http://en.wikipedia.org/wiki/Intrinsic).
3.4 Try to avoid multiple parenthood and multiple inheritance
No class in the hierarchy should have more than one superclass. Multiple
inheritance can generate subtle but systematic ambiguity in the meaning of formal
relations like is_a and part_of within the ontology. One should not press the "is_a" into
service to mean a variety of different things (see univocity principle). Domain-experts
should build single parenthood taxonomies of their views of reality. Other domain
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 7
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
experts build the same for theirs and only later all these taxonomies will get
‘multidimensionally’ aligned within obo and secure common nodes will result which
make consistent (!) multiple inheritance possible.
There are however many opinions on this issue and we might discuss this matter
further, when we feel there is a real need for multiple parenthood.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 8
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
4 Naming ClassesEach class representing a universal in a representational artefact is labelled with a human readable class name. Class names should be short, easy to remember and as self-explanatory as the pragmatic compromise allows. This class name
should be used as default browser key when navigating through the class hierarchy and
should therefore be as intuitive as possible to the ontology engineer building the
ontological structure. However this class name will not necessarily be used as the main
search attribute by the end-users when they are searching for classes. For this a short
and intuitive class name should be captured as preferred synonym, which would be the
term of highest usage frequency found in the literature of that domain, i.e. the term with
the highest user acceptance. Use a name that is most widely accepted in the user
domain. The class should represent and be named after the intrinsic, underlying nature
of the universal to be represented, not according to extrinsic properties or roles a class
can play in a particular context. Embodying the whole meaning of the class - with all its
relationships to other classes - in its name is in most cases neither possible nor
recommended. Keep semantics in the definitions and formalize it explicitly as properties
and axioms. For example, a class “distinct_identifiable_physical_part” should be just
called “physical_part”. For the preferred synonym readability should have higher priority
than constraining interpretation through the class names. For the class name that is
used for OE, it is the other way round.
Epistemological statements don't belong in the class names so avoid calling the
class “instrument” “instrument_class” or the relation “has_part” “has_part_relation”.
4.1 Class name precision
Class names should be precise, concise and linguistically correct (i.e. they should conform to the rules of the language in question). Often terms for RUs are
not precise, i.e. they do not capture the intended meaning. Imprecise terms are
especially problematic in the absence of good definitions. For example the term
“anatomic_structure, system or substance” does not give us any clue as to whether the
scope of the adjective prefix “anatomic” is restricted to structure or extends also to
system and substance. This ambiguity can lead to problems like the following: If
“anatomic” is restricted to “structure” only, then “drug” and “chemical” would be
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 9
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
classified under this class, since these are clearly substances. If it is not restricted
“drug” and “chemical” could not be classified under this class.
4.2 Synonyms
A strict definition of synonymy, as e.g. proposed by ISO 1087-1:2000 is: “… relation
between or among terms in a given language representing the same concept, with a
note to the effect that terms which are interchangeable in all contexts are called
synonyms; if they are interchangeable only in some contexts, they are called quasi-
synonyms. “
The number of synonyms for a class is not limited, and the same text string can be used
as a synonym for more than one class. Add synonyms if you edit or delete a class
name, but the old name is still a valid synonym, e.g. if you change "respiration" to
"cellular_respiration", keep "respiration" as a synonym. This helps other users to find
familiar classes. Add synonyms if the class name has (or contains) a commonly used
abbreviation. Acronyms are synonymous with the full name as long as the acronym
is not used in any other sense elsewhere. 'Jargon' type phrases are synonymous with
the full name as long as the phrase is not used in any other sense elsewhere.
To capture synonyms in owl, one can use the rdf:comment field, and add a comma
separated list of synonyms after a “synonym: ”-marker. Another way would be to create
a new metaclass with a new string datatype property “has_synonyme” and derive all
new classes from this new metaclass (see also http://protege.cim3.net/cgi-bin/wiki.pl?
CreatingSynonyms). This has the disadvantage of the whole ontology becoming OWL-
full. Capturing synonyms in further rdfs:label fields has the disadvantage that when
more synonyms are present, it is not possible to know which one is the preferred class
name, the human readable class name to display as the browser key and which is
another kind of synonym. Usually the alphabetically first rdfs;label would be displayed.
4.2.1 Different sorts of Synonyms ?As we saw above synonyms are not always 'synonymous' in the strictest sense of the word, as they do not always mean exactly the same as the class they are attached to. Some synonyms may be broader or narrower in meaning than the class
name; it may be a related phrase or alternative wording, spelling or use a different
system of nomenclature. Having a single, broad relationship between a class and its
synonyms is adequate for most search purposes, but for applications such as semantic
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 10
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
matching, the inclusion of a more formal relationship set is valuable. For this reason,
one could record a relationship type for each synonym, e.g. like GO does. Such
relationships can be stored in the OBO format flat file.
Synonym types:Some synonym relationship types are:
* the term is an exact synonym to the class name, “ornithine_cycle” is an exact
synonym of “urea_cycle”
* the term is related to the class name, “cytochrome_bc1_complex” is a related
synonym of “ubiquinol-cytochrome-c_reductase_activity”
* the synonym is broader than the class name, “cell division” is a broad synonym of
“cytokinesis”
* the synonym is narrower or more precise than the class name, “pyrimidine-
dimer_repair_by_photolyase” is a narrow synonym of “photoreactive_repair”
* the synonym is related to the class name, but is not exact, broader or narrower,
“virulence” has a synonym type of other related to the class name “pathogenesis”
However we do not recommend to capture such ‘synonym types’ as the GO style guide suggests. Capture only exact synonyms.
For the OWL format one could use the W3 standard for thesauri ‘Simple Knowledge Organisation System’ (SKOS, http://www.w3.org/2004/02/skos/) to encode synonym types through relations like “narrower than”, “broader than”. It also provides a “preferred label” and "related to" element for terminological mapping:The SKOS Core Vocabulary includes the following properties for asserting semantic
relationships between concepts: skos:semanticRelation, skos:broader, skos:narrower
and skos:related. In a property hierarchy semanticRelation is the top semantic
relationship and others are children relationships. To assert that one concept is broader
in meaning (i.e. more general) than another, where the scope (meaning) of one falls
completely within the scope of the other, use the skos:broader property. To assert the
inverse, that one concept is narrower in meaning (i.e. more specific) than another, use
the skos:narrower property.<skos:Concept rdf:about="http://www.my.com/#canals">
<skos:broader rdf:resource="http://www.my.com/#hydrographic
%20structures"/>
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 11
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
</skos:Concept>
To assert that one concept is broader in meaning (i.e. more general) than another,
where the scope (meaning) of one falls completely within the scope of the other, use the
skos:broader property. To assert the inverse, that one concept is narrower in meaning
(i.e. more specific) than another, use the skos:narrower property. For example:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<skos:Concept rdf:about="http://www.example.com/concepts#mammals">
<skos:prefLabel>mammals</skos:prefLabel>
<skos:broader rdf:resource="http://www.example.com/concepts#animals"/>
</skos:Concept>
<skos:Concept rdf:about="http://www.example.com/concepts#animals">
<skos:prefLabel>animals</skos:prefLabel>
<skos:narrower rdf:resource="http://www.example.com/concepts#mammals"/>
</skos:Concept>
</rdf:RDF>
When you add a synonym in OBO-format using OBO-Edit, choose a type from the pull-
down selector (see the DAG-Edit user guide for more information). DAG-Edit will
incorporate the synonym type into the OBO format flat file when you save. The default
synonym type is the broadest, 'synonym' (equivalent to 'related' above).
4.2.2 Property synonymsOne can also create Object Property Synonymes (see section 4.1 of
http://www.w3.org/TR/owl-guide), e.g:
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 12
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
<owl:ObjectProperty rdf:ID="has_child">
<owl:equivalentProperty>
<owl:ObjectProperty rdf:ID="has_kid"/>
</owl:equivalentProperty>
</owl:ObjectProperty>
4.3 Lexical Properties of class names
4.3.1 CapitalisationNames should be lower case letters throughout except for acronyms which are capitalised (if their use in class names can't be avoided) and proprietary names, which are written as such. Proper names / brand names can break the conventions
rules unless rdf-field restrictions prevent these. E.g. there can be a "CBS_station"
(starting with a capital letter) and there can be a CamelCase brand name. This is the
recommendation of the OBO-Consortium. The other KR-domains (semantic web / OWL,
Protégé-group), use capitals for beginning class names, while proprietary names and
properties start with lower case letters.
Internal capitalization is however enforced by some computer systems, and mandated
by the coding standards of many programming languages, i.e. Java coding style
dictates that UpperCamelCase be used for classes, and lowerCamelCase be used for
instances and members. So unless you plan to use auto generated java classes or any
MDA approaches to convert the ontology into software code avoid CamelCase.
4.3.2 Character setTerms designating RUs should consist mainly of alphabetic characters, numerals and underscores. Whether you will be allowed to use the space as word delimiter depends on the way the implementation handles the strings for the representational unit in question. Avoid special characters where possible. Avoid accents, sub- or superscripts and characters and character-combinations that may have a special meaning in regular expressions or programming languages and XML. This recommendations are largely dependant on what the parsers for the
implementation format for the specific RU can handle, e.g. OWL identifiers (values of
the rdfID / :NAME property) must begin with a letter or underscore and contain only
letters, numerals, and the underscore character (‘_’). Spaces are not allowed here.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 13
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
For the full less restrictive specification see http://www.w3.org/TR/REC-xml-names/#NT-
NCName:
NCNameStartChar ::= Letter | '_'
NCNameChar ::= NameChar - ':'
( NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender )
If you keep the class name in another element, e.g. the rdf:label, you are in principle not
restricted in your character usage.
4.3.2.1 Character set formatting
No subscripts or superscripts are allowed (e.g. cm3 replaces cm3 and CO2 replaces
CO2). The Names of chemical elements from the periodic table should be written in full
length and should not be abbreviated with their symbols. (use hydrogen, copper and
zinc rather than H, Cu and Zn). Greek symbols should be spelled out e.g. alpha, beta,
gamma. Temperature designations like 37° C. can be represented as 37C.
[Add Punctuation]
4.3.3 Word separatorsVarious kinds of punctuation connect name parts, including separators such as spaces,
hyphens, and grouping symbols such as parentheses. These may have:
a) No semantic meaning. A naming rule may state that separators will consist of one
blank space or exactly one special character (for example a hyphen or underscore)
regardless of semantic relationships of parts. Such a rule simplifies name formation.
b) Semantic meaning. Separators can convey semantic meaning by, for example,
assigning a different separator between words in the qualifier term from the separator
that separates words in the other part terms. In this way, the separator identifies the
qualifier term clearly as different from the rest of the name. For example, in the data
element name “Cost_Budget-Period_Total_Amount” the separator between words in the
qualifier term is a hyphen; other name parts are separated by underscores.
Asian languages often form words using two characters which, separately, have
different meanings, but when joined together have a third meaning unrelated to its parts.
This may pose a problem in the interpretation of a name because ambiguity may be
created by the juxtaposition of characters. A possible solution is to use one separator to
distinguish when two characters form a single word, and another when they are
individual words.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 14
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Class name terms should be delimited by the "_" (underscore) separator. The underscore substitutes the space character. Whether you will be allowed to use the space as word delimiter depends on the way the implementation handles the strings for the representational unit in question. Under the OBO umbrella one can
find: "MyClass" "My Class", "My-Class", "My_Class", “My_class" and "my class"
conventions, sometimes even within one ontology. One convention is not necessarily
better or worse than the other as long it is used consistently within the ontology. Java
programmers, for example, use the "MyClass" (CamelCase-) convention, because that
is the standard for naming Java classes, whereas text miners use "My class"
convention, because it is easier to tokenize by natural language processing tools. The
CamelCase convention has problems to capture class names like “Sample_pH” which
would then read “SamplePH”. XML based languages don't like the space as a
separator, so check how your parser copes with it in the (meta-) RU which captures the
name for the RU. The current Jena XML parser does not cope with spaces in class-
names when using the Protégé-2000 OWL-plugin. When the class name is captured
within the rdf:ID element, it becomes also part of a Namespace and URI in OWL and
these as explained above should not contain spaces or special characters. This is not
an issue when using the rdf:label element to capture the class name. The easiest thing
is however to avoid the space at all.
4.3.3.1 Hyphens, dash and slash
The hyphen should be avoided as word-separator; it should be used as in normal written English language as long as the representation formalism allows it. Java
will interpret the Hyphen as a minus. Using the hyphen as separator would also cause
ambiguity when using hyphens when required by English, e.g. “copper-
based_compound” and when used to restrict or refine the meaning of a name, e.g.
"bow-boat_part" and "bow-the_weapon" as is still done in some ontologies. The Hyphen
has many meanings which we take for granted, but which have to be assigned more
explicitly to be processed by computers. When using the hyphen one should be aware
that its meanings can conflict: It can generally mark an undefined "somehow-related-to"
relationship, it can mark a closer semantic binding as in “copper-based_compound” and
can encode substantiation like in "abdomen-sonography", but it can also mark a
divergence in meaning between the two words, as in "black-white". In “bio- and
genetechnology” it encodes an ellipse, standing for the morpheme “technology”.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 15
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Sometimes the hyphen encodes different logical connectors like "and" or "or" and it can
be used to separate syllable when breaking a work in two at the end of a line. In
sentences it can of course also encode separation marks for additional thoughts
squeezed into a sentence as in “I am always there in time – except Sundays – to listen
to you.” The hyphen also demarks numerical, spatial or temporal lengths as in “1–4
telephone calls”, “Bremen–Hamburg” and “25.09.–28.12”, or is used as a minus or to
indicate an omission as in “the PC is worth 300,–“. Last, but not least it can be confused
with a minus.
So we need to differentiate between the hyphen and a dash. There are two kinds of
dashes: the n-dash and the m-dash. The n-dash is called that because it is the same
width as the letter "n". The m-dash is longer, he width of the letter "m". We use the n-
dash for numerical ranges, as in "6-10 years." When we need a dash as a form of
parenthetical punctuation in a sentence we use the m-dash.
The slash "/" means OR or AND in most cases and should be avoided in class names
as should logical connectives in general.
4.3.4 Singular nounsNames for RUs should be in the singular form throughout. Class names are always singular nouns, e.g. "randomisation" instead of "randomise". This prevents
redundancy and misclassifications, for example creating a class "experiments" (plural)
and then "experiment" as its subclass deeper in the hierarchy. If you want to import
legacy XML or generate XML feeds from the ontology you have to use the singular form
anyway, since this is the expected convention for XML tags.
4.3.5 Use present tense for representational unitsClass and property names should be uniformly captured in present tense. Sometimes a time perspective is indicated within class or property names, i.e.
”to_be_measured”, “measuring”, “measurement_taken”. Class names should be
normalized consistently into the present tense form, e.g. “measurement”.
4.3.6 Plurals and setsIf you have to capture plurals you have three possibilities e.g. “protocols” “set_of_protocols” or “protocol_set”. The last form is recommended, because it is
easier to spot (also for textmining). It is preferred over “set_of_x” because it is placed
alphabetically directly beneath its singular form within the hierarchy. Use plurals
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 16
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
sparsely. Creating for each singular x a plural-container of the form “x_set” creates a lot
of classes, which we might not use at all. An instance of 'protocol' is a protocol and an
instance of 'protocol_set' is a set of protocols. Be aware of the difference: Each class 'A'
in an ontology has the implicit meaning 'the class A'.
[Refine this (Chebi comment)]
Discriminate carefully between Class and Set: Both classes and sets are marked by
granularity, but sets are timeless. A class endures through time and survives the
turnover in its instances. A set is determined by its members. A class is not determined
by its instances (as a state is not determined by its citizens and as an organism is not
determined by its molecules). A set is an abstract structure, existing outside time and
space. The set of human beings existing at t is (timelessly) a different entity from the set
of human beings existing at t' because of births and deaths.
4.3.7 Avoid linguistic ellipsesBe explicit, try to avoid ellipses, because what you leave out or think as implicitly
clear is not necessarily known by others and in any case not for computers. An ellipsis is a rhetorical figure of speech, the omission of a word or words required by strict
grammatical rules but not by sense. The missing words are implied by the context in
human language. Ellipse usage often points to slang words which should be avoided, or
put as synonyms, e.g. "chemo" for "chemotherapy". The aposiopesis is special form of
rhetorical ellipsis (wiki). Typical examples of this are: Pat embraces Meredith, and
Meredith, Pat, in which the second instance of the word embraces is implied rather than
explicit. And so to bed, which appears on several occasions in the diary of Samuel
Pepys, meaning and so I went to bed.
The Plant Ontology used to use 'cell' to mean 'plant cell' in this way, which led to
problems when they had to extend the ontology to deal with bacteria in plants. They
have now changed the definition and name of their former 'cell' to ‘plant ceell’ and
created a broader ‘cell’ class. The general rule is, for every expression 'E': 'E' means:
E. The term ‘E’ means what the word ‘E’ means, but the word ‘E’ may mean different
things...
Sometimes hyphen usage is a hint for Ellipse usage. This should be avoided, e.g. "bio-
and genetechnology" would be "biotechnology and genetechnology" and then probably
modelled as two separate classes "biotechnology” and “genetechnology".
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 17
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Confusion is also spawned by the fact that we use the very same general terms to refer
both to universals and to collections of particulars. Consider:
· HIV is an infectious retrovirus
· HIV is spreading very rapidly through Asia
This however could also be regarded as an ellipise usage: The first ellipse "HIV" stands
for "HIV-Virus", the second ellipse stands for "HIV-Disease".
4.3.8 Acronyms and AbbreviationsIdeally, abbreviations in names should be avoided and acronyms resolved. Names for RUs should be explicit, e.g. "number_of_residues" should be used instead
of a totally unintuitive "n_res". Acronyms should be included in the synonyms list and
resolved if used as preferred class name. When an acronym, however, is commonly
used with very high frequency in everyday language in place of its full name, for
example “laser”, it should be used as class name, while its resolved name listed in the
synonym list. Domain-specific acronyms should be resolved. Only the main focus
Acronyms that are found frequently in the ontology can stay as they are. Resolving e.g.
“NMR” as “nuclear_magnetic_resonance_spectroscopy” in each RU within an NMR
ontology makes too many terms unnecessary long and hard to read.
Top level classes should never have abbreviations or acronyms in their names,
however, there are bottom level classes in which an acronym or abbreviation could be
used. In these cases of compound terms on the bottom level the acronym should be
unambiguous and be resolved at least in one of the synonyms. Do not allow
abbreviations which employ expressions with other meanings ('chronic olfactory lung
disorder' should never be abbreviated: cold). If they can’t be avoided capitalize
Acronyms. There is no clear policy on when to spell out abbreviations, so use your common sense.
4.3.9 Registered Product- and Company-namesProprietary names should be captured as they are, as long as this is not prohibited
by the parser. In our case we are not restricted here, but should discuss, whether we
allow spaces, or substitute them with the underscore.
[add and refine]
4.3.10 Word compositions and length
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 18
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Names for RUs should be at least four characters long and as short as possible to be easy readable and understandable. It should be avoided to create human readable or preferred names that look like full sentences. Ideally, short and maximally intuitive names are to be preferred. Names are useful only if they are in fact used [see JacobKoehler paper."intelligibility of GO terms" + DILS paper].
Word compositions longer than five words / morphemes should be avoided. When class
names are made out of more words, try to use words that are already defined in higher
hierarchy levels of the ontology. ‘Recycle’ words whenever possible. Build compound names out of simpler ones from the ontology in a consistent LEGO-like approach. Consistent means that the binding operators (words used to connect the
other parts of the class name) are used in the same sound manner throughout the
ontology.
4.3.10.1 Compound vs. atomic names for representational units
Sometimes one encounters rather long names for RUs, which encode a lot of semantics
within the name. These complex names are compositions of many words and therefore
are called compound terms. They often consist of a noun phrase, like
"sample_temperature_in_autosampler" embedding a prepositional term (localizational
property like "in_autosampler"). [Compositionality – see Chris Mungall's OBOL , see
Okren]
When the representational formalism allows to formalize properties and the atomic
compounds are already present, these classes can be refactored / dissected /
decomposed into more primitive existing classes (atoms) and attributes or relations
between them. I.E. this is encouraged for OWL ontologies. When only an is_a
hierarchy (without properties) is provided, compound names should be kept in the long form to capture what the user really wants to express and one has to keep the semantics within the class. As long as working with CVs one should aim to
be reasonably descriptive, even at the risk of some verbal redundancy or longer names.
That is why one often finds rather long class names in taxonomic CVs (e.g. GO).
When word combinations with genitive, dative or accusative case occur, variants are
possible, e.g. Combination into one single word, e.g. Breaking_off_the_experiment
experiment_breakoff or connection with hyphen, e.g. NMR_of_Hydrogen Hydrogen-
NMR.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 19
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
According to DIN 12/1993, when new terms are created out of existing, already defined
class names (B. Schaeder, Fachlexicographie: Fachwissen und seine Repraesentation
in Woerterbuechern, 1994, Tübingen) the following types of multi-word terms can be
distinguished (Schaeder,1994) :
Determinative term (Concept) linkage:A second term occurs additionally, as a feature in the content of the original term,
whereby the latter is restricted. The resulting multi-word term is a subterm. E.g.
randomised study.
Disjunctive term linkage:The new multi-word term encompasses the scope of both constituent terms. E.g.
Consensus Study.
Integrating term linkage:Objects associated to terms are combined into the next higher whole. E.g. Sponsor-
investigator.
Conjunctive term integration:The new term merges the contents of both constituent terms, and is their next common
subterm. E.g. Investigator study.
4.3.10.2 Splitting and merging classes
Simple (sometimes hyphen separated) and bimorphemic compound terms like
"histology-result" should only be atomised into histology and result when the occurring
morphemes represent single important classes themselves which are of use in other
multi-word creations. E.g. for a clinical trail the atomic morphemes "ethics" and
"commission" are not important, so a multi-word term like "ethics_commission" can stay
like this and needs only be defined once as is.
The standard procedure for refactoring / splitting a class is to obsolete the original class
and add a suitable comment directing annotators to the new classes (see Metadata
section). Classes are merged in cases where two classes have exactly the same
meaning. Usually this situation arises when one class exists, and another wording of the
same concept is added as a new class instead of as a synonym, either because a
curator didn't find the old class or didn't know it meant the same thing.
For owl: When two classes are merged, e.g. class A and class B are merged into class
A, the class name and the ID of class B is made a synonym of class A.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 20
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
For obo: When two classes are merged, e.g. class A and class B are merged into class
A, the ID of class B is made a secondary ID, and the class name is made a synonym.
Usually, the ID that has existed longer is used as the primary ID, but exceptions can be
made; e.g. the name of the class with the newer ID may be more correct or the
definition may be better. Secondary IDs are stored in the OBO flat file with the 'alt_id'
tag.
4.3.11 Affixes (prefix, suffix, infix and circumfix)The word-stem should be used and affixes to names should be avoided where possible or at least be used consistently. Since each class 'A' implicitly means 'the
class A', either prefixes or affixes involving “_class” must be avoided. The same applies
to suffixes like "_entity" and "_type". When an ontology has many terms starting with the
same prefix, for example “sample_number”, “sample_origin”, … , it suggests the need
for transforming the postfixes into properties of a [prefix]-class when building the
ontology. If subclasses are named using the class-name and a further descriptive
morpheme, this should be done in a consistent way throughout the subclasses. For
example, a class "receptor" can have two subclasses named either
“katecholamine_receptor” and “peptide_receptor” (naming them just “katecholamine”
and “peptide” would be a bad practice since ellipses have to be avoided and “peptide”
designates a complete different class anyway). So there should not be the names
“katecholamine_receptor” and “peptide”. If one prefixes a "receptor"-subclass name in
the form xy_receptor, e.g. "adrenaline_receptor" (having the ligand as xy (prefix), one
can't integrate receptors that are named according to their succeeding signalling
transduction module, e.g. "G-proteine_coupled_receptor" (and not the ligand) in a
consistent way. Infixes, circumfixes, articles, conjunctions and possessive forms of words should be used consistently, but be avoided when possible.
4.3.12 Logical connectivesLogical connectives such as "and", "or" and "not" should not be used within names for RUs, because they will be formalised as constraints and axioms later (and
hence will allow for reasoning). 'rabbit or whale' does not designate a special universal
of mammal.
4.3.13 "Taboo" words and Characters
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 21
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Where possible, words from the metalevel (the representation formalism / KR language) should not be used within names for RUs. The use of database or
ontology language keywords, for example "Model", "Class", "KIF", "Clips" and "OWL"
and xml style tags or characters designating tags or regular expressions should be
avoided when possible, because you never know whether all parsers you might need to
use will handle these. Also when translations into other formats have to be made you
can be sure not to run into parser problems in these other formats.
Other words and morphemes to be avoided are highly ambiguous ones, e.g. the affixes
“set” and “setting” belong to the most ambiguous words in English. "Set" alone has over
20 different meanings (set refers to the process of setting parameters or to a plural of
parameters.
4.3.14 Specific language requirementsConsistency is required if encountering this special case.Where there are differences in the accepted spelling between English and US usage,
use the US form, e.g. polymerizing, signalling rather than polymerising, signalling.
A common source of misspelled tags is the translation from other alphabets or
characters. For example, the Umlaut, commonly used in German, is usually represented
by the Latin-1 character set. Since this character set is often unavailable, Germans
frequently represent an Umlaut character by means of a longhand encoding, such as
"ue" for "ü". Consistency is required in these special cases to avoid mixture of "ü"s and
"ue"s.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 22
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
5 Depicting representational units within textBe consistent in your notation. We use bold type to depict relations involving
particulars; italics for universals and for relations between universals and Roman for
particulars.
[to be added: Formatting convention when using ontological repr units in literature –
see OBO ROItalics
Bold“ “
‘ ‘
UPPERCASE throughout
lowercase throughout
underlined
One Recommendation: If you use boldface to emphasize that you speak of the term and
not of its denotation, then do not use boldface for other purposes. Use single quotes to
explicitly refer to the term 'class'. Since classes are not terms, but rather have terms as
names one should say: "the class called 'human'" (where 'human' is the term used to
name the class in question), or "the class human" (where italics are used to emphasize
that human represents a class). One might though want to reserve italics for universals,
eg., "the class representing (the universal) human", and then one should say "the class
human", or "the class 'human'" (the last is a shortcut, and this kind of shortcut should be
introduced explicitly).]
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 23
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
6 Class definitionsClass definitions should provide the context and meaning of the class in a way to ease its interpretation. The definition should contain important keywords that describe the classes inherent attributes and relations to other classes in natural language. However in reality proper definitions can not be created for all universals,
especially at the root level of the ontology (e.g. it is hard to define “thing”). A class
should be given a humanly intelligible definition only when the necessary and sufficient
conditions for being an instance of the corresponding universal are really understood.
Before that, do not make up pseudo-definitions (e.g. circular definitions), but
provisionally collect the necessary conditions in the comment field. Proofread your
definitions carefully to eliminate typos and double spaces. As with class names, avoid
using abbreviations that may be ambiguous. Definitions should be as brief as possible, but as complex as necessary. They should begin with an upper-case letter,
can consist of more than one sentence if necessary and end always with a period (full
stop). Definitions should start in the following way: “A [class described] is a [superclass],
which/that [most relevant intrinsic properties (attributes and relations to other classes)].
It…. [Enter]”. When using the word “it” make sure you always refer to the described
class only.
In practice one would first capture non-formal definitions as they come from the domain
experts, glossaries or gathered by a google:define search. These are captured with their
provenance (meta-) data, after a “tempdef” marker. Then one creates a second
definition which is more formal and standardized according to the defined principles
mentioned below (put after the def marker, see metadata section). Currently all
definitions are captured together with metadata in the rdfs:comment field, which is not
the cleanest solution, since the comment field can mean anything from editorial notes,
scope notes, provenance notes and definitions. The xml:lang attributes do not have to
be set, because they can be set once for all classes in the metadata ontology
description tab and these lang-attributes - at least for the rdf:label field - tend to cause
problems when importing these ontologies.
6.1 General rules for creating sound normalized definitions
1. Each definition refers to only one class.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 24
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
2. Definitions should be as clear and concise as possible in order to convey the essence, "Das
Wesen" (Silesius) of the universal to the user of the ontology.
3. Definitions should define classes and their referred universals and not the words used to refer to
classes (class names), so in definitions avoid terms like ‘class’, 'descriptor', 'name', etc. that
refer to RUs and not to the universals in reality. E.g. the definition of 'eye' is 'organ of sight', not
'is name of organ of sight', nor ‘class or concept describing an organ of sight’. Avoid using
acronyms within definitions.
4. The definitions should explain what are characteristics (or properties) that distinguish members
of this class from the others (the upper class and siblings).
5. Definitions should use simple, easy to understand words that are meaningful to most of the
users. In the best case all terms in the definition can be find as classes in higher levels of the
ontology and are thus defined.
6. It should be positive and not negative. Definitions like ‘all animals that are not a mammal’ or ‘ all
non-membrane proteins’, which do not designate natural kinds are not helpful, since
complements of universals are not necessarily themselves universals.
7. The formal rules for definitions laid down by Aristotle should be applied. When A is_a B, the
definition of ‘A’ takes the form: An A is a B which C... e.g: “A human being is a mammal which
is rational”. Essence = Genus + Differentiae. If a class has more parents, I.e. multiple
parenthood can not be avoided, mention all parent classes in the definition.
8. The definition should be free from words sharing the same root as the thing being defined (to be
represented) and should not contain the class name itself. Avoid circularity in definitions like
these:
An A is an A which is B (person = person with identity documents)
An A is the B of an A (heptolysis = the causes of heptolysis)
9. Each definition should reflect the position in the hierarchy to which a defined RU belongs. The
position of a RU within the hierarchy enriches its own definition by incorporating automatically
the definitions of all RUs above it. The entire information content of the hierarchy can then be
translated cleanly into a computer representation.
10.The definition must be correct in most of the possible contexts the class is used, so that the
class is intersubstitutable with its definition in such a way, that the result is both grammatically
correct and truth preserving.
11.Include some examples of well known prototypical instances or subclass of the class.
Additionally have a look at the following paper by Jacob Koehler:
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1482721&blobtype=pdf[Do we need definitions for particulars that we currently represent as classes, e.g. do the brand names of
nmr-instrument vendors need definitions???]
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 25
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
7 Unique identifiers
Following the decentralized web paradigm, every single RU (class or relation) should be versioned independently rather versioning the ontology as a whole. Therefore it is necessary to consider conventions for unique identifiers for RUs. If one tries to edit a set of modular ontologies held together by just the string class names,
every time somebody wants to change a name, fix a spelling error, etc. there is a global
change that is intrinsically unreliable or, if the ontologies are distributed, requires a
major organisational effort. When the identifiers are formal ID numbers and human readable class names are kept as labels you can change the label without disturbing the linkages. Hence versioning becomes easier when using unique formal
Identifiers for RUs in representational artefacts. Some ontology editors, like Protégé-
2000, construct identifiers out of the ontology name and numbers automatically.
A unique identifier MUST NOT be deleted once used. IDs should be conserved at all
times so that, even if a term is defunct or has a new ID, someone searching using the
old ID can find it.
OBO encourages numeric local IDs. Anything that is a valid XML ID can be used. As a
rule of thumb while user friendly names for RUs should not cause problems for human processing, their IDs should not cause problems for machine processing .
Always remind that an ID is associated with a definition and a universal rather than with the preferred class name. The numeric identifier resides in the rdf:ID field
and the human readable name of the class is in the rdfs:label field. These correspond to
the X and Y fields in the OBO-Format.
OBO IDs consist of a (all capitalised) prefix + underscore or ”:”(not in owl) + local ID.
The prefix can be the more commonly used short form (e.g. ‘OBI’ or ‘msi-nmr’) or a long
form (e.g. the full URI prefix). Only the long form + local ID is used in proper OWL files
(although the short form can be used as a qname). Currently the long form is left implicit
for most OBO ontologies; OBO will come up with a default mapping (which can be
overridden by the ontology maintainer); e.g. ONTOLOGYSHORTNAME_21
urn:lsid:ontologyshortname.sourceforge.net: ONTOLOGYSHORTNAME_21 and there
will be widgets in Protégé for substituting the short with a long form throughout an
ontology. OBO has to decide whether to go with URNs on more standard URIs as the
default short->long mapping.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 26
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
[The RECOMMENDED system of identifiers for the PSI CVs consists of two parts. Part one should be the
official ‘namespace abbreviation’ PSI:XXX. The second part corresponds to a numeric accession
numbers having the pattern “000000’. Therefore, the local identifier is XXX:000000 and the complete PSI
CV unique identifier is of the format “PSI:XXX:000000”.]
Within OBO an "OBO_REL_"-prefix is used to name relations within the rdf:ID field,
e.g. rdf:ID="OBO_REL_part_of". The OBO prefix / idspace equates to an XML/RDF
namespace: A mapping between a "local" ID space and a "global" ID space. The value
for this tag should be a local idspace, a space, a URI, optionally followed by a quote-
enclosed description, like this: idspace: GO urn:lsid:bioontology.org:GO: "gene ontology
terms".
7.1 Capturing the class name and ID using the autoID plugin in Protégé-
owl
Within our current ontologies the unique class IDs goes in the rfd:id field. The value of the rdf:id field is restricted and can only contain special characters at special positions. The rdf:id field can contain the following characters:
at the beginning: £ $ _ and :
but not :@[{./=-+<~#!"%^&*(`
within, no @[{./=-+<~#!"%^&*(` are allowed
at the end : is prohibited.
The IDs consist of a short prefix designating the ontology (i.e. msi-nmr, OBI of FuGO) and a number within a range that can be specified. We use the CO-ODE autoID plugin for that.
The preferred class name goes into the rdf:label field, which is not restricted and can contain any Unicode character-sequence (For the owl:DatatypeProperty resp.
owl:AnnotationProperty rdfs:label protégé gives an xsd:string-Instance of the class
rdfs:datatype as the value for its range; http://www.w3.org/2001/XMLSchema#string).
In the Protégé owl plugin per default the rdf:id class name is used as "browser- or display key" within the term hierarchy (this field is mapped to the :STANDARD-
SLOT :NAME with the protégé value type string (Protégé meta-architecture). This
default setting reflects the fact that the alternative field to capture term names, the
rdf:label filed can be ambiguous, because more than one rdf:label fillers can be
asserted, e.g. synonyms (This however can result in the ontology to become an owl-full
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 27
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
ontology, which is not wanted when reasoning shall be applied). To set the OWL
Hierarchy display keys from :NAME to rdfs:label proceed as follows (Also have a look at
https://www.cbil.upenn.edu/fugowiki/index.php/ProtegeGoodies and
http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin):
Owl/Preferences/Visibility/Metaclasses: mark rdfs:class and owl:class to show up. Then
select (the now visible) owl class in forms tab and change the selection for the "Display
slot" from ":NAME" to "rdfs:label".
From now on in the class hierarchy of the OWL Classes tab:
1. classes with no rdfs:label display by their identifier (':NAME');
2. classes with rdfs:label for which all the labels have explicitly a language display
their identifier;
If more than one rdfs:label field with specified lang attribute exist (e.g. when capturing
synonyms within this field, the display-slot will be the alphabetically first one, which is
not necessary the preferred class name (caution!).
3. classes with at least one rdfs:label for which the language is not explicitly stated
display the value of this rdfs:label.
7.2 Life science Identifier, (LSID: http://lsid.sourceforge.net/)
The LSID concept introduces a straightforward approach to naming and identifying data
resources stored in multiple, distributed data stores in a manner that overcomes the
limitations of naming schemes in use today. Almost every public, internal, or
department-level data store today has its own way of naming individual data resources,
making integration between different data sources a tedious, never-ending chore for
informatics developers and researchers. By defining a simple, common way to identify
and access biologically significant data, whether that data is stored in files, relational
databases, in applications, or in internal or public data sources, LSID provides a naming
standard underpinning for wide-area science and interoperability. A LSID conforms to
the URN standards defined by the IETF. Every LSID consists of up to five parts: the
Network Identifier (NID); the root DNS name of the issuing authority; the namespace
chosen by the issuing authority; the object id unique in that namespace; and finally an
optional revision id for storing versioning information. Each part is separated by a colon
to make LSIDs easy to parse. Here are a few examples:
urn:lsid:pdb.org:1AFT:1 This is the first version of the 1AFT protein in the Protein
Data Bank.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 28
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434 References a PubMed article
urn:lsid:ncbi.nlm.nig.gov:GenBank:T48601:2 Refers to the second version of an entry
in GenBank
LSIDs name and refer to one unchanging data object each. Unlike the familiar URLs of
the World-Wide-Web, LSIDs are location independent. This means that a program or a
user can be certain that what they are dealing with is exactly the same data if the LSID
of any object is the same as the LSID of another copy of the object obtained elsewhere.
The problem with URLs is that they always point to a particular web server (which may
not always be in service) and worse, that the contents referred to by a URL often
change.
A universal naming scheme simplifies the processing of data from a variety of sources,
because the application does not need to have specific, hard-coded support for each
naming scheme. This allows cross-referencing between data sources to be done
implicitly using URI’s. One such effort currently underway is the Life Sciences Identifier
(LSID) project. An example looks like this: urn:lsid:uniprot.org:uniprot:P49841. This
LSID names a protein record in Uniprot that is referred to as P49841. It consists of parts
separated by colons: A prefix “urn:lsid:”, the authority name; the authority-specific data
namespace; and the namespace-specific object identifier (here “P49841”).
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 29
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
8 NamespaceEvery ontology has its own characteristic namespace, a string of characters that
prefixes the individual identifiers for RUs in an ontology. By maintaining different
namespaces for different ontologies it is possible for one ontology to reference classes,
properties and individuals in another ontology in an unambiguous manner and without
causing name clashes. For example, all OWL classes reference the class owl:Thing.
This class resides in the OWL vocabulary ontology that has the namespace
http://www.w3.org/2002/07/owl#. The FuGO ontology refers to and makes use of the
Dublin Core ontology for annotating its RUs. It refers to these dc classes through their
namespace (http://purl.org/dc/elements/1.1/, as set in the Protégé metadata tab). A
namespace is also called a context, as the valid meaning of a name can change
depending on what namespace applies. In order to ensure that namespaces are unique
they manifest themselves as Unique Resource Identifiers. As in the OWL language the
class names are also part of a URI, they may not contain spaces or special characters.
In practice the namespace URI is an URL where the ontology can be found from within the internet, e.g.:
For the msi-ontology: http://msi-workgroups.sourceforge.net/ontologies/msi/msi.owl
For the FuGO-ontology: http://fugo.sourceforge.net/ontology/FuGO.owl
To get the corresponding namespaces just add the “#” to these URIs.
For better readability however one can internally also substitute the full namespace with
a short intuitive prefix, which should be the same as for the class ID, e.g. “FuGO” or
“msi”.
Serial namespace dependancies: If FuGO imports an ontology, e.g DC and the msi-ontology imports FuGO, all references FuGO points to will automatically also be passed to the msi-ontology. So you don’t need to import these again when they are already imported by an imported ontology.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 30
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
9 Ontology Imports in Protégé-owlThe usage of standardized ontology representation languages like owl in conjunction
with the ubiquitous URI-based access on Semantic Web resources and the emergence
of ontology management methods and tools constitute a solid inventory for building
more reusable ontologies. To be able to reference to another web-based ontology, e.g.
OBI ontology classes or properties, the full OBI ontology has to be imported into the
active one (i.e. the msi-nmr ontology). Then we can experiment with the “binning” of
classes from our domain dependant / community specific ontology into more general
OBI-ones (OBI does the same with BFO.owl). To import an ontology to be referenced (e.g. OBI.owl), proceed as follows (see also http://protege.cim3.net/cgi-bin/wiki.pl?OWL_Imports_Repositories ):
9.1 The “lang” attribute issue
The Protege wiki page HidingIdentifiersWithLabelsInOWLPlugin
( http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin )
explicitly states (point 4.3.) that per default “
classes with rdfs:label for which all the labels have explicitly a language display
their identifier; (435)
classes with at least one rdfs:label for which the language is not explicitly stated
display the value of this rdfs:label. (436)“
… even if the display slot for owl:classes has been set to rdfs:label.”
For this reason replace all " xml:lang="en"" within the OBI.owl file with "" and save it.
This has to be done, because having the lang attribute set, the Protégé browser-key for
the imported classes will be set to be the unintuitive non descriptive rdf:ID again.
Alternatively work around this as described on the Protégé wiki. Import the Protégé
Metadata ontology and then …
“In the 'Metadata' tab, make sure the primary ontology (the top one) is active and
selected (67H)
1. create a new annotation property (select 'protege:defaultLanguage') 2. in the 'value' field, set the default language; 3. do not fill the 'lang' field for the annotation property; 4. in the class hierarchy of the OWLClasses tab:
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 31
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
1. classes with no rdfs:label display their identifier (':NAME'); 2. classes with rdfs:label for which the language matches the value defined at
step 5.1 display this label; 3. classes with rdfs:label, none of which matches the value defined at step
4.1 display their identifier (':NAME'). “
[Here it is not clear whether in point 3 the lang attribute for the ontological classes themselves or the lang
attribute for the protege:defaultLanguage property is meant (I assume the latter). Unfortunately it is not
stated how classes with an rdfs:label with no lang-attribute set will be displayed.
Following all recommendations from this website I still see classes of imported ontologies displayed by
their ids. This holds true for the ones that are direct parent classes of the ones in the primary ontology. I
assume this bug has something to do with the way protégé stores the referenced classes when an
imported ontology is deleted and after saving the updated new ontology is imported…. This issue still has
to be cleared.]
9.1.1 ImportOpen your ontology in Protege 3.2 beta, go to the metadata tab and click the Import
Icon within the ontology browser frame. Now select from where you want to import you
ontology from. Here you specify the URI pointing to the owl file to be imported, e.g.
weather you want to import from the web (a URL), from a local file or from a repository
file.
Then in the namespaces field of the Metadata Tab add the URI/namespace of the
ontology to be imported and a short prefix for it.
After this save the ontology and re-load it. You should see the top-level nodes of the
imported / referenced ontology. In any case you can only import the whole ontology, not
just certain modules (this is currently being worked on).
You are now able to use these referred classes and other RUs from the imported
ontology. You can make FuGO classes superclasses of your community / domain-
specific ones, e.g. of multiple parenthood of the NMR "autosampler" as a subclass of
msi:optional_part_of_NMR_instrument (rdf:ID="MSI_400001") and FuGO:Instrument
(rdf:about="#FUGO_47") : In owl such a reference looks like this: <owl:Class rdf:ID="MSI_400002">
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>autosampler</rdfs:label>
<rdfs:subClassOf rdf:resource="http://fugo.sourceforge.net/ontology/FuGO.owl#FUGO_47"/> <rdfs:subClassOf>
<owl:Class rdf:ID="MSI_400001"/>
</rdfs:subClassOf>
</owl:Class>
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 32
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
The reference to the FuGO superclass is kept, even if the path to the imported ontology
has changed and it is not accessible. Also if you remove the imported ontology the
classes used as parent classes by the primary ontology classes stay in the ontology.
The reference is done through the rdf:resource value (which consists of the reference ontologies namespaceURI+#+rdf:ID-value ). Conventions for import repositoriesThere is no formal convention for determining the location of an ontology given its URI,
but it is generally recommended that ontologies are made available on the web at a
location that corresponds to their URI., e.g. the FuGO ontology should be able to be
found under http://fugo.sourceforge.net/ontology/
Since SF is sometimes very slow, a faster acessable website would be better.
Note that even if an ontology URI appears to be a URL, it is not necessarily resolvable.
In other words, it may or may not be the case that if an ontology URI is typed into a web
browser, a document containing the ontology will be displayed.
In order to deal with this. i.e. allow Protégé to load an ontology when its URL is not
resolvable, Protege-OWL uses a mechanism based on the notion of "ontology
repositories" to determine where an ontology should be loaded from. Given an ontology
URI, a repository can be checked to see whether or not it contains the ontology that is
identified by the URI. If the repository does contain the required ontology, then it acts as
a gateway for loading and possibly saving the ontology. Protege-OWL maintains a list of
such repositories and searches them when attempting to import an ontology
9.1.1.1 Importing from repositories (extracted from the Protégé wiki)
To manage repositories look at the "Ontology repositories..." item on the OWL menu.
Here for each ontology, its URI is shown, with a description of the location of the
ontology. One repository can provide access to several ontologies and for each of these
several alternative locations.
Protege-OWL supports the notion of global and project repositories. Global repositories
are by default available to every new project and are typically useful for managing
commonly used ontologies, such as upper ontologies, that will be imported into most
projects (so here would the BFO.owl belong which should be used by all OBO
ontologies ?). Project repositories are associated with a specific Protege-OWL project,
and must be created/specified for that project.
Protege-OWL supports the creation of four different types of repositories:
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 33
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
HTTP repositories (URI refers to a URL)
Local folder repositories (URI refers to an absolute location for a repository folder
that is searched for the owl files)
Relative folder repositories (URI refers to a relative location for a repository folder
that is searched for the owl files)
Local file repositories (URI refers to the absolute position of a special .repository
file)
It is possible to specify multiple ontology repositories. When there are multiple
repositories, the ordering of ontology repositories, where an ontology is searched and
imported from, is as follows:
Search any project repositories from top to bottom. If the ontology is found, load the
ontology from the repository that the ontology is contained in.
Search any global repositories from top to bottom. If the ontology is found, load the
ontology from the repository that the ontology is contained in.
Attempt to resolve the ontology URI and import the ontology from the location
pointed to by the resolved URI.
If loading the ontology from the resolved URI fails, a dialog is popped up asking the
user to specify a repository where the imported ontology may be loaded from. (Note that
a dialog is not shown automatically when using the Protege-OWL API).
Protege-OWL stores repository information in plain text files, with one line per
repository. The global repository is saved in a file called global.repository which resides
in the Protege-OWL plugin folder. Project repositories are saved in a file that has the
same location and name as the project OWL file but with an extension of .repository.
9.1.1.2 Changing the imported ontology to be the newest updated version
To switch to a new version of the imported ontology you have to remove the old
outdated version through the metadata tab, save and reload the primary ontology. After
this step you will still find all parent-classes of the old imported ontology (e.g. FuGO
ones) which are used by classes of the primary ontology (e.g. msi-nmr ontology) on the
root-level. Remind: These used parent classes are displayed by their former
namespace-prefix (p1) and their ifentifyer (:NAME, rdf:id, rdf:about), no other data, e.g.
rdfs;labels or definitions will be stored in the primary ontology.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 34
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Now you have to import the updated version of that p1 ontology to embed them into
their originally context again (proceed as described above).
You will encounter the case where the updated ontology was altered in a way that
parentclasses for your primary ontology were deleted. In this case these ‘orphan’
classed are again found on the root level.[Other useful sources include:
http://www.w3.org/TR/owl-ref/#imports-def
http://www.w3.org/TR/2004/REC-owl-semantics-20040210/rdfs.html#owl_imports_rdf
http://www.w3.org/TR/2004/REC-owl-semantics-20040210/direct.html#owl_imports_semantics
http://www.w3.org/TR/2004/REC-owl-semantics-20040210/syntax.html#owl_imports_syntax
http://protege.stanford.edu/doc/owl/owl-imports.html
http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf#search=%22co-ode%22 ]
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 35
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
10 Properties (Attributes and Relations)
10.1Assigning "key-properties" to top level classes
The explicit allocation of class key-properties (the ones that define the essence of
the class A, which discriminates it within its superclass B) fosters consistent taxonomisation of lower level classes, because the inheritance of these properties guarantees that all subclasses at all sublevels can be immediately counterchecked to be consistent with all superclasses at any higher level (this is a feature of the protégé frames visualisation in the ‘properties-view’, not the ‘logic view’). It is not enough to capture these properties in the definitions only, because the
GUI-tools don't pass them on to the leaf classes like they do for formally assigned
properties. Explicitly formalised properties help constraining the interpretation of their domain- classes and all subclasses, which is exactly what is needed to provide the context for classification. These key properties help to keep track of the
intended (otherwise implicit) context, all the way downstream to the leaf nodes.
Classification can be decided to be true or false e.g. for the following case:
time_independent_study is_a ,...., is_a unfolding_through_time. If we would have
assigned a key-property has_timeline to the top level class “unfolding_through_time” (or
process), in the ‘properties view’ of the tab, we would immediately see this property
(inherited) at the leaf node “time_independent_study”, and here we could (by having this
information immediately visually accessible) decide more easily if this classification is
valid, e.g. when we then see the has_timeline property associated to the
“time_independent_study”, this feels counterintuitive at first and we might have a closer
look at this classification or the definition. However, since a “time_independent_study” is
not the same as a “study_without_timeline”, the classification is correct in this case.
Possible key-properties for a “process”-class could be starts_at, has_object_participant,
induced_through. Key-properties for the “object” top level class could be has_position,
has_mass, ….
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 36
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
11 Naming of Ontology files and Ontology VersionsA file-naming convention will help to capture basic metadata into filenames and
provides a simple versioning mechanism, for files which our community members may
upload into the file repositories. Any recommendations tackling this issue are of course
dependent on the way files are stored and versioned. So the following can be neglected
when an updating and versioning mechanism, e.g. cvs or svn is used. Owl can capture
a lot of these data in its metadata sections.
When no automatic update and versioning system is used ontology files and directories should be named according to the following syntax:ShortDescriptiveFilename_Author_Version_Date.extwhere, "ShortDescriptiveFilename" is a short descriptive filename that may contain
upper and lower case text, numerals, "-" (dash) and "_" (underscore). Use upper camel
case and underscore as separators. Space and other symbols are not allowed. "Author"
comprises the name of the author and/or the organization where the file is authored.
Separate author and organization with a dash if both are featured. Again, space and
other symbols are not allowed here. "Version_Date" comprises the version number
and/or the date the file is released. Start the version number with a "v"; use "-" instead
of "." in the version numbering (like "v2-15" instead of "v2.15"). Separate version and
date with an underscore, if both are featured. For the date reference, the more
significant parts should come first, as this eases alphabetical sorting according to the
date: use "yyyymmdd". Add an "a", "b", "c", ... suffix, if multiple versions may occur with
the same date reference. Again, space and other symbols are not allowed here. After
this follows the "." (a dot, there should only be one dot in the entire filename and that
should be right before the file extension). "ext" is the standard file extension by which
this file can be associated with an appropriate application that will handle it. This is
generally in 2~4 lower case alphanumeric characters. E.g.: NMR-Ontology_MSI-DS_v1-9_20060420.owlA similar convention is being practiced at w3c for their published work (e.g. note their
page header information http://www.w3.org/TR/2004/REC-webont-req-20040210/ ).
Also use a convention when constructing URIs for ontology versions, and apply it
consistently. In the following example the date on which the ontology was frozen is used
to construct the URIs for the ontology versions, but the version could also be used:
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 37
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
Ontology URI: http://www.example.com/nmr-ontology
Ontology version URIs: http://www.example.com/nmr-ontology_20051004
http://www.example.com/nmr-ontology_20051126
Filenames may only contain alphanumeric characters, the period ("."), dash ("-") and the
underscore character ("_"). Spaces, parenthesis, or other commonly used characters,
such as "~", "&", or "#" will cause the file to be rejected.
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 38
Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative
13.10.2006 Ontology Working Group
12 References[1] <<Metadata Annotations for Representational Units and Representational
Artefacts>>:
http://msi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc
[2] Establishing reporting standards for metabolomic and metabonomic studies: A call
for participation. Fiehn O, Kristal B, van Ommen B, Sumner LW, Sansone SA, Taylor C,
Hardy N, Kaddurah-Daouk R. OMICS 2006 Summer;10(2):158-63.
[3] http://msi-ontology.sourceforge.net/
[4] http://psidev.sourceforge.net/
[5] http://fugo.sourceforge.net
***** NOTE: This draft document is a work in progress *****It will be expanded to contain also recommendations referring to RUs
within representational artefacts formalized in richer semantics.
Comments and ideas are welcomed and should be sent to:
Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 39