Naming conventions - SourceForgemsi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc ·...

Naming Conventions for CVs and Ontologies Draft v7 Metabolomics Standards Initiative

13.10.2006 Ontology Working Group

Naming Conventions forControlled Vocabularies (CVs) and Ontologies

http://msi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc

1 RATIONALE FOR THIS DOCUMENT......................................................................4

2 (META-) REFERENCE TERMINOLOGY.................................................................5

3 GENERAL PRINCIPLES FOR CREATING REPRESENTATIONAL ARTIFACTS. 6

3.1 Univocity..............................................................................................................6

3.2 Positivity..............................................................................................................7

3.3 Objectivity............................................................................................................7

3.4 Try to avoid multiple parenthood and multiple inheritance............................7

4 NAMING CLASSES.................................................................................................9

4.1 Class name precision.........................................................................................9

4.2 Synonyms..........................................................................................................104.2.1 Different sorts of Synonyms ?......................................................................10

4.2.2 Property synonyms......................................................................................12

4.3 Lexical Properties of class names...................................................................134.3.1 Capitalisation................................................................................................13

4.3.2 Character set................................................................................................13

4.3.2.1 Character set formatting.......................................................................14

4.3.3 Word separators...........................................................................................14

4.3.3.1 Hyphens, dash and slash......................................................................15

Naming Conventions for CV and Ontologies Draft v7 Metabolomics Standards Initiative


4.3.4 Singular nouns.............................................................................................16

4.3.5 Use present tense for representational units................................................16

4.3.6 Plurals and sets............................................................................................16

4.3.7 Avoid linguistic ellipses................................................................................17

4.3.8 Acronyms and Abbreviations.......................................................................18

4.3.9 Registered Product- and Company-names..................................................18

4.3.10 Word compositions and length.....................................................................19

4.3.10.1 Compound vs. atomic names for representational units.......................19

4.3.10.2 Splitting and merging classes...............................................................20

4.3.11 Affixes (prefix, suffix, infix and circumfix).....................................................21

4.3.12 Logical connectives......................................................................................21

4.3.13 "Taboo" words and Characters....................................................................21

4.3.14 Specific language requirements...................................................................22

5 DEPICTING REPRESENTATIONAL UNITS WITHIN TEXT..................................23

6 CLASS DEFINITIONS............................................................................................24

6.1 General rules for creating sound normalized definitions..............................24

7 UNIQUE IDENTIFIERS...........................................................................................26

7.1 Capturing the class name and ID using the autoID plugin in Protégé-owl. .27

7.2 Life science Identifier, (LSID: http://lsid.sourceforge.net/)............................28

8 NAMESPACE.........................................................................................................30

9 ONTOLOGY IMPORTS IN PROTÉGÉ-OWL.........................................................31

9.1 The “lang” attribute issue................................................................................319.1.1 Import...........................................................................................................32

9.1.1.1 Importing from repositories (extracted from the Protégé wiki)...............33

9.1.1.2 Changing the imported ontology to be the newest updated version.....34

10 PROPERTIES (ATTRIBUTES AND RELATIONS).............................................36

Working draft by: Daniel Schober, Susanna-A Sansone, Barry Smith 2



10.1 Assigning "key-properties" to top level classes............................................36

11 NAMING OF ONTOLOGY FILES AND ONTOLOGY VERSIONS.....................37

12 References..........................................................................................................39




1 Rationale for this documentThis document defines naming conventions for controlled vocabularies (CVs) and

ontologies. Metadata annotation elements are not covered here; these are addressed in

the <<Metadata Annotations for Representational Units and Representational

Artefacts>> document [1].

These recommendations have been developed to guide the activities of the

Metabolomics Standards Initiative (MSI) [2] Ontology Working Group (OWG) [3].

The MSI OWG seeks to facilitate the consistent description of metabolomics experiment

components by reaching a consensus on a core set of CVs and then developing an

ontology. The CVs are developed in close collaboration with the HUPO Proteomics

Standards Initiative (PSI) [4] and structured as taxonomies in owl and OBO format. The

ontology is developed as part of the Ontology for Biomedical Investigation (OBI,

previously ‘FuGO’) [5], a larger, multi-domain collaborative effort.

These naming conventions are also used in the context of the OBI, developed in OWL.




2 (Meta-) Reference TerminologyKnowledge representations (KR, also called representational models) are referred to

with the term ‘representational artefact’, RA). A representational artefact is made of

related ‘representational units’ (RU, also known as KR-idioms) - in most cases

classes and properties. We recommend using the term ‘class’ to refer to the

representational unit that models a ‘universal’ in an ontological representational

artefact. Each class has a ‘class name’, a term (string) to designate the class. An

‘Instance’ is the representation of a ‘particular’ in reality. A particular instantiates a

universal and an instance instantiates a class. Properties of universals are represented

through representational units called ‘properties’. Properties which have fillers of

simple datatypes (e.g. integer, string, boolean, ...) are called ‘attributes’ or ‘datatype properties’. Properties which have classes or instances as their fillers (also called

‘range’) are called ‘relations’ or ‘object properties’. Confusingly other formats use the

word "property" for restrictions. The word ‘domain’ can mean a group of classes that a

property is asserted to (in owl), but also describes the area of interest of a

representational artefact.

For a detailed recommendation have a look at the full paper:

http://ontology.buffalo.edu/bfo/Terminology_for_Ontologies.pdf

The following key words “MUST,” “MUST NOT,” “REQUIRED,” “SHALL,” “SHALL NOT,”

“SHOULD,” “SHOULD NOT,” “RECOMMENDED,” “MAY,” and “OPTIONAL” are to be

interpreted as described in RFC-2119, see S. Bradner, Key words for use in RFCs to

Indicate Requirement Levels, Internet Engineering Task Force, RFC 2119,

http://www.ietf.org/rfc/rfc2119.txt, March 1997.

Sections in Brackets [...] are comments for the editor. Please ignore these.


http://www.ietf.org/rfc/rfc2119.txt

http://ontology.buffalo.edu/bfo/Terminology_for_Ontologies.pdf



3 General principles for creating representational artifactsBecome acquainted with the capablities and incapabilities of both the representation

formalism and its implementation (an ontology engineering tool) of your choice.

Don’t get into 'analysis paralysis'! You will not get it right at the first time! Sometimes

one has to throw things away and start again. Do not get into the ‘naïve euphoria’ either. Not every fancy just-built piece of representation is an ontology worth bothering

others.

Save often! Always save to a new version number including the date. Protégé-

OWL is not yet completely stable. Undo is difficult and bugs occasionally corrupt

ontologies beyond retrieval.

General Ontology Engineering Axioms: Every class has at least one instance

Distinct classes on the same level and leaf classes never share instances

3.1 Univocity

Names of RUs (including the ones for relations) should have the same meaning on every occasion of use and refer to the same universals and kinds of entities in reality. Each name should refer to exactly one RU, and each RU should represent

exactly one entity in reality (a universal in the case of a class). In effect, it should

unambiguously refer to the same entity in reality. Note that this principle of univocity

excludes homonyms, terms that are used as names of more than one RU. For example,

if you use the term ‘cell’ as a name of the class representing (the type of) cells as found

in all organisms, the same term should not be used as a name for a more specialized

class representing (the type of) cells as found only in plants. Likewise, the term ‘part of’

should not be used to name more than one relation, e.g., partonomy, set membership,

etc.

Further more:

Don’t confuse universals with ways of getting to know types

Don’t confuse universals with ways of talking about types

Don’t confuses universals with data about types




3.2 Positivity

Complements of classes such as ‘non-mammal’ or ‘non-membrane’ are not necessarily themselves classes and don’t designate genuine universals. Similarly,

do not represent the absence of a wing as the presence of the non-existence of a wing,

e.g.: 'wing' has_status "absent". The positivity recommendation may need to be

weakened; sometimes it can make sense to have e.g. an "ex-vivo" role or a “non-

living_organism”.

3.3 Objectivity

No distinction without a difference. A child class must differ from its parent class in a distinctive way. A child class must share all the properties of its parent classes

(inheritance principle) and have additional ones that the parents have not. Each class

must be defined in a formula which states the necessary and sufficient conditions for

being an instance of the corresponding universal. The sibling class of a given parent

class should have differentia which are really distinct. This means that the universals of

these classes at least have distinct (ideally non-overlapping = single inheritance)

extensions. The distinction between each pair of siblings must be explicitly represented

(opposition principle).

Which universals exist is not a function of our biological knowledge. Be aware that terms such as ‘unknown’ or ‘untypified’ or ‘unlocalized’ do not designate genuine universals. To characterize classes, formulate intrinsic properties (properties

that are inherent to the universal represented by the RU) rather than extrinsic ones

(properties that are asserted from outside, e.g. accession numbers). ‘Intrinsic’ describes

a characteristic or property of some thing or action which is essential and specific to that

thing or action, and which is wholly independent of any other object, action or

consequence. A characteristic which is not essential or inherent is extrinsic (from

http://en.wikipedia.org/wiki/Intrinsic).

3.4 Try to avoid multiple parenthood and multiple inheritance

No class in the hierarchy should have more than one superclass. Multiple

inheritance can generate subtle but systematic ambiguity in the meaning of formal

relations like is_a and part_of within the ontology. One should not press the "is_a" into

service to mean a variety of different things (see univocity principle). Domain-experts

should build single parenthood taxonomies of their views of reality. Other domain




experts build the same for theirs and only later all these taxonomies will get

‘multidimensionally’ aligned within obo and secure common nodes will result which

make consistent (!) multiple inheritance possible.

There are however many opinions on this issue and we might discuss this matter

further, when we feel there is a real need for multiple parenthood.




4 Naming ClassesEach class representing a universal in a representational artefact is labelled with a human readable class name. Class names should be short, easy to remember and as self-explanatory as the pragmatic compromise allows. This class name

should be used as default browser key when navigating through the class hierarchy and

should therefore be as intuitive as possible to the ontology engineer building the

ontological structure. However this class name will not necessarily be used as the main

search attribute by the end-users when they are searching for classes. For this a short

and intuitive class name should be captured as preferred synonym, which would be the

term of highest usage frequency found in the literature of that domain, i.e. the term with

the highest user acceptance. Use a name that is most widely accepted in the user

domain. The class should represent and be named after the intrinsic, underlying nature

of the universal to be represented, not according to extrinsic properties or roles a class

can play in a particular context. Embodying the whole meaning of the class - with all its

relationships to other classes - in its name is in most cases neither possible nor

recommended. Keep semantics in the definitions and formalize it explicitly as properties

and axioms. For example, a class “distinct_identifiable_physical_part” should be just

called “physical_part”. For the preferred synonym readability should have higher priority

than constraining interpretation through the class names. For the class name that is

used for OE, it is the other way round.

Epistemological statements don't belong in the class names so avoid calling the

class “instrument” “instrument_class” or the relation “has_part” “has_part_relation”.

4.1 Class name precision

Class names should be precise, concise and linguistically correct (i.e. they should conform to the rules of the language in question). Often terms for RUs are

not precise, i.e. they do not capture the intended meaning. Imprecise terms are

especially problematic in the absence of good definitions. For example the term

“anatomic_structure, system or substance” does not give us any clue as to whether the

scope of the adjective prefix “anatomic” is restricted to structure or extends also to

system and substance. This ambiguity can lead to problems like the following: If

“anatomic” is restricted to “structure” only, then “drug” and “chemical” would be




classified under this class, since these are clearly substances. If it is not restricted

“drug” and “chemical” could not be classified under this class.

4.2 Synonyms

A strict definition of synonymy, as e.g. proposed by ISO 1087-1:2000 is: “… relation

between or among terms in a given language representing the same concept, with a

note to the effect that terms which are interchangeable in all contexts are called

synonyms; if they are interchangeable only in some contexts, they are called quasi-

synonyms. “

The number of synonyms for a class is not limited, and the same text string can be used

as a synonym for more than one class. Add synonyms if you edit or delete a class

name, but the old name is still a valid synonym, e.g. if you change "respiration" to

"cellular_respiration", keep "respiration" as a synonym. This helps other users to find

familiar classes. Add synonyms if the class name has (or contains) a commonly used

abbreviation. Acronyms are synonymous with the full name as long as the acronym

is not used in any other sense elsewhere. 'Jargon' type phrases are synonymous with

the full name as long as the phrase is not used in any other sense elsewhere.

To capture synonyms in owl, one can use the rdf:comment field, and add a comma

separated list of synonyms after a “synonym: ”-marker. Another way would be to create

a new metaclass with a new string datatype property “has_synonyme” and derive all

new classes from this new metaclass (see also http://protege.cim3.net/cgi-bin/wiki.pl?

CreatingSynonyms). This has the disadvantage of the whole ontology becoming OWL-

full. Capturing synonyms in further rdfs:label fields has the disadvantage that when

more synonyms are present, it is not possible to know which one is the preferred class

name, the human readable class name to display as the browser key and which is

another kind of synonym. Usually the alphabetically first rdfs;label would be displayed.

4.2.1 Different sorts of Synonyms ?As we saw above synonyms are not always 'synonymous' in the strictest sense of the word, as they do not always mean exactly the same as the class they are attached to. Some synonyms may be broader or narrower in meaning than the class

name; it may be a related phrase or alternative wording, spelling or use a different

system of nomenclature. Having a single, broad relationship between a class and its

synonyms is adequate for most search purposes, but for applications such as semantic


http://protege.cim3.net/cgi-bin/wiki.pl?CreatingSynonyms

http://protege.cim3.net/cgi-bin/wiki.pl?CreatingSynonyms



matching, the inclusion of a more formal relationship set is valuable. For this reason,

one could record a relationship type for each synonym, e.g. like GO does. Such

relationships can be stored in the OBO format flat file.

Synonym types:Some synonym relationship types are:

* the term is an exact synonym to the class name, “ornithine_cycle” is an exact

synonym of “urea_cycle”

* the term is related to the class name, “cytochrome_bc1_complex” is a related

synonym of “ubiquinol-cytochrome-c_reductase_activity”

* the synonym is broader than the class name, “cell division” is a broad synonym of

“cytokinesis”

* the synonym is narrower or more precise than the class name, “pyrimidine-

dimer_repair_by_photolyase” is a narrow synonym of “photoreactive_repair”

* the synonym is related to the class name, but is not exact, broader or narrower,

“virulence” has a synonym type of other related to the class name “pathogenesis”

However we do not recommend to capture such ‘synonym types’ as the GO style guide suggests. Capture only exact synonyms.

For the OWL format one could use the W3 standard for thesauri ‘Simple Knowledge Organisation System’ (SKOS, http://www.w3.org/2004/02/skos/) to encode synonym types through relations like “narrower than”, “broader than”. It also provides a “preferred label” and "related to" element for terminological mapping:The SKOS Core Vocabulary includes the following properties for asserting semantic

relationships between concepts: skos:semanticRelation, skos:broader, skos:narrower

and skos:related. In a property hierarchy semanticRelation is the top semantic

relationship and others are children relationships. To assert that one concept is broader

in meaning (i.e. more general) than another, where the scope (meaning) of one falls

completely within the scope of the other, use the skos:broader property. To assert the

inverse, that one concept is narrower in meaning (i.e. more specific) than another, use

the skos:narrower property.<skos:Concept rdf:about="http://www.my.com/#canals">

<skos:broader rdf:resource="http://www.my.com/#hydrographic

%20structures"/>


http://www.w3.org/2004/02/skos/



</skos:Concept>

To assert that one concept is broader in meaning (i.e. more general) than another,

where the scope (meaning) of one falls completely within the scope of the other, use the

skos:broader property. To assert the inverse, that one concept is narrower in meaning

(i.e. more specific) than another, use the skos:narrower property. For example:

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:skos="http://www.w3.org/2004/02/skos/core#">

<skos:Concept rdf:about="http://www.example.com/concepts#mammals">

<skos:prefLabel>mammals</skos:prefLabel>

<skos:broader rdf:resource="http://www.example.com/concepts#animals"/>

</skos:Concept>

<skos:Concept rdf:about="http://www.example.com/concepts#animals">

<skos:prefLabel>animals</skos:prefLabel>

<skos:narrower rdf:resource="http://www.example.com/concepts#mammals"/>

</skos:Concept>

</rdf:RDF>

When you add a synonym in OBO-format using OBO-Edit, choose a type from the pull-

down selector (see the DAG-Edit user guide for more information). DAG-Edit will

incorporate the synonym type into the OBO format flat file when you save. The default

synonym type is the broadest, 'synonym' (equivalent to 'related' above).

4.2.2 Property synonymsOne can also create Object Property Synonymes (see section 4.1 of

http://www.w3.org/TR/owl-guide), e.g:


http://www.w3.org/TR/owl-guide

http://www.w3.org/2004/02/skos/core/spec/#narrower

http://www.w3.org/2004/02/skos/core/spec/#broader



<owl:ObjectProperty rdf:ID="has_child">

<owl:equivalentProperty>

<owl:ObjectProperty rdf:ID="has_kid"/>

</owl:equivalentProperty>

</owl:ObjectProperty>

4.3 Lexical Properties of class names

4.3.1 CapitalisationNames should be lower case letters throughout except for acronyms which are capitalised (if their use in class names can't be avoided) and proprietary names, which are written as such. Proper names / brand names can break the conventions

rules unless rdf-field restrictions prevent these. E.g. there can be a "CBS_station"

(starting with a capital letter) and there can be a CamelCase brand name. This is the

recommendation of the OBO-Consortium. The other KR-domains (semantic web / OWL,

Protégé-group), use capitals for beginning class names, while proprietary names and

properties start with lower case letters.

Internal capitalization is however enforced by some computer systems, and mandated

by the coding standards of many programming languages, i.e. Java coding style

dictates that UpperCamelCase be used for classes, and lowerCamelCase be used for

instances and members. So unless you plan to use auto generated java classes or any

MDA approaches to convert the ontology into software code avoid CamelCase.

4.3.2 Character setTerms designating RUs should consist mainly of alphabetic characters, numerals and underscores. Whether you will be allowed to use the space as word delimiter depends on the way the implementation handles the strings for the representational unit in question. Avoid special characters where possible. Avoid accents, sub- or superscripts and characters and character-combinations that may have a special meaning in regular expressions or programming languages and XML. This recommendations are largely dependant on what the parsers for the

implementation format for the specific RU can handle, e.g. OWL identifiers (values of

the rdfID / :NAME property) must begin with a letter or underscore and contain only

letters, numerals, and the underscore character (‘_’). Spaces are not allowed here.




For the full less restrictive specification see http://www.w3.org/TR/REC-xml-names/#NT-

NCName:

NCNameStartChar ::= Letter | '_'

NCNameChar ::= NameChar - ':'

( NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender )

If you keep the class name in another element, e.g. the rdf:label, you are in principle not

restricted in your character usage.

4.3.2.1 Character set formatting

No subscripts or superscripts are allowed (e.g. cm3 replaces cm3 and CO2 replaces

CO2). The Names of chemical elements from the periodic table should be written in full

length and should not be abbreviated with their symbols. (use hydrogen, copper and

zinc rather than H, Cu and Zn). Greek symbols should be spelled out e.g. alpha, beta,

gamma. Temperature designations like 37° C. can be represented as 37C.

[Add Punctuation]

4.3.3 Word separatorsVarious kinds of punctuation connect name parts, including separators such as spaces,

hyphens, and grouping symbols such as parentheses. These may have:

a) No semantic meaning. A naming rule may state that separators will consist of one

blank space or exactly one special character (for example a hyphen or underscore)

regardless of semantic relationships of parts. Such a rule simplifies name formation.

b) Semantic meaning. Separators can convey semantic meaning by, for example,

assigning a different separator between words in the qualifier term from the separator

that separates words in the other part terms. In this way, the separator identifies the

qualifier term clearly as different from the rest of the name. For example, in the data

element name “Cost_Budget-Period_Total_Amount” the separator between words in the

qualifier term is a hyphen; other name parts are separated by underscores.

Asian languages often form words using two characters which, separately, have

different meanings, but when joined together have a third meaning unrelated to its parts.

This may pose a problem in the interpretation of a name because ambiguity may be

created by the juxtaposition of characters. A possible solution is to use one separator to

distinguish when two characters form a single word, and another when they are

individual words.


http://www.w3.org/TR/REC-xml/#NT-Extender

http://www.w3.org/TR/REC-xml/#NT-CombiningChar

http://www.w3.org/TR/REC-xml/#NT-Digit

http://www.w3.org/TR/REC-xml/#NT-Letter

http://www.w3.org/TR/REC-xml#NT-NameChar

http://www.w3.org/TR/REC-xml#NT-Letter



Class name terms should be delimited by the "_" (underscore) separator. The underscore substitutes the space character. Whether you will be allowed to use the space as word delimiter depends on the way the implementation handles the strings for the representational unit in question. Under the OBO umbrella one can

find: "MyClass" "My Class", "My-Class", "My_Class", “My_class" and "my class"

conventions, sometimes even within one ontology. One convention is not necessarily

better or worse than the other as long it is used consistently within the ontology. Java

programmers, for example, use the "MyClass" (CamelCase-) convention, because that

is the standard for naming Java classes, whereas text miners use "My class"

convention, because it is easier to tokenize by natural language processing tools. The

CamelCase convention has problems to capture class names like “Sample_pH” which

would then read “SamplePH”. XML based languages don't like the space as a

separator, so check how your parser copes with it in the (meta-) RU which captures the

name for the RU. The current Jena XML parser does not cope with spaces in class-

names when using the Protégé-2000 OWL-plugin. When the class name is captured

within the rdf:ID element, it becomes also part of a Namespace and URI in OWL and

these as explained above should not contain spaces or special characters. This is not

an issue when using the rdf:label element to capture the class name. The easiest thing

is however to avoid the space at all.

4.3.3.1 Hyphens, dash and slash

The hyphen should be avoided as word-separator; it should be used as in normal written English language as long as the representation formalism allows it. Java

will interpret the Hyphen as a minus. Using the hyphen as separator would also cause

ambiguity when using hyphens when required by English, e.g. “copper-

based_compound” and when used to restrict or refine the meaning of a name, e.g.

"bow-boat_part" and "bow-the_weapon" as is still done in some ontologies. The Hyphen

has many meanings which we take for granted, but which have to be assigned more

explicitly to be processed by computers. When using the hyphen one should be aware

that its meanings can conflict: It can generally mark an undefined "somehow-related-to"

relationship, it can mark a closer semantic binding as in “copper-based_compound” and

can encode substantiation like in "abdomen-sonography", but it can also mark a

divergence in meaning between the two words, as in "black-white". In “bio- and

genetechnology” it encodes an ellipse, standing for the morpheme “technology”.




Sometimes the hyphen encodes different logical connectors like "and" or "or" and it can

be used to separate syllable when breaking a work in two at the end of a line. In

sentences it can of course also encode separation marks for additional thoughts

squeezed into a sentence as in “I am always there in time – except Sundays – to listen

to you.” The hyphen also demarks numerical, spatial or temporal lengths as in “1–4

telephone calls”, “Bremen–Hamburg” and “25.09.–28.12”, or is used as a minus or to

indicate an omission as in “the PC is worth 300,–“. Last, but not least it can be confused

with a minus.

So we need to differentiate between the hyphen and a dash. There are two kinds of

dashes: the n-dash and the m-dash. The n-dash is called that because it is the same

width as the letter "n". The m-dash is longer, he width of the letter "m". We use the n-

dash for numerical ranges, as in "6-10 years." When we need a dash as a form of

parenthetical punctuation in a sentence we use the m-dash.

The slash "/" means OR or AND in most cases and should be avoided in class names

as should logical connectives in general.

4.3.4 Singular nounsNames for RUs should be in the singular form throughout. Class names are always singular nouns, e.g. "randomisation" instead of "randomise". This prevents

redundancy and misclassifications, for example creating a class "experiments" (plural)

and then "experiment" as its subclass deeper in the hierarchy. If you want to import

legacy XML or generate XML feeds from the ontology you have to use the singular form

anyway, since this is the expected convention for XML tags.

4.3.5 Use present tense for representational unitsClass and property names should be uniformly captured in present tense. Sometimes a time perspective is indicated within class or property names, i.e.

”to_be_measured”, “measuring”, “measurement_taken”. Class names should be

normalized consistently into the present tense form, e.g. “measurement”.

4.3.6 Plurals and setsIf you have to capture plurals you have three possibilities e.g. “protocols” “set_of_protocols” or “protocol_set”. The last form is recommended, because it is

easier to spot (also for textmining). It is preferred over “set_of_x” because it is placed

alphabetically directly beneath its singular form within the hierarchy. Use plurals




sparsely. Creating for each singular x a plural-container of the form “x_set” creates a lot

of classes, which we might not use at all. An instance of 'protocol' is a protocol and an

instance of 'protocol_set' is a set of protocols. Be aware of the difference: Each class 'A'

in an ontology has the implicit meaning 'the class A'.

[Refine this (Chebi comment)]

Discriminate carefully between Class and Set: Both classes and sets are marked by

granularity, but sets are timeless. A class endures through time and survives the

turnover in its instances. A set is determined by its members. A class is not determined

by its instances (as a state is not determined by its citizens and as an organism is not

determined by its molecules). A set is an abstract structure, existing outside time and

space. The set of human beings existing at t is (timelessly) a different entity from the set

of human beings existing at t' because of births and deaths.

4.3.7 Avoid linguistic ellipsesBe explicit, try to avoid ellipses, because what you leave out or think as implicitly

clear is not necessarily known by others and in any case not for computers. An ellipsis is a rhetorical figure of speech, the omission of a word or words required by strict

grammatical rules but not by sense. The missing words are implied by the context in

human language. Ellipse usage often points to slang words which should be avoided, or

put as synonyms, e.g. "chemo" for "chemotherapy". The aposiopesis is special form of

rhetorical ellipsis (wiki). Typical examples of this are: Pat embraces Meredith, and

Meredith, Pat, in which the second instance of the word embraces is implied rather than

explicit. And so to bed, which appears on several occasions in the diary of Samuel

Pepys, meaning and so I went to bed.

The Plant Ontology used to use 'cell' to mean 'plant cell' in this way, which led to

problems when they had to extend the ontology to deal with bacteria in plants. They

have now changed the definition and name of their former 'cell' to ‘plant ceell’ and

created a broader ‘cell’ class. The general rule is, for every expression 'E': 'E' means:

E. The term ‘E’ means what the word ‘E’ means, but the word ‘E’ may mean different

things...

Sometimes hyphen usage is a hint for Ellipse usage. This should be avoided, e.g. "bio-

and genetechnology" would be "biotechnology and genetechnology" and then probably

modelled as two separate classes "biotechnology” and “genetechnology".


http://en.wikipedia.org/wiki/Samuel_Pepys

http://en.wikipedia.org/wiki/Samuel_Pepys

http://en.wikipedia.org/wiki/Aposiopesis

http://en.wikipedia.org/wiki/Figure_of_speech

http://en.wikipedia.org/wiki/Rhetoric



Confusion is also spawned by the fact that we use the very same general terms to refer

both to universals and to collections of particulars. Consider:

· HIV is an infectious retrovirus

· HIV is spreading very rapidly through Asia

This however could also be regarded as an ellipise usage: The first ellipse "HIV" stands

for "HIV-Virus", the second ellipse stands for "HIV-Disease".

4.3.8 Acronyms and AbbreviationsIdeally, abbreviations in names should be avoided and acronyms resolved. Names for RUs should be explicit, e.g. "number_of_residues" should be used instead

of a totally unintuitive "n_res". Acronyms should be included in the synonyms list and

resolved if used as preferred class name. When an acronym, however, is commonly

used with very high frequency in everyday language in place of its full name, for

example “laser”, it should be used as class name, while its resolved name listed in the

synonym list. Domain-specific acronyms should be resolved. Only the main focus

Acronyms that are found frequently in the ontology can stay as they are. Resolving e.g.

“NMR” as “nuclear_magnetic_resonance_spectroscopy” in each RU within an NMR

ontology makes too many terms unnecessary long and hard to read.

Top level classes should never have abbreviations or acronyms in their names,

however, there are bottom level classes in which an acronym or abbreviation could be

used. In these cases of compound terms on the bottom level the acronym should be

unambiguous and be resolved at least in one of the synonyms. Do not allow

abbreviations which employ expressions with other meanings ('chronic olfactory lung

disorder' should never be abbreviated: cold). If they can’t be avoided capitalize

Acronyms. There is no clear policy on when to spell out abbreviations, so use your common sense.

4.3.9 Registered Product- and Company-namesProprietary names should be captured as they are, as long as this is not prohibited

by the parser. In our case we are not restricted here, but should discuss, whether we

allow spaces, or substitute them with the underscore.

[add and refine]

4.3.10 Word compositions and length




Names for RUs should be at least four characters long and as short as possible to be easy readable and understandable. It should be avoided to create human readable or preferred names that look like full sentences. Ideally, short and maximally intuitive names are to be preferred. Names are useful only if they are in fact used [see JacobKoehler paper."intelligibility of GO terms" + DILS paper].

Word compositions longer than five words / morphemes should be avoided. When class

names are made out of more words, try to use words that are already defined in higher

hierarchy levels of the ontology. ‘Recycle’ words whenever possible. Build compound names out of simpler ones from the ontology in a consistent LEGO-like approach. Consistent means that the binding operators (words used to connect the

other parts of the class name) are used in the same sound manner throughout the

ontology.

4.3.10.1 Compound vs. atomic names for representational units

Sometimes one encounters rather long names for RUs, which encode a lot of semantics

within the name. These complex names are compositions of many words and therefore

are called compound terms. They often consist of a noun phrase, like

"sample_temperature_in_autosampler" embedding a prepositional term (localizational

property like "in_autosampler"). [Compositionality – see Chris Mungall's OBOL , see

Okren]

When the representational formalism allows to formalize properties and the atomic

compounds are already present, these classes can be refactored / dissected /

decomposed into more primitive existing classes (atoms) and attributes or relations

between them. I.E. this is encouraged for OWL ontologies. When only an is_a

hierarchy (without properties) is provided, compound names should be kept in the long form to capture what the user really wants to express and one has to keep the semantics within the class. As long as working with CVs one should aim to

be reasonably descriptive, even at the risk of some verbal redundancy or longer names.

That is why one often finds rather long class names in taxonomic CVs (e.g. GO).

When word combinations with genitive, dative or accusative case occur, variants are

possible, e.g. Combination into one single word, e.g. Breaking_off_the_experiment

experiment_breakoff or connection with hyphen, e.g. NMR_of_Hydrogen Hydrogen-

NMR.




According to DIN 12/1993, when new terms are created out of existing, already defined

class names (B. Schaeder, Fachlexicographie: Fachwissen und seine Repraesentation

in Woerterbuechern, 1994, Tübingen) the following types of multi-word terms can be

distinguished (Schaeder,1994) :

Determinative term (Concept) linkage:A second term occurs additionally, as a feature in the content of the original term,

whereby the latter is restricted. The resulting multi-word term is a subterm. E.g.

randomised study.

Disjunctive term linkage:The new multi-word term encompasses the scope of both constituent terms. E.g.

Consensus Study.

Integrating term linkage:Objects associated to terms are combined into the next higher whole. E.g. Sponsor-

investigator.

Conjunctive term integration:The new term merges the contents of both constituent terms, and is their next common

subterm. E.g. Investigator study.

4.3.10.2 Splitting and merging classes

Simple (sometimes hyphen separated) and bimorphemic compound terms like

"histology-result" should only be atomised into histology and result when the occurring

morphemes represent single important classes themselves which are of use in other

multi-word creations. E.g. for a clinical trail the atomic morphemes "ethics" and

"commission" are not important, so a multi-word term like "ethics_commission" can stay

like this and needs only be defined once as is.

The standard procedure for refactoring / splitting a class is to obsolete the original class

and add a suitable comment directing annotators to the new classes (see Metadata

section). Classes are merged in cases where two classes have exactly the same

meaning. Usually this situation arises when one class exists, and another wording of the

same concept is added as a new class instead of as a synonym, either because a

curator didn't find the old class or didn't know it meant the same thing.

For owl: When two classes are merged, e.g. class A and class B are merged into class

A, the class name and the ID of class B is made a synonym of class A.




For obo: When two classes are merged, e.g. class A and class B are merged into class

A, the ID of class B is made a secondary ID, and the class name is made a synonym.

Usually, the ID that has existed longer is used as the primary ID, but exceptions can be

made; e.g. the name of the class with the newer ID may be more correct or the

definition may be better. Secondary IDs are stored in the OBO flat file with the 'alt_id'

tag.

4.3.11 Affixes (prefix, suffix, infix and circumfix)The word-stem should be used and affixes to names should be avoided where possible or at least be used consistently. Since each class 'A' implicitly means 'the

class A', either prefixes or affixes involving “_class” must be avoided. The same applies

to suffixes like "_entity" and "_type". When an ontology has many terms starting with the

same prefix, for example “sample_number”, “sample_origin”, … , it suggests the need

for transforming the postfixes into properties of a [prefix]-class when building the

ontology. If subclasses are named using the class-name and a further descriptive

morpheme, this should be done in a consistent way throughout the subclasses. For

example, a class "receptor" can have two subclasses named either

“katecholamine_receptor” and “peptide_receptor” (naming them just “katecholamine”

and “peptide” would be a bad practice since ellipses have to be avoided and “peptide”

designates a complete different class anyway). So there should not be the names

“katecholamine_receptor” and “peptide”. If one prefixes a "receptor"-subclass name in

the form xy_receptor, e.g. "adrenaline_receptor" (having the ligand as xy (prefix), one

can't integrate receptors that are named according to their succeeding signalling

transduction module, e.g. "G-proteine_coupled_receptor" (and not the ligand) in a

consistent way. Infixes, circumfixes, articles, conjunctions and possessive forms of words should be used consistently, but be avoided when possible.

4.3.12 Logical connectivesLogical connectives such as "and", "or" and "not" should not be used within names for RUs, because they will be formalised as constraints and axioms later (and

hence will allow for reasoning). 'rabbit or whale' does not designate a special universal

of mammal.

4.3.13 "Taboo" words and Characters




Where possible, words from the metalevel (the representation formalism / KR language) should not be used within names for RUs. The use of database or

ontology language keywords, for example "Model", "Class", "KIF", "Clips" and "OWL"

and xml style tags or characters designating tags or regular expressions should be

avoided when possible, because you never know whether all parsers you might need to

use will handle these. Also when translations into other formats have to be made you

can be sure not to run into parser problems in these other formats.

Other words and morphemes to be avoided are highly ambiguous ones, e.g. the affixes

“set” and “setting” belong to the most ambiguous words in English. "Set" alone has over

20 different meanings (set refers to the process of setting parameters or to a plural of

parameters.

4.3.14 Specific language requirementsConsistency is required if encountering this special case.Where there are differences in the accepted spelling between English and US usage,

use the US form, e.g. polymerizing, signalling rather than polymerising, signalling.

A common source of misspelled tags is the translation from other alphabets or

characters. For example, the Umlaut, commonly used in German, is usually represented

by the Latin-1 character set. Since this character set is often unavailable, Germans

frequently represent an Umlaut character by means of a longhand encoding, such as

"ue" for "ü". Consistency is required in these special cases to avoid mixture of "ü"s and

"ue"s.




5 Depicting representational units within textBe consistent in your notation. We use bold type to depict relations involving

particulars; italics for universals and for relations between universals and Roman for

particulars.

[to be added: Formatting convention when using ontological repr units in literature –

see OBO ROItalics

Bold“ “

‘ ‘

UPPERCASE throughout

lowercase throughout

underlined

One Recommendation: If you use boldface to emphasize that you speak of the term and

not of its denotation, then do not use boldface for other purposes. Use single quotes to

explicitly refer to the term 'class'. Since classes are not terms, but rather have terms as

names one should say: "the class called 'human'" (where 'human' is the term used to

name the class in question), or "the class human" (where italics are used to emphasize

that human represents a class). One might though want to reserve italics for universals,

eg., "the class representing (the universal) human", and then one should say "the class

human", or "the class 'human'" (the last is a shortcut, and this kind of shortcut should be

introduced explicitly).]




6 Class definitionsClass definitions should provide the context and meaning of the class in a way to ease its interpretation. The definition should contain important keywords that describe the classes inherent attributes and relations to other classes in natural language. However in reality proper definitions can not be created for all universals,

especially at the root level of the ontology (e.g. it is hard to define “thing”). A class

should be given a humanly intelligible definition only when the necessary and sufficient

conditions for being an instance of the corresponding universal are really understood.

Before that, do not make up pseudo-definitions (e.g. circular definitions), but

provisionally collect the necessary conditions in the comment field. Proofread your

definitions carefully to eliminate typos and double spaces. As with class names, avoid

using abbreviations that may be ambiguous. Definitions should be as brief as possible, but as complex as necessary. They should begin with an upper-case letter,

can consist of more than one sentence if necessary and end always with a period (full

stop). Definitions should start in the following way: “A [class described] is a [superclass],

which/that [most relevant intrinsic properties (attributes and relations to other classes)].

It…. [Enter]”. When using the word “it” make sure you always refer to the described

class only.

In practice one would first capture non-formal definitions as they come from the domain

experts, glossaries or gathered by a google:define search. These are captured with their

provenance (meta-) data, after a “tempdef” marker. Then one creates a second

definition which is more formal and standardized according to the defined principles

mentioned below (put after the def marker, see metadata section). Currently all

definitions are captured together with metadata in the rdfs:comment field, which is not

the cleanest solution, since the comment field can mean anything from editorial notes,

scope notes, provenance notes and definitions. The xml:lang attributes do not have to

be set, because they can be set once for all classes in the metadata ontology

description tab and these lang-attributes - at least for the rdf:label field - tend to cause

problems when importing these ontologies.

6.1 General rules for creating sound normalized definitions

1. Each definition refers to only one class.




2. Definitions should be as clear and concise as possible in order to convey the essence, "Das

Wesen" (Silesius) of the universal to the user of the ontology.

3. Definitions should define classes and their referred universals and not the words used to refer to

classes (class names), so in definitions avoid terms like ‘class’, 'descriptor', 'name', etc. that

refer to RUs and not to the universals in reality. E.g. the definition of 'eye' is 'organ of sight', not

'is name of organ of sight', nor ‘class or concept describing an organ of sight’. Avoid using

acronyms within definitions.

4. The definitions should explain what are characteristics (or properties) that distinguish members

of this class from the others (the upper class and siblings).

5. Definitions should use simple, easy to understand words that are meaningful to most of the

users. In the best case all terms in the definition can be find as classes in higher levels of the

ontology and are thus defined.

6. It should be positive and not negative. Definitions like ‘all animals that are not a mammal’ or ‘ all

non-membrane proteins’, which do not designate natural kinds are not helpful, since

complements of universals are not necessarily themselves universals.

7. The formal rules for definitions laid down by Aristotle should be applied. When A is_a B, the

definition of ‘A’ takes the form: An A is a B which C... e.g: “A human being is a mammal which

is rational”. Essence = Genus + Differentiae. If a class has more parents, I.e. multiple

parenthood can not be avoided, mention all parent classes in the definition.

8. The definition should be free from words sharing the same root as the thing being defined (to be

represented) and should not contain the class name itself. Avoid circularity in definitions like

these:

An A is an A which is B (person = person with identity documents)

An A is the B of an A (heptolysis = the causes of heptolysis)

9. Each definition should reflect the position in the hierarchy to which a defined RU belongs. The

position of a RU within the hierarchy enriches its own definition by incorporating automatically

the definitions of all RUs above it. The entire information content of the hierarchy can then be

translated cleanly into a computer representation.

10.The definition must be correct in most of the possible contexts the class is used, so that the

class is intersubstitutable with its definition in such a way, that the result is both grammatically

correct and truth preserving.

11.Include some examples of well known prototypical instances or subclass of the class.

Additionally have a look at the following paper by Jacob Koehler:

http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1482721&blobtype=pdf[Do we need definitions for particulars that we currently represent as classes, e.g. do the brand names of

nmr-instrument vendors need definitions???]


http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1482721&blobtype=pdf



7 Unique identifiers

Following the decentralized web paradigm, every single RU (class or relation) should be versioned independently rather versioning the ontology as a whole. Therefore it is necessary to consider conventions for unique identifiers for RUs. If one tries to edit a set of modular ontologies held together by just the string class names,

every time somebody wants to change a name, fix a spelling error, etc. there is a global

change that is intrinsically unreliable or, if the ontologies are distributed, requires a

major organisational effort. When the identifiers are formal ID numbers and human readable class names are kept as labels you can change the label without disturbing the linkages. Hence versioning becomes easier when using unique formal

Identifiers for RUs in representational artefacts. Some ontology editors, like Protégé-

2000, construct identifiers out of the ontology name and numbers automatically.

A unique identifier MUST NOT be deleted once used. IDs should be conserved at all

times so that, even if a term is defunct or has a new ID, someone searching using the

old ID can find it.

OBO encourages numeric local IDs. Anything that is a valid XML ID can be used. As a

rule of thumb while user friendly names for RUs should not cause problems for human processing, their IDs should not cause problems for machine processing .

Always remind that an ID is associated with a definition and a universal rather than with the preferred class name. The numeric identifier resides in the rdf:ID field

and the human readable name of the class is in the rdfs:label field. These correspond to

the X and Y fields in the OBO-Format.

OBO IDs consist of a (all capitalised) prefix + underscore or ”:”(not in owl) + local ID.

The prefix can be the more commonly used short form (e.g. ‘OBI’ or ‘msi-nmr’) or a long

form (e.g. the full URI prefix). Only the long form + local ID is used in proper OWL files

(although the short form can be used as a qname). Currently the long form is left implicit

for most OBO ontologies; OBO will come up with a default mapping (which can be

overridden by the ontology maintainer); e.g. ONTOLOGYSHORTNAME_21

urn:lsid:ontologyshortname.sourceforge.net: ONTOLOGYSHORTNAME_21 and there

will be widgets in Protégé for substituting the short with a long form throughout an

ontology. OBO has to decide whether to go with URNs on more standard URIs as the

default short->long mapping.




[The RECOMMENDED system of identifiers for the PSI CVs consists of two parts. Part one should be the

official ‘namespace abbreviation’ PSI:XXX. The second part corresponds to a numeric accession

numbers having the pattern “000000’. Therefore, the local identifier is XXX:000000 and the complete PSI

CV unique identifier is of the format “PSI:XXX:000000”.]

Within OBO an "OBO_REL_"-prefix is used to name relations within the rdf:ID field,

e.g. rdf:ID="OBO_REL_part_of". The OBO prefix / idspace equates to an XML/RDF

namespace: A mapping between a "local" ID space and a "global" ID space. The value

for this tag should be a local idspace, a space, a URI, optionally followed by a quote-

enclosed description, like this: idspace: GO urn:lsid:bioontology.org:GO: "gene ontology

terms".

7.1 Capturing the class name and ID using the autoID plugin in Protégé-

owl

Within our current ontologies the unique class IDs goes in the rfd:id field. The value of the rdf:id field is restricted and can only contain special characters at special positions. The rdf:id field can contain the following characters:

at the beginning: £ $ _ and :

but not :@[{./=-+<~#!"%^&*(`

within, no @[{./=-+<~#!"%^&*(` are allowed

at the end : is prohibited.

The IDs consist of a short prefix designating the ontology (i.e. msi-nmr, OBI of FuGO) and a number within a range that can be specified. We use the CO-ODE autoID plugin for that.

The preferred class name goes into the rdf:label field, which is not restricted and can contain any Unicode character-sequence (For the owl:DatatypeProperty resp.

owl:AnnotationProperty rdfs:label protégé gives an xsd:string-Instance of the class

rdfs:datatype as the value for its range; http://www.w3.org/2001/XMLSchema#string).

In the Protégé owl plugin per default the rdf:id class name is used as "browser- or display key" within the term hierarchy (this field is mapped to the :STANDARD-

SLOT :NAME with the protégé value type string (Protégé meta-architecture). This

default setting reflects the fact that the alternative field to capture term names, the

rdf:label filed can be ambiguous, because more than one rdf:label fillers can be

asserted, e.g. synonyms (This however can result in the ontology to become an owl-full




ontology, which is not wanted when reasoning shall be applied). To set the OWL

Hierarchy display keys from :NAME to rdfs:label proceed as follows (Also have a look at

https://www.cbil.upenn.edu/fugowiki/index.php/ProtegeGoodies and

http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin):

Owl/Preferences/Visibility/Metaclasses: mark rdfs:class and owl:class to show up. Then

select (the now visible) owl class in forms tab and change the selection for the "Display

slot" from ":NAME" to "rdfs:label".

From now on in the class hierarchy of the OWL Classes tab:

1. classes with no rdfs:label display by their identifier (':NAME');

2. classes with rdfs:label for which all the labels have explicitly a language display

their identifier;

If more than one rdfs:label field with specified lang attribute exist (e.g. when capturing

synonyms within this field, the display-slot will be the alphabetically first one, which is

not necessary the preferred class name (caution!).

3. classes with at least one rdfs:label for which the language is not explicitly stated

display the value of this rdfs:label.

7.2 Life science Identifier, (LSID: http://lsid.sourceforge.net/)

The LSID concept introduces a straightforward approach to naming and identifying data

resources stored in multiple, distributed data stores in a manner that overcomes the

limitations of naming schemes in use today. Almost every public, internal, or

department-level data store today has its own way of naming individual data resources,

making integration between different data sources a tedious, never-ending chore for

informatics developers and researchers. By defining a simple, common way to identify

and access biologically significant data, whether that data is stored in files, relational

databases, in applications, or in internal or public data sources, LSID provides a naming

standard underpinning for wide-area science and interoperability. A LSID conforms to

the URN standards defined by the IETF. Every LSID consists of up to five parts: the

Network Identifier (NID); the root DNS name of the issuing authority; the namespace

chosen by the issuing authority; the object id unique in that namespace; and finally an

optional revision id for storing versioning information. Each part is separated by a colon

to make LSIDs easy to parse. Here are a few examples:

urn:lsid:pdb.org:1AFT:1 This is the first version of the 1AFT protein in the Protein

Data Bank.


http://lsid.sourceforge.net/

http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin



urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434 References a PubMed article

urn:lsid:ncbi.nlm.nig.gov:GenBank:T48601:2 Refers to the second version of an entry

in GenBank

LSIDs name and refer to one unchanging data object each. Unlike the familiar URLs of

the World-Wide-Web, LSIDs are location independent. This means that a program or a

user can be certain that what they are dealing with is exactly the same data if the LSID

of any object is the same as the LSID of another copy of the object obtained elsewhere.

The problem with URLs is that they always point to a particular web server (which may

not always be in service) and worse, that the contents referred to by a URL often

change.

A universal naming scheme simplifies the processing of data from a variety of sources,

because the application does not need to have specific, hard-coded support for each

naming scheme. This allows cross-referencing between data sources to be done

implicitly using URI’s. One such effort currently underway is the Life Sciences Identifier

(LSID) project. An example looks like this: urn:lsid:uniprot.org:uniprot:P49841. This

LSID names a protein record in Uniprot that is referred to as P49841. It consists of parts

separated by colons: A prefix “urn:lsid:”, the authority name; the authority-specific data

namespace; and the namespace-specific object identifier (here “P49841”).




8 NamespaceEvery ontology has its own characteristic namespace, a string of characters that

prefixes the individual identifiers for RUs in an ontology. By maintaining different

namespaces for different ontologies it is possible for one ontology to reference classes,

properties and individuals in another ontology in an unambiguous manner and without

causing name clashes. For example, all OWL classes reference the class owl:Thing.

This class resides in the OWL vocabulary ontology that has the namespace

http://www.w3.org/2002/07/owl#. The FuGO ontology refers to and makes use of the

Dublin Core ontology for annotating its RUs. It refers to these dc classes through their

namespace (http://purl.org/dc/elements/1.1/, as set in the Protégé metadata tab). A

namespace is also called a context, as the valid meaning of a name can change

depending on what namespace applies. In order to ensure that namespaces are unique

they manifest themselves as Unique Resource Identifiers. As in the OWL language the

class names are also part of a URI, they may not contain spaces or special characters.

In practice the namespace URI is an URL where the ontology can be found from within the internet, e.g.:

For the msi-ontology: http://msi-workgroups.sourceforge.net/ontologies/msi/msi.owl

For the FuGO-ontology: http://fugo.sourceforge.net/ontology/FuGO.owl

To get the corresponding namespaces just add the “#” to these URIs.

For better readability however one can internally also substitute the full namespace with

a short intuitive prefix, which should be the same as for the class ID, e.g. “FuGO” or

“msi”.

Serial namespace dependancies: If FuGO imports an ontology, e.g DC and the msi-ontology imports FuGO, all references FuGO points to will automatically also be passed to the msi-ontology. So you don’t need to import these again when they are already imported by an imported ontology.


http://fugo.sourceforge.net/ontology/FuGO.owl

http://msi-workgroups.sourceforge.net/ontologies/msi/msi.owl

http://purl.org/dc/elements/1.1/

http://www.w3.org/2002/07/owl#



9 Ontology Imports in Protégé-owlThe usage of standardized ontology representation languages like owl in conjunction

with the ubiquitous URI-based access on Semantic Web resources and the emergence

of ontology management methods and tools constitute a solid inventory for building

more reusable ontologies. To be able to reference to another web-based ontology, e.g.

OBI ontology classes or properties, the full OBI ontology has to be imported into the

active one (i.e. the msi-nmr ontology). Then we can experiment with the “binning” of

classes from our domain dependant / community specific ontology into more general

OBI-ones (OBI does the same with BFO.owl). To import an ontology to be referenced (e.g. OBI.owl), proceed as follows (see also http://protege.cim3.net/cgi-bin/wiki.pl?OWL_Imports_Repositories ):

9.1 The “lang” attribute issue

The Protege wiki page HidingIdentifiersWithLabelsInOWLPlugin

( http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin )

explicitly states (point 4.3.) that per default “

classes with rdfs:label for which all the labels have explicitly a language display

their identifier; (435)

classes with at least one rdfs:label for which the language is not explicitly stated

display the value of this rdfs:label. (436)“

… even if the display slot for owl:classes has been set to rdfs:label.”

For this reason replace all " xml:lang="en"" within the OBI.owl file with "" and save it.

This has to be done, because having the lang attribute set, the Protégé browser-key for

the imported classes will be set to be the unintuitive non descriptive rdf:ID again.

Alternatively work around this as described on the Protégé wiki. Import the Protégé

Metadata ontology and then …

“In the 'Metadata' tab, make sure the primary ontology (the top one) is active and

selected (67H)

1. create a new annotation property (select 'protege:defaultLanguage') 2. in the 'value' field, set the default language; 3. do not fill the 'lang' field for the annotation property; 4. in the class hierarchy of the OWLClasses tab:


http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin#nid67H

http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin#nid436


http://protege.cim3.net/cgi-bin/wiki.pl?HidingIdentifiersWithLabelsInOWLPlugin

http://protege.cim3.net/cgi-bin/wiki.pl?OWL_Imports_Repositories



1. classes with no rdfs:label display their identifier (':NAME'); 2. classes with rdfs:label for which the language matches the value defined at

step 5.1 display this label; 3. classes with rdfs:label, none of which matches the value defined at step

4.1 display their identifier (':NAME'). “

[Here it is not clear whether in point 3 the lang attribute for the ontological classes themselves or the lang

attribute for the protege:defaultLanguage property is meant (I assume the latter). Unfortunately it is not

stated how classes with an rdfs:label with no lang-attribute set will be displayed.

Following all recommendations from this website I still see classes of imported ontologies displayed by

their ids. This holds true for the ones that are direct parent classes of the ones in the primary ontology. I

assume this bug has something to do with the way protégé stores the referenced classes when an

imported ontology is deleted and after saving the updated new ontology is imported…. This issue still has

to be cleared.]

9.1.1 ImportOpen your ontology in Protege 3.2 beta, go to the metadata tab and click the Import

Icon within the ontology browser frame. Now select from where you want to import you

ontology from. Here you specify the URI pointing to the owl file to be imported, e.g.

weather you want to import from the web (a URL), from a local file or from a repository

file.

Then in the namespaces field of the Metadata Tab add the URI/namespace of the

ontology to be imported and a short prefix for it.

After this save the ontology and re-load it. You should see the top-level nodes of the

imported / referenced ontology. In any case you can only import the whole ontology, not

just certain modules (this is currently being worked on).

You are now able to use these referred classes and other RUs from the imported

ontology. You can make FuGO classes superclasses of your community / domain-

specific ones, e.g. of multiple parenthood of the NMR "autosampler" as a subclass of

msi:optional_part_of_NMR_instrument (rdf:ID="MSI_400001") and FuGO:Instrument

(rdf:about="#FUGO_47") : In owl such a reference looks like this: <owl:Class rdf:ID="MSI_400002">

<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"

>autosampler</rdfs:label>

<rdfs:subClassOf rdf:resource="http://fugo.sourceforge.net/ontology/FuGO.owl#FUGO_47"/> <rdfs:subClassOf>

<owl:Class rdf:ID="MSI_400001"/>

</rdfs:subClassOf>

</owl:Class>






The reference to the FuGO superclass is kept, even if the path to the imported ontology

has changed and it is not accessible. Also if you remove the imported ontology the

classes used as parent classes by the primary ontology classes stay in the ontology.

The reference is done through the rdf:resource value (which consists of the reference ontologies namespaceURI+#+rdf:ID-value ). Conventions for import repositoriesThere is no formal convention for determining the location of an ontology given its URI,

but it is generally recommended that ontologies are made available on the web at a

location that corresponds to their URI., e.g. the FuGO ontology should be able to be

found under http://fugo.sourceforge.net/ontology/

Since SF is sometimes very slow, a faster acessable website would be better.

Note that even if an ontology URI appears to be a URL, it is not necessarily resolvable.

In other words, it may or may not be the case that if an ontology URI is typed into a web

browser, a document containing the ontology will be displayed.

In order to deal with this. i.e. allow Protégé to load an ontology when its URL is not

resolvable, Protege-OWL uses a mechanism based on the notion of "ontology

repositories" to determine where an ontology should be loaded from. Given an ontology

URI, a repository can be checked to see whether or not it contains the ontology that is

identified by the URI. If the repository does contain the required ontology, then it acts as

a gateway for loading and possibly saving the ontology. Protege-OWL maintains a list of

such repositories and searches them when attempting to import an ontology

9.1.1.1 Importing from repositories (extracted from the Protégé wiki)

To manage repositories look at the "Ontology repositories..." item on the OWL menu.

Here for each ontology, its URI is shown, with a description of the location of the

ontology. One repository can provide access to several ontologies and for each of these

several alternative locations.

Protege-OWL supports the notion of global and project repositories. Global repositories

are by default available to every new project and are typically useful for managing

commonly used ontologies, such as upper ontologies, that will be imported into most

projects (so here would the BFO.owl belong which should be used by all OBO

ontologies ?). Project repositories are associated with a specific Protege-OWL project,

and must be created/specified for that project.

Protege-OWL supports the creation of four different types of repositories:


http://fugo.sourceforge.net/ontology/



HTTP repositories (URI refers to a URL)

Local folder repositories (URI refers to an absolute location for a repository folder

that is searched for the owl files)

Relative folder repositories (URI refers to a relative location for a repository folder

that is searched for the owl files)

Local file repositories (URI refers to the absolute position of a special .repository

file)

It is possible to specify multiple ontology repositories. When there are multiple

repositories, the ordering of ontology repositories, where an ontology is searched and

imported from, is as follows:

Search any project repositories from top to bottom. If the ontology is found, load the

ontology from the repository that the ontology is contained in.

Search any global repositories from top to bottom. If the ontology is found, load the

ontology from the repository that the ontology is contained in.

Attempt to resolve the ontology URI and import the ontology from the location

pointed to by the resolved URI.

If loading the ontology from the resolved URI fails, a dialog is popped up asking the

user to specify a repository where the imported ontology may be loaded from. (Note that

a dialog is not shown automatically when using the Protege-OWL API).

Protege-OWL stores repository information in plain text files, with one line per

repository. The global repository is saved in a file called global.repository which resides

in the Protege-OWL plugin folder. Project repositories are saved in a file that has the

same location and name as the project OWL file but with an extension of .repository.

9.1.1.2 Changing the imported ontology to be the newest updated version

To switch to a new version of the imported ontology you have to remove the old

outdated version through the metadata tab, save and reload the primary ontology. After

this step you will still find all parent-classes of the old imported ontology (e.g. FuGO

ones) which are used by classes of the primary ontology (e.g. msi-nmr ontology) on the

root-level. Remind: These used parent classes are displayed by their former

namespace-prefix (p1) and their ifentifyer (:NAME, rdf:id, rdf:about), no other data, e.g.

rdfs;labels or definitions will be stored in the primary ontology.




Now you have to import the updated version of that p1 ontology to embed them into

their originally context again (proceed as described above).

You will encounter the case where the updated ontology was altered in a way that

parentclasses for your primary ontology were deleted. In this case these ‘orphan’

classed are again found on the root level.[Other useful sources include:

http://www.w3.org/TR/owl-ref/#imports-def

http://www.w3.org/TR/2004/REC-owl-semantics-20040210/rdfs.html#owl_imports_rdf

http://www.w3.org/TR/2004/REC-owl-semantics-20040210/direct.html#owl_imports_semantics

http://www.w3.org/TR/2004/REC-owl-semantics-20040210/syntax.html#owl_imports_syntax

http://protege.stanford.edu/doc/owl/owl-imports.html

http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf#search=%22co-ode%22 ]


http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf#search=%22co-ode%22

http://protege.stanford.edu/doc/owl/owl-imports.html

http://www.w3.org/TR/2004/REC-owl-semantics-20040210/syntax.html#owl_imports_syntax

http://www.w3.org/TR/2004/REC-owl-semantics-20040210/direct.html#owl_imports_semantics

http://www.w3.org/TR/2004/REC-owl-semantics-20040210/rdfs.html#owl_imports_rdf

http://www.w3.org/TR/owl-ref/#imports-def



10 Properties (Attributes and Relations)

10.1Assigning "key-properties" to top level classes

The explicit allocation of class key-properties (the ones that define the essence of

the class A, which discriminates it within its superclass B) fosters consistent taxonomisation of lower level classes, because the inheritance of these properties guarantees that all subclasses at all sublevels can be immediately counterchecked to be consistent with all superclasses at any higher level (this is a feature of the protégé frames visualisation in the ‘properties-view’, not the ‘logic view’). It is not enough to capture these properties in the definitions only, because the

GUI-tools don't pass them on to the leaf classes like they do for formally assigned

properties. Explicitly formalised properties help constraining the interpretation of their domain- classes and all subclasses, which is exactly what is needed to provide the context for classification. These key properties help to keep track of the

intended (otherwise implicit) context, all the way downstream to the leaf nodes.

Classification can be decided to be true or false e.g. for the following case:

time_independent_study is_a ,...., is_a unfolding_through_time. If we would have

assigned a key-property has_timeline to the top level class “unfolding_through_time” (or

process), in the ‘properties view’ of the tab, we would immediately see this property

(inherited) at the leaf node “time_independent_study”, and here we could (by having this

information immediately visually accessible) decide more easily if this classification is

valid, e.g. when we then see the has_timeline property associated to the

“time_independent_study”, this feels counterintuitive at first and we might have a closer

look at this classification or the definition. However, since a “time_independent_study” is

not the same as a “study_without_timeline”, the classification is correct in this case.

Possible key-properties for a “process”-class could be starts_at, has_object_participant,

induced_through. Key-properties for the “object” top level class could be has_position,

has_mass, ….




11 Naming of Ontology files and Ontology VersionsA file-naming convention will help to capture basic metadata into filenames and

provides a simple versioning mechanism, for files which our community members may

upload into the file repositories. Any recommendations tackling this issue are of course

dependent on the way files are stored and versioned. So the following can be neglected

when an updating and versioning mechanism, e.g. cvs or svn is used. Owl can capture

a lot of these data in its metadata sections.

When no automatic update and versioning system is used ontology files and directories should be named according to the following syntax:ShortDescriptiveFilename_Author_Version_Date.extwhere, "ShortDescriptiveFilename" is a short descriptive filename that may contain

upper and lower case text, numerals, "-" (dash) and "_" (underscore). Use upper camel

case and underscore as separators. Space and other symbols are not allowed. "Author"

comprises the name of the author and/or the organization where the file is authored.

Separate author and organization with a dash if both are featured. Again, space and

other symbols are not allowed here. "Version_Date" comprises the version number

and/or the date the file is released. Start the version number with a "v"; use "-" instead

of "." in the version numbering (like "v2-15" instead of "v2.15"). Separate version and

date with an underscore, if both are featured. For the date reference, the more

significant parts should come first, as this eases alphabetical sorting according to the

date: use "yyyymmdd". Add an "a", "b", "c", ... suffix, if multiple versions may occur with

the same date reference. Again, space and other symbols are not allowed here. After

this follows the "." (a dot, there should only be one dot in the entire filename and that

should be right before the file extension). "ext" is the standard file extension by which

this file can be associated with an appropriate application that will handle it. This is

generally in 2~4 lower case alphanumeric characters. E.g.: NMR-Ontology_MSI-DS_v1-9_20060420.owlA similar convention is being practiced at w3c for their published work (e.g. note their

page header information http://www.w3.org/TR/2004/REC-webont-req-20040210/ ).

Also use a convention when constructing URIs for ontology versions, and apply it

consistently. In the following example the date on which the ontology was frozen is used

to construct the URIs for the ontology versions, but the version could also be used:


http://www.w3.org/TR/2004/REC-webont-req-20040210/



Ontology URI: http://www.example.com/nmr-ontology

Ontology version URIs: http://www.example.com/nmr-ontology_20051004

http://www.example.com/nmr-ontology_20051126

Filenames may only contain alphanumeric characters, the period ("."), dash ("-") and the

underscore character ("_"). Spaces, parenthesis, or other commonly used characters,

such as "~", "&", or "#" will cause the file to be rejected.




12 References[1] <<Metadata Annotations for Representational Units and Representational

Artefacts>>:

http://msi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc

[2] Establishing reporting standards for metabolomic and metabonomic studies: A call

for participation. Fiehn O, Kristal B, van Ommen B, Sumner LW, Sansone SA, Taylor C,

Hardy N, Kaddurah-Daouk R. OMICS 2006 Summer;10(2):158-63.

[3] http://msi-ontology.sourceforge.net/

[4] http://psidev.sourceforge.net/

[5] http://fugo.sourceforge.net

***** NOTE: This draft document is a work in progress *****It will be expanded to contain also recommendations referring to RUs

within representational artefacts formalized in richer semantics.

Comments and ideas are welcomed and should be sent to:

[email protected]


http://fugo.sourceforge.net/

http://psidev.sourceforge.net/

http://msi-ontology.sourceforge.net/

Naming conventions - SourceForgemsi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc ·...

Documents

Transcript of Naming conventions - SourceForgemsi-ontology.sourceforge.net/Naming_Conventions_DRAFTv7.doc ·...