A Survey and Comparative Review -...

22
In this article, we develop a framework for compar- ing ontologies and place a number of the more prominent ontologies into it. We have selected 10 specific projects for this study, including general ontologies, domain-specific ones, and one knowl- edge representation system. The comparison framework includes general characteristics, such as the purpose of an ontology, its coverage (general or domain specific), its size, and the formalism used. It also includes the design process used in creating an ontology and the methods used to evaluate it. Characteristics that describe the content of an ontology include taxonomic organization, types of concept covered, top-level divisions, internal structure of concepts, representation of part-whole relations, and the presence and nature of addition- al axioms. Finally, we consider what experiments or applications have used the ontologies. Knowl- edge sharing and reuse will require a common framework to support interoperability of indepen- dently created ontologies. Our study shows there is great diversity in the way ontologies are designed and the way they represent the world. By identify- ing the similarities and differences among existing ontologies, we clarify the range of alternatives in creating a standard framework for ontology design. T he major goal of this article is to develop a framework for comparing various pro- jects in ontology design and put a num- ber of prominent ontologies in this framework. According to Webster’s dictionary (Woolf 1981), ontology is a particular theory about the nature of being or the kinds of existent. The task of intelligent systems in computer science is to formally represent these existents. A body of formally represented knowledge is based on conceptualization. Conceptualization consists of a set of objects, concepts, and other entities about which knowledge is being expressed and of relationships that hold among them. Every knowledge model is committed to some con- ceptualization, implicitly or explicitly. An explicit specification of this conceptualization is called an ontology (Gruber 1993). Formally, an ontology consists of terms, their definitions, and axioms relating them (Gruber 1993); terms are normally organized in a taxonomy. Most of the researchers in the area of ontol- ogy design agree that the current important goals of ontology research are to (1) make ontologies sharable by developing common formalisms and tools; (2) develop the content of ontologies (ontology design); and (3) com- pare, gather, translate, and compose different ontologies. Recent work in ontology design has produced a range of different projects, from ontologies that represent general world knowl- edge to domain-specific ontologies to knowl- edge representation systems that embody ontological frameworks. There is an agreement in the ontology engineering community that it would be beneficial to be able to integrate ontologies so that they can share and reuse each other’s knowledge. If one ontology, for example, has a well-developed theory of time, another ontology (say, the one representing biology experiments) could then use this theo- ry without having to reinvent it. There is also an understanding that achieving the interoper- ability of ontologies is a challenging task. For smooth integration to be (at least partially) possible, the first thing to do is to look at the ontology projects that already exist and are fairly well developed and consider the differ- ences and similarities in the way they treat some basic knowledge representation aspects. With this understanding, we can see where there is some common base and what the obstacles are to the integration of different ontologies. Identifying a framework for comparing ontologies and placing a number of the more prominent existing projects in this framework Articles FALL 1997 53 The State of the Art in Ontology Design A Survey and Comparative Review Natalya Fridman Noy and Carole D. Hafner Copyright © 1997, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1997 / $2.00

Transcript of A Survey and Comparative Review -...

■ In this article, we develop a framework for compar-ing ontologies and place a number of the moreprominent ontologies into it. We have selected 10specific projects for this study, including generalontologies, domain-specific ones, and one knowl-edge representation system. The comparisonframework includes general characteristics, such asthe purpose of an ontology, its coverage (general ordomain specific), its size, and the formalism used.It also includes the design process used in creatingan ontology and the methods used to evaluate it.Characteristics that describe the content of anontology include taxonomic organization, types ofconcept covered, top-level divisions, internalstructure of concepts, representation of part-wholerelations, and the presence and nature of addition-al axioms. Finally, we consider what experimentsor applications have used the ontologies. Knowl-edge sharing and reuse will require a commonframework to support interoperability of indepen-dently created ontologies. Our study shows there isgreat diversity in the way ontologies are designedand the way they represent the world. By identify-ing the similarities and differences among existingontologies, we clarify the range of alternatives increating a standard framework for ontologydesign.

The major goal of this article is to developa framework for comparing various pro-jects in ontology design and put a num-

ber of prominent ontologies in this framework.According to Webster’s dictionary (Woolf

1981), ontology is a particular theory about thenature of being or the kinds of existent. Thetask of intelligent systems in computer scienceis to formally represent these existents. A bodyof formally represented knowledge is based onconceptualization. Conceptualization consists ofa set of objects, concepts, and other entitiesabout which knowledge is being expressed andof relationships that hold among them. Everyknowledge model is committed to some con-ceptualization, implicitly or explicitly. An

explicit specification of this conceptualizationis called an ontology (Gruber 1993). Formally,an ontology consists of terms, their definitions,and axioms relating them (Gruber 1993); termsare normally organized in a taxonomy.

Most of the researchers in the area of ontol-ogy design agree that the current importantgoals of ontology research are to (1) makeontologies sharable by developing commonformalisms and tools; (2) develop the contentof ontologies (ontology design); and (3) com-pare, gather, translate, and compose differentontologies. Recent work in ontology design hasproduced a range of different projects, fromontologies that represent general world knowl-edge to domain-specific ontologies to knowl-edge representation systems that embodyontological frameworks. There is an agreementin the ontology engineering community that itwould be beneficial to be able to integrateontologies so that they can share and reuseeach other’s knowledge. If one ontology, forexample, has a well-developed theory of time,another ontology (say, the one representingbiology experiments) could then use this theo-ry without having to reinvent it. There is alsoan understanding that achieving the interoper-ability of ontologies is a challenging task. Forsmooth integration to be (at least partially)possible, the first thing to do is to look at theontology projects that already exist and arefairly well developed and consider the differ-ences and similarities in the way they treatsome basic knowledge representation aspects.With this understanding, we can see wherethere is some common base and what theobstacles are to the integration of differentontologies.

Identifying a framework for comparingontologies and placing a number of the moreprominent existing projects in this framework

Articles

FALL 1997 53

The State of the Art in Ontology Design

A Survey and Comparative Review

Natalya Fridman Noy and Carole D. Hafner

Copyright © 1997, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1997 / $2.00

study are formality (from highly informal torigorously formal), purpose (what the ontologyis used for), and subject matter (the nature ofthe domain that the ontology is characteriz-ing). We add a significant number of otherdimensions that consider the content ofontologies in more detail and assess 10 specificprojects based on these dimensions. A briefdescription of each project (with references) ispresented in Project Overviews.

Our criteria for selecting ontologies for thestudy were to (1) get a representative set of pro-jects, second, (2) use ontologies that are signif-icant in size and relatively well developed, and(3) use fairly well-documented ontologies (atleast documented well enough to be able toanswer most of the questions we are asking;not all the data were available for all the pro-jects in the study, however). The list of projectswe used is in table 1 (a more detailed descrip-tion of each project is in Project Overviews).

We compared these projects according to anumber of dimensions, which are summarizedin table 2. We begin by comparing generalattributes of the projects in General Character-istics: what an ontology was created for,whether it is a general or domain-specific one,how can it be integrated in a more generalontology, or how a more specific ontology canbe linked to it. In the comparison, we also

is the objective of the study presented here.After giving a brief introduction to a numberof ontology projects, we compare them withrespect to what they were created for; what thedesign process was; and how they treat certainfundamental issues in representing knowl-edge, such as taxonomies, properties, and rela-tions. We identify common themes and con-sider different approaches to these issues. Wetry to single out some major approaches ineach dimension, group the projects accordingto this categorization, and point out the onesthat do not fit in this categorization.

Another motivation for creating a frameworkfor comparing ontologies is the growing workin producing libraries of ontologies that can bereused. For a researcher to orient himself/herselfin an ontology library and determine whichontologies are good candidates for reuse, itwould be extremely useful to get a descriptionof ontologies in a library in a more or less stan-dard form. A framework for comparing ontolo-gies, such as the one represented here, is a firststep in defining a set of questions and possibleanswers that would allow an ontology to “intro-duce” itself to a potential user.

One of the few prior studies that comparesontologies is discussed in Uschold (1996) andUschold and Gruninger (1996). The ontologycomparison dimensions identified in this

Project Name Project DescriptionCYC A general ontology for commonsense knowledge to facilitate

reasoning

K. Dahlgren’s ontology A linguistically motivated ontology of commonsense knowledge

GENERALIZED UPPER MODEL A general task and domain-independent ontology that is designedto support sophisticated natural language processing in different languages

GENSIM A genetic simulation system that represents and models enzymaticallycatalyzed biomedical reactions

Knowledge interchange format (KIF) A language for defining ontologies that has declarative semantics and is based on first-order predicate calculus

PLINIUS Project An ontology for representing mechanical properties of ceramic materials

J. Sowa’s ontology An attempt to synthesize philosophical insights to create a general ontology

Toronto virtual enterprise (TOVE) Project An ontology for enterprise modeling that will be able to deduce answers to queries about the information in the model

Unified medical language system (UMLS) An ontology of medical concepts

WORDNET A manually constructed online reference system that is one of themost comprehensive lexical ontologies

Articles

54 AI MAGAZINE

Table 1. List of Projects Used in This Study.

include such technical data as ontology size,formalism that was used, any ontology imple-mentation, and its public accessibility. Wethen consider the process of designing andevaluating ontologies. There is active discus-sion in the ontology community about differ-ent approaches to these processes.

In comparing the content of ontologies, wediscuss three different levels: (1) an is-a taxon-omy of concepts, (2) the internal conceptstructure and relations between concepts, and(3) the presence or absence of explicit axioms.Taxonomy is the center part of most ontolo-gies. Taxonomy organization can vary greatly:All concepts can be in one large taxonomy, or

there can be a number of smaller hierarchies,or there can be no explicit taxonomy at all.Although all general-purpose ontologies try tocategorize the same world, they are very differ-ent at their top level. They also differ in theirtreatment of basic parts of an ontology: things,processes, and relations. In Taxonomy, wecompare top-level categories of the ontologies.The next level of comparison is internal con-cept structure (see Internal Concept Structureand Relations between Concepts). Internalstructure can be realized by properties androles. Concepts in some ontologies are atomicand might not have any properties or roles orany other internal structure associated with

Articles

FALL 1997 55

General The purpose the ontology was created forGeneral or domain specificDomain (if domain specific)Easy integration possible into a more general ontologySize: Number of concepts, rules, links, and so onFormalism usedImplementation platform and language, if donePublication, if done

Design process How was the ontology built?Was there a formal evaluation?

Taxonomy What is the general taxonomy organization?Are there several taxonomies, or is everything in the same one?What is in the ontology: things, processes, relations, properties?What is the treatment of time?What is the top-level division?How tangled or dense is the taxonomy?

Internal concept structure and Do concepts have internal structure?relations between concepts Are there properties and roles?

Are there other kinds of relation between concepts?How are part-whole relations represented?

Axioms Are there explicit axioms?How are the axioms expressed?

Inference mechanism How is reasoning done (if any)?What are some instances of going beyond first-order logic?

Applications Retrieval mechanismUser interfaceApplication in which the ontology was used

Contributions Major strengths and contributionsWeaknesses

Table 2. Summary of Ontology Comparison Characteristics Used in This Study.

Project OverviewsIn this section, we present a brief overview ofeach of the projects in the study. We describe thegeneral information about the project, such asits goal and scope, and some of the more impor-tant aspects of its ontology. We start with moregeneral ontologies, such as CYC and WORDNET,and then go on to domain-specific ones (such asUMLS and TOVE) and the knowledge representa-tion system (KIF). To avoid excessive repetition,we discuss some of the details of the projects lat-er in this article during the comparison.

CYC ProjectTwelve years ago, a comprehensive effort was

them. We look at how the studied projectstreat all these issues. We specifically study thetreatment of part-whole relations in theseontologies. The third level in the comparisonis the presence or absence of explicit axiomsand the associated inference mechanisms (ifany). We consider what the ontologies’ use offormal axioms is and whether they go beyondfirst-order logic.

An important test for any ontology is thepractical applications it was used for. These canbe applications in natural language processing,information retrieval, simulation and model-ing, and so on, that use knowledge representedin the ontology. Some applications of the pro-jects are discussed in Applications.

Articles

56 AI MAGAZINE

CompositeTangible&IntangibleObject TangibleStuff

TangibleObjectIntelligence

SomethingExisting

InternalMachineThing

Process

IntangibleStuff

IntangibleObjectStuff

AttributeValue

Slot

Attribute

Relationship

CollectionEvent

Intangible RepresentedThingIndividualObject

Thing

Figure 1. CYC: Top-Level Categories (adapted from Lenat and Guha [1990]).

begun to create a general ontology for com-monsense knowledge: the CYC Project (Lenat1995; Guha and Lenat 1994; Lenat 1990; Lenatand Guha 1990; also see www.cyc.com/cyc-2-1/cover.html). CYC contains more than 10,000concept types used in the rules and facts en-coded in the knowledge base. The upper levelof the CYC hierarchy is presented in figure 1. Atthe top of the hierarchy is the Thing concept,which does not have any properties of its own.The hierarchy under Thing is quite tangled.Not all the subcategories are exclusive. In gen-eral, Thing is partitioned in three ways:

First is RepresentedThing versus InternalMa-chineThing. Every CYC category must be aninstance of one and only one of these sets.InternalMachineThing is anything that is localto the platform CYC is running on (strings,numbers, and so on). RepresentedThing iseverything else. Sowa (1995b) criticizes thisproposition saying that CYC should not beexcluded from representing things in its ownmachine.

Second is IndividualObject versus Collec-tion, which is another total partition ofThings. Collections include all the categoriesmentioned in CYC. Hence, Collection doesn’thave mass and is imperceptible. Sowa arguesthat it is unclear where something such as aflock of birds would be (which is a collectionthat is clearly perceptible).

Third is Intangible versus TangibleObjectversus CompositeTangible&IntangibleObject.Every unit in CYC is an instance of exactly oneof these three categories. Intangible is any-thing that has no mass (set of all people, num-ber42, and so on), whereas TangibleObject isanything that does have mass and energy (arock, a person’s body). CompositeTangible&IntangibleObject is something that has both aphysical extent and an intangible extent. Forexample, a particular person has a body (phys-ical extent) and a mind (intangible extent).

It is interesting to note that Event and itssubclass, Process, are subclasses of Individu-alObject.

WORDNET

One of the most well-developed lexical ontolo-gies is WORDNET (Miller 1990; also see ftp://clar-ity.princeton.edu/pub/wordnet/). WORDNET is amanually constructed online lexical referencesystem. Lexical objects in WORDNET are orga-nized semantically (with the basic distinctionbetween nouns, verbs, adjectives, andadverbs). The central object in WORDNET is asynset, a set of synonyms. If a word has morethan one sense, it will appear in more than onesynset. There are 70,000 synsets. Synsets areorganized in a hierarchy by superclass-subclassrelationship (referred to as hypernymy-hyponymy). Part of the WORDNET hierarchy of

Articles

FALL 1997 57

{thing, entity}

{non-living thing, object}{living thing, organism}

{person, human being}

{natural object} {substance}

{food}{artifact}

{animal, fauna}{plant, flora}

Figure 2. WORDNET: Representation of Subclass Relation among Synsets Denoting Different Kinds of Tangible Things (Miller 1990).Braces enclose concepts in the same synset.

according to their meaning, with entailmentbeing the primary relationship between theverbs in a cluster. Most of these clusters corre-spond to semantic domains: verbs of bodilycare and functions, change, cognition, com-munication, competition, and so on. Theverbs, such as suffice, belong, and resemble thatdo not belong to any of the semantic domainsand refer to states, form a separate file.

GENERALIZED UPPER MODEL

The GENERALIZED UPPER MODEL (Bateman, Magni-ni, and Rinaldi 1994; also see www.darm-s t a d t . g m d . d e / p u b l i s h / k o m e t / g e n -um/newUM.html) is a general task anddomain-independent linguistically motivatedontology that supports sophisticated naturallanguage processing in English, German, andItalian. Its level of abstraction is in betweenlexical knowledge and conceptual knowledge.It claims to simplify the interface betweendomain-specific knowledge and general lin-guistic resources. The model proposes a

tangible things is presented in figure 2. Foreach concept (synset), there is a pointer tonouns representing its parts. For example,parts for the bird concept are beak and wings.There are allowances in the WORDNET imple-mentation for other types of pointer (forexample, from noun to verb to represent func-tions or to adjective to represent properties),but these types of pointer have not beenimplemented yet. WORDNET is a taxonomy; itdoes not have structured concepts or axioms.

Although WORDNET uses a simple hierarchyfor noun synsets, it employs a different organi-zation of synsets for verbs and adjectives.Descriptive adjectives are organized in bipolarclusters based on antonymy. For example, abipolar cluster is generated by dry and wetwith synonyms of each of the adjectives at thecorresponding side of the cluster. Relationaladjectives, such as fraternal in fraternal twins,are organized only in synsets with pointers tothe corresponding nouns.

Verbs in WORDNET are divided into 15 clusters

Articles

58 AI MAGAZINE

Projecting-Sequence

Expanding-Sequence

Sequence

Element

Process

Circumstance

Participant

Simple-Thing

Simple-Quality

Saying&Sensing

Being&HavingDoing&Happening

Configuration

Um-thing

Figure 3. Top-Level Hierarchy from the GENERALIZED UPPER MODEL.

domain organization and consists only of ataxonomy (with the assumption that naturallanguage–processing tools that will use it willencode the axiomatic information in the nat-ural language–processing code itself). There isan extensive hierarchy of concepts (about 250of them) as well as a separate hierarchy for rela-tionships (um-relation).

At the top of the concept hierarchy (partial-ly represented in figure 3) is the concept um-thing that represents the most general phe-nomena or situation. It is subdivided intothree major subtypes: (1) configuration, “a con-figuration of elements all participating in someactivity or state of affairs”; (2) element, a singleconceptual term; and (3) sequence, “a complexsituation where various activities or configura-tions are connected by some relation to form asequence” (see www.darmstadt.gmd.de/pub-lish/komet/gen-um/node9.html). Relations arethen used to link elements into configurationsand sequences. Most of the relations arebetween a process and its participants, man-ner, and so on. Thus, the category um-relationis subdivided into process in configuration, cir-cumstance in configuration, and participant inconfiguration. Causal relation, for example,would then be one of circumstances in config-uration relations, and attribute would be oneof participants in configuration relations.

Sowa’s OntologyJohn Sowa (1997, 1995a) states his fundamen-

tal principles for ontology design as “distinc-tions, combinations, and constraints” (1995a,p. 175). He uses philosophical motivation asthe basis for his categorization. There are threetop-level distinctions:

First is Physical versus Information, or Con-crete versus Abstract. This is a disjoint parti-tion of all the categories in the ontology.

Second is Firstness versus Secondness versusThirdness, or Form versus Role versus Media-tion. These categories are not mutually exclu-sive. For example, Woman is considered to bea form (Firstness) because it can be definedwithout considering anything outside a per-son. As a mother, a teacher, or an employee,the same individual would be an example of arole (Secondness). These roles represent anindividual in relation to another type (a chile,a student, an employer). Marriage is a media-tion (Thirdness) category because it relates sev-eral (in this case, two) types together.

Third, Continuant versus Occurrent, orObject versus Process. Continuants are objectsthat retain their identity over some period oftime; occurrents are processes “whose form is inthe state of flux” (Sowa 1995a, p. 179). Forexample, Avalanche is a process, and Glacier isan object. Note that this distinction dependson the time scale. On a grand time scale of cen-turies, Glacier is also a process.

These distinctions are combined to generatenew categories (figure 4). At a lower level, forexample, Script (for example, a computer pro-

Articles

FALL 1997 59

Abstract

ProcessObject

Concrete

InformationObject

InformationProcess

PhysicalProcess

PhysicalObject

Figure 4. Sowa’s Ontology.Part of the ontology based on the two distinctions: Physical versus Information and Continuant versus Occurrent.

Collective. These divisions are assumed to beessentially parallel. Thus, something is Indi-vidual and Real (Cow), and something is Col-lective and Real (Herd). Figure 5 shows parallelportions of the ontology from Dahlgren(1988). Note that not all the links in the ontol-ogy are of the same type (which makes it some-what confusing): Although most of the linksare is-a ones, some of the links used in the col-lective part of the hierarchy are not. For exam-ple, Herd is placed under Animal in the hierar-chy (clearly, a consists-of relation), but Animalis under Concrete (an is-a relation).

At the bottom of the hierarchy are so-calledterminal nodes that “do not have theoreticalimportance” (Dahlgren 1988, p. 52) and thatare not cross-classified. For example, Animal,Role, and Person are all terminal nodes. Then,specific nouns are attached to terminal nodes.Maybe because all the nouns are specifiedbased on their use in sentences, the classifica-tion of terminal nodes sometimes contradictsthe semantics of a word. For example, a termi-nal node Culture (with E.coli as one of thenouns attached to it) is classified as Individualand Nonselfmoving (along with Social andLiving). E.coli bacteria can move by them-selves if treated as individual bacteria. Howev-er, Culture is more than a single bacteria and,hence, is not Individual.

gram, a baking recipe) is a form that representssequences and is thus defined as Abstract,Form, Process. Also, History (an execution of acomputer program), which is a propositionthat describes a sequence of processes, is thenAbstract, Form, Object.

At the top level, there is a category for everypossible combination of distinctions. To avoidtoo many combinations, however, constraintsare used at the lower levels to rule out cate-gories that cannot exist. Logical constraints,for example, would rule out triangles with foursides, and empirical constraints would rule outtalking birds. Constraints are represented asaxioms and are inherited through the hierar-chy to lower levels.

Dahlgren’s OntologyKathleen Dahlgren (1988) presents a linguisti-cally motivated ontology. One of the mainpoints made by Dahlgren is that ontologyshould be linked closely to language, and herontology is based on extensive linguistic andpsycholinguistic research. The primary con-struction principle in the ontology is the cross-classification hierarchy, which means that ateach node of the taxonomy, branching mightoccur in several dimensions. For example, atthe top level, Entity cross-classifies as eitherAbstract or Real and as either Individual or

Articles

60 AI MAGAZINE

COW

Animal

Concrete

RealAbstract

Individual Entity

HERD

Animal

Concrete

RealAbstract

Collective Entity

Figure 5. Parallel Portions of the Dahlgren (1988) Ontology.

Unified Medical Language System(UMLS)The unified medical language system (UMLS) ofthe National Library of Medicine (Humphreysand Lindberg 1993; also see wwwkss.nlm.nih.gov/Docs/umls.fact.html) is an example of adomain-specific system. This system wasdesigned to facilitate retrieval and integrationfrom multiple machine-readable biomedicalinformation resources. The project’s goal wasto facilitate the link between different sourcesand users that use different terminology.

UMLS has a concept hierarchy of 135 medicalconcepts and a semantic network that repre-sents additional (nontaxonomic) relationshipsbetween categories. Its concept hierarchyincludes both Entities (Physical and Conceptu-al) and Events, such as Activity, Process, andInjury or Poisoning. Figure 6 shows the top-level hierarchy of Entities.

The relations represented in the systeminclude such semantic relations as physically_related_to, spatially_related_to, functionally_related_to, temporally_related_to, and concep-tually_related_to. There are 51 different rela-tions in all, and the semantic network of con-

cepts and all the possible relations that holdbetween them are extremely large. One of themajor gaps in this ontology, however, is that itdoes not consider the dynamics of processesand the dynamic versus static relationsbetween them.

Toronto Virtual Enterprise (TOVE)The Toronto Virtual Enterprise (TOVE) Project(Gruninger and Fox 1995; TOVE 1995; also seewww.ie.utoronto.ca/EIL/tove/ontoTOC.html)is an example of a domain-specific ontologytailored for a specific task: enterprise modeling.The goal of the project is to create enterprisemodels that can answer questions pertaining tothe information explicitly in the model andcan also deduce answers to queries. TOVE uses aformal approach to the ontology-engineeringprocess itself and the ontology evaluation.First, through interaction with their industrialpartners, the ontology engineers determine theproblems that arise in the actual enterprises. Inthis way, the questions that the ontologyshould be able to answer are formed; they arecalled competency questions. These questionsjustify the choice of concepts and relations inthe ontology. Then, the competency questions

Articles

FALL 1997 61

Idea or Concept

Finding

Organism Attribute

Intellectual Product

Organization

Language

Anatomical Structure

Manufactured Object

Substance

Organism

Conceptual EntityPhysical Object

Entity

Group

Group Attribute

Occupation or Discipline

Figure 6. UMLS: Top-Level Hierarchy of Entities.

such as Protein, represents not a single mole-cule but a homogeneous population of suchmolecules.

The process knowledge base describes thepotential behavior of objects in the system. Theauthor uses a qualitative process theory (Forbus1984) approach to represent processes, andeach process has a detailed frame that includesnot only such parameters as input and outputof reactions but also preconditions for a reac-tion to occur and effects of a reaction. Thisinformation is then used to simulate reactionsbased on the specified starting conditions.

PLINIUS

The PLINIUS Project (van der Vet, Speel, andMars 1994; van der Vet et al. 1994; van der Vetand Mars 1993) is aimed at semiautomaticknowledge extraction from text in the domainof material science, ceramic materials in partic-ular. The conceptualization of the chemicalcomposition of materials is at the center of thePLINIUS ontology. The ontology for processesand properties has not been published yet, sowe only discuss the ontology of materials. Theauthors of PLINIUS use an approach to ontologyorganization that is different from a traditionalhierarchical-axiomatic approach. They call itthe conceptual construction kit: The ontologystarts with sets of atomic concepts, such aschemical elements, real numbers, and aggrega-tion states (gaseous, liquid, and so on), servingas primitives. Elements of these sets are thencombined to define all other concepts. This

and all concepts and axioms are formalized infirst-order logic, and the completeness theo-rems for the competency questions are provedbased on this terminology.

As for the ontology itself, there is no singleontology as such but a set of several ontologiesfor various logical parts in enterprise modeling(that are then interlinked by relations). Theseontologies are for activities and states (includ-ing a representation of time), products, organi-zation, and activity-based cost management.Within each of these ontologies, a number ofsmall hierarchies (two or three levels deep) rep-resent small clusters of knowledge. Figure 7gives an example of such a cluster. Axioms andrelations are used to link knowledge from var-ious clusters.

GENSIM

GENSIM (Karp 1993) is a genetic simulation sys-tem that represents and models enzymaticallycatalyzed biochemical reactions whose sub-strates include macromolecules with complexinternal structures, such as DNA and RNA.There are two subontologies in GENSIM (calledknowledge bases): (1) the class knowledge baseand (2) the process knowledge base. The top-level hierarchy of the class knowledge base isrepresented in figure 8. It includes biochemicalobjects, such as genes, proteins, and biochem-ical binding sites. General classes of biochemi-cal objects are represented with frames. Framesare also used to represent instances of theseobjects in simulations. In GENSIM, each object,

Articles

62 AI MAGAZINE

Board of Directors Division

DepartmentContractor

Employee

Organization-GroupOrganization-Individual

Organization-Entity

Figure 7. TOVE: An Organization-Entity Hierarchy in the Organization Ontology. An Organization entity can be an Individual or a Group.

combination is defined by construction rulesthat govern the ways in which the combina-tions can occur. Construction rules define howcomplex substances and structures are con-structed from the atomic concepts. There arerules for groups, chemical substances, andphases (built of chemical substances in relativeproportions).

A taxonomy is then defined implicitly bysubsumption. The lattice in figure 9 illustratesthis principle. Suppose there are three primi-tive sets: (1) A, (2) B, and (3) C. Capital lettersrepresent an arbitrary element of the set. Smallindexed letters represent specific elements ofthe corresponding set (we used only one ele-ment of each set to simplify the figure): a1 is anelement of the set A, b1 is an element of B, andc1 is an element of C. Suppose also that there isa construction rule that combines elements ofthese three sets to form a complex object.Then, an object where all the components arespecific elements of the sets is at the bottom of

the hierarchy: It is the most specific one. If weallow one of the three components to be sub-stituted with an arbitrary element of the corre-sponding set, we get a more general category,of which our original category is a subclass. Ifwe allow two of the three components to bearbitrary elements of the corresponding sets,we get an even more general concept. Themost general category is, of course, the onewhere none of the components is fixed.

It is unclear how this approach can beextended beyond chemistry or how processesand properties would be represented.

Knowledge Interchange Format (KIF)The knowledge interchange format (KIF) is alanguage for defining ontologies (Geneserethand Fikes 1992). It provides for the definitionof objects, functions, and relations. KIF hasdeclarative semantics, and it is based on first-order predicate calculus. It provides for therepresentation of metaknowledge and allows

Articles

FALL 1997 63

Genes

DNA segmentsRNA segments

RNA Genes

Protein

Nucleic AcidSegments

E.coli

Bacteria

Experiments Active Sites

Media

Thing

Figure 8. GENSIM: Top-Level Ontology.In fact, the author does not use a general top-level concept Thing, and the five next-level concepts are

unrelated. We introduced the top-level concept Thing for representation uniformity with other projects.

General CharacteristicsIn this section, we discuss the general charac-teristics of an ontology. These characteristicsinclude what the project goals are and whetherthe ontology is general or domain specific aswell as some implementation details.

Project GoalsMany features of an ontology depend on thepurpose it was created for. Hence, one of thefirst comparison criteria was the purpose of aparticular project. The major goals of the pro-jects included natural language understand-ing, information retrieval, theoretical investi-gation, knowledge sharing and reuse,simulation, and modeling. Many ontologiesare built for various natural language appli-cations, ranging from knowledge acquisitionfrom text to semantic information retrieval.CYC, GENERALIZED UPPER MODEL, WORDNET,Dahlgren’s ontology, UMLS, and PLINIUS fall intothis category. Although CYC does not have anyparticular natural language application it isdesigned for, natural language processing was

for the representation of nonmonotonic rea-soning rules. We consider KIF an ontologybecause a certain view of the world is incorpo-rated in it. Although its representation is not,by far, at the level of detail of other ontologiesdescribed here, microtheories of numbers, sets,and lists, for example, are axiomatized in KIF.Some axioms of the microtheory of sets arepresented in figure 10.

As an interchange format, KIF is tedious touse for specifications of ontologies as such, butthere are systems built on top of it that allowan ontology to be specified in the familiarterms of classes, relations, and so on. An ex-ample of such a system is ONTOLINGUA (Gruber1992), which is a set of tools for analyzing andtranslating ontologies. The World Wide Website for the Knowledge Systems Laboratory atStanford University (www-ksl.svc.stanford.edu: 5915/), where this system was developed,contains a number of ontologies designed withONTOLINGUA and a user-friendly ontology editorfor creating your own ontologies, which alsoallows you to incorporate existing ontologiesfrom the database into your own ontology.

Articles

64 AI MAGAZINE

a1b1c1

a1Bc1Ab1c1 a1b1C

Ab1CABc1 a1BC

ABC

Figure 9. PLINIUS: An Illustration of Taxonomy Organization by Subsumption.Capital letters correspond to arbitrary elements of the sets; small indexed letters correspond to specific

elements of the corresponding sets (thus, a1 is an element of the set A, and so on).

one of the major motivations in the project torepresent commonsense knowledge. WORDNET

and UMLS are large reference systems that arebuilt to be used in other natural language–pro-cessing systems, not in any particular applica-tion. TOVE has intelligent information retrievalas its main purpose. This retrieval, however, isnot text retrieval. TOVE creates enterprise mod-els and then answers queries to these models.Ideally, it will not only answer queries withwhat is explicitly represented but also be ableto deduce answers.

Another class of ontologies is theoreticalinvestigations that do not directly pursue thegoal of building a working system. Sowa’sontology is an example of such a system. Sowaexplores philosophical foundations for build-ing knowledge models, examines the historyof ontologies starting from Aristotle, and sug-gests his own model based on these early stud-ies. We also considered KIF, which is a knowl-edge representation system because it alsoembodies an ontological framework: It pro-vides ways (and limitations) for representingknowledge. A system called ONTOLINGUA (Gru-ber 1992) is built on top of KIF and provides acommon ontology-definition language. Thissystem, of course, is a first step in knowledgesharing and reuse: If different ontologies are tobe shared, they should at least be translatableinto the same formalism. Ontologies that facil-itate knowledge sharing and reuse are at thecurrent center of research in the field. Amongthe projects in this study (besides KIF) GENERAL-

IZED UPPER MODEL also can claim knowledgesharing and reuse as one of its purposes as itattempts to share knowledge across differentlanguages (the ontology is multilingual). GEN-ERALIZED UPPER MODEL is also the only one of thenatural language process–supporting ontolo-gies that spans several different languages inaddition to English.

Simulation and modeling are another pur-pose for ontology-development projects. GEN-SIM, for example, develops a model of qualita-tive scientific knowledge about objects andprocesses in molecular biology and biochem-istry, such that this knowledge can be used ina qualitative simulation to predict experimen-tal outcomes.

General or Domain Specific?One of the basic characteristics of an ontologyis whether it attempts to cover general worldor commonsense knowledge or a specificdomain. In both cases, we should ask how itcan be integrated with other ontologies. For ageneral ontology, the issue is whether domain-specific ontologies can easily be attached to it.Conversely, it should be considered whether adomain-specific ontology can easily be inte-grated into a more general one and if it can useknowledge that is defined elsewhere, say, insome parts of a more general ontology.

Of the systems studied, CYC, Dahlgren’sontology, Sowa’s ontology, GENERALIZED UPPER

MODEL, WORDNET, and KIF are not domain specif-ic. They are targeted at creating a general world

Articles

FALL 1997 65

1. All entities in KIF are either individuals or sets. This distinction isexhaustive and mutually disjoint:(or (set ?x) (individual ?x))(or (not (set ?x)) (not (individual ?x)))

2. An object can be a member of another object if the latter is a set:(=> (member ?x ?s) (set ?x))

3. Extensionality property: two sets are identical if and only if theyhave the same members(=> (and (set ?s1) (set ?s2)) (<=> (forall (?x)

(<=> (member ?x ?s1) (member ?x ?s2))) (= ?s1 ?s2)))

Figure 10. KIF: A Partial Axiomatization of a Microtheory of Sets from Genesereth and Fikes (1992).

how it can go beyond chemical substances andbe integrated with a more general ontology.UMLS is the only one of the domain-specificontologies that might deal with these issues ofintegration (although not explicitly): UMLS

starts its categorization from general notionsof Entity and Event, so one can envision a gen-eral ontology where this ontology would fit in.

Implementation DetailsFor the purpose of reference, we summarizedsome other characteristics in table 3.1 Thesecharacteristics include what the project size is(both in terms of concepts and axioms), whatformalism was used (frames, first-order logic,conceptual graphs, semantic networks, and soon), and whether it was implemented or not(projects started with the purpose of theoreticalinvestigation might not be implemented; forothers, some might be implemented complete-

model. Their views on what this model is,however, differ, as we discuss in later sections.Of the remaining systems, TOVE is in thedomain of enterprise modeling, UMLS is anontology for medical concepts, PLINIUS dealswith material science (specifically, ceramicmaterials), and GENSIM deals with molecularbiology and biochemistry.

In terms of knowledge sharing and integra-tion with other ontologies, some generalontologies (for example, Dahlgren’s ontology)claim to simplify inclusion of new domains asan integral part of the original ontology orfacilitate the interface between a domainontology and the general ontology (GENERAL-IZED UPPER MODEL). Most domain ontologies(GENSIM, TOVE) do not touch the subject of inte-gration, and it is unclear how integration canbe done. PLINIUS can be extended to an ontol-ogy of chemical substances, but it is unclear

Articles

66 AI MAGAZINE

Table 3. Technical Data on the Ontologies in This Study.

Size Formalism Implemented? Published or Not?CYC 105 concept

types; 106

axioms

CYCL—CYC'srepresentation

language

Yes Partially online: 3000 concepttypes at the top level(www.cyc.com/cyc-2-1/cover.html )

Dahlgren's ontology 1500 nouns; 600verbs

Prolog predicates Yes Partially in print (Dahlgren1995, 1988)

Sowa's ontology 90 concepts andconcept types;40 conceptual

relations

Conceptualgraphs

No Partially in print (Sowa 1997)

GENERALIZED UPPERMODEL

250 concepts LOOM Yes Published online(www.darmstadt.gmd.de/publish/komet/genum/newUM.html)

WORDNET 95,600 wordforms in 70,100

synsets

Semanticnetworks

Yes Published online(ftp://clarity.princeton.edu/pub/wordnet/)

TOVE Frameknowledge base

Yes

UMLS 135 semantictypes; 51semanticrelations;252,982concepts

Semanticnetworks

Yes Published online(wwwkss.nlm.nih.gov/Docs/umls.fact.html)

GENSIM Frameknowledge base

Yes

PLINIUS About 150atomic concepts

and 6construction

rules

Frameknowledge base

Yes Published report but notontology

KIF N/A Is itself aformalism

Yes Yes

ly, or some might just have a proof-of-conceptprototype). The last but not least of these gen-eral questions is if the ontology is publishedand freely available: Some are easily accessiblein their entirety (with appropriate licensingagreements) and can be studied and reused.Some are proprietary information and are notpublished, and one can only study the papersthat describe it. In summary, UMLS, CYC, andWORDNET are the largest systems in terms of thenumber of concepts (on the order of 105); CYC

also has the largest number of axioms (106).

Ontology Design and Evaluation Process

There is an ongoing discussion in the ontologycommunity about the best process for buildingan ontology. Should it be built bottom up,starting from the most specific concepts andthen grouping them in categories? Should it bebuilt top down by identifying the most generalconcepts and creating categories at the mostgeneral level first? Should some middle layer ofconcepts serve as a starting point and then thedevelopment go in both directions fromthere—a middle-out approach (see Uscholdand Gruninger [1996]) for the argument forthe third alternative)?

More ontologies in the study used a bottom-up approach in constructing a hierarchy thantop down. Although almost no one statesexplicitly what approach they used, here iswhat we gathered from what was in the papers:The top-down approach is used in Sowa’sontology; the bottom-up approach is used in aWORDNET and PLINIUS; the middle-out approachis used by TOVE.

There is some research toward automaticallyacquiring ontological knowledge from naturallanguage texts, reducing manual effort. How-ever, none of the ontologies we studied usedany sort of automatic ontological knowledgeacquisition. All were constructed manually,with varying amounts of human effortinvolved depending on the size of the project:from the single-author ontologies of Sowa andKarp (GENSIM) to the multiperson 10-year-longCYC effort.

For GENERALIZED UPPER MODEL, which extend-ed an already existing fairly large ontology(PANGLOSS [Hovy and Knight 1993]) to work fordifferent languages, the design process was asfollows: First, there was a PENMAN UPPER MODEL

for English; then German was added, and theMERGED UPPER MODEL was created. The GENERAL-IZED UPPER MODEL is the extension of the MERGED

UPPER MODEL to cover the three languages: (1)English, (2) German, and (3) Italian. For each

subhierarchy of the MERGED UPPER MODEL, the setof relevant Italian linguistic behavior was iden-tified. The behavior was then compared toEnglish. If Italian and English-German behav-ior were compatible, no modification wasneeded; otherwise, the modification was pro-posed, and the English-German model wasreevaluated.

WORDNET, PLINIUS, and Dahlgren’s ontologyused a text corpus, or dictionary, as the basisfor their development process. The PLINIUS

approach, for example, was to use the corpusas an operational specification of the domain.The ontology was required to cover every rele-vant concept from the texts and make everyrelevant distinction. Similarly, Dahlgren’sschema was originally developed to handlepredicates, both nouns and verbs, found in4100 words of text drawn from geographytextbooks. Dahlgren also based her schema oncognitive psychology research. Psycholinguis-tic experiments with people were conducted(in particular, to determine properties andfunctions of categories). WORDNET based its cre-ation on lexicographer-created files of wordforms and word meanings, which were thenautomatically parsed into a database.

TOVE used the following approach to ontol-ogy design: First, motivating scenarios, storyproblems or examples that are not adequatelyexpressed in existing ontologies, were created.Any proposal for a new ontology or extensionto an ontology must describe a motivating sce-nario and a set of intended solutions to theproblem presented in the scenario. Second,informal competency questions, a set of queries(in an informal form), were formulated. Ideal-ly, for each new object, relation, and so on,there should be a competency question requir-ing it. These competency questions are used toevaluate the expressiveness of the ontology.

One of the more interesting research issuesin ontology design is the formal evaluation ofthe created ontology (or any evaluation, forthat matter) (Góméz Perez, Juristo, and Pazos1995).We considered whether there was anevaluation of the conceptual coverage or prac-tical usefulness of the ontology. A proof-of-con-cept prototype can serve as such an evaluation(at least, for practical usefulness), or there canbe a predetermined corpus and an assessmentlater if all the information in the corpus can becovered with the created ontology.

TOVE was the only project that did a formalevaluation of its ontology. This process con-sisted of representing the competency ques-tions formally and then proving completenesstheorems with respect to those queries basedon the first- and second-order logic representa-

Articles

FALL 1997 67

creates a subcategory for each possible combi-nation of these values (which can lead tocombinatorial explosion if more than one dis-tinction is used at more than just a few top lev-els). This requirement also makes the top levelof the hierarchy tangled. Dahlgren’s ontologyand GENERALIZED UPPER MODEL, however, havemore than one distinction at some lower levelsof the ontology, but they do not require a cat-egory to be created for every possible combina-tion of distinctions.

The third major approach to taxonomyorganization is having a large number of smalllocal taxonomies that might be linked togeth-er by relations or axioms. TOVE and KIF repre-sent this type of approach. TOVE, for example,divides its domain (enterprise modeling) into anumber of different subontologies (for exam-ple, ontologies for activity, product, time, andorganization and inside those for part, con-straint, requirement, feature). Even withinthese smaller ontologies in TOVE, no overalltaxonomies exist. Its taxonomies seem to belocal, each going very few levels deep.

Although WORDNET uses a simple hierarchyfor noun synsets, it employs a different organi-zation of synsets for verbs and adjectives.Descriptive adjectives, for example, are orga-nized in bipolar clusters (for example, dry-wet). Verbs are divided into 15 clusters accord-ing to their meaning, with entailment beingthe primary relationship between the verbs ina cluster.

A completely different way of defining andorganizing categories is used in the PLINIUS

ontology. Technically, there is no taxonomy assuch. The principle used to construct theontology is the conceptual construction kit. Inshort, an ontology consists of several sets ofatomic concepts, such as chemical elements,real numbers, and aggregation states (gaseous,liquid, and so on), serving as primitives andconstruction rules that define all otherconcepts. There are rules for groups, chemicalsubstances, phases (built of chemical sub-stances in relative proportions), and so on.Then, a taxonomy is defined implicitly by sub-sumption. Each atomic concept set X has apseudomember called arbitrary (X) that standsfor any member of the set. Now, for a conceptthat contains this term, any concept where theterm is replaced by a particular member of setX is a subconcept.

To summarize, CYC, Dahlgren’s ontology,Sowa’s ontology, GENERALIZED UPPER MODEL, UMLS,and GENSIM have all the concepts in one taxon-omy (a simple one or with several dimensionsat some levels). TOVE and KIF have a number ofsmall local taxonomies. PLINIUS does not have

tion of concepts, attributes, and relations. ForGENSIM, which was designed for simulations,the evaluation consisted of having the pro-gram predict outcomes of already-known reac-tions. According to the author, the predictionswere flawless. Most projects, however, envisionvarious applications that would use the ontol-ogy and, thus, prove its conceptual coverageand practical usefulness (see the discussion ofthe various applications that ontologies wereused for).

TaxonomyFormally, an ontology consists of terms, theirdefinitions, and axioms relating them (Gruber1993); terms are typically organized in a taxon-omy. This is where some disagreement amongontology researchers arises. Some say thataxioms are central to ontology design, and acomplete, or high-level, taxonomy does noteven have to exist (maybe only for visualiza-tion). Others say that one should first concen-trate on defining a taxonomy of fundamentalconcepts (although they agree that thereshould be axioms or knowledge in some otherform associated with the concepts in the tax-onomy). However, most of the ontologies westudied do have some sort of taxonomy (orseveral taxonomies), which is our first topic incomparing the content of the ontologies.

Based on the previous paragraph, the firstquestion to ask is whether there is an explicittaxonomy of concepts? Then, how are con-cepts organized? Is it just a simple concepthierarchy or a more complex taxonomy withseveral dimensions at each level? Is it a num-ber of small local taxonomies or somethingcompletely different?

We found three major approaches to con-cept organization (all of them have some sortof taxonomy; we discuss the PLINIUS taxonomy-less approach later). UMLS, GENSIM, and WORDNET

(for noun synsets only) adopt the approach ofhaving everything in a single tree-like concepthierarchy with multiple inheritance. The linksin the hierarchy are is-a links, and the divisionof a concept into subconcepts is disjoint.

CYC, GENERALIZED UPPER MODEL, and Dahlgren’sand Sowa’s ontologies use what Sowa calls thedistinctions approach. Thus, there are severalparallel dimensions along which one or moretop-level categories are subcategorized, forexample, Real versus Abstract and Individualversus Collective. In this case, categories arespecified by various combinations of valuesalong these dimensions. For example, Herdcan be categorized as Real and Collective,whereas Idea is Abstract and Individual. Sowa

Articles

68 AI MAGAZINE

any explicit taxonomy at all. WORDNET has asingle taxonomy for its nouns but a differentorganization for verbs and adjectives.

Treatment of Specific CategoriesAlthough the projects that we studied werecreated for different purposes, there are a num-ber of general classes of concept that are repre-sented in almost all ontologies: things,processes and events, relations, and properties.This subsection discusses which of these class-es are represented (or underrepresented) ineach ontology.

Things (real or abstract) are representedeverywhere. PLINIUS, GENSIM, TOVE, and UMLS donot attempt to represent all the Things in theuniverse, but those relevant to the domain are,of course, present in the ontology.

Processes and events are almost as ubiqui-tous as Things. CYC, for example, definesProcess as a subclass of Event and Stuff, whichare both subclasses of IndividualObject. AnIndividualObject that has a temporal extent(starting time, duration, ending time) is calleda Process. As mentioned above, WORDNET treatsverbs (which basically correspond to processesand events) separately from nouns and adjec-tives. GENERALIZED UPPER MODEL has a separatetaxonomy for what they call Configurations,which is the ontology of processes. GENSIM hasa limited number of experimental processesand reactions (which are also processes)defined.

Sowa (1977) points out that the distinctionbetween something called an Object andsomething called a Process in fact depends ontime scale. In his ontology, anything that doesnot change over time (on a particular timescale) is called an Object (Continuant), andanything that is in a state of flux is called aProcess (Occurrent). Consider a tree, for exam-ple. A tree is a permanent Object on a scale ofminutes. On a scale of years, however, a tree isa Process.

Much less universal than Things andProcesses is the presence of some sort of taxon-omy of relations or properties (the presence ofwhich creates a need for higher-order logic; wediscuss this issue in Internal Concept Structureand Relations between Concepts). GENERALIZED

UPPER MODEL has probably the most extensivetaxonomy of relations. Many properties inGENERALIZED UPPER MODEL (for example, Color-Property-Ascription) are defined as concepts inthe taxonomy with two (or more) roles for theconcepts that are related by it. UMLS has a two-level deep taxonomy for relations, andDahlgren’s ontology has a list (not a taxono-my) of all relations as part of the ontology.

Other general categories that are present inonly one or a few of the studied projects andworthy of mention are things internal to themachine (CYC), classification of spatiotemporalrelations in GENERALIZED UPPER MODEL, axiomati-zation of sets and lists (KIF), and locations (GEN-SIM) (active sites on DNA are a separate catego-ry in the taxonomy).

We considered the presence and treatmentof one specific microtheory—the ontology oftime. Although for almost any kind of reason-ing one needs some representation of time, wenoticed that not all ontologies model temporalconcepts (and, hence, do not support any tem-poral reasoning). GENERALIZED UPPER MODEL andTOVE have simple ones that axiomatize timepoints and time periods. In CYC, Time is a phys-ical quantity possessed by TemporalObjects(such as Events). TimeInterval, which is a first-class object, is a TemporalObject that can becharacterized fully just by specifying its tempo-ral attributes. TimeInterval has dates, years,and so on, as its subcategories. Other ontolo-gies do not touch this issue at all. GENSIM, forexample, justifies not using any ontology oftime by making an assumption that everyexperiment happens in a short period of time.In the case of a smaller ontology being inte-grated into a larger ontology, there is a possi-bility of reusing an ontology of time present ina larger model. This approach, however,requires a smooth integration.

Top-Level DivisionAmong the most interesting questions pertain-ing to ontology organization are, What are themajor top-level categories in the ontology?How does the ontology divide the world at thetop level?

The most ubiquitous top-level division ofconcepts is Abstract versus Real division. It ispresent in Dahlgren’s and Sowa’s ontologies asthe top-level distinction (termed Physical ver-sus Informational in the latter). The Tangibleversus Intangible division in CYC also reflectsthis distinction. CYC, however, has a third cat-egory at the same level—CompositeTangible&IntangibleObject—to denote something thathas both a physical extent and an intangibleextent. Person category can be such an exam-ple: A person’s body constitutes the physicalextent, and a person’s mind is the intangibleextent. In UMLS, Entities are divided into Phys-ical Objects and Conceptual Entities, that is,also along the Physical versus Abstract lines.

Another frequently found top-level catego-rization is Individual versus Collection. Bothof these are top-level distinctions in Dahlgren’sand Sowa’s ontologies. This distinction is also

There are anumber ofgeneral classes of concept that are represented in almost all ontologies:things,processes and events,relations, and properties.

Articles

FALL 1997 69

hierarchies such as CYC to much more densebut less tangled hierarchies such as WORDNET.

Internal Concept Structure andRelations between Concepts

Almost any ontology has something morethan just a taxonomy of concepts. The least itcan have is a set of properties and componentsthat are meaningful for each category. This lev-el is the internal concept structure. Other rela-tions among concepts (spatial, functional, andso on) can also be represented. CYC, Dahlgren’sontology, Sowa’s ontology, GENSIM, GENERALIZED

UPPER MODEL, TOVE, and KIF have properties androles associated with concepts (often in theform of slots in a frame) and relations that linkconcepts to each other. Objects in PLINIUS arestructured, too, but differently from the frame-based systems. The structure of its objects isdefined by a set of construction rules that spec-ify the internal composition of concepts.

WORDNET and UMLS do not have any proper-ties or roles associated with objects in their tax-onomy: All the concepts are atomic and do nothave any internal structure. They can be relatedto other concepts though, and the predefinedrelations themselves do have a limited struc-ture (unlike concepts). The fact that they are all

pronounced at all levels in CYC, and Lenat andGuha (1990) devote a lot of discussion to thisissue.

The lack of correspondence between ontolo-gies in their top-level division poses an obsta-cle to the integration of different ontologies.Campbell and Shapiro (1995) discuss the ideaof a mediation interface, which will translatestatements made in one ontology to anotherontology. The authors compare top levels of anumber of ontologies to determine how simi-lar or different they are and, hence, how feasi-ble it would be to integrate them. Two of thecriteria they use is how tangled and how sparseor dense the top-level hierarchy is. For exam-ple, a simple treelike structure with little or nomultiple inheritance would not be consideredtangled, whereas hierarchies that employ thedistinction approach would have a highly tan-gled structure. Also, the more subcategoriesthat exist at the top level of categorization, themore dense this top level is. It is, of course, eas-ier to integrate ontologies that are more similarin the way they organize their top-level hierar-chies (then, there is, of course, the issue of thetop-level categories themselves being alike).Figure 11 illustrates the tangledness and densi-ty scales of the projects that we studied. Theyvary from relatively sparse but very tangled

Articles

70 AI MAGAZINE

Cyc, Plinius,Dahlgren's ontology

Sowa's ontologyGeneralized-UM

Wordnet, TOVEUMLS, GENSIM

less tangled fairly tangled very tangled

WordNetPlinius

Generalized-UMUMLS

sparse more dense very dense

Cyc, Dahlgren's ontologySowa's ontology, TOVE,

GENSIM

Figure 11. Comparison of How Tangled and Dense the Ontologies in the Study Are.

binary already provides some internal structureto them. In UMLS, relations (which are first-classobjects and have their own taxonomy) alsohave their possible domain categories specified.

When studying relations between categoriesin the ontologies, we found one relation repre-sented very differently in the ontologies andoften was not adequately dealt with. This is theissue of part-whole relations. There are severaltypes of part-whole relation that can requiredifferent reasoning. For example, Winston,Chaffin, and Hermann (1987) differentiate thefollowing types of part-whole relation: compo-nent-object (branch-tree), member-collection(tree-forest), portion-mass (slice-cake), stuff-object (aluminum-airplane), feature-activity(paying-shopping), place-area (Princeton–NJ),and phase-process (adolescence–growing up).We were looking for different categorizations ofpart-whole relations and treatment for them.

Most of the ontologies do not directlyaddress the issue of part-whole relation and thedistinction between subset-of, part-of, mem-ber-of, and so on. Part-whole relations are han-dled just like other roles or relations. However,GENERALIZED UPPER MODEL, Sowa’s ontology, andTOVE provide some analysis of part-whole rela-tions. Here is how each of the systems does it.

GENERALIZED UPPER MODEL: A part-whole rela-tion is a concept in the taxonomy of relations.It is a relation with two roles: whole (thedomain) and part (the range). It is a specializa-tion of a generalized-possession relation. Thereare three possible subtypes that are describedfor part-whole, although they are not currentlydistinguished within the grammar:

First, consists-of is expressed as

<whole> consist of <parts> or <parts>make up <whole> .

The filler of the part role of the consists-of rela-tion must be all the parts of the whole. Forexample, a protein consists of amino acids, buta car does not consist of an engine.

Second, constituency is a specialization of thepart-whole relation in which the whole is val-ue restricted to be a decomposable object. Forexample, an engine is a constituent of a car.

Third, ingrediency is the relation between awhole and its parts when the whole is a mass-object. For example, gravel is an ingredient ofconcrete.

Sowa’s ontology: There is a category in thetaxonomy (InternalRole) for things that play arole with respect to something in which theyare contained (as opposed to ExternalRole forthe things that play a role with respect tosomething outside themselves). InternalRole issubdivided into several categories: When aContinuant versus Occurrent distinction is

applied to InternalRole, it produces ObjectPartand ProcessPart. Object parts that can existindependently of the object are called pieces(for example, engine of a car); those that can-not are called attributes (for example, size,weight, color of a car). For ProcessPart, thesame distinction leads to participant (for exam-ple, a book and a reader in a reading process)and manner (for example, speed of the wind,style of a dance).

TOVE: A part is defined as a component of anartifact being designed or a software compo-nent. The artifact itself is also considered apart. Parts are classified into Primitive Part (apart that cannot be further subdivided intocomponents), Composite Conjunct Part (com-posed of two or more primitive or compositeparts), and Composite Disjunct Part (representalternatives of parts; that is, at any point intime, the part has only one of its componentsas a valid component).

Of these three categorizations, Sowa’s is themost general. All the types of part-whole-rela-tion in GENERALIZED UPPER MODEL are what Sowacalls pieces (which GENERALIZED UPPER MODEL goeson to classify). Attribute and process partici-pants and manner are not considered parts inTOVE and GENERALIZED UPPER MODEL (note thatalthough a part-of relation in GENSIM is gener-alized to include processes, part in this case isa subprocess of a process). TOVE’s approach topart-of relation is different because it reflectsthe way an artifact is composed from its parts.

WORDNET and PLINIUS also single out the part-of relation. In fact, this relation between con-cepts (beyond taxonomy) is the primary one inthe two systems: This relation is the only onecurrently implemented between noun synsetsin WORDNET, and it is the only relation betweenclasses in PLINIUS, along with its counterpartCOMPOSES. In UMLS, the following relations areincluded in the relation taxonomy within thecategory physical-relation: part-of, contains,and consists-of.

Axioms and First-Order LogicBesides the taxonomy and structure of con-cepts, axioms are a way of representing moreinformation about categories and their rela-tions to each other as well as constraints onproperty and role values for each category.Sometimes, axioms are explicitly specified, andsometimes ontology consists only of categoriesand corresponding frames, and everything elseis hidden in application code. It is importantto note here that there is a fine line betweeninternal concept structure and axioms. Onecan represent a category using a frame formal-

… axioms are a way of representingmore informationabout categories and their relations toeach other as well asconstraints on propertyand role values foreach category.

Articles

FALL 1997 71

superclass), which leads to nonmonotonicitytoo. Sowa uses conceptual graphs, which them-selves use higher-order logic.

Here are some cases where CYC goes beyondfirst-order logic (Lenat 1995):

Certainty: Each assertion is assumed true bydefault, but one can make statements such as“assertion A is less likely than assertion B.”

Reification: A predicate or function isturned into an object in the language. It allowsassertions about categories: “Property P is anopposite of property Q.”

Modals: “John wants assertion A to be true.”Contexts: An assertion can be true only in a

particular context. Contexts are first-classobjects in CYC; for example, “You cannot seesomeone’s heart” (true only as a default butnot true during heart surgery).

ApplicationsAn important way of evaluating the capabili-ties and practical usefulness of an ontology isconsidering what practical problems it wasapplied to. In this section, we summarize someof the applications that the ontologies in thisstudy were used for.

The major classes of applications thatontologies are used for are natural languageprocessing, information retrieval, and simula-tion and modeling. CYC’s ontology, for exam-ple, is used in a CYC natural language system(CNL) whose purpose is to translate natural lan-guage texts into CYCL. GENERALIZED UPPER MODEL

is used in a multilingual text-generation sys-tem that uses stock phrases for each concept togenerate text. Dahlgren’s ontology was a basisfor a text-understanding system that readsnewspaper articles to produce a cognitive mod-el of the text content (Dahlgren 1990). PLINIUS

also falls into the class of natural language sys-tems: It is designed for extracting knowledgefrom titles and abstracts of articles in its

ism, having roles and properties representedby slots of a frame. One can also express thesame facts using axioms. A taxonomy, too, canbe represented using axiomatic notations. Forexample, the axiom where a is an instance of acategory, and A and B are categories states thatA is a subcategory of B. Here, we are only look-ing at explicit axioms that go beyond the hier-archy representation or internal concept struc-ture. We consider how axioms are expressedand if they are, for example, part of a conceptdefinition or can exist by themselves.

The following projects explicitly use axiomsthat go beyond taxonomic and property-roleinformation: CYC, TOVE, Sowa’s ontology, GEN-SIM, and KIF. GENERLAIZED UPPER MODEL has all theaxiomatic information incorporated in thenatural language–processing code. Dahlgren’sontology and GENSIM incorporate axioms inconcept definitions. WORDNET and UMLS do nothave any axioms.

When discussing axioms and formalism,one of the questions that we pay particularattention to is what the instances are of goingbeyond first-order logic. Some systems usedefaults, or ways of expressing modals anduncertain facts. Some do not do and stay with-in the boundaries of first-order logic.

One of the most common instances of goingbeyond first-order logic is having some sort ofhierarchy-of relations, that is, treating relationsas first-class objects: GENERALIZED UPPER MODEL

and TOVE are examples of this approach. UMLS

also has hierarchy-of relations, but because itdoes not have axioms, the first-order–logicissue is irrelevant for it. Another commonexample of going beyond first-order logic is theuse of defaults, for example, CYC and KIF. For KIF,this is the only instance of going beyond firstorder because it is based primarily on first-orderlogic. GENSIM uses override inheritance in itsprocess hierarchy (property values in a subclasscan override the corresponding values in a

Strengths and Contributions OntologiesContent creation CYC, UMLS, GENERALIZED UPPER MODEL, WORDNET

Well-defined formalism creation KIF

Approach based on linguistic and psycholinguistic data Dahlgren’s ontologyThoroughly motivated top level Sowa’s ontologyMultilingual GENERALIZED UPPER MODEL

Extended hierarchy of relations GENERALIZED UPPER MODEL

Novel approach to ontology design PLINIUS (conceptual construction kit)Comparison of different knowledge representation formalisms PLINIUS

Methodology for formal evaluation TOVE

Use of ontology for simulation GENSIM

Articles

72 AI MAGAZINE

Table 4. Summary of Major Strengths and Contributions of the Ontologies in the Study.

domain and creates an interim knowledge basewhere the knowledge is associated with a par-ticular abstract. CYC was used in informationretrieval for a system called CYCCESS, which is asemantic information-retrieval system used forconsistency checking and information retriev-al from structured information such as data-bases and spreadsheets.

CYC, GENSIM, and TOVE were used in simula-tion and modeling applications. There is a per-son-modeling prototype application that usesCYC’s ontology to put together a model of a per-son based on pieces of information it mighthave about a person’s interests, family, job, andso on. This information is then used to sort, forexample, advertisements that should or shouldnot be sent to a person based on the model.GENSIM was primarily created and used for sim-ulation of metabolic pathways, DNA transcrip-tion, and so on. The primary goal of TOVE is tomodel a virtual company and provide a test bedfor research into enterprise integration.

From other classes of applications, UMLS wasused to implement the internet grateful medinterface to MEDLAB databases (developed bythe National Library of Medicine). KIF served asthe basis for ONTOLINGUA, which translates def-initions written in standard form into special-ized representations, including frame-basedsystems as well as relational languages.

ConclusionTo conclude, we considered major strengthsand contributions of each project, such as con-tent creation, well-defined formalism creation,and some novel approach to ontology design.We also outlined weaknesses of particular pro-jects. Major strengths and contributions of theprojects in the study are summarized in table 4.

Many researchers in the area agree that oneof the major challenges in the area of ontologydesign is creating the content of ontologies,that is, creating large, well-developed, usableontologies (either general or domain specific).CYC, WORDNET, and UMLS are major steps in thisdirection. Few projects span more than one lan-guage and can be applied to natural languagetexts in various languages. GENERALIZED UPPER

MODEL is one of the multilingual projects (it isbased on English, German, and Italian). One ofthe interesting things that was done as part ofthe PLINIUS Project was implementing the ontol-ogy in several different knowledge representa-tion formalisms (Speel 1995). This was a sub-stantial step in showing experimentally theindependence of the ontology itself from theformalism that is used. The TOVE Project made asignificant step in an underdeveloped area of

ontology research: formal evaluation. As for the integration of various ontologies,

this study showed that at this point, there isgreat diversity in the way ontologies aredesigned and in the way they represent theworld. Before real knowledge sharing and reusewill be practical, some standards shouldemerge in what an ontology should consist of,what the basic classes of object are that shouldbe represented (for example, things, processes,relations), and how they are represented (notin terms of formalism but in terms of knowl-edge that should accompany the concepts).

We believe that a study such as the one pre-sented here is a useful step in the process ofdeveloping these standards because before wetry to standardize, we first need to understandthe alternatives.

AcknowledgmentsThis research is supported in part by theNational Science Foundation under grants IRI-9117030 and IRI-9633661.

Note1. Not all the data on these characteristics are available.

ReferencesBateman, J. A.; Magnini, B.; and Rinaldi, F. 1994.The Generalized Italian, German, English UpperModel. Paper presented at the Eleventh EuropeanConference on Artificial Intelligence (ECAI’94)Workshop on Comparison of Implemented Ontolo-gies, 8–12 August, Amsterdam, The Netherlands.

Campbell, A., and Shapiro, S. 1995. Ontologic Medi-ation: An Overview. In Proceedings of the Workshopon Basic Ontological Issues in Knowledge Sharing,16–25. Ottawa: University of Ottawa.

Dahlgren, K. 1995. A Linguistic Ontology. Interna-tional Journal on Human-Computer Studies 43(5–6):809–818.

Dahlgren, K. 1990. Naive Semantics and Robust Nat-ural Language Processing. In Proceedings of the Sym-posium on Text-Based Intelligent Systems. Presentedat the AAAI Spring Symposium Series, 27–29 March,Stanford University.

Dahlgren, K. 1988. Naive Semantics for Natural Lan-guage Understanding. Boston, Mass.: Kluwer Academic.

Forbus, K. D. 1984. Qualitative Process Theory. Arti-ficial Intelligence 24:85–168.

Genesereth, M. R., and Fikes, R. E. 1992. KnowledgeInterchange Format, Version 0.3, Reference Manual,Knowledge Systems Laboratory, Stanford University.

Gómez-Pérez, A.; Juristo, N.; and Pazos, J. 1995. Eval-uation and Assessment of the Knowledge-SharingTechnology. In Toward Very Large Knowledge Bases,ed. N. J. I. Mars, 289–296. Amsterdam, The Nether-lands.

Before realknowledgesharing and reuse will be practical,some standardsshould emerge inwhat anontologyshould consist of,what thebasic classesof object are thatshould be represented…, and how they are represented

Articles

FALL 1997 73

Uschold, M., and Gruninger, M. 1996. Ontologies:Principles, Methods, and Applications. KnowledgeEngineering Review 11(2): 93–155.

van der Vet, P. E., and Mars, N. J. I. 1993. StructuredSystem of Concepts for Storing, Retrieving, andManipulating Chemical Information. Journal ofChemical Information and Computer Sciences33:564–568.

van der Vet, P. E.; Speel, P.-H.; and Mars, N. J. I. 1994.The PLINIUS Ontology of Ceramic Materials. Paperpresented at the Eleventh European Conference onArtificial Intelligence (ECAI’94) Workshop on Com-parison of Implemented Ontologies, 8–12 August,Amsterdam, The Netherlands.

van der Vet, P. E.; Jong, H. D.; Mars, N. J. I.; Speel, P.-H.; and Stal, W. G. T. 1994. PLINIUS IntermediateReport, University of Twente.

Winston, M. E.; Chaffin, R.; and Hermann, D. J.1987. A Taxonomy of Part-Whole Relations. Cogni-tive Science 11:417–444.

Woolf, H. B., ed. 1981. Webster’s New Collegiate Dic-tionary. Springfield, Mass.: G. & C. Merriam.

Natalya Fridman Noy is a Ph.D.student in the College of Comput-er Science at Northeastern Univer-sity. She received her B.S. inapplied mathematics fromMoscow State University, Russia.She received her M.A. in computerscience from Boston University.Her doctoral dissertation is on

developing an ontology for intelligent informationretrieval in experimental sciences. Her researchinterests include ontology design, semantic informa-tion retrieval, and natural language processing. Here-mail address is [email protected].

Carole D. Hafner is an associateprofessor of computer science atNortheastern University. Shereceived her Ph.D. in computerand communication sciences fromthe University of Michigan andwas a senior research scientist atthe General Motors Research Lab-

oratories before joining the Northeastern faculty.Her research interests include knowledge representa-tion, information retrieval, and natural languageprocessing. She is coeditor of the journal ArtificialIntelligence and Law and secretary-treasurer of theInternational Association for Artificial Intelligenceand Law. Her e-mail address is [email protected].

Gruber, T. R. 1993. Toward Principles for the Designof Ontologies Used for Knowledge Sharing, KSL-93-04, Knowledge Systems Laboratory, Stanford Univer-sity.

Gruber, T. R. 1992. ONTOLINGUA: A Mechanism to Sup-port Portable Ontologies, KSL-91-66, Knowledge Sys-tems Laboratory, Stanford University.

Gruninger, M., and Fox, M. S. 1995. Methodologyfor the Design and Evaluation of Ontologies. Paperpresented at the Workshop on Basic OntologicalIssues in Knowledge Sharing, 19–20 August, Montre-al, Quebec, Canada.

Guha, R. V., and Lenat, D. B. 1994. Enabling Agentsto Work Together. Communications of the ACM 37(7):127–142.

Hovy, E., and Knight, K. 1993. Motivation-SharedKnowledge Resources: An Example from the PAN-GLOSS Collaboration. Paper presented at the Work-shop on Knowledge Sharing and InformationInterchange, 28 August–3 September, Chamberry,France.

Humphreys, B. L., and Lindberg, D. A. B. 1993. TheUMLS Project: Making the Conceptual Connectionbetween Users and the Information They Need. Bul-letin of the Medical Library Association 81(2): 170.

Karp, P. D. 1993. A Qualitative Biochemistry and ItsApplication to the Regulation of the TrytophanOperon. In Artificial Intelligence and Molecular Biology,ed. L. Hunter, 289–325. Menlo Park, Calif.: AAAIPress.

Lenat, D. B. 1995. CYC: A Large-Scale Investment inKnowledge Infrastructure. Communications of theACM 38(11): 33–38.

Lenat, D. B. 1990. CYC: Toward Programs with Com-mon Sense. Communications of the ACM 33(8): 30–49.

Lenat, D. B., and Guha, R. V. 1990. Building LargeKnowledge-Based Systems: Representation and Infer-ence in the CYC Project. Reading, Mass.: Addison-Wesley.

Miller, G. A. 1990. WORDNET: An On-Line LexicalDatabase. International Journal of Lexicography 3–4:235–312.

Sowa, J. F. 1997. Knowledge Representation: Logical,Philosophical, and Computational Foundations.Boston, Mass.: PWS Publishing. Forthcoming.

Sowa, J. F. 1995a. Distinctions, Combinations, andConstraints. In Proceedings of the Workshop onBasic Ontological Issues in Knowledge Sharing,19–20 August, Montreal, Canada.

Sowa, J. F. 1995b. Top-Level Ontological Categories.International Journal on Human-Computer Studies43(5–6): 669–685.

Speel, P.-H. 1995. Selecting Knowledge Representa-tion Systems. Ph.D. diss., University of Twente,Enschede, The Netherlands.

TOVE. 1995. TOVE Manual: Department of IndustrialEngineering, University of Toronto. Available atwww.ie.utoronto.ca/EIL/tove/ontoTOC.html.

Uschold, M., ed. 1996. EUROKNOWLEDGE Glossary:ESPRIT Project 9806. Available fromftp://ftp.aiai.ed.ac.uk/pub/projects/euroknow/delvr-bls/glossary.ps.

Articles

74 AI MAGAZINE