Terminology Metadata

27
Terminology Metadata Extension of the Service Meta Model SWG Proposal January 2008

description

Terminology Metadata. Extension of the Service Meta Model. SWG Proposal January 2008. Agenda. Background (5 min) Review Proposed Model (15 min) Discussion (5 min) Vote Next Steps (5 min). Team Members. Tom Johnson (Mayo) Frank Hartel (NCI) George Komatsoulis (NCI) Sal Mungal (Duke) - PowerPoint PPT Presentation

Transcript of Terminology Metadata

Page 1: Terminology Metadata

Terminology Metadata

Extension of the Service Meta Model

SWG Proposal

January 2008

Page 2: Terminology Metadata

Agenda

• Background (5 min)• Review Proposed Model (15 min)• Discussion (5 min)• Vote• Next Steps (5 min)

Page 3: Terminology Metadata

Team Members

• Tom Johnson (Mayo)• Frank Hartel (NCI)• George Komatsoulis (NCI)• Sal Mungal (Duke)• Hua Min (Fox Chase)• Scott Oster (OSU)• Mike Riben (MD Anderson)• Brian Davis (3rd Millennium)

Page 4: Terminology Metadata

Background - Goals

• Goals: • Identify metadata queryable at the index service level• Narrow focus for first revision …

• Initial model defined to satisfy discoverydiscovery use cases

• Support development of enhanced grid discovery client• Resolve runtime services for terminologies of interest• Additional metadata available through runtime services

• Allow/anticipate future expansion

Page 5: Terminology Metadata

Background – Use Cases

• Use Case Collection & Classification of Attributes• Identification• Internationalization• Intended/Allowed Usage• Provenance• Administration

Page 6: Terminology Metadata

Background - Use Cases

• Samples• Browse Existing Ontologies• Viewing Differences• Detecting Recently Added Ontologies• Web of Trust

Page 7: Terminology Metadata

(1) Browse Existing Ontologies• An ontology developer is interested in creating an ontology

for a domain (e.g., radiographic anatomy).

• Determine if there are already similar ontologies in that domain. • Evaluates assigned categories for registered ontologies. • Discovers match for “anatomy”

• Views available titles and descriptions• Finds listings for “human” and “mouse” anatomy, but not “radiology” • Looks at the human anatomy ontology to see if it fits the need

Attributes: category, title, description

Background - Use Cases

Page 8: Terminology Metadata

(2) Viewing Differences • An ontology developer wants to view what has changed

between two versions of an ontology.

• Retrieve listing of registered terminology services• Sort by URI, then version• Select and resolve grid services for differing versions• Invokes runtime services to resolve and compare content

Attributes: uri, version

Background - Use Cases

Page 9: Terminology Metadata

(3) Detecting Recently Added Ontologies• A user wants to contact the providers for new ontologies

registered within the last quarter.

• Query registered ontologies by registration date• Pull point of contact information (source, curator, registration

authority) from listed items

Attributes: registration date, registration authority, source, curator

Background - Use Cases

Page 10: Terminology Metadata

(4) Web of Trust• Quality of ontologies:

• User is aware that there are several anatomy ontologies, and is unclear which to use.

• Trusts certain ontology sources (anatomists) more than others • Views ontology source to determine content origin• Views intended and example use to consider alignment with

application• Considers caBIG certification level

Attributes: source, intended use, example use, certification level

Background - Use Cases

Page 11: Terminology Metadata

Background – Model

• Focus of work on …• Model alignment

• External … Incorporate feedback from review and alignment with relevant specifications and standards.

• Internal … Take better advantage of previously registered models and classes.

• Incorporating specific feedback on model classes and attributes.

Page 12: Terminology Metadata

Background - Alignment

• Specifications/standards considered …• Dublin Core• ISO 11179-2/3/6: classification, registries, admin• LexGrid/LexBIG model• National Center for Biomedical Ontology (NCBO) BioPortal• Public Health Information Network (CDC/PHIN)• Simple Knowledge Organization System (SKOS core)• UMLS Rich Release Format (RRF)• CTS/CTS2

Page 13: Terminology Metadata

Background – Model Alignment

Page 14: Terminology Metadata

Background – Model Alignment

Page 15: Terminology Metadata

Background – Model Alignment

• Findings …• No silver bullet• General alignment for defined items

• All SWG items and definitions represented conceptually in one or more specifications

• Adequate, but not perfect, alignment of semantics

• Some name changes

• Some new attributes identified• Supplement existing use case

• Generally not found to be required unless we add use cases

Page 16: Terminology Metadata

Model - Overview

class Domain Objects

The Domain class model captures essential information about objects in the domain.

TerminologyMetaData

+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String

TerminologyUsage

+ exampleUse: String [0..n]+ intendedUse: String [0..n]+ isRestricted: isRestrictedType+ rights: String [0..n]+ rightsHolder

TerminologyProv enance

+ curator [0..1]+ releaseDate: Date+ releaseFormat: String+ releaseLocation: String+ releasePackage: String+ releaseVersion: String+ source [0..1]

TerminologyAdmin

+ certification: certificationType [0..1]+ registrationAuthority+ registrationDate: Date+ registrationStatus: registrationStatusType+ registrationTag: String [0..n]

+terminologyMetaData

1

hasStatus

+terminologyAdmin 1+terminologyMetaData 1

hasProvenance

+terminologyProvenance 1

+terminologyMetaData

1

hasUsage

+terminologyUsage 1

Page 17: Terminology Metadata

Model – Core Identification& Description

• uri (1)• Unique persistent identifier.• urn:oid:2.16.840.1.113883.6.2

• title (1)• Formal or published name for display.• International Classification of Disease, 9th…

• localName (1..n)• Name used to refer to the terminology within a

localized context; often a mnemonic.• ICD-9-CM, ICD-9

• description (0..1)• Human-readable explanation or narrative.• The International Classification of …

• category (0..n)• Applicable domains or scientific fields.• e.g. anatomy, genomic, proteomic,

phenotype…

class Logical Model

TerminologyMetaData

+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String

Page 18: Terminology Metadata

• type (0..1)• Nature of content relative to the category.• application – describes domain in an application

dependent manner• core – describes domain in an application

independent manner• domain – describes the most important

concepts in a domain• task – describes generic types of tasks or

activities (e.g. selling, selecting)• upperLevel – describes general, domain

independent concepts (e.g. space, time)

• structure (1)• Indicates complexity of maintained relationships• flat – no hierarchy• simple - supports a single inheritance mono-

hierarchical structure.• complex - supports multiple relationships and/or

relationship types

Model – Core Identification& Description

class Logical Model

TerminologyMetaData

+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String

Page 19: Terminology Metadata

• defaultLanguage (1)• Language for text unless otherwise specified• eng

• supportedLanguage (1..n)• Languages supported for text-based content• eng, spa, …

• supportedContentType (1..n)• Supported type of text or imbedded multimedia• e.g. mime type (text/plain, image)

• keyword (0..n)• Words or phrases of special significance.• patient record, nursing protocol, …

Model – Core Identification& Description

class Logical Model

TerminologyMetaData

+ category: String [0..n]+ defaultLanguage: String = eng+ description: String [0..1]+ keyword: String [0..n]+ localName: String [1..n]- structure: StructureType+ supportedContentType: String [1..n]+ supportedLanguage: String [1..n]+ title: String+ type: typeEnum [0..1]+ uri: String

Page 20: Terminology Metadata

Model - Usage

• intendedUse (0..n)• Human-readable description of intended use.• data integration

• exampleUse (0..n)• Human-readable example of use.• Integration of protein data.

• isRestricted (1)• Indication of intellectual property boundaries.• true

• rights (0..n)• Human-readable description of IP rights.• NCI Thesaurus terms of use …

• rightsHolder (point of contact) (0..1)• Contact point for intellectual property rights.• National Cancer Institute

class Logical Model

TerminologyUsage

+ exampleUse: String [0..n]+ intendedUse: String [0..n]+ isRestricted: isRestrictedType+ rights: String [0..n]+ rightsHolder

Page 21: Terminology Metadata

Model - Provenance

• source (0..1)• Origin or provider of content• National Center for Health Statistics (NCHS)

• curator (0..1)• Maintains the content in the release format (e.g.

OWL, OBO, RRF)• National Library of Medicine

• releaseDate (0..1)• Date of availability in released format.• 2007-08-30

• releaseFormat (0..1)• Format as released by the curator.• e.g. OWL, OBO, RRF

• releaseLocation (0..1)• Location of resource in the releaseFormat.• ftp://ftp1.nci.nih.gov/pub/cacore/EVS/

NCI_Thesaurus/Thesaurus_07.12a.OWL.zip

class Logical Model

TerminologyProv enance

+ curator [0..1]+ releaseDate: Date+ releaseFormat: String+ releaseLocation: String+ releasePackage: String+ releaseVersion: String+ source [0..1]

Page 22: Terminology Metadata

Model - Provenance

• releasePackage (0..1)• Name of the composite ontology or meta

distribution containing the terminology as released.

• e.g. UMLS, NCI_MetaThesaurus, BiomedGT

• releaseVersion (0..1)• Represented version identifier.

• 2007

class Logical Model

TerminologyProv enance

+ curator [0..1]+ releaseDate: Date+ releaseFormat: String+ releaseLocation: String+ releasePackage: String+ releaseVersion: String+ source [0..1]

Page 23: Terminology Metadata

Model - Administration

• registrationAuthority (1)• Responsible for maintaining content on the grid• National Cancer Institute

• registrationDate (1)• Date of grid availability or last change of

registration status.• 2007-09-30

• registrationStatus (1)• Designation of terminology status in life cycle.• Possible values from 11179-3 registration life

cycle status category.

• registrationTag (0..1)• Supports lookup by version-agnostic designation• development, test, production

• certification (0..1)• caBIG level of compliance.• bronze, silver, gold

class Logical Model

TerminologyAdmin

+ certification: certificationType [0..1]+ registrationAuthority+ registrationDate: Date+ registrationStatus: registrationStatusType+ registrationTag: String [0..n]

«enumeration»registrationStatusType

candidate incomplete preferredStandard qualified recorded retired standard superceded

Page 24: Terminology Metadata

Model – Anticipated Alignmentagainst available classes

class Domain Objects

The Domain class model captures essential information about objects in the domain.

TerminologyMetaData

+ abbreviation: java.lang.String [1..n]+ category: java.lang.String [0..n]+ defaultLanguage: java.lang.String = eng+ description: java.lang.String [0..1]+ keyword: java.lang.String [0..n]- structure: StructureType+ supportedContentType: java.lang.String [1..n]+ supportedLanguage: java.lang.String [1..n]+ title: java.lang.String+ type: typeEnum [0..1]+ uri: java.lang.String

TerminologyUsage

+ exampleUse: java.lang.String [0..n]+ intendedUse: intendedUseType+ isRestricted: isRestrictedType+ rights: java.lang.String [0..n]+ rightsHolder: java.lang.String

TerminologyProv enance

+ curator: java.lang.String [0..1]+ releaseDate: Date+ releaseFormat: java.lang.String+ releaseLocation: java.lang.String+ releaseVersion: java.lang.String+ source: java.lang.String [0..1]

TerminologyAdmin

+ certification: java.lang.String [0..1]+ registrationAuthority+ registrationDate: Date+ registrationStatus: registrationStatusType+ registrationTag: java.lang.String [0..n]

+terminologyMetaData

1

hasStatus

+terminologyAdmin 1+terminologyMetaData 1

hasProvenance

+terminologyProvenance 1

+terminologyMetaData

1

hasUsage

+terminologyUsage 1

SuperclassesBased on 11179

Page 25: Terminology Metadata

Vote

• Vote will be for …• Approval of the identified criteria• Acknowledgement that model will be aligned with

existing (e.g. 11179-based) superclasses, with model and attribute details to be addressed as required.

Page 26: Terminology Metadata

Questions/Discussion before Vote

Page 27: Terminology Metadata

Next Steps

• Model harmonization w/ recommended superclasses• Change caGRID tooling to capture additional metadata

when registering terminology• Create custom discovery client for terminology services, to

take advantage of additional metadata in support of identified use cases