© 2005 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution...

25
© 2005 The MITRE Corporation. All rights reserve Approved for Public Release; Distribution Unlimited Modular Vocabulary Design in Support of Semantic Interoperability Ken Laskey OASIS Symposium 25 April 2005 The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.

Transcript of © 2005 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution...

© 2005 The MITRE Corporation. All rights reservedApproved for Public Release; Distribution Unlimited

Modular Vocabulary Design in Support of Semantic Interoperability

Ken Laskey

OASIS Symposium

25 April 2005

The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.

2

© 2005 The MITRE Corporation. All rights reserved

Motivation

In SOA, distributed resources (data and processing) developed by independent entities

likely different vocabularies Metadata as the magic bullet for discovery but requester

must understand– organizing schema by which resources of interest have defined

metadata the XML tags

– values that correspond to schema and define the metadata instances the values between the tags

US Department of Defense (DoD) Net-Centric Data Strategy assumes Communities of Interest (COIs) will develop community vocabularies, provide and find services– Institutional (long-standing) and expedient COIs

– But how will these (especially expedient COIs) interoperate?

3

© 2005 The MITRE Corporation. All rights reserved

Further considerations (1)

DoD Directives have moved from single vocabulary approach– 8320.1 (1991)

To determine what data elements should be standardized, how grouped Results stored in DoD Information Resource Dictionary System (DoD IRDS) Directive cancelled in 2004

– 8320.2 (2004) Net-Centric approach Data shall be made visible, accessible, and understandable to any potential

user

Accessible by being published in shared spaces

Understandable by publishing associated metadata

Semantic and structural agreements for data sharing shall be promoted through COIs

A single standard vocabulary across a diverse domain is not feasible but how do you share across vocabularies?

4

© 2005 The MITRE Corporation. All rights reserved

Further considerations (2)

Avoiding n2 -- the compromise vocabulary– Some success when need limited

number of people to agree on limited number of terms

– Helps if pressing need, can avoid tough issues

– Success tends not to scale as numbers increase, flexibility of what can avoid decreases

– Still some drawbacks Minimum vocabularies squeeze out subtleties to collapse differences

lose information Verbose vocabularies preserve subtleties but miss higher level

similarities everything different Compromises that work for one application may not be suitable for

another

The world isn’t n2 but it isn’t n either - deal with it!

5

© 2005 The MITRE Corporation. All rights reserved

Then what?

What do we need vocabulary to do?– Convey common meaning

– Provide richness of description consistent with the domain of discourse

– (red, yellow, green) is fine for traffic signals but not graphic artists

How do we get this from multiple, overlapping vocabularies?– Note: Net-Centric Data Strategy emphasizes use of metadata;

Net-Centric Core Enterprise Services (NCES) to provide common services needed across COIs

– Evaluation approach: examine use cases generated for NCES Analysis of Alternatives (AoA) to understand

What types of metadata implied What functions implied for metadata creation, maintenance, use What implied about supporting infrastructure

– What do use cases imply about needed vocabulary?

6

© 2005 The MITRE Corporation. All rights reserved

Examining AoA Use Cases

Use cases generated for services collected in nine bundles– Bundles included Discovery, Mediation, Messaging, ESM

(Enterprise Service Management)

– Multiple services in each bundle

– Service groupings since revisited but use cases still valid Process

1. Consider each step for each use case and identify likely metadata to support each step

Ex: Discovery Service finds appropriate source metadata indicating pedigree of source

2. Look across use cases and collect individually identified metadata into common metadata sets

Ex: Collect instances indicating some type of pedigree metadata

3. Look for common content and common structure

Ex: Recurring need for contact information of responsible party

7

© 2005 The MITRE Corporation. All rights reserved

What is metadata for an SOA? Subset of the data related to an entity that provides some

critical descriptive information which is useful in some context for identifying, using, or otherwise interacting with the entity

– Entity may be anything for which description is needed– Metadata enables user (human or machine) to discriminate one entity

from another

– Metadata enables user to access entity and its contents in either a read or write mode or execute if the entity comprises processing instructions

– Any subset of data is “appropriate” metadata if it satisfies the needs for some context may be multiple metadata sets for multiple contexts

Related to entity but with external representation

– Needs to be accessed, understood before/without accessing the entity

– Entity-specific tools may create/extract metadata, but general tools to use metadata

Data about data

8

© 2005 The MITRE Corporation. All rights reserved

Results of AoA Use Case Evaluation

Many commonalities found; also indications of granularity at which reuse most likely to occur

Refactoring based on notional component elements, pattern of use, granularity of concepts represented indicates three categories – Concept metadata: set of information elements to convey single

elementary concept

– Function metadata: combines concept metadata and simpler function metadata to capture information needed to describe reusable functions

– Resource metadata: combines concept and function metadata to describe assets (data and processing) that can be utilized to respond to user needs

Metadata definition has up to now been tailored to the entity; alternative is to build metadata using modular constituent

vocabularies in analogous fashion to building modular software

9

© 2005 The MITRE Corporation. All rights reserved

Vocabulary for Concept Metadata

A set of information elements that convey a single elementary concept which is reused frequently as part of other metadata focused and concise

Typical examples of concepts that can be defined and reused– Name - textual label by which entity identified

– Person_name Special case of general name Composed of name fragments for formatting names in different cultures

– Address - composed of address fragments for formatting consistent with location

– Datetime - date and time stamp

– Keywords - textual terms providing descriptive associations

– Identifier - unique means to identify an entity Every term should be linked to a vocabulary where the term is

defined or, as with identifier, the generating pattern is defined (e.g. through XML namespace, UDDI tModel, OWL ontology)

ConceptConceptFunctionResource

10

© 2005 The MITRE Corporation. All rights reserved

Benefits of Modular Concept Metadata

Identifies basic concepts with high degree of reuse Simplifies definition of more complex metadata Provides immediate semantic understanding of that portion

of any metadata set in which same concept metadata used Easier to define mappings between small, concise metadata

sets than large, complex schemas Easier mapping has significant implications for

– Versioning

– Discrete variations between independently defined vocabularies

Any agreement in basic vocabulary, whether on specific terms or relationship between terms, enables an immediate

level of interoperability for any containing structures

ConceptConceptFunctionResource

11

© 2005 The MITRE Corporation. All rights reserved

Vocabulary for Function Metadata

Combines concept metadata and simpler function metadata to support processes, convey constraints and constraint compliance

Use cases show – strong dependence on processes

– recurring need to identify mechanisms and constraints that enable use of an entity consistent with needs and requirements of both users and resources

Many functions can be defined at this level Typical (simpler) examples

– Person/Organization [P/O] - identity and contact info for person or organization; uses concept metadata such as name, address

– Title/Position [T/P] - identity based on specific role; may reuse / redirect to instance of People/Organization

– Creation/Modification [C/M] - latest change to resource; incorporate P/O or T/P to identify who made change, Datetime (concept) for when change made

ConceptFunctionFunctionResource

12

© 2005 The MITRE Corporation. All rights reserved

Commonly Recurring Functions - Access/Invocation [A/I]

Describes information related to access of an information resource, to invoking of a processing resource

Notional elements more than just WSDL– Textual description (concept)

– Type of access (e.g. read, write, delete), link to access term definition

– Responsible party (P/O, T/P)

– Creation/Modification (C/M)

– Constraints, assumptions, pedigrees (discussed further below)

– Service level agreements for access and invocation

– Who prequalified to use see later

– Who has certified for use discussion

– Link to WSDL

ConceptFunctionFunctionResource

13

© 2005 The MITRE Corporation. All rights reserved

Commonly Recurring Functions - Constraints

Describes array of assumptions, restrictions, conditions related to resource but without specifying constraint language

Covers both requester and requested resource– Should requester be permitted access?

– Does resource satisfy requester needs and requirements?

– Ex: requester looking for service with specific signal processing capability, service looking for requester with paid license

Notional elements– Name & description of constraint set

– Version number, link to version number definition

– A/I for service‡ to read, provide details of constraint set

– A/I for preferred processing service‡

– Link to entities evaluated against this constraint set

‡ Services (per A/Is) will process constraint language without requester necessarily knowing which constraint language used

ConceptFunctionFunctionResource

14

© 2005 The MITRE Corporation. All rights reserved

Commonly Recurring Functions - Pedigree Constraint definition is precursor to establishing pedigree Pedigree metadata most often thought of as information useful

to evaluate pedigree of entity– Such information interspersed throughout other metadata

– Required information set likely to expand and evolve over time Pedigree is less input used to evaluate entity and more how

entity evaluated (i.e. constraints) and results of evaluation Notional elements

– Link to constraint set against which entity evaluated

– Link to processing service which evaluated the entity against the constraint set

– Results of evaluating and link to description of result

– When the pedigree was established; if applicable, when pedigree expires (Datetime)

Any entity can have multiple pedigrees corresponding to different constraint sets

ConceptFunctionFunctionResource

15

© 2005 The MITRE Corporation. All rights reserved

Vocabulary for Resource Metadata

Combines concept and function metadata to describe assets that can be utilized to respond to user needs

Required vocabulary is combination and specialization of building blocks (concept and function) such as ones already defined

Notional elements– Name and textual description (concept) Values and

– Identifiers & keywords (concept) links to value

– Version, status (concept) definitions

– Responsible parties (P/O, T/P) for development, maintenance, …

– Access/Invocation (A/I) for read, write, execute functions

– Constraint & Pedigree for resource (same as applied to A/I)

– Security (structure not explicitly evaluated)

ConceptFunctionResourceResource

16

© 2005 The MITRE Corporation. All rights reserved

Notes on Resource Vocabulary

Vocabulary must support– discovery by user (human, machine) looking to satisfy some task

– mutual evaluation by resource and prospective user to establish authorization by resource and value of access to user

– access after user and resource mutually satisfy conditions for use

Unlike functions, limited number of resource types (data & processing) but described by same basic vocabulary

No significant difference in defining NCES (core) or COI (community) resources

Multiple A/I may expose different aspects of resource Specific description of native structure & storage likely not

needed because this hidden by service invocation

ConceptFunctionResourceResource

17

© 2005 The MITRE Corporation. All rights reserved

Prequalifying & Certifying Functions & Resources Access to / invoking of A/I and associated resource depends

on– Requestor satisfying access constraints/requirements

– Resource assumptions & conditions consistent with requestor needs

Constraint set can be evaluated with each attempted access; more efficient if result can be stored and reused– Prequalified user: list maintained by resource, A/I functions, etc.

of users who satisfy access requirements

– Certified resource / service interface: list maintained by COI or other user of resources, A/I functions, etc. for which relevant COI/user requirements satisfied

Lists could be maintained for both users and A/I, resources Notional service to maintain consistency of lists & inverses

connected to users and A/I, resources, etc. service (not just initial creator of metadata) will use and modify metadata

ConceptFunctionFunctionResourceResource

18

© 2005 The MITRE Corporation. All rights reserved

Effects on Extensibility and Interoperability

Referencing same vocabulary module explicitly establishes semantic equivalence– Large, distinct schemas for special purposes are semantically

isolated

– Copying pieces of one schema and pasting in another does not indicate explicit semantic equivalence, can create configuration management problems

– Reuse of global elements within schema (weakly) assist local semantics but may get lost in large schema

Mapping between concise vocabulary structures is simpler than between (portions of) large schemas

Reasoning engines can find transitive relations between multiple mappings indicates more semantic relations and lessens n2 burden

Current extensibility based on optional schema elements; with modular definition, extend by adding additional, predefined modules as needed

19

© 2005 The MITRE Corporation. All rights reserved

Further challenges

Mapping between vocabulary structures– What information is needed to sufficiently capture a mapping?

– What kind of structures are needed to do the capture?

– What support is needed to extract information from the mapping corpus?

Collection of use statistics to facilitate reuse– What types of statistics are needed?

How many times used? Who used? What structures are derived as modifications of this one?

– How can tools (e.g. authoring tools, registries) facilitate statistic gathering and use

Examination done without considering existing registry structures, e.g. UDDI tModels, OWL-S ontologies, XML namespaces need to look at implementation details

20

© 2005 The MITRE Corporation. All rights reserved

Summary and Conclusions Role of metadata for SOA is expanded over traditional use

need for vocabulary expressive enough to support role Use case analysis indicates many recurring structures; highlights

granularity at which reuse most likely to occur Explicitly defining then reusing recurring structures promotes

interoperability among any domains using common structures Easier mapping among modular structures facilitates explicit

capture of differences in versions, other vocabulary variations Metadata catalogs must support

– gathering statistics to facilitate reuse

– authorized parties other than entity creators to modify certain elements of metadata

Need to understand semantics and both tags and values between tags– Critical to provide link to term definitions

– Vocabulary must include explicit structures for specifying links

© 2005 The MITRE Corporation. All rights reservedApproved for Public Release; Distribution Unlimited

Backup

22

© 2005 The MITRE Corporation. All rights reserved

What is metadata for an SOA?

A more formal definition for your consideration

Metadata is that set of descriptive properties which serves one or more of the following functions

uniquely characterizes an entity and for which values associated with the descriptive properties allow a user (human or machine) to discriminate between one entity and another,

describes how the entity and its contents can be accessed (both procedurally and the terms of access) in either a read or write mode or executed if the entity comprises processing instructions,

contains pointers to other information not explicitly part of a given metadata set.

Metadata often includes what the entity is, where it is located, and how to make use of it. It may describe entity properties such as format, structure, context, business rules, or any other chosen elements of its integral or associated data or capabilities. It may include the calling argument to methods, invocation of services, or similar executable commands that act on the content of an instance of the entity, including accessing it from its native storage format.

23

© 2005 The MITRE Corporation. All rights reserved

Metadata Example (1) Consider a user looking for meteorological data. Metadata associated with a data resource

that could support this includes

– general document metadata with the name of the data resource and the geographic locations from where it can be accessed; metadata specific to the function of the data resource, such as the date, time, and location where the data was collected,

– access control restrictions which must be satisfied (or possibly licensing terms if it is a commercial source) and a pointer to the service interface (e.g. WSDL ) to retrieve the data,

– a pointer to pedigree information describing the quality of the data as evaluated based on how the data was collected and processed and the accuracy of the measurements.

The request for the meteorological data may generate a log file detailing the services invoked and resources used to satisfy the request, and the log file could be archived using a network storage service. Associated with the stored log would be metadata containing a log ID, the date of the request, and the identity of the requester. Note, in this example, the log file itself is not considered metadata but information describing the log file is. A pointer to the log metadata would be returned with the requested data so the requester would both know how the request was fulfilled and be able to point to the log as a repeatable means to satisfy a similar request in the future.

24

© 2005 The MITRE Corporation. All rights reserved

Metadata Example (2)

Consider the ways in which metadata for a book may be defined and used for different contexts.

For a librarian, the Library of Congress classification number is likely an important metadata element. 

Conversely, for a bookseller, the classification number is not likely to be as important but the current sales price would be (while this price may not be of interest to the librarian). 

The text in the book is unlikely to be identified as metadata, but specific quotes from the book may be metadata for someone advertising the book.

25

© 2005 The MITRE Corporation. All rights reserved

Types of Resources

AoA evaluation indicates two types of resources– Data Resource - source of content

Accepts request and returns a value or set of values in response Response can be numerical or textual value(s), structured entity

(e.g. schema), or attribute of entity (e.g when schema last changed) Values can be from static data store or dynamically generated

– Processing resource - source of instructions/tasking Accepts a request and returns a status indicating extent to which

task completed and how state of entities changed as result One or more processing resources may be invoked as part of

submitting a request (e.g. a query) and being returned a response

Other types commonly identified (e.g. metadata registries) but appear to fall under one of the two categories