IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl...

159
IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas Baker GMD Carl Lagoze Cornell Univ.

Transcript of IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl...

Page 1: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

IFLA/DELOS/NSF WorkshopStandards and Metadata

EVA 2000 MoscowNovember 2, 2000

Thomas Baker GMDCarl Lagoze Cornell Univ.

Page 2: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Introductions

• Thomas Baker– GMD Library, Bonn, Germany– Dublin Core Executive Committee– EU DELOS Network of Excellence

• Carl Lagoze– Digital Library Research Group, Faculty of

Computing and Information, Cornell University, Ithaca, NY, USA

– Dublin Core Advisory Committee – NSF Digital Library Initiative

Page 3: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Workshop Roadmap

• Introduction to Metadata (30 min.)• Dublin Core Metadata Initiative (60 min.)

Break• Simplicity and Complexity (45 min.)• Metadata Infrastructure (45 min.)

Lunch• Deploying and Using Metadata (90 min.)• Metadata Landscape (30 min.)

Page 4: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

Introduction to Metadata

EVA 2000 Moscow

Page 5: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Haven’t we done metadata already?

Page 6: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

What’s wrong with this model?

• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community

of expertise)

• Monolithic– One size fits all approach– Reflects its centralized system origins

• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other

resource relationships

Page 7: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Internet Commons includes Multiple Communities

ScientificData

HomePages Geo

InternetCommons

Library

Museums

Commerce

Whatever...

Page 8: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Web Challenge to Traditional Cataloging

• Scale

• Permanence

• Authenticity

• Organizational Context

• Variety

Page 9: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

State of the Web as an Information System

• Search systems are motivated by advertising• Index coverage is unpredictable and limited (1/3)• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile• What about versions, editions, back issues?• Archiving is presently unsolved• Authority and quality of service are spotty• Managing Intellectual Property Rights is hard

Page 10: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Metadata: Part of a Solution

• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation

• Variety across various dimension:– specialization– decentralization– democratization

Page 11: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Metadata Takes Many Forms

resourcediscovery

documentadministration

rightsmanagement

contentrating

security andauthentication

archivalstatus

products andservices

databaseschemas

process controlor description

Page 12: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Metadata Challenges

• Accommodate multiple varieties of metadata

• Tension: functionality and simplicity • Tension: extensibility and

interoperability• Human and machine creation and

use• Community-specific functionality,

creation, administration, access

Page 13: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Warwick Framework: Containing Chaos

• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)

• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata

• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive

element sets– focuses interoperability issues at package level

Page 14: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Modularization Allows Distributed Management

• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution

Page 15: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Interoperabilityrequires conventions

about:• Semantics– The meaning of the

elements

• Structure– human-readable– machine-parseable

• Syntax– grammars to convey

semantics and structure

Page 16: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

Dublin Core Metadata Initiative

EVA 2000 Moscow

Page 17: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

History of the Dublin Core• 1994: "Do we have a simple set of tags for

ordinary people to describe their Web pages?"

• 1995: The Dublin Core: 13 elements, later 15• 1996: The Dublin Core is but one of many

vocabularies needed ("Warwick Framework")• 1997: "WF needs formal expression in a

Resource Description Framework (RDF)"• 2000: Dublin Core Metadata Initiative

recommends qualifiers, broadens its organizational scope beyond the Core

Page 18: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 A pidgin for digital tourists

• Metadata is language.• Dublin Core is a small and simple language -- a

pidgin -- for finding resources across domains.• Speakers of different languages naturally

"pidginize" to communicate– E.g., tourists using simple phrases to order beer

("zwei Bier bitte" "dva pivo" "biru o san bai"...)

• We are all "tourists" on the global Internet.

Page 19: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 A grammar of Dublin Core

• http://www.dlib.org/dlib/october00/baker/10baker.html

• By design not as subtle as mother tongues, but easy to learn and extremely useful in practice

• Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)

• Simple grammars: sentences (statements) follow a simple fixed pattern...

Page 20: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Example Dublin Core statements

• Resource has Title 'Grammar of Dublin Core'.

• Resource has Creator 'Tom Baker'.• Resource has Subject 'Metadata'.• Resource has Relation

http://foo.org/file.htm.

Page 21: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

Page 22: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

The fifteen special nouns (properties)

Creator Title Subject

Contributor Date Description

Publisher Type Format

Coverage Rights Relation

Source Language I dentifier

Page 23: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Page 24: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Dumb-Down Principle for qualifiers

• The fifteen elements should be usable and understandable with or without the qualifiers

• Like saying that nouns can stand on their own without adjectives

• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

Page 25: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

To test whether qualifiers are "good", cover them with your hand and ask:-- Does the statement still make sense?-- Is it still correct?

Page 26: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000Element Refinements

• Make the meaning of an element narrower or more specific.– a Date Created versus a Date

Modified– an IsReplacedBy Relation versus a

Replaces Relation• If your software does not understand

the qualifier, you can safely ignore it.

Page 27: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Value Encoding Schemes• Says that the value is

– a term from a controlled vocabulary (e.g., Library of Congress Subject Headings)

– a string formatted in a standard way (e.g., "2000-05-03" means May 3, not March 5)

• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Page 28: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Peer review of proposals for new terms

• DCMI Usage Committee reviews proposals for new qualifiers (and perhaps elements)

• Evaluates proposals in light of grammatical principles (are the qualifiers ignorable?)

• Tiered model of approval status (tentative): proposed, conforming, recommended, obsolete

• First qualifiers "recommended" in July 2000• http://purl.org/DC/documents/rec/dcmes-qualifiers-

20000711.htm

Page 29: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Open questions in Dublin Core

• What are "appropriate values" for the fifteen properties? How can they be used for cross-domain searching?

• How can DCMI control the evolution of Dublin Core as it is adapted in practice?

• How can an application use DC as a pidgin while describing resources with more complex metadata?

• Can we keep the Core simple?

Page 30: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Search buckets versus description

• Think of DC elements as fuzzy search buckets– Different types of data appropriate for different

buckets: URLs, date strings, word strings, names– Separate books about Sigmund Freud versus

books by Sigmund Freud into different buckets

• Search bucket: for discovering resources• But general, fuzzy categories may not be

sufficient for describing resources– After searching, display more detailed

descriptions on screen

Page 31: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

DCMI broadens its mission (Oct 2000)

• The mission of the DCMI is to make it easier to find resources using the Internet through the following activities:– Developing metadata standards for

discovery across domains (example: the Dublin Core)

– Defining frameworks for the interoperation of metadata sets

– Facilitating the development of community or disciplinary specific metadata sets that are consistent with items 1 and 2

Page 32: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

A context for the Core

• If "the Dublin Core" is the core of DCMI, what is the surrounding context?

• If "the Dublin Core" is the simple pidgin, what is the broader landscape of metadata language?

• How do pidgins relate to more complex models or "application profiles"?

• Do we need pidgins for describing other things, such as "people" and "events"?

Page 33: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Using DC with other vocabularies

• Specialized application profiles [government

information, education, mathematics] may need to:– Use general-purpose Dublin Core

elements– Use elements from another, more

domain-specific standard– Narrow standard definitions of DC

elements for specific local uses– Invent local elements outside the

scope of existing standards

Page 34: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Example: adapting DC:Title to local uses

• As defined in the official Dublin Core "namespace":– "Title: A name given to the resource"

• As defined in a UK "application profile":– "Title: A name given to the collection"

•Definition is narrower

Page 35: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Namespaces in translation

• Dublin Core has been translated into 26 languages– machine-readable tokens are

shared by all– human-readable labels are defined

in different languages– translations are distributed,

maintained in many countries

Page 36: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

One token - labels in many languages

dc:creator“Verfasser”rdfs:label

“Creator”rdfs:label

“Pencipta”

rdfs:label

[Server inGermany]

[Server inJakarta]

[DCMI Server]

Page 37: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

RDF -- a more powerful sentence pattern

• Dublin Core statements:– Resource has Creator "Tom Baker".– Resource has Identifier http://foo.org/bar.html.

• Resource Description Framework "triples" - a more powerful way to say the same thing:– http://foo.org/bar.htm has Creator "Tom Baker".

Page 38: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 DCMI Re-organization

• Expanded mission– Core metadata elements for Agents (or Events)?– Frameworks for integrating multiple standards

• Re-organization model– Membership organization like W3C or Unicode

Consortium?– Retain open consensus model– International perspective– Better training, documentation, outreach

Page 39: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

DCMI Open Metadata Registry

• Managing vocabularies defined by the DCMI– Languages– Versioning– Controlled vocabularies

• Foundation for modular, incremental integration and evolution

• Collaboration with European SCHEMAS Project and ULIS in Tsukuba, Japan

• http://wip.dublincore.org/registry/

Page 40: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Official recognition of the Dublin Core

• CEN Workshop Agreement– endorse Dublin Core elements as

CWA13874– provide usage guidelines for European

industry

• NISO Z39.85– National Information Standards

Organization, an ANSI affiliate– Balloting concluded in August 2000

Page 41: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

DCMI Activities

• Standards development and maintenance

• Metadata registry• Technical working groups and periodic

workshops• Tutorial materials and user guides• Education and training• Access to software• Liaisons with other standards or user

communities

Page 42: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

DC-9 Workshop in Tokyo, 2001

• DC-8 Workshop was a National Library of Canada (Ottawa)– emphasis on application profiles, longer-

term organizational mission, and domain-specific adaptations of Dublin Core

• DC-9 in Tokyo: well-defined tracks– implementation reports and research

papers– ongoing technical working group meetings– general introduction and tutorials for non-

experts

Page 43: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

Simplicity and Complexity

EVA 2000 Moscow

Page 44: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Warwick Framework

• Container/Package approach to metadata

• Rejection of universal ontology• Recognition of individual community

needs• Provide scope for metadata efforts

Page 45: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Warwick Framework Design

Containers for aggregating Packages of typed metadata sets

Container

PackageMARC Metadata

PackageIndirect Reference

PackageTerms and Conditions

URI

PackageDublin Core

Page 46: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Warwick FrameworkImplementation and

Research• Packaging, linking, storing, and

transmitting component/package framework

• Semantic interactions and interoperability among multiple metadata packages/vocabularies

Page 47: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Interoperability among Metadata Vocabularies

abc coreclasses

DublinCore

MARC

INDECSIMS

Page 48: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Harmony Project

• Project Investigators– Dan Brickley - ILRT, Bristol (U.K.)– Jane Hunter - DSTC, Brisbane (Australia)– Carl Lagoze - Computer Science, Cornell

(U.S.)

• More Information– http://www.ilrt.bris.ac.uk/discovery/

harmony/

Page 49: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Attribute/Value approaches to metadata…

Hamlet has a creator Shakespeare

subject implied verb metadata noun literal

Play

wrig

ht

metadata adjective

The playwright of Hamlet was Shakespeare

R1

“Shakespeare”

“Hamlet”

dc:creator.playwright

dc:title

Page 50: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

…run into problems for richer descriptions…

Hamlet has a creator Stratford

birt

hpla

ce

The playwright of Hamlet was Shakespeare,who was born in Stratford

“Stratford”R1

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Hamlet has a creator Shakespeare

Page 51: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

…because of their failure to model entity distinctions

R1

“Stratford”

creatorR2

name “Shakespeare”

birthplacetitle

“Hamlet”

Page 52: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Applying a Model-Centric Approach

• Formally define common entities and relationships underlying multiple metadata vocabularies

• Describe them (and their inter-relationships) in a simple logical model

• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.

Page 53: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Applications of the ABC Model

• Guidance for communities developing vocabularies

• Foundation for understanding existing vocabularies

• Basis for mappings among vocabularies using formalisms such as RDF

Page 54: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Harmony/ABC Workshop

• January 27-28 2000 CNI Washington • Representatives from

– Dublin Core, INDECS, MPEG-7, IFLA– Archives, Museums, Libraries, Audiovisual

• Result: Importance of processes, events, and states in understanding and describing resources

Page 55: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Conceptual Basis:Evolution of Content over

TimeIFLA Entity Model

From Bearman, et. al., D-Lib Magazine, January 1999.

Page 56: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Events help metadata relationships?

• Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model)

• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.

• Clarifying attachment points facilitates mapping across common entities in different vocabularies.

Page 57: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

desc1

Content, Events, & Descriptions

desc2

R1 R2 R3

R4

E2 E3E1

E4

Page 58: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 ABC Event Model

Page 59: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

A Simple Example:Live At Lincoln Performance

• Performance at The Lincoln Center for the Performing Arts

• On April 7, 1998 at 8pm Eastern time• Orchestra is New York Philharmonic• Musical score – “Concerto for Violin”• 130 minute MP3 audio recording • Rights held by Lincoln Center

Page 60: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Example in ABC Model

Page 61: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Derivation of Multiple Views

CIDOC CRM Model

ABC Description

in XML

ID3 tags embedded in MP3

MPEG-7 description in DDL

Dublin Core in XML/RDF

Page 62: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Step 1 – Structural Mapping

Event-aware model

Resource-centric model

Page 63: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Structural Mapping RulesEvent attributes transferred to output:• Context/Date, /Time, /Place ->

Date.Performance, Time.Performance, Place.Performance

• Act/Role -> Agent.Role e.g. Orchestra • Event Type -> Relation between input & ouput e.g. Performance ->Relation.isPerformanceOf• Output Description generated from event

Type and input Title e.g. “Performance of Concerto for Violin”

Page 64: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Step 2 – Semantic Mapping

Page 65: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 XSLT for Transformations

• Works well for structural and syntactic mapping between metadata descriptions

• Semantic mappings need to be hardcoded• Unsuitable for loosely constrained or

variable input

Page 66: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 A More General Solution

• Flexible semantic mappings require additional knowledge:– Metadata Term Ontology – MetaNet

• Methods for using that context knowledge for mapping– Some combination of procedural language

(Java) and XSLT– Investigating more general mapping rule

language (analogies to compiler technology)

Page 67: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Planned Experimental Context

• CIMI Experiments– Dublin Core for basic resource descriptions– Richer descriptions derived from ABC model– Mapping among descriptions– Understanding relationship between ABC

and CIDOC CRM

• Connecting with Recordkeeping Metadata Issue - SPIRT Project

Page 68: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

Metadata Infrastructure

EVA 2000 Moscow

Page 69: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Metadata is language

• Metadata schemas are languages for making statements about resources:– Book has Title "Gone with the Wind".– Web page has Publisher "Springer

Verlag".

• Vocabulary terms (elements) are defined in standards like Dublin Core

• Metadata grammars constrain the statements and data models one can form

Page 70: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

But languages evolve with use

• Inevitably, languages resist stability

• People stretch official definitions• Implementers misunderstand the

intended meaning or use of elements • Implementors coin local terms and

extensions• If the application does not fit the

standard, the standard is often "customized" to fit the application

Page 71: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Metadata languages are "multilingual"

• Metadata is not a spoken language• The words of metadata -- "elements" --

are symbols that stand for concepts expressible in multiple natural languages

• Standards may have dozens of translations

• Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?

Page 72: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

What metadata languages lack

• Comprehensive dictionaries – Where can one get an overview of

vocabulary terms used in metadata languages?

• A publication context for implementers– Where can you see how they are using

metadata?

• Standard grammars– How do we understand the principles of

metadata?

Page 73: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Can we manage this evolution?

• How can we (scalably) monitor the usage of a language that is:– Never spoken?– Rarely published in a way that can be

harvested?

• How can dictionary editors help a metadata language evolve and grow in response to usage?

• How can this evolution occur across (human) languages?

Page 74: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

RDF Schemas (RDFS) -- W3C standard

• A dictionary format for metadata terms:– Simple XML format for terms and definitions

• Example: "Title" (Dublin Core)– Human-readable label and definition:

• Title: A name given to the resource.

– Unique, machine-readable identifiers• dc:title

• Support for cross-references– between terms in related standards– between local adaptations and related standards

Page 75: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Print world versus the Web

• Traditional print world– Standards are currently defined and published as

paper documents or Web pages in HTML– Metadata implementors rarely publish their

local extensions and adaptations

• RDF Schemas (RDFS)– Web-based publication format– Explicit cross references from implementation

schemas and the standards on which they are based

Page 76: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

EOR -- an RDF Schema Browser

• Harvests RDF Schemas– Schemas distributed on multiple Web servers– Creates huge database of schemas for searching– Web interface functions as a "metadata browser"– Click on cross-references between linked terms

• Downloadable as open source software– http://eor.dublincore.org/index.html– Authors: Eric Miller (OCLC, RDF Working Group, DCMI)

and Tod Matola

Page 77: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Hyperlink Metadata Terms over the Web

• Index of metadata terms searchable as one huge database

• Click on cross-references to follow term-to-term links between vocabularies

• Point-to-point, like the Web itself– In 1992, Gopher located the right file within

directory trees (but not points within the file)– HTML enabled point-to-point links between

documents

Page 78: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

"Editor" -- a MARC relator -- refines "Contributor"

Page 79: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Follow the link to MARC Relator Terms

Page 80: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

...the source of which looks like this:

Page 81: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

...or to Contributor [here, in English, French, German]

Page 82: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Or view the schema of MyRDF itself...

Page 83: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

...itself an RDF schema like the others

Page 84: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Registries can function as dictionaries

• Historically, dictionaries of English, French, etc: recorded variants, prescribed forms, and helped standardize (national) languages

• Metadata dictionaries can help metadata vocabularies evolve more like other human languages– Not just top-down, like traditional

standards– Also bottom-up, in response to usage

Page 85: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Dictionaries prescribe and describe

• Prescribe definitions and recommend usage

• Describe how terms are actually used– Monitor usage through collecting

examples

• Editors and usage boards must strike a balance between prescription and description.

Page 86: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

SCHEMAS Project -- a Thin Registry

• http://www.schemas-forum.org, an EU Project• Pointers to resources elsewhere (a "thin"

registry or portal)• Short descriptions of metadata standards

activities• Critical commentaries by domain experts• Promote the publication of schemas (in

RDF)• Goal: help implementors discover how others

(e.g. EU Projects) are using standards in order to harmonize usage

Page 87: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

DCMI -- a Thick Registry

• A thick registry: stores official metadata element definitions in a central database or repository

• Managing a namespace (as a standards agency): publish qualifiers as available, with version control– Managing translations of the standard in multiple

languages

• Eventually:– User guide interface– Support for standardisation processes (peer review)– Downloadable input to software tools for generating,

editing, validating DC metadata

Page 88: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Dictionaries as a tool for harmonization

• Knowledge of how other projects are using standards will avoid "reinventing the wheel"

• To help information providers harmonize their schemas for improved access within domains:– Between countries (Nordic Metadata Project)– Preprint repositories (Open Archives Initiative)– Subject gateways (Renardus)– Theses and dissertations (NDLTD)– Mathematics and physics (MathNet, PhysNet)

Page 89: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

A global registry infrastructure?

• Analogously to HTML for text, RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web

• Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries

• Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?

Page 90: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 The scope of registries

• Anything "semantic" (terms and definitions) is potentially an RDF schema:– controlled vocabularies– namespaces, application profiles, annotations– the "schema" of the registry itself

• Application constraints can be modelled in XML Schemas– "title is mandatory"; "date must be after 1980"

• Will XML and RDF Schemas merge?

Page 91: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

Deploying and Using Metadata

EVA 2000 Moscow

Page 92: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Syntax Alternatives:HTML

• Advantages:– Simple Mechanism – META tags embedded

in content– Widely deployed tools and knowledge

• Disadvantages– Limited structural richness (won’t support

hierarchical,tree-structured data or entity distinctions).

– Limited formalisms (parsing and schema definition)

Page 93: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Dublin Core in HTML

<link rel="schema.DC" href="http://purl.org/dc"> <meta name="DC.Title" content="Business Unusual” <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web

cataloging "> <meta name="DC.Date" scheme="W3CDTF"

content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">

Page 94: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Syntax Alternatives:XML

• The standard for networked text and data

• Wide-spread tool support– Parsers (DOM and SAX)– Extensibility (namespaces) – Type definition (XML Schema)– Transformation and Rendering (XSLT)– Rich linking semantics (XLINK)

Page 95: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 XML Schema

• Rich XML-based language for expressing type semantics

• Replaces arcane and limited DTD (origin in SGML)

• Facilities– Data typing (both complex and primitive)– Constraints– Defaults

Page 96: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Dublin Core in XML

<metadata xmlns:dc="http://www.openarchives.org/OAI/dc.xsd">   <dc:creator>Carl Lagoze</dc:creator> <dc:title>Accommodating Simplicity and Complexity in Metadata</dc:title> <dc:date>2000-07-01</dc:date>       <dc:publisher>Cornell University, Computer Science</dc:publisher></metadata>    

Page 97: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Syntax Alternatives:RDF

• RDF (Resource Description Format)• The instantiation of the Warwick

Framework on the Web• Provides enabling technology for richly-

structured metadata• Rich data model supporting notions of

distinct entities and properties• Syntax expressed in XML

Page 98: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 RDF Components

• Formal data model• Syntax for interchange of data• Schema Type system (schema model)

Page 99: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 RDF Data Model

• Directed labeled graphs• Model elements

– Resource– Property– Value– Statement– Containers

Page 100: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 RDF Model Primitives

ResourceProperty

ValueResource

Statement

Page 101: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 RDF Syntax Example

URI:R“CIMI Presentation”

Title

Creatordc:

dc:

“Eric Miller”

<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <dc:Creator> Eric Miller </dc:Creator> </Description></RDF>

Page 102: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

“Eric Miller”

RDF Model Example #2

URI:R

URI:ERIC

[email protected]”“Eric Miller”

“OCLC”

bib:Emailbib:Affbib:Name

URI:OCLC

“CIMI Presentation”Title

Creatoroa:

dc:

Page 103: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/” xmlns:bib = “http://www.bib.org/persons#”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <oa:Creator> <Description> <bib:Name> Eric Miller </bib:Name> <bib:Email> [email protected] </bib:Email> <bib:Aff resource = “http://www.oclc.org” /> </Description> </oa:Creator> </Description></RDF>

RDF Syntax Example #2

Page 104: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 RDF Containers

• Permit the aggregation of several values for a property

• Express multiple aggregation semantics– unordered– sequential or priority order– alternative

Page 105: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 RDF Schemas

• Declaration of vocabularies– properties defined by a particular community– characteristics of properties and/or constraints on

corresponding values

• Schema Type System - Basic Types– Property, Class, SubClassOf, Domain, Range– Minimal (but extensible) at this time– minimize significant clashes with typing system

designed for XML Schema WG

• Expressible in the RDF model and syntax

Page 106: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Relationships among vocabularies

dc:Creator

ms:director

marc:100

bib:Author

Page 107: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Bringing it together

• RDF Metadata transmission– Embedded (e.g. <META>), Transmitted with

resource (HTTP), Trusted 3rd Party (HTTP GET)

• RDF Data Model – Support consistent encoding, exchange and

processing of metadata… critical when aggregating data from multiple sources

• RDF Schema– Declare, define, reuse vocabularies

Page 108: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Open Archives Initiativehttp://www.openarchives.or

g

Page 109: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 What is Interoperability?

• Naming?– Handles– Purls

• Metadata?– MARC– Dublin Core

• Document models?– WebDAV

• Federated searching?– Z39.50?– DASL?

• Services and Protocols?– Dienst

Page 110: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Partitioning Interoperability

Document Models

Metadata Harvesting

Mediator ServicesLinking, Searching, Summarizing

Page 111: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

SearchingCurrent

AwarenessSummarization

Service Providers

Data Providers

harv

estin

g

The World According to OAI

Page 112: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 UPS Meeting Results

• Establishment of Open Archives Initiative– Loose coalition to experiment with

interoperability solutions

• Santa Fe Convention– Organizational and technical framework to

support metadata harvesting for ePrint archives

Page 113: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Metadata Harvesting is not New

• Harvest Project (1992-1995)– DARPA-funded– Mike Schwartz (U. Colorado), Mic Bowman

(Penn State), Udi Manber (U. Arizona)

Page 114: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 “Open” Archives

• Political Agenda?– Author self-archiving of E-Prints– “Mission” to reformulate scholarly

publishing framework

• Technical?– Infrastructure to facilitate interoperability

across multiple domains

Page 115: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Other communities of interest

• “Cambridge” digital library federation meetings– research library community has many

materials for which they’d like to ‘expose’ metadata

• San Antonio OAI workshop– librarians, publishers (some), others

Page 116: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Technical Umbrella for Practical Interoperability…

ReferenceLibrariesPublishers

E-PrintArchives

…that can be exploited by different communities

Page 117: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Acting mission statement

Supply and promote an application independent technical framework – a supportive infrastructure that empowers different scholarly communities to pursue their own interests in interoperability in the technical, legal, business, and organizational contexts that are appropriate to them.

Dan Greenstein, Director DLF

Page 118: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

What does this REALLY Mean?

• Keep the bar low enough to make widespread adoption possible

• Provide enough back-doors to make true “disruption” possible (e.g., ePrint community:– refine record notion to mandate full-content

connection– refine metadata to mandate linkage to full-

content

Page 119: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Organizational Stability

• Institutional backing of CNI (Coalition for Networked Information) and DLF (Digital Library Federation)

• Formation of steering committee– first steps towards international

involvement

Page 120: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Framework for Partitioning Tasks

• Steering Committee– policy guidance

• Technical Committee– technical specifications

• Workshops– public dissemination, feedback, community-

building

Page 121: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Ithaca Technical Meeting

• Input– experiences gained with implementing &

discussing the current SFc specs– emerging interest for the application of

SFc-concepts as a general interoperability framework in a scholarly environment

Page 122: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Ithaca technical meeting

• Output– guidelines for an in-depth revised technical

spec to be issued early 2001 – stable for experimentation; not definitive– minimize risk for early adopters– maximize chances for future interoperability

across communities

Page 123: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

underlying concepts

abstract principles

concrete implementation of principles

Components of OAI Model

Page 124: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

service providers

records in an archive

open interface to archives

managed archives (data providers)

OAI Underlying Concepts

Page 125: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

metadata harvesting

identifiers

metadata set formats

acceptable use

registration

abstractprinciples

implementationof principle

OAI harvesting protocol

URIs (community schemes)

DC & XML container (parallel sets)

Flow Control (usage restrictions)

(community specific)

Building on Underlying Concepts

Page 126: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 What is a record?

A record in an archive is a metadata-record. The metadata record describes – and can contain an entry point to- full-content.

Page 127: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.

Metadata: Interoperability & Extensibility

Page 128: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

• Adoption of unqualified Dublin Core Element Set as required metadata.

• Support for parallel metadata sets maintained– EPMS (e-print community)– Others

• Research library community• Museum community

Metadata Solutions

Page 129: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Metadata XML Container

<record> <header> <identifier>oai:arXiv:hep/001001</identifier> <datestamp>1999-12-25</datestamp> </header> <metadata xmlns:dc=“http:…”> <dc:creator>Ernest Rutherford</dc:creator> <dc:title>Investigations of Radioactivity </dc:title> <dc:identifier>doi:1234/5432</dc:identifier> </metadata></record>

Page 130: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Identifier Issues

• Basic identifier constraints based on URI specifications– A key for requesting a record from a

repository– Key and metadata format ID uniquely

identify a record

• Individual communities may develop URN registration schemes

Page 131: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Identifier Solutions

full-identifier = oai:archive-identifier:record-identifier

Registered URI

Scheme

Archive Idendifier:

Registered within OAI

Unique ID within archive:

(syntax is archive-specific)

example = oai:ncstrl:ncstrl.cornellcs/TR94-1418

Page 132: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Repositories, Identifiers, and Records

Identifier

Datestamp

MF1 MF2 MF3 MF4

<record> <header> … </header> <metadata> …. </metadata><record>

Page 133: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Selective harvesting

• Recognized need for light-weight facility for selective harvesting– By Date

• Sets– A low-cost means of selective harvesting– NOT a general tool for defining global

categories– Attribution of meanings to sets can be done

within communities and in bilateral fashion

Page 134: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Protocol Solutions

• Normalized and Enhanced Verb Set– GetRecord– Identity– ListIdentifiers– ListMetadataFormats– ListRecords– ListSets

Page 135: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Protocol Solutions

• CGI-script friendly syntax– baseurl?verb=verbname&argname=argval...

– verbname is the name of the verb– argname is the name of the attribute– argval is the value of the attribute

• Examplehttp://foo/blaz?verb=ListRecords&set=S1

Page 136: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Registration Solutions

• Automation through:– On-line registration of:

• Archive identifier (uniqueness enforcement)• base-url of archives OAI protocol implementation

– Identity verb that exposes archive characteristics

– Use of protocol for registration of metadata formats and validity checking

• Registration of service providers is still an open issue

Page 137: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Release Schedule

• October 15 – normalized meeting notes distributed to meeting group

• November 1 – beta specification to steering committee and limited distribution

• Early January – stabilization of specification and public meeting

Page 138: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

Metadata Landscape

EVA 2000 Moscow

Page 139: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Conferences

• ACM Digital Libraries 2001, San Antonio, June 2001, http://www.dl00.org/

• European Conference on Digital Libraries, Darmstadt, Sep 2001 http://www.ecdl2001.org

• Asian Digital Library Conference, Seoul, December 2000, http://ADL2000.kaist.ac.kr

• Tenth International WWW Conference, Hong Kong, May 2001, http://www10.org

Page 140: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

NSF Digital Library Initiative

• Phase I (1994-1998): six large-scale testbeds involving research universities, industrial partners, and next-generation technologies

• Phase II (1999+): expanded scope, smaller projects as well as large testbeds, emphasis on making accessible new types of content

Page 141: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Distributed National Electronic Resource (UK)

• A managed environment for Internet access to scholarly journals and other materials relevant to higher education in the UK

• Uses international standards (eg, Dublin Core)• National purchase and licensing agreements

for best value to UK education community• eLib research funding since mid-1990s

emphasized incremental improvement of standards and services

Page 142: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Global Info (Germany)

• "The German Digital Library Project"• Since 1996, integrating access to

scientific information among libraries, publishers, learned societies, and individual scientists

• Emphasis on open standards (e.g., Dublin Core) and open-standard formats (e.g., XML, RDF, MPEG)

Page 143: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 European Union

• Fifth Framework Programme, 1998-2002 – several dozen projects with several countries each– Digital Heritage, Cultural Content– Interactive Electronic Publishing– Multimedia Content and Tools

• DELOS Network of Excellence– http://www.ercim.org/delos/– Communication within European digital library

research community and international networking

Page 144: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 MathNet• German Mathematical Societies index math pre-

prints and home pages of mathematicians– Encourages use of Dublin-Core-based metadata by

distributing free metadata editor; displays hits "with metadata" separately from hits "without metadata"

• International Mathematical Union (IMU) planning international Web service based on German MathNet model

• Seeking international agreement on simple metadata profiles for types of math materials

Page 145: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

IMS Global Learning Consortium, Inc.

• Teachers seeking appropriate classroom materials on Web may want to know:– for which age-group?– has it already been used successfully in

classrooms?– will it work on my equipment?

• IMS: Rich descriptions of learning resources in a standard record format

Page 146: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Federal Geographic Data Committee

• (US) FGDC Content Standard for Digital Geospatial Metadata: integrate access to resources about a particular area found in diverse repositories

• Government, education, and business needs– Emergency management– Integrated databases and comprehensive maps– City planning– Environmental control

Page 147: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Visual Resources Association

• VRA Core Categories in a two-level model for describing objects such as paintings and buildings

• "Works" described separately from "images" of those works (One-to-One Principle)

• Conceptual clarity of One-to-One Principle implies more complex work-flow and processing for catalogers and software

Page 148: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Nordic Metadata Project

• Cooperation between Scandinavian countries (since circa 1996)

• Pioneered idea of metadata-based distributed index across national boundaries

• NetLab (Lund University) maintains SAFARI, which harvests Dublin-Core-based metadata embedded in documents on Web servers

Page 149: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Renardus Project (EU)• http://www.konbib.nl/coop/reynard

– National libraries (Netherlands coordinates)

– NDR: National Digital Resource in UK– Die Deutsche Bibliothek

• Goal: integrated access to subject gateways in Europe

• High-level agreement on simple, Dublin-Core-based schema as common denominator

Page 150: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Networked Digital Library of Theses and Dissertations

(NDLTD)• http://www.ndltd.org• International consortium of projects

putting dissertations online• Difficult to agree on single unified

metadata schema -- national, legal, and disciplinary requirements differ significantly

• NDLTD agreement on a small Dublin-Core-based set of metadata elements?

Page 151: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 CIDOC

• International Council of Museums: object-oriented model (CIDOC) designed for describing multiple entities that may be– physical (e.g., museum objects)– conceptual (e.g., works)– temporal (e.g., historical periods)– spatial (e.g., places)

• Implies an integrated information space of "encyclopedic" scope

Page 152: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 Rich Site Summary (RSS)

• Metadata for content syndication (news feeds)

• Used in developing media content portals

• Built on established vocabularies (DC), uses RDF syntax

• Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.

Page 153: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Moving Picture Experts Group (MPEG)

• MPEG 4: encoding and interacting with audio-visual objects

• MPEG 7: multimedia content description interface for such objects

• MPEG 21: ambitious "umbrella" framework describing the infrastructure for delivering and consuming multimedia content

Page 154: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 More...

• INDECS - Uses an event-based model to describe intellectual property rights for commercial transactions

• DOI - Uses the INDECS framework with a Digital Object Identifier for content description and management of references between scientific, technical, and medical journals

• BSR - Basic Semantic Registry as a universal interlingua of concepts

• GILS - Government Information Locator Service

Page 155: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 ...and more...

• PDS - Planetary Data System• IEEE Learning Object Metadata - an elaborate,

hierarchical scheme for describing multiple facets of educational material

• MARC 21 - Machine Readable Cataloging format and related vocabularies for libraries

• EPICS Data Dictionary, a subset of which -- ONIX -- describes books in a specific XML format (pushed by Amazon.com)

Page 156: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000 For further information....

• "Metadata Watch Reports" of SCHEMAS Project, http://www.schemas-forum.org– Critical overview (with expert commentary)

on the metadata landscape as it evolves– Related database of individual activity

reports

• D-Lib Magazine, http://www.dlib.org/dlib/

• Ariadne, http://www.ariadne.ac.uk

Page 157: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Why the Web won

• Tim Berners-Lee's original model was very simple, and it was easy to implement

• Real-world experience with simple HTML led iteratively to better understanding of priorities– As with bicycles and airplanes, there was no

"theory" for design -- design was perfected iteratively, starting simple

• Complex standards impose significant costs, especially if legacy data must be converted

Page 158: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

Learning from experience

• People are only human: the most perfect language is always subject to interpretation

• By design, metadata languages must allow for innovation and evolution

• Physics and art history, Chinese and Finnish -- different languages will continue in real life

• Likewise, a diversity of metadata languages is inevitable

• Interoperability over "everything" can only be via a simple and general pidgin

Page 159: IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.

EVA 2000

[email protected]