Delos … a rocky, barren island Today, Delos is primarily an archaeological site.
IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl...
-
Upload
casandra-shear -
Category
Documents
-
view
213 -
download
1
Transcript of IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl...
IFLA/DELOS/NSF WorkshopStandards and Metadata
EVA 2000 MoscowNovember 2, 2000
Thomas Baker GMDCarl Lagoze Cornell Univ.
EVA 2000 Introductions
• Thomas Baker– GMD Library, Bonn, Germany– Dublin Core Executive Committee– EU DELOS Network of Excellence
• Carl Lagoze– Digital Library Research Group, Faculty of
Computing and Information, Cornell University, Ithaca, NY, USA
– Dublin Core Advisory Committee – NSF Digital Library Initiative
EVA 2000 Workshop Roadmap
• Introduction to Metadata (30 min.)• Dublin Core Metadata Initiative (60 min.)
Break• Simplicity and Complexity (45 min.)• Metadata Infrastructure (45 min.)
Lunch• Deploying and Using Metadata (90 min.)• Metadata Landscape (30 min.)
Introduction to Metadata
EVA 2000 Moscow
EVA 2000
Haven’t we done metadata already?
EVA 2000
What’s wrong with this model?
• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community
of expertise)
• Monolithic– One size fits all approach– Reflects its centralized system origins
• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other
resource relationships
EVA 2000
Internet Commons includes Multiple Communities
ScientificData
HomePages Geo
InternetCommons
Library
Museums
Commerce
Whatever...
EVA 2000
Web Challenge to Traditional Cataloging
• Scale
• Permanence
• Authenticity
• Organizational Context
• Variety
EVA 2000
State of the Web as an Information System
• Search systems are motivated by advertising• Index coverage is unpredictable and limited (1/3)• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile• What about versions, editions, back issues?• Archiving is presently unsolved• Authority and quality of service are spotty• Managing Intellectual Property Rights is hard
EVA 2000
Metadata: Part of a Solution
• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation
• Variety across various dimension:– specialization– decentralization– democratization
EVA 2000
Metadata Takes Many Forms
resourcediscovery
documentadministration
rightsmanagement
contentrating
security andauthentication
archivalstatus
products andservices
databaseschemas
process controlor description
EVA 2000 Metadata Challenges
• Accommodate multiple varieties of metadata
• Tension: functionality and simplicity • Tension: extensibility and
interoperability• Human and machine creation and
use• Community-specific functionality,
creation, administration, access
EVA 2000
Warwick Framework: Containing Chaos
• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)
• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata
• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive
element sets– focuses interoperability issues at package level
EVA 2000
Modularization Allows Distributed Management
• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution
EVA 2000
Interoperabilityrequires conventions
about:• Semantics– The meaning of the
elements
• Structure– human-readable– machine-parseable
• Syntax– grammars to convey
semantics and structure
Dublin Core Metadata Initiative
EVA 2000 Moscow
EVA 2000
History of the Dublin Core• 1994: "Do we have a simple set of tags for
ordinary people to describe their Web pages?"
• 1995: The Dublin Core: 13 elements, later 15• 1996: The Dublin Core is but one of many
vocabularies needed ("Warwick Framework")• 1997: "WF needs formal expression in a
Resource Description Framework (RDF)"• 2000: Dublin Core Metadata Initiative
recommends qualifiers, broadens its organizational scope beyond the Core
EVA 2000 A pidgin for digital tourists
• Metadata is language.• Dublin Core is a small and simple language -- a
pidgin -- for finding resources across domains.• Speakers of different languages naturally
"pidginize" to communicate– E.g., tourists using simple phrases to order beer
("zwei Bier bitte" "dva pivo" "biru o san bai"...)
• We are all "tourists" on the global Internet.
EVA 2000 A grammar of Dublin Core
• http://www.dlib.org/dlib/october00/baker/10baker.html
• By design not as subtle as mother tongues, but easy to learn and extremely useful in practice
• Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)
• Simple grammars: sentences (statements) follow a simple fixed pattern...
EVA 2000
Example Dublin Core statements
• Resource has Title 'Grammar of Dublin Core'.
• Resource has Creator 'Tom Baker'.• Resource has Subject 'Metadata'.• Resource has Relation
http://foo.org/file.htm.
EVA 2000
Resource has property
DC:CreatorDC:TitleDC:SubjectDC:Date...
X
implied subject
impliedverb
one of 15properties
property value(an appropriateliteral)
[optional qualifier]
[optional qualifier]
qualifiers(adjectives)
EVA 2000
The fifteen special nouns (properties)
Creator Title Subject
Contributor Date Description
Publisher Type Format
Coverage Rights Relation
Source Language I dentifier
EVA 2000
Resource has Date "2000-06-13"Revised
ISO8601
Resource has Subject "Languages -- Grammar"LCSH
EVA 2000
Dumb-Down Principle for qualifiers
• The fifteen elements should be usable and understandable with or without the qualifiers
• Like saying that nouns can stand on their own without adjectives
• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
EVA 2000
Resource has Date "2000-06-13"Revised
ISO8601
Resource has Subject "Languages -- Grammar"LCSH
To test whether qualifiers are "good", cover them with your hand and ask:-- Does the statement still make sense?-- Is it still correct?
EVA 2000Element Refinements
• Make the meaning of an element narrower or more specific.– a Date Created versus a Date
Modified– an IsReplacedBy Relation versus a
Replaces Relation• If your software does not understand
the qualifier, you can safely ignore it.
EVA 2000
Value Encoding Schemes• Says that the value is
– a term from a controlled vocabulary (e.g., Library of Congress Subject Headings)
– a string formatted in a standard way (e.g., "2000-05-03" means May 3, not March 5)
• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
EVA 2000
Peer review of proposals for new terms
• DCMI Usage Committee reviews proposals for new qualifiers (and perhaps elements)
• Evaluates proposals in light of grammatical principles (are the qualifiers ignorable?)
• Tiered model of approval status (tentative): proposed, conforming, recommended, obsolete
• First qualifiers "recommended" in July 2000• http://purl.org/DC/documents/rec/dcmes-qualifiers-
20000711.htm
EVA 2000
Open questions in Dublin Core
• What are "appropriate values" for the fifteen properties? How can they be used for cross-domain searching?
• How can DCMI control the evolution of Dublin Core as it is adapted in practice?
• How can an application use DC as a pidgin while describing resources with more complex metadata?
• Can we keep the Core simple?
EVA 2000
Search buckets versus description
• Think of DC elements as fuzzy search buckets– Different types of data appropriate for different
buckets: URLs, date strings, word strings, names– Separate books about Sigmund Freud versus
books by Sigmund Freud into different buckets
• Search bucket: for discovering resources• But general, fuzzy categories may not be
sufficient for describing resources– After searching, display more detailed
descriptions on screen
EVA 2000
DCMI broadens its mission (Oct 2000)
• The mission of the DCMI is to make it easier to find resources using the Internet through the following activities:– Developing metadata standards for
discovery across domains (example: the Dublin Core)
– Defining frameworks for the interoperation of metadata sets
– Facilitating the development of community or disciplinary specific metadata sets that are consistent with items 1 and 2
EVA 2000
A context for the Core
• If "the Dublin Core" is the core of DCMI, what is the surrounding context?
• If "the Dublin Core" is the simple pidgin, what is the broader landscape of metadata language?
• How do pidgins relate to more complex models or "application profiles"?
• Do we need pidgins for describing other things, such as "people" and "events"?
EVA 2000
Using DC with other vocabularies
• Specialized application profiles [government
information, education, mathematics] may need to:– Use general-purpose Dublin Core
elements– Use elements from another, more
domain-specific standard– Narrow standard definitions of DC
elements for specific local uses– Invent local elements outside the
scope of existing standards
EVA 2000
Example: adapting DC:Title to local uses
• As defined in the official Dublin Core "namespace":– "Title: A name given to the resource"
• As defined in a UK "application profile":– "Title: A name given to the collection"
•Definition is narrower
EVA 2000
Namespaces in translation
• Dublin Core has been translated into 26 languages– machine-readable tokens are
shared by all– human-readable labels are defined
in different languages– translations are distributed,
maintained in many countries
EVA 2000
One token - labels in many languages
dc:creator“Verfasser”rdfs:label
“Creator”rdfs:label
“Pencipta”
rdfs:label
[Server inGermany]
[Server inJakarta]
[DCMI Server]
EVA 2000
RDF -- a more powerful sentence pattern
• Dublin Core statements:– Resource has Creator "Tom Baker".– Resource has Identifier http://foo.org/bar.html.
• Resource Description Framework "triples" - a more powerful way to say the same thing:– http://foo.org/bar.htm has Creator "Tom Baker".
EVA 2000 DCMI Re-organization
• Expanded mission– Core metadata elements for Agents (or Events)?– Frameworks for integrating multiple standards
• Re-organization model– Membership organization like W3C or Unicode
Consortium?– Retain open consensus model– International perspective– Better training, documentation, outreach
EVA 2000
DCMI Open Metadata Registry
• Managing vocabularies defined by the DCMI– Languages– Versioning– Controlled vocabularies
• Foundation for modular, incremental integration and evolution
• Collaboration with European SCHEMAS Project and ULIS in Tsukuba, Japan
• http://wip.dublincore.org/registry/
EVA 2000
Official recognition of the Dublin Core
• CEN Workshop Agreement– endorse Dublin Core elements as
CWA13874– provide usage guidelines for European
industry
• NISO Z39.85– National Information Standards
Organization, an ANSI affiliate– Balloting concluded in August 2000
EVA 2000
DCMI Activities
• Standards development and maintenance
• Metadata registry• Technical working groups and periodic
workshops• Tutorial materials and user guides• Education and training• Access to software• Liaisons with other standards or user
communities
EVA 2000
DC-9 Workshop in Tokyo, 2001
• DC-8 Workshop was a National Library of Canada (Ottawa)– emphasis on application profiles, longer-
term organizational mission, and domain-specific adaptations of Dublin Core
• DC-9 in Tokyo: well-defined tracks– implementation reports and research
papers– ongoing technical working group meetings– general introduction and tutorials for non-
experts
Simplicity and Complexity
EVA 2000 Moscow
EVA 2000 Warwick Framework
• Container/Package approach to metadata
• Rejection of universal ontology• Recognition of individual community
needs• Provide scope for metadata efforts
EVA 2000
Warwick Framework Design
Containers for aggregating Packages of typed metadata sets
Container
PackageMARC Metadata
PackageIndirect Reference
PackageTerms and Conditions
URI
PackageDublin Core
EVA 2000
Warwick FrameworkImplementation and
Research• Packaging, linking, storing, and
transmitting component/package framework
• Semantic interactions and interoperability among multiple metadata packages/vocabularies
EVA 2000
Interoperability among Metadata Vocabularies
abc coreclasses
DublinCore
MARC
INDECSIMS
EVA 2000 Harmony Project
• Project Investigators– Dan Brickley - ILRT, Bristol (U.K.)– Jane Hunter - DSTC, Brisbane (Australia)– Carl Lagoze - Computer Science, Cornell
(U.S.)
• More Information– http://www.ilrt.bris.ac.uk/discovery/
harmony/
EVA 2000
Attribute/Value approaches to metadata…
Hamlet has a creator Shakespeare
subject implied verb metadata noun literal
Play
wrig
ht
metadata adjective
The playwright of Hamlet was Shakespeare
R1
“Shakespeare”
“Hamlet”
dc:creator.playwright
dc:title
EVA 2000
…run into problems for richer descriptions…
Hamlet has a creator Stratford
birt
hpla
ce
The playwright of Hamlet was Shakespeare,who was born in Stratford
“Stratford”R1
“Shakespeare”dc:creator.playwright
dc:creator.birthplace
Hamlet has a creator Shakespeare
EVA 2000
…because of their failure to model entity distinctions
R1
“Stratford”
creatorR2
name “Shakespeare”
birthplacetitle
“Hamlet”
EVA 2000
Applying a Model-Centric Approach
• Formally define common entities and relationships underlying multiple metadata vocabularies
• Describe them (and their inter-relationships) in a simple logical model
• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
EVA 2000
Applications of the ABC Model
• Guidance for communities developing vocabularies
• Foundation for understanding existing vocabularies
• Basis for mappings among vocabularies using formalisms such as RDF
EVA 2000 Harmony/ABC Workshop
• January 27-28 2000 CNI Washington • Representatives from
– Dublin Core, INDECS, MPEG-7, IFLA– Archives, Museums, Libraries, Audiovisual
• Result: Importance of processes, events, and states in understanding and describing resources
EVA 2000
Conceptual Basis:Evolution of Content over
TimeIFLA Entity Model
From Bearman, et. al., D-Lib Magazine, January 1999.
EVA 2000
Events help metadata relationships?
• Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model)
• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.
• Clarifying attachment points facilitates mapping across common entities in different vocabularies.
EVA 2000
desc1
Content, Events, & Descriptions
desc2
R1 R2 R3
R4
E2 E3E1
E4
EVA 2000 ABC Event Model
EVA 2000
A Simple Example:Live At Lincoln Performance
• Performance at The Lincoln Center for the Performing Arts
• On April 7, 1998 at 8pm Eastern time• Orchestra is New York Philharmonic• Musical score – “Concerto for Violin”• 130 minute MP3 audio recording • Rights held by Lincoln Center
EVA 2000 Example in ABC Model
EVA 2000 Derivation of Multiple Views
CIDOC CRM Model
ABC Description
in XML
ID3 tags embedded in MP3
MPEG-7 description in DDL
Dublin Core in XML/RDF
EVA 2000
Step 1 – Structural Mapping
Event-aware model
Resource-centric model
EVA 2000 Structural Mapping RulesEvent attributes transferred to output:• Context/Date, /Time, /Place ->
Date.Performance, Time.Performance, Place.Performance
• Act/Role -> Agent.Role e.g. Orchestra • Event Type -> Relation between input & ouput e.g. Performance ->Relation.isPerformanceOf• Output Description generated from event
Type and input Title e.g. “Performance of Concerto for Violin”
EVA 2000
Step 2 – Semantic Mapping
EVA 2000 XSLT for Transformations
• Works well for structural and syntactic mapping between metadata descriptions
• Semantic mappings need to be hardcoded• Unsuitable for loosely constrained or
variable input
EVA 2000 A More General Solution
• Flexible semantic mappings require additional knowledge:– Metadata Term Ontology – MetaNet
• Methods for using that context knowledge for mapping– Some combination of procedural language
(Java) and XSLT– Investigating more general mapping rule
language (analogies to compiler technology)
EVA 2000
Planned Experimental Context
• CIMI Experiments– Dublin Core for basic resource descriptions– Richer descriptions derived from ABC model– Mapping among descriptions– Understanding relationship between ABC
and CIDOC CRM
• Connecting with Recordkeeping Metadata Issue - SPIRT Project
Metadata Infrastructure
EVA 2000 Moscow
EVA 2000 Metadata is language
• Metadata schemas are languages for making statements about resources:– Book has Title "Gone with the Wind".– Web page has Publisher "Springer
Verlag".
• Vocabulary terms (elements) are defined in standards like Dublin Core
• Metadata grammars constrain the statements and data models one can form
EVA 2000
But languages evolve with use
• Inevitably, languages resist stability
• People stretch official definitions• Implementers misunderstand the
intended meaning or use of elements • Implementors coin local terms and
extensions• If the application does not fit the
standard, the standard is often "customized" to fit the application
EVA 2000
Metadata languages are "multilingual"
• Metadata is not a spoken language• The words of metadata -- "elements" --
are symbols that stand for concepts expressible in multiple natural languages
• Standards may have dozens of translations
• Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?
EVA 2000
What metadata languages lack
• Comprehensive dictionaries – Where can one get an overview of
vocabulary terms used in metadata languages?
• A publication context for implementers– Where can you see how they are using
metadata?
• Standard grammars– How do we understand the principles of
metadata?
EVA 2000
Can we manage this evolution?
• How can we (scalably) monitor the usage of a language that is:– Never spoken?– Rarely published in a way that can be
harvested?
• How can dictionary editors help a metadata language evolve and grow in response to usage?
• How can this evolution occur across (human) languages?
EVA 2000
RDF Schemas (RDFS) -- W3C standard
• A dictionary format for metadata terms:– Simple XML format for terms and definitions
• Example: "Title" (Dublin Core)– Human-readable label and definition:
• Title: A name given to the resource.
– Unique, machine-readable identifiers• dc:title
• Support for cross-references– between terms in related standards– between local adaptations and related standards
EVA 2000 Print world versus the Web
• Traditional print world– Standards are currently defined and published as
paper documents or Web pages in HTML– Metadata implementors rarely publish their
local extensions and adaptations
• RDF Schemas (RDFS)– Web-based publication format– Explicit cross references from implementation
schemas and the standards on which they are based
EVA 2000
EOR -- an RDF Schema Browser
• Harvests RDF Schemas– Schemas distributed on multiple Web servers– Creates huge database of schemas for searching– Web interface functions as a "metadata browser"– Click on cross-references between linked terms
• Downloadable as open source software– http://eor.dublincore.org/index.html– Authors: Eric Miller (OCLC, RDF Working Group, DCMI)
and Tod Matola
EVA 2000
Hyperlink Metadata Terms over the Web
• Index of metadata terms searchable as one huge database
• Click on cross-references to follow term-to-term links between vocabularies
• Point-to-point, like the Web itself– In 1992, Gopher located the right file within
directory trees (but not points within the file)– HTML enabled point-to-point links between
documents
EVA 2000
"Editor" -- a MARC relator -- refines "Contributor"
EVA 2000
Follow the link to MARC Relator Terms
EVA 2000
...the source of which looks like this:
EVA 2000
...or to Contributor [here, in English, French, German]
EVA 2000
Or view the schema of MyRDF itself...
EVA 2000
...itself an RDF schema like the others
EVA 2000
Registries can function as dictionaries
• Historically, dictionaries of English, French, etc: recorded variants, prescribed forms, and helped standardize (national) languages
• Metadata dictionaries can help metadata vocabularies evolve more like other human languages– Not just top-down, like traditional
standards– Also bottom-up, in response to usage
EVA 2000
Dictionaries prescribe and describe
• Prescribe definitions and recommend usage
• Describe how terms are actually used– Monitor usage through collecting
examples
• Editors and usage boards must strike a balance between prescription and description.
EVA 2000
SCHEMAS Project -- a Thin Registry
• http://www.schemas-forum.org, an EU Project• Pointers to resources elsewhere (a "thin"
registry or portal)• Short descriptions of metadata standards
activities• Critical commentaries by domain experts• Promote the publication of schemas (in
RDF)• Goal: help implementors discover how others
(e.g. EU Projects) are using standards in order to harmonize usage
EVA 2000
DCMI -- a Thick Registry
• A thick registry: stores official metadata element definitions in a central database or repository
• Managing a namespace (as a standards agency): publish qualifiers as available, with version control– Managing translations of the standard in multiple
languages
• Eventually:– User guide interface– Support for standardisation processes (peer review)– Downloadable input to software tools for generating,
editing, validating DC metadata
EVA 2000
Dictionaries as a tool for harmonization
• Knowledge of how other projects are using standards will avoid "reinventing the wheel"
• To help information providers harmonize their schemas for improved access within domains:– Between countries (Nordic Metadata Project)– Preprint repositories (Open Archives Initiative)– Subject gateways (Renardus)– Theses and dissertations (NDLTD)– Mathematics and physics (MathNet, PhysNet)
EVA 2000
A global registry infrastructure?
• Analogously to HTML for text, RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web
• Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries
• Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?
EVA 2000 The scope of registries
• Anything "semantic" (terms and definitions) is potentially an RDF schema:– controlled vocabularies– namespaces, application profiles, annotations– the "schema" of the registry itself
• Application constraints can be modelled in XML Schemas– "title is mandatory"; "date must be after 1980"
• Will XML and RDF Schemas merge?
Deploying and Using Metadata
EVA 2000 Moscow
EVA 2000
Syntax Alternatives:HTML
• Advantages:– Simple Mechanism – META tags embedded
in content– Widely deployed tools and knowledge
• Disadvantages– Limited structural richness (won’t support
hierarchical,tree-structured data or entity distinctions).
– Limited formalisms (parsing and schema definition)
EVA 2000 Dublin Core in HTML
<link rel="schema.DC" href="http://purl.org/dc"> <meta name="DC.Title" content="Business Unusual” <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web
cataloging "> <meta name="DC.Date" scheme="W3CDTF"
content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">
EVA 2000
Syntax Alternatives:XML
• The standard for networked text and data
• Wide-spread tool support– Parsers (DOM and SAX)– Extensibility (namespaces) – Type definition (XML Schema)– Transformation and Rendering (XSLT)– Rich linking semantics (XLINK)
EVA 2000 XML Schema
• Rich XML-based language for expressing type semantics
• Replaces arcane and limited DTD (origin in SGML)
• Facilities– Data typing (both complex and primitive)– Constraints– Defaults
EVA 2000 Dublin Core in XML
<metadata xmlns:dc="http://www.openarchives.org/OAI/dc.xsd"> <dc:creator>Carl Lagoze</dc:creator> <dc:title>Accommodating Simplicity and Complexity in Metadata</dc:title> <dc:date>2000-07-01</dc:date> <dc:publisher>Cornell University, Computer Science</dc:publisher></metadata>
EVA 2000
Syntax Alternatives:RDF
• RDF (Resource Description Format)• The instantiation of the Warwick
Framework on the Web• Provides enabling technology for richly-
structured metadata• Rich data model supporting notions of
distinct entities and properties• Syntax expressed in XML
EVA 2000 RDF Components
• Formal data model• Syntax for interchange of data• Schema Type system (schema model)
EVA 2000 RDF Data Model
• Directed labeled graphs• Model elements
– Resource– Property– Value– Statement– Containers
EVA 2000 RDF Model Primitives
ResourceProperty
ValueResource
Statement
EVA 2000 RDF Syntax Example
URI:R“CIMI Presentation”
Title
Creatordc:
dc:
“Eric Miller”
<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <dc:Creator> Eric Miller </dc:Creator> </Description></RDF>
EVA 2000
“Eric Miller”
RDF Model Example #2
URI:R
URI:ERIC
“[email protected]”“Eric Miller”
“OCLC”
bib:Emailbib:Affbib:Name
URI:OCLC
“CIMI Presentation”Title
Creatoroa:
dc:
EVA 2000
<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/” xmlns:bib = “http://www.bib.org/persons#”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <oa:Creator> <Description> <bib:Name> Eric Miller </bib:Name> <bib:Email> [email protected] </bib:Email> <bib:Aff resource = “http://www.oclc.org” /> </Description> </oa:Creator> </Description></RDF>
RDF Syntax Example #2
EVA 2000 RDF Containers
• Permit the aggregation of several values for a property
• Express multiple aggregation semantics– unordered– sequential or priority order– alternative
EVA 2000 RDF Schemas
• Declaration of vocabularies– properties defined by a particular community– characteristics of properties and/or constraints on
corresponding values
• Schema Type System - Basic Types– Property, Class, SubClassOf, Domain, Range– Minimal (but extensible) at this time– minimize significant clashes with typing system
designed for XML Schema WG
• Expressible in the RDF model and syntax
EVA 2000
Relationships among vocabularies
dc:Creator
ms:director
marc:100
bib:Author
EVA 2000 Bringing it together
• RDF Metadata transmission– Embedded (e.g. <META>), Transmitted with
resource (HTTP), Trusted 3rd Party (HTTP GET)
• RDF Data Model – Support consistent encoding, exchange and
processing of metadata… critical when aggregating data from multiple sources
• RDF Schema– Declare, define, reuse vocabularies
EVA 2000
Open Archives Initiativehttp://www.openarchives.or
g
EVA 2000 What is Interoperability?
• Naming?– Handles– Purls
• Metadata?– MARC– Dublin Core
• Document models?– WebDAV
• Federated searching?– Z39.50?– DASL?
• Services and Protocols?– Dienst
EVA 2000 Partitioning Interoperability
Document Models
Metadata Harvesting
Mediator ServicesLinking, Searching, Summarizing
EVA 2000
SearchingCurrent
AwarenessSummarization
Service Providers
Data Providers
harv
estin
g
The World According to OAI
EVA 2000 UPS Meeting Results
• Establishment of Open Archives Initiative– Loose coalition to experiment with
interoperability solutions
• Santa Fe Convention– Organizational and technical framework to
support metadata harvesting for ePrint archives
EVA 2000
Metadata Harvesting is not New
• Harvest Project (1992-1995)– DARPA-funded– Mike Schwartz (U. Colorado), Mic Bowman
(Penn State), Udi Manber (U. Arizona)
EVA 2000 “Open” Archives
• Political Agenda?– Author self-archiving of E-Prints– “Mission” to reformulate scholarly
publishing framework
• Technical?– Infrastructure to facilitate interoperability
across multiple domains
EVA 2000
Other communities of interest
• “Cambridge” digital library federation meetings– research library community has many
materials for which they’d like to ‘expose’ metadata
• San Antonio OAI workshop– librarians, publishers (some), others
EVA 2000
Technical Umbrella for Practical Interoperability…
ReferenceLibrariesPublishers
E-PrintArchives
…that can be exploited by different communities
EVA 2000 Acting mission statement
Supply and promote an application independent technical framework – a supportive infrastructure that empowers different scholarly communities to pursue their own interests in interoperability in the technical, legal, business, and organizational contexts that are appropriate to them.
Dan Greenstein, Director DLF
EVA 2000
What does this REALLY Mean?
• Keep the bar low enough to make widespread adoption possible
• Provide enough back-doors to make true “disruption” possible (e.g., ePrint community:– refine record notion to mandate full-content
connection– refine metadata to mandate linkage to full-
content
EVA 2000 Organizational Stability
• Institutional backing of CNI (Coalition for Networked Information) and DLF (Digital Library Federation)
• Formation of steering committee– first steps towards international
involvement
EVA 2000
Framework for Partitioning Tasks
• Steering Committee– policy guidance
• Technical Committee– technical specifications
• Workshops– public dissemination, feedback, community-
building
EVA 2000 Ithaca Technical Meeting
• Input– experiences gained with implementing &
discussing the current SFc specs– emerging interest for the application of
SFc-concepts as a general interoperability framework in a scholarly environment
EVA 2000 Ithaca technical meeting
• Output– guidelines for an in-depth revised technical
spec to be issued early 2001 – stable for experimentation; not definitive– minimize risk for early adopters– maximize chances for future interoperability
across communities
EVA 2000
underlying concepts
abstract principles
concrete implementation of principles
Components of OAI Model
EVA 2000
service providers
records in an archive
open interface to archives
managed archives (data providers)
OAI Underlying Concepts
EVA 2000
metadata harvesting
identifiers
metadata set formats
acceptable use
registration
abstractprinciples
implementationof principle
OAI harvesting protocol
URIs (community schemes)
DC & XML container (parallel sets)
Flow Control (usage restrictions)
(community specific)
Building on Underlying Concepts
EVA 2000 What is a record?
A record in an archive is a metadata-record. The metadata record describes – and can contain an entry point to- full-content.
EVA 2000
We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.
Metadata: Interoperability & Extensibility
EVA 2000
• Adoption of unqualified Dublin Core Element Set as required metadata.
• Support for parallel metadata sets maintained– EPMS (e-print community)– Others
• Research library community• Museum community
Metadata Solutions
EVA 2000 Metadata XML Container
<record> <header> <identifier>oai:arXiv:hep/001001</identifier> <datestamp>1999-12-25</datestamp> </header> <metadata xmlns:dc=“http:…”> <dc:creator>Ernest Rutherford</dc:creator> <dc:title>Investigations of Radioactivity </dc:title> <dc:identifier>doi:1234/5432</dc:identifier> </metadata></record>
EVA 2000 Identifier Issues
• Basic identifier constraints based on URI specifications– A key for requesting a record from a
repository– Key and metadata format ID uniquely
identify a record
• Individual communities may develop URN registration schemes
EVA 2000 Identifier Solutions
full-identifier = oai:archive-identifier:record-identifier
Registered URI
Scheme
Archive Idendifier:
Registered within OAI
Unique ID within archive:
(syntax is archive-specific)
example = oai:ncstrl:ncstrl.cornellcs/TR94-1418
EVA 2000
Repositories, Identifiers, and Records
Identifier
Datestamp
MF1 MF2 MF3 MF4
<record> <header> … </header> <metadata> …. </metadata><record>
EVA 2000 Selective harvesting
• Recognized need for light-weight facility for selective harvesting– By Date
• Sets– A low-cost means of selective harvesting– NOT a general tool for defining global
categories– Attribution of meanings to sets can be done
within communities and in bilateral fashion
EVA 2000 Protocol Solutions
• Normalized and Enhanced Verb Set– GetRecord– Identity– ListIdentifiers– ListMetadataFormats– ListRecords– ListSets
EVA 2000 Protocol Solutions
• CGI-script friendly syntax– baseurl?verb=verbname&argname=argval...
– verbname is the name of the verb– argname is the name of the attribute– argval is the value of the attribute
• Examplehttp://foo/blaz?verb=ListRecords&set=S1
EVA 2000 Registration Solutions
• Automation through:– On-line registration of:
• Archive identifier (uniqueness enforcement)• base-url of archives OAI protocol implementation
– Identity verb that exposes archive characteristics
– Use of protocol for registration of metadata formats and validity checking
• Registration of service providers is still an open issue
EVA 2000 Release Schedule
• October 15 – normalized meeting notes distributed to meeting group
• November 1 – beta specification to steering committee and limited distribution
• Early January – stabilization of specification and public meeting
Metadata Landscape
EVA 2000 Moscow
EVA 2000 Conferences
• ACM Digital Libraries 2001, San Antonio, June 2001, http://www.dl00.org/
• European Conference on Digital Libraries, Darmstadt, Sep 2001 http://www.ecdl2001.org
• Asian Digital Library Conference, Seoul, December 2000, http://ADL2000.kaist.ac.kr
• Tenth International WWW Conference, Hong Kong, May 2001, http://www10.org
EVA 2000
NSF Digital Library Initiative
• Phase I (1994-1998): six large-scale testbeds involving research universities, industrial partners, and next-generation technologies
• Phase II (1999+): expanded scope, smaller projects as well as large testbeds, emphasis on making accessible new types of content
EVA 2000
Distributed National Electronic Resource (UK)
• A managed environment for Internet access to scholarly journals and other materials relevant to higher education in the UK
• Uses international standards (eg, Dublin Core)• National purchase and licensing agreements
for best value to UK education community• eLib research funding since mid-1990s
emphasized incremental improvement of standards and services
EVA 2000 Global Info (Germany)
• "The German Digital Library Project"• Since 1996, integrating access to
scientific information among libraries, publishers, learned societies, and individual scientists
• Emphasis on open standards (e.g., Dublin Core) and open-standard formats (e.g., XML, RDF, MPEG)
EVA 2000 European Union
• Fifth Framework Programme, 1998-2002 – several dozen projects with several countries each– Digital Heritage, Cultural Content– Interactive Electronic Publishing– Multimedia Content and Tools
• DELOS Network of Excellence– http://www.ercim.org/delos/– Communication within European digital library
research community and international networking
EVA 2000 MathNet• German Mathematical Societies index math pre-
prints and home pages of mathematicians– Encourages use of Dublin-Core-based metadata by
distributing free metadata editor; displays hits "with metadata" separately from hits "without metadata"
• International Mathematical Union (IMU) planning international Web service based on German MathNet model
• Seeking international agreement on simple metadata profiles for types of math materials
EVA 2000
IMS Global Learning Consortium, Inc.
• Teachers seeking appropriate classroom materials on Web may want to know:– for which age-group?– has it already been used successfully in
classrooms?– will it work on my equipment?
• IMS: Rich descriptions of learning resources in a standard record format
EVA 2000
Federal Geographic Data Committee
• (US) FGDC Content Standard for Digital Geospatial Metadata: integrate access to resources about a particular area found in diverse repositories
• Government, education, and business needs– Emergency management– Integrated databases and comprehensive maps– City planning– Environmental control
EVA 2000
Visual Resources Association
• VRA Core Categories in a two-level model for describing objects such as paintings and buildings
• "Works" described separately from "images" of those works (One-to-One Principle)
• Conceptual clarity of One-to-One Principle implies more complex work-flow and processing for catalogers and software
EVA 2000 Nordic Metadata Project
• Cooperation between Scandinavian countries (since circa 1996)
• Pioneered idea of metadata-based distributed index across national boundaries
• NetLab (Lund University) maintains SAFARI, which harvests Dublin-Core-based metadata embedded in documents on Web servers
EVA 2000 Renardus Project (EU)• http://www.konbib.nl/coop/reynard
– National libraries (Netherlands coordinates)
– NDR: National Digital Resource in UK– Die Deutsche Bibliothek
• Goal: integrated access to subject gateways in Europe
• High-level agreement on simple, Dublin-Core-based schema as common denominator
EVA 2000
Networked Digital Library of Theses and Dissertations
(NDLTD)• http://www.ndltd.org• International consortium of projects
putting dissertations online• Difficult to agree on single unified
metadata schema -- national, legal, and disciplinary requirements differ significantly
• NDLTD agreement on a small Dublin-Core-based set of metadata elements?
EVA 2000 CIDOC
• International Council of Museums: object-oriented model (CIDOC) designed for describing multiple entities that may be– physical (e.g., museum objects)– conceptual (e.g., works)– temporal (e.g., historical periods)– spatial (e.g., places)
• Implies an integrated information space of "encyclopedic" scope
EVA 2000 Rich Site Summary (RSS)
• Metadata for content syndication (news feeds)
• Used in developing media content portals
• Built on established vocabularies (DC), uses RDF syntax
• Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.
EVA 2000
Moving Picture Experts Group (MPEG)
• MPEG 4: encoding and interacting with audio-visual objects
• MPEG 7: multimedia content description interface for such objects
• MPEG 21: ambitious "umbrella" framework describing the infrastructure for delivering and consuming multimedia content
EVA 2000 More...
• INDECS - Uses an event-based model to describe intellectual property rights for commercial transactions
• DOI - Uses the INDECS framework with a Digital Object Identifier for content description and management of references between scientific, technical, and medical journals
• BSR - Basic Semantic Registry as a universal interlingua of concepts
• GILS - Government Information Locator Service
EVA 2000 ...and more...
• PDS - Planetary Data System• IEEE Learning Object Metadata - an elaborate,
hierarchical scheme for describing multiple facets of educational material
• MARC 21 - Machine Readable Cataloging format and related vocabularies for libraries
• EPICS Data Dictionary, a subset of which -- ONIX -- describes books in a specific XML format (pushed by Amazon.com)
EVA 2000 For further information....
• "Metadata Watch Reports" of SCHEMAS Project, http://www.schemas-forum.org– Critical overview (with expert commentary)
on the metadata landscape as it evolves– Related database of individual activity
reports
• D-Lib Magazine, http://www.dlib.org/dlib/
• Ariadne, http://www.ariadne.ac.uk
EVA 2000
Why the Web won
• Tim Berners-Lee's original model was very simple, and it was easy to implement
• Real-world experience with simple HTML led iteratively to better understanding of priorities– As with bicycles and airplanes, there was no
"theory" for design -- design was perfected iteratively, starting simple
• Complex standards impose significant costs, especially if legacy data must be converted
EVA 2000
Learning from experience
• People are only human: the most perfect language is always subject to interpretation
• By design, metadata languages must allow for innovation and evolution
• Physics and art history, Chinese and Finnish -- different languages will continue in real life
• Likewise, a diversity of metadata languages is inevitable
• Interoperability over "everything" can only be via a simple and general pidgin
EVA 2000