Post on 14-Jan-2016
OAI Overview
DLESE OAI WorkshopApril 29-30, 2002
John Weatherley (jweather@ucar.edu)
DLESE OAI April 29-30, 2002 2
Workshop Schedule
Day 1 Morning
Overview of OAI Look at OAI tools and resources
Afternoon DLESE OAI software installation, configuration and setup
Day 2 Morning
Overview of NDSL and DLESE interoperability architecture
NSDL metadata overview Metadata and OAI
DLESE OAI April 29-30, 2002 3
Resources
Workshop presentation slides, links to tools and other OAI resources are located at:http://oai.dlese.org
DLESE OAI April 29-30, 2002 4
What is DLESE and NSDL?
DLESE: Digital Library for Earth System Education: provides access to digitally accessible
resources for learning about the Earth system
NSDL: National Science (STEM) Digital Library: network of scholarly and educational digital
libraries related to science (DLESE will be part of this network)
DLESE OAI April 29-30, 2002 5
1. What is the OAI? What is the Open Archive Initiative (OAI)?
Organization dedicated to solving problems of digital library interoperability by defining simple protocols and standards
Grew out of the e-prints (arXiv) community at Los Alamos What is the OAI Protocol for Metadata Harvesting
(OAI-PMH)? Protocol to transfer metadata from a source archive to a
destination archive How is the OAI-PMH Being Used by the NSDL and
DLESE? The OAI-PMH has been adopted as a primary means of
gathering and sharing metadata among contributors Also used to facilitate internal management of metadata
stores
DLESE OAI April 29-30, 2002 6
What is Metadata?
Data refers to digital objects e.g. the resources themselves
Metadata is data about data e.g. a description about a resource, not the resource itself
OAI is used to transmit metadata
DLESE OAI April 29-30, 2002 7
2. Definitions / Concepts Basic Principles
Harvesting vs. Federation Data Providers vs. Service Providers
Underlying Technology HTTP and XML XML Namespaces and Schema
Protocol Policies and Conventions Basic Policies Sets
DLESE OAI April 29-30, 2002 8
Harvesting vs. Federation Competing approaches to interoperability
Federation is when services such as searching are run remotely
Harvesting is when metadata is transferred from remote sources to the destination where the services are located
Federation requires more effort at the remote site but is easier for the local system
Harvesting requires less effort at the remote site; Services are provided by the local system
OAI uses the harvesting model
DLESE OAI April 29-30, 2002 9
Data Providers vs. Service Providers
Data Providers refer to entities who possess metadata and are willing to share this with others (e.g. collection builders)
Service Providers are entities who harvest data from Data Providers in order to provide higher-level services to users (e.g. searching, browsing, recommender systems, etc.). The NSDL and DLESE are examples.
DLESE OAI April 29-30, 2002 10
Features of the OAI Approach
Lightweight: Low overhead for Data Providers Protocol is relatively simple to implement Many plug-and-play tools publicly available Transports any metadata framework that
can be made available in XML form (details to come)
Details of searching, browsing, annotation and other advanced services are handled by the Service Provider
DLESE OAI April 29-30, 2002 11
Data Providers:(collection builders)
<xml/>
<xml/>
<xml/>
…
<xml/>
<xml/>
<xml/>
…
<xml/>
<xml/>
<xml/>
…
Service Provider
(DLESE, NSDL)
Harvested Records
3. Provide searching, browsing,
and other services over the data.
OAI protocol(over http)
1. Service Provider polls periodically for new records
2. New records downloaded and cached by the Service Provider
Metadata Harvesting Framework
Library User
DLESE OAI April 29-30, 2002 12
HTTP and XML
The OAI-PMH is an almost stateless request/response protocol
Requests and responses are sent via the HTTP protocol
Requests are encoded as GET/POST operations
Responses are well-formed XML documents
DLESE OAI April 29-30, 2002 13
Well-formed and Valid XMLCorrect<car>
<make>Dodge</make><model>Spirit</model><year>1994</year><owner>
<name>you</name>
<plate>CO</plate> </owner>
</car>
Incorrect<car>
<make>Dodge</make><model>Spirit</model><year>1994<owner>
<plate>CO</plate>
<name>you</name> </car>
</owner>
DLESE OAI April 29-30, 2002 14
DTD, Schemas & NamespaceDTD’s: Document Type
Definition Describe the
elements of XML instance documents
Not well-formed XML Some data-typing Namespaces harder
to deal with
Schemas Describe the
elements of XML instance documents
Well-formed XML Strong data-typing Namespaces are
easier to deal with
Namespace: Collection of related element names identified by a name label (e.g. dc)
DLESE OAI April 29-30, 2002 15
XML Namespaces and Schema
Consistency and data quality is ensured by using XML Schema descriptions for each possible response
XML Namespaces are used where necessary to clearly define which parts of the responses are actual metadata and which support the OAI-PMH. Example:
http://www.cstc.org/cgi-bin/OAI/CSTC.pl?verb=GetRecord&identifier=oai%3ACSTC%3A103&metadataPrefix=oai_dc
DLESE OAI April 29-30, 2002 16
Basic OAI Policies and Conventions
Each metadata record from a given Data Provider must have a unique ID (OAI ID is not necessarily the same as the record ID)
Each metadata record must be persistent so that Service Providers can always refer back to the source
Each record must have a date stamp indicating creation / modification date
Dates provide a mechanism for incremental and continuous transfer of metadata by only requesting records that have changed since the previous harvest
Flow Control - Resumption Tokens can be used to return partial results – the client is issued a token which may be presented to the server to receive more results
Multiple formats of metadata are allowed Examples: Dublin Core, DLESE IMS
DLESE OAI April 29-30, 2002 17
Sets OAI-PMH mechanism to allow for harvesting of
sub-collections Semantics for sets are defined outside of the
protocol Sets are defined by conventions established
between data and service providers Example sets within DLESE might be: DWEL, COMET,
LDEO, etc. Example sets within the NDSL might be: DLESE,
DLESE:DWEL, DLESE:COMET, DLESE:LDEO, etc. Sets can be established that enable querying (e.g.
by topic, author name, subject area, etc.) Example: The Open Digital Library (Suleman, 2001)
DLESE OAI April 29-30, 2002 18
3. Requirements to be a Data Provider
Source of metadata Human or automated resource catalogers
Metadata mappings Crosswalks from native formats to DC or other formats
Server technology Handled by the OAI software
Datestamps
Deletions
Unique identifiers
DLESE OAI April 29-30, 2002 19
4. The OAI-PMH
Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Date Ranges
Resumption Tokens
DLESE OAI April 29-30, 2002 20
Identify
Purpose Return general information about the
archive and its policies Parameters
None Sample URL
http://oai.dlese.org/provider?verb=Identify
DLESE OAI April 29-30, 2002 21
ListMetadataFormats
Purpose List metadata formats supported by
the archive as well as their schema locations and namespaces
Parameters Identifier – for a specific record ( O )
Sample URL http://oai.dlese.org/provider?verb=ListMetadataFormat
s
DLESE OAI April 29-30, 2002 22
ListSets
Purpose Provide a hierarchical listing of sets in
which records may be organized Parameters
None Sample URL
http://oai.dlese.org/provider?verb=ListSets
DLESE OAI April 29-30, 2002 23
GetRecord
Purpose Returns the metadata for a single
identifier in the form on an OAI record Parameters
identifier – id for the record ( R ) metadataPrefix – metadata format ( R )
Sample URL http://oai.dlese.org/provider?verb=GetRecord&identifier=dlese
%3ADLESE-000-000-000-002&metadataPrefix=dlese_ims
DLESE OAI April 29-30, 2002 24
ListIdentifiers
Purpose List all unique identifiers corresponding to the
record in the repository Parameters
from – start date ( O ) until – end date ( O ) resumptionToken – flow control mechanism ( X )
Sample URL http://oai.dlese.org/provider?verb=ListIdentifiers
DLESE OAI April 29-30, 2002 25
ListRecords Purpose
Retrieves metadata for multiple records Parameters
from – start date ( O ) until – end date ( O ) resumptionToken – flow control mechanism ( X ) set – set to harvest from ( O ) metadataPrefix – metadata format ( R )
Sample URL http://oai.dlese.org/provider?
verb=ListRecords&metadataPrefix=dlese_ims
DLESE OAI April 29-30, 2002 26
DLESE Architecture
MetadataRepositor
y
Collections
DLESE Portal
Search & Discovery
Direct Entry
OAI
Resources
Services: (e.g. What’s
New)
NSDL
OAI
OAI
Library Users
DLESE OAI April 29-30, 2002 27
References1. “Building Interoperable Digital Libraries: A Practical Guide to
creating Open Archives,” Hussein Suleman (hussein@vt.edu), JCDL 2001 Tutorial.
2. “A Framework for Building Open Digital Libraries,” Hussein Suleman and Edward A. Fox, in D-Lib Magazine, December, 2001. http://www.dlib.org/dlib/december01/suleman/12suleman.html
3. The Open Archives Initiative http://www.openarchives.org