Open Archives Initiatives for Metadata HarvestingA Framework for Building Open Digital Libraries
Term Paper-1
Submitted by
NIKESHN
International School of Information Management
University of Mysore2010
Open Archives Initiatives for Metadata HarvestingA Framework for Building Open Digital Libraries
10 Introduction
Digital Library may be defined as system that supports collection organization storage retrieval
and dissemination of Digital Documents It may be viewed as the intersection of Library Science
Computer Science and networked information systems Open movements are gaining acceptance
in the scholarly information arena and many of the Universities and research centers have started
to provide public access to their repositories With the growing number of repositories of digital
repositories in the Web it became difficult for the users to visit individual places in search of
information Many organizational repositories have not been indexed by the search engines Such
mechanism is therefore required by which the repositories can share the resources and work in
coordination to provide a broader purview to the users The mechanism which provides the ability to
the information systems to work in coordination has been termed as Interoperability Open Archives
Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources
of many repositories at the usersrsquo end
The essence of the open archives approach is to enable access to Web-accessible material through
interoperable repositories for metadata sharing publishing and archiving
Such interoperability requirements necessitated the development of standards such as the Dublin
Core Metadata Element Set and the Open Archives Initiatives Protocol for Metadata Harvesting
(OAI-PMH) These standards have achieved a degree of success in the DL community largely
because of their generality and simplicity
20 Need for a Harvester protocol
There is a growing need to make resources not only descriptive metadata harvestable in an
interoperable manner There are two major use cases that motivate this need
Preservation The need to periodically transfer digital content from a data repository to one or
more trusted digital repositories charged with storing and preserving safety copies of the
content The trusted digital repositories need a mechanism to automatically synchronize with
the originating data repository
Discovery The need to use content itself in the creation of services Examples include search
engines that make full-text from multiple data repositories searchable and citation indexing
systems that extract references from the full-text content Another scenario is the provision of
thumbnail versions of high-quality images from cultural heritage collections to external
services that build browsing interfaces that include the thumbnails
30 OAI Protocol for Metadata Harvesting (OAI-PMH)
In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address
interoperability issues among the many existing and independent DLs The focus was on high-
level communication among systems and simplicity of protocols The OAI has since received
much media attention in the DL community and primarily because of the simplicity of its
standards has attracted many early adopters It defines a mechanism for harvesting records
containing metadata from repositories
31 Definitions of Key terms
Open archives Initiatives (OAI)
OAI is an initiative to develop and promote interoperability standards that aim to facilitate the
efficient dissemination of content
Archive
The term archive in the name Open Archives Initiative reflects the origins of the OAI in
the e-prints community where the term archive is generally accepted as a synonym for
repository of scholarly papers Members of the archiving profession have justifiably noted
the strict definition of an archive within their domain with connotations of preservation of
long-term value statutory authorization and institutional policy The OAI uses the term
archive in a broader sense as a repository for stored information Language and terms are
never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of
the professional archiving community with this broader use of archive
(OAI definition quoted from FAQ on OAI Web site)
OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between services
Protocol
A protocol is a set of rules defining communication between systems FTP (File Transfer
Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for
communication between systems across the Internet
Harvesting
In the OAI context harvesting refers specifically to the gathering together of metadata from a
number of distributed repositories into a combined data store
32 Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
33 OAI Key players
There are two groups of participants Data Providers and Service Providers
Data Providers
(open archives repositories) provide free access to metadata and may but do not necessarily
offer free access to full texts or other resources OAI-PMH provides an easy to implement low
barrier solution for Data Providers
Service Providers
use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means
that there are no live search requests to the Data Providers rather services are based on the
harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers
(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis
of the metadata harvested and they may enrich the harvested metadata in order to do so
34 How it works
Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language) The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers) although
unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata
from many sources can be gathered together in one database and services can be provided based
on this centrally harvested or aggregated data The link between this metadata and the related
content is not defined by the OAI protocol It is important to realize that OAI-PMH does not
provide a search across this data it simply makes it possible to bring the data together in one
place In order to provide services the harvesting approach must be combined with other
mechanisms
35 Protocol details
Records
A record is the metadata of a resource in a specific format A record has three parts a header and
metadata both of which are mandatory and an optional about statement Each of these is made
up of various components as set out below
header (mandatory)
identifier (mandatory 1 only)
datestamp (mandatory 1 only)
setSpec elements (optional 0 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
Open Archives Initiatives for Metadata HarvestingA Framework for Building Open Digital Libraries
10 Introduction
Digital Library may be defined as system that supports collection organization storage retrieval
and dissemination of Digital Documents It may be viewed as the intersection of Library Science
Computer Science and networked information systems Open movements are gaining acceptance
in the scholarly information arena and many of the Universities and research centers have started
to provide public access to their repositories With the growing number of repositories of digital
repositories in the Web it became difficult for the users to visit individual places in search of
information Many organizational repositories have not been indexed by the search engines Such
mechanism is therefore required by which the repositories can share the resources and work in
coordination to provide a broader purview to the users The mechanism which provides the ability to
the information systems to work in coordination has been termed as Interoperability Open Archives
Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources
of many repositories at the usersrsquo end
The essence of the open archives approach is to enable access to Web-accessible material through
interoperable repositories for metadata sharing publishing and archiving
Such interoperability requirements necessitated the development of standards such as the Dublin
Core Metadata Element Set and the Open Archives Initiatives Protocol for Metadata Harvesting
(OAI-PMH) These standards have achieved a degree of success in the DL community largely
because of their generality and simplicity
20 Need for a Harvester protocol
There is a growing need to make resources not only descriptive metadata harvestable in an
interoperable manner There are two major use cases that motivate this need
Preservation The need to periodically transfer digital content from a data repository to one or
more trusted digital repositories charged with storing and preserving safety copies of the
content The trusted digital repositories need a mechanism to automatically synchronize with
the originating data repository
Discovery The need to use content itself in the creation of services Examples include search
engines that make full-text from multiple data repositories searchable and citation indexing
systems that extract references from the full-text content Another scenario is the provision of
thumbnail versions of high-quality images from cultural heritage collections to external
services that build browsing interfaces that include the thumbnails
30 OAI Protocol for Metadata Harvesting (OAI-PMH)
In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address
interoperability issues among the many existing and independent DLs The focus was on high-
level communication among systems and simplicity of protocols The OAI has since received
much media attention in the DL community and primarily because of the simplicity of its
standards has attracted many early adopters It defines a mechanism for harvesting records
containing metadata from repositories
31 Definitions of Key terms
Open archives Initiatives (OAI)
OAI is an initiative to develop and promote interoperability standards that aim to facilitate the
efficient dissemination of content
Archive
The term archive in the name Open Archives Initiative reflects the origins of the OAI in
the e-prints community where the term archive is generally accepted as a synonym for
repository of scholarly papers Members of the archiving profession have justifiably noted
the strict definition of an archive within their domain with connotations of preservation of
long-term value statutory authorization and institutional policy The OAI uses the term
archive in a broader sense as a repository for stored information Language and terms are
never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of
the professional archiving community with this broader use of archive
(OAI definition quoted from FAQ on OAI Web site)
OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between services
Protocol
A protocol is a set of rules defining communication between systems FTP (File Transfer
Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for
communication between systems across the Internet
Harvesting
In the OAI context harvesting refers specifically to the gathering together of metadata from a
number of distributed repositories into a combined data store
32 Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
33 OAI Key players
There are two groups of participants Data Providers and Service Providers
Data Providers
(open archives repositories) provide free access to metadata and may but do not necessarily
offer free access to full texts or other resources OAI-PMH provides an easy to implement low
barrier solution for Data Providers
Service Providers
use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means
that there are no live search requests to the Data Providers rather services are based on the
harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers
(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis
of the metadata harvested and they may enrich the harvested metadata in order to do so
34 How it works
Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language) The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers) although
unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata
from many sources can be gathered together in one database and services can be provided based
on this centrally harvested or aggregated data The link between this metadata and the related
content is not defined by the OAI protocol It is important to realize that OAI-PMH does not
provide a search across this data it simply makes it possible to bring the data together in one
place In order to provide services the harvesting approach must be combined with other
mechanisms
35 Protocol details
Records
A record is the metadata of a resource in a specific format A record has three parts a header and
metadata both of which are mandatory and an optional about statement Each of these is made
up of various components as set out below
header (mandatory)
identifier (mandatory 1 only)
datestamp (mandatory 1 only)
setSpec elements (optional 0 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
content The trusted digital repositories need a mechanism to automatically synchronize with
the originating data repository
Discovery The need to use content itself in the creation of services Examples include search
engines that make full-text from multiple data repositories searchable and citation indexing
systems that extract references from the full-text content Another scenario is the provision of
thumbnail versions of high-quality images from cultural heritage collections to external
services that build browsing interfaces that include the thumbnails
30 OAI Protocol for Metadata Harvesting (OAI-PMH)
In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address
interoperability issues among the many existing and independent DLs The focus was on high-
level communication among systems and simplicity of protocols The OAI has since received
much media attention in the DL community and primarily because of the simplicity of its
standards has attracted many early adopters It defines a mechanism for harvesting records
containing metadata from repositories
31 Definitions of Key terms
Open archives Initiatives (OAI)
OAI is an initiative to develop and promote interoperability standards that aim to facilitate the
efficient dissemination of content
Archive
The term archive in the name Open Archives Initiative reflects the origins of the OAI in
the e-prints community where the term archive is generally accepted as a synonym for
repository of scholarly papers Members of the archiving profession have justifiably noted
the strict definition of an archive within their domain with connotations of preservation of
long-term value statutory authorization and institutional policy The OAI uses the term
archive in a broader sense as a repository for stored information Language and terms are
never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of
the professional archiving community with this broader use of archive
(OAI definition quoted from FAQ on OAI Web site)
OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between services
Protocol
A protocol is a set of rules defining communication between systems FTP (File Transfer
Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for
communication between systems across the Internet
Harvesting
In the OAI context harvesting refers specifically to the gathering together of metadata from a
number of distributed repositories into a combined data store
32 Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
33 OAI Key players
There are two groups of participants Data Providers and Service Providers
Data Providers
(open archives repositories) provide free access to metadata and may but do not necessarily
offer free access to full texts or other resources OAI-PMH provides an easy to implement low
barrier solution for Data Providers
Service Providers
use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means
that there are no live search requests to the Data Providers rather services are based on the
harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers
(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis
of the metadata harvested and they may enrich the harvested metadata in order to do so
34 How it works
Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language) The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers) although
unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata
from many sources can be gathered together in one database and services can be provided based
on this centrally harvested or aggregated data The link between this metadata and the related
content is not defined by the OAI protocol It is important to realize that OAI-PMH does not
provide a search across this data it simply makes it possible to bring the data together in one
place In order to provide services the harvesting approach must be combined with other
mechanisms
35 Protocol details
Records
A record is the metadata of a resource in a specific format A record has three parts a header and
metadata both of which are mandatory and an optional about statement Each of these is made
up of various components as set out below
header (mandatory)
identifier (mandatory 1 only)
datestamp (mandatory 1 only)
setSpec elements (optional 0 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
(OAI definition quoted from FAQ on OAI Web site)
OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between services
Protocol
A protocol is a set of rules defining communication between systems FTP (File Transfer
Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for
communication between systems across the Internet
Harvesting
In the OAI context harvesting refers specifically to the gathering together of metadata from a
number of distributed repositories into a combined data store
32 Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
33 OAI Key players
There are two groups of participants Data Providers and Service Providers
Data Providers
(open archives repositories) provide free access to metadata and may but do not necessarily
offer free access to full texts or other resources OAI-PMH provides an easy to implement low
barrier solution for Data Providers
Service Providers
use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means
that there are no live search requests to the Data Providers rather services are based on the
harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers
(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis
of the metadata harvested and they may enrich the harvested metadata in order to do so
34 How it works
Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language) The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers) although
unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata
from many sources can be gathered together in one database and services can be provided based
on this centrally harvested or aggregated data The link between this metadata and the related
content is not defined by the OAI protocol It is important to realize that OAI-PMH does not
provide a search across this data it simply makes it possible to bring the data together in one
place In order to provide services the harvesting approach must be combined with other
mechanisms
35 Protocol details
Records
A record is the metadata of a resource in a specific format A record has three parts a header and
metadata both of which are mandatory and an optional about statement Each of these is made
up of various components as set out below
header (mandatory)
identifier (mandatory 1 only)
datestamp (mandatory 1 only)
setSpec elements (optional 0 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
Data Providers
(open archives repositories) provide free access to metadata and may but do not necessarily
offer free access to full texts or other resources OAI-PMH provides an easy to implement low
barrier solution for Data Providers
Service Providers
use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means
that there are no live search requests to the Data Providers rather services are based on the
harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers
(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis
of the metadata harvested and they may enrich the harvested metadata in order to do so
34 How it works
Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language) The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers) although
unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata
from many sources can be gathered together in one database and services can be provided based
on this centrally harvested or aggregated data The link between this metadata and the related
content is not defined by the OAI protocol It is important to realize that OAI-PMH does not
provide a search across this data it simply makes it possible to bring the data together in one
place In order to provide services the harvesting approach must be combined with other
mechanisms
35 Protocol details
Records
A record is the metadata of a resource in a specific format A record has three parts a header and
metadata both of which are mandatory and an optional about statement Each of these is made
up of various components as set out below
header (mandatory)
identifier (mandatory 1 only)
datestamp (mandatory 1 only)
setSpec elements (optional 0 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set naming and subject conventions etc
o Intellectual Property and Usage Rights - who can do what with what
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language) The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers) although
unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata
from many sources can be gathered together in one database and services can be provided based
on this centrally harvested or aggregated data The link between this metadata and the related
content is not defined by the OAI protocol It is important to realize that OAI-PMH does not
provide a search across this data it simply makes it possible to bring the data together in one
place In order to provide services the harvesting approach must be combined with other
mechanisms
35 Protocol details
Records
A record is the metadata of a resource in a specific format A record has three parts a header and
metadata both of which are mandatory and an optional about statement Each of these is made
up of various components as set out below
header (mandatory)
identifier (mandatory 1 only)
datestamp (mandatory 1 only)
setSpec elements (optional 0 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
metadata (mandatory)
XML encoded metadata with root tag namespace
repositories must support Dublin Core may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record Datestamp is a mandatory
characteristic of every item It has two possible levels of granularity
YYYY-MM-DD or YYYY-MM-DDThhmmssZ
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments Its applications are in incremental update
mechanisms It gives either the date of creation last modification or deletion Deletion is
covered with three support levels no persistent transient
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository The
properties of metadata formats are
ndash id string to specify the format (metadataPrefix)
ndash metadata schema URL (XML schema to test validity)
ndash XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata
formats can be defined and transported via the OAI-PMH Any returned metadata must comply
with an XML namespace specification The Dublin Core Metadata Element Set contains 15
elements All elements are optional and all elements may be repeated
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
36 The Dublin Core Metadata Element Set
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories They are optional archives do not have to
define Sets There are no recommendations for the implementation of Sets Sets are not
necessarily exhaustive of the content of a repository They are not necessarily strictly
hierarchical It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities
function selective harvesting (set parameter)
applications subject gateways dissertation search engine and others
examples
o publication types (thesis article )
o document types (text audio image )
o content sets according to DNB (medicine biology )
37 Request format
Requests must be submitted using the GET or POST methods of HTTP and repositories must
support both methods At least one key=value pair verb=RequestType (where RequestType is
some type of request such as ListRecords) must be provided Additional key=value pairs depend
on the request type
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
example for GET request httparchiveorgoai
verb=ListRecordsampmetadataPrefix=oai_dc
The encoding of special characters must be supported for example (host port separator)
becomes 3A
38 Response
Responses are formatted as HTTP responses The content type must be textxml HTTP-based
status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not
available) may be returned Compression codes are optional in OAI-PMH only identity
encoding is mandatory The response format must be well-formed XML with markup as follows
1 XML declaration
(ltxml version=10 encoding=UTF-8 gt)
2 root element named OAI-PMH with three attributes
(xmlns xmlnsxsi xsischemaLocation)
3 three child elements
1 responseDate (UTC datetime)
2 request (the request that generated this response)
3 a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
39 OAI-
PMH
Verbs
Here lsquoverbrsquo
means
request type which the service providerharvester sends to get responses from data providers There is
a standard set of 6 verbs
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
A harvester is not required to use all types However a repository must implement all types
There are required and optional arguments depending on request types
40 Dspace OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories Developed jointly by
MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open
source system that can be customized and extended DSpace is a digital institutional repository that
captures stores indexes preserves and redistributes content in digital formats Institutional
Repository is a set of services that a research institution organization university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically DSpace has been
deployed for Institutional Repositories of publications thesis and dissertations There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages
Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be
harvested
41 DSpace Search System
The end user can browse search and access the collections using the hierarchies and also the
alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a
part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip
provides Ontologies that enables context based querying This work like subject based directory
structures
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search
akin the field level search of library databases In Dspace Dublin Core elements are used for the field
names Lucene also facilitates Boolean search range searches term boosting and proximity searches
The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
(5) that can replace and match terms by similarity This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names
42 Metadata in Dspace
DSpace users deal withcome across metadata in the following modules
1048707 Administration modules Dublin core registry administrative metadata- default values mail
alert to subscribers
1048707 Submission modules descriptive metadata
1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)
1048707 Search result display brief and full metadata
43 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to
discover the sets Only the 15 basic Dublin Core elements is exposed at present
50 OAI Harvester Software
o Arc (httparccsoduedu)
o Citebase (httpcitebaseeprintsorgcgi-binsearch)
o CYCLADES (httpwwwercimorgcyclades)
o DP9 (httparccsoduedu8080dp9indexjsp)
o MeIND (httpwwwmeindde)
o METALIS (httpmetaliscileait)
o myOAI (httpwwwmyoaicom)
o NCSTRL (httpwwwncstrlorg)
o Purseus (httpwwwperseustuftseducgi-binvor)
o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)
o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)
o OAI Repository Explorer (httprecsuctacza)
o OAIster (httpoaisterumdlumicheduooaister)
o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)
o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)
o DLESE OAI Software (httpdleseorgoaiindexjsp)
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
60 Future Prospects
Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider
o The higher versions of the protocol should be made compatible of the lower ones
At metadata creation level some standardization is required as a particular resource is described
inconsistently at different repositories Vocabulary control measures should be also taken care of
Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users
70 Conclusion
Much promise is seen for the use of the protocol within an open archives approach Support for a
new pattern for scholarly communication is the most publicized potential benefit Perhaps most
readily achievable are the goals of surfacing hidden resources and low cost interoperability
Although the OAI-PMH is technically very simple building coherent services that meet user
requirements remains complex The OAI-PMH protocol could become part of the infrastructure
of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations publishers and archives
REFERENCES
1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml
3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml
6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF
7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml
8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml
9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3
11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml
Top Related