Open Archives Initiatives For Metadata Harvesting

20
Open Archives Initiatives for Metadata Harvesting A Framework for Building Open Digital Libraries Term Paper-1 Submitted by NIKESH.N International School of Information Management

Transcript of Open Archives Initiatives For Metadata Harvesting

Page 1: Open Archives Initiatives For Metadata   Harvesting

Open Archives Initiatives for Metadata HarvestingA Framework for Building Open Digital Libraries

Term Paper-1

Submitted by

NIKESHN

International School of Information Management

University of Mysore2010

Open Archives Initiatives for Metadata HarvestingA Framework for Building Open Digital Libraries

10 Introduction

Digital Library may be defined as system that supports collection organization storage retrieval

and dissemination of Digital Documents It may be viewed as the intersection of Library Science

Computer Science and networked information systems Open movements are gaining acceptance

in the scholarly information arena and many of the Universities and research centers have started

to provide public access to their repositories With the growing number of repositories of digital

repositories in the Web it became difficult for the users to visit individual places in search of

information Many organizational repositories have not been indexed by the search engines Such

mechanism is therefore required by which the repositories can share the resources and work in

coordination to provide a broader purview to the users The mechanism which provides the ability to

the information systems to work in coordination has been termed as Interoperability Open Archives

Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources

of many repositories at the usersrsquo end

The essence of the open archives approach is to enable access to Web-accessible material through

interoperable repositories for metadata sharing publishing and archiving

Such interoperability requirements necessitated the development of standards such as the Dublin

Core Metadata Element Set and the Open Archives Initiatives Protocol for Metadata Harvesting

(OAI-PMH) These standards have achieved a degree of success in the DL community largely

because of their generality and simplicity

20 Need for a Harvester protocol

There is a growing need to make resources not only descriptive metadata harvestable in an

interoperable manner There are two major use cases that motivate this need

Preservation The need to periodically transfer digital content from a data repository to one or

more trusted digital repositories charged with storing and preserving safety copies of the

content The trusted digital repositories need a mechanism to automatically synchronize with

the originating data repository

Discovery The need to use content itself in the creation of services Examples include search

engines that make full-text from multiple data repositories searchable and citation indexing

systems that extract references from the full-text content Another scenario is the provision of

thumbnail versions of high-quality images from cultural heritage collections to external

services that build browsing interfaces that include the thumbnails

30 OAI Protocol for Metadata Harvesting (OAI-PMH)

In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address

interoperability issues among the many existing and independent DLs The focus was on high-

level communication among systems and simplicity of protocols The OAI has since received

much media attention in the DL community and primarily because of the simplicity of its

standards has attracted many early adopters It defines a mechanism for harvesting records

containing metadata from repositories

31 Definitions of Key terms

Open archives Initiatives (OAI)

OAI is an initiative to develop and promote interoperability standards that aim to facilitate the

efficient dissemination of content

Archive

The term archive in the name Open Archives Initiative reflects the origins of the OAI in

the e-prints community where the term archive is generally accepted as a synonym for

repository of scholarly papers Members of the archiving profession have justifiably noted

the strict definition of an archive within their domain with connotations of preservation of

long-term value statutory authorization and institutional policy The OAI uses the term

archive in a broader sense as a repository for stored information Language and terms are

never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of

the professional archiving community with this broader use of archive

(OAI definition quoted from FAQ on OAI Web site)

OAI Protocol for Metadata Harvesting (OAI-PMH)

OAI-PMH is a lightweight harvesting protocol for sharing metadata between services

Protocol

A protocol is a set of rules defining communication between systems FTP (File Transfer

Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for

communication between systems across the Internet

Harvesting

In the OAI context harvesting refers specifically to the gathering together of metadata from a

number of distributed repositories into a combined data store

32 Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

33 OAI Key players

There are two groups of participants Data Providers and Service Providers

Data Providers

(open archives repositories) provide free access to metadata and may but do not necessarily

offer free access to full texts or other resources OAI-PMH provides an easy to implement low

barrier solution for Data Providers

Service Providers

use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means

that there are no live search requests to the Data Providers rather services are based on the

harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers

(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis

of the metadata harvested and they may enrich the harvested metadata in order to do so

34 How it works

Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

The OAI-PMH gives a simple technical option for data providers to make their metadata

available to services based on the open standards HTTP (Hypertext Transport Protocol) and

XML (Extensible Markup Language) The metadata that is harvested may be in any format that

is agreed by a community (or by any discrete set of data and service providers) although

unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata

from many sources can be gathered together in one database and services can be provided based

on this centrally harvested or aggregated data The link between this metadata and the related

content is not defined by the OAI protocol It is important to realize that OAI-PMH does not

provide a search across this data it simply makes it possible to bring the data together in one

place In order to provide services the harvesting approach must be combined with other

mechanisms

35 Protocol details

Records

A record is the metadata of a resource in a specific format A record has three parts a header and

metadata both of which are mandatory and an optional about statement Each of these is made

up of various components as set out below

header (mandatory)

identifier (mandatory 1 only)

datestamp (mandatory 1 only)

setSpec elements (optional 0 1 or more)

status attribute for deleted item

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 2: Open Archives Initiatives For Metadata   Harvesting

Open Archives Initiatives for Metadata HarvestingA Framework for Building Open Digital Libraries

10 Introduction

Digital Library may be defined as system that supports collection organization storage retrieval

and dissemination of Digital Documents It may be viewed as the intersection of Library Science

Computer Science and networked information systems Open movements are gaining acceptance

in the scholarly information arena and many of the Universities and research centers have started

to provide public access to their repositories With the growing number of repositories of digital

repositories in the Web it became difficult for the users to visit individual places in search of

information Many organizational repositories have not been indexed by the search engines Such

mechanism is therefore required by which the repositories can share the resources and work in

coordination to provide a broader purview to the users The mechanism which provides the ability to

the information systems to work in coordination has been termed as Interoperability Open Archives

Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources

of many repositories at the usersrsquo end

The essence of the open archives approach is to enable access to Web-accessible material through

interoperable repositories for metadata sharing publishing and archiving

Such interoperability requirements necessitated the development of standards such as the Dublin

Core Metadata Element Set and the Open Archives Initiatives Protocol for Metadata Harvesting

(OAI-PMH) These standards have achieved a degree of success in the DL community largely

because of their generality and simplicity

20 Need for a Harvester protocol

There is a growing need to make resources not only descriptive metadata harvestable in an

interoperable manner There are two major use cases that motivate this need

Preservation The need to periodically transfer digital content from a data repository to one or

more trusted digital repositories charged with storing and preserving safety copies of the

content The trusted digital repositories need a mechanism to automatically synchronize with

the originating data repository

Discovery The need to use content itself in the creation of services Examples include search

engines that make full-text from multiple data repositories searchable and citation indexing

systems that extract references from the full-text content Another scenario is the provision of

thumbnail versions of high-quality images from cultural heritage collections to external

services that build browsing interfaces that include the thumbnails

30 OAI Protocol for Metadata Harvesting (OAI-PMH)

In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address

interoperability issues among the many existing and independent DLs The focus was on high-

level communication among systems and simplicity of protocols The OAI has since received

much media attention in the DL community and primarily because of the simplicity of its

standards has attracted many early adopters It defines a mechanism for harvesting records

containing metadata from repositories

31 Definitions of Key terms

Open archives Initiatives (OAI)

OAI is an initiative to develop and promote interoperability standards that aim to facilitate the

efficient dissemination of content

Archive

The term archive in the name Open Archives Initiative reflects the origins of the OAI in

the e-prints community where the term archive is generally accepted as a synonym for

repository of scholarly papers Members of the archiving profession have justifiably noted

the strict definition of an archive within their domain with connotations of preservation of

long-term value statutory authorization and institutional policy The OAI uses the term

archive in a broader sense as a repository for stored information Language and terms are

never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of

the professional archiving community with this broader use of archive

(OAI definition quoted from FAQ on OAI Web site)

OAI Protocol for Metadata Harvesting (OAI-PMH)

OAI-PMH is a lightweight harvesting protocol for sharing metadata between services

Protocol

A protocol is a set of rules defining communication between systems FTP (File Transfer

Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for

communication between systems across the Internet

Harvesting

In the OAI context harvesting refers specifically to the gathering together of metadata from a

number of distributed repositories into a combined data store

32 Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

33 OAI Key players

There are two groups of participants Data Providers and Service Providers

Data Providers

(open archives repositories) provide free access to metadata and may but do not necessarily

offer free access to full texts or other resources OAI-PMH provides an easy to implement low

barrier solution for Data Providers

Service Providers

use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means

that there are no live search requests to the Data Providers rather services are based on the

harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers

(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis

of the metadata harvested and they may enrich the harvested metadata in order to do so

34 How it works

Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

The OAI-PMH gives a simple technical option for data providers to make their metadata

available to services based on the open standards HTTP (Hypertext Transport Protocol) and

XML (Extensible Markup Language) The metadata that is harvested may be in any format that

is agreed by a community (or by any discrete set of data and service providers) although

unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata

from many sources can be gathered together in one database and services can be provided based

on this centrally harvested or aggregated data The link between this metadata and the related

content is not defined by the OAI protocol It is important to realize that OAI-PMH does not

provide a search across this data it simply makes it possible to bring the data together in one

place In order to provide services the harvesting approach must be combined with other

mechanisms

35 Protocol details

Records

A record is the metadata of a resource in a specific format A record has three parts a header and

metadata both of which are mandatory and an optional about statement Each of these is made

up of various components as set out below

header (mandatory)

identifier (mandatory 1 only)

datestamp (mandatory 1 only)

setSpec elements (optional 0 1 or more)

status attribute for deleted item

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 3: Open Archives Initiatives For Metadata   Harvesting

content The trusted digital repositories need a mechanism to automatically synchronize with

the originating data repository

Discovery The need to use content itself in the creation of services Examples include search

engines that make full-text from multiple data repositories searchable and citation indexing

systems that extract references from the full-text content Another scenario is the provision of

thumbnail versions of high-quality images from cultural heritage collections to external

services that build browsing interfaces that include the thumbnails

30 OAI Protocol for Metadata Harvesting (OAI-PMH)

In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address

interoperability issues among the many existing and independent DLs The focus was on high-

level communication among systems and simplicity of protocols The OAI has since received

much media attention in the DL community and primarily because of the simplicity of its

standards has attracted many early adopters It defines a mechanism for harvesting records

containing metadata from repositories

31 Definitions of Key terms

Open archives Initiatives (OAI)

OAI is an initiative to develop and promote interoperability standards that aim to facilitate the

efficient dissemination of content

Archive

The term archive in the name Open Archives Initiative reflects the origins of the OAI in

the e-prints community where the term archive is generally accepted as a synonym for

repository of scholarly papers Members of the archiving profession have justifiably noted

the strict definition of an archive within their domain with connotations of preservation of

long-term value statutory authorization and institutional policy The OAI uses the term

archive in a broader sense as a repository for stored information Language and terms are

never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of

the professional archiving community with this broader use of archive

(OAI definition quoted from FAQ on OAI Web site)

OAI Protocol for Metadata Harvesting (OAI-PMH)

OAI-PMH is a lightweight harvesting protocol for sharing metadata between services

Protocol

A protocol is a set of rules defining communication between systems FTP (File Transfer

Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for

communication between systems across the Internet

Harvesting

In the OAI context harvesting refers specifically to the gathering together of metadata from a

number of distributed repositories into a combined data store

32 Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

33 OAI Key players

There are two groups of participants Data Providers and Service Providers

Data Providers

(open archives repositories) provide free access to metadata and may but do not necessarily

offer free access to full texts or other resources OAI-PMH provides an easy to implement low

barrier solution for Data Providers

Service Providers

use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means

that there are no live search requests to the Data Providers rather services are based on the

harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers

(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis

of the metadata harvested and they may enrich the harvested metadata in order to do so

34 How it works

Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

The OAI-PMH gives a simple technical option for data providers to make their metadata

available to services based on the open standards HTTP (Hypertext Transport Protocol) and

XML (Extensible Markup Language) The metadata that is harvested may be in any format that

is agreed by a community (or by any discrete set of data and service providers) although

unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata

from many sources can be gathered together in one database and services can be provided based

on this centrally harvested or aggregated data The link between this metadata and the related

content is not defined by the OAI protocol It is important to realize that OAI-PMH does not

provide a search across this data it simply makes it possible to bring the data together in one

place In order to provide services the harvesting approach must be combined with other

mechanisms

35 Protocol details

Records

A record is the metadata of a resource in a specific format A record has three parts a header and

metadata both of which are mandatory and an optional about statement Each of these is made

up of various components as set out below

header (mandatory)

identifier (mandatory 1 only)

datestamp (mandatory 1 only)

setSpec elements (optional 0 1 or more)

status attribute for deleted item

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 4: Open Archives Initiatives For Metadata   Harvesting

(OAI definition quoted from FAQ on OAI Web site)

OAI Protocol for Metadata Harvesting (OAI-PMH)

OAI-PMH is a lightweight harvesting protocol for sharing metadata between services

Protocol

A protocol is a set of rules defining communication between systems FTP (File Transfer

Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for

communication between systems across the Internet

Harvesting

In the OAI context harvesting refers specifically to the gathering together of metadata from a

number of distributed repositories into a combined data store

32 Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

33 OAI Key players

There are two groups of participants Data Providers and Service Providers

Data Providers

(open archives repositories) provide free access to metadata and may but do not necessarily

offer free access to full texts or other resources OAI-PMH provides an easy to implement low

barrier solution for Data Providers

Service Providers

use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means

that there are no live search requests to the Data Providers rather services are based on the

harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers

(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis

of the metadata harvested and they may enrich the harvested metadata in order to do so

34 How it works

Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

The OAI-PMH gives a simple technical option for data providers to make their metadata

available to services based on the open standards HTTP (Hypertext Transport Protocol) and

XML (Extensible Markup Language) The metadata that is harvested may be in any format that

is agreed by a community (or by any discrete set of data and service providers) although

unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata

from many sources can be gathered together in one database and services can be provided based

on this centrally harvested or aggregated data The link between this metadata and the related

content is not defined by the OAI protocol It is important to realize that OAI-PMH does not

provide a search across this data it simply makes it possible to bring the data together in one

place In order to provide services the harvesting approach must be combined with other

mechanisms

35 Protocol details

Records

A record is the metadata of a resource in a specific format A record has three parts a header and

metadata both of which are mandatory and an optional about statement Each of these is made

up of various components as set out below

header (mandatory)

identifier (mandatory 1 only)

datestamp (mandatory 1 only)

setSpec elements (optional 0 1 or more)

status attribute for deleted item

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 5: Open Archives Initiatives For Metadata   Harvesting

Data Providers

(open archives repositories) provide free access to metadata and may but do not necessarily

offer free access to full texts or other resources OAI-PMH provides an easy to implement low

barrier solution for Data Providers

Service Providers

use the OAI interfaces of the Data Providers to harvest and store metadata Note that this means

that there are no live search requests to the Data Providers rather services are based on the

harvested data via OAI-PMH Service Providers may select certain subsets from Data Providers

(eg by set hierarchy or date stamp) Service Providers offer (value-added) services on the basis

of the metadata harvested and they may enrich the harvested metadata in order to do so

34 How it works

Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

The OAI-PMH gives a simple technical option for data providers to make their metadata

available to services based on the open standards HTTP (Hypertext Transport Protocol) and

XML (Extensible Markup Language) The metadata that is harvested may be in any format that

is agreed by a community (or by any discrete set of data and service providers) although

unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata

from many sources can be gathered together in one database and services can be provided based

on this centrally harvested or aggregated data The link between this metadata and the related

content is not defined by the OAI protocol It is important to realize that OAI-PMH does not

provide a search across this data it simply makes it possible to bring the data together in one

place In order to provide services the harvesting approach must be combined with other

mechanisms

35 Protocol details

Records

A record is the metadata of a resource in a specific format A record has three parts a header and

metadata both of which are mandatory and an optional about statement Each of these is made

up of various components as set out below

header (mandatory)

identifier (mandatory 1 only)

datestamp (mandatory 1 only)

setSpec elements (optional 0 1 or more)

status attribute for deleted item

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 6: Open Archives Initiatives For Metadata   Harvesting

Prerequisites to develop metadata harvesting protocol

To facilitate metadata harvesting there needs to be agreement on

o Transport protocol - HTTP or FTP or other such protocol

o Metadata format - Dublin Core or MARC or other such format

o Metadata Quality Assurance - mandatory element set naming and subject conventions etc

o Intellectual Property and Usage Rights - who can do what with what

The OAI-PMH gives a simple technical option for data providers to make their metadata

available to services based on the open standards HTTP (Hypertext Transport Protocol) and

XML (Extensible Markup Language) The metadata that is harvested may be in any format that

is agreed by a community (or by any discrete set of data and service providers) although

unqualified Dublin Core is specified to provide a basic level of interoperability Thus metadata

from many sources can be gathered together in one database and services can be provided based

on this centrally harvested or aggregated data The link between this metadata and the related

content is not defined by the OAI protocol It is important to realize that OAI-PMH does not

provide a search across this data it simply makes it possible to bring the data together in one

place In order to provide services the harvesting approach must be combined with other

mechanisms

35 Protocol details

Records

A record is the metadata of a resource in a specific format A record has three parts a header and

metadata both of which are mandatory and an optional about statement Each of these is made

up of various components as set out below

header (mandatory)

identifier (mandatory 1 only)

datestamp (mandatory 1 only)

setSpec elements (optional 0 1 or more)

status attribute for deleted item

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 7: Open Archives Initiatives For Metadata   Harvesting

metadata (mandatory)

XML encoded metadata with root tag namespace

repositories must support Dublin Core may support other formats

about (optional)

rights statements

provenance statements

Datestamps

A datestamp is the date of last modification of a metadata record Datestamp is a mandatory

characteristic of every item It has two possible levels of granularity

YYYY-MM-DD or YYYY-MM-DDThhmmssZ

The function of the datestamp is to provide information on metadata that enables selective

harvesting using from and until arguments Its applications are in incremental update

mechanisms It gives either the date of creation last modification or deletion Deletion is

covered with three support levels no persistent transient

Metadata schema

OAI-PMH supports dissemination of multiple metadata formats from a repository The

properties of metadata formats are

ndash id string to specify the format (metadataPrefix)

ndash metadata schema URL (XML schema to test validity)

ndash XML namespace URI (global identifier for metadata format)

Repositories must be able to disseminate unqualified Dublin Core Further arbitrary metadata

formats can be defined and transported via the OAI-PMH Any returned metadata must comply

with an XML namespace specification The Dublin Core Metadata Element Set contains 15

elements All elements are optional and all elements may be repeated

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 8: Open Archives Initiatives For Metadata   Harvesting

36 The Dublin Core Metadata Element Set

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Sets

Sets enable a logical partitioning of repositories They are optional archives do not have to

define Sets There are no recommendations for the implementation of Sets Sets are not

necessarily exhaustive of the content of a repository They are not necessarily strictly

hierarchical It is important and necessary to have negotiated agreements within communities

defining useful sets for the communities

function selective harvesting (set parameter)

applications subject gateways dissertation search engine and others

examples

o publication types (thesis article )

o document types (text audio image )

o content sets according to DNB (medicine biology )

37 Request format

Requests must be submitted using the GET or POST methods of HTTP and repositories must

support both methods At least one key=value pair verb=RequestType (where RequestType is

some type of request such as ListRecords) must be provided Additional key=value pairs depend

on the request type

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 9: Open Archives Initiatives For Metadata   Harvesting

example for GET request httparchiveorgoai

verb=ListRecordsampmetadataPrefix=oai_dc

The encoding of special characters must be supported for example (host port separator)

becomes 3A

38 Response

Responses are formatted as HTTP responses The content type must be textxml HTTP-based

status codes as distinguished from OAI-PMH errors such as 302 (redirect) and 503 (service not

available) may be returned Compression codes are optional in OAI-PMH only identity

encoding is mandatory The response format must be well-formed XML with markup as follows

1 XML declaration

(ltxml version=10 encoding=UTF-8 gt)

2 root element named OAI-PMH with three attributes

(xmlns xmlnsxsi xsischemaLocation)

3 three child elements

1 responseDate (UTC datetime)

2 request (the request that generated this response)

3 a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 10: Open Archives Initiatives For Metadata   Harvesting

39 OAI-

PMH

Verbs

Here lsquoverbrsquo

means

request type which the service providerharvester sends to get responses from data providers There is

a standard set of 6 verbs

o Identify

o ListMetadataFormats

o ListSets

o GetRecord

o ListIdentifiers

o ListRecords

Function

Identify Description of repository

ListMetadataFormats Metadata format supported by the repository

ListSets Sets defined by repository

ListIdentifiers Retrieves unique identifiers of the item

ListRecords Used to harvest records from the repository

GetRecords Retrieves individual metadata record from the

repository

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 11: Open Archives Initiatives For Metadata   Harvesting

A harvester is not required to use all types However a repository must implement all types

There are required and optional arguments depending on request types

40 Dspace OAI compatible Digital Library Software

DSpace is open source software for building and managing Digital repositories Developed jointly by

MIT Libraries and Hewlett-Packard (HP) is freely available to research institutions as an open

source system that can be customized and extended DSpace is a digital institutional repository that

captures stores indexes preserves and redistributes content in digital formats Institutional

Repository is a set of services that a research institution organization university offers to the

members of its community for the management and dissemination of digital

materials created by the institution and its community members Typically DSpace has been

deployed for Institutional Repositories of publications thesis and dissertations There are several

groups working on extending its capabilities such implementation of ontologies in search interface

and for submission module customization for management of electronic theses and dissertations and

for localization and international of the package for the world languages

Dspace is compliant with OAI-PMH ver 20 and metadata in Dspace digital libraries can be

harvested

41 DSpace Search System

The end user can browse search and access the collections using the hierarchies and also the

alphabetic bar menu For searching the collection Dspace uses Lucene Search Engine which is a

part of Apache Jakarta Project (1) Additionally research projects such as the hellip(Portugal)hellip

provides Ontologies that enables context based querying This work like subject based directory

structures

Lucene search engine has very powerful search features that encompass many search approaches of

the end-user It provides the basic lsquoexact termrsquo or keyword search In addition it allows fielded search

akin the field level search of library databases In Dspace Dublin Core elements are used for the field

names Lucene also facilitates Boolean search range searches term boosting and proximity searches

The interesting search facility lucene uses fuzzy logic that is based on the Levenstienrsquos alogorithm

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 12: Open Archives Initiatives For Metadata   Harvesting

(5) that can replace and match terms by similarity This feature is especially useful in instances where

we hear a term and guess it spellings and more so in the case of personal names

42 Metadata in Dspace

DSpace users deal withcome across metadata in the following modules

1048707 Administration modules Dublin core registry administrative metadata- default values mail

alert to subscribers

1048707 Submission modules descriptive metadata

1048707 Harvesting ndash OAI-PMH using the DC elements (unqualified)

1048707 Search result display brief and full metadata

43 Metadata harvesting in Dspace

Dspace is compliant with the OAI-PMH for exposing metadata OAI-PMH allows repositories to

expose an hierarchy of sets in which records may be placed DSpace exposes collections as sets

Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets to

discover the sets Only the 15 basic Dublin Core elements is exposed at present

50 OAI Harvester Software

o Arc (httparccsoduedu)

o Citebase (httpcitebaseeprintsorgcgi-binsearch)

o CYCLADES (httpwwwercimorgcyclades)

o DP9 (httparccsoduedu8080dp9indexjsp)

o MeIND (httpwwwmeindde)

o METALIS (httpmetaliscileait)

o myOAI (httpwwwmyoaicom)

o NCSTRL (httpwwwncstrlorg)

o Purseus (httpwwwperseustuftseducgi-binvor)

o Public Knowledge Project ndash Open Archives Harvester (httppkpubccaharvester)

o OAICAT (httpwwwoclcorgresearchsoftwareoaicathtm)

o OAI Repository Explorer (httprecsuctacza)

o OAIster (httpoaisterumdlumicheduooaister)

o OASIC (Open Archvies en SIC) (httpoasicccsdcnrsfr)

o OAIHarvester (httpwwwoclcorgresearchsoftwareoaiharvesterhtm)

o DLESE OAI Software (httpdleseorgoaiindexjsp)

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 13: Open Archives Initiatives For Metadata   Harvesting

60 Future Prospects

Some more work has to be done in order to make OAI-PMH as a complete globally accepted

metadata harvesting protocol

o Tools and software has to be developed by which the non-OAI-PMH compliant repositories

can be converted into OAI-PMH compliant so that the repository can be made data provider

o The higher versions of the protocol should be made compatible of the lower ones

At metadata creation level some standardization is required as a particular resource is described

inconsistently at different repositories Vocabulary control measures should be also taken care of

Still some more improvements are awaited in OAI-PMH protocol and then only we can ensure

a comprehensive view of the resources available on a particular subject to our end-users

70 Conclusion

Much promise is seen for the use of the protocol within an open archives approach Support for a

new pattern for scholarly communication is the most publicized potential benefit Perhaps most

readily achievable are the goals of surfacing hidden resources and low cost interoperability

Although the OAI-PMH is technically very simple building coherent services that meet user

requirements remains complex The OAI-PMH protocol could become part of the infrastructure

of the Web as taken-for-granted as the HTTP protocol now is if a combination of its relative

simplicity and proven success by early implementers in a service context leads to widespread

uptake by research organizations publishers and archives

REFERENCES

1 httpwwwopenarchivesorg 2 Breeding M (2002 April) The Emergence of the Open Archives Initiative This Protocolcould become a key part of the digital library infrastructure Information Todayfrom httpwwwfindarticlescomcf_0m33364_1985251474p1articlejhtml

3 Breeding M (2002) Understanding the Protocol for Metadata Harvesting of the OpenArchives Initiative Computers in Libraries 22(8)

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function
Page 14: Open Archives Initiatives For Metadata   Harvesting

4 Lagoze C amp Sompel H V d (2001 January) The Open Archives Initiative Protocol forMetadata Harvestingfrom httpwwwopenarchivesorgOAIopenarchivesprotocolhtm5 Lynch C A (2001 August) Metadata Harvesting and the Open Archives Initiative ARLBimonthly Report 217 from httpwwwarlorgnewsltr217mhphtml

6 Shearer K (2002 March) The Open Archives Initiative Developing an InteroperabilityFramework for Scholarly Publishing CARLABRC Background Series No 5 from httpwwwcarl-abrccaprojectsscholarlyopen_archivesPDF

7 Suleman H amp Fox E A (2001 December) A Framework for Building Open DigitalLibraries D-Lib Magazine 7(12) from httpwwwdliborgdlibdecember01suleman12sulemanhtml

8 Sompel H V d amp Lagoze C (2000 February) The Santa Fe Convention of the OpenArchives Initiative D-Lib Magazine 6(2) from httpwwwdliborgdlibfebruary00vandesompel-oai02vandesompel-oaihtml

9 Warner S (2001 June) Exposing and Harvesting Metadata Using the OAI MetadataHarvesting Protocol A Tutorial HEP Libraries Webzine Issue 4 from httplibrarycernchHEPLW4papers3

11 httpwwwukolnacukrepositoriesdigirepindexFAQs12 Michael Shepherd (2003) Interoperability for Digital Libraries DRTC Workshop on Semantic Web 8th ndash 10th December 2003DRTC Bangalore13 httpwwwopenarchivesorgRegisterBrowseSites14 httpwwwopenarchivesorgservicelistprovidershtml

  • 32 Prerequisites to develop metadata harvesting protocol
  • Prerequisites to develop metadata harvesting protocol
    • Records
    • Datestamps
    • Metadata schema
    • Sets
    • 37 Request format
    • 38 Response
      • Function