The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
-
Upload
andy-powell -
Category
Education
-
view
983 -
download
4
description
Transcript of The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
![Page 1: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/1.jpg)
UKOLN is supported by:
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
AULIC Institutional Repositories Meeting
University of Bristol – 23 May 2005
Andy Powell, UKOLN, University of Bath
www.bath.ac.uk
A centre of expertise in digital information management
www.ukoln.ac.uk
![Page 2: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/2.jpg)
2
Contents
• a brief history of OAI• 10 technical things you should know about
the OAI-PMH• potential impact…
– institutional context– the role of the library?– the researcher
• ePrints UK project
![Page 3: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/3.jpg)
3
OAI roots
• the roots of OAI lie in the development of eprint archives…– arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL
• each offered Web interface for deposit of articles and for end-user searches
• difficult for end-users to work across archives without having to learn multiple different interfaces
• recognised need for single search interface to all archives– Universal Pre-print Service (UPS)
![Page 4: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/4.jpg)
4
Searching vs. harvesting
• two possible approaches to building a single search interface to multiple eprint archives…– cross-searching multiple archives based on protocol like
Z39.50– harvesting metadata into one or more ‘central’ services –
bulk move data to the user-interface
• digital library experience in this area indicated that cross-searching not preferred approach– distributed searching of N nodes viable, but only for small
values of N
![Page 5: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/5.jpg)
5
Harvesting requirements• in order that harvesting approach can work
there need to be agreements about…– transport protocols – HTTP vs. FTP vs. …– metadata formats – DC vs. MARC vs. …– quality assurance – mandatory elements,
mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice
– intellectual property and usage rights – who can do what with the records
• work in this area resulted in the “Santa Fe Convention”
![Page 6: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/6.jpg)
6
Development of OAI-PMH• 2 year metamorphosis thru various names
– Santa Fe Convention, OAI-PMH versions 1.0, 1.1…– OAI Protocol for Metadata Harvesting 2.0
• development steered by international technical committee
• simplicity and inter-version stability helped developer confidence
• move from focus on eprints to more generic protocol– move from OAI-specific metadata schema to mandatory
support for Dublin Core
![Page 7: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/7.jpg)
7
Bluffer’s guide to OAI
1. OAI-PMH short for Open Archives Initiative Protocol for Metadata Harvesting
2. a low-cost mechanism for harvesting metadata records– from ‘data providers’ to ‘service providers’
3. allows ‘service provider’ to say ‘give me some or all of your metadata records’– where ‘some’ is based on date-stamps, sets,
metadata formats
4. eprint heritage but widely deployed– images, museum artefacts, learning objects, …
http://www.openarchives.org/http://www.openarchives.org/
![Page 8: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/8.jpg)
8
Bluffer’s guide to OAI
5. based on HTTP and XML– simple, Web-friendly, fast deployment
6. OAI-PMH is not a search protocol– but use can underpin search-based services
based on Z39.50 or SRW or SOAP or…
7. OAI-PMH typically carries metadata– content (e.g. full-text or image) made available
separately – typically at URL in metadata
8. mandates simple DC as record format– but extensible to any XML format – IEEE
LOM, ONIX, MARC, METS, MPEG-21, etc.
![Page 9: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/9.jpg)
9
Bluffer’s guide to OAI
9. metadata and ‘content’ often made freely available – but not a requirement– OAI-PMH can be used between closed groups– or, can make metadata available but restrict
access to content in some way
10.underlying HTTP protocol provides– access control – e.g. HTTP BASIC– compression mechanisms (for improving
performance of harvesters)– could, in theory, also provide encryption if
required
![Page 10: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/10.jpg)
10
Dublin Core
• OAI-PMH mandates use of simple DC as lowest common denominator
• agreed XML schema – ‘oai_dc’– simple DC – 15 metadata properties
– all DC properties optional and repeatable
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
http://dublincore.org/http://dublincore.org/
![Page 11: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/11.jpg)
11
Impact on institutions…• OAI-PMH technology provides an open, relatively
stable technical framework– allows institution to re-consider management of
intellectual output– greater confidence in availability of external services
(e.g. discovery, access, analysis)
• the technical bit is easy– eprints.org software (Southampton), DSpace
(MIT/HP), Fedora
• but, technical solutions are always easy!– real problem is cultural change required to get
academics to deposit
![Page 12: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/12.jpg)
12
Impact on libraries…
• library is natural choice as ‘managing agent’ for the institutional repository– quality control– metadata enhancement– preservation
• but technical strengths of libraries quite variable, therefore technical collaboration within institution may be required
• beginning to see some evidence of externally ‘hosted’ repository services being offered
![Page 13: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/13.jpg)
13
Impact on researchers…• OAI-PMH technology provides a ‘disruptive’
technical framework that supports– new ways for individual researcher to disclose his/her
research output– development of new kinds of ‘research’ discovery
services
• can use ‘personal’ OAI repository• but, need to
– clarify roles of institutional, discipline and personal repositories
– overcome FUD – IPR, peer-review, ability to ‘publish’, quality control, inertia
![Page 14: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/14.jpg)
14
ePrints UK
• RDN project funded by JISC under FAIR programme
• now finished but ‘service’ still running• UK ‘service provider’• harvesting metadata from all UK eprint
archives• single point of discovery to UK eprints• working with OCLC and University of
Southampton to automatically enhance harvested metadata
http://eprints-uk.rdn.ac.uk/search/http://eprints-uk.rdn.ac.uk/search/
![Page 15: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/15.jpg)
15
ePrints UKeprint archive(s)
ePrints UK
OAI-PMHname
authority
subjectclassification
citationanalysis
End-user
![Page 16: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/16.jpg)
16
What did we learn?
• impact of eprint archives still quite low
• national coverage is potentially interesting to funders but not to end-users
• automatically enhancing metadata is difficult, particularly w.r.t.– subject classification
– name authority
• approaches to metadata creation varied – no clear cataloguing guidelines– linkage to full-text from metadata record inconsistent
![Page 17: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/17.jpg)
17
![Page 18: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/18.jpg)
18
![Page 19: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/19.jpg)
19
![Page 20: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/20.jpg)
20
![Page 21: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/21.jpg)
21
OAI and Google
OAI gateway
OAI gatewaymakes harvestedmetadataavailable toGoogle…
eprint archive(s)
HTTP
OAI-PMH
Examples…
DSpace and GoogleOAIster and Yahoo
![Page 22: The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK](https://reader035.fdocuments.us/reader035/viewer/2022070313/554a0241b4c905e56c8b51c2/html5/thumbnails/22.jpg)
25
Questions…