OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of...
-
Upload
audrey-stack -
Category
Documents
-
view
215 -
download
2
Transcript of OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of...
![Page 1: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/1.jpg)
OAI Protocol for Metadata Harvesting
Tim BrodyIntelligence, Agents, Multimedia Group
University of SouthamptonOpCit – http://opcit.eprints.org/
www.ecs.soton.ac.uk
BCS Metadata Meeting, London 29th May 2002
(Many slides borrowed from Michael L. Nelson)
![Page 2: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/2.jpg)
OAI 2.0
• Public, stable not released yet … (but very close)– Beta released mid-May– Public release scheduled: 1st June
• 2.0 implementations in the pipeline– British Library, Cornell Univ, Ex Libris, my.OAI, Humbolt
Univ, InQuirion Pty Ltd, Library of Congress, NASA, OCLC, Old Dominion Univ, U. of Illinois, U. of Southampton, UCLA,
John Hopkins U., Indiana U., NYU, UKOLN, Virginia Tech
![Page 3: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/3.jpg)
Open Archives Initiative
The protocol is openlydocumented, and metadatais “exposed” to at least somepeer group (note: rights management can still apply!)
Archive defined as a“collection of stuff” --not the archivist’s definition of “archive”. “Repository” used in most OAI documents.
OAI is happeningat break-neck speed...
![Page 4: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/4.jpg)
Metadata Harvesting• Move away from distributed searching• Extract metadata from various sources• Build services on local copies of metadata
– Resources remain at remote repositories
user
. . .
search for “cfd applications”
local copy ofmetadata
metadataharvested offline
metadataharvested offline
metadataharvested offline
metadataharvested offline
each node independently maintained
all searching, browsing, etc. performed on the metadata hereindividual nodes can
still support direct userinteraction
![Page 5: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/5.jpg)
Metadata Harvesting
• Repositories (archives etc.) = low implementation cost
• Services = higher implementation cost
• Similar to web search model– DP9 gateway makes it exactly the same
![Page 6: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/6.jpg)
about eprintsdocument
like objectsresources
metadata OAMSunqualifiedDublin Core
unqualifiedDublin Core
transport HTTP HTTP HTTP
responses XML XML XML
requests HTTP GET/POST HTTP GET/POST HTTP GET/POST
verbs Dienst OAI-PMH OAI-PMH
nature experimental experimental stable
modelmetadataharvesting
metadataharvesting
metadataharvesting
Santa Feconvention
OAI-PMHv.1.0/1.1
OAI-PMHv.2.0
![Page 7: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/7.jpg)
OAI-PMH v.2.0 [06/2002]
• Goal: recurrent exchange of metadata about resources between systems
• Input:• OAI-PMH v.1.0 [01/01 – 09/02]• feedback on OAI-implementers• deliberations by OAI-tech [09/01 -]• alpha test group of OAI-PMH v.2.0 [03/02 -]
![Page 8: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/8.jpg)
• low-barrier interoperability specification• metadata harvesting model: data provider / service
provider• metadata about resources • autonomous protocol• distinction between protocol and periphery
• community-specific extensions• HTTP based• XML responses• unqualified Dublin Core• stable (1.0 characterized as experimental)
OAI-PMH v.2.0 [06/2002]
![Page 9: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/9.jpg)
OAI Data Model:
Resources / Items / Records
resource
all available metadata about David
item
Dublin Coremetadata
MARCmetadata
SPECTRUMmetadata records
item = identifier
record = identifier + metadata format + datestamp
![Page 10: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/10.jpg)
Overview of OAI Verbs
Verb Function
Identify description of archive
ListMetadataFormats metadata formats supported by archive
ListSets sets defined by archive
ListIdentifiers OAI unique ids contained in archive
ListRecords listing of N records
GetRecord listing of a single record
archivalmetadata
harvestingverbs
most verbs take arguments: dates, sets, ids, metadata formatsand resumption token (for flow control)
![Page 11: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/11.jpg)
Identify
• Arguments– none
• Errors– none
• Arguments– none
• Errors– badArgument
1.1 2.0
![Page 12: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/12.jpg)
ListMetadataFormats
• Arguments– identifier
(OPTIONAL)
• Errors– id does not exist
• Arguments– identifier
(OPTIONAL)
• Errors– badArgument– noMetadataFormats– idDoesNotExist
1.1 2.0
![Page 13: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/13.jpg)
ListSets
• Arguments– resumptionToken
(EXCLUSIVE)
• Errors– no set hierarchy
• Arguments– resumptionToken
(EXCLUSIVE)
• Errors– badArgument– badResumptionToken– noSetHierarchy
1.1 2.0
![Page 14: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/14.jpg)
ListIdentifiers
• Arguments– from (OPTIONAL)
– until (OPTIONAL)
– set (OPTIONAL)
– resumptionToken (EXCLUSIVE)
• Errors– no records match
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)– metadataPrefix (REQUIRED)
• Errors– badArgument– cannotDisseminateFormat– badResumptionToken– noSetHierarchy– noRecordsMatch
1.1 2.0
![Page 15: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/15.jpg)
ListRecords
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)– metadataPrefix
(REQUIRED)
• Errors– no records match– metadata format cannot be
disseminated
• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken
(EXCLUSIVE)– metadataPrefix (REQUIRED)
• Errors– noRecordsMatch– cannotDisseminateFormat– badResumptionToken– noSetHierarchy– badArgument
1.1 2.0
![Page 16: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/16.jpg)
GetRecord
• Arguments– identifier
(REQUIRED)
– metadataPrefix (REQUIRED)
• Errors– id does not exist
– metadata format cannot be disseminated
• Arguments– identifier
(REQUIRED)– metadataPrefix
(REQUIRED)
• Errors– badArgument– cannotDisseminateFor
mat– idDoesNotExist
1.1 2.0
![Page 17: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/17.jpg)
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord></OAI-PMH>
response no errors
![Page 18: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/18.jpg)
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request>http://arXiv.org/oai2</request><error code=“badVerb”>ShowMe is not a valid OAI-PMH verb</error></OAI-PMH>
response with error
![Page 19: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/19.jpg)
• Idempotency of resumptionToken: return same incomplete list when rT is re-issued• while no changes occur in the repo: strict• while changes occur in the repo: all items with unchanged
datestamp• new attributes for the resumptionToken:
• expirationDate• completeListSize• cursor
resumptionToken Flow-Control
![Page 20: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/20.jpg)
• evolution
• from talking about OAI-PMH
• to talking about projects that use OAI-PMH
• to talking about projects and failing to mention they use OAI-PMH
• => OAI-PMH becomes part of the infrastructure
Adoption
![Page 21: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/21.jpg)
• 49 registered repositories [11/2001]
• 65 registered repositories [03/2002]
• 77 registered repositories [05/2002]
• 5+ million records
• many unregistered repositories
• private implementations (e.g. RDN)
Data Providers (a.k.a. repositories)
![Page 22: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/22.jpg)
• Arc: cross-searching of registered repositories [ http://arc.cs.odu.edu ]
• CiteBase: research literature search + citation ranking[ http://citebase.eprints.org ]
• OLAC: cross-searching of Language Archive Community repositories[ http://www.language-archives.org/index.html ]
Service Providers
![Page 23: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/23.jpg)
• Scirus scientific search engine [Elsevier][ http://www.scirus.com ]
• my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.][ http://www.myoai.com ]
• Growing interest from web search engines
Service Providers
![Page 24: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org](https://reader035.fdocuments.us/reader035/viewer/2022062618/5514c58d550346b0478b49b8/html5/thumbnails/24.jpg)
• Repository Explorer: interactive exploration of repositories [Virginia Tech][ http://www.purl.org/NET/oai_explorer ]
• eprints.org: generic OAI-PMH compliant repository software [U of Southampton][ http://www.eprints.org ]
• ALCME repository and harvester software [OCLC][ http://alcme.oclc.org/index.html ]
• APIs, others tools @ www.openarchives.org
OAI-PMH tools