OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of...
-
Upload
herbert-riley -
Category
Documents
-
view
214 -
download
0
Transcript of OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of...
OAI from the needle box
Humboldt Universität Berlin, March 20, 2002
Thomas KrichelPalmer School of Library and Information Science
Long Island University
With apologies to Carl Lagoze
Where I come from...
• Trained economist• Early (1991) visionary of free online scholarship • Creator of NetEc in 1993• Principal founder of RePEc in 1997
– Largest distributed academic DL in the world
– Collection that is open for • Contribution
• Usage
– Grown to over 200 archives, over 10 partly interoperable user services
Metadata collection process
• Metadata is expensive to collect.
• Free online scholarship requires academic self-
documentation
• Building free metadata collection is difficult• no established business model
• no established funding channels
• Only a collaborative effort will be succeed.
The example of eprint servers
• attractive building block for the transformation of
scholarly communication
• but isolated efforts do not make for a scholarly
communication system
• need to federate archives
• need to interoperate with other scholarly
communication components
e-print
Example: e-print accessibility
e-print
e-print
e-print
e-print
e-print
Example: e-print accessibility
e-print
e-print
e-print
e-print
metadata harvesting
metadata
e-print
e-print
e-print
e-print
e-print
metadata harvesting
metadata
AuthorTitleAbstractIdentifer
e-print
e-print
e-print
e-print
e-print
other examples
• within the area of scholarly commuication
• already implemented in RePEc
• Sharing of log data between service providers
• Provision non-document data for document data
provider
• personal data
• institutional data
core concepts in OAI 1.1
• shared metadata format
OAI 1.1 protocol
Dublin Core
HTTP based
Community specific
Reply • XML Schema
• Self contained
• low-barrier interoperability
• data-provider / service-provider model
• metadata harvesting model
• parallel metadata formats
harvester / repository
repos i tory
oai protocol
harves ter
supportdata
harvestingdata
items
OAI protocol requests
Supporting protocol requests:• Identify• ListMetadataFormats• ListSets
Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord
repos i tory
harves ter
service provider data provider
HTTP encoding - requests
BASE-URL -----------> an.oa.org/OAI-scriptkeyword arguments -->verb=ListIdentifers&set=S1
GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1
HTTP encoding - responses
<xml version=1.0 encoding=“UTF-8” ?><GetRecord
xmlns=“http://oai.namespace.uri”xmlns:xsi=“http://w3.namespace.uri”xsi:schemaLocation=“http://oai.namespace.uri
http://oai.schemaURL”><responseDate>2000-19-01T19:30:30-04:00</responseDate><requestURL>http://an.oa.org/OAI-script?verb=GetRecord
&identifier=oai%3AarXiv%3A0001&metadataPrefix=oai_dc</requestURL>
<record>record contents
</record>additional records
</GetRecord>
responseheader
xml namespace
s
responsedata
record<record>
<header><identifier>oai:eg:001</identifier><datestamp>1999-01-01</datestamp>
</header><metadata>
<dc xmlns=“http://purl.org/dc”><title>My Example</title>
</dc></metadata><about>
<ea xmlns=“http://www.arXiv.org/ea”<usage>No restrictions</usage>
</ea></about>
</record>
protocol support
format-specificmetadata
community-specific
record data
selective harvesting - datestamps
repos i tory
harvest withindate range
record
record
selective harvesting - sets
repos i tory
harvest within setS1
recordrecord
record
S2
Communication re OAI
• lists: subscribe via http://www.openarchives.org
• oai-general list
• oai-implementers list
• web: http://www.openarchives.org
• FAQ: http://www.openarchives.org/faq.htm
• mail: [email protected]
• Version 1.1 frozen specifications for 12 -18 months:
• stable for experimentation; not definitive• minimize risk for early adopters
• maximize chances for future interoperability across communities
revision of specifications
The technical committee are working on the “definitive” specifications. They will come out2002-05-01.
The technical committee
- Herbert Van de Sompel (LANL) - Carl Lagoze (Cornell U)
- Thomas Krichel (Long Island U & RePEc) - Jeff Young (OCLC) - Tim Cole (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U & arXiv) - Michael Nelson (NASA & NACA) - Caroline Arms (Library of Congress) - Muhammad Zubair (Old Dominion U & ARC) - Steven Bird (U Penn & Open Language Archive Community) - Robert Tansley (MIT & DSpace) - Andy Powell (UK (UKOLN) - Mogens Sandfær (DTV, Denmark) - Thomas Severiens (Oldenburg U & Physnet) - Thomas Baron (CERN) - Les Carr (U of Southampton) - Thomas Place (Tilburg U)
Issues in front of the committee
Error Handling: SOAP: Harvesting Granularity: Mandatory DC: Set Semantics and Collection Description:XML Schema: Result Set Filtering: Flow Control, Result Set Cardinality, Response Level Container: Awareness Mechanisms: Multiple Metadata Return and "Best" Metadata Selection: Machine Readable Rights Management: From GetRecord to GetRecords: Dedupping Issues: idempotency of base-urls:xml format for mini-archives: response compression:
Thank you for your attention!
Thomas KrichelPalmer School of Library and Information Science720 Northern BoulevardBrookville NY 11548-1300USAhttp://openlib.org/home/[email protected]
Error handling
• badArgument• badGranularity • badResumptionToken• badVerb • cannotDisseminateFormat • idDoesNotExist • noRecordsMatch • noSetHierarchy
SOAP
• SOAP is a mechanism to transmit service requests over the Internet.
• As yet it is not a fully matured protocol.
• A SOAP compatible version of the protocol may be written later.
Harvesting granuality• From and Until arguments may allow a
more finer time stemps, up to one second.
• Level supported is chosen by the data provider and set in the response to the Identify verb.
• All times expressed in UTC.