Open Archives Iniative – Protocol for Metadata Harvesting
description
Transcript of Open Archives Iniative – Protocol for Metadata Harvesting
![Page 1: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/1.jpg)
Open Archives Iniative – Protocol for Metadata Harvesting
Iztok Kavkler, University of Ljubljana
Some slides byStefaan Ternier, KULBram Vandenputte, KULJoris Klerkx, KUL
![Page 2: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/2.jpg)
2
What is OAI?
Harvesting standard, documented athttp://www.openarchives.org/OAI/openarchivesprotocol.html
Seven service verbs– Identify– ListMetadataFormats– GetRecord– ListRecords– ListIdentifiers– ListSets
Allows multiple metadata formats– DC (Dublin core) format mandatory
![Page 3: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/3.jpg)
3
How OAI works
OAI “VERBS”– Identify – ListMetadataFormats– GetRecord– ListIdentifiers– ListRecords– ListSets
HARVESTER
REPOSITORY
OAI OAI
Service Provider Metadata Provider
HTTP Request
HTTP Response
(OAI Verb)
(Valid XML)
![Page 4: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/4.jpg)
4
Try it
Install Apache-Tomcat or any other Java servlet container
Download WAR file from
http://fire.eun.org/Iztok/OAILREApp.war Deploy WAR Demo html
http://localhost:8080/OAILREApp/
Or type a service verb, e.g.http://localhost:8080/OAILREApp/oaiHandler?verb=Identify
![Page 5: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/5.jpg)
5
The raw XML
By default, the resulting XML has stylesheet attached for pretty rendering
To remove the stylesheet comment the line
OAIHandler.styleSheet=testoai/oaicat.xsl
in file
oaicat.properties (in WAR file or the web-app dir)
![Page 6: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/6.jpg)
6
OAI XML example<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...><responseDate>2007-06-11T06:48:58Z</responseDate><request metadataPrefix="oai_lom"
verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request><ListRecords> <record> <header>
<identifier>oai:oai.xyz-repository.com:exercises/112553</identifier><datestamp>2007-06-09T22:38:28Z</datestamp><setSpec>exercises</setSpec>
</header> <metadata>
<lom xmlns=...> ... </lom> </metadata> </record>
....<resumptionToken expirationDate="2007-06-11T07:48:58Z"completeListSize="42" cursor="10">1181544538265</resumptionToken></ListRecords></OAI-PMH>
![Page 7: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/7.jpg)
7
OAICat - a Java implementation
OAICat home athttp://www.oclc.org/research/software/oai/cat.htm
Takes care of– web service details– OAI XML specification
The implementer has to provide three classes– RepositoryOAICatalog– RepositoryRecordFactory– Repository2oai_dc (lom, ...) - usually more than
one
![Page 8: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/8.jpg)
8
A sample implementation
(Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip)
Create a new web module Add servlet oaiHandler to web.xml<servlet>
<servlet-name>LreOAIHandler</servlet-name>
<servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class>
<load-on-startup>5</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>LreOAIHandler</servlet-name>
<url-pattern>/oaiHandler</url-pattern>
</servlet-mapping>
![Page 9: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/9.jpg)
9
(cont)
Define properties file location<context-param>
<param-name>properties</param-name>
<param-value>oaicat.properties</param-value>
</context-param>
Welcome file for testing<welcome-file-list>
<welcome-file>testoai/index.html</welcome-file>
</welcome-file-list>
![Page 10: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/10.jpg)
10
Sample record
A record with basic fieldsid, url, title, descr and date
SampleOAICatalog contains an array with 3 sample records
![Page 11: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/11.jpg)
11
SampleOAICatalog.listIdentifiers
Parameters– from – date to harvest from (String in iso8601
format) date or datetime - depends on granularity
– to – date to harvest to– set – a set name, list only records from this set (if
null, list all records) set names classify objects in natural groups every record may belong to multiple sets (or none)
– metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)
![Page 12: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/12.jpg)
12
SampleOAICatalog.listIdentifiers
Must return a map with to fields– headers – a String iterator of OAI headers– identifiers – a String iterator of OAI identifiers
Both created by the call (rec is a SampleRecord)String[] header = getRecordFactory().createHeader(rec);
headers.add(header[0]);
identifiers.add(header[1]);
Create resultMap<String, Object> listIdMap = new HashMap<String, Object>();
listIdMap.put("headers", headers.iterator());
listIdMap.put("identifiers", identifiers.iterator());
return listIdMap;
![Page 13: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/13.jpg)
13
getRecordFactory().createHeader(rec)
Creates header by calling the methods in SampleRecordFactory
String getOAIIdentifier(Object rec)– return full oai identifier “oai:oay.rep.com:id001”
String getDatestamp(Object rec)– returns date in iso8601 format
Iterator<String> getSetSpecs (Object rec)ArrayList<String> list = new ArrayList<String>();
list.add(...);
return list.iterator(); Iterator<String> getAbouts (Object rec) String fromOAIIdentifier(String id)
– helper method – convert id to a local id
![Page 14: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/14.jpg)
14
SampleOAICatalog.listSets
takes no parameters, returns the list of all sets in this repository– each ListIdentifiers or ListRecords query may
contain a set name, limiting the results to just one set
![Page 15: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/15.jpg)
15
SampleOAICatalog.getSchemaLocations
like GetRecord, but returns the Vector of all metadata schema locations the record supports– to obtain them, just call
getRecordFactory().getSchemaLocations(rec);
![Page 16: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/16.jpg)
16
SampleOAICatalog.getRecord
String getRecord(String id, String metadataPrefix)– find record and convert it to xml string (<record> element)– id is in global format – to get local value call
getRecordFactory().fromOAIIdentifier(id)– throw IdDoesNotExistException if record not found– to generate XML use constructRecord
constructRecord(rec, metadataPrefix)
![Page 17: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/17.jpg)
17
SampleOAICatalog.listRecords
just like ListIdentifiers, only generates a list of XML <record> elements
return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>();
listRecMap.put(“records", records.iterator());return listRecMap;
![Page 18: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/18.jpg)
18
Crosswalks
Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc
Only two methods per implementation– boolean isAvailableFor(Object rec)– String createMetadata(Object rec)
SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM());
throw CannotDisseminateFormatException if the metadata not available in this format
![Page 19: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/19.jpg)
19
SampleRecord.toLOM
uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/
– automatic serialization/deserialization of LOM and DC XML formats
Examplelom.newGeneral().newIdentifier(0).newCatalog().setString("lre");
lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id);
lom.newTechnical().newLocation(-1).setString(url);
lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en");
lom.newGeneral().newTitle().newString(0).setString(title);
![Page 20: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/20.jpg)
20
Resumption
A repository usually has fixed limit on the numer of records to return in one call– if there are more available, it returns a resumption
token, allowing to receive next packet– Implemented by functions
listIdentifiers(String resumptionToken) ,listRecords(String resumptionToken)
– see XYZOAICatalog for details
![Page 21: Open Archives Iniative – Protocol for Metadata Harvesting](https://reader036.fdocuments.us/reader036/viewer/2022062410/5681574e550346895dc4f0dc/html5/thumbnails/21.jpg)
21
References
http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.fmf.uni-lj.si/~kavkler/ http://www.oclc.org/research/software/oai/cat.htm http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt http://sourceforge.net/projects/lom-j/ SIO/Trubar OAI url
http://sio.edus.si/LreTomcat/