Harvesting and Resolution Methods for Building OAI-based Services
description
Transcript of Harvesting and Resolution Methods for Building OAI-based Services
![Page 1: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/1.jpg)
OCLC Online Computer Library Center
Harvesting and Resolution Methods
for Building OAI-based ServicesJeffrey A. Young
CERN OAI3 Workshop# 4Geneva, Switzerland
14 February 2004
![Page 2: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/2.jpg)
IntroductionsIntroductionsName
Affiliation
Plans
Needs
Technical experience
![Page 3: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/3.jpg)
Review OAI-PMH ProtocolReview OAI-PMH Protocol
Identify
ListSets
ListMetadataFormats
ListRecords
ListIdentifiers
GetRecord
![Page 4: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/4.jpg)
Find Repositories to HarvestFind Repositories to Harvest
http://www.openarchives.org/Register/BrowseSites.pl
http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai
http://oai.grainger.uiuc.edu/registry/
Friends lists
Communities (e.g. www.ndltd.org)
![Page 5: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/5.jpg)
Exercise: Getting StartedExercise: Getting StartedWhat are your data sources?
How will you add value?
Who will design the system?
Who will create/operate the software?
Who will create/maintain the data?
Who will advocate for it politically?
Who will benefit?
Who will pay?
![Page 6: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/6.jpg)
MetadataMetadataMetadata is data about data
Metadata formats: two extremes– Dublin Core– MARC
Metadata can be relative– Who created this document?– Who created the metadata about this
document?
Keep in mind, though, that OAI works just as well for sharing XML content
![Page 7: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/7.jpg)
XML/DTD/XSD/XSLXML/DTD/XSD/XSLXML - eXtensible Markup Language
DTD - Document Type Definition
XSD - XML Schema Definition
XSL - eXtensible Stylesheet Language
![Page 8: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/8.jpg)
eXtensible Markup LanguageeXtensible Markup Language
Meta-markup language
HTML – Hypertext markup language
XHTML – eXtensible hypertext markup language
![Page 9: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/9.jpg)
XML OverviewXML OverviewWell-formed XML
XML Namespaces
Valid XML– DTDs– XML Schemas
OAI Items vs. Records– Item identifiers– Multiple metadata record representations
![Page 10: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/10.jpg)
XML NamespacesXML NamespacesAmbiguous XML Elements– <wind>NNE</wind>– <wind>Clockwise</wind>
Prefixes help identify and differentiate elements– <weather:wind>SE</weather:wind>– <toy:wind>Widdershins</toy:wind>
But, prefixes are arbitrary and potentially ambiguous, so what we really need is a URI (ie. prefixes are a local shorthand for the URI)– <weather:wind
xmlns:weather=“someURI”>NW</weather:wind>
![Page 11: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/11.jpg)
XML Schema DefinitionXML Schema Definition
Defines what an XML document contains– XHTML– oai_dc– MARC21 XML
![Page 12: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/12.jpg)
What is our “item”?What is our “item”?Work – a distinct intellectual or artistic creation– J.S. Bach’s The art of the fugue
Expression – the intellectual or artistic realization of a work– The composer’s score for organ– An arrangement for chamber orchestra by Anthony
Lewis
Manifestation – The physical embodiment of an expression of a work– CD, printed score, multimedia kit, etc.
Item – A single exemplar of a manifestation
![Page 13: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/13.jpg)
Exercise: Data DefinitionExercise: Data Definition
Design a metadata format for items in your project– List the elements you need– Consider the encoding rules– Consider using controlled vocabularies
Assign an XML namespace
Map a crosswalk to Dublin Core
Create a sample item with both formats– Consider assigning OAI sets
Report issues, problems, and concerns
![Page 14: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/14.jpg)
Exercise: A Simple HarvesterExercise: A Simple HarvesterXOAIHarvester – a simple harvester
written in XSLT
http://errol.oclc.org/oai:xmlregistry.oclc.org:xoai/xoaiharvester.xsl
The purpose of the Perl script is to manage incremental harvesting
Caveat! OAI is merely the first step. Once data is harvested, OAI provides absolutely no guidance for doing something useful with it.
![Page 15: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/15.jpg)
ConcernsConcernsData quality
Duplicates
Intellectual Property Rights (IPR)
The appropriate copy problem
Persistence
![Page 16: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/16.jpg)
Repository VariablesRepository VariablesMetadataPrefix– oai_dc – the lowest common denominator
Set– Hierarchical– Allows selective harvesting– Work best with community agreement– Client warrant
![Page 17: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/17.jpg)
Exercise: Select/Create ToolsExercise: Select/Create Tools
http://www.oaforum.org/oaf_db/list_db/list_software.php
http://www.openarchives.org/tools/tools.html
http://www.cs.cornell.edu/people/simeon/software/utf8conditioner/
http://harvest.physik.uni-oldenburg.de/dc/index.html
![Page 18: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/18.jpg)
An Alternative Service ModelAn Alternative Service Model
ERRoLs are URLs to content and services related to repositories in the OAI Registry at UIUC
http://errol.oclc.org/
![Page 19: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/19.jpg)
DiscussionDiscussionIssues, Problems, Concerns?
![Page 20: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/20.jpg)
Music ServicesMusic ServicesOrganizational issues
Cultural issues
Collection policies
Best practices
Consensus-building
Controlled vocabularies– http://alcme.oclc.org/gsafd/
Do items represent digital and/or physical entities?
Authority control
![Page 21: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/21.jpg)
Repository DescriptorsRepository Descriptors
Repository-level “description” elements– oai-identifier description – identifier layout– eprints description – content & policies– friends description – discover repositories– branding description – branding information– olac-archive description – archive info
Record-level “about” elements– Rights statements– Provenance statements
![Page 22: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/22.jpg)
XSLT OverviewXSLT Overview
XSLTStylesheet
XSLTProcessor
XMLDocument
XMLDocument
![Page 23: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/23.jpg)
Validate RepositoriesValidate Repositorieshttp://www.openarchives.org/data/registerasprovider.html
http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai
http://www.w3.org/2001/03/webdata/xsv
![Page 24: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/24.jpg)
Example Service ProvidersExample Service ProvidersARC - A Cross Archive Search Service (experimental research service)
http://arc.cs.odu.edu/
Dokumenten- und Publikationsserver der Humboldt-Universität zu Berlin (search service, German language user interface)http://edoc.hu-berlin.de/oaisearch/
iCite (citation index)http://icite.sissa.it/
NCSTRL—Networked Computer Science Technical Reference Library (search engine)http://www.ncstrl.org/
my.OAI (value-added search interface to a selected list of metadata databases)http://www.myoai.com/
Physnet (simple search interface to an experimental OAI harvester)http://physnet.uni-oldenburg.de/oai/query.php
ProPrint (printing-on-demand service, German and English language user interfaces offered)http://www.proprint-service.de/
![Page 25: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/25.jpg)
ResourcesResourceshttp://www.openarchives.org/
http://www.oaforum.org/
![Page 26: Harvesting and Resolution Methods for Building OAI-based Services](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813bf4550346895da5347c/html5/thumbnails/26.jpg)
Everything you need to knowEverything you need to know
http://www.oaforum.org/otherfiles/oaf_d23_technical2.pdf