Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole...
-
Upload
harold-fleming -
Category
Documents
-
view
215 -
download
0
Transcript of Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources Timothy W. Cole...
Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources
Timothy W. Cole ([email protected])University of Illinois at Urbana-Champaign
http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/
ALA/CLA Annual Meeting22 June 2003
Toronto, CA
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Order of Presentation
Perspectives on OAI-PMH Illinois OAI metadata harvesting
project Goals & objectives Findings regarding metadata Findings regarding search & discovery
New OAI projects at Illinois IMLS digital collections & content CIC OAI metadata harvesting project
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
OAI Protocol for Metadata Harvesting
Harvesting approachto interoperabilityat metadata level
Divides world intoMetadata Providers& Service Providers
Builds on HTTP,XML, & Dublin Core
http://www.openarchives.org/
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
OAI Antecedents
Call to other E-Print archives (July 1999)Paul Ginsparg, Rick Luce, & Herbert Von de
Sompel:“…mobilize core group to work towards achieving auniversal service for author self-archived scholarly literature.”
Santa Fe Mtgs. (Oct. 1999 & June 2000) OAI – PMH version history:
First Alpha Release, Sept. 2000 1.0 (Beta) Release January 2001 1.1 (Beta 2) Release July 2001 2.0 (Production) Release June 2002
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Original OAI Organization
OAI Executive: Carl Lagoze & Herbert Van de Sompel
OAI Steering Committee: Co-Chairs: Dan Greenstein, Cliff
Lynch OAI Technical Committee Funded by NSF, DLF & CNI Seeks to be user community driven
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
OAI-PMH as a tool
All about moving metadata around Designed to be a building block,
useable by many different communities Can facilitate (in some cases enable)
services & functions Assumes widely distributed content,
butcentralized indexing(!) & services
Build once, use for many applications Focus of OAI is interoperability
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Harvesting vs. Broadcast Competing approaches to
interoperability
Distributed/Broadcast searching: search and discovery over remote services and data
Harvesting is when data/metadata is transferred from the remote source to the destination where search & discovery services are located (e.g. Union catalogs)
OAI-PMH is a harvesting protocol
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
As Compared to Z39.50
Z39.50 OAI
Content (Objects) Distributed Distributed
World View Bibliographic Bibliographic
Object Presentation
Data provider Data provider
Searching is Distributed Centralized
Search done by Data provider Service provider
Metadata searched is
Up to date Stale
Semantic Mapping When searching Metadata delivery
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Metadata vs. Resources
Resource refers to information objects or digital representations of information objects
Metadata item is a collection of properties about a resource (e.g. title, author, etc.)
Metadata record is a metadata item expressed in a specific syntax according to an XSD
OAI focuses on metadata, with the implicit understanding that metadata contains useful links to the source information object(s)
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
When to use OAI-PMH
Metadata is sufficient for services desired Normalization, dedupping, metadata
augmentation desired Content is widely distributed across small,
non-Z39.50 enabled repositories OAI-PMH is more lightweight than Z39.50
Portals can use BOTH Z39.50 & OAI-PMH
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
What OAI-PMH Is Not
Not search & discovery on its own
Not a database management system
Not a single metadata schema
Not OAIS
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
How OAI Works
OAI “VERBS”
Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord
HARVESTER
REPOSITORY
OAI OAI
Service Provider Metadata Provider
HTTP Request
HTTP Response
(OAI Verb)
(Valid XML)
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
OAI Provider Architectures
Descriptive Metadata
DBMS
XML
HTML <meta>
OAI Administrative Metadata, e.g., Ids, datestamps, sets, formats
Webserver - HTTP
OAI Application (CGI, ASP, PHP, etc.)OAI
Harvesters
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
A few projects using OAI-PMH
Basic building block of the National Science Digital Library
Large-scale implementations in E-Prints, OLAC, NDLTD, …
Built into ENCompass, ContentDM, Michigan’s DLXS, D-Space, and other products
Open Archives Forum in Europe; will be part of federation activities in the UK and EU
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Univ. of Illinois OAI Metadata Harvesting Project
Funded by Andrew W. Mellon Foundation(July 2001 – May 2003)
Primary objectives: Develop & make available OAI harvesting tools Build search services for aggregated metadata
in the domain of cultural heritage Examine metadata aggregation issues,
including use of EAD in OAI context Investigate utility of aggregated metadata,
including preliminary testing with end-users
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Type of resources
39 data providers academic libraries Museums / cultural
orgs digital libraries public library
1.1 million original DC records + 1.5 million derived
from EAD
Images25%
Text & Sheet Music50%
Artifact20%
Other5%
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Variations in DC element usage
Records containing subject & description elementSUBJECT DESCRIPTION
Digital libraries(10 total, 122,719 records)
78% 36%
Museums, hist. societies, etc. (6 total, 255,800 records)
93% 93%
Academic libraries(7 total, 235,294 records)
15% 13%
Many different controlled and local vocabularies in use Granularity: a record may describe a collection
of coins — or one coin
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Excerpt of a metadata record describing a cotton coverlet
Description: Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.
Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.
Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.
Coverage: —
Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?
Type: Image
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Excerpt of a metadata record describing "American woven coverlet“
Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973
Source: —
Format: 228 x 169 x 1.2 cm (1,629 g)
Coverage: Euro-American; America, North; United States; Indiana? Illinois?
Date: Early 19th c. CE
Type: cultural; physical object; original
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Implications Service providers
Automatically normalize metadata encoding where possible (e.g., dates)
Normalize for and co-locate by type / format where possible
Metadata providers Create metadata for interoperability Consider more expressive schema –
e.g., Qualified DC, MARC
Original interface Portal had two search
pages—simple (keyword) and advanced.
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Pilot study with student teachers
23 users in honors-level C&I class Assignment: Use the site in preparing a lesson
plan (high school social studies)__________
Introduced to “aggregated metadata” concept Focus group interviews conducted Students’ papers examined Transaction logs analyzed
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Results of initial user testing
1. Users expected all links pointed to digital objects
Some records pointed to finding aids Some records pointed to collection’s web site Some records described analog objects
2. Users unable to make use of search results Simple searches produced 1000s of unranked
results Advanced search (with limits) rarely used
3. Distinction between portal and data providers unimportant to users
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
What does “online access” mean?
To librarian & curator
To student teacher
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Response to test results
EAD-derived records segregated
Analog only collections excluded
Categories of resource types reduced to 3:
Images and Video Text, Sheet Music, and Websites Museums and Archival Collections
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Revised interface
Simple keyword & advanced searchput on one page
Clarify “online access”
Natural language in Boolean operators
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
Revised search results
Link goes to finding aid or collection page? “Learn more.”
Link displays object? “View item.”
Subj/Desc expanded
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
IMLS Digital Collections & Content
Build a registry of all National Leadership Grant collections with digital content.
Assist and guide NLG projects in making item-level metadata sharable using OAI.
Build a repository and search & discovery tools for integrated access to the content of NLG collections (unique metadata schema?).
Research best practices for sharing metadata about diverse digital content and for supporting the interests of diverse user communities.
http://imlsdcc.grainger.uiuc.edu/
22 June 2003 ALA 2003 / OAI-PMHTim Cole (t-
CIC OAI metadata harvesting
Univ. of Illinois at UC will host an OAI-PMH metadata harvesting service for 10 CIC libraries
Project Goals (3 year experimentation phase) Improve access to selected resources at CIC libraries Advertise these resources (internally & externally) Prepare member institutions for future grant-
mandated OAI-based resource sharing Serve as a useful testbed for experimentation with
OAI-PMH, development of metadata best practices, usability and user needs testing, etc.
Using OAI-PMH to Aggregate Metadata Describing Cultural
Heritage Resources
http://dli.grainger.uiuc.edu/Publications/TWCole/ALA2003OAI/
Timothy W. Cole ([email protected])University of Illinois at Urbana-Champaign