Augmenting interoperability across scholarly repositories

21
RESEARCH LIBRARY Augmenting Interoperability across Scholarly Repositories JISC CNI Conference, York, UK, July 6th 2006 Herbert Van de Sompel Augmenting Interoperability across Scholarly Repositories Herbert Van de Sompel Research Library Los Alamos National Laboratory, USA Obtain Harvest Put This work was supported by NSF award number IIS-0430906 (Pathways)

Transcript of Augmenting interoperability across scholarly repositories

Page 1: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting Interoperabilityacross Scholarly Repositories

Herbert Van de SompelResearch Library

Los Alamos National Laboratory, USA

Obt

ain

Har

vest

Put

This work was supported by NSF award number IIS-0430906 (Pathways)

Page 2: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Pathways Project

• NSF grant number IIS-0430906• http://www.infosci.cornell.edu/pathways/• PIs: Carl Lagoze, Sandy Payette, Herbert Van de Sompel, Simeon

Warner• Research Participants: Lyudmila Balakireva, Jeroen Bekaert,

Xiaoming Liu, Chris Wilper, Zhiwu Xie

Page 3: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Meeting in NYC, April 20-21 2006

• Supported by Microsoft, Mellon Foundation, Coalition forNetworked Information, Digital Library Federation, JISC

• Representatives from institutional Repository projects, scholarlycontent Repositories, Registry projects, various projects that touchon interoperability

• See http://msc.mellon.org/Meetings/Interop/ for Agenda,Participants, Topics & Goals, Terminology, Presentations, Prototypedemonstration.

• Report available July 2006

Page 4: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

And more discussions with the community

• Panel at JCDL 2006, Chapel-Hill, NC• IATUL 2006, Porto, Portugal• ElPub 2006, Bansko, Bulgaria• Meeting at the University of Southampton, UK

Page 5: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Context: the Repository model

Repository

An environment consisting ofDigital Object Repositorieswith a Long Life Expectation:

o Scholarly repositories- Institutional

repositories- Discipline-oriented

repositories- Publisher’s repositories- Dataset repositories- …

o Cultural heritagerepositories

o Preservation archiveso Educational repositories

Page 6: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Context: compound digital objects

Digital Object

Objects of scholarlycommunication system areincreasingly compound innature, simultaneouslyconsisting of:

• Multiple media types• Multiple content types

o Papers,o Datasets,o simulations,o software,o dynamic knowledge

representations,o machine readable chemical

structures

id

Page 7: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Context: the Repository model

• We must leverage the value of the materials that becomeavailable in those distributed Repositories.

• Think about these Repositories as active nodes in a globalenvironment, not as passive local nodes

o These Repositories are about facilitating the use and re-use of materials in many contexts

o These Repositories are the starting point of value chains

• In order to enable value chains, we need to augmentinteroperability across repositories

Page 8: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Selective collecting

Motivation 1 : Richer cross-Repository services

service

Distributed Repositories provide sourcematerials for cross-Repository overlayservices such as discovery services

Need: digital object representation,harvesting interface, datastreamsemantics

Page 9: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

id

id

idrecombine & add value

Motivation 2 : Scholarly communication workflowDistributed Repositories at the basis of adigital scholarly communication system.Scholarly communication as a globalworkflow across those Repositories

Need: digital object representation,obtain interface, put interface

Page 10: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting interoperability across RepositoriesD

Spac

e

Fedo

ra

aDO

Re

ePri

nts

arX

iv

Nat

ure

Individual Data Models and Services

Shared Data Model and Services

Page 11: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Considerations re interoperable framework

• Scholarly communication is a long-term endeavor:• Need abstract definitions of Repository interfaces that can be

instantiated on the basis of various technologies as time goes by• Repository interfaces need to work with whichever type of

identifier (current and future) because Repositories will usewhichever type of identifier

• Value chains do not require transfer of all digital objectcontent

• The content that needs to be transferred depends on the natureof the value chain

• Recording a chain of evidence of a value chain requires finegranularity of identification

• Not only identifier of the digital object but also of therepository

Page 12: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting interoperability across RepositoriesD

Spac

e

Fedo

ra

aDO

Re

ePri

nts

arX

iv

Nat

ure

Individual Data Models and Services

m Obt

ain

Har

vest

Put

Page 13: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting interoperability across Repositories

m Pathways Core Data Model for Cross-Repository services

Bekaert, Jeroen, Xiaoming Liu, Herbert Van de Sompel, Sandy Payette, Carl Lagoze, andSimeon Warner. Pathways Core: A Data Model for Cross-Repository Services. 2006.Poster for JCDL 2006. http://public.lanl.gov/herbertv/papers/pathways_core_poster_submit.pdf

Page 14: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting interoperability across Repositories

• A Surrogate is available for every Digital Object• A Surrogate is a representation of the DigitalObject according to the Pathways Core data model

• The representation is uniform across repositories;not tied to identifier type, content type, applicationdomain.• The Surrogate is what is used in the value chains;the Surrogate is used at Obtain, Harvest and Putinterfaces.o Expresses properties and access points for theDigital Object (see later)

o The Surrogate for a specific Digital Object canchange over time

m Pathways Core Surrogates (currently XML/RDF)

Page 15: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting interoperability across Repositories

• The Surrogates provide By-Reference access toconstituent datastreams of Digital Objects

• Full asset transfer is only required for certainapplications• Static asset transform may be undesirable fordynamic objects => Live references• Avoid IP issues at the level of the interoperabilityframework

• The idea is that the Surrogate itself is notencumbered by IP issues; attach - by definition -a liberal Creative Commons license to Surrogates• Allow Surrogates to flow freely independent ofbusiness models of the underlying content

m Pathways Core Surrogates (currently XML/RDF)

Page 16: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Augmenting interoperability across Repositories

• A Surrogate expresses access points andproperties of a Digital Object, e.g.:

• Location of content streams

• providerInfo: the keys necessary to Obtain afresh Surrogate at some later point in time:

• (Repository identifier, preferredIdentifier,versionKey)

• Lineage: A Surrogate expresses itspredecessor(s)

• == providerInfo in previous life• semantic: A Surrogate expresses the type ofcontent.

m Pathways Core Surrogates (currently XML/RDF)

Page 17: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Obtain interface: a Repository interface that supports the request ofservices pertaining to individual Digital Objects (including theircomponent Datastreams). The core service is the request of aSurrogate for a Digital Object.

Augmenting interoperability across Repositories

Obt

ain

Har

vest Harvest interface: a Repository interface that exposes Surrogates for

incremental collecting/harvesting.

Put Put interface: a Repository interface that supports submission of oneor more Surrogates into the Repository, thereby facilitating theaddition of Digital Objects to the collection of the Repository.

Page 18: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Surrogate is at the core of the value chain

id

id

id

Obt

ain

Obt

ain

Put

Obt

ain

recombine &add value

Lineage

Lineage

providerInfo

providerInfo

Page 19: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Basis for a Network of Linked Digital Objects

Page 20: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Repo1

Obt

ain

Har

vest

Put1 Harvest1

Obtain1

Put

Repo2

Obt

ain

Har

vest

Put2 Harvest2

Obtain2

Put

service

Page 21: Augmenting interoperability across scholarly repositories

RESEARCHLIBRARYAugmenting Interoperability across Scholarly Repositories

JISC CNI Conference, York, UK, July 6th 2006Herbert Van de Sompel

Repo2

Repo1

Obt

ain

Har

vest

Obt

ain

Har

vest

Put2 Harvest2

Obtain2

Put1 Harvest1

Obtain1

Put

Put

Put2Harvest2Obtain2Repo2

Put1Harvest1Obtain1Repo1

PutHarvestObtainprovider

Serv

ice

Regi

stry

providerInfo

(provider,preferredIdentifier,

versionKey)