Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

23
Van de Sompel, Sanderson, Shankar, Klein IDCC 2014, San Francisco, CA, February 26 2014 Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping Los Alamos National Laboratory Herbert Van de Sompel Robert Sanderson Harihar Shankar Martin Klein @hvdsomp @azaroth42 @hariharshankar @mart1nkle1n

description

Presentation given at the International Digital Curation Conference in San Francisco, February 26 2014. Highlights the lack of machine-actionability of persistent identifiers assigned to scholarly communication assets. Proposes an approach to address the issue that meets requirements that take into account the changing nature of web based research communication. A draft paper provides more details: http://public.lanl.gov/herbertv/papers/Papers/2014/IDCC2014_vandesompel.pdf

Transcript of Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Page 1: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Persistent Identifiers for Scholarly Assets and the Web:The Need for an Unambiguous Mapping

Los Alamos National Laboratory

Herbert Van de SompelRobert SandersonHarihar ShankarMartin Klein

@hvdsomp@azaroth42

@hariharshankar@mart1nkle1n

Page 2: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Acknowledgments

• Sean Bechhofer – University of Manchester• Geoff Bilder – CrossRef• Maarten Hoogerwerf – DANS• Pete Johnston – Cambridge University• Carl Lagoze - University of Michigan• Michael L. Nelson – Old Dominion University• Andrew Treloar – ANDS• Simeon Warner – Cornell University

Page 3: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Motivation

• Persistent/Persist-able Identifiers (PIDs) play a crucial role in the identification of scholarly assets

• Motivated by concerns of long term persistence, PIDs are minted outside of the dominant web information access protocol, HTTP

• Value added services targeted at humans and machines assume/require resources identified by means of HTTP URIs

• Hence, an unambiguous bridge is required between:• PID-oriented paradigm of research communication• HTTP-oriented web, semantic web, linked data environment

• Preferably, such a bridge should work across PID systems• Interoperability between PID systems

Page 4: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Status Quo of the PID/HTTP Bridge

Page 5: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Page 6: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Page 7: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

HTTP HEAD != HTTP GET

• The expectation is that an HTTP HEAD on HTTP-URI-PID will yield the same response (without body) as an HTTP GET• Martin Fenner finds this is not always the case• Not a CrossRef resolver problem, a publisher problem

Page 8: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Notation

Asset Identifier PID

Resolving URI HTTP-URI-PID

Redirect URI (landing page) HTTP-URI-LAND

Location URI (content) HTTP-URI-LOC

Page 9: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Examples of Issues with the PID/HTTP Bridge

• Given an HTTP-URI-PID, how can a machine navigate towards the actual content (i.e. not the landing page)?

• Given an HTTP-URI-LOC (of - say - an image), what is the PID of the asset it resorts under?

• What is the URI of the Target of an Open Annotation that pertains to a PID-identified asset (i.e. not to the landing page, not to the PDF, the HTML, …)?

Page 10: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Requirements for the PID/HTTP Bridge

• Targeted at machines so richer applications (for humans and machines) can emerge• Follow your nose; typed links; RDF

• Support for bundling resources and describing those resources to reflect that assets increasingly consist of multiple, not just a single, resource• Multiple HTTP-URI-LOC resort under a PID

• Support for resource versioning, discovery of versions, access to versions to reflect that resources used or created during the research process are increasingly dynamic

Page 11: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Evidence for these Requirements: Data Citation Principles

(4) Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used in the community.

(5) Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.

(7) Specificity and Verifiability: … Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited.

Page 12: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

A Proposed PID/HTTP Bridge

• A bridge goes in two directions:

• Uniform path from the PID of an asset the asset’s constituent resources, each identified by a distinct HTTP-URI-LOC

• Uniform path from the HTTP-URI-LOC of a constituent resource of a scholarly asset to the PID of that asset

• In order to build the bridge, a rather basic question needs an answer …

Page 13: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

What is the Nature of the Resource Identified by HTTP-URI-PID?

• HTTP-URI-PID identifies the landing page HTTP-URI-LAND• Interpretation supported by typical “302 Found” redirection

• HTTP-URI-PID identifies the asset identified by PID for the purpose of web interactions• Interpretation supported by:

• CrossRef display guideline that recommends using HTTP-URI-PID in the online environment, replacing prior practice to use PID

• CrossRef provides descriptive RDF metadata using “303 See Also” style content negotiation with HTTP-URI-PID

• The resource is conceptual, a so-called non-information resource

Page 14: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

A Proposed PID/HTTP Bridge

• A bridge goes in two directions:

• Uniform path from the PID of an asset to the asset’s constituent resources, each identified by a distinct HTTP-URI-LOC

• Uniform path from the HTTP-URI-LOC of a constituent resource of a scholarly asset to the PID of that asset

• HTTP-URI-PID identifies the asset identified by PID for the purpose of web interactions

• The proposed bridge builds on: HTTP, Cool URIs for the Semantic Web, HTTP Links and Link Relation Types, OAI-ORE, Memento

Page 15: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Page 16: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Page 17: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Requirements for the PID/HTTP Bridge

Targeted at machines

Support for bundling resources

• Support for resource versioning

Page 18: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Common Resource Versioning Pattern

generic URI: always most recent version

version-specific URI

version-specific URI

Page 19: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Resource Versioning

• This common resource versioning pattern can be used for Aggregations (HTTP-URI-PID), Resource Maps (HTTP-URI-MACH), Aggregated Resources (HTTP-URI-LOC, HTTP-URI-LAND)

• The pattern aligns perfectly with Memento which offers modular functionality for discovering, accessing resource versions using HTTP headers (See Resource Versioning and Memento):• Express datetime of a resource version• Interlink resource versions• Interlink resource version and the associated generic resource• Access an overview of all resource versions• Access a resource version that was current at a given datetime

Page 20: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Requirements for the PID/HTTP Bridge

Targeted at machines

Support for bundling resources

Support for resource versioning

Page 21: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Open Issues

Which ontologies for metadata, types, relationships? Cf. SURF info-eu-repo, State of the LOD Cloud

• No URI schemes for PIDs• PID/HTTP-URI-PID for each

version; typically none that always yield the current version

Page 22: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

Open Issues

Should it be owl:sameAs

Should it be rel=“collection”

Page 23: Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Van de Sompel, Sanderson, Shankar, KleinIDCC 2014, San Francisco, CA, February 26 2014

References

• Martin Fenner. Challenges in automated DOI resolution. http://blog.martinfenner.org/2013/10/13/broken-dois/

• FORCE11 Data Citation Principles. http://force11.org/datacitation • Cool URIs for the Semantic Web. http://www.w3.org/TR/cooluris/• Web Linking. http://tools.ietf.org/search/rfc5988• IANA Link Relation Types.

http://www.iana.org/assignments/link-relations/link-relations.xhtml • OAI-ORE. http://www.openarchives.org/ore/1.0/• Memento, RFC 7089. http://tools.ietf.org/html/rfc7089 • Resource Versioning and Memento.

http://www.mementoweb.org/guide/howto/ • SURF info-eu-repo. http://purl.org/REP/standards/info-eu-repo • State of the LOD Cloud. http://lod-cloud.net/state/