Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols...

33
Project Document Cover Sheet Project Information Project Acronym WebTracks Project Title Infrastructure for Integration in Structural Sciences Start Date 1 st Aug 2010 End Date 30 th Nov 2011 Lead Institution University of Southampton Project Director Simon Coles Project Manager & contact details Brian Matthews [email protected] Partner Institutions STFC Project Web URL http://webtracks.jiscinvolve.org/wp/ Programme Name (and number) Managing Research Data (Citing, Linking and Integrating Research Data) Programme Manager Simon Hodson Document Name Document Title Intercom: A protocol for link notification

Transcript of Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols...

Page 1: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Project Document Cover Sheet

Project Information

Project Acronym WebTracks

Project Title Infrastructure for Integration in Structural Sciences

Start Date 1st Aug 2010 End Date 30th Nov 2011

Lead Institution University of Southampton

Project Director Simon Coles

Project Manager & contact details

Brian Matthews

[email protected]

Partner Institutions STFC

Project Web URL http://webtracks.jiscinvolve.org/wp/

Programme Name (and number)

Managing Research Data (Citing,

Linking and Integrating Research Data)

Programme Manager

Simon Hodson

Document Name

Document Title Intercom: A protocol for link notification

Deliverable 1.1

Author(s) Shirley Crompton, Brian Matthews, John Casson, Arif Shaon, Mark Borkum

Date 13/04/2012 Filename

URL if document is posted on project web site

Access X Project and JISC internal General dissemination

Page 2: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Document History

Version Date Comments

0.1 29/09/2010 Initial version – John Casson

0.2 20/07/2011 Heavily revised – Shirley Crompton

0.3 12/08/2011 Revision and expansion, after discussion between Crompton, Matthews and Shaon

0.4 15/08/2011 Added introduction – Brian Matthews

0.5 06/09/2011 Minor editorial changes, added Section 2.1.1 and updated examples in Section 3 – Shirley Crompton

0.6 14/02/2012 Revisions, history and use cases

0.7 13/04/2012 Final revisions, history and use cases

Page 3: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Table of Contents1 Introduction................................................................................................5

1.1 History and related work......................................................................5

1.2 Conventions in this document.............................................................6

2 Use cases..................................................................................................7

2.1 Use Case 1: Citation Tracking.............................................................7

2.2 Use Case 2: Data provenance tracking..............................................8

2.3 Use Case 3: Linking via Annotation...................................................9

3 Link Notification.......................................................................................10

3.1 General Principles.............................................................................10

3.2 Architecture for the Link Notification Service.....................................11

4 Terminology Applicable to this Document................................................13

5 Protocol Description.................................................................................13

5.1 Technical Details...............................................................................13

5.1.1 Intercom Namespace..................................................................14

5.1.2 InteRCom Ping URL...................................................................14

5.1.3 Metadata Format........................................................................15

5.1.4 GET Metadata Requests............................................................15

5.1.5 POST Requests..........................................................................16

5.1.6 Logging Requirements................................................................18

5.1.7 REST Interface...........................................................................18

5.2 Conformance Requirements.............................................................18

6 Example...................................................................................................19

6.1 Linking Resources in Managed Archives..........................................19

6.2 Linking Resources in Managed Archives – Initiated by a Third Party21

7 Security Considerations...........................................................................22

Acknowledgements........................................................................................22

References.....................................................................................................22

Page 4: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

InteRCom Specification 1.0 (Draft)

Editors:

John Casson <[email protected]>

Shirley Crompton <[email protected]>

Brian Matthews <[email protected]>

Arif Shaon <[email protected]>

Mark Borkum<[email protected]>

Date: 13/04/2012

Abstract

InteRCom is a method for managed systems to establish semantically annotated links between digital artefacts published on the web. A typical use case would be that, in the course of scientific research, a researcher writes articles on results obtained from analysing primary data from experiments and refers to other prior work as well as creating derived data. The holding entities would need to be notified to provide a “link-back” corresponding to the citation.

Aggregating links between digital research resources provides an RDF graph of citation and provenance that captures the research process in context. These graphs can be traversed on the web and interrogated to support value added services like impact evaluation. Reverse linking is supported as link assertions are intended to be stored on both the Source and Target Resources.

Page 5: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

1 Introduction

The Inter-Repository Communication protocol (InteRCom) is a general purpose application layer protocol for linking digital data resources of any type across the web. It provides a HTTP REST-based mechanism for managed resource archives or data management tools to create link requests and to exchange metadata on web-based representations of heterogeneous research objects. InteRCom is a peer-to-peer protocol with no requirement for centralised services.

1.1 History and related work

The origin of this work dates back to the CLADDIER project [Claddier 2007], which discussed the problem of linking citations between published resources. In this project, a use case of relating publications to associated raw data was developed, and the problem of tracing “forward” and “backward” citations, and how to track these between a number of different participating repositories was identified. This project produced a discussion of the problem [Matthews et. al. 2007] and considered a number of protocols available to provide notifications of this protocol. Such protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM).

A “push” protocol, where the notification of the citation is actively directed at the participating repository was suggested as being suitable. Claddier then considered a number of “linkback”1 protocols which have been proposed for this purpose, and proposed to use the well-known TrackBack [Trackback 2008] protocol as the basis for notification protocol which uses the REST web service model [Jacobs 2001]. This is a simple and established protocol, based on HTTP, and thus a straightforward extension to existing practice. An initial prototype of this was produced within the STFC ePubs, and the BADC repositories [Matthews et. al. 2007b] [Matthews et. al. 2008]

The work was subsequently, extended in the StoreLink [Matthews et. al. 2009] which added whitelists to the protocol and provided an ePrints implementation in the National Crystallography Service. The Storelink approach has advantages over harvesting methods. It is Peer-to-Peer, which increases the chance of identification of the source and target node, supplies the context of the link (link semantics), is simple and does not rely on an aggregator service. There are also advantages over “pull” approaches (e.g. Atom), as a link is propagated directly and therefore there is no reliance on discovery by subscriber services.

1 “A linkback is a method for Web authors to obtain notifications when other authors link to one of their documents” http://en.wikipedia.org/wiki/Linkback [retrieved 15th August 2011]

Page 6: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Two observations on the protocol arose in StoreLink were that:

a) the step of “discovery” of the location of the notification receiver could be separated from the transmission of the link, and

b) the protocol should be made ‘general purpose’ in order to propagate links in context between any digital object.

A similar approach is being taken by the Semantic Pingback project which uses a Remote Procedure Call [SPB 2010]. While this project already has recognised the value of a general purpose notification protocol, it uses RPC, and thus requires a different communications protocol rather than building on widely used HTTP and REST based services.

Another approach is taken by the Salmon project [Salmon 2011]. This does use a HTTP protocol, but does not use general purpose RDF based ontologies as the basis for representing the information.

Thus we propose the InterCom protocol as a two stage inter-repository communication protocol. It is more flexible than StoreLink as it does not specify a fixed format for the metadata ontology and it allows the metadata properties to be defined per link. StoreLink, in contrast, specified a static list of fields to be sent. In InteRCom, a link is represented as an RDF triple. The source and target resources form the Subject and Object URIs of the assertion, and the link type is the Predicate (Figure 1). Using this approach, InteRCom can support a wide range of links to be represented between different types of data resources.

Figure 1: An Example Link Assertion

1.2 Conventions in this document

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”,”SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described I [RFC2119].

Page 7: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

2 Use cases

We give a number of use cases where the InterCom protocol would be appropriate.

2.1 Use Case 1: Citation Tracking

In this scenario we wish to trace the citation graph between research papers.

Traditional publishing uses one directional citation which is entered by the author, where an author annotates the current publications with prior publications in order to reference work carried previously by other authors and published. Thus credit for ideas and work can be properly attributed. However, this method is intrinsically one-directional. In a traditional publication system it is difficult to track citations forward, so that readers can discovery further work which builds upon the current publication.

Such forward citation tracking has become of greater importance due to the requirement for citation and impact metrics within research assessment. Not only the number of papers generated by research is required, but also an evaluation of their impact. This can be estimated by the number of citations of the original work. Traditional citation indexes are generated via aggregation services such as Web of Knowledge which harvest citation information from a pool of recognised journals and generate a citation. However, a linked web of repositories (held in institutions or by publishers) could perform the same function. Publications in repositories could harvest citation information and record the cross citation information between papers recorded within their databases. However, they would not have access to all the papers, but only those entered from its own user community, so while the repositories would have access to the information when taken as a whole, each individual one would not have this information. A mechanism is required for repositories to propagate citation information in a targeted manner, so that the right paper in the right repository can be identified, given a URI of the paper identifying where it is being held.

In this case, an effective method of propagating citation information is a peer-to-peer link notification method as illustrated in Figure 2. When a paper is ingested into a repository:

1. its citations are identified

2. the location of paper within a repository (held within institution or by a publisher). This could be via the URL of the paper or via its DOI.

3. The citation of the paper is transmitted to a citation notification service, which records the citation.

4. The citation is recorded as ‘Cited By’ by the citing paper.

Page 8: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Figure 2: Generating the Citation Graph

In this case, the Cito ontology is used to form the appropriate links [Shotton & Peroni 2011].

2.2 Use Case 2: Data provenance tracking

In this scenario, we wish to link a research object with another research object . For example, we wish to associate a dataset which was used to derive the result reported in the paper. Further, we may wish to associate other information to the object, such as the raw data collected, the software packages used to undertake analysis, the research project used to fund the research, and the people and organisations involved in the scientific process. Thus we would create a graph of Provenance to trace the derivation of the research results so that the quality of the research process can be made transparent in future assessment, and earlier components of the process can be reused. Such provenance trails are supported by notations such as the Open Provenance Model [Moreau et. al. 2010] and the emerging W3C Provenance Data Model and Ontology [W3C 2012].

Typically citations will only reference publications. Data archives wish to track who has been using data resources and thus want to keep track of forward links (“cited-by” links) – they may be informed of a citation from a communication, or from a usage report for example. Once a data archive has recorded a paper as arising from a particular dataset, then the citation from the paper to the data set can be added, using the data citation form discussed above; this is not necessarily added by the author, but rather by the repository managers.

We assume that the publication P is held in library’s publication repository A, and the data set D is held in a research department’s data repository B, and that the information that the link should be created is initiated within repository A. Thus:

- Repository A can add the link P uses data from D to its knowledge base.

- Repository A can notify B that the link P uses data from D

- Repository B can add the link P uses data from D to its knowledge base.

This process can be taken further to related other entities within the provenance graph. Thus the data set can propagate the relationship that it is

Page 9: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

derived from raw data R generated and held at Facility C, and has used a software package S held at software repository D. Thus in this way, a provenance graph can be generated and propagated around the interested parties. The relationships are illustrated in Figure 3; note that in this diagram a blank node representing the activity of using the software package to generate the derived data is included.

Figure 3: Provenance Graph showing derivation of published data

2.3 Use Case 3: Linking via Annotation

Third parties which do not hold the entities which are being related can also create links between entities. For example, to continue the theme of annotating research artefacts with their provenance, an electronic laboratory notebook may add the annotation that a data set has been derived by an analysis process on a raw data set. In this case, the link has been recorded by the notebook, and both the repositories holding the raw and derived data would need to be notified that such as link exists in order to have a complete record of the provenance.

So if we assume that an electronic lab note book is used to create the citation that data set A is derived from data set B. A is held in repository X and B in repository Y. Thus in this case, the link is transmitted to both repositories, so that they can add it to both triples stores. This is illustrated in Figure 4.

Page 10: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Figure 4: Notification of a link via a third party

3 Link Notification

In order to complete the desired link graph we need to populate the repositories with links. In particular, we need to inform repositories that their entries have been linked so that they can add the annotated link entries to their triple stores. We propose that this would be undertaken by a Link Notification Service.

3.1 General Principles

A number of principles are adopted in the design of the link notification service.

1. The notification should be on a Peer-2-Peer basis.

2. The service should be generic across different types of repositories and repository software.

3. The service should be generic across different digital object types and metadata formats (i.e. RDF Vocabularies).

4. The notification should exchange appropriate metadata on the link from link holders to other parties with an interest in recording the link.

5. The notification system should not determine what the target repository does with the notification of the link.

6. The mechanism should fail gracefully in the event that a target does not exist or does not recognise the notification.

7. The mechanism should identify the sender of the notification and defend against bogus notifications of links.

Page 11: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Further, it was seen as desirable if existing off-the-shelf tools and mechanisms could be adapted to build on existing established practice and save on development effort.

3.2 Architecture for the Link Notification Service

We base the link notification service on Linkback, a peer-to-peer push protocol. The Linkback model establishes a direct notification between repositories, as in

Error: Reference source not found, and operates as follows:

1. Link holding repositories identify the resources involved in a link and the likely holder repository of those resources to identify appropriate target repositories.

2. Link holding repositories notify the target repository directly of the link.

3. Target repositories accept the notification of the link.

This architecture is similar to a Linkback Protocol, such as Trackback [Trackback 2008] or Pingback [Langridge & Hickson 2002]. A Linkback protocol is a protocol which has been developed largely within the Blogging community to allow notification of cross-references between Weblogs so that authors can keep track of who is linking to, or referring to their articles.

Figure 5: Linkback Model for Notification ServiceThis architecture needs the following components and functionality of those components.

1. Publishers, which:

i. Identify likely target holders of linked resources.

ii. Send link data in an appropriate format to target holders within the appropriate Linkback protocol.

2. Subscribers, which:

Repository B

Resource B1

Repository A

Resource A1

references 1. n

otify

Page 12: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

i. Receive link data in an appropriate format from source holders within the appropriate Linkback protocol.

ii. Digest link data appropriately.

Note that in this model there is no “registration of interests” with a broker; a source repository decides not “who” to notify, but merely “where” to notify – an appropriate end-point for the notification based on its URL.

Advantages 1. No centralized broker service.

2. No negotiation of registration with broker.

3. No definition of interests or harvesting of catalogue required.

4. If notification is not acknowledged, then there is no need to continue.

Disadvantages1. Identification of target repositories dependent on URL, which may be

missing.

2. Less flexibility in who can receive what (a repository can only get linkbacks for those resources it hosts)

3. Linkback protocols are well-known for being vulnerable to “spamming” by bogus notifications. As a consequence of this, there may be additions to the protocol, such as registration of trusted repositories, or signatures. While we recommend such safeguards, we regard them as out of scope of this protocol.

A number of Linkback specifications exist2, including Trackback, Pingback and Refback3.

Refback uses the information sent when a user clicks on a link to register the back link to the HTTP Referer (i.e. the page on which the link was made), which can then be harvested; thus Refback is dependent on user’s clicking on a link, which is not guaranteed, and the back link could be made to any reference to the digital object, not necessarily citations.

Pingback uses an XML-RPC call rather than HTTP. This reduces spam and potentially richer metadata can be sent across this protocol. However, the protocol is not widely supported.

Trackback is a simple “framework for peer-to-peer communication”. Essentially, TrackBack involves sending a “ping” request over HTTP POST requests, saying “resource A has a link to (cites) resource B”. TrackBack is supported by blogging software such as MoveableType4. It has a relatively simple metadata transmission in its simplest form, but has a straightforward mechanism for extension of the metadata as it

2 For a summary see: http://en.wikipedia.org/wiki/Linkback 3 http://en.wikipedia.org/wiki/Refback 4 http://www.movabletype.org/

Page 13: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

uses the POST mechanism. Problems with Spamming are well-known and mechanisms can be added to mitigate this problem.

Consequently, Trackback was chosen as basis of the Link Notification Service.

4 Terminology Applicable to this Document

We give some basic definitions as used in this protocol specification.

InteRCom-enabled Resource – A digital research object accessible on the web by a HTTP-based URI and which also supports the InteRCom GET and POST methods.

InteRCom User Agent – An entity that enacts the InteRCom protocol for a given link assertion.

Ping – An HTTP Post request send from an InteRCom agent to a server for the purpose of establishing an explicit relationship between Web resources.

Receiving/Target Resource – A Web resource to which a Ping is directed for the purpose of establishing a link between it and a Source Resource

Sender/Source Resource – A Web resource containing a link to the Target Resource.

Security Guard – A generic entity that handles authentication and/or authorisation for the Receiving Resource.

TrackBack Ping URL – The HTTP URI to which TrackBack Ping requests are posted.

URI - A HTTP-based Uniform Resource Identifier that can be de-referenced to a digital representation of a Resource.

URL – A HTTP-based Uniform Resource Locator that points to a digital representation of a Resource.

5 Protocol Description

5.1 Technical Details

The InteRCom mechanism uses REST GET and POST requests to exchange metadata and establish a link between web-based resources. For simplicity, the protocol is designed to be fired and forgotten by the invoking application. Should it fail at any point in the interactions; it fails silently without interrupting the processing of the invoking application. It is strongly recommended that an

Page 14: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

error message is logged by the InteRCom User Agent to facilitate error resolution (see Section 2.1.4).

5.1.1 Intercom NamespaceIntercom defines a namespace, with conventional prefix as follows:

xmlns:intercom="http://intercom.stfc.ac.uk/2011/"

5.1.2 InteRCom Ping URLAn InteRCom-enabled resource has an HTTP-based URI or a URN (eg. DOI) that can be resolved to the resource home location (the resource located at the resource owner). The resource should have the capability to serve RDF: either providing a complete RDF representation or RDF embedded in an HTML representation of the resource. The RDF should include the InteRCom ping URL for the resource. This ping URL should be used to manipulate (e.g. PUT/UPDATE/DELETE/GET) links associated with the resource. The ping URL should follow this format:

URI format:

http:// authority/context/resourcepath/ links

Concrete example:

http://example.com:8080/webtracks/resources/id/1/links

The value of the ping URI MUST exposes the resource’s links resource in line with the SOA approach. (For this reason, the path segment ‘/links’ is appended to the resource path.)

Note that how the invocation application is informed of the Source/Target Resource URIs are out of scope of this specification. A typical managed resource such as STFC e-Publications5 may provide a form for users to input the relevant link creation information. Alternatively the relevant information may be “scraped” from electronic citation records using a suitable tool (e.g. see [Bergmark 2000]).

In addition, it is strongly recommended that resource owners SHOULD expose the links aggregated by their resources via a SPARL endpoint. This

5 http://epubs.stfc.ac.uk

Page 15: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

access method will enable third party value added services to execute specific queries directly on the aggregated link RDF statements, in accordance with Linked Data Principles [Berners-Lee 2006].

5.1.3 Metadata FormatThe protocol only mandates three properties to be communicated when creating a link:

The Subject: the Source Resource URI

The Object: the Target Resource URI

The Predicate : the type of linkage

For consistency and persistency, cool URIs SHOULD be used to identify the resources. InteRCom by design aims to be flexible and purposely does not constrain the metadata being exchanged; except that a representation in XML formatted RDF should be available (see Section 1.1.2). Both the number and type of metadata properties can be specified on a link-by-link basis. Resource owners are free to propagate all relevant properties, which may range from metadata describing the Resource or Rights information relating to its usage.

Note that publication of the metadata does not enforce any client behaviour. It is up to the client to interpret the information in order to adopt the most respectful behaviour towards the resources offered. To facilitate interpretation and the correct usage of the resource, it is strongly recommended that common and well understood ontologies, e.g. SPAR [Shotton 2010], are used.

5.1.4 GET Metadata RequestsEach digital resource will be associated with a HTTP URI that can be de-referenced to an XML formatted RDF representation of the resource’s metadata. It is strongly recommended that the URI also provides a representation of the metadata suitable for viewing through a web browser.

An example reply to a GET metadata request is provided below:

Page 16: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Figure 6 : An Example Reply to a GET metadata Request.

The digital resource described is specified in a rdf:about attribute of a rdf:Description tag. The child tags contain arbitrary metadata on the resource. The above example uses the Dublin Core [DCMI 2010] standard to describe the metadata. Note that it includes the InteRCom ping URL associated with the resource to support the InteRCom server discovery process.

5.1.5 POST RequestsThe link creation POST request should be POSTed to the InteRCom ping URL advertised in the resource’s RDF (see Section 2.1.1). The receiver could be either the Subject or Target of the link assertion. The Content-Type header of the HTTP POST request MUST be ‘application/rdf+xml’ with a character encoding of UTF-8. The User-Agent field should be set to an appropriate version of the InteRCom User Agent.

The entity payload of the POST request is an XML formatted RDF containing the link assertion. The format of the RDF/XML is similar to the metadata format specified in Section 2.1.3. All references to an rdf namespace refer to the W3C’s RDF ontology [RDF]. The subject is specified in an rdf:about attribute of a rdf:Description tag. The object should be specified as an rdf:resource attribute to a tag specifying the predicate. The predicate can be from any appropriate ontology (but see Section 2.1.2).

Figure 3 gives an example of a minimal InteRCom POSTrequest. It links a publication (the Source Resource) in the STFC ePublications archive (epubs.stfc.ac.uk) to a beamline Investigation (experiment) managed by the ISIS ICAT data catalogue (data.isis.stfc.ac.uk) using a predicate of cito:usesDataFrom from the CiTO ontology[CIT].

Page 17: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

Figure 7 : Example minimal Post Request for a link creation.

If the Resource being linked to is managed by another authority, the Sender SHOULD also include metadata on the Source Resource in the request RDF. The Target Resource may use the information to complete or update its own sparse description of the Source Resource. Note that it is not necessary to send metadata about a Resource already stored in the receiving repository.

An example InteRCom POST request with metadata on the Source Resource is given below.

Figure 8 : Example Post Request for a Link Creation

5.1.5.1 POST Request ResponsesUpon successful submission of a POST Request, the InteRCom User Agent should set HTTP response with a 202 status code.

Page 18: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

5.1.6 Logging RequirementsWhile the protocol does not specify how logging should be managed or formatted, it is strongly recommended that compliant InteRCom services SHOULD log message sent and received to allow audit of messages and tracing of possible spurious, misdirected, or maliciously sent messages.

5.1.7 REST InterfaceInteRCom has a RESTful interface with these HTTP methods:

GET – Get metadata on the requested Resource URI.

Produces - application/rdf+xml

POST – Request the creation of a link with the Target (Receiving) Resource URI.

Consumes – application/rdf+xml

PUT – Update a specific link on the requested Link Resource URI.

Consumes – application/rdf+xml

DELETE – Remove a specific link on the requested Link Resource URI

Consumes – application/rdf+xml

GET – Get a specific link on the requested Link Resource URI

Produces - application/rdf+xml

5.2 Conformance Requirements

To claim conformance to this specification, an InteRCom-enabled repository MUST provide resource-specific InteRCom Ping URLs and publish these as part of the resources’ metadata. The repository is strongly recommended to publish a SPARL endpoint to support queries over the aggregated link RDF resource.

To claim conformance to this specification, an InteRCom-enabled Resource MUST support the GET Resource metadata and POST link creation methods as described in Sections 2.1.3 and 2.1.4.

Page 19: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

To claim conformance to this specification, an InteRCom User Agent must be able to compose and parse XML formatted RDF representation of InteRCom messages.

6 Example

6.1 Linking Resources in Managed Archives

Figure 9 : Interactions between Managed Archives

This example describes the full InteRCom protocol being used in creating a link between two Resources held in separate managed archives. This satisfies the Use Case 1: Citation Tracking above.

1. Repository A publishes/updates a resource which has an RDF link assertion to a third party resource.

2. The Source Resource passes the link assertion to its InteRCom User Agent.

3. The InteRCom User Agent (Source) extracts the Target (Receiving) Resource URI from the link assertion. (The target Resource could be either the Subject or Object of the RDF statement as the link could be a backward or forward one). It sends a GET metadata request to the Target Resource URI.

Page 20: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

4. The Target Resource GET method validates the request and passes the request to its InteRCom User Agent (Target).

5. The User Agent (Target) obtains metadata on the requested resource, compose the Target’s InterCom Ping URL and returns these in XML formatted RDF (see Figure 2).

6. The User Agent (Source) parses the response, extract the Ping URL and process the metadata according to local policy.

7. The User Agent (Source) prepares the link creation POST request. It requests metadata on the Source Resource from the local repository.

8. The User Agent (Source) composes the link creation POST request which contains the InteRCom Ping URL and metadata on the Source Resource (see Figure 4).

9. The User Agent (Source) POSTs the request to the Target Resource’s Ping URL.

10.The Target Resource POST method validates the request and passes the request to its InteRCom User Agent (Target).

11.The User Agent (Target) processes the link creation request according to its local policy which may include specific authentication and authorisation before adding the link to its knowledge base.

12.The User Agent sets a HTTP status code of 200 OK to send back to the requester.

Note that should the interaction fails at any point, it fails silently. In this event, it is strongly recommended that an error message SHOULD be logged to facilitate error resolution.

Page 21: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

6.2 Linking Resources in Managed Archives – Initiated by a Third Party

Figure 10 : Protocol Initiated by a Third Party Data Management Tool.

This example is similar to the one described in Section 6.1except that the request came from a Third Party Data Management Tool. This satisfies the Use Case 3: Linking via Annotation, above.

1. A Third Party Data Management Tool creates/updates a resource that contains a RDF link between two web resources. This is the result of a local triggering event, outside of the scope of the protocol.

2. This triggers a call to the User Agent to initiate link creation processing.

3. For each URI in the link, the User Agent sends a GET request for metadata which includes the InteRCom ping URL.

4. The Receiving resource processes the GET request as per steps 4 and 5 detailed in the previous example.

5. The User Agent (Resource X) parses the response to extract the Ping.

6. The User Agent (Resource X) composes the link creation POST request.

7. The remaining processes follow steps 9-12 detailed in the previous example.

Page 22: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

7 Security Considerations

InteRCom is based on HTTP, it is subject to the same considerations given to that specification [RFC2616]. InteRCom does not specify any normative measures regarding security and for preventing malicious spamming as these are policy issues best addressed by the resource owners. There are various security frameworks and libraries such as Public Key Infrastructure [PKI 2000], and OAuth [OAuth 2012] that can be applied to protect InteRCom-enabled Resources. Similarly, there are standard defensive practices for preventing spamming. For instance, Storelink only accepts TrackBack call from white-listed IP addresses. Some Trackback servers scan Originating Resources for legitimate links to the Target Resource (see [Trackback 2008]). In the case of managed public archives, active moderation with an administrator screening the incoming TrackBack Pings has also proven effective.

Acknowledgements

The InteRCom protocol is developed as part of the WebTracks Project http://www.jisc.ac.uk/whatwedo/programmes/mrd/clip/webtracks.aspx which is funded by the JISCMRD http://www.jisc.ac.uk/whatwedo/programmes/mrd.aspx Programme

References

[Bergmark 2000] Donna Bergmark, Automatic extraction of Reference Linking Information from Online Documents. Cornell Digital Library Research Group CSTR 2000-1821

[Berners-Lee 2006] Berners-Lee, T.(2006). Linked Data - Design Issues. Retrieved 10 April 2012, http://www.w3.org/DesignIssues/LinkedData.html

[Claddier 2007] CITATION, LOCATION, And DEPOSITION IN DISCIPLINE & INSTITUTIONAL REPOSITORIES (CLADDIER) JISC Project 2005-7. http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005/claddier

[DCMI 2010] Dublin Core Metadata Initiative specifications http://dublincore.org/specifications/

[Langridge & Hickson 2002] Stuart Langridge and Ian Hickson. Pingback 1.0 2002. http://www.hixie.ch/specs/pingback/pingback

[Jacobs 2001] Ian Jacobs, Architecture of the World Wide Web, Volume One. http://www.w3.org/2001/tag/webarch/.

Page 23: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

[Matthews et. al. 2007a] Brian Matthews, Katherine Portwin, Catherine Jones, and Bryan Lawrence. Recommendations for Data/Publication Linkage, CLADDIER Project Report III. Nov 2007 http://epubs.stfc.ac.uk/work-details?w=42017

[Matthews et. al. 2007b] Brian Matthews, Katherine Bouton, Jessie Hey, Catherine Jones, Sue Latham, Bryan Lawrence, Alistair Miles, Sam Pepler, Katherine Portwin. Cross-linking and referencing data and publications in Claddier. Proc. UK e-Science 2007 All Hands Meeting, 10-13 Sep 2007, UK e-Science 2007 All Hands Meeting, 10-13 Sep 2007

[Matthews et. al. 2008] Brian Matthews, Katherine Portwin, Catherine Jones, Bryan Lawrence. Using Trackback to Support Citation Notification Services. XTech 2008: The Web on the Move (XTech2008), Dublin, Ireland, 06-09 May 2008

[Matthews et. al. 2009] Brian Matthews, Alastair Duncan, Catherine Jones, Cameron Neylon, Mark Borkum, Simon Coles, Philip Hunter, A Protocol for Exchanging Scientific Citations, e-science, pp.171-177, 2009 Fifth IEEE International Conference on e-Science, 2009 http://www.computer.org/portal/web/csdl/doi/10.1109/e-Science.2009.32.

[Moreau et. al 2010] Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, Jan Van den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, July 2010.

[OAuth 2012] OAuth Work 2.0 http://oauth.net/

[PKI 2000] Public Key Infrastructure http://www.opengroup.org/security/pki/

[RFC2119] S. Bradner. Key words for use in RFCs to Indicate Requirement Levels, IETF, March 1997. http://www.normos.org/ietf/rfc/rfc2119.txt

[RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee. Hypertext Transfer Protocol – HTTP/1.1, FRC 2616, June 1999 http://www.ietf.org/rfc/rfc2616.txt

[Salmon 2011] Salmon Protocol http://www.salmon-protocol.org/

[Shotton 2010] David Shotton. Introducing the Semantic Publishing and Referencing (SPAR) Ontologies. http://opencitations.wordpress.com/2010/10/14/introducing-the-semantic-publishing-and-referencing-spar-ontologies/

[Shotton & Peroni 2011] David Shotton, Silvio Peroni, Citation Typing Ontology (CiTO). http://speroni.web.cs.unibo.it/cgi-bin/lode/req.py?req=http://purl.org/spar/cito.

[SPB 2010] Semantic Pingback http://aksw.org/Projects/SemanticPingBack

Page 24: Abstract - Jisc · Web viewSuch protocols included harvesting (e.g. OAI-PMH) and pull protocols (e.g. SWORD or those based on RSS or ATOM). A “push” protocol, where the notification

[Trackback 2008] TrackBack Technical Specification http://www.movabletype.org/documentation/trackback/specification.html

[W3C 2012] W3C Provenance Working Group http://www.w3.org/2011/prov/wiki/Main_Page