ResourceSync Tutorial

190
ResourceSync Tutorial DANS, January 21 2014, Den Haag, Netherlands ResourceSync: A Web-Based Resource Synchronization Framework ResourceSync is funded by The Sloan Foundation & JISC #resourcesync 1

description

These slides are a tutorial for the OAI ResourceSync framework.

Transcript of ResourceSync Tutorial

Page 1: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync:A Web-Based

Resource SynchronizationFramework

ResourceSync is funded by The Sloan Foundation & JISC

#resourcesync

1

Page 2: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

2

These slides were presented at the LITA Forum, Louisville, Kentucky, November 10 2013

The most recent version of the slides is available at

http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial

Page 3: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

3

ResourceSync Tutorial History• First outing: OAI8, June 2013• Second run: Open Repositories, July 2013• Third run: JCDL, July 2013• Fourth run: TPDL 2013, September 2013• Fifth run: LITA Forum, November 2013• Sixth run: SWIB 2013, November 2013

Presenter

Herbert Van de Sompel Los Alamos National Laboratory

<[email protected]>@hvdsomp

Page 4: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Martin KleinLos Alamos National Laboratory<[email protected]>

@mart1nkle1n

ResourceSync Tutorial Contributors

4

Simeon WarnerCornell University

<[email protected]>@zimeon

Herbert Van de Sompel Los Alamos National Laboratory

<[email protected]>@hvdsomp

Robert SandersonLos Alamos National Laboratory

<[email protected]>@azaroth24

Richard JonesCottage Labs

<[email protected]>@cottagelabs

Michael L. NelsonOld Dominion University

<[email protected]>@phonedude_mln

Page 5: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

5

OAI

Herbert Van de SompelMartin KleinRobert Sanderson(Los Alamos National Laboratory)

Simeon Warner(Cornell University)

Berhard Haslhofer(University of Vienna)

Michael L. Nelson(Old Dominion University)

Carl Lagoze(University of Michigan)

NISO

Todd CarpenterNettie Lagace

University of Oxford

Graham Klyne

Lyrasis

Peter Murray

Page 6: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync Technical Group

6

JISC

Richard JonesGraham Klyne

Stuart Lewis

OCLC

Jeff Young

LOCKSS

David Rosenthal

RedHat

Christian Sadilek

Ex Libris Inc.

Shlomo Sanders

Library of Congress

Kevin Ford

Paul Walk

Page 7: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Timeline, Status of Specification(s)

• August 2013o Release of ResourceSync framework Core specification

- Version 0.9.1 o Public draft of ResourceSync Archives specification released

• September 2013o Core specification on its way to become an ANSI standard

• November 2013o Internal draft of ResourceSync Notification specification

• January 2014o Public draft of ResourceSync Notification specification

• Mid 2014o Core specification becomes ANSI/NISO standard

7

Page 8: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Pointers

• Specification

http://www.openarchives.org/rs/http://www.openarchives.org/rs/resourcesynchttp://www.openarchives.org/rs/notificationhttp://www.openarchives.org/rs/archives

• List for public comment

https://groups.google.com/d/forum/resourcesync

• Client and simulator code

http://github.org/resync/resynchttp://github.org/resync/simulator

8

Page 9: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Papers

• Klein, M., and Van de Sompel, H. (2013) Extending Sitemaps for Resourcesync. http://arxiv.org/abs/1305.4890 ACM/IEEE JCDL 2013

• Haslhofer, B., Warner, S, Lagoze, C., Klein, M., Sanderson, R., Nelson, M.L. and Van de Sompel, H. (2013) ResourceSync: Leveraging Sitemaps for Resource Synchronization. http://arxiv.org/abs/1305.1476 WWW 2013 Developer Track

• Klein, M., Sanderson, R., Van de Sompel, H., Warner, S, Haslhofer, B., Lagoze, C., and Nelson, M.L. (2013) A Technical Framework for Resource Synchronization. http://dx.doi.org/10.1045/january2013-klein D-Lib Magazine.

• Van de Sompel, H., Sanderson, R., Klein, M., Nelson, M.L., Haslhofer, B., Warner, S, and Lagoze, C. (2012) A Perspective on Resource Synchronization. http://dx.doi.org/10.1045/september2012-vandesompel D-Lib Magazine.

9

Page 10: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

10

Page 11: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

11

Page 12: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Synchronize What?

• Web resourceso things with a URI that can be dereferenced

• Focus on needs of research communication and cultural heritage organizationso but aim for generality

12

Page 13: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Synchronize What?

• Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources)

13

sync

sync

Page 14: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Synchronize What?

14

• Low change frequency (weeks/months) to high change frequency (seconds)

sync

sync

sync

Page 15: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Synchronize What?

15

• Synchronization latency and accuracy needs may vary

sync

Sync ???

Page 16: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Why?

… because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches!

• Project team involved with projects that need this

• Experience with OAI-PMH: widely used in repos buto XML metadata onlyo Web technology has moved on since 1999

• Devise a shared solution for data, metadata, linked data?

16

Page 17: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync Problem

17

• Consideration:• Source (server) A has resources that change over time: they

get created, modified, deleted• Destination (servers) X, Y, and Z leverage (some)

resources of Source A.• Problem:

• Destinations want to keep in step with the resource changes at Source A: resource synchronization.

• Goal:• Design an approach for resource synchronization aligned

with the Web Architecture that has a fair chance of adoption by different communities.• The approach must scale better than recurrent HTTP

HEAD/GET on resources.

Page 18: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source: Core Synchronization Capabilities

1. Describing content – publish a list of resources available for synchronization to enable Destinations to perform an initial load or catch-up with a Source

2. Packaging content – bundle resources to enable bulk download by destinations

3. Describing changes – publish a list of resource changes to enable destinations to stay synchronized and decrease latency

4. Packaging changes – bundle resource changes for bulk download by destinations

18

PULL

Page 19: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

19

To reduce synchronization latency and to optimize the synchronization process the Source can support:

• 1. Change Notification• Notifies about changes to particular resources• e.g., resource A has been updated | created | deleted

• 2. Framework Notification• Notifies about changes to capabilities i.e., their documents• e.g., a Change List has been updated | created | deleted

Source: Notifications Capabilities

PUSH

Page 20: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source: Archival Capabilities

The Source may hold on to historical data, for example, to allow Destinations to catch up with events they missed or revisit prior resource states. To this end, the Source can publish archives, i.e. documents that enumerate historical capability documents

1. Resource List Archive

2. Resource Dump Archive

3. Change List Archive

4. Change Dump Archive

20

ARCHIVES

Page 21: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source: Synchronization Features

1. Discovery of capabilities – support Destinations in discovering all offered capabilities o Applies to PULL, PUSH, ARCHIVES capabilities

2. Linking to related resources – provide links from resources subject to synchronization to related resourceso Applies to PULL, PUSH capabilities

21

Page 22: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Destination: Synchronization Needs

1. Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source

- avoid out-of-band setup

2. Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source

- subject to some latency; minimal: create/update/delete- allow to catch-up after destination has been offline

3. Audit – A destination should be able to determine whether it is synchronized with a source

- regarding coverage and accuracy

22

Page 23: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

23

Page 24: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Cases – The Basics

24

a)

b)

Page 25: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Cases – The Basics

25

c)

d)

Page 26: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Cases – The not-so-Basics

26

e)

f)

Page 27: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 1: arXiv Mirroring and Data Sharing

• Repository of scholarly articles in physics, mathematics, computer science, etc.

• > 850k articles• approx. 1.5 revisions per article on

average• approx. 75k new articles per year• Each article has full-text and separate

metadata record• approx. 3.8M resources

28

Page 28: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 1: arXiv Mirroring and Data Sharing

• 2,700 updates dailyo at 8pm ESTo Currently using homebrew mirroring

solution (running with minor modifications since 1994!)

o occasional rsync (file system-specific, auth issues)

29

Page 29: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 1: arXiv

Mirroring

• GOAL: Keep mirror sites synchronized with daily changes

• WANT:o high consistencyo moderate latencyo robustness to global network outages (low admin effort)o ability to verify sync status in case of questions

31

Page 30: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 1: arXiv

Data Sharing

• GOAL: Make resources and update information publicly available so that any other service may synchronize at the frequency it needs, e.g.o Math Front at UC Daviso EprintWeb from IOP in UKo Data for bibliometric and scientometric analysis

• WANT:o low admin effort (i.e. standard approach, standard tools)o reasonable consistency, latency, efficiency

32

Page 31: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 2: DBpedia Live Duplication

• Average of 2 updates per second• Low latency desirable => need for a push technology

33

Page 32: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 2: DBpedia Live Duplication

• Daily traffic:o 99% updateso 0.6% deletionso 0.03% creations

35

Page 33: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Use Case 2: DBpedia Live Duplication

• # of content transfer events in two 8 hour intervals

• Max, queue size of remote duplication process

36

Page 34: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

37

Page 35: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source Capability 1: Describing Content

In order to advertise the resources that a source wants destinations to know about, it may describe them:

o Publish a Resource List, a list of resource URIs and possibly associated metadata- Destination GETs the Resource List- Destination GETs listed resources by their URI

o A Resource List describes the state of a set of resources at one point in time (snapshot)

38

Page 36: ResourceSync Tutorial

39

Page 37: ResourceSync Tutorial

40

Page 38: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source Capability 2: Packaging Content

By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms:

o Publish a Resource Dump, a document that points to packages of resource representations and necessary metadata- Destination GETs the package- Destination unpacks the package- ZIP format supported

o A Resource Dump and the packages it points to reflect the state of a set of resources at one point in time (snapshot)

41

Page 39: ResourceSync Tutorial

42

Page 40: ResourceSync Tutorial

43

Page 41: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source: Modular Capabilities

44

Page 42: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source Capability 3: Describing Changes

In order to achieve lower latency and/or greater efficiency, a source may communicate about changes to its resources:

o Publish a Change List, a list of recent change events (created, updated, deleted resource)- Destination acts upon change events, e.g. GETs

created/updated resources, removes deleted resources.o A Change List pertains to resources that changed in a

temporal interval with a start- and an end-date- If a resource changed more than once, it will be listed

more than once

45

Page 43: ResourceSync Tutorial

46

Page 44: ResourceSync Tutorial

47

Page 45: ResourceSync Tutorial

48

Page 46: ResourceSync Tutorial

49

Page 47: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source Capability 4: Packaging Changes

In order to reduce the number of requests to obtain resource changes, a source may provide packaged bitstreams for changed resources:

o Publish a Change Dump, a document that points to packages containing bitstreams of recently changed resource and necessary metadata - Destination GETs the package- Destination unpacks the package- ZIP format supported

o A Change Dump and its packages pertain to resources that changed in a temporal interval with a start- and an end-date- If a resource changed more than once, it will be included

more than once

50

Page 48: ResourceSync Tutorial

51

Page 49: ResourceSync Tutorial

52

Page 50: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source: Modular Capabilities

53

Page 51: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Destination: Key Processes

54

Page 52: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

55

Page 53: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

4. Framework (Technical) Details

1. Sitemaps

2. Core synchronization capabilities (PULL)

3. Discovery

4. Linking to related resources

5. Notification Capabilities (PUSH)

6. Archival capabilities (ARCHIVES)

56

Page 54: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

57

4. Framework (Technical) Details

1. Sitemaps

2. Core synchronization capabilities (PULL)

3. Discovery

4. Linking to related resources

5. Notification Capabilities (PUSH)

6. Archival capabilities (ARCHIVES)

Page 55: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

So Many Choices

58

XMPP

AtomPub

SDShare

RSS

Atom

PubSubHubbub

Sitemap

XMPP

rsync

OAI-PMH

WebDAV Col. Syn.

OAI-ORE

DSNotify

RDFsync

Crawl

Push

Pull

SWORD

SPARQLpush

Page 56: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

So Many Choices

59

XMPP

AtomPub

SDShare

RSS

Atom

PubSubHubbub

Sitemap

XMPP

rsync

OAI-PMH

WebDAV Col. Syn.

OAI-ORE

DSNotify

RDFsync

Crawl

Push

Pull

SWORD

SPARQLpush

Page 57: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

60

Page 58: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

A Framework Based on Sitemaps

• Modular framework allowing selective deployment

• Sitemap is the core format throughout the framework

o Introduce extension elements and attributes: - In ResourceSync namespace (rs:) to

accommodate synchronization needso Reuse Sitemap format for all capability documents:

Resource List, Resource Dump, Change List, Change Dump, as well as for manifest in Dumps

o Utilize Sitemap index format where needed/allowed

61

Page 59: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Sitemap Format

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>

<url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url>

<url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </url> …</urlset>

62

Page 60: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Sitemap Index Format

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>

<sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </sitemap>

<sitemap> <loc>http://example.com/sitemap2.xml</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> </sitemap> …</sitemapindex>

63

Page 61: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync Sitemap Extensions

<urlset xmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </url> <url> … </url></urlset>

64

Page 62: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync Sitemap Extensions

<sitemapindex xmlns=http://www.sitemaps.org/schemas/sitemap/0.9 xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/><sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </sitemap>…</sitemapindex>

65

Page 63: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource Metadata SummaryElement/Attribute Description Defined by

<loc> Resource URI (identity) sitemaps

<lastmod> Timestamp of last change sitemaps

<changefreq> Expected update frequency sitemaps

<rs:md>   ResourceSync

change Change type (Change List & Change Dump Manifest only) ResourceSync

encodingHTTP Content-Encoding header value RFC2616

hashOne or more content digests (md5, sha-1, sha-256)

Atom Link Ext.

lengthHTTP Content-Length header value RFC4287

pathPath in ZIP package (Dump Manifests only)

ResourceSync

typeHTTP Content-Type header value RFC4287

Page 64: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Related Resource Metadata Summary

• Attributes of the <rs:ln> element; c.f. resource metadata + pri

Element/Attribute Description Defined by

<rs:ln>   ResourceSync

encoding HTTP Content-Encoding header value RFC2616

hash One or more content digests (md5, sha-1, sha-256) Atom Link Ext.

href Related resource URI (identity) RFC4287

length HTTP Content-Length header value RFC4287

modified Timestamp of last change (c.f. <lastmod>) Atom Link Ext.

path Path in ZIP package (Dump Manifests only) ResourceSync

pri Priority of link RFC6249

rel Relation - IANA registered or URI RFC4287

type HTTP Content-Type header value RFC4287

Page 65: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Link Relation Summary

Relation Use in ResourceSync Defined in

rel="alternate" Link from generic to specific URI HTML 5

rel="canonical" Link from specific to generic URI RFC6596

rel="collection" Resource is member of collection RFC6573

rel="contents" Link from dump to manifest HTML4

rel="describedby" Has metadata Protocol for Web Description Resources (POWDER): Description Resources

rel="describes" Is metadata for The 'describes' Link Relation Type

rel="duplicate" Mirror or alternative copy RFC6249

rel=".../rs/terms/patch"A patch -- efficient change information This specification

rel="memento" Link to time-specific URI Memento Internet Draft

rel="timegate" Link to timegate Memento Internet Draft

rel="via" Provenance chain, came from RFC4287

Page 66: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync Sitemap Validation

• All ResourceSync capability documents are valid according to the Sitemap XML Schemao http://www.sitemaps.org/schemas/sitemap/0.9

• For a more thorough validation use the ResourceSync XML Schemao http://www.openarchives.org/rs/0.9.1/resourcesync.xsd

Page 67: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

70

http://www.openarchives.org/rs/resourcesync

4. Framework (Technical) Details

1. Sitemaps

2. Core synchronization capabilities (PULL)

3. Discovery

4. Linking to related resources

5. Notification Capabilities (PUSH)

6. Archival capabilities (ARCHIVES)

Page 68: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

71

http://www.openarchives.org/rs/resourcesync#DescResources

Describing Content: Resource List

Page 69: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2013-01-03T09:00:00Z” completed="2013-01-03T09:01:00Z” /> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>

72

Page 70: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource List

• Describe Source’s resources that are subject to synchronization• At one point in time (snapshot)• Creation can take some time – duration can be conveyed

• Typical Destination use: Baseline Synchronization, Audit

• Each URI typically listed only once• Might be expensive to generate• Destinations use @at to determine freshness

• [@at, @completed] – interval of uncertainty• Destination issues GETs against URIs to obtain resources• Very similar to current Sitemaps

73

Page 71: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

What if I have a million resources?

• Current sitemap limit is 50k resources (or maximum document size of 50MB)

• Break complete list of resources into 50k-resource chunks, each on a Resource List document

• Create a Resource List Index document to group them:o Based on <sitemapindex>o May have up to 50k component Resource Listso Extends capacity to 2,500,000,000 resources within current

community practices

Page 72: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource List Index <resourcelist_index.xml>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcelist" at="2013-01-02T09:00:02Z”/> <sitemap> <loc>http://example.com/resourcelist1.xml</loc> <rs:md type="application/xml"/> </sitemap> <sitemap> <loc>http://example.com/resourcelist2.xml</loc> <rs:md type="application/xml"/> </sitemap></sitemapindex>

75

Page 73: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource List <resourcelist1.xml>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs=http://www.openarchives.org/rs/terms/> <rs:ln rel=”index” href=”http://example.com/resourcelist_index.xml”/> <rs:md capability=”resourcelist" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T08:07:06Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> ...</urlset>

76

Page 74: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource List Index

77

Page 75: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

78

http://www.openarchives.org/rs/resourcesync#ResourceDump

Packaging Content: Resource Dump

Page 76: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource Dump

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/resourcedump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”97553" type=”application/zip"/> <rs:ln rel=”contents” href="http://example.com/resourcedump_manifest-part1.xml" type=”application/xml"/> </url> <url> <loc>http://example.com/resourcedump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod></url></urlset>

79

Page 77: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource Dump Manifest

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”resourcedump-manifest" at="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type="text/html" path=”/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md type=”application/pdf” path=”/resources/res2"/> </url></urlset>

80

Page 78: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource Dump

• A Resource Dump points to packages (ZIP files) that contain representations of the Source’s resources• At one point in time (snapshot)

• Resource Dump is mandatory, even if there is only one ZIP file• ZIP package contains manifest, listing contained bitstreams• Typical Destination use: Baseline Synchronization, bulk

download

• Each URI typically listed only once• Might be expensive to generate• Destinations use @at to determine freshness

• [@at, @completed] – interval of uncertainty• GETs against individual URIs from Resource List achieves the

same result (ignoring varying freshness)

81

Page 79: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

82

http://www.openarchives.org/rs/resourcesync#DesChanges

Describing Changes: Change List

Page 80: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>

83

Page 81: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Open Change List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs=http://www.openarchives.org/rs/terms/> <rs:md capability="changelist" from="2013-01-02T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url></urlset>

84

Page 82: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change List

• A Change List pertains to a Source’s resources that changed• Changes that occurred during a temporal interval with start-

and end-date• Typical Destination use: Incremental Synchronization, Audit

• Changes are listed in chronological order• Multiple changes to one resource results in the resource being

listed multiple times, once per change• Source determines duration of temporal interval• Destinations use @from and @until to determine freshness• Destinations issue GETs against URIs to obtain changed

resources

85

Page 83: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change List Index

<changelist_index.xml>

<changelist1.xml>

86

Page 84: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change List Index <changelist_index.xml>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <sitemap> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-01-02T11:00:00Z</lastmod> <rs:md type="application/xml"/> </sitemap> <sitemap> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-01-02T23:00:00Z</lastmod> <rs:md type="application/xml"/> </sitemap></sitemapindex>

87

Page 85: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change List <changelist1.xml>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs=http://www.openarchives.org/rs/terms/> <rs:ln rel=”index” href=”http://example.com/changelist_index.xml”/> <rs:md capability="changelist" from="2013-01-02T09:00:00Z” until="2013-01-02T21:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url></urlset>

88

Page 86: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Open Change List Index

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z”/> <sitemap> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-01-02T11:00:00Z</lastmod> </sitemap> <sitemap> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-01-02T23:00:00Z</lastmod> </sitemap> <sitemap> <loc>http://example.com/changelist_open.xml</loc> </sitemap></sitemapindex>

89

Page 87: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change List Index

90

Page 88: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

91

http://www.openarchives.org/rs/resourcesync#PackChanges

Packaging Changes: Change Dump

Page 89: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Capability 4: Change Dump

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/change_dump_part1.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length="887" type=”application/zip"/> </url> <url> <loc>http://example.com/change_dump_part2.zip</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md length=”9767" type=”application/zip"/> </url></urlset>

92

Page 90: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change Dump Manifest

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump-manifest" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated" length=”2887” type=”text/html” path=”/changes/res1”/> </url> <url> … </url></urlset>

93

Page 91: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change Dump

• A Change Dump points at packages (ZIP files) that contain bitstreams of the Source’s resources that changed• Changes that occurred during a temporal interval with start-

and end-date • Change Dump is mandatory, even if there is only one ZIP file• ZIP package contains manifest, listing contained bitstreams• Typical Destination use: Incremental Synchronization, bulk

download of changes

• Changes in Change Dump Manifest listed in chronological order• Same URI can be listed multiple times• Might be expensive to generate• Destinations use @from and @until to determine freshness

94

Page 92: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

95

http://www.openarchives.org/rs/resourcesync#Discovery

4. Framework (Technical) Details

1. Sitemaps

2. Core synchronization capabilities (PULL)

3. Discovery

4. Linking to related resources

5. Notification Capabilities (PUSH)

6. Archival capabilities (ARCHIVES)

Page 93: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Discovery of Capabilities

Requirements:• Need to discover capabilities, i.e. Resource List, Resource

Dump, Change List, Change Dump, Archives, Notification channels

• Need to know the type of capability each document represents.

Approach:• The Source publishes a Capability List that enumerates the

capabilities it supports.• By pointing at Resource List, Change List, Resource Dump,

etc. using appropriate relation types, e.g. “resourcelist”, “changelist”, “resourcedump” etc.

96

http://www.openarchives.org/rs/resourcesync#CapabilityList

Page 94: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

97

Discovery of Capabilities

Page 95: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Capability List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”capabilitylist”/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability=”changelist”/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability=”resourcedump”/> </url></urlset>

98

Page 96: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

99

Requirements:• Need to discover a Capability List

Approaches:• Introduce a link in the HTTP Link header of a resources that is

subject to synchronization, pointing at the Capability List with the relation type “resourcesync”

• Introduce a link from an HTML document that is subject to synchronization (<head> section), pointing at the Capability List with the relation type “resourcesync”

• Link from a Resource List, etc. to the Capability List with the relation type “up”

Link header on example.com/res1.pdf

Link: <example.com/dataset1/capabilitylist.xml>;rel=“resourcesync”

Discovery of Capability Lists

Page 97: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

100

Discovery of Capabilities

Page 98: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Discovery: Source Description

Requirements:• Support for multiple Capability Lists, one per “set of

resources”• Need to discover these Capability Lists• Need descriptive information about each set of resources

that a Capability List pertains to• Useful to have descriptive information about the Source itself

Approach:• The Source Description document meets these requirements.

• It should be at a particular location to avoid having registries:

http://(hostname)/.well-known/resourcesync• It can be linked to from the Capability Lists as well.

101

http://www.openarchives.org/rs/resourcesync#SourceDesc

Page 99: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

102

Discovery of Capabilities

Page 100: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

103

Discovery of Capabilities

Page 101: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source Description

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”description”/> <rs:ln rel=“describedby” href=“http://example.com/info_about_source.xml”/> <url> <loc>http://example.com/dataset1/capabilitylist.xml</loc> <rs:md capability=”capabilitylist”/> <rs:ln rel=“describedby” href=“http://example.com/dataset1/info_about_dataset1.xml”/> </url> <url> <loc>http://example.com/dataset2/capabilitylist.xml</loc> <rs:md capability=”capabilitylist”/> <rs:ln rel=“describedby” href=“http://example.com/dataset2/info_about_dataset2.xml”/> </url></urlset>

104

Page 102: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

105

• Resource Lists are (enhanced) Sitemaps• Sitemaps can be discovered via robots.txt

• Ergo, Resource Lists should be discoverable via robots.txt

User-agent: *Disallow: /cgi-bin/Disallow: /tmp/Sitemap: http://example.com/dataset1/resourcelist.xml

Discovery via robots.txt

Page 103: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Discovery of Capabilities

106

http://www.openarchives.org/rs/resourcesync#Discovery

Page 104: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Framework Navigation

107

http://www.openarchives.org/rs/resourcesync#Navigation

Page 105: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

e.g., Capability List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”capabilitylist”/> <rs:ln rel=“up” href=“http://example.com/.well-known/resourcesync”/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability=”changelist”/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability=”resourcedump”/> </url></urlset>

108

Page 106: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Framework Structure

109

http://www.openarchives.org/rs/resourcesync#Structure

Page 107: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

110

Framework Structure

Page 108: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

4. Framework (Technical) Details

1. Sitemaps

2. Core capabilities (pull)

3. Discovery

4. Linking to related resources

5. Archives

6. Notifications (push)

111

http://www.openarchives.org/rs/resourcesync#LinkRelRes

Page 109: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Supported Linking Use Cases

Provide links to related resources to address specific resource synchronization needs.

1. Mirrored content with multiple download locations

2. Alternate representations of the same content

3. Patching content rather than replacing it

4. Resources and metadata about resources

5. Prior versions of resources

6. Collection membership of resources

7. Republishing synchronized resources

All cases are handled with a <rs:ln> element referring to the linked resource

112

Page 110: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Notes about Linked Resources

Some important things to keep in mind about linked resources:

• They may also be subject to synchronization• They may be updated in a very different schedule than the

resources that link to them• Therefore, it is recommended to convey metadata about the

linked resource too• Links can be bi-directional – the linked resource can link back to

the linking resource

113

Page 111: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #1 - Mirror

1. Content with multiple download locations

This may be of interest for:• Content distribution networks• Mirror sites• Backup locations• Load balancing

114

http://www.openarchives.org/rs/0.9.1/resourcesync#MirCon

Page 112: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #1 - Mirror

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”duplicate” pri=”1” href=”http://mirror1.example.com/res1"/> <rs:ln rel=”duplicate” pri=”2” href=”http://mirror2.example.com/res1"/> </url></urlset>

115

Page 113: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #2 – Alternate Representations

2. Alternate representations of the same content

This may be of interest for:• Resources subject to HTTP content negotiation• Format migration for preservation reasons • Different clients wanting different formats• Multiple languages of the content

116

http://www.openarchives.org/rs/resourcesync#AltRep

Page 114: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #2 – Alternate Representations

117

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel="alternate" type="text/html" href="http://example.com/res1.html"/> <rs:ln rel="alternate" type=“application/pdf" href=”http://example.com/res1.pdf"/> </url></urlset>

Page 115: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #2 – Alternate Representations

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1.html</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”canonical” href="http://example.com/res1"/> </url></urlset>

118

Page 116: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #3 – Patching Content

3. Patching content rather than replacing it

This may be of interest when:• Resources are very large and server wishes to conserve

bandwidth where possible• Changes are frequent and small• Changes are managed in a CMS that tracks differences

Need:• Machine processable format to describe a change in a

manner that allows patching a representation• Existing or newly defined by communities

119

http://www.openarchives.org/rs/resourcesync#PatchCon

Page 117: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #3 – Patching Content

120

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1.json</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated” length=“398723”/> <rs:ln rel=”http://www.openarchives.org/rs/terms/patch” type=”application/json-patch” modified=“2013-01-02T17:00:00Z” length=“58” href=”http://example.com/res1-patch.json"/> </url></urlset>

Page 118: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #4 – Metadata about Resources

4. Resources and metadata about resources

This may be of interest when:• Resources have associated descriptive metadata records,

which are useful for understanding the resource• Such as cultural heritage images, audio, video

• Resources that have associated technical, administrative, rights metadata

121

http://www.openarchives.org/rs/resourcesync#ResMDLinking

Page 119: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #4 – Metadata about Resources

122

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”describedby” type=”application/xml” href=”http://example.com/metadata/res1.xml"/> </url></urlset>

Page 120: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #4 – Metadata about Resources

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/metadata/res1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”describes” type=”text/html” href=”http://example.com/res1"/> </url></urlset>

123

Page 121: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #5 – Prior Versions of Resources

124

This may be of interest when:• A Destinations needs to have a copy of all versions of a

resource

http://www.openarchives.org/rs/resourcesync#ResVers

Page 122: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Memento Intermezzo

http://www.mementoweb.org/

Page 123: ResourceSync Tutorial

URI for Original, URI for Version

URI-M - http://web.archive.org/web/20010911203610/http://www.cnn.com/

Web Archive

URI-R - http://www.cnn.com/

Page 124: ResourceSync Tutorial

URI for Original, URI for Version

URI-M - http://en.wikipedia.org/w/index.php?title=September_11_attacks&oldid=282333

CMS

URI-R - http://en.wikipedia.org/wiki/September_11_attacks

Page 125: ResourceSync Tutorial
Page 126: ResourceSync Tutorial
Page 127: ResourceSync Tutorial
Page 128: ResourceSync Tutorial
Page 129: ResourceSync Tutorial
Page 130: ResourceSync Tutorial
Page 131: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Memento Time Travel extension for Chrome

Download extension at http://bit.ly/memento-for-chrome

Page 132: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #5 – Prior Versions of Resources

135

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”memento” href=”http://example.com/past/20130102130000/res1"/> <rs:ln rel=”timegate” href=”http://example.com/timegate/res1"/> <rs:ln rel=”timemap” href=“http://example.com/timemap/res1” type=“application/link-format”/> </url></urlset>

Page 133: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #6 – Collection Membership

6. Collection membership of resources

This may be of interest when:• Resources are part of OAI-ORE aggregations• Resources are part of OAI-PMH sets• To indicate any other type of collections of resources

Collections are named with URIs and can then be linked to with rel=“collection”

• Nice if the collection URI resolves to a useful description

136

http://www.openarchives.org/rs/resourcesync#ColMem

Page 134: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #6 – Collection Membership

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z”/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”collection” href=”http://example.com/aggregation/allres"/> </url></urlset>

137

Page 135: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #7 – Republishing Resources

7. Republishing synchronized resources

This may be of interest when:• Aggregator systems harvest resources from Sources and

then republish them at new URIs

Examples include Blog republishing, content distribution networks, mirrored or combined collections

Hypothetical scenario: Lots of little museums with small collections, and a large European/American aggregating digital library system that wants to provide fast, combined access to the content (with permission)

138

http://www.openarchives.org/rs/resourcesync#RePub

Page 136: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #7 – Republishing Resources #1

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-03T00:00:00Z”/> <url> <loc>http://original.example.com/res1</loc> <lastmod>2013-01-03T07:00:00Z</lastmod> <rs:md change=”updated”/> </url></urlset>

139

• Original Source publishes information about a changed resource via a Change List

Page 137: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #7 – Republishing Resources #2

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-03T11:00:00Z”/> <url> <loc>http://aggregator1.example.com/res1</loc> <lastmod>2013-01-03T20:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”via” modified=“2013-01-03T07:00:00Z” href=”http://original.example.org/res1"/> </url></urlset>

140

• Aggregator 1 republishes information about the changed resource with reference to the original Source

Page 138: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Linking #7 – Republishing Resources #3

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-03T12:00:00Z”/> <url> <loc>http://aggregator2.example.com/res1</loc> <lastmod>2013-01-04T09:00:00Z</lastmod> <rs:md change=”updated”/> <rs:ln rel=”via” modified=“2013-01-03T07:00:00Z” href=”http://original.example.org/res1"/> </url></urlset>

141

• Aggregator 2 ditto• Caution when republishing links, need to make sure they are still

appropriate from an aggregator’s perspective

Page 139: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

142

4. Framework (Technical) Details

1. Sitemaps

2. Core synchronization capabilities (PULL)

3. Discovery

4. Linking to related resources

5. Notification Capabilities (PUSH)

6. Archival capabilities (ARCHIVES)

http://www.openarchives.org/rs/notification

Page 140: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Motivation for Notifications

143

• Reduce synchronization latency by having the Source push out resource change information• To avoid continuous pull of Change Lists by Destinations

• Share information about changes to the Source’s ResourceSync implementation, e.g. announcement of new Resource List, new Capability List, etc.• To avoid continuous polling of e.g. Resource Lists,

ResourceSync Description

Page 141: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

144

• 1. Change Notification• Notifies about changes to particular resources• e.g., resource A has been updated | created | deleted

• 2. Framework Notification• Notifies about changes to capabilities i.e., their documents• e.g., a Change List has been updated | created | deleted• Also for Capability Lists and Source Description

Source: Notifications Capabilities

PUSH

Page 142: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

145

• Notification sent via channels• Resource Notification: one channel per set of resources• Framework Notification: one channel per set of resources

• Sent on level of capability document, not on index-level• Notifications about changes to Source Description sent on all

Framework Notification channels

• Payload for notifications: <urlset> documents

• Transport protocol for notifications:• PubSubHubbub -

https://pubsubhubbub.googlecode.com/git/pubsubhubbub-core-0.4.html - current choice

• WebSockets -http://tools.ietf.org/html/rfc6455 – may be added later

Notifications Channels

Page 143: ResourceSync Tutorial

146

Page 144: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

147

Framework NotificationStructure

Page 145: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

148

Framework NotificationStructure

Page 146: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change Notification Payload

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"><url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T09:07:00Z</lastmod> <rs:md change=”updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> … </url></urlset>

149

Page 147: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Framework Notification Payload

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <url> <loc>http://example.com/resourceset1/resourcelist.xml</loc> <rs:md change=”created" capability=”resourcelist”/> </url> <url> <loc>http://example.com/resourceset1/resourcedump.xml</loc> <rs:md change=”created" capability=”resourcedump”/> </url></urlset>

150

Page 148: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Framework Notification Payload (w/ index)

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"><url> <loc>http://example.com/resourceset1/resourcelist.xml</loc> <rs:md change=”created" capability=”resourcelist”/> <rs:ln rel="index" href=”http://example.com/dataset1/resourcelist-index.xml/> </url></urlset>

151

Page 149: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

152

Framework NotificationDiscovery

Page 150: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

153

4. Framework (Technical) Details

1. Sitemaps

2. Core synchronization capabilities (PULL)

3. Discovery

4. Linking to related resources

5. Notification Capabilities (PUSH)

6. Archival capabilities (ARCHIVES)

http://www.openarchives.org/rs/archives

Page 151: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Source: Archival Capabilities

The Source may hold on to historical data, for example, to allow Destinations to catch up with events they missed or revisit prior resource states. To this end, the Source can publish archives, i.e. documents that enumerate historical capability documents

1. Resource List Archive

2. Resource Dump Archive

3. Change List Archive

4. Change Dump Archive

154

ARCHIVES

Page 152: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

155

http://www.openarchives.org/rs/archives#ResourceListArch

Resource List Archive

Page 153: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist-archive" at="2013-01-09T13:00:00Z"/> <url> <loc>http://example.com/resourcelist1.xml</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> </url> <url> <loc>http://example.com/resourcelist2.xml</loc> <lastmod>2013-01-09T13:00:00Z</lastmod> </url> <url> … </url></urlset>

Resource List Archive

156

Page 154: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Resource Dump Archive

157

http://www.openarchives.org/rs/archives#ResourceDumpArch

Page 155: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcedump-archive" at="2013-02-10T03:00:00Z"/> <url> <loc>http://example.com/resourcedump1.xml</loc> <lastmod>2013-01-10T03:00:00Z</lastmod> </url> <url> <loc>http://example.com/resourcedump2.xml</loc> <lastmod>2013-02-10T03:00:00Z</lastmod> </url> <url> … </url></urlset>

Resource Dump Archive

158

Page 156: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

159

http://www.openarchives.org/rs/archives#ChangeListArch

Change List Archive

Page 157: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist-archive" from="2013-02-01T23:00:00Z until="2013-02-03T23:00:00Z"/> <url> <loc>http://example.com/changelist1.xml</loc> <lastmod>2013-02-01T23:00:00Z</lastmod> </url> <url> <loc>http://example.com/changelist2.xml</loc> <lastmod>2013-02-02T23:00:00Z</lastmod> </url> <url> <loc>http://example.com/changelist3.xml</loc> <lastmod>2013-02-03T23:00:00Z</lastmod> </url></urlset>

Change List Archive

160

Page 158: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Change Dump Archive

161

http://www.openarchives.org/rs/archives#ChangeDumpArch

Page 159: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changedump-archive" from="2013-02-10T03:00:00Z until="2013-02-17T03:00:00Z"/> <url> <loc>http://example.com/changedump1.xml</loc> <lastmod>2013-02-10T03:00:00Z</lastmod> </url> <url> <loc>http://example.com/changedump2.xml</loc> <lastmod>2013-02-17T03:00:00Z</lastmod> </url> <url> … </url></urlset>

Change Dump Archive

162

Page 160: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs=“http://www.openarchives.org/rs/terms/”> <rs:md capability=”capabilitylist”/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability=”resourcelist”/> </url>… <url> <loc>http://example.com/dataset1/resourcelist-archive.xml</loc> <rs:md capability=“resourcelist-archive”/> </url> <url> <loc>http://example.com/dataset1/changelist-archive.xml</loc> <rs:md capability=“changelist-archive”/> </url></urlset>

Capability List for Archives

163

Page 161: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSyncFrameworkwith Archives

164

Page 162: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

165

Page 163: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Implementation #1:The Metadata Harvesting Use Case

166

Page 164: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

The Metadata Harvesting Use Case

1. Identification of metadata records within a service

2. Use of standards in metadata formats

3. Incremental updates

4. Create, Update, Delete

5. Sets

167

Page 165: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

The Metadata Harvesting Use Case

1. Identification of metadata records within a service

2. Use of standards in metadata formats

168

ResourceSync does not specifically care about metadata records, only resources. It is up to the server to identify which of those resources are metadata.

We are free to annotate a resource's entry with appropriate metadata to indicate the format.

Page 166: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

The Metadata Harvesting Use Case

3. Incremental updates

4. Create, Update, Delete

5. Sets

169

All resources that can be obtained from a change list will be annotated with the kind of change that happened to them.

ResourceSync allows the server to publish lists of resources and changes and indexes of those lists all annotated with metadata.

ResourceSync publishes changes as static documents. The client is then free to walk up and down the change lists provided by the server.

Page 167: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

(Required) Documents formetadata harvesting use case

170

Page 168: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Describing Metadata Resources

171

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" from="2013-05-05T13:00:00Z"/> <url> <loc>http://mydspace.edu/dspace-rs/resource/123456789/7/qdc</loc> <lastmod>2013-05-01T19:09:35Z</lastmod> <changefreq>never</changefreq> <rs:md type=”application/xml”/> <rs:ln href="http://mydspace.edu/bitstream/123456789/7/1/bitstream.pdf" rel="describes"/> <rs:ln href="http://mydspace.edu/bitstream/123456789/7/2/image.jpg" rel="describes"/> <rs:ln href="http://mydspace.edu/123456789/3" rel=”collection"/> </url></urlset>

Page 169: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Describing Bitstream Resources

172

<urlset … <url> <loc>http://mydspace.edu/bitstream/123456789/7/1/bitstream.pdf</loc> <lastmod>2013-05-01T19:09:35Z</lastmod> <changefreq>never</changefreq> <rs:md hash="md5:75d0ea94097a05fce9aca5b079e2f209" length="419805" type="application/pdf"/> <rs:ln href="http://mydspace.edu/dspace-rs/resource/123456789/7/qdc" rel="describedby"/> <rs:ln href="http://mydspace.edu/dspace-rs/resource/123456789/7/mets" rel="describedby"/> <rs:ln href="http://mydspace.edu/dspace-rs/resource/123456789/12/qdc" rel="describedby"/> <rs:ln href="http://mydspace.edu/123456789/2" rel=”collection"/> </url></urlset>

Page 170: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Serving Metadata Resources

173

http://mydspace.edu/dspace-rs/resource/123456789/7/qdc

ResourceSync webapp Item handle Metadata Format

metadata.formats = \ qdc = http://purl.org/dc/terms/, \ mets = http://www.loc.gov/METS/

metadata.types = \ qdc = application/xml, \ mets = application/xml

<loc>http://mydspace.edu/dspace-rs/resource/123456789/7/qdc<loc> <rs:md type="application/xml”/> <rs:ln href="http://purl.org/dc/terms/" rel="describedby"/>

<loc>http://mydspace.edu/dspace-rs/resource/123456789/7/mets</loc> <rs:md type="application/xml”/> <rs:ln href="http://www.loc.gov/METS/" rel="describedby"/>

Page 171: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Generating Documents1. Initialise

Creates initial Capability List and Resource List documents

[dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -i

2. Update

Creates a new Change List which covers the period since the last Change List was created

[dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -u

3. Rebase

A combination of both Initialise and Update.

[dspace]/bin/dspace dsrun org.dspace.resourcesync.ResourceSyncGenerator -r

174

Page 172: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Usage of Resources by clients

175

Page 173: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Impact on DSpace

176

Page 174: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

URLs• Stable identifiers for archived items• Stable identifiers for unarchived items• Stable identifiers for metadata resources (in their various formats)• Stable identifiers for previous versions

Provenance• History of changes to an item/bitstream• Item/bitstream deletions (vs withdraw)• Bitstream create/update dates• Item create/update dates

177

?

Page 175: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Versioning• Access of previous versions of both metadata and bitstreams• Stable identifiers for previous versions of both metadata and

bitstreams

Metadata Resources• Metadata in a variety of formats• Metadata as file/bitstream

178

?

?

Page 176: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Admin Files• ResourceSync documents (Resource Lists, Change Lists, etc)• ResourceSync exports - Resource Dumps, Change Dumps• Metadata exports in a number of formats

Scheduled Tasks• Regular generation of RS documents

Complex Objects• Item/bitstream relationships• Collections of content

179

Page 177: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Dspace Module:https://github.com/CottageLabs/DSpaceResourceSync

depends on the common java library:https://github.com/CottageLabs/ResourceSyncJava

PHP client:https://github.com/stuartlewis/resync-php

depends on the SWORDv2 clienbt library:https://github.com/swordapp/swordappv2-php-library/

Get the software!

180

Page 178: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Implementation #2:ResourceSync at arXiv.org

181

Page 179: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync @ arXiv

• Use ResourceSync for both mirroring and public data accesso efficient updateso ability to do periodic auditso public synchronization capabilityo reduce admin burden

• Likely start with metadata + source for mirroring use case (doing experiments now)

• Open access use cases requires processed PDF also• Some concerns about likely use/load…

182

Page 180: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

183

Page 181: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Alternate download location

• Likely want to separate machine accesses from human accesses to preserve response time on main server

=> Use Mirrored Content part of spec

o <loc> specifies canonical URI - e.g. http://arxiv.org/pdf/1306.1073v1.pdf

o <rs:ln rel=“duplicate”> specifies preferred download location- e.g. http://export.arxiv.org/pdf/1306.1073v1.pdf

184

Page 182: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

<url> <loc>http://arxiv.org/pdf/1306.1073v1.pdf</loc> <lastmod>2013-06-06T00:57:12Z</lastmod> <rs:md hash="md5:e08e0c4e4d7b0895120014f0aa09e7c4" length="287714” type=”application/pdf"/> <rs:ln rel="duplicate” pri="1" href="http://export.arxiv.org/pdf/1306.1073v1.pdf" modified="2013-06-06T02:00:59Z"/></url>

Alternate download location

185

Page 183: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Getting a copy of arXiv

It might be as easy as:

186

(of course, you probably have to wait a while but it is nice to know ResourceSync is stateless so one can efficiently restart)

Page 184: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

Python Library and Client

• Aim to provide library code implementing all ResourceSync facilities for use in both source and destination implementationso Designed for python 2.6 (RHEL6) and 2.7o Will not work with python <= 2.5

• Client (resync) supports many destination operations, inspired by the common Unix rsync program

• Client also supports some operations that might be useful in a source, such as generation of static Resource Lists, or periodic Change Lists (used in arXiv experiments)

• Explorer (resync-explorer) intended to allow easy inspection of a source’s resource sets and capabilities

• Developed since ResourceSync v0.5, updated for v0.9

http://github.org/resync/resync

Page 185: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync Source Simulator

• Python code using Tornado server• Provides random set of resources of different sizes updated at a

particular rate• Very useful for testing Destination code

http://github.com/resync/simulator

Page 186: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync - Agenda

1. ResourceSync: Problem Perspective & Conceptual Approach

2. Motivation & Use Cases

3. Framework Walkthrough

4. Framework (Technical) Details

5. Implementation

6. Q&A

189

Page 187: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

190

Page 188: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

191

Page 189: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

192

Page 190: ResourceSync Tutorial

ResourceSync TutorialDANS, January 21 2014, Den Haag, Netherlands

ResourceSync:A Web-Based

Resource SynchronizationFramework

ResourceSync is funded by The Sloan Foundation & JISC

#resourcesync

193