Persistent Identifiers

35
Tobias Weigel (DKRZ) Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) Persistent Identifiers Solving a number of problems through a simplistic mechanism

description

Persistent Identifiers. Solving a number of problems through a simplistic mechanism. Agenda. What are Persistent Identifiers (PIDs)? Extended PIDs How to use them. What are PIDs?. Persistent identifiers come in various formats. 10876/abc123 10.1594/WDCC/CMIP5.NCCNMpc - PowerPoint PPT Presentation

Transcript of Persistent Identifiers

Page 1: Persistent Identifiers

Tobias Weigel (DKRZ)

Tobias WeigelDeutsches Klimarechenzentrum (DKRZ)

Persistent Identifiers

Solving a number of problems through a simplistic

mechanism

Page 2: Persistent Identifiers

Tobias Weigel (DKRZ)

What are Persistent Identifiers (PIDs)?

Extended PIDs How to use them

Agenda

Page 3: Persistent Identifiers

Tobias Weigel (DKRZ) 3

What are PIDs?

Page 4: Persistent Identifiers

Tobias Weigel (DKRZ) 4

10876/abc123 10.1594/WDCC/CMIP5.NCCNMpc ark:/13030/tf5p30086k http://purl.org/dc/elements/1.1/ urn:lsid:ubio.org:namebank:11815

Persistent identifiers come in various formats

Page 5: Persistent Identifiers

Tobias Weigel (DKRZ) 5

Why do we need PIDs?

Tracking ID CIM IDDRS Syntax ...

Page 6: Persistent Identifiers

Tobias Weigel (DKRZ) 6

PIDs point to resources

101010101010101

10876/abc123

Resolution service

http://example.com/xyz567

Page 7: Persistent Identifiers

Tobias Weigel (DKRZ) 7

The resource is a black box

101010101010101

Data Metadata

Software code

Document?

?

?

Page 8: Persistent Identifiers

Tobias Weigel (DKRZ) 8

PIDs are globally unique

101010101010101

10876/abc123

101010101010101

10876/abc123

Page 9: Persistent Identifiers

Tobias Weigel (DKRZ) 9

URLs are not persistent over time („link rot“)

101010101010101

http://example.com

101010101010101

http://example.com

Today 2015 2020

http://example.com

404Not found

Page 10: Persistent Identifiers

Tobias Weigel (DKRZ) 10

PIDs are persistent over time

101010101010101

10876/abc123

101010101010101

10876/abc123

101010101010101

10876/abc123

Today 2015 2020

Page 11: Persistent Identifiers

Tobias Weigel (DKRZ)

PIDs establish a redirection layer

101010101010101

10876/abc123

http://... http://... http://...

Stable

Unstable

Page 12: Persistent Identifiers

Tobias Weigel (DKRZ) 12

Create PID Update the URL the PID points to (Delete PID)

Operations on a PID

Page 13: Persistent Identifiers

Tobias Weigel (DKRZ) 13

Handle System Archival Resource Key (ARK) Life Science Identifier (LSID) Persistent URL (PURL) Uniform Resource Name (URN) ...

There are many PID systems / infrastructures

Page 14: Persistent Identifiers

Tobias Weigel (DKRZ) 14

How do PID infrastructures differ?

Identifier name schema

Resolver service

Persistency mechanism

Additional services

PID Infrastructure

Page 15: Persistent Identifiers

Tobias Weigel (DKRZ) 15

We need to go beyond the simple redirection view

Extended PIDs

Page 16: Persistent Identifiers

Tobias Weigel (DKRZ) 16

Some information must be stored persistently

101010101010101

Checksum: 7D01E436

10876/A

101010101110101

Today

10876/A

2015

Checksum: 7D01E436

Verify...!

Page 17: Persistent Identifiers

Tobias Weigel (DKRZ) 17

Build more complex information structures

[1, 5, 13, 9, 12]

A: 1B: 4C: 3D: 7

Page 18: Persistent Identifiers

Tobias Weigel (DKRZ) 18

Collections of PIDs are required for our use cases

10876/B10876/B

10876/B10876/B

10876/A

10876/collection1

Page 19: Persistent Identifiers

Tobias Weigel (DKRZ) 19

Graphs or trees of PIDs are required as well

10876/A

10876/C10876/B

Page 20: Persistent Identifiers

Tobias Weigel (DKRZ) 20

The graph nodes and edges may be typed

10876/A

10876/C10876/B

has metadata

older version has metadata

Data object Metadata object

Data Object

Page 21: Persistent Identifiers

Tobias Weigel (DKRZ) 21

The graph structure must be stored persistently

10876/C10876/B

has metadataData object Metadata object

One combined entity101010101010101

101010101010101

Page 22: Persistent Identifiers

Tobias Weigel (DKRZ) 22

Collections can be realized through graphs

10876/B 10876/A

10876/collection1

10876/B

Page 23: Persistent Identifiers

Tobias Weigel (DKRZ) 23

What must be stored persistently?

Minimal metadata (key-metadata)

• Checksum• PID creation time stamp• Graph structure (links)• Collection membership

static

dynamic

Page 24: Persistent Identifiers

Tobias Weigel (DKRZ) 24

Levels of preservation

101010101010101

Primary level of preservation

Secondary level of preservation

10876/abc123

Minimal metadata

Page 25: Persistent Identifiers

Tobias Weigel (DKRZ) 25

Relation types must be standardized.

Research Data Alliance• WG ‘PID Information Types’• WG ‘Type Registry’• collections?

PIDs are a topic for international collaboration

10876/A

10876/C10876/B

10876/B10876/B

10876/B10876/B

10876/A

10876/collection1

Page 26: Persistent Identifiers

Tobias Weigel (DKRZ) 26

Usage scenario: Provenance as a DAG

cdot

Data object PID

Link „was derived from“

Page 27: Persistent Identifiers

Tobias Weigel (DKRZ) 27

How do we actually use PIDs?

Software

Page 28: Persistent Identifiers

Tobias Weigel (DKRZ)

For technical reasons• key-metadata is unique feature that is

required for PID graphs For practical reasons – examples:

• ARKs and URNs lack wide adoption and support

• PURL maintenance is not clear• LSIDs in older literature are not

persistent Handle System has an operational

perspective

I am biased towards the Handle System

Page 29: Persistent Identifiers

Tobias Weigel (DKRZ) 29

Developed by CNRI• Corporation for National Research

Initiatives Registered trademark Fee for registering new prefixes (e.g.

10876) Customers e.g.

• US military• International DOI Foundation

What is the Handle System?

Page 30: Persistent Identifiers

Tobias Weigel (DKRZ) 30

URL: www.dkrz.deChecksum: ...

How does the Handle System work?

Prefix DB

1001

10876

1234

Central resolution service

10876/100 100

Page 31: Persistent Identifiers

Tobias Weigel (DKRZ) 31

What are Digital Objects?

10876/abc123http://example.com/xyz789

Page 32: Persistent Identifiers

Tobias Weigel (DKRZ) 32

Build a stack of lightweight components

LAPIS API for Persistent Identifier Services (on GitHub)

Page 33: Persistent Identifiers

Tobias Weigel (DKRZ) 33

Weigel et al.: “A framework for extended persistent identification of scientific assets” (submitted to the Data Science Journal)

Duerr et al., doi:10.1007/s12145-011-0083-6

Further reading

Page 34: Persistent Identifiers

Tobias Weigel (DKRZ) 34

All slides available here:redmine.dkrz.de/seminar

Thank you.

Page 35: Persistent Identifiers

Tobias Weigel (DKRZ) 35

LTA application: Q4 2012, Q1 2013 EUDAT integration: 2014 CMIP6+: 2014

The greater plan

2012 2013 2014 2015Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1

Entwicklung BasissoftwareErste Umsetzung in C3

Definition der Basiskonzepte

Umsetzung LZA

Nutzergruppen erfassen

Konsolidierung Entwicklungsstand in EUDAT

Ausstehende Fragen sichten und ggf. beantworten

Ausblick CMIP6+

Provenance: Literatur und Begriff

Festlegung erster Requirements

Erster FrameworkentwurfRelated work erfassen

Ausweitung RequirementsProvenance Tracing von der Quelle zum ZielProvenance für Erdsystem-Datenobjekte

Verfeinerung des Frameworks im Randbereich

Vergleich mit related workFuture work erfassenVerhältnis zu Linked Data bewerten