Integration - the heart of researcher centric research data management systems - Steve Mackey,...

23
Integration – the heart of researcher centric research data management systems Steve Mackey 15 January 2015 1

Transcript of Integration - the heart of researcher centric research data management systems - Steve Mackey,...

1

Integration – the heart of researcher centric research data management systems

Steve Mackey

15 January 2015

2

Agenda

• Who we are, what we do• How it works• RDM systems, where it fits• Workflows• Integrations

21 October 2014

3

Archive storage with a difference

Flagship Arkivum100 service with 100% data integrity guarantee

World-wide professional indemnity insurance – Arkivum100

Long term contracts for enterprise data archiving

Fully automated and managed solution

Audited and certified to ISO27001

Data escrow, exit plan, no lock-in

21 October 2014

Adding media – effectively continual process

Monthly checks and maintenance updates

Annual data retrieval and integrity checks

Hardware refresh

Software migration

Hardware migration

Tape format migration – LTO n to LTO n+2

Support and admin staff migration

Change of supplier of products and services

Keeping Data Alive for 25+ Years

3-5 year obsolescence of servers, operating systems and software

5

Arkivum Appliance• CIFS/NFS presentation

(integrates easily to local file systems)

• Simple administration of user access permissions and storage allocations

• Robust REST API for application integration

• GUI for file ingest status, recovery pre-staging, security

• Ingest triggered by: timeout, checksum exchange, manifest (bulk).

• Checksum/fixity chain of custody from ingest through replication

• Immutable (WORM)• Regular (6 monthly) data

copy read verify• Offline Escrow data copy

(open source, self describing)

• Data encryption throughout keys only held by customer

21 October 2014

Arkivum Service

Arkivum Gatewayon ApplianceOriginal

Datasets& Files

Copy foringest

Arkivum Service

Arkivum Gatewayon Appliance

Copy foringest

OriginalDatasets& Files

EncryptedArchive

EncryptedArchive

Arkivum Service

Arkivum Gatewayon Appliance

Copy foringest

OriginalDatasets& Files

ValidatedArchive

Decryptedobject

Arkivum Service

Arkivum Gatewayon Appliance

Copy foringest

OriginalDatasets& Files

Archive Copy 1

ValidatedArchive

Arkivum/100

Arkivum Gatewayon Appliance

Archive Copy 1

Archive Copy 2

Copy foringest

OriginalDatasets& Files

ValidatedArchive

Arkivum/100

Arkivum Gatewayon Appliance

Archive Copy 1

Archive Copy 2

Copy foringest

OriginalDatasets& Files

ValidatedArchive

Arkivum/100

Arkivum Gatewayon Appliance

Archive Copy 1

Archive Copy 2

Escrow Copy

Copy foringest

OriginalDatasets& Files

ValidatedArchive

Arkivum/100

Arkivum Gatewayon Appliance

Archive Copy 1

Archive Copy 2

Escrow Copy

OriginalDatasets& Files

ValidatedArchive

CachedCopy

Arkivum/100

Arkivum Gatewayon Appliance

Archive Copy 1

Archive Copy 2

Escrow Copy

CachedCopy

ValidatedArchive

http://datablog.is.ed.ac.uk/2013/12/06/the-four-quadrants-of-research-data-curation-systems/

PUREElementsConveris

ePrints,Dspace,Hydra

FigshareRe3data.orgLanding pagesCKAN

Institutional storage

17

Workflows

• RDM Workflow - The sequence of repeatable processes (steps) through which Research Data passes during its lifecycle, including the steps involved in its creation, curation, preservation, access and eventual disposal.

21 October 2014

18

RDM Workflows Report

• JISC Research Data Spring

• A Consortial Approach to Building an Integrated RDM System – “Small and Specialist”

• http://dx.doi.org/10.6084/m9.figshare.1476832

21 October 2014

19

Researcher Centric Workflow

21 October 2014

Figshare (Amazon)

Archive (Arkivum)

Rese

arch

er 8. Data DOI

2. Data files

Local Research Data

5. Data DOI

DataCite (BL)

HR system

1. Researcher details

Web browser

4. Mint DOI

3. Data Description

Journal7. Article

CRIS(Elements)

6. Data DOI

12. Dataset Description and Data DOI

9.Article and Article DOI

14. Data files

Repository(DSpace)

10. Article and Article DOI

13. Dataset Description And Data DOI

Article DOI

16. Data is safe

15. Data is safe

11. Article DOI

21

Why integrate?• Simpler and easier RDM processes from a Researcher perspective, which both

encourages adoption and lowers the cost of institutional support to the research base. • Clear and repeatable RDM processes that help ensure higher levels of quality and

consistency in RDM across the research base. • Ability to deploy RDM as community-driven shared service(s) so that smaller

institutions can ‘join forces’ to benefit from having access to a common RDM infrastructure.

• Scaling RDM up across a large research base using automation and ‘factory’ type approaches to achieve ‘economies of scale’ and move away from RDM being a manual and labour intensive endeavour.

• Specifically for Archive layer storage this may include:– Confirmation of integrity of received files via checksums/fixity– File archive status reporting– Trigger for original file deletion– File location, data pool management– File recovery staging– Encryption key management

21 October 2014

22

Data Archiving - Integrations

21 October 2014

2321 October 2014

Questions?