NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system...

20
N EUROIMAGING R ESEARCH D ATA L IFE - CYCLE M ANAGEMENT Hurng-Chun (Hong) Lee, Robert Oostenveld, Erik van den Boogert, Eric Maris

Transcript of NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system...

Page 1: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

NEUROIMAGING RESEARCH DATA LIFE-CYCLE MANAGEMENT

Hurng-Chun (Hong) Lee, Robert Oostenveld, Erik van den Boogert, Eric Maris

Page 2: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Outlines• Lifecycle RDM: objectives and challenges

• The method

• the RDM protocol

• Donders Research Data Repository (DRDR) - usage of iRODS

• Strength and weakness of DRDR

• Future focuses

2

Page 3: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Lifecycle RDM

• RDM spans the entire research lifecycle

• Objectives:

• long-term data preservation

• scientific-process documentation

• data publication

conception of researchdata acquisition

data analysispublication

3

Page 4: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Challenges• large institute with heterogeneous scientific-administrative workflows

• 600 researchers in PI groups ➞ more than 150 projects per year

• 3 centres with 4 administrative domains

• data complexity:

• text, audio/video, imaging or signal data, etc.

• sensitive data

• size ranges from a few large (>2GB) files to a huge amount of small files (<1MB)

• user expectation

4

Page 5: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The RDM protocoldata acquisition

data analysis

data publication

conception of research

5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

Page 6: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The RDM protocoldata acquisition

data analysis

data publication

conception of research

research documentation collection

data sharing collection

collection: a container of ✓ data (files/folders) ✓ metadata has a “state” attribute

data acquisition collection

5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

Page 7: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The RDM protocoldata acquisition

data analysis

data publication

conception of research

research documentation collection

data sharing collection

collection: a container of ✓ data (files/folders) ✓ metadata has a “state” attribute

data acquisition collection

PID

5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

Page 8: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The RDM protocoldata acquisition

data analysis

data publication

conception of research

research documentation collection

data sharing collection

research administrator

(RA)

manager

contributor

viewer

collection: a container of ✓ data (files/folders) ✓ metadata has a “state” attribute

data acquisition collection

PID

5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

Page 9: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The RDM protocoldata acquisition

data analysis

data publication

conception of research

research documentation collection

data sharing collection

research administrator

(RA)

manager

contributor

viewer

collection: a container of ✓ data (files/folders) ✓ metadata has a “state” attribute

workflow

responsibility

eligibility

data acquisition collection

PID

5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

Page 10: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The RDM protocolOrganisational Unit (OU)

data acquisition

data analysis

data publication

conception of research

research documentation collection

data sharing collection

research administrator

(RA)

manager

contributor

viewer

collection: a container of ✓ data (files/folders) ✓ metadata has a “state” attribute

workflow

responsibility

eligibility

data acquisition collection

PID

5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

Page 11: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

The data repository• a iRODS-based ICT system implementing the workflow defined by the protocol

• a single data-management system enabling internal collaboration, and external data sharing

storage system

data management middleware

interfaces

WebDAV Stagermanagement portal

iRODS management rules ELK stack

✓file-based system ✓single and uniform namespace ✓access-controlled metadata management ✓authentication via trusted identity providers ✓role-based authorisation ✓data replication for disaster recovery ✓workflow automation and policy

enforcement

6

Page 12: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Storage resources

vault_ou1_1 vault_ou1_2 vault_ou2_1 vault_ou2_2

resc_ou1 resc_ou2

vault_nl_1

resc_nl

NFS export NFS export

Location A (first replica) Location B (second replica)

stor

age

data of collections of OU1data of collections of OU2

asynchronousdata replication

iRO

DS re

sour

ces

quot

a

quot

a

quot

a

quot

a

files

yste

m

dataflow controle

load-balancing/OU-level quota

two identical copies of data (disaster recovery)

7

Page 13: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Collection namespace

/rdm/DI/DCCN/DAC_3010000.01_123/

iRODS zone

organisation

organisational unit

DRDR collection

admin viewer

manager contributor

viewer

• namespace reflects administrative hierarchy

• metadata in KVU triplets

• role-based authorisation with iRODS groups

8

Page 14: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Management rules• a RPC-like interface for collection management

rdmUpdateCollectionMetadata

Filter out attributes the client doesn’t have right to set

Verify attribute value

Update collection attributes

return up-to-date collection attributes

Server (core.re)Clientinputs: *collName, *kvp

output: *errorcode, *collectionAttrs

9

Page 15: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

User provisioning and authentication

• authentication via a national federated IdP

• user is provisioned upon sign-up to the management portal

• IdP attributes are stored as user KVU-triplets in iRODS

• setup PAM authentication on OTP (one-time password) for data access

10

Page 16: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Event logging

iCATPEP

filebeatrodsLog

reporting

auditing

• essential user actions are logged as events

• non-blocking way streaming events to Elastic stack via “filebeat”

11

Page 17: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

User interfaces• separating data-access from collection management

• web portal for collection management

• WebDAV (Davrods) for “easy” data transfer

• file stager: a service “intelligently” managing bulk file transfer between a local storage and the repository

actual transfer

file stager

transfer job manager/queue

local storage DRDRirsync

transfer agents

web interface

12

Page 18: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Strength and weakness👍 It fits to a combined scientific-administrative workflow of a

large and heterogeneous institute

👍 It provides sufficient functionality for

1. sharing data for publication

2. implementing Data Management Plan (DMP)

👎 It has weak integration with data analysis facility

👎 It doesn’t implement standard way of organising collection content

13

Page 19: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Future focuses

• seamless integration with computing (data-analysis) facility

• FAIR-ness of published data collections

• adoption beyond neuroimaging

14

Page 20: NEUROIMAGING RESEARCH DATA IFE CYCLE MANAGEMENT · The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management

Summary• We structured a RDM workflow

• which covers the entire research lifecycle

• in which both researcher and administrator take part of responsibility

• which is specified by protocol; implemented by a iRODS-based digital repository

https://data.donders.ru.nl

15