Digital Curation or Digital Data? The impact of Services and
FederationPhil Lord
Newcastle University
Take Home Messages
• Curation is important for the CARMEN project and neuroinformatics
• To enable repeatability and rerunability, curation of both services and data are of equal importance
• To enable federation and autonomy, data release, license and other policies need to be operated over computationally.
Research Challenge
Understanding the brain may be the greatest
informatics challenge of the 21st century
Worldwide >100,000 neuroscientists(~ 5,000 in UK) are generating vast amounts of data
Principal experimental data formats:
molecular (genomic/proteomic)
neurophysiological (time-series electrical measures of activity)
anatomical (spatial)
behavioural
Neuroinformatics concerns how these data are handled and integrated, including the application of computational modelling
Need for Cooperation
Understanding the brain may be the greatest
informatics challenge of the 21st century
OECD Neuroinformatics Working Group identified the need to work cooperativelyin order to achieve major advances
Cooperation will permit:
development of common processes
best value from data, including long term curation
‘mega-analysis’ of large data sets
integration of data sets across different scales and different approaches
interdisciplinary research
CARMEN – Focus on Neural Activity
resolving the ‘neural code’ from the timing of action potential activity
Understanding the brain may be the greatest
informatics challenge of the 21st century
neurone 1
neurone 2
neurone 3
raw voltage signal data collected by patch-clamp and single & multi- electrode array recording novel optical recording, particularly the activity dynamics of large networks
• CARMEN is a new e-Science Pilot Project, (UK research council funded) in Neuroinformatics.
• To create a grid-enabled, real time ‘virtual laboratory’ environment for neurophysiological data
• To develop an extensible ‘toolkit’ for data extraction, analysis and modelling
• To provide a repository for archiving, sharing, integration and discovery of data
• To achieve wide community and commercial engagement in developing and using CARMEN– CARMEN is a 4 year project: if it is to last longer, it must become
financially self-sufficient.• See http://www.carmen.org.uk
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
CARMEN Active Information Repository Node
Dynamic Service Deployment - Dynasoar
R
C WSP
req
res
1
Compute Machines
node 1s2, s5
…
node 2
node ns2
Web Server
3
2: service fetch &deploy
SR
Service Repository
Client
CAIRN
Distribution and Federation
Initially, we plan to have two CAIRNS
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Distribution and Federation
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
CARMEN’s perspective
• We wish to store data, store it’s provenance, store it’s usage.
• We need release policies, we need retention policies, we need to understand ownership
What do we get from this?
• Replicability: one scientist should be able to repeat another’s experiment, under equivalent conditions, at a different time.
• Rerunability: a scientist should be able to apply an
equivalent technique under new circumstances.
• The addition of services into this mix complicate the issue.
New DataOld Data
Replicability Rerunability
New Data
Old Data Old Services
New ServicesReplicability
Rerunability
Is the specification of what
happened actually right?
Has the state of the world advanced since previously?
Has the world changed, in a comparable way?
Has the service changed in a comparable way?
Error-Prone
Neuroscientist
Eager Neuroscientist
Neurosciensist comparing to existing work
Tool Builder
So, what is problem?
• I would like to rerun this experiment and release the results. Can I?
• Is the new data available? • Is the new data public? • Does the license allow derived results?• Who owns the derived results?
– data license– software license
So, whats the problem?
• Can I compare how new data would have changed the results? – Is that data available? (New and Old)– Is that data public? (New and Old) etc…
• Is it embargoed – will it become public later?
– Do the licenses allow derived results? – Who owns the derived results?
• The licenses may conflict
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
CARMEN Active Information Repository Node
Whose release policy?
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Data
Metadata
Core ServicesExternal Client
..............
External Client
Sec
urity
Service 1
Service 2
Service n
Service 1
Service 2
Service n
Client Dynamically Deployed Services
Workflow Enactment
Engine
Registry
Policy Issues
• One of the main purposes of the CAIRN is to hide the distribution.
• What if the CAIRNs have different release policies? What if they have different licenses?
• We cannot inflict these differences on the user. • Therefore, we must be able to compute over
policies• We must be able to represent justifications back
to the users
An Example: Licensing
• Computationally amenable licenses are available
• Take, for example, Creative Commons
Take Home Messages
• Curation is important for the CARMEN project and neuroinformatics
• To enable repeatability and rerunability, curation of services and data are of equal importance
• To enable federation and autonomy, data release, license and other policies need to be operated over computationally.
AcknowledgementsProfessor Colin Ingram, Professor Jim Austin, Professor Leslie Smith, Professor Paul Watson Dr. Stuart Baker,Professor Roman Borisyuk, Dr. Stephen Eglen, Professor Jianfeng Feng, Dr. Kevin Gurney, Dr. Tom Jackson Dr. Marcus Kaiser, Dr. Phillip Lord, Dr. Paul Overton, Dr. Stefano Panzeri, Dr. Rodrigio Quian Quiroga, Dr. Simon Schultz, Dr. Evelyne Sernagor, Dr. V. Anne Smith, Dr. Tom Smulders Professor Miles Whittington, Christoph Echtermeyer, Martyn Fletcher, Frank Gibson, Mark Jessop Dr. Bojian Liang, Juan Martinez-Gomez, Dr. Chris Mountford, Agah Ogungboye, Georgios Pitsilis, Dr. Daniel Swan
University ofSt Andrews
TheUniversity OfSheffield
Top Related