I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop...
-
Upload
aaron-hutchinson -
Category
Documents
-
view
212 -
download
0
Transcript of I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop...
I2S2 - Infrastructure for Integration in Structural Sciences
Information Model Development WorkshopRAL 11th February 2010
http://www.ukoln.ac.uk/projects/I2S2/
I2S2: Project Overview
• Identify requirements for a data-driven research infrastructure in the Structural Sciences
• Understand localised data management practices
• Understand large centralised facilities data management infrastructure
OrganisationalScale
Discipline
Data LifecycleConceive Research Propose Analyse Publish Curate Reuse
ChemistryPhysics
Materials
Earth Sciences
Life Sciences
Medical
EngineeringTechnology
Lone Researcher
Research Group
Department
Mid-range Service
Central Facility
Increased efficiency through effective cross-institutional data management
I2S2: Institutional Context
• University of Cambridge– “lone” researcher scenario– data sharing with colleagues via email– Little or no infrastructure– data received from ISIS is currently stored on laptop or WebDAV server– management of intermediate and derived data (intra & inter institution) a major issue
• EPSRC National Crystallography Service– service provision function– operates across institutions – moderate infrastructure– raw data generated in-house is stored at ATLAS– Local / institutional repository for intermediate and derived data
• Diamond & ISIS Central Facilities– operates on behalf of multiple institutions (community)– established, ‘formulaic’ & bespoke processes for experiments – large infrastructure engineered to manage raw data– derived data taken off site on laptops / removable drives– results data independently worked up and published
Bridging the gap between raw and derived data
MethodologyMapping across organisational infrastructures
Proposals
Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment.
Experiment
Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team
Analysed Data
You will have the capability to upload any desired analysed data and associate it with your experiments.
Publication
Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications.
B-lactoglobulin protein interfacial structureE
xam
ple
IS
IS P
rop
osa
l
GEM – High intensity, high resolution neutron
diffractometer
H2-(zeolite) vibrational frequencies vs polarising
potential of cations
Home Institution Central Facility
MethodologyAn established starting platform
http://code.google.com/p/icatproject/
Investigation
Publication KeywordTopic
SampleSample
ParameterDataset
Dataset Parameter
Datafile
Datafile Parameter
Investigator
Related DatafileRelated Datafile
Parameter
Authorisation
Core Metadata model forms the information model for ICAT.
Designed to describe facilities based experiments in Structural Science.
Forms the basis for extension:
- To laboratory based science- To secondary analysis data- To preservation information- To publication data
Covering the scientist’s research lifecycle as well as the facilities.
Basis of I2S2 integrated information model
I2S2: Research Challenges• Research teams capture, manage, discuss and disseminate their
data in relative isolation with highly fragmented data infrastructures and poorly integrated software applications
• Conventional systems of publication lead to insufficient information on provenance of results and irreproducible experiments
• The processes for recognition lead to a lack of inclination and incentive to share or make all the supporting information for a study publicly available
• A low awareness of data curation and preservation issues leads to data loss and reduced productivity
• Scale and complexity: from small lab equipment through institutional installations to large scale facilities such as the DLS and ISIS at STFC
• Inter-disciplinary: research across domain boundaries• Data lifecycle: time-factored data flows and data transformations
I2S2: Objectives & DeliverablesObjectives:
Broadly: development of pilot data management infrastructure solutions which bridge discipline, laboratory and institutional boundaries
Specifically, development of data practices across the research data lifecycle: – A framework for data management, deployable across Structural Sciences – Explore a range of data acquisition techniques at different scales (complexity,
volume, definition) – Advocate recognition in the community for sharing data to encourage reuse– Facilitate access to data underpinning publications with higher levels of
verification, resulting in higher quality research– Support long-term preservation assuring future discovery of results
Deliverables (in addition to project reporting):
D1.1 Requirements Report D1.2 Two Use CasesD2.1 Extended Cost Model based on KRDS2D2.2 Cost Analysis Phase 1 D3.1 Integrated Information Model D3.2 Pilot Implementation PlanD3.3a Pilot One: Scale and Complexity (Chemistry)D3.3b Pilot Two: Inter-disciplinary (Earth Science)
D4.1 Cost Analysis Phase 2 D4.2 Benefits report and business modelD5.1 Advocacy and Training materials D5.2 Two Workshops (disciplinary; RDMF)D6.3 Evaluation report D6.4 Sustainability recommendations
I2S2: Measuring success & Future
• Implementation of key aspects of the project to bridge the chasm between large scale facilities and the lab bench– Integrated Information Model based on the Core Scientific
Metadata Schema and the DCC Digital Curation Lifecycle Model– Development of use cases and inter-disciplinary Pilots– Before and after cost-benefit analysis– Advocacy and training materials
In 18 months time … • Realised the potential to streamline the working processes
of structural scientists across domains and organisations