Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group,...

Post on 28-Mar-2015

220 views 5 download

Tags:

Transcript of Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group,...

Towards an information model for I2S2

Brian Matthews, Leader, Scientific Applications Group,E-Science Centre,STFC Rutherford Appleton Laboratory

brian.matthews@stfc.ac.uk

Facilities Process

Proposal

Approval

Scheduling Experimen

tData storage

Record Publicatio

n

Scientist submits

application for beamtime

Facility committee approves

application

Facility registers, trains, and schedules

scientist’s visit

Scientists visits, facility

run’s experiment

Subsequent publication

registered with facility

Raw data filtered,

cleansed and stored

Data analysis

Tools for processing

made available

Characteristics : - formal application - set processes - central infrastructure - standard tools - hierarchical control - dedicated staff

•user office•instrument scientists•Library and IT support

Requirements

• Secure access to user’s data• Flexible data searching• Scalable architecture• Extensible architecture• Integration with analysis tools• Access to high-performance resources• Linking to other scientific outputs• Data policy aware

Principles

Online Proposal System

User Office System:

User Database

Scheduling

Health and Safety

Proposal Management

Metadata Catalogue

Data Acquisition

System

Storage Management

System

DataAccess Portal

Single Sign On Account Creation and Management

ICAT Software Suite, providing the crucial integration of key functions.

The ICAT software suite

• Catalogues all experiment related information

• Metadata gathered via integration with existing IT systems

– proposal systems– data acquisition

• Provides a well defined API for easy embedding into any applications.

Access data anywhere via the web Annotate and Search for data Share data with colleaguesAccess data via user’s own programs Utilise integrated e-Science resources Link to data from your publications

Component architectur

e

RDBMS

Web Services API

ICAT API

Command Line Tools

Glassfish / JBOSS

JavaC++Fortran

Data Storage/ Delivery System

Single Sign On

User Database System

Proposal System

Proposal System

Publication SystemPublication System

e-Science Servicese-Science Services

Software Repositor

y

Software Repositor

y

ICAT Deployment

Data Portal

TopCat

Towards an Information Model

Methodology

The Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnstonhttp://dublincore.org/documents/singapore-framework/

Functional requirements

A Metadata Model for Facilities Science

A common general format/standard for Scientific Studies and data holdings metadata did not exist

By proposing a Model– A specification for the types of metadata to

capture Scientific Studies– Cataloguing data holdings: provide access for

the Data Owner– Ease citation, sharing collaboration, and

integration– Allow easy Federation of distributed

heterogeneous metadata systems into a homogeneous (virtual) Platform

Therefore – The Common Scientific Metadata Model (CSMD) developed.

A Domain Model

Modelling Scientific Activity

Investigation

Publication KeywordTopic

SampleSample

ParameterDataset

Dataset Parameter

Datafile

Datafile Parameter

InvestigatorReference / Proposal IdPrevious ReferenceFacilityInstrumentTitleAbstractEtc.

Name

Name/Units/Value etcSearchableIs Sample ParameterIs Dataset ParameterIs Datafile ParameterVerified

NameUnitsString ValueNumeric ValueRange TopRange BottomError

Full ReferenceURL

Repository

NameParent Id

Topic Level

User IdRole

NameChemical FormulaSafety Information

NameUnitsString ValueNumeric ValueRange TopRange BottomError

NameSample Id

Description

NameUnitsString ValueNumeric ValueRange TopRange BottomError

NameDescription

VersionLocation

FormatFormat Version

Create TimeModify Time

SizeChecksum

Related DatafileRelated Datafile

Parameter

Authorisation

Source Datafile IdDestination Datafile Id

RelationS/W Apllication

S/W Version

User IdRole e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc.

Element TypeElement Id

Damian FlanneryCore Scientific Metadata Model

Description set profile

Metadata granule

Metadata Granule

Topic

Study Description

Access Conditions

Data Location

Data Description

Keywords providing a index on what the study is about.

Provenance about what the study is, who did it and when.

Conditions of use providing information on who and how the data can be accessed.

Detailed description of the organisation of the data into datasets and files.

Locations providing a navigational aid to where the data on the study can be found.

References into the literature and community providing context about the study.

Related Material

Legal Note

Copyright, patents and conditions of use etc relating to the study and the data in the study

.

ICAT 3.3 Schema – Study (2)

Syntax and metadata formats

ICAT API and XML format

ICAT 3.3 Database Schema

CSMD HistoryModel first pilot developed in 2001!• Now in ICAT 3.3• Serving data from STFC Facilities (ISIS, DLS)• Model proven robust – simple yet expressive

– http://code.google.com/p/icatproject/

I2S2 - Infrastructure for Integration in Structural Sciences

Bridging the gap between raw and derived data

“Lone” researcher scenario• data sharing with colleagues via email• Little or no infrastructure• Little management of raw or derived data

EPSRC National Crystallography Service

• service provision function• operates across institutions • moderate infrastructure

Diamond & ISIS•operates on behalf of multiple institutions •processes for experiments •large infrastructure engineered to manage raw data•derived data taken off site on laptops / removable drives

Interactions between research process

Grant Proposal

Facilities Proposal

FacilitiesExperimen

tData

cleansing

Record Publication

Data analysis

Local experimen

ts

Simulation

Sample Preparatio

n

Literature Review

Publication

Proposal

Approval

Scheduling

Facilities Experimen

t

Data storage

Record Publication

Analysis Tools

CS

MD

Cover the scientist’s research lifecycle as well as the facilities.

Extend to

To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies

By being: - standardised - modular - extensible

Methodology

The Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnstonhttp://dublincore.org/documents/singapore-framework/

Issues

• Metadata model• Framework for developing metadata model• Modularisation mechanisms and extensions• Formats

• Model supporting laboratory tools– How does the model fit ?– Flexibility to handle local processes

• Adhoc, partial, un-ordered

– What needs changing in the model?– What needs changing in tools?

• Data input and maintenance???• Simple ways of inputting the data• Lab books?

Extension areas:

• Secondary analysis data• Preservation data• Publication data• Topic data

• chemistry

• Controlled lists (ontologies) for • Instruments• Facilities,• Methods

• Access control• Safety data• Blogs and notebooks

ISIS - ICAT

Part of ISIS study

Gudrun

Control fileCorrection data Sample data Calibration data

Scattering function data

User inputs

Derived Data

Generalised model

Managing the links between data

Inputs of data sets

Associated with a software item with a set of parameters

Managing this? - lab-books ? - simple tools? - VRE ?