Publication of facility investigations Brian Matthews Scientific Information Group Scientific...
-
Upload
gerard-fletcher -
Category
Documents
-
view
228 -
download
0
Transcript of Publication of facility investigations Brian Matthews Scientific Information Group Scientific...
![Page 1: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/1.jpg)
Publication of facility investigations
Brian Matthews
Scientific Information GroupScientific Computing Department
STFC Rutherford Appleton Laboratory
![Page 2: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/2.jpg)
Scientific computing develop and operate computing infrastructure - HPC, PB Datastore, s/w, data management…
Funds and operates large scale science for UK Research base - physics, astronomy - chemistry, materials
ESO: Alma Array
STFC
![Page 3: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/3.jpg)
Major Science Facilities
Big Science Particle Physics - exploring the very small Space Science - exploring the very large
Small ScienceUnderstanding the world around us at a molecular levelLasers, Neutron & Light Source – ISIS & Diamond
![Page 4: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/4.jpg)
Facilities Support
Big Facilities for Small Science
Diamond
ISIS
CLF
![Page 5: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/5.jpg)
Science at STFC Facilities
data
ComputingAnalysisModelling
knowledge
beamsample Imaging
detector
Neutrons and photons Provide complementary views of matter:
Photons “see” electric charge – high atomic number nuclei
Neutrons “see” nucleons – especially hydrogen atoms
![Page 6: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/6.jpg)
The science we do - Structure of materials
Fitting experimental data to model
Bioactive glass for bone growth
Structure of cholesterol in crude oil
Hydrogen storage for zero emission vehicles
Magnetic moments in electronic storage
• ~30,000 user visitors each year in Europe: – physics, chemistry, biology,
medicine, – energy, environmental,
materials, culture– pharmaceuticals,
petrochemicals, microelectronics
Longitudinal strain in aircraft wing
Diffraction pattern from sample
Visit facility on research campus
Place sample in beam
• Billions of € of investment– c. £400M for DLS– + running costs
• Over 5.000 high impact publications per year in Europe
– But so far no integrated data repositories
– Lacking sustainability & traceability
![Page 7: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/7.jpg)
• Similar architecture use for DLS
• Scaling is a constant concern
• Data rates keep increasing• 70TB per month
and rising
• Tailored ICAT• Reengineered
StorageD
duodesk
DLS Proposal Entry
http://duo.diamond.ac.uk/propman
2
ICAT
External lookup data:/home/oracle/
external_tables/dls33
JOB: icat_dls_propagationON: orisa.icatdls
FREQUENCY: 1 hourDB LINK: duodesk.dl.ac.uk
ACTION: Pull data from DuoDesk to ICAT
JOB: icat_dls_propogationON: orisa.icatdls
FREQUENCY: 30 minsACTION: Load lookup data into ICAT
IDMAN
CDR
JOB: SSO - SYNCRONISATION PRODON: orisa.sso
FREQUENCY: daily at 08:45DB LINK: cdr.esc.rl.ac.uk
ACTION: Pull data from CDR to IDMAN
SSO-MyProxy
vintela
GDA
valid user check
XML Ingest
StorageD
SRB Scriptsb1-storage1
Atlas Data Store
DATA PORTAL/ICAT API
Active Directory
KDC
Certificate
JOB: cron scriptON: sso-myproxy
FREQUENCY: daily at 09:18ACTION: Pull data from IDMAN to
gridmap file (mapping FedID to DN)
CA
Kerberos Token
FedID/Password
FedID/Password
Check FedID/Password
Kerberos Authentication
SRB containers Transfer data to tape
User User
SQL
Scommands
User
75
1
Diamond e-Infrastructure
8
13
12
15
19
17
16
18
28
2726
21
20
JOB: icatdls33_propagationON: orisa.icatdls33
FREQUENCY: 30 minsACTION: Push data to iKittens
DArc
lustre
EDNA MX/DNA Drop file
MX: strategy for data collection
Drop file
22
data
data
data
23
24
25
29
Local Beamline lustre Client
24
UNIX Group created for Visit/Users File to linux administrator
30
ISPyB
14
Picture location
DUO Desk Applications
4
Federal ID
iKitten Databases iKitten Databases
I12I03I02 B22B18B16I22I20I19I18I16I15I11I07I06I04 I24 B23
iKitten Databases
11
iKitten Databases
![Page 8: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/8.jpg)
Proposals
Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment.
Experiment
Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team
Analysed Data
You will have the capability to upload any desired analysed data and associate it with your experiments.
Publication
Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications.
B-lactoglobulin protein interfacial structureE
xam
ple
IS
IS P
rop
osa
l
GEM – High intensity, high resolution neutron diffractometer
H2-(zeolite) vibrational frequencies vs polarising
potential of cations
Central Facility
• Secure access to user’s data
• Flexible data searching
• Scalable and extensible architecture
• Integration with analysis tools
• Access to high-performance resources
• Linking to other scientific outputs
• Data policy awarehttp://code.google.com/p/
icatproject/
![Page 9: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/9.jpg)
Investigation
Publication KeywordTopic
SampleSample
ParameterDataset
Dataset Parameter
Datafile
Datafile Parameter
Investigator
Related Datafile
Parameter
Authorisation
Core Scientific Metadata Model (CSMD)
The Core Metadata model forms the information model for ICAT.
Designed to describe facilities based experiments in Structural Science.
![Page 10: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/10.jpg)
TopCat
![Page 11: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/11.jpg)
DOI’s for Data Publication
![Page 12: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/12.jpg)
Is this enough?• What we have so far is good for:
– us to manage data– users to access their own data– citation of raw data
• But – Traceability and Validation?– Reuse of the data?
• Need to make context more explicit– Focussing on the dataset is the wrong subject of
discourse
![Page 13: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/13.jpg)
Support the wider Facilities Lifecycle
Proposal
Approval
SchedulingExperiment
Data storage
Record Publication
Scientist submits
application for beamtime
Facility committee approves
applicationFacility registers,
trains, and schedules
scientist’s visit
Scientists visits, facility run’s experiment
Subsequent publication
registered with facility
Raw data filtered, and stored
Data analysis
Tools for processing made
available
As in PanData-ODI – D6.1 (which has much more detail)
![Page 14: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/14.jpg)
Publishing Investigations• So what we want is a record of EXPERIMENTS not data.
• Thus want the record of the context– The experimental intention and actors – The instruments and configurations used– The sample – The environmental parameters and context – The Raw Data
• Thus we want to publish a record of the whole INVESTIGATION– Can get most of this this from what we have
• The Investigation becomes a “first class” research object– Published– Identified and treated as a single entity– Cited and credited– Record of the output of the facility
• Analogous to a Journal Article– Investigation as the unit of discourse for scientific facilities.
• But also as an access point for validation and reuse– Because we have a record of what actually happened.
![Page 15: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/15.jpg)
Our DataCite entries are in fact Investigations (red is for “data” notion, and green is for “investigation”)
![Page 16: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/16.jpg)
“DataCite abuse”As we have seen, we use DataCite for Investigations, with Datasets
only referred from them.
Other data curators sometimes use DataCite for Publications (“documents”) that contain data: http://data.datacite.org/10.7480/OA
So “data” DOIs tend to resolve either into Investigations or Publications
• Extend the Resource Type
• Also may not want to have a landing page for all DOIs
![Page 17: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/17.jpg)
Research Objects• Represent the “investigation” as a Research Object
– Research Objects (ROs) are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations. Their goal is to create a class of artifacts that can encapsulate our digital knowledge and provide a mechanism for sharing and discovering assets of reusable research and scientific knowledge
• www.researchobject.org and elsewhere (WorkFlow4Ever)
• Represent Investigation as a Research Object– Build a graph structure for the links in the research object.– Using an RDF representation, URIs– Publish as a linked data object
Bechhofer, et. al. Why Linked Data is Not Enough for Scientists, Proceedings of the 10th IEEE e-Science Conference, Brisbane, Australia (2010) http://eprints.ecs.soton.ac.uk/21587/5/research-objects-final.pdf
Arif Shaon, Sarah Callaghan, Bryan Lawrence, Brian Matthews. Opening up Climate Research: a linked data approach to publishing data provenance 7th Int Digital Curation Conference (2011).
![Page 18: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/18.jpg)
RDF representation of CSMD model <!-- csmd:Investigation --> <owl:Class rdf:about="csmd:Investigation"> <rdfs:label>Investigation</rdfs:label> <rdfs:comment>An investigation or experiment</rdfs:comment> </owl:Class> <!-- csmd:Facility --> <owl:Class rdf:about="csmd:Facility"> <rdfs:label>Facility</rdfs:label> <rdfs:comment>An experimental facility</rdfs:comment> </owl:Class> <!-- csmd:Dataset --> <owl:Class rdf:about="csmd:Dataset"> <rdfs:label>Dataset</rdfs:label> <rdfs:comment>A collection of data files and part of an investigation</rdfs:comment> </owl:Class> <!-- csmd:Datafile --> <owl:Class rdf:about="csmd:Datafile"> <rdfs:label>Datafile</rdfs:label> <rdfs:comment>A data file</rdfs:comment> </owl:Class>
![Page 19: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/19.jpg)
After proposal: Initialise the Research Object
Investigation #n
DOI:STFC.xxx.n
:instrument
:investigator
:n a csmd:Investigation ; csmd:investigation_doi doi:stfc.xxx.n csmd:investigation_investigationUser :iu1 ; csmd:investigation_instrument :inst1 .
:iu1 a csmd:investigationUser ; csmd:investigationUser_user :u1 .
:u1 a csmd:User .
:inst1 a csmd:Instrument .
![Page 20: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/20.jpg)
After the experimentExperimental Data Metadata
Investigation #n
DOI:STFC.xxx.n
:dataset
:instrument
:investigator
• Own metadata format (CSMD)• More or less what ICAT currently supports• Adds extra details on parameters, datasets, formats etc.
:sample
Data Storage
![Page 21: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/21.jpg)
Linking Publication into Investigation
Raw Data Repository
Publication Repository
:dataset
:publication
:publication
:investigator
cito:citescito:cites
Investigation #n
DOI:STFC.xxx.n
:instrument :sample
Publication Store
![Page 22: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/22.jpg)
Raw Data Repository
Derived Data Repository
Publication Repository
:dataset
:publication
:publication
:investigatorInvestigation
#nDOI:STFC.xxx.
n
:instrument :sample
• Note that derived data could be on a different site
:relatedDataset
Linking the derived data into the Investigation
![Page 23: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/23.jpg)
Linking the software into the Investigation
:dataset
:relatedDataset
:publication
:publication
:investigator
• W3C Prov ontology• Assume that the software is in a repository
SoftwarePackage 1
cito:cites
cito:cites
:inputDataset
:outputDataset
:application
Software Repository
Investigation #n
DOI:STFC.xxx.n
:instrument :sample
![Page 24: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/24.jpg)
Generate Landing page from RO
![Page 25: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/25.jpg)
Setting the Boundary: It depends on your Point of View
Investigations
Extended Publication
E-Portfolio
![Page 26: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/26.jpg)
Setting a boundary : OAI-ORE
![Page 27: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/27.jpg)
Preserving Investigations
• Now becomes preserving the research object.– Preserving a linked data graph– Persistency of identifiers– Managing integrity of external artefacts.– Link checking– Copying and mirrorign – checking consistency
• Representation Information to give more context on the objects– And on the aggregate as a whole
• PDI (Provenance, Integrity etc) on the whole aggregate object – As well as components
![Page 28: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/28.jpg)
Adding Preservation Information – Rep Info for various items
:dataset
:relatedDataset
:publication
:publication
:investigator
• Would probably be more• Work into a RepInfo Repository• Would also have a RepInfo Network
:applicationInvestigation #n
DOI:STFC.xxx.n
:instrument :sample
Instrument description(website)
Raw data format description (e.g.
NeXus)
Parameter description (e.g.
NXDL, Con Vocab)
Software classification
Software description
Sample description
Analysed data format description
Publication format description
![Page 29: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/29.jpg)
Adding Preservation Information – Rep Info for the whole aggregate
:dataset
:relatedDataset
:publication
:publication
:investigator:applicationInvestigation
#nDOI:STFC.xxx
.n
:instrument :sample
Software classification
CSMD Vocabulary description
![Page 30: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/30.jpg)
Summary• Investigation appropriate unit of discourse for facilities science
– Publishable, Citable, Reportable– Can be used as a vehicle for validation and reuse
• Basic principles of building research objects for facilities science– Follow research lifecycle– Consider Investigation a RO “seed”– Apply Linked Data principles– Re-use existing vocabularies and ontologies– Share ROs via recognizable data formats and APIs
• Applicable beyond Facilities– Other analogous objects:– “experiments”, “observations”, “studies”
• The subject of preservation– How do we maintain the integrity of Investigation objects?
![Page 31: Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.](https://reader036.fdocuments.us/reader036/viewer/2022081501/56649e005503460f94ae99fa/html5/thumbnails/31.jpg)
Thank You
Questions?
www.e-science.stfc.ac.uk