EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry,...
-
Upload
abigail-roberts -
Category
Documents
-
view
218 -
download
3
Transcript of EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry,...
![Page 1: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/1.jpg)
eBankII Workshop1
Making Scientific Data Openly Available
Simon Coles
School of Chemistry,
University of Southampton
![Page 2: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/2.jpg)
eBankII Workshop2
Scientific Data Overload!
Cl
Cl
Cl
Cl
Cl
Cl
ClCl Cl
Cl
Cl
ClCl
O
O
O
O
N
N
N
N
N+
O
O
O
N+
O
O
O
![Page 3: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/3.jpg)
eBankII Workshop3
CombeChem: eScience testbed
Properties
X-Raye-Lab
Analysis
Propertiese-Lab
SimulationVideo
Diff
ract
omet
er
Grid Middleware
StructuresDatabase
![Page 4: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/4.jpg)
eBankII Workshop4
Chemistry Publications
Ideas and interpretations Hooks into the literature
Results & derived data
Raw data!
![Page 5: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/5.jpg)
eBankII Workshop5
![Page 6: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/6.jpg)
eBankII Workshop6
Establishing common ground
• Understand the data creation process • Terminology and definitions
– Data– Metadata– Datafile– Dataset– Data holding
• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner
• Aim for a common ontology• Modelling the domain• Creating a metadata schema
![Page 7: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/7.jpg)
eBankII Workshop7
Crystallography workflow
RAW DATA DERIVED DATA RESULTS DATA
![Page 8: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/8.jpg)
eBankII Workshop8
Crystallography datasets
• Initialisation: mount new sample on diffractometer & set up data collection
• Collection: collect data• Processing: process and correct images• Solution: solve structure• Refinement: refine structure• CIF: produce CIF (Crystallographic Information
File format)• Report: generate Crystal Structure Report• Validation: generate report from structure checks
Within a dataholding are the following datasets:
![Page 9: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/9.jpg)
eBankII Workshop9
Publishing, Informatics & Schemas
• Current schema is for publishing / advertising only• eCrystals publishing requires lightweight schema
only• eBank harvesting requires lightweight schema only• Aggregation and Linking requires a comprehensive
schema • Data management, Information delivery and
Searching services require a very rich schema
![Page 10: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/10.jpg)
eBankII Workshop10
Deposition into the archive
![Page 11: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/11.jpg)
eBankII Workshop11
An Archive entry
ecrystals.chem.soton.ac.uk
![Page 12: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/12.jpg)
eBankII Workshop12
Access to the underlying data
![Page 13: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/13.jpg)
eBankII Workshop13
Some metadata issues
• Using simple and qualified Dublin Core • Additional chemical information in schema for
harvesting e.g. empirical formula• Schema contains International Chemical Identifier
(InChI)• Specifies which ‘datasets’ are present in an entry• Links to ePrints (and other published literature)
derived from the data• Using vocabularies specific to crystallography
![Page 14: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/14.jpg)
eBankII Workshop14
Harvesting: OAIster
![Page 15: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/15.jpg)
eBankII Workshop15
Linking and aggregating
![Page 16: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/16.jpg)
eBankII Workshop16
Embedded in a science portal
![Page 17: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/17.jpg)
eBankII Workshop17
Current situation
• Version 2.0 eBank metadata schema• Pilot institutional e-data repository for harvesting (raw,
derived, results data) using EPrints software• Exports records as ebank_dc and oai_dc• Validation of schema & discussion with International
Union of Crystallography for developments (and wider deployment)
• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal –
embedding eBank UK
![Page 18: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/18.jpg)
eBankII Workshop18
What’s next?
• Generic metadata schema vs Subject specific schema • Validation against other schema (CCLRC Model)• (Eprints.org software: allow for more generic scientific data
and schemas?) • Metadata enhancement: keywords based on knowledge of
keywords in related publications?• Investigate identifiers: International Chemical Identifier • Explore context sensitive linking• Embedding into chemical and crystallographic research and
publishing• e-Learning embedding and pedagogic evaluation• Feasibility study in related domains
![Page 19: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.](https://reader035.fdocuments.us/reader035/viewer/2022062417/55160a10550346d46f8b5f67/html5/thumbnails/19.jpg)
eBankII Workshop19
Crystallography Schema Breakout
• Describing non dc: terms– METS– SET container
• Rights– IPR– Copyright– Publisher– Funder
• Linking – DOI– Keyword ontology– Identifiers
• Data validation- Add validation dataset
- Other forms of validation: Mogul
• Chemical representation
- Naming conventions
- Empirical formula representation
• Relationship between repositories and harvesters
- Registration / subscription
• Syndication
- FRIENDS container
- RSS feeds