Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014
description
Transcript of Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014
![Page 1: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/1.jpg)
UK National Chemical Database Service: An integration of commercial and public chemistry services to support chemists in the United Kingdom
Antony Williams, Valery Tkachenko and Richard Kidd
ACS Dallas
March 2014
![Page 2: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/2.jpg)
UK Chemical Database Service
• The National Chemical Database Service is for UK academics – see later for Rest of World
![Page 3: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/3.jpg)
Vision for the Service PART 1
• Provide access to databases and services of interest to the academic community to serve their needs. Access to services to include:• Crystallography data – Organic and inorganic
materials• Thermophysical data• Reactions Data including retrosynthetic analysis• Prediction technologies – name generation,
physicochemical parameters, NMR prediction
![Page 4: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/4.jpg)
![Page 5: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/5.jpg)
Service Rollout
• Many services are hosted in the cloud• Access through login/password, IP
authentication or Shibboleth authentication• Lots of hard work in a very short time – so
much thanks to all of the service providers• More providers stepped up to help –
ChemAxon • Crystallography concern (understatement!)
![Page 6: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/6.jpg)
Feedback from Community
• Converted initial public negativity spike on Twitter pre-release to very positive feedback post-release
• Training required – onsite training sessions organized
• Available Chemicals Directory is big plus!
• Concerns with Retrosynthetic Analysis tool
![Page 7: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/7.jpg)
Usage
• Majority of usage is for crystallography data – previous provider had same bias
• Usage is increasing month-by-month
• Still way-under used and in many cases low awareness
![Page 8: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/8.jpg)
Vision for the Service PART 2
• Response to the call for proposals included our vision for a 21st Century data repository
• At a time of Open Access, Open Data and funding agency requirement to make data public – build a data repository
• Funding is split for licensing content and services (VAST MAJORITY) and some funding for research and development
![Page 9: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/9.jpg)
An Initial “Vague” Vision Set
• Manage “all” of the chemistry data associated with chemical substances
• Data to be downloadable, reusable, interactive• Build a platform that enables the scientist
• Data storage, validation, standardization and curation
• Collaborative data sharing• Provide data platform that can enable and
enhance publishing of scientific papers
![Page 10: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/10.jpg)
Data Repository
• Registration of chemical compounds• Deposition of chemical syntheses• Addition of analytical data • Integration to electronic notebooks• Rewards and recognition for data sharing• Document processing• Hosting of data as private, embargoed or
public
![Page 11: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/11.jpg)
What we will deliver for all data
• Simple interfaces for uploading of data
• Embeddable widgets and programming interfaces to utilize in in-house systems, ELNs
• Automated harvesting approaches – sweeping directories for data
• Data validation where possible
![Page 12: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/12.jpg)
Input data pipeline
Deposition Gateway
Staging databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds Module
Spectra Module
Reactions Module
Materials Module
TextminingModule
!Module
Web UI for unified depositions
DropBox, Google Drive, SkyDrive, etc
LabTrove and other templated data
Documents
API, FTP, etc
Raw data Validated dataStaging
databases
All databases are sliced by data sources/data
collections and have simple
security model where each data
slice/source is private, public or
embargoed
![Page 13: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/13.jpg)
Compounds upload
• Draw chemicals in the interface (Javascript editors – PC, Mac, Tablets, Phones)
• Drag and drop of compounds
• Automated generate of properties – Formulae, Mw, Mi, physchem properties
• Metadata input forms
• Bulk upload
![Page 14: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/14.jpg)
Depositions Gateway User Interface
![Page 15: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/15.jpg)
Depositions Gateway User Interface
![Page 16: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/16.jpg)
Chemical Validation and Standardization
![Page 17: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/17.jpg)
![Page 18: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/18.jpg)
Reactions
• Hosting of reaction data – standard “document formats” – full flexibility but limiting – extraction of data from embedded objects
• Encourage template formats – using ELNs for example, community agreed templates
![Page 19: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/19.jpg)
![Page 20: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/20.jpg)
![Page 21: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/21.jpg)
![Page 22: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/22.jpg)
Electronic Notebook Data
• Development work integrating chemistry into the Southampton Labtrove notebook• Stoichiometry table development• Analytical data integration
• “ChemTrove” rolled out to a small test group in January
![Page 23: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/23.jpg)
Micropublishing Syntheses
![Page 24: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/24.jpg)
ChemSpider SyntheticPages
![Page 25: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/25.jpg)
![Page 26: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/26.jpg)
Requirements
• Community agreement on acceptable templates for CSSP/Reactions deposition
• Data Model deposition based on mappings between template and CSSP model
• Adoption of Labtrove interface for deposition
![Page 27: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/27.jpg)
What we will deliver
• Micropublishing platform for submission of • Protocols and Procedures• Reactions• Safety and Hazard data (LATER)
• Template-based submissions of procedures• Matched to ELN submissions• Full details for user submission versus
mapped submission into database
![Page 28: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/28.jpg)
Reaction Deposition/Validation
![Page 29: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/29.jpg)
Reaction Deposition/Validation
![Page 30: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/30.jpg)
Spectral Data
• Support for “structure identification” is a must – “greatest value” for reference and lookup
• Support for data standards primarily – JCAMP, mzML, SPC
• Want to support ASSIGNED data formats
• Hold binary files but prefer standards – WHY?
![Page 31: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/31.jpg)
Raw Spectral Data
![Page 32: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/32.jpg)
10 years from now…
• Binary file formats generally need original data processing software to deal with them – from Bruker, Agilent, Jeol, Thermo, Waters, blah, blah, blah, blah,…
• While we can store the original raw data files for posterity should we? This has been one focus for data repositories
![Page 33: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/33.jpg)
This is way more useful
![Page 34: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/34.jpg)
Processed data…
Spectral searching is made possible
Spectral matching is possible
![Page 35: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/35.jpg)
This is what we really want…
![Page 36: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/36.jpg)
![Page 37: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/37.jpg)
![Page 38: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/38.jpg)
Addition of Analytical Data
• Spectral Container is in development using componentized widgets for display
• NIST spectra converted into standardized JCAMP format for deposition - 296,103 spectra deposited
• 10% of remaining NIST spectra need to be curated as there are obvious structure issues
![Page 39: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/39.jpg)
Javascript viewer NMR, MS, IR
![Page 40: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/40.jpg)
Depositions Gateway User Interface
![Page 41: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/41.jpg)
Document processing
![Page 42: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/42.jpg)
Depositions Gateway User Interface
![Page 43: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/43.jpg)
User Interface Approach
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
![Page 44: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/44.jpg)
![Page 45: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/45.jpg)
![Page 46: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/46.jpg)
![Page 47: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/47.jpg)
User Interface Approach
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
![Page 48: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/48.jpg)
Analytical Chemist
Characterize
Measure
Search
Store
<<include>>
<<include>>
<<include>>
Synthetic Chemist
Search(synthetic procedure)
Document(publish synthetic procedure)
Retrosynthetic analysis
![Page 49: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/49.jpg)
Medicinal Chemist
Search(against database of properties)
Source(find vendor)
Analyse(cluster, dock, screen)
Computational Chemist
Search or Develop algorithm
Store results
Run calculations
Synthesize
Measure activity
![Page 50: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/50.jpg)
Present activities for ACS Fall
• Deposition process development of compounds, reactions and spectral data by end of Spring • FTP, DropBox, Web-upload, ELN integration
• Compounds, Reactions, Spectral data search, display, download
• Data sharing – private, public, collaborative
• Metadata, metadata, metadata standards!
• Open Sourcing Chemical Registry System including CVSP
![Page 51: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/51.jpg)
UK Chemical Database Service
• The National Chemical Database Service is for UK academics
• What would be necessary to make this available for “Rest of World”, a single institution, an organization?
• It’s not really technology…that’s scale out and can be handled
• It’s negotiation with database providers, pricing, login/authentication, localization?
![Page 52: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/52.jpg)
Acknowledgments
• Jeremy Frey and Simon Coles, University of Southampton
• Will Dichtel and Leah McEwan, Cornell University
• Stuart Chalk, University of North Florida
• Bob Hanson and Bob Lancashire, Jmol and JSpecView
![Page 53: Antony Williams, Valery Tkachenko and Richard Kidd ACS Dallas March 2014](https://reader038.fdocuments.us/reader038/viewer/2022110102/56813acd550346895da2edd4/html5/thumbnails/53.jpg)
Thank you
Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams