Building global chemistry network at the royal society of chemistry
Ontology work at the Royal Society of Chemistry
description
Transcript of Ontology work at the Royal Society of Chemistry
![Page 1: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/1.jpg)
Ontology work at the Royal Society of ChemistryAntony J. Williams, Colin Batchelor, Peter Corbett, Jon Steele and Valery Tkachenko
ACS Dallas
March 16th 2014
![Page 2: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/2.jpg)
Royal Society of Chemistry
• You know us as a publisher and society but
• We are a host of chemistry databases• We are a charity and community support• We are a provider of grant-based services• We are an innovator in cheminformatics
![Page 3: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/3.jpg)
We have data to manage…
• Compounds
• Reactions
• Spectra
• Crystals
• Materials
• Assays
• Algorithms
• …
![Page 4: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/4.jpg)
We have data to manage…
• Compounds
• Reactions
• Spectra
• Crystals
• Materials
• Assays
• Algorithms
• …
![Page 5: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/5.jpg)
![Page 6: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/6.jpg)
Properties - experimental
![Page 7: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/7.jpg)
Physicochemical properties
LONG LIST: log P, log D (at pH 5.5, at pH 7.4), bioconcentration factor, KOC (at pH 5.5, at pH 7.4), index of refraction, polar surface area, molar refractivity, molar volume, polarizability, surface tension, density at STP, flash point at 1 atm, boiling point at 1 atm, enthalpy of vaporization at STP, vapour pressure at STP…
![Page 8: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/8.jpg)
All are amenable to ontologiesand should blend standards
• Compounds and properties are handled (InChIs are important)
• Reactions are covered (and RInChIs help)• Spectra (JCAMP, AnIML, NetCDF, mzML)• Crystals (CIFs)• Materials (MatML)• Assays (MIAME)• Algorithms• …
![Page 9: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/9.jpg)
ChemSpider Reactions
![Page 10: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/10.jpg)
ChemSpider Spectra
![Page 11: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/11.jpg)
ChemSpider is 7 years old
• When ChemSpider was developed ontologies were not directly implemented
• The ontologies and technologies have developed and more accepted in seven years
• Some efforts have been made to include ontologies – layer on MeSH. We support a lot of standards – InChI, RInChI, JCAMP, CIF
• The ChemSpider architecture is being rebuilt and considering new standards and ontologies
![Page 12: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/12.jpg)
Some available ontologies…
• RSC has built and opened in-house ontologies:• Chemical methods (CHMO)• Name reactions (RXNO) • Molecular processes (MOP), largely auto-generated
from the corresponding ChEBI classes
• We have contributed to external ontologies:• Small molecules (ChEBI)• Cheminformatics (CHEMINF)
![Page 13: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/13.jpg)
Chemistry ontologies 1
ChEBI (molecules, families of molecules, parts of molecules, 32128 fully annotated classes) (http://www.ebi.ac.uk/chebi/)
perylene (CHEBI:29861) a perylene (CHEBI:60201) perylene skeleton (CHEBI:60200)
![Page 14: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/14.jpg)
ChEBI Ontology
![Page 15: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/15.jpg)
RSC Ontologies
![Page 16: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/16.jpg)
Chemistry ontologies 2
Chemical Methods Ontology (http://rsc-cmo.googlecode.com)
2745 classes describes methods used to: •collect data in chemical experiments, such as MS and NMR•prepare and separate material for further analysis, such as sample ionisation, chromatography, and electrophoresis •synthesise materials, such as continuous vapour deposition •also describes the instruments used in these experiments, such as mass spectrometers and chromatography columns and their outputs•Should be of value to chemical hazards and safety data
![Page 17: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/17.jpg)
Chemistry ontologies 3
RSC Name Reaction Ontology
(http://rxno.googlecode.com/)
421 classes
Examples:
Diels–Alder cyclization
![Page 18: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/18.jpg)
Chemistry ontologies 4
CHEMINF(http://code.google.com/p/semanticchemistry/)
638 classes Describes cheminformatics methods. Not presently used in text mining (see Open PHACTS usage later).
doi:10.1371/journal.pone.0025513
![Page 19: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/19.jpg)
Limits of ontologies
Chemical space is very big:
‘The “small molecule universe” (SMU), the set of all synthetically feasible organic molecules of 500 Daltons molecular weight or less, is estimated to contain over 1060 structures, making exhaustive searches for structures of interest impractical.”
Virshup et al., J. Am. Chem. Soc., doi:10.1021/ja401184g
![Page 20: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/20.jpg)
Why a named reaction ontology?
• Despite attempts to introduce systematic
nomenclature for organic reactions, lots of
chemists still prefer to attach human
names.
![Page 21: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/21.jpg)
A big challenge
• Classification is based on what the experimenter intends
• Build the ontology around intended product molecules rather than might be by-products
• (Carbon dioxide, water, hydrolysed protecting groups, protons, etc. etc.)
![Page 22: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/22.jpg)
![Page 23: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/23.jpg)
Defining the skeleton
![Page 24: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/24.jpg)
Limits of reaction classification
• Much of RXNO is still classified by hand
• Example: we can’t just define a cyclization as
a reaction where a cyclic compound is formed.
The Friedel–Crafts acylation produces a cyclic
compound but is not a cyclization!
![Page 25: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/25.jpg)
RXNO in the wild
510 classes in the RXNO namespace
… and RXNO is built in to NextMove
Software’s reaction identification tool.
![Page 26: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/26.jpg)
RXNO: next steps
• More reactions!
• More cross-references!
• More example reactions!
• Links to graphical versions! (All drawn, just
awaiting uploading.)
• More SMIRKS strings!
![Page 27: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/27.jpg)
Using ontologies in text mining
• To provide a controlled vocabulary of terms found in text and a common identifier.
• This identifier hopefully is a resolvable HTTP URI, for example, for chemical compounds http://purl.obolibrary.org/obo/CHEBI_36063 ) and to methods terminology
![Page 28: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/28.jpg)
![Page 29: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/29.jpg)
Ontologies as synonym sets for text-mining
• We have text-mined the whole 21st century
RSC archive with a myriad of ontologies.
Results are on the publishing platform
• We have looked for correlations between
molecules and ontology terms.
• Two examples follow…
![Page 30: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/30.jpg)
Co-occurrences with ?
alcohols (CHEBI:30879) solvents (CHEBI:46787)
coproporphyrins (CHEBI:23388) 3D DOSY-TOCSY
(CHMO:0001950) lipase activity (GO:0016298) solvolysis
(MOP:0000620) wood (ENVO:00002040) aliphatic alcohol
(CHEBI:2571) Raman circular dichroism spectroscopy
(CHMO:0001160) propoxy group (CHEBI:46881) steam
reforming (CHMO:0001450) hydrogenation (MOP:0000589)
aqueous-phase reforming (CHMO:0001444) sonication
(CHMO:0001707)
![Page 31: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/31.jpg)
Co-occurrences with ?
reducing agent (CHEBI:63247) ascorbic acid (CHEBI:22652)
antioxidant (CHEBI:22586) reduction (MOP:0000569) electrode
(CHMO:0002344) ascorbate (CHEBI:22651) modified residue
(SO:0001089) phosphate buffer (CHMO:0001734) oxidation
(MOP:0000568) nafion polymer (CHEBI:61428) vitamin C
(CHEBI:21241) antioxidant activity (GO:0016209) atom-transfer
radical polymerisation (MOP:0000684) detection of glucose
(GO:0051594) reducing agent (CHEBI:63247) glucose
(CHEBI:17234) graphene (CHEBI:36973)
![Page 32: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/32.jpg)
Projects and Ontologies
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic web technologies
• Open source code, open data and open standards
• Academics, Pharmas, Publishers…• To put medicines in the pipeline…
![Page 33: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/33.jpg)
The Open PHACTS community ecosystem
![Page 34: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/34.jpg)
![Page 35: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/35.jpg)
Our RDF schema
Two dozen calculated properties >106 molecules•CHEMINF ontology for cheminformatics•QUDT for units and numeric values•ChemSpider IDs for molecules
Calculation
connection table
has_input
benzeneis_about
calculated log Phas_output
dimensionless
has_unit 2.177has_value
0.234has standard
uncertainty
![Page 36: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/36.jpg)
RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifiers
3. Molecule–molecule relations (“parent–child”) of
interest for drug discovery
4. Calculated physicochemical properties for
compounds (both molecular and macroscopic)
![Page 37: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/37.jpg)
Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms•Unvalidated ChemSpider synonyms•Validated database identifiers•Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name
![Page 38: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/38.jpg)
Physicochemical properties
log P log D (at pH 5.5, at pH 7.4) bioconcentration factor KOC (at pH 5.5, at pH 7.4) index of refraction polar surface area molar refractivity molar volume polarizability surface tension density at STP flash point at 1 atm boiling point at 1 atm enthalpy of vaporization at STP vapour pressure at STP
![Page 39: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/39.jpg)
It is actually more complicated..
benzene’s connection table
OPSbenzene
calculation result
QUDTdimensionless
quantity
“2.17”^^xsd:float
IAOis about
OBIhas specified
output
OBIhas specified
input
QUDThas value
QUDThas standard uncertainty
QUDThas unit
CHEMINFcalculated log P
rdf:type
CHEMINFconnection table
rdf:type
“0.234”^^xsd:float
calculation process
CHEMINFexecution of ACD/Labs
PhysChem software library version 12.01
rdf:type
![Page 40: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/40.jpg)
What’s built on top of this?
![Page 41: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/41.jpg)
Chemistry Data to manage…
• Compounds
• Reactions
• Spectra
• Crystals (in development)
• Materials
• Assays
• Algorithms
• …
![Page 42: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/42.jpg)
Future Work
• Extending use of ontologies across all of our work on databases and as an underpinning to the Chemical Data Repository
• Adding ontologies to other grant-based projects such as PharmaSea
• Continued collaborations with University of Southampton on Labtrove for Chemistry
• RSC collaboration with Dr Stuart Chalk (UNF) on data standards and ontologies
• Working with CHAS on hazard/safety data
![Page 43: Ontology work at the Royal Society of Chemistry](https://reader034.fdocuments.us/reader034/viewer/2022051620/56813ffe550346895dab2e95/html5/thumbnails/43.jpg)
Thank you
•Email: [email protected]•ORCID: 0000-0002-2668-4821 •Twitter: @ChemConnector•Personal Blog: www.chemconnector.com •SLIDES: www.slideshare.net/AntonyWilliams