Pipeline for automated structure-based classification in the ChEBI ontology
-
Upload
janna-hastings -
Category
Technology
-
view
448 -
download
0
description
Transcript of Pipeline for automated structure-based classification in the ChEBI ontology
![Page 1: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/1.jpg)
Pipeline for automated structure-based classification in the ChEBI ontology
Pipeline for automated structure-based classification in the ChEBI ontology
Janna Hastings
Coordinator, Cheminformatics and Metabolism
www.ebi.ac.uk/chebi
ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014
![Page 2: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/2.jpg)
Chemical Entities of Biological Interest
Freely available online, available
for download in full
Freely available online, available
for download in full
Low molecular weight, i.e. no proteins
Low molecular weight, i.e. no proteins
Definitions, relationships,
hierarchy
Definitions, relationships,
hierarchy
E.g. metabolites,
drugs, pesticides
E.g. metabolites,
drugs, pesticides
38,215 entries last release
38,215 entries last release
![Page 3: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/3.jpg)
What does ChEBI provide?
Chemical structures and visualisations
caffeine1,3,7-trimethylxanthine methyltheobromine
Names and synonyms
Formula: C8H10N4O2Charge: 0 Mass: 194.19
Chemical data
metaboliteCNS stimulanttrimethylxanthines
Ontology – classifications
MSDchem: CFFKEGG DRUG: D00528PubMed citations
Links to more information
Chemical InformaticsInChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
![Page 4: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/4.jpg)
Example ChEBI entry page
![Page 5: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/5.jpg)
Example entry page (continued)
![Page 6: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/6.jpg)
Example entry page (continued)
![Page 7: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/7.jpg)
Structure-based classification in ChEBI
![Page 8: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/8.jpg)
Challenges with manual classification
• May be incomplete
• May be inconsistent
• Difficult to maintain (even with extensive use of computationally expensive automatic validations)
• Blocks automatic loading of otherwise high-quality externally annotated chemical data into ChEBI (as no classification available)
![Page 9: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/9.jpg)
SOCO (SMARTS, OWL) Leonid Chepelev, Michel Dumontier, collaborators• Given a training set of classified molecules,
examine structures for consensus features across all (using fragmentation and feature detection)
• Capture features hierarchically
• Use OWL to classify
Chepelev et al. BMC Bioinformatics 2012 13:3 doi:10.1186/1471-2105-13-3
![Page 10: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/10.jpg)
Limitations of SOCO
• No support for negation
• Only “min” (at least) counting supported, not max or exact. Thus, dicarboxylic acid is_a monocarboxylic acid (Every two-legged human is also a one-legged human in the sense that they have at least one leg…)
• SMARTS is powerful – but not very human-readable. ChEBI is for human biologist and chemist consumption. E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])]
Can we do better at making definitions accessible?
![Page 11: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/11.jpg)
A new pipeline for automated structure-based ontology classification in ChEBI
Definitions (OWL)
ChEBI structures
OWL Parser => logical
cheminformatics definitions
OWL Parser => logical
cheminformatics definitions
Novelstructure
Candidateclasses
RankingRankingBest classes: save is_a relations
MatchingMatching
![Page 12: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/12.jpg)
Human-readable definitions, mapped to structures in ChEBI knowledgebase
thiadiazoles:molecular_entity and has_part some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole or 1,2,5-thiadiazole or 1,3,4-thiadiazole )
diterpenoid: organic_molecular_entity and has_part exactly 2 terpenoid
organic ion: organic_molecular_entity and ( has_charge some int[>0] or has_charge some int[<0] )monocyclic compound: molecular_entity and has_cycles value "1"^^int
Logical operatorsLogical operators
Counts (min, max and exact)
Counts (min, max and exact)
PropertiesProperties
PartsParts
![Page 13: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/13.jpg)
Planned integration into ChEBI tools
• ChEBI internal data loader and bulk submissions
• ChEBI online submission tool
Pre-population of matched
classes
Pre-population of matched
classes
![Page 14: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/14.jpg)
Acknowledgements – Thanks!
ChEBI team:
Christoph SteinbeckGareth OwenAdriano DekkerNamrata KaleSteve TurnerVenkatesh Muthukrishnan
Collaborators:
Colin Batchelor, RSCLian Duan, ETHLeonid Chepelev, OttawaMichel Dumontier, StanfordDespoina Magka, OxfordIlinca Tudose and John May, EBI
Funding:
BBSRC “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling communities” BB/K019783/1
![Page 15: Pipeline for automated structure-based classification in the ChEBI ontology](https://reader036.fdocuments.us/reader036/viewer/2022062418/5550138db4c905af648b4a57/html5/thumbnails/15.jpg)
Questions?
Thank you for listening!Thank you for listening!
ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014