OpenPHACTS - Chemistry Platform Update and Learnings

31
Open PHACTS - Chemistry Platform Update and learnings Antony Williams and Valery Tkachenko ORCID ID:0000-0002- 2668-4821

Transcript of OpenPHACTS - Chemistry Platform Update and Learnings

Page 1: OpenPHACTS - Chemistry Platform Update and Learnings

Open PHACTS - Chemistry Platform Update and learnings

Antony Williams and Valery Tkachenko

ORCID ID:0000-0002-2668-4821

Page 2: OpenPHACTS - Chemistry Platform Update and Learnings

@gray_alasdair Big Data Integration 2

OpenPHACTS and CRS Diagram

Page 3: OpenPHACTS - Chemistry Platform Update and Learnings

The Chemical Registration ServiceChemistry processing•Validation•Standardization•Properties generation•Properties retrieval

Export•RDF•SDF

API•Domain-specific searches•Chemical visualization•Properties•Conversions

Page 4: OpenPHACTS - Chemistry Platform Update and Learnings
Page 5: OpenPHACTS - Chemistry Platform Update and Learnings

Subsystems

• “CVSP” (frontend, backend, database)• Compounds (frontend, database)• OpenPHACTS API (frontend, database)• Datasources registry (frontend, database)• Processing farm (optional)

Page 6: OpenPHACTS - Chemistry Platform Update and Learnings

Structure-Based Database linking

• Open PHACTS, and many other projects requiring the linking of structure databases, depend on mappings

• Different databases use different processes for standardization prior at deposition

• Examples: PubChem, EBI databases, ChemSpider, etc.

Page 7: OpenPHACTS - Chemistry Platform Update and Learnings

DrugBank• ~60 records can’t be dearomatized unambiguously

• ~40 records where InChIs did not match structure• 2 records where SMILES, InChI and name did not

match the structure• 7 records with 2 stereo bonds at chiral atoms

DB04283 DB04462

Page 8: OpenPHACTS - Chemistry Platform Update and Learnings

Standardizers• EBI Standardizer:

https://wwwdev.ebi.ac.uk/chembl/extra/francis/standardiser/

• PubChem Standardizer: https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi

• NCGC Standardizer: https://tripod.nih.gov/?p=61

• The CVSP Standardizer work in Open PHACTS http://cvsp.chemspider.com/

Page 9: OpenPHACTS - Chemistry Platform Update and Learnings
Page 10: OpenPHACTS - Chemistry Platform Update and Learnings

Standardization Rules

• Available from: http://tinyurl.com/hwapem3 • Use the SRS as guidance for standardization• Adjust as necessary to our needs

Page 11: OpenPHACTS - Chemistry Platform Update and Learnings

Nitro groups

Page 12: OpenPHACTS - Chemistry Platform Update and Learnings

Salt and Ionic Bonds

Page 13: OpenPHACTS - Chemistry Platform Update and Learnings

The CVSP Systemhttp://cvsp.chemspider.com

Page 14: OpenPHACTS - Chemistry Platform Update and Learnings

Supports various file formats

Page 15: OpenPHACTS - Chemistry Platform Update and Learnings

Comptox Chemistry DashboardPrior to deposition check a deposition…

Page 16: OpenPHACTS - Chemistry Platform Update and Learnings

>3450 compounds in one SDF

Page 17: OpenPHACTS - Chemistry Platform Update and Learnings

98 Errors, 1571 Warnings

Page 18: OpenPHACTS - Chemistry Platform Update and Learnings

Review Errors

Page 19: OpenPHACTS - Chemistry Platform Update and Learnings

Validation Rule Set

Page 20: OpenPHACTS - Chemistry Platform Update and Learnings

Various Rules Sets Available

Page 21: OpenPHACTS - Chemistry Platform Update and Learnings

CVSP – My own custom rules

Page 22: OpenPHACTS - Chemistry Platform Update and Learnings

ChEMBL Validation Review (of 1.3 million records)• 11,020 records with 4 bonds and zero charge, e.g.

CHEMBL501101 or CHEMBL501973

• 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine

• 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704

Page 23: OpenPHACTS - Chemistry Platform Update and Learnings

Chemical Validation first… Standardization Second• Chemical Validation detects errors –

Standardization FIXES them according to rules

• SMIRKS transformations are based on both InChI Normalization and FDA SRS rules

Page 24: OpenPHACTS - Chemistry Platform Update and Learnings

Standardization SMIRKSExamples of InChI normalization [*;H+:1]>>[*;H:1][O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3][N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2]

Examples of FDA SRS rules[n:1]=[O:2]>>[n+:1][O-:2][*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3][N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5]Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10]([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10]([H,*:12])[N:9]2)[C:3]1=[S:2]

Page 25: OpenPHACTS - Chemistry Platform Update and Learnings

Examples of Standardization

Double bond with adjacent wiggly single bond

Collapser hydrogen atoms with no stereo bonds

ClCl

Cl

NH 2

O

Cl

N

H

H

Cl

H

Cl

O

Page 26: OpenPHACTS - Chemistry Platform Update and Learnings

Examples of Standardization

Remove symmetric stereocenters

Turn off chiral flag if no up or down bonds

Chiral flag is setN H 2

NH 2NH 2

N H 2

Page 27: OpenPHACTS - Chemistry Platform Update and Learnings

Defining a Community Rule Set

• There are multiple standardizers, each with their own rules set

• Can we decide on a default community rules set, like Standard InChI, that could be used by ALL Standardizers?

• A joint meeting between the Research Data Alliance (RDA), IUPAC and ACS Division of Chemical Information discussed the value and possibilities of this approach (July 2016)

Page 28: OpenPHACTS - Chemistry Platform Update and Learnings

EPA is investigating CVSP

• EPA is investigating CVSP as a validation and standardization platform

• Considering the API aspects of CVSP to integrate to our registration system

• CVSP is a reference implementation and “starting point” for a community rules set

Page 29: OpenPHACTS - Chemistry Platform Update and Learnings

CVSP code is now Open Source

• Open Source CVSP code now released• Code is hosted on Open PHACTS Github

https://github.com/openphacts/ops-crs • Valery Tkachenko will offer future support • Hoping for additional community engagement

and support

• Some details of availability….

Page 30: OpenPHACTS - Chemistry Platform Update and Learnings

Virtual Machines

• OPS_FRONT (all websites and API)• OPS_BACK (all heavy-lifting)• OPS_DB (databases)

• VMs are VMware images• Can be converted to other hypervisors

Page 31: OpenPHACTS - Chemistry Platform Update and Learnings

Thank you

Emails: [email protected] and [email protected]

SLIDES: www.slideshare.net/AntonyWilliams