ACS Meeting New Orleans 2013 (CINF)
-
Upload
markus-sitzmann -
Category
Documents
-
view
3.513 -
download
1
description
Transcript of ACS Meeting New Orleans 2013 (CINF)
![Page 1: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/1.jpg)
NCI/CADD Chemical Structure Web Services
Markus SitzmannComputer-Aided Drug Design Group, Chemical Biology Laboratory, Frederick National Laboratory for Cancer Research, NIH, DHHS
![Page 2: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/2.jpg)
http://cactus.nci.nih.gov
![Page 3: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/3.jpg)
Chemical Structure Web API
NCI/CADDweb service
NCI/CADD Chemical StructureDataBase (CSDB)
CACTVS
externalweb services
http
ChemicalIdentifierResolver
othersoftwarepackages
Chemical Structure Web API
OPSIN
NCI/CADDweb service
![Page 4: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/4.jpg)
Chemical Structures
chemical structureNCI/CADD Identifiers
InChI/InChIKey
ChemSpider ID
PubChem SID/CID
chemical names
CAS Registry Number
NSC number
FDA UNII
ChemNavigator SID
SMILES
SD File
Chemical FormulaChEBI ID
PDB Ligand ID
MRV
CML
SYBYL Line Notation
GIF image
![Page 5: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/5.jpg)
Chemical Identifier Resolver (CIR)
http://cactus.nci.nih.gov/chemical/structure
CIR works as a resolver for different chemical structure identifiers orrepresentations. It allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.
![Page 6: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/6.jpg)
Chemical Identifier Resolver (CIR)
http://cactus.nci.nih.gov/chemical/structure
• officially released in June 2009• since then four beta versions
(for testing, learning, experience things)• one larger database update March 2010• since early 2012: major internal rewrite
(which will allow us to add new servicesand API functionality while not breakingthe existing API)
• major database update and servicesplanned for 2013
![Page 7: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/7.jpg)
7
CIR Usage Statistics
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
Typical number of unique IP addresses per month: 4,000 – 8,000
Requests per month since June 2009
![Page 8: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/8.jpg)
8
Academic/Hospitals• St. Olaf College• Carnegie Mellon• Drexel University• Princeton• Mayo
Pharma/Chemical Industry• Eli Lilly• Dow Chemical• Intermune• Procter & Gamble• Vertex
U.S. Government• EPA• NIH (NIEHS, NCI, NLM...)• Lawrence Livermore Natl. Lab.• CDC• DoD
Other• Google• Amazon• HP• Agilent• Symyx
Top Users (US)
![Page 9: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/9.jpg)
• CIR node for KNIME, by Talete s.r.l.• Lab Helper app for Windows Phone• Avogadro molecule editor• Jmol/JSmol open-source viewer for chemical structures in 3D• GChem for Google Spreadsheet• Bioclipse (CIR plugin)• Macs in Chemistry• Accelrys Draw
...and educational tools/sites such as:• Jmol/JSmol Virtual Molecular Model Kit• ISU CheMagic• Caltech Library
9
External web services and applications
![Page 10: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/10.jpg)
Examples using CIR
![Page 11: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/11.jpg)
Chemical Identifier Resolver (CIR)C7H6O2APtclcactv03051222202D 0 0.00000 0.00000 15 15 0 0 0 0 0 0 0 0999 V2000 2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0 4 7 1 0 0 0 0 7 8 1 0 0 0 0 7 9 2 0 0 0 0 1 10 1 0 0 0 0 2 11 1 0 0 0 0 3 12 1 0 0 0 0 5 13 1 0 0 0 0 6 14 1 0 0 0 0 8 15 1 0 0 0 0M END$$$$SD file
ChemWriter Editor
![Page 12: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/12.jpg)
Chemical Identifier Resolver (CIR)benzoic acid65-85-0WLN: QVRUnisept BZAAIDS018010Salvo liquidBenzoic acid-ring-UL-14CST5213864BenzoesaeureCHEBI:30746NSC 149benzenecarboxylic acidphenylformic acidBenzoic acid (JP15/USP)Benzoic acid (TN)18102_RIEDELAromatic hydroxy acidBenzoic acid (7CI,8CI,9CI)Benzoic acid [USAN:JAN]W213128_ALDRICH47849_SUPELCOAcide benzoique [French]Acido benzoico [Italian]Benzoate (VAN)Benzoesaeure [German]Benzoic acid (natural)Acide benzoiqueBenzeneformic acidBenzenemethanoic acidBenzoesaeure GKBenzoesaeure GVBenzoic acid, tech.CarboxybenzeneKyselina benzoovaPhenylcarboxylic acidnames
ChemWriter Editor
![Page 13: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/13.jpg)
Chemical Identifier Resolver (CIR)
InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-NInChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)C1=CC=C(C=C1)C(O)=O
InChIKeyInChI
SMILES
ChemWriter Editor
![Page 14: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/14.jpg)
Chemical Identifier Resolver (CIR)
programmatic URL API:
http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”
if a request is not successful: HTTP404 status message
![Page 15: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/15.jpg)
Chemical Identifier Resolver (CIR)
• access by programming libraries/languages (e.g. Python):
• access from Unix shell level (e.g., via wget):
shell > wget -qO - \http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas204255-11-8
from urllib2 import *url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas”resolver = urlopen(url) try:
response = resolver.read() except HTTPError:
raise “your own error handling”print response204255-11-8
![Page 16: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/16.jpg)
Chemical Identifier Resolver (CIR)
http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas
204255-11-8 MIME type: text/plain
examples:
http://cactus.nci.nih.gov/chemical/structure/tamiflu/image
MIME type: image/gif
![Page 17: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/17.jpg)
CIR
chemical namesIUPAC names (OPSIN)
CAS numbersSMILES strings
IUPAC InChI/InChIKeysNCI/CADD Identifiers
CACTVS HASHISYNSC number
PubChem SIDZINC Code
ChemSpider IDChemNavigator SID
eMolecule VIDUNII
/smiles/names, /iupac_name/cas/inchi, /stdinchi/inchikey, /stdinchikey/ficts, /ficus, /uuuuu /image/file, /sdf/mw, /monoisotopic_mass /formula/twirl/urls/chemspider_id/pubchem_sid/chemnavigator_sid
“identifier” “representation”
http://cactus.nci.nih.gov/chemcial/structure
CIR
Chemical Identifier Resolver (CIR)
![Page 18: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/18.jpg)
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles
CCO
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles`
CCOCC[OH2+]
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles
C(C(O)([2H])[2H])[2H]CC(O)([2H])[2H]C(CO)([2H])([2H])[2H]CC[17OH]C(CO)[2H][14CH3]COCCO
• resolve Standard InChIKey into full structure representation: Ethanol
(Partial) InChIKey Lookup
![Page 19: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/19.jpg)
Chemical File Representation
• available file format representations:
alc Alchemy formatcdxml CambridgeSoft ChemDraw XML formatcerius MSI Cerius II formatcharmm Chemistry at HARvardMacromolecular Mechanics file formatcif Crystallographic Information Filecml Chemical Markup Languagegjf Gaussian input data filegromacs GROMACS file formathyperchem HyperChem file formatjme Java Molecule Editor format
maestro Schroedinger MacroModelstructure file formatmol Symyx molecule filesybyl2/mol2 Tripos Sybyl MOL2 formatmrv ChemAxon MRV formatpdb Protein Data Banksdf Symyx Structure Data Formatsdf3000 Symyx Structure Data Format 3000sln SYBYL Line Notationsmiles SMILESxyz xyz file format
http://cactus.nci.nih.gov/chemical/structure/Aspirin/file?format=sdf
![Page 20: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/20.jpg)
Chemical Structure Images (GIF, PNG)
http://cactus.nci.nih.gov/chemical/structure/XMWRBQBLMFGWIX-UHFFFAOYSA-N/image?height=300&width=300&bgcolor=black&bondcolor=white
http://cactus.nci.nih.gov/chemical/structure/Aspirin/image?height=200&width=200&symbolfontsize=7&footer="Aspirin"
Buckyball
![Page 21: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/21.jpg)
Chemical Properties
• request molecular weight:
http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight
180.1598
/mw molecular weight/formula formula/monoisotopic_mass monoisotopic mass/h_bond_donor_count H bond donor count/h_bond_acceptor_count H bond acceptor count/h_bond_center_count H bond center count/rotor_count number of rotatable bonds/effective_rotor_count number of effectively rotatable bonds/rule_of_5_violation_count number of Rule-of-5 violations/xlogp2 octanol−water partition coefficient XLOGP2
/aromatic compound is aromatic/macrocyclic compound is macrocyclic/heteroatom_count heteroatom count/hydrogen_atom_count H atom count/heavy_atom_count heavy atom count/deprotonable_group_count number of deprotonable groups/protonable_group_count number of protonable groups/ring_count number of rings/ringsys_count number of ringsystems
MIME type: text/plain
Aspirin
![Page 22: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/22.jpg)
• request (alternative) names:
<?xml version="1.0" encoding="UTF-8" ?> <request string=“Aspirin" representation="names">
<data id="1" resolver=“name" string_class=“Name"><item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item><item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item><item id="3" classification="pubchem_generic_registry_name">50-78-2</item><item id="4" classification="pubchem_generic_registry_name">11126-35-5</item><item id="5" classification="pubchem_generic_registry_name">11126-37-7</item><item id="6" classification="pubchem_generic_registry_name">2349-94-2</item><item id="7" classification="pubchem_generic_registry_name">26914-13-6</item><item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item><item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item><item id="10" classification="pubchem_substance_synonym">SBB015069</item><item id="11" classification="pubchem_substance_synonym">Aspirin</item><item id="12" classification="pubchem_substance_synonym">D00109</item>
[…]
http://cactus.nci.nih.gov/chemical/structure/Aspirin/names/xml
Chemical Name Lookup
![Page 23: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/23.jpg)
example: all chemical names that contain the words “morphine” and “methyl”(name pattern: ‘+morphine +methyl‘):
http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern
based on the open sourcefull text search server Sphinx(http://sphinxsearch.com)
• Google-like searches on CIR’s name index (approx. 70 million names)
Chemical Name Pattern Search
![Page 24: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/24.jpg)
<request string="+morphine +methyl" representation="stdinchikey"><data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether">
<item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item></data><data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine">
<item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item></data><data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-">
<item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item></data><data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER">
<item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item></data><data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether">
<item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item></data><data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride">
<item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item></data><data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-">
<item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item></data>
</request>
Search name pattern ‘+morphine +methyl’: 7 matching names
![Page 25: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/25.jpg)
Chemical Name Pattern Search
example: chemical names that contain the words “morphine” and “methyl”but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern
example: chemical names that contain the substring “morphine”somewhere in the name (name pattern: ‘*morphine*‘) http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern
example: chemical names that contain a single character “m” and the word “benzene” in a maximum distance of 3 words (finds meta-substituted aromaticcompounds, name pattern: ‘“m benzene”~3‘):http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern
6 matching names
45 matching names
22 matching names
![Page 26: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/26.jpg)
NCI/CADD Chemical Structure DataBase CSDB 2010
![Page 27: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/27.jpg)
Chemical Structure Normalization/Identifier
structurenormalization
parentstructure
NCI/CADDIdentifier
hashcodecalculation
E_HASHISY
original structure
record
MolfileSDFSMILESChemDraw cdxPDB
SDFSMILESdatabase
original structure records, parent structures and identifiersare stored in the database
• stepwise process:
![Page 28: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/28.jpg)
• calculation of a set of parent structures with differentsensitivity to chemical features:
structurenormalization
parentstructure
NCI/CADDIdentifier
hashcodecalculation
FICTS
original structure
record
FICuS
uuuuu
FICTS
FICuS
uuuuu
Chemical Structure Normalization/Identifier
E_HASHISY
all steps are performed using CACTVS
![Page 29: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/29.jpg)
NCI/CADD Identifiers (FICTS, FICuS, uuuuu)
HNN NH2
O-
ONa+
6C16DE2351F9FF50-FICTS
NNH NH2
OH
O
9850FD9F9E2B4E25-FICTS
HNN
OH
O
NH2HN
NOH
O
NH2HN
N NHOH
O
E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS
E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuSE5F83F10C5DB080A-FICuS
E5F83F10C5DB080A-FICTS
tautomer 2 salt SRtautomer 1
structure normalization - histidine:
based on CACTVS hashcodes (HASHISY)16-digit hexadecimal number (64-bit unsigned) HN
N NH2
OH
O
9850FD9F9E2B4E25-FICuS 9850FD9F9E2B4E25-FICuS
9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25
![Page 30: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/30.jpg)
• calculation of Standard InChIKey from the union set ofparent structures
structurenormalization
parentstructure
NCI/CADDIdentifier
hashcodecalculationoriginal
structurerecord
FICTS
FICuS
uuuuu
Chemical Structure Normalization/Identifier
E_HASHISY
Standard InChIKeyunion set:
1.03
![Page 31: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/31.jpg)
Chemical Structure Database (CSDB)
• ChemNavigator iResearch Librarycompilation of commercially available screeningcompounds from ~300 international chemistrysuppliers
• PubChem Substance Databaseincluding Open NCI database, EPA DSSTox databases, NIAID HIV database, NIST Webbook, NLM ChemIDplus, ChemSpider, …
• Commercial Sources / othersAsinex, Comgenex, eMolecules, …
ChemNav.iResearch Lib.~56%
PubChem~38%
others
~6%
140 chemical structure databases120 million structure records
84.6 million unique structures by FICuS110 million Standard InChIKeys for lookup
current status: (released March 2010)
![Page 32: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/32.jpg)
NCI/CADD Chemical Structure DataBase CSDB 2013
![Page 33: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/33.jpg)
FICTS ~125.0 million FICuS ~121.4 million uuuuu ~109.0 million
• >270 small-molecule database• >600 database releases (full, incremental, “historic versions”)• 385 million original database records
Chemical Structure Database 2013
unique structure count:
union set: 141.7 million unique structures
![Page 34: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/34.jpg)
InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:
Set 1
Set 2
Set 3
Standard Standard InChIKey
DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T
Add H
Add H
Add H
Add H
CACTVS
:
:
:
:
Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVSSet 3: addition of hydrogen atoms by the InChI library
Chemical Structure Database 2013
![Page 35: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/35.jpg)
• calculation of Standard InChIKey
structurenormalization
parentstructure
NCI/CADDIdentifier
hashcodecalculationoriginal
structurerecord
FICTS
FICuS
uuuuu
E_HASHISY
union set:
Standard InChIKey 1.04
Set 1 Set 2 Set 3Standard
Chemical Structure Database 2013
![Page 36: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/36.jpg)
Chemical Structure Database 2013
• database schema is entirely implemented in python/
• supports many different database engines: Oracle, PostreSQL, MySQL
• SQLAlchemy provides:
• the communication layer with the database engine
• creates a object-oriented data model representation of the database to the “python”-side
• table relationships:
• either defined by Foreign Key relationships in the database or specified on python level
• SQLAlchemy creates table joins on the SQL level
![Page 37: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/37.jpg)
structure_table = Table(‘structure’, metadata,Column(‘id’, Integer, primary_key=True, autoincrement=True),Column(‘hash’, Char(16), unique=True,Column(‘smiles’, Text()),schema=schema
)
class Structure(TableRepr, TableInit):__table__ = structure_table
mapper(Structure, structure_table, relationship={‘name’: relationship(Name, backref=backref(‘structure’,primaryjoin=structure_table.c.id=name_table.c.structure_id
})
Chemical Structure Database 2013
• SQLAlchemy table definition
![Page 38: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/38.jpg)
Chemical Structure Database 2013
• Query the database
> s = db.session.query(Structure).filter(Structure.id==1234).one()<object “Structure”>> s.smilesCCO
> q = select([structure_table.c.id,]).where(structure.c.id==1234)> s = q.execute().fetchone()(CCO,)
• if the object-oriented data model representation creates too much overhead, SQLAlchemy supports writing “almost
bare” SQL but still follows the python paradigms
![Page 39: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/39.jpg)
Chemical Structure Database
• index any chemical structures that can be referenced in some way or has a known source
• may also include virtual chemistry or generic structure collections• collect public dataset/databases/structure collections• normalize them to our standards• make them available in our public web interfaces and APIs
(if we are allowed to)• no refusal/deletion of structures – curation is performed by “keep the
bad and tag it as bad”
track chemical space
• Goals
![Page 40: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/40.jpg)
NCI/CADD Chemical Web Apps
![Page 41: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/41.jpg)
![Page 42: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/42.jpg)
NCI/CADD Chemical Web Apps
• implemented with jQuery Mobile (1.3.0)• HTML5• supports web browser on major mobile platforms: iOS, Android,
BlackBerry, WindowsPhone, Windows 8, Palm, Symbian• supports major Desktop web browsers: Google Chrome, Firefox, IE9/10• WAI-ARIA compliant (W3C specification draft describing accessibility
standards of dynamic Web content for people with disabilities)
• services will be optimized for usage on tabled-sized touch screens devices, however, not (yet) for smart-phone sized devices (current development is done on an iPad3)
• all services work on a common platform
![Page 43: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/43.jpg)
![Page 44: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/44.jpg)
![Page 45: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/45.jpg)
![Page 46: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/46.jpg)
![Page 47: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/47.jpg)
![Page 48: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/48.jpg)
![Page 49: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/49.jpg)
chemical structure
prediction of physicochemical properties and activities
Chemical Activity Predictor - GUSAR
![Page 50: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/50.jpg)
characteristics:
chemical structures are represented byQNA descriptorsMNA descriptors
mathematical algorithmunique algorithm of self- consistent regression allows to select the best set of descriptors for a robust and reliable QSAR model.
main developerAlexey Zakharov
Chemical Activity Predictor - GUSAR
GUSAR Software
![Page 51: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/51.jpg)
comparison was performed on the following data sets:
• ligand–enzyme interactions• ligand–receptor interactions• acute toxicity• interaction with drug-metabolism• enzymes
GUSAR Software
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
CoMFA CoMSIA HQSAR EVA 2DCerius2
3DCerius2
GOLPE GUSAR
Accu
racy
(R2
test
)
Chemical Activity Predictor - GUSAR
![Page 52: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/52.jpg)
Chemical Activity Predictor - GUSAR
• QSAR-based models created by GUSAR can be used separatelyfrom the application
• broad spectra of chemical/biological activity and property prediction models for small molecules in development:• physicochemical properties• assessment of toxicity, metabolism and antineoplastic activities• HIV-1-related models
• will be available as Web App and programmatic URL API:
http://cactus.nci.nih.gov/chemical/activity/CCOCC/boiling_point
{in_applicability_domain: True, datatype: ‘float’, value: 42.660}
![Page 53: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/53.jpg)
Chemical Activities
Categories Models Endpoints
PhysicochemicalProperties
PhysicochemicalModels
Boiling pointDensity Flash pointMelting pointSurface tensionThermal conductivityVapor pressureViscosityWater solubilityHIV-1 Integrase (Strand Transfer) InhibitorHIV-1 Reverse Transcriptase Inhibitor
HIV-ModelsBiological Activities
![Page 54: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/54.jpg)
![Page 55: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/55.jpg)
![Page 56: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/56.jpg)
![Page 57: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/57.jpg)
![Page 58: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/58.jpg)
Activity Endpoints
![Page 59: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/59.jpg)
Activity Endpoints
![Page 60: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/60.jpg)
Activity Endpoints
![Page 61: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/61.jpg)
Activity Endpoints
![Page 62: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/62.jpg)
Prediction ResultsGUSAR• value• unit• in applicability domain• quantitative and
qualitative models
![Page 63: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/63.jpg)
Chemical Activity Predictor – GUSAR beta
http://cactus.nci.nih.gov/chemial/apps
![Page 64: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/64.jpg)
Chemical Activity Predictor – GUSAR beta
http://cactus.nci.nih.gov/chemial/apps
![Page 65: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/65.jpg)
![Page 66: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/66.jpg)
Chemical Structure Lookup Service (CSLS)
• first version was released in 2006, development stalled in 2008• new version will be based on CSDB• new release planned for 2013• allows easy lookup of chemical structures within the constituting
databases in CSDB
![Page 67: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/67.jpg)
![Page 68: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/68.jpg)
InChI/InChIKey Resolver
![Page 69: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/69.jpg)
InChI/InChIKey Resolver
“loose coupling”of InChI resolversprovided by differentorganizations
central list of resolvers
each resolvermust provide aspecific protocol.
![Page 70: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/70.jpg)
InChI/InChIKey Resolver
• Evan Bolton (NCBI, NLM, NIH)• Valery Tkachenko (RSC/ChemSpider)• Marc Nicklaus (CADD Group, NCI, NIH)• Steven Bachrach (Trinity University)• Antony Williams (RSC/ChemSpider)• Markus Sitzmann (CADD Group, NCI, NIH)
![Page 71: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/71.jpg)
Chemical Structure Web API
NCI/CADDweb service
NCI/CADD Chemical StructureDataBase (CSDB)
CACTVS
externalweb services
http
ChemicalIdentifierResolver
othersoftwarepackages
Chemical Structure Web API
NCI/CADDweb service
OPSIN
![Page 72: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/72.jpg)
Chemical Structure Web API
NCI/CADDweb service
NCI/CADD Chemical StructureDataBase (CSDB)
CACTVS
externalweb services
http
ChemicalIdentifierResolver
othersoftwarepackages
Chemical Structure Web API
OPSIN
NCI/CADDweb service
GUSAR
![Page 73: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/73.jpg)
http://cactus.nci.nih.gov/blog
![Page 74: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/74.jpg)
NCI/CADD TeamAlexey ZakharovLaura Guasch PàmiesMegan Peach Marc Nicklaus
Xemistry GmbH, GermanyWolf-Dietrich Ihlenfeldt
Acknowledgements
ChemNavigatorScott HuttonTad Hurst
InChI Team
Pubchem
All other database providers
![Page 75: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/75.jpg)
Acknowledgments - Software
CACTVS
Python Web FrameworkChemWriter
Python SQL Library
Javascript library
Peter Ertl (Novartis)
Fulltext Search Engine
![Page 76: ACS Meeting New Orleans 2013 (CINF)](https://reader038.fdocuments.us/reader038/viewer/2022110121/55873c2cd8b42a7b098b4587/html5/thumbnails/76.jpg)
http://cactus.nci.nih.gov