Understanding Smiles
-
Upload
abhik-seal -
Category
Education
-
view
1.137 -
download
1
Transcript of Understanding Smiles
![Page 1: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/1.jpg)
SMILES Review
![Page 2: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/2.jpg)
There are six generic SMILES encoding rules, corresponding to specification of atoms, bonds, branches, ring closures, and disconnections and isomerism.
SMILES RULES
![Page 3: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/3.jpg)
Basics of SMILES• SMILES specifically represents a valence model of a molecule, not a
computer data structure, a mathematical abstraction, or an “actual substance”.
• The function of SMILES is to clearly represent a particular valence model, not dictate which one should be used. For example one chemist might represent nitromethane as C[ N+](=O)[O-] with a nitrogen of valence 4 in a charge-separated structure whereas another might represent it as CN(=O)=O with a neutral five-valent nitrogen. Both of them are correct.
QUIZ : Is this a correct SMILES for nitormethane CN([O])[O]?
• SMILES represents a chemist’s model of molecules, not a computer scientist’s model of a chemical data structures.
• SMILES grammar is such that it may be canonicalized, i.e., among all possible valid SMILES for a given molecule or reaction, a single, canonical (unique)
![Page 4: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/4.jpg)
1. Atom Specifications• The SMILES atom specification sublanguage represents the atomic
properties: element identity, isotope, formal charge, and implicit hydrogen count.
• Elements in the "organic subset" B, C, N, O, P, S, F, Cl, Br, and I may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds. "Lowest normal valences" are B (3), C (4), N (3,5), O (2), P (3,5), S (2,4,6), and 1 for the halogens.
• Atoms in aromatic rings are specified by lower case letters, e.g., aliphatic carbon is represented by the capital letter C, aromatic carbon by lower case c.
• The symbol ‘*’ (“star” or “asterisk”) is treated by SMILES as a valid atomic symbol meaning “unspecified atomic number” and is represented as an atom of atomic number zero.
![Page 5: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/5.jpg)
Examples of Atom specificationsStructure SMILES Name Remarks
S [S] Elemental sulfur Defaults inside brackets: mass unspecified, charge 0, hcount 0.
Au [Au] Elemental gold Second character of two-charactersymbols is lower case
PH3 P Phosphine Lowest normal valence of phosphorus is 3
OH- [OH-] or [OH-1]
Hydroxide anion If charge value is missing, 1 isassumed. i.e., ‘+’ is equivalent to‘+1’ and ‘-’ is equivalent to ‘-1’.
Fe2+ [Fe2+] or Fe[++]
Iron(II) cation Charge sign may be repeated or have a signed value, e.g., ‘+-t’isequivalent to ‘+2’.
235U [235U] Uranium-235 A leading integer represents aspecified atomic mass
H2S S Hydrogen sulfide Lowest normal valence of sulfur is 2.
![Page 6: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/6.jpg)
2. Bond Specifications• Single, double, triple, and aromatic bonds are represented in SMILES
by the symbols -, =, #, and :, respectively. Adjacent atoms without an intervening symbol are connected by a valence-dictated bond (typically a single or aromatic bond). “-” (single) and “:” (aromatic) bond symbols may always be omitted on input.
• There is no “preferred’ or “correct” ordering in SMILES, e.g. CCO and OCC are equally valid SMILES for ethanol.
SMILES Name Emp Formula
cc Ethane (CH3CH3)
C=C Ethene (CH2=CH2)
C#N Hydrogen Cyanide HCN
CCO Ethanol CH3CH2OH
![Page 7: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/7.jpg)
3.Branching
Structure SMILES Name
CC( C)C( =O)O Isobutyric acid
FC(F)F or C(F)(F)F Fluroform
?? Perchlorate anion
Branches are specified in SMILES by enclosing them in parentheses, which may be nested or stacked. First three rules (atoms, bonds, branching) allow specification of any non-cyclic molecule
![Page 8: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/8.jpg)
4. Ring Specifications
Taken from Handbook of Cheminformatics J.Gasteiger
A useful way of thinking about SMILES ring specification is as follows. There isa graph theorem that says, “There is always a way of breaking one bond per ring ina connected molecule which leaves you with a still-connected but acyclic molecule.’’(Actually, graph theoreticians talk about “graphs” instead of ‘‘molecules’’and “edges” instead of “bonds”, but if they thought about chemistry, that is howthey might say it.) Pick one bond in each ring in this way, numbering them in anyorder. Break the numbered bonds, appending the bond number to the atoms onthe ends of the bonds so broken.
![Page 9: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/9.jpg)
ExamplesStructure SMILES Name
ClCCCCCl Cyclohexane
C1CCC=CC1C1=CCCCC1C1CCCC=C1
Cyclohexene
c1cc2ccccc2cc1c12c(cccc1)cccc2C1=CC2=C(C=C1)C=CC=C2
Napthalene
?? Biphenyl
![Page 10: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/10.jpg)
5. Disconnections• The ‘. ‘ (“period” or “dot”) is used in SMILES to represent
disconnections. In terms of the valence model being represented, the dot literally represents a bond of formal order zero: the atoms on either side of the dot are explicitly not bonded to each other.
• It is often a surprise to SMILES-parser implementers that c1cc([O-].[Na+])ccc1 is a valid synonym for[Na+].[O-]c1ccccc1.Because bonds can be specified with “ring closures”, not all SMILES which contain dots are disconnected nor are all SMILES which contain ring closures cyclic. Although somewhat perverse, C1.02.Cl2 is a valid SMILES for ethanol.
![Page 11: Understanding Smiles](https://reader036.fdocuments.us/reader036/viewer/2022083000/55796dd5d8b42a3a5c8b4ecf/html5/thumbnails/11.jpg)
Isomersim• SMILES provides for four types of specification which are so
important to the molecular model that they are included even though they are outside the valence model. They are: isotopism, orientation about double bonds, stereo specification, and (for reactions) reactant-product atom mapping. These are collectively known as “isomeric SMILES”.
Check http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html