Sheridan Most Common Chemicalreplacements 2002 Ci0100806

Subscriber access provided by AZ LibraryJournal of Chemical Information and Computer Sciences is published by the AmericanChemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036The Most Common Chemical Replacements in Drug-Like CompoundsRobert P. SheridanJ. Chem. Inf. Comput. Sci., 2002, 42 (1), 103-108 DOI: 10.1021/ci0100806 Publication Date (Web): 03 January 2002Downloaded from http://pubs.acs.org on April 15, 2009More About This ArticleAdditional resources and features associated with this article are available within the HTML version: Supporting Information Links to the 9 articles that cite this article, as of the time of this article download Access to high resolution figures Links to articles and content related to this article Copyright permission to reproduce figures and/or text from this articleThe Most Common Chemical Replacements in Drug-Like CompoundsRobert P. SheridanDepartment of Molecular Systems, RY50SW-100 Merck Research Laboratories, Rahway, New Jersey 07065Received August 10, 2001We have written a method that extracts one-to-one replacements of chemical groups in pairs of drug-likemolecules with the same biological activity and counts the frequency of the replacements in a large collectionof such molecules. There are two variations on the method that differ in their treatment of replacements inrings. This method is one possible approach to systematically identify candidate bioisosteres. Here we lookat the MDDR database because it has a large diversity of drug-like compounds in a large number of therapeuticareas. The most frequent replacements in MDDR seem generally consistent with medicinal chemistry intuitionabout what chemical groups are equivalent or with groups that are easily converted by synthetic or metabolicpathways. Thismethodcanbeappliedtoanysetofmoleculeswhereinthemoleculescanbepairedbysimilar biological activity.INTRODUCTIONBioisosterismistheconceptthatachemicalgroupinabiologically active compound can be replaced with anothersuch that the new molecule retains the biological activity.Thepresumptionisthat thegroupstobesubstitutedaresimilar in some important physical property. For example, aphenylringandathiopheneringareaboutthesamesizeand both are hydrophobic, carboxylate and tetrazole are bothanionsat physiological pH, etc. Thereaderisreferredtoreviews in this area, e.g. ref 1. Many papers have been writtenwherein group X2 is substituted for group X1 in moleculeM1tomakemoleculeM2. If M1andM2havesimilarbiological activities, theclaimisoftenmadethat X1andX2 are bioisosteres. (References 2-4 are recent examples.)Thismaynotbegenerallytrueforthefollowingreasons:1. The substituent X1/X2 may be in an unimportant partofthemolecule, i.e., apart that doesnot makeacriticalinteraction with the receptor.2. If X1 and X2 are relatively small, M1 and M2 are verysimilar molecules, and it is not surprising that similarmolecules will have similar biological activities.3. All that can be inferred is that X1 might be equivalentto X2 at that one position and only for that bioactivity.A strict definition of bioisosterism might require that X1and X2 be equivalent in a number of properties (hydropho-bicity, size, charge, etc.). However, a more liberal definitionmaybegroupsthatcanbesubstitutedforeachotherinavariety of chemical classes for a variety of bioactivities. Ouraimhereistogather statisticsonhowoftengroupsaresubstituted for others in drug-like molecules. Replacementsthat occur often may be worth considering in lead develop-ment projects.METHODSThe overall scheme for counting replacements is shownin Chart 1. Details are provided below.Clustering. Examining all pairs of molecules with a givenbiological activity is impractical because the extraction ofreplaced groups is computationally expensive. We areinterested in pairs of molecules that differ only in one place.Therefore we clustered compounds using a method describedpreviously,5which uses topological descriptors to calculateoverall simlarity of two molecules. Only pairs of compoundswithinaclusterareexamined. Forthisworkweusedtheregular atom pair descriptor and a cosine similarity cutoffof 0.9.ExtractionofReplacedParts. Thishasseveral steps,which are illustrated with an example in Figure 1.1. Identify the corresponding atoms of molecules M1 andM2. These are the match atoms.2. Label the correspondingatoms withcorrespondingnames.3. Remove the bonds between match atoms.4.Deleteatomswithnobonds.Whatisleftiscalledafragment-pair.5. Filter the fragment-pairs.Bythisalgorithm, thefragmentsineachfragment-pairhave at least one match atom that is the attachment point tothe common parts of the molecules.Correspondingauthor phone: (732)594-3859; fax: (732)594-4224;e-mail: [email protected] 1103 J. Chem. Inf. Comput. Sci. 2002, 42, 103-10810.1021/ci0100806 CCC: $22.00 2002 American Chemical SocietyPublished on Web 01/03/2002Forstep1weusethemaximumcommonsubstructuredetection method in Sheridan and Miller.6This method, basedon clique detection, can generate substructures that arediscontinuous. A clique-defined substructure is a set of pairsof atoms, one from M1 and one from M2, such that the pairedatoms areof thesameatomtypeandthetopologicaldistances (in bonds) between the atoms in M1 are the sameasthecorrespondingdistancesbetweentheatomsinM2.The score of a common substructure equals the number ofatomsinthecommonsubstructureminusadiscontinuitypenalty (p ) 1) that penalizes having discontinuous frag-ments in the substructure. We keep only the highest scoringcommonsubstructure(HSCS)formoleculesM1andM2.Intheoriginal workwetypedatoms basedonelement,hybridization, and physiochemical type. Here we use onlyelement and hybridization. In the original work, we filteredout HSCSs that were not significantly larger than expectedfor two randomly selected molecules of the same size.However, for the current application, where size is irrelevant,we keep all HSCSs.For step 5, we make the following requirements to ensurethat thereisaone-to-onereplacement ofonegroupwithanother:a. The labels of the matched atoms have to be in one-to-one correspondence.b. There is exactly one fragment extracted from M1 andone from M2.Figure 2 illustrates what kinds of pairs of molecules wouldproduce fragment-pairs that would pass the filter.We will call the algorithm described above algorithm A.One optional modification is for certain atom matches to beerased after step 2. If there is at least one match atom in a3,4,5,6,7, or 8-memberedring, all atoms intheringaredeclaredunmatched. Also, if thereisaunmatchedatomadjacent to a NO, CO, SO2, or PO2 in which the atoms arematched, the atoms in the NO, CO, SO2, or PO2 are declaredunmatched as well. Steps 3-5 are followed as before. Wewill call thisalternativealgorithmB. Thedifferenceinresults is also illustrated in Figure 1. Algorithm B will bejustified later on the basis of keeping more of the context ofthe replaced groups.Counting the Occurrence of Fragment-Pairs. A hashstring is calculated from connection table of each fragment-pair using a method similar to that of Burden.7A modifiedadjacencymatrixQisconstructedsuchthat thediagonalelements for atom i are made of the sumwhere the match state is 1 if the atom is a match atom and0 otherwise.The off diagonal elements areIf i and j are from different fragments, the distance is set toan arbitrary high number.The hash string is a concatenation of the highest and lowesteigenvalues of Qexpressedtosixdecimal places. (It isusuallynecessarytocalculatetheeigenvalues indoubleprecision.) The hash string depends only on the atoms andthe bonds between them; the order of the atoms and the orderof the fragments in the pair is irrelevant. The inclusion ofmatchinformationinthehashstringhelpsusdistinguishFigure 1. A schematic diagram of how replaced groups are parsedas a fragment-pair froma pair of molecules. There are twoalternativepaths(markedAandB) dependingonwhethermatch atoms in rings or other groups are declared unmatched afterStep 2 (see text). Steps 3-5 are the same thereafter. Bondsconnecting match atoms are shown as bold. At the bottom of thefigure, the match atoms remaining after processing are indicatedby *.Figure2. Examplesofpairsofmoleculeswheretheextractedfragment-pair would pass (yes) or fail (no) the filters for algorithmA. Thefiltersaremeant todetect one-to-onereplacementsofasingle chemical group.Qii) atomic number +0.1* number of non-hydrogen neighbors +0.01* number of electrons +0.001*match stateQij) 0.4/ topological distance between i and j104 J. Chem. Inf. Comput. Sci., Vol. 42, No. 1, 2002 SHERIDANfragments that have a match atom (*) at one vs two sides(e.g. C*-O-C* in ether vs C*-O-C in methoxy).Counting unique fragment-pairs becomes a matter ofcountingthefrequencyof auniquehashstringsover allbiological activities. A counting method where the fragment-pairs are weighted as the inverse of the number of fragment-pairs in that activity did not produce significantly differentresults, at least for the most frequent fragment-pairs.Source of Molecules. One source of drug-like compoundsis the MDL Drug Data Report (MDDR),8a licensed databasecompiled from the patent literature. A small percentage ofmolecules in the MDDR are very large (e.g. peptides) andsomeareverysmall. Becausewewant toconsiderdrug-sizedmolecules,wekeptonlythosemoleculeswithintherange of 7-50 non-hydrogen atoms. (This is expedient aswell because molecules with>50atoms tendtoslowcalculations of maximum common substructure.) Moleculesin the MDDR are assigned a therapeutic category by thevendor. For thepurposes of executingChart 1, wewillassume that molecules in the same therapeutic category havethe same biological activity. Some therapeutic categories (e.g.antihypertensive) contain molecules that work by differentmechanisms, but this is not a problem here because we arelooking only at pairs of very similar molecules, and thesealmost certainly work by the same mechanism.There are 647 therapeutic categories. A molecule may bein more than one therapeutic category, and some therapeuticcategories are nearly synonymous, but we did not make anyspecial compensations for this.RESULTSIn the MDDR there were 98 445 unique molecules in thesize range and 556 therapeutic categories that had at leastone pair of similar molecules in that range. A total of 527 985pairs of molecules were compared. Algorithm A kept 90 095fragment-pairs, of which 16 536 were unique. Figure 3 showsthe distribution of the count of each unique fragment-pairas a function of its rank. This is a log-log relationship, i.e.,the counts fall very quickly with rank. Figure 4 shows thesize of the replaced groups, measured in the number of non-hydrogen atoms in each unique fragment pair, as a functionof the count. The fragment-pairs with the highest counts areall small, the smallest possible being 4 (one replaced atom+ one attachment point times two molecules). On the otherhand, the rare fragment-pairs can be small or large.The top 10 fragment-pairs, plus some other interesting onesare shown in Figure 5. By drilling down to molecule pairsthatcontributetothefragment-paircount, onecangetaninterpretationofthecontext ofthereplacement. Manyofthese seemto be classical replacements in medicinalchemistry. Themostcommonreplacement(labeledA1inFigure5)isthereplacement ofCwithNinanaromaticring. This canoccur inphenyl Spyridine, pyridine Spyrazine, pyridine Spyrimidine, etc. The next most common(A2) is -O- S -S-. This can occur in aliphatic chains,aliphatic rings, and aromatic rings. Similarly, replacementof -N- for -O- can happen not only in chains, aliphaticrings, and aromatic rings, but also in amides S esters. A4and A5 occur only in aromatic rings. A change from a six-toafive-memberedaliphaticringisA7.Thereplacementphenyl Sthiophene is A8, carbonyl Sthiocarbonyl (in ureasand amides) is A26, amide S sulfonamide is A33, phenylSfuran is A40, and phenyl Spyrrole is A76. Somereplacements(e.g. A30, A90, A99) appear tobemovingheteroatoms around a ring. A95 is the reversal of an amidebond. We were surprised to see that replacements such asA11, A15, andA18occur withahighfrequency, but inretrospect they may not be so surprising. For instance, thereductionof aketonetoanalcohol (A15) isacommonmetabolic transformation.We need to go fairly far down the list to see replacementof charged groups; for instance, A115 and A132 representanionicreplacements. WeexpectedtoseecarboxylateStetrazole as an anionic substitution, but a one-to-one replace-ment issurprisinglyrare; it occursonly10timesintheMDDR, giving it a rank of A881. Replacements of cationsalso occur far down the list. For instance, -CH2-guanidineS-CH2-NH3 (as in the amino acids Arg and Lys) occurs21 times, making it A429. Guanidine S amidine is A873.The fragment pairs in Figure 5 are generally small and donot contain much information about the context of thereplacement, especiallywithregardtorings. That iswhywe used an optional step that involved unmatching ringatomsandester, thioesters, phosphonates, etc. Usingtheoption in algorithm B, there were 116 060 total fragment-Figure3. Thecount of auniquefragment-pair vstherankindecreasing count.Figure 4. The number of non-hydrogen atoms in a uniquefragment-pair vs its count.MOST COMMON CHEMICAL REPLACEMENTS J. Chem. Inf. Comput. Sci., Vol. 42, No. 1, 2002 105pairs of which 18 275 were unique. Since more of the contextis preserved, including substituents on the rings, we expectedthat there would be more unique fragment-pairs, but thereare more total fragment-pairs as well. This is due to the factthat in algorithm A changes in two places in a single ring,e.g. cyclohexaneS morpholine, are rejected by the filter,whereas algorithm B would treat cyclohexane Smorpholineas a single change. The statistics for algorithmBlookqualitatively very much like that of algorithm A shown inFigures 3 and 4, except that the counts are somewhat smaller,andthenumberofatomsinanaveragefragment-pairaresomewhat larger, as expected.The most frequent fragment-pairs plus others are showninFigure6. ManyreplacementsarethesameaswiththealgorithmA, but theranks havechanged. Wecannowdistinguish between acyclic and ring replacements. The mostcommonarenowsmall acyclicreplacements. Phenyl Sbenzyl (B9) is the most common replacement involving aring. The next most common ring replacement is phenyl Sthiophene(B11). Thistime, sincethecontext isretained,we see that the more common replacement is 2-thiophene;3-thiophene does not show up until B75. Similarly, we seethe most common phenyl S pyridine replacement is 3-py-ridine (B20). We can now see the explicit replacement ofamide S ester (B19) and urea S carbamate (B62) insteadof the more generic -N- S -O-.Figure 5. Selected unique fragment-pairs fromthe MDDRalgorithm A. For each, the rank is given and its count, e.g. A1, isthe most frequent fragment-pair witha count of 2855. A*indicates a match atom (the connection point to the conserved partsof the molecules). The order of the two fragments in each pair isarbitrarysincethereplacementsaretreatedassymmetrical. Thesymbol ooo indicates one or more pairs were skipped.Figure 6. Selected unique fragment-pairs fromthe MDDRalgorithm B.106 J. Chem. Inf. Comput. Sci., Vol. 42, No. 1, 2002 SHERIDANBesides looking at MDDR as a whole, one can look at asubset of activities, or a single activity, with the caveat thatthe counts are much smaller and some of the replacementsare more likely to be associated with a particular positionaround a core. For instance, Figure 7 shows the algorithmB results for the activity Angiotensin II Blockers. Thereare 1304 compounds with 7-50 non-hydrogen atoms, 3468replacements, of which 3132 are unique. Many of the morefrequent replacements (B1-B5) are generic in the sense thattheyarealsofrequentfortheMDDRasawhole.Furtherdownthelist (e.g. B9andB10) weseeactivity-specificreplacements. This is true for many of the individual activitieswe examined.DISCUSSIONWehavepresentedamethodof identifyingfrequentlyreplaced groups in drug-like molecules. Of the two variations,algorithm B appears to be more appealing because it retainsmore of the contextual information of the replacement. Thismethod can be applied to any set of molecules wherein themolecules can be paired by similar biological activities.There are certain limits to our MCS algorithm in deter-mining equivalent groups, and many further refinements arepossible.First,clique-basedmethodssuchasoursdependon having matched topological distances. Thus, moleculesof the formR1-X1-R2andR1-X2-Y2-R2couldnot bematched at R1 and R2 because they are different distancesapart in the two molecules, so the -X1- S-X2-Y2-replacement would not be detected. Second, the hash methodof identifying unique fragment-pairs is very efficient, but itis rather stringent. Asingle atomchange is enough todistinguish fragment-pairs (e.g. phenyl S2-pyridine isdifferentfromphenyl S2-pyrimidine),butitisnotclearwhetherthatlevelofdiscriminationisalwaysuseful. Forexample, many chemists would not consider phenyl S2-pyridine conceptually different from phenyl S 3-methyl-2-pyrimidine. Aclusteringbasedonsimilarityratherthanstrict identity could consolidate many low-frequency frag-ment-pairs. Finally, more fragment pairs could be generatedif we removed the constraint that a single group be replacedby another. Allowing double substitutions (e.g. the pair inthe lower right of Figure 2) might provide additional insight.It should be emphasized that having a list of replacementsandtheir frequenciesisnot thesameashavingalist ofdefinitive bioisosteres. Because groups are often substituteddoesnot necessarilymeanthegroupsarephysiologicallyequivalent. For instance, the ketone S alcohol replacementmay be so common because of the ease of transformation.Also, sinceourapproachisretrospective, itcanhighlightonly those replacements that have already been made manytimesinsomecollectionof molecules. That is, chemicalgroupsthat might beequivalent toareceptor, but donotappear in a one-to-one substitution in the database, cannotbe detected.That said, we note that we have probably captured at leastsomeflavorofbioisosterismfromtheMDDR. Themostfrequently made replacements in MDDRcorrespond towidelyknownreplacementsinmedicinal chemistry,1andgenerally the replacements make physical sense in terms ofsize and bond angle, although the replacements are less oftenequivalent inhydrogenbondingcharacter. Conversely, atleast some groups that are known to be equivalent in someproperty, e.g. carboxylate and tetrazole, are in the list,althoughtheyareperhapsnotasfrequentlysubstitutedinthe MDDR as might be expected. It could be argued that,since the MDDR is derived from patent literature, we maybeseeingthereflectionof alreadyestablishedmedicinalchemistry intuition about what groups are equivalent. Thusit is not at all surprising that our results would be consistentwith such intuition. Even if this is true, however, it is usefultobeabletosystematizesuchintuitionbyanautomaticmethod.Onegoal hereistosystematicallydetect andorganizereplacements as a resource to be mined for synthetic ideas.The most frequent replacements in MDDR are not necessarilythemost interesting, becausetheyaremostlysmall andalready well-established. More insight might be derived fromreplacements that are relativelyinfrequent but showupenoughtimes (say10or more) andinenoughdifferentFigure 7. Selected unique fragment-pairs from only the moleculeswith Angiotensin II Blocker activity, algorithm B.Figure8. ExamplesfromtheMDDRwhereacarboxylate Stetrazole replacement is made in different therapeutic categories.MOST COMMON CHEMICAL REPLACEMENTS J. Chem. Inf. Comput. Sci., Vol. 42, No. 1, 2002 107therapeutic areas that one can have confidence that they arereal. For instance, the carboxylate Stetrazole replacementis infrequent, but it does occur in unrelated molecules in aleastthreedifferenttherapeuticareas. Someexamplesareshown in Figure 8.Although the MDDR is a very valuable database in that itcontainsmanydiversemoleculesinmanydifferentthera-peutic areas, making it nearly ideal for the work presentedhere, it has the limit that the activity data may not alwaysbe reliable. For instance, compounds may be claimed in apatent to have a specific activity, but the activity may notbequantitative, andthusnot alwayscomparabletocom-pounds with a similar claimed activity from another labora-tory. Also, whether the activity is in vivo or in vitro is notalways consistent. A very useful approach might be to useour method extract frequent replacements fromsmallerdatabaseswithmoreconsistent measuresof activity. Forinstance, given sets of IC50 data, one could pair only thosemoleculesfromthesamebindingassaythat haveIC50swithin a factor of 10. The frequent replacements from thosepairs wouldmorecloselyreflect bioisosterismfor thosespecific activities.ACKNOWLEDGMENTThe authors thank Dr. Scott Berk, Dr. James Doherty, andDr. Arthur Patchett for useful comments. The tools for thisworkwere writteninMIX, Mercks in-house modelingsystem, and the author thanks the other members of the MIXteam.REFERENCES AND NOTES(1) Wermuth, C. G. Molecular variations based on isosteric replacements.In The Practice of Medicinal Chemistry; Wermuth, C. G., Eds.; 1996;pp 202-237.(2) van Vliet, L. A.; Rodenhuis, N.; Dijkstra, D.; Wikstrom, H.; Pugsley,T. A.; Serpa, K. A.; Meltzer, L. T.; Heffner, T. G.; Wise, L. D.;Lajiness, M. E.; Huff, R. M.; Svensson, K.; Sundell, S.; Lundmark,M. Synthesis and pharmacological evaluation of thiopyran analoguesof the dopamine D-3 receptor selective agonist. J. Med. Chem. 2000,43, 2871-2882.(3) Balsamo, A.; Macchia, M.; Martinelli, A.; Rossello, A. The [(methy-loxy)imino]methyl moiety (MOIMM) in the design of a new type ofbeta-adrenergic blocking agent. Eur. J. Med. Chem. 1999, 34, 283-291.(4) Mederski, W. W. K. R.; Osswald, M.; Dorsh, D.; Anzali, S.;Christadler, M.; Schmiteges, C.-J.; Wilm, C. Endothelin antagonists:evaluation of 2,1,3-benzothiadiazole as a methylendioxyphenyl bioi-sostere. Bioorg. Med. Chem. Lett. 1998, 8, 17-22.(5) Sheridan, R. P. The centroid approximation for mixtures: calculatingsimilarity and deriving structure-activity relationships. J. Chem. Inf.Comput. Sci. 2000, 40, 1456-1469.(6) Sheridan, R. P.; Miller, M. D. Amethodforvisualizingrecurrenttopologicalsubstructuresinsetsofactivemolecules. J. Chem. Inf.Comput. Sci. 1998, 38, 915-924.(7) Burden, F. R. A chemically intuitive molecular index based on theeigenvalues of a modified adjacency matrix. QSAR 1997, 16, 309-314.(8) Molecular DesignDrugDataReport, version99.1distributedbyMolecular Design Ltd.: San Leandro, CA.CI0100806108 J. Chem. Inf. Comput. Sci., Vol. 42, No. 1, 2002 SHERIDAN

Sheridan Most Common Chemicalreplacements 2002 Ci0100806

Documents

Transcript of Sheridan Most Common Chemicalreplacements 2002 Ci0100806