192_Simple and Highly Predictive QSAR Method Application

Talanta 67 (2005) 182186

Simple and highly predictive QSAR metho( yl]-

. MHortola

bruary 22005

Abstract

A simple ysis uscompounds ultilinis a binary m d here6-methoxyb eptor scompared to 2005 Else

Keywords: Q amides

1. Introduction

Quantitative structureactivity relationship (QSAR) anal-yses are mostly performed by applying methodologies basedon two-dimputing apprprovided ouCoMFA [1descriptorstimes to preticular activand lessermethods retreatment.rapid analycated methsive and fainterest.

Promisi2D and 3Ddeveloped

CorresponE-mail ad

sults is still an unsolved problem. A QSAR method based onmultivariate image analysis developed by us has presented en-couraging results as a 2D technique [7], and it was an effortto open precedents in the course of developing new, simple

0039-9140/$doi:10.1016/jensional (2D) and three-dimensional (3D) com-oaches to predict biological activities. These havetstanding results, specially when using the known

], CoMSIA [2] and GRID [3] models to generate, in detriment of testing experimentally and some-dict intuitively which compound could have a par-ity, allowing then to develop new drugs in a faster

costly manner [4]. However, such computationalquire an exhaustive conformational and alignmentAccordingly, alternative methods, which providesis and results as reliable as the most sophisti-

odologies available today, but which are inexpen-cile to handle, is of immediate and worldwide

ng attempts to simplify and offer advantages overmethods have been emerging [5], as they were

in the past [6], but obtainment of comparable re-

ding author. Tel.: +55 19 3887 9353; fax: +55 19 3887 9889.dress: [email protected] (J.A. Martins).

QSAR methods. The present paper is focused on the devel-opment of a simple method for QSAR analysis, capable topredict biological activity, or any other parameter, with highcorrelation coefficients and good statistics. Two partial leastsquares regression methods, namely PLS [8] and N-PLS [9],were used here to assess the calibration models. Using thePLS regression method, an unfolded X-matrix, where eachrow contains the variables (the binary descriptors) describingeach molecule, is decomposed into a score vector (s1) and aweight vector (w1), and s1 is determined to have the prop-erty of maximum covariance with the dependent variable y.The scores vectors then replace the original variables as re-gressors. Using the multilinear PLS (N-PLS), the unfoldingstep is not necessary and the decomposition is performed di-rectly in the three-way matrix. It was shown that N-PLS ismore stable than bilinear PLS, i.e. traditional PLS, since itis supposed to increase the predictive ability and improvethe interpretation of the results, in 3D QSAR. A detailed ac-count of N-PLS regression method may be found elsewhere[9].

see front matter 2005 Elsevier B.V. All rights reserved..talanta.2005.02.016S)-N-[(1-ethyl-2-pyrrolidinyl)methMatheus P. Freitas, Jose A

EMS Sigma Pharma R&D, Rodovia SP 101 Km 08,

Received 25 November 2004; received in revised form 2 FeAvailable online 23 March

quantitative structureactivity relationship (QSAR) method of analis reported. This method is based on the application of bilinear or m

atrix representing the substituents of a framework. It is appraiseenzamides, compounds with affinity towards the dopamine D2 reca refined three-dimensional (3D) approach.

vier B.V. All rights reserved.

SAR analysis; PLS; (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-methoxybenzd: application to a series of6-methoxybenzamidesartins

ndia SP 13186-401, Brazil

005; accepted 14 February 2005

ed to predict biological activity for congeneric series ofear partial least squares regression to a data set, whichto a series of (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-

ubtype and showed high predictive ability, even when

M.P. Freitas, J.A. Martins / Talanta 67 (2005) 182186 183

Fig. 1. Structzamides (1-58NO2, F, Cl, B

Our proseries of (Sbenzamideand selectimay be useschizophrewith thosePLS algoriQSAR anddescriptorstion, and thpurpose.

2. Method

The keyptors forinyl)methyin the correby 38 colum2-positionsubstituentsubstituentand the datware (for eto the 58 cof 40 9which areset of 40the presentour data wray, the matback-folde

The quaand the extion coefficsus predictrespectiveltained to evvalidationsand the mierror sum o

N-PLS late

3. Results and discussion

priority needed to perform computer-assisted QSARsis is to calculate descriptors suitable to correlate struc-with the corresponding activities. Three-dimensionaliptors, which describe non-covalent interactions, are themost usually applied for such purpose. However, theirration requires previous molecular treatment, such asrmational screening and ligand alignment. On the other

, our proposed methodology avoids this time-consumingrement and may be easily performed. Here, the descrip-re a suggestive binary table, where the ones give ori-

o the element symbol or group representation, whileemaining blanks of table are filled out by zeros. Eachcule of a congeneric series owns a table of m rows by n

ns, and the columns are divided according to the num-f substituents. As an instance, for a three-substitutedatic ring, e.g. 1-chloro-2-nitro-4-methyltio-benzene, a2 tablf 14 csimila-ethylhe pro-wayr (N-Ponseqrrelatorrespctionsate th9 34 coluilt anlidinyinto ampouls are

matricrationel 2 coure of (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-methoxyben-). R2 = H and OH; R3 = H, Me, Et, n-Pr, n-Bu, OMe, SMe,

r and I; R5 = H, Me, Et, n-Pr, OMe, NO2, F, Cl and Br.

posed method is appraised in detail here to a)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-methoxy-s (Fig. 1), compounds that possess high affinityvity towards dopamine D2 receptors, and whichd in the treatment of psychiatric disorders, such asnia. The results from our modeling are comparedfrom a reported study [10], where a multilinearthm was utilized as the regression method in 3Dthe GRID program [3] was used for generation of. That work rendered notorious results of correla-us is a suitable reference for comparison with our

s

to this analysis is to build the matrices. Descri-each substituent of (S)-N-[(1-ethyl-2-pyrrolid-l]-6-methoxybenzamides were built and insertedsponding position of a template table of 9 rowsns. This table reserves the first 10 columns to the

substituent, the next 14 columns to the 3-positionand the remaining 14 columns to the 5-position

, and it may be built by using a Word processora may then be treated using any appropriate soft-xample, Matlab [11]). The tables correspondingompounds are grouped to form a calibration set38 and a test set of 18 9 38 (N-PLS model),

subsequently unfolded to form a new calibration342 and a test set of 18 342 (PLS model). Inanalysis, the only data preprocessing applied on

as column mean-centering. For the three-way ar-rix was unfolded, column mean-centered and thend before regression analysis.lity of the calibrations was quantified with R2

2

Aanalyturesdescrones

geneconfohandrequitors agin tthe rmolecolumber oarom

9 4sets o

AN-[(1and tthreelineaand cby cothe cpredievalu

Aand 1is bupyrrovided18 comodewaycalibModternal predictions with Q , the squared correla-ients of the linear regression of experimental ver-ed pIC50 plots for the calibration and test sets,y. The F-statistic and t-test values were also ob-aluate the model quality. The leave-one-out cross-were performed with the NIPALS algorithm [12]

nimum cumpress (cumulative predictive residualf squares) values were estimated for 5 PLS and 7nt variables (LV).

Fig. 2. Tablethat ones desifill out the blae may be built, whose columns are divided in 3olumns, as illustrated in Fig. 2.r procedure is described here to a series of (S)--2-pyrrolidinyl)methyl]-6-methoxybenzamides,vided data are then matricized to two-way and

arrays, in order to run bilinear (PLS) and multi-LS) partial least squares regression, respectively,uently appraise the predictive ability of the modeling the fitted and predicted activities (pIC50) withonding experimental values. Comparison withfrom literature [10] is also provided to better

e quality of our model.8 table (10 columns for the 2-position substituentmns for both the 3- and 5-position substituents)alogously to Fig. 2 for 58 (S)-N-[(1-ethyl-2-l)methyl]-6-methoxybenzamides, which were di-calibration set of 40 compounds and a test set of

nds, identically as previously reported [10]. Twoconsidered here: Model 1, consisting of three-

es to run N-PLS, i.e. a 40 9 38 matrix for theset and a 18 9 38 matrix for the test set, andnsisting of unfolded matrices to run PLS, i.e. a

of descriptors for 1-chloro-2-nitro-4-methyltio-benzene. Notegnate, as a drawing, element symbols and groups, while zerosnks.

184 M.P. Freitas, J.A. Martins / Talanta 67 (2005) 182186

Table 1Aromatic substitution patterns and activities of compounds 1-58Compounds R2 R3 R5 Activity (pIC50) Compounds R2 R3 R5 Activity (pIC50)

1 0 O2 1 O3 2 O4 3 O5 4 O6 5 O7 6 O8 7 O9 8 O10 9 O11 0 O12 1 H13 2 O14 3 O15 4 H16 5 O17 6 O18 7 H19 8 H20 9 H21 0 O22 1 O23 2 O242526272829

a Experime

40 342 mtrix for thetraditionalsince N-PLas compare

The resuTable 1 andpredictionpredicted pN-PLS latecompress viments. Infrom happea fortuitousities block)modeling (LVs), as swith no moin predictiorespondingcompoundof the aromwere used iExpa Fitted(PLS/N-PLS)

Predicted(PLS/N-PLS)

OH Cl Cl 7.49 7.77/7.76 3OH OMe Cl 7.15 7.13/7.15 3H Br Br 8.10 7.80/7.87 3H Et Br 7.96 8.34/8.48 3H I OMe 9.17 9.26/9.28 3OH nPr Me 8.30 8.50/8.34 3OH Cl nPr 6.96 6.80/7.13 3OH H Et 6.91 6.97/7.15 3OH I OMe 9.54 9.45/9.49 3H SMe OMe 8.96 8.50/8.40 3OH Et OMe 8.89 9.56/9.43 4H n-Bu OMe 8.57 8.60/8.80 4H n-Pr H 7.17 7.65/7.84 4H Cl H 6.59 7.25/7.06 4H Cl Cl 7.70 7.58/7.55 4H Cl Br 8.25 7.61/7.60 4H Br H 7.34 7.44/7.33 4H Br OMe 8.92 8.84/8.62 4H Et Cl 8.38 8.31/8.43 4OH H Cl 7.19 6.93/6.96 4OH H OMe 8.06 8.00/7.76 5OH F H 6.44 7.13/7.00 5OH Cl H 7.41 7.44/7.27 5

OH Cl Br 7.24 7.80/7.80 53 OOH Cl Et 7.92 7.81/7.94 54 OOH Br H 8.08 7.63/7.54 55 OOH Br Cl 7.77 7.96/8.03 56 OOH Br Br 7.59 7.99/8.07 57 OOH Br OMe 8.85 9.03/8.82 58 O

ntal values are obtained from Ref. [10].

atrix for the calibration set and a 18 342 ma-test set. This was done in order to compare the

PLS regression method with the N-PLS approach,S is supposed to demonstrate several advantagesd to PLS [10].lts for calibration and validation are presented inillustrated in Fig. 3, where good performances in

are shown for both PLS and N-PLS models. TheIC50 values of Fig. 3 are referred to 5 PLS and 7nt variables, since these presented the minimumalues in the leave-one-out cross-validation exper-order to show that the relationships did not resultnstance and to assure that the calibration was notcorrelation, we scrambled the Y-block (the activ-and no predictive relationship was found from theR2 of 0.42 for 5 PLS LVs and 0.45 for 7 N-PLSupposed if we consider that a set of compoundsdeling capability is taken. The largest deviationn, as can easily be seen in Fig. 3 (the point cor-to the smallest experimental pIC50), refers to thecontaining a NO2 group bonded to the 3-positionatic ring. No compounds with this connectivity

n the calibration set, explaining the observed out-

lier result,tions usuallpredict propoint correit is not advit was left i[10] (whenpredicted vand 0.78 fotical paramwell as theposed methin comparierature [10number ofimproved tPLS for thwas a bit m

Loadingmetric Piroportant varinterpretatiables withExpa Fitted(PLS/N-PLS)

Predicted(PLS/N-PLS)

H Br NO2 6.73 6.67/6.78 H I H 8.52 8.05/8.21 H Me Cl 7.59 7.74/7.73 H Me Me 8.11 8.07/7.53 H Et H 8.54 8.16/8.15 H Et F 8.82 8.52/8.53 H Et Cl 9.04 8.42/8.64 H Et Br 8.64 8.52/8.68 H nPr H 8.30 7.84/8.04 H OMe H 6.69 6.80/6.66 H OMe Br 7.17 7.16/7.20

Br OH 8.00 8.24/8.47H n-Pr Cl 8.49 8.19/8.62H Me Br 8.26 7.78/7.85

Me OMe 8.28 8.63/8.40H Br Et 7.77 8.01/8.29H Et Et 8.75 8.54/8.90

Et OMe 8.89 9.38/9.31H OMe 7.28 7.81/7.63Et H 7.40 7.99/8.02

H H H 6.50 6.61/6.56H H Br 7.25 6.97/7.09H Cl Me 7.96 8.11/7.65

H Cl OMe 8.77 8.85/8.64H Br F 8.15 8.00/8.01H Br Me 7.96 8.30/7.92H Me H 7.72 7.42/7.32H Me n-Pr 6.85 6.78/7.18H NO2 H 5.52 7.03/6.83

but it should be born in mind that similar devia-y occur when any other QSAR model is applied toperties in non-trained situations. Also, since thissponds to an extreme value of experimental pIC50,isable to belong to an external validation set, but

n to ends of comparison with the literature resultsincluding this point into the calibration set, the

alues are improved to a Q2 of 0.76 for 5 PLS LVsr 7 N-PLS LVs). Tables 2 and 3 show the statis-eters obtained from calibration and validation, ascorrelation results from literature [10]. Our pro-od presented improving effects in many aspects

son with the results from the 3D approach of lit-], such as better predictive power using a smallerdata. The N-PLS regression method has slightlyhe predictive ability as compared to the bilineare set of compounds studied here, while the latterore parsimonious.s analysis was performed, by using the chemo-uette 3.11 Software [13], in order to identify im-iables in the PLS model and also to reach someon of the binary descriptors. Exclusion of vari-low loadings from the model, according to factor

M.P. Freitas, J.A. Martins / Talanta 67 (2005) 182186 185

Fig. 3. Experimental vs. predicted pIC50 values for (a) PLS model and (b)N-PLS model.

Fig. 4. Loadings plot, using factor number one for the PLS model.

number one (the factor that retains the most significant vari-ance in the datasee loadings plot using factor one in Fig. 4),did not improve the calibration quality. However, some inter-pretability from the weights illustrated in Fig. 4 may be givenfor the calibration set. The last rows of the last columns, con-sidering the binary tables for each molecule, correspondingto the last variables in the X-matrix, have a positive effect onthe biological activity when populated by ones, e.g. OMeand SMe substituents at the R5 position. On the other hand,the first rows, according to factor number one, have smallsignificanc

Other ding this apagents, whiseries, our c5 PLS LVCoMFA anwas obtain

Table 2Statistical parameters of calibration and validation for the QSAR model (using PLS) and literatu#LV R2a Q2b S.D.ec S.D.pd S.E.ee S.E.pf Feg Fph

1 0.46 0.19 0.59 0.82 0.34 0.62 5.7 7.42 0.56 0.32 0.53 0.72 0.28 0.48 11.4 7.43 0.68 0.43 0.46 0.67 0.20 0.41 42.6 0.24 0.77 0.62 0.38 0.54 0.14 0.27 91.1 13.45 0.82 0.71 0.34 0.47 0.11 0.20 137.5 24.46 0.83 0.70 0.33 0.48 0.10 0.21 154.5 24.77 0.84 0.65 0.32 0.52 0.10 0.25 171.7 17.48 0.85 0.67 0.31 0.51 0.09 0.24 190.2 19.09 0.86 0.69 0.30 0.49 0.09 0.22 206.2 15.7

10 0.87 0.69 0.29 0.50 0.08 0.22 221.1 11.7a Squared correlation coefficient of estimate.b Squared correlation coefficient of prediction.c Standard deviation of estimate.d Standard deviation of prediction.e Standard error of estimate.f Standard error of prediction.g F-test value of estimate.h F-test value of prediction.i t-Test value of estimate.j t-Test value of prediction.

k R2 and Q2CV from literature (Ref. [10]) for model with 26,400 variables.l R2 and Q2 from literature (Ref. [10]) for model with 2940 variables.e on the biological activities.ata sets have been analyzed in our laboratory us-proach, including a series of potential anxiolyticch are some 5-HT2C receptor antagonists. For thisross-validated predictions gave aQ2CV of 0.60 for

s, in agreement with the results reported from aalysis [14], where a Q2CV of 0.656 for 4 PLS LVsed.

re data

tei tpj R2k Q2CVl R2l Q2l

85.1 40.9 0.47 0.29 0.44 0.3694.6 46.4 0.59 0.44 0.64 0.45

110.0 50.4 0.69 0.52 0.73 0.64130.3 62.2 0.75 0.56 0.79 0.56147.2 71.3 0.79 0.56 0.81 0.62152.9 70.6 0.81 0.57 0.87 0.62158.5 64.8 0.85 0.56 0.88 0.57164.2 66.4 0.87 0.62 0.90 0.63

169.1 68.2 0.91 0.62173.5 67.8 0.92 0.59

186 M.P. Freitas, J.A. Martins / Talanta 67 (2005) 182186

Table 3Statistical parameters of calibration and validation for the QSAR model (using N-PLS) and literature data#LV R2a Q2b S.D.ec S.D.pd S.E.ee S.E.pf Feg Fph tei tpj R2k Q2CVl R2l Q2l

1 0.44 0.20 0.60 0.81 0.35 0.62 8.7 7.4 83.6 41.0 0.47 0.29 0.44 0.362 0.50 0.28 0.56 0.75 0.31 0.51 1.0 7.8 88.9 45.0 0.59 0.44 0.64 0.453 0.60 0.33 0.50 0.74 0.25 0.50 20.7 3.6 99.5 45.6 0.69 0.52 0.73 0.644 0.70 0.50 0.44 0.62 0.19 0.36 54.0 1.0 115.1 53.8 0.75 0.56 0.79 0.565 0.79 0.68 0.37 0.50 0.13 0.22 106.0 15.3 136.0 67.8 0.79 0.56 0.81 0.626 0.81 0.76 0.35 0.44 0.12 0.17 132.1 32.6 145.3 77.5 0.81 0.57 0.87 0.627 0.84 0.75 0.32 0.43 0.10 0.17 168.5 35.9 157.5 77.1 0.85 0.56 0.88 0.578 0.85 0.74 0.31 0.44 0.10 0.18 179.5 28.0 160.9 75.6 0.87 0.62 0.90 0.639 0.85 0.75 0.31 0.44 0.09 0.18 186.0 28.3 163.0 76.4 0.91 0.62

10 0.86 0.76 0.30 0.43 0.09 0.17 203.2 32.0 168.2 78.1 0.92 0.59a Squared correlation coefficient of estimate.b Squared cc Standardd Standarde Standardf Standardg F-test valuh F-test valui t-Test valuj t-Test valu

k R2 and Ql R2 and Q2

With thmethod forferent subswhich posiels can be bological daas using ahaving resu

4. Conclu

The proof compoutrices thanlow compuity for a semethoxybegression mcodes, thoumeaning, athe substitutation for tsimilar mo

rence

.D. Cra1988) 5. Kleb13041.J. Go995.. Rees,.L. Di809..M. Fre.P. Fre

49154. Geladorrelation coefficient of prediction.deviation of estimate.deviation of prediction.error of estimate.error of prediction.e of estimate.e of prediction.e of estimate.e of prediction.

2CV from literature (Ref. [10]) for model with 26,400 variables.

from literature (Ref. [10]) for model with 2940 variables.

is novel appliance, a simple and very accessibleQSAR analysis, one can test molecules with dif-tituents and evaluate which type of them and intion they can influence dependent variables. Mod-uilt rapidly, depending only on availability of bi-

ta. We do not have alignment problems, of course,single binary table instead of 3D descriptors, butlts as good as those provided from 3D approaches.

sions

posed QSAR methodology for congeneric series

Refe

[1] R(

[2] G4

[3] P1

[4] P[5] S

3[6] S[7] M

1[8] Pnds allows the construction of much smaller ma-the usually applied 3D approaches, thus requiringtational cost, and provided high predictive abil-ries of (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-nzamides, when using both PLS and N-PLS re-ethods. Actually, the presented descriptors act asgh they do not have a direct physicochemical

nd were generated here to didactically representents, but any plausible and systematic represen-he substituents should be possible to guaranteedeling capability.

[9] R. Bro,[10] J. Nilsso

Comput.[11] Matlab V[12] H. Wold

Press, N[13] Pirouette

199020[14] S.M. Br

Forbes,K.M. ThT.P. BlaJ. Med.s

mer III, D.E. Patterson, J.D. Bunce, J. Am. Chem. Soc. 1109595967.e, U. Abraham, T. Mietzner, J. Med. Chem. 37 (1994)46.odford, GRID, University of Oxford, Oxford, UK,

Sci. Comput. World July/August (2003) 1618.xon, K.M. Merz Jr., J. Med. Chem. 44 (2001) 3795

e, J.W. Wilson, J. Med. Chem. 7 (1964) 395399.itas, S.D. Brown, J.A. Martins, J. Mol. Struct. 738 (2005).i, B.R. Kowalski, Anal. Chim. Acta 185 (1986) 117.

J. Chemometrics 10 (1996) 4761.n, E.J. Homan, A.K. Smilde, C.J. Grol, H. Wikstrom, J.Aided Mol. Des. 12 (1998) 8193.ersion 6.5.1, MathWorks Inc., Natick, MA, 1993.

, in: K.R. Krishnaiah (Ed.), Multivariate Analysis, Academicew York, 1966, pp. 391420.

Version 3.11, Infometrix Inc., Woodinville, WA,03.omidge, S. Dabbs, D.T. Davies, D.M. Duckworth, I.T.P. Ham, G.E. Jones, F.D. King, D.V. Saunders, S. Starr,ewlis, P.A. Wyman, F.E. Blaney, C.B. Naylor, F. Bailey,

ckburn, V. Holland, G.A. Kennett, G.J. Riley, M.D. Wood,Chem. 41 (1998) 15981612.

Simple and highly predictive QSAR method: application to a series of (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-methoxybenzamidesIntroductionMethodsResults and discussionConclusionsReferences

192_Simple and Highly Predictive QSAR Method Application

Documents

Transcript of 192_Simple and Highly Predictive QSAR Method Application