Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

13

Click here to load reader

Transcript of Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Page 1: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Neural Network Based QSPR Study for Predicting pKa of Phenolsin Different Solvents

Jesus Jover, Ramon Bosque and Joaquim Sales*

Departament de Qu�mica Inorganica, Universitat de Barcelona, Mart� i Franques, 1, 08028-Barcelona, Spain,E-mail: [email protected]

Keywords: Different solvents, Multicomponent systems, Neural networks, Phenols, pKa, QSPR

Received: July 11, 2006; Accepted: August 2, 2006

DOI: 10.1002/qsar.200610088

AbstractComputational Neural Network (CNN) based QSPR methodology is applied to amulticomponent system: the prediction of pKa of phenols in different solvents. The systemis composed of 94 phenols, 10 solvents, and 276 experimental pKa values. The phenols arecharacterized by the habitual molecular descriptors, while the solvents are described by anumber of physical properties and by several parameters of the most used multipara-metric polarity solvent scales. The proposed model, non-linearly derived, contains sevendescriptors; five of them belong to the solutes and the other two to the solvents. Goodresults are obtained with a Root-Mean-Square Error (RMSE) and correlation coefficients(R2) of 0.71 (0.982), 0.83 (0.977), and 0.95 (0.975) for the training, prediction, and crossvalidation sets, respectively. The robustness of the model is also in accord with thestatistical results obtained from particular subsets of phenols with and without ortho-substituents and those obtained from the subsets of values of pKa determined in protic orin aprotic solvents and also in each solvent. The descriptors of the model encodeinformation that reflects characteristics of the molecules of the solutes and the solventsclearly related to the interactions acting in the dissociation process.

1 Introduction

The acid – base processes are one of the most importanttypes of reactions in chemistry and biochemistry. The acidcharacter of a substance is expressed by its acid dissocia-tion constant, Ka, which describes the extent to which acompound dissociates in solution. The pKa is a key proper-ty which governs the general reactivity of the substanceswith other chemical compounds, and it is also very impor-tant, among others, in chromatographic, partitioning, andphase-transfer processes. Although most experimental val-ues have been determined in water, the importance ofnon-aqueous solutions is increasing, and nowadays dissoci-ation constants of different types of organic and inorganiccompounds in several solvents are known. On the otherhand, it is well known that the pharmacokinetic propertiessuch as bioavailability, capacity to diffuse across manymembranes and other physical barriers of a compound canbe strongly affected by its pKa. In fact, the pKa along withintegrity, lipophilicity, solubility, and permeability hasbeen considered as one of the five key physicochemicalprofiling screens to provide an early understanding of keyproperties that affect ADME characteristics [1].

According to the Brçnsted – Lowry definition, an acid isa proton donor. Since in solution the isolated proton can-not exist, a proton transference reaction will take placeonly in the presence of a base. As most solvents have acidor base properties, the strengths of acids and bases dependon the medium in which they are dissolved. The dissocia-tion process is a complex reaction that can be representedas an equilibrium containing the remaining non-dissociat-ed solvated acid molecules and the solvated anion and pro-ton formed

HAþ SH )*Ka

A� þ SHþ2

The solvation process is composed of a multitude physical-ly independent paths that reflects the complex intermolec-ular solute/solvent interactions.

Experimentally determined pKa values are not alwaysavailable from literature sources, and often estimated val-ues are employed instead. Therefore, it is of interest to de-velop methods for estimating the pKa of ionizable mole-cules. The dissociation constants of small molecules in thegas phase can be currently calculated very accuratelythrough DFTapproaches [2]. However, the situation is lesssatisfactory in solution, mostly due to the difficulty ofquantitatively calculating solvation energies with adequate

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 385

Full Papers

Page 2: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

accuracy. Using dielectric continuum methods [3], it is pos-sible to predict the pKaNs in aqueous solution with a preci-sion of about 0.5 – 2.2 pKa units. Despite the success in theprediction of acidities in water, relatively little work hasbeen done on the prediction of pKaNs in organic solutions.Recently, a few reports have appeared on the calculationof pKa in Dimethylsulfoxide (DMSO) and Acetonitrile(AN) [4, 5]. In contrast, besides the well-known Ham-met – Taft model, other predictive methods based on semi-empirical quantum chemical derived descriptors havebeen developed for the pKaNs of several families of organiccompounds [6 – 8] in aqueous solution. In non-aqueous sol-utions, a comparative study of Hammet – Taft and Dragomodels in the prediction of acidity constants of carboxylicacids, benzoic acids, phenols, and protonated amines, inmethanol, has been published [9].

The Quantitative Structure–Property Relationship ap-proach (QSPR) has become a very useful tool in the predic-tion and interpretation of several physical and chemicalproperties. The basis of such relationships is the assumptionthat the variation of behavior of the compounds, as ex-pressed by any measured physical or chemical properties,can be correlated with changes in molecular features of thecompounds termed descriptors. Descriptors are numericalvalues used to describe different characteristics of a certainstructure in order to yield information about the propertybeing studied. QSPR methods are based on statistically de-termined linear or non-linear functional forms that relatethe property of interest with descriptors. Its development in-volves the selection of descriptors to satisfactorily character-ize different sets of compounds and the application of algo-rithms, such as multiple linear regression or ComputationalNeural Network (CNN) to build the QSPR model. The ad-vantage of CNN is their inherent ability to incorporate non-linear relationships in the derivation of the QSPR models.Linear and non-linear QSPR approach has been successfullyapplied to the correlation of many diverse physicochemicalproperties of chemical compounds. Mainly, the propertiesstudied are those relative to single molecular compounds in-cluding more complex species like polymers and liquid crys-tals [10]. In addition to these studies, some correlations ofproperties involving interactions between different molecu-lar species – such as solubility [11], solvent effects in chemi-cal reactivity [12], chromatographic retention parameters[13] among others – have been reported. Although theseproperties depend on the characteristics of two components,namely solute and solvent, in these papers only the descrip-tors corresponding to the former have been used in the deri-vation of the correlation, because the property analyzed hasbeen obtained under conditions where it only depends onthe chemical nature of these compounds. In contrast, thepublished studies dealing with more complex systems inwhich the analyzed property is determined by varying themedium or the physical conditions are extremely scarce. Inthese cases, descriptors of the different elements of the sys-tem must be used simultaneously and, due to the fact that

the feature usually depends on all of these conditions non-linearly, it is convenient to use CNN in the derivation of theQSPR model. In nearly all the published works in this area,the only additional descriptor that has been considered isthe temperature; some of the properties studied are the va-por pressure, viscosity, and density of organic compounds[14–16]. In contrast, a QSPR study on the kinetics of theacid hydrolysis of carboxylic acid esters at various tempera-tures and solvent compositions has been reported [17].

In the present paper, we apply the CNN based QSPRmethodology to a more complex multicomponent system:the prediction of the pKa of a wide set of phenols in differ-ent solvents. The phenols are very appropriate compoundsto undertake this kind of study because the presence of sub-stituents with very different electronic and steric effects inthe ortho-, meta-, and para-positions of the phenylic ringproduce significant modifications in the value of the proper-ty to be studied. The solvents studied have been condi-tioned by the availability of experimental data, but, fortu-nately, literature shows experimental pKa values in severalsolvents of very different physicochemical characteristics.Thus, we have found enough data for protic (water, metha-nol, isopropanol, and tert-butanol) and aprotic (DMSO,N,N-Dimethylformamide (DMF), AN, Nitromethane(NM), acetone, and N,N-Dimethylacetamide (DMA)) sol-vents. The set of descriptors used is composed by the usualmolecular descriptors of the solute (phenols) and several se-lected solvent descriptors. Some of these solvent descriptorsaccount for different physical properties such as refractiveindex, dielectric constant or relative permittivity, dipole mo-ment, density, etc. In contrast, there is a wide collection ofthe so-called multiparametric polarity solvent scales, whichhave been proposed by different authors to interpret andpredict properties of diverse solutes solved in. In general,each of these scales contains one to four parameters thatare derived from several physicochemical phenomena in-cluding solvatochromic effects, reaction rates, and reactionenthalpies, among others. We have considered these param-eters as solvent descriptors, and they have been used in thederivation of the QSPR model.

2 Data and Computational Methods

2.1 Dataset

The dataset was comprised 276 pKa values. These experi-mental data have been determined in solutions of 94 phe-nols in ten solvents. These solvents can be classified asaprotic: DMSO, DMF, AN, NM, acetone, and DMA, andprotic: water, methanol, isopropanol, and tert-butanol. Thevalues of pKa in the aprotic solvents have been takenmainly from the Izutsku compilation [18], and Vianelloand coworkers [19]. Values in water and in the alcoholshave been obtained from Bosch and coworkers [20 – 24].The 94 phenols contain different substituents in the aro-

386 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397

Full Papers Jesus Jover et al.

Page 3: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 www.qcs.wiley-vch.de H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 387

Table 1. Experimental and calculated pKa for the training, prediction, and cross validation sets.

No. Compound Solvent Experimental Calculated

1 2,3,4,6-Tetrachlorophenola DMSO 7.55 8.882 2,3,4,6-Tetrachlorophenola H2O 5.63 5.633 2,3,4,6-Tetrachlorophenolc DMF 9.50 8.964 2,3,4,6-Tetranitrophenolc DMF 1.11 1.405 2,3-Dihydroxyphenola H2O 9.01 8.546 2,3-Dimethylphenola MeOH 15.08 14.937 2,3-Dimethylphenola H2O 10.54 10.418 2,3-Dinitrophenola MeOH 9.43 9.349 2,3-Dinitrophenola H2O 5.24 5.37

10 2,4,5-Trichlorophenola DMF 12.46 12.3111 2,4,5-Trichlorophenolc DMSO 10.97 12.0112 2,4,6-Tribromophenola Acetone 21.10 22.8913 2,4,6-Tribromophenolc MeOH 10.10 10.9414 2,4,6-Tribromophenolc H2O 6.10 6.6315 2,4,6-Trichlorophenola Acetone 22.50 22.3716 2,4,6-Trichlorophenolb DMF 12.05 10.6417 2,4,6-Trichlorophenola DMSO 10.19 10.6418 2,4,6-Trichlorophenola H2O 6.42 6.2919 2,4,6-Trichlorophenola tert-BuOH 14.82 15.5720 2,4,6-Trichlorophenolc Isopropanol 12.55 12.0521 2,4,6-Trimethyl-3-nitrophenola H2O 8.98 9.5122 2,4,6-Trimethylphenola MeOH 15.53 14.7823 2,4,6-Trimethylphenola H2O 10.86 10.5924 2,4,6-Trinitrophenola Acetone 9.20 8.3425 2,4,6-Trinitrophenola AN 11.00 10.8026 2,4,6-Trinitrophenola DMF 3.65 2.1327 2,4,6-Trinitrophenola DMSO �0.30 2.0428 2,4,6-Trinitrophenolb MeOH 3.90 3.4729 2,4,6-Trinitrophenola NM 8.02 9.3330 2,4,6-Trinitrophenola H2O 0.43 1.3031 2,4,6-Trinitrophenola Isopropanol 3.70 3.8132 2,4,6-Trinitrophenola tert-BuOH 4.70 4.9733 2,4-Dichlorophenola DMSO 13.25 13.5534 2,4-Dichlorophenola H2O 7.65 7.7935 2,4-Dichlorophenola Isopropanol 14.48 13.7036 2,4-Dichlorophenola tert-BuOH 17.25 17.4037 2,4-Dichlorophenolc DMF 13.56 13.7838 2,4-Dimethylphenola MeOH 15.04 14.8439 2,4-Dimethylphenolb H2O 10.60 10.4240 2,4-Dinitrophenola Acetone 15.70 16.0341 2,4-Dinitrophenola AN 18.40 16.5742 2,4-Dinitrophenola DMF 6.36 6.0643 2,4-Dinitrophenola DMSO 5.32 5.6744 2,4-Dinitrophenolc MeOH 7.82 7.4345 2,4-Dinitrophenolc NM 15.90 15.4546 2,4-Dinitrophenolc H2O 4.10 3.7547 2,4-Di-tert-Butylphenola MeOH 16.77 15.6348 2,4-Di-tert-Butylphenolc H2O 11.57 11.5249 2,5-Dichlorophenola DMF 13.85 13.6450 2,5-Dimethylphenola MeOH 14.91 15.0251 2,5-Dimethylphenolc H2O 10.41 10.5152 2,5-Dinitrophenola Acetone 19.80 19.8253 2,5-Dinitrophenola DMF 8.78 8.6854 2,5-Dinitrophenolb DMSO 7.32 8.2355 2,5-Dinitrophenola NM 17.90 18.8656 2,5-Dinitrophenola H2O 5.22 5.1957 2,5-Dinitrophenolc MeOH 8.93 9.4158 2,6-Dibromo-4-nitrophenola DMF 5.71 6.0859 2,6-Dibromo-4-nitrophenola DMSO 5.17 5.5660 2,6-Dibromo-4-nitrophenola MeOH 7.31 7.5061 2,6-Dibromo-4-nitrophenola H2O 3.38 3.65

Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Page 4: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

388 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397

Table 1. (cont.)

No. Compound Solvent Experimental Calculated

62 2,6-Dichloro-4-hydroxyphenolc H2O 7.38 7.0963 2,6-Dichloro-4-nitrophenola DMF 5.72 6.0164 2,6-Dichloro-4-nitrophenola MeOH 7.40 7.2465 2,6-Dichloro-4-nitrophenolc H2O 3.55 3.4266 2,6-Dichlorophenola acetone 23.90 23.4467 2,6-Dichlorophenola DMF 12.55 12.1368 2,6-Dichlorophenola DMSO 11.54 12.1169 2,6-Dichlorophenola H2O 6.79 6.9570 2,6-Dichlorophenola Isopropanol 13.58 13.0071 2,6-Dichlorophenolb tert-BuOH 16.38 16.7372 2,6-Dimethyl-4-cyanophenolc H2O 8.27 9.3573 2,6-Dimethyl-4-nitrophenola DMSO 9.00 9.1274 2,6-Dimethyl-4-nitrophenola H2O 7.19 7.3175 2,6-Dimethylphenola H2O 10.59 10.3276 2,6-Dimethylphenolc MeOH 15.26 14.6277 2,6-Dinitro-4-hydroxyphenola H2O 4.42 4.9278 2,6-Dinitrophenola Acetone 13.78 14.9779 2,6-Dinitrophenola DMF 6.07 5.2980 2,6-Dinitrophenola MeOH 7.70 7.0081 2,6-Dinitrophenola NM 16.00 15.2182 2,6-Dinitrophenola H2O 3.74 3.1883 2,6-Dinitrophenolc AN 16.45 16.4884 2,6-Dinitrophenolc DMSO 4.82 5.0385 2,6-Di-tert-Butyl-4-nitrophenola AN 19.10 18.7486 2,6-Di-tert-Butyl-4-nitrophenola DMF 8.27 8.7987 2,6-Di-tert-Butyl-4-nitrophenolb DMSO 7.60 7.6888 2,6-Di-tert-Butyl-4-nitrophenola MeOH 10.89 10.7389 2,6-Di-tert-Butyl-4-nitrophenolc H2O 6.62 7.6890 2,6-Di-tert-Butylphenolc DMSO 16.85 15.6991 2-Aminophenola H2O 9.44 10.1992 2-Bromophenola AN 23.92 25.1293 2-Bromophenola H2O 8.39 8.6094 2-Bromophenolc DMF 13.85 15.2095 2-Chloro-4-bromophenola MeOH 12.70 12.5996 2-Chloro-4-bromophenolc H2O 7.64 7.7397 2-Chloro-4-phenylphenola AN 24.90 24.8098 2-Chloro-4-phenylphenola DMF 15.70 15.1399 2-Chloro-4-phenylphenola DMSO 14.90 14.45

100 2-Chloro-4-phenylphenola Isopropanol 13.70 14.95101 2-Chloro-4-phenylphenola tert-BuOH 18.60 18.37102 2-Chloro-4-phenylphenolc H2O 8.07 10.23103 2-Chlorophenolb DMA 13.91 15.06104 2-Chlorophenola Isopropanol 15.83 14.48105 2-Chlorophenola tert-BuOH 18.54 18.31106 2-Chlorophenolc MeOH 12.83 13.48107 2-Chlorophenolc H2O 8.51 8.45108 2-Ethylphenola H2O 10.20 10.29109 2-Fluorophenola MeOH 12.14 12.48110 2-Fluorophenola H2O 8.73 7.92111 2-Hydroxy-3-nitrophenola H2O 6.68 7.03112 2-Hydroxyphenola H2O 9.12 9.32113 2-Iodophenola H2O 8.46 9.01114 2-Methoxyphenola MeOH 14.48 13.79115 2-Methoxyphenola H2O 9.90 9.62116 2-Methyl-4,6-dinitrophenola DMSO 4.59 5.53117 2-Methylphenola AN 27.50 25.76118 2-Methylphenola H2O 10.31 9.99119 2-Methylphenolc MeOH 14.90 14.36120 2-Nitrophenolb AN 22.00 21.67121 2-Nitrophenola DMSO 11.00 11.37122 2-Nitrophenola MeOH 11.52 12.09

Full Papers Jesus Jover et al.

Page 5: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 www.qcs.wiley-vch.de H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 389

Table 1. (cont.)

No. Compound Solvent Experimental Calculated

123 2-Nitrophenola NM 21.40 20.96124 2-Nitrophenola H2O 7.23 6.65125 2-Nitrophenola Isopropanol 13.30 12.50126 2-Nitrophenola tert-BuOH 15.88 15.81127 2-Nitrophenolb DMF 12.14 12.05128 2-Nitrophenolc Acetone 22.30 22.32129 2-tert-Butylphenola MeOH 16.50 15.07130 2-tert-Butylphenola H2O 11.34 10.73131 3-(Methylsulfonyl)phenola DMSO 13.56 11.70132 3-(Trifluoromethylsulfonyl)phenolb NM 19.30 17.95133 3,4,5-Trichlorophenola DMSO 12.58 13.21134 3,4,5-Trichlorophenola H2O 7.68 7.81135 3,4-Dichlorophenola AN 24.06 24.86136 3,4-Dichlorophenola H2O 8.51 8.60137 3,4-Dichlorophenolc DMF 13.22 14.96138 3,4-Dichlorophenolc DMSO 14.22 14.65139 3,4-Dimethylphenola MeOH 14.63 14.80140 3,4-Dimethylphenola H2O 10.36 10.41141 3,4-Dinitrophenola AN 17.90 19.10142 3,4-Dinitrophenola DMSO 7.97 7.62143 3,4-Dinitrophenola MeOH 9.46 8.87144 3,4-Dinitrophenolc H2O 5.42 5.44145 3,5-Dichlorophenolb AN 23.31 24.66146 3,5-Dichlorophenola DMSO 13.09 14.44147 3,5-Dichlorophenola MeOH 12.94 13.29148 3,5-Dichlorophenola H2O 8.18 8.42149 3,5-Dichlorophenola Isopropanol 14.05 14.22150 3,5-Dichlorophenola tert-BuOH 17.04 17.90151 3,5-Dihydroxyphenola H2O 8.45 8.71152 3,5-Dimethyl-4-cyanophenolb H2O 8.21 9.23153 3,5-Dimethyl-4-nitrophenola H2O 8.25 7.79154 3,5-Dimethylphenola MeOH 14.62 14.96155 3,5-Dimethylphenola H2O 10.20 10.44156 3,5-Dinitrophenola Acetone 22.70 20.57157 3,5-Dinitrophenola AN 20.50 20.45158 3,5-Dinitrophenolb H2O 6.66 5.72159 3,5-Dinitrophenola Isopropanol 10.84 10.16160 3,5-Dinitrophenola tert-BuOH 13.40 12.72161 3,5-Dinitrophenolc DMF 11.30 9.07162 3,5-Dinitrophenolc DMSO 10.60 8.65163 3,5-Dinitrophenolc MeOH 10.20 8.92164 3,5-Dinitrophenolc NM 19.40 19.42165 3,5-Di-tert-Butylphenola MeOH 14.89 15.92166 3,5-Di-tert-Butylphenolc H2O 10.29 11.76167 3-Acetylphenola DMSO 15.14 15.52168 3-Acetylphenolc H2O 9.19 9.98169 3-Aminophenola H2O 9.99 9.91170 3-Bromophenola MeOH 13.30 13.92171 3-Bromophenola H2O 9.01 9.33172 3-Bromophenola Isopropanol 14.83 14.87173 3-Bromophenolc tert-BuOH 18.52 18.61174 3-Chloro-2,4,6-trinitrophenola DMSO 1.16 1.43175 3-Chloro-4-nitrophenola AN 19.95 20.39176 3-Chloro-4-nitrophenola DMSO 9.80 9.73177 3-Chloro-4-nitrophenolb H2O 6.49 6.49178 3-Chlorophenola AN 25.04 25.34179 3-Chlorophenola DMA 16.29 15.96180 3-Chlorophenola DMSO 15.83 15.50181 3-Chlorophenola MeOH 13.10 13.88182 3-Chlorophenolc H2O 9.02 9.13183 3-Cyanophenolc DMSO 14.76 15.09

Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Page 6: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

390 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397

Table 1. (cont.)

No. Compound Solvent Experimental Calculated

184 3-Ethylphenolc H2O 9.90 10.38185 3-Fluorophenola DMSO 15.88 15.16186 3-Fluorophenola H2O 9.28 8.53187 3-Hydroxyphenola DMSO 15.30 15.74188 3-Hydroxyphenola H2O 9.15 9.27189 3-Iodophenola H2O 8.88 9.55190 3-Methoxyphenola H2O 9.65 9.72191 3-Methoxyphenolc DMSO 15.72 15.79192 3-Methyl-5-ethylphenolb H2O 10.10 10.71193 3-Methylphenola DMSO 16.86 16.18194 3-Methylphenolb MeOH 14.48 14.74195 3-Methylphenola H2O 10.10 10.16196 3-Nitrophenola DMF 13.85 13.79197 3-Nitrophenola DMSO 13.75 13.03198 3-Nitrophenola MeOH 12.40 12.74199 3-Nitrophenola NM 22.20 22.33200 3-Nitrophenola H2O 8.36 8.10201 3-Nitrophenola Isopropanol 13.92 13.42202 3-Nitrophenola tert-BuOH 16.99 16.67203 3-Nitrophenolc AN 23.85 22.91204 3-Trifluoromethyl-4-nitrophenola AN 19.30 20.74205 3-Trifluoromethyl-4-nitrophenolb DMF 10.40 10.00206 3-Trifluoromethyl-4-nitrophenola DMSO 9.30 9.42207 3-Trifluoromethyl-4-nitrophenola H2O 6.41 6.18208 3-Trifluoromethyl-4-nitrophenola Isopropanol 9.90 11.02209 3-Trifluoromethyl-4-nitrophenola tert-BuOH 12.77 13.82210 3-Trifluoromethylphenola DMF 15.70 14.39211 3-Trifluoromethylphenola DMSO 14.30 14.56212 3-Trifluoromethylphenola MeOH 12.10 12.09213 3-Trifluoromethylphenola H2O 9.04 8.40214 3-Trifluoromethylphenola Isopropanol 12.50 13.61215 3-Trifluoromethylphenolc AN 24.90 25.29216 3-Trifluoromethylphenolc tert-BuOH 17.10 17.32217 4-Acetylphenolb H2O 8.05 9.90218 4-Acetylphenolc DMSO 13.68 15.44219 4-Aminophenola H2O 10.43 10.39220 4-Bromophenola AN 25.53 25.19221 4-Bromophenola MeOH 13.63 13.70222 4-Bromophenola Isopropanol 14.30 14.66223 4-Bromophenola tert-BuOH 19.10 18.36224 4-Bromophenola DMF 14.34 15.56225 4-Bromophenolc DMSO 15.50 15.25226 4-Bromophenolc H2O 9.36 9.16227 4-Chloro-2,6-dinitrophenola AN 15.30 14.67228 4-Chloro-2,6-dinitrophenola DMF 4.68 4.15229 4-Chloro-2,6-dinitrophenola DMSO 3.51 3.93230 4-Chloro-2,6-dinitrophenola H2O 2.97 2.63231 4-Chlorophenola AN 25.44 25.28232 4-Chlorophenolb DMF 14.50 15.72233 4-Chlorophenola DMSO 16.10 15.43234 4-Chlorophenola MeOH 13.59 13.74235 4-Chlorophenola H2O 9.38 9.17236 4-Chlorophenola Isopropanol 15.31 14.71237 4-Chlorophenolc tert-BuOH 18.96 18.45238 4-Chlorophenolc DMA 16.78 15.95239 4-Cyanophenola AN 22.77 24.70240 4-Cyanophenola DMSO 13.01 14.39241 4-Cyanophenola H2O 7.80 8.83242 4-Ethylphenola H2O 10.00 10.41243 4-Fluorophenola H2O 9.95 8.96244 4-Hydroxy-2,3,5,6-tetramethylphenola H2O 11.51 11.02

Full Papers Jesus Jover et al.

Page 7: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

matic ring. Besides the phenol, there are 39 mono-, 30 di-,18 tri-, 4 tetra-, and 2 penta-substituted compounds. Theycontain 17 kinds of substituents with very different elec-tronic and steric effects: Me, Et, tert-Bu, Ph, OMe, F, Cl,Br, I, OH, NO2, CF3, CN, COCH3, NH2, SO2CH3, andSO2CF3; there are some compounds with two hydroxylgroups, for which the studied pKa corresponds to the firstdissociation constant of the molecule. The substituents arein the ortho-, meta-, and para-positions of the aromaticring; 51 of the phenols studied have at least one substitu-ent in the ortho-position. Some of them are very bulky,such as tert-Bu, and can produce important interactionswith the OH moiety when they appear in the ortho-posi-tion. The number of solutes in each solvent is rather differ-ent: water (83), methanol (39), isopropanol (16), tert-buta-nol (16), DMSO (44), DMF (29), AN (25), NM (10), ace-tone (10), and DMA (4). Of this set of 276 values of pKa,154 correspond to protic solvents and 122 to aprotic ones.On the other hand, 141 values correspond to ortho-substi-tuted phenols and 135 to phenols with no substituent in

this position. The pKa values ranged from �0.3 for the2,4,6-trinitrophenol (picric acid) in DMSO to 27.5 for the2-methylphenol in AN, with a mean value of 12.4.

The full dataset of 276 compounds was divided random-ly into three subsets: the training set consists of 199 com-pounds, 72% of the compounds studied; the prediction set,composed of other 55 compounds (20%); and the crossvalidation set that contains the remaining 22 compounds(8%). The random selection has been done to ensure thatthe prediction and the cross validation sets contain valuesin all the solvents studied. The training set is used exclu-sively to derive the model. The prediction set, formed bycompounds that were not included in the model develop-ment, is used to prove the predictive ability of the model.The third set, the cross validation set, is needed to deter-mine when to stop training the neural network, in order toprevent their overtraining and to be sure that the net-work would have good and general predictive ability. Ta-ble 1 contains all the experimental and calculated pKa

values.

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 www.qcs.wiley-vch.de H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 391

Table 1. (cont.)

No. Compound Solvent Experimental Calculated

245 4-Hydroxy-2-methylphenola H2O 10.20 10.22246 4-Hydroxy-2-nitrophenola H2O 7.63 7.92247 4-Hydroxyphenola H2O 9.14 9.84248 4-Iodophenola DMF 14.31 15.47249 4-Iodophenola H2O 9.20 9.28250 4-Methoxyphenolb DMSO 17.58 16.19251 4-Methoxyphenola H2O 10.27 10.33252 4-Methylphenola AN 27.45 25.66253 4-Methylphenola DMSO 16.96 16.11254 4-Methylphenola H2O 10.28 10.10255 4-Methylphenolc MeOH 14.52 14.58256 4-Nitrophenola DMA 10.85 11.42257 4-Nitrophenola DMF 11.84 11.01258 4-Nitrophenola DMSO 11.00 10.20259 4-Nitrophenola MeOH 11.24 11.34260 4-Nitrophenola NM 20.10 19.95261 4-Nitrophenolb Isopropanol 12.45 11.85262 4-Nitrophenola tert-BuOH 14.60 14.71263 4-Nitrophenolc AN 20.70 20.55264 4-Nitrophenolc H2O 7.18 6.91265 4-tert-Butylphenola AN 27.48 25.64266 4-tert-Butylphenola MeOH 14.52 15.06267 4-tert-Butylphenola H2O 10.31 10.86268 Pentachlorophenola DMF 7.97 6.66269 Pentachlorophenola DMSO 7.05 6.45270 Pentachlorophenolc Acetone 18.30 18.57271 Phenola DMF 15.40 16.41272 Phenola DMSO 16.47 16.15273 Phenola MeOH 14.32 14.45274 Phenolb NM 25.70 25.20275 Phenola H2O 9.99 9.78276 Phenolc AN 26.60 25.67

a Training set.b Cross validation set.c Prediction set.

Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Page 8: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

2.2 Solute Descriptors

The calculation of the structural descriptors was per-formed with the CODESSA program [25]. The structuresof the compounds were drawn with HyperChem Lite (Hy-percube, Inc.), and the geometries were fully optimized,without symmetry restrictions, using the semi-empiricalmethod AM1 [26] implemented in the MOPAC 6.0 pro-gram [27]. In all cases, frequency calculations have beenperformed in order to ensure that all the calculated geo-metries correspond to true minima. The MOPAC outputfiles were used by CODESSA to calculate several hun-dreds of molecular descriptors, which can be classified infive classes: constitutional (number of various types ofatoms and bonds, number of rings, molecular weight, etc.);topological (Wiener index, Randic indices, Kier – Hallshape indices, etc.); geometrical (moments of inertia, mo-lecular volume, molecular surface area, etc.); electrostatic(minimum and maximum partial charges, polarity parame-ter, etc.); and quantum (reactivity indices, dipole moment,HOMO and LUMO energies, etc.). In the calculation ofthe electrostatic descriptors, the program uses partialcharges derived from the empirical approach proposed byZefirov et al. [28], based on the Sanderson electronegativi-ty. Many of these electrostatic descriptors are also calculat-ed using the charges derived from the quantum – chemicalmethods.

2.3 Solvent Descriptors

To characterize the solvents we have used the following 11physicochemical properties: refractive index (n), dielectricconstant or relative permittivity (er), dipole moment (m),molecular weight (M), density at 25 8C (d), molar volumeat 25 8C (Vm), polarizability (ALFA), refractivity (c), stan-dard molar vaporization enthalpy at 25 8C (DvapH

0), stan-dard internal energy of vaporization at 25 8C (DvapU

0), andHildebrandNs solubility parameter (d). The other magni-tudes used as solvent descriptors have been taken from theempirical model-dependent solvent scales. Table 2 showsthe solvent parameters used as descriptors. They are ex-tensively used, and contain values for the ten solventsstudied here [29 – 49].

The heuristic multilinear regression procedures avail-able in the framework of the CODESSA program wereused to make the first reduction of the pool of the solutedescriptors. The initial number of descriptors was 677. Theheuristic procedures provide collinearity control (i.e., anytwo descriptors intercorrelated above 0.8 are never in-volved in the same model) and implement heuristic algo-rithms for the rapid selection of the best correlation, with-out testing all possible combinations of the available de-scriptors. After the heuristic reduction, the pool of solutedescriptors was reduced to 215.

392 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397

Table 2. Multiparametric polarity solvent scales used in this study.

Descriptor Definition (authorsN scale) Reference

a Hydrogen-bond donation ability (Kamlet and Taft) 31b Electron-pair donation ability (Kamlet and Taft) 32p* Polarity/polarizability parameter (Kamlet and Taft) 33ET(30) Polarity index (Dimroth and Reichardt) 34Y Polarity (expression of dielectric constant) (Koppel and Palm) 35P Polarizability (expression of refractive index) (Koppel and Palm) 35E Acidity (Koppel and Palm) 35B Basicity (Koppel and Palm) 35B’ Acid – base H-bond formation induced shifts of phenol OH group stretching frequency (Koppel and Paju) 36DN Donor number (Gutmann) 37AN Acceptor number (Mayer, Gutmann, and Gerger) 38Z Hydrogen bonding ability (Kosower) 39SPP Solvent polarity (Catalan) 40SA Solvent acidity scale (Catalan) 41SB Solvent basicity scale (Catalan) 42Bj Cation-solvating tendency (Swain) 43Aj Anion-solvating tendency (Swain) 43M Expression of refractive index (McRae) 44J Expression of dielectric constant (Kirkwood and David) 44N Combination of M and P (McRae) 44EB Lewis acidity of solvents (Janowski) 45Py Pyrene fluorescence spectra (Dong and Winnik) 46S Derived from Kosower Z values (Brownstein) 47S’ Non-specific solvent polarity scale (Drago) 48Eb Electrostatic basic enthalpy parameter (Drago) 49Cb Covalent basic enthalpy parameter (Drago) 49

Full Papers Jesus Jover et al.

Page 9: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

2.4 CNN Methods

The computations were performed with the ADAPT (Au-tomated Data Analysis and Pattern Recognition Toolkit)program [50, 51], including feature selection routines (ge-netic algorithm [52] and simulated annealing [53]) andCNN procedures [54]. The CNNs used for this analysis arethree-layer, fully connected, feed-forward networks, andthey have been described in detail by Jurs and coworkers[54, 55]. The number of neurons of the input layer corre-sponds to the number of descriptors in the model. Thenumber of neurons in the hidden layer controls the flexi-bility of the network and was adjusted until the optimalnetwork architecture was achieved. The output layer con-tains one neuron representing the predicted pKa value.

The 215 solute descriptors calculated by CODESSA andthe 37 solvent discriptors were imported to the ADAPTprogram. These descriptors were subjected to the objectivefeature selection routines of ADAPT, and a reduced poolof 114 descriptors was obtained. Fully CNNs were devel-oped using a genetic algorithm descriptor selection routinewith a CNN for evaluating the fitness of each subset of de-scriptors selected. The fitness of descriptor subsets was cal-culated as COST¼TSETþ0.4 jTSET�CVSET j, whereTSET and CVSET denote rms errors for the training andcross validation sets, respectively. Models chosen with thisquality factor performed better than models chosen withjust training set rms error as the quality factor, that is,CNNs that produce training and cross validation set errorsthat are low and similar in magnitude tend to perform wellin predicting properties of interest for compounds notused in the training process. A quasi-Newton methodBFGS (Broyden – Flectcher –Golfarb – Shanno [55]) wasused to train the network. It should be noted that the ratioof training set observations to Adjustable Parameters(AP) should be kept above 2.0 to avoid overtraining [56].The number of AP is computed as AP¼ [(ILþ1)�HL]þ[(HLþ1)�OL], where IL, HL, and OL denote the num-ber of neurons in the input layer, hidden layer, and outputlayer, respectively. The number of hidden neurons was ad-justed with a building up procedure, that means, start witha low number and increase it one unit until the results ach-ieved with that architecture are not better than those ob-tained with the previous one. In this work, we built archi-tectures with four, five, and six hidden neurons, stoppingat this point because the results obtained with the last onedid not improve those derived from a 7 – 5 – 1 architecture.So, the architecture selected for the subsequent calcula-tions will be 7 – 5 – 1. With this architecture, the networkcontains 46 AP, corresponding to a ratio of 4.3 for trainingobservations (199) to AP, well above the minimum accept-able ratio of 2.0. After testing several models, the best onewas evaluated by the external prediction set compounds.

3 Results and Discussion

The seven descriptors forming the best model are shownin Table 3. This model contains five descriptors of the sol-ute and two of the solvents. Except the maximum partialcharge for a hydrogen atom, which is an electrostatic de-

scriptor, the other four solute descriptors belong to thequantum – chemical type. The first descriptor of the modelis the polarizability of the solutes, ALFA, which values arecalculated by the AM1 semi-empirical method with theMOPAC program and used directly. The second descriptoris the maximum partial charge for a hydrogen atom, whichis the hydrogen atom of the O�H phenolic bond. In thiscase, the partial charge has been derived from the methodproposed by Zefirov et al. [28]. The other three solute de-scriptors, LUMOþ1 energy, maximum e – n attraction fora C�O bond, and relative positive charged surface area,RPCS, are quantum – chemical descriptors and they arealso calculated with the MOPAC program. The a solventdescriptor belongs to the Kamlet – Taft scale [31], which isone of the most used scales for the hydrogen bond abilityof the solvents. This scale is based on solvatochromic pa-rameters, averaged for several probes; it measures the abil-ity to donate hydrogen bonds of the solvent molecules tothe solute; it was designed to be devoid of contributionsfrom the polarity and electron-pair donicity. The last de-scriptor of the model is the dipole moment of the solvents.

The statistical results obtained are very good, the corre-lation coefficient (R2) for the training, prediction, andcross validation sets, are: 0.982, 0.977, and 0.975, respec-tively; and the Root-Mean-Square Errors (RMSE) are:0.71, 0.83, and 0.85, respectively, for the three subsets,showing the good predictive ability of the model. Figure 1shows a plot of calculated versus observed pKa values forthe training, prediction, and cross validation sets.

Although one of the drawbacks of the neural networksis the matter of the interpretability, the derived model in-cludes descriptors that allow the analysis of the studiedprocess. As it is well known, the intermolecular solute/sol-vent interactions are very complex and they have beenconventionally divided into two groups. The first is definedas related to non-specific or macroscopic solvent effects.These intermolecular interactions are related to the bulk

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 www.qcs.wiley-vch.de H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 393

Table 3. Seven descriptors forming the model

Descriptor

ALFA polarizability (DIP) (Solute)Max partial charge for a H atom (Solute)LUMOþ1 energy (Solute)Max e – n attraction for a C�O bond (Solute)Relative positive charged surface area, RPCS (Solute)Hydrogen-bond donation ability (Kamlet and Taft), a (Solvent)Dipole moment, m (Solvent)

Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Page 10: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

of the solvent rather than the directional short-distance in-termolecular interactions between the solute and the indi-vidual solvent molecules. They include the solvent dielec-tric polarization in the field of the solute molecule, the iso-tropic dispersion interactions, and the solute cavity forma-tion in the bulk of the solvent. The second type of solventeffects, the specific or microscopic effects, involves the for-mation of chemical bonds and other anisotropic interac-tions between the solute and solvent molecules in the solu-tion. In most cases, these specific effects are related to thedonor– acceptor interactions (Lewis acid – base character-istics) or to hydrogen bonding between the solute and thesolvent molecules [57]. Analyzing the descriptors con-tained in the model, we can say that the maximum partialcharge for a hydrogen atom solute descriptor reflects thepolarity of the O�H bond, that is, broken in the dissocia-tion process, and consequently the facility of this bond togive protons and the phenolate anions. The higher valuesof this descriptor correspond to the compounds with moreelectron-withdrawing substituents in the phenyl ring suchas: polychloro- and polynitrophenols, and those with lowervalues are the phenol itself and alkyl-substituted phenols,according to the basic ideas of the bond polarity related tothe electronic effects of the substituents in the aromaticrings. Analogously, the maximum e – n attraction for aC�O bond descriptor can also be related to the energy ofthe O�H bonds, since in all the studied phenols the oxygenatom of this C�O bond corresponds to the phenolic group.The RPCS solute descriptor belongs to the Charged Parti-al Surface Area (CPSA) descriptors proposed by Stantonand Jurs [58], which combine shape and electronic infor-mation to characterize molecules, and therefore encodefeatures responsible for polar interactions between mole-

cules. These three descriptors are clearly related to thenon-specific interactions. The descriptors dipole momentof the solvent and the polarizability of the solute, beingmeasures of the polarity of the specimen involved, canalso be related to non-specific interactions, probably, thoseof electrostatic nature. The specific interactions can be de-scribed in terms of localized donor– acceptor interactionsinvolving specific orbitals, according to the Lewis model,and also by means of acid – base hydrogen-bonding inter-actions. The electron donor– acceptor interactions can beassociated to the solute descriptor LUMOþ1 energy,since it can be related to the Lewis acid character of thephenols. It is a quantum – chemical descriptor that denotesthe energy of the second lowest unoccupied molecular or-bital, and corresponds to the second electron affinity ofthe molecule. Among the descriptors contained in themodel, the solvent acidity, a, is clearly related to hydro-gen-bonding interactions, as it was indicated before. Thisdescriptor stands for the hydrogen-bond donor propertiesof the solvent molecules and clearly reflects this type ofspecific interactions in the dissociation process.

Recently, Guha and Jurs have proposed a simple meth-od to measure the relative importance of the descriptors inCNN derived models [59]. The first input descriptor is ran-domly scrambled, and then the neural network model isused to predict the property. Because the values of the de-scriptor have been scrambled, the correlation between de-scriptor values and property values is obscured. As a re-sult, the RMSE for these new predictions should be largerthan the RMSE of the model, the so-called base RMSE.The difference between this RMSE value and the baseRMSE indicates the importance of the descriptor to thepredictive ability of the model, that is, if a descriptor playsa major role in the modelNs predictive ability, scramblingthat descriptor will lead to a greater loss in predictive abili-ty (as measured by the RMSE value) than for a descriptorthat does not play such important role in the model. Thisprocedure is then repeated for all the descriptors presentin the model, and the descriptors are ranked in order ofimportance. Table 4 reports the increases in RMSE due tothe scrambling of the corresponding descriptor over thebase RMSE. These values show the relative importance of

394 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397

Figure 1. Plot of calculated vs. experimental pKa values for thetraining, prediction, and cross validation sets.

Table 4. Increase in RMSE due to scrambling of individual de-scriptors.

No. Scrambled descriptor IncrementRMSE

1 ALFA polarizability 0.342 Max partial charge for a H atom 0.973 LUMOþ1 energy 2.794 Max e – n attraction for a C�O bond 0.615 Relative positive charged surface area, 0.536 Hydrogen-bond donation ability

(Kamlet and Taft), a5.61

7 Dipole moment, m 4.10

Full Papers Jesus Jover et al.

Page 11: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

the seven descriptors; the figures rank from 5.61 to 0.34.The two most important descriptors are the solvent de-scriptors: the hydrogen-bond acidity, a, and the dipole mo-ment, m.

Table 1 shows the experimental and calculated pKa val-ues; the RMSE of all the 276 entries is 0.76. The 2,4,6-trini-trophenol (picric acid) in DMSO (entry 27) is the onlycase in which the residual is larger than three times theRMSE, and it can be considered as an outlier. There are14 values, due to ten different phenols, with residuals larg-er than twice the RMSE (Table 5). Except for 2-chloro-4-phenylphenol and 4-acetylphenol in water (entries 102 and219, respectively), the remaining 12 values correspond toaprotic solvents, where, in general, the quality of the ex-perimental data is slightly lower than in protic solvents. Ta-ble 5 contains phenols with and without ortho-substituents,and the residuals are positive or negative for both types ofderivatives. In any case, the ortho-substituted phenols donot give poorer correlations than other subsets.

Table 6 shows the statistical results obtained in the pre-diction of some subsets of the phenols. The set of the 276values studied has been divided into subsets according tothe presence of one or two substituents in the ortho-posi-tion and the phenols substituted in meta- and/or para-posi-tions. On the other hand, two subsets with the pKa valuesin protic and aprotic solvents, respectively, have also beenestablished. For the four subsets, the statistical results arevery similar, showing the robustness and the good predic-tion capacity of the model. The similar results obtained forthe ortho and non-ortho derivatives suggest that the so-called ortho-effect is not as important in phenols as in oth-er organic compounds, like benzoic acid derivatives. Anal-ogous conclusions have been obtained in a QSPR study onthe bond dissociation energy of the O�H in phenols, wherethe model proposed estimates the BDE with similar accu-racy for the ortho and non-ortho-substituted phenols [60].Similar conclusions were also obtained in the study on theprediction of the acidity constants of phenols and benzoic

acid derivatives in methanol, using the well-known Ham-met – Taft and Drago models [9]. The goodness of the fit issimilar for the set of all the phenols and for the subsets ofortho- and non-ortho-substituted derivatives, while withthe benzoic acids it is necessary to fit separately the ortho-substituted derivatives from the meta- and/or para-substi-tuted ones. Table 6 also shows the good statistical resultsobtained for the subsets corresponding to each of the tensolvents used. Again, the predictions are better for proticthan for aprotic solvents.

4 Conclusions

The QSPR methodology has been applied to a multicom-ponent system using simultaneously descriptors of both el-ements: solutes and solvents. The studied system involvesa wide set of pKa values of 94 phenols in ten solvents.These solvents have diverse physicochemical properties,four are protic and six aprotic, and they have different de-gree of autoprotolysis, with pKap, ranging from 14.0 for wa-

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 www.qcs.wiley-vch.de H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 395

Table 5. Predictions with residuals larger than twice the RMSE.

Phenol Solvent Entry pKa exp. pKa calc. Residual

2,4,6-Tribromophenol Acetone 12 21.10 22.89 1.792,4-Dinitrophenol AN 41 18.40 16.57 �1.832-Chloro-4-phenylphenol Water 102 8.07 10.23 2.162-Methylphenol AN 117 27.50 25.76 �1.743-(Methylsulfonyl)phenol DMSO 131 13.56 11.70 �1.863,4-Dichlorophenol DMF 137 13.22 14.96 1.743,5-Dinitrophenol Acetone 156 22.70 20.57 �2.133,5-Dinitrophenol DMF 161 11.30 9.07 �2.233,5-Dinitrophenol DMSO 162 10.60 8.65 �1.954-Acetylphenol DMSO 218 13.68 15.44 1.764-Acetylphenol Water 219 8.05 9.90 1.854-Cyanophenol AN 239 22.77 24.70 1.934-Methylphenol AN 252 27.45 25.66 �1.794-tert-Butylphenol AN 265 27.48 25.64 �1.842,4,6-Trinitrophenol DMSO 27 �0.30 2.04 2.34

Table 6. Statistics of pKa estimations for different subsets.

Subset n R2 F sd

ortho 141 0.983 8261 0.69non-ortho 135 0.974 4941 0.81Protic 154 0.975 6020 0.58Aprotic 122 0.978 5414 0.93Water 83 0.936 1194 0.56Methanol 39 0.963 963 0.58Isopropanol 16 0.928 180 0.77tert-Butanol 16 0.979 663 0.51DMSO 44 0.965 1166 0.81DMF 29 0.955 573 0.94AN 25 0.937 340 1.05Acetone 10 0.949 148 1.13NM 10 0.976 320 0.73DMA 4 0.918 23 0.75

Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Page 12: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

ter to values higher than 30 for AN, acetone, and NM; thatis, to say, the interval of possible pKa values of the solutesis very wide and different in each solvent. For the solutes,the usual molecular descriptors have been used, and forthe solvents, several physical properties and some parame-ters of the most popular multiparametric polarity scaleshave been employed. Neural network based methods havebeen used in the selection of the descriptors and in the der-ivation of the seven-descriptor model. Of these seven de-scriptors, five belong to the solutes and the other two tothe solvents.

As it was stated before, the interactions between themolecules of the solutes and the solvents in a dissociationprocess is very complex, and it is difficult to interpret themwith a clear and simple model. Nevertheless, the descrip-tors of the model contain information that reflects charac-teristics of the molecules of the solutes and the solventsclearly related to these interactions. Thus, the dipole mo-ment of the solvent, and the solute descriptors: polarizabil-ity, maximum partial charge for a hydrogen atom, maxi-mum e – n attraction for a C�O bond, and relative positivecharged surface area, contain information related to thenon-specific solute/solvent interactions. On the otherhand, the LUMOþ1 energy solute descriptor and the a

solvent descriptor encode information related to the spe-cific interactions.

The statistical results are practically the same for differ-ent subsets of the compounds analyzed: those containingvalues obtained in protic and aprotic solvents; containingphenols with and without substituents in the ortho-posi-tions of the aromatic ring, and in the subsets correspond-ing to each solvent, supporting the robustness of the mod-el. Thus, the model derived is very good in the two basicaspects of the QSPR methodology: its predictive abilityand the capacity to interpret the property studied, showingthat this approach can be used successfully in the analysisof multicomponent systems.

Acknowledgements

The authors thank Professor Peter C. Jurs (PennsylvaniaState University) for giving us access to the ADAPT pro-gram. Financial support from the Catalan Government(Grant 2005 SGR 00184) is gratefully acknowledged.

References

[1] H. Wang, J. Ulander, Expert Opin. Drug Metab. Toxicol.2006, 2, 139 – 155.

[2] M. D. Liptak, K. C. Gross, P. G. Seybold, S. Feldgus, G. C.Shields, J. Am. Chem. Soc. 2002, 124, 6421 – 6427.

[3] G. I. Almerindo, D. W. Tondo, J. R. Pliego, J. Phys. Chem. A2004, 108, 166 – 171.

[4] Y. Fu, L. Liu, R.-Q. Li, R. Liu, Q.-X. Guo, J. Am. Chem.Soc. 2004, 126, 814 – 822.

[5] D. M. Chipman, J. Phys. Chem. A 2002, 106, 7413 – 7422.[6] B. G. Tehan, E. J. Lloyd, M. G. Wong, W. R. Pitt, J. G. Mon-

tana, D. T. Manallack, E. Gancia, Quant. Struct.-Act. Relat.2002, 21, 457 – 472.

[7] B. G. Tehan, E. J. Lloyd, M. G. Wong, W. R. Pitt, E. Gancia,D. T. Manallack, Quant. Struct.-Act. Relat. 2002, 21, 473 –485.

[8] M. J. Citra, Chemosphere 1999, 38, 191 – 206.[9] E. Bosch, F. Rived, M. Roses, J. Sales, J. Chem. Soc., PerkinTrans. 2 1999, 1953 – 1958.

[10] A. R. Katritzky, U. Maran, V. S. Lobanov, M. Karelson, J.Chem. Inf. Comput. Sci. 2000, 40, 1 – 18.

[11] A. R. Katritzky, A. A. Oliferenko, P. V. Oliferenko, R. Pet-rukhin, D. B. Tatham, U. Maran, A. Lomaka, W. E. Acree,J. Chem. Inf. Comput. Sci. 2003, 43, 1794 – 1805.

[12] A. R. Katritzky, S. Perumal, R. A. Petrukhin, J. Org. Chem.2001, 66, 4036 – 4040.

[13] R. Bosque, J. Sales, E. Bosch, M. Roses, M. C. Garc�a-Al-varez-Coque, J. R. Torres-Lapasio, J. Chem. Inf. Comput.Sci. 2003, 43, 1240 – 1247.

[14] A. J. Chalk, B. Beck, T. A. Clark, J. Chem. Inf. Comput. Sci.2001, 41, 1053 – 1059.

[15] T. Suzuki, R.-U. Ebert, G. SchTTrmann, J. Chem. Inf. Com-put. Sci. 2001, 41, 776 – 790.

[16] N. M. Halberstam, I. I. Baskin, V. A. Palyulin, N. S. Zefirov,Dokl. Chem. 2002, 384, 140 – 143.

[17] N. M. Halberstam, I. I. Baskin, V. A. Palyulin, N. S. Zefirov,Mendeleev Commun. 2002, 185 – 186.

[18] K. Izutsu, Acid-Base Dissociation Constants in DipolarAprotic Solvents, Blackwell Scientific Publications Oxford1990.

[19] F. Maran, D. Celadon, M. G. Severin, E. Vianello, J. Am.Chem. Soc. 1991, 113, 9320 – 9329.

[20] E. Bosch, R. Roses, Talanta 1989, 36, 627 – 632.[21] E. Bosch, C. Rafols, M. Roses, Talanta 1989, 36, 1227 – 1231.[22] E. Bosch, C. Rafols, M. Roses, Anal. Chim. Acta 1995, 302,

109 – 119.[23] F. Rived, M. Roses, E. Bosch, Anal. Chim. Acta 1998, 374,

309 – 324.[24] M. Roses, F. Rived, E. Bosch, J. Chromatograph. A 2000,

867, 45 – 56.[25] A. R. Katritzky, V. S. Lovanov, M. Karelson, CODESSA,

Reference Manual V 2.13, Semichem and the University ofFlorida, 1997.

[26] M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, J. P. P. Stewart,J. Am. Chem. Soc. 1985, 107, 3902 – 3909.

[27] J. P. P. Stewart, QCPE, No 455, Indiana University, Bloo-mington, IN 1989.

[28] N. S. Zefirov, M. A. Kirpichenok, F. F. Izmailov, M. I. Trofi-mov, Dokl. Akad. Nauk SSSR 1987, 296, 883 – 887.

[29] J.-L. M. Abboud, R. Notario, Pure Appl. Chem. 1999, 71,645 – 718.

[30] A. R. Katritzky, T. Tamm, Y. Wang, S. Sild, M. Karelson, J.Chem. Inf. Comput. Sci. 1999, 39, 684 – 691.

[31] R. W. Taft, M. J. Kamlet, J. Am. Chem. Soc. 1976, 98,2886 – 2894.

[32] M. J. Kamlet, R. W. Taft, J. Am. Chem. Soc. 1976, 98, 377 –383.

[33] M. J. Kamlet, J.-L. Abboud, R. W. Taft, J. Am. Chem. Soc.1977, 99, 6027 – 6038.

[34] K. Dimroth, C. Reichardt, T. Siepmann, F. Bohlmann, Lie-bigs Ann. Chem. 1963, 661, 1 – 37.

[35] I. A. Koppel, V. A. Palm, in: N. B. Chapmann, J. Shorter(Eds.), The Influence of Solvent on Organic Reactivity in

396 H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397

Full Papers Jesus Jover et al.

Page 13: Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents

Advances in Linear Free Energy Relationships, PlenumPress, London 1972, Chapter 5.

[36] I. A. Koppel, A. I. Paju, Org. Reactivity 1974, 1, 121 – 136.[37] V. Gutmann, E. Wychera, Inorg. Nucl. Chem. Lett. 1966, 2,

257 – 260.[38] U. Mayer, V. Gutmann, W. Gerger, Monatsh. Chem. 1975,

1235 – 1257.[39] E. M. Kosower, J. Am. Chem. Soc. 1958, 80, 3253 – 3260.[40] J. Catalan, V. Lopez, P. Perez, R. Mart�n-Villamil, J. G. Ro-

dr�guez, Liebigs Ann. 1995, 241 – 252.[41] J. Catalan, C. A. D�az, Liebig Ann./Recueil 1997, 1941 –

1949.[42] J. Catalan, C. D�az, V. Lopez, P. Perez, J. L. G. Paz, J. G.

Rodr�guez, Liebigs Ann. 1996, 1785 – 1794.[43] C. G. Swain, M. S. Swain, A. L. Powell, S. Alumni, J. Am.

Chem. Soc. 1983, 105, 502 – 513.[44] F. W. Fowler, A. R. Katritzky, R. J. D. Rutherford, J. Chem.

Soc. B 1971, 460 – 489.[45] A. Janowski, I. Turowska-Tyrk, P. K. Wrona, J. Chem. Soc.,

Perkin Trans. 2 1985, 821 – 825.[46] D. C. Dong, M. A. Winnik, Photochem. Photobiol. 1982, 35,

17 – 21.[47] S. Browstein, Can. J. Chem. 1960, 38, 1590 – 1596.[48] R. S. Drago, J. Chem. Soc. Perkin Trans. 2 1992, 1827 – 1838.

[49] R. S. Drago, Applications of Electrostatic-Covalent Modelsin Chemistry, Surfside, Gainsville 1994.

[50] P. C. Jurs, J. T. Chow, M. Yuan, in: E. C. Olson, R. E. Chris-torffersen (Eds.), Computer-Assisted Drug Design, TheAmerican Chemical Society, Washington, DC 1979, pp 103 –129.

[51] A. J. Stuper, W. E. Brugger, P. C. Jurs, Computer-AssistedStudies of Chemical Structure and Biological Function, Wi-ley, New York 1979.

[52] B. T. Luke, J. Chem. Inf. Comput. Sci. 1994, 34, 1279 – 1287.[53] J. M. Sutter, S. L. Dixon, P. C. Jurs, J. Chem. Inf. Comput.

Sci. 1995, 35, 77 – 84.[54] L. Xu, J. W. Ball, S. L. Dixon, P. C. Jurs, Environ. Toxicol.

Chem. 1994, 13, 841 – 851.[55] M. D. Wessel, P. C. Jurs, Anal. Chem. 1994, 66, 2480 – 2487.[56] D. J. Livingstone, D. T. Manallack, J. Med. Chem. 1993, 36,

1295 – 1297.[57] C. Reichardt, Solvent and Solvent Effects in Organic

Chemistry, 3rd Edn., VCH, Weinheim 2003.[58] D. T. Stanton, P. C. Jurs, Anal. Chem. 1990, 62, 2323 – 2329.[59] R. Guha, P. C. Jurs, J. Chem. Inf. Comput. Model. 2005, 45,

800 – 806.[60] R. Bosque, J. Sales, J. Chem. Inf. Comput. Sci. 2003, 43,

637 – 642.

QSAR Comb. Sci. 26, 2007, No. 3, 385 – 397 www.qcs.wiley-vch.de H 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 397

Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents