Estimate Water Solubility
-
Upload
christos-kannas -
Category
Technology
-
view
278 -
download
4
description
Transcript of Estimate Water Solubility
![Page 1: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/1.jpg)
ESTIMATE WATER SOLUBILITY
Christos Kannas
University of Cyprus
Department of Computer Science
![Page 2: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/2.jpg)
2nd RDKit UGM 2
Outline
• Introduction• Related Work
• ESOL• RDKit based Implementation• Results
• Correlation Table & Chart• Conclusion
3rd October, 2013
![Page 3: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/3.jpg)
2nd RDKit UGM 3
Introduction
• Need to estimate the solubility of molecules in:• DMSO (CS(=O)C), and • Water.
• Predictive Models for DMSO and Water Solubility.
3rd October, 2013
![Page 4: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/4.jpg)
2nd RDKit UGM 4
Related Work
3rd October, 2013
![Page 5: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/5.jpg)
2nd RDKit UGM 5
Related Work
• J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp. 1000–1005, May 2004.
3rd October, 2013
![Page 6: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/6.jpg)
2nd RDKit UGM 6
Related Work: ESOL
• ESOL – Estimated SOLubility• Linear Regression Model• 8 Molecular Properties (Initially)
• Preeminent Method: General Solubility Equation (GSE), logP and melting point (Tm)
3rd October, 2013
![Page 7: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/7.jpg)
2nd RDKit UGM 7
ESOL: Molecular Properties (Initial) 1/3
• clogP – Daylight CLOGP v4.72
• MolWeight
• RotBonds – Rotatable Bonds, Daylight SMARTS structures define rotatable bonds
3rd October, 2013
![Page 8: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/8.jpg)
2nd RDKit UGM 8
ESOL: Molecular Properties (Initial) 2/3
• Aromatic Proportion (AromProp) – The proportion of heavy atoms in the molecule that are in an aromatic ring. Daylight SMARTS ([a]) aromatic atoms.
• Non-Carbon Proportion – The proportion of heavy atoms in a molecule that are not carbon. Daylight SMARTS ([!#6])
3rd October, 2013
![Page 9: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/9.jpg)
2nd RDKit UGM 9
ESOL: Molecular Properties (Initial) 3/3
• H-bond Donors
• H-bond Acceptors
• Polar Surface Area – Peter Ertl’s Polar Surface Area
3rd October, 2013
![Page 10: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/10.jpg)
2nd RDKit UGM 10
ESOL: Methodology
• Multiple Linear Regression
• Significance of each parameter based in terms of its absolute t-statistic.
3rd October, 2013
![Page 11: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/11.jpg)
2nd RDKit UGM 11
ESOL: Train Dataset
• Training Set: 2874 molecules• Small – Low MolWeight organic
compounds• Medium – Pesticide products,
MolWeight 200-300• Large – Sygenta compounds,
MolWeight 300-400
3rd October, 2013
![Page 12: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/12.jpg)
2nd RDKit UGM 12
ESOL: Results
• 4 parameters with t-statistic > 2• clogP• MolWeight• RotBonds• AromProp
Log(Sw) = 0.16
- 0.63 x clogP
- 0.0062 x MolWeight
+ 0.066 x RotBonds
- 0.74 x AromPropJ. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp. 1000–1005, May 2004.
3rd October, 2013
![Page 13: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/13.jpg)
2nd RDKit UGM 13
RDKit Implementation
3rd October, 2013
![Page 14: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/14.jpg)
2nd RDKit UGM 14
RDKit Based Implementation 1/2
• Use Regression Equation:
Log(Sw) = 0.16
- 0.63 x clogP
- 0.0062 x MolWeight
+ 0.066 x RotBonds
- 0.74 x AromProp
• Calculate properties using RDKit.
3rd October, 2013
![Page 15: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/15.jpg)
2nd RDKit UGM 15
RDKit Based Implementation 2/2
3rd October, 2013
![Page 16: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/16.jpg)
2nd RDKit UGM 16
RDKit Based Implementation 2/2
3rd October, 2013
![Page 17: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/17.jpg)
2nd RDKit UGM 17
RDKit Based Implementation 2/2
3rd October, 2013
![Page 18: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/18.jpg)
2nd RDKit UGM 18
Results
3rd October, 2013
![Page 19: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/19.jpg)
2nd RDKit UGM 19
Testing…
• Supplementary Dataset:• 1143 molecules with:
• Measured Water Solubility (logSw)• ESOL
• Correlation Charts:• Measured vs ESOL• Measured vs RDKit_clogSw• ESOL vs RDKit_clogSw• Measured vs ESOL vs RDKit_clogSw
3rd October, 2013
![Page 20: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/20.jpg)
2nd RDKit UGM 20
Correlation Table & Chart
IMPORTED_measured log(solubility:mol/L)IMPORTED_ESOL predicted
log(solubility:mol/L) clogSw
IMPORTED_measured log(solubility:mol/L) 1
IMPORTED_ESOL predicted log(solubility:mol/L) 0.90794375 1
clogSw 0.864718601 0.964683313 1
3rd October, 2013
-12 -10 -8 -6 -4 -2 0 2
-10
-8
-6
-4
-2
0
2
4Predicted vs MeasuredIMPORTED_ESOL predicted log(solubility:mol/L)
Linear (IMPORTED_ESOL predicted log(solubility:mol/L))
clogSw
Linear (clogSw)
Measured log(solubility:mol/L)
Pre
dic
ted
lo
g(s
olu
bil
ity:
mo
l/L
)
![Page 21: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/21.jpg)
2nd RDKit UGM 21
Conclusion
• Comparable results.
• Easy, fast and relatively accurate.
• What is importance of adding Hydrogens prior to Aromatic Proportion calculation?
3rd October, 2013
![Page 22: Estimate Water Solubility](https://reader033.fdocuments.us/reader033/viewer/2022061521/554ea788b4c905977e8b49ee/html5/thumbnails/22.jpg)
2nd RDKit UGM 223rd October, 2013