The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS,...
-
Upload
madeline-gardner -
Category
Documents
-
view
221 -
download
2
Transcript of The National Technical University of Athens QSAR Group – Overview of Research Activities ATHENS,...
The National Technical University of Athens QSAR Group – Overview of Research Activities
ATHENS, August 2008
Structure
The NTUA group emerged out of the collaboration between two research laboratorieswhich are located in the School of Chemical Engineering at NTUA: the Laboratory of Process Control and Informatics and The Laboratory of Organic Chemistry
NTUA QSAR Group – Structure
It is headed by Haralambos Sarimveis, Asst. Professor in Process Control and Informaticsand involves one additional faculty member, one post-doctorate associate, one research associate at Ph.D. level, one software developer and several postgraduate and undergraduate students
The collaboration between the two laboratories started in 2002, recognizing the fact that progress in the design of new molecules with improved properties can be acceleratedby the application of existing quantitative methodologies and the development of new methods that are based on information sciences, computer technologies andcomputational intelligence.
Activities and Objectives
NTUA QSAR Group – Activities and Objectives
Although the group has been formed quite recently, it has already published numerous papersin top scientific journal, established collaborations with other research groups (Fleming Research Institute, University of Athens, University of Cyprus, Universita degli Studi di Firenze, University of North Carolina, NovaMechanics Ltd) and participated in several research programs.
The group has worked in many scientific disciplines (fuels, polymers, food Properties), but it has focused on the very challenging and important pharmaceutical industry, by developing QSAR models that predict activities and toxicity of existing and new potential pharmaceutical compounds.
Supported by its parallel research activities on simulation of biological and toxicologicalsystems, development of ADMET and physiologically based parhmacokinetic (PBPK) models and automation of drug delivery systems, the objective of NTUA research work is tosupport the different phases in the drug discovery process, from hit finding through leadOptimization. The vision of the group is to contribute to the development of a highly-automated system that will optimize the therapy strategy for each individual patient.
NTUA QSAR Group – Strategy for designing novel compounds using QSAR models
Database: Compounds – Activity/Property/Toxicity
Descriptor calculation
Experimental Synthesis
Experimental evaluation of activity/property/toxicity
Variable Selection - Modeling
Model validation:1. Test Set (R2, RMS), cross-validation, Y-randomization2.Domain of applicability
Design of novel compoundsvirtual screening
data mining inverse-QSAR
EXPERIMENT
QSAR DEVELOPMENT
NOVEL STRUCTURE
DESIGN
Strategy for designing novel compounds using QSAR models
NTUA QSAR Group – QSAR model development 1. Database design
QSAR model development 1. Database design
Selection of compounds
Lead compounds and derivatives
Representative of the structures under study
Wide range of structural characteristics
Experimental data (activities, toxicity)
Protocol
Experimental data
Literature
Calculation of descriptors - topological indices (Randic, Kier&Hall), Stereochemical indices (molecular volume V), Electronic/Quantum descriptors (ΕHOMO, ELUMO), Physicochemical descriptors (logP)
Commercial software
In house software
Experimental data
Literature
NTUA QSAR Group – QSAR model development 2. Model generation
QSAR model development 2. Model generation
Variable selection
Elimination stepwise regression (ES-SWR)
Genetic algorithm developed in-house (GASA-RBF)
Modeling methodologies
Linear – Multiple linear regression (MLR), Partial least squares (PLS)
Neural networks – Radial basis function (RBF) trained using the fuzzy means algorithm or the subtractive clustering algorithm both developed in-house
Support Vector Machines (SVM) using the LIB-SVM software
• Standard statistical indices (R2, RMS, F)
• Predictive ability tested on external data sets
• Cross – validation
• Y-randomization test
• Domain of applicability
NTUA QSAR Group – QSAR model development 3. Model validation
QSAR model development 3. Model validation
NTUA QSAR Group- Design of novel compounds
Virtual Screening
Structural modifications with insertion, deletion, replacement etc of substituents orsubstructures and prediction of activity/toxicity from the QSAR model
Data mining
Search for chemical similarity between active compounds and other compounds.
Inverse optimization method
Formulation and solution of optimization of mathematical optimization problems with constraints (i.e. connectivity, valence) for the identification of the lead compound with optimal characteristics
Design of novel compounds
NTUA QSAR Group – Case studies, Solving QSPR problems
Case studies: Solving QSPR problems
“Prediction of High Weight Polymers Glass Transition Temperature Using RBF Neural Networks”,
Journal of Molecular Structure: THEOCHEM 2005, 716, 193-198
“Prediction of Intrinsic Viscosity in Polymer-Solvent Combinations using a QSPR model"
Polymer 2006 47 3240-3248
"A novel QSPR model to predict è (lower critical solution temperature) in polymer solutions using molecular descriptors" Journal of Molecular Modeling 2007 13 55-64
"Development and Evaluation of a QSPR Model or the Prediction of Diamagnetic Susceptibility" QSAR Comb. Sci. 27, 2008, No. 4, 432 – 436
NTUA QSAR Group – Case studies, Solving QSAR - QSTR problems
Case studies: Solving QSAR - QSTR problems
QSAR Problems
“QSAR study on para – substituted aromatic sulfonamides as carbonic anhydrase II inhibitors using topological information indices” Bioorganic and Medicinal Chemistry 2006 14 (4) 1108-1114.
“A Novel QSAR Model for Evaluating and Predicting the Inhibition of Dipeptidyl Aspartyl Fluoromethylketones” QSAR & Combinatorial Science 2006 25 928-935
"A Novel QSAR Model for Modeling and Predicting Induction of Apoptosis by 4-Aryl-4H-chromenes". Bioorganic and Medicinal Chemistry 2006 14, 6686-6694
"A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4]diazepane ureas" European Journal of Medicinal Chemistry
QSTR Problems
A novel RBF neural network training methodology to predict toxicity to Vibrio Fischeri" Molecular Diversity 2006 10, 213-221.
“Prediction of toxicity using a novel RBF neural network training methodology”. Journal of Molecular Modeling 2006 12, 297-305
NTUA QSAR Group – Case studies, Virtual Screening – In Silico Lead Optimization
Case studies: Virtual Screening – In Silico Lead Optimization
"Identification of a series of novel derivatives as potent HCV inhibitors by a ligand – based virtual screening optimized procedure" Bioorganic & Medicinal Chemistry 2007 15 7237-7147
"Optimization of Biaryl Piperidine and 4-Amino-2-biarylurea MCH1 Receptor Antagonists using QSAR Modeling, Classification Techniques and Virtual Screening", Journal of Computer-Aided Molecular Design 2007 20 83-95.
Investigation of Substituent Effect of 1-(3,3-Diphenylpropyl) - Piperidinyl Phenylacetamides Amides on CCR5 Binding Affinity using QSAR and Virtual Screening Techniques” Journal of Computer-Aided Molecular Design 2006 20, 83-95.
‘A Novel Simple QSAR Model for the Prediction of anti-HIV Activity Using Multiple Linear Regression Analysis’ Molecular Diversity 2006 10, 405-414
NTUA QSAR Group – QSAR Software under development-1
The user can load existing mol files or create new mol files
NTUA QSAR Group – QSAR Software under development-2
NTUA QSAR Group – QSAR Software under development-3
NTUA QSAR Group – QSAR Software under development-4
NTUA QSAR Group – QSAR Software under development-5
NTUA QSAR Group – The RBF neural network architecture
The RBF neural network architecture
A special neural network architecture with important advantages
Simple network topology
Fast training algorithms (usually split into two phases)
Linear relationship between the hidden layer and the output layer
Accurate predictions (in many test cases it has been shown that they provide more successful results
compared to other neural network types)
NTUA QSAR Group – The RBF neural network topology
The RBF neural network topology
ΣΣΣΣ
Input LayerInput Layer Hidden layer Output layerOutput layer
x=[x1 x2 x3]x1 x2 x3
w2
w3
w4
4
1j
j
y w f
jx - c
w1
cc=[cc11 c c22 c c33 c c44]c3c2 c4c1
(x1 -c
j (1) ) 2
(x2-cj(2) )2
(x 3-c j(3
) )2
jf x - c
3 2
1
( )j i ji
x c i
x - c
Radial BasisRadial BasisFunctionFunction
NTUA QSAR Group – The fuzzy means algorithm
An RBF network training algorithm that:
Is very fast, since it requires only one pass of the training examples
Determines the hidden layer structure automatically
Locates the hidden node centers so that they are not close to each other
Provides a solution that does not depend on an initial random selection
The fuzzy means algorithm
(Sarimveis et al., 2002, Industrial and Engineering Chemistry Research)
The fuzzy means algorithm determines the proper number of hidden nodes
and calculates the hidden node center locations. The rest of the network
parameters are determined using conventional techniques.
The key concept behind the algorithm is the idea of the fuzzy partition of the
input space into a number of fuzzy subsets.
x1
x 2 Two Dimensional Example
α1,1 α1,2 α1,3 α1,4α1,5
α2,
1α
2,2
α2,
3α
2,4
α2,
5α1,2
α2,
3
NTUA QSAR Group – Fuzzy partition of the input space
Fuzzy partition of the input space
Then, fuzzy partitioning is extended to the entire input space so that a number of fuzzy subspaces are created, where each fuzzy subspace is defined as a combination of N particular fuzzy sets.
Assuming a system with N input variables, the domain of each input variable is evenly partitioned into a number of triangular fuzzy subsets.
The multidimensional membership function of an input vector x into a fuzzy subspace Al, is defined
1/22
11/2
2
1
( )
( )
Nli i
l i
N
ii
a x k
rd k
δa
x
First data point [x (1) y(1)]First data point [x (1) y(1)]
Determination of first fuzzy subspace(Hidden neuron center)
Determination of first fuzzy subspace(Hidden neuron center)
New data point [x (k) y(k)]New data point [x (k) y(k)]
0 ( ( )) 1lrd k x
Determination of next fuzzy subspace(Hidden neuron center)
Determination of next fuzzy subspace(Hidden neuron center)
0
1( ( )) min ( ( ))l l
l Lrd k rd k
x x
L=1L=1
L=L+1L=L+1 YES
NO
NTUA QSAR Group – Flowchart of the fuzzy means algorithm
Flow chart of the fuzzy means algorithm
Hybrid coding of candidate solutions Hybrid coding of candidate solutions (chromosomes) (chromosomes)
Binary coding for each descriptor Binary coding for each descriptor
(first N genes)(first N genes)
Integer coding for the number of fuzzy Integer coding for the number of fuzzy
sets used in the fuzzy means algorithmsets used in the fuzzy means algorithm
xx77(1)(1)
xx77(2)(2)
xx77(k)(k)
0 1 1 0 1 0 1 8
xx11(1)(1)
xx11(2)(2)
xx11(k)(k)
xx22(1)(1)
xx22(2)(2)
xx22(k)(k)
xx33(1)(1)
xx33(2)(2)
xx33(k)(k)
xx44(1)(1)
xx44(2)(2)
xx44(k)(k)
xx55(1)(1)
xx55(2)(2)
xx55(k)(k)
xx66(1)(1)
xx66(2)(2)
xx66(k)(k)
DescriptorsDescriptorsNumberNumberof fuzzy of fuzzy
setssets
Creation of initial populationCreation of initial population
DescriptorsDescriptors:: probability equal to 5probability equal to 500% %
for every digit to receive value for every digit to receive value
11
Fuzzy setsFuzzy sets: : Random selection from a Random selection from a
normal distribution between normal distribution between LBLB and and UBUBObjective functionObjective function
Leave-one-out cross-validationLeave-one-out cross-validation
2( ),
GA 1
ˆ( ) RMSECV
K
i j ii
j
y y
K
2
( ),GA 1
ˆ( ) RMSECV
K
i j ii
j
y y
K
NTUA QSAR Group – 1st stage of GASA-RBF
1st stage of GASA-RBF
ReproductionReproduction
Cross-overCross-over
MutationMutation
(1 / )
(1 / )
( ) 1 , if random digit is 0
( ) 1 , if random digit is 1
b
b
t Told old
new
t Told old
fz UB fz r
fz
fz fz LB r
bb11 b b2 2 … b… bpospos bbpos+1pos+1 … b … bnn fz fzbb
cc11 c c2 2 … c… cpospos ccpos+1pos+1 … c … cnn fz fzcc
Binary genesBinary genes: : Flip bit mutation (the values Flip bit mutation (the values in a small percentage of genes for each in a small percentage of genes for each population are inverted)population are inverted)
Integer genes:Integer genes: Non-uniform mutationNon-uniform mutation
Strings of genes are exchanged Strings of genes are exchanged between pairs of chromosomesbetween pairs of chromosomes
Roulette wheel selectionEach chromosome is allocated a slot onEach chromosome is allocated a slot onthe roulette, with size proportional to the roulette, with size proportional to its fitnessits fitness
Exploitation operatorsExploitation operators
Intensified search in spaces of highIntensified search in spaces of highquality solutionsquality solutions
Exploration operatorsExploration operators
New solution spaces are exploredNew solution spaces are explored
NTUA QSAR Group – 1st stage of GASA-RBF (continued)
1st stage of GASA-RBF (continued)
SIMULATED ANNEALINGSIMULATED ANNEALING
Probability of Accepting a worse solutionProbability of Accepting a worse solution:
( )- ( )- new cur
k
f s f s
TP e
GENERALIZED SIMULATED GENERALIZED SIMULATED ANNEALINGANNEALING
1 , 0, 1, 2,...k kT r T k
Cooling scheduleCooling schedule
( ) ( )
( )new cur
bsf cons
f s f s
f s fP e
No need to determine a cooling scheduleNo need to determine a cooling schedule
( )- ( )- new cur
k
f s f s
TP e
OnlyOnly ββ must be determined by the usermust be determined by the user0.80 0.99r
InitiallyInitially, , almost all solutionsalmost all solutionsare acceptedare accepted Random search Random search
As T approaches zero only improving solutionsAs T approaches zero only improving solutions are acceptedare accepted Local Search Local Search
The following design parameters must be specifiedThe following design parameters must be specified::1.1. InitialInitial value of Tvalue of T2.2. Strategy for reducing Strategy for reducing ΤΤ33.. FinalFinal value of value of ΤΤ
NTUA QSAR Group – 2nd stage of GASA-RBF
2nd stage of GASA-RBF
NTUA QSAR Group- References
Tsekouras, G, H. Sarimveis and G. Bafas, “A method for fuzzy system identification based on clustering analysis”, (Systems Analysis Modeling Simulation, 39,543-558, 2001).
Tsekouras, G, H. Sarimveis, C. Raptis and G. Bafas, “A fuzzy logic approach for system qualitative characteristics”, (Computers & Chemical Engineering, 26, 429-438, 2002).
Sarimveis, H., A. Alexandridis, G. Tsekouras and G. Bafas, “A fast and efficient algorithm for training radial basis function neural networks based on a fuzzy partition of the input space”, (Industrial & Engineering Chemistry Research, 41, 751-759, 2002).
Tsekouras, G., H. Sarimveis, G. Bafas, “A simple algorithm for training fuzzy systems using input-output data” (Advances in Engineering Software, 34(5) 247-259, 2003).
Sarimveis, H, A. Alexandridis, G. Bafas, “A fast training algorithm for RBF networks based on subtractive clustering” (Neurocomputing, 51 501-505, 2003).
Sarimveis H. A. Alexandridis, S. Mazarakis, G. Bafas, “A new algorithm for developing dynamic radial basis function neural network models based on genetic algorithms”, (Computers and Chemical Engineering, 28(1-2), 209-217, 2004).
Tsekouras G., H. Sarimveis, “A new approach for measuring the validity of the fuzzy c-means algorithm”, (Advances in Engineering Software, 35(8-9), 567-575, 2004).
Tsekouras G., H. Sarimveis, E. Kavakli, G. Bafas “A hierarchical fuzzy-clustering approach to fuzzy modeling”, (Fuzzy Sets and Systems, 150(2), 245-266, 2005).
Alexandridis A., P. Patrinos, H. Sarimveis, G. Tsekouras, “A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models”, (Chemometrics and Intelligent Laboratory Systems, 75(2), 149-162, 2005).
Afantitis Α., G. Melagraki, K. Makridima, A. Alexandridis, H. Sarimveis, O. Iglessi-Markopoulou, “Prediction of High Weight Polymers Glass Transition Temperature Using RBF Neural Networks” (ΤΗΕOCHEM: Journal of Molecular Structure, 716(1-3), 193-198, 2005).
G. Melagraki, Afantitis Α., H. Sarimveis, O. Iglessi-Markopoulou, C. T. Supuran, “QSAR study on para – substituted aromatic sulfonamides as carbonic anhydrase II inhibitors using topological information indices”, (Bioorganic & Medicinal Chemistry, 14(4), 1108-1114, 2006).
G. Melagraki, Afantitis Α., K. Makridima, H. Sarimveis, O. Iglessi-Markopoulou “Prediction of toxicity using a novel RBF neural network training methodology”, (Journal of Molecular Modeling, 12(3), 297-305, 2006).
References
NTUA QSAR Group- References (continued)
A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Prediction of the Intrinsic Viscosity of Polymer – Solvent Combinations using a QSPR model", (Polymer, 47(9), 3240-3248, 2006).
A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Investigation of Substituent Effect of 1-(3,3-Diphenylpropyl)-Piperidinyl Phenylacetamides Amides on CCR5 Binding Affinity using QSAR and Virtual Screening Techniques", (Journal of Computer-Aided Molecular Design, 20, 83-95, 2006).
G. Melagraki, Afantitis Α., H. Sarimveis, O. Iglessi-Markopoulou, A. Alexandridis “A novel RBF neural network training methodology to predict toxicity to Vibrio fischeri”, (Molecular Diversity , 10(2), 213-221, 2006).
A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, " A Novel QSAR Model for Predicting Induction of Apoptosis by 4-Aryl-4H-chromenes", (Bioorganic and Medicinal Chemistry, 14, 6686-6694, 2006).
A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, “A Novel Simple QSAR Model for the Prediction of anti-HIV Activity Using Multiple Linear Regression Analysis”, (Molecular Diversity , 10, 405-414, 2006).
A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "A Novel QSAR Model for Evaluating and Predicting the Inhibition Activity of Dipeptidyl Aspartyl Fluoromethylketones", (QSAR & Combinatorial Science, 10, 928-935, 2006).
Melagraki G., A. Afantitis, H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, " A novel QSPR model to predict θ(lower critical solution temperature) in polymer solutions using molecular descriptors", (Journal of Molecular Modeling, 13(1), 55-64, 2007).
Melagraki G., A. Afantitis, H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Optimization of Biaryl Piperidine and 4-Amino-2-biarylurea MCH1 Receptor Antagonists using QSAR Modeling, Classification Techniques and Virtual Screening", (Journal of Computer-Aided Molecular Design, 21(5), 251-267, 2007).
Melagraki G., A. Afantitis, H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, " Identification of a series of novel derivatives as potent HCV inhibitors by a ligand – based virtual screening optimized procedure", (Bioorganic and Medicinal Chemistry, 15, 7237-7247, 2007).
A. Afantitis, Melagraki G., H. Sarimveis, P. A. Koutentis, J Markopoulos, O. Iglessi-Markopoulou, "Development and Evaluation of a QSPR Model for the Prediction of Diamagnetic Susceptibility”, (QSAR & Combinatorial Science, 27(4), 432-436, 2008).
A. Afantitis, Melagraki G., H. Sarimveis, O. Iglessi-Markopoulou, G. Kollias, "A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4] diazepane ureas”, accepted, European Journal of Medicinal Chemistry, 2008.
References (continued)