An Introduction to Chemoinformatics for the postgraduate students of Agriculture
-
Upload
devakumar-jain -
Category
Technology
-
view
117 -
download
5
description
Transcript of An Introduction to Chemoinformatics for the postgraduate students of Agriculture
Chemoinformatics and Applications in Agrochemical
DiscoveryC. Devakumar and Rajesh Kumar
Division of Agricultural ChemicalsIARI, New Delhi-110012
2
Chemical SpaceChemical Space
Stars Small Molecules
Existing 1022 107
Virtual 0 1060 (?)
Mode Real Virtual
Access Difficult “Easy”
3
Chemical Space: Small Molecules in Organic Chemistry
Chemical Space: Small Molecules in Organic Chemistry
Understanding chemical space
Small molecules:
chemical synthesis
drug design
chemical genomics,
systems biology
nanotechnology
And others
4
Cost to develop and time to market of various products
5
Registration of safer chemicals
Proportion of pesticide active ingredients that are considered to be safer (biological chemicals and reduced-risk conventional chemicals) has steadily increased over the last several years.
Source: EPA, 1999.
6
Plant biotechnology opens new markets / solutions
7
The development of the agrochemical in vivo screening
Overall Outline
1. Introduction2. Molecular Representations 3. Chemical Data and Databases4. Molecular Similarity5. Chemical Reactions6. Machine Learning and Other Predictive
Methods7. Molecular Docking and Drug Discovery
What is Chemoinformatics?
• It encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information
• It is the mixing of information resources to transform data into information and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization
What is Chemoinformatics?
• “the set of computer algorithms and tools to store and analyse chemical data in the context of drug discovery and design projects”
• Chemoinformatics is the application of informatics methods to solve chemical problems
Resources
Books:
J. Gasteiger, T. E. and Engel, T. (Editors) (2003). Chemoinformatics: A Textbook. Wiley.
A.R. Leach and V. J. Gillet (2005). An Introduction to Chemoinformatics. Springer.
Journal:
Journal of Chemical Information and Modeling
Web:
http://cdb.ics.uci.edu
and many more………
History of Chemoinformatics
The first, and still the core, journal for the subject, the Journal of Chemical Documentation, started in 1961 (the name Changed to the Journal of Chemical Information and computer Science in 1975)
The first book appeared in 1971 (Lynch, Harrison, Town and Ash, Computer Handling of Chemical Structure Information)
The first international conference on the subject was held in 1973at Noordwijkerhout and every three years since 1987
Chemoinformatics….
Chemoinformatics encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information
'Cheminformatics combines the scientific working fields of chemistry and computer science for example in the area of chemical graph theory and mining the chemical space. It is to be expected that the chemical space contains at least 1062 molecules
Is it Cheminformatics or Chemoinformatics?
Year Cheminformatics Chemoinformatics Ratio
2000 39 684 0.05
2001 8,010 2,910 2.75
2002 34,000 16,000 2.12
2203 58,143 32,872 1.77
2204 85,435 60,439 1.41
2005 6,58,298 2,72,096 2.41
2006 3,17,000+ 1,63,000+ 1.94
• Cheminformatics, molecular informatics, chemical informatics, or even Chemo bioinformatics
Why We Need Chemoinformatics?
1) An enormous amount of data and maintenance of data
2) Can we gain enough knowledge from the known data to make predictions for those cases where the required information is
not available?
3) Relationships between the structure of a compound and itsbiological activity, or for the influence of reaction conditions on chemical reactivity.
Advances in theoretical and computational chemistry now allow chemists to model chemical compounds “in silico” with ever-increasing accuracy.
Molecular properties now becoming accessible through computation include molecular shape, electronic structure, physical properties, chemical reactivity, protein folding, structures of materials and surfaces, catalytic activity, and biochemical activities.
integrates a comprehensive knowledge of chemistry with an extensive understanding of information technology.
The intersection of chemistry and information technology embraces an expanding territory;
computational modeling of individual molecules, thermodynamic methods of estimating chemical properties, methods of predicting biological activity of hypothetical compounds, and organization and classification of chemical information.
Chemoinformatics
Schematic representation of a crowded cell. An array of different molecules can function independently under extremely crowded conditions, partly because of judicious distributions of oppositely charged polar groups on the molecular surfaces. However, such systems are in some ways extremely fragile. For example, a mutation that alters just one amino acid in the haemoglobin molecule can stimulate massive aggregation and give rise to a fatal genetic disease, sickle-cell anaemia. More generally, many disorders of old age, most famously Alzheimer’s disease, result from the increasingly facile conversion of normally soluble proteins into intractable deposits that occur particularly as we get older Many of these aggregation processes involve the reversion of the unique biologically active forms of polypeptide chains into a generic and non-functional ‘chemical’ form
Additional computational challenges lie in indexing and classifying the infinite population of chemical compounds that could be synthesized or are already known.
Specific indexing and search problems include
how to find a compound that might block a specific biological target;
how to predict the most efficient synthetic strategy for a desired compound from available precursors;
how to employ results of bioactivity tests on a family of molecules to design improved versions;
20
Currently combinatorial chemists are developing new methods of synthesizing libraries of related compounds on an unprecedented scale.
Such libraries can be used to produce huge arrays of materials for investigation of biochemical, catalytic, or material properties.
Systems are required to design, catalog, and search these libraries, assess test results in a meaningful way, and integrate new information with existing chemical databases.
Investigations into information storage at the molecular level are underway, bringing to full circle the link between chemistry and information technology.
22
The Scope of Chemoinformatics
Representations and Structure Searching
Substructure Searching
Similarity Searching, Clustering, and Diversity Analysis
Searching Databases
Computer-aided Structure Elucidation
3D Substructure Searching
QSAR and Docking
Structure and applications of chemoinformaticsDatabase design and programming Representation and searching of chemical structures Structure, substructure & similarity searching in 2D & 3D Markush and reaction searching Representation and searching of biological databases chemoinformatics softwareData analysis techniquesClustering; Evolutionary algorithms; Graph theory; Neural networks; Chemical information sourcesCheminformatics applications Techniques used to design bioactive compounds Molecular simulation and design Drug discovery process; QSAR; Combi-chem; SBDDSpectroscopy and crystallography in cheminformatics
Kinds of chemistry databases
• Small-molecule databases
– Databases of commercially-available compounds (e.g. ACD, http://www.mdl.com/products/experiment/available_chem_dir/index.jsp)
– Proprietary chemical structure databases
– Literature databases
– Patent databases
– Small project-specific databases
• Protein databases
– Public, online databases (e.g. PDB, http://www.pdb.org)
– Proprietary and project-specific databases
Software Companies
Accelrys -Large chemoinformatics company ACD/Labs - analytical informatics & predictionsBCI - 2D fingerprinting, clustering toolkits & softwareBioreason - HTS data analysis softwareCambridgesoft - 2D drawing tools & E-notebooksCAS - produce Scifinder Scholar searching softwareChemAxon - Java based toolkits and softwareDaylight- 2D representation & searching softwareLeadscope - 2D structure and property toolsLion Bioscience - produce LeadNavigatorMDL - Large chemoinformatics companyOpeneye - Fast 3D docking, structure generation, toolkitsQuantum Pharmaceuticals - prediction, docking, screeningSage Informatics - ChemTK 2D analysis softwareTripos-Large chemoinformatics company
Journal of Chemical Information and Computer SciencesJournal of Computer-Aided Molecular DesignJournal of Molecular Graphics and ModellingJournal of Medicinal ChemistryNetSci (online journal)Scientific Computing WorldBio-IT WorldDrug Discovery Today
Journals & Magazines
Newsletters, Mailing Lists & Other Hubs
Chemical Informatics Letters- Monthly newsletterCHMINF-L (Indiana)- Email discussion listChemoinf Yahoo Group -Email discussion listChemistry Software Yahoo GroupCheminformatics.org Lots of links and QSAR datasets Reactive Reports Chemistry Web Magazine
SMILES (Simplified Molecular Input Line Entry Specification)
O11
H 2
8
9
10
N
O
H
3
4
5
6
7
1
Acetaminophen
c1c(O)ccc(NC(=O)C)c1 SMILES Representation
Aliphatic- CapitalAromatic-SmallRing-By giving no.Double bonds- “=” signParentheses-branching in the molecule
Sources of 3D structures information
• X-ray crystallography• NMR spectroscopy
DRAWING AND DEPICTING 2D STRUCTURES DRAWING AND DEPICTING 2D STRUCTURES Web-based drawing tools
JME (http://www.molinspiration.com/cgi-bin/properties) is a clean, simple Java drawing tool. Draw your structure and click on the smiley face to show the SMILES.
Marvin Sketch is a Java applet that allows you to draw structures, and export them as SMILES, MDL MOL files or others.
Web-based depiction tools
Daylight Depiction Tool (http://www.daylight.com/daycgi/depict) is a very simple to use tool that allows you to enter a SMILES string and will then produce a 2D structure diagram from it.
CACTVS GIF generator has a more complex interface, but allows many more options for producing GIF picture files of SMILES or other format structures. The quality of the images is superior to the daylight tool.
MDL Chime (http://www.mdlchime.com) is a browser-based plugin that can display both 2D and interactive 3D structures in web pages.
2D searching with Oracle chemistry cartridges
Daylight DayCart – http://www.daylight.com/products/daycart.html
• Tripos Auspyx – ttp://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.html
• Accelrys Accord for Oracle – http://www.accelrys.com/accord/oracle.html
• MDL Relational Chemistry Server – http://www.mdl.com/products/isisdirect.html
• IDBS ActivityBase – http://www.id-bs.com/products/abase/
• Chemaxon JChem Cartridge – http://www.jchem.com
Concord from Tripos, Inc. One of the first 3D structure generation programs, and is still being refined and developed. It generates single, minimal-energy structures from input 2D structures. The program can input and output a variety of file formats. http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/concord.html
Corina from the Gasteiger group. It is similar to Concord. http://www2.chemie.uni-erlangen.de/software/corina/free_struct.html
Omega from OpenEye is the latest release. It offers very fast generation of multiple low-energy conformers. http://www.eyesopen.com/products/applications/omega.html
3D Structure generation and minimization
Depiction Tools for 3D structures MDL Chime is a web browser plug-in that allows 2D and 3D structures to be viewed in web pages. It can be used to visualize both proteins and small molecules, and includes some limited ability to create molecular surfaces. It is excellent for communicating structures via the web and for use in writing web-based chemoinformatics software. http://www.mdlchime.com
ArgusLab is a free molecular modeling program that has a fairly extensive set of options for 3D visualization, calculation of surfaces and properties, minimization, and molecular docking. http://www.arguslab.com.
Data Analysis Methods
Unsupervised-artificial neural networks, genetic algorithms
Supervised - inductive learning methods statistics, pattern recognition methods
• Quantum mechanical calculations
• Additive schemes
Methods for Calculating Physical and ChemicalData
Chemistry Based Data Mining And Exploration
Chemical(s)of concern
Chemical Specific data
Structural analogue
Propertyanalogue
Biological or mechanistic analogue
Data bases
Data mining
Structure searchable
Structure activity relationships
• Quantitative analysis of chemical data relied exclusively on Multilinear regression analysis.
• Artificial neural networks
Chemometrics
An artificial neural network (ANN) or commonly just neural network (NN) is an interconnected group of artificial neurons that uses a mathematical model or computational model for information processing based on a connectionist approach to computation.
Input
Hidden
HiddenOutput
35
Computer-Assisted Structure Elucidation (CASE)
• A field of exercise for artificial intelligence techniques.
Computer-Assisted Synthesis Design (CASD)
In 1969 Corey and Wipke worked for the development of a synthesis design system.
• The DENDRAL project, initiated in 1964 at Stanford University
1. Substructure searching2. Similarity searching
36
Applications of Chemoinformatics
1. Chemical Information
Storage and retrieval of chemical structures and associated data to manage the flood of data
Dissemination of data on the internet
Cross-linking of data to information
2. All fields of chemistry
• Prediction of the physical, chemical, or biological properties of compounds
37
• identification of new lead structures
• optimization of lead structures
• establishment of quantitative structure-activity relationships
• comparison of chemical libraries
• definition and analysis of structural diversity
•planning of chemical libraries
3. Bioactive molecules
38
analysis of high-throughput data
docking of a ligand into a receptor
prediction of the metabolism of xenobiotics
analysis of biochemical pathways
Contd……
Prediction of the course and products of organic reactions
Design of organic synthesis
4. Organic Chemistry
39
• Analysis of data from analytical chemistry to make predictions on the quality, origin, and age of the investigated objects
• Elucidation of the structure of a compound based on spectroscopic data
5. Analytical Chemistry
Teaching Chemoinformatics
Chemists have to become more efficient in planning their experiments, have to extract more knowledge from their data
40
41
Toxicity Prediction for chemical Q
Globaltoxicity model
Supportinginformation
Toxicityprediction
Hypothesisgeneration
Analogue search
Chemical class assignment
Class based SAR model
Weight of evidence of toxicity presentation
Data collection
Q
42
University of Barcelona, SpainUniversity of Erlangen-Nürnberg, GermanyBioinformatics Institute Of India , ChandigarhGeorgia Institute of TechnologyUniversity of Sheffield (Willett) - MSc/PhD programsUniversity of Erlangen (Gasteiger)UCSF (Kuntz)University of Texas (Pearlman)Yale (Jorgensen)University of Michigan (Crippen)Indiana University (Wiggins) - MSc programCambridge Unilever (Glen, Goodman, Murray-Rust)Scripps - Molecular Graphics lab
Institutes are Offering Courses on Chemoinformatics
SAR Application
DRUG DESIGN ENVIRONMENTAL PROTECTION
Maximum activity Prediction of toxicity
Minimize toxicity
•Single therapeutic target•Drug like chemical•Some toxicity anticipated
•Multiple unknown targets•Diverse Structures•Human and ecosystems
QSAR STUDIES
DESIGN OF INSECTICIDE SYNERGISTS
O
O
OCH3
OR
OCH3
FURAPIOLE ANALOGUES
log SF = 0.319 RM + 0.445σR + 0.248B1 + 0.034B4 - 0.966
n s r F
14 0.057 0.950 21.04
DESIGN OF INSECTICIDE SYNERGISTS
SESAMOL ETHERS
log SF =
0.153D2 + 0.240D1 - 1.711 σI - 0.429RM + 0.070L - 0.384
O
O
OR
O
O
OCH3
OCH3
OR
O
O
OR
OCH3
n s r F
29 0.087 0.938
33.72
DILLAPIOLE SIDE CHAIN ANALOGUES
O
O
OCH3
OCH3
OR1
O
O
OCH3
OCH3
CH3OR
log SF = n s r F
0.467 - 0.105 D - 1.537 RM2 - 0.980σR 17 0.046 0.948 38.84
0.305 - 0.1 I0 D - 1. I 14 RM2 - 1.626 σR + 0.012 B4 17 0.045 0.955 31.37
0.071- 0.120 D - 0.619 R2M - 2.066 σR + 0.080 B4 -
0.003 L2
17 0.045 0.958 24.86
0.053-0.134D - 0.216 R2M – 1.290 σR + 0.135B4 +
0.006L2 - 0.67 σI
17 0.046 0.961 20.30
DESIGN OF CHEMICAL HYBRIDISING AGENTSDESIGN OF CHEMICAL HYBRIDISING AGENTS
Hybrid TechnologyHybrid Technology
Pollination Controlsystem
Pollination Controlsystem
Male sterilityMale sterility
Male sterilityMale sterility
Three - lineThree - line
Cytoplasmic GeneticMale Sterility
Cytoplasmic GeneticMale Sterility
Two - lineTwo - line
Chemical HybridisingAgents
Chemical HybridisingAgents
* p values (%)
QSAR Equations for Ethyl Oxanilates
Sl.No Equation (Ms = ) Statistics
r s F
1 49.99Fp – 2.39ΣMR + 64.73 0.7 14.10 11.75 (0.00)*
2 39.73 Fp -3.24ΣMR +0.32 MW- 0.91 0.76 13.20 10.43 (0.02)
3 43.74Fp – 3.04ΣMR +0.36 MW- 5.63D - 0.71 0.81 12.10 10.41 (0.01)
4 44.61Fp – 2.93ΣMR +0.65MW- 5.78D +8.02ΣEs – 56.94 0.86 10.80 12.05 (0.00)
5 35.56Fp – 2.96ΣMR +0.85MW- 4.94D +10.36ΣEs –10.00Σπ - 96.48 0.90 9.37 14.49 (0.00)
39.59 Fp-2.86 ΣMR+0.67MW-5.11D-2.57ΣR-16.91 Σπ-64.28 0.91 3.73 16.99
J. Agri. Food Chem. 2003, 51, 992-998
n = 27
X
NO
H
O
O
X
NO
H
O
O
F Br CF3 CN// /
O
O
O
NH
Agrophore Group
QSAR equations for 2-pyridones analogues
0.9226
7.01
7.24 21.82 (0.00)
n
0.93
-3.43MR + 38.60 Fp– 4.79D + 210.64lnMw + 10.42Es-1113.06
19.74 (0.00)
Equations (Ms =) Statistics
s r F(p %)
-2.34MR + 64.45 Fp -3.70Rp – 4.69D + 71.49 26 9.34 0.85 14.13 (0.00)
-3.21MR + 57.18 Fp -3.77Rp – 5.38D + 93.98lnMw -459.06 26 7.77 0.91 18.38 (0.00)
-3.32MR+47.47Fp-1.74Rp–5.10D+173.39lnMw+7.05Es-904.54 26
-3.00MR + 49.50 Fp– 7.87D + 211.67lnMw + 12.19Es -6.87Es(m) -1117.35 6.37 0.94 22.94
J. Agri. Food Chem. 2005, 53, 3468-3475
NO
X
O
O
NO
X
O
O
QSAR equations for N-acylanilines
Equations (Ms=) Statistics
n r r2 s F (Probability)
62.76FP – 1.66R-6.39D + 43.38 29 0.81 0.65 10.79 15.65 (0.0000)
67.54FP –1.67R-6.59D + 0.13P +
15.37
29 0.86 0.74 9.55 16.97 (0.0000)
67.54FP –1.67R-6.59D + 0.13P + 15.37 9.56 0.86 16.97
J. Agri. Food Chem. 2005, 53, 5959-5968
55
Strategy of identifying new targets
56
Model organisms for target identification
57
Test systems for targets in UHTBS
58
UHTVS - Automated evaluation of activity of compounds
59
The virtual discovery cycle
60
Unique research platform – Network of complementary technologies to meet the challenges in compound discovery
61
Discovery of the target proteins of novel fungicides
62
De novo target discovery by functional genomics and the steps aiming to develop and perform high
throughput biochemical tests
63
Gene expression profiling, a revolutionary tool in herbicide discovery
Gene Expression Profiling (GEP) with DNA microarrays (chips) is a new technology used to measure changes in the entire transcriptome, i.e. full complement of active genes, of an organism in a single experiment.
A catalogue of genetic fingerprints of the plant Arabidopsis thaliana, is created and each fingerprint being characteristic for a single herbicidal MoA is then used to rapidly classify herbicidal compounds from UHTVS according to their MoA.
Helps to identify the affected metabolic pathway and the MoA of pro-drugs, which cannot be elucidated by conventional biochemical methods.
GEP provides insight into the interactions of any herbicidal compound with the entire plant metabolism with unprecedented accuracy and completeness.
64
The principle of Gene Expression Profiling.