An Introduction to Chemoinformatics for the postgraduate students of Agriculture

Chemoinformatics and Applications in Agrochemical

DiscoveryC. Devakumar and Rajesh Kumar

Division of Agricultural ChemicalsIARI, New Delhi-110012

[email protected]

mailto:[email protected]

2

Chemical SpaceChemical Space

Stars Small Molecules

Existing 1022 107

Virtual 0 1060 (?)

Mode Real Virtual

Access Difficult “Easy”

3

Chemical Space: Small Molecules in Organic Chemistry

Chemical Space: Small Molecules in Organic Chemistry

Understanding chemical space

Small molecules:

chemical synthesis

drug design

chemical genomics,

systems biology

nanotechnology

And others

4

Cost to develop and time to market of various products

5

Registration of safer chemicals

Proportion of pesticide active ingredients that are considered to be safer (biological chemicals and reduced-risk conventional chemicals) has steadily increased over the last several years.

Source: EPA, 1999.

6

Plant biotechnology opens new markets / solutions

7

The development of the agrochemical in vivo screening

Overall Outline

1. Introduction2. Molecular Representations 3. Chemical Data and Databases4. Molecular Similarity5. Chemical Reactions6. Machine Learning and Other Predictive

Methods7. Molecular Docking and Drug Discovery

What is Chemoinformatics?

• It encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information

• It is the mixing of information resources to transform data into information and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization

What is Chemoinformatics?

• “the set of computer algorithms and tools to store and analyse chemical data in the context of drug discovery and design projects”

• Chemoinformatics is the application of informatics methods to solve chemical problems

Resources

Books:

J. Gasteiger, T. E. and Engel, T. (Editors) (2003). Chemoinformatics: A Textbook. Wiley.

A.R. Leach and V. J. Gillet (2005). An Introduction to Chemoinformatics. Springer.

Journal:

Journal of Chemical Information and Modeling

Web:

http://cdb.ics.uci.edu

and many more………

History of Chemoinformatics

The first, and still the core, journal for the subject, the Journal of Chemical Documentation, started in 1961 (the name Changed to the Journal of Chemical Information and computer Science in 1975)

The first book appeared in 1971 (Lynch, Harrison, Town and Ash, Computer Handling of Chemical Structure Information)

The first international conference on the subject was held in 1973at Noordwijkerhout and every three years since 1987

Chemoinformatics….

Chemoinformatics encompasses the design, creation, organisation, management, retrieval, analysis, dissemination, visualization and use of chemical information

'Cheminformatics combines the scientific working fields of chemistry and computer science for example in the area of chemical graph theory and mining the chemical space. It is to be expected that the chemical space contains at least 1062 molecules

http://en.wikipedia.org/wiki/Chemistry

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Graph_theory

http://en.wikipedia.org/wiki/Chemical_space

http://en.wikipedia.org/wiki/Space

Is it Cheminformatics or Chemoinformatics?

Year Cheminformatics Chemoinformatics Ratio

2000 39 684 0.05

2001 8,010 2,910 2.75

2002 34,000 16,000 2.12

2203 58,143 32,872 1.77

2204 85,435 60,439 1.41

2005 6,58,298 2,72,096 2.41

2006 3,17,000+ 1,63,000+ 1.94

• Cheminformatics, molecular informatics, chemical informatics, or even Chemo bioinformatics

Why We Need Chemoinformatics?

1) An enormous amount of data and maintenance of data

2) Can we gain enough knowledge from the known data to make predictions for those cases where the required information is

not available?

3) Relationships between the structure of a compound and itsbiological activity, or for the influence of reaction conditions on chemical reactivity.

Advances in theoretical and computational chemistry now allow chemists to model chemical compounds “in silico” with ever-increasing accuracy.

Molecular properties now becoming accessible through computation include molecular shape, electronic structure, physical properties, chemical reactivity, protein folding, structures of materials and surfaces, catalytic activity, and biochemical activities.

integrates a comprehensive knowledge of chemistry with an extensive understanding of information technology.

The intersection of chemistry and information technology embraces an expanding territory;

computational modeling of individual molecules, thermodynamic methods of estimating chemical properties, methods of predicting biological activity of hypothetical compounds, and organization and classification of chemical information.

Chemoinformatics

Schematic representation of a crowded cell. An array of different molecules can function independently under extremely crowded conditions, partly because of judicious distributions of oppositely charged polar groups on the molecular surfaces. However, such systems are in some ways extremely fragile. For example, a mutation that alters just one amino acid in the haemoglobin molecule can stimulate massive aggregation and give rise to a fatal genetic disease, sickle-cell anaemia. More generally, many disorders of old age, most famously Alzheimer’s disease, result from the increasingly facile conversion of normally soluble proteins into intractable deposits that occur particularly as we get older Many of these aggregation processes involve the reversion of the unique biologically active forms of polypeptide chains into a generic and non-functional ‘chemical’ form

Additional computational challenges lie in indexing and classifying the infinite population of chemical compounds that could be synthesized or are already known.

Specific indexing and search problems include

how to find a compound that might block a specific biological target;

how to predict the most efficient synthetic strategy for a desired compound from available precursors;

how to employ results of bioactivity tests on a family of molecules to design improved versions;

Currently combinatorial chemists are developing new methods of synthesizing libraries of related compounds on an unprecedented scale.

Such libraries can be used to produce huge arrays of materials for investigation of biochemical, catalytic, or material properties.

Systems are required to design, catalog, and search these libraries, assess test results in a meaningful way, and integrate new information with existing chemical databases.

Investigations into information storage at the molecular level are underway, bringing to full circle the link between chemistry and information technology.

22

The Scope of Chemoinformatics

Representations and Structure Searching

Substructure Searching

Similarity Searching, Clustering, and Diversity Analysis

Searching Databases

Computer-aided Structure Elucidation

3D Substructure Searching

QSAR and Docking

Structure and applications of chemoinformaticsDatabase design and programming Representation and searching of chemical structures Structure, substructure & similarity searching in 2D & 3D Markush and reaction searching Representation and searching of biological databases chemoinformatics softwareData analysis techniquesClustering; Evolutionary algorithms; Graph theory; Neural networks; Chemical information sourcesCheminformatics applications Techniques used to design bioactive compounds Molecular simulation and design Drug discovery process; QSAR; Combi-chem; SBDDSpectroscopy and crystallography in cheminformatics

Kinds of chemistry databases

• Small-molecule databases

– Databases of commercially-available compounds (e.g. ACD, http://www.mdl.com/products/experiment/available_chem_dir/index.jsp)

– Proprietary chemical structure databases

– Literature databases

– Patent databases

– Small project-specific databases

• Protein databases

– Public, online databases (e.g. PDB, http://www.pdb.org)

– Proprietary and project-specific databases

Software Companies

Accelrys -Large chemoinformatics company ACD/Labs - analytical informatics & predictionsBCI - 2D fingerprinting, clustering toolkits & softwareBioreason - HTS data analysis softwareCambridgesoft - 2D drawing tools & E-notebooksCAS - produce Scifinder Scholar searching softwareChemAxon - Java based toolkits and softwareDaylight- 2D representation & searching softwareLeadscope - 2D structure and property toolsLion Bioscience - produce LeadNavigatorMDL - Large chemoinformatics companyOpeneye - Fast 3D docking, structure generation, toolkitsQuantum Pharmaceuticals - prediction, docking, screeningSage Informatics - ChemTK 2D analysis softwareTripos-Large chemoinformatics company

Journal of Chemical Information and Computer SciencesJournal of Computer-Aided Molecular DesignJournal of Molecular Graphics and ModellingJournal of Medicinal ChemistryNetSci (online journal)Scientific Computing WorldBio-IT WorldDrug Discovery Today

Journals & Magazines

Newsletters, Mailing Lists & Other Hubs

Chemical Informatics Letters- Monthly newsletterCHMINF-L (Indiana)- Email discussion listChemoinf Yahoo Group -Email discussion listChemistry Software Yahoo GroupCheminformatics.org Lots of links and QSAR datasets Reactive Reports Chemistry Web Magazine

SMILES (Simplified Molecular Input Line Entry Specification)

O11

H 2

8

9

10

N

O

H

3

4

5

6

7

1

Acetaminophen

c1c(O)ccc(NC(=O)C)c1 SMILES Representation

Aliphatic- CapitalAromatic-SmallRing-By giving no.Double bonds- “=” signParentheses-branching in the molecule

Sources of 3D structures information

• X-ray crystallography• NMR spectroscopy

DRAWING AND DEPICTING 2D STRUCTURES DRAWING AND DEPICTING 2D STRUCTURES Web-based drawing tools

JME (http://www.molinspiration.com/cgi-bin/properties) is a clean, simple Java drawing tool. Draw your structure and click on the smiley face to show the SMILES.

Marvin Sketch is a Java applet that allows you to draw structures, and export them as SMILES, MDL MOL files or others.

Web-based depiction tools

Daylight Depiction Tool (http://www.daylight.com/daycgi/depict) is a very simple to use tool that allows you to enter a SMILES string and will then produce a 2D structure diagram from it.

CACTVS GIF generator has a more complex interface, but allows many more options for producing GIF picture files of SMILES or other format structures. The quality of the images is superior to the daylight tool.

MDL Chime (http://www.mdlchime.com) is a browser-based plugin that can display both 2D and interactive 3D structures in web pages.

2D searching with Oracle chemistry cartridges

Daylight DayCart – http://www.daylight.com/products/daycart.html

• Tripos Auspyx – ttp://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.html

• Accelrys Accord for Oracle – http://www.accelrys.com/accord/oracle.html

• MDL Relational Chemistry Server – http://www.mdl.com/products/isisdirect.html

• IDBS ActivityBase – http://www.id-bs.com/products/abase/

• Chemaxon JChem Cartridge – http://www.jchem.com

Concord from Tripos, Inc. One of the first 3D structure generation programs, and is still being refined and developed. It generates single, minimal-energy structures from input 2D structures. The program can input and output a variety of file formats. http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/concord.html

Corina from the Gasteiger group. It is similar to Concord. http://www2.chemie.uni-erlangen.de/software/corina/free_struct.html

Omega from OpenEye is the latest release. It offers very fast generation of multiple low-energy conformers. http://www.eyesopen.com/products/applications/omega.html

3D Structure generation and minimization

Depiction Tools for 3D structures MDL Chime is a web browser plug-in that allows 2D and 3D structures to be viewed in web pages. It can be used to visualize both proteins and small molecules, and includes some limited ability to create molecular surfaces. It is excellent for communicating structures via the web and for use in writing web-based chemoinformatics software. http://www.mdlchime.com

ArgusLab is a free molecular modeling program that has a fairly extensive set of options for 3D visualization, calculation of surfaces and properties, minimization, and molecular docking. http://www.arguslab.com.

Data Analysis Methods

Unsupervised-artificial neural networks, genetic algorithms

Supervised - inductive learning methods statistics, pattern recognition methods

• Quantum mechanical calculations

• Additive schemes

Methods for Calculating Physical and ChemicalData

Chemistry Based Data Mining And Exploration

Chemical(s)of concern

Chemical Specific data

Structural analogue

Propertyanalogue

Biological or mechanistic analogue

Data bases

Data mining

Structure searchable

Structure activity relationships

• Quantitative analysis of chemical data relied exclusively on Multilinear regression analysis.

• Artificial neural networks

Chemometrics

An artificial neural network (ANN) or commonly just neural network (NN) is an interconnected group of artificial neurons that uses a mathematical model or computational model for information processing based on a connectionist approach to computation.

Input

Hidden

HiddenOutput

35

Computer-Assisted Structure Elucidation (CASE)

• A field of exercise for artificial intelligence techniques.

Computer-Assisted Synthesis Design (CASD)

In 1969 Corey and Wipke worked for the development of a synthesis design system.

• The DENDRAL project, initiated in 1964 at Stanford University

1. Substructure searching2. Similarity searching

36

Applications of Chemoinformatics

1. Chemical Information

Storage and retrieval of chemical structures and associated data to manage the flood of data

Dissemination of data on the internet

Cross-linking of data to information

2. All fields of chemistry

• Prediction of the physical, chemical, or biological properties of compounds

37

• identification of new lead structures

• optimization of lead structures

• establishment of quantitative structure-activity relationships

• comparison of chemical libraries

• definition and analysis of structural diversity

•planning of chemical libraries

3. Bioactive molecules

38

analysis of high-throughput data

docking of a ligand into a receptor

prediction of the metabolism of xenobiotics

analysis of biochemical pathways

Contd……

Prediction of the course and products of organic reactions

Design of organic synthesis

4. Organic Chemistry

39

• Analysis of data from analytical chemistry to make predictions on the quality, origin, and age of the investigated objects

• Elucidation of the structure of a compound based on spectroscopic data

5. Analytical Chemistry

Teaching Chemoinformatics

Chemists have to become more efficient in planning their experiments, have to extract more knowledge from their data

41

Toxicity Prediction for chemical Q

Globaltoxicity model

Supportinginformation

Toxicityprediction

Hypothesisgeneration

Analogue search

Chemical class assignment

Class based SAR model

Weight of evidence of toxicity presentation

Data collection

Q

42

University of Barcelona, SpainUniversity of Erlangen-Nürnberg, GermanyBioinformatics Institute Of India , ChandigarhGeorgia Institute of TechnologyUniversity of Sheffield (Willett) - MSc/PhD programsUniversity of Erlangen (Gasteiger)UCSF (Kuntz)University of Texas (Pearlman)Yale (Jorgensen)University of Michigan (Crippen)Indiana University (Wiggins) - MSc programCambridge Unilever (Glen, Goodman, Murray-Rust)Scripps - Molecular Graphics lab

Institutes are Offering Courses on Chemoinformatics

SAR Application

DRUG DESIGN ENVIRONMENTAL PROTECTION

Maximum activity Prediction of toxicity

Minimize toxicity

•Single therapeutic target•Drug like chemical•Some toxicity anticipated

•Multiple unknown targets•Diverse Structures•Human and ecosystems

QSAR STUDIES

DESIGN OF INSECTICIDE SYNERGISTS

O

O

OCH3

OR

OCH3

FURAPIOLE ANALOGUES

log SF = 0.319 RM + 0.445σR + 0.248B1 + 0.034B4 - 0.966

n s r F

14 0.057 0.950 21.04

DESIGN OF INSECTICIDE SYNERGISTS

SESAMOL ETHERS

log SF =

0.153D2 + 0.240D1 - 1.711 σI - 0.429RM + 0.070L - 0.384

O

O

OR

O

O

OCH3

OCH3

OR

O

O

OR

OCH3

n s r F

29 0.087 0.938

33.72

DILLAPIOLE SIDE CHAIN ANALOGUES

O

O

OCH3

OCH3

OR1

O

O

OCH3

OCH3

CH3OR

log SF = n s r F

0.467 - 0.105 D - 1.537 RM2 - 0.980σR 17 0.046 0.948 38.84

0.305 - 0.1 I0 D - 1. I 14 RM2 - 1.626 σR + 0.012 B4 17 0.045 0.955 31.37

0.071- 0.120 D - 0.619 R2M - 2.066 σR + 0.080 B4 -

0.003 L2

17 0.045 0.958 24.86

0.053-0.134D - 0.216 R2M – 1.290 σR + 0.135B4 +

0.006L2 - 0.67 σI

17 0.046 0.961 20.30

DESIGN OF CHEMICAL HYBRIDISING AGENTSDESIGN OF CHEMICAL HYBRIDISING AGENTS

Hybrid TechnologyHybrid Technology

Pollination Controlsystem

Pollination Controlsystem

Male sterilityMale sterility

Male sterilityMale sterility

Three - lineThree - line

Cytoplasmic GeneticMale Sterility

Cytoplasmic GeneticMale Sterility

Two - lineTwo - line

Chemical HybridisingAgents

Chemical HybridisingAgents

* p values (%)

QSAR Equations for Ethyl Oxanilates

Sl.No Equation (Ms = ) Statistics

r s F

1 49.99Fp – 2.39ΣMR + 64.73 0.7 14.10 11.75 (0.00)*

2 39.73 Fp -3.24ΣMR +0.32 MW- 0.91 0.76 13.20 10.43 (0.02)

3 43.74Fp – 3.04ΣMR +0.36 MW- 5.63D - 0.71 0.81 12.10 10.41 (0.01)

4 44.61Fp – 2.93ΣMR +0.65MW- 5.78D +8.02ΣEs – 56.94 0.86 10.80 12.05 (0.00)

5 35.56Fp – 2.96ΣMR +0.85MW- 4.94D +10.36ΣEs –10.00Σπ - 96.48 0.90 9.37 14.49 (0.00)

39.59 Fp-2.86 ΣMR+0.67MW-5.11D-2.57ΣR-16.91 Σπ-64.28 0.91 3.73 16.99

J. Agri. Food Chem. 2003, 51, 992-998

n = 27

X

NO

H

O

O

X

NO

H

O

O

F Br CF3 CN// /

O

O

O

NH

Agrophore Group

QSAR equations for 2-pyridones analogues

0.9226

7.01

7.24 21.82 (0.00)

n

0.93

-3.43MR + 38.60 Fp– 4.79D + 210.64lnMw + 10.42Es-1113.06

19.74 (0.00)

Equations (Ms =) Statistics

s r F(p %)

-2.34MR + 64.45 Fp -3.70Rp – 4.69D + 71.49 26 9.34 0.85 14.13 (0.00)

-3.21MR + 57.18 Fp -3.77Rp – 5.38D + 93.98lnMw -459.06 26 7.77 0.91 18.38 (0.00)

-3.32MR+47.47Fp-1.74Rp–5.10D+173.39lnMw+7.05Es-904.54 26

-3.00MR + 49.50 Fp– 7.87D + 211.67lnMw + 12.19Es -6.87Es(m) -1117.35 6.37 0.94 22.94

J. Agri. Food Chem. 2005, 53, 3468-3475

NO

X

O

O

NO

X

O

O

QSAR equations for N-acylanilines

Equations (Ms=) Statistics

n r r2 s F (Probability)

62.76FP – 1.66R-6.39D + 43.38 29 0.81 0.65 10.79 15.65 (0.0000)

67.54FP –1.67R-6.59D + 0.13P +

15.37

29 0.86 0.74 9.55 16.97 (0.0000)

67.54FP –1.67R-6.59D + 0.13P + 15.37 9.56 0.86 16.97

J. Agri. Food Chem. 2005, 53, 5959-5968

55

Strategy of identifying new targets

56

Model organisms for target identification

57

Test systems for targets in UHTBS

58

UHTVS - Automated evaluation of activity of compounds

59

The virtual discovery cycle

60

Unique research platform – Network of complementary technologies to meet the challenges in compound discovery

61

Discovery of the target proteins of novel fungicides

62

De novo target discovery by functional genomics and the steps aiming to develop and perform high

throughput biochemical tests

63

Gene expression profiling, a revolutionary tool in herbicide discovery

Gene Expression Profiling (GEP) with DNA microarrays (chips) is a new technology used to measure changes in the entire transcriptome, i.e. full complement of active genes, of an organism in a single experiment.

A catalogue of genetic fingerprints of the plant Arabidopsis thaliana, is created and each fingerprint being characteristic for a single herbicidal MoA is then used to rapidly classify herbicidal compounds from UHTVS according to their MoA.

Helps to identify the affected metabolic pathway and the MoA of pro-drugs, which cannot be elucidated by conventional biochemical methods.

GEP provides insight into the interactions of any herbicidal compound with the entire plant metabolism with unprecedented accuracy and completeness.

64

The principle of Gene Expression Profiling.

An Introduction to Chemoinformatics for the postgraduate students of Agriculture

Technology

Transcript of An Introduction to Chemoinformatics for the postgraduate students of Agriculture