Introduction to...

24
Introduction to Chemoinformatics Alexandre Varnek

Transcript of Introduction to...

Introduction to Chemoinformatics

Alexandre Varnek

Cheminformatics (also known as chemoinformatics and chemicalinformatics) is the use of computer and informational techniques,applied to a range of problems in the field of chemistry.

These in silico techniques are used in pharmaceutical companiesin the process of drug discovery.

In the U.S., recent NIH emphasis has been placed on developingpublic domain Cheminformatics research by creating sixExploratory Centers for Cheminformatics Research (ECCRs) aspart of the NIH Molecular Libraries Initiative.

Definition (wikipedia)

CHEMOINFORMATICS

Target Protein

Large librariesof molecules

High Throughout Screening

Hit

experimental

computational

Virtual Screening

Filtering, QSAR,Docking

Small Library of selected hits

In silico design

Storage and Search of chemical information

Structure-Property Modeling

Major applications of Chemoinformatics

Chemoinformatics ‐Why?•amount of information

many millions of compounds and reactionsmany millions of publications

Chemical Databases

Storage, organization and search experimental data

Problem: Flood of Information

• > 47 million compounds

• 5-7 million new compounds / year

• 800,000 publications / year0

5 000 000

10 000 000

15 000 000

20 000 000

25 000 000

30 000 000

# of

stru

ctur

es

1965 1970 1975 1980 1985 1990 1995 2000

Year

=> can anyone read 4.000 publications / day ?

Problem: Not Enough Information

•> 47,000,000 chemical compounds

•~ 500,000 3D structures on• Cambridge Crystallographic File

we have 3D structures for 0.1 % of all compounds

Chemoinformatics ‐Why?

• complex relationshipsstructure - biological activitychemical reactivity

In silico design of new compounds

Prediction of physical, chemical and biological properties

The most fundamental and lasting objective of synthesis is not production of new compounds but production of 

properties

George S. HammondNorris Award Lecture, 1968

Chemoinformatics ‐ How?

Prediction of physical, chemical and biological properties

Storage, organization and search experimental data

Encoding molecular structures by descriptors

Example 1: Hansch Analysis

• Hansch’s Descriptors canbe broadly classified intothree general types: 

• Electronic (σ)• Steric (δEs)• Hydrophobic (logP)

Biological Activity = f (Descriptors) + constant

log1/C = a ( log P )2 + b log P + ρσ + δEs + C

Example 2: Lipinski rule of five

• There are more than  5 H‐bond donors.

• The molecular weight is over 500.

• The LogP is over 5.

• There are more than 10 H‐bond acceptors.

Poor absorption or permeation are more likely when:

Molecule is represented by 4 parameters:- the number of H-bond donor groups;- the number of H-bond acceptor groups;- molecular weight;- logP

Chemoinformatics ‐ definition

Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in multidimentional chemical space

Theoretical chemistry

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

Theoretical chemistry

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

- Molecular model- Basic concepts- Major applications- Learning approaches

Molecular Model

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformaticsobjects in chemical space

(graphs, vectors)

electrons and nuclei

atoms and bonds

Learning approach

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

deductive >> inductive

deductive ≅ inductive

deductive << inductive

Chemoinformatics:      From Data to Knowledge

know-ledge

information

data

generalization

context

measurementcalculation

deductivelearning

inductivelearning

They are complementary

Quantum Chemistry

Force Field Modeling

Chemoinformatics

… but Chemoinformatics is the most suitable one for quantitative predictions of properties

Which approach is more useful for a theoretical design of compounds possessing desired properties ?

Chemoinformatics ‐ definition• Chemoinformatics is a generic term that encompasses the design,

creation, organization, management, retrieval, analysis, dissemination,visualization, and use of chemical information

G. Paris, 1998.

• Chemoinformatics is the application of informatics methods to solvechemical problems

J. Gasteiger, 2004

• Chemoinformatics is the mixing of those information resources totransform data into information and information into knowledge for theintended purpose of making better decisions faster in the area of drug leadidentification and optimization”

F.K. Brown, 1998

• Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in multidimentional chemical space

A. Varnek, 2007

Recommended reading

Chemoinformatics - A Textbook, Johann Gasteiger andThomas Engel, Wiley-VCH 2003.

Handbook of Chemoinformatics, Johann Gasteiger,Wiley-VCH 2003.

An Introduction to Chemoinformatics, Andrew R. Leach,Valerie J. Gillet, Springer 2007.

Short courses in chemoinformatics, 1 – 5 June 2009

Computer representation of chemical structures A. VarnekMorning

Day 1

Afternoon Creation and management of chemical databases G. Marcou, A.VarnekTutorials with the ChemAxon software

Molecular Descriptors A. VarnekMorning

Day 2

Afternoon Force Field approach. Conformational sampling D. Horvath, A. VarnekTutorials with MOE, Codessa Pro

Pharmacophores T. Langer, D. HorvathMorning

Day 3

Afternoon Chemical space, similarity/diversity and chemical library design J. BajorathTutorials with MOE

Short courses in chemoinformatics, 1 – 5 June 2009

Morning

Day 4

AfternoonStructure-Property modeling G. Marcou, A.VarnekTutorials with ISIDA, CODESSA Pro and WEKA

Docking E. KellenbergerMorning

Day 5

Afternoon Virtual screening G. Marcou, D. Horvath, A. VarnekTutorials with MOE