Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy [email protected]...
-
Upload
heather-young -
Category
Documents
-
view
214 -
download
0
Transcript of Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy [email protected]...
Outline
Overview of drug discovery
Structure-based computational methodsWhen we know the structure of the targeted protein
Ligand-based computational methodsWhen we don’t know the protein’s structure
What is a drug?
Small Molecule Drugs
Aspirin
Sildenafil (Viagra)
Glipizide (Glucotrol)
Taxol
Digoxin
Darunavir
Nanoparticles(e.g., packaged small-molecule drugs)
Doxil(liposome package,
extended circulation time,milder toxicity)
Abraxane(albumin-packaged taxol)
http://www.doxil.com/about_doxil.html http://www.abraxane.com/professional/nab-technology.aspx
Biopharmaceuticals
Erythropoietin (EPO)Stabilized variant of a natural protein hormone
Etanercept (Enbrel)Protein with TNF receptor + Ab Fc domainScavenges TNF, diminishes inflammation
http://www.ganfyd.org/index.php?title=Erythropoietin_beta http://en.wikipedia.org/wiki/File:Enbrel.jpg
How are drugs discovered?
Digoxin
Foxglove
Aspirin Taxol
Willow
Pacific Yew
Natural Products
How Aspirin Works
inflammation
platelet activation
Aspirin
platelet inactivation
Biomolecular Pathways and Target SelectionE.g. signaling pathways
http://www.isys.uni-stuttgart.de/forschung/sysbio/insulin/index.html
Target protein
Empirical Path to Ligand DiscoveryCompound library(commercial, in-house,
synthetic, natural)
High throughput screening(HTS)
Hit confirmation
Lead compounds(e.g., µM Kd)
Lead optimization(Medicinal chemistry)
Potent drug candidates(nM Kd)
Animal and clinical evaluation
Compound Libraries
Commercial (also in-house pharma) Government (NIH)
Academia
Computer-Aided Ligand Design
Aims to reduce number of compounds synthesized and assayed
Lower costs
Less chemical waste
Faster progress
HIV Protease/KNI-272 complex
Scenario 1Structure of Targeted Protein Known: Structure-Based Drug Discovery
Protein-Ligand Docking Structure-Based Ligand Design
VDW
Dihedral
Screened Coulombic
+ -
Potential functionEnergy as function of structure
Docking softwareSearch for structure of lowest energy
Energy Determines Probability (Stability)Boltzmann distribution
Ene
rgy
Pro
babi
lity
( )/( ) E x RTp x e
x
Structure-Based Virtual Screening
Compound database 3D structure of target(crystallography, NMR, modeling)
Virtual screening(e.g., computational docking)
Candidate ligands
Experimental assay
Ligands
Ligand optimizationMed chem, crystallography, modeling
Drug candidates
Fragmental Structure-Based Screening
“Fragment” library 3D structure of target(crystallography, NMR, modeling)
Fragment docking
Compound design
http://www.beilstein-institut.de/bozen2002/proceedings/Jhoti/jhoti.html
Experimental assay and ligand optimizationMed chem, crystallography, modeling Drug candidates
Physics-Based
Knowledge-Based
Potential Functions for Structure-Based DesignEnergy as a function of structure
Physics-Based PotentialsEnergy terms from physical theory
Van der Waals interactions (shape fitting)Bonded interactions (shape and flexibility)Coulombic interactions (charge-charge complementarity)Hydrogen-bonding
Common Simplifications Used in Physics-Based Docking
Quantum effects approximated classically
Protein often held rigid
Configurational entropy neglected
Influence of water treated crudely
Proteins and Ligand are Flexible
+
Ligand
Protein
Complex
D Go
Binding Energy and Entropy
Unbound states
Bound states
l 3n lnbound FreeG RT E EK RTD
EFree
EBound
Energy part Entropy part
/
/
2
6
Bound
Free
RE
RTE
TeK
e
Structure-Based DiscoveryPhysics-oriented approaches
WeaknessesFully physical detail becomes computationally intractableApproximations are unavoidableParameterization still required
StrengthsInterpretable, provides guides to designBroadly applicable, in principle at leastClear pathways to improving accuracy
StatusUseful, far from perfectMultiple groups working on fewer, better approxs
Force fields, quantumFlexibility, entropyWater effects
Moore’s law: hardware improving
Knowledge-Based Docking Potentials
Histidine
Ligandcarboxylate
Aromaticstacking
Probability Energy
( )/( ) E r RTp r e
( ) ln ( )E r RT p r
Boltzmann:
Inverse Boltzmann:
Example: ligand carboxylate O to protein histidine N
1. Find all protein-ligand structures in the PDB with a ligand carboxylate O2. For each structure, histogram the distances from O to every histidine N3. Sum the histograms over all structures to obtain p(rO-N)4. Compute E(rO-N) from p(rO-N)
“PMF”, Muegge & Martin, J. Med. Chem. 42:791, 1999Knowledge-Based Docking Potentials
A few types of atom pairs, out of several hundred total
Atom-atom distance (Angstroms)
( )( )
( )prot lig vdw type ij ijpairs ij
E E E r
Nitrogen+/Oxygen- Aromatic carbons Aliphatic carbons
Structure-Based DiscoveryKnowledge-based potentials
WeaknessesAccuracy limited by availability of dataAccuracy may also be limited by overall approach
StrengthsRelatively easy to implementComputationally fast
StatusUseful, far from perfectMay be at point of diminishing returns
Limitations of Knowledge-Based Potentials
1. Statistical limitations (e.g., to pairwise potentials)
2. Even if we had infinite statistics, would the results be accurate? (Is inverse Boltzmann quite right? Where is entropy?)
r1 r2 r10…
10 bins for a histogram of O-N distances
rO-N
rO-C
100 bins for a histogram of O-N & O-C distances
rO-N
e.g. MAP Kinase Inhibitors
Using knowledge of existing inhibitors to discover more
Scenario 2Structure of Targeted Protein Unknown: Ligand-Based Drug Discovery
Why Look for Another Ligand if You Already Have Some?
Experimental screening generated some ligands, but they don’t bind tightly
A company wants to work around another company’s chemical patents
An high-affinyt ligand is toxic, is not well-absorbed, etc.
Ligand-Based Virtual Screening
Compound Library Known Ligands
Molecular similarityMachine-learning
Etc.
Candidate ligands
Assay
Actives
OptimizationMed chem, crystallography, modeling
Potent drug candidates
Sources of Data on Known LigandJournals, e.g., J. Med. Chem.
Some Binding and Chemical Activity Databases
PubChem (NIH) pubchem.ncbi.nlm.nih.govChEMBL (EMBL) www.ebi.ac.uk/chemblBindingDB (UCSD) www.bindingdb.org
BindingDBwww.bindingdb.org
Finding Protein-Ligand Data in BindingDB
e.g., by Name of Protein “Target”
e.g., by Ligand Draw Search
Sample Query ResultsBindingDB to PDB
PDB to BindingDB
Download data inmachine-readableformat
Sample Query Results
Machine-Readable Chemical FormatStructure-Data File (SDF)
PDB Format Lacks Chemical BondingSDF Format Defines Chemical Bonds
There are Many Other Chemical File FormatsInterconvert with Babel
Chemical SimilarityLigand-Based Drug-Discovery
Compounds(available/synthesizable)
Compare with known ligands
SimilarTest experimentally
Different
Don’t bother
Chemical FingerprintsBinary Structure Keys
Molecule 1
Molecule 2
phenyl
methyl
ketone
carboxy
late
amidealdehyd
e
chlorin
e
fluorine
ethylnaphthyl
S-S bond
alcohol
…
Chemical Similarity from FingerprintsTanimoto Similarity or Jaccard Index, T
0.25U
ITN
N
NI=2Intersection
NU=8Union
Molecule 1
Molecule 2
Hashed Chemical FingerprintsBased upon paths in the chemical graph
1-atom paths: C F N H S O2-atom paths: F-C C-C C-N C-S S-O C-H3-atom paths: F-C-C C-C-N C-N-H C-S-O
C S-O etc.
Each path sets a pseudo-random bit-pattern in a very long molecular fingerprint
Maximum Common Substructure
Ncommon=34
Potential Drawbacks of Plain Chemical Similarity
May miss good ligands by being overly conservative
Too much weight on irrelevant details
Scaffold Hopping
Zhao, Drug Discovery Today 12:149, 2007
Identification of synthetic statins by scaffold hopping
Abstraction and Identification of Relevant Compound Features
Ligand shape
Pharmacophore models
Chemical descriptors
Statistics and machine learning
+ 1
Bulky hydrophobe
Aromatic
5.0 ±0.3 Å3.2 ±0.4 Å
2.8 ±0.3 Å
Pharmacophore ModelsΦάρμακο (drug) + Φορά (carry)
A 3-point pharmacophore
Molecular DescriptorsMore abstract than chemical fingerprints
Physical descriptorsmolecular weightchargedipole momentnumber of H-bond donors/acceptorsnumber of rotatable bondshydrophobicity (log P and clogP)
Topologicalbranching indexmeasures of linearity vs interconnectedness
Etc. etc.
Rotatable bonds
A High-Dimensional “Chemical Space”Each compound is at a point in an n-dimensional space
Compounds with similar properties are near each other
Descriptor 1
Descriptor 2
Des
crip
tor 3
Point representing a compound in descriptor space
Statistics and Machine LearningSome examples
Partial least squares
Support vector machines
Genetic algorithms for descriptor-selection
Summary
Overview of drug discovery
Computer-aided methodsStructure-basedLigand-based
Interaction potentialsPhysics-basedKnowledge-based (data driven)
Ligand-protein databases, machine-readable chemical formats
Ligand similarity and beyond
Mike Gilson, School of Pharmacy, [email protected], 2-0622
Activities and Discussion Topics
BindingDB: Advil Machine-readable format, Binding activities
PDB/BindingDB2ONY at PDB BindingDB Substructure search Related data
Similarity search
Combined computational approaches(physics + knowledge)-based docking potentials(ligand + structure)-based computational discovery
Other data-driven methods where it may be hard to get enough statistics
Validation of computational methods
Protein-ligand databases: getting data and assessing data quality
Drug Discovery Pipeline(One Model)
Target identification
Target validation
Assay development
Animal Pharmacokinetics,
Toxicity
Phase I Clinical(safety, metab, PK)
Phase II Clinical(efficacy)
Phase III Clinical(comparison with existing therapy)
Lead optimization
Lead compound
(ligand) discovery
Muegge J. Med. Chem. 49: 5895, 2006
Updated Knowledge-Based PMF Potential