BUDE: GPU-Accelerated Molecular Docking for Drug Discoveryon-demand.gputechconf.com ›...
Transcript of BUDE: GPU-Accelerated Molecular Docking for Drug Discoveryon-demand.gputechconf.com ›...
BUDE
A General Purpose Molecular Docking Program Using OpenCL
Richard B Sessions
1
The molecular docking problem
2
Proteins typically O(1000) atoms
Ligands typically O(100) atoms ligand
predicted complex
receptor
1 Sampling (6-degrees of freedom) EMC
2 Binding affinity prediction EFE-FF
3
An atom-atom based forcefield
parameterised according to atom type,
analagous to standard molecular mechanics
McIntosh-Smith, S., et al., Benchmarking Energy Efficiency,
Power Costs and Carbon Emissions on Heterogeneous Systems.
Computer Journal, 2012. 55(2): p. 192-205.
Empirical Free Energy Forcefield
soft core
Re-docking a ligand into the Xray Structure (good prediction == low RMSD)
5 1CIL (Human carbonic anhydrase II) RMSD ~ 0.2 Å
Another example
6 1EZQ (Human Factor XA) RMSD ~ 1.2 Å
7
Accuracy of Pose Prediction (re-docking the BindingDB validation set, 84 complexes)
www.bindingdb.org
8
Binding Energy Prediction:
is BUDE any better?
Mike Hann’s
2006 test
of docking
software
Yes – better but not perfect!
9
Yes
Yes
No
No
Yes Large
Small
Info
Docking
Yes
No
No
Start BUDE Enter Initial
Data End BUDE
Act on
Option
Print Help
Write Control File
End BUDE
Data
Reading
Error(s)?
Error(s)?
Prepare Data for
Docking
Yes
Error(s)?
No
Docking
Type
Surface Docking Site Docking
Generate Surface
Pairs
Do Docking
Parallel
Code?
Do Generation
Host Job
Accelerated Job
Calculate Energies
Rank Energies
EMC
Last
Generation
Score Results
Print Results
BUDE Simplified Flow Diagram
(C++/OpenCL)
BUDE’s heterogeneous approach
1. Discover all OpenCL platforms/devices,
inc. both CPUs and GPUs
2. Run a micro benchmark on each device,
ideally a short piece of real work
3. Load balance using micro benchmark
results
4. Re-run micro benchmark at regular
intervals in case load changes
10
11
BUDE’s Three Docking Modes
•Virtual Screening by Docking
• Binding Site Prediction
• Protein-Protein Docking in real space
12
Virtual Screening by Docking
Virtual Screening by Docking of NDM-1 New Delhi metallo-β-lactamase-1
• 8 million ZINC8 candidate drug molecules
20 conformers each 160M dockings
• EMERALD (STFC funded machine in Oxford)
• 372 GPU
• 2.4x1017 atom-atom energies calculated
• ~60 hours actual wall-time
13
BUDE’s EMC in Action 14
15
Virtual Screening for Ligands to Stabilise a Protein
Screened 160 million conformations of the 8 million ZINC database
against 5 different conformations of the protein on EMERALD
Selected and tested 58 compounds with two types of experimental
assays and found 18 compounds binding between 10 and 100 µM
31% hit rate
16
A New Virtual Screen against a key protein from the
Malaria Parasite
BlueCrystal P3 76 Nvidia K20s
EMERALD 372 Nvidia M2090s
Binding Site Identification
Full rotation and limited
translation of the ligand at
each receptor surface vector
18
Location of the Binding Site of PI3P to a Protein
(homology model) Involved in Insulin Signalling
Thomas & Tavare
19
Protein-Protein Docking (in real space)
Each point on ligand offered to
each point on receptor with a local
mini-dock: complete rotation in Z ,
rock in X & Y, small translations in
X, Y & Z
20
Best energy -> RMSD = 0.2 Å
Protein-Protein Docking Example the leucine zipper coiled coil
In a “real” case with Pete Cullen’s group we have mapped a protein-
protein interface using BUDE and confirmed it experimentally. This
took only 20 site-directed mutations, instead of the hundreds required
by full alanine-scanning mutagenesis
21
Performance across devices
High performance in silico virtual drug screening on many-core processors.
Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra
International Journal of High Performance Computing Applications (accepted for publication)
16 cores @ 3.1 GHz
22
Main Optimisations
Conditional accumulation Predicated accumulation
Instruction mix in the innermost loop of the energy calculation
High performance in silico virtual drug screening on many-core processors.
Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra
International Journal of High Performance Computing Applications (accepted for publication)
23
Optimisations
High performance in silico virtual drug screening on many-core processors.
Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra
International Journal of High Performance Computing Applications (accepted for publication)
24
Summary
• GPUs and machines like Emerald are enabling new science
• BUDE is promising a step-change in Molecular Docking
• But plenty more developments and improvements are possible!
25
Amaurys Avila Ibarra
Simon N McIntosh-Smith
James Price
Debbie K Shoemark
EMERALD and the eInfraStructure South Consortium UK
BlueCrystal and the Advanced Computing Research Centre
(Bristol)
Acknowledgements On the shoulders of giants ...
Emil Fischer (1852-1919)
‘Lock and Key’
Willard Gibbs (1839-1903)
Gibbs Free Energy
G = H – TS
26
Supplementary Slides
Structure and Binding Energy Prediction speed vs accuracy tradeoff
Entropy:
solvation
configurational
Electrostatics
All atom
Explicit solvent
No Yes Yes
Approx Approx Yes
? Approx Yes
No Yes Yes
No No Yes
Typical docking
scoring
functions
Empirical Free
Energy Forcefield
BUDE
Free Energy
calculations
MM1,2 QM/MM3
Accuracy
Speed
1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006)
2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007)
3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 27
EMC Genetic Algorithm minimiser
28
On the shoulders of giants ...
Emil Fischer (1852-1919)
‘Lock and Key’
Willard Gibbs (1839-1903)
Gibbs Free Energy
G = H – TS
30
Receptor and Ligand Flexibility
Protein: Backbone – dock to selected Xray or MD structures
Sidechains – sample side chain rotamers during docking
Full flexibility: would be Molecular Dynamics
Small molecule: generate and dock many different conformations
Limited flexibility: is appropriate for Molecular Docking:
e.g. ZINC database of 8 M drug-like compounds 160 M conformers
Seed Parents Selected By Flag Generation Size Output
Descriptors
Output
Coordinates
Mutation
Method
Parameter Parameter Parameter
N M R* True X Y Z U K% R* R*
EMC Genetic Algorithm
32
BUDE Algorithm
33