High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud
description
Transcript of High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud
![Page 1: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/1.jpg)
High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud
Sally R. EllingsonGraduate Research Assistant
Center for Molecular Biophysics, UT/ORNLDepartment of Genome Science and Technology, UT
Scalable Computing and Leading Edge Innovative Technologies (IGERT)
Dr. Jerome BaudryPhD Advisor
Center for Molecular Biophysics, UT/ORNLDepartment of BCMB, UT
The Second International Emerging Computational Methods for the Life Sciences WorkshopACM International Symposium on High Performance Distributed Computing
June 8, 2011, San Jose, CA
![Page 2: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/2.jpg)
Ultimate Goal:
Reduce the time and cost of discovering novel drugs
![Page 3: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/3.jpg)
1. Virtual Molecular Dockinga) Novel Drug Discoveryb) Virtual high-throughput screenings (VHTS)
2. Cloud Computinga) Advantages for VHTSb) Kandinskyc) Hadoop (MapReduce)
3. AutoDockClouda) Current Implementationb) Future Implementations
![Page 4: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/4.jpg)
Virtual Molecular Docking
Given a receptor (protein) and ligand (small molecule), predict
1. Bound conformations• Search algorithm to explore conformational space
2. Binding affinity• Force field to evaluate energetics
![Page 5: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/5.jpg)
Autodock4Virtual Docking Engine
http://autodock.scripps.edu/wiki/AutoDock4
![Page 6: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/6.jpg)
Novel Drug Discovery
Human HDAC4HA3 crystal structureZINC03962325
![Page 7: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/7.jpg)
Virtual High-Throughput Screening (VHTS)
![Page 8: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/8.jpg)
VHTS with Autodock4
![Page 9: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/9.jpg)
Potential advantages of Cloud Computing for VHTS
• Affordable access to compute resources (especially for small labs and classrooms).
• Easy to use interface accessible through web for non-computer experts. Software maintained by experts.
• Scalable resources for size of screening.
![Page 10: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/10.jpg)
KandinskyPrivate Cloud Platform at ORNL
Kandinsky, the Systems Biology Knowledgebase Computer, Sponsored by the Office of
Biological and Environmental Research in the DOE Office of Science
68 nodes X 16 cores/node = 1088 cores 20 Gbps Infiniband Interconnect
Designed to support Hadoop applications and gain an understanding of the MapReduce paradigm.
•57 nodes for MapReduce tasks • 1 tasktracker per node •10 map and 6 reduce tasks per node (16 tasks per node) •570 map tasks and 342 reduce tasks can run simultaneously on Kandinsky
![Page 11: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/11.jpg)
Hadoop
• Scalable• Economical• Efficient• Reliable
http://hadoop.apache.org/common/docs/current/api/overview-summary.html
![Page 12: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/12.jpg)
MapReduce programming paradigm used by Hadoop
people.apache.org
people.apache.org
![Page 13: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/13.jpg)
Current AutoDockCloud Implementation
input=file names needed for each docking
map(input){
copy input to local working directory;run AutoDock4 locally;copy result file to HDFS;
}
*pre-docking set-up and post-docking analysis is currently done manually*no reduce function is currently being used
![Page 14: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/14.jpg)
Current AutoDockCloud Implementation
Er Agonist screening from DUD as benchmark450 speed-up with 570 available map slots on Kandinsky, private cloud at ORNL
![Page 15: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/15.jpg)
Current AutoDockCloud Implementation
Docking enrichment plot for ER agonist using AutoDockCloud and DUD.
Perc
ent o
f kno
wn
ligan
ds fo
und
Percent of ranked database
![Page 16: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/16.jpg)
Future AutoDockCloud Implementationinput=ligand file from chemical compound database
map(input){
create pdbqt (AutoDock input file) from input;run AutoDock4 locally;find best scoring ligand structure;save structure to HDFS;return <score, ligand>;
}reduce(<score, ligand>){
sort; return ranked_database;}
*pre-docking and post-docking will be automated and distributed*less total I/O requirements
![Page 17: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/17.jpg)
Future Plans
• Incorporate additional docking engines– Autodock Vina• Less I/O• More efficient and accurate algorithm• No charge information needed
• Deploy on Commercial Cloud (EC2)• Develop web interface
![Page 18: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/18.jpg)
1. Virtual Molecular Dockinga) Novel Drug Discoveryb) Virtual high-throughput screenings (VHTS)
2. Cloud Computinga) Advantages for VHTSb) Kandinskyc) Hadoop (MapReduce)
3. AutoDockClouda) Current Implementationb) Future Implementations
![Page 19: High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud](https://reader035.fdocuments.us/reader035/viewer/2022062812/568163ee550346895dd56173/html5/thumbnails/19.jpg)
Questions/Comments
Acknowledgements• Dr. Jerome Baudry (advisor)• Center for Molecular Biophysics, UT/ORNL• Genome Science and Technology, UT• Scalable Computing and Leading Edge Innovative Technologies (IGERT)• Avinash Kewalramani, ORNL• ECMLS and HPDC organizers and participants