ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17)...
Transcript of ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17)...
The NVIDIA® Tesla™ Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a “computational laboratory” capable of running complex bioscience codes, in fields such as drug discovery and DNA sequencing, more than 10-20 times faster through the use of NVIDIA Tesla GPUs.
It consists of bioscience applications; a community site for downloading, discussing, and viewing the results of these applications; and GPU-based platforms that enable these applications to run at 1/10th the cost of CPU-only computers.
Complex molecular simulations that had been only possible using supercomputing resources can now be run on an individual workstation, optimizing the scientific workflow and accelerating the pace of research. These simulations can also be scaled up to GPU-based clusters of servers to simulate large molecules and systems that would have otherwise required a supercomputer.
Applications that are accelerated on GPUs include:
• Molecular Dynamics & Quantum Chemistry
- AMBER, GROMACS, HOOMD, LAMMPS, NAMD, TeraChem (Quantum Chemistry), VMD
• Bio Informatics
- CUDA-BLASTP, CUDA-EC, CUDA-MEME, CUDASW++ (Smith-Waterman), GPU-HMMER, MUMmerGPU
For more information, visit www.nvidia.com/bio_workbench
GPU SOLUTIONS
The Tesla Bio Workbench applications can be deployed on GPU-based desktop personal supercomputers or in data center solutions. Built on the revolutionary, massively parallel CUDA™ GPU computing architecture, these solutions are designed to accelerate the pace of computational science and are available today. Visit www.nvidia.com/tesla to find out more about:
WORKSTATION SOLUTIONS:
TESLA PERSONAL SUPERCOMPUTER For personal supercomputing at your desk
DATA CENTER SOLUTIONS:
TESLA GPU COMPUTING CLUSTERS For computing with large-scale installations
ENABLING NEW SCIENCETESLA™ BIO WOrkBENCh
AMBEr
The explicit solvent and implicit solvent simulations in AMBER have been accelerated using CUDA-enabled GPUs. Paired with a Tesla GPU computing solution built on the CUDA architecture, it enables more than 10x speedup compared to a single quad-core CPU.
Benchmark data for explicit solvent PME and implicit solvent GB Benchmarks. Details can be found on the AMBER 11 NVIDIA benchmark page courtesy of the San Diego Supercomputing Center.
GrOMACS
GROMACS is a molecular dynamics package designed primarily for simulation of biochemical molecules like proteins, lipids, and nucleic acids that have a lot complicated bonded interactions. The CUDA port of GROMACS enabling GPU acceleration is now available in beta and supports Particle-Mesh-Ewald (PME), arbitrary forms of non-bonded interactions, and implicit solvent Generalized Born methods.
The CUDA version of GROMACS currently supports a single GPU and produces the following results:
hOOMD
HOOMD-blue is a general purpose particle dynamics package written from ground-up to take advantage of the revolutionary CUDA architecture in NVIDIA GPUs. It contains a number of different force fields and integrators and its object oriented design enables adding additional ones easily.
Detailed benchmarks are available on the HOOMD-blue page. One Tesla GPU running HOOMD-blue can outperform 32 CPU cores.
LAMMPS
LAMMPS is a classical molecular dynamics package written to run well on parallel machines. The CUDA version of LAMMPS is accelerated by moving the force calculations to the GPU.
LAMMPS on the GPU scales very well as seen by the results below. Two Tesla GPUs running GPU-LAMMPS outperforms 24 CPUs.
0
5.94
11.93
ns/d
ay
8 CPU coresTesla C1060Tesla C2050
25
DHFR NVE23,558 atoms
Nucleosome25,095 atoms 1GB-1
20.70
0
1.2
0.06
0.52
ns/d
ay
1.04
20
15
10
5
1
0.8
0.6
0.4
0.2
CPU = 2x Quad-core Intel Xeon E 5462, ifort 10.1, MKL 10.1, MPICHZGPU = Tesla C1060 / Tesla C2050, Gfortran 4.1, nvcc v3.0
LAMMS on 32 x 86 CPU coresHOOMD-BLUE on 1 Tesla 10-series GPUHOOMD-BLUE on 2 Tesla 10-series GPUs
Tim
e st
eps/
sec
Polymer(64K particles)
Lennard-Jones Liquid(64K particles)
Data courtesy of University of Michigan
HOOMD on GPUs vs LAMMPS on 32 CPU cores
300
250
200
150
100
50
450
400
350
0
2 GPUs
1 GPU
32 CPUcores
2 GPUs
1 GPU
32 CPUcores
Fem
tose
c si
mul
ated
/sec
Number of Quad-Core CPUsor Number of Tesla GPUs
2 GPUs = 24 CPUs
Data courtesy of Scott Hampton & Pratul K. Agarwal, Oak Ridge National Laboratory
LAMMPS Results on CPU vs GPU Clusters
24 CPUs
2 GPUs
01
140
160
180
120
100
80
60
20
8 24
40
2x Intel Quad Core E5462Tesla C1060 GPU
12.0
10.0
8.0
6.0
4.0
2.0
0.0
step
s/se
c 22xFaster
0
7000
step
s/se
c
6000
5000
4000
3000
2000
1000
NaCl (894 atoms)Solvated Spectrin(37.8K atoms)
Spectrin Dimer(75.6K atoms)
Data courtesy of Stockholm Center for Biomembrane Research
Reaction-Field CutoffsParticle-Mesh-Ewald (PME)
GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS
NAMD
The Team at University of Illinois at Urbana-Champaign (UIUC) has been enabling CUDA-acceleration on NAMD since 2007. They have performed scaling experiments on the NCSA Tesla-based Lincoln cluster and demonstrated that 4 Tesla GPUs can outperform a cluster with 16 quad-core CPUs.
NAMD scales very well on a Tesla GPU cluster as demonstrated by these results.
TErAChEM
TeraChem is the first general purpose quantum chemistry software package designed ground up specifically to take advantage of the massively parallel CUDA architecture of NVIDIA GPUs. TeraChem currently supports density functional theory (DFT) and Hartree-Fock (HF) methods.
TeraChem supports multiple GPUs using p-threads and will soon be MPI enabled. A workstation with 4 Tesla GPUs running TeraChem outperforms 256 quad-core CPUs running GAMESS.
VMD
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. Several key kernels and applications in VMD now take advantage of the massively parallel CUDA architecture of NVIDIA’s GPUs. These applications run 20x to 100x faster when using a NVIDIA CUDA GPU compared to running them on a CPU only.
Number of Quad-Core CPUsor Number of Tesla GPUs
Data courtesy of Theoretical and Computation Bio-physics Group, UIUC
NAMD Results on CPU vs GPU Clusters
0
7
8
9
6
5
4
3
1
2
4 GPUs = 16 CPUs
STM
V St
ep/s
1 2 4 8 16Data courtesy of Theoretical and Computational Bio-physics Group, UIUC
Molecular Orbital Computate in VMD
Intel Core 2 Q6600 (Quad) 2.4 GHzTesla C1060 GPU
50000
30000
40000
10000
0
20000
70000
60000
Thr-a(threonine-17)
Thr-b(threonine-17)
C60-a(carbon-60)
C60-b(carbon-60)
103 L
attic
e po
ints
/sec
Data courtesy of Petachem
Speedup of TeraChem on GPU vs GAMESS on CPU
Vallnomyclin
Taxol
Buckyball
Olestra50x
11x
7x
8x
0 10 20 30 40 50 60 70
2x Intel Quad Core 5520 CPU4x Tesla C1060 GPU
GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS
CUDA-BLASTP
CUDA-BLASTP is designed to accelerate NCBI BLAST for scanning protein sequence databases by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. CUDA-BLASTP also has a utility to convert FASTA format database into files readable by CUDA-BLASTP.
CUDA-BLASTP running on a workstation with two Tesla C1060 GPUs is 10x faster than NCBI BLAST (2.2.22) running on an Intel i7-920 CPU. This cuts compute time from minutes on CPUs to seconds using GPUs.
CUDA-EC
CUDA-EC is a fast parallel sequence error correction tool for short reads. It corrects sequencing errors in high-throughput short-read (HTSR) data and accelerates HTSR by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. Error correction is a preprocessing step for many DNA fragment assembly tools and is very useful for the new high-throughput sequencing machines.
CUDA-EC running on one Tesla C1060 GPU is up to 20x faster than Euler-SR running on a x86 CPU. This cuts compute time from minutes on CPUs to seconds using GPUs. The data in the chart below are error rates of 1%, 2%, and 3% denoted by A1, A2, A3 and so on for 5 different reference genomes.
CUDA-MEME
CUDA-MEME is a motif discovery software based on MEME (version 3.5.4). It accelerates MEME by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. It supports the OOPS and ZOOPS models in MEME.
CUDA-MEME running on one Tesla C1060 GPU is up to 23x faster than MEME running on a x86 CPU. This cuts compute time from hours on CPUs to minutes using GPUs. The data in the chart below are for the OOPS (one occurrence per sequence) model for 4 datasets.
CUDASW++ (SMITh-WATErMAN)
CUDASW++ is a bio-informatics software for Smith-Waterman protein database searches that takes advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs to perform sequence searches 10x-50x faster than NCBI BLAST. CUDASW++ supports query lengths up to 59K.
CUDASW++ can take advantage of multiple Tesla GPUs and runs between 10x-50x faster than NCBI BLAST, getting up to 30 GCUPs on query lengths of over 5000. Whereas BLAST is a heuristic algorithm, CUDASW++ does optimal local alignment using the Smith-Waterman algorithm.
Data courtesy of Nanyang Technological University, Singapore
NCBI BLASTP: 1 Thread Intel i7-920 CPU NCBI BLASTP: 4 Threads Intel i7-920 CPU CUDA BLASTP: 1 Tesla C1060 GPUCUDA BLASTP: 2 Tesla C1060 GPUs
Query Sequence Length
Spee
dup
12
10
8
6
4
2
127 254 517 1054 20260
CUDA-BLASTP vs NCBI BLASTP Speedups
3.6 sec 3.6 sec 8.5 sec
22 sec
21 sec
Data courtesy of Nanyang Technological University, Singapore
AMD Opteron 2.2 GHzTesla C1060 GPU
Spee
dup
25
20
15
10
Mini-drosoph4 sequences
HS_100100 sequences
HS_200200 sequences
HS_4004 sequences
5
0
CUDA-MEME vs MEME Speedups
22 mins
15 mins 57 mins 4 hours
0
35
30
25
20
15
10
GC
UP
S
CUDASW++ vs NCBI BLAST
Data courtesy of Nanyang Technological University, Singapore
Query Sequence Length
2 GPUs:BLOSUM45
CUDASW++ 2.0
BLAST on CPU:BLOSUM62
BLAST on CPU:BLOSUM50
1 GPU:BLOSUM45
5
BLAST BLOSUM50 10-3kBLAST BLOSUM62 10-2k1 Tesla C1060 GPU (CUDASW++)2 Tesla C1060 GPUs (CUDASW++)
144
189
222
375
464
567
657
729
850
1000
1500
2005
2504
3005
3564
4061
4548
4743
5147
5478
Data courtesy of Nanyang Technological University, Singapore
Euler-SR: AMD Opteron 2.2 GHzCUDA-EC: Tesla C1060
Read Length =35 Error rates of 1%, 2%, 3%
Spee
dup
25
20
15
10
A1
A2
A3
B1
B2
B3
C1
C2
C3
D1
D2 E
S. cer 50.58M
S. cer 71.1M
H. inf1.9M
E. col4.7M
S. aures4.7M
0
CUDA-EC vs Euler-SR Speedups
5
GPU-BASED BIO-INFORMATICS APPLICATIONS
To learn more about the NVIDIA Tesla Bio Workbench, go to www.nvidia.com/bio_workbench.
© 2010 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Tesla, and CUDA, are trademarks and/or registered trademarks of NVIDIA Corporation. All company and product names are trademarks or registered trademarks of the respective owners with which they are associated.
Partner Logo
GPU-hMMEr
GPU-HMMER is a bioinformatics software that does protein sequence alignment using profile HMMs by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. GPU-HMMER is 60-100x faster than HMMER (2.0).
GPU-HMMER accelerates the hmmsearch tool using GPUs and gets speed-ups ranging from 60-100x. GPU-HMMER can take advantage of multiple Tesla GPUs in a workstation to reduce the search from hours on a CPU to minutes using a GPU.
MUMMErGPU
MUMmerGPU is a bio-informatics software for high-throughput sequence alignment using GPUs. It accelerates the alignment of multiple query sequences against a single reference sequence by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs.
MUMmerGPU (version 2.0) provides 3x-4x speedup compared to running MUMmer on CPUs. The data shown in the chart below is based on MUMmerGPU version 1.0 and will soon be updated.
HMMER: 60-100x faster
Opteron 2376 2.3 GHz
1 Tesla C1060 GPU
2 Tesla C1060 GPUs
3 Tesla C1060 GPUs0
789
120
100
80
60
40
Data courtesy of Scalable Informatics and University of Buffalo
# of States (HMM Size)
Spee
dup
20
1431 2295
27 mins 3 GPUs
2 GPUs
1 GPUs
CPU
37 mins
69 mins
28 mins
8 mins
11 mins
11 mins
15 mins
21 mins
40 mins
25 mins
21 mins
Data courtesy of University of Maryland
Time (minutes)
S. suis
L. monocytogenes
C. briggsae
MUMmerGPU vs MUMmer
1 Tesla GPU (8-Series)Intel Xeon 5120 (3.0 GHz)
3 mins
11 mins
6 mins
22 mins
10 mins
38 mins
10 20 300 40