ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17)...

5
The NVIDIA ® Tesla™ Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a “computational laboratory” capable of running complex bioscience codes, in fields such as drug discovery and DNA sequencing, more than 10-20 times faster through the use of NVIDIA Tesla GPUs. It consists of bioscience applications; a community site for downloading, discussing, and viewing the results of these applications; and GPU-based platforms that enable these applications to run at 1/10 th the cost of CPU-only computers. Complex molecular simulations that had been only possible using supercomputing resources can now be run on an individual workstation, optimizing the scientific workflow and accelerating the pace of research. These simulations can also be scaled up to GPU-based clusters of servers to simulate large molecules and systems that would have otherwise required a supercomputer. Applications that are accelerated on GPUs include: • Molecular Dynamics & Quantum Chemistry - AMBER, GROMACS, HOOMD, LAMMPS, NAMD, TeraChem (Quantum Chemistry), VMD • Bio Informatics - CUDA-BLASTP, CUDA-EC, CUDA-MEME, CUDASW++ (Smith-Waterman), GPU-HMMER, MUMmerGPU For more information, visit www.nvidia.com/bio_workbench GPU SOLUTIONS The Tesla Bio Workbench applications can be deployed on GPU-based desktop personal supercomputers or in data center solutions. Built on the revolutionary, massively parallel CUDA™ GPU computing architecture, these solutions are designed to accelerate the pace of computational science and are available today. Visit www.nvidia.com/tesla to find out more about: WORKSTATION SOLUTIONS: TESLA PERSONAL SUPERCOMPUTER For personal supercomputing at your desk DATA CENTER SOLUTIONS: TESLA GPU COMPUTING CLUSTERS For computing with large-scale installations ENABLING NEW SCIENCE TESLA™ BIO WOrkBENCh

Transcript of ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17)...

Page 1: ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17) C60-a (carbon-60) C60-b (carbon-60) 10 3 Lattice points/sec Data courtesy of Petachem

The NVIDIA® Tesla™ Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a “computational laboratory” capable of running complex bioscience codes, in fields such as drug discovery and DNA sequencing, more than 10-20 times faster through the use of NVIDIA Tesla GPUs.

It consists of bioscience applications; a community site for downloading, discussing, and viewing the results of these applications; and GPU-based platforms that enable these applications to run at 1/10th the cost of CPU-only computers.

Complex molecular simulations that had been only possible using supercomputing resources can now be run on an individual workstation, optimizing the scientific workflow and accelerating the pace of research. These simulations can also be scaled up to GPU-based clusters of servers to simulate large molecules and systems that would have otherwise required a supercomputer.

Applications that are accelerated on GPUs include:

• Molecular Dynamics & Quantum Chemistry

- AMBER, GROMACS, HOOMD, LAMMPS, NAMD, TeraChem (Quantum Chemistry), VMD

• Bio Informatics

- CUDA-BLASTP, CUDA-EC, CUDA-MEME, CUDASW++ (Smith-Waterman), GPU-HMMER, MUMmerGPU

For more information, visit www.nvidia.com/bio_workbench

GPU SOLUTIONS

The Tesla Bio Workbench applications can be deployed on GPU-based desktop personal supercomputers or in data center solutions. Built on the revolutionary, massively parallel CUDA™ GPU computing architecture, these solutions are designed to accelerate the pace of computational science and are available today. Visit www.nvidia.com/tesla to find out more about:

WORKSTATION SOLUTIONS:

TESLA PERSONAL SUPERCOMPUTER For personal supercomputing at your desk

DATA CENTER SOLUTIONS:

TESLA GPU COMPUTING CLUSTERS For computing with large-scale installations

ENABLING NEW SCIENCETESLA™ BIO WOrkBENCh

Page 2: ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17) C60-a (carbon-60) C60-b (carbon-60) 10 3 Lattice points/sec Data courtesy of Petachem

AMBEr

The explicit solvent and implicit solvent simulations in AMBER have been accelerated using CUDA-enabled GPUs. Paired with a Tesla GPU computing solution built on the CUDA architecture, it enables more than 10x speedup compared to a single quad-core CPU.

Benchmark data for explicit solvent PME and implicit solvent GB Benchmarks. Details can be found on the AMBER 11 NVIDIA benchmark page courtesy of the San Diego Supercomputing Center.

GrOMACS

GROMACS is a molecular dynamics package designed primarily for simulation of biochemical molecules like proteins, lipids, and nucleic acids that have a lot complicated bonded interactions. The CUDA port of GROMACS enabling GPU acceleration is now available in beta and supports Particle-Mesh-Ewald (PME), arbitrary forms of non-bonded interactions, and implicit solvent Generalized Born methods.

The CUDA version of GROMACS currently supports a single GPU and produces the following results:

hOOMD

HOOMD-blue is a general purpose particle dynamics package written from ground-up to take advantage of the revolutionary CUDA architecture in NVIDIA GPUs. It contains a number of different force fields and integrators and its object oriented design enables adding additional ones easily.

Detailed benchmarks are available on the HOOMD-blue page. One Tesla GPU running HOOMD-blue can outperform 32 CPU cores.

LAMMPS

LAMMPS is a classical molecular dynamics package written to run well on parallel machines. The CUDA version of LAMMPS is accelerated by moving the force calculations to the GPU.

LAMMPS on the GPU scales very well as seen by the results below. Two Tesla GPUs running GPU-LAMMPS outperforms 24 CPUs.

0

5.94

11.93

ns/d

ay

8 CPU coresTesla C1060Tesla C2050

25

DHFR NVE23,558 atoms

Nucleosome25,095 atoms 1GB-1

20.70

0

1.2

0.06

0.52

ns/d

ay

1.04

20

15

10

5

1

0.8

0.6

0.4

0.2

CPU = 2x Quad-core Intel Xeon E 5462, ifort 10.1, MKL 10.1, MPICHZGPU = Tesla C1060 / Tesla C2050, Gfortran 4.1, nvcc v3.0

LAMMS on 32 x 86 CPU coresHOOMD-BLUE on 1 Tesla 10-series GPUHOOMD-BLUE on 2 Tesla 10-series GPUs

Tim

e st

eps/

sec

Polymer(64K particles)

Lennard-Jones Liquid(64K particles)

Data courtesy of University of Michigan

HOOMD on GPUs vs LAMMPS on 32 CPU cores

300

250

200

150

100

50

450

400

350

0

2 GPUs

1 GPU

32 CPUcores

2 GPUs

1 GPU

32 CPUcores

Fem

tose

c si

mul

ated

/sec

Number of Quad-Core CPUsor Number of Tesla GPUs

2 GPUs = 24 CPUs

Data courtesy of Scott Hampton & Pratul K. Agarwal, Oak Ridge National Laboratory

LAMMPS Results on CPU vs GPU Clusters

24 CPUs

2 GPUs

01

140

160

180

120

100

80

60

20

8 24

40

2x Intel Quad Core E5462Tesla C1060 GPU

12.0

10.0

8.0

6.0

4.0

2.0

0.0

step

s/se

c 22xFaster

0

7000

step

s/se

c

6000

5000

4000

3000

2000

1000

NaCl (894 atoms)Solvated Spectrin(37.8K atoms)

Spectrin Dimer(75.6K atoms)

Data courtesy of Stockholm Center for Biomembrane Research

Reaction-Field CutoffsParticle-Mesh-Ewald (PME)

GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS

Page 3: ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17) C60-a (carbon-60) C60-b (carbon-60) 10 3 Lattice points/sec Data courtesy of Petachem

NAMD

The Team at University of Illinois at Urbana-Champaign (UIUC) has been enabling CUDA-acceleration on NAMD since 2007. They have performed scaling experiments on the NCSA Tesla-based Lincoln cluster and demonstrated that 4 Tesla GPUs can outperform a cluster with 16 quad-core CPUs.

NAMD scales very well on a Tesla GPU cluster as demonstrated by these results.

TErAChEM

TeraChem is the first general purpose quantum chemistry software package designed ground up specifically to take advantage of the massively parallel CUDA architecture of NVIDIA GPUs. TeraChem currently supports density functional theory (DFT) and Hartree-Fock (HF) methods.

TeraChem supports multiple GPUs using p-threads and will soon be MPI enabled. A workstation with 4 Tesla GPUs running TeraChem outperforms 256 quad-core CPUs running GAMESS.

VMD

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. Several key kernels and applications in VMD now take advantage of the massively parallel CUDA architecture of NVIDIA’s GPUs. These applications run 20x to 100x faster when using a NVIDIA CUDA GPU compared to running them on a CPU only.

Number of Quad-Core CPUsor Number of Tesla GPUs

Data courtesy of Theoretical and Computation Bio-physics Group, UIUC

NAMD Results on CPU vs GPU Clusters

0

7

8

9

6

5

4

3

1

2

4 GPUs = 16 CPUs

STM

V St

ep/s

1 2 4 8 16Data courtesy of Theoretical and Computational Bio-physics Group, UIUC

Molecular Orbital Computate in VMD

Intel Core 2 Q6600 (Quad) 2.4 GHzTesla C1060 GPU

50000

30000

40000

10000

0

20000

70000

60000

Thr-a(threonine-17)

Thr-b(threonine-17)

C60-a(carbon-60)

C60-b(carbon-60)

103 L

attic

e po

ints

/sec

Data courtesy of Petachem

Speedup of TeraChem on GPU vs GAMESS on CPU

Vallnomyclin

Taxol

Buckyball

Olestra50x

11x

7x

8x

0 10 20 30 40 50 60 70

2x Intel Quad Core 5520 CPU4x Tesla C1060 GPU

GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS

Page 4: ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17) C60-a (carbon-60) C60-b (carbon-60) 10 3 Lattice points/sec Data courtesy of Petachem

CUDA-BLASTP

CUDA-BLASTP is designed to accelerate NCBI BLAST for scanning protein sequence databases by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. CUDA-BLASTP also has a utility to convert FASTA format database into files readable by CUDA-BLASTP.

CUDA-BLASTP running on a workstation with two Tesla C1060 GPUs is 10x faster than NCBI BLAST (2.2.22) running on an Intel i7-920 CPU. This cuts compute time from minutes on CPUs to seconds using GPUs.

CUDA-EC

CUDA-EC is a fast parallel sequence error correction tool for short reads. It corrects sequencing errors in high-throughput short-read (HTSR) data and accelerates HTSR by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. Error correction is a preprocessing step for many DNA fragment assembly tools and is very useful for the new high-throughput sequencing machines.

CUDA-EC running on one Tesla C1060 GPU is up to 20x faster than Euler-SR running on a x86 CPU. This cuts compute time from minutes on CPUs to seconds using GPUs. The data in the chart below are error rates of 1%, 2%, and 3% denoted by A1, A2, A3 and so on for 5 different reference genomes.

CUDA-MEME

CUDA-MEME is a motif discovery software based on MEME (version 3.5.4). It accelerates MEME by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. It supports the OOPS and ZOOPS models in MEME.

CUDA-MEME running on one Tesla C1060 GPU is up to 23x faster than MEME running on a x86 CPU. This cuts compute time from hours on CPUs to minutes using GPUs. The data in the chart below are for the OOPS (one occurrence per sequence) model for 4 datasets.

CUDASW++ (SMITh-WATErMAN)

CUDASW++ is a bio-informatics software for Smith-Waterman protein database searches that takes advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs to perform sequence searches 10x-50x faster than NCBI BLAST. CUDASW++ supports query lengths up to 59K.

CUDASW++ can take advantage of multiple Tesla GPUs and runs between 10x-50x faster than NCBI BLAST, getting up to 30 GCUPs on query lengths of over 5000. Whereas BLAST is a heuristic algorithm, CUDASW++ does optimal local alignment using the Smith-Waterman algorithm.

Data courtesy of Nanyang Technological University, Singapore

NCBI BLASTP: 1 Thread Intel i7-920 CPU NCBI BLASTP: 4 Threads Intel i7-920 CPU CUDA BLASTP: 1 Tesla C1060 GPUCUDA BLASTP: 2 Tesla C1060 GPUs

Query Sequence Length

Spee

dup

12

10

8

6

4

2

127 254 517 1054 20260

CUDA-BLASTP vs NCBI BLASTP Speedups

3.6 sec 3.6 sec 8.5 sec

22 sec

21 sec

Data courtesy of Nanyang Technological University, Singapore

AMD Opteron 2.2 GHzTesla C1060 GPU

Spee

dup

25

20

15

10

Mini-drosoph4 sequences

HS_100100 sequences

HS_200200 sequences

HS_4004 sequences

5

0

CUDA-MEME vs MEME Speedups

22 mins

15 mins 57 mins 4 hours

0

35

30

25

20

15

10

GC

UP

S

CUDASW++ vs NCBI BLAST

Data courtesy of Nanyang Technological University, Singapore

Query Sequence Length

2 GPUs:BLOSUM45

CUDASW++ 2.0

BLAST on CPU:BLOSUM62

BLAST on CPU:BLOSUM50

1 GPU:BLOSUM45

5

BLAST BLOSUM50 10-3kBLAST BLOSUM62 10-2k1 Tesla C1060 GPU (CUDASW++)2 Tesla C1060 GPUs (CUDASW++)

144

189

222

375

464

567

657

729

850

1000

1500

2005

2504

3005

3564

4061

4548

4743

5147

5478

Data courtesy of Nanyang Technological University, Singapore

Euler-SR: AMD Opteron 2.2 GHzCUDA-EC: Tesla C1060

Read Length =35 Error rates of 1%, 2%, 3%

Spee

dup

25

20

15

10

A1

A2

A3

B1

B2

B3

C1

C2

C3

D1

D2 E

S. cer 50.58M

S. cer 71.1M

H. inf1.9M

E. col4.7M

S. aures4.7M

0

CUDA-EC vs Euler-SR Speedups

5

GPU-BASED BIO-INFORMATICS APPLICATIONS

Page 5: ENABLING NEW SCIENCE TESLA™ BIO WOrkBENChSecure Site ...Thr-a (threonine-17) Thr-b (threonine-17) C60-a (carbon-60) C60-b (carbon-60) 10 3 Lattice points/sec Data courtesy of Petachem

To learn more about the NVIDIA Tesla Bio Workbench, go to www.nvidia.com/bio_workbench.

© 2010 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Tesla, and CUDA, are trademarks and/or registered trademarks of NVIDIA Corporation. All company and product names are trademarks or registered trademarks of the respective owners with which they are associated.

Partner Logo

GPU-hMMEr

GPU-HMMER is a bioinformatics software that does protein sequence alignment using profile HMMs by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. GPU-HMMER is 60-100x faster than HMMER (2.0).

GPU-HMMER accelerates the hmmsearch tool using GPUs and gets speed-ups ranging from 60-100x. GPU-HMMER can take advantage of multiple Tesla GPUs in a workstation to reduce the search from hours on a CPU to minutes using a GPU.

MUMMErGPU

MUMmerGPU is a bio-informatics software for high-throughput sequence alignment using GPUs. It accelerates the alignment of multiple query sequences against a single reference sequence by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs.

MUMmerGPU (version 2.0) provides 3x-4x speedup compared to running MUMmer on CPUs. The data shown in the chart below is based on MUMmerGPU version 1.0 and will soon be updated.

HMMER: 60-100x faster

Opteron 2376 2.3 GHz

1 Tesla C1060 GPU

2 Tesla C1060 GPUs

3 Tesla C1060 GPUs0

789

120

100

80

60

40

Data courtesy of Scalable Informatics and University of Buffalo

# of States (HMM Size)

Spee

dup

20

1431 2295

27 mins 3 GPUs

2 GPUs

1 GPUs

CPU

37 mins

69 mins

28 mins

8 mins

11 mins

11 mins

15 mins

21 mins

40 mins

25 mins

21 mins

Data courtesy of University of Maryland

Time (minutes)

S. suis

L. monocytogenes

C. briggsae

MUMmerGPU vs MUMmer

1 Tesla GPU (8-Series)Intel Xeon 5120 (3.0 GHz)

3 mins

11 mins

6 mins

22 mins

10 mins

38 mins

10 20 300 40