IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD...

57
IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software, IBM T J Watson Research Center, Yorktown Heights, NY {sameerk,gheorghe}@us.ibm.com L. V. Kale, Chao Huang Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL {kale,chuang10}@uiuc.edu

Transcript of IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD...

Page 1: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2006 IBM Corporation

Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD

Sameer Kumar, Gheorghe AlmasiBlue Gene System Software,IBM T J Watson Research Center,Yorktown Heights, NY{sameerk,gheorghe}@us.ibm.com

L. V. Kale, Chao HuangDepartment of Computer Science,University of Illinois at Urbana Champaign,Urbana, IL{kale,chuang10}@uiuc.edu

Page 2: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation2

Outline

Background and motivation

NAMD and Charm++

Blue Gene optimizations

Performance results

Summary

Page 3: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation3

Blue Gene/L

Slow embedded core at a clock speed of 700 Mhz

– 32 KB L1 cache

– L2 is a small prefetch buffer

– 4MB Embedded DRAM L3 cache

3D Torus interconnect

– Each processor is connected to six torus links with a throughput of 175 MB/s

System optimized for massive scaling and power

Page 4: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2006 IBM Corporation

Blue Gene/L

2.8/5.6 GF/s4 MB

2 processors

2 chips, 1x2x1

5.6/11.2 GF/s1.0 GB

(32 chips 4x4x2)16 compute, 0-2 IO cards

90/180 GF/s16 GB

32 Node Cards

2.8/5.6 TF/s512 GB

64 Racks, 64x32x32

180/360 TF/s32 TB

Rack

System

Node Card

Compute Card

Chip

Has this slide been presented 65536 times ?

Page 5: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation5

Can we scale on Blue Gene/L ?

Several applications have demonstrated weak scaling

NAMD was one of the first applications to achieve strong scaling on Blue Gene/L

Page 6: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2006 IBM Corporation

NAMD and Charm++

Page 7: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation7

NAMD: A Production MD program

NAMD

Fully featured program from University of Illinois

NIH-funded development

Distributed free of charge (thousands downloads so far)

Binaries and source code

Installed at NSF centers

User training and support

Large published simulations (e.g., aquaporin simulation featured in keynote)

Page 8: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation8

NAMD Benchmarks

BPTI3K atoms

Estrogen Receptor36K atoms (1996)

ATP Synthase327K atoms

(2001)

Recent NSF Peta-scale proposal presents a 100 Million atom system

Page 9: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation9

Molecular Dynamics in NAMD

Collection of [charged] atoms, with bonds

– Newtonian mechanics

– Thousands to even a million atoms

At each time-step

– Calculate forces on each atom

• Bonds:• Non-bonded: electrostatic and van der Waal’s

– Short-distance: every timestep– Long-distance: using PME (3D FFT)– Multiple Time Stepping : PME every 4 timesteps

– Calculate velocities and advance positions

Challenge: femto-second time-step, millions needed!

Page 10: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation11

Spatial Decomposition

•Atoms distributed to cubes based on their location

• Size of each cube :•Just a bit larger than cut-off radius

•Computation performed by movable computes

•C/C ratio: O(1)

•However:

•Load Imbalance

•Easily scales to about 8 times number of patches

Cells, Cubes or“Patches”Typically 13 computes per patch

Movable Computes

Page 11: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation12

NAMD Computation

Application data divided into data objects called patches

– Sub-grids determined by cutoff

Computation performed by migratable computes

– 13 computes per patch pair and hence much more parallelism

– Computes can be further split to increase parallelism

Page 12: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation13

Charm++ and Converse

Charm++: Application mapped to Virtual Processors (VPs)

– Runtime maps VPs to physical processors

Converse: communication layer for Charm++

– Send, recv, progress, on node level

User ViewSystem implementation

Network

Scheduler

Recv Msg Q

obj

obj

obj

obj

obj

Send Msg Q

Interface

obj

Page 13: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation14

847 VPs108 VPs

100,000 VPs

NAMD Parallelization using Charm++

These 100,000+ Virtual Processors (VPs) are mapped to real processors by charm runtime system

Page 14: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2006 IBM Corporation

Optimizing NAMD on Blue Gene/L

Page 15: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation16

The Apo-lipo Protein A1

92,000 atoms

Benchmark for testing NAMD performance on various architectures

Page 16: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation17

F1 ATP Synthase

327K atoms

Can we run it on Blue Gene/L in virtual node mode?

Page 17: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation18

Lysozyme in 8M Urea Solution Total ~40,000 atoms

Solvated in 72.8Ǻ x 72.8Ǻ x 72.8Ǻ box

Lysozyme: 129 residues, 1934 atoms

Urea: 1811 molecules

Water: 7799 molecules

Water/Urea ratio: 4.31

Red: protein, Blue: urea; CPK: water

Ruhong Zhou, Maria Eleftheriou, Ajay Royyuru, Bruce Berne

Page 18: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation19

H5N1 Virus Hemaglutinin Binding

Page 19: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation20

HA Binding Simulation Setup

Homotrimer, each with 2 subunits (HA1 & HA2)

Protein: 1491 residues, and 23400 atoms

3 Sialic acids, 6 NAGs (N-acetyl-D-Glucosamine)

Solvated in 91Å x 94Å x 156Å water box, with total 35,863 water molecules

30 Na+ ions to neutralize the system

Total ~131,000 atoms

PME for long-range electrostatic interactions

NPT simulation at 300K and 1atm

Page 20: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation21

1

10

100

1000

32 64 128 256 512 1024 2048 4096 8192

IA64-Myrinet

BGL

NAMD 2.5 in May 2005

Processors

Ste

p T

ime

(ms)

APoA1 step time with PME in Co-Processor Mode

Initial serial time 17.6s

Page 21: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation22

Parallel MD: Easy or Hard?

Easy

– Tiny working data

– Spatial locality

– Uniform atom density

– Persistent repetition

Hard

– Sequential timesteps

– Very short iteration time

– Full electrostatics

– Fixed problem size

– Dynamic variations

Page 22: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation23

NAMD on BGL

Advantages

– Both application and hardware are 3D grids

– Large 4MB L3 cache

– Higher bandwidth for short messages

– Six outgoing links from each node

– Static TLB

– No OS Daemons

Disadvantages

– Slow embedded CPU

– Small memory per node

– Low bisection bandwidth

– Hard to scale full electrostatics

– Hard to overlap communication with computation

Page 23: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation24

Single Processor Performance

Inner loops

– Better software pipelining

– Aliasing issues resolved through the use of

#pragma disjoint (*ptr1, *ptr2)

– Cache optimizations

– 440d to use more registers

– Serial time down from 17.6s (May 2005) to 7s

– Iteration time down from 80 cycles to 32 cycles

– Full 440d optimization would require converting some data structures from 24 to 32 bytes

Page 24: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation25

Memory Performance

Memory overhead high due to several short memory allocations

– Group short memory allocations into larger buffers

– We can now run the ATPase system in virtual node mode

Other sources of memory pressure

– Parts of atom structure duplicated on all processors

– Other duplication to support external clients like TCL and VMD

– These issues still need to be addressed

Page 25: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation26

BGL Parallelization

Topology driven problem mapping

– Blue Gene Has a 3D Torus network

– Near neighbor communication has better performance

Load-balancing schemes

– Choice of correct grain size

Communication optimizations

– Overlap of computation and communication

– Messaging performance

Page 26: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation27

Problem Mapping

X

Y

Z

X

Y

Z

Application Data Space Processor Grid

Page 27: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation28

Problem Mapping

X

Y

Z

X

Y

Z

Application Data Space Processor Grid

Page 28: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation29

Problem Mapping

Application Data SpaceX

Y

Z

Processor Grid

Y

X

Z

Page 29: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation30

Problem Mapping

X

Y

Z

Processor Grid

Data Objects

Cutoff-driven Compute Objects

Page 30: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation31

Improving Grain Size: Two Away Computation

Patches based on cutoff are too coarse on BGL

Each patch can be split along a dimension

– Patches now interact with neighbors of neighbors

– Makes application more fine grained

• Improves load balancing

– Messages of smaller size sent to more processors

• Improves torus bandwidth

Page 31: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation32

Two Away X

Page 32: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation33

Load Balancing Steps

Regular Timesteps

Instrumented Timesteps

Detailed, aggressive Load Balancing

Refinement Load Balancing

Page 33: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation34

Load-balancing Metrics

Balancing load

Minimizing communication hop-bytes

– Place computes close to patches

Minimizing number of proxies

– Effects connectivity of each patch object

Page 34: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation35

Communication in NAMD

Three major communication phases

– Coordinate multicast

• Heavy communication

– Force reduction

• Messages trickle in

– PME

• Long range calculations which require FFTs and alltoalls

Page 35: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation36

Optimizing communication

Overlap of communication with computation

New messaging protocols

– Adaptive eager

– Active put

Fifo mapping schemes

Page 36: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation37

Overlap of Computation and Communication

Each FIFO has 4 packet buffers

Progress engine should be called every 4000 cycles

Progress overhead of about 200 cycles

– 5 % increase in computation

Remaining time can be used for computation

Page 37: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation38

Network Progress Calls

NAMD makes progress engine calls from the compute loops

– Typical frequency is10000 cycles, dynamically tunable

for ( i = 0; i < (i_upper SELF(- 1)); ++i ){

CmiNetworkProgress();

const CompAtom &p_i = p_0[i];

//……………………………

//Compute Pairlists

for (k=0; k<npairi; ++k) {

//Compute forces

}

}

void CmiNetworkProgress() {

new_time = rts_get_timebase();

if(new_time < lastProgress + PERIOD) {

lastProgress = new_time;

return;

}

lastProgress = new_time;

AdvanceCommunication();

}

Page 38: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation39

Charm++ Runtime Scalability

Charm++ MPI Driver

– Iprobe based implementation

– Higher progress overhead of MPI_Test

– Statically pinned FIFOs for point to point communication BGX Message Layer (developed in collaboration with George Almasi)

– Lower progress overhead makes overlap feasible

– Active messages• Easy to design complex communication protocols

– Charm++ BGX driver was developed by Chao Huang last summer

– Dynamic FIFO mapping

Page 39: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation41

Better Message Performance: Adaptive Eager

Messages sent without rendezvous but with adaptive routing

Impressive performance results for messages in the 1KB-32KB range

Good performance for small non-blocking all-to-all operations like PME

Can achieve about 4 links of throughput

Page 40: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation42

Active Put

A put that fires a handler at the destination on completion

Persistent communication

Adaptive routing

Lower per message overheads

Better cache performance

Can optimize NAMD coordinate multicast

Page 41: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation43

FIFO Mapping

pinFifo Algorithms

– Decide which of the 6 FIFOs to use when send msg to {x,y,z,t}

– Cones, Chessboard

Dynamic FIFO mapping

– A special send queue that msg can go from whichever FIFO that is not full

Page 42: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2006 IBM Corporation

Performance Results

Page 43: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation45

BGX Message layer vs MPI

# NodesAPoA1 with PME

Native Layer MPI

32 347 371

128 97.2 -

512 23.7 27.8

1024 13.8 17.3

2048 8.6 10.2

4096 6.2 7.3

8192 5.2 -

NAMD 2.6b1 Co-Processor Mode Performance (ms/step) (OCT 2005)

Fully non-blocking version performed below par on MPI

– Polling overhead high for a list of posted receives

BGX native comm. layer works well with asynchronous communication

Page 44: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation46

1

10

100

1000

32 64 128 256 512 1024 2048 4096 8192 16384

May-05

Oct-05

Mar-06

IA64-Myr-May05

NAMD Performance

Processors

Ste

p T

ime

(ms)

APoA1 step time with PME in Co-Processor Mode

Scaling = 2.5Scaling = 4.5

Time-step = 4ms

Page 45: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation47

Virtual Node Mode

0

5

10

15

20

25

512 1024 2048 4096 8192

CP (Mar 06)VN (Mar 06)

Processors

Ste

p T

ime

(ms)

APoA1 step time with PME

Plot comparing VN mode

with CO mode

on twice as many chips

Page 46: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation48

Impact of Optimizations

Optimization Performance (ms)

NAMD v2.5 40

NAMD v2.6 (OCT-05)

Blocking

25.2

Fine Grained 24.3

Congestion Control 20.5

Topology Loadbalancer 14

Dynamic FIFO Mapping 13.5

Non Blocking 11.9

NAMD cutoff step time on the APoA1 system on 1024 processors

Page 47: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation49

Blocking Communication

(Projections timeline of a 1024-node run without aggressive network progress)

Network progress not aggressive enough: communication gaps result in a low utilization of 65%

Page 48: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation50

Effect of Network Progress

(Projections timeline of a 1024-node run with aggressive network progress)

More frequent advance closes gaps: higher network utilization of about 75%

Page 49: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2006 IBM Corporation

Summary

Page 50: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation52

Impact on Science

Dr Zhao ran the Lysome system for 6.7 billion time steps over about two months on 8 racks of Blue Gene/L

Page 51: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation53

Lysozyme Misfolding & Amyloids

Mechanism behind protein misfolding and amyloid formation – Alzheimer’s disease

Amyloids can be formed not only from traditional -amyloid peptides, but also from almost any proteins, such as lysozyme.

A single mutation on lysozyme (TRP62GLY) can cause the protein to be less stable and also misfold to form possible amyloids.

More mysteriously, the single mutation site TRP62 is on surface not in hydrophobic core.

To study lysozyme misfolding and amyloids formation

10 s aggregate MD simulation

C. Dobson and coworkers, Science 295, 1719, 2002; C. Dobson and coworkers, Nature 424, 783, 2003

Page 52: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation54

Page 53: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation55

Summary

Machine is capable of massive performance

– We were able to scale ApoA1 on NAMD to 8k processors

– The bigger ATPase system also scales to 8k processors

Applications benefit from native messaging APIs

Topology optimizations are a big winner

Overlap of computation and communication is possible

Lack of operating system daemons leads to massive scaling

Page 54: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation56

Future Plans

Improve Application Scaling

– We still have some Amdahl bottlenecks

• Splitting bonded work• 2D or 3D decompositions for PME

– Reducing grain size overhead

– Improve load-balancing

Page 55: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation57

847 VPs108 VPs

100,000 VPs

NAMD Parallelization using Charm++

These 100,000+ Virtual Processors (VPs) are mapped to real processors by charm runtime system

Page 56: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation58

Towards Peta Scale Computing

Sequential performance has to improve from 0.7 flops/cycle to 1-1.5 flops per cycle

– Explore new algorithms for the inner loop to reduce register and cache pressure

– Effectively using the double hummer

Reduce memory pressure to run very large problems

Fully distributed load balancer

Page 57: IBM Research © 2006 IBM Corporation Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Gheorghe Almasi Blue Gene System Software,

IBM Research

© 2005 IBM Corporation59

Acknowledgements

Funding Agencies

– NIH, NSF, DOE (ASCI center)

Students, Staff and Faculty

– Parallel Programming Laboratory• Chao Huang, Gengbin Zheng, David Kunzman, Chee Wai Lee, Prof.

Kale

– Theoretical Biophysics• Klaus Schulten, Jim Phillips

– IBM Watson• Gheorghe Almasi, Hao Yu

– IBM Toronto• Murray Malleschuk, Mark Mendell