GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

42
GPUenabled Studies of Molecular Systems on Keeneland On pursuing high resource u/liza/on and coordinated simula/ons’ progression Michela Taufer, Sandeep Patel, Samuel Schlachter, and Stephen Herbein University of Delaware Jeremy Logan, ORNL

Transcript of GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Page 1: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

GPU-­‐enabled  Studies  of  Molecular  Systems  on  Keeneland  

-­‐  On  pursuing  high  resource  u/liza/on  and  coordinated  simula/ons’  progression    

Michela  Taufer,  Sandeep  Patel,  Samuel  Schlachter,  and  Stephen  Herbein  

University  of  Delaware  

Jeremy  Logan,  ORNL  

Page 2: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Taxonomy  of  simula/ons    

•  Simula/ons  applying  fully  atomis/cally  resolved  molecular  models  and  force  fields    GPUs  enable  longer  /me  and  space  scales  

•  Variable  job  lengths  (ns/day):    As  a  trajectory  evolves    Across  trajectories  with  different  e.g.,  concentra/ons  

•  Fully  or  par/ally  coordinated  simula/on  progression:    Fully  coordinated  needed  for  e.g.,  replica-­‐exchange  molecular  dynamics  (REMD)    

  Par/ally  coordinated  for  e.g.,  SDS  and  nanotubes  systems  

1

Page 3: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Constraints  on  high–end  computer  systems  

•  Resource  constraints  on  high-­‐end  clusters:    Limited  wall-­‐/me  limit  per  job  (e.g.,  24  hours)    Mandatory  use  of  resource  managers        No  direct  submission  and  monitoring  of  GPU  jobs  

•  Logical  GPU  job  does  not  map  to  physical  GPU  job      Workflow  managers  s/ll  in  infancy  

•  System  and  applica/on  failures  on  GPUs  are  undetected    Resource  managers  remain  with  no  no/on  of  job  termina/ons  on  GPUs  

2

Page 4: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Moving  beyond  virtualiza/on    

• When  clusters  do  include  virtualiza;on      E.g.,  Shadowfax  

• We  can  schedule  isolated  CPU/GPU  pairs      This  allows  us  to  associate  failures  with  a  specific  GPU  

•  Virtualiza/on  imposes  overheads    Power    Performance    Noise  or  ji]er    Portability  and  maintainability  

 …  and  may  not  be  available  

3

Our goal: Pursuing BOTH high accelerators’ utilization and (fully or partially) coordinated simulations’ progression

on GPUs in effective and cross-platform ways

Page 5: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Our  approach  

•  Two  so_ware  modules  that  plug  into  exis/ng  resource  managers  and  workflow  managers    No  virtualiza/on  to  embrace  diverse  clusters  and  programming  languages  

•  A  companion  module:    Runs  on  the  head  node  of  the  cluster    Accepts  jobs  from  workflow  manager    Instan/ates  "children"  wrapper  modules      Dynamically  splits  jobs  and  distributes  job  segments  to  wrapper  modules  

•  A  wrapper  module:    Launches  on  compute  node  as  a  resource  manager  job    Receives  and  runs  job  segments  from  companion  module    Reports  status  of  job  segments  to  companion  module  

4

Page 6: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

5

Workflow Manager

Resource Manager

Front-end node Back-end node

Companion Module

User node

Job queue

Page 7: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

6 Front-end node Back-end node

Workflow Manager: •  generate set of 24-hour jobs

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

User node

Job queue

Page 8: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

7 Front-end node

Job queue

Back-end node

24-hour jobs

Workflow Manager: •  send set of 24-hour jobs to companion module Companion Module: •  receive 24-hour jobs •  generate a Wrapper Module (WM) instance per back-node

Workflow Manager

Resource Manager

Companion Module

WM instance

User node

Page 9: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

8 Front-end node

Job queue

Back-end node

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  submit WM instance as a job to resource manager

User node

WM instance

24-hour jobs

Page 10: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

9 Front-end node

Job queue

Back-end node

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  submit WM instance as a job to resource manager

User node

WM instance

24-hour jobs

Page 11: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

10 Front-end node Back-end node

Workflow Manager

Resource Manager

Companion Module

Resource Manager: •  launch WM instance as a job on back-end node

Job queue

24-hour jobs

WM job

User node

WM instance

Page 12: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

11

Job queue

WM job

Resource Manager

Companion Module

Wrapper Module: •  ask companion module for job segments, as many as the available GPUs

24-hour jobs

User node

Front-end node Back-end node

Page 13: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

12

Job queue

WM job

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  fragment jobs into 6-hour subjobs

24-hour jobs

User node

Front-end node Back-end node

Page 14: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

13

Job queue

WM job

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  fragment jobs into 6-hour subjobs •  send bundle of 3 subjobs to WM job

24-hour jobs

User node

Front-end node Back-end node

Page 15: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

14

Job queue

WM job

Workflow Manager

Resource Manager

Companion Module

Companion Module: •  fragment jobs into 6-hour subjobs •  send bundle of 3 subjobs to WM job

24-hour jobs

User node

Front-end node Back-end node

Page 16: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

15

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

Wrapper Module: •  instantiate subjobs on GPUs •  monitor system and application failures as well as time constraints

User node

Front-end node Back-end node

Page 17: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

16

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

User node Wrapper Module: •  instantiate subjobs on GPUs •  monitor system and application failures as well as time constraints

Front-end node Back-end node

Page 18: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

17

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

Wrapper Module: •  if subjob terminates prematurely because of e.g., system or application failures, it request new subjob

User node

Front-end node Back-end node

Page 19: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

18

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

User node

Front-end node Back-end node

Companion Module: •  adjust length of new subjob based on heuristics, e.g., to complete initially 6-hour period •  send subjob to wrapper module for execution

Page 20: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modules  in  ac/on  

19

Job queue

Workflow Manager

Resource Manager

Companion Module

24-hour jobs

WM job

User node

Front-end node Back-end node

Companion Module: •  adjust length of new subjob based on heuristics, e.g., to complete initially 6-hour period •  send subjob to wrapper module for execution

Page 21: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

MD  Simula/ons  

• MD  simula/ons:    Case  study  1:  Study  of  sodium  dodecyl  sulfate  (SDS)  molecules  aqueous  solu/ons  and  electrolyte  solu/ons    

  Case  study  2:  Study  of  nanotubes  in  aqueous  solu/ons  and  electrolyte  solu/ons    

•  GPU  code  FEN  ZI  (Yun  Dong  de  FEN  ZI  =  Moving  MOLECULES)    MD  simula/ons  in  NVT  and  NVE  ensembles  and  energy  minimiza/on  in  explicit  solvent    

  Constraints  on  interatomic  distances  e.g.,  shake  and  ra]le,  atomic  restraints,  and  freezing  fast  degrees  of  mo/ons  

  Electrosta/c  interac/ons,  i.e.,  Ewald  summa/on,  performed  on  GPU  • Metrics  of  interest:  

  U/liza/on  of  GPUs  –  i.e.,  /me  ra/o  accountable  for  simula/on’s  progression  

20

Page 22: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

The  Keeneland  system  

•  GPU  descrip/on:    3  M2090  GPUs  per  node  

•  So_ware:    TORQUE  Resource  Manager    Globus  allows  for  the  use  of  Pegasus  Workflow  Manager    Shared  Lustre  file  system  

•  Constraints:      24-­‐hour  /me  limit    1  job  per  node  (cannot  have  mul/ple  jobs  on  one  node)    Can  set  GPUs  into  Shared/Exclusive  mode  but  not  complete  isola/on  (e.g.,  user  that  get  access  first  can  steal  all  the  GPUs)  

  Vendor  specific  with  specific  version  of  NVIDIA  driver  (>260))  

21

Page 23: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modeling  max  u/liza/on  

• With  our  approach  using            segments  in  24-­‐hour  period:  

• Without  our  approach:  

22

utilization =days∑

tmax − tarrival (i) − tlastchk (i)( ) + trestart[ ]i=1

n−1

∑ − tmax − tarrival (n)( )

tmaxGPUs∑

utilization =days∑

tmax − tarrival (1) − tlastchk (1)( ) − tmax − tarrival (1)( )tmaxGPUs

tarrival (i) = tlastcheck (i) when tarrival (i) > tmaxtarrival (i) otherwise

⎧ ⎨ ⎪

⎩ ⎪

tlastchk (n) = f (molecular _ system)

where:  

n

Page 24: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case    study  1:  Sodium  Dodecyl  Sulfate  (SDS)  

23

molar concentrations: 0.10 molar concentrations: 0.25

molar concentrations: 0.50 molar concentrations: 1.00

Initial structures: surfactant molecules randomly distributed

Page 25: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case    study  1:  variable  simula/on  /mes  

24

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 2 4 6 8 10 12

perfo

rman

ce (n

s/da

y)

simulation time (ns)

0.1 0.25 0.5 1

Page 26: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  1:  testbeds  

•  Taxonomy  of  our  simula/ons:    4  concentra/ons  and  3  200-­‐ns  trajectories  per  concentra/on  at  298K    

•  Test  1:    Jobs  with  same  concentra/ons  assigned  to  same  node  

•  Test  2:    Jobs  with  different  concentra/ons  assigned  to  same  node  

25

24 hours 24 hours 24 hours 24 hours

Test 1: Test 2:

Page 27: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

utilization =days∑

tmax − tarrival (1) − tlastchk (1)( ) − tmax − tarrival (1)( )tmaxGPUs

∑€

utilization =days∑

tmax − tarrival (i) − tlastchk (i)( ) + trestart[ ]i=1

n−1

∑ − tmax − tarrival (n)( )

tmaxGPUs∑

Case  study  1:  modeling  max  u/liza/on  

• With  our  approach  using            segments  in  24-­‐hour  period:  

• Without  our  approach:  

26

utilization =days∑

tmax − trestart[ ]i=1

n−1

∑ − tmax − tarrival (n)( )

tmaxGPUs∑

tarrival (i) = tlastcheck (i) when tarrival (i) > tmaxtarrival (i) otherwise

⎧ ⎨ ⎪

⎩ ⎪

tmax = 24hours

where:  

n

utilization =days∑

tmax − tmax − tarrival (1)( )tmaxGPUs

Page 28: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  1:  modeling  arrival  /me  

We  model                              in  two  ways:    Scien/sts:  run  short  simula/on,  compute  ns/day,  define  job’s  speed  to  constant  rate  to  fit  into  24-­‐hour  period  

  Our  approach:  segment  24-­‐hour  job  in  segments,  adjust  segment  length  based  on  heuris/c  that  takes  into  account  change  in  ns/day    

27

tarrival (i)

Page 29: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  1:  our  heuris/c  

28

observed performance

our heuristic

projected performance

Page 30: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  1:  results    

29

•  Run  12  10-­‐day  trajectories  with  4  concentra/ons  and  3  different  seeds  on  Keeneland,  three  trajectories  per  node  

99.54%

99.18%

97.83%

95.83%

98.82%

98.44%

96.98%

94.85%

98.78%

98.26%

96.21%

93.50%

97.08%

96.53%

94.49%

91.72%

0.5

1

3

6

With our approach W/o our approach

test 1test 2 test 2test 1tchkpnt(hours)

Page 31: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  1:  snapshots  of  ongoing  simula/ons    

30

molar concentrations: 0.10 time: 0ns

molar concentrations: 0.25 time: 0ns

molar concentrations: 0.50 time: 0ns

molar concentrations: 1.00 time: 0ns

Initial structures: surfactant molecules randomly distributed molar concentrations: 0.10 time: 22ns

molar concentrations: 1.00 time: 15ns

molar concentrations: 0.25 time: 20ns

molar concentrations: 0.50 time: 20ns

Page 32: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:  Carbon  Nanotubes  

•  Study  nanotubes  in  aqueous  solu/ons  and  electrolyte  solu/ons      Different  temperatures    Different  separa/ons  

•  Scien/fic  metrics:    Poten/al  of  mean  force    Effect  of  electrolytes,  ie,  sodium  chloride  and  iodide    

  Ion  spa/al  distribu/ons  

31

24 Å

13.6 Å

Page 33: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:  testbeds    

•  Taxonomy  of  the  simula/ons:    10  temperatures  ranging  from  280K  to  360K  along  with  20  tube  separa/ons  

  200ns  per  trajectory  with  5.8ns+/-­‐3%  per  day  on  64  nodes  •  Test  1:  

  Hardware  errors,  i.e.,    ECC  error    and  system  failures  •  Test  2:  

  Hardware  and  applica/on  errors  

32 24 hours 24 hours

Page 34: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Modeling  max  u/liza/on  

• With  our  approach:  

• Without  our  approach:  

33

utilization =days∑

tmax − tarrival (i) − tlastchk (i)( ) + trestart[ ]i=1

n−1

∑ − tmax − tarrival (n)( )

tmaxGPUs∑

utilization =days∑

tmax − tarrival (1) − tlastchk (1)( ) − tmax − tarrival (1)( )tmaxGPUs

tarrival (i)i<n = weilbul(scale,shape)tarrival (n) = 0.03 × tmaxtmax = 24hours

where:

Page 35: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:    modeling  system  failures  

34 • Weibul  distribu/on:  scale  =  203.8  and  shape  =  0.525  

failu

res

occu

rren

ces

probability density function (pdf)

P(system failure) = 0.057 P(two more jobs fail because of system given that one already failed) = 0.333

hours

Page 36: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:  modeling  applica/on  failures  

35 • Weibul  distribu/on:  scale  =  56.56,  shape  =  0.3361  

failu

res

occu

rren

ces

probability density function (pdf)

hours

P(application failure) = 0.038

Page 37: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:  results  

•  Run  200ns  for  each  nanotube  system  –  equivalent  to    ~35  days  on  64  nodes  of  Keeneland,  each  with  3  GPUs  

36

99.69%

99.64%

99.47%

99.28%

99.54%

99.47%

99.23%

98.98%

94.07%

94.02%

93.79%

93.61%

90.32%

90.24%

89.98%

89.73%

0.5

1

3

6

With our approach W/o our approach

sysfailsysfailappfail

sysfailappfail

sysfailtchkpnt(hours)

Page 38: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:  scien/fic  results    

37

Page 39: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

38

Case  study  2:  scien/fic  results    

Page 40: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Case  study  2:  scien/fic  results    

39

Page 41: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Conclusions  

•  GPUs  are  s/ll  second  class  ci/zens  on  high-­‐end  clusters    Virtualiza/on  is  too  costly    Lightweight,  user-­‐level  OSs  are  work  in  progress  

•  Rather  than  rewri/ng  exis/ng  workflow  and  resource  managers,  we  propose  to  complement  them  with:      Companion  Module  complemen/ng  the  workflow  manager      Wrapper  Module  suppor/ng  the  resource  managers  

• We  model  the  maximum  usability  for:    SDS  systems  with  dynamically  variable  run/mes    Carbon  nanotube  systems  with  hardware  and  applica/on  failures  

•  Usability  increases  significantly  in  both  cases  •  The  science  is  work  in  progress  

  Stay  tune  for  our  next  publica/ons  

40

Page 42: GPU-Enabled Studies of Molecular Systems on Keeneland | GTC 2013

Acknowledgments  

41

Related work: Taufer et al., CiSE 2012 Ganesan et al., JCC 2011 Bauer et al., JCC 2011 Davis et al., BICoB 2009

Patel’s  group   Taufer’s  group  

Sponsors:

Contact: [email protected] , [email protected]