Memcached(Design(on(High(Performance(...

28
Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. WasiurRahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan Sur & D. K. Panda NetworkBased Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State University, USA

Transcript of Memcached(Design(on(High(Performance(...

Page 1: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

Memcached  Design  on  High  Performance  RDMA  Capable  Interconnects  

Jithin  Jose,  Hari  Subramoni,  Miao  Luo,  Minjia  Zhang,    Jian  Huang,  Md.  Wasi-­‐ur-­‐Rahman,  Nusrat  S.  Islam,    

Xiangyong  Ouyang,  Hao  Wang,  Sayantan  Sur  &  D.  K.  Panda    

Network-­‐Based  Compu2ng  Laboratory  Department  of  Computer  Science  and  Engineering  

The  Ohio  State  University,  USA    

Page 2: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon    •  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

2

Page 3: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

IntroducLon  •  Tremendous increase in interest in interactive web-sites

(social networking, e-commerce etc.)

•  Dynamic data is stored in databases for future retrieval and analysis

•  Database lookups are expensive •  Memcached – a distributed memory caching layer,

implemented using traditional BSD sockets

•  Socket interface provides portability, but entails additional processing and multiple message copies

•  High-Performance Computing (HPC) has adopted advanced interconnects (e.g. InfiniBand, 10 Gigabit Ethernet/iWARP, RoCE)

–  Low latency, High Bandwidth, Low CPU overhead

•  Many machines in Top500 list (http://www.top500.org) 3

Page 4: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon  •  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

4

Page 5: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Overview  

•  Memcached  provides  a  scalable  distributed  caching    •  Spare  memory  in  data-­‐center  servers  can  be  aggregated  to  speedup  

lookups    •  Basically  a  key-­‐value  distributed  memory  store  •  Keys  can  be  any  character  strings,  typically  MD5  sums  or  hashes  •  Typically  used  to  cache  database  queries,  results  of  API  calls  or  webpage  

rendering  elements  •  Scalable  model,  but  typical  usage  very  network  intensive  -­‐Performance  

directly  related  to  that  of  underlying  networking  technology   5

Internet  

Proxy  Servers    (Memcached  Clients)  

Memcached  Servers  

Database  Servers  

System  Area  

Network  

System  Area  

Network  

Page 6: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon  •  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

6

Page 7: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Modern  High  Performance  Interconnects  

7

ApplicaAon  

IB  Verbs  Sockets  ApplicaLon  Interface  

TCP/IP  

Hardware  Offload  

TCP/IP  

Ethernet  Driver  

Kernel Space

Protocol  ImplementaLon  

1/10  GigE  Adapter  

Ethernet  Switch  

Network  Adapter  

Network  Switch  

1/10  GigE  

InfiniBand  Adapter  

InfiniBand  Switch  

IPoIB  

IPoIB  

SDP

RDMA  User space

IB  Verbs  

InfiniBand  Adapter  

InfiniBand  Switch  

SDP  

InfiniBand  Adapter  

InfiniBand  Switch  

RDMA  

10  GigE  Adapter  

10  GigE  Switch  

10  GigE-­‐TOE  

Page 8: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Problem  Statement  

•  High-performance RDMA capable interconnects have emerged in the scientific computation domain

•  Applications using Memcached are still relying on sockets

•  Performance of Memcached is critical to most of its deployments

•  Can Memcached be re-designed from the ground up to utilize RDMA capable networks?

8

Page 9: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

A  New  Approach  using    Unified  CommunicaLon  RunLme  (UCR)    

9

Current  Approach  

ApplicaAon  

Sockets  

1/10  GigE  Network  

•  Sockets  not  designed  for  high-­‐performance  –  Stream  semanLcs  o\en  mismatch  for  upper  layers  (Memcached,  Hadoop)  –  MulLple  copies  can  be  involved  

Our  Approach  

ApplicaAon  

IB  Verbs  

RDMA  Capable  N/ws  (IB,  10GE,  iWARP,  RoCE  ...)  

UCR  

Page 10: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon  &  MoLvaLon  

•  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

10

Page 11: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Unified  CommunicaLon  RunLme  (UCR)

•  Initially proposed to unify communication runtimes of different parallel programming models –  J. Jose, M. Luo, S. Sur and D. K. Panda, Unifying UPC and MPI

Runtimes: Experience with MVAPICH, (PGAS’10)

•  Design of UCR evolved from MVAPICH/MVAPICH2 software stacks (h`p://mvapich.cse.ohio-­‐state.edu/)

•  UCR provides interfaces for Active Messages as well as one-sided put/get operations

•  Enhanced APIs to support Cloud computing applications •  Several enhancements in UCR

–  end-point based design, revamped active-message API, fault tolerance and synchronization with timeouts.

•  Communications based on endpoint, analogous to sockets 11

Page 12: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

AcLve  Messaging  in  UCR  

12

•  Active messages are proven to be very powerful in many environments •  GASNet Project (UC Berkeley), MPI design using LAPI (IBM), etc.

•  We introduce Active messages into the data-center domain •  An Active Message consists of two parts – header and data •  When the message arrives at the target, header handler is run •  Header handler identifies the destination buffer for the data •  Data is put into the destination buffer •  Completion handler is run afterwards (optional) •  Special flags to indicate local & remote completions (optional)

Page 13: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

AcLve  Messaging  in  UCR  (contd.)  

13

(General Active Message Functionality) (Optimized Short Active Message Functionality)

Origin Target Header

HeaderHandler

CompletionHandler

Set RComplFlag

Set ComplFlag

RDMA Data

Set LComplFlag

Origin Target Header + Data

HeaderHandler

CompletionHandler

Set ComplFlag

Set LComplFlag

Copy Data

Set RComplFlag

Page 14: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon  &  MoLvaLon  

•  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

14

Page 15: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Design  using  UCR  

15

•  Server  and  client  perform  a  negoLaLon  protocol  –  Master  thread  assigns  clients  to  appropriate  worker  thread  

•  Once  a  client  is  assigned  a  verbs  worker  thread,  it  can  communicate  directly  and  is  “bound”  to  that  thread  

•  All  other  Memcached  data  structures  are  shared  among  RDMA  and  Sockets  worker  threads  

Sockets  Client  

RDMA  Client  

Master  Thread  

Sockets  Worker  Thread  

Verbs  Worker  Thread  

Sockets  Worker  Thread  

Verbs  Worker  Thread  

 Shared  Data  

 Memory  Slabs  Items  …  

Page 16: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon  &  MoLvaLon  

•  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

16

Page 17: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Experimental  Setup  •  Used  Two  Clusters  

–  Intel  Clovertown  •  Each  node  has  8  processor  cores  on  2  Intel  Xeon  2.33  GHz  Quad-­‐core  CPUs,    

6  GB  main  memory,  250  GB  hard  disk  •  Network:  1GigE,    IPoIB,  10GigE  TOE  and  IB  (DDR)  

–  Intel  Westmere  •  Each  node  has  8  processor  cores  on  2  Intel  Xeon  2.67  GHz  Quad-­‐core  CPUs,    

12  GB  main  memory,  160  GB  hard  disk  •  Network:  1GigE,    IPoIB,  and  IB  (QDR)  

•  Memcached  So\ware  –  Memcached  Server:  1.4.5    –  Memcached  Client:  (libmemcached)  0.45  

 17

Page 18: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Get  Latency  (Small  Message)    

18

•  Memcached  Get  latency  –  4  bytes  –  DDR:  6  us;  QDR:  5  us  –  4K  bytes  -­‐-­‐  DDR:  20  us;  QDR:12  us  

•  Almost  factor  of  four  improvement  over  10GE  (TOE)  for  4KB  on  the  DDR  cluster  •  Almost  factor  of  seven  improvement  over  IPoIB  for  4KB  on  the  QDR  cluster  

 

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

50

100

150

200

250

1 2 4 8 16 32 64 128 256 512 1K 2K 4K

Tim

e (u

s)

Message Size

0

50

100

150

200

250

300

350

1 2 4 8 16 32 64 128 256 512 1K 2K 4K

Tim

e (u

s)

Message Size

SDP

IPoIB

OSU Design

1G

10G-TOE

Page 19: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Get  Latency  (Large  Message)    

19

•  Memcached  Get  latency  –  8K  bytes  –  DDR:  17  us;  QDR:  13  us  –  512K  bytes  -­‐-­‐  DDR:  362  us;  QDR:  94  us  

•  Almost  factor  of  three  improvement  over  10GE(TOE)  for  512KB  on  the  DDR  cluster  •  Almost  factor  of  four  improvement  over  IPoIB  for  512K  bytes  on  the  QDR  cluster  

   

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

4K 8K 16K 32K 64K 128K 256K 512K

Tim

e (u

s)

Message Size

0

100

200

300

400

500

600

700

4K 8K 16K 32K 64K 128K 512K Message Size

SDP

IPoIB

OSU Design

1G

10G-TOE

Page 20: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Set  Latency  (Small  Message)    

20

•  Memcached  Set  latency  –  4  bytes  –  DDR:  7  us;  QDR:  5  us  –  4K  bytes  -­‐-­‐  DDR:  15  us;  QDR:13  us  

•  Almost  factor  of  four  improvement  over  10GE  (TOE)  for  4KB  on  the  DDR  Cluster  •  Almost  factor  of  six  improvement  over  IPoIB  for  4KB  on  the  QDR  Cluster    

 

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

50

100

150

200

250

1 2 4 8 16 32 64 128 256 512 1K 2K 4K

Tim

e (u

s)

Message Size

0

10

20

30

40

50

60

70

80

90

100

1 2 4 8 16 32 64 128 256 512 1K 2K 4K Message Size

SDP

IPoIB

OSU Design

1G

10G-TOE

Page 21: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Set  Latency  (Large  Message)  

21

•  Memcached  Get  latency  –  8K  bytes  –  DDR:  18  us;  QDR:  15  us  –  512K  bytes  -­‐-­‐  DDR:  375  us;  QDR:185  us  

•  Almost  factor  of  two  improvement  over  10GE  (TOE)  for  512KB  on  the  DDR  cluster  •  Almost  factor  of  three  improvement  over  IPoIB  for  512KB  on  the  QDR  cluster  

 

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

4K 8K 16K 32K 64K 128K 256K 512K

Tim

e (u

s)

Message Size

0

100

200

300

400

500

600

700

800

4K 8K 16K 32K 64K 128K 256K 512K Message Size

SDP

IPoIB

OSU Design

1G

10G-TOE

Page 22: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Latency    (10%  Set,  90%  Get)  

22

•  Memcached  Get  latency  –  4  bytes  –  DDR:  7  us;  QDR:  5  us  –  4K  bytes  -­‐-­‐  DDR:  15  us;  QDR:12  us  

•  Almost  factor  of  four  improvement  over  10GE  (TOE)  for  4K  bytes  on  the  DDR  cluster    

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

50

100

150

200

250

1 2 4 8 16 32 64 128 256 512 1K 2K 4K

Tim

e (u

s)

Message Size

0

50

100

150

200

250

300

350

1 2 4 8 16 32 64 128 256 512 1K 2K 4K Message Size

SDP

IPoIB

OSU Design

1G

10G

Page 23: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Latency    (50%  Set,  50%  Get)  

23

•  Memcached  Get  latency  –  4  bytes  –  DDR:  7  us;  QDR:  5  us  –  4K  bytes  -­‐-­‐  DDR:  15  us;  QDR:12  us  

•  Almost  factor  of  four  improvement  over  10GE  (TOE)  for  4K  bytes  on  the  DDR  cluster    

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0 10 20 30 40 50 60 70 80 90

100

1 2 4 8 16 32 64 128 256 512 1K 2K 4K Message Size

SDP

IPoIB

OSU Design

1G

10G-TOE

0

50

100

150

200

250

1 2 4 8 16 32 64 128 256 512 1K 2K 4K

Tim

e (u

s)

Message Size

Page 24: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Get  TPS  (4byte)  

24

•  Memcached  Get  transacLons  per  second  for  4  bytes  –  On  IB  DDR  about  600K/s  for  16  clients    –  On  IB  QDR  1.9M/s  for  16  clients  

•  Almost  factor  of  six  improvement  over  10GE  (TOE)  •  Significant  improvement  with  naLve  IB  QDR  compared  to  SDP  and  

IPoIB    

 

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

200

400

600

800

1000

1200

1400

1600

1800

2000

8 Clients 16 Clients

SDP

IPoIB

OSU Design

0

100

200

300

400

500

600

700

8 Clients 16 Clients

Thou

sand

s of

Tra

nsac

tions

per

se

cond

(TPS

)

SDP

10G - TOE

OSU Design

Page 25: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Memcached  Get  TPS  (4KB)  

25

•  Memcached  Get  transacLons  per  second  for  4K  bytes  –  On  IB  DDR  about  330K/s  for  16  clients    –  On  IB  QDR  842K/s  for  16  clients  

•  Almost  factor  of  four  improvement  over  10GE  (TOE)  •  Significant  improvement  with  naLve  IB  QDR  

 

Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR)

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

8 Clients 16 Clients

SDP

IPoIB

OSU Design

0

50000

100000

150000

200000

250000

300000

350000

8 Clients 16 Clients

Tra

nsac

tions

per

sec

ond(

TPS)

SDP

10G - TOE

OSU Design

Page 26: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Outline    

•  IntroducLon  &  MoLvaLon  

•  Overview  of  Memcached  

•  Modern  High  Performance  Interconnects  

•  Unified  CommunicaLon  RunLme  (UCR)  

•  Memcached  Design  using  UCR  

•  Performance  EvaluaLon  

•  Conclusion  &  Future  Work  

26

Page 27: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

Conclusion  &  Future  Work  

27  

•  Described  a  novel  design  of  Memcached  for  RDMA  capable  networks  

•  Provided  a  detailed  performance  comparison  of  our  design  compared  to  unmodified  Memcached  using  sockets  over  RDMA  and  10GE  networks  

•  Observed  significant  performance  improvement  with  the  proposed  design  

•  Factor  of  four  improvement  in  Memcached  get  latency  (4K  bytes)  

•  Factor  of  six  improvement  in  Memcached  get  transacLons/s  (4  bytes)  

•  We  plan  to  improve  UCR  by  taking  into  account  the  many  features  in  OpenFabrics  API  ,  Unreliable  Datagram  transport  and  designing  iWARP  and  RoCE  versions  of  UCR,  and  thereby  scaling  Memcached  

•  We  are  working  on  enhancing  the  Hadoop/HBase  designs  for  RDMA  capable  networks  

Page 28: Memcached(Design(on(High(Performance( …mvapich.cse.ohio-state.edu/static/media/publications/...Memcached(Design(on(High(Performance(RDMA(Capable(Interconnects(JithinJose, Hari(Subramoni,(Miao(Luo,

ICPP - 2011

 Thank  You!  {jose,  subramon,  luom,  zhanjmin,  huangjia,  rahmanmd,  islamn,  ouyangx,  wangh,  surs,panda}@cse.ohio-­‐state.edu  

Network-­‐Based  CompuLng  Laboratory  h`p://nowlab.cse.ohio-­‐state.edu/

28

MVAPICH  Web  Page  h`p://mvapich.cse.ohio-­‐state.edu/