Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM...

21

Transcript of Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM...

Page 1: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)
Page 2: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

Sean Hefty Openfabrics Interfaces Working Group Co-Chair Intel November 2016

OPENFABRICS INTERFACES: PAST, PRESENT, AND FUTURE

Page 3: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

3 3

Scalable  Implementa-on  

Agnos-c  

OFIWG: develop … interfaces aligned with … application needs

So2ware  interfaces  aligned  with  applica-on  requirements  • Careful  analysis  of  requirement  

Expand  open  source  community  • Inclusive  development  effort  • App  and  HW  developers  

Good  impedance  match  with  mul-ple  fabric  hardware  • InfiniBand*,  iWarp,  RoCE,  Ethernet,  UDP  offload,  Intel®,  Cray*,  IBM*,  others  

Open  Source   Applica-on-­‐Centric  

libfabric

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

Op-mized  SW  path  to  HW  • Minimize  cache/memory  footprint  • Reduce  instrucLon  count  • Minimize  memory  accesses  

Page 4: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

4

OFI APPLICATION REQUIREMENTS Give us a high-level interface!

Give us a low-level interface!

MPI developers

OFI strives to meet both requirements

Page 5: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

5                        Fabric  Services  

Application

OFI

Provider

Application

OFI

Provider Provider  opLmizes  for  

OFI  features  

Common  opLmizaLon  for  all  apps/providers  

App  uses  OFI  features  

Application

OFI

Provider

App  opLmizes  based  on  supported  features  

Provider  supports  low-­‐level  features  only  

OFI SOFTWARE DEVELOPMENT STRATEGIES One Size Does Not Fit All

Page 6: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

OFI DEVELOPMENT STATUS

6

                       Fabric  Services  

Application

libfabric

Provider

Provider optimizes for OFI features

Common  opLmizaLon  for  all  apps/providers  

Provider supports low-level features only

Many  apps   Few  apps  

Provider’s  choice  

App  opLmizes  based  on  supported  features  

App uses OFI features

OFI-provider gap

6

Page 7: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

OFI LIBFABRIC COMMUNITY

7 *  Other  names  and  brands  may  be  claimed  as  the  property  of  

others  

libfabric  

Intel®  MPI  Library  

MPICH  Netmod/CH4  

Open  MPI  MTL/BTL  

Open  MPI  SHMEM  

Sandia  SHMEM  GASNet   Clang  

UPC  rsocket  ES-­‐API  

libfabric  Enabled  Middleware  

Control  Services   CommunicaLon  Services  

CompleLon  Services  

Data  Transfer  Services  

Discovery  

fi_info  

ConnecLon  Management  

Address  Vectors  

Event  Queues  

Event  Counters  

Message  Queue  

Tag  Matching  

RMA  

Atomics  

Sockets  TCP,  UDP   Verbs   Cisco  

usNIC  Intel  

OPA  PSM  Cray  GNI  

Mellanox  MXM  

IBM  Blue  Gene  

A3Cube  RONNIE  

* * * * *®

experimental  supported  

*

Because of the OFI-provider gap, not all apps work with all providers

Page 8: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

LIBFABRIC SCALABILITY

8

By Courtesy Argonne* National Laboratory, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=24653857

Developed  to  evaluate  the  Aurora  so_ware  stack  at  scale  and  assist  applicaLons  in  the  transiLon  from  Mira  to  Aurora  

NaLve  provider  implementaLon  that  directly  uses  the  Blue  Gene/Q  hardware  and  network  

interfaces  for  communicaLon  

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

Blue Gene / Q

Page 9: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

§  IBM* MPICH / PAMI •  IBM XL C compiler for BG, v12.1 • Optimized for single-threaded latency • …/comm/xl.legacy.ndebug/bin/mpicc •  v1r2m2

§ MPICH / CH4 / libfabric • gcc 4.4.7 • global locks, inline, direct, etc. • Provider not optimized for performance

PAMI  

MPICH  

PAMID  

hardware  

BG/Q  Provider  

libfabric  

MPICH  

CH4  OFI  

Completely  subjec.ve  so_ware  stack  comparison  

vs  

32  nodes  on  ALCF  Vesta  machine  

PAMI and libfabric performance

LIBFABRIC SCALABILITY

9

Blue Gene / Q

Page 10: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

10

1  

2  

4  

8  

16  

1   8   64   512   4096  

Latency  (us)  

Bytes  

IBM   OFI  

OSU* MPI Performance Tests v5.0

0.1  

1  

10  

100  

1000  

1   8   64   512   4096   32768  

Band

width  (M

B/s)  

Bytes  

IBM   OFI  

100,000  

1,000,000  

10,000,000  

1   8   64   512   4096   32768  

Msgs/s  

Bytes  

IBM   OFI  

MPI scale out testing: - cpi – 1M ranks, - ISx benchmark – 0.5M ranks

Tests  document  performance  of  components  on  a  parLcular  test,  in  specific  systems.  Differences  in  hardware,  so_ware,  or  configuraLon  will  affect  actual  performance.  Consult  other  sources  of  informaLon  to  evaluate  performance  as  you  consider  your  purchase.    For  more  complete  informaLon  about  performance  and  benchmark  results,  visit  hkp://www.intel.com/performance.  

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

LIBFABRIC SCALABILITY Blue Gene / Q

Page 11: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

LIBFABRIC SCALABILITY

11

Evaluate  libfabric  SHMEM  performance  on  high-­‐

performance  interconnect  

Provider  implementaLon  that  uses  the  Cray*  uGNI  hardware  and  network  

interface  for  communicaLon  

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

Computing Sciences Lawrence Berkeley National Laboratory

SHMEM CRAY XC40

Page 12: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

§  Cray* SHMEM • Cray* Aries, Dragonfly* topology • CLE (Cray* Linux*), SLURM* • DMAPP

• Designed for PGAS • Optimized for small messages

§  Sandia* OpenSHMEM / libfabric •  uGNI

•  Designed for MPI and PGAS •  Optimized for large messages

§  https://www.nersc.gov/users/computational-systems/cori/configuration

DMAPP  

Cray  SHMEM  

Aries  Interconnect  

uGNI  

libfabric  

Open  SHMEM  

OFI  

vs  

1630  nodes  on  Cray*  XC40  (Cori)  

LIBFABRIC SCALABILITY

12 *  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

SHMEM CRAY XC40

Page 13: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

13

Tests  document  performance  of  components  on  a  parLcular  test,  in  specific  systems.  Differences  in  hardware,  so_ware,  or  configuraLon  will  affect  actual  performance.  Consult  other  sources  of  informaLon  to  evaluate  performance  as  you  consider  your  purchase.    For  more  complete  informaLon  about  performance  and  benchmark  results,  visit  hkp://www.intel.com/performance.  

LIBFABRIC SCALABILITY

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

Put – up to 61% improvement

Get – within 2%

Blocking Get/Put B/W SHMEM

CRAY XC40

Page 14: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

14

Tests  document  performance  of  components  on  a  parLcular  test,  in  specific  systems.  Differences  in  hardware,  so_ware,  or  configuraLon  will  affect  actual  performance.  Consult  other  sources  of  informaLon  to  evaluate  performance  as  you  consider  your  purchase.    For  more  complete  informaLon  about  performance  and  benchmark  results,  visit  hkp://www.intel.com/performance.  

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

XPMEM

Improved scalability

GUPS Scaling

slight improvement (lower is better)

LIBFABRIC SCALABILITY NAS ISx (Integer Sort)

weak scaling SHMEM

CRAY XC40

Page 15: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

ADDRESSING THE OFI-PROVIDER GAP

15

Libfabric Framework

libfabric  API  

Components  templates,  lists,  rbtree,  hash  table,  free  pool,  ring  buffer,  stack,  …  

Base  Class  Implementa-ons  fabric,  domain,  EQ,  wait  sets,  AV,  CQ,  …  SHM  primiLves  

Provider  Services  •  Logging  •  Environment  variables  

U-lity  Provider  

Core  Provider  

Interface  ‘extensions’  –  for  consistency  

Assist  in  provider  development  

Enhance  core  provider  

Page 16: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

UTILITY PROVIDER

16

Performance is a primary objective

Page 17: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

MOVING FORWARD

17

Beyond  HPC  Enterprise,  Cloud,  Storage  (NVM)  

Stronger  engagement  with  these  communiLes  

Beyond  Linux*  

Sockets  –  TCP/UDP   NetworkDirect  

Analyze requests to expand OFI

community

*  Other  names  and  brands  may  be  claimed  as  the  property  of  others  

Page 18: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

TARGET SCHEDULE

18

§  Driven  by  implementaLon  feedback  §  Improve  error  handling,  flow  control  §  Beker  support  for  non-­‐tradiLonal  fabrics  §  OpLmize  compleLon  handling  §  Address  deferred  features  

2016   Q2   Q3   Q4   2017   Q2   Q3   Q4  

RDM  over  DGRAM  ULl  

RDM  over  MSG  ULl  

Shared  Memory  

New  Core  Providers  

ABI  1.1  

ULlity  provider  is  ongoing  

TradiLonal  and  non-­‐tradiLonal  RDMA  providers  

Page 19: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

SUMMARY

19

§ OFIWG development model working well

§ Interest in OFI and libfabric is high

§ Growing community § Significant effort being made to

simplify the lives of developers • Applications and providers

OFI  is  so  good  

Page 20: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

LEGAL DISCLAIMER & OPTIMIZATION NOTICE

20

§  No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

§  Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

§  Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

§  *Other names and brands may be claimed as the property of others

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Page 21: Openfabrics Interfaces Working Group Co-Chair...Sockets) TCP,UDP) Verbs Cisco usNIC) Intel) OPA)PSM Cray) GNI Mellanox) MXM IBMBlue) Gene A3Cube RONNIE) * ® * * * supported) experimental)

Thank you for your time! Sean Hefty

[email protected]

www.intel.com/hpcdevcon