InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based...

62
InfiniCortex Marek T. Michalewicz Senior Director A*CRC

Transcript of InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based...

Page 1: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

InfiniCortex  

Marek  T.  Michalewicz  Senior  Director  

A*CRC

Page 2: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

NOT  GRID!

Page 3: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

NOT  CLOUD!

Page 4: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

InfiniCortexconcurrent  supercompu7ng    

across  the  globe    u7lising  trans-­‐con7nental  InfiniBand    

and    Galaxy  of  Supercomputers

Page 5: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

1.  ACA  100    Asia  Connects  America  100  Gbps,  by  November  2014  Challenge  issued  by  Yves  Poppe  at  APAN  37  in  Budung,  Indonesia,  February  2014  

2.  InfiniBand  over  trans-­‐Pacific  distance  Made  possible  with  Obsidian  Strategics  Longbow  range  extenders  

3.  Galaxy  of  Supercomputers  Supercomputer  interconnect  topology  work    by  Y.  Deng,  M.  Michalewicz  and  L.  Orlowski  Obsidian  Strategics  Crossbow  InfiniBand  router  

4.  Applica7on  layer  from  simplest  file  transfer:  dsync+    to  complex  workflows:  ADIOS,  mulW-­‐scale  models  

The  InfiniCortex  Components

Page 6: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Na7onal  Supercomputer  Centre

Joint  A*STAR,  NUS,  NTU,  SUTD  and  NRF  Proposal    

➠ NaWonal  SupercompuWng  Centre  (NSCC)  ! New  1-­‐2+  PetaFLOP  Supercomputer  ! Recurrent  investment  every  3-­‐5  years  ! Pooling  of  upper-­‐mid  to  high  Wer  compute  resources  at  A*STAR  &  IHLs  ! Co-­‐investment  from  primary  stakeholders  

➠ Science,  Technology  and  Research  Network  (STAR-­‐N)  !  A  high  bandwidth  network  to  connect  the  distributed  compute  resources  !  Provides  high  speed  access  to  users  (both  public  and  private)  anywhere  ! Supports  transfer  of  large  data-­‐sets  (both  locally  and  internaWonally)  ! Deepen  local  and  internaWonal  network  connecWvity  

Ini7al  Mo7va7on

Page 7: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Level 17 at Fusionopolis

A*CRC Datacenter 1

Page 8: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Matrix Building at Biopolis

Page 9: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Mellanox  Metro-­‐X  tesWng  since  early  2013  goal:  to  connect  HPC  resources  at  Fusionopolis  with  storage  and  genomics  pipeline  at  Biopolis  -­‐  Matrix  building

Metro-­‐X  A*CRC  team:  Stephen  Wong    Tay  Teck  Wee    Steven  Chew

Page 10: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  SetupsServer

IB  HCA

Server

IB  HCACopper

Mellanox  Metro-­‐X  Switch

Server

IB  HCAServer

IB  HCACopper Copper

Mellanox  Metro-­‐X  Switch

Server

IB  HCAServer

IB  HCACopperMellanox  

Metro-­‐X  SwitchCopper 10m  fibre  

patch  cord

Mellanox  Metro-­‐X  Switch

Server

IB  HCAServer

IB  HCACopperMellanox  

Metro-­‐X  SwitchCopper 2km  dark  

fibre

Point  to  Point

One  Switch

Two  Switches

Long  Range  (2km)

Page 11: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

File  Transfer  Speed(Ave.  Time  to  Transfer  a  20GB  file)

Band

width  (M

B/s)

0

200

400

600

800

Average  Time  (s)

0

8.25

16.5

24.75

33

Configura7on

10G  Direct IPOIB  1  Switch IPOIB  2  Switches,  2km

660.6706.2731.4

787.7

620.6

312928

26

33

620.6

787.7731.4 706.2

660.6

Page 12: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

MPI  SendRecv  Bandwidth

Page 13: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  InfiniCortex  Components1.  ACA  100    

Asia  Connects  America  100  Gbps,  by  November  2014  Challenge  thrown  by  Yves  Poppe  at  APAN  37  in  Budung,  Indonesia,  February  2014  

2.  InfiniBand  over  trans-­‐Pacific  distance  Made  possible  with  Obsidian  Strategics  Longbow  range  extenders  

3.  Galaxy  of  Supercomputers  Supercomputer  interconnect  topology  work    by  Y.  Deng,  M.  Michalewicz  and  L.  Orlowski  Obsidian  Strategics  Crossbow  InfiniBand  router  

4.  Applica7on  layer  from  simplest  file  transfer:  dsync+    to  complex  workflows:  ADIOS,  mulW-­‐scale  models  

Page 14: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

© 2010 Tata C

omm

unications Ltd., All R

ights Reserved

!!!

Raising Asian Research and Education Networking to a higher dimension

!The ACA-100 challenge

!Asia connects America at 100gbps in November

2014 at SC14 in New Orleans

14Credits:  Yves  Poppe

Page 15: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

© 2010 Tata C

omm

unications Ltd., All R

ights Reserved

Will APAN raise to the ACA-100 challenge ?

• Project name: ACA-100 Asia connects America at 100gbps !

• Event : SC14 Super Computing 2014 in November in New Orleans !

• Objective: Illustrate the pioneering role of the R&E community and APAN in particular in R&E networking !

• Demonstration: feasibility of a ‘real’ transpacific 100gbps on a single wave between Asia and SC14 in New Orleans, in cooperation with Internet2 and other counterparts. Leave it on for one year as is the case with ANA-100. !

• The extreme challenge for 2014: extend ACA-100 all the way to Europe via internet2 and ANA-100 or it’s successor…J

Credits:  Yves  Poppe

Page 16: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

© 2010 Tata C

omm

unications Ltd., All R

ights Reserved

as TERENA and Internet2 did last year at ANA-100?

• Project name: ANA-100 transatlantic connection at 100gbps !

• Event : Terena conference in june 2013 in Maastricht, Netherlands !

• Objective: Illustrate the pioneering role of the R&E community in North America and Europe in particular in R&E networking !

• Demonstration: feasibility of a ‘real’ transpacific 100gbps on a single wave between Internet2 and Netherlights extende to Maastricht and CERN in Geneva in cooperation with Internet2 and other counterparts. The circuit remains operational for one year after the june demo. !

• Timeframes an outcome: an agreement to proceed was reached in april 2013; extremely tight schedule. Successful demo at Terena 2013.

Credits:  Yves  Poppe

Page 17: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)
Page 18: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

1 | Infinera Confidential & Proprietary

Terabit Scale Super-ChannelsDANTE/GEANT Field TrialAugust/September 2014

Page 19: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

4 | Infinera Confidential & Proprietary

The Trial Route: Budapest-Bratislava

Budapest

Tatabánya

Győr

Bratislava

High loss span

Line card in Budapest

510km: Loopback and monitoring in Bratislava

1.2  Tbp/s  link

Page 20: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

7 | Infinera Confidential & Proprietary

DTN-X: Terabit Scale Capacity Upgrade

Infinera DTN-XWorld’s first, and most

successful super-channel transport platform

500Gb/suniversal slot

5Tb/sswitching

per chassis

1.2Tb/suniversal slot

12Tb/sswitching

per chassis

No service interruption!

Page 21: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Supercompu7ng  2014  New  Orleans,  USA,  16-­‐20  November  Event:  Emerging  Technologies

Galaxy14 Network - Supercomputing 2014

Revision A - 24/07/2014

Singapore Taipei, Taiwan 1.0Gbps

Data Rate

3,232 km / - km

Distance / Actual

15ms / - ms

Ping / ActualLink

Singapore Tokyo, Japan 2.5Gbps 5,312 km / - km 25ms / - ms

Singapore Canberra, Australia - Gbps 6,216 km / - km 29ms / - ms

Singapore New Orleans, USA 10/100 Gbps 16,367 km / - km 77ms / - ms

Knoxville TN, USA 10 Gbps 876 km / - km 4ms / - ms

Austin TX, USA 10 Gbps 823 km / - km 3ms / - msNew Orleans, USA

New Orleans, USA

Taipei

Tokyo

Singapore

Canberra

New OrleansAtlanta

Page 22: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  InfiniCortex  Components1.  ACA  100    

Asia  Connects  America  100  Gbps,  by  November  2014  Challenge  issued  by  Yves  Poppe  at  APAN  37  in  Budung,  Indonesia,  February  2014  

2.  InfiniBand  over  trans-­‐Pacific  distance  Made  possible  with  Obsidian  Strategics  Longbow  range  extenders  

3.  Galaxy  of  Supercomputers  Supercomputer  interconnect  topology  work    by  Y.  Deng,  M.  Michalewicz  and  L.  Orlowski  Obsidian  Strategics  Crossbow  InfiniBand  router  

4.  Applica7on  layer  from  simplest  file  transfer:  dsync+    to  complex  workflows:  ADIOS,  mulW-­‐scale  models  

Page 23: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

2.  Extending  InfiniBand  over   trans-­‐con7nental  distances

!

!

!

Partners:  Obsidian  Strategics  and  A*CRC

Page 24: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Mission: Scalable, stable, secure and high performance RDMA interconnects The Obsidian team firmly believes in the technical superiority of the InfiniBand protocol, and develops proprietary technologies to move InfiniBand beyond today's typical HPC model into fabrics worthy of both Exascale HPC and Enterprise deployments, which have similar requirements. Obsidian's Longbow and Crossbow platforms provide four main capability sets: Range Extension Lossless InfiniBand is incapable of significant link lengths due to buffer credit starvation. Longbows provide transparent buffer credit extension and interface between LAN and optical WANs.

Multi-Subnet Routing Crossbow and certain Longbow devices provide hardware InfiniBand routing, which is critical for inter-organizational communication, fault isolation, overcoming LID space limits and supporting compound topologies.

Encryption and Authentication Longbow E100 provides AES protection on the WAN segments, protecting data and also infrastructure against intrusions while X100 interoperates with military Type-1 encryptors. Critical for certain industries, including finance, health care and military/ intelligence.

Enterprise-Grade Subnet Management Obsidian's BGFC provides n-way active SA clusters, mathematically proven dead-lock free forwarding tables, direct support for InfiniBand routers and many features intended to facilitate Exascale deployment concerning performance scaling, fault tolerance and deterministic routing engines.

Page 25: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

High Performance System Area Networking

Page 26: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Key Enabler - InfiniBand Over the WAN

Patent granted: US (#7,843,962) also UK, Australia, Mexico, Japan, Russia, Canada, China, Korea and Israel (pending worldwide).

Lossless transport: ● Cluster interconnect now a global

transport protocol ● Deterministic data flows ● Distance-insensitive throughput ● Very low latency & jitter ● Seamless to applications ● RDMA support – scales with

memory bandwidth, not CPU speed ● A Unified Fabric able to carry

storage protocols

Page 27: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Existing Hardware Portfolio

Longbow C100 – 10G IB range extender Dark Fiber/ xWDM Metro Area Networks (80 km)

Longbow X100 (Military) – 10G IB range extender/ router 10GbE/ OC-192, Wide Area Networks (unlimited distance)

Longbow E100 – 10G IB range extender/ router/ crypto Dark Fiber/10GbE, Wide Area Networks (unlimited distance)

A-CWDM81 – 10G Optical Mux/Demux Nine CWDM channels, Metro Area Networks (80 km)

Full-production models:

Longbow C400 – 40G IB range extender Dark Fiber/ xWDM Regional Area Networks (1~1,600 km)

( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax) and HTTPS interactive GUI for configuration, performance/state monitoring, logging and firmware/hardware updates )

Page 28: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Crossbow Device

Crossbow R400-6 – 40G Six-port native IB router (enables multi-subnet LAN fabrics)

Longbow E1000 – 100G IB range extender Dark Fiber/ 100GbE Wide Area Networks (Global km)

Longbow Devices (Future)

Page 29: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Capability: Bulk, Secure Data Migration at the File System Level

• NASA helped Obsidian define market requirements for enhancing Wide Area InfiniBand connections with suitable encryption and authentication.

• Obsidian created a large-scale file system synchronization tool, dsync+, to help NASA and others simplify the transfer of huge scientific data sets across distance using Longbows – supporting non-InfiniBand storage arrays providing they are fast enough to keep up.

Unprotected ftp transfers – 30 Mbytes/second file-level copies. Secure dsync+/Longbow E100 transfers – 940 Mbytes/second file-level copies.

Page 30: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

1.  ACA  100    Asia  Connects  America  100  Gbps,  by  November  2014  Challenge  issued  by  Yves  Poppe  at  APAN  37  in  Budung,  Indonesia,  February  2014  

2.  InfiniBand  over  trans-­‐Pacific  distance  Made  possible  with  Obsidian  Strategics  Longbow  range  extenders  

3.  Galaxy  of  Supercomputers  Supercomputer  interconnect  topology  work    by  Y.  Deng,  M.  Michalewicz  and  L.  Orlowski  Obsidian  Strategics  Crossbow  InfiniBand  router  

4.  Applica7on  layer  from  simplest  file  transfer:  dsync+    to  complex  workflows:  ADIOS  to  mulW-­‐scale  models  

The  InfiniCortex  Components

Page 31: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Galaxy  of  Supercomputers

Study  of  topologies  !!!!

Yuefan  Deng,  A*CRC  &  Stony  Brook  University  Lukasz  Orlowski,  A*CRC  &  Stony  Brook  University  

Marek  Michalewicz,  A*CRC  &  Stony  Brook  University

Page 32: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

• Supercomputers  located  at  different  geolocaWons  connected  into  a  Nodes  of  Super-­‐Network  (Super-­‐Graph)  

• Supercomputers  may  have  arbitrary  interconnect  topologies  

• Galaxy  is  based  on  a  topology  with  small  diameter  and  lowest  possible  link  number  

• In  terms  of  graph  representaWon  it  is  embedding  of  graphs  represenWng  Supercomputers’  topologies  into  a  graph  represenWng  the  Galaxy  topology

Galaxy  of  Supercomputers

Page 33: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Embedding  of  a  5-­‐connected  graph  on  32  nodes  into  itself  proves  to  be  comparable  to  TOFU  or  5D  torus  with  equal  or  similar  number  of  nodes.  

32k5⊗32k5

0 256 512 768 1024

0

256

512

768

1024

Name  of  topology

Number  of  nodes

Number  of  link

Diameter Mean  path  length

32k5⊗32k5 1024 2640 9 6.31

Tofu  (6x5x3) 1080 5400 9 5.04

5D  torus  (4x4x4x4x4)

1024 5120 10 5

Tofu  (4x4x8) 1536 7680 11 5.67

poster  at  ISC’14,  Leipzig,  June  2014

Page 34: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

System Software: BGFC - Next Generation InfiniBand Subnet Manager

A 1Gbyte/second 40km routed and encrypted InfiniBand link was demonstrated at SC12, using BGFC to orchestrate the fabric comprising two subnets with very different internal topologies  (“compound  topology”).

Galaxy of supercomputers: proof of concept

Page 35: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Capability: Low Latency Server Aggregation

• NASA Ames purchased 16 Longbow C100 units expanding their flagship Itanium-based Columbia supercomputer to share jobs across one-mile of dark fiber to a second building.

• Expansion of supercomputers

and data centers must contend with power and cooling constraints – these problems can often be resolved by Longbows.

• A similar model works for the

linking of containerized data center pods in the field or within modular data centers.

Galaxy of supercomputers: proof of concept

Page 36: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Galaxyd4 Network - Topology - Supercomputing Wfd4

Revision C - f6XdfXWfd4

SingaporeIB subnet ECoreE

LongbowEdff x10

LongbowEdff

Asia dIB subnet ETitechE

LongbowEdff

LocalResources

LocalResources

New Orleans - SCd4IB subnet ECoreE

LongbowEdff x10

LongbowEdff

LocalResources

USA dIB subnet EGeorgia TechE

LongbowEdff

LocalResources

Booth BIB EiVECV PerthE

LongbowC4ff

LongbowC4ff

CrossbowR4ff

IBSwitch

IBSwitch

CrossbowR4ff

4X InfiniBand LAN

dfG WAN

LongbowC4ff

Booth CIB EObsidianE

Booth DIB ENCIVCanberraE

LongbowEdff

Australia dIB subnet EiVECV PerthE

LongbowEdff

LocalResources

Australia WIB subnet ENCIV CanberraE

LongbowEdff

LocalResources

LongbowEdff

Page 37: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

1.  ACA  100    Asia  Connects  America  100  Gbps,  by  November  2014  Challenge  issued  by  Yves  Poppe  at  APAN  37  in  Budung,  Indonesia,  February  2014  

2.  InfiniBand  over  trans-­‐Pacific  distance  Made  possible  with  Obsidian  Strategics  Longbow  range  extenders  

3.  Galaxy  of  Supercomputers  Supercomputer  interconnect  topology  work    by  Y.  Deng,  M.  Michalewicz  and  L.  Orlowski  Obsidian  Strategics  Crossbow  InfiniBand  router  

4.  Applica7on  layer  from  simplest  file  transfer:  dsync+    to  complex  workflows:  ADIOS  to  mulW-­‐scale  models  

The  InfiniCortex  Components

Page 38: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Task  decomposi7on    of  LAMMPS  molecular  dynamics  code    

on  Galaxy  of  Supercomputers

Par7cipants:  ORNL,  Georgia  Tech,  A*CRC

Page 39: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  Fusion  experiment  for  which  large  ECEI  fast  camera  data  from  Singapore  to  USA    

must  be  transferred

Par7cipants:  ORNL,  U  Tennessee,  Rutgers  U,  A*CRC

Page 40: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Performance  of  InfiniBand  over  Long  

DistancesDr  Jonathan  LOW  Seng  LIM  Geok  Lian  TAN  Dr  Sing-­‐Wu  LIOU  Dr  Dominic  CHIEN  Stephen  WONG

Page 41: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Tests  Conducted

• Table-­‐top  Test  at  A*CRC  – E100  models  

• Conducted  latency  simulations  on  E100  model  • Establish  theoretical  performance  limits  

• Between  A*CRC  and  SingAREN  in  Singapore  – 2  x  10G  links  to  SingAREN    

• compare  with  theoretical  values  • Between  A*CRC  and  NTU  in  Singapore  

– Fibre  Loop-­‐back    • Between  GIS  and  A*CRC  

- E400  models  40Gbps  (32  Gbps  sustained  throughput)  • A*CRC  to  ANU  in  Australia  

– Layer  3    • A*CRC  to  Titech  in  Japan  

– Layer  2

Page 42: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

  Initial  Setup  • Latency  estimated  based  on  speed  of  light  

calculation  • Lab  Results  =  7.87Gbps

  Observation  • Dsync+  Transfer  Rate  optimizing  by  tuning  

buffer  and  IO-­‐block  size  • Better  performance  with  fewer  but  larger  

chunk  of  data  • IB  SDR  Data  Rate  max  at  8Gbps  due  to  

overheads  from  encoding  scheme  • 984MBps  (7.87Gbps)  =  ~98.4%  efficiency

Table-­‐Top  Test  at  A*CRC  (Results)

Page 43: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

NTU  Fibre  Loopback  (Results)   Setup  • Testing  at  A*Star  across  ~50km  using  dark  

fibres  looping  back  at  NTU

  Observation  • Achieved  above  90%  efficiency

984MB/s  /7.87Gbps  ≈  98.4%

225µs

Page 44: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

SingAREN  Setup

• Conduct  performance  testing  on  SLIX  Infrastructure,  across  50KM  • With  Layer  2  access  and  trunk  links

10GE

10GE

A*STAR  Mux SingAREN  Mux

SLIX  Infrastructure

Router

2  x  10G  Links

24KM

24KM

Page 45: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Testing  with  SingAREN  (Results)• Latency  increased  when  traffic  passes  through  Network  Equipment  (Router  and  Switches)  • Performance  of  on  HPL  test  is  inversely  proportionate  to  latency.  It  was  observed  that  CPUs  

were  not  running  at  full  capacity  as  latency  increases  • There  is  no  difference  observed  on  the  performance  across  access  or  trunk  links

Performance  of  HPL

0

125

250

375

500

Tabletop  (1m)

SingAREN    50km    

(layer  2  across  SLIX)

SingAREN    24km    

(Layer  2  trunk  across  SLIX)

467.3448.954

379.699406.112 405.389

5.1

225.2

437.07

225.2 229.85MPI  PingPong  

(usec)  

HPL  (Problem  size:  86016)  [GFLOPS]  

Page 46: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Testing  with  SingAREN  (Results)

Performance  of  Dsync

0

250

500

750

1000

Tabletop  (1m)

SingAREN    50km    

(layer  2  across  SLIX)

SingAREN    24km    

(Layer  2  trunk  across  SLIX)

984 984952 953 957

5.1

225.2

437.07

225.2 229.85

MPI  PingPong  (usec)  

Dsync  (MB/s)  

• Performance  of  Dsync  is  not  affected  by  the  latency  increase  across  50km  and  network  equipments

Page 47: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Testing  with  Australia-­‐NCI  (Setup)

• Conduct  test  with  NCI  over  layer  3  network  with  capacity  capped  at  1Gbps  • Traffic  routed  from  Canberra/Sydney/SG,  distances  approx.  14,500KM

NCI/A*CRC(network(topology(for(Longbow(experiments ©(Na=onal(Computa=onal(Infrastructure,((July(2014.(Version(1.1

[email protected]

Longbow'E100

20G

AARNet Backbone following Sydney path out to Singapore as of July 29th, 2014

AARNet Australian Backbone (AARNet3)

Longbow'E100

Infiniband Links

Ethernet Links

2.5Gbit'(STM516)

'

NCI Raijin Computation Fabric

NCI Storage IB Fabric

A*CRC @ A*STAR

NCI'server

ncihuxhub'6509

AARNet'7604'CPE

Australian International Links

SingAREN

1'GbE

10GbE

10GbE

10GbE

10GbE

202.130.56.133

Nexus'5596

A*CRC'switching

192.43.239.44

192.43.239.43192.43.239.50 (Management interface)

10GbE

VLAN 192 (192.43.239.0/24), gateway 192.43.239.1

192.43.239.0/25 routed out of Singapore to ensure On-Net and using AARNet's separate 1GbE connection into SingAREN

Canberra,'ACT

Singapore

Leonard'Huxley'DCNCI'DC

A*CRC'rouRng

1'(?)'GbE

Page 48: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Testing  with  Australia  -­‐  NCI  (Results)• Poor  Performance  at  the  beginning:  

– RTT  Ping  147ms  – 50Mbps  using  iperf  benchmark  – dsync  consistent  at  6MB/s  (48Mbps),  failed  with  

tweaked  settings  – MPI  PingPong  failed  to  send  4MB  messages  

• Required  some  investigation  into  this.  – Using  iperf  to  test  different  segments  of  the  

A*CRC  –  NCI  network  link  – Found  unable  to  send  Jumbo  frame  end  to  end.  

Workaround  by  reducing  traffic  MTU    • Best  results  obtained  at  the  end:  

– 800Mbps  using  iperf    – dsync  20MB/s  (did  not  try  optimal  settings  due  

to  time  constraints)  – 86MB/s(688Mbps)  using  IB  RDMA  Write  

benchmark  (172MB/s  bidirectional)  – 120  ms  RTT  ping  Pong  after  identifying  and  

fixing  an  MTU  mismatch  • Open  issues  still  remain  after  experiment.  

Performance  of  IB  (%)

0

25

50

75

100

Tabletop  (1m)

SingAREN    24km    

(layer  2  across  SLIX)

98.4 98.4 95.2 95.3 95.7

68.8

IB  (%)

Page 49: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Testing  with  Japan  –  Titech  (Setup)

TITANET

SINET4  (40Gbps)

kote-­‐dc-­‐L2S1.s4.sinet.ad.jp

sin-­‐dc-­‐L2S1.tein-­‐jp.net

SingAREN

10  Gbps  VLAN:3622,3623,3624

20  Gbps  VLAN:2001,2002,2003

20  Gbps  VLAN:2001,2002,2003

20  Gbps  VLAN:2001,2002,2003

10  Gbps  VLAN:2001,2002,2003

10  Gpbs  VLAN:2001,2002,2003

10  Gbps  VLAN:3622,3623,3624

10  Gbps  VLAN:3622,3623,3624

10  Gbps  VLAN:3622,3623,3624

10  Gbps  VLAN:3622,3623,3624

10  Gbps  VLAN:3622,3623,3624

Kfc.r.gsic.titech.ac.jp

netperf-­‐t2.g.gsic.titech.ac.jpsingapore-­‐mx80-­‐1.jgn-­‐x.jp kote-­‐dc-­‐gm1.tein-­‐jp.net

Kfc042.Kfc.r.gsic.titech.ac.jp

Archer3

20  Gbps  VLAN:2001,2002,2003

10  Gbps  VLAN:  2001

TITECH-­‐E100

ASTAR-­‐E100

10  Gbps  VLAN:  3622

10  Gbps  VLAN:  3624

(Traffic  Shaping:  8Gbps)

Page 50: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Testing  with  Japan  –  Titech  (Results)

Setup  • 10G  Circuit,  capped  at  8Gbps  • Layer  2  network  with  MTU  at  9000  • Approx.  6000KM  • Pre-­‐testing  of  circuit  with  Iperf,  able  to  

achieve  8Gbps  !!

• Setup  of  circuit  is  very  smooth  • Able  to  achieve  5.5Gbps  throughput  on  Day  • Able  to  achieve  7Gbps  in  the  late  evening  • Titech  network  switches  are  also  shared  by  

other  servers  doing  large  file  transfers.

Observation• Noticed  traffic  will  intermittent  dropped  to  

1.8Gbps  if  rate-­‐limit  is  set  to  above  5.5Gbps  and  packet  out  of  sequence  is  observed.  This  error  occurs  when  the  link  is  experiencing  packet  loss.  

• Workaround:  Lower  the  rate-­‐limit  to  avoid  packet  loss  due  to  congestions  for  achieving  good  performance.  

• Measured  performance  are  significantly  better  than  theoretical  value.  

• Need  further  investigation  on  packet  loss  approaching  the  higher  bandwidths

Results

Page 51: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Performance  of  InfiniBand  over  Long  Distances

Performance  of  IB  (%)

0.00

25.00

50.00

75.00

100.00

SingAREN    24km    

(layer  2  across  SLIX)

Japan  5000km    

(8Gbps  Layer  2)

Australia  14500km  

 1Gbps  Layer  3

95.3

87.5

68.8

0.2252

35.0519

73.5061

MPI  PingPong  (msec)  

IB  (%)

Page 52: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Summary

• Best  Performance  for  InfiniBand  over  long  distance  – Low  Latency  – Choose  shortest  possible  path  with  reduced  overheads  from  network  devices  – Support  Jumbo  Frame  – Direct  lightpath  preferred  – Preferred  dedicated  circuit  over  shared  bandwidth.  If  shared  bandwidth  is  unavoidable,  

enable  QOS  and  encryption  – Prevent  packet  loss  resulting  from  link  congestions  !

• Tuning  of  Server  and  network  configuration  is  required  to  achieve  good  performance  !• National  Research  and  Education  Network  (NREN)  plays  an  important  role  in  making  this  a  

success

Page 53: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  Singaporean  Team

SingAREN  

A/Prof  Francis  Lee  Prof  Lawrence  Wong  !NTU  

Stanley  Goh  

!!

A*CRC  

Dr  Marek  Michalewicz  (PI)  A/Prof  Tan  Tin  Wee  (PI)  Prof    Yuefan  Deng  (PI,  Galaxy  of  Supercomputers)  Yves  Poppe  (International  Carriers)  Tan  Geok  Lian  (Networking)    Lim  Seng  (Networking)  Dr  Jonathan  Low  (H/W,  S/W,  Applications)  Dr  Dominic  Chien  (S/W,  Applications)  Dr  Liou  Sing-­‐Wu  (S/W,  Applications)  Paul  Hiew  (H/W)  Stephen  Wong  (iVEC  connection)  

!!

Page 54: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)
Page 55: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)
Page 56: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  American  Team

Oak  Ridge  National  Lab  Dr  Scott  Klasky  Dr  Jong  Choi  !Rutgers  University  Prof  Manish  Parashar    !Georgia  Tech  Prof  Matthew  Wolf    Prof  Greg  Eisenhauer

Stony  Brook  University  Prof  Deng  Yuefan  Prof  Tahsin  Kurc  !University  of  Tennessee  Glenn  Brook  

Page 57: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  Australian  Team

NCI  Directorate  Prof.  Lindsay  Botten  Allan  Williams  !NCI  HPC  Systems  &  Cloud  Services  Dr.  Muhammad  Atif  Andrew  Wellington  Dongyang  Li  Jakub  Chrzeszczyk

NCI  Storage  &  Infrastructure  Daniel  Rodwell        Others  at  NCI  (Network)  Jason  Andrade  Darren  Coleman      ARRNET  Bruce  Morgan

iVEC  Dr  George  Beckett  Jenni  Harrison  Chris  Schlipalius  

Canberra:  National  Computing  Infrastructure

Perth:  iVEC

Page 58: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)
Page 59: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

The  Japanese  Team

TiTech:  Tsubame-­‐KFC:  Prof  Satoshi  Matsuoka  TEIN-­‐JP  NOC  Team,  KDDI  SINET  Team,  NII  NICT    !    

Page 60: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)
Page 61: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Obsidian  Strategics  Dr.  David  Southwell  Jason  Gunthorpe

Page 62: InfiniCortex - HPC Advisory Council€¦ · ( All devices support secure standards-based out-of-band management over Ethernet and serial ports, including SNMP, SSH-CLI (Cisco syntax)

Commercial  Sponsors  and  Partners  !

OpWcAccess    BIG    

BruHaas  Tata  CommunicaWons    

Ciena    Infinera    ESnet    SCinet    

Internet  2  AARnet    SingAREN  

DDN  Intel  SGI  

Mellanox