High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand...

32
Satinder Nijjar, 2018 High speed networking Configuration and testing

Transcript of High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand...

Page 1: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

Satinder Nijjar, 2018

High speed networking Configuration and testing

Page 2: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

2

InfiniBand Setup and Verification

Page 3: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

3

Info: Checking InfiniBand

Using standard Linux commands

Check the InfiniBand cards visible by the OS and the drivers are loaded

lspci, lsmod

Version of software is installed

modinfo mlx5_core

Version of firmware on the HCA

Page 4: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

4

Verify ConnectX Card is WorkingRun “lspci” to ensure all four IB cards are recognized by the system. The output should show all four controller

Run “lsmod” and verify that the InfiniBand drivers are present. The output should consist of a list of lb_ and mlx_ driver components

$ lspci | grep –i mellanox

05:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0c:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

84:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

8b:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

$ lsmod | grep -e ib_ -e mlx_

ib_ucm 20480 0

ib_ipoib 131072 0

ib_cm 45056 3 rdma_cm,ib_ucm,ib_ipoib

ib_uverbs 73728 2 ib_ucm,rdma_ucm

ib_umad 24576 0

mlx5_ib 192512 0

mlx4_ib 192512 0

ib_sa 36864 5 rdma_cm,ib_cm,mlx4_ib,rdma_ucm,ib_ipoib

ib_mad 57344 4 ib_cm,ib_sa,mlx4_ib,ib_umad

ib_core 143360 13

rdma_cm,ib_cm,ib_sa,iw_cm,nv_peer_mem,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_umad,ib_u

verbs,rdma_ucm,ib_ipoib

ib_addr 20480 3 rdma_cm,ib_core,rdma_ucm

ib_netlink 16384 3 rdma_cm,iw_cm,ib_addr

mlx4_core 344064 2 mlx4_en,mlx4_ib

mlx5_core 524288 1 mlx5_ib

mlx_compat 16384 18

rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,ib_mad,ib_ucm,ib_netlink,ib_ad

dr,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib

Page 5: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

5

Verify Software ReleaseVerify that the OFED software is correctly installed

DGX-1 OS release 1.0 - OFED software 3.2DGX-1 OS release 2.0 - OFED software 3.4DGX-1 OS release 3.0 - OFED software 4.0 DGX-1 OS release 4.0 - OFED software 4.4

Restart the InfiniBand service

Restart the Service Manager service

Verify that both services have started

For further reference, check the User Guide chapter on Infiniband card replacement:

$ modinfo mlx5_core | grep -i version | head -1

Version : 4.4-2.0.7

$ sudo service openibd restart

$ service opensmd restart

$ service openibd status

$ service opensmd status

http://docs.nvidia.com/dgx/dgx1-user-guide/maintenance.html#task_setting-up-infiniband

Page 6: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

6

Verify Firmware ReleaseVerify the firmware version.

DGX-1 OS release 1.0 is 12.16.1020

DGX-1 OS release 2.0 is 12.17.1010

DGX-1 OS release 3.0 is 12.18.1000

DGX-1 OS release 4.0 is 12.24.1000

If the firmware version does not match and requires update. Run this script to perform firmware update. After the reboot, repeat step d) to confirm version

$ cat /sys/class/infiniband/mlx5*/fw_ver

12.24.1000

12.24.1000

12.24.1000

12.24.1000

$ sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl

Attempting to perform Firmware update...

Querying Mellanox devices firmware ...

Device #1:

----------

Device Type: ConnectX4

Part Number: MCX455A-ECA_Ax

Description: ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE;

single-port QSFP28; PCIe3.0 x16; ROHS R6

PSID: MT_2180110032

PCI Device Name: 05:00.0

Base GUID: 248a0703004a5368

Base MAC: 0000248a074a5368

Versions: Current Available

FW 12.16.1020 12.24.1000

PXE 3.4.0812 3.5.0603

UEFI 14.16.0017 14.17.0011

Status: Update required

…snipped…

Status: Up to date

Log File: /tmp/mlnx_fw_update.log

Please reboot your system for the changes to take effect.

Page 7: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

7

Challenge: IB Setup & Verification

How many InfiniBand Cards are installed?

What are the PCI bus addresses of all the IB cards on your system?

What version of OFED software is present?

What is the firmware version on each card?

Hint: “ibstat -l“ will list all Mellanox Devices

Page 8: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

8

Solution: IB Setup & Verification

Retention Clip

ib3: PCI 8b.00.0

$ lspci | grep -i mellanox

05:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0c:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

84:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

8b:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

ib2: PCI 84.00.0

ib1: PCI c.00.0

ib0: PCI 5.00.0

How many InfiniBand Cards are installed?

What are the PCI bus addresses of all the IB cards on your system?

Use the “lspci” command to list all PCI buses and devices in the system

Page 9: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

9

Solution

What version of OFED software is present?

What is the firmware version on each card?

$ cat /sys/class/infiniband/mlx5*/fw_ver

12.17.1010

12.17.1010

12.17.1010

12.17.1010

$ modinfo mlx5_core | grep -i version | head -1

Version : 4.4-2.0.7

Page 10: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

10

IB/Ethernet Mode

Page 11: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

11

Info: InfiniBand or Ethernet

Why IB mode?

IB is the default mode for DGX-1 clustering

For multi node training

Why Ethernet mode?

NCCL with RDMA can also be used for multi node training (RoCE)

Customer can leverage existing NAS systems

How do you change modes

Using the Mellanox Software Tools we can toggle the device at “/dev/mst/mt4115_pciconf#“ between 1 for InfiniBand and 2 for Ethernet

Page 12: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

12

UPDATING LINK PROTOCOLa) Run “lspci” to identify the current link

protocol. To reset link type, we would need to download and install the Mellanox Firmware Tools (MFT) athttp://www.mellanox.com/page/management_tools

a) Start the mst driver by typing “sudo mst start”. Query the host for the Mellanox device ID MT4115. This system has four adapters 0-3.

a) Set link type:InfiniBand = 1Ethernet = 2E.g. Set device 0 to link type 1

Repeat to set link type for each adapter. Reboot. Repeat step a) to verify new settings

$ lspci | grep –i mellanox

05:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0c:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

84:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

8b:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

OR

05:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0c:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

84:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

8b:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

# ls –ls /dev/mst

0 crw------- 1 root root 238, 0 Mar 13 15:44 mt4115_pciconf0

0 crw------- 1 root root 238, 0 Mar 13 15:44 mt4115_pciconf1

0 crw------- 1 root root 238, 0 Mar 13 15:44 mt4115_pciconf2

0 crw------- 1 root root 238, 0 Mar 13 15:44 mt4115_pciconf3

# mlxconfig –d /dev/mst/mt4115_pciconf0 set LINK_TYPE_P1=1

Device #1:

----------

Device type: ConnectX4

Name: N/A

Description: N/A

Device: /dev/mst/mt4115_pciconf0

Congirations: Next Boot New

LINK_TYPE_P1 IB(1) IB(1)

Apply new Configuration? ? (y/n) [n] : _

Page 13: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

13

Determine the Port ConfigurationRun “ibv_devinfo | grep -e "hca_id\|state\|link_layer"” to determine the current link

configuration and state :

Cards configured for InfiniBand

Cards configured for Ethernet

“ibstat” could also have been used to collect this information

$ ibv_devinfo | grep -e "hca_id\|link_layer"

hca_id: mlx5_3 state:

PORT_ACTIVE (4)

link_layer:

Ethernet

hca_id: mlx5_2 state:

PORT_ACTIVE (4)

link_layer:

Ethernet

hca_id: mlx5_1 state:

PORT_ACTIVE (4)

link_layer:

Ethernet

$ ibv_devinfo | grep -e "hca_id\|state\|link_layer"

hca_id: mlx5_3 state:

PORT_ACTIVE (4)

link_layer:

InfiniBand

hca_id: mlx5_2 state:

PORT_ACTIVE (4)

link_layer:

InfiniBand

hca_id: mlx5_1 state:

PORT_ACTIVE (4)

link_layer:

InfiniBand

hca_id: mlx5_0 state:

PORT_ACTIVE (4)

link_layer:

InfiniBand

Page 14: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

14

Verify Status using ibstatCA 'mlx5_2'

CA type: MT4115

Number of ports: 1

Firmware version: 12.24.1000

Hardware version: 0

Node GUID: 0x248a0703001effde

System image GUID: 0x248a0703001effde

Port 1:

State: Initializing

Physical state: LinkUp

Rate: 100

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x248a0703001effde

Link layer: InfiniBand

CA 'mlx5_3'

CA type: MT4115

Number of ports: 1

Firmware version: 12.24.1000

Hardware version: 0

Node GUID: 0x7cfe900300118f22

System image GUID: 0x7cfe900300118f22

Port 1:

State: Initializing

Physical state: LinkUp

Rate: 100

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x7cfe900300118f22

Link layer: InfiniBand

$ ibstat

CA 'mlx5_0'

CA type: MT4115

Number of ports: 1

Firmware version: 12.24.1000

Hardware version: 0

Node GUID: 0x248a0703000de288

System image GUID: 0x248a0703000de288

Port 1:

State: Down

Physical state: Polling

Rate: 10

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x248a0703000de288

Link layer: InfiniBand

CA 'mlx5_1'

CA type: MT4115

Number of ports: 1

Firmware version: 12.24.1000

Hardware version: 0

Node GUID: 0x248a0703000de26c

System image GUID: 0x248a0703000de26c

Port 1:

State: Initializing

Physical state: LinkUp

Rate: 100

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x248a0703000de26c

Link layer: InfiniBand

Page 15: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

15

Mellanox Software Tools (mst)Start Mellanox Software Tools (mst) and verify the module loaded

$ sudo mst status

MST modules:

------------

MST PCI module is not loaded

MST PCI configuration module loaded

MST devices:

------------

/dev/mst/mt4115_pciconf0 - PCI configuration cycles access.

domain:bus:dev.fn=0000:05:00.0 adadr.reg=88

data.reg=92

Chip revision is: 00

/dev/mst/mt4115_pciconf1 - PCI configuration cycles access.

domain:bus:dev.fn=0000:0c:00.0 addr.reg=88

data.reg=92

Chip revision is: 00

/dev/mst/mt4115_pciconf2 - PCI configuration cycles access.

domain:bus:dev.fn=0000:84:00.0 addr.reg=88

data.reg=92

Chip revision is: 00

/dev/mst/mt4115_pciconf3 - PCI configuration cycles access.

domain:bus:dev.fn=0000:8b:00.0 addr.reg=88

data.reg=92

Chip revision is: 00

$ sudo mst status

MST modules:

------------

MST PCI module is not loaded

MST PCI configuration module is not loaded

PCI Devices:

------------

05:00.0

84:00.0

0c:00.0

8b:00.0

$ sudo mst start

Page 16: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

16

Update the port configurations to Ethernet$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf0 set LINK_TYPE_P1=2

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf1 set LINK_TYPE_P1=2

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf2 set LINK_TYPE_P1=2

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf3 set LINK_TYPE_P1=2

Change the configuration to all four ports

Verify the configuration changes were appliedReboot the system

Verify that the desired configuration is now running

$ sudo mlxconfig query |grep -e LINK_TYPE -e "Device.*mst"

PCI device: /dev/mst/mt4115_pciconf3 LINK_TYPE_P1 ETH(2)

PCI device: /dev/mst/mt4115_pciconf2 LINK_TYPE_P1 ETH(2)

PCI device: /dev/mst/mt4115_pciconf1 LINK_TYPE_P1 ETH(2)

PCI device: /dev/mst/mt4115_pciconf0 LINK_TYPE_P1 ETH(2)

$ sudo reboot

$ ibv_devinfo |grep -e "hca_id\|link_layer"

hca_id: mlx5_3 link_layer: Ethernet

hca_id: mlx5_2 link_layer: Ethernet

hca_id: mlx5_1 link_layer: Ethernet

hca_id: mlx5_0 link_layer: Ethernet

Page 17: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

17

Update the port configurations to InfiniBand$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf0 set LINK_TYPE_P1=1

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf1 set LINK_TYPE_P1=1

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf2 set LINK_TYPE_P1=1

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf3 set LINK_TYPE_P1=1

Change the configuration to all four ports

Verify the configuration changes were appliedReboot the system

Verify that the desired configuration is now running

$ sudo mlxconfig query |grep -e LINK_TYPE -e "Device.*mst"

PCI device: /dev/mst/mt4115_pciconf3 LINK_TYPE_P1 IB(1)

PCI device: /dev/mst/mt4115_pciconf2 LINK_TYPE_P1 IB(1)

PCI device: /dev/mst/mt4115_pciconf1 LINK_TYPE_P1 IB(1)

PCI device: /dev/mst/mt4115_pciconf0 LINK_TYPE_P1 IB(1)

$ sudo reboot

$ ibv_devinfo |grep -e "hca_id\|link_layer"

hca_id: mlx5_3 link_layer: InfiniBand

hca_id: mlx5_2 link_layer: InfiniBand

hca_id: mlx5_1 link_layer: InfiniBand

hca_id: mlx5_0 link_layer: InfiniBand

Page 18: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

18

Challenge: IB/Ethernet mode switching

Determine the current port configuration

Start Mellanox Software Tools (mst)

Set the one of the cards (mt4115_pciconf#) to Ethernet mode

Page 19: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

19

Solution: IB/Ethernet mode switching

Determine the current port

configuration of all 4 cards

$ ibv_devinfo | grep -e "hca_id\|state\|link_layer"

hca_id: mlx5_3

link_layer:

InfiniBand

hca_id: mlx5_2

link_layer:

InfiniBand

hca_id: mlx5_1

link_layer:

InfiniBand

hca_id: mlx5_0

link_layer:

InfiniBand

Page 20: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

20

Solution: IB/Ethernet mode switching

MST tools

$ sudo mst status

MST modules:

------------

MST PCI module is not loaded

MST PCI configuration module is not loaded

PCI Devices:

------------

05:00.0

84:00.0

0c:00.0

8b:00.0

$ sudo mst start

$ sudo mst status

MST modules:

------------

MST PCI module is not loaded

MST PCI configuration module loaded

MST devices:

------------

/dev/mst/mt4115_pciconf0 - PCI configuration cycles access.

domain:bus:dev.fn=0000:05:00.0 adadr.reg=88

data.reg=92

Chip revision is: 00

/dev/mst/mt4115_pciconf1 - PCI configuration cycles access.

domain:bus:dev.fn=0000:0c:00.0 addr.reg=88

data.reg=92

Chip revision is: 00

/dev/mst/mt4115_pciconf2 - PCI configuration cycles access.

domain:bus:dev.fn=0000:84:00.0 addr.reg=88

data.reg=92

Chip revision is: 00

/dev/mst/mt4115_pciconf3 - PCI configuration cycles access.

domain:bus:dev.fn=0000:8b:00.0 addr.reg=88

data.reg=92

Chip revision is: 00

Page 21: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

21

Solution: IB/Ethernet mode switching

Set the 4th card to Ethernet mode

$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf0 set LINK_TYPE_P1=2

Device #4:----------

Device type: ConnectX4 PCI device: /dev/mst/mt4115_pciconf0

Configurations: Next Boot NewLINK_TYPE_P1 IB(1) ETH(2)

Apply new Configuration? ? (y/n) [n] : yApplying... Done!-I- Please reboot machine to load new configurations.

$ sudo reboot

(have a coffee)

$ ibv_devinfo | grep -e "hca_id\|link_layer"

hca_id: mlx5_3link_layer: Ethernet

Page 22: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

22

Solution: IB/Ethernet mode switching

Set the 4th card to InfiniBand mode

$ sudo mst start$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf0 set LINK_TYPE_P1=1

Device #1:----------

Device type: ConnectX4 PCI device: /dev/mst/mt4115_pciconf0

Configurations: Next Boot NewLINK_TYPE_P1 IB(1) ETH(2)

Apply new Configuration? ? (y/n) [n] : yApplying... Done!-I- Please reboot machine to load new configurations.

$ sudo reboot

(have a coffee)

$ ibv_devinfo | grep -e "hca_id\|link_layer"

hca_id: mlx5_3link_layer: Infiniband

Page 23: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

23

Bandwidth & Latency between nodes

Page 24: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

24

Info: Bandwidth and Latency

ib_read_bwThis command is part of Mellanox testperf package: https://community.mellanox.com/docs/DOC-2086. This command

will be installed with the installation of MLNX_OFED).

Example:

(server)

ib_read_bw -d mlx5_2

(client)

ib_read_bw -d mlx5_2 --report_gbits <server IP address>

ib_read_lat

This command calculates the latency of RDMA read operation of message_size between a pair of DGX-1’s.

Example:

(server)

ib_read_lat -d mlx5_2

(client)

ib_read_lat -d mlx5_2 <server IP address>

Page 25: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

25

Team Challenge: Bandwidth and Latency

● What is the bandwidth between two DGX-1s?

● Is the bandwidth the same in both directions?

● What is the latency between two DGX-1s?

● Is the latency the same in both directions?

● How does latency compare to ICMP (“ping”)?

Page 26: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

26

Solution: Bandwidth and Latency

● What is the bandwidth between two DGX-1s?

(server, to see the ip) $ ip addr (server) $ ib_read_bw -d mlx5_0

(client) $ ib_read_bw --report_gbits <ip addr>

---------------------------------------------------------------------------------------RDMA_Read BW Test

Dual-port : OFF Device : mlx5_3Number of qps : 1 Transport type : IBConnection type : RC Using SRQ : OFFTX depth : 128CQ Moderation : 100Mtu : 4096[B]Link type : IBOutstand reads : 16rdma_cm QPs : OFFData ex. method : Ethernet

---------------------------------------------------------------------------------------local address: LID 0x23 QPN 0x2fe6 PSN 0xe11032 OUT 0x10 RKey 0x1bb7a3 VAddr 0x002aaaaab30000remote address: LID 0x1b QPN 0x493d PSN 0x6afd86 OUT 0x10 RKey 0x09f4bc VAddr 0x002aaaaab30000

---------------------------------------------------------------------------------------#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

Conflicting CPU frequency values detected: 2862.750000 != 1453.460000. CPU Frequency is not max.65536 1000 95.35 95.31 0.181792

---------------------------------------------------------------------------------------

Page 27: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

27

Solution: Bandwidth and Latency (contd.)

● What is the latency between two DGX-1s?

(server, to see the ip) $ ip addr (server) $ ib_read_lat -d mlx5_0

(client) $ ib_read_lat <ip addr>

---------------------------------------------------------------------------------------RDMA_Read Latency Test

Dual-port : OFF Device : mlx5_3Number of qps : 1 Transport type : IBConnection type : RC Using SRQ : OFFTX depth : 1Mtu : 4096[B]Link type : IBOutstand reads : 16rdma_cm QPs : OFFData ex. method : Ethernet

---------------------------------------------------------------------------------------local address: LID 0x23 QPN 0x2fe7 PSN 0xd8e1de OUT 0x10 RKey 0x1bae95 VAddr 0x002aaaaaad9000remote address: LID 0x1b QPN 0x493e PSN 0x56549e OUT 0x10 RKey 0x0a60df VAddr 0x002aaaaaadb000

---------------------------------------------------------------------------------------#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9%

percentile[usec]Conflicting CPU frequency values detected: 2011.625000 != 2624.273000. CPU Frequency is not max.Conflicting CPU frequency values detected: 2671.710000 != 3597.257000. CPU Frequency is not max.2 1000 2.35 17.08 2.40 2.43

0.55 2.48 17.08 ---------------------------------------------------------------------------------------

Page 28: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

29

Solution: Bandwidth and Latency (contd.)

● How does latency compare to ICMP (“ping”)?

(server, to see the ip) $ ip addr

(client) $ ping <ip addr>

PING 10.31.229.56 (10.31.229.56) 56(84) bytes of data.

64 bytes from 10.31.229.56: icmp_seq=1 ttl=64 time=0.306 ms

64 bytes from 10.31.229.56: icmp_seq=2 ttl=64 time=0.185 ms

64 bytes from 10.31.229.56: icmp_seq=3 ttl=64 time=0.285 ms

64 bytes from 10.31.229.56: icmp_seq=4 ttl=64 time=0.269 ms

64 bytes from 10.31.229.56: icmp_seq=5 ttl=64 time=0.241 ms

--- 10.31.229.56 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 3998ms

rtt min/avg/max/mdev = 0.185/0.257/0.306/0.043 ms

257 µs over Ethernet vs 2.3 µs over IB (IB is ~100x faster)

Page 29: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

30

Troubleshooting IB State

Page 30: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

31

Using ibstat to troubleshoot connection states

CA 'mlx5_1'

CA type: MT4115

Number of ports: 1

Firmware version: 12.17.1010

Hardware version: 0

Node GUID: 0x248a0703000de26c

System image GUID: 0x248a0703000de26c

Port 1:

State: Active

Physical state: LinkUp

Rate: 100

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x248a0703000de26c

Link layer: InfiniBand

CA 'mlx5_2'

CA type: MT4115

Number of ports: 1

Firmware version: 12.17.1010

Hardware version: 0

Node GUID: 0x248a0703001effde

System image GUID: 0x248a0703001effde

Port 1:

State: Active

Physical state: LinkUp

Rate: 56

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x248a0703001effde

Link layer: InfiniBand

CA 'mlx5_3'

CA type: MT4115

Number of ports: 1

Firmware version: 12.17.1010

Hardware version: 0

Node GUID: 0x7cfe900300118f22

System image GUID: 0x7cfe900300118f22

Port 1:

State: Inactive

Physical state: LinkUp

Rate: 100

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x2651e848

Port GUID: 0x7cfe900300118f22

Link layer: InfiniBand

Page 31: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check

Physical State The physical state field indicates the state of the cable.

This is very similar to the link state on Ethernet.

Polling

There is no connection from this card to another card or switch.

Check to make sure cable is installed and the device on the other end of the cable is on

and working properly.

LinkUpThere is link and connection between this node and the device at the other end of the cable.This doesn’t mean it’s configured and ready to send data, just that the physical connection is up.

State The state shows if the HCA port is up, and if it’s been discovered by the subnet manager.

DownThere is no physical connection between the HCA card in this node and the device at theother end of the cable. This is almost always seen when ‘Physical State’ shows the value ‘Polling.’

Initializing Physical connection has been made between the HCA in this node and the device at the other end of the cable, but it hasn’t been discovered by the subnet manager. You need to make sure you have a managed switch, or more likely that the ‘opensm‘ process is running on a node in your cluster

Active The physical connection is up and working, and the port has been discovered by the subnet manager.

The port is in a normal operational state.

RateThe rate is the speed at which the port is operating. This should match the speed of the slowest device between the node's HCA

and the device at the other end of the cable.

Description of the ibstat output:

Page 32: High speed networking Configuration and testing€¦ · Configuration and testing. 2 InfiniBand Setup and Verification. 3 Info: Checking InfiniBand Using standard Linux commands Check