Post on 18-Jul-2015
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 1
The State of libfabric in Open MPI
Jeffrey M. Squyres jsquyres@cisco.com
16 March 2015
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 2
What is the Message Passing Interface (MPI)?
A standards document
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 3
Using MPI
Hardware and software implement the interface in the MPI standard (book)
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 4
MPI implementations
There are many implementations of the MPI standard
Some are closed source
Others are open source
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 5
Open MPI
Open MPI is a free, open source implementation of the MPI standard
www.open-mpi.org
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 7
Server
Server
MPI abstracts away the underlying network
MPI_Send(…) MPI_Recv(…)
Network
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 8
Server
Server
MPI abstracts away the underlying network
MPI_Send(…) MPI_Recv(…)
Network MAGIC
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 9
Server
Server
Open MPI multiplexes to the underlying network stack
MPI_Send(…) MPI_Recv(…)
TCP Shared memory Verbs MXM Portals
SCIF Loopback uGni PSM libfabric
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 10
Two major types of transports
Byte Transport Layer (BTL) plugins
Matching Transport Layer (MTL) plugins
MPI_Send(…)
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 11
BTL
• Inherently multi-device • Round-robin for
small messages • Striping for large messages
• Major protocol decisions and MPI message matching driven by an Open MPI engine
Byte Transport Layer (BTL) plugins
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 12
Matching Transport Layer (MTL) plugins
MTL
• Most details hidden by network API
• MXM • Portals • PSM
• As a side effect, must handle: • Process loopback • Server loopback (usually via shared memory)
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 13
BTL and MTL plugins
Byte Transport Layer (BTL) plugins
Matching Transport Layer (MTL) plugins
• IB / iWarp (verbs) • Portals • SCIF • Shared memory • TCP • uGNI • usNIC (verbs)
• MXM • Portals • PSM
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 14
• IB / iWarp (verbs) • Portals • SCIF • Shared memory • TCP • uGNI • usNIC
Now featuring 200% more libfabric
Byte Transport Layer (BTL) plugins
Matching Transport Layer (MTL) plugins
• MXM • Portals • PSM • ofi
libfabric
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 15
Linux linker: fun fact
MPI process
libmpi.so
ofi MTL
libfabric.so
Linker auto loads
dependency
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 16
Linux linker: fun fact
MPI process
libmpi.so
usnic BTL ofi MTL
libfabric.so
Linker does not re-load
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 17
Libfabric-based plugins
libfabric
usnic BTL ofi MTL
• Cisco developed • usNIC-specific • OFI point-to-point / UD • Tested with usNIC
• Intel developed • Provider neutral • OFI tag matching • Tested with PSM
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 18
First experiment usnic BTL: verbs à libfabric
verbs bootstrapping
verbs message passing
Can loosely classify the usnic BTL into two parts
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 19
First experiment usnic BTL: verbs à libfabric
verbs bootstrapping
verbs message passing
sideband bootstrapping
1. Find the corresponding ethX device
2. Obtain MTU 3. Open usNIC-specific
configuration options
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 20
First experiment usnic BTL: verbs à libfabric
verbs bootstrapping
verbs message passing
sideband bootstrapping
libfabric bootstrapping
à
libfabric message passing
à
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 21
Comparison results
verbs bootstrapping
verbs message passing
sideband bootstrapping
libfabric bootstrapping
à
libfabric message passing
à Pretty much a 1:1 swap of verbs à libfabric calls
Bootstrapping sequence totally different
libfabric requires no sideband bootstrapping
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 22
Second experiment Two different libfabric usage models
• For a specific provider § Ask fi_getinfo() for
prov_name=“usnic” • Use usNIC extensions
§ Netmask, link speed, IP device name, etc.
• usNIC-specific error messages
• For any tag-matching provider
• No extension use § 100% portable
• Generic error messages
usnic BTL ofi MTL
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 23
Second experiment Two different libfabric usage models
• For a specific provider § Ask fi_getinfo() for
prov_name=“usnic” • Use usNIC extensions
§ Netmask, link speed, IP device name, etc.
• usNIC-specific error messages
• For any tag-matching provider
• No extension use § 100% portable
• Generic error messages
usnic BTL ofi MTL
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 24
libfabric performance vs. Linux verbs
1.9
1.95
2
2.05
2.1
2.15
2.2
2.25
2.3
2.35
2.4
0.1 1 10 100
Tim
e (
mic
rose
conds)
Buffer size
Open MPI with usNIC: IMB PingPong Latency
imb-pingpong-ompi-1.8-verbs.outimb-pingpong-ompi-1.8-libfabric.out
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 25
61000
62000
63000
64000
65000
66000
67000
68000
69000
1e+06
Bandw
idth
(m
egabit
s/se
cond)
Buffer size
Open MPI with usNIC: IMB SendRecv Bandwidth
imb-sendrecv-ompi-1.8-verbs.outimb-sendrecv-ompi-1.8-libfabric.out
libfabric performance vs. Linux verbs
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 26
Version roadmap
Git master Main development
v1.8 / Stable release series
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 27
Version roadmap
Git master Main development
v1.8 / Stable release series
Past
v1.9 Feature series
Present Future libfabric
libfabric
libfabric
Dec 2014
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 28
Currently embedding libfabric
• openmpi-master § opal
• mca § common
• libfabric • include • prov • src …
Because there is no public libfabric release (yet) Will be removed before Open MPI v1.9 release
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 29
Periodic refresh from libfabric Github
• openmpi-master § opal
• mca § common
• libfabric • include • prov • src …
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 30
Periodic refresh from libfabric Github
• openmpi-master § opal
• mca § common
• libfabric • include • prov • src …
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 31
Periodic refresh from libfabric Github
• openmpi-master § opal
• mca § common
• libfabric • include • prov • src …
Moar new libfabric goodness!
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 32
Can also build against external libfabric
• openmpi-master § opal
• mca § common
• libfabric • include • prov • src …
libfabric
(e.g., installed under $HOME, or in /usr, or …)
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 33
Will be the only model in v1.9
• openmpi-master
libfabric
(e.g., installed under $HOME, or in /usr, or …)
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 34
Feedback loop = good
• Using libfabric in its (first) intended environment was quite useful § Resulted in libfabric pull requests, minor
changes, etc.
• Biggest thing missing is the mmunotify functionality § …will file a PR/RFC about this soon