Realizing the Performance Potential of the Virtual Interface Architecture
description
Transcript of Realizing the Performance Potential of the Virtual Interface Architecture
![Page 1: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/1.jpg)
Realizing the Performance Potential of the Virtual Interface Architecture
Evan Speight, Hazim Abdel-Shafi, and John K. Bennett
Rice University, Dep. Of Electrical and Computer Engineering
Presented by Constantin Serban, R.U.
![Page 2: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/2.jpg)
VIA Goals
• Communication infrastructure for System Area Networks (SANs)
• Targets mainly high speed cluster applications
• Efficiently harnesses the communication performance of underlying networks
![Page 3: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/3.jpg)
Trends
• The peak bandwidth increase two order of magnitude over past decade while user latency decreased modestly.
• The latency introduced by the protocol is typically several times the latency of the transport layer.
• The problem becomes acute especially for small messages
![Page 4: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/4.jpg)
Targets
VI architecture addresses the following issues:
• Decrease the latency especially for small messages (used in synchronization)
• Increase the aggregate bandwidth (only a fraction of the peak bandwidth is utilized)
• Reduce the CPU processing due to the message overhead
![Page 5: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/5.jpg)
Overhead
Overhead mainly comes from two sources:• Every network access requires one-two
traps into the kernel – user/kernel mode switch is time consuming
• Usually two data copies occur:– From the user buffer to the message passing
API– From message layer to the kernel buffer
![Page 6: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/6.jpg)
VIA approach
• Remove the kernel from the critical path – Moving communication code out of the kernel
into user space
• Provide 0-copy protocol– Data is sent/received directly into the user
buffer, no message copy is performed
![Page 7: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/7.jpg)
VIA emerged as a standardization effort from Compaq, Intel, and Microsoft
It was built on several academic ideas: • The main architecture most similar to U-Net• Essential features derived from VMMCAmong current implementations :
– GigaNet cLan – VIA implemented in hardware– Tandem ServerNet –VIA software driver
emulated– Myricom Myrinet - software emulated in
firmware
![Page 8: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/8.jpg)
VIA architecture
![Page 9: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/9.jpg)
VIA operationsSet-Up/Tear-Down :• VIA is point-to-point connection oriented protocol• VI-endpoint : the core concept in VIA
• Register/De-Register Memory• Connect/Disconnect• Transmit• Receive• RDMA
![Page 10: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/10.jpg)
VIA operationsSet-Up/Tear-Down :VIA is point-to-point
connection oriented protocol• VI-endpoint : the core concept in VIA• VipCreateVi function creates a VI endpoint in the
user space.• The user-level library passes the call to the kernel
agent which passes the creation information to the NIC.
• OS thus controls the application access to the NIC
![Page 11: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/11.jpg)
VIA operations - cont’dRegister/De-Register Memory:• All data buffers and descriptors reside in a
registered memory • NIC performs DMA I/O operation in this
registered memory• Registration pins down the pages into the physical
memory and provides a handle to manipulate the pages and transfer the addresses to the NIC
• It is performed once, usually at the beginning of the communication session
![Page 12: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/12.jpg)
VIA operations - cont’dConnect/Disconnect:
• Before communication, each endpoint is connected to a remote endpoint
• The connection is passed to the kernel agent and down to the NIC
• VIA does not define any addressing scheme, existing schemes can be used in various implementations
![Page 13: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/13.jpg)
VIA operations - cont’dTransmit/receive:• The sender builds a descriptor for the message to
be sent. The descriptor points to the actual data buffer. Both descriptor and data buffer resides in a registered memory area.
• The application then posts a doorbell to signal the availability of the descriptor.The doorbell contains the address of the descriptor.
• The doorbells are maintained in an internal queue inside the NIC
![Page 14: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/14.jpg)
VIA operations - cont’dTransmit/receive (cont’d):• Meanwhile, the receiver creates a descriptor that
points to an empty data buffer and posts a doorbell in the receiver NIC queue
• When the doorbell in the sender queue has reached the top of the queue, through a double indirection the data is sent into the network.
• The first doorbell/ descriptor is picked up from the receiver queue and the buffer is filled out with data
![Page 15: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/15.jpg)
VIA operations - cont’dRDMA:• As a mechanism derived from VMMC, VIA allows
Remote DMA operations: RDMA Read and Write
• Each node allocates a receive buffer and registers it with the NIC. Additional structures that contain read and write pointers to the receive buffers are exchanged during connection setu
• Each node can read and write to the remote node address directly.
• These operations posts potential implementation problems.
![Page 16: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/16.jpg)
Evaluation Benchmarks
• Two VI implementations :– GigaNet cLan B:125MB/sec, Latency 480ns – Tandem ServerNet, 50MB/S, Latency 300ns
• Performance measured:– Bandwidth and Latency – Poling vs. Blocking– CPU Utilization
![Page 17: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/17.jpg)
Bandwidth
![Page 18: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/18.jpg)
Latency
![Page 19: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/19.jpg)
Latency Polling/Blocking
![Page 20: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/20.jpg)
CPU utilization
![Page 21: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/21.jpg)
MPI performance using VIA
• The challenge is to deliver performance to distributed application
• Software layers such MPI are mostly used between VIA and the application: provide increased usability but they bring additional overhead
• How to optimize this layer in order to use it efficiently with VIA ?
![Page 22: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/22.jpg)
MPI VIA - performance
![Page 23: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/23.jpg)
MPI observations• Difference between MPI-UDP and MPI-
VIA-baseline is remarkable
• MPI-VIA-baseline is dramatically far from VIA-Native
• Several improvements proposed to shift MPI-Via to be closer to VIA native : reduce MPI overhead
![Page 24: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/24.jpg)
MPI Improvements
• Eliminating unnecessary copies:MPI UDP and VIA use a single set of receiving buffers,
thus data should be copied to the application : allow the user to register any buffer
• Choosing a synchronization primitive:All synchronization formerly using OS constructs/events.
Better implementation using swap processor commands
• No Acknowledge: Remove the acknowledge of the message by switching to
a reliable VIA mode
![Page 25: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/25.jpg)
VIA - Disadvantages
• Polling vs. blocking synchronization – a tradeoff between CPU consumption and overhead
• Memory registration: locking large amount of memory makes virtual memory mechanisms inefficient. Registering / deregistering on the fly is slow
• Point-to-point vs. multicast: VIA lacks multicast primitives. Implementing multicast over the actual mechanism, makes communication inefficient
![Page 26: Realizing the Performance Potential of the Virtual Interface Architecture](https://reader034.fdocuments.us/reader034/viewer/2022051622/56815000550346895dbdca43/html5/thumbnails/26.jpg)
Conclusion
• Small latency for small messages. Small messages have a strong impact on application behavior
• Significant improvement over UDP communication (still after recent TCP/UDP hardware implementations?)
• At the expense of an uncomfortable API