Enabling High Performance Bulk Data Transfers With SSH
-
Upload
datacenters -
Category
Technology
-
view
2.652 -
download
0
Transcript of Enabling High Performance Bulk Data Transfers With SSH
![Page 1: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/1.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Enabling High Performance Bulk Data Transfers With SSH
Chris Rapier
Benjamin Bennett
Pittsburgh Supercomputing Center
Mardi Gras 2008
![Page 2: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/2.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Moving Data
• Still crazy after all these years– Multiple solutions exist
• Protocols– UDT, SABUL, etc…
• Implementations – GridFTP, kFTP, bbFTP, hand rolled and more…
• Not to mention– Advanced congestion control, autotuning,
jumbograms, etc…
![Page 3: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/3.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Many Solutions No Answers
• All developed as solutions to the same problem– Moving lots of data very fast and reliably
can be very difficult
• Unfortunately, no single solution meets all needs.– Fast, easy to use, inexpensive to maintain,
flexible, secure, ubiquitous
![Page 4: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/4.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Why not [insert app here]?
• Many will work just fine if the right environment exists.– Some solutions have significantly higher costs of
entry than others. • Not a problem for some but can be a serious barrier to
smaller organizations.
– Sometimes just getting the app installed on both ends is a problem.
• Can make it difficult to ‘test drive’
![Page 5: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/5.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
What About SSH?
• Easy to use. – Well known interface
• Cheap to maintain.– Often a ‘fire and forget’ installation
• Installed everywhere.– Included in most OS distributions
• Flexible.– Multiple authentication methods and functional
modes
• Strong cryptography.
![Page 6: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/6.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Why not SSH?
• It can be slow.– Really.– Really.
– Slow.
![Page 7: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/7.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
703
4.6
0 100 200 300 400 500 600 700 800
Iperf
OpenSSH4.6
Mb/s
How slow?
![Page 8: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/8.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
703
128
0 100 200 300 400 500 600 700 800
Iperf
OpenSSH4.7
Mb/s
A little better
![Page 9: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/9.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
What changed?
• Why the improvement in OpenSSH4.7?– SSH is a multiplexed application
• Each channel requires its own flow control which is implemented as a receive window
– In 4.7 the maximum window size was increased to ~1MiB up from 64KiB
![Page 10: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/10.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Windows
• Receive and congestion windows advertise the amount of data a system or application is willing to accept per round trip time.
• Effective window size is the minimum of all windows; protocol and application.
• Each window must be tuned and in sync to maximize throughput.– If any one is out of tune the entire connection will
suffer.
![Page 11: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/11.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
TCP
![Page 12: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/12.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
TCP
![Page 13: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/13.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
TCP
![Page 14: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/14.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
TCP
![Page 15: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/15.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
SSHTCP
![Page 16: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/16.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
SSHTCP
![Page 17: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/17.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Windows in HPN-SSH
• Dynamically defined receive window size grows to match the TCP window.– Set to TCP RWIN on start.
– Grows with RWIN if autotuning system.– Dynamic sizing reduces issues of over-
buffering problems.
![Page 18: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/18.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
HPN-SSHTCP
![Page 19: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/19.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
HPN-SSHTCP
![Page 20: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/20.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
HPN-SSHTCP
![Page 21: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/21.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
HPN-SSHTCP
![Page 22: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/22.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
SFTP is Special
• SFTP adds *another* layer of flow control.– All SFTP packets are treated as requests
– By default no more than 16 outstanding requests.
– Results in a 512KiB window – Increase using -R on command line
![Page 23: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/23.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
HPN-SSHTCP SFTP
![Page 24: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/24.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
703
317
0 100 200 300 400 500 600 700 800
Iperf
HPN-SSH
Mb/s
A lot better
![Page 25: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/25.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
But…
• As the throughput increases crypto demands more of the processor.– The transfer is now processor bound
![Page 26: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/26.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
We Need More Power?
• Two solutions to processor bound transfers– Throw more processing power at the
problem
– Do the work more efficiently• Define ‘work’
![Page 27: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/27.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
The None Switch
• Many people only need secure authentication. The data can pass in the clear.– HPN-SSH allows users to switch to a
‘None’ cipher after authentication.
![Page 28: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/28.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
703
694
0 100 200 300 400 500 600 700 800
Iperf
None
Mb/s
Done!
![Page 29: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/29.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
As far as we can go?
• Windows are already optimized.– No more real improvements available there
• NONE cipher is limited to a subset of transfers. – Sometimes you absolutely need full
encryption.
• So what now?
![Page 30: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/30.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
More Power
• Common assumption that current hardware is incapable of meeting crypto demand– Is it true?
![Page 31: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/31.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Packetize
ComputeMAC
Encrypt
read(disk)
write(net)
Depacketize
ComputeMAC
Decrypt
write(disk)
read(net)
Tx Rx
What does SSH need to do?
![Page 32: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/32.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Today's Hardware
• Laptop– Two 64bit general purpose cores– 1GiB to 4GiB RAM– 1Gbps ethernet
• Desktop/Workstation– Two to eight 64bit general purpose cores– 1GiB to 8GiB RAM– 1Gbps ethernet
![Page 33: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/33.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
OpenSSL Benchmarks
• Dual Intel Xeon 5345 Workstation– 4 cores per socket, 8 cores total @ 2.33Ghz– Fedora 7 stock OpenSSL build
Performance of MAC & Cipher Algorithms on 8KiB Data Blocks
5976
6736
7704
26032
744
840
960
3232
0 5000 10000 15000 20000 25000 30000
aes256-cbc
aes192-cbc
aes128-cbc
hmac-md5
Mbps
Single Core
Eight Cores
![Page 34: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/34.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• hmac-md5 @ 1Gbps, ~0.3 cores• aes256-cbc @ 1Gbps, ~1.34 cores• Crypto total @ 1Gbps, ~1.64 cores• We have 8!
We have the CPU power
![Page 35: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/35.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• MAC requires fraction of one core
• Cipher requires more than one core• MAC, cipher, and more all within a single
execution thread
So what's the problem?
ssh
idle idle idle idle
kernelI/O idle idle
util %
![Page 36: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/36.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Multi-threading on functional boundaries– Perform MAC and cipher on a packet
concurrently• Possible on sender, not on receiver
– Process multiple packets concurrently (pipeline)
– Cipher still needs more than one core
• Multi-threading within cipher– Can it be parallelized?
How can we fix it?
![Page 37: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/37.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
SSH Cipher Modes
• CBC– Most common– RFC 4253 “The Secure Shell (SSH) Transport
Layer Protocol” specifies only CBC mode ciphers, arcfour, and none.
• CTR– Specified in RFC 4344 “SSH Transport Layer
Encryption Modes”– More desirable security properties than CBC
![Page 38: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/38.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Cipher Block Chaining Mode Encryption
Hello, my name is CBC
XOR
IV P0
Encrypt
C0
XOR
Encrypt
C1
P1
Key
...
![Page 39: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/39.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Cipher Block Chaining Mode Decryption
Decrypt
Key
C0
XOR
P0
Decrypt
XOR
P1
C1
IV
...
Hello, my name is CBC (cont)
![Page 40: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/40.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Encrypt must be serial• Decrypt may be parallel• That doesn't help so much :-(
CBC Summary
![Page 41: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/41.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Counter Mode Encryption
Hello, my name is CTR
CTR
Encrypt
Key
XOR
P0
C0
CTR + 1
Encrypt
XOR
P1
C1 ...
![Page 42: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/42.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Counter Mode Decryption
Hello, my name is CTR (cont)
CTR
Encrypt
Key
XOR
C0
P0
Encrypt
XOR
C1
P1 ...
CTR + 1
![Page 43: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/43.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Encrypt may be parallel• Decrypt may be parallel• Keystream can be pregenerated• Let’s get to work…
CTR Summary
![Page 44: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/44.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
• Uses arbitrary number of cipher threads (and cores) to generate a single keystream.
• Cipher threads pre-generate keystream, starting once a cipher context key and IV are known.
• Leaves only keystream dequeue & XOR for encrypt/decrypt operations in main SSH thread.
Multi-threaded AES-CTR
![Page 45: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/45.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Single Cipher Thread
• Cipher Thread– AES_Encrypt(ctr)– Inc(ctr)
• Main Thread– read(disk)– Packetize– Compute MAC
– XOR
– write(net)Keystream Q
![Page 46: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/46.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Multiple Cipher Threads
• Ring of bounded queues– Each queue holds a portion of keystream– Each queue exclusively accessed
• Queue counters offset initially and each fill
DRAINING
FILLING
FILLING
EMPTY
Cipher Thread 2
Cipher Thread 1
Main Thread
![Page 47: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/47.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
M-T AES-CTR Results8-core Nodes on 1Gbps LAN
938
938
938
938
417
456
506
944
0 200 400 600 800 1000
aes256-ctr
aes192-ctr
aes128-ctr
None
Iperf
Mbps
OriginalHPN-SSH
![Page 48: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/48.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Conclusion
• SSH designed for security– HPN-SSH is performance enhancements to the most
common SSH implementation, OpenSSH
• High throughput with high latency– Kernel auto-tuning adjusts TCP flow contol
– HPN-SSH RecvBufferPolling adjusts SSH flow control
• High throughput with any latency– HPN-SSH None cipher for non-private data– HPN-SSH Multi-threaded AES-CTR cipher
![Page 49: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/49.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Future Work
• Approaching 10Gbps
• Continued multi-threading– Concurrent packet processing/pipelining
• Efficiency
• Striped data transfers
• Exotic architectures
![Page 50: Enabling High Performance Bulk Data Transfers With SSH](https://reader030.fdocuments.us/reader030/viewer/2022032419/55a2bd581a28ab59268b45b4/html5/thumbnails/50.jpg)
Chris Rapier, Benjamin BennettPittsburgh Supercomputing Center
Mardi Gras 2008
Where to get it
http://www.psc.edu/networking/projects/hpn-ssh
Email: [email protected]