Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Programming TCP for responsiveness
DeNA Co., Ltd.Kazuho Oku
1
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
explains TCP latency optimization implemented in H2O HTTP/2 server 2.1
2Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Background
3Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
TCP slow start
n Initial Congestion Window (IW)=10⁃ only 10 packets can be sent in first RTT⁃ used to be IW=3
n window increase: 1.5x/RTT
4Programming TCP for responsivesess
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1 2 3 4 5 6 7 8
bytestransmi,ed
RTT
TCPslowstart(IW10,MSS1460)
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Why 1.5x?
During slow start, a TCP increments cwnd by at most SMSS bytes for each ACK received that cumulatively acknowledges new data.(snip)The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet.
TCP Congestion Control (RFC 5681)
5Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Flow of the ideal HTTP
n fastest within the limits of TCP/IPn receive a request 0-RTT, and:
⁃ first send CSS/JS*⁃ then send the HTML⁃ then send the images*
*: but only the ones not cached by the browser
6Programming TCP for responsivesess
client server
1RT
T
request
response
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
The reality in HTTP/2
n TCP establishment: +1 RTT*n TLS handshake: +2 RTT**n HTML fetch: +1 RTTn JS,CSS fetch: +2 RTT***
n Total: 6 RTT
*: 0 RTT on reconnection**: 1 RTT on reconnection***: servers often cannot switch to sending JS,CSS instantly, due to the output buffered in TCP send buffer
7Programming TCP for responsivesess
client server
1RT
T
TCPSYN
TCPSYNACK
TLSHandshake
TLSHandshake
TLSHandshake
TLSHandshake
GET/
HTML
GETcss,js
css,js〜〜
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Ongoing optimizations
n TCP Fast Open⁃ initial establishment in 1 RTT⁃ re-establishment in 0 RTT
n TLS 1.3⁃ initial handshake complete in 1 RTT⁃ resumption in 0 RTT
n what can be done in the HTTP/2 layer?
8Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Programming TCP for responsiveness
9Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Programming TCP for responsiveness
Answer: TCP Urgent Indications (i.e. MSG_OOB)
10Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Programming TCP for responsiveness
Answer: TCP Urgent Indications (i.e. MSG_OOB)
11Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
TCP Urgent Indications
n out-of-band messaging for TCP⁃ used by telnet!
n can only send 1 octet⁃ conflicting specs on how to handle multi-octet
messagesn cannot be used for HTTP/2n RFC 6093 “recommends against the use of urgent
mechanism” (RFC 7414)
12Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Typical sequence of HTTP/2
13Programming TCP for responsivesess
HTTP/2 200 OK
<!DOCTYPE HTML>…<SCRIPT SRC=”jquery.js”>…
client server
GET /
GET /jquery.js
needtoswitchsendingfromHTMLtoJSatthisverymoment(meansthatamountofdatasentin*mustbesmallerthanIW)
1RTT
*
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Buffering in TCP and TLS layer
14Programming TCP for responsivesess
TCPsendbuffer
CWNDunacked pollthreshold
BIObuf.
// ordinary code (non-blocking)while (SSL_write(…) != SSL_ERR_WANT_WRITE) ;
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Why do we have buffers?
15Programming TCP for responsivesess
n TCP send buffer:⁃ reduce ping-pong bet. kernel and application
n BIO buffer:⁃ for data that couldnʼt be stored in TCP send buffer
TCPsendbuffer
CWNDunacked pollthreshold
BIObuf.
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Improvement: poll-then-write
16Programming TCP for responsivesess
TCPsendbuffer
CWNDunacked pollthreshold
// only call SSL_write when polls notifies the app.while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…);
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Adjust poll threshold
17Programming TCP for responsivesess
TCPsendbuffer
CWNDunacked pollthreshold
n set poll threshold to the end of CWND?⁃ setsockopt(TCP_NOTSENT_LOWAT)⁃ in linux, the minimum is CWND + 1 octet• becomes unstable when set to CWND + 0
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Adjust poll threshold
18Programming TCP for responsivesess
CWNDunacked pollthreshold
// only call SSL_write when polls notifies the app.while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…);
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
TCPsendbuffer
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Further improvement: read TCP states
19Programming TCP for responsivesess
CWNDunacked pollthreshold
// calc size of data to send by calling getsockopt(TCP_INFO)if (poll_for_write(fd) == SOCKET_IS_READY) { capacity = CWND - unacked + TWO_MSS - TLS_overhead; SSL_write(prepare_http2_frames(capacity));}
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
TCPsendbuffer
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Negative impact of additional delay
n increased delay bet. ACK recv. → data send, since:⁃ traditional approach: completes within kernel⁃ this approach: application needs to be notified to
generate new datan outcome:
⁃ increase of CWND becomes slower⁃ leads to slower peak speed?• depends on how CWND at peak is calculated
⁃ does kernel use TCP timestamp for the matter?
20Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Countermeasures
n optimize for responsiveness only when necessary⁃ i.e. when RTT is big and CWND is small⁃ impact of optimization is proportional to
unsent_bytes / CWNDn disable optimization if additional delay is significant
⁃ when epoll returns immediately, estimated additional delay is equal to the time spent by the loop
21Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Configuration Directives
n http2-latopt-min-rtt⁃ minimum TCP RTT to enable the optimization⁃ default: UINT_MAX (disabled)
n http2-latopt-max-cwnd⁃ maximum CWND to enable (in octets)⁃ default: 65535
n http2-max-additional-delay⁃ max. additional delay (as the ratio to TCP RTT)⁃ latopt disabled if the delay is greater⁃ default: 0.1
22Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Pseudo-codesize_t get_suggested_write_size() { getsockopt(fd, IPPROTO_TCP, TCP_INFO, &tcp_info, sizeof(tcp_info)); if (tcp_info.tcpi_rtt < min_rtt || tcp_info.tcpi_snd_cwnd > max_cwnd) return UNKNOWN;
switch (SSL_get_current_cipher(ssl)->id) { case TLS1_CK_RSA_WITH_AES_128_GCM_SHA256: case …: tls_overhead = 5 + 8 + 16; break; default: return UNKNOWN; }
packets_sendable = tcp_info.tcpi_snd_cwnd > tcp_info.tcpi_unacked ? tcp_info.tcpi_snd_cwnd - tcp_info.tcpi_unacked : 0; return (packets_sendable + 2) * (tcp_info.tcpi_snd_mss - tls_overhead);}
23Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Benchmark (1)
24Programming TCP for responsivesess
n conditions:⁃ server in Ireland, client in Tokyo (RTT 250ms)⁃ load tiny js at the top of a large HTML
n result: delay decreased from 511ms to 250ms⁃ i.e. JS fetch latency was 2RTT, became 1 RTT• similar results in other environments
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Benchmark (2)
n using same data as previousn server: Sakura VPS (Ishikari DC)
25Programming TCP for responsivesess
0
50
100
150
200
250
300
HTML JS
millisecon
ds�
downloadingHTML(andJSwithin)RTT~25ms�
master latopt
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Conclusion
n near-optimal result can be achieved⁃ by adjusting poll threshold and reading TCP
states⁃ 1-packet overhead due to restriction in Linux
kerneln 1-RTT improvement in H2O
⁃ estimated 1-RTT improvement per the depth of the load graph
26Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Under the hood
27Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
TCP_NOTSENT_LOWAT
n supported by Linux, OS Xn on Linux:
⁃ sysctl:• set to -1: use kernel default• set to 0: sshd hangs• set to positive int: override kernel default
⁃ setsockopt:• set to 0: use default (sysctl or kernel)• set to int: override default
28Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Unit of CWND
n Linux: # of packets⁃ if INITCWND is 10, you can send at most 10
packets at once, regardless of their sizen BSD (incl. OS X): octets
⁃ you can send CWND*MSS octets, regardless of the number of packets• if CWND=10 and MSS=1460, it is possible to send
14,600 packets containing 1-octet payload
29Programming TCP for responsivesess
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Determining amount of data that can be sent immediately
OS MSS CWND inflight sendbuffer(inflight+unsent)
Linux tcpi_snd_mss tcpi_snd_cwnd* tcpi_snd_unacked* ioctl(SIOCOUTQ)
OSX** tcpi_maxseg tcpi_snd_cwnd - tcpi_snd_sbbytes
FreeBSD tcpi_snd_mss tcpi_snd_cwnd - ioctl(FIONWRITE)
NetBSD tcpi_snd_mss tcpi_snd_cwnd* - ioctl(FIONWRITE)
30Programming TCP for responsivesess
n calculate either of:⁃ CWND - inflight⁃ min(CWND - (inflight + unsent), 0)
n units used in the calculation must be the same⁃ NetBSD: fail
*:unitsofvaluesmarkedarepackets,unmarkedareoctets**:somefmesthevaluesoftcpi_*arereturnedaszeros
Top Related