Measurement and Improvement Analysis of Applications and ...
Network latency - measurement and improvement
-
Upload
matt-willsher -
Category
Technology
-
view
214 -
download
2
Transcript of Network latency - measurement and improvement
What is latency?
• Latency impacts the user experience
• Lower latency = more responsive = better
experience
• A fast download over link of high latency can take
longer than a slow down load over a low latency
link
Why measure latency?
• Efficiency:
• Improved resource usage
• Improved user experience
• Spotting and diagnosing defects
Where is Latency?
• Between:
• A CPU and it’s cache
• Client and server over a network
• Application and disk
• Anywhere a system does work
Where is latency?
• L1 cache reference 0.5 ns
• Branch mispredict 5 ns
• L2 cache reference 7 ns
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns
• Compress 1K bytes with Zippy 10,000 ns
• Send 2K bytes over 1 Gbps network 20,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Disk seek 10,000,000 ns
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet CA->Netherlands->CA 150,000,000 ns
Causes of network latency
• Physical limitations - speed of light, wire speeds
• Congestion at switches, routers and servers
• Packet loss due to noise, congestion, faults
Round Trip Times
• aka RTT
• Time to go their and back again
• Return route my be different from the outbound
Network Latency Tools
• Ping. Time between sending ICMP Echo Request and
receiving ICMP Echo Reply
• Traceroute. Time between sending a packet with
incremented TTL value and receiving ICMP Time
Exceeded package..
• tcptraceroute. traceroute using TCP packages to
configurable ports
• mtr - does ICMP, UDP and TCP traceroute
Transmission Control
Protocol
• Stateful, connection oriented protocol for reliable
data transmission
• Guarantees data delivery and ordering
• Server maintain state tables of connections
• HTTP, SMTP, SSL/TLS, IRC, SSH…
TCP Latency Improvements
• By reducing number of round trips:
• Compress content into fewer packets. 1500 MTU
=1460 byte payload
• TCP timestamps take an extra 12 bytes = 1448
byte payload. Timestamp can be disabled.
TCP Improvements
• Move your content closer to your users:
• Make good use of local caches (e.g. browser)
• Content Delivery Networks (Cloudflare,
Cloudfront, Akamai)
• Host geographically closely
• Host at locations with low latency links
HTTP Latency
• Use HTTP/1.1, HTTP/2 (née SPDY)
• Ensure pipelining is enabled
• Tune TCP keep alive
• Try TCP corking (buffer stream and
send), nodelay (buffer small
payload
HTTP Latency
• Take care over caching and provide well formed
headers
• Use tools like Pagespeed Insight to analyse
performance
• Pagespeed module to modify content on the
server
SSL/TLS
• Use AES and compatible libraries on processors
with AES-NI for hardware acceleration
• Elliptic Curve (EC-DSA) for smaller certs & keys
and better performance.
• Terminate SSL at the edge and consider using
lightweight or no encryption inside the local
network.
User Datagram Protocol
• ‘Fire and forget’ - no inbuilt reliability, connection-
less
• No hand shake
• Ordering and retransmission at the application
level
• Stateless, so no connect states to manage
• DNS, VOIP, SNMP, RIP, VPNs, Games, Mosh
Domain Name Service
• DNS lookups can hamper user experience
significantly
• Synchronous lookup before each resource
access
• Uses UDP (usually) for client/server lookups
DNS
• Caches are distributed nearer to the user (DNS
resolvers/forwarders)
• Great for popular sites
• For lower traffic site may still require an
authoritative lookup
DNS CNAMES
• DNS CNAMEs - name -> name -> IP
• Two DNS lookups. Two round trips.
• Never use a CNAME at a zone apex if you have
other records in that zone.
DNS Time to Live
• Time a DNS record is cached in a non-
authoritative servers.
• Need to strike a balance between keeping the
record cached near the user and the ability to
update the record
• 1 day is a good starting point. Decrease before
record switch overs.
DNS clients
• Avoid synchronous DNS lookups where possible:
async libraries, or batch process results later
• Consider local hosts files, use config
management to distribute
DNS
• Keep DNS geographically close to users
• Use providers with anycast DNS servers
• Globally distribute records if the audience is
global
• Can make initial load significantly faster
QUIC
• Experimental protocol from Google for encrypted,
multiplexed streams over UDP
• Aims to reduce number of round trips
• May make the next TLS standard
• Supported by Chrome, prototype server
Client and Servers hosts
• Watch for queuing - something in a queue means
not enough resource to service the request
• Disk IO historically a problem. Throughput in
IOPS. SSDs are reducing this latency.
• Be familiar with the standard system monitoring
tools
• Be wary of multi-threaded processes and locks
Cloud
• Get familiar with cloud providers tools. Useful views
outside the hosts.
• Load test for 5+ cycles of monitoring
• Can provide protocol level information
• Test apps from the point of view of the users -
Nagios, Pingdom, hitting representative end points
• Don’t take their word for performance - measure it