Next Steps in Internet Content Delivery Peter B Danzig [email protected].
-
date post
18-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of Next Steps in Internet Content Delivery Peter B Danzig [email protected].
Understanding WAN traffic
HOW MUCH WEB TRAFFIC
CROSSES THE INTERNET?
How much WAN HTTP traffic?
Assumptions:
250 million internet users
Average 10 kbits/s per user when online
Average 10% online
Yields:
Bandwidth = 250M * 10kb/s * 0.10 = 250 Gbits/s
How much WAN HTTP traffic? Observations:
Doubleclick.Com is 1% of web by byte
Geocities.Com is 1 % of web by byte
Download.Microsoft.Com is 1% byte byte
Learn their aggregate bandwidth purchases….And estimate internet bandwidth:
250 Gbits/s
Internet Web Sites
05
1015202530354045
20 100 300 1000
Number of Web Sites
Per
cent
of
Inte
rnet
Tra
ffic
More than 6,000 ISPs
0
20
40
60
80
100
1 10 100 1000 10000
ISPs
Per
cent
of
Inte
rnet
Tra
ffic
Percent
Internet CDN Market Sizing
• Market Size = $2M/Gb * 250 Gb * 3yr / Sqrt(2)yr
• Revenue Potential = Market Size * Mkt Fraction
• Market size grows by 4.5x every two years
• Today: Mkt = (250 Gb) (Multiplex Factor) + Streaming Mkt
• Streaming Mkt = Yahoo Broadcast + IBeam + Akamai + Real Broadcast Network < 10Gb/s
Internet CDN Strategy• Except for top 1000 sites, the world’s 199,000+ web sites serve minimal bandwidth
• This determines the business strategy
•Acct provisioning need be cheap & easy
• Need indirect sales
• Need bigger, more expensive product bundle
• Customer care need be inexpensive
• Make money from streaming media?
Internet CDN Strategy, cont.
• Live Streaming Media
• Lights, camera, action
• Event connectivity: ISDN or Satellite truck role
• Production and encoding
• Yucky, dirty, icky, labor intensive, non-cerebral, labor-of-love, crafty, stuff
• Work more reminiscent of WebVan than Cisco
Akamai ‘GIF’ Delivery
Without Akamai
“Akamaized”
HTMLDelivered byCNN
“Akamaized”
“Akamaized”
“Akamaized”
“Akamaized”“Akamaized”
“Akamaized”
Entire WebPage deliveredby CNN
KeyNote System Measurements
With Akamai
Without Akamai
KeyNote Systems & its Wannabes Deploys “footprint” of monitoring
agents, provisioning interface, global log collection, reports
Agents: Emulate URL and page download. Emulate broadband and dialup access rates
Wannabe Competitors: Mercury Interactive, Service Metrics, StreamCheck, etc.
KeyNote: Operational Issues Where’s the bottleneck: the agent or the
agent’s network connection Where’s the agent’s DNS resolver? How to excise mistaken points from
database How can a CDN beat a Keynote
benchmark? How does Keynote’s TCP stack affect its
results?
End-to-End CDN Measurements? Contrast methodology between Johnson
et al and Keynote Systems Server log analysis-e.g. Web Trends
Server logs don’t record page arrival times, as the bytes stay queued in TCP or OS buffers.
Client-side reporting (e.g. WebSideStory)Place JavaScript on web page that
reports client experience to aggregator
HTML Delivery Consider Web Traffic breakdown:
GIFS and JPEG 55% HTML 25% J. Random Gunk 20%
HTML is half of the delivery market, but HTML is 1/3 static and 2/3 dynamic HTML?
HTML Delivery Delivering static HTML from caches is fast How can we make dynamic HTML faster?
Compress it or Delta-Encode it Black magic: Transfer it over a TCP tunnel or
L2TP Little’s Law almost always surprises laymen. Construct or “assemble” it within the CDN via
proprietary language extensions
Future of HTML Delivery Profiling: detect client location and
link speed Interpret XML style sheets at edge
(see Oracle/Akamai) Insert ads Compress at source/decompress in
browser or edge network
Components of a CDN Distributed server load balancing,
e.g. “Internet Mapping” DNS redirection, hashing, and fault
tolerance Distributed system monitoring Distributed software configuration
management
Components of a CDN (cont) Live stream distribution and entry
points Log collection, reporting, and
performance monitoring Client provisioning mechanism Content management and
replication
Network Mapping Network mapping chooses reasonable data
centers to satisfy a client request. We could devote an entire day to mapping. Briefly, what factors help predict good
mapping? Contracted data center bandwidth Path characteristics: RTT, Bottleneck Bandwidth,
“Experience”, Autonomous Systems Crossed, Hop Count, Observed loss rates, etc.
How do you measure these factors? Mapping is an art.
Black Art of Network Mapping Cisco Boomerang
Synchronized DNS servers Radware’s DSLB box
Linear combination of hop count and RTT
F5’s 3DNS ICMP ping
Alteon, Foundry, Resonate, and others….
Live Stream Distribution Ubiquitous IP Multicast hasn’t
emerged Alternative: IP Multicast plus FEC Yahoo Broadcast’s Approach:
Private network link to principal ISPs Support multicast where available Otherwise, just blast it by unicast and
hope
Live Stream Distribution Some CDNs attempt to route
independent live streams via multiple paths
Encode with simple error correction codes- better code would increase delay
Makes client provisioning more challenging—need to get encoded signal to multiple entry points
Live Stream Distribution Splitter-combiner network burns
bandwidth Subscription and teardown expensive,
given low median subscriber count According to Yahoo Broadcast
Mean subscribers? Average subscribers?
Splitter/combiner masks failures too successfully, until hell breaks loose
DNS Redirection, Hashing, and Fault Tolerance Top-level DNS: Uses IP Anycast to a
dozen DNS servers (or more) Second-level DNS servers: Redirect
client to a reasonable region Low-level DNS servers: Implement
something akin to consistent hashing
Hot spare address takeover to mask machine failures
Distributed system monitoring Problem: export monitoring information
across thousands of machines running in hundreds of regions
Design principles: Aggregation Scalable Extensible data types Fault-tolerant Timely delivery Expressible queries
Distributed software configuration management
Manage software and OS on thousands of remote machines
Stage system software pushes Detect incompatibilities before hell
breaks loose
Log collection, reporting, and performance monitoring
Collect and create database of 10-100 billion log lines per day
Allow customer to see their logs and performance
How would you do this in real time?
Content management and replication
Reliably update replicated hosting Mask storage volume boundaries Enable billing and reclaiming lost
space
Consistent Hashing
Cute algorithm for splitting load across multiple servers
Create permutation on hash bucket
Add servers and subtract servers for given bucket (e.g. permutation) in same order
Consistent Hashing
http://a32.g.akamaitech.net/ Would a less elegant algorithm
suffice? Yes, hit rates are 98-99% anyway,
any hash algorithm suffices. The 2nd level of Akamai DNS servers
slightly degrade performance, since DNS TTLs are short
What are the next steps?
Got to address HTTP and compression/delta encoding
What about peer-to-peer for GIFS and Video?
How about PVR (e.g. TIVO) and Peer-to-peer
What about live stream distribution?