Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in...
-
Upload
tracy-stanley-curtis -
Category
Documents
-
view
218 -
download
0
Transcript of Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in...
![Page 1: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/1.jpg)
Data Center Networks
Jennifer RexfordCOS 461: Computer Networks
Lectures: MW 10-10:50am in Architecture N101
http://www.cs.princeton.edu/courses/archive/spr12/cos461/
![Page 2: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/2.jpg)
Networking Case Studies
2
Data Center
BackboneEnterprise
Cellular
Wireless
![Page 3: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/3.jpg)
Cloud Computing
3
![Page 4: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/4.jpg)
Cloud Computing• Elastic resources– Expand and contract resources– Pay-per-use– Infrastructure on demand
• Multi-tenancy– Multiple independent users– Security and resource isolation– Amortize the cost of the (shared) infrastructure
• Flexible service management
4
![Page 5: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/5.jpg)
Cloud Service Models• Software as a Service– Provider licenses applications to users as a service– E.g., customer relationship management, e-mail, ..– Avoid costs of installation, maintenance, patches, …
• Platform as a Service– Provider offers platform for building applications– E.g., Google’s App-Engine– Avoid worrying about scalability of platform
5
![Page 6: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/6.jpg)
Cloud Service Models• Infrastructure as a Service– Provider offers raw computing, storage, and
network– E.g., Amazon’s Elastic Computing Cloud (EC2)– Avoid buying servers and estimating resource needs
6
![Page 7: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/7.jpg)
Enabling Technology: Virtualization
• Multiple virtual machines on one physical machine• Applications run unmodified as on real machine• VM can migrate from one computer to another
7
![Page 8: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/8.jpg)
Multi-Tier Applications• Applications consist of tasks–Many separate components–Running on different machines
• Commodity computers–Many general-purpose computers–Not one big mainframe– Easier scaling
![Page 9: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/9.jpg)
Multi-Tier Applications
9
Front end Server
Aggregator
Aggregator Aggregator… …
Aggregator
Worker
…
Worker Worker
…
Worker Worker
![Page 10: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/10.jpg)
Data Center Network
10
![Page 11: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/11.jpg)
Virtual Switch in Server
11
![Page 12: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/12.jpg)
Top-of-Rack Architecture• Rack of servers– Commodity servers– And top-of-rack switch
• Modular design– Preconfigured racks– Power, network, and
storage cabling
12
![Page 13: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/13.jpg)
Aggregate to the Next Level
13
![Page 14: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/14.jpg)
Modularity, Modularity, Modularity
• Containers
• Many containers
14
![Page 15: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/15.jpg)
Data Center Network Topology
15
CR CR
AR AR AR AR. . .
SS
Internet
SS
A AA …
SS
A AA …
. . .
Key•CR = Core Router•AR = Access Router•S = Ethernet Switch•A = Rack of app. servers
~ 1,000 servers/pod
![Page 16: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/16.jpg)
Capacity Mismatch
16
CR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
. . .
SS
SS
A AA …
SS
A AA …
~ 5:1~ 5:1
~ 40:1~ 40:1
~ 200:1~ 200:1
![Page 17: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/17.jpg)
Data-Center Routing
17
CR CR
AR AR AR AR. . .
SS
DC-Layer 3
Internet
SS
A AA …
SS
A AA …
. . .
DC-Layer 2
Key•CR = Core Router (L3)•AR = Access Router (L3)•S = Ethernet Switch (L2)•A = Rack of app. servers
~ 1,000 servers/pod == IP subnet
S S S S
SS
![Page 18: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/18.jpg)
Reminder: Layer 2 vs. Layer 3• Ethernet switching (layer 2)– Cheaper switch equipment– Fixed addresses and auto-configuration– Seamless mobility, migration, and failover
• IP routing (layer 3)– Scalability through hierarchical addressing– Efficiency through shortest-path routing– Multipath routing through equal-cost multipath
• So, like in enterprises…– Connect layer-2 islands by IP routers
18
![Page 19: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/19.jpg)
Case Study: Performance Diagnosis in Data Centers
http://www.eecs.berkeley.edu/~minlanyu/writeup/nsdi11.pdf
19
![Page 20: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/20.jpg)
Applications Inside Data Centers
20
Front end Server
Front end Server
AggregatorAggregator WorkersWorkers
….
…. …. ….
![Page 21: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/21.jpg)
Challenges of Datacenter Diagnosis• Multi-tier applications– Hundreds of application components– Tens of thousands of servers
• Evolving applications– Add new features, fix bugs– Change components while app is still in operation
• Human factors– Developers may not understand network well – Nagle’s algorithm, delayed ACK, etc.
21
![Page 22: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/22.jpg)
Diagnosing in Today’s Data Center
22
HostHost
App
OS Packet sniffer
App logs:#Reqs/sec
Response time1% req. >200ms
delay
Switch logs:#bytes/pkts per minute
Packet trace:Filter out trace for
long delay req.
SNAP:Diagnose net-app interactions
SNAP:Diagnose net-app interactions
![Page 23: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/23.jpg)
Problems of Different Logs
23
HostHost
App
OS Packet sniffer
App logs:Application-specific
Switch logs:Too coarse-grained
Packet trace:Too expensive
SNAP:Generic, fine-grained, and
lightweight
SNAP:Generic, fine-grained, and
lightweight
Runs everywhere, all the timeRuns everywhere, all the time
![Page 24: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/24.jpg)
TCP Statistics• Instantaneous snapshots– #Bytes in the send buffer– Congestion window size, receiver window size– Snapshots based on random sampling
• Cumulative counters– #FastRetrans, #Timeout– RTT estimation: #SampleRTT, #SumRTT– RwinLimitTime– Calculate difference between two polls
24
![Page 25: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/25.jpg)
Identifying Performance Problems
25
– Not any other problems
– Send buffer is almost full
– #Fast retransmission– #Timeout
– RwinLimitTime– Delayed ACKdiff(SumRTT)/diff(SampleRTT) > MaxDelay
Sender App
Send Buffer
Receiver
NetworkDirect measure
Sampling
Inference
![Page 26: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/26.jpg)
SNAP Architecture
26
At each host for every connection At each host for every connection
Collect data
Direct access to OS-Polling per-connection statistics: • Snapshots (#bytes in send buffer)• Cumulative counters (#FastRestrans)
-Adaptive tuning of polling rate
![Page 27: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/27.jpg)
SNAP Architecture
27
At each host for every connection At each host for every connection
Collect data
Performance Classifier
Classifying based on the life of data transfer-Algorithms for detecting performance problems-Based on direct measurement in the OS
![Page 28: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/28.jpg)
SNAP Architecture
28
At each host for every connection At each host for every connection
Collect data
Performance Classifier
Cross-connection correlation
Direct access to data center configurations-Input• Topology, routing information• Mapping from connections to processes/apps
-Correlate problems across connections• Sharing the same switch/link, app code
![Page 29: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/29.jpg)
SNAP Deployment• Production data center– 8K machines, 700 applications– Ran SNAP for a week, collected petabytes of data
• Identified 15 major performance problems– Operators: Characterize key problems in data center– Developers: Quickly pinpoint problems in app
software, network stack, and their interactions
29
![Page 30: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/30.jpg)
Characterizing Perf. Limitations
– Bottlenecked by CPU, disk, etc.– Slow due to app design (small writes)
30
Sender App
Send Buffer
Receiver
Network
#Apps that are limited for > 50% of the time
551 Apps
1 App
6 Apps
8 Apps
144 Apps
– Send buffer not large enough
– Fast retransmission – Timeout
– Not reading fast enough (CPU, disk, etc.)– Not ACKing fast enough (Delayed ACK)
![Page 31: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/31.jpg)
Delayed ACK • Delayed ACK caused significant problems– Delayed ACK was used to reduce bandwidth usage
and server interruption
31
Data
Data+ACK
Data
A B
ACK
B has data to send
B doesn’t have data to send
200 ms
….
Delayed ACK should be
disabled in data centers
Delayed ACK should be
disabled in data centers
![Page 32: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/32.jpg)
Diagnosing Delayed ACK with SNAP• Monitor at the right place– Scalable, low overhead data collection at all hosts
• Algorithms to identify performance problems– Identify delayed ACK with OS information
• Correlate problems across connections– Identify the apps with significant delayed ACK issues
• Fix the problem with operators and developers– Disable delayed ACK in data centers
32
![Page 33: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/33.jpg)
Conclusion• Cloud computing– Major trend in IT industry– Today’s equivalent of factories
• Data center networking– Regular topologies interconnecting VMs– Mix of Ethernet and IP networking
• Modular, multi-tier applications– New ways of building applications– New performance challenges
33
![Page 34: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/34.jpg)
Load Balancing
34
![Page 35: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/35.jpg)
Load Balancers• Spread load over server replicas– Present a single public address (VIP) for a service– Direct each request to a server replica
35
Virtual IP (VIP)192.121.10.1
10.10.10.1
10.10.10.2
10.10.10.3
![Page 36: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/36.jpg)
Wide-Area Network
36
Router
Router
DNS Server
DNS-basedsite
selection
Servers Servers
Internet
Clients
Data Centers
![Page 37: Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101](https://reader035.fdocuments.us/reader035/viewer/2022081515/56649d985503460f94a8280c/html5/thumbnails/37.jpg)
Wide-Area Network: Ingress Proxies
37
Router Router
Data CentersServers Servers
Clients
ProxyProxy