6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures...
Transcript of 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures...
![Page 1: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/1.jpg)
6.888:Lecture2
DataCenterNetworkArchitectures
MohammadAlizadeh
Spring2016
² SlidesadaptedfrompresentaDonsbyAlbertGreenbergandChanghoonKim(MicrosoJ)1
![Page 2: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/2.jpg)
DataCenterCosts
Amor%zedCost*
Component Sub-Components
~45% Servers CPU,memory,disk
~25% Powerinfrastructure
UPS,cooling,powerdistribuDon
~15% Powerdraw ElectricaluDlitycosts
~15% Network Switches,links,transit
*3yramorDzaDonforservers,15yrforinfrastructure;5%costofmoney
TheCostofaCloud:ResearchProblemsinDataCenterNetworks.SigcommCCR2009.Greenberg,Hamilton,Maltz,Patel.
![Page 3: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/3.jpg)
ServerCostsUglysecret:30%uDlizaDonconsidered“good”indatacenters
UnevenapplicaDonfit– EachserverhasCPU,memory,disk:mostapplicaDonsexhaustoneresource,strandingtheothers
LongprovisioningDmescales– Newserverspurchasedquarterlyatbest
Uncertaintyindemand– Demandforanewservicecanspikequickly
Riskmanagement– Nothavingspareserverstomeetdemandbringsfailurejustwhensuccessisathand
Sessionstateandstorageconstraints– Iftheworldwerestatelessservers,lifewouldbegood
3
![Page 4: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/4.jpg)
Goal:Agility–Anyservice,AnyServer
Turntheserversintoasinglelargefungiblepool– Dynamicallyexpandandcontractservicefootprintasneeded
Benefits– IncreaseservicedeveloperproducDvity– Lowercost– Achievehighperformanceandreliability
The 3 motivators of most infrastructure projects
4
![Page 5: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/5.jpg)
AchievingAgilityWorkloadmanagement
– Meansforrapidlyinstallingaservice’scodeonaserver– Virtualmachines,diskimages,containers
StorageManagement– Meansforaservertoaccesspersistentdata– Distributedfilesystems(e.g.,HDFS,blobstores)
Network– MeansforcommunicaDngwithotherservers,regardlessofwheretheyareinthedatacenter
5
![Page 6: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/6.jpg)
ConvenDonalDCNetwork
Reference–“DataCenter:LoadbalancingDataCenterServices”,Cisco2004
CR CR
AR AR AR AR...
SS
DC-Layer3
Internet
SS
A AA …
SS
A AA …
...
DC-Layer2Key
• CR=CoreRouter(L3)• AR=AccessRouter(L3)• S=EthernetSwitch(L2)• A=Rackofapp.servers
~1,000servers/pod==IPsubnet
6
![Page 7: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/7.jpg)
Layer 2 vs. Layer 3Ethernet switching (layer 2)
ü Fixed IP addresses and auto-configuration (plug & play) ü Seamless mobility, migration, and failover x Broadcast limits scale (ARP) x Spanning Tree Protocol
IP routing (layer 3) ü Scalability through hierarchical addressing ü Multipath routing through equal-cost multipath x More complex configuration x Can’t migrate w/o changing IP address
7
![Page 8: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/8.jpg)
ConvenDonalDCNetworkProblemsCR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
...
SS
SS
A AA …
SS
A AA …
~5:1
~40:1
~200:1
Dependenceonhigh-costproprietaryroutersExtremelylimitedserver-to-servercapacity
8
![Page 9: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/9.jpg)
AndMoreProblems…CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IPsubnet(VLAN)#1
~200:1
• ResourcefragmentaDon,significantlyloweringclouduDlizaDon(andcost-efficiency)
IPsubnet(VLAN)#2
A AA … A AA … A A… AA …AA A
9
![Page 10: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/10.jpg)
AndMoreProblems…CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IPsubnet(VLAN)#1
~200:1
• ResourcefragmentaDon,significantlyloweringclouduDlizaDon(andcost-efficiency)
ComplicatedmanualL2/L3re-configura%on
IPsubnet(VLAN)#2
A AA … A AA … A A… AA …AA A
10
![Page 11: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/11.jpg)
Measurements
11
![Page 12: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/12.jpg)
DCTrafficCharacterisDcsInstrumentedalargeclusterusedfordataminingandidenDfieddisDncDvetrafficpamerns
Trafficpamernsarehighlyvola%le– AlargenumberofdisDncDvepamernseveninaday
Trafficpamernsareunpredictable– CorrelaDonbetweenpamernsveryweak
Traffic-awareop%miza%onneedstobedonefrequentlyandrapidly
12
![Page 13: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/13.jpg)
DCOpportuniDesDCcontrollerknowseverythingabouthosts
HostOS’sareeasilycustomizable
Probabilis%cflowdistribuDonwouldworkwellenough,because…
– Flowsarenumerousandnothuge–noelephants– Commodityswitch-to-switchlinksaresubstanDallythicker(~10x)thanthemaximumthicknessofaflow
DCnetworkcanbemadesimple
??
13
![Page 14: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/14.jpg)
IntuiDon
Higherspeedlinksimproveflow-levelloadbalancing(ECMP)
14
20×10GbpsUplinks
2×100GbpsUplinks
11×10Gbpsflows(55%load)
1 2
1 2 20
Probof100%throughput=3.27%
Probof100%throughput=99.95%
![Page 15: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/15.jpg)
WhatYouSaid
“In3.2,thepaperstatesthatrandomizinglargeflowswon'tcausemuchperpetualcongesDonifmisplacedsincelargeflowsareonly100MBandthustake1secondtotransmitona1Gbpslink.Isn't1secondsufficientlyhightoharmtheisolaDonthatVL2triestoprovide?”
15
![Page 16: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/16.jpg)
VirtualLayer2Switch
16
![Page 17: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/17.jpg)
1.L2seman%cs
2.Uniformhighcapacity
3.Performanceisola%on
A AA … A AA … A AA … A AA …AAAA AAAA AAAA A A A A AA A AA AA AA
17
VL2Goals
![Page 18: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/18.jpg)
VL2DesignPrinciplesRandomizingtoCopewithVolaDlity
– Tremendousvariabilityintrafficmatrices
SeparaDngNamesfromLocaDons– Anyserver,anyservice
EmbracingEndSystems– Leveragetheprogrammability&resourcesofservers– Avoidchangestoswitches
BuildingonProvenNetworkingTechnology– Buildwithpartsshippingtoday– Leveragelowcost,powerfulmerchantsiliconASICs,thoughdonotrelyonanyonevendor
![Page 19: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/19.jpg)
Single-Chip“MerchantSilicon”Switches
19
Wedge
6pack
SwitchASIC
² ImagecourtesyofFacebook
![Page 20: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/20.jpg)
SpecificObjecDvesandSoluDonsSolu%onApproachObjec%ve
2.Uniformhighcapacitybetweenservers
Enforcehosemodelusingexis%ng
mechanismsonly
Employflataddressing
1.Layer-2seman%cs
3.PerformanceIsola%on
Guaranteebandwidthfor
hose-modeltraffic
Flow-basedrandomtrafficindirec%on
(ValiantLB)
Name-loca%onsepara%on&
resolu%onservice
TCP
20
![Page 21: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/21.jpg)
Discussion
21
![Page 22: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/22.jpg)
WhatYouSaid
“ItisinteresDngthatthispaperisfrom2009.ItseemsthatalargenumberofthesuggesDonsinthispaperareusedinpracDcetoday.”
22
![Page 23: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/23.jpg)
WhatYouSaid
“ForaddressresoluDon,whynothaveapplicaDonsusehostnamesanduseDNStoresolvehostnamestoIPaddresses(themappingfromhostnametoIPcouldbeupdatedwhenaservicemoved)?IsthedirectorysystembasicallyjustDNSbutwithIPsinsteadofhostnames?”“itwasunclearwhythehashofthe5tupleisrequired.”
23
![Page 24: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/24.jpg)
AddressingandRouDng:Name-LocaDonSeparaDon
payloadToR3
... ...
yx
Serversuseflatnames
Switchesrunlink-staterou%ngandmaintainonlyswitch-leveltopology
Copewithhostchurnswithverylieleoverhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y,zpayloadToR3 z
...
DirectoryService
…xàToR2yàToR3zàToR4
…
Lookup&Response
…xàToR2yàToR3zàToR3
…
24
![Page 25: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/25.jpg)
AddressingandRouDng:Name-LocaDonSeparaDon
payloadToR3
... ...
yx
Serversuseflatnames
Switchesrunlink-staterou%ngandmaintainonlyswitch-leveltopology
Copewithhostchurnswithverylieleoverhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y,zpayloadToR3 z
...
DirectoryService
…xàToR2yàToR3zàToR4
…
Lookup&Response
…xàToR2yàToR3zàToR3
…
• Allowstouselow-costswitches• Protectsnetworkandhostsfromhost-statechurn• Obviateshostandswitchreconfigura%on
25
![Page 26: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/26.jpg)
ExampleTopology:ClosNetwork
...
...
TOR
20Servers
Int
... ......
Aggr
KaggrswitcheswithDports
20*(DK/4)Servers... ........
Offerhugeaggrcapacityandmul%pathsatmodestcost
26
![Page 27: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/27.jpg)
ExampleTopology:ClosNetwork
...
...
TOR
20Servers
Int
... ......
Aggr
KaggrswitcheswithDports
20*(DK/4)Servers... ........
Offerhugeaggrcapacityandmul%pathsatmodestcost
D(#of10Gports)
MaxDCsize(#ofServers)
48 11,52096 46,080144 103,680
27
![Page 28: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/28.jpg)
TrafficForwarding:RandomIndirecDon
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
CopewitharbitraryTMswithverylieleoverhead
Linksusedforuppaths
Linksusedfordownpaths
T1 T2 T3 T4 T5 T6
28
![Page 29: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/29.jpg)
TrafficForwarding:RandomIndirecDon
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
CopewitharbitraryTMswithverylieleoverhead
Linksusedforuppaths
Linksusedfordownpaths
T1 T2 T3 T4 T5 T6
[ECMP+IPAnycast]• Harnesshugebisec%onbandwidth• Obviateesoterictrafficengineeringorop%miza%on• Ensurerobustnesstofailures• Workwithswitchmechanismsavailabletoday
29
![Page 30: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/30.jpg)
Whatyousaid
“…theheterogeneityofracksandtheincrementaldeploymentofnewracksmayintroduceasymmetrytothetopology.Inthiscase,moredelicatetopologydesignandrouDngalgorithmsareneeded.”
30
![Page 31: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/31.jpg)
SomeotherDCnetworkdesigns…
31
Fat-tree[SIGCOMM’08]
Jellyfish(random)[NSDI’12]
BCube[SIGCOMM’10]
![Page 32: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/32.jpg)
NextDme:CongesDonControl
32
![Page 33: 6.888: Lecture 2 Data Center Network ArchitecturesLecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016 ² Slides adapted from presentaons by Albert Greenberg and](https://reader030.fdocuments.us/reader030/viewer/2022040912/5e867e5f5acc8b35e62006de/html5/thumbnails/33.jpg)
33