Virtual Layer 2: A Scalable and Flexible Data‐Center...
Transcript of Virtual Layer 2: A Scalable and Flexible Data‐Center...
9/19/09
1
VirtualLayer2:AScalableandFlexibleData‐CenterNetwork
WorkwithAlbertGreenberg,JamesR.Hamilton,NavenduJain,SrikanthKandula,ParantapLahiri,
DavidA.Maltz,ParveenPatel,andSudiptaSengupta
Microso?ResearchChanghoonKim
TenetsofCloud‐ServiceDataCenter
• Agility:Assignanyserverstoanyservices– Boostscloudu7liza7on
• Scalingout:Uselargepoolsofcommodi7es– Achievesreliability,performance,lowcost
2
StaFsFcalMulFplexingGain
EconomiesofScale
9/19/09
2
WhatisVL2?
• Whyisagilityimportant?– Today’sDCnetworkinhibitsthedeploymentof
othertechnicaladvancestowardagility
• WithVL2,cloudDCscanenjoyagilityinfull
3
ThefirstDCnetworkthatenablesagilityinascaled‐outfashion
StatusQuo:ConvenFonalDCNetwork
Reference–“DataCenter:LoadbalancingDataCenterServices”,Cisco2004
CR CR
AR AR AR AR...
SS
DC‐Layer3
Internet
SS
A AA …
SS
A AA …
...
DC‐Layer2Key
• CR=CoreRouter(L3)• AR=AccessRouter(L3)• S=EthernetSwitch(L2)• A=Rackofapp.servers
~1,000servers/pod==IPsubnet
4
9/19/09
3
ConvenFonalDCNetworkProblems
CR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
...
5
SS
SS
A AA …
SS
A AA …
~5:1
~40:1
~200:1
• Dependenceonhigh‐costproprietaryrouters• Extremelylimitedserver‐to‐servercapacity
AndMoreProblems…
6
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IPsubnet(VLAN)#1
~200:1
• Resourcefragmenta7on,significantlyloweringcloudu7liza7on(andcost‐efficiency)
IPsubnet(VLAN)#2
A AA … A AA … A A… AA …AA A
9/19/09
4
AndMoreProblems…
7
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IPsubnet(VLAN)#1
~200:1
• Resourcefragmenta7on,significantlyloweringcloudu7liza7on(andcost‐efficiency)
IPsubnet(VLAN)#2
A AA … A AA … A A… AA …AA A
AndMoreProblems…
8
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
• Resourcefragmenta7on,significantlyloweringcloudu7liza7on(andcost‐efficiency)
A AA … A AA … A A… AA …AA
9/19/09
5
9
KnowYourCloudDC:Challenges• Instrumentedalargeclusterusedfordataminingandiden7fieddis7nc7vetrafficpa`erns
• Trafficpa`ernsarehighlyvolaFle– Alargenumberofdis7nc7vepa`ernseveninaday
• Trafficpa`ernsareunpredictable– Correla7onbetweenpa`ernsveryweak
OpFmizaFonshouldbedonefrequentlyandrapidly
KnowYourCloudDC:OpportuniFes
• DCcontrollerknowseverythingabouthosts• HostOS’sareeasilycustomizable
• ProbabilisFcflowdistribu7onwouldworkwellenough,because…– Flowsarenumerousandnothuge–noelephants!– Commodityswitch‐to‐switchlinksaresubstan7allythicker(~10x)thanthemaximumthicknessofaflow
10
DCnetworkcanbemadesimple
9/19/09
6
11
AllWeNeedisJustaHugeL2Switch,oranAbstracFonofOne
A AA … A AA …
...
A AA … A AA …
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
AAAA AAAA AAAA A A A A AA A AA AA AA
...
12
AllWeNeedisJustaHugeL2Switch,oranAbstracFonofOne
1.L2semanFcs
2.Uniformhighcapacity
3.PerformanceisolaFon
A AA … A AA … A AA … A AA …AAAA AAAA AAAA A A A A AA A AA AA AA
9/19/09
7
SpecificObjecFvesandSoluFons
13
SoluFonApproachObjecFve
2.Uniformhighcapacitybetweenservers
EnforcehosemodelusingexisFng
mechanismsonly
Employflataddressing
1.Layer‐2semanFcs
3.PerformanceIsolaFon
Guaranteebandwidthfor
hose‐modeltraffic
Flow‐basedrandomtrafficindirecFon
(ValiantLB)
Name‐locaFonseparaFon&
resoluFonservice
TCP
14
AddressingandRouFng:Name‐LocaFonSeparaFon
payloadToR3
... ...
yx
Serversuseflatnames
Switchesrunlink‐staterouFngandmaintainonlyswitch‐leveltopology
Copewithhostchurnswithverylijleoverhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y,zpayloadToR3 z
...
DirectoryService
…xToR2yToR3zToR4
…
Lookup&Response
…xToR2yToR3zToR3
…
9/19/09
8
15
AddressingandRouFng:Name‐LocaFonSeparaFon
payloadToR3
... ...
yx
Serversuseflatnames
Switchesrunlink‐staterouFngandmaintainonlyswitch‐leveltopology
Copewithhostchurnswithverylijleoverhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y,zpayloadToR3 z
...
DirectoryService
…xToR2yToR3zToR4
…
Lookup&Response
…xToR2yToR3zToR3
…
• Allowstouselow‐costswitches• Protectsnetworkandhostsfromhost‐statechurn• ObviateshostandswitchreconfiguraFon
ExampleTopology:ClosNetwork
16
...
...
TOR
20Servers
Int
... ......
Aggr
KaggrswitcheswithDports
20*(DK/4)Servers
... ........
OfferhugeaggrcapacityandmulFpathsatmodestcost
9/19/09
9
ExampleTopology:ClosNetwork
17
...
...
TOR
20Servers
Int
... ......
Aggr
KaggrswitcheswithDports
20*(DK/4)Servers
... ........
OfferhugeaggrcapacityandmulFpathsatmodestcost
D(#of10Gports)
MaxDCsize(#ofServers)
48 11,520
96 46,080
144 103,680
18
TrafficForwarding:RandomIndirecFon
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
CopewitharbitraryTMswithverylijleoverhead
Linksusedforuppaths
Linksusedfordownpaths
T1 T2 T3 T4 T5 T6
9/19/09
10
19
TrafficForwarding:RandomIndirecFon
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
CopewitharbitraryTMswithverylijleoverhead
Linksusedforuppaths
Linksusedfordownpaths
T1 T2 T3 T4 T5 T6
[ECMP+IPAnycast]• HarnesshugebisecFonbandwidth• ObviateesoterictrafficengineeringoropFmizaFon• Ensurerobustnesstofailures• Workwithswitchmechanismsavailabletoday
DoesVL2EnsureUniformHighCapacity?• How“high”and“uniform”canitget?– Performedall‐to‐alldatashuffletests,thenmeasuredaggregateandper‐flowgoodput
• Thecostforflow‐basedrandomspreading
20Time(s)
FairnessIndex
§
0 100 200 300 400 500
1.00 0.96 0.92 0.88 0.84 0.80
Fairness of Aggr-to-Int links’ utilization
Goodputefficiency
Fairness§betweenflows§Jain’sfairnessindexdefinedas(∑xi)2/(n·∑xi
2)
94%
0.995