LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster...

24
Hashing in Networked Systems COS 461: Computer Networks Spring 2011 Mike Freedman h@p://www.cs.princeton.edu/courses/archive/spring11/cos461/ LB Server Cluster Switches

Transcript of LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster...

Page 1: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

HashinginNetworkedSystems

COS461:ComputerNetworksSpring2011

MikeFreedmanh@p://www.cs.princeton.edu/courses/archive/spring11/cos461/

LB

ServerCluster

Switches

Page 2: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Hashing•  HashfuncIon

–  FuncIonthatmapsalarge,possiblyvariable‐sizeddatumintoasmalldatum,oNenasingleintegerthatservestoindexanassociaIvearray

–  Inshort:mapsn‐bitdatumintokbuckets(k<<2n)

–  ProvidesIme‐&space‐savingdatastructureforlookup

•  Maingoals:–  Lowcost– DeterminisIc– Uniformity(loadbalanced)

2

Page 3: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Today’soutline•  Usesofhashing

– Equal‐costmulIpathrouInginswitches

– Networkloadbalancinginserverclusters– Per‐flowstaIsIcsinswitches(QoS,IDS)– CachingincooperaIveCDNsandP2Pfilesharing– DataparIIoningindistributedstorageservices

•  Varioushashingstrategies– Modulohashing– Consistenthashing– BloomFilters

3

Page 4: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

UsesofHashing

4

Page 5: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Equal‐costmulIpathrouIng(ECMP)

•  ECMP– MulIpathrouIngstrategythatsplitstrafficovermulIplepathsforloadbalancing

•  Whynotjustround‐robinpackets?– Reordering(leadtotripleduplicateACKinTCP?)– DifferentRTTperpath(forTCPRTO)…– DifferentMTUsperpath

5

Page 6: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Equal‐costmulIpathrouIng(ECMP)

•  Path‐selecIonviahashing– #buckets=#outgoinglinks– HashnetworkinformaIon(source/destIPaddrs)toselectoutgoinglink:preservesflowaffinity

6

Page 7: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Now:ECMPindatacenters

•  DatacenternetworksaremulI‐rootedtree– Goal:Supportfor100,000sofservers– RecallEthernetspanningtreeproblems:Noloops

–  L3rouIngandECMP:TakeadvantageofmulIplepaths

7

Page 8: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Networkloadbalancing

•  Goal:Splitrequestsevenlyoverkservers– Mapnewflowstoanyserver–  PacketsofexisIngflowsconInuetousesameserver

•  3approaches–  LoadbalancerterminatesTCP,opensownconnecIontoserver

–  VirtualIP/DedicatedIP(VIP/DIP)approaches•  Oneglobal‐facingvirtualIPrepresentsallserversincluster•  Hashclient’snetworkinformaIon(sourceIP:port)•  NATapproach:ReplacevirtualIPwithserver’sactualIP•  DirectServerReturn(DSR)

8

Page 9: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

LoadbalancingwithDSR

•  ServersbindtobothvirtualanddedicatedIP•  LoadbalancerjustreplacesdestMACaddr•  ServerseesclientIP,respondsdirectly

–  PacketinreversedirecIondonotpassthroughloadbalancer–  Greaterscalability,parIcularlyfortrafficwithassymmetricbandwidth(e.g.,HTTPGETs)

LB

ServerCluster

Switches

9

Page 10: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Per‐flowstateinswitches

•  SwitchesoNenneedtomaintainconnecIonrecordsorper‐flowstate– Quality‐of‐serviceforflows– Flow‐basedmeasurementandmonitoring– PayloadanalysisinIntrusionDetecIonSystems(IDSs)

•  Onpacketreceipt:– HashflowinformaIon(packet5‐tuple)– Performlookupifpacketbelongstoknownflow– Otherwise,possiblycreatenewflowentry– ProbabilisIcmatch(falseposiIves)maybeokay

10

Page 11: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

CooperaIveWebCDNs•  Tree‐liketopologyofcooperaIvewebcaches

–  Checklocal–  Ifmiss,checksiblings/parent

•  Oneapproach–  InternetCacheProtocol(ICP)–  UDP‐basedlookup,shortImeout

•  AlternaIveapproach–  Aprioriguessissiblings/childrenhavecontent–  Nodessharehashtableofcachedcontentwithparent/siblings–  ProbabilisIccheck(falseposiIves)okay,asactualICPlookuptoneighborcouldjustreturnfalse

11

publicInternet

Parentwebcache

Page 12: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

HashtablesinP2Pfile‐sharing

•  Two‐layernetwork(e.g.,Gnutella,Kazaa)–  Ultrapeersaremorestable,notNATted,higherbandwidth–  Leafnodesconnectwith1ormoreultrapeers

•  Ultrapeershandlecontentsearchers–  Leafnodessendhashtableofcontenttoultrapeers–  Searchrequestsfloodedthroughultrapeernetwork– Whenultrapeergetsrequest,checkshashtablesofitschildrenformatch

12

Page 13: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

DataparIIoning

•  Networkloadbalancing:Allmachinesareequal

•  DataparIIoning:Machinesstoredifferentcontent

•  Non‐hash‐basedsoluIon–  “Directory”servermaintainsmappingfromO(entries)tomachines(e.g.,Networkfilesystem,GoogleFileSystem)

–  Nameddatacanbeplacedonanymachine

•  Hash‐basedsoluIon–  NodesmaintainmappingsfromO(buckets)tomachines

–  Dataplacedonthemachinethatownsthename’sbucket

13

Page 14: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

ExamplesofdataparIIoning•  Akamai

–  1000clustersaroundInternet,each>=1servers–  Hash(URL’sdomain)tomaptooneserver

–  AkamaiDNSawareofhashfuncIon,returnsmachinethat1.  isingeographically‐nearbycluster2.  managesparIcularcustomerdomain

•  Memcached(Facebook,Twi@er,…)–  Employkmachinesforin‐memorykey‐valuecaching–  Onread:

•  Checkmemcache

•  Ifmiss,readdatafromDB,writetomemcache

–  Onwrite:invalidatecache,writedatatoDB

14

Page 15: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

HowAkamaiWorks–AlreadyCached

End‐user

15

cnn.com (content provider) DNS root server Akamai server

1 2Akamai high-level DNS server

Akamai low-level DNS server

Nearby hash-chosen Akamai server

7

8

9

10

GET index.html

GET /cnn.com/foo.jpg Cluster

Page 16: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

HashingTechniques

16

Page 17: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

BasicHashTechniques

•  Simpleapproachforuniformdata–  IfdatadistributeduniformlyoverN,forN>>n

– Hashfn=<data>modn– Failsgoalofuniformityifdatanotuniform

•  Non‐uniformdata,variable‐lengthstrings– Typicallysplitstringsintoblocks– PerformrollingcomputaIonoverblocks

•  CRC32checksum

•  CryptographichashfuncIons(SHA‐1has64byteblocks)

17

Page 18: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

ApplyingBasicHashing

•  ConsiderproblemofdataparIIon:–  GivendocumentX,chooseoneofkserverstouse

•  Supposeweusemodulohashing–  Numberservers1..k

–  PlaceXonserveri=(Xmodk)•  Problem?Datamaynotbeuniformlydistributed

–  PlaceXonserveri=hash(X)modk•  Problem?

– Whathappensifaserverfailsorjoins(kk±1)?

– WhatisdifferentclientshasdifferentesImateofk?

– Answer:Allentriesgetremappedtonewnodes!

18

Page 19: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

•  ConsistenthashingparIIonskey‐spaceamongnodes

•  Contactappropriatenodetolookup/storekey–  Bluenodedeterminesrednodeisresponsibleforkey1

–  Bluenodesendslookuporinserttorednode

key1 key2 key3

key1=value

insert(key1,value)

19

ConsistentHashing

lookup(key1)

Page 20: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

•  ParIIoningkey‐spaceamongnodes

–  NodeschooserandomidenIfiers: e.g.,hash(IP)

–  KeysrandomlydistributedinID‐space: e.g.,hash(URL)

–  Keysassignedtonode“nearest”inID‐space–  Spreadsownershipofkeysevenlyacrossnodes

0000 0010 0110 1010 1111 1100 1110 URL1 URL2 URL3 0001 0100 1011

20

ConsistentHashing

Page 21: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

ConsistentHashing0

4

8

12 Bucket

14 •  ConstrucIon–  AssignChashbucketstorandompointsonmod2ncircle;hashkeysize=n

– MapobjecttorandomposiIononcircle

–  Hashofobject=closestclockwisebucket

•  Desiredfeatures–  Balanced:NobuckethasdisproporIonatenumberofobjects

–  Smoothness:AddiIon/removalofbucketdoesnotcausemovementamongexisIngbuckets(onlyimmediatebuckets)

–  Spreadandload:Smallsetofbucketsthatlienearobject

21

Page 22: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

BloomFilters

•  DatastructureforprobabilisIcmembershiptesIng–  Smallamountofspace,constantImeoperaIons–  FalseposiIvespossible,nofalsenegaIves–  Usefulinper‐flownetworkstaIsIcs,sharinginformaIonbetweencooperaIvecaches,etc.

•  Basicideausinghashfn’sandbitarray–  UsekindependenthashfuncIonstomapitemtoarray

–  Ifallarrayelementsare1,it’spresent.Otherwise,not

22

Page 23: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Startwithanmbitarray,filledwith0s.

Toinsert,hasheachitemkImes.IfHi(x)=a,setArray[a]=1.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0

Tocheckifyisinset,checkarrayatHi(y).Allkvaluesmustbe1.

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0

PossibletohaveafalseposiIve:allkvaluesare1,butyisnotinset.

BloomFilters23

Page 24: LB Server Cluster Switches - Computer Science … · load balancing in ... LB Server Cluster Switches ... – Nodes choose random idenfiers: e.g., hash(IP) ...

Today’soutline•  Usesofhashing

– Equal‐costmulIpathrouInginswitches

– Networkloadbalancinginserverclusters– Per‐flowstaIsIcsinswitches(QoS,IDS)– CachingincooperaIveCDNsandP2Pfilesharing– DataparIIoningindistributedstorageservices

•  Varioushashingstrategies– Modulohashing– Consistenthashing– BloomFilters

24