Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to...

69
Peer‐to‐peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2010 (MW 3:00‐4:20 in COS 105) Mike Freedman hKp://www.cs.princeton.edu/courses/archive/spring10/cos461/

Transcript of Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to...

Page 1: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Peer‐to‐peersystemsandDistributedHashTables(DHTs)

COS461:ComputerNetworksSpring2010(MW3:00‐4:20inCOS105)

MikeFreedmanhKp://www.cs.princeton.edu/courses/archive/spring10/cos461/

Page 2: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

OverlayNetworks•  P2PapplicaPonsneedto:

– TrackidenPPes&(IP)addressesofpeers•  Maybemanyandmayhavesignificantchurn•  Bestnottohaven2IDreferences

– Thus,nodes’“views”<<viewinconsistenthashing– Routemessagesamongpeers

•  Ifyoudon’tkeeptrackofallpeers,thisis“mulP‐hop”

•  Overlaynetwork– PeersdoingbothnamingandrouPng–  IPbecomes“just”thelow‐leveltransport

•  AlltheIProuPngisopaque•  AssumpPonthatnetworkisfully‐connected(true?)

(ManyslidesborrowedfromJoeHellerstein’sVLDB’04keynote)

Page 3: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

ManyNewChallenges

•  RelaPvetootherparallel/distributedsystems– ParPalfailure– Churn– Fewguaranteesontransport,storage,etc.– HugeopPmizaPonspace– NetworkboKlenecks&otherresourceconstraints– NoadministraPveorganizaPons– Trustissues:security,privacy,incenPves

•  RelaPvetoIPnetworking– MuchhigherfuncPon,moreflexible– Muchlesscontrollable/predictable

Page 4: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2P

Page 5: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PI:Client‐Server

•  Napster

xyz.mp3?

xyz.mp3

Page 6: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PI:Client‐Server

•  Napster– Client‐searchsearch

xyz.mp3

Page 7: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PI:Client‐Server

•  Napster– Client‐searchsearch

xyz.mp3?

xyz.mp3

Page 8: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PI:Client‐Server

•  Napster– Client‐searchsearch– “P2P”filexfer

xyz.mp3?

xyz.mp3

Page 9: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PI:Client‐Server

•  Napster– Client‐searchsearch– “P2P”filexfer

xyz.mp3?

xyz.mp3

Page 10: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PII:FloodingonOverlays

xyz.mp3?

xyz.mp3

An“unstructured”overlaynetwork

Page 11: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PII:FloodingonOverlays

xyz.mp3?

xyz.mp3

Flooding

Page 12: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PII:FloodingonOverlays

xyz.mp3?

xyz.mp3

Flooding

Page 13: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PII:FloodingonOverlays

xyz.mp3

Page 14: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

EarlyP2PII.v:“Ultra/superpeers”•  Ultra‐peerscanbeinstalled(KaZaA)orself‐promoted(Gnutella)–  AlsousefulforNATcircumvenPon,e.g.,inSkype

Page 15: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

HierarchicalNetworks(&Queries)

•  IP–  Hierarchicalnamespace–  HierarchicalrouPng:AS’scorr.withnamespace(notperfectly)

•  DNS–  Hierarchicalnamespace(“clients”+hierarchyofservers)–  HierarchicalrouPngw/aggressivecaching

•  TradiPonalpros/consofhierarchicalmgmt– Workswellforthingsalignedwiththehierarchy

•  E.g.,physicaloradministraPvelocality

–  Inflexible•  Nodataindependence!

Page 16: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

LessonsandLimitaPons

•  Client‐Serverperformswell–  Butnotalwaysfeasible:Performancenotonenkeyissue!

•  Thingsthatflood‐basedsystemsdowell–  Organicscaling–  DecentralizaPonofvisibilityandliability–  Findingpopularstuff–  Fancylocalqueries

•  Thingsthatflood‐basedsystemsdopoorly–  Findingunpopularstuff–  Fancydistributedqueries–  VulnerabiliPes:datapoisoning,tracking,etc.–  Guaranteesaboutanything(answerquality,privacy,etc.)

Page 17: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

StructuredOverlays:DistributedHashTables

Page 18: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

DHTOutline

•  High‐leveloverview•  Fundamentalsofstructurednetworktopologies

–  Andexamples

•  OneconcreteDHT–  Chord

•  Somesystemsissues–  Heterogeneity–  Storagemodels&sonstate

–  Locality–  Churnmanagement

–  Underlaynetworkissues

Page 19: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

High‐LevelIdea:IndirecPon

•  IndirecPoninspace–  Logical(content‐based)IDs,rouPngtothoseIDs

•  “Content‐addressable”network–  Tolerantofchurn

•  nodesjoiningandleavingthenetworktoh

y

zh=y

Page 20: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

High‐LevelIdea:IndirecPon

•  IndirecPoninspace–  Logical(content‐based)IDs,rouPngtothoseIDs

•  “Content‐addressable”network–  Tolerantofchurn

•  nodesjoiningandleavingthenetwork

•  IndirecPoninPme–  Temporallydecouplesendandreceive–  Persistencerequired.Hence,typicalsol’n:sonstate

•  Comboofpersistenceviastorageandviaretry–  “Publisher”requestsTTLonstorage–  Republishesasneeded

•  Metaphor:DistributedHashTable

tohz

h=z

Page 21: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

WhatisaDHT?

•  HashTable– Datastructurethatmaps“keys”to“values”– EssenPalbuildingblockinsonwaresystems

•  DistributedHashTable(DHT)– Similar,butspreadacrosstheInternet

•  Interface–  insert(key,value)orput(key,value)–  lookup(key)orget(key)

Page 22: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

How?

EveryDHTnodesupportsasingleoperaPon:

– Givenkeyasinput;routemessagestowardnodeholdingkey

Page 23: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon

Page 24: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon

Page 25: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon

OperaPon:takekeyasinput;routemsgstonodeholdingkey

Page 26: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon:put()

put(K1,V1)

OperaPon:takekeyasinput;routemsgstonodeholdingkey

Page 27: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon:put()

put(K1,V1)

OperaPon:takekeyasinput;routemsgstonodeholdingkey

Page 28: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

(K1,V1)

K V

K V K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon:put()

OperaPon:takekeyasinput;routemsgstonodeholdingkey

Page 29: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

get(K1)

K V

K V K V

K V

K V

K V

K V

K V

K V

K V

K V

DHTinacPon:get()

OperaPon:takekeyasinput;routemsgstonodeholdingkey

Page 30: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

get(K1)

K V

K V K V

K V

K V

K V

K V

K V

K V

K V

K V

IteraPvevs.RecursiveRouPngPreviouslyshowedrecursive.AnotheropPon:itera8ve

OperaPon:takekeyasinput;routemsgstonodeholdingkey

Page 31: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

DHTDesignGoals

•  An“overlay”networkwith:–  Flexiblemappingofkeystophysicalnodes–  Smallnetworkdiameter

–  Smalldegree(fanout)–  LocalrouPngdecisions–  Robustnesstochurn–  RouPngflexibility–  Decentlocality(low“stretch”)

•  Different“storage”mechanismsconsidered:–  Persistencew/addiPonalmechanismsforfaultrecovery–  Besteffortcachingandmaintenanceviasonstate

Page 32: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

DHTOutline

•  High‐leveloverview•  Fundamentalsofstructurednetworktopologies

–  Andexamples

•  OneconcreteDHT–  Chord

•  Somesystemsissues–  Heterogeneity–  Storagemodels&sonstate

–  Locality–  Churnmanagement

–  Underlaynetworkissues

Page 33: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

AnExampleDHT:Chord

•  Assumen=2mnodesforamoment– A“complete”Chordring

– We’llgeneralizeshortly

Page 34: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

AnExampleDHT:Chord

• EachnodehasparPcularviewofnetwork– Setofknownneighbors

Page 35: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

AnExampleDHT:Chord

• EachnodehasparPcularviewofnetwork– Setofknownneighbors

Page 36: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

AnExampleDHT:Chord

• EachnodehasparPcularviewofnetwork– Setofknownneighbors

Page 37: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

CayleyGraphs

•  TheCayleyGraph(S,E)ofagroup:–  VerPcescorrespondingtotheunderlyingsetS–  Edgescorrespondingtotheac8onsofthegenerators

•  (Complete)ChordisaCayleygraphfor(Zn,+)–  S=Zmodn(n=2k).

–  Generators{1,2,4,…,2k‐1}–  That’swhatthepolygonsareallabout!

•  Fact:Most(complete)DHTsareCayleygraphs–  Andtheydidn’tevenknowit!–  FollowsfromparallelInterConnectNetworks(ICNs)

Page 38: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

HowHairymetCayley•  Whatdoyouwantinastructurednetwork?

–  UniformityofrouPnglogic–  Efficiency/load‐balanceofrouPngandmaintenance–  Generalityatdifferentscales

•  Theorem:AllCayleygraphsarevertexsymmetric.–  I.e.isomorphicunderswapsofnodes–  SorouPngfromytoxlooksjustlikerouPngfrom(y‐x)to0

•  TherouPngcodeateachnodeisthesame•  Moreover,underarandomworkloadtherouPngresponsibiliPes(congesPon)ateachnodearethesame!

•  Cayleygraphstendtohavegooddegree/diametertradeoffs–  EfficientrouPngwithfewneighborstomaintain

•  ManyCayleygraphsarehierarchical–  MadeofsmallerCayleygraphsconnectedbyanewgenerator

•  E.g.aChordgraphon2m+1nodeslookslike2interleaved(half‐notchrotated)Chordgraphsof2mnodeswithhalf‐notchedges

Page 39: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Pastry/Bamboo

•  BasedonPlaxtonMesh•  Namesarefixedbitstrings•  Topology:PrefixHypercube

–  Foreachbitfromlentoright,pickneighborIDwithcommonflippedbitandcommonprefix

–  logndegree&diameter

•  Plusaring–  Forreliability(withkpred/succ)

•  SuffixRouPngfromAtoB–  “Fix”bitsfromlentoright–  E.g.1010to0001:

1010

0101

1100 1000

1011

1010→0101→0010→0000→0001

Page 40: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

CAN:ContentAddressableNetwork

  ExploitmulPpledimensions

  Eachnodeisassignedazone

(0,0) (1,0)

(0,1)

(0,0.5,0.5,1)

(0.5,0.5,1,1)

(0,0,0.5,0.5)

• •

••

(0.5,0.25,0.75,0.5)

(0.75,0,1,0.5)•

  NodesID’dbyzoneboundaries  Join:choserandompoint,

splititszones

Page 41: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

RouPngin2‐dimensions

•  RouPngisnavigaPngad‐dimensionalIDspace–  RoutetoclosestneighborindirecPonofdesPnaPon–  RouPngtablecontainsO(d)neighbors

•  NumberofhopsisO(dN1/d)

(0,0) (1,0)

(0,1)

(0,0.5,0.5,1)

(0.5,0.5,1,1)

(0,0,0.5,0.5) (0.75,0,1,0.5)

••

(0.5,0.25,0.75,0.5)

Page 42: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Koorde

•  DeBruijngraphs–  Linkfromnodextonodes2xand2x+1

–  Degree2,diameterlogn•  OpPmal!

•  KoordeisChord‐based–  BasicallyChord,butwithDeBruijnfingers

Page 43: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

TopologiesofOtherOn‐citedDHTs•  Tapestry

–  VerysimilartoPastry/Bambootopology–  Noring

•  Kademlia–  AlsosimilartoPastry/Bamboo

–  Butthe“ring”isorderedbytheXORmetric:“bidirecPonal”–  UsedbytheeMule/BitTorrent/Azureus(Vuze)systems

•  Viceroy–  AnemulatedBuKerflynetwork

•  Symphony–  Arandomized“small‐world”network

Page 44: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

IncompleteGraphs:EmulaPon•  ForChord,weassumedexactly2mnodes.Whatifnot?–  Needto“emulate”acompletegraphevenwhenincomplete.

•  DHT‐specificschemesused–  InChord,nodexisresponsiblefortherange(pred(x),x]

–  The“holes”ontheringshouldberandomlydistributedduetohashing

Page 45: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Handlenodeheterogeneity•  Sourcesofunbalancedload

–  UnequalporPonofkeyspace–  Unequalloadperkey

•  Balancingkeyspace–  Consistenthashing:RegionownedbysinglenodeisO(1/n(1+logn))

– Whataboutnodehetergeneity?•  Nodescreate“virtualnodes”of#proporPonaltocapacity

•  Loadperkey–  Assumesmanykeysperregion

Page 46: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

ChordinFlux

•  EssenPallynevera“complete”graph– Maintaina“ring”ofsuccessornodes

–  Forredundancy,pointtoksuccessors

–  PointtonodesresponsibleforIDsatpowersof2

•  Called“fingers”inChord•  1stfingeristhesuccessor

Page 47: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

JoiningtheChordRing

•  NeedIPofsomenode•  PickarandomID

–  e.g.SHA‐1(IP)•  SendmsgtocurrentownerofthatID–  That’syoursuccessorinChordrouPng

Page 48: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

JoiningtheChordRing

•  Updatepred/succlinks–  Onceringisinplace,allwell!

•  InformapplicaPontomovedataappropriately

•  Searchtofind“fingers”ofvaryingpowersof2–  Orjustcopyfrompred/succandcheck!

•  Inboundfingersfixedlazily

Theorem:Ifconsistencyisreachedbeforenetworkdoubles,lookupsremainlogn

Page 49: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Fingersmustbeconstrained?

•  No:ProximityNeighborSelecPon(PNS)

Page 50: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

HandlingChurn

•  Churn– SessionPme?LifePme?

•  Forsystemresilience,sessionPmeiswhatmaKers

•  Threemainissues– DeterminingPmeouts

•  Significantcomponentoflookuplatencyunderchurn

– Recoveringfromalostneighborin“leafset”•  Periodic,notreacPve!•  ReacPvecausesfeedbackcycles

–  Esp.whenaneighborisstressedandPminginandout

– NeighborselecPonagain

Page 51: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Timeouts•  RecallIteraPvevs.RecursiveRouPng

–  IteraPve:OriginatorrequestsIPaddressofeachhop–  Recursive:Messagetransferredhop‐by‐hop

•  EffectonPmeoutmechanism–  NeedtotracklatencyofcommunicaPonchannels–  IteraPveresultsindirectn×ncommunicaPon

•  Can’tkeepPmeoutstatsatthatscale•  SoluPon:virtualcoordinateschemes[Vivaldi,etc.]

– WithrecursivecandoTCP‐liketrackingoflatency•  ExponenPallyweightedmeanandvariance

•  Upshot:BothworkOKuptoapoint–  TCP‐styledoessomewhatbeKerthanvirtualcoordsatmodestchurnrates(23min.ormoremeansessionPme)

–  Virtualcoordsbeginstofailathigherchurnrates

Page 52: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Recursivevs.IteraPve

Len:SimulaPonof20,000lkpsforrandomkeysRecursivelookuptakes0.6PmesaslongasiteraPve

RightFigure:1,000lookupsintest‐bed;confirmssimulaPon

Page 53: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Recursivevs.IteraPve

•  Recursive–  FasterundermanycondiPons

•  Fewerround‐trip‐Pmes•  BeKerproximityneighborselecPon•  CanPmeoutindividualRPCsmorePghtly

–  BeKertolerancetonetworkfailure•  Pathbetweenneighborsalreadyknown

•  IteraPve–  TightercontroloverenPrelookup

•  EasilysupportwindowedRPCsforparallelism•  EasiertoPmeoutenPrelookupasfailed

–  Fastertoreturndatadirectlythanuserecursivepath

Page 54: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

StorageModelsforDHTs

•  UptonowwefocusedonrouPng– DHTsas“content‐addressablenetwork”

•  Implicitin“DHT”nameissomekindofstorage– OrperhapsabeKerwordis“memory”– EnablesindirecPoninPme– Butalsocanbeviewedasaplacetostorethings

Page 55: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Storagemodels

•  Storeonlyonkey’simmediatesuccessor– Churn,rouPngissues,packetlossmakelookupfailuremorelikely

•  Storeonksuccessors– Whennodesdetectsucc/predfail,re‐replicate

•  Cachealongreverselookuppath– Provideddataisimmutable– …andperformingrecursiveresponses

Page 56: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Storageonsuccessors?•  Erasure‐coding

– Datablocksplitintolfragments– mdiff.fragmentsnecessarytoreconstructtheblock–  Redundantstorageofdata

•  ReplicaPon– NodestoresenPreblock–  Specialcase:m =1andlisnumberofreplicas–  RedundantinformaPonspreadoverfewernodes

•  Comparisonofbothmethods–  r=l/mamountofredundancy

•  Prob.blockavailable:

pavail =li

p0

i i − p0( )l− ii=m

l

Page 57: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Latency:Erasure‐codingvs.replicaPon

  ReplicaPon:slightlylowerlatency  Erasure‐coding:higheravailability

  DHash++useserasure‐codingwithm=7andl=14

Page 58: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Whataboutmutabledata?

•  Ugh!

•  Differentviews–  Ivy:Createversiontrees[Muthitacharoen,OSDI‘02]

•  Think“distributedversioncontrol”system

•  Globalagreement?– Reachconsensusamongallnodesbelongingtoasuccessorgroups:“distributedagreement”•  Difficult,especiallyatscale

Page 59: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

AnonoverlookedassumpPon:Theunderlayisn’tperfect!

•  AllhaveimplicitassumpPon:fullconnecPvity

•  Non‐transi8veconnec8vity(NTC)notuncommon

B↔C,C↔A,A↔B

•  AthinksCisitssuccessor!

kA B C

X

Page 60: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Doesnon‐transiPvityexist?•  Gerding/StriblingPlanetLabstudy

– 9%ofallnodetriplesexhibitNTC– AKributedhighextenttoInternet‐2

•  YetNTCisalsotransient– One3hourPlanetLaball‐pair‐pingstrace– 2.9%havepersistentNTC– 2.3%haveintermiKentNTC– 1.3%failonlyforasingle15‐minutesnapshot

•  Level3↔Cogent,butLevel3↔X↔Cogent

•  NTCmoPvatesRONandotheroverlayrouPng!

Page 61: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

NTCproblemfundamental?

S → R A

A → R B

B → R R

RS CBA

TradiPonalrouPng

Page 62: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

NTCproblemfundamental?

•  DHTsimplementgreedyrouPngforscalability

•  Sendermightnotusepath,eventhoughexists:findslocalminimawhenid‐distancerouPng

S → R A

A → R B

B → R R

RS CBA

TradiPonalrouPng

S → R A

A → R C

C → R X

GreedyrouPng

Page 63: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

PotenPalproblems?

•  Invisiblenodes

•  RouPngloops

•  Brokenreturnpaths

•  Inconsistentroots

Page 64: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

IteraPverouPng:Invisiblenodes

RA

S

C

X

B

  Invisiblenodescauselookuptohalt

k

Page 65: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

IteraPverouPng:Invisiblenodes

RA

S

C

X

B

  Invisiblenodescauselookuptohalt

  EnablelookuptoconPnue  TighterPmeoutsvianetworkcoordinates  LookupRPCsinparallel  Unreachablenodecache

X Dk

Page 66: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Inconsistentroots

R

S

•  Nodesdonotagreewherekeyisassigned:inconsistentviewsofroot

–  Canbecausedbymembershipchanges

–  Alsoduetonon‐transiPveconnecPvity:maypersist!

R’?

S’

X

k

Page 67: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Inconsistentroots

R

•  Rootreplicates(key,value)amongleafset–  Leafsperiodicallysynchronize–  GetgathersresultsfrommulPpleleafs

•  Notapplicablewhenrequirefastupdate

R’MN

XS

k

Page 68: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

LongertermsoluPon?

•  Routearoundlocalminimawhenpossible

•  Havenodesmaintainlink‐stateofneighbors–  Performone‐hopforwardingifnecessary

S → R A

A → R B

B → R R

RS CBA

TradiPonalrouPng

S → R A

A → R C

C → R X

GreedyrouPng

Page 69: Peer‐to‐peer systems and Distributed Hash Tables (DHTs) · Many New Challenges • Relave to other parallel/distributed systems – Paral failure – Churn – Few guarantees

Summary•  Peer‐to‐peersystems

–  Unstructuredsystems•  Findinghay,performingkeywordsearch

–  Structuredsystems(DHTs)•  Findingneedles,exactmatch

•  Distributedhashtables–  BasedaroundconsistenthashingwithviewsofO(logn)–  Chord,Pastry,CAN,Koorde,Kademlia,Tapestry,Viceroy,…

•  Lotsofsystemsissues–  Heterogeneity,storagemodels,locality,churnmanagement,underlayissues,…

–  DHTs(Kademlia)deployedinwild:Vuzeis1M+acPveusers