Synchronizing Clocks in a Datacenter to 10s of Nanoseconds Talks/retreat-2017/Yilong...

Post on 07-Jul-2020

6 views 0 download

Transcript of Synchronizing Clocks in a Datacenter to 10s of Nanoseconds Talks/retreat-2017/Yilong...

YilongGeng,Shiyu Liu,Zi Yin,Balaji Prabhakar,MendelRosenBlumStanfordUniversity

incollaborationwithAshishNaik,AminVahdatGoogleInc.

SynchronizingClocksinaDatacenterto10sofNanoseconds

Clocksynchronizationisuseful• Boostsperformanceofdatabasesbymaintainingcausalityandexternal

consistency• Enablesschedulingtasksandresourceswithprecisetiming• Getsridofthe“clockless” assumptioninbuildingdistributedsystems,

enablesbrand-newsystemsandapplications

Clocksynchronizationverychallenging• PTPandPPSrequirecompatiblehardwareandareexpensive• ManyPTPcompatibleswitchesstillperformpoorlyunderload

Background

Syncingclockswithprobes

Probe

Probe

𝑇𝑋# 𝑅𝑋%

𝑇𝑋%𝑅𝑋#

t

𝑡# = 𝑡 + Δ𝑡# 𝑡% = 𝑡 + Δ𝑡%

ProbefromAtoB:

• Receivetime=transmittime+delay

• 𝑅𝑋% − 𝛥𝑡% = 𝑇𝑋# − 𝛥𝑡# + 𝑃𝑟𝑜𝑝𝑜𝑔𝑎𝑡𝑖𝑜𝑛𝑎𝑛𝑑𝑞𝑢𝑒𝑢𝑒𝑖𝑛𝑔𝑑𝑒𝑙𝑎𝑦

• 𝛥𝑡% − 𝛥𝑡# = 𝑅𝑋% − 𝑇𝑋# − 𝑃𝑟𝑜𝑝𝑜𝑔𝑎𝑡𝑖𝑜𝑛𝑎𝑛𝑑𝑞𝑢𝑒𝑢𝑒𝑖𝑛𝑔𝑑𝑒𝑙𝑎𝑦

• 𝛥𝑡% − 𝛥𝑡# < 𝑅𝑋% − 𝑇𝑋#

ProbefromBtoA:

• 𝛥𝑡% − 𝛥𝑡# > 𝑇𝑋% − 𝑅𝑋#

Eachprobeisaboundontheclock

Clockboundsovertime𝛥𝑡

%−𝛥𝑡

#(𝜇𝑠)

𝑡#(𝑠𝑒𝑐)

SyncingclockswithSVMs

Codedprobes

Network

Secondpacketdelayedmore

Firstpacketdelayedmore

Likelynoqueueing delay

Secondpacket

Firstpacket

10us >>10us

<<10us

~10us

Codedprobes

Thenetworkeffect-- exposingtheerror

A

B

C

Ifmyclockisat10,B’sclockmustbeat10:15

Ifmyclockisat10:15,C’sclockmustbeat10:05

Ifmyclockisat10:05,A’sclockmustbeat9:50

Guys,weareoffby10minutes!

2?

2?

6?

-10?

5?

15?

3.3

3.3

3.3

NetFPGA verification

• SingleNetFPGA actsas4independentNICssharingthesameclock

• DifferentNetFPGAs syncedwithI/Opins

NetFPGA verification

SingleNetFPGA DifferentNetFPGAs

w/onetowrkeffect

w/networkeffect

w/o networkeffect

w/networkeffect

Meanofabs.error(ns) 40.0 11.0 38.6 13.6

99th percentile ofabs.error(ns) 94.3 22.7 89.6 29.4

10-minute experimentat40%networkloadwithK=10

Robusttohighnetworkload

0 10 20 30 40 50 60 70 80

Network load (%)

0

5

10

15

20

25

30

35

40

Err

or(n

s)

mean99th percentile

Synchronizationerrorstaysunder40nsat80%load

Probebasedclocksynchronization• Workswithsimpleswitches• OnlyneedswidelyusedtimestampingcapableNICs

3Keyideastoachievenanosecondprecision• Support vectormachines• Codedprobes• Networkeffect

Distributedimplementation• Lightweight:~5Mb/sbandwidth andverysmallCPUoverhead• Runs inrealtime

Summary