Post on 08-Aug-2020
HeXiao Zhenhua Li Ennan Zhai Tianyin Xu
PracticalWeb-basedDeltaSyncforCloudStorageServices
xiaoh16@gmail.comJuly10,2017Hotstorage’17
NetworkTrafficisOverwhelminginCloudStorage
FileSync
2
CloudTraffichas30%CAGR(CompoundAverage Growth Rate)
SeverClient
NetworkTrafficUsers Vendors
DeltaSyncImproves Network Efficiency
DeltaSynciscrucialforreducingcloudstoragenetworktraffic.
10MB1B
DeltaSync
DeltaData
3
NewFile OldFile
Delta sync support in nine state-of-the-art cloud storage services 10MB
FullSyncNewFile OldFile
FullFile
No Web-basedDeltaSync
Whyweb-baseddeltasyncisnotsupportedbytoday’scloudstorageservices?
4
WebAppswithlocalstorageorlogfilesneedweb-basedDeltaSync
WebisthemostpervasiveandOS-independent cloudstorageaccessmethod
Web-baseddeltasyncisessentialforcloudstoragewebclientsandwebapps
Contribution
• Wequantitatively studywhyweb-baseddeltasyncisnotoffered bytoday’scloudstorageservices.
• Webuildapracticalweb-baseddeltasyncsolutionforcloudstorageservices.• Byreversing traditionaldeltasyncprocess,wemaketheoverheadaffordableatthewebclientside.• Byexploitingthelocality ofusers’editsandtradingoffhashalgorithms,wemakethecomputationoverheadaffordableattheserverside.
5
WebRsync:ImplementDeltaSynconWeb
• Implementrsync onrealcloudstoragewith nativewebtech:JavaScript + HTML5 + WebSocket• rsync isthedefactosolutionofdeltasyncincloudstorage
JavaScriptImplementationofRsync
WebServer
LocalFile System
HTML5FileAPI
WebSocket
StorageBackendAliyun OSS/OpenStack Swift
High-SpeedInternalNetwork
Web Browser
CImplementationofRsync
6
WebRsyncvs.rsync
7
Sync time of WebRsync vs rsync
Average Client CPU utilization
StagnationduetoJavaScript’sSingle-thread EventLoopModel
//printtimestampevery100mssetInterval(print(timestamp),100) //printthetimestampofeverykeystone( startorendofatask)on_start(task); print(task.id, timestamp) on_finish(task); print(task.id, timestamp)
8
StagMeter
1.SendmetadataWaitserver
2.ChecksumSearchandComparison
3.SendtokensandliteralbytesWaitserver
High CPU Utilizationwhencomputing
TimestampPrintingissuspendedWebisunderstagation state
StagMeteronWebRsync
9
Sync Process (Second)
WebR2sync:Client-sideOptimizationReverseComputationProcess
Client Server
RequestforSyncingFilef’
ChecksumListoffSegmentationFingerprinting
SearchingComparing
GeneratetokensandLiteralBytes Construct
NewFilefACK
10
WebRsync
WebR2sync:Client-sideoptimizationReverseComputationProcess
• Web Reverse Rsync: Reverse complicated computation fromserver to client.
Client Server
RequestforSyncingFilef’
SegmentationFingerprinting
GenerateTokensAndLiteralBytes
ConstructNewFilefACK
SearchingComparing
ChecksumListoff
11
PerformanceofWebR2sync
Edit Size (Byte)
Sync
Tim
e (S
econ
d)
12
Edit Size (Byte)
Sync
Tim
e (S
econ
d)
Issue:Servertakesseverelyheavyoverhead.
Server-sideOverheadProfiling
Checksumsearching andblockcomparison occupy80%ofthecomputingtime
MD5 Computing Checksum Search
13
Ø UsefasterhashfunctionstoreplaceMD5Ø Reducechecksumsearchingoverhead
ReplacingMD5withSipHashinChunkComparison
HashFunction CollisionProbability
CyclesperByte
MD5 Low 5.58
Murmur3 High 0.33
Spooky High 0.14
SipHash Low 1.13
SipHashremainlowCollisionProbabilityat muchfasterspeed
14
A comparison of pseudorandom hash functions
SolvePossibleHashCollision
• ReplaceMD5withSipHash,maycausepotentialcollisions(Probabilityp),sodoesMD5.
• OurSolution:UseSpooky(fastestmethod,collisionprobabilityp’).• Theprobabilityofcollisionsisp*p’
• Alternative:UseMD5orotherstronghashfunctionsasaglobalverification.• ComputeMD5overwholefileisexpensive.
15
ReduceChunkSearchingbyExploitingLocality ofFileEdits.
16
MD5-4
HashTableAdler32-1 Adler32-2 Adler32-3 Adler32-4
MD5-1 MD5-2 MD5-3
Block1 Block2 Block3 Block4
Checksumsearch
Compare
95%synchronizedfileshavelessthan10 edits.
EvaluationSetup
17
Basic experiment setup visualized in a map of China
SyncTime
18
1 10 100 1K 10K 100kEdit Size (Byte)
10-1
100
101Sy
nc T
ime
(Sec
ond) WebRsync
WebR2syncWebR2sync+rsync
WebR2sync+is2-3 times fasterthanWebR2syncand15-20timesfasterthanWebRsync
Throughput
19
0 2000 4000 6000 8000Number of Concurrent Users
NoWebRsync
WebRsync
WebR2sync
WebR2sync+
rsync
Thisthroughputisas4 timesasthatofWebR2sync/rsyncandas 9timesasthatofNoWebRsync.
FutureWork
• Evaluateourapproachunderdifferenteditmodes• delete,insert,append
• Evaluatetrafficefficiency• allthemethodsshouldhavesimilartrafficefficiency
• Understandtheeffectsofthreeoptimizations• evaluatethemseparately
20
Discussion
• Probabilityofcollisionsoffilechecksums
• Characteristicsoffileoperationsinreal-worldscenariosfromtheperspectiveofsync
• Localitymeasurefordecidingwhethertoapplylocality-basedoptimization.
21
Conclusion
•WebR2sync+isapracticalsolutionforweb-baseddeltasync• lightweightcomputation attheclientside• optimizedoverheadattheserverside• theserver-sideoptimizationscanbeadoptedinthetraditionalcloudstoragearchitecture
22
Thanks!discussion
23
WebRsyncDetailed Description
Block1
Block2
Block3
…
Adler32 MD5
Adler32 MD5
Adler32 MD5
… …
WeakChecksumSearch
StrongChecksumCompare
1 block offset
YES
YES
NO
NO
MatchedTokens LiteralBytes ConstructNewFile
Client Server
1 byte offset
Rolling Adler32O(1): Adler(i)=>Adler(i+1)
24
WebR2sync:FlowchartandData structure
ConstructNewFilesClient Server
WeakChecksumSearch
StrongChecksumCompare
YES
NO
NO
1 byte offsetNo further Operation
YESBlock 1Block 2Block 3Block 4
Block 1Block 2Block 3Block 4
Whenfind amatch,recordtheassociatedindex
25
SyncTimedecomposed
26
1 10 100 1K 10K 100KEdit Size (Byte)
0
0.05
0.1
0.15
0.2Sy
nc T
ime
(Sec
ond) Server
NetworkClient
WebR2sync+clienttakesstableandshortertime.BecauseoftheServer-sideoptimization,computingtimeismuchshorterbothinclientandserver.