A study of delta sync and other optimisations in HTTP/WebDav synchronisation protocols Do we need...
-
Upload
camron-walsh -
Category
Documents
-
view
214 -
download
0
description
Transcript of A study of delta sync and other optimisations in HTTP/WebDav synchronisation protocols Do we need...
A study of delta sync and other optimisations in HTTP/WebDav
synchronisation protocolsDo we need changes in OwnCloud protocol?
Wojciech JaroszAGH University of Science and Technology /
CERN
CS3 Zurich, January 2016 2
Introduction• Owncloud protocol, CERNBox service• Enhancing current protocol• Investigation of following enhancements:
o Bundlingo Delta-syncingo Compressiono Chunk size adjustment
• Context: scientific environment at CERN
CS3 Zurich, January 2016 3
Introduction
• Data from CERNBox FS and network logs
Analysis SimulationProposed
implementation
Decision
CS3 Zurich, January 2016 4
CERNBox• Distinguished features:
o Integrated with 80PB of physics data
o Future: easy and effective to share experiment results
o Future: focus on scientific usage
o Currently: a mix of scientific and personal use
CS3 Zurich, January 2016 5
CERNBox as of Oct 15• ~ 31 TB of data• ~ 3700 users• ~ 24 milion files in ~ 3 milion directories• Average file size: ~ 1.3 MB, median file size <
100kB• 200k file uploads / downloads per day
CS3 Zurich, January 2016 6
Filesizes
0 1 - 9 b 10 - 99 b 100 - 1000b
1kb - 10kb
10kb - 100kb
100kb - 1mb
1mb - 10mb
10mb - 100mb
100mb - 1gb
over 1gb0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
Files by size
CS3 Zurich, January 2016 7
Files count and size
null
png pd
fda
tjpg sv
nro
ot txt eps h
npy c
html
log gif xml
olk14
messa
ge tex o f0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Countsize(GB)
No extension
CS3 Zurich, January 2016 8
Where are the transfers coming from?Transfers
CERNUnviersities / Insti-tutionsOthers
CS3 Zurich, January 2016 9
Downloads vs Uploads
44%
56%
GETs vs PUTs
PUTGET
CS3 Zurich, January 2016 10
Protocol - chunking• Could be used for:
o partial uploado delta-synco deduplication
• Is the chunk size chosen correctly?o Most of the files are smallo Modern protocols should use network-aware chunking
• Currently only ~0.15% of all PUTs are chunked• Is dynamic chunking a viable option?
CS3 Zurich, January 2016 11
Enhancements to the current OwnCloud
protocolFocus on bundling, delta-sync and compression
CS3 Zurich, January 2016 12
Bundling• Typically users are active only a few days a month
2/15/2015 4/6/2015 5/26/2015 7/15/2015 9/3/2015 10/23/20150
20000400006000080000
100000120000140000160000180000200000
Sample user transfers count
CS3 Zurich, January 2016 13
Bundling• Even power users work in cycles
3/1/2015 4/20/2015 6/9/2015 7/29/2015 9/17/20150
5000
10000
15000
20000
25000
Power user file transfers
CS3 Zurich, January 2016 14
Bundling• Typically users are active only a few days a month• Often over 2000 requests in 10 minutes• Small file size
Implementation?• Simple bundling – TARBall?• Choose the right bundle size• Send chunks in parallel• Error reporting
tar untar
CS3 Zurich, January 2016 15
BundlingDROPBOX[1]
CERNBOX*
• Reduce TCP slow-start effect
Before bundling After bundlingMedian flow size 16.2 kB 42.4 kB
Throughput PUT 358 kbit/s 552.92 kbit/s
Throughput GET 783 kbit/s 1294 kbit/s
Before bundling
After bundling
Throughput PUT
~3600 kbit/s Up to 400 Mbit/s ?
Throughput GET
~7653 kbit/s Up to 500 Mbit/s ?
[1] I. Drago, M. Mellia, M. M. Munaf`o, A. Sperotto, R. Sadre, and A. Pras. Inside Dropbox: Understanding Personal Cloud Storage Services. In Proceedings of the 12th ACM Internet MeasurementConference, IMC’12, pages 481–494, 2012.* Based on users inside CERN and affiliated institutions
CS3 Zurich, January 2016 16
Extensions and filesizes
root null jpg mp4 pdf mov enc avi zip mp3 gz img epio pptx 1 wav txt png iso nef0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Countsize(GB)
?
CS3 Zurich, January 2016 17
Delta-sync• About 7.8 % of the files are versions• Typically files are modified the same day• Usually small files
root mov pptx pdf mp4 zip h5 key bz2 jpg tc null vdi gz epio pxp hep tgz f4v0
1000000000
2000000000
3000000000
4000000000
5000000000
6000000000
CountSize
CS3 Zurich, January 2016 18
ROOT files• Scientific software framework• Complex file structure• Already compressed• Small changes scattered
throughout the file
CS3 Zurich, January 2016 19
Delta-sync• Possible implementations
o Chunk-basedo Byte-range request
• More data and simulation needed• It might be not worth implementing
CS3 Zurich, January 2016 20
Compression• From TOP20 extensions (sizewise) only .txt will
compress well• Compression can be slow, but almost all requests
are executed from desktop clients
root null jpg mp4 pdf mov enc avi zip mp3 gz img epio pptx 1 wav txt png iso nef0
1000000200000030000004000000500000060000007000000
Countsize(GB)
Future
CS3 Zurich, January 2016 22
Future - service• CernBOX fully exposed to a very large
scientific repository (ATLAS, LHCb, CMS…)
• Fuse-mount to underlying CernBOX storage available everywhere at CERN
• Will users use CERNBox in new ways?
CS3 Zurich, January 2016 23
Conclusion• Owncloud protocol is simple, but is it enough?• Understand before implementation
• Work in progress!• MSc at AGH
Analysis SimulationProposed
implementation
Decision
CS3 Zurich, January 2016 24
Conclusion• Bundling looks like the most viable
enhancement
• Further research is needed for delta-sync and dynamic chunking
• Compression is less likely to enhance current protocol
CS3 Zurich, January 2016 25
Contact detailsWojciech Jarosz
[email protected] +41 22 76 75970
Opinions / questions most welcome!• How the usage compares to
your system?• How to implement the new
features?• Feedback, ideas, comments…