Volley: Automated Data Placement for Geo-Distributed Cloud Services
description
Transcript of Volley: Automated Data Placement for Geo-Distributed Cloud Services
![Page 1: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/1.jpg)
Volley:Automated Data Placement
for Geo-Distributed Cloud Services
![Page 2: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/2.jpg)
![Page 3: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/3.jpg)
Why data placement important?
Minimize latency
Eliminate redundant cost
Optimize utilization of data center
•user wants lower latency
•cloud service operator wants to limit cost
•partitioning data across DCs
![Page 4: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/4.jpg)
Live Messenger Live Mesh
• Cover all users and devices that accessed these services over this entire month
• clients are identified by application-level unique identifiers.
Commercial cloud service trace analysis
![Page 5: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/5.jpg)
Challenge of data placement
Geographic Diversity
Data Sharing
Data-inter Dependency
Data Center Capacity
Client Mobility
![Page 6: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/6.jpg)
Challenge: Geographic Diversity
![Page 7: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/7.jpg)
Challenge: Data Sharing
![Page 8: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/8.jpg)
Data-inter dependency in Live meshChallenge: Data-inter Dependency
![Page 9: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/9.jpg)
The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity
Challenge: Datacenter Capacity
![Page 10: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/10.jpg)
Challenge: User Mobility
![Page 11: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/11.jpg)
Proven algorithms do not apply to this problem
![Page 12: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/12.jpg)
Volley
![Page 13: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/13.jpg)
Three phases
Volley Algorithm
Compute Initial Placement
Iteratively Move Data to Reduce Latency
Iteratively Collapse Data to Datacenters
![Page 14: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/14.jpg)
Common IPPut data close to the IP address that accesses it most frequently oneDCPut all data in one data center HashRandomly allocate data Volley
Data placement heuristics
![Page 15: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/15.jpg)
Capacity Skew
Inter-Datacenter Traffic
Latency
Evaluation
Metrics
![Page 16: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/16.jpg)
Hash> Volley> Common IP> oneDC
Capacity Skew
![Page 17: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/17.jpg)
oneDC> Volley> Common IP> Hash
Inter-datacenter Traffic
![Page 18: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/18.jpg)
Volley> Common IP> oneDC> Hash>
Latency
![Page 19: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/19.jpg)
Capacity skew:Hash>Volley>Common IP>oneDC
Inter-DC traffic:oneDC>Volley>Common IP>Hash
LatencyVolley>Common IP>oneDC>Hash
Evaluation
![Page 20: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/20.jpg)
Iteration Count• In phase 2, exceeded iterations do not have significant
improvement• 5 iterations enough• Phase 3 determines the capacity skewRe-computation• Do make sense• Reason: data migration
Improvement of Volley
![Page 21: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/21.jpg)
Data placement is vital in cloud service
Volley has a comprehensive advantagesimultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours computation across 1 week of traces
The re-computation of Volley algorithm is necessary
Conclusion
![Page 22: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/22.jpg)
Limitation of the evaluation conducted by the paper No good contrast Can geo-distance stand for latency? Client mobility? Large space for development
Let’s go on…….
![Page 23: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/23.jpg)
Thank You!
![Page 24: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/24.jpg)
Phase 1:calculate geographic centroid for each data
![Page 25: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/25.jpg)
Phase 2:Refine centroid for each data iteratively
•considering client locations, and data inter-dependencies •using weighted spring model that attracts data items , but on a spherical coordinate system
![Page 26: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/26.jpg)
![Page 27: Volley: Automated Data Placement for Geo-Distributed Cloud Services](https://reader036.fdocuments.us/reader036/viewer/2022081604/568165c4550346895dd8cfc6/html5/thumbnails/27.jpg)