Towards a Unified View of Cloud Elasticity
-
Upload
srikumarv -
Category
Technology
-
view
248 -
download
1
description
Transcript of Towards a Unified View of Cloud Elasticity
![Page 1: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/1.jpg)
Towards a Unified View of Elasticity
Srikumar Venugopal & Team
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
![Page 2: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/2.jpg)
Acknowledgements
• Basem Suleiman • Han Li • Reza Nouri • Freddie Sunarso • Richard Gow
![Page 3: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/3.jpg)
Agenda
• Introduction to elasticity and its challenges
• Performance Modeling of Elasticity Rules • Autonomic Decentralised Elasticity
Management of Cloud Applications • Efficient Bootstrapping for Decentralised
Shared-nothing Key-value Stores
![Page 4: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/4.jpg)
Simple Service Deployment on Cloud
![Page 5: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/5.jpg)
Elasticity
The ability of a system to change its capacity in direct response to the workload demand
![Page 6: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/6.jpg)
Different Views of Elasticity
• Performance View – When to scale and how much ?
• Application View – Does the architecture accommodate scaling ? – How is state managed ?
• Configuration View – Are there changes in configuration due to
scaling?
![Page 7: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/7.jpg)
Elastic Deployment Architecture
![Page 8: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/8.jpg)
Elasticizing Application Layer
![Page 9: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/9.jpg)
Trigger – Controller – Action
• Trigger: Threshold Breach • Controller: Intelligence/Logic • Action: Add or Remove Capacity
![Page 10: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/10.jpg)
State-of-the-art in Auto-scaling
Product/Project Trigger Controller Ac3ons
Amazon Autoscaling
Cloudwatch metrics/ Threshold
Rule-‐based/Schedule-‐based
Add/Remove Capacity
WASABi Azure Diagnos3cs/Threshold
Rule-‐based Add/Remove Capacity, Custom
RightScale/Scalr Load monitoring Rule-‐based/Schedule-‐based
Add/Remove Capacity, Custom
Google Compute Engine
CPU Load, etc. Rule-‐based Add/Remove Capacity
Academic
CloudScale Demand Predic3on Control theory Voltage-‐scaling
Cataclysm Threshold-‐based Queueing-‐model Admission Control
IBM Unity Applica3on U3lity U3lity func3ons/RL Add/Remove Capacity
![Page 11: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/11.jpg)
Summary
• Currently, the most popular mechanisms for auto-scaling are rule-based mechanisms
• The effectiveness of rule-based autoscaling is determined by the trigger conditions
• So, how do we know how to set up the right triggers ?
![Page 12: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/12.jpg)
Performance Modeling of Elasticity Rules
Basem Suleiman
![Page 13: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/13.jpg)
Elasticity (Auto-Scaling) Rules
Examples: • If CPU Utilization ≥ 85% for 7 min. add 1 server (Scale Out) • If RespTimeSLA ≥ 95% for 10 min. remove 1 server (Scale In)
B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
![Page 14: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/14.jpg)
Performance of Different Elasticity Rules
• How well do elasticity rules perform in terms of SLA satisfaction, CPU utilization , costs and % served request?
Rule Elasticity Rules
CPU75 If CPU Util.>75% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server
CPU80 If CPU Util.>80% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server
CPU85 If CPU Util.>85% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server
SLA90 If SLA < 90% for 5 min; add 1 server If SLA ≥ 90% for 5 min; remove 1 server
SLA95 If SLA < 95% for 5 mins; add 1 server If SLA ≥ 95% for 5 mins; remove 1 server
B. Suleiman, S. Sakr, S. Venugopal, W. Sadiq, Trade-‐off Analysis of Elas2city Approaches for Cloud-‐Based Business Applica2ons, Proc. WISE 2012
![Page 15: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/15.jpg)
Cloud Testbed for Collecting Metrics
TPC-W database
EC2
EC2
TPC-W application
.......
Elastic Load Balancer
EC2
EC2
% SLA Satisfaction, Avg. CPU Utilization Server Costs and % served Requests
Response Time
B. Suleiman, S. Sakr, S. Venugopal, W. Sadiq, Trade-‐off Analysis of Elas2city Approaches for Cloud-‐Based Business Applica2ons, Proc. WISE 2012
![Page 16: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/16.jpg)
Performance Evaluation - Different Elasticity Rules
Max
Min
Median
Q3
Q1
Mean
Legend
$0.00
$0.50
$1.00
$1.50
$2.00
$2.50 CPU75
CPU80
CPU85
SLA90
SLA95
Cos
ts
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
CPU75
CPU80
CPU85
SLA90
SLA95 CPU
Util
izatio
n B. Suleiman, S. Sakr, S. Venugopal, W. Sadiq, Trade-‐off Analysis of Elas2city Approaches for Cloud-‐Based Business Applica2ons, Proc. WISE 2012
![Page 17: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/17.jpg)
The Challenges of Thresholds
You must be at least this tall to scale up!
• Threshold values determine performance and cost
• E.g. Low CPU utilization => Higher cost, Better Performance
• Thresholds vary from one application to another
• Empirically determining thresholds is expensive.
B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
![Page 18: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/18.jpg)
Can we construct a model that allows us to establish the right thresholds ?
![Page 19: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/19.jpg)
Queue Model of 3-tier
B. Suleiman, S. Venugopal, Modeling Performance of Elas2city Rules for Cloud-‐based Applica2ons, EDOC 2013 (Accepted)
![Page 20: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/20.jpg)
Establishing Rule Thresholds
• Developed a model based on M/M/m queuing model – Simultaneous session initiations on 1 server – Provisioning Lag Time of the provider – Cool-down interval after elasticity action – Algorithms to model scale-in and scale-out – Request Mix
• Compared model fidelity with actual cloud execution of TPC-W workload.
B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.
![Page 21: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/21.jpg)
Experiments: Methodology
• Run the TPC-W workload on Amazon cloud resources using thresholds
• Simulate the model using MATLAB with the same thresholds
• Compare the simulation results to the results from the actual execution – If both are equivalent, then we are good J
B. Suleiman, S. Venugopal, Modeling Performance of Elas2city Rules for Cloud-‐based Applica2ons, EDOC 2013 (Accepted)
![Page 22: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/22.jpg)
Experiments: Testbed
TPC-W database
EC2
TPC-W user emulation
Linux – Extra-large
EC2
TPC-W application
.......
Elastic Load Balancer
EC2
Small/Medium server Linux – JBoss/JSDK
Extra-large server Linux - MySQL
EC2
![Page 23: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/23.jpg)
Experiments: Input Workload
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 5700
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
Req
uest
Arr
ival
Rat
e (r
eq/m
in)
Time (minutes)
Workload
• Used TPC-W Browsing profile (95% read) • Stress on application tier • Number of concurrent-users – Zipf • Inter-arrival times - Poisson
![Page 24: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/24.jpg)
Experiments: Elasticity Rules
Rule Rule Expansion
CPU75 If CPU Util. > 75% for 5 min, add 1 server If CPU Util. < 30% for 5 min, remove 1 server
CPU80 If CPU Util. > 80% for 5 min, add 1 server If CPU Util. < 30% for 5 min, remove 1 server
Common parameters: • Waiting time – 10 mins., Measuring interval – 1 min. Metrics Captured: • Average CPU Utilization across all the servers • Average Response Time in a time interval • Number of servers in operation at any point of time
![Page 25: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/25.jpg)
Results
CPU Utilization
CPU75M CPU75E CPU80M CPU80E0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Elasticity Rules - Model (M) & Empirical (E)
Avg
. CP
U U
tiliz
atio
n
CPU75M CPU75E CPU80M CPU80E
Average Response Time
CPU75M CPU75E CPU80M CPU80E0.0
0.1
0.2
0.3
0.4
0.5
Elasticity Rules - Models (M) & Empirical (E)
Avg
. Res
pons
e Ti
me
(sec
)
CPU75M CPU75E CPU80M CPU80E
![Page 26: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/26.jpg)
0 40 80 120 160 200 240 280 320 360 400 440 480 520 5600%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%A
vg. C
PU
Util
izat
ion
(%)
Time (minutes)
CPU80M CPU80E
CPU Utilization over Time
![Page 27: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/27.jpg)
0 40 80 120 160 200 240 280 320 360 400 440 480 520 5600
1
2
3
4
5
6
No.
Ser
vers
(App
. Tie
r)
Time (minutes)
CPU75M CPU75E CPU80M CPU80E
Number of Servers Initialized
![Page 28: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/28.jpg)
Summary
• Developed a queueing model that can be used to reason about elasticity
• Model captures effects of thresholds and can be used for testing different rules
• Evaluations show that the model approx. real-world conditions closely
• Future work: handling initial bursts in workload
![Page 29: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/29.jpg)
Autonomic Decentralised Elasticity Management of Cloud Applications
Reza Nouri and Han Li
![Page 30: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/30.jpg)
Cons of Rule-based Autoscaling
• Commercial products are rule-based – Gives “illusion of control” to users – Leads to the problem of defining the “right”
thresholds • Centralised controllers
– Communication overhead increases with size – Processing overhead also increases (Big
Data!) • One application/VM at a time
![Page 31: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/31.jpg)
Challenges of large-scale elasticity
• Large numbers of instances and apps – Deriving solutions takes time
• Dynamic conditions – Apps are going into critical all the time
• Shifting bottlenecks – Greedy solutions may create bottlenecks in
other places • Network partitions, fault tolerance…
H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.
![Page 32: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/32.jpg)
Initial Conditions
Instance1 App Server1
app1 app2
Instance2 App Server2
app3 app4
IaaS Provider
![Page 33: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/33.jpg)
A Critical Event
Instance1 App Server1
app1 app2
IaaS Provider
Instance2 App Server2
app3 app4
![Page 34: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/34.jpg)
Placement 1
Instance1 App Server1
app1
IaaS Provider
Instance2 App Server2
app3 app4 app2
![Page 35: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/35.jpg)
Placement 2
Instance1 App Server1
app2
IaaS Provider
Instance2 App Server2
app3 app4
Instance3 App Server3
app1
$$
![Page 36: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/36.jpg)
Placements 3 & 4
Instance1 App Server1
app2
IaaS Provider
Instance2 App Server2
app3 app4
Instance1 App Server1
app2
IaaS Provider
Instance2 App Server2
app3 app4
Instance3 App Server3
app1 app1
app1 app1
![Page 37: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/37.jpg)
Problems for Automatic Placement
• Provisioning – Smallest number of servers required to satisfy
resource requirements of all the applications • Dynamic Placement
– Distribute applications so as to maximise utilisation yet meet each app’s response time and availability requirements
H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.
![Page 38: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/38.jpg)
Co-ordinated Control of Elasticity
• Instances control their own utilisation – Monitoring, management and feedback
• Local controllers are learning agents – Reinforcement Learning
• Controllers learn from each other – Share their knowledge and update their own
• Servers are linked by a DHT – Agility, Flexibility, Co-ordination
H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings of 8th ICAC '11.
![Page 39: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/39.jpg)
Abstract View of the Control Scheme
H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings of 8th ICAC '11.
![Page 40: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/40.jpg)
Fuzzy Thresholds
H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.
![Page 41: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/41.jpg)
Basic Actions
Instance
Applica3on
create! terminate! find!
move! duplicate! merge!
(-‐3.5) (3.5) (3.5)
(0.5) (0.5) (0.5)
![Page 42: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/42.jpg)
Co-ordination using find!
• Server looks up other servers with the least load – DHT lookup
• Sends a move message to the selected server
• Replies with accept or reject!– accept has a +ve reward
![Page 43: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/43.jpg)
Shrinking
• The controller is always reward maximising – Highest Reward is for merge+terminate
• A controller initiates its own shutdown – Low load on its applications
• Gets exclusive lock on termination – Only one instance can terminate at a time
• Transfers state before shutdown
![Page 44: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/44.jpg)
Experiments
• Six web applications – Test Application: Hotel Management – Search à Book à Confirm
• Five were subjected to a background load – Uniform Random
• One was subjected to the test load • Application threshold: 200 and 500 ms • Metrics
– Average Response Time, Drop Rate, Servers
H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elas3c Web Applica3on Hos3ng Pla\orm”, Proceedings of 8th ICAC '11.
![Page 45: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/45.jpg)
Experimental Results (EC2)
![Page 46: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/46.jpg)
Elasticising Persistence Layer
![Page 47: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/47.jpg)
Efficient Bootstrapping for Decentralised Shared-nothing Key-
value Stores
Han Li
![Page 48: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/48.jpg)
Key-value Stores
• The standard component for cloud data management
• Increasing workload à Node bootstrapping – Incorporate a new, empty node as a member of KVS
• Decreasing workload à Node decommissioning – Eliminate an existing member with redundant data off
the KVS
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 49: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/49.jpg)
Research Questions
• As the system scales, how to efficiently incorporate or remove data nodes? – Load balancing, migration overheads, etc.
• How to partition and place the data replicas when the system is elastic? – Data consistency, durability, availability, etc..
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 50: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/50.jpg)
Elasticity in Key-Value Stores
• Minimise the overhead of data movement – How to partition/store data?
• Balance the load at node bootstrapping
– Both data volume and workload – How to place/allocate data?
• Maintain data consistency and availability – How to execute data movement?
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 51: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/51.jpg)
A
B
G
F
C
D
E
I
H
Key space
Split-Move Approach
A
I
C CD
Node 1 Node 2
Node 3 Node 4
B
IB
B
A
Master Replica Slave Replica
A
H
A
I B2
C CD
Node 1 Node 2
Node 3 Node 4
New Node
B1 B2
I
B1
B2
A
B1
Master Replica Slave Replica
A
H①
①①
A
B
G
F
C
D
E
I
HB2
B1
①
Key space
②A
I B2
C CD
B2
A B1
Node 1 Node 2
Node 3 Node 4
New Node②
B1 B2
I
B1
B2
A
B1
Master Replica Slave Replica
A
H
A
I B2
C CD
B2
A B1
Node 1 Node 2
Node 3 Node 4
New Node②②
B1 B2
I
B1
B2
A
B1
Master Replica Slave Replica
To be deleted
③
A
H
Partition at node bootstrapping
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 52: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/52.jpg)
Virtual-Node Approach
A
B
G
F
C
D
E
I
H
Key spaceD B
E H
I G
A C
D F
G I
A B
C E
I
C D
F H
G
Node 1 Node 2
Node 3 Node 4
D B
E H
I G
A C
D F
G I
A B
C E
I
C D
F H
G
Node 1 Node 2
Node 3 Node 4
New NodeD B
E H
I G
A C
D F
G I
A B
C E
I
C D
F H
G
B A
E F
H
Node 1 Node 2
Node 3 Node 4
New Node
......Partition at system startup
Data skew: e.g., the majority of data is stored in a minority of partitions. Moving around giant partitions is not a good idea.
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 53: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/53.jpg)
Our Solution • Virtual-node based movement
– Each partition of data is stored in separated files – Reduced overhead of data movement – Many existing nodes can participate in bootstrapping
• Automatic sharding – Split and merge partitions at runtime – Each partition stores a bounded volume of data – Easy to reallocate data – Easy to balance the load
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 54: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/54.jpg)
The timing for data partitioning • Shard partitions at writes (insert and delete)
– Split: Size(Pi) ≤ Θmax – Merge: Size(Pi) + Size(Pi+1) ≥ Θmin
Split
Delete
Insert
Merge
BA
CD
E
B1A
CD
E
B2
B1A
CD
E
B2
B1A
M
DE
Split
Delete
InsertB
A
CD
E
B1A
CD
E
B2
B1A
CD
E
B2
Split
InsertB
A
CD
E
B1A
CD
E
B2B
A
CD
E
Θmax ≥ 2Θmin
Avoid oscilla3on!
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 55: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/55.jpg)
Sharding coordination • Solution: Election-based coordination
Node-A
Node-C
Node-E
Node-B
SortedList:C, E, ..., A, ..., B Step1
Election
Node-A
CoordinatorNode-C
Node-E
Node-B
Step 2Enforce Split/Merge
Data/Node mappingNode-A
CoordinatorNode-C
Node-E
Node-B 1st
Data/Node mapping
Step 3 Finish Split/Merge
2nd
3rd
4th
Node-A
CoordinatorNode-C
Node-E
Node-B
Step 4Announce to all nodes
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 56: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/56.jpg)
Node failover during sharding Non-
coordinatorsNon-
coordinatorsNon-
coordinatorsElection
Notification:Shard Pi
Time
Beforeexecution
Duringexecution
Afterexecution
Replace Replicas
Coordinator
Announce:Successful
Step2
Step3
Step4
Step1Non-
coordinatorsNon-
coordinators
Removed from candidate list
Non-coordinatorsElection
Failed Resurrectyes
No
Yes
Notification:Shard Pi
Append to candidate list
Gossip
No Dead
Time
Beforeexecution
Duringexecution
Afterexecution
Replace Replicas
Coordinator
Announce:Successful
Step2
Step3
Step4
Step1Non-
coordinatorsNon-
coordinatorsNon-
coordinatorsElection
Notification:Shard Pi
Gossip Continue without coordinator Resurrect
Dead
No
Yes
Time
Beforeexecution
Duringexecution
Afterexecution
Failed
Replace Replicas
Coordinator
Announce:Successful
Step2
Step3
Step4
Step1Non-
coordinatorsNon-
coordinatorsNon-
coordinatorsElection
Notification:Shard Pi
Failed
Gossip
Yes
Continue without coordinator
ElectNew coordinator
NoInvalidate Piin this node
Timeout
Time
Beforeexecution
Duringexecution
Afterexecution
Replace Replicas
Coordinator
Announce:Successful
Step2
Step3
Step4
Step1
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 57: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/57.jpg)
Evaluation Setup
• ElasCass: An implemention of auto-sharding, building on Apache Cassandra (version 1.0.5), which uses Split-Move approach.
• Key-value stores: ElasCass vs. Cassandra (v1.0.5)
• Test bed: Amazon EC2, m1.large type, 2 CPU cores, 8GB ram
• Benchmark: YCSB
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 58: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/58.jpg)
Evaluation – Bootstrap Time
• Start from 1 node, with 100GB of data, R=2. Scale up to 10 nodes.
• In Split-Move, data volume transferred reduces by half from 3 nodes onwards.
• In ElasCass, data volume transferred remains below 10GB from 2 nodes.
• Bootstrap time is determined by data volume transferred. ElasCass exhibits a consistent performance at all scales.
H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.
![Page 59: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/59.jpg)
Conclusions
• We have designed and implemented a decentralised auto-sharding scheme that – consolidates each partition replica into single
transferable units to provide efficient data movement;
– automatically shards the partitions into bounded ranges to address data skew;
– reduces the time to bootstrap nodes, achieves more balancing load and better performance of query processing.
![Page 60: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/60.jpg)
A Unified View of Elasticity (?)
![Page 61: Towards a Unified View of Cloud Elasticity](https://reader033.fdocuments.us/reader033/viewer/2022051514/54c17c4f4a79590b338b4602/html5/thumbnails/61.jpg)
Final Thoughts
• Elasticising Application Logic is done – How do we eliminate thresholds ? – Should it be more autonomic ?
• Application View of Elasticity – Managing state is the big challenge – Decoupling of different components (service-
oriented model) – How would you scale interconnected
components ?