Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores
-
Upload
han-li -
Category
Technology
-
view
675 -
download
1
description
Transcript of Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores
![Page 1: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/1.jpg)
Click to edit Present’s Name
Efficient Bootstrapping for Decentralised Shared-nothing Key-value StoresHan Li, Srikumar Venugopal
![Page 2: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/2.jpg)
School of Computer Science and Engineering
Agenda
• Motivations for Node Bootstrapping• Research Gap• Challenges and Solutions• Evaluations• Conclusion
![Page 3: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/3.jpg)
School of Computer Science and Engineering
On-demand Provisioning
The Capacity versus Utilisation Curve
![Page 4: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/4.jpg)
School of Computer Science and Engineering
Key-value Stores
• The standard component for cloud data management
• Increasing workload Node bootstrapping– Incorporate a new, empty node as a member of KVS
• Decreasing workload Node decommissioning– Eliminate an existing member with redundant data off the KVS
![Page 5: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/5.jpg)
School of Computer Science and Engineering
Goals for Efficient Node Bootstrapping
• Minimise the overhead of data movement– How to partition/store data?
• Balance the load at node bootstrapping– Both data volume and workload– How to place/allocate data?
• Maintain data consistency and availability– How to execute data movement?
![Page 6: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/6.jpg)
School of Computer Science and Engineering
Background: Storage model
• Shared Storage– Access same storage
• Distributed file systems• Networked attached storage
– E.g. GFS, HDFS– Simply exchange metadata
• Albatross, by S. Das, UCSB
• Shared Nothing– Use individual local storage– Decentralised, peer-to-peer– E.g. Dynamo, Cassandra,
Voldemort, etc.– Require data movement
• Lightweight solutions?
![Page 7: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/7.jpg)
School of Computer Science and Engineering
Background: Split-Move Approach
Partition at node bootstrapping
![Page 8: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/8.jpg)
School of Computer Science and Engineering
Background: Virtual-Node Approach
Partition at system startup
Data skew: e.g., the majority of data is stored in a minority of partitions.Moving around giant partitions is not a good idea.
![Page 9: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/9.jpg)
School of Computer Science and Engineering
Research Gap
• Shared Storage vs. Shared Nothing– Require data movement
• Centralised vs. Decentralised– Require coordination
• Split-Move vs. Virtual-node Based– Partition at node bootstrapping is heavyweight – Partition at system startup causes data skew
• The Gap: A scheme of data partitioning and placement that improves the efficiency of bootstrapping in shared-nothing KVS
![Page 10: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/10.jpg)
School of Computer Science and Engineering
Our Solution
• Virtual-node based movement– Each partition of data is stored in separated files – Reduced overhead of data movement– Many existing nodes can participate in bootstrapping
• Automatic sharding– Split and merge partitions at runtime– Each partition stores a bounded volume of data
• Easy to reallocate data• Easy to balance the load
![Page 11: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/11.jpg)
School of Computer Science and Engineering
The timing for data partitioning• Shard partitions at writes (insert and delete)
– Split: Size(Pi) ≤ Θmax
– Merge: Size(Pi) + Size(Pi+1) ≥ Θmin
Θmax ≥ 2Θmin
Avoid oscillation!
![Page 12: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/12.jpg)
School of Computer Science and Engineering
Challenge 1: Sharding coordination
• Issues– Totally decentralised– Each partition has multiple replicas– Each replica is split or merged locally
• Question– How to guarantee that all the replicas of certain partition are
simultaneously sharded?
![Page 13: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/13.jpg)
School of Computer Science and Engineering
Challenge 1: Sharding coordination
• Solution: Election-based coordination
![Page 14: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/14.jpg)
School of Computer Science and Engineering
Challenge 2: Node failover during sharding
![Page 15: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/15.jpg)
School of Computer Science and Engineering
Challenge 3: Data consistency during sharding• Use two sets of replicas at sharding
– Original partition and future partition– Data from different partitions is stored separate files
• Approach 1– Write to future partition, roll back at failure– Read from both partitions
• Approach 2– Write to both partitions, abandon future partition at failure – Read from original partition
![Page 16: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/16.jpg)
School of Computer Science and Engineering
Challenge 3: Data consistency during movement• Use a pair of tokens for each partition
– A Boolean token to approve and disapprove read/write
![Page 17: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/17.jpg)
School of Computer Science and Engineering
Replica Placement at Node Bootstrap
• Partition re-allocation and sharding are mutually exclusive;• Maintain data availability
– Each partition has at least R replicas
• Balance the load (e.g., number of requests)– Heavily loaded nodes have higher priority to “move out” data
• Balance the data– Balance the number of partitions across nodes
• Each partition, via sharding, is of similar size
• Two-phase bootstrap– Phase 1: guarantee R replicas, shift load from heavily loaded nodes– Phase 2: achieve load and data balancing in low-priority threads
![Page 18: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/18.jpg)
School of Computer Science and Engineering
Evaluation Setup
• ElasCass: An implemention of auto-sharding, building on Apache Cassandra (version 1.0.5), which uses Split-Move approach.
• Key-value stores: ElasCass vs. Cassandra (v1.0.5)• Test bed: Amazon EC2, m1.large type, 2 CPU cores, 8GB ram• Benchmark: YCSB• System scale: Start from 1 node, with 100GB of data, R=2. Scale up
to 10 nodes.
![Page 19: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/19.jpg)
School of Computer Science and Engineering
Evaluation – Bootstrap Time
• In Split-Move, data volume transferred reduces by half from 3 nodes onwards.
• In ElasCass, data volume transferred remains below 10GB from 2 nodes.
• Bootstrap time is determined by data volume transferred. ElasCass exhibits a consistent performance at all scales.
![Page 20: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/20.jpg)
School of Computer Science and Engineering
Evaluation – Data Volume
• ElasCass uses two-phase bootstrap. More data is pulled in at phase 2.• Imbalance Index = standard deviation / average. Data is well balanced in ElasCass.• ElasCass occupies less storage space than Split-Move approach.
![Page 21: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/21.jpg)
School of Computer Science and Engineering
Evaluation – Query Processing
• ElasCass is scalable, while Split-Move is not.
• Write throughput is higher than read throughput.
• ElasCass has better resources utilisation.
• ElasCass achieves balanced load.
![Page 22: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/22.jpg)
School of Computer Science and Engineering
Key Takeaways
• Using virtual nodes introduces less overhead in data movement, and reduces the bootstrap time to below 10 mins.– Apache Cassandra v.1.1 uses virtual nodes
• Consolidating the partitions into bounded ranges simplifies replica placement and facilitates load-balancing– MySQL, MongoDB start to auto-shard partitions
• A balanced load leads to 80% resource utilisation and increasing throughput scalable to #nodes.
![Page 23: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/23.jpg)
School of Computer Science and Engineering
Contributions and Acknowledgments
• We have designed and implemented a decentralised auto-sharding scheme that– consolidates each partition replica into single transferable units to
provide efficient data movement;– automatically shards the partitions into bounded ranges to address data
skew;– reduces the time to bootstrap nodes, achieves more balancing load and
better performance of query processing.
• The authors would like to thank Smart Services CRC Pty Ltd for the grant of Services Aggregation project that made this work possible.
![Page 24: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores](https://reader035.fdocuments.us/reader035/viewer/2022062704/555c3b0cd8b42a0b038b49fa/html5/thumbnails/24.jpg)
School of Computer Science and Engineering
Thank You!