A Practical Scalable Distributed B-Treekmsalem/courses/CS848W10/presentations/... · A Practical...
-
Upload
truongphuc -
Category
Documents
-
view
216 -
download
0
Transcript of A Practical Scalable Distributed B-Treekmsalem/courses/CS848W10/presentations/... · A Practical...
A Practical Scalable Distributed B-TreeCS 848 Paper Presentation
Marcos K. Aguilera, Wojciech Golab,Mehul A. Shah
PVLDB ’08
March 8, 2010
Presenter: Evguenia (Elmi) Eflov
Presentation Outline
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Distributed (key,value) Storage
The paper presents three motivating examples:
The back-end of a multiplayer game. Multiplayergames need to store and manage data for thousands ofplayers while providing low latency access and very highdata consistency
Metadata storage for a cluster file system.Metadata access is often the bottleneck in such systems.Metadata changes, for example, file renaming orrelocation, in cluster file systems need to be atomic
Secondary indexes. A lot of application require morethan one index on a set of data to guarantee fast accessbased on different conditions. As in the two previousexamples, data changes need to be atomic
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
B+tree
B-tree is a tree data structure that stores values sorted bykey and allows updates and lookups in amortizedlogarithmic time
B+tree is a form of B-tree where inner nodes of the treestore keys and pointers, and leaf nodes store key-valuepairs
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Sinfonia
Sinfonia is a distributed data storage service that providesACID properties for the application data
Sinfonia provides a data manipulation primitive, aminitransaction
Minitransaction ...
Consists of 3 (possibly empty) sets of operationsOperations are comparisons, reads, and writesReads and writes are performed only if all of thecomparisons are successfulIs performed as part of a two-phase commitSome varieties can be performed in a single phase
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Assumptions
The B-tree operates in a data center environment. Thisguarantees high bandwidth, low latency connectionsbetween client and server machines
Individual machines can fail without causing the systemto stall, but network partitions will stall the system
B-tree is not going to grow or shrink rapidly
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Design of the B-tree
Each server in the system stores some number of innerand leaf nodes of the B-tree
Each server in the system stores the version table of allthe inner nodes of the B-tree
Each client caches all inner nodes of the B-tree, and usesthis cache while executing a transaction
During a transaction, the client composes a set of readsand writes required
At commit time, Sinfonia’s minitransaction is used toperform the B-tree operations required on the server data
Comparisons are added by the B-tree client library toguarantee data consistency
B-tree Operations Efficiency
To make the B-tree efficient, the following three techniquesare used:
Clients use optimistic concurrency control, which workswell unless the B-tree is rapidly shrinking or growing
Since version numbers of the inner nodes are stored ateach server, inner node versions can be checked at anyserver in the system, for example, at the server where aleaf node being accessed is stored
Inner B-tree nodes are lazily replicated by clients - nodesthat a particular client does not access may be stale ornot present on the client
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Migration Operations
The distributed B-tree supports the following additionaloperations:
Migrate(x , s) - migrates node B-tree node x to server s
The following operations for multi-node migration:
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Why are transactions required?
In order to guarantee data consistency, each datamanipulation on the B-tree has to be performedatomically, for example, renaming of a file in the clusterfile system or transferring an item and payment for theitem between characters in the computer game
While a minitransaction provided by Sinfonia is sufficientto perform the necessary B-tree node manipulations, it istedious of the user of the B-tree to code in terms of theminitransaction
The B-tree provides transaction interface as a way for theuser to define all the necessary Read and Writeoperations within a transaction, while adding necessarycomparisons to guarantee data consistency
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Extensions
The following extensions are suggested to enhance the existingimplementation
Enhanced migration tasks - migration tasks that helpthe system adapt to seasonal variations or balance theload aggressively can be implemented
Dealing with hot-spots - migration task to migratepopular keys to different servers can be implemented,including migration of the keys that are currently storedin the same node
(continued)
Extensions
Varying the replication factor of inner nodes -replicating version numbers of the inner nodes on lowerlevels of the tree less aggressively could decrease the costof modifying those nodes
Finer-grained concurrency control to avoid falsesharing - if concurrency control operated on keys (orsmall groups of keys) rather than nodes, the number ofconflicts could be decreased
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Experimental Setup
10-byte keys, 8-byte values, 12-byte pointers - 4 bytesspecify server, 8 bytes - offset within the server
4 KB nodes, with leaf nodes storing 220 key-value pairsand inner nodes storing 180 key-pointer pairs
Same number of servers and clients
Each client has 4 parallel threads, each thread issues anew request as soon as the current request is completed
Key space consists of 109 elements, with keys chosenuniformly, at random for each operation
Workloads
The following workloads are used in both scalability andmigration experiments
Insert
Lookup
Update - values for existing keys are updated
Mixed - 60% lookups and 40% updates
“Before the insert workload, the B-tree was pre-populatedwith 40,000 elements rather than starting with an emptyB-tree.”
Were the experiments performed in the order they arepresented in?
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions
Results of the Migration Experiments
For migration experiments, Move task was performed by amigration client while the rest of the setup for thecorresponding experiment was executed
Migration rate was 55.3 ± 2.7 nodes/s (around 10000key-value pairs/s) on an idle system, and less than 5 nodes/swhen executed with other tasks.
1 BackgroundProblemDistributed B+treeSinfonia
2 Distributed B-tree ImplementationAssumptionsDesign of the B-treeB-tree OperationsTransactionsExtensions
3 Experimental ResultsWorkloadResults
4 DiscussionQuestions