Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16

Motivation

Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 2 successful 1-subsets + 0.9 2 0.1 3 7 successful 2-subsets + 0.9 3 0.1 2 9 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99 A

Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 0 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99144 B

Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 6 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.9963 C

MotivationA B C 0.99 0.99144 0.9963

0.1 accessmodel

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Source s has a data object of unit size It can use n storage nodes to store x 1, x 2, , x n amount of data But faces an aggregate storage budget T, i.e. Access by the Data Collector Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Data collector t attempts to recover the data object by accessing a subset r of storage nodes It succeeds when the total amount of data accessed is at least the size of the data object, i.e. Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective We seek the optimal allocation that maximizes the probability of successful recovery

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Difficulty Problem is nonconvex Large space of possible symmetric and nonsymmetric allocations (an allocation is symmetric if all its nonzero elements are equal, and nonsymmetric otherwise)

[1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independently with constant probability p

Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial [1] Deterministic Allocation with Probabilistic Access Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley

[2] Deterministic Allocation with Fixed Access Data collector accesses an r -subset of storage nodes, selected uniformly at random from the collection of all possible r -subsets, where r < n is a constant

[2] Deterministic Allocation with Fixed Access Equivalently, we can seek the allocation that minimizes the budget T, among all allocations that achieve a given probability of successful recovery

[2] Deterministic Allocation with Fixed Access Example: ( n, r ) = (6,2) Question: For any budget T, is there always a symmetric allocation that produces the maximum success probability?

[2] Deterministic Allocation with Fixed Access Question: What is the optimal symmetric allocation? For most choices of ( n, r, T ), the optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally An example of an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal number of nodes to use, 9, is neither of the extremes

[2] Deterministic Allocation with Fixed Access For Probability-1 Recovery, the problem reduces to a simple LP Result 1: If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of which corresponds to the allocation i.e. it is optimal to spread the budget maximally We can also bound the success probability above which this allocation is optimal

[3] Symmetric Probabilistic Allocation with Fixed Access Each storage node is used independently with constant probability s / n to store the same amount of data 1 / `, and the total storage used must be at most budget T in expectation

[3] Symmetric Probabilistic Allocation with Fixed Access Probability of successful recovery can be written as where Bin( n, p ) denotes the binomial random variable with n trials and success probability p Reparameterizing in terms of budget T gives the success probability,, each nonempty node stores 1 / ` amount of data

[3] Symmetric Probabilistic Allocation with Fixed Access Result 2: For any r 2, and at any budget T large enough to support a success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to spread the budget maximally each nonempty node stores 1 / ` amount of data

[3] Symmetric Probabilistic Allocation with Fixed Access As we increase the budget T, we observe a sharp change in the optimal allocation For small budgets and therefore low success probabilities, it is optimal to store the data object in its entirety ( ` = 1) and hope the data collector accesses at least one of the nonempty nodes For large budgets and therefore high success probabilities, it is optimal to store only 1 / r amount of data in each node used ( ` = r ) and hope the data collector accesses r of them r = 5

[3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r r = 5 each nonempty node stores 1 / ` amount of data

[3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r each nonempty node stores 1 / ` amount of data r = 5 store less store more increasing budget per node

Summary & Future Work [1] Deterministic Allocation with Probabilistic Access Suboptimality of symmetric allocations [2] Deterministic Allocation with Fixed Access Optimal allocation for high probability recovery Extreme point solutions not necessarily optimal for symmetric allocations Is there always a symmetric optimal allocation? [3]iSymmetric Probabilistic Allocation with Fixed Access Optimal allocation in high-probability regime Is there a phase transition in optimal allocation with increasing budget?

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...

Documents

Transcript of Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...