Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...
-
Upload
edward-johnston -
Category
Documents
-
view
212 -
download
0
Transcript of Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...
- Slide 1
- Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16
- Slide 2
- Motivation
- Slide 3
- 0.1
- Slide 4
- A B C
- Slide 5
- Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 2 successful 1-subsets + 0.9 2 0.1 3 7 successful 2-subsets + 0.9 3 0.1 2 9 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99 A
- Slide 6
- Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 0 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99144 B
- Slide 7
- Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 6 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.9963 C
- Slide 8
- MotivationA B C 0.99 0.99144 0.9963
- Slide 9
- 0.1 accessmodel
- Slide 10
- Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective
- Slide 11
- Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Source s has a data object of unit size It can use n storage nodes to store x 1, x 2, , x n amount of data But faces an aggregate storage budget T, i.e. Access by the Data Collector Objective
- Slide 12
- Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Data collector t attempts to recover the data object by accessing a subset r of storage nodes It succeeds when the total amount of data accessed is at least the size of the data object, i.e. Objective
- Slide 13
- Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective We seek the optimal allocation that maximizes the probability of successful recovery
- Slide 14
- Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Difficulty Problem is nonconvex Large space of possible symmetric and nonsymmetric allocations (an allocation is symmetric if all its nonzero elements are equal, and nonsymmetric otherwise)
- Slide 15
- [1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independently with constant probability p
- Slide 16
- Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial [1] Deterministic Allocation with Probabilistic Access Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley
- Slide 17
- [2] Deterministic Allocation with Fixed Access Data collector accesses an r -subset of storage nodes, selected uniformly at random from the collection of all possible r -subsets, where r < n is a constant
- Slide 18
- [2] Deterministic Allocation with Fixed Access Equivalently, we can seek the allocation that minimizes the budget T, among all allocations that achieve a given probability of successful recovery
- Slide 19
- [2] Deterministic Allocation with Fixed Access Example: ( n, r ) = (6,2) Question: For any budget T, is there always a symmetric allocation that produces the maximum success probability?
- Slide 20
- [2] Deterministic Allocation with Fixed Access Question: What is the optimal symmetric allocation? For most choices of ( n, r, T ), the optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally An example of an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal number of nodes to use, 9, is neither of the extremes
- Slide 21
- [2] Deterministic Allocation with Fixed Access For Probability-1 Recovery, the problem reduces to a simple LP Result 1: If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of which corresponds to the allocation i.e. it is optimal to spread the budget maximally We can also bound the success probability above which this allocation is optimal
- Slide 22
- [3] Symmetric Probabilistic Allocation with Fixed Access Each storage node is used independently with constant probability s / n to store the same amount of data 1 / `, and the total storage used must be at most budget T in expectation
- Slide 23
- [3] Symmetric Probabilistic Allocation with Fixed Access Probability of successful recovery can be written as where Bin( n, p ) denotes the binomial random variable with n trials and success probability p Reparameterizing in terms of budget T gives the success probability,, each nonempty node stores 1 / ` amount of data
- Slide 24
- [3] Symmetric Probabilistic Allocation with Fixed Access Result 2: For any r 2, and at any budget T large enough to support a success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to spread the budget maximally each nonempty node stores 1 / ` amount of data
- Slide 25
- [3] Symmetric Probabilistic Allocation with Fixed Access As we increase the budget T, we observe a sharp change in the optimal allocation For small budgets and therefore low success probabilities, it is optimal to store the data object in its entirety ( ` = 1) and hope the data collector accesses at least one of the nonempty nodes For large budgets and therefore high success probabilities, it is optimal to store only 1 / r amount of data in each node used ( ` = r ) and hope the data collector accesses r of them r = 5
- Slide 26
- [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r r = 5 each nonempty node stores 1 / ` amount of data
- Slide 27
- [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r each nonempty node stores 1 / ` amount of data r = 5 store less store more increasing budget per node
- Slide 28
- Summary & Future Work [1] Deterministic Allocation with Probabilistic Access Suboptimality of symmetric allocations [2] Deterministic Allocation with Fixed Access Optimal allocation for high probability recovery Extreme point solutions not necessarily optimal for symmetric allocations Is there always a symmetric optimal allocation? [3]iSymmetric Probabilistic Allocation with Fixed Access Optimal allocation in high-probability regime Is there a phase transition in optimal allocation with increasing budget?
- Slide 29
- Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16