Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...

download Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16.

If you can't read please download the document

Transcript of Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California...

  • Slide 1
  • Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16
  • Slide 2
  • Motivation
  • Slide 3
  • 0.1
  • Slide 4
  • A B C
  • Slide 5
  • Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 2 successful 1-subsets + 0.9 2 0.1 3 7 successful 2-subsets + 0.9 3 0.1 2 9 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99 A
  • Slide 6
  • Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 0 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.99144 B
  • Slide 7
  • Motivation Success probability = 0.9 0 0.1 5 0 successful 0-subsets + 0.9 1 0.1 4 0 successful 1-subsets + 0.9 2 0.1 3 6 successful 2-subsets + 0.9 3 0.1 2 10 successful 3-subsets + 0.9 4 0.1 1 5 successful 4-subsets + 0.9 5 0.1 0 1 successful 5-subsets = 0.9963 C
  • Slide 8
  • MotivationA B C 0.99 0.99144 0.9963
  • Slide 9
  • 0.1 accessmodel
  • Slide 10
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective
  • Slide 11
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Source s has a data object of unit size It can use n storage nodes to store x 1, x 2, , x n amount of data But faces an aggregate storage budget T, i.e. Access by the Data Collector Objective
  • Slide 12
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Data collector t attempts to recover the data object by accessing a subset r of storage nodes It succeeds when the total amount of data accessed is at least the size of the data object, i.e. Objective
  • Slide 13
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective We seek the optimal allocation that maximizes the probability of successful recovery
  • Slide 14
  • Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Difficulty Problem is nonconvex Large space of possible symmetric and nonsymmetric allocations (an allocation is symmetric if all its nonzero elements are equal, and nonsymmetric otherwise)
  • Slide 15
  • [1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independently with constant probability p
  • Slide 16
  • Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial [1] Deterministic Allocation with Probabilistic Access Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley
  • Slide 17
  • [2] Deterministic Allocation with Fixed Access Data collector accesses an r -subset of storage nodes, selected uniformly at random from the collection of all possible r -subsets, where r < n is a constant
  • Slide 18
  • [2] Deterministic Allocation with Fixed Access Equivalently, we can seek the allocation that minimizes the budget T, among all allocations that achieve a given probability of successful recovery
  • Slide 19
  • [2] Deterministic Allocation with Fixed Access Example: ( n, r ) = (6,2) Question: For any budget T, is there always a symmetric allocation that produces the maximum success probability?
  • Slide 20
  • [2] Deterministic Allocation with Fixed Access Question: What is the optimal symmetric allocation? For most choices of ( n, r, T ), the optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally An example of an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal number of nodes to use, 9, is neither of the extremes
  • Slide 21
  • [2] Deterministic Allocation with Fixed Access For Probability-1 Recovery, the problem reduces to a simple LP Result 1: If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of which corresponds to the allocation i.e. it is optimal to spread the budget maximally We can also bound the success probability above which this allocation is optimal
  • Slide 22
  • [3] Symmetric Probabilistic Allocation with Fixed Access Each storage node is used independently with constant probability s / n to store the same amount of data 1 / `, and the total storage used must be at most budget T in expectation
  • Slide 23
  • [3] Symmetric Probabilistic Allocation with Fixed Access Probability of successful recovery can be written as where Bin( n, p ) denotes the binomial random variable with n trials and success probability p Reparameterizing in terms of budget T gives the success probability,, each nonempty node stores 1 / ` amount of data
  • Slide 24
  • [3] Symmetric Probabilistic Allocation with Fixed Access Result 2: For any r 2, and at any budget T large enough to support a success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to spread the budget maximally each nonempty node stores 1 / ` amount of data
  • Slide 25
  • [3] Symmetric Probabilistic Allocation with Fixed Access As we increase the budget T, we observe a sharp change in the optimal allocation For small budgets and therefore low success probabilities, it is optimal to store the data object in its entirety ( ` = 1) and hope the data collector accesses at least one of the nonempty nodes For large budgets and therefore high success probabilities, it is optimal to store only 1 / r amount of data in each node used ( ` = r ) and hope the data collector accesses r of them r = 5
  • Slide 26
  • [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r r = 5 each nonempty node stores 1 / ` amount of data
  • Slide 27
  • [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r each nonempty node stores 1 / ` amount of data r = 5 store less store more increasing budget per node
  • Slide 28
  • Summary & Future Work [1] Deterministic Allocation with Probabilistic Access Suboptimality of symmetric allocations [2] Deterministic Allocation with Fixed Access Optimal allocation for high probability recovery Extreme point solutions not necessarily optimal for symmetric allocations Is there always a symmetric optimal allocation? [3]iSymmetric Probabilistic Allocation with Fixed Access Optimal allocation in high-probability regime Is there a phase transition in optimal allocation with increasing budget?
  • Slide 29
  • Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16