An Information-Theoretic Perspective of Consistent...

58
An Information-Theoretic Perspective of Consistent Distributed Storage Viveck R. Cadambe Pennsylvania State University Joint with Prof. Zhiying Wang (UCI) Prof. Nancy Lynch (MIT), Prof. Muriel Medard (MIT) and Dr. Peter Musial (EMC Corporation)

Transcript of An Information-Theoretic Perspective of Consistent...

Page 1: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

An Information-Theoretic Perspective of Consistent Distributed Storage

Viveck R. Cadambe

Pennsylvania State University

Joint with Prof. Zhiying Wang (UCI) Prof. Nancy Lynch (MIT), Prof. Muriel Medard (MIT) and Dr. Peter Musial (EMC Corporation)

Page 2: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Distributed Storage Systems

Page 3: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

•  Failure tolerance, Low storage costs, Fast reads and writes

3  

Distributed Storage Systems

Page 4: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

•  Failure tolerance, Low storage costs, Fast reads and writes •  This talk: Consistency

4  

•  High-level principle: read the “latest” value stored in the system

Distributed Storage Systems

Page 5: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

•  Failure tolerance, Low storage costs, Fast reads and writes •  This talk: Consistency

5  

•  High-level principle: read the “latest” value stored in the system •  Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra DB, Google Spanner, Voldermort DB ….. •  Used for transactions, reservation systems, multi-player gaming, social

networks, news feeds, distributed computing tasks etc.

Distributed Storage Systems

Page 6: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

6  

Servers

Write Clients Read Clients (Decoders)

High level Distributed Storage Model

Page 7: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

•  Asynchrony – packets don’t arrive at all the servers simultaneously

•  Distributed nature - nodes do not know which packets have been received by other nodes, or if they have failed.

•  Consistency – the reader/decoder needs the latest “possible” version.

7  

Servers

Write Clients Read Clients (Decoders)

High level Distributed Storage Model

Page 8: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

8  

Servers

Write Clients Read Clients (Decoders)

•  Asynchrony, Distributed Nature, Consistency

Analytical understanding of storage costs, latency, is very limited Replication is used in every commercial solution to provide fault tolerance

High level Distributed Storage Model

Page 9: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Standard model in distributed systems

theory

Page 10: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Multi-version Coding

Standard model in distributed systems

theory

Page 11: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Multi-version Coding

Toy model for distributed storage

Standard model in distributed systems

theory

Page 12: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

12  

The multi-version coding (MVC) problem [Wang-C, ISIT, Allerton 2014, arxiv 2015]

As the data gets updated

•  Asynchrony: all servers may not simultaneously get the new version of the data

•  Distributed nature: each node is unaware of the versions received by the other nodes

•  Consistency: A decoder must get the latest possible version of the data

Page 13: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver. 1

Ver. 1 Ver. 1 Ver. 1

13  

The multi-version coding problem

Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 14: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver. 1 Ver. 2

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2

14  

The multi-version coding problem

Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 15: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver. 1 Ver. 2

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2

Ver. 3 Ver. 4 ……  

15  

The multi-version coding problem

Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 16: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver. 1 Ver. 2

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2

Ver. 3 Ver. 4 ……  

16  

The multi-version coding problem

Ver. 1

Latest common or something later: Ver. 2

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 17: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver. 1 Ver. 2

Latest common or something later: Ver.

1

Ver. 3 Ver. 4 ……

17  

The multi-version coding problem

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2 Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 18: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver. 1 Ver. 2

Latest common or something later: Ver.

1 or Ver. 2

Ver. 3 Ver. 4 ……

18  

The multi-version coding problem

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2 Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 19: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

19  

The multi-version coding problem

In general, client connects to c servers, demands the latest common version among v versions

Ver. 1 Ver. 2

Latest common or something later: Ver.

1 or Ver. 2

Ver. 3 Ver. 4 ……

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2 Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Page 20: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

20  

The multi-version coding problem •  n servers •  v versions •  c connectivity

Ver. 1

Ver. 2

Latest common or something later: Ver. 1 or Ver. 2

Ver. 3

Ver. 4

……

Ver. 1

Ver. 1

Ver. 2

Ver. 1

Ver. 2

Ver. 1

Page 21: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

21  

The multi-version coding problem •  n servers •  v versions •  c connectivity •  Goal: decode the latest common version among the c servers •  Minimize the storage cost    – Worst case, across all “states” – across all servers

Ver. 1

Ver. 2

Latest common or something later: Ver. 1 or Ver. 2

Ver. 3

Ver. 4

……

Ver. 1

Ver. 1

Ver. 2

Ver. 1

Ver. 2

Ver. 1

Page 22: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Solution 1: Replication

Ver 1 Ver 1 Ver 1 Ver 1 Version 1

Version 2

Storage size = size-of-one-version

Page 23: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1 Ver 1

Ver 2 Ver 2

Version 1

Version 2

Storage size = size-of-one-version

Solution 1: Replication

Page 24: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

1/2 1/2 1/2 1/2

1/2 1/2

Version 1

Version 2

Solution 2: MDS code

Separate coding across versions. Each server stores all the versions received.

c=2

Storage size = (Number of versions / c)*size-of-one-version = v/c*size-of-one-version

Page 25: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Storage Cost Normalized by size-of-

value Replication 1

Naïve MDS codes

Constructions

Lower bound vc+v�1

v = Number of Versions

c = Connectivity

v/c1

dc/ve

�o(size-of-one-version)

Page 26: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Storage Cost Normalized by size-of-

value Replication 1

Naïve MDS codes

Constructions

Lower bound vc+v�1

v = Number of Versions

c = Connectivity

1dc/ve

v/c

�o(size-of-value)

Page 27: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Achievability  

. . . . . . . . .

z}|{ z}|{ z}|{

. . .

Partition 1 Partition 2Partition v

Partition i: Version i is the latest version

Page 28: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Achievability  

. . . . . . . . .

z}|{ z}|{ z}|{

. . .

Partition 1 Partition 2Partition v

Partition i: Version i is the latest version

There is at least one partition with dc/ve servers

Page 29: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Achievability  

. . . . . . . . .

z}|{ z}|{ z}|{

. . .

Partition 1 Partition 2Partition v

Partition i: Version i is the latest version

There is at least one partition with dc/ve servers

Simple achievable scheme:

Server in partition i stores 1dc/ve of version i

Page 30: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1 . . .

. . .

Page 31: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Propagate version 2 to a minimal set of servers such that it is decodable

Page 32: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2

Virtual server

Versions 1 and 2 decodable from c+1 symbols

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Propagate version 2 to a minimal set of servers such that it is decodable

Page 33: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2

Virtual server

=) Storage � 2c+1

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Versions 1 and 2 decodable from c+1 symbols

Propagate version 2 to a minimal set of servers such that it is decodable

Page 34: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2

Virtual server

=) Storage � 2c+1

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Versions 1 and 2 decodable from c+1 symbols

Propagate version 2 to a minimal set of servers such that it is decodable

�o(size-of-one-version)

Page 35: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

•  Intuition: Find c+v-1 virtual servers, where all v versions can be decoded

•  A more intricate puzzle as compared to v=2. •  Multi-version coding problem related to index-coding/

multiple-unicast/non-multicast network coding –  More precisely, it is related to pliable index coding

Converse: v > 2

[Brahma-Fragouli 12]

Page 36: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Converse: v=3

Ver 1 Ver 1

Ver 2

Ver 3 Ver 3 Ver 3 . . .

Page 37: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Converse: v=3

Ver 1 Ver 1

Ver 2

Ver 3 Ver 3 Ver 3 . . .

Ver 2 Ver 1 Ver 3

a1

Server a1

Page 38: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Converse: v=3

a1 is the smallest number such that, there is a version x, such that

Version x is decodable, given the symbols of the first a1 servers with all 3 versions

and the messages of versions {1, 2, 3}� {x}

Page 39: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 2

Ver 3 Ver 3 Ver 3 . . .

Ver 2

Ver 3

. . .

. . .

. . . Ver 3

Ver 1 Ver 3

Ver 1

Ver 1 Ver 3

a1

a2

Server a2Server a1

Page 40: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Converse: v=3

a1 is the smallest number such that, there is a version x, such that

Version x is decodable, given the symbols of the first a1 servers with all 3 versions

and the messages of versions {1, 2, 3}� {x}

a2 is the smallest number such that, there is a version y 2 {1, 2, 3}� {x}, suchthat

Version y is decodable, given the symbols of the first a1 � 1 servers with all 3

versions and the remaining a2 � (a1 � 1) servers with versions {1, 2, 3}� {x}

and the message of version {1, 2, 3}� {x, y}

Page 41: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 2

Ver 3 Ver 3 Ver 3 . . .

Ver 2

Ver 3

. . .

. . .

. . . Ver 3

Ver 1 Ver 3

Ver 1

Ver 1 Ver 3

a1

a2

Server a2Server a1

Page 42: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Ver 1

Ver 2 Ver 2

. . .

. . .

. . .

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 2

Ver 3 Ver 3 Ver 3 . . .

Ver 2

Ver 3

. . .

. . .

. . . Ver 3

Ver 1 Ver 3

Ver 1

Ver 3 Ver 3

Ver 1 Ver 3

. . .

. . .

a1

a2 c

Server a2Server a1

Page 43: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Storage Cost Normalized by size-of-one-

version Replication 1

Naïve MDS codes

Constructions

Lower bound vc+v�1

v = Number of Versions

c = Connectivity

1dc/ve

�o(size-of-one-version)

v/c

Summary

*

*

These bounds can be improved.

See “Multi-version Coding – An Information Theoretic Perspective of Distributed Storage ”, Wang-Cadambe, arxiv, 2015

*

Page 44: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Multi-version codes – Main Insights

•  Redundancy required to ensure consistency in an asynchronous environment –  Redundancy increases with the number of parallel versions in the

system

•  Simple codes are (approximately) optimal –  Separate coding across versions –  Random linear codes within versions

•  More insights may be obtained by going beyond worst-case measures –  Correlated versions –  Allow a small fraction of “erroneous” statess

See “Multi-version Coding – An Information Theoretic Perspective of Distributed Storage ”, Wang-Cadambe, arxiv, 2015

Page 45: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Multi-version codes – Main Insights

•  Redundancy required to ensure consistency in an asynchronous environment –  Redundancy increases with the number of parallel versions in the

system

•  Simple codes are (approximately) optimal –  Separate coding across versions –  Random linear codes within versions

•  More insights may be obtained by going beyond worst-case measures –  Correlated versions –  Allow a small fraction of “erroneous” statess

See “Multi-version Coding – An Information Theoretic Perspective of Distributed Storage ”, Wang-Cadambe, arxiv, 2015

Page 46: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Multi-version Coding

Toy model for distributed storage

Standard model in distributed systems

theory

Page 47: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

47  

Servers

Write Clients Read Clients (Decoders)

Toy Model for packet arrivals, links

Page 48: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

48  

Servers

Write Clients Read Clients (Decoders)

•  Arrival at client: One packet in every time slot. Sent immediately to the servers. •  Channel from the write client to the server: Delay is an integer in [0,T-1].

•  Channel from server to read client: instantaneous (no delay).

•  Goal: decoder invoked at time t, gets the latest common version among c servers

Toy Model for packet arrivals, links

Page 49: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

49  

Servers

Write Clients Read Clients (Decoders)

•  Arrival at client: One packet in every time slot. Sent immediately to the servers. •  Channel from the write client to the server: Delay is an integer in [0,T-1].

•  Channel from server to read client: instantaneous (no delay).

•  Goal: decoder invoked at time t, gets the latest common version among c servers

Toy Model for packet arrivals, links

Page 50: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Insights from multi-version codes over toy model

Achievability “Theorem”:

Converse “Theorem”:

There exists an achievable storage strategy that achieves a storage cost of

1

dTc e

⇥ size-of-one-version

There exists no achievable storage strategy that achieves a storage cost smaller

than

T

T + c� 1

⇥ size-of-one-version� o(size-of-one-version)

Page 51: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Insights from multi-version codes over toy model

Achievability “Theorem”:

Converse “Theorem”:

Number of versions ⌫, depends on degree of asynchrony T

There exists an achievable storage strategy that achieves a storage cost of

1

dTc e

⇥ size-of-one-version

There exists no achievable storage strategy that achieves a storage cost smaller

than

T

T + c� 1

⇥ size-of-one-version� o(size-of-one-version)

Page 52: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Multi-version Coding

Toy model for distributed storage

Standard model in distributed systems

literature

Page 53: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

53  

Servers

Write Clients Read Clients (Decoders)

Model studied in distributed systems – Key features

•  Arrival at clients: arbitrary

•  Channel from clients to servers: arbitrary delay, reliable

•  Clients and servers are modeled as I/O automata, so their protocols can be designed.

Page 54: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

54  

Servers

Write Clients Read Clients (Decoders)

Model studied in distributed systems – Key features

•  Arrival at clients: arbitrary

•  Channel from clients to servers: arbitrary delay, reliable

•  Clients and servers are modeled as I/O automata, so their protocols can be designed.

Multi-version coding converse for v=2 can be lifted to this setting.

Page 55: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

55  

Future Work – Many open questions •  Less conservative modeling assumptions,

-  Exploiting correlation between versions -  Allow for a “small” number of erroneous states -  Less distributed, knowledge of the state of other nodes.

•  Finer network and node models (beyond toy models). -  Can lead to finer insights in to communication and storage costs -  Allow for the design of protocols, for say, the read client (or the write client)

•  Study of errors/Byzantine adversaries instead of erasures -

useful assumption for ensuring security.s

Page 56: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

56  

Future Work – Many open questions •  Less conservative modeling assumptions,

-  Exploiting correlation between versions -  Allow for a “small” number of erroneous states -  Less distributed, knowledge of the state of other nodes.

•  Finer network and node models (beyond toy models). -  Can lead to finer insights in to communication and storage costs -  Allow for the design of protocols, for say, the read client (or the write client)

•  Study of errors/Byzantine adversaries instead of erasures -

useful assumption for ensuring security.s

Page 57: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

57  

Future Work – Many open questions •  Less conservative modeling assumptions,

-  Exploiting correlation between versions -  Allow for a “small” number of erroneous states -  Less distributed, knowledge of the state of other nodes.

•  Finer network and node models (beyond toy models). -  Can lead to finer insights in to communication and storage costs -  Allow for the design of protocols, for say, the read client (or the write client)

•  Study of errors/Byzantine adversaries instead of erasures -

useful assumption for ensuring security.

Page 58: An Information-Theoretic Perspective of Consistent ...dimacs.rutgers.edu/Workshops/Next15/Slides/Cadambe.pdf · • Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra

Thanks