An Information-Theoretic Perspective of Consistent...

An Information-Theoretic Perspective of Consistent Distributed Storage

Viveck R. Cadambe

Pennsylvania State University

Joint with Prof. Zhiying Wang (UCI) Prof. Nancy Lynch (MIT), Prof. Muriel Medard (MIT) and Dr. Peter Musial (EMC Corporation)

Distributed Storage Systems

•  Failure tolerance, Low storage costs, Fast reads and writes

•  Failure tolerance, Low storage costs, Fast reads and writes •  This talk: Consistency

•  High-level principle: read the “latest” value stored in the system

•  Failure tolerance, Low storage costs, Fast reads and writes •  This talk: Consistency

•  High-level principle: read the “latest” value stored in the system •  Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra DB, Google Spanner, Voldermort DB ….. •  Used for transactions, reservation systems, multi-player gaming, social

networks, news feeds, distributed computing tasks etc.

Servers

Write Clients Read Clients (Decoders)

High level Distributed Storage Model

•  Asynchrony – packets don’t arrive at all the servers simultaneously

•  Distributed nature - nodes do not know which packets have been received by other nodes, or if they have failed.

•  Consistency – the reader/decoder needs the latest “possible” version.

Servers

•  Asynchrony, Distributed Nature, Consistency

Analytical understanding of storage costs, latency, is very limited Replication is used in every commercial solution to provide fault tolerance

Standard model in distributed systems

theory

Multi-version Coding

theory

Toy model for distributed storage

theory

The multi-version coding (MVC) problem [Wang-C, ISIT, Allerton 2014, arxiv 2015]

As the data gets updated

•  Asynchrony: all servers may not simultaneously get the new version of the data

•  Distributed nature: each node is unaware of the versions received by the other nodes

•  Consistency: A decoder must get the latest possible version of the data

Ver. 1

Ver. 1 Ver. 1 Ver. 1

The multi-version coding problem

Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Ver. 1 Ver. 2

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2

Ver. 1

Ver. 1 Ver. 2

Ver. 3 Ver. 4 ……

Ver. 1

Ver. 1 Ver. 2

Ver. 3 Ver. 4 ……

Ver. 1

Latest common or something later: Ver. 2

Ver. 1 Ver. 2

Latest common or something later: Ver.

Ver. 3 Ver. 4 ……

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2 Ver. 1

Ver. 1 Ver. 2

1 or Ver. 2

Ver. 3 Ver. 4 ……

In general, client connects to c servers, demands the latest common version among v versions

Ver. 1 Ver. 2

1 or Ver. 2

Ver. 3 Ver. 4 ……

The multi-version coding problem •  n servers •  v versions •  c connectivity

Ver. 1

Ver. 2

Latest common or something later: Ver. 1 or Ver. 2

Ver. 3

Ver. 4

……

Ver. 1

Ver. 2

Ver. 1

Ver. 2

Ver. 1

The multi-version coding problem •  n servers •  v versions •  c connectivity •  Goal: decode the latest common version among the c servers •  Minimize the storage cost – Worst case, across all “states” – across all servers

Ver. 1

Ver. 2

Latest common or something later: Ver. 1 or Ver. 2

Ver. 3

Ver. 4

……

Ver. 1

Ver. 2

Ver. 1

Ver. 2

Ver. 1

Solution 1: Replication

Ver 1 Ver 1 Ver 1 Ver 1 Version 1

Version 2

Storage size = size-of-one-version

Ver 1 Ver 1

Ver 2 Ver 2

Version 1

Version 2

Storage size = size-of-one-version

Solution 1: Replication

1/2 1/2 1/2 1/2

1/2 1/2

Version 1

Version 2

Solution 2: MDS code

Separate coding across versions. Each server stores all the versions received.

Storage size = (Number of versions / c)*size-of-one-version = v/c*size-of-one-version

Storage Cost Normalized by size-of-

value Replication 1

Naïve MDS codes

Constructions

Lower bound vc+v�1

v = Number of Versions

c = Connectivity

�o(size-of-one-version)

Storage Cost Normalized by size-of-

value Replication 1

Naïve MDS codes

Constructions

c = Connectivity

1dc/ve

�o(size-of-value)

Achievability

. . . . . . . . .

z}|{ z}|{ z}|{

Partition 1 Partition 2Partition v

Partition i: Version i is the latest version

Achievability

. . . . . . . . .

z}|{ z}|{ z}|{

There is at least one partition with dc/ve servers

Achievability

. . . . . . . . .

z}|{ z}|{ z}|{

There is at least one partition with dc/ve servers

Simple achievable scheme:

Server in partition i stores 1dc/ve of version i

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1 . . .

Ver 1 Ver 1 Ver 1 Ver 1

Ver 2 Ver 2

Propagate version 2 to a minimal set of servers such that it is decodable

Virtual server

Versions 1 and 2 decodable from c+1 symbols

Ver 2 Ver 2

Virtual server

=) Storage � 2c+1

Ver 2 Ver 2

Virtual server

=) Storage � 2c+1

Ver 2 Ver 2

•  Intuition: Find c+v-1 virtual servers, where all v versions can be decoded

•  A more intricate puzzle as compared to v=2. •  Multi-version coding problem related to index-coding/

multiple-unicast/non-multicast network coding –  More precisely, it is related to pliable index coding

Converse: v > 2

[Brahma-Fragouli 12]

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

Ver 2 Ver 1 Ver 3

Server a1

Converse: v=3

a1 is the smallest number such that, there is a version x, such that

Version x is decodable, given the symbols of the first a1 servers with all 3 versions

and the messages of versions {1, 2, 3}� {x}

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

. . . Ver 3

Ver 1 Ver 3

Server a2Server a1

Converse: v=3

a1 is the smallest number such that, there is a version x, such that

Version x is decodable, given the symbols of the first a1 servers with all 3 versions

and the messages of versions {1, 2, 3}� {x}

a2 is the smallest number such that, there is a version y 2 {1, 2, 3}� {x}, suchthat

Version y is decodable, given the symbols of the first a1 � 1 servers with all 3

versions and the remaining a2 � (a1 � 1) servers with versions {1, 2, 3}� {x}

and the message of version {1, 2, 3}� {x, y}

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

. . . Ver 3

Ver 1 Ver 3

Server a2Server a1

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

. . . Ver 3

Ver 1 Ver 3

Ver 3 Ver 3

Ver 1 Ver 3

Server a2Server a1

Storage Cost Normalized by size-of-one-

version Replication 1

Naïve MDS codes

Constructions

c = Connectivity

1dc/ve

Summary

These bounds can be improved.

See “Multi-version Coding – An Information Theoretic Perspective of Distributed Storage ”, Wang-Cadambe, arxiv, 2015

Multi-version codes – Main Insights

•  Redundancy required to ensure consistency in an asynchronous environment –  Redundancy increases with the number of parallel versions in the

system

•  Simple codes are (approximately) optimal –  Separate coding across versions –  Random linear codes within versions

•  More insights may be obtained by going beyond worst-case measures –  Correlated versions –  Allow a small fraction of “erroneous” statess

Multi-version codes – Main Insights

•  Redundancy required to ensure consistency in an asynchronous environment –  Redundancy increases with the number of parallel versions in the

system

•  Simple codes are (approximately) optimal –  Separate coding across versions –  Random linear codes within versions

•  More insights may be obtained by going beyond worst-case measures –  Correlated versions –  Allow a small fraction of “erroneous” statess

theory

Servers

Toy Model for packet arrivals, links

Servers

•  Arrival at client: One packet in every time slot. Sent immediately to the servers. •  Channel from the write client to the server: Delay is an integer in [0,T-1].

•  Channel from server to read client: instantaneous (no delay).

•  Goal: decoder invoked at time t, gets the latest common version among c servers

Servers

•  Arrival at client: One packet in every time slot. Sent immediately to the servers. •  Channel from the write client to the server: Delay is an integer in [0,T-1].

•  Channel from server to read client: instantaneous (no delay).

•  Goal: decoder invoked at time t, gets the latest common version among c servers

Insights from multi-version codes over toy model

Achievability “Theorem”:

Converse “Theorem”:

There exists an achievable storage strategy that achieves a storage cost of

⇥ size-of-one-version

There exists no achievable storage strategy that achieves a storage cost smaller

T + c� 1

⇥ size-of-one-version� o(size-of-one-version)

Insights from multi-version codes over toy model

Achievability “Theorem”:

Converse “Theorem”:

Number of versions ⌫, depends on degree of asynchrony T

There exists an achievable storage strategy that achieves a storage cost of

⇥ size-of-one-version

There exists no achievable storage strategy that achieves a storage cost smaller

T + c� 1

⇥ size-of-one-version� o(size-of-one-version)

literature

Servers

Model studied in distributed systems – Key features

•  Arrival at clients: arbitrary

•  Channel from clients to servers: arbitrary delay, reliable

•  Clients and servers are modeled as I/O automata, so their protocols can be designed.

Servers

Model studied in distributed systems – Key features

•  Arrival at clients: arbitrary

•  Channel from clients to servers: arbitrary delay, reliable

•  Clients and servers are modeled as I/O automata, so their protocols can be designed.

Multi-version coding converse for v=2 can be lifted to this setting.

Future Work – Many open questions •  Less conservative modeling assumptions,

-  Exploiting correlation between versions -  Allow for a “small” number of erroneous states -  Less distributed, knowledge of the state of other nodes.

•  Finer network and node models (beyond toy models). -  Can lead to finer insights in to communication and storage costs -  Allow for the design of protocols, for say, the read client (or the write client)

•  Study of errors/Byzantine adversaries instead of erasures -

useful assumption for ensuring security.s

useful assumption for ensuring security.

Thanks

An Information-Theoretic Perspective of Consistent...

Documents

Transcript of An Information-Theoretic Perspective of Consistent...

Machine Learning Algorithms for Classiﬁcation Princeton ...dimacs.rutgers.edu/Workshops/DataMiningTutorial/slides/schapire.pdf · Machine Learning Algorithms for Classiﬁcation

db-tech MICROWAVE COMMUNICATIONS DB-08

ã á s ã...(DB: Darden Restaurants Olive Garden Red Lobster (DB: Earls Joey Restaurant (DB: Keg (DB' McDonald's Moxie's Old Spaghetti Factory (DB: Red Robin (DB: Sammy Js …

db/db Aspergillus awamori

Node 1Node 2Node 3Node 4Node 5 DB 1Copy 1 DB 2Copy 1 DB 3Copy 1 DB 4Copy 1 DB 5Copy 1 DB 6Copy 1 DB 7Copy 1 DB 8Copy 1 DB 9Copy 1.

Patrick Huesler wooga GmbH › dl › goto-aar-2012 › slides › ...load balancer app app app app app app app app db db db db db db db db db db db db db db db db app app app app

· 1-1 OdB/ 15dB/ 20dB/ 25dB/ specified FC, SC, ST, LC, MU PC 50 dB FC, SC APC 60 dB 1 5 dB: deviation 0.5 dB 10 dB: deviation 1.0 dB 15 dB: deviation 1.5 dB 20 dB: deviation 2.0

Noise Thermometer - Honeywell Industrial Safety · 2019. 5. 17. · M1Rifle 161 dB Fireworks 162 dB Handgun 166 dB Apollo Lift-Off 188 dB Shotgun 170 dB Artillery Fire 162 dB Firecracker

DB-3 & DB-3A Manual

N450 DB/N600 DB User Manual

ETD-db 2.0: Rewriting ETD-db

ON-LINE MANUAL 24D143*DB / 24W143*DB / 24D343*DB LED ...cdn.cnetcontent.com/aa/e1/aae1d6b6-340c-474c-b896... · ON-LINE MANUAL 24D143*DB / 24W143*DB / 24D343*DB LED Backlight LCD

The Impact of Network Coding on Mathematicsdimacs.rutgers.edu/Workshops/Next15/Slides/Byrne.pdf · q-Fano plane I Braun, Kiermaier, Naki c, \On the Automorphism Group of a Binary

LCT 340 - Full Compass Systems · 139 dB, 0 dB pre-attenuation 145 dB, 6 dB pre-attenuation 151 dB, 12 dB pre-attenuation 157 dB, 18 dB pre-attenuation 6 dB, 12 dB, 18 dB, switchable

DB-2D, DB-3D, DB-4D and DB-2TC Manual

DIMACS Security & Cryptography Crash Course – …dimacs.rutgers.edu/Workshops/ComputerSecurity/slides/d4_tls_ssl.pdfDIMACS Security & Cryptography Crash Course – day 4 ... " Few

B SERIES - EAR Customized Hearing Protection Datasheet 2017-1.pdfBTN 27 dB, Class 5 24 dB 24.3 dB 30 dB HM 26 dB, Class 5 23 dB 19.1 dB 31 dB Power Battery type Rechargeable - lithium

190SL CATALOG - Antique and Vintage Mercedes Parts Catalog.pdf · db-40 black db-50 white db-158 white gray db-166 blue gray db-180 silver metalic db-190 graphite gray db-334 light

inContact DB Connector Reference Manual DB Connector Manual.pdfLaunch the DB Connector.exe ... MySQL. Since DB Connector utilizes Microsoft’s standardized OLE DB ... DB Connector

Legato Li - Rexton...HFA-OSPL 90 102 dB SPL – 114 dB SPL – Gain Full on gain (FOG) at 1.6 kHz – 43 dB – 55 dB Full on gain (peak) 45 dB 56 dB 60 dB 70 dB HFA-FOG 37 dB –

ON-LINE MANUAL 24D143DB / 24W143DB / 24D343DB LED ...cdn.cnetcontent.com/aa/e1/aae1d6b6-340c-474c-b896... · ON-LINE MANUAL 24D143DB / 24W143DB / 24D343DB LED Backlight LCD