The CAP Theorem

The CAP Theorem : Brewer’s Conjecture and

the Feasibility of Consistent, Available,

Partition-Tolerant Web Services

Aleksandar Bradic, Vast.com

nosqlsummer | Belgrade28 August 2010

CAP Theorem

Conjecture by Eric Brewer at PODC 2000 :� It is impossible for a web service to provide following three

guarantees :� Consistency� Availability� Partition-tolerance

CAP Theorem

� Consistency - all nodes should see the same data at the sametime

� Availability - node failures do not prevent survivors fromcontinuing to operate

� Partition-tolerance - the system continues to operate despitearbitrary message loss

CAP Theorem

CAP Theorem :� It is impossible for a web service to provide following three

guarantees :� Consistency� Availability� Partition-tolerance

� A distributed system can satisfy any two of theseguarantees at the same time but not all three

CAP Theorem

CAP Theorem

� Conjecture since 2000

� Established as theorem in 2002: Lynch, Nancy, and SethGilbert. Brewer’s conjecture and the feasibility of consistent,available, partition-tolerant web services. ACM SIGACT News,v. 33 issue 2, 2002, p. 51-59.

CAP Theorem

A distributed system can satisfy any two of CAP guarantees at thesame time but not all three:

� Consistency + Availability

� Consistency + Partition Tolerance

� Availability + Partition Tolerance

Consistency + Availability

Examples:

� Single-site databases

� Cluster databases

� LDAP

� xFS file system

Traits:

� 2-phase commit

� cache validation protocols

Consistency + Partition Tolerance

Examples:

� Distributed databases

� Distributed Locking

� Majority protocols

Traits:

� Pessimistic locking

� Make minority partitions unavailable

Availability + Partition Tolerance

Examples:

� Coda

� Web caching

� DNS

Traits:

� expiration/leases

� conflict resolution

� optimistic

Enterprise System CAP Classification

� RDBMS : CA (Master/Slave replication, Sharding)

� Amazon Dynamo : AP (Read-repair, application hooks)

� Terracota : CA (Quorum vote, majority partition survival)

� Apache Cassandra : AP (Partitioning, Read-repair)

� Apache Zookeeper: AP (Consensus protocol)

� Google BigTable : CA

� Apache CouchDB : AP

Enterprise System CAP Classification

Some Techniques:

� Consistent Hashing

� Vector Clocks

� Sloppy Quorum

� Merkle trees

� Gossip-based protocols

� ...

Proof of the CAP Theorem

Lynch, Nancy, and Seth Gilbert. Brewer’s conjecture and thefeasibility of consistent, available, partition-tolerant web services.ACM SIGACT News, v. 33 issue 2, 2002, p. 51-59.

Formal Model

Formalization of the notion of Consistency, Availability andPartition Tolerance :

� Atomic Data Object

� Available Data Object

� Partition Tolerance

Atomic Data Objects

Atomic/Linearizable Consistency:

� There must exist a total order on all operation such that eachoperation looks as if it were completed at a single instant

� This is equivalent to requiring requests on the distributedshared memory to act as if they are executing on single node,responding to operations one at the time

Available Data Objects

� For a distributed system to be continuously available, everyrequest received by a non-failing node in the system mustresult in a response

� That is, any algorithm used by service must eventuallyterminate

� (In some ways, this is weak definition of availability : it putsno bounds on how long the algorithm may run beforeterminating, and therefore allows unbounded computation)

� (On the other hand, when qualified by the need for partitiontolerance, this can be seen as a strong definition of availability: even when severe network failures occur, every request mustterminate)

Partition Tolerance

� In order to model partition tolerance, the network is allowed tolose arbitrary many messages sent from one node to another

� When a network is partitioned, all messages sent from nodesin one component of the partition to another component arelost.

� The atomicity requirement implies that every response will beatomic, even though arbitrary messages sent as part of thealgorithm might not be delivered

� The availability requirement therefore implies that every nodereceiving request from a client must respond, even througharbitrary messages that are sent may be lost

� Partition Tolerance : No set of failures less than total networkfailure is allowed to cause the system to respond incorrectly

Formal Framework

� Asynchronous Network Model

� Partially Synchronous Network Model

Asynchronous Networks

� There is no clock

� Nodes must make decisions based only on messages receivedand local computation

Asynchronous Networks : Impossibility Result

Theorem 1 : It is impossible in the asynchronous network model toimplement a read/write data object that guarantees the followingproperties:

� Availability

� Atomic consistency

in all fair executions (including those in which messages are lost)


Proof (by contradiction) :

� Assume an algorithm A exists that meets the three criteria :atomicity, availability and partition tolerance

� We construct an execution of A in which there exists arequest that returns and inconsistent response

� Assume that the network consists of at least two nodes. Thusit can be divided into two disjoint, non-empty sets G1,G2

� Assume all messages between G1 and G2 are lost.

� If a write occurs in G1 and read oddurs in G2, then the readoperation cannot return the results of earlier write operation.


Formal proof:

� Let v0 be the initial value of the atomic object

� Let α1 be the prefix of an execution of A in which a singlewrite of a value not equal to v0 occurs in G1, ending with thetermination of the write operation.

� assume that no other client requests occur in either G1 or G2.

� assume that no messages from G1 are received in G2 and nomessages from G2 are received in G1

� we know that write operation will complete (by theavailability requirement)


� Let α2 be the prefix of an execution in which a single readoccurs in G2 and no other client requests occur, ending withthe termination of the read operation

� During α2 no messages from G2 are received in G1 and nomessages from G1 are received in G2

� We know that the read must return a value (by theavailability requirement)

� The value returned by this execution must be v0 as no writeoperation has occurred in α2


� Let α be an execution beginning with α1 and continuing withα2. To the nodes in G2 , α is indistinguishable from α2, as allthe messages from G1 to G2 are lost (in both α1 and α2 thattogether make up α), and α1 does not include any clientrequests to nodes in G2.

� Therefore, in the α execution - the read request (from α2)must still return v0.

� However, the read request does not begin until after the writerequest (from α1) has completed

� This therefore contradicts the atomicity property,proving that no such algorithm exists


Corollary 1.1:It is impossible in the asynchronous network model to implement aread/write data object that guarantees the following properties:

� Availability - in all fair executions

� Atomic consistency - in fair executions in which no messagesare lost


Proof:

� The main idea is that in the asynchronous model, andalgorithm has no way of determining whether a message hasbeen lost, or has been arbitrary delayed in the transmissionchannel

� Therefore if there existed an an algorithm that guaranteedatomic consistency in executions in which no messages werelost, there would exist an algorithm that guaranteed atomicconsistency in all executions.

� This would violate Theorem 1


� Assume that there exists an algorithm A that alwaysterminates, and guarantees atomic consistency in fairexecutions in which all messages are delivered

� Theorem 1 implies that A does not guarantee atomicconsistency in al fair execution, so there exist some fairexecution of α in which some response is not atomic


� At some finite point in execution α1, the algorithm A returnsa response that is not atomic.

� Let α� be the prefix of α ending with the invalid response.

� Next, extend α� to to a fair execution α�� in which allmessages are delivered

� The execution α�� is now a fair execution in which allmessages are delivered

� However, this execution is not atomic.

� Therefore no such algorithm A exists

Solutions in the Asynchronous Model

While it is impossible to provide all three properties : atomicity,availability and partition tolerance, any two of these properties canbe achieved:

� Atomic, Partition Tolerant

� Atomic, Available

� Atomic, Partition Tolerant

Atomic, Partition Tolerant

� If availability is not required , it is easy to achieve atomic dataand partition tolerance

� The trivial system that ignores all requests meets theserequirements

� Stronger liveness criterion : if all the messages in an executionare delivered, system is available and all operations terminate

� A simple centralized algorithm meets these requirements : asingle designated node maintains the value of an object

� A node receiving request forwards the request to designatednode which sends a response. When acknowledgement isreceived, the node sends a response to the client

� Many distributed databases provide this guarantee, especiallyalgorithms based on distributed locking or quorums : if certainfailure patterns occur, liveness condition is weakened and theservice no longer returns response. If there are no failures,then liveness is guaranteed

Atomic, Available

� If there are no partitions - it is possible to provide atomic,available data

� Centralized algorithm with single designated node formaintaining value of an object meets these requirements

Available, Partition Tolerant

� It is possible to provide high availability and partitiontolerance if atomic consistency is not required

� If there are no consistency requirements, the service cantrivially return v0, the initial value in response to every request

� It is possible to provide weakened consistency in an available,partition-tolerant setting

� Web caches are one example of weakly consistent network

Partially Synchronous Model

� In the real world, most networks are not purely asynchronous

� If we allow each node in the network to have a clock, it ispossible to build a more powerful service

� In partially synchronous model - every node has a clock andall clocks increase at the same rate

� However, the clocks themselves are not synchronized, in thatthey might display different variables at the same real time

� In effect : the clocks act as timers : local state variables thatthe process can observe to measure how much time has passed

� A local timer can be used to schedule an action to occur acertain interval of time after some other event

� Furthermore, assume that every message is either deliveredwithin a given, known time tmax or it is lost

� Also, every node processes a received message within a given,known time tlocal and local processing time

Partially Synchronous Networks : Impossibility Result

Theorem 2: It is impossible in the partially synchronous networkmodel to implement a read/write data object that guarantees thefollowing properties:

� Availability

� Atomic consistency

in all executions (even those in which messages are lost)


Proof:

� Same methodology as in case of Theorem 1 is used

� we divide the network into two components {G1, G2} andconstruct an admissible execution in which a write happens inone component, followed by a read operation in the othercomponent.

� This read operation can be shown to return inconsistent data


� We construct execution α1 : a single write request andacknowledgement occurs in G1, and all messages between thetwo components {G1, G2} are lost

� Let α�2 be an execution that begins with a long interval of

time during which no client requests occur

� This interval must be at least long as the entire duration of α1

� Then append to α�2 the events of α2 in following manner : a

single read request and response in G2 assuming all messagesbetween the two components are lost

� Finally - we construct α by superimposing two execution α1

and α�2

� The long interval of time in α2 ensures that the write requestcompetes before the read request begins

� However, the read request returns the initial value, ratherthan the new value written by the write request, violatingatomic consistency

Solutions in the Partially Synchronous Model

Corollary 1.1(Asynchronous network model):It is impossible in the asynchronous network model to implement aread/write data object that guarantees the following properties:

� Availability - in all fair executions

� Atomic consistency - in fair executions in which no messagesare lost

In partially synchronous model - the analogue of Corollary 1.1 doesnot hold

� The proof of this corollary depends on nodes being unaware ofwhen a message is lost

� There are partially synchronous algorithms that will returnatomic data when all messages in an execution are delivered(i.e. there are no partitions) - and will only return inconsistentdata when messages are lost


� An example of such an algorithm is the centralized protocolwith single object state store modified to time-out lostmessages :

� On a read (or write) request, a message is sent to the centralnode

� If a response from central node is received, then the nodedelivers the requested data (or an acknowledgement)

� If no response is received within 2 ∗ tmsg + tlocal , then thenode concludes that the message was lost

� The client is then sent a response : either the best knownvalue of the local node (for a read operation) or anacknowledgement (for a write) operation. In this case, atomicconsistency may be violated.

Weaker Consistency Conditions

� While it is useful to guarantee that atomic data will bereturned in executions in which all messages are delivered, it isequally important to specify what happens in executions inwhich some of the messages are lost

� We discuss possible weaker consistency condition that allowsstale data to be returned when there are partitions, yet placeformal requirements on the quality of stale data returned

� This consistency guarantee will require availability and atomicconsistency in executions in which no messages are lost and istherefore impossible to guarantee in the asynchronous modelas a result of corollary

� In the partially synchronous model it often makes sense tobase guarantees on how long an algorithm has had to rectify asituation

� This consistency model ensures that if messages are delivered,then eventually some notion of atomicity is restored


� In a atomic execution, we define a partial order of the readand write operations and then require that if one operationbegins after another one ends, the former does not precedethe latter in the partial order.

� We define a weaker guarantee, t-Connected Consistency,which defines a partial order in similar manner, but onlyrequires that one operation not precede another if there is aninterval between the operations in which all messages aredelivered


A timed execution, α of a read-write object is t-ConnectedConsistent if two criteria hold. First in executions in which nomessages are lost, the execution is atomic. Second, in executionsin which messages are lost, there exists a partial order P on theoperations in α such that :

� 1. P orders all write operations, and orders all read operationswith respect to the write operations

� 2. The value returned by every read operation is exactly theone written by the previous write operation in P or the initialvalue if there is no such previous write in P

� 3. The order in P is consistent with the order of read andwrite requests submitted at each node

� 4. Assume that there exists an interval of time longer than tin which no messages are lost. Further, assume an operation,θ completes before the interval begins, and another operationφ, begins after the interval ends. Then φ does not precede θin the partial order P


t-Connected Consistency

� This guarantee allows for some stale data when messages arelost, but provides a time limit on how long it takes forconsistency to return, once the partition heals

� This definition can be generalized to provide consistencyguarantees when only some of the nodes are connected andwhen connections are available only some of the time


A variant of ”centralized algorithm” is t-Connected Consistent.Assume node C is the centralized node. The algorithm behaves asfollows:

� read at node A : A sends a request to C from the mostrecent value. If A receives a response from C within time2 ∗ tmsg + tlocal , it saves the value and returns it to the client.Otherwise, A concludes that a message was lost and it returnsthe value with the highest sequence number that has everbeen received from C , or the initial value if no value has yetbeen received from C . (When a client read request occurs atC it acts like any other node, sending messages to itself)


� write at A : A sends a message to C with the new value. Awaits 2 ∗ tmsg + tlocal , or until it receives an acknowledgementfrom C and then sends an acknowledgement to the client. Atthis point, either C has learned of the new value, or amessage was lost, or both eventsoccured. If A concludes thata message was lost, it periodically retransmits the value to C(along with all values lost during earlier write operations) untilit receives an acknowledgement from C . (As in the case ofread operations, when a client write request occurs at C itacts like any other node, sending messages to itself)

� New value is received at C : C serializes the write requeststhat it hears about by assigning them consecutive integertags. Periodically C broadcasts the latest value and sequencenumber to all other nodes.


Theorem 4 The modified centralized algorithm is t-Connectedconsistent

� Proof: In executions in which no messages are lost, theoperations are atomic. An execution is atomic if everyoperation acts as if it is executed at a single instant. In thiscase - that single instant occurs when C processes theoperation. C serializes the operation, ensuring atomicconsistency in executions in which all messages are delivered.

� We then examine executions in which messages are lost. Thepartial order P is constructed as follows. Write operations areordered by the sequence number assigned by the centralnode.Each read operation is sequenced after the writeoperation whose value it returns. It is clear by constructionthat the partial order P satisfies criteria 1 and 2 of thedefinition of t-Connected consistency. As the algorithmhandles requests in order received, criterion 3 is also clearlytrue


� In showing that the partial order respects criterion 4, there arefour cases : write followed by read , write followed by write,read followed by read and read followed by write. Let time tbe long enough for a write operation to complete (and for Cto assign a sequence number to the new value), and from oneof the periodic broadcasts from C to occur.


1. write followed by read :

� Assume a write occurs at Aw , after which an interval of timelonger than t passes in which all messages are delivered. Afterthis, a read is requested at some node. By the end of theinterval, two things have happened. First, Aw has notified thecentral node of the new value, and the write operation hasbeen assigned a sequence number. Second, the central nodehas rebroadcast that value (or later value in the partial order)to all other nodes during one of the periodic broadcasts. As aresult , the read operation does not return an earlier value,and therefore it must come after the write in the partial orderP


2. write followed by write :

� Assume a write occurs at Aw after which an interval of timelonger than t passes in which all messages are delivered. Afterthis a write is requested at some node. As in the previouscase, by the end of the interval in which messages aredelivered , the central node has assigned a sequence numberto the write operation at Aw . As a result, the later writeoperation is sequenced by the central node after the first writeoperation. Therefore the second write comes after the firstwrite in the partial order P .


3. read followed by read :

� Assume a read operation occurs at Br , after which an intervalof time longer than t passes in which all messages aredelivered. After this a read is requested at some node. Let φbe the write operation whose value value the first readoperation at Br returns. By the end of the interval in whichmessages are delivered, the central node has assigned asequence number to φ and has broadcast the value of φ (or alater value in the partial order) to all other nodes. As a result,the second read operation does not return a value earlier inthe partial order than φ. Therefore the second read operationdoes not precede the first in the partial order P .


4. read followed by write :

� Assume a read operation occurs at Br , after which an intervalof time longer than t passes in which all messages aredelivered. After this a write is requested at some node. Let φbe the write operation whose value the first read operation atBr returns. By the end of the interval in which messages aredelivered, the central node has assigned a sequence number toφ and as a result all write operations beginning after theinterval are serialized after φ. Therefore the write operationdoes not precede the read operation in the partial order P .


� Therefore P satisfies criterion 4. of the definition and thisalgorithm is t-Connected Consistent.

Conclusion

� We have shown that it impossible to reliably provide atomic,consistent data when there are partitions in the network

� It is feasible, however, to achieve any two of the threeproperties : consistency, availability and partition tolerance

� In an asynchronous model, when no clocks are available, theimpossibility result is fairly strong : it is impossible to provideconsistent data, even allowing stale data to be returned whenmessages are lost

� However, in partially synchronous models it is possible toachieve a practical compromise between consistency andavailability

The CAP Theorem

Technology

Transcript of The CAP Theorem