Reasoning about data and consistency in systems

Reasoning about data and consistency in systems

Daniel NormanCTO, güdTECHunba.se contributorTwitter: @DreamingInCode

Caveat emptor!

There is no silver bullet.

TL;DR● Systems model the physical world● Don’t annoy the humans● Many of our systems are global● We have to be available 24x7x365● Mind-bending conceptual models of systems● A view of a plausibly modern system● No refunds

● Create a model of reality.● Solve problems by performing computations thereon.● Profit!

What are we trying to achieve?

Humans have certain expectations

We don’t like it when weird stuff happens – They get irritated.

● Well, do I have a new message or not?● Why didn’t that save? Oh, it did save? Arghhh!● Why is this so slow?● What kind of lousy product is this?● This should always be available, time is money!

Consistency model:

“A set of all histories of operations allowable under a system”1

In other words:

A contract between the programmer (or agent) and the system, which provides a set of invariants to which the system will conform.

1. https://aphyr.com/posts/313-strong-consistency-models

● Linearizable● Serializable● Sequential● Causal● Eventual● PRAM● Read Your Writes● Repeatable Read● Monotonic Write● Monotonic Read● Write Follows Reads...

Some common consistency models

Linearizable Consistency

Eventual Consistency

Casual ConsistencyCausal Consistency

Reality is Causal

Hey, why not use wallclock?

I can use my system clock (AKA wallclock) to order my operations, right?

No.

Definitely NOT.

● NTP is notoriously unreliable● Bad news: simultaneity isn’t actually a thing● Time is actually weird and lumpy ( ask a physicist )

Advantages and Disadvantages

● Linearizable Strongly consistent, Single POV, May entail patience.

● Serializable Almost a single POV, allows modest concurrency, patience still required.

● Sequential Concurrent writers go nuts, ordering is arbitrary though, patience required for reads.

● Eventual Concurrent writers and no patience required! It’ll get applied, no promises when.

● Causal No patience required! No waiting for readers or writers, but no single POV either.

What consistency models do we really use?

“Linearizability / Serializability, obviously. End of presentation.”

But actually...

But Wait!AWS MAGIC CASTLE TECHNOLOGY TO THE RESCUE!

Not so fast – AWS is pretty good, but we must still reason about their consistency models:

● S3 Read after Write● DynamoDB Eventual● SQS Sequential or Linearizable● Aurora / RDS Serializable

● FIFO makes my life easier● It works around packet loss● We’re accustomed to it’s foibles

Why do we like TCP?

● Single POV makes my life easier● A central gatekeeper helps us ignore our other consistency models● It works well in the small scale

Why do we like Serializable RDBMS?

It’s nice to avoid coordination, but:

● Incompatible with user’s worldview● Requires ad-hoc consistency models as an overlay

Why isn’t eventual consistency your final answer?

● LieFi TCP linearizability gone wrong.

● Errant Promotion Asymmetries are problematic.

● Race conditions When two systems race head to head, you lose.

A few scenarios:

Concurrency is either something you’re dealing with,or something you’re putting off. No exceptions.

Your system is distributed

Our limited comprehension of this complexity eventually leads to:

● Mysterious System Behaviors● Inefficient Business Processes● Lapses in Service● Sadness

Causality is great when you can use it.

Wallclock baad!

Eventual Consistency is seductive, but problematic.

Consistency models are everywhere.

Be mindful of Linearizability / Serializability limitations.

Parting words:

Thank you!

Daniel NormanCTO, güdTECHunba.se contributorTwitter: @DreamingInCode

Reasoning about data and consistency in systems

Software

Transcript of Reasoning about data and consistency in systems