VoltDB and the Jepsen Test
Transcript of VoltDB and the Jepsen Test
VoltDB and the Jepsen test: What we learned about data
accuracy and consistency
John Hugg September 29th, 2016
@johnhugg / [email protected]
chat.voltdb.com@johnhugg
This Talk• Intro to VoltDB (will make this quick 😀)
Not going to explain how VoltDB works, or how to build an app
• What’s the value of consistency?
• What are the tradeoffs VoltDB made/makes to be consistent?
• Jepsen Testing Background & Results Not going to give a Jepsen talk the way Kyle Kingsbury does, but I will link to one
• A bit more on consistency and wrap up
chat.voltdb.com@johnhugg
What is VoltDB?• Scale-out, clustered, SQL Relational database
• Blazing, in-memory architecture with disk-persistence Millions of ACID multi-statement transactions per second
• Strong serializable transactions by default, even at high scale
• An excellent processing engine with rich import/export functionality
• Simple operational model:All nodes the same. Send ops to any node. Unpack tarball to install. Public clouds / Private clouds / Bare metal / VMs / Containers
chat.voltdb.com@johnhugg
Use Cases• Anything where decisions based on logic are made in real time for incoming
events: • Policy Enforcement • Fraud Detection • Real-time personalization for Ad-Tech, Gaming, Loyalty Programs, Retail • Payments (Micro or otherwise)
• Anything where math / calculations are done: • Billing and reporting on live data • State tracking
chat.voltdb.com@johnhugg
Example: Telco
Mobile phone is dialed.
Request sent to VoltDB to decide if it
should be let through.
Single transaction looks at state and decides if this call:
is fraudulent? is permitted under plan?
has prepaid balance to cover?
State Blacklists
Fraud Rules Billing Info
Recent Activity for both Numbers
Export to OLAP
99.999% of txns respond in 50ms
chat.voltdb.com@johnhugg
Example: Micro Personalization
User clicks link on a website. This
generates a request to VoltDB.
VoltDB transaction scans a table of rules and
checks which apply to this event.
Eventually the transaction decides what
to show the user next.
That decision is exported to HDFS
Spark ML is used to look at historical data in HDFS and
generate new rules.
These rules are loaded into VoltDB every few hours.
User sees personalized
content
chat.voltdb.com@johnhugg
More?VoltDB.com Visit our booth next to O’Reilly
MeEmail Me:
Slack
chat.voltdb.com@johnhugg
ACID vs CAP: Fight
• ACID refers to transactional consistency.
• How do multi-statement, multi-value operations read and modify data?
• CAP refers to agreement on data values between multiple replicas.
• Consistency, Availability, Partition-Tolerance
• Do all copies of this data have the same values?
ACID: 1 Transaction = 1 Event
• Atomic: Either 100% done or 0% done. No in-between.
• (Consistent)
• Isolated: Two concurrent operations can’t interfere with each other
• Durable: If it says it’s done, then it is done.
chat.voltdb.com@johnhugg
CAP Tradeoffs
In the face of unreliable networks (partitions): There are some cases where a system has to choose between inconsistent data processing and not responding at all.
• CP Systems: If the system responds, the answer is the same at all replicas.
• AP Systems: The system can respond even if it isn’t sure.
AP doesn’t imply 100% uptime or even more uptime necessarily. AP often offers knobs to pick between safety and latency / availability.
• It’s possible (and common) to be neither CP or AP.CA is not a thing.
chat.voltdb.com@johnhugg
Links for Those at Home
• Disambiguating ACID and CAP (blog post) https://www.voltdb.com/blog/disambiguating-acid-and-cap
• "All In With Determinism for Performance and Testing in Distributed Systems” (talk)https://www.youtube.com/watch?v=gJRj3vJL4wE
chat.voltdb.com@johnhugg
VoltDB offers the Strongest Consistency Guarantees of Any System Anywhere
• Serializable ACIDLinearizable Operations CP in CAP
• A conscious choice from day one at VoltDB to turn the consistency dial to eleven.
• Verified by Kyle Kingsbury at jepsen.io*
*More accurately: Jepsen failed to show it wasn’t inconsistent in version 6.4.
chat.voltdb.com@johnhugg
“Right Answers”?
• The simplest argument for consistency is that you get better answers, but that’s maybe too simple…
• There are lots of ways to take a less consistent system and get better answers.
• But most of them are more work for you, the developer. Many of them are not super efficient either.
chat.voltdb.com@johnhugg
Fewer Things Can Go Wrong
• A transaction can never partially fail => fewer bad states to worry about
• Secondary indexes and materializations are always in perfect sync
• Fewer awkward workarounds (like secondary indexes for example)
chat.voltdb.com@johnhugg
Exactly Once
• Everyone wants things to be processed and recorded exactly once.
• But distributed systems don’t care about what we want.
• In this world, bad things happen to good people.
• But there is some hope if you have strong consistency.
ACID
CP
is the property of certain operations in mathematics and computer science, that can be
applied multiple times without changing the result beyond the initial application.
Idempotence
Idempotent Not Idempotent
set x = 5;same as
set x = 5; set x = 5;
x++;not same as x++; x++;
if (x % 2 == 0) x++;same as
if (x % 2 == 0) x++; if (x % 2 == 0) x++;
if (x % 2 == 0) x *= 2;not same as
if (x % 2 == 0) x *= 2; if (x % 2 == 0) x *= 2;
spill coffee on brown pants eat whole plate of spaghetti
chat.voltdb.com@johnhugg
Operation How to Make it Idempotent
Insert Make it an upsert (PK required)
Many Inserts Transactional upserts
Complex Conditional Logic, possibly with many writes, some to non-unique tables.
If it adds a unique row somewhere, check if that row exists first
Keep a separate table with a log of work — always read log first
chat.voltdb.com@johnhugg
Isolation
• Many “ACID” systems don’t offer (or don’t default to using) strong isolation.
• Weak isolation, like “read committed” makes idempotency more challenging.
http://www.bailis.org/blog/when-is-acid-acid-rarely/
Low Latency Can Affect the Decision
500ms
Want to be here You lose money here
Many options for building consistency on top of eventually-consistent systems introduce extra latency,
or at least much more variability in latency.
Get Into the “Fast Path”
• Policy Enforcement in Telco
• Instant Fraud Detection
• Change what a user sees in response to action:
• Change the next webpage content based on recent website actions.
• Pick what’s behind the magic door based on how the game is going.
chat.voltdb.com@johnhugg
Timeouts
CAP Theorem here…
• We can’t confirm the transaction succeeded to the client until all nodes confirm.
• If a node doesn’t confirm, we wait up to a specified timeout. Then the fault resolution algorithm kicks in and ejects the node (with consensus among surviving nodes).
• This means killing a node can block transactions up to the timeout value, typically seconds, not milliseconds.
chat.voltdb.com@johnhugg
Two-Node Clusters Not Ideal
Is the other machine down?
Is the network partitioned?
3-Node clusters are better at reaching consensus.
chat.voltdb.com@johnhugg
Slower Cross Partition Write Ops
• If you need to verify all replicas of all involved partitions have the correct data, sometimes you are going to need to block until you get confirmations.
• There are tricks to make this better, but it will never be perfect.
chat.voltdb.com@johnhugg
High Bar for Testing / Slower Development
• Any feature we add must be vetted against our very strong consistency guarantees to users.
• Since it’s nearly impossible to prove a system like VoltDB is correct*, we are stuck trying to exhaustively find a counterexample.
• The amount of automated evil, self-verifying, randomized workloads we run nightly is getting pretty crazy. Jepsen is just a part of that.
*Tools like TLA+ are useful, but can’t verify all features and implementations practically, only subsets
chat.voltdb.com@johnhugg
What is Jepsen?John-Speak:
Kyle Kingsbury built a tool he called Jepsen.
He uses this tool, usually customized, to break databases.
We hired him to break VoltDB.
jepsen.io
chat.voltdb.com@johnhugg
Key Jepsen Testing Thing
• We paid Kingsbury to try to break VoltDB.
• We gave him complete editorial control over the subsequent post about his findings.
• If he found issues, he was going to write about them.
• This is atypical and speaks to Kingsbury’s integrity and value as a third party validator.
chat.voltdb.com@johnhugg
How Does it Work?
Step 1: Run a Workload and LOG EVERYTHING
Step 2: Inject lots of network failures
Step 3: Run a superpowered solver on the logs, checking for any states
that contradict DB promises
Hand-drawn images were made by Kingsbury
chat.voltdb.com@johnhugg
Example Problem
Time Op Result
T0 Write(5) Success
T1 Read 5
T2 Write(6) Success
T3 Read 5
chat.voltdb.com@johnhugg
Fun Reading
• http://jepsen.io/analyses.html
• Most systems fail.
• How the projects respond can be interesting.
chat.voltdb.com@johnhugg
Why Jepsen for VoltDB?
• We are always hungry for tests!
• Could build VoltDB-Jepsen harness ourselves, but…Wouldn’t be as good and wouldn’t have Kingsbury’s credibility.
• Customers have asked about it.
• Kingsbury has a built-in audience (marketing)
chat.voltdb.com@johnhugg
VoltDB Thoughts
• Our policy: Consistency or data loss bugs are blocking bugs to be prioritized above all else.
• So if Jepsen finds bugs, we need to fix them ASAP.
• The risk is that Jepsen finds bugs that we have to fix, which might impact our schedule.
• But that’s dumb. If our product has bugs, not knowing about them doesn’t make them not there.
chat.voltdb.com@johnhugg
What was found?Issue Reproducable Fixed
Under network partitions, VoltDB allows stale and/or dirty reads in read-only transactions. Any redundant VoltDB cluster 6.4
Under network partitions, VoltDB can lose confirmed writes.
Only when redundancy level > node count / 2 6.4
After a network partition, a total cluster failure, and a recovery,
VoltDB can lose confirmed writes.
Only when redundancy level > node count / 2 6.4
chat.voltdb.com@johnhugg
What was found?Issue Reproducable Fixed
Under network partitions, VoltDB allows stale and/or dirty reads in read-only transactions. Any redundant VoltDB cluster 6.4
Under network partitions, VoltDB can lose confirmed writes.
Only when redundancy level > node count / 2 6.4
After a network partition, a total cluster failure, and a recovery,
VoltDB can lose confirmed writes.
Only when redundancy level > node count / 2 6.4
One production deployment vulnerable
One production deployment vulnerable
chat.voltdb.com@johnhugg
VoltDB Takeaway
Engineering Team:
• Good move for our never-ending quest to build better software.
Marketing & Perception:
• Passing Jepsen is good. People talking about VoltDB is good. Showing we care about this stuff is good.
• Having bugs is bad, but discussing and fixing issues openly and seriously can be positive.
chat.voltdb.com@johnhugg
Reproducible!
• 100% reproducible test: • Set up Jepsen from Github • Clone Jepen VoltDB driver
from Github • Run!
• Can’t do this with systems you don’t control.
https://github.com/jepsen-io/voltdb
chat.voltdb.com@johnhugg
Jepsen is just one test• Jepsen is a Key-Value test, albeit one that was extended to multiple-
keys-per-transaction as part of the VoltDB work
• Doesn’t test configurations other than 5 node clusters with 5X redundanc
• Doesn’t test SQL, which can be much more complex, with many unpredictable writes per test
• Doesn’t test materialized views, Kafka importers, ElasticSearch exporters, cross-datacenter, windowing functions, complex stored procedures, etc…
chat.voltdb.com@johnhugg
Links for Those at Home• VoltDB 6.4 Passes Jepsen Testing
https://www.voltdb.com/jepsen
• How We Test at VoltDB (blog post pre-Jepsen)https://www.voltdb.com/blog/how-we-test-voltdb
• Testing VoltDB Against PostgreSQL https://www.voltdb.com/blog/testing-voltdb-against-postgresql
• Testing at VoltDB: SQLCoverage https://www.voltdb.com/blog/testing-voltdb-sqlcoverage
• "All In With Determinism for Performance and Testing in Distributed Systems” (talk) https://www.youtube.com/watch?v=gJRj3vJL4wE
chat.voltdb.com@johnhugg
#speedoflightfail
• VoltDB-style CAP+ACID consistency across the globe would mean mean latencies of 100ms or more.
• For some apps, this is ok, but for many it’s very challenging.
• VoltDB offers Eventual-Consistency-style tools for dealing with multiple datacenter deployments.*
☹
“speed of light” not “speedo flight”
chat.voltdb.com@johnhugg
Example: Telco (Revisited)
Mobile phone is dialed.
Request sent to VoltDB to decide if it
should be let through.
Single transaction looks at state and decides if this call:
is fraudulent? is permitted under plan?
has prepaid balance to cover?
State Blacklists
Fraud Rules Billing Info
Recent Activity for both Numbers
Export to OLAP
99.999% of txns respond in 50ms
chat.voltdb.com@johnhugg
Islands of Consistency
New York VoltDB
Strong-Serializable ACID + CP
100ms Async Replication
London VoltDB
Strong-Serializable ACID + CP
Boston User 20ms Latency
NYC Home
Glasgow User 20ms Latency London Home
chat.voltdb.com@johnhugg
Islands of Consistency
New York VoltDB
Strong-Serializable ACID + CP
100ms Async Replication
London VoltDB
Strong-Serializable ACID + CP
Boston User 20ms Latency
NYC Home
Client Migrates Home Takes > 100ms
Conflicts Extremely Rare
chat.voltdb.com@johnhugg
Local Consistency > None• Still get full functionality locally.
• Sends only committed transactions and applies them atomically on the peer clusters.
• Putting some smarts in the client makes conflicts extremely rare.
• This requires more planning and engineering work than a single datacenter solution.
Spoiler: Any complex distributed application is going to require lots of
planning and engineering work
chat.voltdb.com@johnhugg
No VoltDB?
Dynamo-Based Eventual Consistency
NYC London
(1)
Dynamo-Based Eventual Consistency
Dynamo-Based Eventual Consistency
Dynamo-Based Eventual Consistency(2)
Consistent System generating packaged events
Consistent System consuming packaged events
Kafka or Similar(3)