The Power of Determinism in Database Systems

Daniel J. Abadi

Yale University

(Joint work with Jose Faleiro, Kun Ren, and Alex Thomson)

Database Systems Are Great

• Protects a dataset from corruption or deletion in the face of media, system, or program crashes

• Allows programs to change state of data in arbitrary ways

• Allows 1000s of such programs to run concurrently– Guarantees atomicity and isolation of such

programs

• Has served as blueprint for many concurrent, highly complex systems

But …

• Design is incredibly complex– Takes $17 million to build a new one

• Components are horribly monolithic• Corner case bugs nearly impossible to

reproduce• Does not scale horizontally• Does not scale horizontally (seriously)

Should the DBMS architecture really be a blueprint for concurrent system design?

Nondeterminism is the problem

• Building on top of:– OSes that enable threads to be scheduled

arbitrarily– Networks that deliver messages with arbitrary

delays (and sometimes in arbitrary orders)– Hardware that can fail arbitrarily

• Only natural to allow the state of the database to be dependent on these nondeterministic events

Nondeterminism is the problem

• OS non-deterministic thread scheduling leads to:– Arbitrary transaction interleaving– Deadlocks– Difficult to reproduce bugs– Tight interactions between lock manager, recovery

manager, access manager, and transaction manager.

• Hardware failures and message delivery delays result in transaction aborts– Need complicated recovery manager to handle

half-completed transactions– Need commit-protocol for distributed transactions

How to eliminate nondeterminism?

• There exist proposals for:– Deterministic operating systems– (Somewhat) deterministic networking layers– Highly redundant and reliable hardware

• Maybe one day those proposals will come with fewer disadvantages

• In the meantime, we have to create determinism from nondeterministic components– Select and choose what we make deterministic

Possible determinism levels• Given an input and initial state of the database

system, to get to one and only one possible final state:– Level 1: System always runs the same sequence

of instructions – Level 2: System always proceeds through the

same sequence of states of the database– Level 3: Database is allowed to proceed through

states in any order as long as the final state of all external and internal data structures is determined by the input

– Level 4: Database is allowed to proceed through states in any order as long as the final state of all external structures is determined by the input

Database Systems Problems

reproduce• Does not scale horizontally• Does not scale horizontally

Database Systems Problems

reproduce• Does not scale horizontally• Does not scale horizontally

LEVEL 4 DETERMINISM HELPS WITH ALL OF

Recovery

• Brain-dead version:– Log all input to the system– Upon a failure, trash the entire database, reply input log from

the beginning

• Less brain-dead version:– Create checkpoints of database state as of some point in the

input log– Upon a failure, trash the entire database, load checkpoint,

replay input log from point where checkpoint was taken

• Note that logging can happen entirely externally to the DBMS

• Same is true for checkpointing, although may want to perform it inside the DBMS for performance– Even in this case, it needs very little knowledge about other

components

Replication

• Send the same input log to replica DBMS– User-visible state in replicas will not diverge– Can happen entirely externally to the DBMS

Horizontal Scalability

• Active distributed xacts not aborted upon node failure– Greatly reduces (or eliminates) cost of

distributed commit• Don’t have to worry about nodes failing during

commit protocol• Don’t have to worry about affects of transaction

making it to disk before promising to commit transaction

• Just need one message from any node that potentially can deterministically abort the xact

– This message can be sent in the middle of the xact, as soon as it knows it will commit

One Way to Implement Determinism

• Use a preprocessor to handle client communications, and create a log of submitted xacts

• Send log in batches to DBMS• Every xact immediately requests all locks it will need (in

order of log)• If it doesn’t know what it will need

– Run enough of the xact to find out, but do not change the database state

– Reissue xact to the preprocessor with lock requirements included as parameter

– Run enough of the new xact to find out if it locked the correct items (database state might have changed in the meantime)

• If so, then xact can proceed as normal• If not, reissue again to the preprocessor and repeat as necessary

• Trivial to prove this is deterministic and deadlock-free

What’s the Downside?

• Increased latency to log input transactions and send to the DBMS in batches

• No flexibility for the system to abort transactions on a whim

• Can’t reorder transaction execution if one xact stalls mid-transaction

• Need to determine what will be locked in advance

Additional Upside

• Our implementation eliminates deadlocks– Distributed deadlock is a major problem for

distributed DBMSs

• Lock manager totally separate from the rest of DBMS– Increases modularity of the system

Experimental Evaluation

• Experiments conducted on Amazon EC2 using m3.2xlarge(Double Extra Large)

• Cluster of 8 nodes• TPC-C • Microbenchmark:

– 10RMW actions– 10RMW actions + CPU computation

Microbenchmark Experiments (Long xacts)

20%30%

40%50%

60%70%

80%90%

100000

150000

200000

250000

Deterministic, high contentionNondeterministic, high contentionDeterministic, low contentionNondeterministic, low contentionNondeterministic w/o 2PC, low contentionNondeterministic w/o 2PC, high contention

% Distributed Transactions

Microbenchmark Experiments (Short xacts)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

100000

200000

300000

400000

500000

600000

Deterministic, high contention

Nondeterministic, high contention

Nondeterministic, low contention

Deterministic, low contention

Deterministic w/ VLL, low contention

Deterministic w/ VLL, high contention

% Distributed Transactions

Resource Constraints Experiments

Dependent Transactions Experiments

（ a) 0% distributed transactions（ b） 100% distributed

transactions

Latency CDF

More information

• The Case for Determinism in Database SystemsAlexander Thomson and Daniel J. Abadi. In PVLDB, 3(1), 2010. (pdf)

• Calvin: Fast Distributed Transactions for Partitioned Database SystemsAlexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. In Proceedings of SIGMOD, 2012. (pdf)

• An Evaluation of the Advantages and Disadvantages of Deterministic Database SystemsKun Ren, Alexander Thomson and Daniel J. Abadi. In PVLDB, 7(10), 2014. (pdf)

• Modularity and Scalability in CalvinAlexander Thomson and Daniel J. Abadi. In IEEE Data Eng. Bull., 36(2): 48-55, 2013. (pdf)

• Lightweight Locking for Main Memory Database SystemsKun Ren, Alexander Thomson, and Daniel J. Abadi. In PVLDB 6(2): 145-156, 2012. (pdf)

Conclusions

• Determinism not a good fit for latency-sensitive applications

• Fewer options to deal with node overload (true only for lock-based implementation)

• Much improved throughput for distributed transactions

• Much simpler design. Recover manager, lock manager, totally separate from rest of DBMS

• Replication is trivial

The Power of Determinism in Database Systems

Technology

Transcript of The Power of Determinism in Database Systems

Complex systems complexity chaos the butterfly effect emergence determinism vs. non-determinism & observational non-determinism.

Technological Determinism

Chance and Determinism - PhilSci-Archivephilsci-archive.pitt.edu/11219/1/Chance_and_determinism.pdf · Chance and Determinism ... determinism because it is in principle possible that

Determinism Freewill

Freewill vs Determinism

Environmental determinism and archaeology. Understanding and … · 2020-06-22 · DISCUSSION Environmental determinism and archaeology. Understanding and evaluating determinism in

History of Information January 25, 2011courses.ischool.berkeley.edu/i103/s11/SLIDES/HofI11-Determinism-P… · HofI 11 -- determinism overview what is technological determinism? taking

Leveraging Determinism in Industrial Control Systems for ... · 4/8/2020 · Leveraging Determinism in Industrial Control Systems 8 – 1 Leveraging Determinism in Industrial Control

Technological Determinism 27/01/2013 13:27 Technological ... · PDF fileTechnological Determinism 27/01/2013 13:27 ... Technological or Media Determinism ... Scholars who study the

BONJOUR Determinism

Paradoxes of determinism

Environmental Determinism

Beyond technological determinism

Technological determinism theory powerpoint

Determinism Capitalism

Determinism and freedom

Technology determinism

Free will & determinism, part II - nd.edujspeaks/courses/2009-10/10100/... · combinations of views about free will and determinism: free will + determinism no free will + determinism

· Web viewCultural bias, including ethnocentrism and cultural relativism Free will and determinism Page 6 Hard determinism and soft determinism Biological, environmental and psychic

Language Development TED—Linguistic Determinism Ch. 7B—p. 315-323 TED—Linguistic Determinism TED—Linguistic Determinism “Learn a new language and get a.