Lampson and Lomet’s Paper: A New Presumed Commit Optimization for Two Phase Commit Doug Cha COEN...
-
Upload
gwen-shepherd -
Category
Documents
-
view
213 -
download
0
Transcript of Lampson and Lomet’s Paper: A New Presumed Commit Optimization for Two Phase Commit Doug Cha COEN...
Lampson and Lomet’s Paper: Lampson and Lomet’s Paper: A New Presumed Commit A New Presumed Commit
Optimization for Two Phase Optimization for Two Phase CommitCommit
Doug ChaDoug Cha
COEN 317 – SCU Spring 05COEN 317 – SCU Spring 05
About the authorsAbout the authors
Butler LampsonButler Lampson Currently at MSFTCurrently at MSFT Formerly at Xerox PARC, DEC research, and a Formerly at Xerox PARC, DEC research, and a
professor at MIT and Berkeleyprofessor at MIT and Berkeley ACM Turing Award in 1992ACM Turing Award in 1992
David LometDavid Lomet Also at MSFT, formerly at DEC researchAlso at MSFT, formerly at DEC research Key work on database systemsKey work on database systems One of the inventors of the transaction concpetOne of the inventors of the transaction concpet ACM FellowACM Fellow
OutlineOutline
Review of 2PCReview of 2PC
More on 2PC / OptimizationsMore on 2PC / Optimizations Presumed NothingPresumed Nothing Presumed AbortPresumed Abort Presumed CommitPresumed Commit
Recovery requirementsRecovery requirements
The new PrC protocolThe new PrC protocol
SummarySummary
Review of 2PCReview of 2PC
Distributed Atomic Commit problem (DC9 p2)Distributed Atomic Commit problem (DC9 p2) How to get all members of a group to commit/abort together?How to get all members of a group to commit/abort together?
Two Phase Commit, Gray 1987 (DC9 p3):Two Phase Commit, Gray 1987 (DC9 p3): First phase is the voting phaseFirst phase is the voting phase
Coordinator sends all participants (cohorts) a vote request Coordinator sends all participants (cohorts) a vote request (PREPARE)(PREPARE)All participants (cohorts) respond COMMIT-VOTE or ABORT-VOTEAll participants (cohorts) respond COMMIT-VOTE or ABORT-VOTE
Second phase, coordinator decides commit or abort: if any Second phase, coordinator decides commit or abort: if any participant voted ABORT, then decision must be abort. participant voted ABORT, then decision must be abort. Otherwise, commit.Otherwise, commit.
Coordinator sends all participants decision (COMMIT or ABORT)Coordinator sends all participants decision (COMMIT or ABORT)Participants (who have been waiting for decision) commit or abort Participants (who have been waiting for decision) commit or abort as instructed and ACK.as instructed and ACK.
2 Phase Commit2 Phase Commit
PREPARE
COMMIT-VOTE
COMMIT
<<collect all votes>>
Coordinator Cohort
make vote
execute commit
ACK
Additional Detail – A protocol database at the coordinator stores transaction states and cohort votes. This is used for error recovery.
2PC Variations2PC Variations
Presumed Nothing (PrN)Presumed Nothing (PrN)
Presumed Abort (PrA)Presumed Abort (PrA)
Presumed Commit (PrC)Presumed Commit (PrC)
Variations deal with how to handle Variations deal with how to handle recovery and vary on how recovery data is recovery and vary on how recovery data is logged.logged.
Presumed Nothing (PrN)Presumed Nothing (PrN)
PREPARE
COMMIT or ABORT-VOTE
COMMIT or ABORT
<<collect all votes>>
Coordinator Cohort
make vote
execute commit
ACK
Record ACK
Forced record
1 forced write, 1 lazy write, 2 messages to cohort
<<collect all acks>>
Remove record
PrN Failure RecoveryPrN Failure Recovery
PREPARE
COMMIT-VOTE
Coordinator Cohort
make vote
In PrN nothing is recorded until a COMMIT is sent, so coordinator crash results in ABORT.
timeoutSTATUS?
crash
no recordABORT
PrA OptimizationPrA Optimization
PREPARE
ABORT-VOTE
ABORT
Coordinator Cohort
make vote
No record
On an ABORT, there are no log records and no ACK. This works because we “presume an abort” if no record exists!
crash
recoverySTATUS?
no recordABORT
Presumed Commit (PrC) - COMMITPresumed Commit (PrC) - COMMIT
PREPARE
COMMIT-VOTE
COMMIT<<collect all
votes>>
Coordinator Cohort
make vote
Forced remove record
2 forced write, 2 messages to cohort
Cohort doesn’t need to send ACK
Forced record
crash
recoverySTATUS?
no recordCOMMIT
Presumed Commit (PrC) - ABORTPresumed Commit (PrC) - ABORTCoordinator Cohort
PREPARE
ABORT-VOTE
ABORT
<<collect all acks>>
make vote
execute abort
remove record
ACK
Forced record
ACK only needed on ABORTs
Comparison For NowComparison For Now
2PC 2PC VariantVariant
CoordinatorCoordinator CohortCohort
PrNPrN 2 log records2 log records
1 forced log1 forced log
2 messages to Cohort2 messages to Cohort
2 log records2 log records
2 forced log2 forced log
2 messages to Coordinator2 messages to Coordinator
PrAPrA 2 log records2 log records
1 forced log1 forced log
2 messages to Cohort2 messages to Cohort
2 log records2 log records
2 forced log2 forced log
2 messages to Coordinator2 messages to Coordinator
PrCPrC 2 log records2 log records
2 forced log2 forced log
2 messages to Cohort2 messages to Cohort
2 log records2 log records
1 forced log1 forced log
1 messages to Coordinator1 messages to Coordinator
Improving PrCImproving PrC
Messaging is low already, try to reduce forced Messaging is low already, try to reduce forced log writes.log writes. In PrC a forced write happens at PREPAREIn PrC a forced write happens at PREPARE
Any transactions with a PREPARE, but no transaction end Any transactions with a PREPARE, but no transaction end are abortedare aborted
Non existence of a transaction record assumes commitNon existence of a transaction record assumes commit To remove the forced PREPARE write, we need to:To remove the forced PREPARE write, we need to:
Find another way to identify transactions that may have Find another way to identify transactions that may have started before the crash but did not finishstarted before the crash but did not finish
Keep these transaction records around so we know to abort Keep these transaction records around so we know to abort them (since we are still presuming commits)them (since we are still presuming commits)
Improving PrCImproving PrC
Instead of recording trans init, record timestamps:Instead of recording trans init, record timestamps: tidtidll –lowest possible time of an undocumented transaction –lowest possible time of an undocumented transaction tidtidhh –most recent undocumented transaction –most recent undocumented transaction tidtidstasta – most recent record of a transaction – most recent record of a transaction
So we have:So we have: REC = { tid | tidREC = { tid | tidll < tid < tid < tid < tidhh} = recent transactions} = recent transactions
COM = commited and stable transactionsCOM = commited and stable transactions IN = REC – COM = transactions maybe active during crashIN = REC – COM = transactions maybe active during crash
On recovery:On recovery: Cohorts asking status of a transaction assume commit unless the record Cohorts asking status of a transaction assume commit unless the record
exists in the IN setexists in the IN set The IN set must be stored forever! (But data size is small)The IN set must be stored forever! (But data size is small)
Transaction Log
tidtidll tidtidhhtidtidstasta
Window of Active/Undocumented Transactions (REC)
Commited or Aborted Transactions
Not used space
time
The New PrC Protocol ABORTThe New PrC Protocol ABORT
PREPARE
ABORT-VOTE
ABORT
Coordinator
make vote
increase tidl value past this trans, so IN set does not include this anymore
ACK<<Collect all acks>>
abort
IN range of tids contains this transaction
tidl < tid < tidh
The New PrC Protocol COMMITThe New PrC Protocol COMMIT
PREPARE
COMMIT-VOTE
COMMIT
Coordinator
make vote
No trans record in IN so commit
ACKabort
recovery / crash STATUS?
COMMIT
<<Collect all acks>>
Move tidl past this
IN range of tids contains this transaction
tidl < tid < tidh
The New PrC Protocol ABORT/CRASHThe New PrC Protocol ABORT/CRASH
PREPARE
ABORT-VOTE
ABORT
Coordinator
make vote
Trans is still in IN set, so we send abort
ACK
abortcrash
recovery STATUS?
ABORT
IN range of tids contains this transaction
Analysis of New PrC ProtocolAnalysis of New PrC Protocol
We reduce the # of forced writes but require permanent We reduce the # of forced writes but require permanent storage of IN recordsstorage of IN records
2PC Variant2PC Variant CoordinatorCoordinator CohortCohort
PrCPrC 2 log records2 log records
2 forced log2 forced log
2 messages to Cohort2 messages to Cohort
2 log records2 log records
1 forced log1 forced log
1 messages to Coordinator1 messages to Coordinator
New PrCNew PrC 1 log records1 log records
1 forced log1 forced log
2 messages to Cohort2 messages to Cohort
2 log records2 log records
1 forced log1 forced log
1 messages to Coordinator1 messages to Coordinator
SummarySummary
Two-Phase CommitTwo-Phase Commit Presumed NothingPresumed Nothing Presumed AbortPresumed Abort Presumed CommitPresumed Commit Requirements for logging/recoveryRequirements for logging/recovery New Presumed CommitNew Presumed Commit
ReferencesReferences
A New Presumed Commit Optimization for A New Presumed Commit Optimization for Two Phase Commit – Lampson and Two Phase Commit – Lampson and Lomet, 1993.Lomet, 1993.
Distributed Systems Concepts and Design Distributed Systems Concepts and Design – Coulouris, Dollimore, Kindberg– Coulouris, Dollimore, Kindberg
Santa Clara Univ, COEN 317 class notes Santa Clara Univ, COEN 317 class notes – Holliday– Holliday