CS 347: Distributed Databases and Transaction...
Transcript of CS 347: Distributed Databases and Transaction...
![Page 1: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/1.jpg)
CS 347 Notes07 1
CS 347: Distributed Databases and
Transaction ProcessingNotes07: Reliable
DistributedDatabase Management
Hector Garcia-Molina
![Page 2: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/2.jpg)
CS 347 Notes07 2
Reliable distributed database management
• Reliability• Failure models• Scenarios
![Page 3: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/3.jpg)
CS 347 Notes07 3
Reliability
• Correctness– Serializability– Atomicity– Persistence
• Availability
![Page 4: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/4.jpg)
CS 347 Notes07 4
Types of failures
• Processor failures– Halt, delay, restart, bezerk, ...
• Storage failures– Volatile, non-volatile, atomic write,
transient errors, spontaneous failures
• Network failures– Lost message, out-of-order messages,
partitions, bounded delay
![Page 5: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/5.jpg)
CS 347 Notes07 5
• Malevolent failures• Multiple failures• Detectable failures
More Types of failures
![Page 6: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/6.jpg)
CS 347 Notes07 6
Failure models
• Cannot protect against everything• Unlikely failures (e.g., flooding in the Sahara)
• Expensive to protect failures (e.g., earthquake)
• Failures we know how to protect against (e.g., message sequence numbers; stable storage)
![Page 7: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/7.jpg)
CS 347 Notes07 7
Failure model:
DesiredEvents
ExpectedUndesired
Unexpected
![Page 8: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/8.jpg)
CS 347 Notes07 8
Node models
(1) Fail-stop nodestime
perfect halted recoveryperfect
Volatilememory lost
Stablestorage ok
![Page 9: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/9.jpg)
CS 347 Notes07 9
(2) Byzantine nodesA Perfect
Perfect Arbitrary failure RecoveryB
C
At any given time, at most some fraction f of nodesfailed (typically f < 1/2 or f < 1/3)
Node models
![Page 10: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/10.jpg)
CS 347 Notes07 10
Network models
(1) Reliable network- in order messages- no spontaneous messages- timeout TD
I.e., no lost messages, except for node failures
If no ack in TD sec.Destination down
(not paused)
![Page 11: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/11.jpg)
CS 347 Notes07 11
Variation of reliable net:
• Persistent messages– If destination down, net will
eventually deliver message– Simplifies node recovery, but leads to
inefficiencies (hides too much)– Not considered here
![Page 12: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/12.jpg)
CS 347 Notes07 12
Network models
(2) Partitionable network- In order messages- No spontaneous messages
- no timeout; nodes can have different view of failures
![Page 13: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/13.jpg)
CS 347 Notes07 13
Scenarios
• Reliable network– Fail-stop nodes
– No data replication (1)– Data replication (2)
• Partitionable network– Fail-stop nodes (3)
![Page 14: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/14.jpg)
CS 347 Notes07 14
No Data Replication
• Reliable network, fail-stop nodes
• Basic idea: node Pα controls X
Pα Item Xnet
![Page 15: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/15.jpg)
CS 347 Notes07 15
No Data Replication
• Reliable network, fail-stop nodes
• Basic idea: node Pα controls X
Pα Item Xnet
- Single control point simplifies concurrency control, recovery
- Not an availability hit: if Pα down, X unavailable too!
![Page 16: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/16.jpg)
CS 347 Notes07 16
“Pα controls X” means- Pα does concurrency control for X- Pα does recovery for X
![Page 17: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/17.jpg)
CS 347 Notes07 17
Say transaction T wants to access X:
PT is process that represents T at this node
PT
req
Local DMBS
Lockmgr
LOG X
![Page 18: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/18.jpg)
CS 347 Notes07 18
Process models
(A) Cohorts Spawn process
CommunicationData Access
USER
T1
LocalDMBS
T2
LocalDMBS
T3
LocalDMBS
![Page 19: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/19.jpg)
CS 347 Notes07 19
Process models
(B) Transaction servers (manager)USER
LocalDMBS
TransMGR
LocalDMBS
TransMGR
LocalDMBS
TransMGR
![Page 20: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/20.jpg)
CS 347 Notes07 20
• Cohorts: application code responsible for remote access
• Transaction manager: “system” handles distribution, remote access
![Page 21: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/21.jpg)
CS 347 Notes07 21
.
Distributed commit problem
Action:a1,a2
Action:a3
Action:a4,a5
Transaction T
![Page 22: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/22.jpg)
CS 347 Notes07 22
Centralized two-phase commit
Coordinator Participant
I
W
C
A
I
W
C
A
go exec*
nok abort*
ok* commit *
commit -
exec ok
exec nok
abort -
![Page 23: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/23.jpg)
CS 347 Notes07 23
• Notation: Incoming message Outgoing message( * = everyone)
• When participant enters “W” state:– it must have acquired all resources
– it can only abort or commit if so instructedby a coordinator
• Coordinator only enters “C” state if all participants are in “W”, i.e., it is certain that all will eventually commit
![Page 24: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/24.jpg)
CS 347 Notes07 24
Handling node failures
• Coordinator and participant logs are used to reconstruct state before failure
![Page 25: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/25.jpg)
CS 347 Notes07 25
-> Example: after participant fails: Log:
T1X
undo/redoinfo
T1Y
info... ...
T1“W”state
![Page 26: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/26.jpg)
CS 347 Notes07 26
At recovery:
• T1 is in “W” state• Obtain X,Y write locks (no read locks!)
• Wait for message from coordinator(or ask coordinator for outcome)
![Page 27: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/27.jpg)
CS 347 Notes07 27
Other examples:
• No “W” record on log ⇒ abort T1
• See “C” record on log ⇒ finish T1
![Page 28: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/28.jpg)
CS 347 Notes07 28
• Add timeouts to cope with messages lost during crashes
• Add finish (“F”) state for coordinator – all done, can forget outcome
Next
![Page 29: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/29.jpg)
CS 347 Notes07 29
Coordinator
I
W
C
A
F
_go_exec*
c-ok*-
nok*-
nokabort*
ok*commit*
t=timeout
cping=coord. ping
![Page 30: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/30.jpg)
CS 347 Notes07 30
Coordinator
I
W
C
A
F
_go_exec*
c-ok*-
nok*-
ping- _t_
cping
pingabort
pingcommit
_t_cping
_t_abort*
nokabort*
ok*commit*
t=timeout
cping=coord. ping
![Page 31: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/31.jpg)
CS 347 Notes07 31
Participant
I
W
C
A
execok
execnok
commitc-ok
abortnok
![Page 32: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/32.jpg)
CS 347 Notes07 32
Participant
I
W
C
A
execok
execnok
commitc-ok
abortnok
cping , _t - ping
cpingdone
cpingdone
“done” message counts as eitherc-ok or n-ok for coordinator
![Page 33: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/33.jpg)
CS 347 Notes07 33
Participant
I
W
C
A
execok
equivalent to finish state
execnok
commitc-ok
abortnok
cping , _t - ping
cpingdone
cpingdone
“done” message counts as eitherc-ok or n-ok for coordinator
![Page 34: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/34.jpg)
CS 347 Notes07 34
Presumed abort protocol
• “F” and “A” states combined in coordinator
• Saves persistent space (forget quicker)
• Presumed commit is analogous
![Page 35: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/35.jpg)
CS 347 Notes07 35
Presumed abort-coordinator (participant unchanged)
I
W
C
A/F
_go_exec*
ping-
c-ok*-
pingabort
pingcommit
_t_cping
nok, tabort*
ok*commit*
![Page 36: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/36.jpg)
CS 347 Notes07 36
Remember: all state transitions must be loggedExample: tracking who has sent “OK” msgsLog at coord:
• After failure, we know still waiting for OK from node b
• Alternative: do not log receipts of “OK”s abort T1
T1start
part={a,b}
T1OK
from a RCV
... ......
![Page 37: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/37.jpg)
CS 347 Notes07 37
Example: logging receipt of C-OK messages
• If logged, can recover state• If not logged:
– resend commit *– participants reply “done” if duplicate
![Page 38: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/38.jpg)
CS 347 Notes07 38
2PC is blocking
Sample scenario:Coord P2
W P1 P3
WP4
W
![Page 39: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/39.jpg)
CS 347 Notes07 39
Case I:P1 → “W”; coordinator sent commits P1 → “C”
Case II:P1 → NOK; P1 → A
⇒ P2, P3, P4 (surviving participants)
cannot safely abort or commit transaction
coord
P1
P2
P3
P4w
w
w
![Page 40: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/40.jpg)
CS 347 Notes07 40
Variants of 2PC
• Linear Coord
• Hierarchical
ok ok ok
commit commit commit
![Page 41: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/41.jpg)
CS 347 Notes07 41
• Distributed
– Nodes broadcast all messages– Every node knows when to commit
Variants of 2PC
![Page 42: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/42.jpg)
CS 347 Notes07 42
3PC = non-blocking commit
• Assume: failed node is down forever
• Key idea: before committing, coordinator tells participants everyone is ok
![Page 43: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/43.jpg)
CS 347 Notes07 43
Coordinator Participant
I
W
P
A
_go_exec*
ack*commit*
nokabort*
ok*pre *
3PC
C
I
W
P
A
_exec_ok
commit-
execnok
preack
C
abort-
![Page 44: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/44.jpg)
CS 347 Notes07 44
3PC recovery rules: termination protocol
• Survivors try to complete transaction, based on their current states
• Goal:– If dead nodes committed or aborted, then
survivors should not contradict!
– Else, survivors can do as they please...
surv
ivors
![Page 45: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/45.jpg)
CS 347 Notes07 45
• Let {S1,S2,…Sn} be survivor sites
• If one or more Si = COMMIT ⇒ COMMIT T
• If one or more Si = ABORT ⇒ ABORT T
• If one or more Si = PREPARE ⇒T could not have aborted ⇒ COMMIT T
• If no Si = PREPARE (or COMMIT) ⇒T could not have committed ⇒ ABORT T
surv
ivors
![Page 46: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/46.jpg)
CS 347 Notes07 46
Example:
P
W
W
?
?
![Page 47: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/47.jpg)
CS 347 Notes07 47
Example:
I
W
W
?
?
![Page 48: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/48.jpg)
CS 347 Notes07 48
Example:
P
P
C
?
?
![Page 49: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/49.jpg)
CS 347 Notes07 49
Example:
P
W
A
?
?
![Page 50: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/50.jpg)
CS 347 Notes07 50
➳ Once survivors make decision, they must select new coordinator to continue 3PC
P P C C
W P C C W P P C
Decide tocommit
Time1
Time2
Time3
Time4
![Page 51: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/51.jpg)
CS 347 Notes07 51
Note: when survivors continue 3PC, failed nodes do not count
E.g., “OK*” ≡ when OK’s received from non-failed nodesW
P
ok*pre*
![Page 52: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/52.jpg)
CS 347 Notes07 52
Note: 3PC unsafe with partitions!
W
W
W
P
P
abort commit
![Page 53: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/53.jpg)
CS 347 Notes07 53
Node recovery:
• After node N recovers from failure:– do not participate in termination protocol
(why?)
W
W
W
?
P
→ A
![Page 54: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/54.jpg)
CS 347 Notes07 54
Node recovery:
• After node N recovers from failure:– do not participate in termination protocol
(why?)
W
W
W
?
P
→ A
✔
later on...
![Page 55: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/55.jpg)
CS 347 Notes07 55
Node recovery:
• After node N recovers from failure:– do not participate in termination protocol
(why?)– wait until it hears commit or abort decision
from operational node
?ping
ping
ping“C”(or “A”)
![Page 56: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/56.jpg)
CS 347 Notes07 56
• Waiting for commit/abort decision from other node is ok, unless all fail:
? ? ? ? ?
![Page 57: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/57.jpg)
CS 347 Notes07 57
Two options for all-failed problem:(A) Wait for all to recover(B) Majority commit
![Page 58: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/58.jpg)
CS 347 Notes07 58
Option A
• Recovering node waits for either:(1) commit/abort outcome for T from other node(2) all nodes that participated in T are up and recovering:
⇒ then 3PC can continue(no danger that a failed node could haveaborted or committed)
![Page 59: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/59.jpg)
CS 347 Notes07 59
Option B
• Nodes are assigned votes, total is VMajority is V+1 e.g., V=5 2 Maj=3 V=6 Maj=4
• To make state transitions, coordinator requires messages from nodes with a majority of votes
![Page 60: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/60.jpg)
CS 347 Notes07 60
Example(1): Coord P2 → W P1 P3 → Wp4 → W
• Nodes P2, P3, P4 enter “W” state and fail• When they recover, coord. and P1 are down• Each node has 1 vote, V=5, Maj=3
?
?
?
![Page 61: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/61.jpg)
CS 347 Notes07 61
Example(1): Coord P2 → W P1 P3 → Wp4 → W
• Nodes P2, P3, P4 enter “W” state and fail• When they recover, coord. and P1 are down• Each node has 1 vote, V=5, Maj=3
?
?
?
• Since P2, P3, P4 have majority, they know coord. could not have gone to “P” without at least one of their votes
• Therefore, T can be aborted!
![Page 62: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/62.jpg)
CS 347 Notes07 62
Example(2): Coord P3 → ”P” P1 P4 → ”W” P2
• Each node has 1 vote; V=5, Maj=3• Nodes fail after entering states shown;
P3, P4 recover
?
?
![Page 63: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/63.jpg)
CS 347 Notes07 63
Example(2): Coord P3 → ”P” P1 P4 → ”W” P2
• Each node has 1 vote; V=5, Maj=3• Nodes fail after entering states shown;
P3, P4 recover
?
?
• Termination rule says we can try to commit, but P3, P4 do not have enough votes, so they do nothing!
• P3, P4 doing nothing is good because later on, coord. P1, P2 could abort T
![Page 64: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/64.jpg)
CS 347 Notes07 64
1
Summary: Majority rule ensures that any
decision (e.g., Preparing, committing) will be
knownto any future group making a decision
1 1
22
decision # 2
decision # 1
![Page 65: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/65.jpg)
CS 347 Notes07 65
Important Detail for Majority 3PC
• Example:
W
W
W
?
P
➜ A
![Page 66: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/66.jpg)
CS 347 Notes07 66
Important Detail for Majority 3PC
• Example:
W
W
W
?
P
➜ A
✔
➜ P ➜ C
![Page 67: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/67.jpg)
CS 347 Notes07 67
Need “Prepare To Abort” State
I
W
PC PA
_go_exec*
ackC*commit*
nokpreA*ok*
preC *
C
I
W
PC
A
_exec_ok
commit-
execnok
preCackC
C
preAackA
coordinator participant
A
ackA*abort*
PA
abort-
![Page 68: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/68.jpg)
CS 347 Notes07 68
Example Revisited
W
W
W
?
PC
➜ PA
![Page 69: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/69.jpg)
CS 347 Notes07 69
Example Revisited
W
W
W
?
PC
➜ PA
✔
➜ PC ➜ C
OK to commit sincetransaction could not have aborted
![Page 70: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/70.jpg)
CS 347 Notes07 70
Example Revisited -II
W
W
W
?
PC
➜ PA
➜ PA
![Page 71: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/71.jpg)
CS 347 Notes07 71
Example Revisited -II
W
W
W
?
PC
➜ PA
✔
➜ PA
No decision:Transaction could have aborted orcould have committed... Block!
➜ PA
![Page 72: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/72.jpg)
CS 347 Notes07 72
3PC with Majority Voting
• If survivors have majority andall states W ⇒ try to abort
• If survivors have majority andstates in {W, PC, C} ⇒ try to commit
• If survivors have majority andstates in {W, PA, A} ⇒ try to abort
• Otherwise block
![Page 73: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/73.jpg)
CS 347 Notes07 73
Comparison
Option A: only nodes that have not failed participate in 3PC
• Any size group can terminate(even one node)
• If all nodes fail, must wait for all to recover
![Page 74: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/74.jpg)
CS 347 Notes07 74
Option B: Majority voting• A group of failed+recovering nodes
can terminate transaction (with majority of votes)
• Need majority for every commit➽ blocking protocol!
Comparison
![Page 75: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/75.jpg)
CS 347 Notes07 75
Reminder
• When node recovers, it uses its log in a normal fashion to determine status of transactions:– if commit found in log ⇒ redo if
necessary– if abort found (or no “W” record) ⇒
rollback if necessary
![Page 76: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/76.jpg)
CS 347 Notes07 76
– if in “W” state (or “P” state): • reclaim locks held by T before crash• try to terminate T (with other nodes)
– after locks claimed for “in doubt” transactions, start normal processing
Reminder - Continued
![Page 77: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/77.jpg)
CS 347 Notes07 77
Final note
• If nodes use 2P locking, global deadlocks possible
Local WFG: Local WFG: no cycles no cycles!
T1
T2
T1
T2
![Page 78: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/78.jpg)
CS 347 Notes07 78
• Need to “combine” WFGs to discover global deadlock
T1
T2
T1
T2
T1
T2e.g., central
detection node
![Page 79: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/79.jpg)
CS 347 Notes07 79
Problem: False deadlocks
T1
T2
T1
T2
T1
T2
T1
T2
Time1
Time2
Time3
infosent
infosent
T1
T2
at centralsite:
![Page 80: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management](https://reader035.fdocuments.us/reader035/viewer/2022063003/5f755d54f3050f7cc3071c7d/html5/thumbnails/80.jpg)
CS 347 Notes07 80
• Many deadlock solutions– Distributed vs. centralized– Detection vs. prevention
• timeouts• wait-die• wound-wait
• Covered in CS245