Chapter 20: Recovery
2421B: Database Systems - Recovery
Failure Types
Transaction Failures: local recovery System Failure: Global recovery
Main memory is lost Disk survives
Media Failure
3421B: Database Systems - Recovery
Motivation Atomicity: All-or-nothing
Transactions may abort (“Rollback”).
Durability: Changes survive server crash
crash!
Desired Behavior after system restarts:– T1, T2 & T3 should be durable.
– T4 & T5 should be aborted (effects not seen).
T1T2T3T4T5
c c c
a
4421B: Database Systems - Recovery
Update Approaches
In-place updating Change value of object and write it back to the same place on stable storage
Used in most DB Multiversion System
Old object version is kept, new version is created
E.g., in PostgreSQL Vacuum procedure that from time to time deletes old versions that are no more needed
5421B: Database Systems - Recovery
Handling the Buffer Pool Force/NoForce
If transaction T has written object X on page P: Force page P to disk (flush) before T commits
All changes of T are in the database on stable storage before T commits
NoForce: Flushing pages to disk is only determined by NoForce: Flushing pages to disk is only determined by replacement policy of buffer manager; replacement policy of buffer manager;
some of the changes of T might not be in the stable database at commit
Steal/NoSteal NoSteal: If transaction T has updated object X on page P: do NOT flush P before T commits;
No change of an active, uncommitted transaction is on stable storage
Steal: Replacements strategy is allowed to replace and flush a page even if the page contains update of uncommitted transaction
Changes of uncommitted transactions might be in the stable database
6421B: Database Systems - Recovery
Combinations
Force
No Force
Steal
Atomic flush at Commit Time
flush any timebefore commit
flush any timeafter commit
Flush anytime
NoSteal
7421B: Database Systems - Recovery
More on Steal STEAL (why enforcing Atomicity is hard)
To steal frame F: Current page in F (say P) is written to disk; some transaction T holds lock on object A on P.
What if the T with the lock on A aborts? What if the system fails directly after the flush and before T commits?
Must remember the old value of A at steal time (to support UNDOing the write to page P).
Crash case: we have to do something with transaction T4 ACTIVE at the time of the crash
No Steal (no uncommitted changes are in the stable database) At restart after failure we are sure that none of the changes of T4 are in DB
nothing to do for ACTIVE transactions
8421B: Database Systems - Recovery
More on Force Force (write changes before transaction
commits) At restart after failure we are sure that changes of T1, T2, and T3 are in DB and changes of T5 are NOT in the DB; nothing to do for TERMINATED transactions
NO FORCE (why enforcing Durability is hard) Assume a transaction T has modified tuple on page P and T committed but update is not yet in the stable database? Now system crashes before modified page is written to disk?
Write as little as possible, in a convenient place, at commit time,to support REDOing modifications.
9421B: Database Systems - Recovery
Combinations Ideal: FORCE/NO-Steal:
nothing has to be done at recovery Problem: basically not possible with update-in-place
In reality: mostly NOFORCE/STEAL
10421B: Database Systems - Recovery
Basic Idea: Logging A log is a read/append data structure maintained on
stable storage (survives failures) UNDO information: when transaction T updates an object
it stores the old version of object (before-image); when transaction aborts, copy before-image to current object-location.
REDO information: when transaction T updates an object it stores the new version of object (after-image); can be used to redo updates of transactions that committed.
In total: whenever a transaction T updates an object, both before- and after-image are written as one log-record and appended to the log.
Additionally: when transaction starts, a BEGIN record is appended to the log; when transaction commits/aborts, a commit/abort record is appended to the log.
11421B: Database Systems - Recovery
Architecture II
Upper Layer
Cache/Buffer Manager
Buffer Pool (random access)
SecondaryStorage(stable)
Log (append/read)
Access cost:~15 ms
Log DiskAccess cost:
~1 ms
12421B: Database Systems - Recovery
DB pages and Log pages
Page iRid = (i,N)
Rid = (i,2)Rid = (i,1)
N . . . 2 1
20 16 24 N
# slotsT255: w(x)
…T3: before(y), after (y) T255: begin T3: commit
…T255: before(z), after (z) …
T255: before(x), after (x)
Db page:
Log page:
Log tail
13421B: Database Systems - Recovery
When to flush a log page The Write-Ahead Logging Protocol:
Must force the log entry for an update before the corresponding data page gets to disk.
Must write all log entries for a Xact before commit.
Note: flushing log page is much cheaper than flushing DB page!
#1 guarantees Atomicity Assume active T has changed X; page with X get flushed to disk (steal); now system crashes before T commits => must undo T changes; need before image of X!
#2 guarantees Durability Assume T has changed X and committed; page with X does not get flushed to disk (no-force); now system crashes => must redo T changes; need after image of X!
14421B: Database Systems - Recovery
Types of Recovery Local UNDO during normal processing
whenever a transactions aborts, undo updates of aborted Xact by installing before-images.
Log-records are probably still in main memory; scan backwards starting from log-tail;
Global UNDO: at restart after system crash Xacts that aborted before the crash (we find abort record in log)
Xacts that were active at the time of the crash (we find neither abort nor commit record in log)
Whenever pages on the disk have updates of such Xacts (we say the update is reflected in the database), undo these updates by installing before-images
Pages contain additional information to detect this!
15421B: Database Systems - Recovery
Types of Recovery Partial REDO: at restart after system crash
Xacts that committed before the crash (we find a commit record in the log)
Whenever pages on the disk do not have the updates of such Xacts (we say the update is not reflected in the database), redo the updates by installing after-images
Page contains additional information so that we can detect this.
Global REDO: after disk failure Make snapshot of database (once a day /once a week)
Duplicate log and keep on two disks Keep log on a second storage After disk failure
Start with snapshot and then apply log
16421B: Database Systems - Recovery
Recovery after Crash Simple procedure:
Backward pass: Scan log from tail to head; For each record
If commit of T, include T in list of committed transactions C
If abort of T, include T in list of aborted transactions A
If update record of T and T is neither in A or C, include T in list of aborted transactions
If update record of T on object X and T in ARead in page P with object XIf update on X performed, install before-image
Forward pass: Scan log from head to tail: for each record
If update record of T on object X and T in CRead in page P with object XIf update on X not yet performed, install after-image
17421B: Database Systems - Recovery
Example of Recovery
update: T1 writes A on P5update T2 writes B on P3T1 Abortupdate: T3 writes C on P1update: T3 write D on P3update: T2 writes Z P5T3 commitCRASH, RESTART
LOG Backward pass: 7: Put T3 in C 6: Put T2 in A 6: Read P5; nothing has to be done
4,5: nothing 3: put T1 in A 2: read p3; install before-image of B
1: read p5: install before image (the write on A was flushed to disk but not the undo during normal processing)
1234567
P5 is flushed
BM
P3 is flushed
Forward pass: Step 4: read P1 install after-image of P1
Step 5: read P3; nothing has to be done
18421B: Database Systems - Recovery
Checkpointing Log becomes longer and longer => recovery has to read
entire log! Periodically, the DBMS creates a checkpoint, in order to
minimize the time taken to recover in the event of a system crash.
Simple checkpoint: Goal: only log that was written after the checkpoint has to be analyzed
Algorithm: Prevent new transactions from starting Wait until all transactions have terminated Flush all dirty pages to stable storage Write a checkpoint log entry Start new transactions
Upon recovery: backward pass only goes to last checkpoint entry
In real life more complicated; transaction processing is not interrupted; no big flush in one step
19421B: Database Systems - Recovery
Example
T1T2T3T4T5
c a c
start FlushBuffer
WriteChkpt record
20421B: Database Systems - Recovery
Further Issues
Crash during Recovery Logical logging: instead of physical before/after image redo operation / inverse operation (e.g. increment by one, decrement by one)
Hard disk failures: mirror disk or Archive copy (consistent copy of database on tape, created e.g. once every night when no transaction processing) + archive log (similar to log shown here)
Real Life: much more complicated: see textbook with ARIES
Top Related