Download - Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main.

Chapter 20: Recovery

2421B: Database Systems - Recovery

Failure Types

Transaction Failures: local recovery System Failure: Global recovery

Main memory is lost Disk survives

Media Failure


Motivation Atomicity: All-or-nothing

Transactions may abort (“Rollback”).

Durability: Changes survive server crash

crash!

Desired Behavior after system restarts:– T1, T2 & T3 should be durable.

– T4 & T5 should be aborted (effects not seen).

T1T2T3T4T5

c c c

a


Update Approaches

In-place updating Change value of object and write it back to the same place on stable storage

Used in most DB Multiversion System

Old object version is kept, new version is created

E.g., in PostgreSQL Vacuum procedure that from time to time deletes old versions that are no more needed


Handling the Buffer Pool Force/NoForce

If transaction T has written object X on page P: Force page P to disk (flush) before T commits

All changes of T are in the database on stable storage before T commits

NoForce: Flushing pages to disk is only determined by NoForce: Flushing pages to disk is only determined by replacement policy of buffer manager; replacement policy of buffer manager;

some of the changes of T might not be in the stable database at commit

Steal/NoSteal NoSteal: If transaction T has updated object X on page P: do NOT flush P before T commits;

No change of an active, uncommitted transaction is on stable storage

Steal: Replacements strategy is allowed to replace and flush a page even if the page contains update of uncommitted transaction

Changes of uncommitted transactions might be in the stable database


Combinations

Force

No Force

Steal

Atomic flush at Commit Time

flush any timebefore commit

flush any timeafter commit

Flush anytime

NoSteal


More on Steal STEAL (why enforcing Atomicity is hard)

To steal frame F: Current page in F (say P) is written to disk; some transaction T holds lock on object A on P.

What if the T with the lock on A aborts? What if the system fails directly after the flush and before T commits?

Must remember the old value of A at steal time (to support UNDOing the write to page P).

Crash case: we have to do something with transaction T4 ACTIVE at the time of the crash

No Steal (no uncommitted changes are in the stable database) At restart after failure we are sure that none of the changes of T4 are in DB

nothing to do for ACTIVE transactions


More on Force Force (write changes before transaction

commits) At restart after failure we are sure that changes of T1, T2, and T3 are in DB and changes of T5 are NOT in the DB; nothing to do for TERMINATED transactions

NO FORCE (why enforcing Durability is hard) Assume a transaction T has modified tuple on page P and T committed but update is not yet in the stable database? Now system crashes before modified page is written to disk?

Write as little as possible, in a convenient place, at commit time,to support REDOing modifications.


Combinations Ideal: FORCE/NO-Steal:

nothing has to be done at recovery Problem: basically not possible with update-in-place

In reality: mostly NOFORCE/STEAL


Basic Idea: Logging A log is a read/append data structure maintained on

stable storage (survives failures) UNDO information: when transaction T updates an object

it stores the old version of object (before-image); when transaction aborts, copy before-image to current object-location.

REDO information: when transaction T updates an object it stores the new version of object (after-image); can be used to redo updates of transactions that committed.

In total: whenever a transaction T updates an object, both before- and after-image are written as one log-record and appended to the log.

Additionally: when transaction starts, a BEGIN record is appended to the log; when transaction commits/aborts, a commit/abort record is appended to the log.


Architecture II

Upper Layer

Cache/Buffer Manager

Buffer Pool (random access)

SecondaryStorage(stable)

Log (append/read)

Access cost:~15 ms

Log DiskAccess cost:

~1 ms


DB pages and Log pages

Page iRid = (i,N)

Rid = (i,2)Rid = (i,1)

N . . . 2 1

20 16 24 N

# slotsT255: w(x)

…T3: before(y), after (y) T255: begin T3: commit

…T255: before(z), after (z) …

T255: before(x), after (x)

Db page:

Log page:

Log tail


When to flush a log page The Write-Ahead Logging Protocol:

Must force the log entry for an update before the corresponding data page gets to disk.

Must write all log entries for a Xact before commit.

Note: flushing log page is much cheaper than flushing DB page!

#1 guarantees Atomicity Assume active T has changed X; page with X get flushed to disk (steal); now system crashes before T commits => must undo T changes; need before image of X!

#2 guarantees Durability Assume T has changed X and committed; page with X does not get flushed to disk (no-force); now system crashes => must redo T changes; need after image of X!


Types of Recovery Local UNDO during normal processing

whenever a transactions aborts, undo updates of aborted Xact by installing before-images.

Log-records are probably still in main memory; scan backwards starting from log-tail;

Global UNDO: at restart after system crash Xacts that aborted before the crash (we find abort record in log)

Xacts that were active at the time of the crash (we find neither abort nor commit record in log)

Whenever pages on the disk have updates of such Xacts (we say the update is reflected in the database), undo these updates by installing before-images

Pages contain additional information to detect this!


Types of Recovery Partial REDO: at restart after system crash

Xacts that committed before the crash (we find a commit record in the log)

Whenever pages on the disk do not have the updates of such Xacts (we say the update is not reflected in the database), redo the updates by installing after-images

Page contains additional information so that we can detect this.

Global REDO: after disk failure Make snapshot of database (once a day /once a week)

Duplicate log and keep on two disks Keep log on a second storage After disk failure

Start with snapshot and then apply log


Recovery after Crash Simple procedure:

Backward pass: Scan log from tail to head; For each record

If commit of T, include T in list of committed transactions C

If abort of T, include T in list of aborted transactions A

If update record of T and T is neither in A or C, include T in list of aborted transactions

If update record of T on object X and T in ARead in page P with object XIf update on X performed, install before-image

Forward pass: Scan log from head to tail: for each record

If update record of T on object X and T in CRead in page P with object XIf update on X not yet performed, install after-image


Example of Recovery

update: T1 writes A on P5update T2 writes B on P3T1 Abortupdate: T3 writes C on P1update: T3 write D on P3update: T2 writes Z P5T3 commitCRASH, RESTART

LOG Backward pass: 7: Put T3 in C 6: Put T2 in A 6: Read P5; nothing has to be done

4,5: nothing 3: put T1 in A 2: read p3; install before-image of B

1: read p5: install before image (the write on A was flushed to disk but not the undo during normal processing)

1234567

P5 is flushed

BM

P3 is flushed

Forward pass: Step 4: read P1 install after-image of P1

Step 5: read P3; nothing has to be done


Checkpointing Log becomes longer and longer => recovery has to read

entire log! Periodically, the DBMS creates a checkpoint, in order to

minimize the time taken to recover in the event of a system crash.

Simple checkpoint: Goal: only log that was written after the checkpoint has to be analyzed

Algorithm: Prevent new transactions from starting Wait until all transactions have terminated Flush all dirty pages to stable storage Write a checkpoint log entry Start new transactions

Upon recovery: backward pass only goes to last checkpoint entry

In real life more complicated; transaction processing is not interrupted; no big flush in one step


Example

T1T2T3T4T5

c a c

start FlushBuffer

WriteChkpt record


Further Issues

Crash during Recovery Logical logging: instead of physical before/after image redo operation / inverse operation (e.g. increment by one, decrement by one)

Hard disk failures: mirror disk or Archive copy (consistent copy of database on tape, created e.g. once every night when no transaction processing) + archive log (similar to log shown here)

Real Life: much more complicated: see textbook with ARIES