Introduction to database-Transaction Concurrency and Recovery

Introduction to Databases

Relational Database Design

Transaction, Concurrency and Recovery

Ajit K Nayak, Ph.D.

Siksha O Anusandhan University

AKN/IDBIV.2Introduction to databases

Transaction Concept A transaction is a unit of program execution that

accesses and possibly updates various data items.

E.g., transaction to transfer Rs. 50 from account A to

account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Two main issues to deal with:

Failures of various kinds, such as hardware failures and system

crashes

Concurrent execution of multiple transactions

One Transaction consists of a

Set of instruction


Transaction State - I Active – the initial state; the

transaction stays in this state

while it is executing

Partially committed – after the

final statement has been

executed.

Failed -- after the discovery that

normal execution can no longer

proceed.

Aborted – after the transaction has been rolled back and the

database restored to its state prior to the start of the transaction. Two options after it has been aborted:

Restart the transaction

can be done only if no internal logical error

Kill the transaction

Committed – after successful completion.


ACID PropertiesTo preserve the integrity of data the database system must ensure:

Atomicity. Either all operations of the transaction are properly

reflected in the database or none are.

Consistency. Execution of a transaction in isolation preserves the

consistency of the database.

Isolation. Although multiple transactions may execute

concurrently, each transaction must be unaware of other

concurrently executing transactions. Intermediate transaction

results must be hidden from other concurrently executed

transactions.

That is, for every pair of transactions Ti and Tj, it appears to Ti that

either Tj, finished execution before Ti started, or Tj started execution

after Ti finished.

Durability. After a transaction completes successfully, the

changes it has made to the database persist, even if there are

system failures.


ACID Properties w.r.t. a Transaction Atomicity requirement

If the transaction fails after step 3, money will

be ―lost‖ leading to an inconsistent database

state

The system should ensure that updates of a

partially executed transaction are not

reflected in the database

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Durability requirement — once the user has been notified that the

transaction has completed, the updates to the database by the

transaction must persist even if there are software or hardware

failures.

Consistency requirement in above example:

The sum of A and B is unchanged by the execution of the

transaction


Required Properties of a Transaction (Cont.) Isolation requirement — if between steps 3 and 6, another

transaction T2 is allowed to access the partially updated database

T1 T2

1. read(A)

2. A := A – 50

3. write(A)

read(A), read(B), print(A+B)

4. read(B)

5. B := B + 50

6. write(B)

Isolation can be ensured trivially by running transactions serially

That is, one after the other.

However, executing multiple transactions concurrently has

significant benefits.


Concurrent Executions

Multiple transactions are allowed to run concurrently in

the system. Advantages are:

Increased processor and disk utilization, leading to better

transaction throughput

E.g. one transaction can be using the CPU while another is

reading from or writing to the disk

Reduced average response time for transactions: short

transactions need not wait behind long ones.

Concurrency control schemes – mechanisms to

achieve isolation

That is, to control the interaction among the concurrent

transactions in order to prevent them from destroying the

consistency of the database


Schedules Schedule – a sequences of instructions that specify the

chronological order in which instructions of concurrent

transactions are executed

A schedule for a set of transactions must consist of all instructions of

those transactions

Must preserve the order in which the instructions appear in each

individual transaction.

A transaction that successfully completes its execution will have a

commit instructions as the last statement

By default transaction assumed to execute commit instruction as its

last step

A transaction that fails to successfully complete its execution will

have an abort instruction as the last statement


Serial Schedule 1 Let T1 transfer Rs.50 from A to B, and T2 transfer 10% of the balance

from A to B.

An example of a serial schedule in which T1 is followed by T2 :

Serial execution of transactions

always ensures isolation and

consistence in database

Consistency is preserved

i.e. A+B remains same


Serial Schedule 2 A serial schedule in which T2 is followed by T1 :

Consistency is preserved

irrespective of execution sequence

of both T1 and T2


Concurrent Schedule 3

concurrent schedule

does not preserve the

sum of A+B


Concurrent Schedule 4 the sum “A + B” is preserved

It is not a serial schedule, but it is equivalent to a serial Schedule.

These schedules are called serializable schedules.

i.e. out of multiple possible concurrent schedules, some may ensure isolation and other may not.

Hence only the concurrent schedules that ensures isolation and consistency shall be acceptable.


Serializability

If the final outcome of a concurrent schedule S1, is

same as that of a serial schedule S2, then S1 is said to

be a seralizable schedule.

i.e. A concurrent schedule is serializable if it is

equivalent to a serial schedule. Different forms of

schedule equivalence give rise to the notions of:

1. Conflict Serializability

2. View Serializability

Simplified view of transaction

We ignore operations other than read and write instructions

We assume that transactions may perform arbitrary computations in

between reads and writes


Conflicting Instructions Let Ii and Ij be two Instructions of transactions Ti and Tj

respectively. Instructions Ii and Ij conflict if and only if there exists some item Q accessed by both Ii and Ij, and at least one of these

instructions wrote Q.

1. Ii = read(Q), Ij = read(Q). Ii and Ij don’t conflict, order does not matter

2. Ii = read(Q), Ij = write(Q). They conflict, as the order maters

3. Ii = write(Q), Ij = read(Q). They conflict, as the order maters

4. Ii = write(Q), Ij = write(Q). They conflict, order does not affect.

However, the value obtained by the next read(Q) is affected, since the result of

only the latter of the two write instructions is preserved in the database

Intuitively, a conflict between Ii and Ij forces a (logical) temporal

order between them.

If Ii and Ij are consecutive in a schedule and they do not conflict,

their results would remain the same even if they had been

interchanged in the schedule.

i.e. if both Ii and Ij represent read operation, then they can be

swapped, but if any one of them is a write operation then they can

not be swapped.


Conflict Serializability If a schedule S can be transformed into a schedule S by a series

of swaps of non-conflicting instructions, we say that S and S are conflict equivalent.

We say that a schedule S is conflict serializable if it is conflict

equivalent to a serial schedule

conflicts

Does not conflicts,

as different data

conflictswrite(A) of T2 can be

swapped with read(B) of T1


Conflict Serializability (Cont.)

Swap the read(B) instruction of T1 with the read(A) instruction of T2.

Swap the write(B) instruction of T1 with the write(A) instruction of T2.

The final result of these swaps is a serial schedule

i.e. S and S’ are conflict equivalent

And hence S is conflict serializable


Conflict Serializability (Cont.)

Example of a schedule that is not conflict serializable:

It is not possible to swap instructions in the above

schedule to obtain a serial schedule


Precedence Graph

It is a simple and efficient method for determining

conflict serializability

Consider schedule S of a set of transactions T1, T2, ..., Tn

Precedence graph — a direct graph G=(V,E),

where the vertices are participating transactions.

We draw an directed from Ti to Tj if the two transaction

conflict, i.e.

Ti executes write(Q) before Tj executes read(Q)

Ti executes read(Q) before Tj executes write(Q)

Ti executes write(Q) before Tj executes write(Q)

If the precedence graph for S has a cycle, then S is

not conflict serializable


Precedence graph for Schedule 1

Since all instructions of T1 are

executed before the first

instruction of T2 is executed.

An edge is formed from T1 to T2

T1 T2

As there is no cycle, therefore S1 is conflict serializable



T1 executes write(A) before T2

executes read(A)

T1 T2

T1 executes read(B) before T2

executes write(B)

T1 executes write(B) before T2

executes write(B)



Precedence Graph Schedule 3

One edge from T1 T2 , as T1

executes read(A), before T2

executes write(A)

Another edge T2 T1 , as T2

executes read(B), before T1

executes write(B)

As the precedence graph contains a cycle, therefore S3

is not conflict serializable


Concurrent Schedule 5 Test for Conflict serializablity

Test for schedule equivalence

T1 T2

Read(A)

A=A-50

Write(A)

Read(B)

B=B-10

Write(B)

Read(B)

B=B+50

Write(B)

Read(A)

A=A+10

Write(A)

Precedence Graph

T1 executes write(A) before T2 executes

read(A) (edge from T1 →T2)

T2 executes write(B) before T1 executes

read(B) (edge from T2 → T1)

So S5 is not conflict serializable

Schedule equivalence (A+B)

Before transaction, A+B =1500

After transaction, A+B = 1500

So S5 and S5’ are equivalent schedules

It is possible to have two schedules that produce

same outcome but are not conflict serializable

Schedule equivalence have less-stringent

definitions


Recoverable Schedules

Recoverable schedule — if a transaction Tj reads a data item previously written by a transaction Ti , then the commit operation

of Ti must appear before the commit operation of Tj.

The following schedule is not recoverable if T9 commits

immediately after the read(A) operation.

If T8 should abort, T9 would have read (and possibly shown to the

user) an inconsistent database state. Hence, database must

ensure that schedules are recoverable.


Cascading Rollbacks Cascading rollback – a single transaction failure leads

to a series of transaction rollbacks. Consider the

following schedule where none of the transactions

has yet committed (so the schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.

Can lead to the undoing of a significant amount of

work


Cascadeless Schedules Cascadeless schedules — for each pair of transactions

Ti and Tj such that Tj reads a data item previously

written by Ti, the commit operation of Ti appears

before the read operation of Tj.

Every cascadeless schedule is also recoverable

It is desirable to restrict the schedules to those that are

cascadeless

Example of a schedule that is NOT cascadeless


Concurrency Control

A database must provide a mechanism that will ensure that all

possible schedules are both:

Conflict serializable.

Recoverable and preferably cascadeless

A policy in which only one transaction can execute at a time

generates serial schedules, but provides a poor degree of

concurrency

Concurrency-control schemes tradeoff between the amount of

concurrency they allow and the amount of overhead that they

incur

Testing a schedule for serializability after it has executed is a little

too late!

Tests for serializability help us understand why a concurrency control

protocol is correct

Goal – to develop concurrency control protocols that will assure

serializability.


Weak Levels of Consistency

Some applications are willing to live with weak

levels of consistency, allowing schedules that

are not serializable

E.g., a read-only transaction that wants to get an

approximate total balance of all accounts

E.g., database statistics computed for query

optimization can be approximate (why?)

Such transactions need not be serializable with

respect to other transactions

Tradeoff accuracy for performance


Levels of Consistency in SQL-92

Serializable — default

Repeatable read — only committed records to be read, repeated reads of same record must return same value. However, a

transaction may not be serializable – it may find some records

inserted by a transaction but not find others.

Read committed — only committed records can be read, but successive reads of record may return different (but committed)

values.

Read uncommitted — even uncommitted records may be read. Lower degrees of consistency useful for gathering approximate

information about the database

Warning: some database systems do not ensure serializable schedules by

default

E.g., Oracle and PostgreSQL by default support a level of consistency

called snapshot isolation (not part of the SQL standard)


Transaction Definition in SQL

Data manipulation language must include a construct

for specifying the set of actions that comprise a

transaction.

In SQL, a transaction begins implicitly.

A transaction in SQL ends by:

Commit work commits current transaction and begins a new

one.

Rollback work causes current transaction to abort.

In almost all database systems, by default, every SQL

statement also commits implicitly if it executes

successfully

Implicit commit can be turned off by a database directive

E.g. in JDBC, connection.setAutoCommit(false);


Precedence graph for Schedule 1

Since all instructions of T1 are

executed before the first

instruction of T2 is executed.

An edge is formed from T1 to T2

T1 T2




T1 executes write(A) before T2

executes read(A)

T1 T2

T1 executes read(B) before T2

executes write(B)

T1 executes write(B) before T2

executes write(B)



Precedence Graph Schedule 3

One edge from T1 T2 , as T1

executes read(A), before T2

executes write(A)

Another edge T2 T1 , as T2

executes read(B), before T1

executes write(B)

As the precedence graph contains a cycle, therefore S3

is not conflict serializable


Concurrent Schedule 5 Test for Conflict serializablity

Test for schedule equivalence

T1 T2

Read(A)

A=A-50

Write(A)

Read(B)

B=B-10

Write(B)

Read(B)

B=B+50

Write(B)

Read(A)

A=A+10

Write(A)

Precedence Graph

T1 executes write(A) before T2 executes

read(A) (edge from T1 →T2)

T2 executes write(B) before T1 executes

read(B) (edge from T2 → T1)

So S5 is not conflict serializable

Schedule equivalence (A+B)

Before transaction, A+B =1500

After transaction, A+B = 1500

So S5 and S5’ are equivalent schedules

It is possible to have two schedules that produce

same outcome but are not conflict serializable

Schedule equivalence have less-stringent

definitions


Recoverable Schedules Recoverable schedule — if a transaction Tj reads a data

item previously written by a transaction Ti , then the

commit operation of Ti must appear before the commit

operation of Tj.

This is a nonrecoverable schedule because

If T8 fails before it commits, then T9 reads new value of A. i.e. T9 is

dependent on T8.

Therefore T9 should also be aborted along with T8

But T9 already committed with a inconsistent database sate.

To make this schedule

recoverable, T9 have to

delay commiting until after

T8 commits


Cascading Rollbacks A single transaction failure may lead to a series of

transaction rollbacks.

T10 writes A, read by T11

T11 writes A, read by T12

T12 depends on T11 and

T11 depends on T10

Now if T10 fails, then T11, and T12 has also to be rolled

back along with T10due their interdependency.

If a transaction failure leads to a series of rollbacks, is

called cascading rollback

It is undesirable as involves significant amount of work.


Cascadeless Schedules

It is desirable to restrict the schedules so that

cascading rollback can’t occur.

These schedules are called cascadeless schedule.

i.e. for each pair of transactions Ti and Tj such that Tj

reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of

Tj

Every cascadeless schedule is also recoverable


Concurrency Control Is a mechanism to ensure isolation in a concurrent

execution scenario

Achieved using the concept of mutual exclusion

i.e. while one transaction is accessing a data item,

no other transaction is allowed to modify that data

item.

Mutual exclusion is achieved using logical locks.

Locks are granted/revoked by a central

concurrency control manager.

i.e. Transactions request it to grant a lock


Locks Data items can be locked in two modes :

Shared: Data item can only be read. Requested

using lock-S instruction. It can be shared with other

transactions

Exclusive: Data item can be both read as well as

written. It is requested using lock-X instruction. It

can’t be shared with other transactions

Lock-compatibility matrix

Shared locks may be granted to

multiple transactions

simultaneously.

Exclusive locks can’t be granted

to multiple transactions

simultaneously


Locking ExampleT1 T2 CC Manager

Lock-X(B)

Grant-X(B, T1)

Read(B)

B=B-50

Write(B)

Unlock(B)

Lock-S(A)

Grant-S(A,T2)

Read(A)

Unlock(A)

Lock-S(B)

T1 T2 CC Manager

Grant-S(B, T2)

Read(B)

Unlock(B)

Display

(A+B)

Lock-X(A)

Grant-X(A,T1)

Read(A)

A=A+50

Write(A)

Unlock(A)

Produces inconsistent result

Transactions must hold the lock on a data item till it access that item.

It is not necessarily desirable for a transaction to unlock a data item

immediately after its final access of that item, since the serializability may not

be ensured.

1000

950

1450950


Delayed Unlocking Produces consistent result

Unlocking is delayed to the end of the

transactions

This schedule produces consistent

result

But this technique may lead to an

undesirable scenario called deadlock

When a transaction (T3) delays

unlocking on its locked data items (B)

and

requests to acquire a lock on new

data items (A) that is already locked

by another transaction (T4)

This is called a Hold-and-Wait situation


Deadlock due to Hold-and-WaitT1 T2 CC Manager

Lock-X(B)

Grant-X(B, T1)

Read(B)

B=B-50

Write(B)

Lock-S(A)

Grant-S(A,T2)

Read(A)

Lock-S(B)

Lock based concurrency

control needs the

transaction to follow a set

of rules called locking

protocol

A locking protocol is a set

of rules followed by all

transactions while

requesting and releasing

locks.

Locking protocols restrict

the set of possible

schedules.

Deadlock


The Two-Phase Locking Protocol It requires the transactions execute in two phases

Phase 1: Growing Phase

Transaction may obtain locks

Transaction may not release locks

Phase 2: Shrinking Phase

Transaction may release locks

Transaction may not obtain locks

i.e. The transaction acquires locks as needed. Once

the transaction releases a lock, it cann’t issue any more

lock requests.

Ex. Transactions T3 and T4 are two phase. On the other

hand, transactions T1 and T2 are not two phase


Properties of 2PL 2PL ensures conflict serializability. The contributing

transactions are isolated w.r.t. the lock point. (point at

which growing phase ends and shrinking phase starts)

It does not ensure deadlock free execution. In the event of

deadlock participating transactions are rolled back

It ensures recoverability but does not safeguard against

cascading rollback.

Strict two phase locking is an enhanced 2PL that ensures

cascadeless recovery. Strict 2PL requires that not only the

locking and unlocking be in two phases, all the exclusive

mode locks must be hold by the transaction till the

transaction commits.

In Rigorous 2PL both the locks are held till the transaction

commits


Timestamp-Based Protocols The serializability of conflicting transactions are pre-

decided and are with respect to the timestamp values

or the participating transaction.

i.e. transaction with lower timestamp are executed before the

transaction with higher timestamp.

Each transaction is issued a timestamp when it enters

the system.

If an old transaction Ti has time-stamp TS(Ti), a new transaction

TJ is assigned time-stamp TS(TJ) such that TS(Ti) <TS(TJ).


Timestamp Ordering Protocol - I The protocol maintains for each data Q two timestamp

values:

W-timestamp(Q) is the largest(recent) time-stamp of any

transaction that executed write(Q) successfully.

R-timestamp(Q) is the largest time-stamp of any transaction

that executed read(Q) successfully.

Suppose a transaction Ti issues a read(Q) (case-I)

a) If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q

that was already overwritten.

Hence, the read operation is rejected, and Ti is rolled

back.

b) If TS(Ti) ≥ W-timestamp(Q), then the read operation is

executed, and R-timestamp(Q) = max(R-timestamp(Q), TS(Ti)).


Timestamp-Based Protocols -II Suppose that transaction Ti issues write(Q). (case-II)

a) If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system assumed

that that value would never be produced.

Hence, the write operation is rejected, and Ti is rolled

back.

b) If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an

obsolete value of Q.

Hence, this write operation is rejected, and Ti is rolled

back.

c) Otherwise, the write operation is executed,

and W-timestamp(Q) = TS(Ti).

Transactions, that are denied/rejected are rolled

back, and needs to re-enter to the system for

execution with a new timestamp value.


Example: Timestamp ordering - I TS(T25)=1, TS(T26)=2

i.e. TS(T25) < TS(T26)

R-TS(A)=0, R-TS(B)=0

W-TS(A)=0, W-TS(B)=0

T25 – read(B)

TS(T25) > W-TS(B) (case-1,b )

R-TS(B)=1

T26 – read(B)

TS(T26) > W-TS(B) (case-1,b )

R-TS(B)=2

T26 – write(B)

TS(T26) > W-TS(B) (case-II,c )

W-TS(B)=2


Example: Timestamp ordering -II T25 – read(A)

TS(T25) > W-TS(A) (case-I,b )

R-TS(A)=1

T26 – read(A)

TS(T26) > W-TS(A) (case-1,b )

R-TS(A)=2

1

2

3

4

5

6

7

8

9

10

T26 – write(A)

TS(T26) > W-TS(A) (case-II,c )

W-TS(A)=2

Hence, the schedule is

possible under the

timestamp protocol.

i.e. schedule is conflict

serializable


Properties of TS ordering protocol This protocol ensures conflict serializability

It ensures freedom from deadlock

Cascadeless recovery can be ensured by bringing few

modifications to the protocol.


Multiversion Schemes The concurrency-control schemes ensure serializability

by either delaying an operation or aborting the

transaction

Multiversion schemes keep old versions of data item to

increase concurrency instead of overwriting the old

Each successful write results in the creation of a new

version of the data item written. Timestamp is used to

label versions.

When a read(Q) operation is issued, an appropriate

version of Q is selected based on the timestamp of

the transaction

reads never have to wait as an appropriate version is

returned immediately.


Multiversion Timestamp Ordering

Each data item Q has a sequence of versions <Q1,

Q2,...., Qm>. Each version Qk contains three data fields:

Content -- the value of version Qk.

W-timestamp(Qk) -- timestamp of the transaction

that created (wrote) version Qk

R-timestamp(Qk) -- largest timestamp of a

transaction that successfully read version Qk

When a transaction Ti creates a new version Qk of Q,

Qk's W-timestamp and R-timestamp are initialized to

TS(Ti).

R-timestamp of Qk is updated whenever a transaction

Tj reads Qk, and TS(Tj) > R-timestamp(Qk).


Multiversion Timestamp Ordering -II Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk

denote the version of Q whose write timestamp is the largest write

timestamp less than or equal to TS(Ti).

1. If transaction Ti issues a read(Q), then the value returned is the

content of version Qk.

2. If transaction Ti issues a write(Q)

1. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.

2. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten

3. else a new version of Q is created.

Observe that

Reads always succeed

A write by Ti is rejected if some other transaction Tj that (in the

serialization order defined by the timestamp values) should read

Ti's write, has already read a version created by a transaction older

than Ti.

Protocol guarantees serializability


Multiversion Two-Phase Locking

Differentiates between read-only transactions and update transactions

Update transactions acquire read and write locks, and

hold all locks up to the end of the transaction. That is,

update transactions follow rigorous two-phase locking.

Each successful write results in the creation of a new

version of the data item written.

Each version of a data item has a single timestamp

whose value is obtained from a counter ts-counter that

is incremented during commit processing.

Read-only transactions are assigned a timestamp by

reading the current value of ts-counter before they start

execution; they follow the multiversion timestamp-ordering

protocol for performing reads.


Multiversion Two-Phase Locking -II

When an update transaction wants to read a data item:

it obtains a shared lock on it, and reads the latest version.

When it wants to write an item

it obtains X lock on; it then creates a new version of the item

and sets this version's timestamp to .

When update transaction Ti completes, commit processing

occurs:

Ti sets timestamp on the versions it has created to ts-counter +

1

Ti increments ts-counter by 1

Read-only transactions that start after Ti increments ts-counter will

see the values updated by Ti.

Read-only transactions that start before Ti increments the

ts-counter will see the value before the updates by Ti.

Only serializable schedules are produced.


Failure Classification Transaction failure :

Logical errors: transaction cannot complete due to some internal error condition, e.g. bad input, data not found etc.

System errors: system has entered an undesirable state(e.g.,

deadlock) for which transaction can't continue.

System crash: a power failure or other hardware or software failure causes the system to crash.

Fail-stop assumption: non-volatile storage contents are

assumed to not be corrupted as result of a system crash

Database systems have numerous integrity checks to

prevent corruption of disk data

Disk failure: a head crash or similar disk failure destroys all or part

of disk storage

Destruction is assumed to be detectable: disk drives use

checksums to detect failures


Log-Based Recovery A log is kept on stable storage.

The log is a sequence of log records, which maintains information about update activities on the database.

When transaction Ti starts, it registers itself by writing a record <Ti start> to the log

Before Ti executes a write(q), a log record of the form <Ti, Q, Vold, Vnew> is written

where Vold is the value of Q before the write, and Vnew is the value to be written to Q.

When Ti finishes it last statement, the log record of the form <Ti commit> is written.

When there is an abnormal termination, the log record of the form <Ti abort> is written.


Atomicity Preservation In the event of a failure, the system scans the log from

bottom to top in order to determine the transaction whose atomicity/durability properties are at risk.

the recovery scheme performs following operations.

If for a transaction <Ti start> log record is found but <Ti commit> record not found then this transaction need to be rolled back

To preserve atomicity, undo(Ti) is executed.

Undo(Ti): restores all modified data items to their old values as depicted in the corresponding modification log records of transaction Ti


Durability Preservation All the transactions who have completed execution

and subsequently committed by the time failure occurs have their durability property at risk.

Procedure

The logs are scanned backward to find the transactions having both <Ti start> and <Ti commit> records are present in the log

To preserve durability execute Redo(Ti)

It sets the value of each modified data item of transaction Ti to its new value as found in all modified log records of transaction Ti.

Both Undo(Ti), and Redo(Ti) operation are idempotent, i.e. undoing or redoing a transaction several times ensures the same final outcome.


Preservation example T1: <T1 start>

Read (A)

<T1, A, 1000,950 >

A=A-50

Write(A)

Read(B) <T1, B, 500,550 >

B=B+50

Write(B)

<T1commit>

T2: <T2 start>

Read (C)

<T2, C, 300,400 >

C=C+100

Write(C)

<T2 commit>

<T3 start>

T3:

Read (D)

Read(E)

Display(D+E)

<T3 commit>

F1

F2

F3

F1: Undo(T1)

F2: Undo(T2)

Redo(T1) F3:

Undo (T3)

Redo(T2)

Redo(T3)

Recovery

Procedure


Data Access with Concurrent transactions

A

B

X

Y

buffer

Buffer Block A

Buffer Block B

input(A)

output(B)

read(X)

write(Y)

x1

y1

work area

of T1

diskmemory

work area

of T2

x2


Approaches to log based recovery Immediate database modification: allows updates of

an uncommitted transaction to be made to the buffer, or the disk itself, before the transaction commits

In this scheme, transaction needs to undergo Undo(Ti) operation in case of failure to preserve atomicity.

Deferred database modification: performs updates to buffer/disk only at the time of transaction commit

In this scheme, transaction does not need to perform Undo(Ti) operation in the event of failure.

The recovery procedure in this case needs to ignore and delete corresponding modification log record of the failed transaction.


Database Modification Example

Log Write Output

<T0 start>

<T0, A, 1000, 950>

<To, B, 2000, 2050>

A = 950B = 2050

<T0 commit>

<T1 start><T1, C, 700, 600>

C = 600

BB , BC

<T1 commit>

BA

Note: BX denotes block containing X.

BC output before

T1 commits

BA output after

T0 commits


Checkpoints Redoing/undoing all transactions recorded in the log can be

very slow

Processing the entire log is time-consuming if the system has

run for a long time

unnecessary redo of transactions which have already output

their updates to the database.

To get rid of above overheads, checkpoints are introduced and

checkpointing is performed periodically.

All updates are stopped while doing checkpointing

1. Output all log records currently residing in main memory onto

stable storage.

2. Output all modified buffer blocks to the disk.

3. Write a log record < checkpoint L> onto stable storage where

L is a list of all transactions active at the time of checkpoint.


Checkpoints (Cont.) During recovery we need to consider only the most recent

transaction Ti that started before the checkpoint, and

transactions that started after Ti.

Scan backwards from end of log to find the most recent

<checkpoint L> record

Only transactions that are in L or started after the checkpoint

need to be redone or undone

Transactions that committed or aborted before the

checkpoint already have all their updates output to stable

storage.

Some earlier part of the log may be needed for undo operations

Continue scanning backwards till a record <Ti start> is found

for every transaction Ti in L.

Parts of log prior to earliest <Ti start> record above are not

needed for recovery, and can be erased whenever desired.


Basic Concepts Query: Find all instructors in the Physics department

It is inefficient for the system to read every tuple in the instructor

relation to check if the dept name value is “Physics”.

To make the system be able to locate these records

directly, some additional structures need to be

associated called Index structure

An index structure comprises of index entries/records

having two fields.

Search Key, pointer

Search Key: attribute(s) used to look up records in a file.

Pointer: specifies exact physical address of the records containing the search key value.

Physical address: the block identifier of the record with the

offset to locate the record.


Index structure - I Two basic kinds of indices:

Ordered indices: search keys are stored in sorted order

Hash indices: search keys are distributed uniformly across

―buckets‖ using a ―hash function‖.

In an ordered index, index entries are stored sorted on

the search key value. E.g., author catalog in library.

Primary index: in a sequentially ordered file, the index

whose search key specifies the sequential order of the

file. Also called clustering index

The search key of a primary index is usually but not necessarily

the primary key.

Secondary index: an index whose search key specifies

an order different from the sequential order of the file.

Also called non-clustering index.


Example

A sequential file of instructor records, the records are stored in

sorted order of instructor ID, which is used as the search key.


Index Files An index entry, or index record, consists of a search-key value

and pointers to one or more records with that value as their

search-key value.

The pointer to a record consists of the identifier of a disk block

and an offset within the disk block to identify the record within the

block.

Index-sequential file: ordered sequential file with a primary index.

There are two types of ordered indices that we can use:

Dense index — Index record appears for every search-key value in

the file.

E.g. index on ID attribute of instructor relation

Sparse Index: contains index records for only some search-key

values.

Applicable when records are sequentially ordered on search-key


Ex1. Dense Index Files Dense index on emp_id, with instructor file sorted on

emp_id


Ex2. Dense Index Files Dense index on dept_name, with instructor file sorted

on dept_name

The address pointer is followed pointing to the first record in the

file with the search key value and all other records with the same

search key value may be found present sequentially


Sparse Index Files To locate a record with search-key value K we:

Find index record with largest search-key value < K

Search file sequentially starting at the record to which the

index record points


Sparse Index Files (Cont.) Only the clustered index can be made sparse

Time required to access records using dense index is

faster as compared to sparse index

Sparse index on the other hand consume less space

which would be beneficial for voluminous database

In practice, a sparse index is designed in such a way

that there at least exist one pointer to each disk block

that the index sequential file spreads accross


Multilevel Index - I Suppose we build a dense index on a relation with

1,000,000 tuples.

Let us assume that 100 index entries fit on a 4 kilobyte

block. Thus, our index occupies 10,000 blocks.

If the relation instead had 100,000,000 tuples, the index

would instead occupy 1,000,000 blocks, or 4 gigabytes

of space.

If primary index does not fit in memory, access

becomes expensive.

Such large indices are stored as sequential files on disk.

Solution: treat primary index kept on disk as a

sequential file and construct a sparse index on it

called multilevel indexing


Multilevel Index - II The first level of index can be considered as an index

sequential file and we built another level of sparse

index on it

inner index – the primary index file

The second level of index must consume lesser no of

disk blocks that of inner index

outer index – a sparse index of primary index

If even outer index is too large to fit in main memory,

yet another level of index can be created, and so on.

Indices at all levels must be updated on insertion or

deletion from the file.


Multilevel Index -III


Index Update: Deletion

Single-level index entry deletion:

Dense indices – deletion of search-key is similar to file

record deletion.

Sparse indices –

if an entry for the search key exists in the index, it is

deleted by replacing the entry in the index with the

next search-key value in the file (in search-key order).

If the next search-key value already has an index

entry, the entry is deleted instead of being replaced.

If deleted record was the

only record in the file with its

particular search-key value,

the search-key is deleted

from the index also.


Index Update: Insertion

Single-level index insertion:

Perform a lookup using the search-key value

appearing in the record to be inserted.

Dense indices – if the search-key value does not

appear in the index, insert it.

Sparse indices – if index stores an entry for each block

of the file, no change needs to be made to the index

unless a new block is created.

If a new block is created, the first search-key value

appearing in the new block is inserted into the index.

Multilevel insertion and deletion: algorithms are simple

extensions of the single-level algorithms


Secondary Indices

Frequently, one wants to find all the records whose

values in a certain field (which is not the search-key of

the primary index) satisfy some condition.

Example 1: In the instructor relation stored

sequentially by ID, we may want to find all instructors

in a particular department

Example 2: as above, but where we want to find all

instructors with a specified salary or with salary in a

specified range of values

We can have a secondary index with an index record

for each search-key value


Thank You

Introduction to database-Transaction Concurrency and Recovery

Engineering

Transcript of Introduction to database-Transaction Concurrency and Recovery