1 CS 728 Advanced Database Systems Chapter 20 Foundation of Database Transaction Processing.

57
1 CS 728 Advanced Database Systems Chapter 20 Foundation of Database Transaction Processing
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    1

Transcript of 1 CS 728 Advanced Database Systems Chapter 20 Foundation of Database Transaction Processing.

1

CS 728 Advanced Database Systems

Chapter 20

Foundation of Database Transaction Processing

2

1 Introduction to Transaction Processing (1)

A transaction is a sequence of operations whose execution

transforms a database from one consistent state to another consistent state.

Transaction boundaries: Begin and End transaction.

Consistent state: the data currently in the database satisfy all integrity constraints defined for the database.

During transaction execution the database may be inconsistent.

3

Introduction to Transaction Processing (2)

When the transaction is committed, the database must be consistent.

Two main issues to deal with: Failures of various kinds, such as hardware

failures and system crashes Concurrent execution of multiple transactions

Consistent Database State

Consistent Database State

Execution of transaction

Possible inconsistent state

4

Introduction to Transaction Processing (3)

Consider a transaction that transfers $200 from account A to account B.

read(A)

A = A - 200 write(A) read(B) B = B + 200 write(B)

System crash

5

Introduction to Transaction Processing (5)

Basic operations are read and write read_item(X): Reads a database item named

X into a program variable. To simplify our notation, we assume that the program variable is also named X.

write_item(X): Writes the value of program variable X into the database item named X.

Basic unit of data transfer from the disk to the computer main memory is one block.

6

Introduction to Transaction Processing (6)

read_item(X) command includes the following steps: Find the address of the disk block that contains

item X.

Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer).

Copy item X from the buffer to the program variable named X.

7

Introduction to Transaction Processing (7)

write_item(X) command includes the following steps: Find the address of the disk block that contains

item X. Copy that disk block into a buffer in main memory

(if that disk block is not already in some main memory buffer).

Copy item X from the program variable named X into its correct location in the buffer.

Store the updated block from the buffer back to disk (either immediately or at some later point in time).

8

Introduction to Transaction Processing (8)

FIGURE 20.2 Two sample transactions: (a) Transaction T1 (b) Transaction T2

9

Why Concurrency Control is needed

Multiple transactions are allowed to run concurrently. Advantages are:

increased processor and disk utilization, leading to better transaction throughput one transaction can be using the CPU while another is

reading from or writing to the disk throughput: # of transactions executed in a given

amount if time. reduced average response time for transactions

short transactions need not wait behind long ones. Concurrency control is needed to achieve isolation

i.e., to control the interaction among the concurrent transactions in order to prevent them from destroying the database consistency.

10

Why Concurrency Control is needed

Problems occur when concurrent transactions execute in an uncontrolled manner: The Lost Update The Temporary Update (or Dirty Read) The Incorrect Summary

Unrepeatable Read: A transaction T1 may read a given value. If another

transaction later updates that value and T1 reads that value again, then T1 will see a different value.

11

The Lost Update Problem

This occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect.

12

The Temporary Update Problem(Dirty Read)

This occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another

transaction before it is changed back to its original value.

13

Incorrect Summary Problem

If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.

14

Incorrect Summary Problem

15

Transaction and System Concepts (1)

A transaction is an atomic unit of work that is either completed in its entirety or not done at all. For recovery purposes, the system needs to keep track

of when the transaction starts, terminates, and commits or aborts.

A transaction must be in one of the following states: Active:

the initial state; the transaction stays in this state while it is executing

Partially committed: after the final statement has been executed.

Failed: after the discovery that normal execution can no longer

proceed.

16

Transaction and System Concepts (2)

Committed: after successful completion.

Aborted: after the transaction has been rolled back and the

database restored to its state prior to the start of the transaction. Two options after it has been aborted:– restart the transaction (only if no internal logical error)

– kill the transaction

17

Transaction and System Concepts (3)

Recovery manager keeps track of the following operations: begin_transaction

This marks the beginning of transaction execution. read or write:

These specify read or write operations on the database items that are executed as part of a transaction.

end_transaction: This specifies that read and write transaction operations have

ended and marks the end limit of transaction execution. At this point it may be necessary to check whether the changes

introduced by the transaction can be permanently applied to the database or whether the transaction has to be aborted because it violates concurrency control or for some other reason.

18

Transaction and System Concepts (4)

Recovery manager keeps track of the following operations (cont): commit_transaction:

This signals a successful end of the transaction so that any changes (updates) executed by the transaction can be safely committed to the database and will not be undone.

rollback (or abort): This signals that the transaction has ended

unsuccessfully, so that any changes or effects that the transaction may have applied to the database must be undone.

19

Transaction and System Concepts (5)

Recovery techniques use the following operators: undo:

Similar to rollback except that it applies to a single operation rather than to a whole transaction.

redo: This specifies that certain transaction operations

must be redone to ensure that all the operations of a committed transaction have been applied successfully to the database.

20

State transition diagram illustrating the states for transaction execution

Rollback

21

Desirable Properties of Transactions (1)

The ACID properties of a transaction Atomicity

a transaction is an atomic processing unit; it is either performed in its entirety or not performed at all.

Consistencya transaction transforms a database from a consistent state to

another consistent state. Isolation

A transaction should not make its updates visible to other transactions until it is committed; this property, when enforced strictly, solves the temporary update problem.

Durabilitycommitted work must never be lost due to subsequently failure.

22

ACID Properties

Example:T1 T2 value of Xread(X) 200 (initial value)X = X + 100 300 (not saved yet)

read(X) 200 X = X - 50 150 (not saved yet) write(X) 300 (saved) write(X) 150 (overwrite 300)

lost $100! Correct value of X should be 250.

23

Schedules

A schedule S of n transactions T1, T2, ..., Tn is an ordering of all the operations in these transactions subject to the constraint that: for each transaction Ti, the operations of Ti in S

must appear in the same order as they do in Ti.Note, however, that operations from other transactions Tj

can be interleaved with the operations of Ti in S.

Example: Given T1 = R1(Q) W1(Q) & T2 = R2(Q) W2(Q)

a schedule: R1(Q) R2(Q) W1(Q) W2(Q)

not a schedule: W1(Q) R1(Q) R2(Q) W2(Q)

24

Schedules

Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);

25

Schedules

Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;

26

Conflict Operations

Instructions (Operations) li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

1. li = Read(Q), lj = Read(Q). li and lj don’t conflict2. li = Read(Q), lj = Write(Q). They conflict3. li = Write(Q), lj = Read(Q). They conflict4. li = Write(Q), lj = Write(Q). They conflict

Two operations in a schedule are conflict if:1) They belong to different transactions,2) They access the same item Q, and3) At least one them is a Write(Q) operation.

27

Recoverable Schedule

Recoverable schedule: One where no committed transaction needs to be

rolled back. A schedule S is recoverable if no transaction T in S

commits until all transactions T’ that have written an item that T reads have committed.

T reads from T’ in S if X is first written by T’ and later read by T. T’ should not have been aborted before T reads X There should be no transaction Ti that writes X

after T’ writes it before T reads it (unless Ti, if any, has aborted before T reads X).

Sa, Sb and Sa’ are recoverable: Sa’: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;

28

Recoverable Schedule

Consider the following schedules: Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1; Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2; Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;

Sc is not recoverable because: T2 reads item X from T1, and then T2 commits before T1

commits. If T1 aborts after c2 operation in Sc, then the value of X

that T2 read is no longer valid and T2 must be aborted after it is committed, leading to a schedule that is not recoverable.

For the schedule to be recoverable c2 operation in Sc must be postponed until after T1 commits, as shown in Sd;

If T1 aborts instead of committing, then T2 should also abort as shown in Se, because the value of X it read is no longer valid.

29

Cascade-less Schedule

Schedules requiring cascaded rollback: A schedule in which uncommitted transactions

that read an item from a failed transaction must be rolled back. As shown in schedule Se

Cascadeless Schedule: One where every transaction reads only the

items that are written by committed transactions. r2(X) in Sd and Se must be postponed until after T1

has committed (or aborted), thus delaying T2 but ensuring no cascading rollback if T1 aborts.

30

Cascade-less Schedule

Strict Schedules: A schedule in which a transaction can neither

read or write an item X until the last transaction that wrote X has committed.

Consider the following schedule: Sf: w1(X, 5); w2(X , 8); a1; Suppose the value of X was originally 9. If T1 aborts, as in Sf, the recovery system will

restore the value of X to 9, even though it has already been changed to 8 by T2, thus leading to incorrect results.

Although Sf is cascade-less, it is not strict It permits T2 to write X even though T1 that last

wrote X had not yet committed (or aborted).

31

Schedules Classification

In term of: 1. Recoverability 2. Avoidance of cascading rollback 3. Strictness

Condition 2 implies condition 1, and condition 3 implies both 1 and 2.

Thus, all strict schedules are cascade-less, and All cascade-less schedules are recoverable

32

Recoverability

Need to address the effect of transaction failures on concurrently running transactions.

Recoverable schedule if a transaction Tj reads a data items previously

written by a transaction Ti, the commit operation of Ti appears before the commit operation of Tj.

The following schedule (Schedule 11) is not recoverable if T9 commits immediately after the read

33

Recoverability

If T8 should abort, T9 would have read (dirty read) an inconsistent database state. Hence database must ensure that schedules are recoverable.

34

Recoverability (Cont.)

Cascading rollback a single transaction failure leads to a series of

transaction rollbacks. Consider the following schedule where none of

the transactions has yet committed (so the schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.

35

Recoverability (Cont.)

Can lead to the undoing of a significant amount of work

Cascadeless schedules cascading rollbacks cannot occur; for each pair of

transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.

Every cascadeless schedule is also recoverable

It is desirable to restrict the schedules to those that are cascadeless

36

Schedules

A schedule S is serial, if for every transaction T participating in the

schedule, all the operations of T are executed consecutively if operations from different transactions are not interleaved.

otherwise the schedule is called nonserial.

Serial schedules: R1(Q) W1(Q) R2(Q) W2(Q) R2(Q) W2(Q) R1(Q) W1(Q)

Non-serial schedule: R1(Q) R2(Q) W1(Q) W2(Q)

37

Example Schedules

The following is a serial schedule (Schedule 1), in which T1 is followed by T2.

38

Example Schedule (Cont.)

The following schedule (Schedule 3) is not a serial schedule, but it is equivalent to Schedule 1.

In both Schedule 1 & 3, the sum A+B is preserved.

39

Example Schedules (Cont.)

The following concurrent schedule (Schedule 4) does not preserve the value of the the sum A + B.

40

Example Schedules

(a) Serial schedule A: T1 followed by T2. (b) Serial schedules B: T2 followed by T1.

41

Example Schedule (Cont.)

(c) Two nonserial schedules C and D with interleaving of operations.

42

Several Observations

Serial schedule guarantees database consistency. n transactions may form n! different serial

schedules. Different serial schedule may produce different

result. Suppose Q = 20 initially. R1(Q), Q=Q+10, W1(Q), R2(Q), Q=Q*2, W2(Q)

produces Q = 60 R2(Q), Q=Q*2, W2(Q), R1(Q), Q=Q+10, W1(Q)

produces Q = 50

Allowing only serial schedule may cause poor system performance (i.e., low throughput)

43

Several Observations

Serial schedule is not a must for guaranteeing transaction consistency.

If X and Y are independent, then the following two schedules always produces the same result: non-serial schedule:

R1(X) W1(X) R2(X) W2(X) R1(Y) W1(Y)

serial schedule:R1(X) W1(X) R1(Y) W1(Y) R2(X) W2(X)

44

Serializability

A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions.

Basic Assumption each transaction preserves database consistency

We ignore operations other than read and write instructions schedules consist of only read and write

instructions

45

Serializability

One way to ensure correctness of concurrent transactions is to enforce serializability of transactions that is the interleaved execution of the

transactions must be equivalent to some serial execution of those transactions.

The interleaved execution of a set of transactions is considered correct iff it is serializable.

A nonserial but serializable schedule often permits higher degree of concurrency than a serial schedule.

Different forms of schedule equivalence give rise to the notions of: conflict serializability view serializability

46

Serializability

Two schedules that are result equivalent for the initial value of X = 100, but are not result equivalent in general.

47

Conflict Serializability

If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.

If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. Two schedules are called conflict equivalent if

the order of any two conflicting operations is the same in both schedules

We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

48

Example

Consider two transactions: T1 = R1(X) W1(X) R1(Y) W1(Y)

T2 = R2(X) W2(X)

The following two schedules are equivalent: S1: R1(X) W1(X) R2(X) W2(X) R1(Y) W1(Y)

S2: R1(X) W1(X) R1(Y) W1(Y) R2(X) W2(X)

49

Conflict Serializability (Cont.)

Example of a schedule that is not conflict serializable:

T3 T4

read(Q)write(Q)

write(Q)

We are unable to swap instructions in the above schedule to obtain either the serial schedules T3, T4 or T4, T3

50

Conflict Serializability (Cont.)

Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions.

Therefore Schedule 3 is conflict serializable.

51

View Serializability

Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met:

1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q.

2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must, in schedule S´, also read the value of Q that was produced by transaction Tj.

52

View Serializability

3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S´.

As can be seen, view equivalence is also based purely on reads and writes alone.

Conditions 1 and 2 ensure that each transaction reads the same values in both

schedules. Condition 3, coupled with conditions 1 and 2,

ensures that both schedules results in the same final state

53

View Serializability (Cont.)

A schedule S is view serializable if it is view equivalent to a serial schedule.

Every conflict serializable schedule is also view serializable.

Schedule 9 is view-serializable but not conflict serializable. Every view serializable schedule that is not conflict

serializable has blind writes. a write operation without having performed a read

operation

54

Testing for Serializability

Consider some schedule of a set of transactions T1, T2, ..., Tn

Precedence graph — a direct graph where the vertices are the transactions (names).

We draw an edge from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which the conflict arose earlier.

The edge may be labeled by the item that was accessed

x

y

55

Example Schedule (Schedule A)

T1 T2 T3 T4 T5

R(X)R(Y)R(Z)

R(B)R(A)R(A)

R(Y)W(Y)

W(Z)R(U)

R(Y)W(Y)R(Z)W(Z)

R(U)W(U)

T3T4

T1 T2

T5

56

Test for Conflict Serializability

A schedule is conflict serializable iff its precedence graph is acyclic.

If the precedence graph of schedule S has no cycle, then S is equivalent to any serial schedule that can be generated by a topological sort of the precedence graph.

For example, a serializability order for schedule A would be T5 T1 T3 T2 T4

57

Test Schedule Serializability

Consider the following schedule S R1(X) R2(Y) W1(X) R2(X) W2(Y) W2(X)

R3(Y) W3(Y) R4(X) W4(X)

Two possible orders of topological sorting: T1 T2 T3 T4 & T1 T2 T4 T3 S is equivalent to both of the above two serial

schedules

T1 T2

T4

T3