CONCURRENCY CONTROL (CHAPTER 16)CONCURRENCY …dblab.usc.edu/csci485/fall...

44
CONCURRENCY CONTROL (CHAPTER 16) CONCURRENCY CONTROL (CHAPTER 16) 2007/11/13 1

Transcript of CONCURRENCY CONTROL (CHAPTER 16)CONCURRENCY …dblab.usc.edu/csci485/fall...

CONCURRENCY CONTROL (CHAPTER 16)CONCURRENCY CONTROL (CHAPTER 16)

2007/11/131

CONCURRENCY CONTROL INTRODUCTION

• Motivation: A dbms is multiprogrammed to increase the utilization of resources. While the CPU is processing one transaction, the disk can perform p g pa write operation on behalf of another transaction. However, the interaction between multiple transactions must be controlled to ensure atomicity and consistency of the database.A l id th f ll i t t ti T t f 50 f• As an example, consider the following two transactions. T0 transfers 50 from account A to B. T1 transfers 10% of the balance from A to B.

– T0 T1

read(A) read(A)read(A) read(A) A=A-50 tmp = A × 0.1 write(A) A = A - tmp read(B) write(A) B=B+50 read(B) write(B) B=B+tmp

write(B)

2007/11/132

CONCURRENCY CONTROL INTRODUCTION (Cont…)( )

• Assume the balance of A is 1000 and B is 2000. The bank's total balance is 3000. If the system executes T0 before T1 then A's balance will be 855 while y 0 1B's balance will be 2145. On the other hand, if T1 executes before T0 then A's balance will be 850 and B's balance will be 2150. However, note that in both cases, the total balance of these two accounts is 3000.I thi l b th h d l i t t• In this example both schedules are consistent.

T0 T1

read(A) read(A)

A schedule is a chronological execution order of multipleread(A) read(A)

A=A-50 tmp = A × 0.1 write(A) A = A - tmp read(B) write(A)

execution order of multiple transactions by a system.A serial schedule is a sequence of

i l i l i( ) ( )

B=B+50 read(B) write(B) B=B+tmp

write(B)

processing multiple transactions which ensures atomicity of each transaction. Given n transactions,

2007/11/133there are n! valid serial schedules.

Schedule 1• Let T1 transfer $50 from A to B, and T2 transfer 10% of the

balance from A to B.• A serial schedule in which T1 is followed by T2:

Schedule 2• A serial schedule where T2 is followed by T1

• Concurrent execution of multiple transactions, where the instructions of different transactions are interleaved, may result in a non-serial schedule.y

A AAB

AB

t1 t2Time

• To illustrate, assuming that a transaction makes a copy of each data item in order to manipulate it, consider the following execution:

2007/11/136

Schedule 3• Let T1 and T2 be the transactions defined previously. The following

schedule is not a serial schedule, but it is equivalent to Schedule 1.

In Schedules 1, 2 and 3, the sum A + B is preserved.

Schedule 4• The following concurrent schedule does not preserve the value of

(A + B).

Serializabilityy• Basic Assumption – Each transaction preserves database consistency.• Thus serial execution of a set of transactions preserves database

consistency.consistency.• A (possibly concurrent) schedule is serializable if it is equivalent to a serial

schedule• We ignore operations other than read and write instructions, and we g p ,

assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions.

Conflicting Instructions g• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only

if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.Q

1. li = read(Q), lj = read(Q). li and lj don’t conflict.2. li = read(Q), lj = write(Q). They conflict.3. li = write(Q), lj = read(Q). They conflict4 l it (Q) l it (Q) Th fli4. li = write(Q), lj = write(Q). They conflict

• Intuitively, a conflict between li and lj forces a (logical) temporal order between them.

If l and l are consecutive in a schedule and they do not conflict their results– If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.

Conflict Serializabilityy• If a schedule S can be transformed into a schedule S´ by a series of swaps of

non-conflicting instructions, we say that S and S´ are conflict equivalent.• We say that a schedule S is conflict serializable if it is conflict equivalent toWe say that a schedule S is conflict serializable if it is conflict equivalent to

a serial schedule

Schedule 3• Looking only at Read(Q) and write(Q) instructions

Schedule 3 can be transformed into a new schedule, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions.

Schedule 3

T1 T2

Read(A)Write(A)

Read(A)Write(A)

Read(B)Write(B)

Write(A)

Read(B)Write(A)

2007/11/1313

LOCK-BASED PROTOCOLSshared (S) mode. Data item can only be read. S-lock is requested using lock S instruction

exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-Xi i

Lock requests are made to concurrency-control manager.

• There are alternative approaches to ensure serializability among multiple transactions.

using lock-S instruction.instruction.q y g

Transaction can proceed only after request is granted.

Lock-based Protocols• To ensure serializability, require access to data items to be performed in a y q p

mutually exclusive manner.• This approach requires a transaction to lock a data item before accessing it.

The simple protocol consists of two lock modes: Shared and eXclusive. A t ti l k d t it Q i S d if it l t d Q' l d Xtransaction locks a data item Q in S mode if it plans to read Q's value and X mode it it plans to write Q. The compatibility between these two lock modes is as follows: T0

S X

S True False

X False FalseT1

2007/11/1314

LOCK-BASED PROTOCOLS (Cont…)( )

• There can be multiple transactions with S locks on a particular data item. However, only one X lock is allowed on a data item. When a transaction yrequests a lock, if a compatible lock mode exists, it proceeds to lock the data item. Otherwise, it waits. Waiting might result in deadlocks.

2007/11/1315

CONCURRENCY CONTROL INTRODUCTION (Cont…)( )

T0 T1

lockX(A) lockX(A) read(A) read(A) A=A-50 tmp = A × 0.1 write(A) A = A - tmp

l k( ) i ( )

2 1

unlock(A) write(A) lockX(B) unlock(A)read(B) lockX(B)B=B+50 read(B)4 3B B+50 read(B) write(B) B=B+tmpunlock(B) write(B)

unlock(B)

4 3

2007/11/1316

LOCK-BASED PROTOCOLS (Cont…)( )

T0 T1lockX(A) read(A) A=A-50 write(A)

lockX(B) ( )read(B) tmp = B × 0.1 B = B - tmp write(B)write(B)

lockX(B) lockX(A) DEADLOCK

read(B) B=B+50 write(B)

read(A) A=A+tmp

2007/11/1317

A A tmp write(A)

LOCK-BASED PROTOCOLS (Cont…)( )

• As a solution to deadlocks, most systems construct a Transaction Wait for Graph (TWG). A transaction is represented as a node in TWG. When a p ( ) ptransaction Ti waits for Tj, an arc is attached from Ti to Tj. Next, the system detects cycles in the TWG. If a cycle is detected then there exists a deadlock. The system breaks deadlocks by aborting one of the transactions (e.g., using log based recovery protocol)log-based recovery protocol).

Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),

•V is a set of vertices (all the transactions in the system)•E is a set of edges; each element is an ordered pair Ti →Tj.

2007/11/1318

Deadlock Detection (Cont.)

Wait-for graph without a cycle Wait-for graph with a cycleWait-for graph without a cycle Wait for graph with a cycle

2007/11/1319

Example: suppose the values of account A and B are 100 & 200If T1 d T2 t d S i ll h t i di l dIf T1 and T2 are executed Serially , what is displayed

T1 T2

lockX(B)read(B)B=B-50

lockS(A)read(A)unlock(A)

write(B)unlock(B)lock-X(A)

d( )

lockS(B)read(B)unlock(B)di l ( )read(A)

A=A+50write(A)unlock(A)

display(A+B)

unlock(A)

2007/11/1320

Example: suppose the values of account A and B are 100 & 200If T1 d T2 t d tl h t i di l dIf T1 and T2 are executed concurrently , what is displayed

T1 T2

lockX(B)( )read(B)B=B-50write(B)unlock(B)

lockS(A)read(A)unlock(A)( )lockS(B)read(B)unlock(B)display(A+B)

lock-X(A)read(A)A=A+50

display(A+B)

2007/11/1321

write(A)unlock(A)

Example: suppose the values of account A and B are 100 & 200If T1 d T2 t d tl h t i di l dIf T1 and T2 are executed concurrently , what is displayed

T1 T2

lockX(B)( )read(B)B=B-50write(B)

lockS(A)read(A)lockS(B)read(B)

lock-X(A)

unlock(A)unlock(B)display(A+B)

( )read(A)A=A+50write(A)

2007/11/1322

unlock(B)unlock(A)

LOCK-BASED PROTOCOLS (Cont…)( )

• With a two phase locking protocol, each transaction is required to release itsWith a two phase locking protocol, each transaction is required to release its locks at the end of its execution. Thus, a transaction has two phases:

1. growing phase: the transaction acquires locks. But may not release any locks2. shrinking phase: the transaction releases locks and acquires no additional locks

2007/11/1323

LOCK-BASED PROTOCOLS (Cont…)( )

Lock point

growing shrinkingNumber of

locks

growing shrinking

TimeLifetime of a transaction

2007/11/1324

The Two-Phase Locking Protocol (Cont.)When a single transaction failure leads to

i f i llb k

• Two-phase locking does not ensure freedom from deadlocksCascading roll back is possible nder t o phase locking To a oid this follo

a series of transaction rollbacks

T1 T2 T3• Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.

• Rigorous two phase locking is even stricter: here all locks are held till

T1 T2 T3

lockX(A)Read(A) T1 fails rollbackT3 T4 T3 T4

l kX(B) l kS(A) l kX(B)• Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.

Read(A)lockS(B)Read(B)Write(A)

T1 fails, rollback T2&T3

lockX(B)read(B)B=B-50write(B)

lockS(A)read(A)lockS(B)read(B)

lockX(B)read(B)B+B-50write(B)( )

Unlock(A)

lockX(A)Read(A)

write(B)lockX(A)A=A+50write(A)

read(B)display(A+B)unlock(A)unlock(B)

write(B)lockS(A)read(A)lockS(B)

Write(A)Unlock(A)

lockS(A)

unlock(B)unlock(A)

lockX(A)

2007/11/1325

Read(A)

The Two-Phase Locking Protocol (Cont.)

• Two-phase locking does not ensure freedom from deadlocksCascading roll back is possible nder t o phase locking To a oid this follo• Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.

• Rigorous two phase locking is even stricter: here all locks are held till• Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.

2007/11/1326

Lock Conversions

• Two-phase locking with lock conversions:

– First Phase: – can acquire a lock-S on item– can acquire a lock-X on item– can convert a lock-S to a lock-X (upgrade)

– Second Phase:– can release a lock-S– can release a lock-X

t l k X t l k S (d d )– can convert a lock-X to a lock-S (downgrade)

• This protocol assures serializability. But still relies on the programmer to insert the various locking instructions.

2007/11/1327

Automatic Acquisition of Locksq

• A transaction Ti issues the standard read/write instruction, without explicit locking calls.

• The operation read(D) is processed as:• The operation read(D) is processed as:if Ti has a lock on D

thenread(D)read(D)

elsebegin

if necessary wait until no otherif necessary wait until no other transaction has a lock-X on D

grant Ti a lock-S on D;read(D)( )

end

2007/11/1328

Automatic Acquisition of Locks (Cont.)q ( )

• write(D) is processed as:if Ti has a lock-X on D

thenthenwrite(D)

elsebegin

if necessary wait until no other trans. has any lock on D,if Ti has a lock-S on D

thenupgrade lock on D to lock-X

elsegrant T a lock X on Dgrant Ti a lock-X on D

write(D)end;

• All locks are released after commit or abort

2007/11/1329

LOCK-BASED PROTOCOLS (Cont…)( )

Without a two-phase locking protocol, the schedule provided by an execution of transactions might no longer be serializable. This is specially true in the g g p ypresence of aborts. Several possible situations might arise:

1. dirty reads: A transaction T0 reads the value of a record Q at two different points in time (ti and tj) and observes a different value for this record. This is jbecause an updating transaction T1 produced the value of Q when T0 read this value at time ti. However, T1 aborted sometimes later (prior to tj) and when T0tried to read the value of Q at tj, it observes the value of Q prior to execution of TT1.

2. un-repeatable reads: A transaction T0 reads the value of a record Q at two different points in time (ti and tj) and observes a different value for this record. After T0 reads the value of Q at time ti, an updating transaction T1 updates the 0 i 1value of Q and commits prior to tj. When T0 read this value of Q at time tj, it observes a Different value for Q.

2007/11/1330

LOCK-BASED PROTOCOLS (Cont…)( )

Example of dirty reads: Example of unrepeatable reads:T1 T0 T1 T0T1 T0 T1 T0

lockX(Q) lockS(Q)read(Q) read(Q)Q=Q+50 unlock(Q)i Q Q+50 unlock(Q)write(Q) lockX(Q)unlock(Q) Q=Q+50

read(Q) write(Q)

time

read(Q) write(Q)unlock(Q)

abort commitlockS(Q)lockS(Q)

read(Q) read(Q)unlock(Q)

2007/11/1331

LOCK-BASED PROTOCOLS (Cont…)( )

3. dirty writes (lost updates): T0 and T1 read the value of Q at two different points in time and produce a new value for this data item. Subsequently, they overwrite each other when updating Q. The execution paradigm that motivated the use of locking (earlier in p g Q p g g (the lecture notes) is an example of dirty writes.

T0 T1read(A)

A=950 A=A-50 read(A) A=1000 tmp = A × 0.1 tmp=100 A = A - tmp write(A) A=900 read(B) B=2000

A=950 write(A) B=2000 read(B)

B=B+50 B=2050 write(B)

B=B+tmp write(B) B=2100

2007/11/1332

• Most systems support four levels of lock granularities:– Level 3: locks held to end of a transaction (two phase locking that results in ( p g

serializable schedules)– Level 2: write locks held to end of a transaction (un-repeatable reads)– Level 1: no read locks at all (dirty reads and un-repeatable reads)

L l 0 l k (di t it di t d d t bl d )– Level 0: no locks (dirty writes, dirty reads and un-repeatable reads)

2007/11/1333

Multiple Granularityp y

• Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger onesg g g

• Can be represented graphically as a tree • When a transaction locks a node in the tree explicitly, it implicitly locks all the

node's descendents in the same mode.• Granularity of locking (level in tree where locking is done):

– fine granularity (lower in tree): high concurrency, high locking overhead– coarse granularity (higher in tree): low locking overhead, low concurrency

2007/11/1334

Example of Granularity Hierarchyp y y

The highest level in the example hierarchy is the entire database.

2007/11/1335

The levels below are of type file, pages and record in that order.

Intention Lock Modes

• In addition to S and X lock modes, there are three additional lock modes with multiple granularity:p g y

– intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks.

– intention-exclusive (IX): indicates explicit locking at a lower level with exclusive lockslocks

– shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks.

• intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.

2007/11/1336

Compatibility Matrix withIntention Lock Modes

• The compatibility matrix for all lock modes is:

IS IX S S IX X

IS ×

IX

S ×

× × ×

××S

S IX

×

× ×× ×

××

X × × × × ×

2007/11/1337

Multiple Granularity Locking Scheme

• Transaction Ti can lock a node Q, using the following rules:1. The lock compatibility matrix must be observed.2 The root of the tree must be locked first and may be locked in2. The root of the tree must be locked first, and may be locked in

any mode.3. A node Q can be locked by Ti in S or IS mode only if the parent

of Q is currently locked by Ti in IS mode.4. A node Q can be locked by Ti in X, SIX, or IX mode only if the

parent of Q is currently locked by Ti in either IX or SIX mode.

5 T can lock a node only if it has not previously unlocked any node5. Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-phase).

6. Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.

• Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order.

2007/11/1338

1) T1 wants to update records r111 and r2111) T1 wants to update records r111 and r2112) T2 wants to update all records on page p123) T3 wants to read record r11j and the entire f2 file

2007/11/1339

TIME STAMP BASED PROTOCOL

• We saw locking: S, X, IS, IX, SIX• Now, we will cover:Now, we will cover:

– Time-stamp based protocols– Optimistic concurency control

Time-stamp based protocol• provide a mechanism to enforce order. How?• When a transaction Ti is submitted, we associate a unique fixed time stamp i q p

TS(Ti). No two transactions may have an identical time stamp. One way to realize this is to use the system clock.

• The time stamp of the transaction determines the serializability order.• Associated with each data item Q is two time stamp values:

– W-TimeStamp(Q): Largest time stamp of the transaction that has written Q to date– R-TimeStamp(Q): Largest time stamp of the transaction that has read Q to date

2007/11/1340

TIME STAMP BASED PROTOCOL (Cont…)( )

• Lets assume that TS(Ti) is produced in an increasing order, i.e., Ti < Ti +1• Suppose transaction Ti issues read(Q):Suppose transaction Ti issues read(Q):

– If TS(Ti) < W-TimeStamp(Q) then Ti needs to read the value of Q which was already overwritten. Hence the read request is rejected and Ti is rolled back.

– If TS(Ti) >= W-TimeStamp(Q) then the read operation is executed and the R-ti St (Q) i t t th i f R Ti St (Q) d TS(T )timeStamp(Q) is set to the maximum of R-TimeStamp(Q) and TS(Ti).

• Suppose transaction Ti issues write(Q):– If TS(Ti) < R-TimeStamp(Q) then this implies that some transaction has already

consumed the value of Q and Ti should have produced a value before that Q i ptransaction read it. Thus, the write request is rejected and Ti is rolled back.

– If TS(Ti) < W-TimeStamp(Q) then Ti is trying to write an obsolete value of Q. Hence reject Ti’s request and roll it back.Otherwise execute the write(Q) operation– Otherwise, execute the write(Q) operation.

• When a transaction is rolled back, the system may assign a new timestamp to the transaction and restart its execution (as if it was just submitted).

• This approach is free from deadlocks

2007/11/1341

This approach is free from deadlocks.

THOMAS’S WRITE RULE

• Consider the following schedule

TT2 T3

Read(Q)Write(Q)

T2 is aborted

TI

M

• The rollback of T2 is un-necessary because T3 has already produced the final value The right thing to do is to ignore the write operation performed by T

Write(Q) E

value. The right thing to do is to ignore the write operation performed by T2. To accomplish this, modify the protocol for the write operation as follows (the protocol for read stays the same as before): When Ti issues write(Q);

– If TS(Ti) < R-TimeStamp(Q) then the value of Q that Ti is producing was previously read Hence reject the write operation and roll T backpreviously read. Hence, reject the write operation and roll Ti back.

– If TS(Ti) < W-TimeStamp(Q) then Ti is writing an obsolete value of Q. Ignore this write operation.

– Otherwise, the write is executed, and W-TimeStamp(Q) is set to TS(Ti)

2007/11/1342

OPTIMISTIC CC (VALIDATION TECHNIQUE)( Q )

• Argues that the overhead of locking is too high and not worthwhile for applications whose workload consists of read-only transactions.pp y

• Each transaction Ti has three phases:– Read phase: reads the value of data items and copies its contents to variables local

to Ti. All writes are performed on the temporary local variables.lid i h d i h h h l l i bl h l h b– Validation phase: Ti determines whether the local variables whose values have been

overwritten can be copied to the database. If not then abort. Otherwise, proceed to Write phase.

– Write phase: The values stored in local variables overwrite the value of the data items in the database.

• A transaction has three time stamps:– Start(Ti): When Ti started its execution.

V lid ti (T ) Wh T fi i h d it d h d t t d it lid ti– Validation(Ti): When Ti finished its read phase and started its validation– Finish(Ti): done with the write phase.

• TS(Ti) = Validation(Ti) instead of Start(Ti) because it produces a better response time if the conflict rate between transactions is low

2007/11/1343

response time if the conflict rate between transactions is low.

OPTIMISTIC CC (VALIDATION TECHNIQUE) (Cont…)( Q ) ( )

• When validating transaction Tj, for all transactions Ti with TS(Ti) < TS(Tj), one of the following must hold:g

– Finish(Ti) < Start(Tj), OR– Set of data items written by Ti does not intersect with the set of data items read by

Tj and Ti completes its write phase before Tj starts its validation phase.

– Rational: Serializability is maintained because the write of Ti cannot affect the read of Tj and since Tj cannot affect the read of Ti because:

Start(Tj) < Finish(Ti) < Validation(Tj)j i j

2007/11/1344