An open and safe nested transaction model: concurrency and ...

15
An open and safe nested transaction model: concurrency and recovery Sanjay Kumar Madria a, * , S.N. Maheshwari b , B. Chandra c , Bharat Bhargava a a Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA b Department of Computer Science and Engineering, Indian Institute of Technology, Hauz Khas, New Delhi, India c Department of Mathematics, Indian Institute of Technology, Hauz Khas, New Delhi, India Received 1 August 1999; received in revised form 1 November 1999; accepted 8 December 1999 Abstract In this paper, we present an open and safe nested transaction model. We discuss the concurrency control and recovery algorithms for our model. Our nested transaction model uses the notion of a recovery point subtransaction in the nested transaction tree. It incorporates a prewrite operation before each write operation to increase the potential concurrency. Our transaction model is termed ‘‘open and safe’’ as prewrites allow early reads (before writes are performed on disk) without cascading aborts. The systems restart and buer management operations are also modeled as nested transactions to exploit possible concurrency during restart. The concurrency control algorithm proposed for database operations is also used to control concurrent recovery operations. We have given a snapshot of complete transaction processing, data structures involved and, building the restart state in case of crash recovery. Ó 2000 Elsevier Science Inc. All rights reserved. 1. Introduction 1.1. Overview of nested transaction models and recovery algorithms 1.1.1. Close nested transaction model In close nested transaction model (Moss, 1985), a subtransaction may contain operations to be performed concurrently, or operations that may be aborted inde- pendent of their invoking transaction. Such operations are considered as subtransactions of the original trans- action. This parent–child relationship defines a nested transaction tree and transactions are termed as nested transactions (Moss, 1985). Failure of subtransactions may result in the invocation of alternate subtransactions that could replace the failed ones to accomplish the successful completion of the whole transaction. Each transaction has to acquire the respective lock before accessing a data object. A subtransaction’s eect cannot be seen outside its parent’s view (hence, called closed). A child transaction has access to the data locked by its parent. When a transaction writes a data object, a new version of the object is created. This version of the ob- ject is stored in volatile memory. When the subtrans- action commits, the updated versions of the object are passed to its parent. If the transaction aborts, the new version of the object is discarded. Parent commits only after all its children are terminated. When the top-level transaction commits, the current version of each object is saved on stable storage. In the closed nested transaction model, the avail- ability is restricted as the scope of each subtransaction is restricted to its parent only. This forces a subtransaction to pass all its locks and versions of data objects updated to its parent on commit. The eect of a committed subtransaction is made permanent only when the top- level transaction commits. In many applications, it is unacceptable that the work of a longlived transaction is completely undone by using either of the above tech- niques in case the transaction eventually fails at finishing stage. The current strategy forces short-lived transac- tions to wait before they acquire locks until the top-level transactions commit and release their locks. Therefore, the model is not appropriate for the system that consists of long and short transactions. The Journal of Systems and Software 55 (2000) 151–165 www.elsevier.com/locate/jss * Corresponding author. Present address: Department of Computer Science, University of Missouri, Rolla, MO 65405, USA. E-mail addresses: [email protected], [email protected] (S.K. Madria), [email protected] (S.N. Maheshwari), bchan- [email protected] (B. Chandra), [email protected] (B. Bharg- ava). 0164-1212/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 0 0 ) 0 0 0 6 7 - 4

Transcript of An open and safe nested transaction model: concurrency and ...

Page 1: An open and safe nested transaction model: concurrency and ...

An open and safe nested transaction model: concurrency andrecovery

Sanjay Kumar Madria a,*, S.N. Maheshwari b, B. Chandra c, Bharat Bhargava a

a Department of Computer Science, Purdue University, West Lafayette, IN 47907, USAb Department of Computer Science and Engineering, Indian Institute of Technology, Hauz Khas, New Delhi, India

c Department of Mathematics, Indian Institute of Technology, Hauz Khas, New Delhi, India

Received 1 August 1999; received in revised form 1 November 1999; accepted 8 December 1999

Abstract

In this paper, we present an open and safe nested transaction model. We discuss the concurrency control and recovery algorithms

for our model. Our nested transaction model uses the notion of a recovery point subtransaction in the nested transaction tree. It

incorporates a prewrite operation before each write operation to increase the potential concurrency. Our transaction model is

termed ``open and safe'' as prewrites allow early reads (before writes are performed on disk) without cascading aborts. The systems

restart and bu�er management operations are also modeled as nested transactions to exploit possible concurrency during restart.

The concurrency control algorithm proposed for database operations is also used to control concurrent recovery operations. We

have given a snapshot of complete transaction processing, data structures involved and, building the restart state in case of crash

recovery. Ó 2000 Elsevier Science Inc. All rights reserved.

1. Introduction

1.1. Overview of nested transaction models and recoveryalgorithms

1.1.1. Close nested transaction modelIn close nested transaction model (Moss, 1985), a

subtransaction may contain operations to be performedconcurrently, or operations that may be aborted inde-pendent of their invoking transaction. Such operationsare considered as subtransactions of the original trans-action. This parent±child relationship de®nes a nestedtransaction tree and transactions are termed as nestedtransactions (Moss, 1985). Failure of subtransactionsmay result in the invocation of alternate subtransactionsthat could replace the failed ones to accomplish thesuccessful completion of the whole transaction. Eachtransaction has to acquire the respective lock beforeaccessing a data object. A subtransaction's e�ect cannot

be seen outside its parent's view (hence, called closed). Achild transaction has access to the data locked by itsparent. When a transaction writes a data object, a newversion of the object is created. This version of the ob-ject is stored in volatile memory. When the subtrans-action commits, the updated versions of the object arepassed to its parent. If the transaction aborts, the newversion of the object is discarded. Parent commits onlyafter all its children are terminated. When the top-leveltransaction commits, the current version of each objectis saved on stable storage.

In the closed nested transaction model, the avail-ability is restricted as the scope of each subtransaction isrestricted to its parent only. This forces a subtransactionto pass all its locks and versions of data objects updatedto its parent on commit. The e�ect of a committedsubtransaction is made permanent only when the top-level transaction commits. In many applications, it isunacceptable that the work of a longlived transaction iscompletely undone by using either of the above tech-niques in case the transaction eventually fails at ®nishingstage. The current strategy forces short-lived transac-tions to wait before they acquire locks until the top-leveltransactions commit and release their locks. Therefore,the model is not appropriate for the system that consistsof long and short transactions.

The Journal of Systems and Software 55 (2000) 151±165www.elsevier.com/locate/jss

* Corresponding author. Present address: Department of Computer

Science, University of Missouri, Rolla, MO 65405, USA.

E-mail addresses: [email protected], [email protected] (S.K.

Madria), [email protected] (S.N. Maheshwari), bchan-

[email protected] (B. Chandra), [email protected] (B. Bharg-

ava).

0164-1212/00/$ - see front matter Ó 2000 Elsevier Science Inc. All rights reserved.

PII: S 0 1 6 4 - 1 2 1 2 ( 0 0 ) 0 0 0 6 7 - 4

Page 2: An open and safe nested transaction model: concurrency and ...

1.1.2. Open nested transaction modelTo exploit layer speci®c semantics at each level of

operation nesting, Weikum presented a multileveltransaction model (Weikum, 1991; Weikum et al., 1990).The model provides non-strict execution by taking intoaccount the commutative properties of the semantics ofoperations at each level of data abstraction, whichachieves a higher degree of concurrency. A subtransac-tion is allowed to release locks before the commit ofhigher level transactions. The leaf level locks are re-leased early only if the semantics of the operations areknown and the corresponding compensatory actionsde®ned. When a high level transaction aborts, its e�ect isundone by executing an inverse action which compen-sates the completed transaction. Recovery from systemcrashes is provided by executing undo actions at theupper levels and redo actions at the leaf level. Each levelis provided with a level speci®c recovery mechanism.This model has also been studied in the framework ofobject oriented databases in Muth et al. (1993) andResende et al. (1994).

In many applications, the semantics of transactionsmay not be known and hence, it is di�cult to providenon-strict executions. In real time situations, there areother classes of operations that cannot be compensated.These are the operations that have an irreversible ex-ternal e�ect, such as handing over huge amounts ofmoney at an automatic teller machine. Such operationshave to be deferred until top-level commits, which re-stricts availability (i.e., increases response time).

1.1.3. Nested transaction recovery algorithmsThe intentions-list and undo-logging recovery algo-

rithms given in Fekete et al. (1993) handle recovery fromtransaction aborts in the nested transaction environmentby exploiting the commutative properties of the opera-tions. The intentions-list algorithm works by maintain-ing a list of operations for each transaction. When atransaction commits, its list is appended to its parent;when it aborts, the intentions-list is discarded. When thetop level transaction commits, its intentions-list istransferred to the log. This scheme provides recoveryfrom transaction aborts only and does not handle sys-tem crashes. To increase concurrency during undo log-ging recovery, scheme allows some non-strictexecutions. It allows a transaction to share the uncom-mitted updates made by other transactions by exploitingcommutativity of operations. On execution of an oper-ation, the data object records change their states and thenew state is transferred to the log. When a transactionaborts, in contrast to intentions-list algorithm, all op-erations executed by its descendants on the object areundone from its current state and are also subsequentlyremoved from the log. This algorithm does not take careof recovery from system crashes.

In both intentions-list and undo-logging algorithms,an incomplete transaction is allowed to make uncom-mitted updates available to those transactions that per-form a commutative operation. However, this isrestricted to transactions at the same level of abstrac-tion. This limits availability. In both algorithms, all thework done by descendent transactions are discarded incase of aborts at higher levels. This may not be possibleor desirable in many real time applications. In undo-logging algorithm, when a transaction aborts, in con-trast to intentions-list algorithm, all operations executedby its descendants on the object are undone from itscurrent state and are subsequently removed from thelog. In both intentions-list and undo-logging algorithms,an incomplete transaction is allowed to make uncom-mitted updates visible to those transactions that performa commutative operation. This is restricted to thetransactions at the same level of abstraction.

The above two recovery models consider semantics ofoperations at leaf level only. System R (Gray et al.,1981) exploits layer speci®c semantics but restricted totwo level of transaction nesting. In System R, to per-form recovery, updates are undone by performing in-verse tuple-level operations. For this purpose, System Rrecords tuple updates on a log. To recover from a systemcrash, before applying any tuple level log record, thedatabase must ®rst be restored to some tuple-levelconsistent state. In other words, a low-level recovermechanism is necessary to make tuple actions appearatomic.

In Moss (1987), a crash recovery technique similar toshadow page has been suggested in nested transactionenvironment based on undo/redo log methods. In termsof logging, both undo/redo logs are used. Mohan et al.(1992, 1989) has also discussed ``write ahead logging''based crash recovery algorithm using conventionalnested transaction model. This undo/redo type of re-covery model exploits semantics of nested transactions.The actions of a transaction undone during previousabort have not been undone again in case of one morefailure. This is an advantage over Weikum's multilevelrecovery algorithm by which requires undo actions to beundone again in case of one more failure.

1.2. Our contributions

In this paper, we introduce an open and safe nestedtransaction model in the environment of normal readand write operations to remove the de®ciencies statedabove and to further improve availability and providee�cient crash recovery. Our model supports inter- andintra-transaction concurrency. We assume that seman-tics of transactions at various levels of nesting are notknown. There are two basic motivations behind ourmodel. First, it is desirable that long-lived transactionsshould be able to release their locks before top-level

152 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 3: An open and safe nested transaction model: concurrency and ...

transactions commit. Second, it may not be desirable orpossible to undo or compensate the e�ects of one ormore of the important committed descendants after thefailure of a higher level transaction due to abort or asystem crash. We introduce the concept of a ``recoverypoint subtransaction'' of a top-level transaction in anested transaction tree. It is essentially a subtransactionafter the commit of which its ancestors are not allowedto rollback. In other words, once the recovery pointsubtransaction of a top-level transaction has committed,all its superior transactions are forced to commit. In caseit aborts, its ancestors can choose an alternate path tocomplete their execution. Our nested transaction modeluses a prewrite operation before an actual write opera-tion to increase the concurrency. The nested transactiontree of our model consists of database operations, sys-tem recovery operations (such as analysis and redo op-erations) and bu�er management operations speci®ed asnested transactions. The read, prewrite and writes op-erations are modeled at leaf levels in the transactionhierarchy. The recovery operations are speci®ed in termsof nested transactions to achieve higher concurrencyduring system restart. Our locking algorithm controlsthe execution of both the normal operations as well asrecovery operations. We also discuss the data structuresrequired for the implementation of the recovery algo-rithm. We have discussed a snapshot of the concurrencyand recovery algorithm with the help of an example. Abrief overview of our crash recovery algorithm has ap-peared in Madria et al. (1997c). The correctness of theconcurrency control algorithm using I/O automationmodel (FLMW) has reported in Madria et al. (1997b).

The rest of the paper is organized as follows. InSection 2, we present motivating examples and theoverview of our nested transaction and recovery model.In Section 3, we discuss nested transaction system modeland implementation. Section 4 presents system restartoperations. We present a snapshot of transaction pro-cessing, logging and recovery in Section 5. We concludein Section 6.

2. Open and safe nested transaction model and recovery

algorithm

In this section, we motivate the readers about theopen and safe nested transaction model with some ex-amples and provide an overview of our model and re-covery algorithm.

Motivating examples: Consider a part of the nestedtransaction tree for fund transfer operation from agroup of accounts to another account. In the transactiontree, let Ts be a transaction on whose behalf Ts1 invokesvarious subtransactions to collect (access) funds fromdi�erent accounts. Once Ts1 is committed, Ts invokes Ts2

to ®nally credit the funds collected into another account.

Suppose after a subtransaction Tw has withdrawn all theamount, Ts commits. If any transaction situated above Ts

aborts then it is desirable for the transaction to completesuccessfully on transaction revival. This is because it isnot possible to undo or correct the failed transaction'saction by some compensatory actions. Another possi-bility is to delay the actual commit of Ts until its top-level transaction commits which restricts availability.For example, a balance transaction has to wait until thecommit of the top-level transaction.

Consider another scenario where in the nestedtransaction tree, a subtransaction determines the success(commit) or failure (abort) of the top level transaction.Suppose this nested transaction tree models variousactivities (modeled as subtransactions here) related to abusiness travel. Some of the activities are very crucial indetermining the completion of the top level activity. Forexample, the commit of ``fund'' and ``visa'' subtransac-tions will determine whether the travel transaction willcommit or not. That is, these subtransactions commitwill determine the commitment of the top level trans-action no matter what may be the fate of other sub-transactions in the transaction tree. Note that once, thefund and visa subtransaction commits, its upper level(sub)transactions will be forced to commit (some delayor restart may involve).

Salient features of our model: Our model allows someparticular subtransactions to release their locks beforetheir ancestor transactions commit. This allows theother subtransactions to acquire required locks earlier.Our nested transaction model can handle the situationswhere a committed lower level subtransaction's e�ectcannot be undone or compensated in case of a higherlevel transaction's failure. A transaction's semanticsmay be such that beyond a certain point, it cannotrollback entirely or its e�ect should not be lost. Weachieve this by introducing the concept of ``recoverypoint subtransaction'' of a top-level transaction in anested transaction tree. It is essentially a subtransactionafter whose commitment, its ancestors are not allowedto rollback. In case a superior transaction aborts or thesystem fails after the commit of its recovery point sub-transaction, the failed transaction has to complete onsystem revival. Such a transaction execution permits arecovery point subtransaction to reveal its result toother transactions at any level of nesting before its su-perior transactions commit. A recovery point subtrans-action's e�ect is made durable before its top-leveltransaction's commit. This results in the relaxation ofthe isolation property (Harder and Reuter, 1983) of thetransaction.

To avoid undo actions and the consequent cascadingaborts and to increase the availability, we assume thateach transaction issues a prewrite operation (Madria,1995; Madria et al., 1999; Madria and Bhargava, 1997a)before a write for the object it intends to write. Each

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 153

Page 4: An open and safe nested transaction model: concurrency and ...

prewrite operation contains the value that a user-visibletransaction wants to write and precedes the associated®nal write. A prewrite operation actually does notchange a data object's state but only announces thevalue the data object will have after the associated writeis performed. The advantage of prewrite is that a readoperation of another transaction can get the value be-fore a data object's state is updated on stable storageand hence, results in increasing the availability of newdata values. Prewrite operations are particularly helpfulin the engineering design applications (Kim et al., 1984),CAD (Korth et al., 1990), large software design projects(Korth and Speegle, 1990) etc. where transactions arelong. A subtransaction that initiates di�erent prewriteaccess subtransactions at leaf level for di�erent dataobjects is de®ned to be the recovery point subtransac-tion. These announced prewrite values are made visibleto other subtransactions after the commit of recoverypoint subtransaction. The prewrite subtransactions re-lease their locks before their ancestors commit. Dis-carding some of the prewrites before the commit of therecovery point subtransaction will not introduce cas-cading aborts (hence safe) since the prewrite values aremade visible only after the commit of the recovery pointsubtransaction.

2.1. Crash recovery algorithm

2.1.1. Basic goals of system crash recovery algorithm· Revive the database state of those data objects which

do not contain their last committed values with re-spect to the execution up to the system failure.

· Revive the prewrite values (kept in prewrite-bu�ers)of the data objects which have been announced bythe committed recovery point subtransaction beforesystem failure.

· To identify such data objects, the dirty object tablehas to be revived. The dirty object table is used tokeep track of those data objects whose ®nally writtenvalues are inconsistent with the stable database val-ues. This table also keeps information about thosedata objects whose prewrite values, announced bythe committed recovery point subtransactions, havenot been subsequently written on the database beforea system crash.

· A system crash creates an additional problem of ac-complishing the completion of those top-level trans-actions whose recovery point subtransactions havebeen committed before system crash. They have to re-acquire the locks held by them at the time of failurebefore new transactions acquire such locks.

· To handle above, the transaction and lock tableshave to be revived. The transaction table keeps a listof all active transactions in the system at any time.The revived transaction table will recognize those ac-tive top-level transactions (and their active descen-

dants) whose recovery point subtransactions havebeen committed before failures. The lock table con-tains the type of locks held by the transactions on dif-ferent data objects at any time. The revived lock tablewill help in reacquiring the locks held by active top-level transactions and their descendants at the timeof failure.

· To initiate new top-level transactions as soon as thedirty object, transaction, and lock tables and consis-tent states of prewrite- and write-bu�ers of dirty dataobjects are re-established.

2.1.2. System crash recovery stepsTo achieve above recovery goals, we need the fol-

lowing steps to restart the system:Revival of dirty object table: The dirty object table is

required to be checkpointed periodically by transferringa copy of it to the stable storage during normal pro-cessing. The prewrite values and after-images are loggedon stable storage during the execution of transactions tobuild the consistent dirty object table in case a systemfailure occurs before the next checkpoint is taken. Atransaction is not permitted to complete its commitprocessing until the redo portion of that transaction hasbeen written to stable storage. The redo portion of a logrecord provides information on how to redo changesperformed by the committed transactions. During sys-tem restart, the dirty object table is recovered with thehelp of most recent checkpointed copy of the dirty ob-ject table and is modi®ed with the help of log storedafter the last checkpoint.

Revival of transaction and lock tables: The transactionand lock tables are checkpointed by transferring a copyof each of them to the stable storage periodically duringnormal processing. Whenever a subtransaction is madeactive or when any transaction acquires or releases alock, the information is also logged to build a consistentstate of these tables. However, such information maynot be logged for read-only and prewrite access sub-transactions as these are to be discarded in case of asystem failure. If a checkpoint is taken during restartrecovery then the contents of transaction and lock tableswill also be included in checkpoint. The entries corre-sponding to all other transactions except those that areto be restarted (whose recovery point subtransactionshave not been committed) are removed from the trans-action and lock tables. To do so, we need to ®ndwhether the stable storage contains the commit-state ofthe recovery point subtransaction of each active top-level transaction. The commit states information istransferred to the stable storage during the commit ofthose subtransactions whose e�ects cannot be undone orlost in case of a failure. The commit-state informationcontains, besides associated variables, private dataand other information, the identi®er of the committedsubtransaction as well as of its parent transaction.

154 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 5: An open and safe nested transaction model: concurrency and ...

A commit-state information of a subtransaction T 1de®nes the state of its parent transaction T 2 at the timeof commit of T 1. The commit-state information helps inre-establishing the restart state of a top-level transactionin order to complete its remaining execution. No sub-transaction whose e�ects cannot be undone in case offailure can be considered complete until its commit-stateinformation and all its data are safely recorded on stablestorage. If the stable storage does not contain thecommit-state of the recovery point subtransaction of anactive top-level transaction then all the entries corre-sponding to it and all its subtransactions are removedfrom the table. Otherwise, the top-level transaction hasto complete its remaining execution on revival.

Revival of bu�ers: To revive the contents of write-bu�er of a dirty data object, we copy the value of thedata object from the stable-db to the write-bu�er.However, the stable database version of the data objectmay not contain some or all the updates of committedtransactions. It involves redoing those committedtransactions after-images, which have not been trans-ferred to the stable-db before the failure. The redo ofafter-images will re-establish the state of the database inthe write-bu�er at the time of failure. Similarly, to re-cover the prewrite-bu�ers corresponding to dirty dataobjects, we redo the prewrite values logged on stablestorage after the last checkpoint, which have not sub-sequently been written. This re-establishes the states ofprewrite-bu�ers of dirty data objects as they exist at thetime of failure. The contents of prewrite- and write-bu�ers are recovered using the non-volatile storageversion of the database, dirty object table and the log.There is at the most one prewrite log corresponding to adata object since once the associated write values arewritten, the corresponding prewrite log entry is removedfrom the stable storage as well as from the dirty objecttable.

Transaction logging and recovery: The log recordswritten on behalf of subtransactions are always linked tothe last record of their parents which re¯ects the trans-action tree in the log. Whenever a prewrite access sub-transaction commits, its commit information andprewrite values are passed to its parent transaction,which is the recovery point subtransaction. When therecovery point subtransaction decides to commit, itscommit-state information and the associated prewritevalues are required to be logged. This commit-state in-forms the scheduler that from this point of time, thecommitted subtransaction's e�ect cannot be lost underany circumstance. The commit of recovery point sub-transaction occurs only after all its prewrite accesssubtransactions have been committed and therefore, therecovery point subtransaction's commit-state has thee�ect of all its committed prewrite access descendants.Hence, the commit-state of recovery point subtransac-tion is the ®rst commit entry in the stable log. If system

crashes immediately after the commit of its recoverypoint subtransaction, the scheduler will restart its activetop-level transaction from the logged commit-state on-wards.

Whenever a write access subtransaction at leaf leveldecides to commit, its commit-state and write value arelogged. The commit information is passed to its parenttransaction that helps in the termination of the parenttransaction. The process of logging the commit-state willcontinue till the top-level transaction commits. In caseof failure, these logs will help in completing an activetop-level transaction on revival. The process of trans-ferring a transaction's commit-state or prewrite or writevalues to the stable log is called transaction check-pointing (early writing). It is required in order to keeptrack of commit-states, prewrite and write values ofvarious subtransactions, which helps in completing anactive top-level transaction's re-execution. Transactioncheckpointing will keep track of all logical committedstates as well as prewrite and write values, which cannotbe lost. Readonly transactions require no early writingas they do not change a data object's state.

To complete the active top-level transactions, whoserecovery point subtransactions have been committedbefore system failure, the scheduler has to decide therestart states to re-initiate such transactions. If the re-covery point subtransaction's commit is the only com-mit record in the stable log then its active top-leveltransaction restarts from this commit-state. Otherwise,the scheduler ®nds out the last commit-state logged afterthe commit of recovery point subtransaction prior tosystem crash in order to restart the transaction from thelast commit-state. Once the restart-state is established,the scheduler reacquires the type of locks the active top-level transaction and all its active descendants wereholding at the time of failure. Once the locks are reac-quired, the execution of a top-level transaction restartsfrom the restart-state.

2.2. Data structures

Here we discuss the data structures used in the logicalimplementation of the recovery model. Most of thesedata structures are needed in the physical implementa-tion as well. We ®rst discuss some of the ®elds present indi�erent types of log records, which are as follows.

LSN: This gives the address of the log record in thelog address space. It is a monotonically increasing value.It is present in log records of the type ``data''. This maybe included in other type of log records also but is notmandatory.

Transaction-id: Identi®er of the transaction involvedin the log record.

Object-id: Identi®er of the object involved in the logrecord. It is present in log records of ``data'' and ``lock''types.

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 155

Page 6: An open and safe nested transaction model: concurrency and ...

Value: This is the redo data that describes the updatethat was performed. This also includes the committedprewrite value.

Active: Present in the log record written during theactivation of a transaction.

Commit-state: Present in the log record written duringthe commit of a subtransaction. This includes the pri-vate data, local variables, etc. of the committed sub-transaction.

Lock: This is present in the log record which is loggedwhen a subtransaction acquires or releases any lock.This includes information whether the lock is ``retained''or ``held'' by the transaction.

Log records can be of the following types:``Data'' type of log records of the form:hLSN; transaction-id; object-id;Prewrite or write valuei``Transaction'' type of log records are the formhtransaction-id; statusi. Note that status takes ei-ther the value ``Active'' or ``Commit''.``Lock'' type is of the formhtransaction-id; lock type; object-idiWe refer the complete log record structure byhlog recordi along with its type information.hEND-CHK-POINTi is a record to identify the endof a checkpointing activity.

2.2.1. Transaction and lock tablesTo distinguish between di�erent transactions and to

know their status (active or not) in the system, we needto maintain the transaction-id and status of eachtransaction using a transaction table. Furthermore, tore¯ect the transaction tree, each transaction-id is suchthat it contains the identi®er of its own as well as theidenti®er of its parent transaction. This type of trans-action-id helps the scheduler in informing the commitor abort of a subtransaction to its parent. In addition,an abort request also contains identi®ers of all its in-feriors.

For the scheduler to know whether a transaction isactive or not, it is su�cient that a transaction may keeponly one status namely ``active''. Since the parent±childrelationship of committed subtransactions are to bestored in the log separately by linking the log records ofcommitted subtransactions to their parents, the trans-action table need not keep information about committedsubtransactions. A subtransaction enters the ``active''status as soon as it is initiated and remains ``active'' untilit commits or aborts. When a subtransaction commits,its entry is removed from the transaction table. After thecommit of the recovery point subtransaction, the statusof its upper level subtransactions remains ``active'' evenin case of aborts at higher level because they have tocomplete their remaining execution on revival. On sys-tem revival, once such ``active'' top-level transactionsare decided, all other ``active'' top-level transactions andtheir ``active'' descendants are removed from the table.

The lock table keeps the information about the locksheld by all active transactions in the system at any time.Each entry of the table keeps information about thetransaction-id, type of lock held and the object-id. Whena transaction acquires a lock on a data object, an entryis made in the lock table. When a transaction commitsor aborts, the corresponding transaction entry is re-moved from the table. A new entry is made about thetransaction which inherits the lock from the committedor aborted transaction.

2.2.2. Dirty data object tableEach entry in the dirty data object table consists of

®eld's object-id, RecLSN (Recovery log sequence num-ber) of prewrite and write operations. The value ofRecLSN of write operation indicates in the log theremay be updates which are, possibly, not yet in the non-volatile version of the data objects. The minimum ofRecLSN values in the table gives the starting point forredo activity. All the write log records whose LSNs aregreater than the min RecLSN are, possibly, required tobe redone as these log records e�ects might not havebeen transferred to the stable-db. The min RecLSN ofall prewrite operations which is greater than the maxRecLSN of all write operations gives the starting pointfor redoing the prewrite operations. All the prewrite logrecords whose LSNs are greater than the min RecLSNare required to be redone as these are the prewrite logrecords whose associated write operations are not per-formed. Therefore, these prewrite log records need to beredone. Whenever the write values are announced, thecorresponding prewrite entries from the dirty objecttable are removed. Similarly, whenever the data objectsare written back to the non-volatile storage, the corre-sponding entries are removed from the table.

3. Nested transaction system model and implementation

Our nested transaction database system model for-mally consists of transaction managers (TMs), recoverymanagers (RMs) and data managers (DMs). The datamanagers (DMs) model the data objects. Each datamanager keeps a copy of the data object in the sec-ondary storage, called stable-db. The prewrite and writevalues of each object are kept in the respective bu�ers atthe corresponding DMs. These are called prewrite- andwrite-bu�ers, respectively. Physically, only a subset ofthese DMs will have prewrite and write values of thedata objects in the corresponding bu�ers. A read oper-ation gets the value of the referenced data object fromthe prewrite-bu�er (if any) otherwise it gets the valuefrom the write-bu�er. If the DM does not have a copy ofthe data object in the write-bu�er, the read operationgets the value from the stable-db copy of the data ob-ject. The write-bu�er's contents of a data object are

156 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 7: An open and safe nested transaction model: concurrency and ...

transferred periodically to stable storage. Also, eachDM maintains a log corresponding to the data object.Each DM also shares a common log.

Considering the above con®guration, our model hasfour di�erent transaction managers (TMs) for per-forming read (read-TM), write (write-TM) and systemrestart's analysis (analysis-TM) and redo (redo-TM)operations. Of these, read- and write-TMs are initiatedby the user-visible transactions. An external agent suchas the operating system invokes analysis-TMs and redo-TMs. A hidden daemon transaction is associated witheach write- and redo-TM transaction to coordinate thebu�er management operations, i.e., the transfer of adata object's value to stable-db during the normal andsystem restart operations. To achieve the notion ofspontaneity and transparency of bu�er managementoperation, the daemon transaction wakes up and com-mits with respect to its associated transaction.

TMs are situated at one level below user-visibletransactions. Next level of transaction hierarchy has sixdi�erent recovery managers for co-ordinating read(read-RM), prewrite (prewrite-RM), write (write-RM),transfer (transfer-RM) and system restart's analysis(analysis-RM) and redo (redo-RM) operations. TheseRMs are made active by the corresponding TMs. Dur-ing the span of daemon transaction, a daemon can ini-tiate many transfer-RMs. This will help the transferprocess to be made active during redo operations. TheseRMs initiate access subtransactions situated at the leaf

level. Each read, prewrite and write-RM initiates read,prewrite and write access subtransactions, respectively.A read access reads the value either from the prewrite orthe write-bu�er of the data object whereas a write sub-transaction accesses only the write-bu�er component ofthe data object. A transfer-RM initiates a transfer access(like read access but returns no value) to transfer thecontents of write-bu�er of the data object to the stable-db. The analysis-RM initiates copy, read, write andread-analysis access subtransactions. A copy transactionre-initializes the tables in the volatile memory whereasread accesses read the log entries after the last check-point record and write accesses update tables to bringtheir state as existing at the time of failure. A read-analysis subtransaction returns information about thedirty objects at the time of system failure. It also returnsa list of active transactions (and their restart states) to berestarted on system restart. A redo-RM initiates copy,read, prewrite and write access subtransactions. Here, acopy transaction places the stable-db copy of the objectin its write-bu�er. A read access initiated by redo-RMreads a log entry corresponding to a data object loggedafter the last checkpoint record. A prewrite (write) ac-cess makes a prewrite-bu�er's (write-bu�er's) valueconsistent. The nested transaction tree structure isshown in Fig. 1(a) (for normal operations) and Fig. 1(b)(for system restart operations).

We assume that each user transaction knows itswrite-set before initiating a write-TM to write all the

Fig. 1. (a) Nested transactions tree for normal operations and (b) nested transaction tree for system restart operations.

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 157

Page 8: An open and safe nested transaction model: concurrency and ...

data objects. A write-TM ®rst initiates a prewrite-RM,which further initiates prewrite access subtransactions inorder to announce prewrites for all the data objectscontained in the write-set. This value for each data ob-ject is written in the prewrite-bu�er allocated in thevolatile memory. Modeling prewrites at leaf level pro-vides user transparency to the prewrite operations. Weformally specify the prewrite-RM as the recovery pointsubtransaction of the top-level transaction. Once theprewrite-RM has committed, the prewrite values be-come visible outside its parent's view at any level ofnesting without necessarily requiring the commit of allits superior transactions. After the prewrite-RM'scommit, the write-TM initiates a write-RM to update allthe data objects whose prewrite values have been an-nounced before. The ®nal updates are written in thewrite-bu�ers allocated in the volatile memory at eachDM. With the invocation of each write-TM automaton,a daemon transaction is initiated automatically whichfurther initiates transfer-RMs. A transfer-RM initiates atransfer access subtransaction to transfer the write-buf-fer's value to the stable-db. The write-bu�er's contentscan be transferred without the commit of the top-leveltransaction because write-values, once written, cannot beundone or lost.

To meet transaction and data recovery guarantees,the system maintains a log corresponding to each dataobject at the respective DM. The system maintains acommon log (shared by all DMs) which keeps infor-mation about the progress of transactions and its asso-ciated data, their lock holding information etc. Ouralgorithm asserts that the log records representingchanges to some data must already be on stable storagebefore the changed data is allowed to replace the pre-vious version of that data on non-volatile storage. Thelog record corresponding to each data object is assigneda unique log sequence number (LSN) at the time therecord is appended to the log. The LSNs are assigned inascending order. Whenever a prewrite value is an-nounced, the LSN of the prewrite log record to bewritten is placed in the LSN ®eld of the prewrite value inthe prewrite-bu�er. Similarly, when the data object'svalue is updated in the write-bu�er, the LSN of the logrecord is placed in the LSN ®eld of the updated dataobject in the write-bu�er. This value of LSN will bemore than the value of LSN associated with the prewritevalue of the same data object. The LSN with each pre-write and write value in the associated LSN ®eld keepstrack of the data object's state. Also, when a write-bu�er's value is transferred to the stable-db, the after-images LSN is placed in the LSN ®eld of the stable-db.This tagging of LSN allows precise tracking of the statesof the object with respect to logged updates of the dataobject for system restart purpose. Each write-bu�er isalso associated with a stableLSN ®eld. The stableLSN®eld is the LSN of the stable-db copy of the object. A

write-bu�er's value is transferred to the stable storage ifthe stableLSN of the write-bu�er is greater than thewrite-bu�er's LSN. This will avoid accessing the stable-db's LSN to check whether the transfer of the object'svalue to the stable storage is required or not.

3.1. Concurrency control algorithm

In this section, we discuss the type of con¯icts whichoccur in our model during normal and recovery opera-tions and the locks needed to control them. Note that weuse the same concurrency control protocol to control theconcurrent execution of recovery operations and dat-abase operations (see Fig. 2). The database operationsare read, write and prewrite. The locks needed to controlthe concurrent executions are read-lock (RL) for read,write-lock (WL) and prewrite-lock (PL) for prewriteoperations, respectively. During recovery operations,the access operations executed are copy, read, write,transfer, read-analysis and prewrite. The operationscopy, read, transfer and read-analysis acquire read locksand hence, follows the read-locking protocols as in Fig.2. Write and prewrite acquire write-lock and prewrite-locks, therefore, they also follow the locking protocolsgiven below with respect to write-lock and prewrite-lock. A more detailed discussion on the correctness ofconcurrency control algorithm for database operationshas appeared in Madria et al. (1997b).

Formally, we have the following locking algorithm asshown in Fig. 2.

4. System restart operations

System restart has to perform two passes of the log:analysis pass and the redo pass. After the analysis passof log records, the transaction table will contain the listof transactions active at the time of failure, the locktable will have lock entries corresponding to activetransactions, and the dirty object table will contain thelist of data objects which were dirty at the time of fail-ure. The redo activity is performed in second pass inorder to restore the dirty data objects to the valuesconsistent with the information kept in the stable log.

The analysis pass is modeled as an analysis-TMwhich initiates an analysis-RM. The analysis-RM fur-ther initiates a copy access, which re-initializes the lock,transaction and the dirty object tables by placing theirstable storage copies in the bu�er after the last check-point record. Next, it invokes read accesses to read thelog records from the corresponding DMs after the lastcheckpoint record. The analysis-RM with the help ofwrite accesses updates these tables as follows. If a logrecord corresponding to transaction table is encounteredwhose identity does not already appear in the table, thenan entry is made in the table. The transaction table is

158 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 9: An open and safe nested transaction model: concurrency and ...

modi®ed to track the active transactions at the time offailure. Similarly, a log record corresponding to dirtyobject table is entered with the current LSN in the dirty

object table if it is not already there. In a similar fashion,the lock table is also updated. Whenever a commit re-cord is encountered, a list of commit records is made.

Fig. 2. Locking algorithm.

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 159

Page 10: An open and safe nested transaction model: concurrency and ...

This helps in establishing the restart states of thetransactions to be re-executed. A read-analysis returns aset of min RecLSN (min LSN of log records from whereredo recovery has to restart, called RedoLSN): one foreach dirty data object with respect to write operationsand a set of min RecLSNs (greater then the max Rec-LSNs of the corresponding data object) for each dirtydata object with respect to prewrite operations. It alsoreturns a list of active transactions and their restartstates. The above information will be the output of ananalysis-TM. A checkpoint is taken at the end of theanalysis pass.

The redo activity is performed using the prewritevalues and after-images logged on the stable storage.The redo activity can be skipped if there is no dirtydata object. The transaction hierarchy for redo activityconsists of a redo-TM for each object, a redo-RM atthe next level and access subtransactions at the leaflevel. Each redo-TM initiates a redo-RM which furthertriggers copy, read, prewrite and write access sub-transactions to redo the operations present in the stablelog corresponding to the data objects after the lastcheckpoint record. A copy transaction re-initializes thewrite-bu�er by copying the contents of the stable-db tothe write-bu�er. Read operations read the write andprewrite log entries one by one after the last check-point. A prewrite access will re-initialize the prewrite-bu�er with the help of prewrite log record value if thevalue of LSN associated with the prewrite log record isgreater than LSNs of all the after-images logged onstable storage. A write access substitutes the value(after-image) of the data object read from the log in theallocated write-bu�er of the data object if the dataobject's LSN in the write-bu�er is found to be less thanthe log record's LSN. These log records will be thosewhose e�ects are not yet in the non-volatile storageversion of the data object. The daemon transactionassociated with the redo-TM initiates transfer-RMs totransfer the contents of the write-bu�er to the stable-db. After the commit of redo-TM, a checkpoint istaken.

The analysis and redo activities in the form of nestedtransactions provide faster recovery since a redo-TM foreach data object can be initiated in parallel. Also, if asystem crash occurs during the execution of an analysis-or redo-TM, the corresponding new TM can be trig-gered on system restart. All the actions of previouslycommitted subtransactions of the failed TM are notrequired to be discarded in case of a system failure.Similarly, in case of a normal transaction abort of any ofthese TMs, RMs, or their subtransactions, a corre-sponding new transaction can be initiated without dis-carding the e�ects of the aborted transaction and theirdescendants (if any). This helps in relaxing the atomicityproperty of such subtransactions since neither the ac-tions of previously committed subtransactions are un-

done nor the failed TM or RM is retried until theircompletion on revival.

4.1. Bu�er management operations

The transfer of a data object's value from the write-bu�er to the stable storage is initiated with the help of adaemon transaction associated with each write-TM andredo-TM. The daemon transaction invokes a transfer-RM that transfers the value of the data object from itswrite-bu�er in the volatile memory to the stable-db onsecondary storage. For the sake of uniformity, and inorder to model correct atomicity requirements, transfer-RMs are modeled as transactions and placed as childrenof write- and redo-TMs in the transaction tree. Onewould like to permit transfer operations to take placewith the intention that these TMs do not control theirinvocations. They are intended to run spontaneouslyand transparently. Therefore, we have associated adaemon transaction with each write- and redo-TM sothat these TMs do not have to be aware of the invoca-tions of transfer-RMs. A daemon transaction commitswith the commit of its associated TM. Therefore, it ispossible to invoke many transfer-RMs dur ing its spanand these transfer-RMs may not have committed beforetheir parent transaction's commit. However, we knowthat the sequence of transfer operations for a read-writedata object from the write-bu�er to stable-db is the sameas transfer of last write. This enables the daemontransaction to initiate many transfer-RMs and to com-mit without the commit of all its transfer-RM sub-transactions. The daemon transaction achieves thespontaneous and transparent behaviour of the periodictransfer of the data from the volatile memory to stablestorage and is failure-atomic.

The transfer of write-bu�er to the stable-db takesplace if the stableLSN of the write-bu�er is greater thanthe LSN of the write-bu�er. Once the transfer of thevalue along with LSN is completed, the correspondinglog entry from the stable log is removed and the stab-leLSN of the write-bu�er is set to the LSN of the write-bu�er.

5. Snapshot of transaction processing, logging and recov-

ery

Consider the nested transaction tree structure asshown in Fig. 3, where U is a user-visible transaction. T1

and T2 are read- and write-TMs, respectively. T 02 is theassociated daemon transaction. T11; T21 and T22 are read,prewrite and write-RMs, respectively. T 023 is the transfer-RM. T111 is a read access, T211 and T212 are prewriteaccess subtransactions whereas T221 and T222 are corre-sponding write accesses, respectively. T 0231 and T 0232 are

160 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 11: An open and safe nested transaction model: concurrency and ...

the transfer access subtransactions. Data objects sup-posed to be accessed by these transactions are X and Y.

As soon as the top level transaction U is initiated, itsstate is set to be ``active'' in the transaction table. Also,this information is appended to the log on stable stor-age. As soon as transactions T2; T21; T211 and T212 aremade active, the information is recorded in the trans-action table as well as logged on stable storage. The stateof the transaction table is as shown in Fig. 4(a). Thetransaction T21 has initiated the prewrite access sub-transactions T211 and T212 to announce the prewrite valueof the data object X and Y, respectively. As soon as theprewrite access subtransaction T211 �T212� gets the pre-write-lock on the DM corresponding to the data objectX �Y �, an entry in the lock table is made as shown in Fig.4(b). When a prewrite value of a data object X �Y � isannounced by T211�T212�, LSN of the log record to bewritten is placed in the LSN ®eld associated with theprewrite value. When T211�T212� decides to commit, thecommit information and prewrite value along with LSNare passed to their parent subtransaction T21. The pre-write-lock held by T211 and T212 are passed to the pre-write-RM T21 on their commit and the correspondingentries are made in the lock table. The updating of dirtyobject table, logging of prewrite values and lock holding

information associated with T211 and T212 are deferreduntil the commit of recovery point subtransaction T21.This is because in case of system crash, the e�ect of T211

and T212 are to be discarded. On commit of T21, theprewrite values are transferred to the respective logs asshown in Fig. 4(c) where each tuple in the database log isof the form hLSN; transaction-id; object-id; prewritevaluei.

To sustain the e�ect of recovery point subtransac-tion T21 in case of system crash, its commit-state in-formation that includes all the variables, its privatedata and prewrite values along with the LSNs, arealso logged. Logging of commit-state is repeated untilthe top level transaction commits. When the prewrite-RM T21 commits, the lock released by T21 is inheritedby U (it is safe to release lock to U here since theancestors of recovery point subtransaction, prewrite-RM, cannot be aborted in any case). This informationis also recorded in the lock table as well as logged onstable storage. The state of the transaction, lock anddirty object tables after the commit of T21 is shown inFig. 5(a)±(c). Note that these updated tables are alsocheckpointed periodically during normal processingwhich reduces the system restart's work in case ofsystem crash.

Fig. 3. Nested transaction tree.

Fig. 4. (a) Transaction table; (b) lock table; (c) event and log records.

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 161

Page 12: An open and safe nested transaction model: concurrency and ...

After the commit of prewrite-RM T21, the write-TMT2 initiates write-RM T22 to write data objects X and Ywhose prewrite values have been announced before.Write-RM T22 further initiates write access subtransac-tions T221 and T222 which ®nally update data objects Xand Y, respectively. When an update is performed, thelog record is written along with LSN and this value ofLSN is also placed in the LSN ®eld of the updated valueof the data object as shown in Fig. 6. The write-lock heldby T221 �T222� is also inherited by U and the informationis recorded in the lock table as well as logged on thestable storage. The states of di�erent tables after thecommit of T2 are shown in Fig. 7(a)±(c).

When the write-TM T2 is initiated, the associateddaemon transaction T 02 is also invoked automatically.The daemon transaction initiates the transfer-RM T 023

which further initiates transfer access subtransactionsT 0231 and T 0232 to transfer the updated values of data ob-jects X and Y to the stable-db, respectively. The infor-mation that these transactions are initiated is alsorecorded in the transaction table. As soon as T 0213 �T 0232�acquires a read-lock, the information is recorded in thelock table. When the transfer of the data object X's (Y's)value from the write-bu�er to the stable-db is completed,the corresponding entry from the dirty object table aswell as from the log is also removed. When any of the

Fig. 5. (a) Transaction table; (b) lock table; (c) dirty object table.

Fig. 6. Event and log records.

Fig. 7. (a) Transaction table; (b) lock table; (c) dirty object table.

162 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 13: An open and safe nested transaction model: concurrency and ...

readonly transactions T1; T11 and T111 are initiated, theinformation is recorded in the transaction table. Simi-larly, when a read-lock is acquired or released, the locktable is also updated. However, such information is notlogged on stable storage as these transactions are to bediscarded in case of system failure. When the top leveltransaction U commits, all the entries corresponding toU are removed from the tables and the logs.

5.1. System restart processing

5.1.1. Analysis pass processingGiven below is an example to show how analysis pass

operations explained in Section 5 are performed. Sup-pose in the stable storage, the tables are in a state asshown in Fig. 7(a)±(c) and the log has the followingentries after the last checkpoint record at the time ofsystem failure.

On system restart, the external agent initiates ananalysis-TM Ta to perform the analysis pass (see Fig. 8).The analysis-TM invokes the analysis-RM Taa at thenext level of transaction hierarchy. The analysis-RMfurther initiates copy, read, write and read-analysis ac-cess subtransactions. The copy transactions Tci �i � 1; 3�copy all the tables from the stable storage into the vol-atile memory. Read subtransactions Tri �i � 1; 4� readthe log records one by one from the logs correspondingto each data object and the common log. The corre-sponding write accesses Twi �i � 1; 4� update the tablesas follows : since the entries of database type log recordsare not in the dirty object table, therefore, entries aremade into the table. Similarly, other tables are alsoupdated. Also, since T221 has committed, the corre-sponding entry in the transaction table is removed. The

read-analysis returns min RecLSN (RedoLSN) of 102for the write operation of the data object X and minRecLSN (RedoLSN) of 101 for the prewrite operationof the data object Y. It also returns the information thatU and T2 are active subtransactions along with lockholding information about such transactions. Thesetransactions are to be restarted as their recovery pointsubtransaction T21 has committed.

5.1.1.1. Concurrency control for analysis pass. To explainconcurrency control among access transactions of theanalysis pass, we assume that only one copy, one read,one write and one read-analysis access subtransactionshave to be initiated. The copy, read and read-analysisaccesses require a read-lock before accessing a DMwhereas a write access requires a write-lock before ac-cessing a DM. When an access transaction commits, thelock is released to its parent. When a non-access trans-action commits, its lock is released to the external agent.When a transaction aborts, its lock is released to theparent transaction without discarding the e�ects of theaborted transaction. In case of a system failure, analysispass is started again. In case of a transaction abort, anew transaction can be initiated. The locking rules aresimilar as given for normal system operations in Section3.1.

5.1.2. Redo pass processingHere, we explain the redo pass operations given in

Section 4 with the help of the following example. Con-sider the following database log records correspondingto the data object X at the time of system failure.

Suppose RedoLSN of data object X with respect tothe write operations in the reorganized dirty object tableis 222. Let the min RecLSN (RedoLSN) of prewriteoperations which is greater than max RecLSNs of writeoperations in the dirty object table be 225. ConsiderFig. 9. During redo operations, redo-TM transaction Ts

initiates a redo-RM transaction Ts1 which in turn initi-ates a copy transaction Tc which re-initializes the write-bu�er of the data object X by copying its value from thestable-db along with LSN. It sets its stableLSN andLSN ®elds to be equal to the LSN of the stable-db. After

``Data'' type At Logx h102; T221;X ; xiAt Logy h101; T212; Y ; yi

``Transaction''

type

At commonLog

hT221; commiti

``Lock'' type At commonLog

hT2;write-lock�retain�;X i

Fig. 8. Analysis pass.

``Data'' type log records Write-bu�er of X

Write

Log

hEND-CHK-POINTi

stableLSN � 222

hT1;X ; 220; x1i LSN � 224hT2;X ; 222; x2i stable-db of XhT3;X ; 224; x3i x2LSN � 222

Prewrite

Log

hT4;X ; 225; xi

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 163

Page 14: An open and safe nested transaction model: concurrency and ...

its commit, the value in the write-bu�er will become x2

with LSN 222. Ts1 then triggers a read access subtrans-action Tr1 with RecLSN 222. It reads the database writelog record whose LSN is greater than or equal to 222.Similarly, Tr2 and Tr3 are initiated to read the next writeand prewrite log entries, respectively. The write accesssubtransaction Tw1 invoked by Ts1 replaces the write-bu�er value of the data object X by x2 if LSN of the logrecord is greater than stableLSN of the object in thewrite-bu�er. Since LSN (222) of the log record is notgreater than the stableLSN (222) of the object in write-bu�er and therefore, Tw commits without updating thevalue in the write-bu�er with the return message ``not tobe redone''. Ts1 will then initiate next write access sub-transaction Tw2 which replaces the write-bu�er of theobject by x3 as its LSN (224) is greater than the stab-leLSN (222). Ts1 also concurrently initiates a prewriteaccess subtransaction Tp to re-initialize the prewritevalue in the prewrite-bu�er if the LSN value of theprewrite log record is greater than the max RecLSN ofall the write operations. Tp re-initializes the prewrite-bu�er with the prewrite value x, as its LSN value (225) isgreater than the max RecLSN 224 of all the write op-erations. During the redo operation, the daemontransaction T 0s associated with Ts is also invoked auto-matically. Ts in turn initiate a transfer access subtrans-action to transfer the contents of the write-bu�er of thedata object to the stable-db. Tt is the transaction whichtransfers the data object X's write-bu�er value if its LSNis greater than the stable-LSN of the data object in thewrite-bu�er.

5.1.2.1. Concurrency control for redo pass. Consider Fig.9, where Ts is a redo-TM and T 0s is the associated dae-mon transaction. Ts1 and T 0s1 are redo- and transfer-RMs, respectively. Tc, fTr1; Tr2g, Tp and Tt denotes copy,read, prewrite, and transfer access subtransactionswhereas Tw is the write access subtransaction, respec-tively. In this example, for simplicity, we have assumedreading of only one write and one prewrite log record.As before, copy, read, and transfer accesses require

read-locks whereas prewrite and write accesses requireprewrite- and write-locks, respectively. All the con¯ictrelations are the same as before except that a prewrite-lock does not con¯ict with a write-lock. When an accesstransaction commits, its lock is passed to its parent andso on except that write and transfer accesses release theirlocks to the external agent. When a transaction aborts,its lock is released to its parent. The locking rules aresame as given in Section 3.1.

6. Conclusions

A nested transaction model, its concurrency controland recovery algorithms are presented in this paper. Wehave introduced the concept of a recovery point sub-transaction and prewrite operation in our model toachieve higher concurrency. Our recovery algorithmconsisting of system restart operations; analysis andredo operations and bu�er management operations,which are modeled in terms of nested transactions.Modeling recovery operations in terms of subtransac-tions increases concurrency during system restart oper-ations. As a future work, our transaction model need toextend in the context of orthogonally persistent pro-gramming languages where serious problems arise un-less all computation exists within a transactionalcontext, but this restricts concurrency. We are exploringthis issue further. We are also looking into adapting ourtransaction recovery model in mobile computing envi-ronment (Madria et al., 1999). We are working on theveri®cation of our recovery algorithm using I/O au-tomaton model (Fekete et al., 1993). Each component ofour recovery model is modeled as I/O automata and isspeci®ed with the help of some pre and post conditionsto capture the operational semantics. We have provedsome invariant that leads towards the correctness of themodel. Thus, our model gives clear understanding of therecovery algorithm. This work will be reported as aseparate paper (Madria and Bhargava, 1999), wheremain thrust is only on the proof of correctness.

References

Fekete, A., Lynch, N., Merrit, M., Whiel, W., 1993. Atomic

Transactions. Morgan-Kaufmann, Los Altos, CA.

Gray, J., et al., 1981. The recovery manager of the System R database

manager. ACM Comput. Surveys 13 (2), 223±244.

Harder, T., Reuter, A., 1983. Principles of transaction oriented

database recovery. ACM Comput. Surveys 15 (4), 287±378.

Korth, H.F., Kim, W., Bancilhon, 1990. On long-duration CAD

transactions. Inform. Sci. 46, 73±107.

Kim, W., Lorie, R., Mcnabb, D., Plou�e, W., 1984. A transaction

mechanism for engineering design databases. In: Proceedings of the

10th International Conference on Very Large Databases. VLDB

Endowment, pp. 355±362.

Fig. 9. Redo pass.

164 S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165

Page 15: An open and safe nested transaction model: concurrency and ...

Korth, H.F., Speegle, G., 1990. Long duration transactions in software

design projects. In: Proceedings of the Sixth IEEE International

Conference on Data Engineering. New York, pp. 568 ± 574.

Madria, S.K., 1995. Concurrency control and recovery algorithms in

nested transaction environment and their proofs of correctness,

Ph.D. thesis. Indian Institute of Technology, Delhi, India.

Moss, J.E.B., 1985. Nested Transactions: An Approach to Reliable

Distributed Computing. MIT Press, Cambridge, MA.

Moss, J.E.B., 1987. Log-based recovery for nested transaction,

COINS, Technical Report. University of Massachusetts at Am-

berest, pp. 87±98.

Madria, S.K., Bhargava, B., 1999. Improving Availability in Mobile

Computing Using Prewrite Operations, CSD-TR-32. Department

of Computer Sciences, Purdue University, IN, June 1997 (under

revision in Distributed and Parallel Databases).

Madria, S.K., Bhargava, B., 1997a. System de®ned prewrites to

increase concurrency in databases. In: Proceedings of the First

East-European Symposium on Advances in Databases and Infor-

mation Systems (in co-operation with ACM-SIGMOD). St.-

Petersburg, Russia.

Mohan, C., Haderle, D., Landsay, B., Pirahesh, H., Scwartz, P., 1992.

Aries: a transaction recovery method supporting ®ne-granularity

locking and partial rollbacks using write-ahead logging. ACM

Trans. Database Syst. 17 (1).

Madria, S.K., Maheshwari, S.N., Chandra, B., 1997b. Formalization

and correctness of a concurrency control algorithm for an open and

safe nested transaction model using I/O automaton model. In:

Proceedings of the Eighth International Conference on Manage-

ment of Data (COMAD'97). Madras, India.

Madria, S.K., Maheshwari, S.N., Chandra, B., Bhargava, B., 1997c.

Crash recovery algorithm in an open and safe nested transaction

model. In: Proceedings of the Eighth International Conference on

Database and Expert System Applications (DEXA'97), France,

Lecture Notes in Computer Science, vol. 1308. Springer, Berlin.

Madria, S.K., Maheshwari, S.N., Chandra, B., Bhargava, B., 1999.

Crash Recovery in an Open and Safe Nested Transaction Model:

Formalization and Correctness (under communication to Journal).

Mohan, C., Rothermel, K., 1989. Recovery protocol for nested

transaction using write-ahead logging. IBM Technical Disclosure

Bulletin 31(4), September 1988 (also appeared in the Proceedings of

the 15th VLDB Conference, Amsterdam, 1989).

Muth, P., Rakow, T.C., Weikum, G., Brossler, P., Hasse, C., 1993.

Semantic concurrency control in object-oriented database systems.

In: Proceedings of the Ninth IEEE International Conference on

Data Engineering. pp. 233±242.

Resende, R.F., Agrawal, D., Abbadi, A.E., 1994. Semantic locking in

object oriented database systems, Technical Report TRCS 94-01.

University of California at Santa Barbara.

Weikum, G., Hasse, C., Brossler, P., Muth, P., 1990. Multi-level

recovery. In: Proceedings of the Ninth ACM Symposium on

Principles of Database Systems. Nashville, pp. 109±123.

Weikum, G., 1991. Principles and realization strategies of multi-level

transaction management. ACM Trans. Database Syst. 16 (1).

Sanjay Kumar Madria has obtained his Ph.D. from Indian Institute ofTechnology, Delhi, India in 1995. He has been a Visiting AssistantProfessor in the Department of Computer Science, Purdue University,USA since 1999. He has earlier worked in Nanyang TechnologicalUniversity, Singapore and University Sains Malaysia, Penang, Ma-laysia. He has widely published in the area of nested transactionprocessing, mobile transaction processing and more recently on man-agement of web data. He is a Guest-editor of WWW Journal. He isProgram-Chair for EC&WEB 2000 conference to be held in UK. He isPC member of many leading international database conferences andreviewers of many database journals. He has given many invitedseminars and tutorials in leading institutes and database conferences(EDBT'2000). He has served as invited Panelist in NSF (NationalScience Foundation) and in some other database conferences.

S.N. Maheshwari received his Ph.D. in Computer Science fromNorthwestern University, USA. He is currently a Professor of Com-puter Science and Engineering at the Indian Institute of Technology,Delhi, India, where he has held the appointments of Head, Departmentof Computer Science and Engineering, and Dean, UndergraduateStudies. His research interests range from graph algorithms, compu-tational geometry, distributed and parallel computing, to theory oftransaction processing.

B. Chandra is a Professor of Computer Applications at the IndianInstitute of Technology for the past two decades. Her area of spe-cializations include Distributed Databases with special reference torecovery, knowledge acquisition using Neural Networks. She haspublished a number of research papers in reputed journals in these twoareas. She has held visiting faculty positions at the University ofPittsburgh, USA and the Penn State University, USA. She has alsoheld positions at the World Bank at Washington D.C. and INRIA,Paris.

Bharat Bhargava is a full professor in the Department of ComputerScience, Purdue University, West Lafayette, USA. His research in-volves both theoretical and experimental studies in distributed systems.Professor Bhargava is on the editorial board of three internationaljournals. In the 1988 IEEE Data Engineering conference, he and JohnRiedl received the best paper award for their work on A Model forAdaptable Systems for Transaction Processing. Professor Bhargava isa fellow of Institute of Electrical and Electronics Engineers and In-stitute of Electronics and Telecommunication Engineers. He has beenawarded the Gold Core Member distinction by IEEE Computer So-ciety for his distinguished service. He has been awarded IEEE CStechnological Achievement Award for Adaptability and fault-tolerancein communication network in 1999.

S.K. Madria et al. / The Journal of Systems and Software 55 (2000) 151±165 165