001.Integrity Checking

download 001.Integrity Checking

of 17

Transcript of 001.Integrity Checking

  • 8/2/2019 001.Integrity Checking

    1/17

    Inconsistency-Tolerant Integrity CheckingHendrik Decker and Davide Martinenghi

    AbstractAll methods for efficient integrity checking require all integrity constraints to be totally satisfied, before any update is

    executed. However, a certain amount of inconsistency is the rule, rather than the exception in databases. In this paper, we close the

    gap between theory and practice of integrity checking, i.e., between the unrealistic theoretical requirement of total integrity and the

    practical need for inconsistency tolerance, which we define for integrity checking methods. We show that most of them can still be used

    to check whether updates preserve integrity, even if the current state is inconsistent. Inconsistency-tolerant integrity checking proves

    beneficial both for integrity preservation and query answering. Also, we show that it is useful for view updating, repairs, schema

    evolution, and other applications.

    Index TermsIntegrity checking, inconsistency tolerance.

    1 INTRODUCTION

    INTEGRITY constraints are statements declared in the

    database schema. They express semantic properties,meant to be invariably satisfied by the stored data acrossstate changes.

    For preserving the satisfaction of simple constraints likeprimary keys, foreign keys, or CHECK constraints, suffi-cient support is usually provided by the database manage-ment system (DBMS). For constraints that are not supportedby the DBMS, the majority of scientific publications on thesubject proposes to use some automated, application-independent method for integrity checking.

    Each such method takes as input the set of constraints inthe schema, an update consisting of two (possibly empty)

    sets of database elements to be inserted or, respectively,deleted, and possibly the current, also called old state ofthe database. The output of the methods indicates whetherthe new state, obtained from updating the old state,would satisfy or violate integrity.

    In theory, each method requires the total integrity of theold state, i.e., no violation whatsoever is tolerated at anytime. Total integrity, however, is the exception, rather thanthe rule in practice.

    Integrity violation may sneak into a database in manyways. For instance, new constraints may be added without being checked for violations by legacy data. Or, integrity

    control may be turned off temporarily, e.g., when uploadinga backup for which a total check would last too long. Or,integrity may deteriorate by migrating to the DBMS of adifferent vendor, since the semantics of integrity constructstends to be proprietary. Or, integrity may be compromised

    by the integration of databases, when constraints that had

    held locally fail to hold after databases have been merged.Other database applications where inconsistencies may

    occur are view updating, schema evolution, data miningand warehousing, diagnosis, replication, uncertain data,

    and many more.Often, users consider efforts to completely repair all

    inconsistencies unnecessary, inopportune, unaffordable, orimpossible. Violations of constraints may even be desirable,

    e.g., when constraints are used to detect irregularities, suchas indications of security attacks, tax dodging. So, eventhough the standard logic foundations are intolerant wrt.inconsistency, there is a strong practical need for integrity

    checking methods that are able to tolerate extant cases ofconstraint violations.

    For convenience, we abbreviate, from now on, incon-sistency-tolerant integrity checking by ITIC.

    Fortunately, no new methods for ITIC have to beinvented. The main purpose of this paper is to show thatthe gap between theory and practice of integrity checkingcan be closed by already approved, time-tested methods.

    Contrary to common belief, these methods can waive theunrealistic requirement of total integrity satisfaction, with-out forfeiting their capacity to check integrity, even in thepresence of inconsistency. Our approach to ITIC yields

    major quality improvements, both of data wrt. theirintended semantics and of answers to queries.

    The aims pursued in this paper are the following:

    1. To distinguish methods that are inconsistency-tolerantfrom those that are not. In this paper, we formalize thenotion of ITIC. Before that, the behavior of methodsfor checking declaratively stated constraints in thepresence of inconsistency has never been contem-plated. Traditionally, integrity checking methodswere not legitimized to be used in the presence ofinconsistency, although many databases are not

    totally consistent. Now, inconsistency-tolerant meth-ods can be soundly used in the presence of anarbitrary amount of inconsistency. Thus, the applic-ability of integrity checking methods is widened

    218 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

    . H. Decker is with the Instituto Tecnolo gico de Informatica, CiudadPolitecnica de la Innovacion, Universidad Politecnica de Valencia, Campusde Vera, edificio 8G, E-46071 Valencia, Spain. E-mail: [email protected].

    . D. Martinenghi is with the Dipartimento di Elettronica e Informazione,Politecnico di Milano, Via Ponzio, 34/5, I-20133 Milano, Italy.E-mail: [email protected].

    Manuscript received 25 May 2009; revised 6 Oct. 2009; accepted 18 Oct.

    2009; published online 5 May 2010.Recommended for acceptance by B.C. Ooi.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TKDE-2009-05-0462.Digital Object Identifier no. 10.1109/TKDE.2010.87.

    1041-4347/10/$26.00 2011 IEEE Published by the IEEE Computer Society

  • 8/2/2019 001.Integrity Checking

    2/17

    immensely. To the best of our knowledge, ourdefinition is the first of its kind.

    2. To bridge the gap between theory and practice of integritychecking by using inconsistency tolerance. The theore-tical total-integrity requirement is a formidabledesideratum in practice. Typically, practical ap-proaches to deal with extant inconsistency are basedon exception handling. They tend to have thecharacter of workarounds or ad hoc solutions.Theoretical approaches to deal with extant incon-sistency have been based on nonclassical logics, suchas modal, many-valued or paraconsistent calculi.Our approach is based on classical logic and doesnot need any changes or adaptations of existingintegrity checking methods.

    3. To evaluate the effects of ITIC on database evolution andquery answering. Ultimately, integrity checking isabout preserving the semantics of data throughupdates and, consequently, obtaining query answersthat can be trusted. Without total integrity, full

    trustability is lost. Yet, some databases may be lessinconsistent than others, and thus better behavedwrt. query answering. In this paper, we propose acomprehensive set of experiments for observing theimpact of ITIC on databases subject to evolutionthrough updates. We report both on the number ofconstraint violations and on the number of incorrectanswers to complex benchmark queries. We alsocompare our approach to consistent query answer-ing [1], which is an orthogonal technique for dealingwith inconsistent data.

    4. To describe several application contexts that may benefit

    from ITIC. The vision brought forward in this papercan be applied to various knowledge and datamanagement problems. We show that ITIC naturallyextends to view updates, database repairs, schemaevolution, risk management, and unsatisfiabilityhandling.

    Section 2 outlines the background. The main contribu-tions are: to develop a concept of ITIC (Section 3), to showthe inconsistency tolerance of known methods (Section 4),to outline several applications of ITIC (Section 5), and tovalidate the practical relevance of ITIC (Section 6). Relatedwork is discussed in Section 7. In Section 8, we conclude.

    2 PRELIMINARIES

    We adopt the usual terminology and notation ofdatalog andrefer to textbooks in the field (e.g., [2]) for furtherbackground.

    2.1 Logic and Databases

    Throughout, let symbols a, b, . . . denote constants, p, q, . . .predicates and x, y, . . . variables. A term is either a variable ora constant. Sequences of terms are denoted as vectors, e.g., ~t.Predicates, terms, logical connectives $;^;_; , 0-arypredicates true, false, and quantifiers 8, 9 are used in

    formulas, defined as follows: 1) If p is an n-ary predicate andt1; . . . ; tn are terms, then pt1; . . . ; tn is a formula. 2) IfF andG are formulas, then so are $F, F^G, F_G, F G, andF G. 3) If F is a formula and x a variable such that

    neither 8x nor 9x occurs in F, then 8xF and 9xF areformulas; in 8xF and 9xF, each occurrence ofx in F is saidto be bound. A formula in which all variables are bound issaid to be closed. A formula preceded by $ is said to benegated. Formulas of the form pt1; . . . ; tn, where p is apredicate and the ti are terms, are called atoms. A literal iseither an atom ( positive literal) or a negated atom (negativeliteral).

    An expression is either a formula or a term. A substitution is a set of pairs of terms fx1=t1; . . . ; xn=tng, wherex1; . . . ; xn are distinct variables; let fx1; . . . ; xng be denotedby Dom. The restriction of a substitution to a set ofvariables V Dom is the substitution 0 such thatDom0 V.

    For an expression E (or a set of expressions E) and asubstitu- tion , the expression E (E be obtained byreplacing each occurrence of each variable from Dom in E(E) by the corresponding term in . An expression is calledground if it contains no variable. A substitution is moregeneral than a substitution if there is a substitution such

    that, for each expression E, E E. A substitution is aunifier of expressions E1; . . . ; En if E1 En; is amost general unifier (mgu) of E1; . . . ; En if is more generalthan any other unifier of E1; . . . ; En.

    A clause is a formula of the form H B1 ^ ^Bn(n ! 0), where H is a positive literal, B1; . . . ; Bn are literalsand, implicitly, each variable in H B1 ^ ^Bn isuniversally quantified in front of the clause. H is calledthe head and B1 ^ ^Bn the body of the clause. The head isoptional; when absent, the clause is called a denial, and itsbody can be read as a condition that must not hold. Theempty clause is a denial with an empty body; it is equivalent

    to false. A fact is a clause whose head is ground and whosebody is empty.A database clause is a clause with nonempty head

    H62 ftrue, falseg. A database is a finite set of databaseclauses. The dependency graph DD of a database D is adirected graph such that its nodes are labeled with thepredicates in D, and there is a positive (resp., negative) arcp;q in DD for each clause H B in D and each pair ofpredicates p, q such that q occurs in H and p in a positive(resp., negative) literal in B. A database D is relational ifeach clause in D is a fact; D is hierarchical if no cycle exists inDD, i.e., no predicate recurs on itself; D is stratified if nocycle with a negative arc exists in DD, i.e., no predicaterecurs on its own negation.

    An update is a bipartite finite set of clauses to be deletedand inserted, respectively. For a database D and an updateU, let DU denotes the updated database; we also call D andDU the old and the new state, respectively. For a fact A in Uto be inserted or deleted, we may write insert A or, resp.,delete A.

    2.2 Integrity

    We are going to formalize basic notions of databaseintegrity.

    2.2.1 SyntaxAn integrity constraint (in short constraint) is a closed-first-order predicate logic formula. As usual, constraints arerepresented either in a denials or in prenex conjunctive

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 219

  • 8/2/2019 001.Integrity Checking

    3/17

    normal form (PCNF), i.e., formulas of the form I QI0,where Q is a sequence of quantified variables Q1x1 . . . Qnxn,each Qi is either 8 or 9, and the so-called matrix I0 is aconjunction of disjunctions of literals.

    A variable x in a constraint I is called a global variable inI, if x is 8-quantified and 9 does not occur left of8x in thePCNF ofI. Let GlbI denote the set of global variables in I.

    An integrity theory is a finite set of integrity constraints.

    2.2.2 Semantics

    We use true and false also to denote truth values. We onlyconsider databases that have a two-valued semantics, givenby a unique standard model, e.g., stratified databases withthe stable model semantics [3]. That also determines thesemantics of integrity, as follows:

    Let I be a constraint, IC an integrity theory, and D adatabase. We write DI true (resp., DIC true)andsaythat I(resp., IC) is satisfied in D, ifI(resp., each constraint inIC) is true in D. Else, we write DI false (resp.,DIC false) and say that I (resp., IC) is violated in D.

    In literature, the semantics of integrity is not alwaysdefined by the truth or the falsity of constraints, as in thepreceding definition. For instance, the consistency viewin [4] defines satisfaction not by truth, but by satisfiability.The theoremhood view in [5] defines that a constraint isviolated if it is not true, which does not necessarily meanthat it is false, e.g., in the completion of databases withpredicates defined by recurring on themselves. The preced-ing definition avoids such incongruences, as long as onlydatabases, with a unique two-valued model, are considered.

    2.2.3 Soundness and Completeness

    Each integrity checking method M can be formalized as afunction that takes as input a database, an integrity theory,and an update, and outputs either sat or vio. To computethis function, usually, is much more efficient than the brute-force method, henceforth denoted by Mbf, which exhaus-tively evaluates all constraints upon each update.

    The soundness and completeness of integrity checkingmethods can now be generically defined as follows:

    Definition 2.1 (Sound and complete integrity checking).An integrity checking method M is called sound or, resp.,complete, if, for each database D, each integrity theory ICsuchthat DIC true, and each update U, (1), or, resp., (2) holds.

    IfMD;IC;U sat then DUIC true: 1

    If DUIC true then MD;IC;U sat: 2

    Definition 2.1 only states soundness and completenessproperties for the output sat ofM. Symmetrically, sound-ness and completeness properties for the output vio could be defined. We refrain from doing so, since, under theadditional condition that M terminates, it is easy to showthat soundness and completeness for sat is equivalent tocompleteness and, resp., soundness for vio.

    Soundness, completeness, and termination have been

    shown for the methods in [6], [5], [4], [7], and others. Othermethods (e.g., [8], [9]) are only shown to be sound. Thus,they provide sufficient but not necessary conditions forguaranteeing integrity.

    2.2.4 Simplifications

    Most methods for efficient integrity checking attempt tosimplify the constraints that are potentially violated by anupdate U, so that computing MD;IC;U becomes moreefficient than querying all constraints by brute force.

    Example 2.1. Let pISBN;TITLE be a relation withpredicate p about published books, and I the constraint

    px; y ^px; z ^ y 6 z:

    I states that no two books with the same ISBN may havedifferent titles. Let U be an update that inserts pi; t. Forany database D, most methods M computeMD;fIg; Uby evaluating DI0, where I0 is the simplified constraint pi; y ^ y 6 t. It states that no book with ISBN i mayhave a title different from t. If M is sound (andcomplete), the new state DU is guaranteed to satisfy Iif (and, resp., only if) DUI0 true.

    Such simplifications typically yield major gains in effi-ciency, as can be seen by comparing I and I0 in Example 2.1.For any given update pattern, simplifications can begenerated even without depending on any database state, but only on the schema and the integrity theory. Thus,database performance is not affected, since simplificationscan be anticipated ahead of update time. For instance, take iand t in Example 2.1 as placeholders for actual ISBNs andtitles. For the insertion of a concrete fact, e.g., p17;abc,values 17, and abc replace i and, resp., t in I0. The cost ofchecking the resulting simplification then is that of a tablelookup, while the brute-force evaluation of I would bequadratic in the size of the extension ofp (if no index is used).

    3 INCONSISTENCY TOLERANCE

    The motivation behind this paper is the need for methodsthat are capable of checking constraints without insisting ontotal integrity satisfaction. No method has ever beendefined without requiring total integrity, which wasthought of as indispensable. However, inconsistencies oftenare unavoidable, or even useful (e.g., for diagnosis, ormining fraudulent data). Thus, extant cases of violatedconstraints should be tolerable. Nonetheless, integritychecking should prevent that any new cases of integrityviolation are introduced. That is captured by the definitions

    in Section 3.1.3.1 The Main Definitions

    The goal of this section is to characterize methods that cantolerate extant cases of constraint violation in databases.For attaining that goal, we first formalize what we mean by case.

    Definition 3.1 (Case). Let I be a constraint and asubstitution. I is called a case of I, if Dom GlbI;it is a basic case, if GlbI ;. For a database D and anintegrity theory IC, let SD;IC denote the set of all cases Cof all constraints in IC such that DC true.

    Example 3.1. Consider two relations with predicates r, s,andthe foreign key constraint I 8x; y9zsx; y rx; z onthe first argument of s, which references a primary key,the first argument ofr. The global variables ofIare x and

    220 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

  • 8/2/2019 001.Integrity Checking

    4/17

    y. For a fact sa; b to be inserted, integrity checkingusually focuses on the basic case 9zsa; b ra; z ofI.It requires the existence of a fact in r, whose primary keyvalue matches the foreign key value of the inserted fact.Other cases are ignored.

    Example 3.1 illustrates the crucial role of cases for ITIC.

    Intuitively, at most those cases, whose global variablesmatch with the values of the update, need to be checked.All other cases can be ignored, even if they are violated.Most methods in the literature and in practice work thatway: they focus on cases that may be violated by updatedfacts while ignoring extant violations. However, thetraditional theory of simplified integrity checking, anchoredin Definition 2.1, does not reflect that focus. Rather, itcoarsely treats each constraint I as either satisfied orviolated. It does not consider that, e.g., only a few, tolerablecases ofI may be violated, while all others are satisfied. Thefollowing definition does:

    Definition 3.2 (Inconsistency-tolerant integrity checking).An integrity checking method M is sound or, resp., completewrt. inconsistency tolerance if, for each database D, eachintegrity theory IC, and each update U, (3) or, resp., (4) holds

    IfMD;IC;U sat then SD;IC SDU; IC: 3

    If SD;IC SDU; IC then MD;IC;U sat: 4

    As opposed to the traditional Definition 2.1, Definition 3.2does not require total integrity, i.e., it may well be thatDIC false. However, in both definitions, the samefunction MD;IC;U is used for integrity checking. Thus,

    no new method needs to be invented for achievinginconsistency tolerance. Rather, any traditional method canbe employed, if it complies with (3).

    Example 3.2. Let I be as in Example 2.1. Let D consist ofp1; a and p1; b. Clearly, DI false. Let Ufinsert p2; cg. The simplification p2; y ^ y 6 c, asobtained in Example 2.1, is true in DU. Each method Mthat evaluates this simplification outputs sat, i.e., U isaccepted because it does not introduce any violation ofintegrity. Thus, M guarantees that all cases of I, thatwere satisfied in D, remain satisfied in DU whiletolerating the inconsistency of violated cases of I.

    Several nontrivial examples and counterexamples forDefinition 3.2 are featured in Section 4. A trivial example ofa method that is sound wrt. inconsistency tolerance, is Mbf.However, Mbf is not complete wrt. inconsistency tolerance,as shown below.

    Example 3.3. Let D, I, and U be as in Example 3.2. Clearly,MbfD; fIg; U vio, but the only violated basic case ofI in DU, p1; a ^p1; b ^ a 6 b, was already violatedin D.

    Theorem 1 below states that ITIC generalizes thetraditional approach which insists on total integrity. Thegeneralization is proper, i.e., some, but not all, methods areinconsistency-tolerant, as we shall see in Section 4.4.

    Theorem 1. Let M be a method for integrity checking. Then, foreach database D, each integrity theory IC such thatDIC true, and each update U, the implications (3) ) (1)and (4) ) (2) hold.

    Proof. To show 3 ) 1, note that DIC true entailsIC SD;IC.Hence,if(3)holdsandMD;IC;U sat,theconclusion of (3) entails DUC true for each C2 IC.

    Hence, (1) follows. Similarly, (4) ) (2) can be shown. tu

    Theorem 1 entails that relaxing traditional integritychecking (which requires total integrity) to ITIC causes noloss of efficiency and no extra cost at all. On the other hand,the gains are immense: with an inconsistency-tolerantmethod, database operations can proceed even in thepresence of (obvious or hidden, known or unknown)violations of integrity. As opposed to that, integrity checkingwas traditionally not legitimate in the presence of constraintviolations, i.e., it had to wait until integrity was repaired.

    Example 3.4 below illustrates another advantage of ITIC:

    each update that does not introduce any new case ofintegrity violation can be accepted, while extant violatedcases may disappear intentionally or accidentally.

    Example 3.4. Let D and I be as in Example 3.2, and letU fdelete p1; bg. Since DI false, no method thatinsists on total integrity is in the position to check thisupdate; however, each inconsistency-tolerant method is.Each method that is complete wrt. inconsistency toler-ance (and, in fact, each method assessed in Section 4)returns sat for this example, since U does not introduceany new violation of I. Since U even repairs a violatedconstraint, there are several reasons to accept this

    update, and the lack of total integrity is no reason toreject it.

    3.2 Sufficient and Necessary Conditions

    We are now going to discuss conditions that will be used forassessing the inconsistency tolerance of methods in Sec-tion 4. Conditions (5) and (6) below are sufficient forsoundness (3) and, resp., completeness (4), as shown inTheorem 2. Later, (8), which is also interesting on its own, isshown to be necessary for (4)

    IfMD;IC;U sat;

    then; for each C2 SD;IC; MD; fCg; U sat: 5If; for each C2 SD;IC; MD; fCg; U sat;

    then MD;IC;U sat: 6

    Theorem 2. Let M be a sound method for integrity checking.Then, for each database D, each integrity theory IC, theimplications (5) ) (3) and (6) ) (4) hold.

    Proof. By applying (1), the then part of (5) becomes

    for each C2 SD;IC; DUC sat; 7

    which is the same as SD;IC SDU; IC, hence the

    thesis. Similarly, applying (1) on the if part of (6)yields (4). tu

    In Section 4, Condition (5) is verified for the methods in[6], [5], [4], and (6) is verified for [6]. Interestingly, we are

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 221

  • 8/2/2019 001.Integrity Checking

    5/17

    going to see that many other methods turn out to not fulfill(4), since, e.g., they may output vio whenever an updateyields a redundant new path for deriving some alreadyviolated case. So, if the update causes no other violation ofintegrity, the premise of (4) holds, but its conclusion doesnot. In other words, the output vio of methods that are soundbut incomplete wrt. inconsistency tolerance do not guaranteethat the given update would violate a case of some constraintthat was satisfied in the old state. However, the following,somewhat weaker property holds for several methods:

    Definition 3.3 (Weakly complete inconsistency tolerance).Let M be a method for integrity checking. M is called weaklycompletewrt.inconsistency tolerance if, for each database D,each integrity theory ICand each update U, the following holds:

    If DUIC true then MD;IC;U sat: 8

    The technical difference between (2) and (8) is that, for(2), total integrity of the old state is required, but not for (8).

    In practice, weak completeness wrt. inconsistency toler-

    ance is a desirable property: The output vio of any sound, butincomplete integrity checking method, means that furtherchecking is needed for deciding, if the update preserves orviolates integrity. However, the contraposition of (8) ensuresthat, if a weakly complete method outputs vio, integritysurely is violated after the update, i.e., no further checking isneeded. In fact, it is easy to show the following directconsequences of Definitions 2.1, 3.2, and 3.3:

    Corollary 3. Let M be a method for integrity checking.

    1. IfM is complete wrt. inconsistency tolerance, then itis also weakly complete wrt. inconsistency tolerance

    ((4) ) (8)).2. IfM is weakly complete wrt. inconsistency tolerance,

    then it is also a complete integrity checking method((8) ) (2)).

    4 ASSESSMENT OF INTEGRITY CHECKING METHODS

    As seen in Section 3, the differences between traditionalintegrity checking and ITIC are quite subtle. However, itwould be wrong to think that inconsistency tolerance wasfor free or marginal. In this section, we assess five methodsto determine if they are or are not inconsistency-tolerant,

    spanning from the seminal work by Nicolas [6] to morerecent ones.

    The result that many, though not all, well-knownmethods are inconsistency-tolerant, is of utmost practicalsignificance, since each simplification method hitherto has been believed to be dis-capacitated, hence useless in thepresence of inconsistency. To show that several methodscontinue to function well even if integrity is violated thusbreaks radically with all expectations. Without this result,there would be no justification at all for using integritychecking methods in inconsistent databases.

    We chose methods [6], [5], [4] due to their impact on

    subsequent works. In particular, [6] initiatedand popularizedthe notion of simplification. Its extensions in [5] and [4] havegeneralized integrity checking to datalog. A lot moreextensions have appeared. Since it is unfeasible to discuss

    them all, we have chosen just two more methods. (Others are

    analyzed in [10].) One is from the 1990s [8]. It excels for

    constraints that lend themselves to optimizations related to

    query containment [11]. The other is from the 2000s [7]. It

    generates provably optimal simplifications, and generalizes

    previous methods that evaluate their simplifications in the

    old (instead of the new) state. Thus, costly rollbacks of

    updates that violate integrity are avoided. Table 1 sum-

    marizes theproperties of themethodsassessedin this section.

    4.1 The Method of NicolasWe are going to show that the well-known method for

    integrity checking by Nicolas [6], henceforth denoted by

    MN, is sound and complete wrt. inconsistency tolerance.We adopt the notation f;I from [6]. For a database D, a

    constraint I ~QI0 in PCNF, and a fact f to be inserted, MNgenerates the simplification

    f;I ~QI01 ^ . . . ^ I

    0m m ! 0; 9

    where the i are unifiers, restricted to GlbI, of f and the m

    different occurrences of negated atoms in I unifying with f.

    Then, each occurrence off in

    f;I is replaced by true in

    f;I,which is then further simplified by standard rewritings.

    Symmetrically, for a fact f to be deleted, a simplification

    obtained by instantiating I with restricted unifiers of f and

    nonnegated occurrences of matches of f, is generated. For

    simplicity, we only deal with insertions here; result and

    proof wrt. deletions are symmetrical.Under the total integrity premise DI true, the

    simplification theorem in [6] states that DUI true iff

    DUf;I true.

    Example 4.1. Let Iand Ubeas in Example 2.1.A PCNFofIis

    8x 8y 8z$px; y _ $px; z _ y z:

    Clearly, pi; t unifies with two atoms in I, by unifiers

    fx=i, y=tg and fx=i, z=tg. The simplification pi;t;Ireturned by MN is 8x 8y 8z ($pi; t _ $pi; z _ t z)

    ^ ($pi; y _ $pi; t _ y t). Since the two conjuncts are

    obviously equivalent, one of them can be dropped,

    yielding 8x8y ($pi; y _ $pi; t _ y t). Then, replacing

    pi; t by true and dropping the corresponding disjunct

    yields the same simplification as in Example 2.1.It is worth noting that each simplification step above

    is believed to be valid in [6] (and in fact in all the rest of

    the literature on integrity checking methods) only if I issatisfied in the old state. Theorem 4 rebuts this belief byconfirming that the simplifications are valid, also if I isviolated in the old state.

    222 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

    TABLE 1Properties of Integrity Checking Methods

  • 8/2/2019 001.Integrity Checking

    6/17

    Theorem 4 (Inconsistency tolerance of MN). MN is soundand complete wrt. inconsistency tolerance in relationaldatabases.

    Proof. Let D be a relational database, IC an integrity theory,and Uan update. For the proof below, we assume that ICis singleton and U insert f, for some fact f62 D. Asymmetric proof for deletions and straightforward exten-

    sions to multiple updates and constraints are omitted. Inpart 1), we show soundness, in 2) completeness.

    1) Soundness: Let I ~QI0 be an integrity constraint inPCNF with matrix I0, f;I the simplification ofMN for Iand U, a substitution such that Dom GlbI, andDI true. We have to show that DUI true; ifDUf;I true. According to Theorem 2, it suffices toshow:

    If DUf;I true then DUf;I true;

    where f;I is the simplification of I by MN. For eachconjunct J in f;I, there is a negated atom g in I such

    that g g f, where and are the substitutionsused to compute f;I and, resp.,

    f;I. Thus, J I

    0I0 also occurs in f;I. Hence,

    f;I entails

    f;I.

    2) Completeness: We show the following contrapositiveclaim:

    IfMND;I;U vio then there is a case C of I

    such that DC true and DUC false:10

    Assume that MND;I;U vio. We distinguishDI true an d DI false. If DI true, t he nDUI false, since MN is complete for integritychecking. Hence, (10) follows.

    Now, let DI false. Since MN reports a violation,there is a conjunct I0 in f;I such that C

    ~QI0 isviolated in DU, where is a unifier of f and a negativeliteral in I. Thus, one of the disjuncts in I0 is a negatedoccurrence of f. Since f62 D, DC true holds. Hence,(10) follows. tu

    4.2 The Method of Lloyd, Sonenberg, and Topor

    We are going to show that the integrity checking method in[5], here denoted by MLST, is sound and weakly completewrt. inconsistency tolerance.

    In [5], two sets posD;D0 and negD;D0 are defined that

    capture the difference between any two databases D and D0such that D D0. The sets consist of atoms that either arethe head of some clause in the update U D0 nD or thehead of a clause in D that is possibly affected by reasoningforward from clauses in U. In particular, posD;D0 captures asuperset of the facts that are actually inserted, i.e., provableafter the update but not before, and negD;D0 a superset of thefacts that are actually deleted, i.e., provable before but notafter the update.

    Let D be a stratified database and U an update thatpreserves stratification. Applying the deletions in U to Dleads to an intermediate state D00. Then, applying the

    insertions in U to D00

    leads to the updated state D0

    DU

    . Itis shown in [5] that posD00 ;D0 [ negD00;D captures a superset offacts that are actually inserted, and negD00;D0 [posD00;Dcaptures a superset of facts that are actually deleted by U.

    Thus, the principles for identifying all relevant, i.e.,potentially violated constraints, as established in [6], applyas follows: Only those atoms in posD00;D0 [ negD00 ;D that unifywith the atom of a negative literal in I by some mgu ,capture a possibly inserted fact that may violate integrity. Itis checked by evaluating ~QI00, where 0 is the restriction of to GlbI, and I0 the matrix ofI. Symmetrically, only thoseatoms in neg

    D

    00

    ;D

    0 [posD00;D that unify with the atom of apositive literal in I by some mgu , capture a possiblydeleted fact that may violate integrity. It is then checked byevaluating the case ~QI00 ofI, where 0 is defined as above.

    Let I;D;U denote the set of all such substitutions0 for identifying relevant constraints. Assuming the totalintegrity premise DI true, the simplification theoremin [5] states that, for any stratified database D andupdate U preserving stratification, DUI true iff, for all 2 I;D;U, DU~QI0 true.

    Example 4.2. Let D consist of the four clauses

    px; y qx ^ ry; rb;

    px; y sy; x; sb; a;

    I px; a, and U finsert qag. Clearly, DI true. MLST generates I ;D ;U fx=ag, indicating thatsome fact matching pa; y may violate I. Thus, thesimplification to be evaluated is pa; a. Hence,MLSTD; fIg; U sat. By the soundness of MLST,DUI true follows.

    Note that MLSTD; fIg; U sat; also if sa; b 2 D.This highlights the inconsistency tolerance ofMLST, asstated below.

    Theorem 5 (Inconsistency tolerance of MLST). MLST is

    sound wrt. inconsistency tolerance in stratified databases.Proof. Let D be a stratified database, I ~QI0 be a constraint

    in PCNF with matrix I0, I ~QI0 a case ofI, and U anupdate preserving stratification. Assume DI true.Wehave to show that DUI true ifDU~QI0 true,for all 2 I ;D ;U . That follows from (5) andLemma 4.1. tu

    Lemma 4.1. DU~QI0 true for all 2 I;D ;U ifDU~QI0 true for all 2 I ;D ;U .

    This lemma is a direct consequence of the following one:

    Lemma 4.2. For each substitution 2 I;D ;U , there is asubstitution 2 I;D;U that is more general than .

    Proof. Each 2 I;D ;U either originates from apotentially inserted atom A that unifies with a negatedatom A in I or a potentially deleted atom Bthat unifieswith a nonnegated atom B in I. Since I is a case of I,there is a negated atom A0 (or a nonnegated atom B0,resp.) in I such that A A0 (B B0, resp.). Thus, A(B, resp.) a fortiori unifies with A0 (B0, resp.), by somemgu that is more general than . tu

    The method MLST is not complete wrt. inconsistency

    tolerance, as shown by the following counterexample:Example 4.3. For the same D, I, and U as in Example 4.2, let

    D D [ fsa; ag. Clearly, pa; a is a violated case inD; all other basic cases ofI are satisfied in D. Although

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 223

  • 8/2/2019 001.Integrity Checking

    7/17

    U does not introduce any new violated case in DU,MLST still generates the simplification pa; a. Thus,MLSTD

    ; fIg; U vio. Hence, by Definition 3.2, MLSTis not complete wrt. inconsistency tolerance. Yet, wehave the following result:

    Theorem 6 (Weak completeness of MLST). MLST is weaklycomplete wrt. inconsistency tolerance in stratified databases.

    Proof. Let D be a stratified database, IC an integrity theory,and U an update. We have to show that DUIC trueentails MLSTD;IC;U sat. If DUIC true, thenthere is a refutation of I0 in DU for each case I0 ofeach constraint I in IC. Since each simplification of Ichecked by MLST is a case of I, it follows thatMLSTD;IC;U sat. tu

    4.3 The Method of Sadri and Kowalski

    We are going to show that the integrity checking method in[4], here denoted by MSK, is sound and weakly completewrt. inconsistency tolerance.

    Roughly, MSK works as follows: Each integrity theoryIC is a set of denials. Each update U may include denials.Denials to be deleted cannot violate integrity and, thus, aresimply dropped from IC: let IC ICn fI : delete I2 Ug.Denials to be inserted are queried in the new state. If any ofthem is refuted, MSK outputs vio. Else, for checking, if anyother clause in U would cause integrity violation, MSKcomputes an update U0, consisting of all clauses in U to beinserted and all ground negative literals $H, such that H,are true in D and false in DU. For each T2 U0, MSK buildsa resolution tree rooted at T, using input clauses fromDU[ IC. For each derivation in the tree, each step taken

    in is either a standard backward-reasoning step, or aforward-reasoning step from the literal selected in the headofT or of any clause derived from T by previous steps in .In forward steps, the selected literal is resolved with amatching literal in the body of some input clause. If anysuch derivation yields a refutation, MSK outputs vio. If thetree is finitely failed, MSK outputs sat.

    Theorem 7 (Inconsistency tolerance of MSK). MSK is soundwrt. inconsistency tolerance in stratified databases.

    Proof. Let D be a stratified database, IC an integrity theoryand U an update such that MSKD;IC;U sat.Further, let IC and U0 be as described above. We have

    to show that, for each C2 SD;IC, DUC true. ByTheorem 2, it suffices to verify (5), i.e., that for eachC0 2 U

    0, MSK builds a finitely failed tree TC, rooted atC0, with input from D

    U[ fCg.Let C2 SD;IC, i.e., for some I2 IC, C I, for

    some substitution of Glb(I). Further, let C0 2 U0. Since

    MSKD;IC; U sat, there is a finitely failed tree T in

    the search space ofMSK, rooted at C0, built with inputclauses from DU[ IC. From T, TC is obtained asfollows:

    Each derivation in T is replaced, if possible, by thefollowing derivation 0 in TC. It starts from the same root

    as . For each i, 0 i < n, where n is the length of , thei 1th resolvent of0 is obtained as follows: Suppose thejth literal of the ith clause of is selected. Then, also thejth literal in the ith clause of 0 is selected. If the

    i+1th input clause ofis I, then C is used as input clausein 0; if the kth literal is selected in I, then also thekth literal is selected in C. Otherwise, the i 1th inputclause of is also used in 0, for obtaining thei 1th resolvent of 0. Clearly, the latter is of formCi1i1, for some substitution i1, where Ci1 is thei 1th resolvent in .

    At any step of 0, it may be impossible to continue itsconstruction by using the input clause corresponding tothe one used in : either the selected literal does notmatch with the selected literal in the corresponding inputclause in , or the latter is a denial, which cannot be usedas input for TC. In both cases, 0 is discontinued, i.e., 0

    then terminates with failure.It is easy to see that TC is the required finitely failed

    tree. tu

    We illustrate the inconsistency tolerance ofMSK with anexample inspired by a similar one in [12].

    Example 4.4. Let D be a database consisting of clauses 1-5 inFig. 1, defining the predicates r (regular), p (pays taxes), q(quitted) and s (signed). The integrity constraint I denies

    the possibility to have regular status and to have quittedwork at the same time. The update U inserts a clausestating that persons signed by a tax payer also have

    regular status.Clearly, DI false, since rada and qada are true

    in D. The case I0 rbob ^ qbob of I, however, issatisfied in D since qbob is not true in D.

    From the root U,MSK buildsthe tree as shown in Fig. 1(selected literals are underlined). Since this tree is finitelyfailed, it follows that U will not introduce new cases ofinconsistency: all cases of integrity constraints that weresatisfied in D, remain satisfied in DU. In particular, I0 isalso satisfied in DU.

    The method MSK is not complete wrt. inconsistencytolerance, as shown by the following counterexample:

    Example 4.5. Let D, U, and IC be as in Example 4.3, and

    D D [ frag. The only violated basic case in D is pa; a, and U does not introduce any additional one.However, starting from U, MSK derives pa; y ry,

    which it then refutes by two more steps for resolving theliterals in head and body against pa; a and ra.Thus, MSK is not complete wrt. inconsistency tolerance.Yet, we have the following result:

    224 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

    Fig. 1. Clauses and derivation tree of Example 4.4.

  • 8/2/2019 001.Integrity Checking

    8/17

    Theorem 8 (Weak completeness of MSK). MSK is weaklycomplete wrt. inconsistency tolerance in stratified databases.

    Proof. Let D be a stratified database, IC an integrity theory,and U an update for which MSK terminates. We have toshow that DUIC true entails MSKD;IC;U sat.Suppose that MSKD;IC;U vio. Thus, by definitionofMSK, there is a refutation rooted at some clause in U

    with input clauses from DU

    plus a denial clause in IC.Hence, integrity is violated in DU. However, this contra-dicts the supposition above. Since MSK terminates, theresult follows. tu

    4.4 The Method of Gupta et al.

    Not all methods are inconsistency-tolerant. An example isthe well-known method in [8], here denoted by MG. Theconstraints considered in [8] are of the form

    L ^R1 ^ . . . ^Rn ^E1 ^ . . . ^ Ek; 11

    where L is a literal with a local, i.e., accessible predicate; the

    Ri are literals with remote predicates that are not accessiblefor integrity checking; the Ej are evaluable literals witharithmetic expressions. Updates considered in [8] areinsertions of facts into the relation ofL. (In fact, the methodalso works ifL is a conjunction of literals.) For convenience,let l be Ls predicate.

    The main result, Theorem 5.2 in [8], refers to asimplification called reduction. For a constraint I of theform (11) and a fact f inserted in l, the reductionREDf;L ;I is essentially the corresponding simplificationin MN. To check ifI is satisfied after inserting f, MG checksif, for facts g in the extension of l,

    REDf;L ;I v[

    g in l

    REDg;L;I

    holds, where v denotes query containment.

    Example 4.6. The constraint I lx; y ^ rz ^ x z yrequires that no z in r must occur in an interval whoseends are specified by l. Suppose D fl3; 6; l5; 10gand U inserts l4; 8. Then, DUI true is inferred byMG from

    rz ^ 4 z 8 v

    rz ^ 3 z 6 [ rz ^ 5 z 10;

    which essentially expresses that [4, 8] is contained in[3, 10].

    Example 4.7 shows that MG is not inconsistency-tolerant.

    Example 4.7. Consider D fl3; 6; l5; 10; r7g, the caseI0 l4; 8 ^ rz ^ 4 z 8 of I, and also U as inExample 4.6. Clearly, DI false, while DI0 true.Assuming the total integrity premise, MG guarantees, asin Example 4.6, that U does not violate integrity, i.e.,MGD; fIg; U sat. However, D

    UI0 false. ThusMG is not sound wrt. inconsistency tolerance.

    As reported in [8], MG cannot be complete for integritychecking, since constraints may involve inaccessible remotedata. Thus, by Theorem 1, MG is not complete wrt.

    inconsistency tolerance either, nor is it weakly complete,by Corollary 3b.

    4.5 The Method of Christiansen and Martinenghi

    For an integrity theory ICand an updateU, the method in [7],here denoted by MCM, consists of the following two steps:

    . First, a presimplification of IC for U, denoted

    AfterUIC, is computed, asdescribed inDefinition4.3below, such that DAfterUIC DUIC, for everydatabase D.

    . Second, AfterUIC is optimized by removing from itall denials and literals that can be proved to beredundant, assuming that the total integrity premise,i.e., DIC true, holds. The result is denotedOptimizeICAfterUIC.

    To run MCMD;IC;U is to compute the simplificationOptimizeICAfterUIC and evaluate it in D.

    Definition 4.3. Let IC be an integrity theory IC and anupdate U, and let AfterUIC be obtained from IC

    by simultaneously replacing each atom of the form p~tb y p~t ^~t 6 ~b1 ^ ^~t 6 ~bm _~t ~a1 _ _ ~t ~an,where p~a1; . . . ; p~an are all facts inserted to p andp~b1; . . . ; p~bm are all facts deleted from p by U.

    By using De Morgans laws, AfterUIC is represented asa set of denials. Then, OptimizeICAfterUIC is obtained,as specified in Definition 4.4, by a terminating proofprocedure, denoted below by , that is substitutive,1 i.e., ifF F0 then F F0, for each pair F, F0 of sets offormulas and each substitution .

    Definition 4.4. Let IC, IC

    0

    be sets of denials, I a denial, K aconjunction of literals, L a literal, and a terminating proofprocedure. OptimizeICIC0 is obtained by exhaustivelyapplying on IC0 the following rewrite rules, where ,symbolizes rewriting and IC00 IC0 n f K^ Lg:

    1. IC0 , IC00 [ f Kg if K^ L 2 IC0 andIC[ IC0 K,

    2. IC0 , IC0 n fIg ifI2 IC0 and IC[ IC0 n fIg I.

    In [7], it is shown that MCM is both sound and complete.

    Example 4.8. Let I and U be as in Example 3.2. We have

    AfterU

    fIg f px; y ^px; z ^ y 6 z; x i ^ y t ^px; z ^ y 6 z;

    px; y ^ x i ^ z t ^ y 6 z;

    x i ^ y t ^ x i ^ z t ^ y 6 zg:

    Then, Optimize removes the first constraint (subsumedby I), the second (subsumed by the third), and the fourth(a tautology). The simplification returned by MCM (to beevaluated in the old state) is the third constraint,equivalent to I0 as found in Example 3.2. Then, for eachdatabase D, DUI true iff DI0 true.

    The MCM method is not sound wrt. inconsistency

    tolerance due to the behavior of Optimize, as illustrated inExample 4.9.

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 225

    1. Substitutivity of is assumed implicitly in [7].

  • 8/2/2019 001.Integrity Checking

    9/17

    Example 4.9. IC f p ^ q; p ^$q; p ^ rx ^ sxg b e a n i nt eg ri ty t he or y a nd U finsert rag.MCM computes the simplification IC

    0 ; of IC (i.e.,D ^ UIC true if DIC true) since p, derivedfrom IC by , subsumes all denials in AfterUIC f p ^ q; p ^$q; p ^ rx ^ sx; p ^ sag.

    Now, letD fp;sag. Clearly, I p ^ ra ^ sa isa case of the last constraint in IC. We have: DIC false,DI true; and DIC0 true. However, DUI false,which shows that MCM is not sound wrt. inconsistencytolerance.

    One may object that IC above is equivalent to p andthus redundant. For IC f p}, misleading optimizationswould be avoided. In general, however, redundancy isundecidable.

    Optimize never harms inconsistency tolerance if ICcontains a single constraint, as shown by Theorem 9 below.(More generally, it can be shown that MCM is sound wrt.inconsistency tolerance if each pair of constraints has no

    predicate in common.)

    Theorem 9 (Inconsistency tolerance of MCM). For singletonintegrity theories, MCM is sound wrt. inconsistency tolerancein hierarchical databases.

    Proof. Let D be a hierarchical database, I a denial, U anupdate, I0 the simplification of fIg for U obtained byMCM, and a substitution such that I 2 SD; I. SinceMD; fIg; U DI0, we have to show:

    If DI0 true then DUI true: 12

    We prove (12) by transitivity of (13) and (14), below, as

    follows:

    If DI0 true then DI0 true: 13

    If DI0 true then DUI true: 14

    Evidently, (13) holds. Note now that, by Definition 4.3,After is substitutive, i.e., AfterUI AfterUI. Since is also substitutive, I0 is obtained from AfterUI bythe same sequence of Optimize steps as I0 is obtainedfrom AfterUI. Then, (14) holds because, as shown in [7]to prove soundness ofMCM, the evaluation of the resultof After in the old state is a sound integrity checkingmethod, and the application of any of the steps inOptimize preserves soundness. tu

    The interplay between multiple constraints also causesMCM to be notcomplete (not even weakly) wrt. inconsistencytolerance. This is shown by the following counterexample:

    Example 4.10. For D fqbg, IC f pa ^ qx, qbgand U finsert pa, delete qbg,weobtain AfterUIC f qx ^ aa ^x6b; pa ^qx ^x6b; qb ^b6bg.From that, OptimizeICAfterUIC f qxg is ob-tained as follows: First, the third denial in AfterUIC isdropped, since it is subsumed by the second denial in IC.Then, a a is dropped in the first denial. That then

    subsumes the second denial, which is thus removed. Last,x 6 b is dropped from theremaining denial qx^ x6 b,since qx can be proved from qx ^ x 6 b and qb in IC. Thus, we have that DUIC true and

    MCMD;IC;U DOptimizeICAfterUIC false,

    so neither (4) nor (6) holds for MCM.

    5 APPLICATIONS

    Inconsistency-tolerant integrity checking can improve solu-tions to several problems of database management. Weshow that for update requests in Section 5.1, for repairs in

    Section 5.2, for schema evolution in Section 5.3, for reliablerisk management in Section 5.4, and for unsatisfiabilityhandling in Section 5.5.

    Updates are a cornerstone of each database managementapplication addressed in this section. Each update U isrequired to preserve the satisfaction of a given integritytheory IC. Traditionally, integrity preservation has meantthat U maps a state D such that DIC true to a state DU

    such that DUIC true. But as soon as constraint viola-tions in D become tolerable, the notion of integritypreservation must be generalized as follows:

    Definition 5.1. For a database D and an integrity theory IC, anupdate U is said to preserve integrity if SD;IC SDU; IC.

    Note that Definition 5.1 does not require total integrity ofD, i.e., U may preserve integrity even if executed in thepresence of violated constraints. The following corollary ofDefinitions 3.2 and 5.1 states that updates can be checkedfor preserving integrity by any method that is sound wrt.inconsistency tolerance:

    Corollary 10. For a database D, an integrity theory IC and aninconsistency-tolerant integrity checking method M, anupdate U preserves integrity ifMD;IC;U sat.

    In general, the only-if version of Corollary 10 does nothold, as shown in Example 5.1. It is easily seen that it doeshold for methods that are complete wrt. inconsistencytolerance.

    Example 5.1. Let p be defined by px; y sx;y;z andpx; y qx ^ rx; y in a database D in which qaand ra; a are the only facts that contribute to thenatural join of q and r. Further, let IC f px; xg andU finsert sa;a;bg. Clearly, U preserves integrity,since the case C pa; a is already violated in D.However, the inconsistency-tolerant methods MLST andMSK; and others generate and evaluate the simplifica-tion pa; a of px; x and thus output vio.

    5.1 Inconsistency-Tolerant Satisfaction of UpdateRequests

    We define an update request as a closed first-order formulaintended to be made true by some integrity-preservingupdate. For a database D, an update U is said to satisfy anupdate request R, if DUR true and U preservesintegrity. View update requests are a common variantof update requests. An update method is a method to

    compute updates for satisfying update requests.Similar to integrity checking, also all known update

    methods have traditionally postulated the total satisfactionof all constraints in the old state. However, that requirement

    226 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

  • 8/2/2019 001.Integrity Checking

    10/17

    is as unrealistic for satisfying update requests as forintegrity checking. And, in fact, we are going to see that itcan be abandoned just as well, for the class of methodsdefined as follows:

    Definition 5.2. An update method UM is inconsistency-tolerant if each update computed by UM preserves integrity.

    For an update request R and a database D, many updatemethods work in two phases. First, an update U such thatDUR true is computed. Then, U is checked for integritypreservation by some integrity checking method. If thatcheck is positive, U is accepted. Else, U is rejected andanother update candidate, if any, is computed and checked.Hence, the following corollary follows from Definition 5.2and Corollary 10:

    Corollary 11. Each update method that uses an inconsistency-tolerant method to check its computed updates for preservingintegrity is inconsistency-tolerant.

    Corollary 11 serves to identify several known updatemethods as inconsistency-tolerant, since they use incon-sistency-tolerant integrity checking methods. Among themare, e.g., the update methods in [13], [14], which use theintegrity checking method of [5], shown to be inconsistency-tolerant in Section 4.2.

    Another well-known update method, by Kakas andMancarella, is described in [15]. For convenience, let usname it KM. It does not use any integrity checking methodas a separate module, hence Corollary 11 is not applicable.However, the inconsistency tolerance ofKM can be trackeddown as outlined below.

    For satisfying an update request, KM explores a possiblynested search space of abductive derivations and con-sistency derivations. Roughly, abductive derivations com-pute hypothetical updates of facts for satisfying a givenupdate request; consistency derivations check these updatesfor integrity. Each update generated by KM consists of a bipartite set of positive and negative ground literals,corresponding to insertions and, resp., deletions of groundfacts. For more details, we refer the reader to [15]. It sufficeshere to mention that, for KM, all constraints are repre-sented by denials that are used as candidate input clauses inconsistency derivations. Each consistency derivation of eachupdate computed by KM corresponds to a finitely failedattempt to refute the update as inconsistent.

    It is easy to verify that, for an update request R, eachupdate U computed by KM makes R become true in DU,even if some constraint is violated in D. What is at stake isthe preservation of the satisfaction of each case that issatisfied in D, while cases that are violated in D may remainviolated in DU. The following theorem entails that satisfiedcases are preserved by KM:

    Theorem 12. The method KM is inconsistency-tolerant.

    Proof. By Definition 5.2, we have to show that each updatecomputed by KM preserves integrity. Suppose that, for

    some update request in some database D and someintegrity theory IC, KM would compute and accept anupdate U that does not preserve integrity. Then, byDefinition 5.1, there is a constraint I in IC such that

    DI true and DUI false. Hence, by the definitionofKM, there is a consistency derivation rooted at oneof the literals in U, that uses I as input clause andterminates by deducing the empty clause. That, however,signals that the root of causes a violation of I. Thus,KM rejects U, which contradicts the supposition thatKM accepts U. tu

    Example 5.2 illustrates the usefulness of inconsistency-tolerant update methods.

    Example 5.2. Let D fqx rx ^ sx, pa; ag, ICf px; x, pa; y ^ qyg, and R the update request tomake qatrue. To satisfy R, most update methodsc o mp u te t he c a nd i da t e u p da t e U finsert ra,insert sag. To check if U preserves integrity, mostintegrity checking methods compute the simplification pa; a of the second constraint in IC. Rather thanaccessing the p relation for evaluating pa; a, integritychecking methods that are not inconsistency-tolerantmay use the invalid total integrity premise thatDIC true, by reasoning as follows: The first con-straint px; x in IC is not affected by U and subsumes pa; a, hence both constraints remain satisfied in DU.Thus, such methods conclude that U preserves integrity.However, that is wrong, since the case pa; y ^ qy issatisfied in D but violated in DU. By contrast, eachinconsistency-tolerant update method rejects U andcomputes the update U0 U[ fdelete pa; ag for satis-fying R. Clearly, U0 preserves integrity. Incidentally, U0

    even removes a violated case.

    5.2 Partial Repairs

    Roughly, repairing is to compute updates to databaseswith violated constraints such that the updated databasessatisfy integrity. Based on cases, Definition 5.3 introducespartial repairs. They repair only a fragment of thedatabase.

    Definition 5.3. Let D be a database, IC an integrity theory andS a set of cases of constraints in IC such that DS false.An update U is called a repair of S in D if DUS true ; ifDUIC false, U is also called a partial repair ofIC in D,else U is called a total repair of IC in D.

    Related notions in the literature [1], [16], [17] only dealwith total repairs, additionally requiring them to be

    minimal, in some sense. In [18], null values and a 3-valuedsemantics are used to summarize total repairs.

    Repairing can be costly, if not intractable [19]. Thus, atfirst sight, a good heuristic to curtail inconsistency could beto use partial instead of total repairs, particularly in largedatabases with potentially unknown inconsistencies. How-ever, partial repairs may not preserve integrity, as shown bythe following example:

    Example 5.3. Let IC f px;y;z ^ $qx; z, qx; xgand D fpa;b;c; pb;b;c; pc;b;c; qa; c; qc; cg. Theviolated basic cases are pb;b;c ^ $qb; c an d qc; c. Repairing f qx; xg by fdelete qc; cg does

    not preserve integrity, since pc;b;c ^ $qc; c issatisfied in D but not in DU. However, the partial repairsfdelete pb;b;cg and finsert qb; cg of IC do preserveintegrity. The only subset-minimal total repairs are

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 227

  • 8/2/2019 001.Integrity Checking

    11/17

    fdelete qc; c, delete pb;b;c, delete pc;b;cg, a n dfdelete qc; c, insert qb; c, delete pc;b;cg.

    The dilemma that total repairs may require more update

    operations than partial repairs, while the latter may not

    preserve integrity, is relaxed by the following corollary of

    Corollary 10. It says that it suffices to check partial repairs for

    integrity preservation by an inconsistency-tolerant method.Corollary 13. For each inconsistency-tolerant method M, each

    database D, and each integrity theory IC, each partial repair U

    of IC such that MD;IC;U sat preserves integrity.

    The following example illustrates how integrity-preser-

    ving repairs can be computed by inconsistency-tolerant

    update methods:

    Example 5.4. Let Dbe a database and S f B1; . . . ; Bngn ! 0 a set of cases of constraints in an integrity theoryIC. Thus, DS false if and only ifD Bi true for

    some i. Hence, an integrity-preserving repair of S thattolerates extant violations of cases, not in S, can be

    computed by each inconsistency-tolerant update method,

    by issuing the update request$vioS, where vioS is defined by the n clauses vioS B1; . . . ; vioS Bn. Update

    methods that are not inconsistency-tolerant cannot be

    used, since they may accept repairs that do not preserve

    integrity, as seen in Example 5.2.

    5.3 Inconsistency-Tolerant Schema Evolution

    A database schema evolves via schema updates, i.e.,removals, additions or alterations of integrity constraints

    or of database clauses with nonempty bodies. Since changesof the set of clauses can be captured by update requests as

    in Section 5.1, and deletions of constraints never cause any

    violation, we focus below on schema updates consisting of

    insertions of constraints.Whenever a new constraint I is added to the integrity

    theory, it may be too costly to evaluate it on the spot, letalone to immediately repair all violated cases of I. As long

    as such repairs are delayed, traditional integrity checking is

    not applicable, since the total integrity premise does not

    hold. However, inconsistency-tolerant integrity checking

    can be used, no matter for how long the repair of violated

    cases is delayed.More precisely, let IC IC[ fIg be an integrity theory

    obtained by the schema update insert I. Then, each incon-

    sistency-tolerant methodM for computingMD;IC; U foreach update U issued after IC has been updated can

    guarantee that all cases in IC that are satisfied in D remain

    satisfied in DU. IfM was not inconsistency-tolerant, then a

    possible inconsistency of D [ IC would invalidate anyoutput ofMD;IC; U, even if integrity was totally satisfiedbefore IC was updated.

    Theorem 14 below captures another advantage of incon-

    sistency-tolerant integrity checking for schema evolution.

    Theorem 14. For each database D, each pair of integrity theories

    IC, IC0, each update U, and each inconsistency-tolerant

    method M, the following holds, where IC IC[ IC0:

    If DIC true and MD;IC; U sat

    then DUIC true:15

    Proof. Since IC SD;IC, (15) follows from (3). tu

    Theorem 14 says that M guarantees the preservation oftotal integrity of IC even if DIC0 false. That is usefulfor distinguishing hard constraints (those in IC), thesatisfaction of which is indispensable, and soft constraints(in IC0), the violation of which is tolerable. Thus, byTheorem 14, each inconsistency-tolerant method guaranteesthat all hard constraints remain totally satisfied acrossupdates, even if there are violated soft constraints.

    Example 5.5. Let hr and lr be two predicates that model ahigh, resp., low risk in some application domain.Further, I1 hr~x, I2 lr~x, be a hard, resp., softconstraint for protecting against high and, resp., lowrisks. Then, each inconsistency-tolerant method M canbe used to preserve the satisfaction ofI1 across updates,even if I2 is violated.

    5.4 Inconsistency-Tolerant Risk Management

    Since constraint violations may be hidden or unknown, andsince all integrity checking methods traditionally haveinsisted on total integrity, their use has not been reliable.But now, the definition of inconsistency-tolerant integritychecking provides a decision criterion for distinguishingreliable from unreliable methods. The unreliability ofmethods, that are not inconsistency-tolerant, is illustratedin the following elaboration of Example 5.5:

    Example 5.6. Let D fp0; 0; p1; 2; p2; 3; p3; 4; . . .g.Further, let the predicates in IC f lrx, hrxg

    be defined by the clauses

    lrx px; x

    hrx p0; x; qx; y;y > th;

    where lr and hr indicate a low and, resp., a high risk. Inthe clause defining hr, the term th may stand for athreshold value.

    The purpose of IC is to protect the application fromany risk. Yet, in D, the low-risk presence of p0; 0 istolerated. Now, let U insert q0; 100;000. Then, meth-ods that are not inconsistency-tolerant, such as MG,MCM, reason as follows for checking if U preserves

    integrity. Using U for simplifying hrx yields the case p0; 0, 100;000 > th. It is obtained from the body ofthe definition ofhr by binding the variables x and y to 0and, resp., 100,000, and then dropping the literalq0; 100;000. C le ar ly , t ha t c ase i s sub su me d b y px; x, which defines lr and is not affected by U.The total integrity premise entails that px; x issatisfied in D. Hence, methods that are not inconsis-tency-tolerant may deduce that px; x remains satis-fied in DU. From that, such methods deduce that also thesubsumed constraint p0; 0, 100;000 > th, and hence hrx is satisfied in DU. Thus, even if 100;000 > th,

    methods such as those mentioned above accept U, i.e.,they fail to detect that U causes a high risk. Thus, theiroutput is not reliable in the presence of extant low risks.As opposed to that, if 100;000 > th, then U is reliably

    228 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

  • 8/2/2019 001.Integrity Checking

    12/17

    rejected by each inconsistency-tolerant method, since thecase hr 0 is satisfied in D but violated in DU.

    5.5 Unsatisfiability-Tolerant Integrity Checking

    By bad design or faulty schema updates, database evolutionmay lead to an unsatisfiable integrity theory, i.e., no statecould ever satisfy integrity. Theoretically, unsatisfiableintegrity is the worst possible situation, since each state

    then is irreparable. Since unsatisfiability is known to beundecidable in general, it even might never be detected.Anyway, with an unsatisfiable integrity theory, schemaevolution may seem to have reached a dead end.

    However, inconsistency-tolerant integrity checking canbe applied even if the constraints are unsatisfiable, i.e., ifintegrity is inevitably violated in any state. By using aninconsistency-tolerant method, one can guarantee thatall satisfied cases of constraints remain satisfied, eventhough integrity as a whole is never attainable. Thus, eachinconsistency-tolerant method is also unsatisfiability-tolerant, as defined below.

    Definition 5.4. An integrity checking method M is calledunsatisfiability-tolerant if, for each database D, eachunsatisfiable integrity theory IC ; and each update U, (3) holds.

    Definition 5.4 straightforwardly entails Corollary 15,since unsatisfiability of IC is in fact not excluded inDefinition 3.2.

    Corollary 15. Each inconsistency-tolerant integrity checkingmethod is unsatisfiability-tolerant.

    Example 5.7. Let IC be the unsatisfiable integrity theoryf px; x, $p0; 0, qx ^ rxg. Clearly, the firsttwo denials in IC can never be satisfied at a time.

    However, in D fp0; 0; q0; r0; q1; r2; q3; r4;q5; . . . ; q99; r100g, all basic cases of IC ; except thecases p0; 0 and q0 ^ r0, are satisfied. AlthoughIC can never be fully satisfied, it makes sense to acceptupdates such as deleting q0, which would actuallyremove a case of violated integrity, and to preventinsertions, e.g., of q2, that would introduce newviolations. Also, no inconsistency-tolerant method wouldever reject any request to delete any fact from q or r. Orwhen, e.g., the insertion of a fact of the form qa isrequested, only the simplification ra will be checked,i.e., the request is rejected only if ra is in DU.

    6 EXPERIMENTAL EVALUATION

    We now describe the experiments performed for evaluating,first, the benefits of ITIC for database updating, second, itsbenefits for query answering, and, third, the impact of ITICon the performance of updating, checking, and querying.Each experiment is based on a series of updates, startingfrom an initial state and leading to a final state. Updates areeither checked by ITIC, as proposed in this paper, or notchecked at all, since checking updates in inconsistentdatabases are traditionally considered invalid. More pre-cisely, we run the following three kinds of experiments, the

    setups of which are described in Section 6.1:

    . Updates may change the amount of inconsistency. InSection 6.2, we assess how inconsistency varies

    between initial and final states, both when ITIC isused and when integrity is not checked. We do thatby measuring the percentage of tuples that partici-pate in constraint violations.

    . Extant inconsistency may cause incorrect answers. InSection 6.3, we measure and compare the amounts ofincorrect tuples in the answers to queries posed inthe final states, both when ITIC is used and whenintegrity is not checked. That way, we obtain anindication of the quality of query answers depend-ing on whether ITIC is used or not. We computeanswers both by traditional query evaluation and byconsistent query answering (CQA), a technique forimproving the quality of query answers in thepresence of inconsistency [1].

    . Using ITIC obviously weighs in more on perfor-mance than running no integrity checks at all. InSection 6.4, we measure and report on the timesrequired for integrity checking, updating and query-ing both, when ITIC is used and when integrity is

    not checked.

    6.1 Parameters and Setups

    The tests are run on the databases and queries of the TPC-H

    decision support benchmark,2 which is known to have a

    broad industry-wide relevance. In order to cover a

    significant spectrum of update series, we have experimen-

    ted with the following variants of parameter values for the

    initial state and the updates:

    . s: initial state size. We experiment with s 100 MB,500 MB, 1 GB, 2 GB. A database with s 2 GB has

    approximately 16 million tuples.. p: initial inconsistency, expressed as the percentage of

    tuples that participate in constraint violations in theinitial state. For simplicity, we only consider primarykey constraint violations. We experiment withp 0%, 1%, 10%, 33%. For example, p 10% ands 2 GB means that 1,600,000 tuples violate aprimary key constraint. Violations occur when thesame key value is repeated; we experiment withviolations caused by 2-50 repetitions, and an equalpercentage of violations in all tables.

    . i: percentage of insertions in the update series that violatea primary key constraint. We experiment with i 10%,

    50%, 90%. We generate update series consisting ofinsertions and deletions of a size equal to 10 percentand, respectively, 1 percent of the size of the initialdatabase, so as to simulate a significant evolution ofthe database. For example, with i 10%, ands 2 GB, there will be 160,000 insertions violatingprimary key constraints, i.e., 10 percent of 10 percentof 16 million tuples. (Note that deletions cannotcause any violation of primary key constraints.)

    The TPC-H suite provides a script, called dbgen, forgenerating database states of a given size that satisfy theconstraints. In order to set the initial inconsistency, we use

    the same technique as in [20], where the authors test theirCQA approach against the TPC-H benchmark. To have, e.g.,

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 229

    2. http://www.tpc.org/tpch/.

  • 8/2/2019 001.Integrity Checking

    13/17

    p 10% and violations with two key repetitions each in adatabase of s 1 GB, dbgen is first used to create aconsistent state of size 0:95 GB; we denote that state by "Dps .Then, a set A of tuples of size 0:05 GB from the database israndomly selected from a uniform distribution; a new set Bof size 0:05 GB is generated from this, with the same keyvalues as in A, and nonkey values taken randomly from theother tuples in the database. Then, set Bis added to "Dps , andthe resulting state Dps has the desired size s and incon-sistency p. Similarly, we use dbgen to generate a set ofupdates Uis consisting of deletions (of size 1 percent of s)and insertions (10 percent ofs), i percent of which introducenew constraint violations.

    6.2 Measuring Inconsistency Variations throughUpdates

    The first measurement, we have performed, assesses theinconsistency, i.e., the percentage of tuples that violate aconstraint, in the final state reached after a series of updates.We denote as DNC (resp., DITIC) the state reached afterexecuting on Dps the updates in U

    is with no checking (resp.,

    if accepted by an ITIC method). Both here and in the othertests, we use MN as the integrity checking method, since it

    is sound, complete and inconsistency-tolerant. Fig. 2 showshow inconsistency varies between the initial state Dps andthe final states DNC and DITIC for s 100 MB; and for allpossible values for p (lines with squares for p 0%, circles

    for p 1%, lozenges for p 10%, and triangles for p 33%)and i (light gray for i 10%, dark gray for i 50%, andblack for i 90%). The dashed lines refer to the no-checkingscenario, where inconsistency always grows, unless i < p(which is the case in our tests only for p 33% andi 10%). The continuous lines refer to the ITIC scenario,where the number of violations cannot increase; in fact,inconsistency naturally decreases, since the database tendsto become bigger after executing our series of updates,while inconsistency does not increase. The differences in theamount of inconsistency between DNC and DITIC are quitesignificant in many cases. For example, for p 1% andi 10%, inconsistency amounts to 1.81 percent of thedatabase in DNC, while it is only 0.99 percent in DITIC(differences are even bigger for bigger values ofi). Needlessto say, if the initial state is consistent (p 0), integrity istotally preserved with ITIC, while it is lost with nochecking. Similar considerations hold also for the othervalues considered for s.

    Another quantitative difference, not shown in Fig. 2,

    between DNC and DITIC regards their sizes. Obviously,DITIC does not contain any new violation and is, therefore,always smaller than DNC whenever i > 0 (their differenceincreases as i increases). For example, for p 10% andi 90%, the size varies from 953,088 tuples in DNC to875,109 in DITIC.

    6.3 Measuring Incorrect Tuples in Query Answers

    Our second experiment tests the negative effect of incon-sistency on query answering, and to which extent such effectcan be cured by handling database maintenance with ITIC.

    Inconsistency may be responsible for incorrect answers to

    queries. Let us indicate with QD the set of tuples in theanswer to query Q evaluated in database D. We define atuple t to be a correct answer to Q in D ift 2 Q "D, where "D isthe reference database ofD, i.e., the state in which D would beif no inconsistency had occurred at any time. Accordingly,we define the false positives of Q in D a s the setQD QD nQ "D, the false negatives as Q

    D Q "D nQD. The

    smaller QD and QD, the better the quality of QD. Determin-

    ing the reference database for a given D requires, in general,information on all the updates that have led to D, which istypically unavailable. However, since in our experiments wehave that kind of information, for our purposes it issufficient to assume that "Dps is the reference database ofD

    ps .

    Another way to remove inconsistency from Dps is toeliminate all tuples that participate in constraint violations.This is a very strict repair procedure, since some of theeliminated tuples may indeed be correct. However, it ismore feasible than determining the reference database, ingeneral. We denote by ~Dps the state obtained in this way.Because of its consistency, traditional integrity checking can be applied to any series of updates starting from ~Dps . Wedenote by Dclean the state obtained from ~D

    ps by executing the

    updates in Uis accepted by MN.Our tests measure and compare, for given benchmark

    queries, the amounts of false positives and negatives in

    DNC, DITIC, and Dclean. To this end, consider that thedatabase Dideal obtained by executing on "D

    ps the updates in

    Uis accepted by MN is the reference database of each of DNC,DITIC, and Dclean.

    230 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

    Fig. 2. Inconsistency measured after updates applied to an initial statewith size s 100 MB and different values for p (initial inconsistency)and i (percentage of insertions in the update series violating aconstraint), as indicated. Continuous lines indicate tests run with ITIC(where inconsistency never increases), while dashed lines indicate no

    checking (NC).

  • 8/2/2019 001.Integrity Checking

    14/17

    Queries that always return a fixed, very small number ofresults coming from complex aggregations are not well-

    suited for our purposes, since even tiny variations in the statemay imply different aggregate values, and thus falsepositives and negatives. Therefore we choose to focus onqueries that return at least 10 results, namely queries Q3 andQ10 of the benchmark. Such queries are top-k queries thatoutput the first few results of an aggregation operation.3 Q3involves 3 relations, selects 4 attributes, and returns10 results. Q10 involves 4 relations, selects 8 attributes, andreturns 20 results.

    In order to compare false positives and negatives of largequery answers, we also consider queries Qall3 and Q

    all10 , that

    we define as identical to Q3 and, resp., Q10, but without

    being limited to the top 10 or, resp., 20 results.Finally, we also consider the rewritings Qcqa3 and Q

    cqa10 of

    Q3 and, resp., Q10 obtained by the CQA rewriting techniquedescribed in [20]. Intuitively, CQA consists in rewriting agiven query Q over a database D with an integrity theoryIC into a new, more complex query Qcqa, the evaluation ofwhich only produces the consistent answers to Q. In thedefinition of [1], [20], a tuple is a consistent answer to Q inD, if it is an answer to Q in each consistent database whoseset difference wrt. D is minimal. CQA can therefore beregarded as a technique for reducing the amount ofincorrect answer tuples.

    We measure jQ

    Dj and jQ

    Dj for every Q 2 fQ3; Q10;Qall3 ; Qall10 ; Q

    cqa3 ; Q

    cqa10 g a nd e ve r y D 2 fDNC; DITIC; Dcleang.

    Note that jQDj jQDj for Q 2 fQ3; Q10g, since the cardin-

    ality of the query answers is fixed by the top-k clause.Figs. 3 and 4 compare the amounts of false positives for

    Q3 and, resp., Q10 in DNC, DITIC, and Dclean, with s 100 MB and all combinations of p and i. The benefits ofITIC are significant, especially for lower amounts of initialinconsistency. For example, for p 1% and i 10%, thereare only 4 incorrect answers among the top 10 answers toQ3 in DITIC, whereas all top 10 answers are incorrect inDNC. The graph also signals that initial repairing is

    beneficial: removing all potentially incorrect data lowers

    false positives and negatives considerably. Recall, however,that repairing is costly.

    Moreover, although the answers in Dclean are usuallybetter than those in DITIC, it is not necessarily so, as shown,e.g., for Q10 with i 10% and p 1%, where the number offalse positives is seven in Dclean, but four in DITIC. The otherlines in the figure report the amounts of false positives forQcqa3 and Q

    cqa10 in DNC and DITIC (not in Dclean, since it is

    consistent, so the answers to Qcqa3 and Qcqa10 coincide with

    those to Q3 and, resp., Q10). Although slower in execution,such queries further improve the quality of answers, and insome cases they even eliminate all false positives in DITIC.This suggests that, for quality-critical OLTP applications,where some extra time is affordable for CQA but not for

    total repairs, ITIC should be used for database maintenancetogether with CQA for query answering. When the databaseis too inconsistent, as, e.g., for p 33%, an update phase of10 percent the size of the database cannot do much tosignificantly improve consistency, so all top answers areincorrect, both in DNC and DITIC in Fig. 4.

    Fig. 5 shows the amounts of both false positives andnegatives for Qall3 and Q

    all10 ,for s 100 MB,p 1%,andforall

    values ofi. Note that these values are higher than in Figs. 3and 4, since Qall3 and Q

    all10 do not restrict to the top 10, resp.,

    20 results. Again, the benefits of ITIC seem impressive.

    DECKER AND MARTINENGHI: INCONSISTENCY-TOLERANT INTEGRITY CHECKING 231

    3. The queries in the TPC-H specification are parameterized, and thestandard suggests values for these parameters. In the experiments, we usedthe suggested values in all the queries.

    Fig. 3. The lines refer to an initial state size s 100 MB and differentvalues for p (initial inconsistency) and i, and show the false positives forQ3 on DNC (stars), Q3 on DITIC (crosses), Q3 on Dclean (triangles), Q

    cqa3

    on DNC (squares), and Qcqa3 on DITIC (circles).

    Fig. 4. The lines refer to an initial state size s 100 MB and differentvalues for p (initial inconsistency) and i, and show the false positives forQ10 on DNC (stars), Q10 on DITIC (crosses), Q10 on Dclean (triangles), Q

    cqa10

    on DNC (squares), and Qcqa10 on DITIC (circles).

    Fig. 5. False positives and negatives for Qall3 and Qall10 in DNC (squares)

    and DITIC (diamonds) with initial state size s 100 MB, initialinconsistency p 1%, and different values for i.

  • 8/2/2019 001.Integrity Checking

    15/17

    Finally, Fig. 6 shows the amount of false positives for Qall3and Qall10 with i 50% and p 1% for different values of s,both in DNC and in DITIC. We observe that the amount of

    false positives in DNC is about six times higher than inDITIC, therefore, with remarkable benefits due to ITIC. Thementioned factor depends of course on the chosen para-meters and on the selection predicates in the queries, andcan be explained as follows: Queries Qall3 and Q

    all10 turn out

    to retrieve a number of false positives that is proportional tothe number of tuples violating the constraints, which, inturn, is proportional to s, since p is fixed. Since the size ofthe insertions is 10% s, the inconsistency in DNC is abouti 10%p=p 6 times higher than that in DITIC.

    6.4 Measuring Execution Times

    In our last test, we measure the time consumed for integritychecking, updating, and querying in the states obtained byall update series considered so far, i.e., leading

    1. from Dps to DNC,2. from Dps to DITIC,3. from ~Dps to Dclean, and4. from "Dps to Dideal.

    The test machine sports a 2.66GHz Quad Core Intelprocessor with 4GB 800MHz RAM and runs Sqlserver2005 under Windows XP.

    Apart from 1), where no time is spent for integrity

    checking, there are little notable differences of measuredtimes across different update series. For example, for s 1 GB and p 1%, integrity checking, if performed, alwaystakes around 450 seconds in total, updating about 18 seconds,answering non-CQA queries 2.2 seconds, and answeringCQAqueries12seconds.Forlargeramountsofinconsistency,e.g.,p 33%, improvementsup to 10 percent of theexecutiontimes wrt. the update series 2) are observed both for queryanswering and for integrity checking in 4), and up to20 percent in 3). These, however, are due to the differentsizes of initial states, which are the smaller the higher theinitial inconsistency:whileDps hassize s,

    ~Dps hassize 1p s

    and "Dps has size 1p=2 s.Note that we did not perform any particular tuning of

    the database, so as to speed up query answering or integritychecking.

    6.5 Summary of Experimental Results

    The experiments reported in this section have providedevidence of the following benefits of ITIC:

    . A series of updates executed on a database stateleads to lower amounts of inconsistency if filtered byITIC, with no extra checking cost incurred withrespect to integrity checking in a traditional, con-

    sistent setting.. The query answers obtained from a database state

    reached after a series of updates checked by ITIC aregenerally better than without checking, in that theycontain fewer false positives and negatives.

    . Traditional query answering can be replaced by CQAto further improve the quality of query answers.

    The smaller the initial amount of inconsistency and the largertheamount of inconsistency introduced by updates, themorethe above effects become visible. Or, in other words, largerinitial amounts of constraint violations require longerperiods of updates checked by inconsistency-tolerant meth-

    ods for decreasing the initial inconsistency significantly.

    7 RELATED WORK

    Various forms of exception handling for dealing withpersistent inconsistencies as embodied by constraint viola-tions have been proposed in [21], [22], [23], and others.However, integrity checking is not addressed in any ofthose works.

    Another approach to deal with inconsistencies is torepair them (cf. 5.2), which, despite recent advances [18],[17], is known to be intractable in general. Anyway, all

    approaches that either eliminate or work around incon-sistencies (e.g., by repairing them or treating them asexceptions) need to know about extant integrity violations.As opposed to that, ITIC simply leaves inconsistenciesalone. That works reliably, even if violated cases ofconstraints are unknown to the user or the application, asseen in Section 5.4.

    To the best of our knowledge, the putatively fundamentalrole alleged to total integrity as an indispensable premise forsimplified integrity checking has never been challenged.Thatmaybeduetotheclassical ex contradictione quodlibetrule,by which conclusions derived from inconsistency cannot beconsidered reliable. However, in Section 5.4, we have seen

    that, on the contrary, the use of inconsistency-tolerantmethods is fully reliable.

    On the other hand, it is astonishing that total integrityhas always been insisted on, since many database contextsin practice suffer from some amount of inconsistency.

    Nevertheless, interesting work hasbeen going on in recentyears under the banner of inconsistency tolerance. A lot ofit is concerned with consistent query answering in incon-sistent databases (abbr. CQA) [1], [16], [19]. CQA definesanswers to be correct if they are logical consequences of eachreasonably repaired state of the database, i.e., each state thatsatisfies integrity and differs from the given violated state in

    some minimal way. CQA and ITIC have in common that theyneither capitulate in the presence of inconsistency (asclassical logic would), nor need to appeal to repairingviolated constraints (as traditional query answering and

    232 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, FEBRUARY 2011

    Fig. 6. False positives for Qall3 in DNC (squares) and DITIC (diamonds),

    and Qall10 in DNC (crosses) and DITIC (triangles) for p 1% and i 50%

    for different sizes of the initial state.

  • 8/2/2019 001.Integrity Checking

    16/17

    integrity checking would). However, their main purposesare different, since CQA enables query answering, whileITIC enables updating, even when the database violatesintegrity. Yet, integrity checking (which can be seen as aspecial-purpose variant of query answering) has, to the bestof our knowledge, never been addressed in detail by theCQA community. As was observed in Section 6, ITIC can beconsidered as largely complementary to CQA, since theformer prevents new integrity violations to occur duringupdates but does not remove the effect that extant violationsmay have on query answers, which is precisely what thelatter does.

    Also, a variety of paraconsistent logic approaches that aretolerant and robust wrt. inconsistency have received someattention, e.g., [24], [25]. Most of them, however, deviatesignificantly from the syntax and semantics of classical first-order logic, while ITIC does not. Some paraconsistentapproaches resort to modal or multivalued logic. Asopposed to that, ours complies with conventional two-valued semantics of databases and integrity in the literature.

    Yet, resolution-based query answering (by which each ofthe methods mentioned in this paper has been implemented)can be characterized as a procedural form of paraconsistentreasoning [26]. This is particularly noteworthy for proofprocedures that use integrity constraints as candidate inputclauses, such as those in [12], [4]. Thus, the paraconsistencyof logic programming naturally qualifies it as a paradigm forimplementing inconsistency-tolerant approaches to data-base integrity.

    Further relevant work on the management of incon-sistencies in databases comes from the field of inconsistencymeasuring [27].

    Inconsistency measures are useful for updates andintegrity checking, if one wants to accept an update onlyif the measure of inconsistency of the old state does notincrease in the new state. That, however, is preciselyaccomplished by ITIC, as soon as the set of violated casesof an integrity theory is measured: that set cannot beincreased by an update if the update is checked by aninconsistency-tolerant method. Also other measures, suchas those proposed in [27], should be useful for determiningthe increase or decrease of inconsistency across updates.Alternative ways to characterize ITIC, including definitions based on inconsistency measures, are described in [10].There, different classes of integrity checking strategies areidentified, studied, and compared wrt. their inconsistencytolerance capabilities.

    This paper improves and extends [28] in several ways.New are the properties and conditions for completenessand weak completeness wrt. inconsistency tolerance. Alsothe application of ITIC to various database managementproblems in Section 5 is new. Another important addition ofthis paper is the validation of the practical relevance of ITICin Section 6.

    8 CONCLUSION

    The purpose of integrity checking is to ensure that thesatisfaction of each constraint is preserved across updates.Traditionally, the theory of efficient integrity checkingstipulates total integrity, i.e., that all constraints be satisfied

    in each state, without except