Integrity Constraints: Semantics and Applications

46

Transcript of Integrity Constraints: Semantics and Applications

Page 1: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 1 of 46Integrity Constraints: Semantics and ApplicationsP. Godfrey1;4, J. Grant3, J. Gryz1, & J. Minker1;[email protected], [email protected], fjarek, [email protected] of Computer Science1and Institute for Advanced Computer Studies2University of Maryland at College ParkCollege Park, MarylandDepartment of Computer and Information Sciences3Towson State UniversityTowson, MarylandU.S. Army Research Laboratory4Adelphi, Maryland30 April 1997

Page 2: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 2 of 46Contents1 Introduction 32 Background 43 Semantics of Integrity Constraints 73.1 Expressiveness of Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Model Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Epistemics of Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Types of Integrity Constraints 134.1 Static versus Temporal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2 State versus Universal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3 User and Preference Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.4 Aggregate Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Reasoning with Integrity Constraints 185.1 Eliminating Integrity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.2 Model Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.3 Residue Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Applications of Integrity Constraints 236.1 Semantic Query Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.2 Cooperative Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.3 Combining Databases and Resolving Inconsistencies . . . . . . . . . . . . . . . . . . 286.4 View Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.5 Additional Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Conclusion and Future Directions 35

Page 3: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 3 of 461 IntroductionDatabases contain knowledge as well as data. The database's schema|how the data is organized|is knowledge, which yields constraints on the form the data must take. The relationships betweendata that must hold, such as functional and inclusion dependencies, are knowledge. General rulesabout the world or domain, to which the database's data must always conform, are knowledgeas well. Such knowledge de�nes the semantics of the database. It is bene�cial for a database tostore explicitly its knowledge, in addition to its data. This has long been recognized in relationaldatabases. Some of the database's knowledge is captured and stored via integrity constraints,statements about what are the legal states and transitions of the database. Integrity constraints(ICs) were introduced to prevent the entering of incorrect data into the database and to check theintegrity of the database.Integrity constraints actually have much wider applicability. In addition to integrity checking, theseinclude query optimization via semantics, cooperative query answering, combining databases in asemantically consistent manner, and view updating. It is commonly held that integrity constraintsare an adequate and suitable knowledge representation in databases.1 Thus the types of knowledgethat should be kept by databases can, and should, be written as ICs. By having a standard, uniformrepresentation for the database's knowledge, the various applications that rely on the database'ssemantics can all employ the same representation.In this chapter, we consider logic databases (also called deductive databases).2 Logic databasesemploy the logic model, a subset of the �rst-order predicate calculus, to describe the database andqueries [Ullman, 1988]. Records are represented as logical facts. Rules in logic databases allowimplicit facts to be derived, via logical deduction. (Views play such a role in relational databases.)The logic model can be extended to allow formulas as integrity constraints. The advantage oftaking a logical approach to databases is that data, rules, queries, and integrity constraints canbe all handled in a common framework, and formal techniques rather than ad hoc approaches canthen be employed for all database applications.There is a broad body of work on logic and relational databases, and a general consensus on whatdatabases (facts and rules) and queries mean. However, there is less work on the meaning ofintegrity constraints, and certainly no consensus. What is meant by an IC can di�er widely fromsystem to system. For instance, one may de�ne that ICs must be consistent with the database, orde�ne that they must be provable statements, deducible from the database. Another view is thatICs really represent meta-knowledge|knowledge about the database itself|and should, perhaps, bewritten in an extended logic beyond �rst-order. The general situation becomes more complex whenwe permit databases to contain inde�nite (disjunctive) information or to use negation. Subtle butprofound di�erences in meaning can arise due to di�erent interpretations of ICs. In many systems,the semantics for ICs is never made clear; at times, one interpretation seems intended, while atother times, another interpretation is evident. This ambiguity is dangerous, and could allow adatabase to become corrupt in unanticipated ways.1There is much work in knowledge representation, and one might argue that another formalism is more suitable forcapturing \knowledge" in databases. However, ICs were designed for this purpose, and have proven to be suited andadequate for the task. (See [Grant and Minker, 1990].) We limit our focus to integrity constraints in this chapter,and we do not compare them with other possible knowledge representations here.2All the results we present, and our considerations, for logic databases apply to relational databases as well. Thelogic model subsumes the relational model.

Page 4: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 4 of 46In this chapter, we introduce integrity constraints in the logical framework, and overview the variouswork that has been done on ICs in this context. Section 2 presents the basic de�nitions for databasesand integrity constraints for the chapter. We consider various interpretations of ICs which de�newhat they mean and discuss how ICs are used to represent the knowledge of the database (Section3), look at variations on the types and roles of ICs (Section 4), review techniques for reasoningwith ICs (Section 5), and present a number of important applications that employ ICs (Section 6).Integrity constraints are a vital component, and issue, in databases, and this is an important bodyof work. There still remains fundamental work to be done on ICs to determine which semanticsare appropriate in which contexts, their representational adequacy, and how better to reason withthem. There are a number of important and useful database applications that rely on ICs whichare in progress. We conclude with a discussion of current and future work (Section 7). We notethat there are important aspects of ICs we do not cover in this chapter. In particular, we do notdiscuss integrity constraint checking; that is, how best to check the consistency of the databasewith respect to ICs. We recommend [Grefen and Apers, 1993] as a survey of this work.2 BackgroundThis section contains a summary of the background and notations used in this chapter. We use thelanguage and terminology of logic databases ([Das, 1992], [Lloyd, 1987], and [Lobo et al., 1992]).Logic databases express data, rules (views), and queries in �rst-order logic ([Boolos and Je�rey, 1989]and [Lloyd, 1987]). We show that integrity constraints can likewise be expressed in �rst-order logic.3We use standard syntax for �rst-order logic, with the usual symbols for variables, connectives,quanti�ers, punctuation, equality, constants, and predicates. The notions of term, formula, andsentence (a formula with no free variables) are de�ned in the usual way. We further restrict formulasto be function-free. Any formula may be considered as a query (although we are about to restrictthe form of standard queries). A formula is called ground if it contains no variables.A substitution is a set of substitution pairs, for example, fX = a; Y = bg, such that every elementof a substitution pair is a variable or constant (or, more generally, a term), and such that thecollection of left-hand sides of the substitution pairs|X and Y in our example|constitute a set.A substitution applied to a formula is a rewrite of the formula by replacing any occurrence in theformula of a left-hand element from the substitution by its right-hand counterpart, in parallel. Letthe formula F be p (X,Y) and the substitution � be again fX = a; Y = bg. The substitution �applied to formula F , written as F�, is the formula p (a,b). A ground substitution of a formularesults in a ground formula. When the right-hand sides of the substitution pairs of a substitution� are distinct and do not appear in the formula F , the inverse of the substitution, denoted as��1, exists. The inverse is de�ned as the original substitution with the left-hand and right-handelements of each of its substitution pairs exchanged. In our example, ��1 is fa = X; b = Y g.An answer to a query is a ground substitution of the query formula such that the resulting groundformula is true with respect to the database; that is, the grounded query formula is logically entailedby the database.3Some proposals for integrity constraints express them in a higher-order logic, while keeping the database|thefacts, rules, and, sometimes, queries|in �rst-order. We review some of these proposals in Section 3.3.

Page 5: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 5 of 46An important class of sentence is the clause. A clause has the general form:8: A1 _ : : :_ Ak _ :Ak+1 _ : : :_ :Anin which each Ai is an atomic formula, for i 2 f1; : : : ; ng, and in which the variables are understoodto be universally quanti�ed (as denoted by `8'). Any clause can be written in a logically equivalentform as an implication. 8: A1 _ : : :_ Ak Ak+1 ^ : : :^An:This is often written in further shorthand asA1; : : : ;Ak Ak+1; : : : ;An:in which disjunction is assumed on the left-hand side of the implication arrow, conjunction on theright-hand side, and the universal quanti�cation is understood. A clause in this form is also calleda rule. The collection of atoms on the left-hand side (A1; : : : ;Ak) is called the head of the rule,and the collection of atoms on the right-hand side (Ak+1; : : : ;An) the body. When k = n, the bodyis empty, and when k = 0, the head is empty. A Horn rule has at most one atom in the head:k � 1. A de�nite clause has exactly one atom in the head: k = 1. A rule is range-restricted if everyvariable that appears in the head also appears in the body.A ground rule with an empty body is called a fact. De�nite rules have a clear procedural interpre-tation. Consider A B1; : : : ;Bn:We call this rule a rule for A. The above rule can be interpreted to say that A is shown (or proven)whenever all the Bi's are shown (proven).4 Thus, rules are equivalent to views in the parlance ofrelational databases. Logically, rules are more expressive than views in relational databases whenrecursion is permitted, and when disjunction is permitted.5 A fact then is simply interpreted astrue. There is nothing more that needs to be shown to support it.A database may then be de�ned as a collection of rules and facts. We call the language in whichthe database is written|based on clauses and �rst-order logic, as de�ned above|DATALOG[Ullman, 1988]. When all the rules and facts are de�nite, the database is called de�nite. It iscalled disjunctive (or inde�nite) otherwise. Thus, a database DB often is de�ned as consisting oftwo parts:� the extensional database, EDB, and� the intensional database, IDB.The EDB is the database's collection of facts. The IDB is the database's collection of rules. (Wesoon rede�ne databases to have a third component, the set of the database's integrity constraints.)Conventionally, negative data is not represented explicitly in a logic database. There are several4The interpretation for disjunctive rules is not as apparent. Essentially, a disjunctive rule states that at least oneof the atoms in the rule's head is true whenever all the atoms in the body are.5The SQL-3 standard extends SQL to support recursion [Melton and Simon, 1993]. So once SQL-3 becomes thestandard, this di�erence in expressiveness will go away, since any relational database that supports the SQL-3 will,in fact, be a deductive database system.

Page 6: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 6 of 46standard approaches to allow negative data to be inferred. The closed world assumption (CWA)is a default rule for the inference of negative facts [Reiter, 1978]. For any ground atom A, thenegation of A is accepted as true if A is not provable from the database. The set of all negatedatoms inferable in this way is written as CWA (DB). Another approach to negation is the Clarkcompletion of a database [Clark, 1978]. This formalizes the concept that the set of tuples true fora predicate is precisely the set that can be proven to be true via the facts and rules. In brief,this is accomplished by adding a formula to the database for each predicate (to correspond withthe collection of rules for that predicate), to supply the logical only if half of the de�nition of thepredicate. Certain negated facts are then deducible from the completed database, the database withthese \only if" formulas added.So far, we have assumed that the body of a database rule (clause) contains only positive atoms.However, it is useful sometimes for de�ning database rules if negated atoms can be used in thebody of a rule. We need negation in logic databases if we want to subsume the relational algebra,which includes set di�erence. We can extend deductive databases with negation. A rule which hasa negated atom in its body is called a normal rule, and deductive databases that have normal rulesare called normal databases. We call DATALOG that has been extended with default negationDATALOG:. For example, the normal rulep (X) not q (X).is interpreted, in general, to mean that, for any constant a, if q (a) is not true (or cannot be provento be true), then p (a) is true.We write this negation with not rather than with the symbol for logical negation, `:', and referto it as default negation. This is because for most semantics that have been de�ned for normaldatabases, such semantics interpret the use of default negation di�erently than logical negation.There are a number of semantics that have been de�ned for normal databases, and no one semanticsis universally accepted. Also, since the notion of default negation is generally based on provability,not logical truth, such default negation is beyond �rst-order logic.6The intuition behind the use of default negation becomes confused when it is combined with recur-sion. One solution to this problem is simply not to allow recursive de�nitions through negation.The canonical example of recursion through negation isp (X) not q (X).q (X) not p (X).This restriction not to allow recursion through negation leads to what are called strati�ed databases,and such databases have a unique standard model called the perfect model of the database. (Weconsider model semantics more in Section 3.2 in the context of integrity constraints.) In somecases, a non-strati�ed database may also have a unique standard model. Some of these casesmay be captured by the concept of stable database. Two important model semantics for nor-mal databases (and logic programs) are the well-founded semantics ([Gelder et al., 1988]) and thestable model semantics or the semantically equivalent well-supported models ([Fages, 1991] and[Gelfond and Lifschitz, 1988]). (See [Minker, 1996] for a retrospective on work in semantics forlogic programs and deductive databases.)6We still speak in terms of �rst-order logic even for normal databases, as most of the �rst-order framework ofdeductive databases remains applicable.

Page 7: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 7 of 46We also consider logics other than �rst-order logic (besides the addition of default negation), inparticular, modal logic and temporal logic. Such logics extend �rst order logic with additionaloperators. Modal logics deal with modalities, such as possibility and belief. Temporal logics dealwith concepts of time, such as past and future.3 Semantics of Integrity ConstraintsWe now rede�ne a deductive database to have three components:7� the extensional database, EDB,� the intensional database, IDB, and� the integrity constraints, IC.The formulas in IC are meant as knowledge about the world, or the domain of the database. Theyare not intended to generate data, as do the rules in IDB, nor do they represent speci�c data, as dothe facts in EDB. Their use is to verify that the data (both intensional|deduced by the rules|andextensional|facts in EDB) in the database is consistent with the general model of the world, asrepresented by IC.Integrity constraints (ICs) in relational databases were introduced for this very purpose: they areused to validate (or invalidate) changes to the database (transactions) to ensure that the new stateof the database remains consistent with the world view. However, the world knowledge that theICs encode has been found to be useful for a wide array of database tasks. We discuss a numberof these applications in Section 6.While the standard view has been that ICs represent knowledge about the world, this position isdebatable. In Subsection 3.3, we consider the view that ICs represent knowledge about the databaseitself. In any case, it is recognized that ICs represent knowledge, general characterizations of howthings are (and, hence, are a type of knowledge representation), whereas databases themselves|EDB and IDB, which, together, we call the state of the database|represent data, speci�c factsabout the world (domain).In this section, we speci�cally consider the semantics of the database with respect to the integrityconstraint component. We �rst consider what types of knowledge are expressible via integrityconstraints. We then consider various model based semantics that have been proposed for, orextended for, ICs. Last, we consider some extensions that have been suggested for ICs that takethem beyond �rst-order.3.1 Expressiveness of Integrity ConstraintsOften ICs are limited to be formulas in clausal form, as are rules and queries. For ICs in this form,to distinguish them from rules and queries, we write their clauses with `(' instead of ` '. Considerthe following example with predicate employee=5 (a 5-ary relation), which has, say, the attributesname, address, salary, dept, and age. The domain constraint that states that all employees are over7Deductive databases have been de�ned both with and without integrity constraints in the literature.

Page 8: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 8 of 46sixteen years old can be written as:A > 16 ( employee (age : A).8 1)The next example expresses the functional dependency of the second attribute on the �rst.P1 = P2 ( employee (name : N,address : P1), employee (name : N,address : P2). 2)Consider that we have another predicate person=3 which has, say, only the attributes name, address,and age. Thus employee=5 is a \sub-type" of person=3. The next IC expresses this relationship,that every employee is a person.person (name : N, address : P, age : A) (employee (name : N, address : P, ssn : S, dept : , age : A). 3)Alternatively, this statement could instead be made part of the IDB as a rule, to generate the personrelation. Its placement in the IC suggests that the facts for the employee and person relations areobtained separately, and their necessary relationship is expressed by the integrity constraint. Thus,the IC is used to check the state of the database, not to generate any portion of it.There have been a number of proposals for the syntax of ICs. One is to allow any �rst-orderformula as an IC. From the perspective of implementation, however, it may be useful to restrictthe syntactic form of ICs. Often ICs are restricted to denial constraints; that is, formulas of theform ( L1,: : :,Ln.in which the Li's are atoms if default negation is not part of the database, and are atoms or defaultnegations of atoms if default negation is part of the (normal) database. Many constraints can berepresented in the form of normal denial constraints, and all clausal constraints can be representedthat way. For the three IC given above, the corresponding denial constraints are:( employee (age : A), A � 16. 10)( employee (name : N,address : P1), employee (name : N,address : P2), P1 6= P2. 20)( employee (name : N, address : P, ssn : , dept : , age : A),not person (name : N, address : P, age : A). 30)Sometimes extra predicates may need to be introduced to write a constraint as a denial constraint.Consider the following formula that is meant as a referential integrity constraint that expressesthat the department name for every employee must appear in the department relation (an inclusiondependency): 8N;P; S;D;A: 9R; T: (employee (N,P,S,D,A)! dept (D,R,T))To represent this as a denial constraint, we �rst add a new predicate deptname=1, and the followingrule for deptname=1:8We employ for convenience an attribute extended DATALOG that uses attribute names as is done in SQL withrelational databases, to be understood in the obvious way. The \slots" for any attributes not mentioned in an atomare assumed to be �lled with unique (existential) variables. (Likewise, a convention adopted from Prolog is to writethe special variable name ` ' in such slots, to be interpreted as a unique, existential variable.)

Page 9: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 9 of 46deptname (D) dept (D,R,T).This rule essentially de�nes a projection on dept=3. Then the integrity constraint can be representedas follows:( employee (name : , address : , ssn : , dept : D, age : ), not deptname (D).Assume dept=3's attributes are name, region, and type. We could likewise write this using attribute-extended DATALOG, without having to introduce a new predicate:( employee (dept : D), not dept (name : D).Recall we assume of our notation for attribute-extended DATALOG thatdept (name : D) � 9R; T: dept (name : D, region : R, type : T)Thus, the existential quanti�cation is inside the not within the integrity constraint, and so theoverall quanti�cation is as we intend.Certain classes of formulas, however, cannot be written in the form of denial constraints. Non-range-restricted ICs, such as 8X; Y: (p (X)! q (X,Y))cannot be written as normal denial constraints.We also have not considered here the use of aggregation|average, maximum, minimum, sum, andsuch|in integrity constraints, or in rules. Aggregation is also a higher-order concept, and cannotbe modeled in �rst-order logic. In this chapter, we do not consider the use of aggregation inDATALOG. However, we brie y discuss some work to extend ICs to aggregation in Section 4.4.3.2 Model SemanticsModel semantics o�ers a way to de�ne formally the meaning of a logic database. As discussed inSection 2, there has been much work, and various approaches, to characterize both DATALOGand DATALOG: databases via model semantics. Once we introduce integrity constraints intodatabases, it is necessary to extend any model semantic characterization to cover the meaning ofthe integrity constraints (with respect to the rest of the database) too. Again, there are a numberof approaches.An interpretation over a logical language is a truth assignment (a valuation) of true or false to everyconstructible ground atom, or (de�nite) fact.9 It can be represented as a set: any atom in the set isconsidered true, and any constructible atom not in the set is considered false. Every constructiblesentence can be then assigned a valuation with respect to the interpretation, preserving truth inthe obvious way over the logical connectives and quanti�ers.A model M of a collection of sentences, say P (a DATALOG database is such a collection), is an in-9A formula is constructible over a language if it can be constructed with the variables, constants, and predicatesin the language and the logical connectives and operators.

Page 10: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 10 of 46terpretation I of the language of P such that the valuation of every sentence in P is true. A Herbrandmodel of P is a model of P that is restricted to only the predicates and constants (and functionsymbols, in the case of logic programs) that are mentioned in P . (See [Boolos and Je�rey, 1989,Lloyd, 1987] or another basic text in logic for more background.) We restrict our attention toHerbrand models, and, from this point on, when we say \model", we mean a Herbrand model.A model of P is called a minimal model if no subset of it is also a model of P . Consider �rst ade�nite DATALOG database DB. As every clause in DB is a rule or a fact|so has a non-emptyhead|the consistency (that it has a model) of the database is always assured. (Recall that nonegated facts can be deduced from DB.) Furthermore, it can be shown in this case that DB hasa unique, minimal model. The generally accepted model semantic characterization of DB is thisminimal model. We denote this model by MDB.There are two standard approaches to ascribe the meaning of IC. Let DB = EDB [ IDB again forthe following discussion. By the consistency de�nition [Kowalski, 1978], the ICs are satis�ed i�DB [ IC is consistent; that is, it has a model. By the entailment de�nition [Reiter, 1984], the ICsare satis�ed i� DB j= IC; that is, DB logically entails the ICs.There are problems with both de�nitions, however. Neither fully captures our intuition of integrityconstraints. When ICs are restricted to be denial constraints, these de�nitions do not di�er. TheICs can only serve either to deny the database's model (the minimal model MDB)|by not beingconsistent with that model, or likewise, by not being logically entailed by the model|or to con�rmit. If an IC may be any clause, however, the two de�nitions di�er. Under the consistency de�nition,ICs which are syntactically equivalent to rules (so do not have empty heads) will, indeed, behaveas rules. This is because DB [ IC may be consistent, and thus have a model, but this model neednot be the intended model, MDB. (It is MDB[IC instead.) Thus, the distinction or value in callingthese formulas integrity constraints is lost. So this does not capture our intuition that integrityconstraints and rules are epistemically di�erent, and that integrity constraints only \check" thedatabase but do not generate data.Under the entailment de�nition, these \rule" ICs intuitively still serve only to deny or con�rmthe database's model according to our notion of ICs. However, the entailment de�nition does notcapture our intuition of integrity constraints either, for a di�erent reason. Consider the emptydatabase (with no clauses). All models are models of the empty database. Therefore, no IC islogically entailed. This is not what we intend.A standard way to reconcile the two approaches is to say that the ICs must be entailed by thedatabase's minimal model, MDB: MDB j= ICThis works �ne for de�nite (non-disjunctive) databases without default negation (DATALOG). Thepicture becomes complicated, however, when we consider databases which contain disjunction, orwhich contain default negation (DATALOG:). In such cases, there may no unique minimal modelto represent the database's meaning. Instead, there may be a number of minimal models of thedatabase. One approach is to consider the set of a database's minimal models to characterize thedatabase. Again, the entailment and consistency approaches di�er. The pure entailment approachis that every minimal model of the database must satisfy IC. The pure consistency approach isthat at least one minimal model satis�es IC. In the latter case, the ICs e�ectively eliminate those

Page 11: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 11 of 46models which do not entail them. There exist possible semantics for ICs between the entailmentand consistency de�nitions that are sensible ([Chan, 1993] and [Sakama, 1989]), but we do notconsider them here.Thus once we allow disjunction or default negation in our databases, we may lose the substance ofour intuition for integrity constraints, that they only \check" the database and do not contributeto generate data. Any proof procedure, or query evaluation strategy, must employ the ICs, as theICs e�ectively eliminate models from consideration. Our intuition is still weakly satis�ed: the ICsdo not actively generate data, but may indirectly allow for more to be inferred.Integrity constraints and negation are closely related. Denial constraints are e�ectively a way toadd logical negation into deductive databases. Recall that the headless clause is equivalent to adisjunct of negated atoms. As seen with the consistency and entailment de�nitions, a databasewith denial constraints need not be consistent; that is, there may be no model that satis�es boththe database's state and its constraints. In [Fern�andez et al., 1993], it is shown that any normallogic program (or deductive database) can be transformed into a disjunctive logic program (ordisjunctive deductive database) with certain associated integrity constraints added. The subset ofthe set of minimal models of the disjunctive logic program, which is also consistent with the ICs,is equivalent to the set of stable models [Gelfond and Lifschitz, 1988] of the initial normal logicprogram. This is a powerful result, and yields credence to the fact that integrity constraints andnegation are two di�erent approaches to many of the same semantic issues.We brie y present the algorithm of [Fern�andez et al., 1993]. For each predicate symbol a, a newpredicate symbol Ea is introduced. Let ah~k i be a fact.10 The intuitive meaning of the fact Eah~k iis that there is evidence for the fact ah~k i. The transformed database is created by the followingthree transformation rules.1. Add the rule Eph~x i ph~x i.to the IDB for each predicate symbol p.11 In e�ect, this adds a rule that states that if ph~x iis true, then there exists evidence for ph~x i.2. Rewrite each rule into a disjunctive rule by moving every negated atom in the rule's body toits head, and replacing the atom with its evidential counterpart.12A B1, : : :, Bn, not D1, : : :, not Dk.A, ED1, : : :, EDk B1, : : :, Bn.3. Introduce the IC ph~x i Eph~x i.10We use the shorthand notation ~x to refer to a sequence of variables and constants. Thus, given the predicatea=n, ah~x i is shorthand for a (X1,: : :,Xn), in which each Xi is a variable or a constant.11Here, ~x is most general; that is, ~x = X1; : : : ;Xn, (assuming that p is of arity n) for which each Xi is a variable,and the Xi's are pairwise distinct.12Given atom D = dh~x i, let ED be shorthand for Edh~x i.

Page 12: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 12 of 46for each predicate symbol p of the database. In e�ect, this IC states that if there is evidencefor ph~x i, then ph~x i must be true.The minimal models of the transformed database which are consistent with the integrity constraints,when all predicates of the form Ep are removed, are precisely the stable models of the originaldatabase. The IC removes exactly the models of the transformed database that cannot be stablemodels of the original.Again, the intuition of what ICs mean becomes confused in normal databases (DATALOG:), andwhen we permit normal ICs. First, it is common when we shift to normal databases to restrict ourfocus to (normal) denial constraints. As discussed above, this is done without loss of generality.One approach to ascribe a semantics to normal ICs|that is, to get a handle on what they mean|isto convert them to rules. This can be done as follows.We introduce a new, special predicate symbol called bottom, written as `?=0', into the databaselanguage. We replace every denial constraint with a rule for `?', by replacing the denial's emptyhead with `?'. ( A1,: : :,Ak.? A1,: : :,Ak .The atom `?' is considered always to be false. Thus, one may consider the database to containexplicitly one single negated fact, :?.The advantage of this approach is that one may readily apply semantics that have been de�ned fornormal deductive databases (and normal logic programs), but which do not account for integrityconstraints|the stable model semantics, the well-founded semantics, and so forth|to normal de-ductive databases with integrity constraints (as normal denial constraints). This is because thedenial constraints have been converted into normal rules, which the semantics interprets. Specialprovisions have to be added, of course, to handle appropriately the special atom `?', and caseswhen it is deducible. The deducibility of `?' represents a contradiction with the ICs. The seman-tics might be modi�ed to discount any (minimal) models in which `?' appears, as in the generalizedconsistency de�nition above.3.3 Epistemics of Integrity ConstraintsAn epistemic approach that employs modal logic has also been promoted for expressing integrityconstraints in [Reiter, 1992]. This approach allows greater expressiveness for queries and ICs, sothat queries may not only ask about the facts of the database, but also about what the databaseknows. Reiter's view here is that integrity constraints are knowledge about what the databaseknows, rather than knowledge about the actual world. The database's state remains unaltered.The rules and facts are written in �rst-order logic. However, the integrity constraints and queriesare allowed to be statements in the modal logic. The proposed modal logic is KFOPCE13 asintroduced in [Levesque, 1984]. The modal operator introduced, K, is intended as \know".13FOPCE stands for �rst-order predicate calculus with equality. KFOPCE is FOPCE extended with a single beliefmodal operator, K.

Page 13: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 13 of 46For example, let the database DB consist of the single disjunctive fact (p_ q). The query p hasthe answer unknown. Neither the answer yes nor no is appropriate; p might be true. However, thequery K p has the answer no. That is, \No, p is not known to be true."Consider that DB should never assign the same individual to be both male and female. Thefollowing KFOPCE IC expresses this.( K (male (X); female (X)).14To express the functional dependency of the second attribute of a predicate r on its �rst, we canwrite the IC as K (Y = Z) ( Kr (X,Y), Kr (X,Z).Integrity constraint satisfaction is de�ned via entailment in the KFOPCE modal logic. Reiter alsoshows how query evaluation can be done in this logic.The approach in [Demolombe and Jones, 1996] also employs a modal logic. They attempt to clarifysome aspects of Reiter's work, in particular, concerning the interpretations and justi�cations of ICs.On one point, they distinguish among di�erent causes in a database which can lead to IC violation:the presence of false facts versus a lack of true facts in the database. For them, a database maycontain incorrect information. Some incorrect facts can be discounted when they are at odds withthe database's integrity constraints. The two concepts of the validity and the completeness of astatement with respect to the external world, and what the database believes about the world, arede�ned formally.In their logic, the modal operator B denotes \believe". In brief, a sentence P is valid i� P BPis provable from the world and database beliefs. It is complete i� BP P is provable.In this framework, there are three important sets:1. DB: the beliefs of the database,2. SAF: the beliefs of the database|a type of integrity constraint|guaranteed to be true aboutthe world (safe beliefs), and3. IC: integrity constraints that indicate the parts of DB for which validity or completenessmust be enforced (the traditional role of ICs).Thus Demolombe and Jones seek to have both integrity constraints that express knowledge aboutthe world and integrity constraints that express knowledge about the database itself. Their ap-proach may allow for additional exibility in the handling of ICs.4 Types of Integrity ConstraintsThe previous section provided examples and theories of ICs. This section considers di�erent typesof ICs that have been proposed, and the types of knowledge they try to capture. First, we considerthe distinction between static and temporal ICs. Second, we distinguish between state and universal14Note that K scopes across both atoms in this constraint. It is an important distinction that this has a di�erentmeaning from the constraint ( Kmale (X), Kfemale (X).

Page 14: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 14 of 46ICs. Then we discuss user and preference constraints, IC-like formulas that modify the view of adatabase for a given user. Finally, we consider work on constraints with aggregate predicates.4.1 Static versus TemporalStatic ICs refer to a given state of theDB. All the ICs that we have discussed so far have been static.Consider, for instance, the IC that states that all employee ages are over sixteen. The constraintmust be true for any state of theDB, and can be checked against any state, without reference to anyother states the database has had, or will have. The standard relational ICs|such as functionaldependencies, multivalued dependencies, referential constraints, and domain constraints|are staticICs [Das, 1992].Other constraints may be dynamic. They are constraints across states of the database. No onedatabase state provides enough information to check a dynamic constraint. Consider the following� A new employee's salary must be less than $50,000.� The salary of an employee may never be decreased.When consecutive states of the database represent time, such dynamic constraints may be calledtemporal. Several di�erent approaches have been developed for dealing with temporal ICs. Inthis subsection, we present some of these techniques. In particular, we show how to express thetwo temporal ICs from above using di�erent formalisms. For our examples, recall the predicateemployee=5 introduced in Section 3.1, and consider that attribute name is the key of the relation.One general approach to deal with temporal ICs is to add action rules. Three action relations areadded for each relation in the schema: one each for insertion, deletion, and update. (See [Das, 1992]for background.) For example, given the employee=5 relation, the new action relations added areins employee=5, del employee=5, and upd employee=10. The ins employee=5 relation is used to referto the situation when a new employee record (fact) is inserted. Thus, the �rst temporal constraintfrom above is written as S < 50000 ( ins employee ( , ,S, , ). 4)In the case of updates, the number of attributes is doubled: the �rst set represents the databasestate before the update, and the second set stands for the database state after the update. Forinstance, Si � Sf ( upd employee (N, , Si, , , N, , Sf , , ). 5)is a temporal IC that represents the second constraint from above.Many di�erent ways can be used to express time in relational databases. One standard approachis to add two extra columns to each table, call them start and end [Navathe and Ahmed, 1993].The meaning of a tuple then is that the tuple (for its other attributes) is valid over the time periodindicated by start to end. Within this framework, various temporal operators that employ thetime intervals are useful. The special constant now may be used as a time value, with the obviousmeaning. Assume that a relational calculus query may contain a single time attribute time. Atuple satis�es the query (is an answer) temporally if the value t of attribute time in the query is

Page 15: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 15 of 46such that ts � t � te, and ts is the value of start and te of end in the \matching" tuple in thedatabase. Thus the previous ICs can be re-expressed asS < 50000 ( employee (name : N, salary : S, time : T1), 40)not(8T2: employee (name : N, time : T2),T1 > T2).Ss � Se ( employee (name : N, salary : Ss, time : Ts), 50)employee (name : N, salary : Se, time : Te),Ts � Te.We take liberties with our use of default negation here to allow it to scope over a conjunction ofatoms, rather than over just a single atom as is usual. This could be rewritten in clausal form via atransformation similar to the one in Section 3.1, to remove also the nested universal quanti�cationrequired above.Past First-Order Temporal Logic (Past FOTL) is a useful language for expressing temporal ICs,as explained in detail in [Chomicki, 1995]. Most temporal ICs are expressible in this language ina natural manner, and the ICs can be checked e�ciently. This language uses a unary operatorprevious time, denoted as `�', and a binary connective since. An advantage of the formalism isthat an IC can relate events arbitrarily separated in time. For the IC to be de�ned as conciselyas possible, it is convenient to use the temporal connective `3', which means \sometime in thepast". It is de�ned via the connective since. In this language, our example ICs can be expressedas follows. S < 50000 ( employee (name : N, salary : S), 400)not (3employee (name : N)).S1 � S2 ( 3employee (name : N, salary : S1), 500)employee (name : N, salary : S2).A fourth way to handle change in a database is the situation calculus. This approach requiresus to extend our logic databases to allow terms (from �rst-order logic), albeit the use of terms isrestricted. Each predicate in the database is extended with one more attribute, call it state. Onlythis attribute of any atom is allowed to have a complex term (not just a variable or constant) as avalue. In one approach, a single functor trans=3 is introduced into the language: the �rst argumentis the type of \change", the second argument is the value of the \change", and the third argumentrepresents the current \state" of the database. These might include, for instance, the changes addand delete. Every type of transaction must have rules in the database that describe what is new inthe database after it occurs. Thus the notion of time is simple: change is represented as a sequenceof transactions|represented, say, as a nesting of states via trans=3|that have been applied to thedatabase.15 Any state attribute may only take as a value the initial state, denoted by `;', or atrans=3 term. Our example ICs can be cast thusly.S < 50000 ( employee (name : N, salary : S, state : trans ( , ,W)), 4000)not employee (name : N, state : W).S1 � S2 ( employee (name : N, salary : S1, state : trans ( , ,W)), 5000)15That this sequence must be canonical, thus serializable, is the same requirement made for transactions in relationaldatabase systems.

Page 16: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 16 of 46employee (name : N, salary : S2, state : W).The situation calculus is quite exible, and allows us to write easily transaction ICs that arespeci�c to given types of transactions. Another advantage of the situation calculus approach isthat it makes explicit in the logic the transactions made over the database, so that inferences canbe made across them. This has also been its primary disadvantage: it incurs the so-called frameproblem that people encountered �rst when using approaches related to the situation calculus inarti�cial intelligence. Inference rules must be added to the database system to represent the frameaxioms for reasoning across states. The problem is that it is seemingly di�cult and prohibitivelyexpensive to reason across states to infer what remains true after a transaction, and what doesnot [Hayes, 1973]. (Reasoning is inherently harder because this representation is closer to the full�rst-order predicate calculus as some use of terms is permitted.)16 In [Pinto and Reiter, 1993]and [Reiter, 1995], the authors argue for the situation calculus as a representation for temporaldatabases, and show how the frame problem can be alleviated.As demonstrated, there are a number of approaches to representing time and change formally indatabases. Such a formalism o�ers us a way to extend integrity constraint semantics in a formal wayfor temporal databases, and to de�ne formally dynamic constraints. So far, there is no consensus onhow to model time and change, nor are the advantages and disadvantages of the di�erent approacheswell known. However, it seems to have been well demonstrated that the notions of time and changein databases can be formally modeled, and that integrity constraints can be well de�ned in suchmodels.4.2 State versus UniversalState constraints are equivalent to integrity constraints within a given state of the database, butare assigned a di�erent meaning. They describe what is true in the database in a given state;hence, they may be invalidated on updates. This reverses the usual epistemological weight of ICs.Usually, a new database state is only valid if it satis�es the ICs. For state constraints, any state isvalid with respect to them, but a new state may invalidate the constraint.One way in which state constraints may arise is via queries that fail.17 Until the database changesin such a way that the queries then succeed, such queries can remain designated as state con-straints. Formally, state constraints may be de�ned in a similar fashion to traditional ICs, usingthe entailment or consistency de�nition. However, now it is with respect to the database at a giventime.The reason that state constraints are useful is that they can be used the same way as integrityconstraints in many applications. (See Sections 6.1 and 6.2.) It is possible to use more sophisticatedmechanisms to generate state constraints, such as by extracting the minimal failing subqueries fromfailed queries (Section 6.2).Since state constraints may be invalidated by database updates, a mechanism for testing theirvalidity is necessary. This can be accomplished via the same techniques database systems use16In [Chomicki, 1995] it is shown that the addition of terms in this manner is still computable.17Unless, of course, the query failed because it is subsumed by a universal integrity constraint; in this case, we donot get a state constraint.

Page 17: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 17 of 46to check the validity of updates in the presence of \universal" ICs. When an update violates auniversal integrity constraint, the transaction is rolled back. However, when an update violates astate constraint, the transaction succeeds, but the state constraint is deleted.4.3 User and Preference ConstraintsICs normally express statements that must be true with respect to the database (or, at leastwith respect to the database's state, as with state constraints). We can de�ne another type of\constraint", however, that need not even be consistent with the database. Such constraints canbe used on behalf of given users to change the evaluation of queries for those users. This providesthe user a speci�c semantics on the database to that user's liking, above and beyond the database'ssemantics. Such user constraints then are statements that declare a user's restrictions. Considerthe following statement: S > 30000 ( employee (N,A,S,D,A).Treated as an IC, this would state that all salaries must be more than $30,000. If it is possible thatan employee have a salary of $30,000 or less, this cannot be an IC. However, suppose a particularuser wishes to restrict consideration of employees, as answers to queries, to only those whose salaryis over $30,000. In this case, the statement above could be employed as a user constraint.Another example for user constraints is as follows. Consider a ight database. An individual maywish to restrict consideration to ights that do not include the JFK airport as a stopover. Clearly,there may be many ights which use JFK as a stopover, but it seems unreasonable to list them inan answer set for a user who is not interested in them.The type of user constraint presented above can also be expressed via views. However, this doesnot account for the common situation in which a user would like to see the answers to a query insome speci�c order, based on preferences. Returning to the example of ights, we may have certainpreferences, such as speci�c airlines, times, or routes. As shown in [Gaasterland and Lobo, 1994],such preferences can be handled via annotated user constraints. We brie y illustrate some of theseconcepts. The idea of the annotation is that it represents a preference for a particular IC. Theannotation values must be (partially) ordered. (In general, they form a lattice structure). A simpleset might be, along with the ordering:bad < OK < good < excellentConsider now the following constraints (based on [Gaasterland and Lobo, 1994]):nonstop ight (A,B,Date,Flight) : excellent.direct ight (A,B,Date,Flight) : good.indirect ight (A,B,Date,Flight) : OK.stopover (Flights, Airport) : bad ( london airport (Airport).When the individual asks a query for a ight between certain cities, nonstop ights are noted asexcellent and given �rst, direct ights are noted as good, and so on. The user may request thatannotated answers below a threshold value not be presented.

Page 18: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 18 of 464.4 Aggregate ConstraintsAdditional predicates must be introduced to handle ICs involving aggregates [Das, 1992]. Suchintegrity constraints cannot be expressed in �rst-order logic. We brie y consider the standardbuilt-in aggregate operations of SQL|count, sum, avg, max, and min. Since the latter four arehandled in a similar way, we consider only count and sum here. For representing count, the aggregateoperator count (Q;N) is used, in which Q is a query and N is an integer interpreted as countingthe number of answers to Q. For representing sum, the aggregate operator sum (X;Q;R) is used,in which Q is a query and R is interpreted as the sum over all values of the variable X|X mustappear in query formula Q|that are returned in Q's answer set. Here are two examples.The number of students in a section must be at least ten.( student (Y,U,V), count (section (X,Y,Z);T), T < 10.The total of the employee salaries may not exceed one million.( sum (U; employee (X,Y,Z,U,V);R), R > 1000000.The ability to handle ICs with aggregation is necessary, because many relational database applica-tions use aggregates. More work is needed in this area, to de�ne formally such ICs and to devisehow to handle such ICs e�ciently.5 Reasoning with Integrity ConstraintsWe advocate the explicit representation of the database's integrity constraints (in IC), just as arethe database's rules and facts. Then it is explicitly clear what the constraints of database are.When the integrity constraints are left implicit, it is not necessarily obvious what the semanticsof the database is, and the semantic information made available by the ICs is not available toapplications.There is, however, the separate issue of how one should best reason over a database with integrityconstraints. It might be easier, in certain contexts or for certain applications, to reason about adatabase in which integrity constraints have been processed partially, or eliminated altogether. Atransformation of a database in such a way should result in an equivalent database, of course. Thatis, every query posable to the original database should result in the same set of answers when posedto the transformed database. Such reasoning techniques are important not only for applicationswhich employ integrity constraints, but also may be necessary as part of the query evaluation itself.We present three techniques for processing integrity constraints. The most general one, presentedin [Kowalski and Sadri, 1988], describes the transformation of a deductive database (or logic pro-gram) with integrity constraints into an equivalent one without constraints. The result of thistransformation is a normal deductive database (or logic program) with some of the original deduc-tive rules modi�ed via \incorporating" the information of the integrity constraints into them. In[Fern�andez, 1994], all integrity constraints are expressed as denial rules (as described in Section3.1). The denial rules are used to eliminate semantically meaningless minimal models|those mod-els which are inconsistent with the denial rules|in disjunctive deductive databases. The methoddescribed in [Chakravarthy et al., 1990] compiles out the integrity constraints too. The approach

Page 19: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 19 of 46does not eliminate integrity constraints by rewriting deductive rules however, but rather it anno-tates the rules with relevant pieces of the integrity constraints called residues, to be used duringquery evaluation.5.1 Eliminating Integrity ConstraintsThe transformation of Kowalski and Sadri [Kowalski and Sadri, 1988] not only eliminates integrityconstraints, but also makes the modi�ed database consistent with them. When the original theoryis inconsistent, the new theory represents a particular way of restoring consistency. The transfor-mation is done as follows:Let ( ah~x i,L1,: : :,Ln be an integrity constraint (expressed as a denial rule) to be eliminated, inwhich ah~x i is an atom and L1,: : :,Ln are atoms or default negated atoms. Then each deductiverule of the form ah~y i K1,: : :,Km. 6)in which K1,: : :,Km are literals, and for which ah~x i and ah~y i unify with most general uni�er �[Lloyd, 1987], is replaced by at most three new rules. Intuitively, the �rst new rule deals with thecase when the integrity constraint is applicable to rule (6). The next two rules, potentially to beadded, handle the case when it is not applicable because X and Y do not unify. The �rst rule isah~y i� K1�; : : : ;Km�;not (L1; : : : ;Ln)�:18The next two rules are needed only when ah~y i� is a non-trivial instance of ah~y i (that is, not avariant of ah~y i). The �rst of these isah~y i K1,: : :,Km, not th~y i.in which t is a new predicate symbol, and there is a single rule de�ned for it as follows.th~y i (~y = ~x ).19If n = 0, the negation of the empty conjunct is de�nitionally false, in which case the above ruledoes not need to be added. If n > 1 (so the conjunct contains more than one literal), the rule maybe written instead as a collection of deductive rules, employing a technique similar to the one usedin Section 3.1 to rewrite an arbitrary integrity constraint as a denial constraint.It is easy to show that the transformed database, call it DB0, is always consistent with the originalset of integrity constraints IC, in the sense that there exists no SLDNF refutation [Lloyd, 1987] ofany variable-free instances of clauses in DB0 [ IC. It can also be shown that the original databaseDB with IC is equivalent to DB0, in the sense that any arbitrary query results in same set ofanswers for either database (the original or the transformed).There are several problems inherent in this approach, however. First, by incorporating ICs intoIDB rules, the epistemic distinction that is usually drawn between ICs and rules is lost. Second,18Again, we take liberties with our use of default negation here to allow it to scope over a conjunction of atoms.19Here, (~y = ~x ) represents a conjunction of equalities. Given ~y = Y1; : : : ;Yn and ~x = X1; : : : ;Xn, then (Y1 =X1); : : : ; (Yn = Xn) is meant.

Page 20: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 20 of 46if an IC that is to be eliminated contains extensional predicates only, then some of the EDB factswould have to be expressed as rules in the transformation. Since a database user may still want toupdate such facts, the update would now have to be a view update. Third, incorporating ICs intorules introduces default negation into the transformed database; this may not be desired when theoriginal database does not use default negation.The following two subsections describe two techniques for eliminating integrity constraints thatlargely avoid the above problems. The �rst technique allows for the complete elimination of ICsin the restricted domain of disjunctive databases, in which disjunctive facts are represented viaminimal models. In the second approach, ICs are eliminated not by explicit rule rewrite, but byintroducing a new type of information stored with rules, called residues.5.2 Model EliminationIn normal and disjunctive databases, knowledge is often considered synonymous with the set ofcanonical models that characterize the semantics of the database. For a normal database, thecanonical models might be its stable models (if the stable model semantics has been chosen), or,for a disjunctive database, its minimal models. The semantics for a de�nite database is oftendetermined by a unique minimal model, and so the consistency and entailment de�nitions of ICs(as presented in Section 3.2) coincide, and the ICs can be disregarded. Given a normal or disjunctivedatabase and a set of integrity constraints that is consistent with the database, only a subset ofthe database's canonical models should be considered (under the consistency approach). If thecanonical models of the database are not explicitly known|and most proof procedures and queryevaluation strategies do not manifest the models|it is necessary to query the database unionedwith the ICs.Thus under the consistency interpretation, integrity constraints must play an active role in thedeductive process. Models that do not satisfy an integrity constraint are eliminated as they are notconsidered to be a part of the semantic characterization of the database. The possible reductionof the number of models describing the semantics of a database implies the possibility that moreinformation can be deduced from it. ICs play the role of �lters to eliminate models that are notmeaningful in the database. Consider the following example [Fern�andez, 1994].Example 1. Let the clause ( a (X); b (X) be an integrity constraint and let DB = fa (2) _b (2); a (1); b (1) _ b (2)g for which the semantics is represented by the set of its minimalmodels: nfa (1); a (2); b (1)g; fa (1); b (2)goThe model fa (1); b (2)g is the only one that satis�es the constraint; hence, DB does not entailthe constraint, but is consistent with it.Note that if the query b (X) is asked, we should get the answer b (2). This requires useof the IC. Without the IC, we would get no answers, as the intersection of the two minimalmodels is empty with respect to predicate b.2020In a disjunctive logic programming system (or a disjunctive deductive database system) which provides disjunctiveanswers (see [Lobo et al., 1992]), the weaker disjunctive answer (b (1) _ b (2)) would be returned.

Page 21: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 21 of 46As long as the IC contains inde�nite predicates only (that is, predicates used to express disjunctivefacts), they can be eliminated by removing models in which they are not satis�ed.In logic programming for disjunctive theories, a new inference rule must be added for the queryprocessing proof procedure to be complete. For example, the SLI proof procedure extends the tradi-tional SLD procedure with factoring and reduction to accomplish this. (See [Lobo et al., 1992].) Anextra inference rule named model elimination in [Loveland, 1969] also allows for complete inferenc-ing over disjunctive theories, and will also account for denial constraints, and is implemented anddiscussed further in [Stickel, 1988]. This work also applies to disjunctive databases with integrityconstraints.5.3 Residue MethodThe basic idea behind the residue method [Chakravarthy et al., 1986a, Chakravarthy et al., 1986b,Chakravarthy et al., 1990] is the use of a theorem proving technique called partial subsumption toattach integrity constraint fragments called residues to relations and rules. Hence, we start witha presentation of the concept of subsumption. A clause C1 subsumes a clause C2 if there is asubstitution � such that C1� is a subclause of C2. For example, ifC1: p (X,Y,a) q (X,Z), r (X,Y,Z,a).and C2: p (b,Y,a) q (b,Y), r (b,Y,Y,a), s (a).then C1 subsumes C2 by the substitution fX = b; Z = Y g.To illustrate partial subsumption, we show the basic procedure to test for subsumption betweenC1 and C2. First, the clause we are testing to be subsumed|C2 in this case|is instantiated to aground clause via a substitution � which uses all new constants k1,: : :,kn, not present in C1 or C2.This results in C2�: p (b,k1,a) q (b,k1), r (b,k1,k1,a), s (a).in which � = fY = k1g. Next, the clausal formula C2� is negated, resulting in the conjunction ofone denial rule and three facts::C2� : f( p (b,k1,a)); q (b,k1); r (b,k1,k1,a); s (a)g.The �nal step is the construction, if possible, of a linear refutation tree, with C1 as the proposedsubsuming clause, the root, using at each resolution step only clauses from :C2�. It is known thatC1 subsumes C2 i� there is a refutation tree that ends with the null clause. In our example, sucha tree exists, and is shown in Figure 1.The subsumption algorithm can be applied to determine partial subsumption, with an integrityconstraint taking the part of the subsuming clause and a relation (negated) taking the part of thesubsumed clause. In general, the null clause is not obtained, because integrity constraints rarely

Page 22: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 22 of 46<{

q(b,k1)<{<{r(b,k1,k1,a) r(b,k1,k1,a)<{fX=b,Y=k1g<{ q(b,Z), r(b,k1,Z,a) fZ=k1gp(X,Y,a) <{ q(X,Z), r(X,Y,Z,a) <{ p(b,k1,a)Figure 1: Refutation tree for subsumption.subsume relations.21 However, an IC might partially subsume a relation, leaving a fragment atthe bottom of the refutation tree. Such a fragment is called a residue. A residue, if nontrivial,represents an interaction between a relation and an integrity constraint.Before commencing subsumption, constants must be handled specially which occur in ICs, as thefollowing illustrates. Let r (X,Y,Z) be a relation and X = a ( r (X,b,Z) be an integrity constraint.The subsumption algorithm cannot work in this case, because the IC's clause cannot be resolvedwith r (k1,k2,k3) (the instantiation of r (X,Y,Z)). A transformation called expansion is applied to theIC which allows for the subsumption to proceed. In this case, expanding (X = a)( r (X,b,Z) yields(X = a)( r (X,Y,Z); (Y = b), which can be resolved with r (k1,k2,k3) to yield (k1 = a) (k2 = b).This is not yet the residue. The grounding substitution � must �rst be reversed by applying ��1.The residue is then (X = a) (Y = b), which intuitively represents the integrity constraint'se�ect on the predicate r.In some cases an extra step, called a reduction, is necessary for obtaining a useful residue. Thisstep basically takes care of removing redundant atoms. We leave out the details of expansion andreduction here.Next, we state the de�nitions of partial subsumption and residue.De�nition 5.1. We say that IC partially subsumes the atom A, i� IC does not subsume A, buta subclause of IC+ (in which IC+ is a result of expanding IC) subsumes A.De�nition 5.2. Given an integrity constraint IC and atom R, apply the subsumption algorithmto IC+ and R until no more resolutions are possible. Let C be the clause at the bottomof a refutation tree. Then (C�)��1 (in which C� is a result of reducing C) is a residue of ICand R.21There is something odd about the database's design if they do.

Page 23: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 23 of 46Several di�erent residues can result between an integrity constraint and a relation. If the residue isIC�, that is, there was no resolution, we call it a maximal residue. We call a residue that is alwaystrue, such as (a = a), a redundant residue. Finally, we call an integrity constraint merge-compatiblewith a relation if there is at least one residue that is neither maximal nor redundant. These arethe residues that can be used in semantic query optimization, as described in Section 6.1.6 Applications of Integrity ConstraintsAs integrity constraints are used to represent the semantics of the database, they are useful inmany advanced database applications. Certainly, ICs are vital for maintaining the consistencyof the database, and for updating the entries in the extensional database. This is the originalreason why integrity constraints were introduced. In this section, we describe other important ap-plications, which include semantic query optimization, cooperative query answering, view updates(updating intensional relations), the use of ICs to combine databases meaningfully and to resolveinconsistencies, and several additional applications.6.1 Semantic Query OptimizationSemantic query optimization (SQO) is a technique which uses integrity constraints to improve thee�ciency of query evaluation. Most of the SQO algorithms are applied to a query after the queryis posed to a database, but before the syntactic optimization is performed. SQO typically consistsof two steps. During the �rst step several queries logically equivalent (w.r.t. IDB and IC) to theoriginal query are generated. Then, the query with the lowest estimated evaluation cost is chosenand submitted for evaluation. This step is based on heuristics [Chakravarthy et al., 1990] whichtake into account physical database design (indices, table sizes, and so forth). Ideally, all theseaspects should be integrated into one, coherent optimizer.The major types of SQO are:1. Join elimination: this transformation eliminates a join from the query.2. Restriction introduction/scan reduction: the introduction of restriction on e.g. indexed at-tribute may reduce the cost of evaluation.3. Restriction elimination: this can be done when restriction on one of the attributes impliesrestriction on another attribute.Work on SQO has been done by several researchers starting with [McSkimin, 1976], [Hammer and Zdonik, 1980],and [King, 1981]. The most in uential has been the SQO technique developed in [Chakravarthy et al., 1986b],[Chakravarthy et al., 1986a], and [Chakravarthy et al., 1990]. The main idea of this technique canbe summarized as follows. SQO is done in two phases, the compilation phase and the transforma-tion phase. During the compilation phase residues (see Section 5.3) are computed and associatedwith relational tables and rules (views). The result of the compilation phase is a set of semanticallyconstrained rules, containing all the information carried previously by integrity constraints. Duringthe second phase, which takes place after a query is given to the system, the semantic transformeruses these stored residues to generate semantically equivalent queries that may be processed moree�ciently than the original query.

Page 24: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 24 of 46The following example describes the method.Example 2. Let the database contain the following relations: student with attributes id and name,takes section with attributes student id and name, faculty with attributes id, section, and age.Assume also that there is an integrity constraint which says that all faculty members are morethan thirty years old. That is,(A � 30) ( faculty (id : , section : , Age : A). 7)By the method of partial subsumption described in Section 5.3, with the above IC and therelation faculty, we can construct the residue: fA � 30 g, which then annotates therelation faculty. Intuitively, this means that any query with the predicate faculty must satisfythe condition stated by the residue to succeed.The query below asks for the names of all students taught by professors who are youngerthan eighteen.query (N) student (id : I, name : N), takes section (student id : I, section : S),faculty (id : , section : S, age : A), A < 18.Since the query contains the predicate faculty, the residue applies, and the following seman-tically equivalent query is produced:query0 (N) student (id : I, name : N), takes section (student id : I, section : S),faculty (id : , section : S, age : A), A < 18, A > 30.This query contains a logical contradiction, which can be easily determined without processingthe query. The query cannot return any answers, unless the database violates the IC. Hence,the query does not need to be evaluated. The optimizer, upon discovering the contradiction,can curtail the evaluation of the query.The primary advantage of the residue approach is that by associating integrity constraints explicitlywith the rules reduces the search space of integrity constraints that can apply to a given query. Thisallows for most of the computation related to semantic query optimization (SQO) to be done atcompile time. However, this rigid combining of rules and integrity constraints has disadvantages. Itworks well for queries that can be represented succinctly as a union of conjunctive queries,22 in whichcase it is su�cient to consider residues attached to each rule in isolation. In the case of recursiverules, however, residues have to be considered with respect to the derivation trees produced duringevaluation. What is required in the recursive case is the rewriting of the evaluation, not simply in-dividual rules. Research in SQO has been advanced in this direction by [Chakravarthy et al., 1990],[Lakshmanan and Missaoui, 1992], [Lee and Han, 1988], and [Levy and Sagiv, 1995].Another case for which the residue technique must be adapted|or another approach found|isfor query evaluation that interleaves join and union operations. (Bottom-up query evaluation doesthis.) Once again, such a query cannot be expressed succinctly as a single union of joins, towhich the residues could be directly attached. Research in SQO for queries involving an arbitrarynesting of union and join operations has been initiated in [Lee et al., 1991] and developed furtherin [Godfrey and Gryz, 1996a], [Godfrey and Gryz, 1996b], and [Godfrey et al., 1996].22This is the concept of top-down query evaluation.

Page 25: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 25 of 46SQO can be extended to object-oriented databases. An approach recently developed [Grant et al., 1997]transfers existing SQO techniques (for instance, based on residues) from the logic (or relational) tothe object database model. The core of this technique is the transformation of an object schemainto a relational schema, in such a way that the object semantics is re ected via integrity constraintsat the relational level. (See Section 6.5.) Correspondingly, each object query is transformed to aquery over the relational database. The optimization of a query is performed at the relational level(as described above) to produce a new relational query. This technique works for a large class ofobject queries, in particular, queries with methods and structure constructors.6.2 Cooperative AnsweringCooperative answering augments database systems to respond with more than just the query'sanswer set, whenever such additional information would help to disambiguate, clarify, or contributein some way to the user's goals and intentions. Numerous cooperative answering techniques havebeen explored. (See [Gaasterland et al., 1994] for a survey.) The techniques vary by which aspectof the query/answer cycle they address|for instance, to attempt to clarify a user's misconceptionsas indicated by the query, or to help the user formulate a query|and by what type of additionalresponse they provide|for instance, an explanation of the misconception, or a list of candidatequeries. Cooperative answering provides relevant knowledge in addition to the data of the answerset. This knowledge is derived from the semantics of the database, with respect to the query asked.The database's semantics, in turn, is represented via its integrity constraints.Some of the initial research in cooperative answering|in particular, within deductive databaseswhich employ integrity constraints|was done at Maryland. This work built upon previous work onsemantic query optimization. (Recall the previous section.) In [Gal, 1988], [Gal and Minker, 1985],and [Gal and Minker, 1988], it was shown that sometimes a query can be ascertained to fail (toproduce the empty answer set), without evaluating the query. This is true whenever the query issubsumed by an IC; that is, any answer to the query would be inconsistent with respect to thedatabase.When one asks a query, one usually wants answers. For database queries, this means that oneusually expects, or hopes for, a non-empty answer set.23 Thus, when a query fails, the emptyresponse is unexpected. Furthermore, the empty response is ambiguous in that it does not indicatewhy the query fails, or what about the query is responsible for the failure. (For instance, a part ofthe query may result in failure.)When a query must fail, because it is inconsistent with respect to the semantics of the database, itindicates a misconception on the user's behalf.24 So in such cases, we often call the query itself amisconception. The user's concept of what the database can meaningfully provide is at odds withwhat the database's speci�cation, via its ICs, indicates is meaningful. The query is guaranteednever to succeed on that database, unless the semantics of the database at some point changes.Therefore, it is cooperative to explain the cause of the misconception to the user. It is quiteuncooperative not to, since the user is then left to believe the query was valid to ask. Consider thefollowing query:23This is not always true. A system administrator may be testing integrity constraints, and so expects the emptyset. However, usually the person is seeking information.24Granted, there are other possibilities. The user may have simply typed the query incorrectly, for instance.

Page 26: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 26 of 46Q: in patient (name : P, ward : maternity), patient (name : P, gender : male),infected (patent : P, disease : D),disease (name : D, contagious : high, type : streptococcus).The query asks for any male patient on the maternity ward who has a highly contagious strepto-coccus infection. Let the database have the following natural integrity constraints:patient (name : P, gender : female) ( in patient (name : P, ward : maternity).X = Y ( patient (name : P, gender : X), patient (name : P, gender : Y)The second IC expresses the functional dependency in patient, and should be in the database.The �rst IC may have been added for the purpose of double-checking patient record entries in thematernity ward. With this knowledge (these ICs) available, it is possible to prove logically thatthe query Q must fail.Gal and Minker studied misconceptions. They could determine whether a query was a misconcep-tion by checking whether it was subsumed by an IC, and so was inconsistent with the database.They built a mechanism for the misconception test using the technique of residues and partialsubsumption (presented in Section 5). The subsuming IC provides a reason why the query fails.The proof of subsumption and the IC are then used as the basis of an explanation to the user.25In [Gaasterland, 1992] misconceptions is further studied, and the question of how best to coordi-nate the explanation, which can become quite complex, to present to the user is considered. In[Godfrey, 1997a], a di�erent approach to misconception detection and explanation than the residuemethod is developed which has better runtime complexity, and which can guarantee a minimalexplanation.User constraints, which were discussed in Section 4.3, were introduced as a cooperative techniqueto help focus the response to a user's queries to within the user's interests ([Gaasterland, 1992] and[Gaasterland and Minker, 1991]). While they are not integrity constraints|they are not necessarilyconsistent with the database as ICs (usually) are required to be|they can be employed as such bythe algorithms that test for misconceptions to remove portions of the answer set in which the useris not interested (as indicated by the user constraints). This is a variation on ICs that can allowfor user-speci�c semantics to be added to the database.A second type of cooperative response is an intensional answer (IA). An intensional answer toa query is another query formula. Intensional answers have been alternatively characterized asformulas which logically subsume the query (so the intensional answer's answer set is a supersetof the query's), which are logically equivalent to the query, or which are logically subsumed by thequery. The latter two characterizations are more prevalent. An intensional answer is warrantedwhen the intensional answer is simpler than the query itself, or explicates something about thequery and database.Intensional answers were introduced by [Imieli�nski, 1988]. Imieli�nski viewed IAs as a rewrite of thequery which preserves its semantics (and, thus, is equivalent to the query). In [Cholvy, 1990] and[Cholvy and Demolombe, 1987] IAs are considered under the su�ciency criterion (that is, the IA issubsumed by the query). While intensional answers can be considered without integrity constraints,25It is important to note that sometimes, due to the closed world assumption, di�erent unfoldings of the querymay fail for di�erent reasons. Then a composite explanation involving several ICs may be necessary.

Page 27: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 27 of 46ICs allow more IAs to exist: the query and IA are logically equivalent with respect to the ICs. In[Motro, 1989] and in [Pirotte and Roelants, 1989] and [Pirotte et al., 1990] intensional answers areconsidered based on residues.26 It may be argued that IAs which are equivalent to the query withrespect to integrity constraints are more meaningful because their equivalence conveys semanticinformation. Consider the following query: course (title : , semester : \Spring 1997", professor : P),employee (name : P, rank : professor, department : \Computer Science").This asks for all the professors in computer science who are teaching a course this semester. Considerthat the following integrity constraint exists:course (semester : S, professor : P) ( employee (name : P, rank : professor, started : B),semester (S), B � S.This states that every professor must teach at least one course a semester. If so, the query aboveis equivalent to the set of all computer science professors, employee (name : P, rank : professor, department : \Computer Science").Thus, it could be returned as an intensional answer. Again, this indicates that the user has amisconception. Not to respond with the intensional answer above|that all professors teach everysemester|would be misleading. The user probably will miss the fact that the answer set is all thecomputer science professors. Even if he or she did notice this somehow, the user could not knowthat it is signi�cant. The knowledge was left out.There are other types of cooperative answering that do not rely on integrity constraints directly.Query relaxation helps the user formulate follow-up queries by relaxing the current query into moregeneral queries. (A more general query's answer set subsumes the query's answer set.) This helpsthe user to browse further for related answers and concepts. In [Gaasterland et al., 1992b], the rulesof the database are employed in an abductive direction to accomplish relaxation. In the CoBase sys-tem ([Chu et al., 1993] and [Chu et al., 1994]), type abstraction hierarchies are used in conjunctionwith the relational schema to relax constants used in the query, and a generalization/re�nementprocess is employed to �nd \neighboring" queries. In both cases, integrity constraint informationis not used.However, such techniques ought to take into consideration the semantics that the integrity con-straints provide to be e�ective. Relaxation can be used to help a user re-formulate a query aftera misconception query. The relaxed alternatives the system provides, however, better resolve (re-move) the semantic con icts that cause the original query to be a misconception. Otherwise, therelaxation choices may be misconceptions too. In such a case, the relaxation facility could bepatently unhelpful to the user. In [Godfrey et al., 1994], we consider such issues, and advocate thenecessity for a general cooperative system to coordinate the cooperative behaviors it enables.Other techniques for cooperative answering \create" knowledge in an attempt to disambiguate orexplain the situation for the user. One such important technique is to identify the false presup-26As Gal and Minker earlier do, Pirotte, Roelands, and Zimanyi note that the residues from seman-tic compilation are knowledge, and can be considered a type of intensional answer. Their work issubsumed by [Chakravarthy et al., 1986a], [Chakravarthy et al., 1986b], [Chakravarthy et al., 1990], [Gal, 1988],[Gal and Minker, 1985], and [Gal and Minker, 1988].

Page 28: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 28 of 46positions of a failing query to the user. In the case of conjunctive DATALOG queries,27 this issynonymous with �nding the minimal failing subqueries (MFSs) of the query ([Godfrey, 1997b],[Janas, 1979], [Janas, 1981], and [Kaplan, 1981]). (We advocate doing this when the query failedand was �rst determined not to be a misconception. Otherwise, the misconception ought to beexplained instead [Godfrey et al., 1994].) It is cooperative to focus on the part of the query thatfails. Thus the user knows what information is not in the database and can appropriately revisewhat he or she is seeking. Consider a valid variant of the misconception query from before:Q: in patient (name : P, ward : maternity), infected (patent : P, disease : D),disease (name : D, contagious : high, type : streptococcus).Assume that this query fails (but is not a misconception). It may be that the subqueryQ: in patient (name : P,maternity), infected (patent : P, disease : D),disease (name : D, contagious : high).fails. That is, there are currently no patients (on any ward) with a highly contagious disease.The cooperative technique of false presuppositions does not employ integrity constraints; rather,it is a technique that is used when there is no semantic knowledge to bring to bear on the failingquery as there is with misconceptions.28 This side of cooperative answering is closely related withknowledge discovery in databases. In [Godfrey, 1997b], the complexity of �nding MFSs of a queryis determined, and an optimal algorithmic approach for discovering MFSs is devised. Note that the\semantics" that are discovered by the process take the same form as integrity constraints.Thus while the MFSs found for a query are not ICs, they could be. Indeed, a database designermay discover that he or she had intended some MFS as an IC in the database, but the IC had beenmissed in the speci�cation. In this case, a discovered MFS could be \promoted" to be an IC. Thiscauses no administrative problems, as the database is already consistent with the MFS. (At leastas of the time the query was asked.) In any case, an MFS may be treated as a state IC, at leastuntil a transaction invalidates it. (Recall Section 4.2.) State integrity constraints can be useful fore�ciency, as they can be used by any semantic query optimization routines just as any IC. (Recallthe previous section.)In this section, we stated that a query is considered a misconception whenever it is \inconsistent"with the database. The intent is that the query cannot have answers with respect to the database,and so the query formula could serve as an integrity constraint. Thus the semantics of misconceptionis linked with the semantics we chose for integrity constraints. Since there are choices for suchsemantics, there are choices for the semantics of cooperative answering too.6.3 Combining Databases and Resolving InconsistenciesDuring the past several years, there has been considerable work devoted towards answering queriesacross heterogeneous and distributed databases. This work is critical for organizations locatedremotely from one another, but where related data are stored, and for organizations whose data27These are synonymous with basic multi-join SQL queries.28In [Janas, 1981], he attempts to show how ICs could be used to help in the search for false presuppositions.However, it is shown in [Godfrey, 1997b] that this is of marginal utility.

Page 29: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 29 of 46are distributed simply for organizational reasons. Given a set of DATALOG databases and a setof integrity constraints that are considered to be the IC for every database, we wish to \combine"these databases to obtain answers to queries. If one ignores the integrity constraints, then thecombination of the databases via their union is easily shown to be consistent. However, when oneaccounts for the IC, the merged database may be inconsistent.We consider a merging technique that ensures consistency with IC, guarantees in a certain senseminimal loss of information to accomplish this, and results in a potentially disjunctive database.The goal is to combine a set of DATALOG databases, fD1; : : : ;Dng with IC to obtain a maximalset of consistent data. For example, consider two DATALOG databases, D1 = fAg and D2 =fBg, and IC = f( A;Bg. There are three consistent and correct combinations of D1 and D2:nfAg; fBg; f(A _ B)go. The last is maximally consistent, and we choose it as characterizing thecombination. Thus, the combination consists of two minimal models|models fAg and fBg|andany answer to a query must be correct with respect to both of these minimal models.We de�ne the consistency of a set of DATALOG databases as follows. A database D (whichmay be disjunctive) is said to be consistent with respect to a set of integrity constraints IC i�every minimal model of D satis�es IC. An algorithm to combine DATALOG databases is given in[Baral et al., 1991]. We provide an example that illustrates the problem, and we present brie y thesteps needed to combine the databases in the example.Example 3. [Baral et al., 1991] Consider the integrity constraint ( p (X); p (Y); (X 6= Y), andtwo databases D1 and D2 as follows.D1r (a).p (X) q (X). D2q (b).p (X) r (X).Each database alone is consistent with the integrity constraint. However, the union of D1and D2 together with the integrity constraint is inconsistent. The steps to �nd a consistentmerger of D1 and D2 are as follows.Step 1: Consider the integrity constraint as a query. Form the union of the two databases.Query the union of the two databases with the integrity constraint. From this, itcan be determined that the integrity constraint is violated when X = a and Y = b.There is a proof tree that leads to an inconsistency. Let S be as follows.( p (X), p (Y).p (X) r (X).p (X) q (X).Let the substitution � be fX = a; Y = bg. The set S� represents an inconsistency,as found from the proof of inconsistency.Step 2: The set S� has two minimal models: fp (a)g and fp (b)g. Hence, (p (a)_ p (b)) is alogical consequence and becomes a new fact.

Page 30: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 30 of 46Step 3: The rules in the original databases are modi�ed to exclude the possibility thatthere will be a contradiction. The modi�cations are due to the instantiations ofthe constants in the proof tree in Step 1. The restricted rules then are:p (X) q (X), X 6= b.p (X) r (X), X 6= a.Step 4: The new fact, the revised rules, and the union of the original databases with therevised rules constitute the combined database. Thus, the combined database is:p (a) _ p (b).r (a).q (b). p (X) q (X), Y 6= b.p (X) r (X), X 6= a.The combined theory has the minimal models fp (a); r (a); q (b)g and fp (b); r (a); q (b)g.The algorithm provides a maximal and consistent combination of the databases.For the query p (X), it returns the correct answer (p (a)_ p (b)).This algorithm is extended in [Baral et al., 1991] to handle strati�ed databases. In [Baral et al., 1992],it is shown how to combine databases that consist of �rst order theories, while in [Baral et al., 1994],it is shown how to combine default logic databases.In addition to combining multiple databases with integrity constraints, one may have preferencesamong the data in the alternative databases. Thus, one database might have priority over anotherfor certain facts, and vice-versa for other pieces of data. For example, in one database it is knownthat father (j, m), which is intended to mean that \j is the father of m," while in a second databasefather (s, m) appears. Assume that the �rst database is owned by j, while the second is owned bysome other individual (other than j or s). Furthermore, we have the integrity constraint that anindividual may have only one father. Clearly, we want to have the priority that speci�es that if jstates that he is the father of some individual, then it should take precedence over the claims inanother database that says something di�erent. The case of relational databases without views ishandled in [Pradhan et al., 1995]. We brie y describe this approach.Let X = fX1; : : : ;Xng and Y = fY1 : : : ;Yng be sets of atoms. The priority X � Y is understoodto mean fX1 � Y ; : : : ;Xn � Yg; that is, each atom Xi has priority over Y . The combined theory isdeveloped in [Baral et al., 1991], [Baral et al., 1992], and [Baral et al., 1994]. The resultant modelsof the combined theory are called options. That is, an option associated with a set of databases,fD1; : : : ;Dng, is a maximal subset of the base set (that is, the facts in the database) consistentwith IC. In the following, we assume the set of databases is propositional; that is, each is simply arelational database without views (justEDB). This assumption is removed in [Pradhan et al., 1995].When an option oi 2 O does not satisfy a priority, in which the priorities between atoms or sets ofatoms are given above, then there are three ways to develop a consistent semantics that satisfy theintegrity constraints, the theory, and the priorities.1. Option elimination states that if a priority X � Y is not satis�ed, then the option should beremoved.2. Option ordering imposes a rank ordering on members of O, which gives oi 2 O a lower

Page 31: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 31 of 46ordering whenever the priority X � Y is not satis�ed.3. Option transformation transforms the option oi so that it satis�es the priority X � Y .It is proved in [Pradhan et al., 1995] that the three alternative possible semantics yield the sameresults, and, thus, are equivalent. We illustrate option elimination.Example 4. [Pradhan et al., 1995] Let T1 = fa; bg and T2 = fc; dg. Let the set of integrityconstraints IC and the set of priorities P be as follows.IC( a,b,c.( a,d.( b,c,d. P a � bb � cc � dThe options may be seen to be O = nfa; bg; fa; cg; fb; cg; fb; dg; fc; dgo. The priority a � beliminates the option fb; cg since the priority is not satis�ed in that option (a does not appearin the option); priority b � c eliminates both options fa; cg and fc; dg; and, �nally, priorityc � d eliminates the option fb; dg. Hence, the only option that remains is fa; bg, and so weaccept this as the semantics of the merger of the databases, the integrity constraints, and thepriorities.The problem of combining DATALOG databases that have both integrity constraints and prioritiesis solved in [Pradhan and Minker, 1997]. They show that option elimination and option orderingyield equivalent semantics for DATALOG databases. However, option transformation is not equiv-alent to the other semantics in this case. It is shown in [Pradhan and Minker, 1997] that optiontransformation is a more cautious semantics, in the sense that less information can be derived fromit than in either option elimination or option ordering.It is a practical necessity to resolve semantic con icts in the merger of multiple databases withrespect to the global integrity constraints simply because such semantic con icts are assured toarise. Di�erent organizations may own and update the di�erent databases being merged, and sothere is no mechanism that can assure that the collection is always consistent. This idea can betaken a step further. Why not allow that a given database itself might be inconsistent with respectto its (or global) integrity constraints? As databases become more complex and very large, it maybecome inevitable that we accept that a database will have inconsistencies. (Some ICs will still beused to check updates, but others may not be because, perhaps, it is too expensive.)In [Demolombe and Jones, 1996], discussed in Section 3.3, they consider that the database may beinconsistent with respect to the integrity constraints, and can tell if an answer from the databaseshould be considered valid. In [Bry, 1996], a way to use the ICs to mask out local inconsistencies ina logic program or deductive database is considered. All the techniques discussed above for mergingdatabases in a globally consistent manner can be applied to the problem of presenting a consistentview of a single inconsistent database.

Page 32: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 32 of 466.4 View UpdatesIn deductive or relational databases, views are intensional predicates, which appear only in theheads of IDB rules but do not appear in the EDB.29 An intensional predicate is de�ned in terms ofother intensional predicates and extensional predicates. We consider the problem of view updates|updates to an intensional predicate|in deductive databases. An update to an intensional predicatecan only be accomplished by modifying appropriately the underlying extensional predicates, inthe extensional part of the database. The problem is that there may be more than one way theunderlying tables might be modi�ed to e�ect the update. Which to choose is unclear. This problemof ambiguity is known as the view update problem.Even in the case of a purely de�nite deductive database, it is unclear how the extensional relationsshould be modi�ed to accomplish a given view update. For example, consider the database Dp (X) a (X)p (X) b (X)(in which p is an intensional predicate, and a and b are extensional predicates). Suppose we wishto update the database with the fact p (c). If we are restricted to de�nite clauses, there are threeplausible ways to accomplish this:� add a (c);� add b (c); or� add both a (c) and b (c).Either of the �rst two options seem arbitrary. (Why choose one arbitrarily over the other?) Thethird option, however, results in an update that is too strong. (There is no reason to believe botha (c) and b (c).) If we allow disjunctive information in the database, however, we can accomplishthe update via a fourth option, by adding (a (c)_ b (c)). This last option appears intuitively to becorrect.In [Grant et al., 1993], general algorithms for accomplishing view updates (both for insertions anddeletions) in de�nite, and disjunctive deductive databases are provided. Semantic justi�cation ofthe updates that these algorithms accomplish is given, which makes precise the sense in which theyrepresent \minimal modi�cations" to the underlying database. Two kinds of updates are considered:those that involve the insertion or deletion of information into and out of a de�nite, or disjunctive,deductive database; and those that involve the insertion or deletion of information into and out ofa strati�ed normal disjunctive database. Insertions into normal deductive databases may requirethe insertion of negative information. (Of course, a database cannot contain negative informationexplicitly. However, it can yield negative information through inference|non-monotonic reasoningtechniques|such as closed world reasoning ([Reiter, 1978] and [Minker, 1982]).An extension to the work in [Grant et al., 1993] is made in [Fern�andez et al., 1996], in which a modeltheoretic approach is introduced which encompasses a wide class of Herbrand semantics, includingthe perfect model and stable model semantics, for disjunctive databases including negation. Thedatabases may contain disjunctive rules and denial constraints, and may be required to satisfy29This is a common restriction, and can be made without loss of generality. Call predicates that appear only inthe IDB intensional predicates, and the others extensional predicates.

Page 33: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 33 of 46integrity constraints. They prove the algorithms provided to be correct, and \best", with respectto the criteria of causing minimal change to the database and giving priority to minimizing deletions.We illustrate the application of ICs in the framework of [Fern�andez et al., 1996]. Consider thefollowing propositional DB:s _ t not p;not r.a _ b.a _ c. w not s.p _ q a.q _ r b.p _ r c.Note that the extensional atoms are a, b, c, and the intensional atoms p, q, r, s, t, and w. Thisdatabase has two minimal models projected over its EDB: fag and fb; cg. With the IDB considered,the database has �ve minimal models: fa; p;wg, fa; q; sg, fa; q; t;wg, fb; c; p; q;wg, and fb; c; r;wg.Since there is a minimal model that does not contain w, w is not true.To insert w into DB in a minimal way, our algorithm adds the fact c to the EDB (and also removesthe fact (a _ c), which is subsumed by the new fact). This yields two minimal models over thenew EDB: fa; cg and fb; cg. With the IDB considered, the database has four minimal models:fa; c; p;wg, fa; c; q; r;wg, fb; c; p; q;wg, and fb; c; r;wg.As w is in all four minimal models, it is true with respect to the modi�ed database.Suppose now that DB also contains the IC: ( a,c. The insertion from above cannot work, becausea minimal model of the EDB contains both a and c. The algorithm takes the ICs into account. Inthis case, the update is to insert b and c into the EDB, and to remove (a _ b) and (a _ c), whichare now subsumed. Note that this would not be a minimal modi�cation without the presence ofthe IC. After this update, there is only one minimal model projecting over the new EDB: fb; cg.There are two minimal models for the new DB: fb; c; p; q;wg and fb; c; r;wg.6.5 Additional ApplicationsWe discuss brie y several other applications of integrity constraints.1. Order Optimization. Many SQL operations such as group by, order by, distinct require thephysical ordering of data. Sorting is an expensive operation, however. [Simmen et al., 1996]describe an optimization technique for minimizing the number of sorting columns or detectingwhen sorting can be avoided altogether because of the existence of functional dependencieson retrieved data. For example, if tuples need to be sorted on columns A and B and there isa functional dependency A! B, then sorting on B can be avoided. Functional dependenciescan also be used to push sort down into a query tree to make it cheaper or to combine twoor more sorts into a single sort.2. Enhancing the applications of query folding. Query folding is a semantic optimization tech-nique that replaces the original predicates of a query with so-called resource predicates andanswers the modi�ed query. Resources can be views or cached results of previous queries.

Page 34: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 34 of 46Query folding can be used for evaluating queries in the context of heterogeneous databasesystems where tables in di�erent databases are related through view de�nitions or in semanticquery caching where the query may be answered from a cache. Integrity constraints can dra-matically increase the applicability of query folding; [Dawson et al., 1996] describes a queryfolding algorithm that uses functional dependencies.3. Revealing the semantics of object-oriented databases. One of the reasons for the rapid develop-ment of object-oriented databases was their model, which was designed to capture more mean-ing of an application environment than was possible with a model of relational or even deduc-tive databases. The introduction of a new, more natural object model required the introduc-tion of a new data and query language compatible with that model. This meant, in particular,that many of the semantic-based techniques30 designed for relational and deductive databaseswere not directly applicable to the object model; such techniques had to be developed specif-ically for the new model ([Yoon and L.Kerschberg, 1993] and [Buchheit et al., 1994]).It has been shown [Grant et al., 1997], however, that the existing techniques can be trans-ferred from the relational to the object database model. The main idea of this approach isthe transformation of an object schema into a relational schema in such a way that the objectsemantics is re ected in integrity constraints at the relational level. The transformation isintended to capture all the semantic information encoded in the object model, such as objectidentity, inheritance, types of relationships between classes (one-to-one, many-to-one) andkeys. Consider the following example.Example 5. Let C1 be a subclass of C2 in the object model. Then, C1 and C2 can berepresented as tables T1 and T2 so that each tuple in T1 and T2 represents an objectfrom the classes C1 and C2 respectively. Then, the fact that C1 is a subclass of C2 canbe expressed by the following integrity constraint:t2(X1; :::; Xn)( t1(X1; :::; Xn; Xn+1; :::; Xn+m)Once the transformation of an object schema into a relational schema with integrity con-straints is completed, each query posed to an object database can be translated to a queryon the relational database. At this point many of the existing tools developed for relationaldatabases can be applied to the query and the result translated back into the object model.4. Semantic Data Security via Security Constraints. Security and privacy are important issuesin database systems. All commercial relational systems provide security mechanisms thatallow one to designate certain tables to be viewable or updatable by certain users but notby others. Most provide an account and password facility to authenticate the users of thesystem. However, the granularity of these mechanisms is coarse: permission is often decidedon a table-by-table basis.A �ner mode of data security would be useful. One could specify the types of data that agiven user or class is not permitted to see. This is sometimes called semantic data security.The question then is how these semantic security speci�cations can be accomplished. In[Thuraisingham and Ford, 1995], it is argued that security constraints should be viewed as aspecial form of integrity constraints. This would allow security constraints to be written in30Such techniques include semantic query optimization (see Section 6.1), semantic query caching ([Dar et al., 1996],[Chen and Roussopoulos, 1994a], and [Keller and Basu, 1996]), cooperative answering (see Section 6.2) and queryrewriting using views ([Qian, 1996], [Levy et al., 1995], [Chaudhuri et al., 1995], and [Yang and Larson, 1987]).

Page 35: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 35 of 46a declarative fashion, in the same way ICs are. This is akin to user constraints, discussed inSection 4.3. The di�erence is that instead of shielding certain answers to the user's querybecause the user has no interest in them (indicated by con icts with the user constraints),they are shielded from the user for security or privacy reasons. In [Godfrey and Gryz, 1996a],it is considered that the same mechanisms used in semantic query optimization may be usedalso to eliminate answers that would con ict with such security constraints, thereby abidingsemantic data security.The issues of semantic data security can be epistemically more complex, if one also wantsin ensure that a user can never infer a \secret", and thereby compromise database security.In [Bonatti et al., 1992], �rst-order formulas are allowed to be designated as secrets withina deductive database domain. The deductive system is extended to protect secrets. It isshown that if one wants to protect that secrets themselves cannot be inferred, sometimes thedatabase system must lie to the user.Semantic data security is an issue of growing importance. As database applications becomemore sophisticated, the possibilities that traditional security mechanisms may be circum-vented grow. In [O'Leary, 1991], it is shown how new techniques for knowledge discovery indatabases can compromise data security. Pro-active mechanisms, such as the ones securityconstraints might be able to provide, could help guarantee that security is not breached.7 Conclusion and Future DirectionsWe have discussed various aspects of the use of integrity constraints in databases. We have notedthat integrity constraints are important not only for the traditional applications of updates andmaintaining the consistency of a database, but also as a way to provide semantic information. Usinglogic to represent integrity constraints|as well as using logic to represent facts, rules, and queries|and to represent user and other constraints provides a uniform representation language that permitsall of the tools and techniques of logic to be applied to a database. It permits systems to bedeveloped that are complete and sound; that is, all answers are found, and all answers are correctwith respect to the semantics of the database. We have also stressed that integrity constraintsprovide a basis for performing many applications such as semantic query optimization, providingcooperative answers, combining databases, handling inconsistent databases, and contributing to thesolving of the view update problem in databases and disjunctive databases. We have also noted thatthere is no one uniform de�nition of an integrity constraint. We noted that there is the consistencyde�nition, which states that the database must be consistent with the integrity constraints, and theentailment de�nition, which states that the integrity constraints must be entailed (or be provable)from the database, and other de�nitions as well. Each de�nition is relevant and is useful in di�erentcontexts, as we have demonstrated in this paper. Similarly, we have shown that integrity constraintscan apply to temporal conditions, but that there is no one uniform approach to handling temporaldatabases.Many areas of the use of integrity constraints still need investigating. Below, we discuss someaspects associated with this topic that require additional work.� Implement semantic query optimization and cooperative answering systems. Current rela-tional and deductive database systems do not provide these capabilities, but the current

Page 36: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 36 of 46standards for SQL provide for the incorporation of some aspects of integrity constraints.� Temporal databases require additional research to handle historical and real-time databases,and to extend the logical basis of ICs to transactions. It may be possible to apply transitionrules to such databases for integrity constraint checking, as described in [Martin and Sistac, 1996].� Transactions and updates have not received su�cient attention. There are semantic mod-els for updates ([Fagin et al., 1983], [Grant et al., 1993], and [Fern�andez et al., 1996]) thatensure that views and data are updated correctly. In a number of emerging applicationsof database systems, transactions are viewed as sequences of nested, and often interacting,sub-transactions that may occur sporadically over long periods of time. Logic-based trans-action systems will be essential to ensure that an appropriate and correct transaction isachieved, and that the updates are consistent with respect to the ICs. Relevant work isbeing done as in [Bonner and Kifer, 1993], [Bonner and Kifer, 1996], [Lin and Reiter, 1994],[Lin and Reiter, 1993], [Korth and Speegle, 1994], [Ammann et al., 1995], [Chen and Roussopoulos, 1994b],[Farrag and Ozsu, 1989], [Garcia-Molina, 1983], and [Ludascher et al., 1996].� Active databases allow for data to protect their own integrity, which makes for a more complexdatabase semantics. Active rules are often represented by the formalism Event-Condition-Action (ECA) [Technologies, 1989]. Whenever an event E occurs, if condition C holds,then trigger action A. Zaniolo has noted the need for a declarative semantics of triggers([Zaniolo, 1993] and [Zaniolo, 1996]). He has developed a uni�ed semantics for active anddeductive databases, and has shown how active database rules relate to transaction-consciousstable model semantics. In [Baral and Lobo, 1996], a �rst step is proposed towards charac-terizing active databases.� Data mining and inductive inference involve discovering generalizations that hold over adatabase (or a logic program). Such generalizations may be considered as integrity constraintsthat must be true with respect to the database, or state constraints that may be true of thecurrent state, but may change if there are updates. (Recall Section 4.2.) Data mining canbe viewed as a complement to the usual applications of integrity constraints. ICs representdeclarative knowledge in the database. Data mining is involved in discovering new, potentialknowledge about the database. For work on this topic see [Muggleton and Raedt, 1994],[Piatetsky-Shapiro and Frawley, 1991], and [Laurent and Vrain, 1996].� Logical foundations of object-oriented deductive databases is needed. Object-oriented databasesare related in many ways to hierarchic and network systems. It is essential for such systemshave a formal theory and a semantics. This task is made di�cult as there have been noformal de�nitions for the object-oriented database model.31 E�orts have been undertaken in[Kifer et al., 1993] to develop a formal foundation for object-oriented databases. A formal,logic-based model for object-oriented databases would allow us to apply all of the tools andtechniques developed for deductive and relational databases to these databases. (We dis-cussed brie y in Section 6.5 how SQO might be applied to object-oriented databases.) Thiswork will help to de�ne integrity constraints in the domain of object-oriented databases.� Combining databases is a topic related to both heterogeneous and multimedia systems. Thegoal is to combine databases that share the same set of integrity constraints and a mutual31The Classic database system does have a formal description based on a description logic [Brachman et al., 1992].

Page 37: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 37 of 46schema. (See Section 6.3.) Such issues arise in distributed systems work, and in combiningknowledge bases. In addition to handling problems that arise because the combined databasesmay be inconsistent, one must handle priorities that may exist among individual facts anddatabases. (Some databases are more trusted than others for certain information.) For a for-mal treatment of this subject see [Baral et al., 1991], [Baral et al., 1992], [Baral et al., 1994],[Pradhan et al., 1995], [Pradhan and Minker, 1995], and [Pradhan, 1995]. Techniques devel-oped for combining databases may be adapted for the mediation of heterogeneous databasesthat accounts for the semantics of the integrity constraints, as researched in [Subrahmanian et al., 1994],[Chawathe et al., 1994], and [Miller et al., 1994].Multimedia databases is an emerging area for which new datamodels are needed [Subrahmanian and Jajodia, 1996].Multimedia applications have special requirements: manipulating geographic databases; pic-ture retrieval, for which space (instead of time) must be modeled in the database; and video,for which space and time must both be modeled together. Integrity constraints for temporaland spatial reasoning are needed.� Constraint databases (and constraint logic programming) introduce the additional construct ofdomain constraints. Domain constraints may be equalities, inequalities, and, sometimes, spe-cial relations considered to be constraint relations ([Ja�ar and Lassez, 1987] and [Ja�ar and Maher, 1994]).Constraint-intensive queries arise in many advanced applications, such as in geographicdatabases. Constraints can capture spatial and temporal behavior in a natural way notpossible in other database systems. The relationships between these areas need to be furtherexplored.From the above, it is clear that the application of integrity constraints to databases remains a fertilearea to explore. There are many topics that must be explored to yield intelligent systems beyondwhat is currently available.

Page 38: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 38 of 46References[Ammann et al., 1995] Ammann, P., Jajodia, S., and Ray, I. (1995). Using formal methods toreason about semantics-based decompositions of transactions. In 21st International Conferenceon Very Large Data Bases (VLDB), pages 218{227, Zurich, Switzerland.[Baral et al., 1991] Baral, C., Kraus, S., and Minker, J. (1991). Combining multiple knowledgebases. IEEE Transactions on Knowledge and Data Engineering, 3(2):208{220.[Baral et al., 1992] Baral, C., Kraus, S., Minker, J., and Subrahmanian, V. (1992). Combiningknowledge bases consisting of �rst order theories. Computational Intelligence, 8:45{71.[Baral et al., 1994] Baral, C., Kraus, S., Minker, J., and Subrahmanian, V. (1994). Combiningdefault logic databases. Intl. Journal of Intelligent and Cooperative Info. Systems, 3(3):319{348.[Baral and Lobo, 1996] Baral, C. and Lobo, J. (1996). Formal characterization of active databases.In Logic in Databases (LID'96), San Miniato, Italy. Springer.[Bonatti et al., 1992] Bonatti, P. A., Kraus, S., and Subrahmanian, V. S. (1992). Declarative foun-dations of secure deductive databases. In Biskup, J. and Hull, R., editors, Fourth InternationalConference on Database Theory (ICDT'92), Lecture Notes in Computer Science, Vol. 646, pages391{406, Berlin. Springer.[Bonner and Kifer, 1993] Bonner, A. and Kifer, M. (1993). Transaction logic programming. InWarren, D. S., editor, Proceedings of the Tenth International Conference on Logic ProgrammingICLP'93, pages 257{279, Budapest, Hungary. MIT Press.[Bonner and Kifer, 1996] Bonner, A. and Kifer, M. (1996). Concurrency and communication intransaction logic. In Pedreschi, D. and Zaniolo, C., editors, Logic in Databases (LID'96), pages153{172. Also in this collection.[Boolos and Je�rey, 1989] Boolos, G. S. and Je�rey, R. C. (1989). Computability and Logic. OpenUniversity Set Book. Cambridge University Press, third edition.[Brachman et al., 1992] Brachman, R., Borgida, A., McGuinness, D., Patel-Schneider, P., andResnick, L. (1992). The Classic knowledge representation system of KL-ONE: The next gen-eration. In International Conference on Fifth Generation Computer Systems, pages 1036{1043,ICOT, Japan.[Bry, 1996] Bry, F. (1996). A compositional semantics for logic programs and deductive databases.In Proceedings of the Joint International Conference and Symposium on Logic Programming, BadHonnef, Germany. MIT Press. Longer version available as technical report.[Buchheit et al., 1994] Buchheit, M., Jeusfeld, M. A., Nutt, W., and Staudt, M. (1994). Subsump-tion between queries to object-oriented databases. Information Systems, 19(1):33{54.[Chakravarthy et al., 1986a] Chakravarthy, U. S., Grant, J., and Minker, J. (1986a). Foundationsof semantic query optimization for deductive databases. In Minker, J., editor, Proceedings ofthe Workshop on Foundations of Deductive Databases and Logic Programming, pages 67{101,Washington, D.C.

Page 39: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 39 of 46[Chakravarthy et al., 1986b] Chakravarthy, U. S., Grant, J., and Minker, J. (1986b). Semanticquery optimization: Additional constraints and control strategies. In Kerschberg, L., editor,Proceedings of Expert Database Systems, pages 259{269, Charleston.[Chakravarthy et al., 1990] Chakravarthy, U. S., Grant, J., and Minker, J. (1990). Logic basedapproach to semantic query optimization. ACM Transactions on Database Systems, 15(2):162{207.[Chan, 1993] Chan, E. (1993). A possible world semantics for disjunctive databases. IEEE Trans-actions on Data and Knowledge Engineering, 5(2):282{292.[Chaudhuri et al., 1995] Chaudhuri, S., Krishnamurthy, R., Potamianos, S., and Shim, K. (1995).Optimizing queries with materialized views. In Proceedings of the Eleventh International Con-ference on Data Engineering, pages 190{200.[Chawathe et al., 1994] Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstanti-nou, Y., Ullman, J., and Widom, J. (1994). The TSIMMIS project: Integration of heterogeneousinformation sources. In Proceedings of the Information Processing Society of Japan (IPSJ) Con-ference, T�oky�o.[Chen and Roussopoulos, 1994a] Chen, C. M. and Roussopoulos, N. (1994a). The implementationand performance evaluation of the ADMS query optimizer: Integrating query result caching andmatching. In Proceedings of the 4th International Conference on Extending Database Technology,Cambridge, U.K.[Chen and Roussopoulos, 1994b] Chen, C. M. and Roussopoulos, N. (1994b). The implementationand performance evaluation of the ADMS query optimizer: Integrating query result cachingand matching. In Proc. of the 4th International Conference on Extending Database Technology,Cambridge, UK.[Cholvy, 1990] Cholvy, L. (1990). Answering queries addressed to a rule base. Revue d'intelligencearti�cielle, 4(1):79{98.[Cholvy and Demolombe, 1987] Cholvy, L. and Demolombe, R. (1987). Querying a rule base. InKershberg, L., editor, Expert Database Systems, Tysons Corner, Virginia.[Chomicki, 1995] Chomicki, J. (1995). E�cient checking of temporal integrity constraints usingbounded history encoding. ACM TODS, 20(1).[Chu et al., 1994] Chu, W. W., Chen, Q., and Merzbacher, M. A. (1994). CoBase: A cooperativedatabase system. In [Demolombe and Imielinski, 1994], chapter 2, pages 41{73.[Chu et al., 1993] Chu, W. W., Merzbacher, M. A., and Berkovich, L. (1993). The design andimplementation of CoBase. In Proceedings of the 1993 ACM SIGMOD: International Conferenceon Management of Data, pages 517{522, Washington, D.C. ACM Press.[Clark, 1978] Clark, K. L. (1978). Negation as Failure. In Gallaire, H. and Minker, J., editors,Logic and Data Bases, pages 293{322. Plenum Press, New York.[Dahl and Saint-Dizier, 1988] Dahl, V. and Saint-Dizier, P., editors (1988). Natural Language Un-derstanding and Logic Programming. North Holland.

Page 40: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 40 of 46[Dar et al., 1996] Dar, S., Franklin, M., Jonsson, B., Srivastava, D., and Tan, M. (1996). Semanticdata caching and replacement. In Proceedings of VLDB.[Das, 1992] Das, S. K. (1992). Deductive Databases and Logic Programming. Addison-Wesley,Wokingham, England.[Dawson et al., 1996] Dawson, S., Gryz, J., and Qian, X. (1996). Query folding with functionaldependencies. Technical report, Computer Science Laboratory, SRI International, Menlo Park,CA.[Demolombe and Imielinski, 1994] Demolombe, R. and Imielinski, T., editors (1994). NonstandardQueries and Nonstandard Answers. Studies in Logic and Computation 3. Clarendon Press,Oxford.[Demolombe and Jones, 1996] Demolombe, R. and Jones, A. (1996). Integrity constraints revisited.Journal of IGPL, 4(3):369{383.[Fages, 1991] Fages, F. (1991). A new �xpoint semantics for general logic programs compared withthe well-founded and the stable model semantics. New Generation Computing, 9:425{443.[Fagin et al., 1983] Fagin, R., Ullman, J. D., and Vardi, M. Y. (1983). On the semantics of updatesin databases. In Proceedings of the Tenth ACM Symposium on Principles of Database Systems(PODS), pages 352{365. SIGACT/SIGMOD.[Farrag and Ozsu, 1989] Farrag, A. and Ozsu, M. (1989). Using semantic knowledge of transactionsto increase concurrency. ACM, TODS, 14(4):503{525.[Fern�andez, 1994] Fern�andez, J. A. (1994). Disjunctive Deductive Databases. PhD thesis, Universityof Maryland, Department of Computer Science, College Park.[Fern�andez et al., 1996] Fern�andez, J. A., Grant, J., and Minker, J. (1996). Model theoretic ap-proach to view updates in deductive databases. Journal of Automated Reasoning, 17(2):171{197.[Fern�andez et al., 1993] Fern�andez, J. A., Lobo, J., Minker, J., and Subrahmanian, V. (1993).Disjunctive LP + integrity constraints = stable model semantics. Annals of Mathematics andArti�cial Intelligence, 8(3{4):449{474.[Gaasterland, 1992] Gaasterland, T. (1992). Cooperative Answers for Database Queries. PhDthesis, University of Maryland, Department of Computer Science, College Park.[Gaasterland et al., 1992a] Gaasterland, T., Godfrey, P., and Minker, J. (1992a). An overview ofcooperative answering. Journal of Intelligent Information Systems, 1(2):123{157.[Gaasterland et al., 1992b] Gaasterland, T., Godfrey, P., and Minker, J. (1992b). Relaxation as aplatform for cooperative answering. Journal of Intelligent Information Systems, 1:293{321.[Gaasterland et al., 1994] Gaasterland, T., Godfrey, P., and Minker, J. (1994). An overview ofcooperative answering. In [Demolombe and Imielinski, 1994], chapter 1, pages 1{40. Appearsorginally as [Gaasterland et al., 1992a].[Gaasterland and Lobo, 1994] Gaasterland, T. and Lobo, J. (1994). Quali�ed answers that re ectuser needs and preferences. In Proceedings of VLDB, pages 309{320, Santiago de Chile, Chile.

Page 41: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 41 of 46[Gaasterland and Minker, 1991] Gaasterland, T. and Minker, J. (1991). User needs and languagegeneration issues in a cooperative answering system. In Saint-Dizier, P., editor, ICLP'91 Work-shop: Advanced Logic Programming Tools and Formalisms for Language Processing, pages 1{14,INRIA, Paris, France.[Gal, 1988] Gal, A. (1988). Cooperative Responses in Deductive Databases. PhD thesis, Departmentof Computer Science, University of Maryland, College Park, Maryland.[Gal and Minker, 1985] Gal, A. and Minker, J. (1985). A natural language database interface thatprovides cooperative answers. Proceedings of the Second Conference on Arti�cial IntelligenceApplications.[Gal and Minker, 1988] Gal, A. and Minker, J. (1988). Informative and cooperative answers indatabases using integrity constraints. In [Dahl and Saint-Dizier, 1988], pages 277{300.[Gallaire et al., 1981] Gallaire, H., Minker, J., and Nicolas, J.-M., editors (1981). Advances inDatabase Theory, Volume 1. Plenum Press, New York.[Garcia-Molina, 1983] Garcia-Molina, H. (1983). Using semantic knowledge for transaction pro-cessing in a distributed database. ACM, TODS, 8(2):186{213.[Gelder et al., 1988] Gelder, A. V., Ross, K., and Schlipf, J. S. (1988). Unfounded sets and well-founded semantics for general logic programs. In Proceedings of the 7th Symposium on Principlesof Database Systems, pages 221{230.[Gelfond and Lifschitz, 1988] Gelfond, M. and Lifschitz, V. (1988). The stable model semanticsfor logic programming. In Kowalski, R. A. and Bowen, K. A., editors, Proc. 5th InternationalConference and Symposium on Logic Programming, pages 1070{1080, Seattle, Washington.[Godfrey, 1997a] Godfrey, P. (1997a). An Architecture and Implementation for a CooperativeDatabase System. PhD thesis, University of Maryland at College Park, College Park, Mary-land 20742. In progress.[Godfrey, 1997b] Godfrey, P. (1997b). Minimization in cooperative response to failing databasequeries. International Journal of Intelligent and Cooperative Information Systems. To appear.[Godfrey and Gryz, 1996a] Godfrey, P. and Gryz, J. (1996a). A framework for intensional queryoptimization. In Boulanger, D., Geske, U., Giannotti, F., and Seipel, D., editors, Proceedings ofthe Workshop on Deductive Databases and Logic Programming, held in conjunction with the JointInternational Conference and Symposium on Logic Programming (JICSLP'96), GMD-StudienNr. 295, pages 57{68, Bonn, Germany. GMD-Forschungszentrum.[Godfrey and Gryz, 1996b] Godfrey, P. and Gryz, J. (1996b). Intensional query optimization. Tech-nical Report CS-TR-3702, UMIACS-TR-96-72, Dept. of Computer Science, University of Mary-land, College Park, MD 20742.[Godfrey et al., 1996] Godfrey, P., Gryz, J., and Minker, J. (1996). Semantic query optimizationfor bottom-up evaluation. In Ra�s, Z. W. and Michalewicz, M., editors, Foundations of Intelli-gent Systems: Proceedings of the 9th International Symposium on Methodologies for IntelligentSystems, Lecture Notes in Arti�cial Intelligence 1079, pages 561{571, Berlin. Springer.

Page 42: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 42 of 46[Godfrey et al., 1994] Godfrey, P., Minker, J., and Novik, L. (1994). An architecture for a coopera-tive database system. In Litwin, W. and Risch, T., editors, Proceedings of the First InternationalConference on Applications of Databases (ADB'94), Lecture Notes in Computer Science 819,pages 3{24. Springer Verlag, Vadstena, Sweden.[Grant et al., 1997] Grant, J., Gryz, J., Minker, J., and Raschid, L. (1997). Semantic query opti-mization in object databases. In Proceedings of ICDE, Birmingham, UK.[Grant et al., 1993] Grant, J., Horty, J., Lobo, J., and Minker, J. (1993). View updates in strati�eddisjunctive databases. Journal Automated Reasoning, 11:249{267.[Grant and Minker, 1990] Grant, J. and Minker, J. (1990). Integrity constraints in knowledge basedsystems. In Knowledge Engineering, volume II, pages 1{25. McGraw-Hill, New York.[Grefen and Apers, 1993] Grefen, P. W. P. J. and Apers, P. M. G. (1993). Integrity control inrelational database systems: An overview. Data and Knowledge Engineering, 10:187{223.[Hammer and Zdonik, 1980] Hammer, M. and Zdonik, S. (1980). Knowledge-based query process-ing. Proc. 6th International Conference on Very Large Data Bases, pages 137{147.[Hayes, 1973] Hayes, P. J. (1973). The frame problem and related problems in arti�cial intelligence.Arti�cial and Human Thinking, pages 45{59.[Imieli�nski, 1988] Imieli�nski, T. (1988). Intelligent query answering in rule based systems. In[Minker, 1988].[Ja�ar and Lassez, 1987] Ja�ar, J. and Lassez, J.-L. (1987). Constraint logic programming. InProceedings of the 14th ACM Symposium on Principles of Programming Languages, pages 111{119, M�unich, Germany.[Ja�ar and Maher, 1994] Ja�ar, J. and Maher, M. (1994). Constraint logic programming: A survey.Journal of Logic Programming, 19-20:503{581.[Janas, 1979] Janas, J. M. (1979). How not to say \NIL": Improving answers to failing queriesin data base systems. In Proceedings of the 6th International Joint Conference on Arti�cialIntelligence, pages 429{434, T�oky�o.[Janas, 1981] Janas, J. M. (1981). On the feasibility of informative answers. In[Gallaire et al., 1981], pages 397{414.[Joshi et al., 1981] Joshi, A., Webber, B., and Sag, I., editors (1981). Elements of Discourse Un-derstanding. Cambridge University Press.[Kaplan, 1981] Kaplan, S. J. (1981). Appropriate responses to inappropriate questions. In[Joshi et al., 1981], pages 127{144.[Keller and Basu, 1996] Keller, A. M. and Basu, J. (1996). A predicate-based caching scheme forclient-server database architectures. The VLDB Journal, 5(2):35{47.[Kifer et al., 1993] Kifer, M., Lausen, G., and Wu, J. (1993). Logical Foundations of Object-Oriented and Frame-Based Languages. Journal of the ACM.

Page 43: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 43 of 46[King, 1981] King, J. (1981). Quist: A system for semantic query optimization in relationaldatabases. Proc. 7th International Conference on Very Large Data Bases, pages 510{517.[Korth and Speegle, 1994] Korth, H. and Speegle, G. (1994). Formal aspects of concurrency controlin long duration transaction systems using the NT/PV model. ACM, TODS, 19(3):492{535.[Kowalski, 1978] Kowalski, R. (1978). Logic for data description. In Minker, H. G. J., editor, Logicand Data Bases, pages 77{102. Plenum Press, New York.[Kowalski and Sadri, 1988] Kowalski, R. and Sadri, F. (1988). Knowledge representation withoutintegrity constraints. Draft manuscript.[Lakshmanan and Missaoui, 1992] Lakshmanan, L. V. S. and Missaoui, R. (1992). On semanticquery optimization in deductive databases. In Proc. IEEE International Conference on DataEngineering, pages 368{375.[Laurent and Vrain, 1996] Laurent, D. and Vrain, C. (1996). Learning query rules for optimizingdatabases with update rules. In Pedreschi, D. and Zaniolo, C., editors, Logic in Databases(LID'96), pages 173{192.[Lee and Han, 1988] Lee, S. and Han, J. (1988). Semantic query optimization in recursivedatabases. In Proc. IEEE International Conference on Data Engineering, pages 444{451.[Lee et al., 1991] Lee, S., L.J.Henschen, and Qadah, G. (1991). Semantic query reformulation indeductive databases. In Proc. IEEE International Conference on Data Engineering, pages 232{239. IEEE Computer Society Press.[Levesque, 1984] Levesque, H. J. (1984). Foundations of a functional approach to knowledge rep-resentation. Arti�cial Intelligence, 23:155{212.[Levy et al., 1995] Levy, A. Y., Mendelzon, A. O., Sagiv, Y., and Srivastava, D. (1995). Answeringqueries using views. In Proc. PODS, pages 95{104.[Levy and Sagiv, 1995] Levy, A. Y. and Sagiv, Y. (1995). Semantic query optimization in datalogprograms. In Proceedings of the ACM Symposium on the Principles of Database Systems (PODS).[Lin and Reiter, 1993] Lin, F. and Reiter, R. (1993). How to progress a database II: The STRIPSconnection. Technical report, Department of Computer Science, University of Toronto. Recentversion appears in IJCAI'95.[Lin and Reiter, 1994] Lin, F. and Reiter, R. (1994). How to progress a database (and why) I:Logical foundations. In Proceedings of Knowledge Representation (KR94), pages 425{436.[Lloyd, 1987] Lloyd, J. W. (1987). Foundations of Logic Programming. Symbolic Computation|Arti�cial Intelligence. Springer-Verlag, Berlin, second edition.[Lobo et al., 1992] Lobo, J., Minker, J., and Rajasekar, A. (1992). Foundations of DisjunctiveLogic Programming. M.I.T. Press, Cambridge, Massachusetts.[Loveland, 1969] Loveland, D. (1969). Theorem-provers combining model elimination and reso-lution. In Meltzer and Michie, editors, Machine Intelligence 4, pages 73{86. University Press,Edinburgh.

Page 44: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 44 of 46[Ludascher et al., 1996] Ludascher, B., May, W., and Lausen, G. (1996). Nested transactions in alogical language for active rules. In Pedreschi, D. and Zaniolo, C., editors, Logic in Databases(LID'96), pages 217{242. Also in this collection.[Martin and Sistac, 1996] Martin, C. and Sistac, J. (1996). Applying transition rules to bitemporaldeductive databases for integrity constraint checking. In Pedreschi, D. and Zaniolo, C., editors,Logic in Databases (LID'96), pages 111{128. Also in this collection.[McSkimin, 1976] McSkimin, J. (1976). The Use of Semantic Information in Deductive Question-Answering Systems. PhD thesis, University of Maryland, College Park, Maryland 20742.[Melton and Simon, 1993] Melton, J. and Simon, A. R. (1993). Understanding the New SQL: AComplete Guide. Morgan Kaufmann, San Mateo, California.[Miller et al., 1994] Miller, R., Ioannidis, Y., and Ramakrishnan, R. (1994). Translation and inte-gration of heterogeneous schemas: Bridging the gap between theory and practice. InformationSystems, 19(1):3{31.[Minker, 1982] Minker, J. (1982). On inde�nite databases and the closed world assumption. InProceedings of the Sixth Conference on Automated Deduction, pages 292{308. Also in: LectureNotes in Computer Science 138, pages 292-308. Springer Verlag, 1982.[Minker, 1988] Minker, J., editor (1988). Foundations of Deductive Databases and Logic Program-ming. Morgan Kaufmann Pub.[Minker, 1996] Minker, J. (1996). Logic and databases: a 20 year retrospective. In Workshop onLogic in Databases, San Miniato, Italy. Invited Keynote Address.[Motro, 1989] Motro, A. (1989). Using constraints to provide intensional answers to relationalqueries. In Proceedings of the Fifteenth International Conference on Very Large Data Bases.[Muggleton and Raedt, 1994] Muggleton, S. and Raedt, L. D. (1994). Inductive logic programming:theory and methods. Journal of Logic Programming, 19/20:629{679.[Navathe and Ahmed, 1993] Navathe, S. B. and Ahmed, R. (1993). Temporal extentions to therelational model and SQL. In Tansel and et al., editors, Temporal Databases, chapter 4, pages92{109. Benjamin/Cummings.[O'Leary, 1991] O'Leary, D. E. (1991). Knowledge discovery as a threat to database security. In[Piatetsky-Shapiro and Frawley, 1991], chapter 30.[Piatetsky-Shapiro and Frawley, 1991] Piatetsky-Shapiro, G. and Frawley, W. J., editors (1991).Knowledge Discovery in Databases. AAAI Press and MIT Press, Menlo Park, California.[Pinto and Reiter, 1993] Pinto, J. and Reiter, R. (1993). Temporal reasoning in logic programming:A case for the situation calculus. In Warren, D. S., editor, Proceedings of the Tenth InternationalConference on Logic Programming ICLP'93, pages 203{221, Budapest, Hungary. MIT Press.[Pirotte and Roelants, 1989] Pirotte, A. and Roelants, D. (1989). Constraints for improving thegeneration of intensional answers in deductive databases. In Proceedings of the 5th IEEE Inter-national Conference on Data Engineering.

Page 45: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 45 of 46[Pirotte et al., 1990] Pirotte, A., Roelants, D., and Zimanyi, E. (1990). Controlled generation ofintensional answers. IEEE Transactions on Knowledge and Data Engineering.[Pradhan, 1995] Pradhan, S. (1995). Combining datalog databases using priorities. In Advancesin Data Management '94, pages 355{375. Tata-McGraw Hill, India.[Pradhan and Minker, 1995] Pradhan, S. and Minker, J. (1995). Combining datalog databasesusing priorities. Journal of Intelligent & Cooperative Information Systems.[Pradhan and Minker, 1997] Pradhan, S. and Minker, J. (1997). Using priorities to combine knowl-edge bases. Journal of Cooperative Information Systems, 5(2,3):333{364.[Pradhan et al., 1995] Pradhan, S., Minker, J., and Subrahmanian, V. (1995). Combiningdatabases with prioritized information. Journal of Intelligent Information Systems, 4(3):231{260.[Qian, 1996] Qian, X. (1996). Query folding. In Proceedings of the 12th International Conferenceon Data Engineering, pages 48{55.[Reiter, 1978] Reiter, R. (1978). On Closed World Data Bases. In Gallaire, H. and Minker, J.,editors, Logic and Data Bases, pages 55{76. Plenum Press, New York.[Reiter, 1984] Reiter, R. (1984). Towards a logical reconstruction of relational database theory. InSchmit, M. B. J. M. J., editor, On Conceptual Modelling, pages 163{189. Springer-Verlag Pub.,New York.[Reiter, 1992] Reiter, R. (1992). What should a database know? Journal of Logic Programming,14(1& 2):127{153.[Reiter, 1995] Reiter, R. (1995). On specifying database updates. Journal of Logic Programming,25(1):53{91.[Sakama, 1989] Sakama, C. (1989). Possible model semantics for disjunctive databases. In Pro-ceedings of the First International Conference on Deductive and Object Oriented Databases(DOOD'89), pages 337{351, Ky�oto.[Simmen et al., 1996] Simmen, D., Shekita, E., and Malkems, T. (1996). Fundamental techniquesfor order optimization. In Proceedings of SIGMOD, pages 57{67.[Stickel, 1988] Stickel, M. (1988). A PROLOG technology theorem prover: Implementation by anextended PROLOG compiler. Journal of Automated Reasoning, 4(4):353{380.[Subrahmanian et al., 1994] Subrahmanian, V., Adali, S., Brink, A., Emery, R., Lu, J., Rajput,A., Rogers, T., and Ross, R. (1994). HERMES: A heterogeneous reasoning and mediator system.Submitted for publication.[Subrahmanian and Jajodia, 1996] Subrahmanian, V. S. and Jajodia, S., editors (1996). Multime-dia database systems: issues and research directions. Springer.[Technologies, 1989] Technologies, X. A. I. (1989). HIPAC: a research project in active, time-constrained databases. Technical Report 187, Xerox Advanced Information Technologies.

Page 46: Integrity Constraints: Semantics and Applications

30 April 1997 ICs: Semantics and Applications|Godfrey, Grant, Gryz, & Minker p. 46 of 46[Thuraisingham and Ford, 1995] Thuraisingham, B. and Ford, W. (1995). Security constraint pro-cessing in a multilevel secure distributed database management system. IEEE Transactions onKnowledge and Data Engineering, 7(2):274{293.[Ullman, 1988] Ullman, J. D. (1988). Principles of Database and Knowledge-Base Systems, Vol-umes I & II. Principles of Computer Science Series. Computer Science Press, Incorporated,Rockville, Maryland.[Yang and Larson, 1987] Yang, H. Z. and Larson, P.-�A. (1987). Query transformation for PSJ-queries. In Proceedings of the Thirteenth International Conference on Very Large Data Bases,pages 245{254.[Yoon and L.Kerschberg, 1993] Yoon, J. and L.Kerschberg (1993). Semantic query optimizationin deductive object-oriented databases. In Proceedings of the 3rd International Conference onDeductive and Object-Oriented Databases, pages 169{182.[Zaniolo, 1993] Zaniolo, C. (1993). A uni�ed semantics for active and deductive databases. InProceedings of 1st international workshop on rules in database systems, pages 271{287. Springer-Verlag.[Zaniolo, 1996] Zaniolo, C. (1996). Active database rules with transaction-conscious stable modelssemantics. In Proceedings of DOOD 1996, pages 55{72.