On the Practical Semantics of Mathematical Diagrams · 2006-01-11 · On the Practical Semantics of...

On the Practical Semantics of Mathematical Diagrams

Dave Barker-PlummetCenter for the Study of Language and

Information,Ventura Hall, Stanford University,Stanford, California, 94305, USA

[email protected]

Sidney C. B~linKnowledge Evolution, Inc.,

1050 17th Street NW,~520,Washington, DC, 20036, USA

[email protected]

Abstract

This paper describes our research into the way inwhich diagrams convey mathematical meaning.Through the development of an automated rea-soning system, called &/GROVER, we have triedto discover how a diagram can convey the mean-ing of a proof. &/QROVER is a theorem provingsystem that interprets diagrams as proof strate-gies. The diagrams are similar to those that amathematician would draw informally when com-municating the ideas of a proof. We have applied&/QROVER to obtain automatic proofs of threetheorems that are beyond the reach of existingtheorem proving systems operating without suchguidance. In the process, we have discoveredsome patterns in the way diagrams are used toconvey mathematical reasoning strategies. Thosepatterns, and the ways in which &/GROVER takesadvantage of them to prove theorems, are the fo-cus of this paper.

Key words: Mathematical diagrams, reasoning strate-gies, visualization, proof, automated reasoning.

IntroductionDiagrams and visual images play an essential role inboth the comprehension and communication of math-ematical proofs. We contend that this role is to makethe content of the proof "real" rather than formal. Di-agrams are used to represent the objects and relationsto which a proof refers. When successfully used, thevalidity of a proof can be "seen" in the diagram ratherthan justified as a step-by-step application of formalrules. We suggest that visualization distinguishes ’%l-lowing" a proof from "seeing" it to be true. In the for-mer case, the proof is not fully assimilated, and thus,we might argue, not fully understood.¯ What distinguishes the full comprehension of a proof

from just following the individual steps? The differ-ence concerns the interpretation of the mathematicallanguage: whether it is understood as a purely formalsystem of formulae, rules, and inferences, or whetherit points to something that, however abstract, is realin the world of the mathematician.

Visualization, then, is a means by which mathe-matics sheds its purely formal character and takes onmeaning. As such, it is a key aspect not just of math-ematical learning but also of mathematical discovery.Diagrams, in turn, are a vehicle for communicating thevisualized images. Far from being an expendable aid,diagrams play an essential role in the communicationof mathematical meaning.

This paper summarizes our research into the wayin which diagrams convey mathematical meaning.Through the development of an automated reasoningsystem, called &/QROVP.R, we have tried to discoverhow a diagram can convey the meaning of a proof.&/QROVER is a theorem proving system that takes, asits input, not only a theorem to be proved but also a di-agram intended to represent the essence of the proof.&/GROVER interprets the diagram as a strategy forperforming a detailed formal proof. The diagram fo-cuses &/GROVER’S attention on the relevant facts ateach stage of the proof.

&/tROVER consists of two parts: GROV~.R the dia-gram processor which is the subject of this paper, andan underlying theorem prover, called &. The diagramprocessor constructs a strategy on the basis of infor-mation extracted from the diagram; & is then calledupon to prove the subgoals in this strategy.

GROVER is a prototype system. Our eventual aim isto build a system capable of extracting from a diagramthe same information that a human can. Developmentof GROV~.R has, conversely, yielded insights into thekinds of reasoning involved when humans infer mean-ing from a diagram.

Three non-trivial theorems which we have provedfully automatically using the &/GROVER system are:

1. The Diamond Lemma, a theorem from the the-ory of well-founded relations, described in (Barker-Plummet & Bailin 1992), and which we briefly recaphere,

2. The Multiple Peaks Theorem, a generalization of theDiamond Lemma, which we describe in this paper,

3. The Schrhder-Bernstein Theorem, a theorem fromthe theory of functions, whose proof we also describehere.

39

From: AAAI Technical Report FS-97-03. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.

We chose to study these theorems for the followingimportant reasons:

¯ Each of these theorems is non-trivial for automatedreasoning systems. Indeed, in each case we knowof no other automated reasoning system which iscapable of producing fully automated proofs of anyof them.

¯ Despite the power of the underlying & theoremprover, that system alone is not able to prove thetheorem without the guidance that it obtains fromCROVER’s interpretation of the diagram. This indi-cates that the diagram is playing a crucial role inthe derivation of the proof.

¯ Finally, when presented in tutorial mode, either ina textbook or in a classroom setting, the theoremsare often explained using diagrams to motivate theproof. In our experience, the diagrams which ac-company such presentations are canonical -- theyvary little between independent presentations -- andfurthermore, when called upon to do so, we our-selves remember the diagrams and then reconstructthe proofs from them, rather than remembering theproofs directly. We take this as indication that thediagram is playing a key psychological role in theproof.

In working with these theorems we have discovereda number of heuristics that appear to play a significantrole in the interpretation of a mathematical diagram.The heuristics concern the identification and orderingof steps in the proof strategy, and the determinationof relevant facts to be used at each stage of the proof.1

How can a Proof be Seen?

We have used CROVE~ to test some hypotheses aboutproof visualization, which we describe in this section.The basic hypothesis is that visualizations are partialmodels of the world to which a proof refers. Theyare partial because mathematical worlds are typicallyinfinite (for example, the integers, the real numbers,and the universe of sets) and mathematical theoremstypically quantify over all objects in such a world.

A visualization of such a theorem consists of exem-plars of the patterns asserted to hold. When we provea universal statement of the form

Vx.A(z)

for example, we typically say something like "let c bean arbitrary x," and then proceed to demonstrate A(c).If A is an existential formula of the form

3u.B(x, y)then we might construct a y for which B(c, y) holds,or we might prove B(c, y) by assuming its negation,

1A more detailed description of the work presented here,including more detail on all of the three proofs mentionedabove, can be found in (Barker-Plummer & Bailin 1997).

Vy.’-,B(c, y) and deriving a contradiction. This too willtypically involve instantiating y, at some point, to oneor more specific objects, from which a contradiction isderived.

We hypothesize that the diagram illustrating sucha proof is a trail of the instantiations performed alongthe way: the objects themselves, together with a repre-sentation of the relevant facts about them. These factsare mathematical assertions composed of primitive ordefined relations between the objects, and logical op-erators such as conjunction, disjunction, negation, andimplication. The repertoire of relations depends on theparticular "world" or field of mathematics in which thetheorem is being proved.

In general, the logical operators are not explicitlyrepresented in a visualization. They may serve to in-terpret the relationships between several images thatarise in the course of a proofi For example, a proofby cases, which involves deriving the theorem from adisjunction

AVB...VCmight involve separate visualizations of A, B, and C.Implication is somewhat more complicated: the proofof

A~Bmight involve starting with a visualization of A andelaborating it so that it becomes a visualization of B.This is, in fact, our understanding of the relationshipbetween the hypotheses of a theorem and its conclusionas they appear in a visualization of a proof.

We conjecture that the primary role of visualiza-tions is to represent the relations between exemplarobjects. Depending on the particular field of mathe-matics, there may be preferred representations of cer-tain relations. For example, in set theory we typicallyillustrate the subset relation by containment of one cir-cle within another.

These observations lead us to the first major decisionin the design of QROVEm

A diagram represents a set of facts concerningthe properties of, and relations between, exemplarobjects that are identified in the course of a proof.

The interpretation of a diagram as a trail of the ex-emplars invoked in the course of a proof is one of ourbasic ideas, which we have validated against several(very different) theorems. We develop this idea in thenext section.

Example of a Diagram-Based Proof:The Diamond Lemma

The Diamond Lemma, a theorem in the theory of well-founded relations, states that a well-founded relationthat is locally confluent is also globally confluent.

The definitions of these terms are as follows:¯ The domain of a relation R is the set of all elements

that are related by/~ to some other element, that is,all a such that for some b, either R(a, b) or R(b, a).

4O

* A relation R is well founded (WFR) if there are no in-finite R-chains, that is, no infinite series of elementsa, b, c... such that R(a, b), R(b, c), ....

. A relation R is locally confluent (LCR) if and only iffor any three elements a, b, and c in the domain ofR, if R(a, b) and R(a, c), then there is an element dsuch that R(b, d) and R(c, d).

¯ The transitive closure of R is the relation /~* suchthat R° (a, c) if and only if there is an R-chain from to c, that is, a series bl, b~,...bn such that R(a, bl),R(bl, b~), ... R(b,_l, b,), R(b,,

¯ The relation R is globally confluent (GCR) if andonly if its transitive closure R* is locally confluent.

The standard proof of this theorem uses a diagramthat begins as the upper half of the diamond in Fig-ure 1 and is elaborated in steps, eventually yieldingthe element h. The proof begins by assuming that ar-bitrary elements a, b, and c have been selected withR’(a,b) and R’(a,c). Since R*(a,b), there is an R-chain from a to b and therefore there is an element dthat is the first element of this chain. Similarly, thereis an e one step along an R-chain from a to c.

Now the local confluence of R is invoked to deducethat there is an element f which completes the smalldiamond shown in Figure 1.

The next step of the proof uses transfinite induction,which is a technique for proving properties about thedomain of a well-founded relation. Transfinite induc-tion states that, in order to prove a property P(a) forall elements a in the domain of a well-founded relationR, it suffices to show that the property "climbs up" R.That is, it suffices to show:

For every x in the domain of R, P(x) holds ifit holds for every y that is "lower" than x

where y is "lower" than x if R(x, y).Transfinite induction is applied by observing that e is

"lower" than a: we can, therefore, assume the theoremto hold when e is the upper vertex of an R* diamond.We now have the upper half of an R* diamond withvertex e, the other elements being f and c. Althoughe and f are illustrated as related by R, not R*, we cansee that there is an R-path from e to f with no inter-mediate elements (the degenerate case), and thereforeR*(e, f) holds.

Applying the theorem to the half-diamond with ver-tices e, f and c, we obtain an element g such thatR* (f, g) and R* (c,

The next step is to observe that there is an R-pathfrom d to g, passing through f. Thus, R*(d,g) holdseven though it is not explicitly noted in the Figure 1.

The R-path from d to g provides another opportu-nity to apply transfinite induction. This time we ob-serve that d is "lower" than a and therefore that thehalf-diamond with vertices d, b, and g can be com-pleted with an element h as shown in Figure 1.

a

~ R*.

h

Figure 1: Completion of the proof of the DiamondLemma

Finally, observing in Figure 1 that there is an R-path from c to h (through g), we see that the theoremhas been successfully proven.

Diagrams as Staged Observations: TheExistential Solve Heuristic

Existential solve is a heuristic procedure that we use tothe infer the trail of existence proofs implicit in a dia-gram. We first developed existential solve as a meansto prove the Diamond Lemma automatically. We thendiscovered that it plays an essential role in the proofsof two other difficult theorems, which are described inthis paper.

Existential solve implements the reasoning describedin the proof of the Diamond Lemma above. The goalof the heuristic is to construct a sequence of lemmas,each of which proves the existence of ~or "solves" for)one existential object in the diagram,z The key pointis that the objects are solved for successively, one at atime. The heuristic is used to determine which objectto solve for first, which next, and so on.

Solving for an existential object means proving theexistence of an object that has the properties assertedin the diagram. This is not as obvious a process asit might seem, however, because some properties mayinvolve other objects which may not have been solvedfor yet. The procedure must, therefore, not only de-termine a succession of existential objects, but for eachsuch object it must decide which properties of the ob-ject are to be considered defining properties.

The key idea in existential solve is to use the avail-ability of defining properties as the principal criterionfor ordering the existential objects. A defining prop-erty is a formula whose variables consist only of the

2Throughout this paper we indicate that an object isexistential by writing it in bold face.

41

following:

* One and only one existential object that has notalready been solved for,

¯ Universal objects,

¯ Existential objects that have already been solved for.

At the beginning of the proof of the DiamondLemma there are two existential objects with definingproperties, d and e. When there is more than one can-didate, existential solve chooses the existential objectwhose defining properties, taken together, contain themost other objects (universals and previously solvedfor existentials). The rationale for this criterion is thata greater number of objects in the properties means,in some sense, more information, or greater constraint,and thus a stronger definition. If there are ties whenthis criterion is applied, existential solve proves the ex-istence of the remaining candidates in logical parallel.That is, a random order is used, but since the definingproperties of each object do not reference the com-petitor objects, the selected order has no effect on theresulting proofs.

Existential solve organizes the existential objects inthe diagram into a partial order by repeatedly apply-ing the criteria just described. With each selection ofthe next object to be solved for, that object becomesavailable to appear in the defining properties of otherobjects. Eventually, every existential object will haveat least one defining property, and the ordering processis then complete.

Existential Solve in the Diamond Lemma

To see how existential solve works in the DiamondLemma, we apply it to the diagram in Figure 1. Thefollowing formulae are explicitly represented in the di-agram:

R(a, b) R(a, c) R(a, d) R(a, e)R*(d,b) /~*(e,c) R(d,f) R(e,f)R*(c,g) R*(f,g) R*(b,h) R*(g,h)

All of the objects are existential except a, b, and c,which are identified as universal in the hypothesis ofthe theorem.

In the first pass of existential solve, the existentialobjects with potentially defining properties are d, e,g, and h. The defining properties of d are

R(a,d) and R*(d,b)with universals a and b. The defining properties of eare

R(a,e) and R*(e,c)with universals a and c. The only defining propertyof g at this stage is R(c, g), and the only one for h isR(b, h). Since each of these contains only one univer-sal, g and h are ruled out at this stage. There is noway to break the tie between d and e, so the order inwhich they are solved for is randomly chosen.

In the next pass, d and e may appear in the definingproperties of other objects, so the object f has thedefining properties

R(d,f) and R(e,f)The presence of the two previously solved for objects,d and e, means that f now wins out over g and h, eachof which still has only one defining property containingonly one other object.

In the next pass, g has the defining propertiesR(f,g) and R(c,g)

Now g wins over h because its defining properties con-tain two other objects, f and c, while h still has onlyone defining property, containing one other object.

In the final pass, h has the defining propertiesR(b,h) and R(g,h)

and this marks the end of the trail.

Diagrams as Elisions of Infinitely Many

Observations

Recall our view of diagrams as partial models of theworld to which a proof refers. Diagrams are finite,while mathematical worlds are typically infinite. Whilea theorem may quantify over an infinite range of ob-jects (as in "for every integer i..."), a diagram express-ing the theorem will focus on an arbitrary example inthat range (as in "let io be an arbitrary integer").

In some proofs, the diagram consists of a finite num-ber of such exemplars plus a finite number of existen-tial objects that are "defined" (more precisely, provento exist) in terms of these exemplars -- and relationsbetween these objects. Frequently, however, this doesnot suffice to convey a proof. In many cases it is nec-essary to represent an infinite range of objects throughelision. The ellipsis notation (...) is the most commonmeans of expressing an infinite range through elision.

When a mathematical argument relies on the im-plicit performance of an arbitrary number of calcula-tions or operations, rigorous presentation of the argu-ment must be based on inference rules that permit suchreasoning. The most common of such rules are vari-ous forms of induction. 3 When the "arbitrary" numberis finite (but arbitrarily large), the appropriate rule some form of mathematical induction -- that is, in-duction over the natural numbers -- as opposed totransfinite induction which operates over an arbitrary(possibly infinite) well-founded tree.4

When GROVErt detects the presence of ellipses in adiagram, it tries to determine whether a finite (but ar-bitrarily long) series is being represented, and hencewhether mathematical induction should be applied. Ifthe objects connected by the ellipses are labeled simi-larly except for numerical (integer) subscripts, Grtov~.l~

3These are not the only rules that permit such rea-soning: others include the Axiom of Choice and its manyequivalents.

4A well-founded tree is one that may have infinitebranching but no infinite paths.

42

aO al

~........ bk

j

la

Figure 2: Graphical Statement of The Multiple PeaksTheorem

interprets this to indicate a situation requiring mathe-matical induction.

A proof by mathematical induction consists of twoparts, the base case and the step case. The base caseproves the theorem for the first element in the series ofobjects. The step case proves the theorem for an ar-bitrary element in the remainder of the series, on theassumption that it holds for the preceding element.Accordingly, when GrtovER recognizes a diagram call-ing for mathematical induction, it decomposes the dia-gram into two simpler diagrams, one for the base caseand one for the step case, using the ellipses to deter-mine where the separation should occur.

GROVER’S assumption in performing this decomposi-tion is that each of the resulting diagrams will containenough information to prove its part of the theorem. Inparticular, the diagram for the step case must not onlyexpress the desired conclusion (e.g., that the propertyP(z) holds for z = n + 1), but it must also express theinductive hypothesis (i.e., that P(n) holds) which willbe used to derive the conclusion P(n+l). GROV~.I~ veri-ties this as part of the more general process of matchingthe diagram with the corresponding conjecture. Thisprocess was described in (Barker-Plummet & Bailin1992). If the step case diagram does not represent theinduction hypothesis, the process will fail even beforea proof is attempted. In this sense, GROVEI~ has abuilt-in safeguard against improperly interpreting theellipses.

The Multiple Peaks Theorem

To see how the interpretation of ellipses works, wepresent the proof of the Multiple Peaks Theorem. InFigure 2, R* refers to the transitive closure of a relationR. The theorem states that, if R is globally confluentthen an object h can be found so that the figure canbe completed along the dotted lines, i.e., R*(b0, h) andR*(bk+l, h).

Most notable about this theorem is the fact that thenumber of "peaks," represented by the variable n, is

Figure 3: The Diagram for the Base Case of The Mul-tiple Peaks Theorem

arbitrary.The proof is a straightforward application of math-

ematical induction. The base case (n = 0) follows im-mediately from the global confluence of R. The stepcase (n = k + 1) follows from the inductive hypothesis,which gives us the existence of an h/c such that

R*(b0, h/c) ^ R*(b~+l,

Since we also have, from the assumptions of the the-orem, that

R*(a~+l, b/~+l) A R* (a/c+1, b/c+2)

we infer, from the transitivity of R*, that

R* (a/c+l, h/c)

We therefore use the global confluence of/~ to get theexistence of an h/c+1 such that

R* (hk, h/c+1) R*(b~+2, hk+l)

and then, from the transitivity of R* again, infer that

R*(b0, h~+l)

As in the proof of the Diamond Lemma, the transi-tivity of R* is automatically inferred by & and appliedwhere needed.

GRovErt interprets each ellipsis in Figure 2 as rep-resenting a sequence tl ¯ .. tm of objects. Since the ob-jects in one of the sequences are existential, aROVERinfers that their existence is to be proven by mathe-matical induction. GrtovErt therefore replaces Figure 2with Figures 3 and 4, and the theorem itself is decom-posed into a base case and a step case by applying &’smathematical induction tactic.

Having decomposed both the diagram and the theo-rem into two parts, GROVEI:t must now match the termsin each theorem with objects in the corresponding dia-gram so that the theorem’s hypotheses are recognizedas facts in the diagram.

Through its analysis of the ellipses as a shorthandfor mathematical induction, GrtOVEP~ is able to as-sociate the diagram subscript k with the inductionvariable n in the theorem. Completing the associa-tion process is complicated, however, by a discrepancyin representation between the theorem and the dia-gram. The theorem (in its original form as well as

43

Figure 4: The Diagram for the Step Case of The Mul-tiple Peaks Theorem

in the step-case) does not contain the universal vari-ables a0, al ... ak, ak+l and b0, bl, b2... b~, b~+l, bk+2but rather two universal variables a and b, which areapplied as functions to an index variable i. In orderto complete the association, therefore, GI~OVER mustestablish the correspondence between the variables aand b in the theorem and the instantiated terms in thediagram.

GROVER solves this problem using the idea of span-ning hypotheses -- hypotheses of the form

Vx.(x < n + I --~ A) (1)

where n-}-I is associated with a diagram spanning limit,which is the subscript of the final term of a sequencein the diagram. GROVER replaces each spanning hy-pothesis (1) with the instantiated formulae Air~x] forall spanning instances t, which are the diagram objectsparticipating in one of the diagram sequences. GROVERis then able to match the hypotheses of both the basecase and step case theorems with facts in their respec-tive diagrams.

Focus of Attention: Choosing RelevantHypothesesA diagram fact that has been proven from the con-jecture’s hypotheses is available as a hypothesis dur-ing any individual step of the proof strategy. Further-more, the conclusion of any previous step in the strat-egy is available as a hypothesis in subsequent steps.Not all of these potential hypotheses are necessarilyuseful, however, and in order to facilitate &’s searchfor a proof, GROV~R tries to keep the hypotheses to aminimum. Underlying GROVER’s approach is the ideathat some previously proven facts are relevant to thecurrent lemma, and some are not. If a hypothesis isexplicitly cited as a hint for a given object’s existence,QROVER assumes that it is relevant. GROVER deter-mines the relevancy of other facts by comparing theterms found in the current lemma to those found inthe potential hypotheses. The objective is to find hy-potheses that, taken together, mention all of the terms

found in the lemma’s conclusion. We call this a processof covering all of the lemma’s terms.

To determine the hypotheses for a given lemma, aheuristic algorithm examines the preceding lemmas tosee whether any of them can contribute to "covering"the current lemma’s terms. The algorithm proceedsbackwards, examining the most recent lemmas firstand then, if necessary, moving on to the earlier lem-mas. As this process continues, the set of terms thatstill need to be covered shrinks.

The obvious selection criterion would be to add aprevious lemma to the hypotheses of the current lemmaif the previous lemma contains any of the terms re-maining to be covered. We found that a more restric-tive criterion of relevancy is necessary, however. Ameasure of relevancy is provided by defining two classesof terms in the current lemma:

1. Terms from the lemma’s conclusion that still needto be covered -- we call these the required terms

2. Terms that appear in the lemma’s conclusion or inthe hypotheses thus far selected -- we call these thedesired termsGROVP.R sorts parallel lemmas by 1) the number of

required terms they contain, and 2) within that, thenumber of desired terms they contain. If none of theparallel lemmas contains any required or desired terms,the algorithm proceeds to the next latest set of paral-lel lemmas to consider as candidate hypotheses. Oth-erwise, the parallel lemmas that come out best in thesort --i.e., the highest number of required terms, andwithin that the highest number of desired terms -- areselected as hypotheses.

When the process described above is complete --either because all of the required terms have been cov-ered, or because there are no more earlier lemmas toprovide potential hypotheses -- GROVER considers thehypotheses of the theorem, and the diagram facts thatrepresent them. GROVER again applies a relevancy cri-terion to determine which of these might be suitablehypotheses for the current lemma.

To understand how the procedure we have just de-scribed helps to prune hypotheses, we consider the finallemma step of the Multiple Peaks Theorem, which isthe theorem’s conclusion:

3h.(R*(bo, h) A l~*(bk+2,

We back up to the preceding lemma, which is

R*(hk, hk+l) A R*(b~+2, hk+l)

This lemma contains the required term bk+2, but therequired term b0 still needs to be covered, so we backup to the parallel lemmas

R*(b0, h0) ^ R*(bl,h0)R*(bo, hk) A R*(bk+l, hk)

Both lemmas contain the required term b0, so we mustlook to the desired terms in order to break the tie. Thehk goal wins because it contains the desired term hkwhile the h0 goal contains no other desired term.

44

Diagram Idioms: Visualization and

Abstraction

In this section we will describe the process by whichwe move from the diagram to a collection of formulaewhich it represents. This is a crucial step in GROV~R’sautomatic processing of the diagram.

One of the key components of &/GROVER is a graph-ical editor called DEGAS5. DEGAS is a rather conven-tional graphical editor, with tools allowing the drawingof lines, ellipses, and rectangles, and for attaching la-bels to these objects. The most important feature ofDEGAS for GROVER is that it is able to save the di-agram in the form of a geometry facts file (G-file).6

The G-file is a generic textual representation of the di-agram structure, irrespective of any semantics that weassociate with the diagram.

The use of a graphical editor which is able to pro-duce a representation of the diagram at this level ofabstraction allows us to avoid some potentially diffi-cult problems in understanding the diagram. We donot, for example, have to be involved in line-finding,recognizing collections of lines as rectangles, worryingabout whether a collection of points are collinear, andso on. DEGAS provides us with a representation of thestructure of the diagram which is based on the toolsused to draw the diagram.

Interpreting the Diagram

When presented with a diagram, GROVm~ must inter-pret it as representing facts that are expected to followfrom the hypotheses of the current theorem. We havedeveloped a small expert system for carrying out thistask. An important point in understanding the devel-opment of this system is that we are not attempting todevelop a new language for drawing diagrams, ratherwe are trying to ensure that the system properly inter-prets the "natural" diagram for proving a given the-orem. The rules of the expert system are intended,therefore, to capture the usual practice of mathemat-ical diagrams rather than to define a new language.While we do not believe that a complete and correctset of rules for achieving this goal necessarily exists,we do believe that we can devise a generally useful setof rules that approximate this desire.

We also observe that the rules used in interpretingdiagrams will depend on the mathematical context inwhich the diagram is drawn. For example, a circle ina diagram represents an abstract mathematical circleif the diagram is offered in the context of a geometryproof, while it probably represents a set when offeredin a set theory proof like the SchrSder-Bernstein The-orem. In addition, the specific diagrammatic idioms

5DEGAS is the Diagram Editor for the GROVF_~ Auto-mated System.

6DEGAS can also save the diagram as a postscript file,or in a representation suitable for saving and restoring di-agrams within DEGAS.

used may differ from author to author in an idiosyn-cratic manner. Both of these factors indicate the ex-istence of a number of diagrammatic idioms used inmathematics, rather than a single unified language ofmathematical diagrams.

The interpretation of the diagram is divided into twoparts: a local analysis, and a global analysis. Thelocal analysis phase produces atomic formulae from thespatial and explicit relationships in the diagram, andwrites them to a logic file (L-file). The global analysisphase detects larger constructions in the diagram.

Local Analysis: Geometry To Logic

The analysis of the diagram proceeds in a bottom-upfashion. First the individual objects in the diagramare examined. The labels that are associated withsome objects are symbolic representations of the ob-jects. Various types of labels are allowed in our sys-tem, corresponding to the practices that we have en-countered. The simplest label attaches a name to anobject, but more structured labels are possible, for ex-ample, the label "¢ : R*(a,c)" indicates that thelabeled object is called c and that it has the propertyR*(a, c). Other label forms that are allowed includeequalities such as a = f (b).

The analysis of the labels, as we have just seen, canlead to some formulae being discovered, but the systemmay obtain further facts from the geometric relation-ships between the objects in the diagram. For example,our expert system interprets a dot within a closed fig-ure as the E relation and a closed figure completelywithin another as the C_ relation.

In addition to arbitrary geometric relationships, re-lationships may be stated explicitly. For example,given an arc labeled with the formula "R" whose endpoints are dots labeled a and b respectively, we inferthat the meaning of the arc is R(a, b). This is because,in the language of our prover, "R" has the right formto be a predicate symbol. An alternative reading ispossible, namely (a, / ER.This possible int erpreta-tion is not eliminated completely by our system, butit is deemed to be less likely (since R is not a legalterm in the syntax of our logic), and the preferred in-terpretation is returned by the system. The rules forinterpreting arrows are similar to those for interpretingarcs, except that the preferred interpretation of an ar-row is as representing a function. For example, in theSchrSder-Bernstein Theorem diagram (figure 5), an ar-row labeled by the term "f" and end-points labeled "a"and "b" will be interpreted as (a, b) E f, since f is constant term, rather than a predicate symbol, in the& logic.

As another example, in the diagram of figure 5 weuse the device of dividing a circle into two parts by astraight line. This indicates a partition of the set rep-resented by the circle into two disjoint subsets. Thenatural diagram might instead divide the enclosing setby indicating a subset of that set using a second en-

45

closed circle. GROVER would correctly interpret thisdiagram as indicating that the enclosed circle repre-sents a subset of the set represented by the enclosingcircle, but this would not cause the system to focus onthe remainder of the enclosing circle as an object in itsown right, which is what we need in the proof of theSchrSder-Bernstein Theorem. We can imagine otherdevices for dealing with this problem, for example theuse of shading to indicate the salience of the remainderof the circle as an object.

Global Analysis: Verify Logic

The result of the local analysis of the G-file is a col-lection of atomic formulae, which are implicitly con-joined. We call this representation a Logic File (L-file). Diagrams can represent more complex struc-tures than a flat collection of atomic formulae how-ever. These structures are detected in an analysis ofthe L-file which we call verify logic. Verify logic is onlyactivated once the G-file representation has been com-pletely interpreted as an L-file, so it is an operationon logical formulae. In principle, the same processingcould be performed on the G-file representation, or in-terleaved with the geometry to logic phase. From animplementors point of view, however, it is simpler towait until the L-file representation is complete beforelooking for higher-level structures.

The global analysis is implemented as a collectionof "critics", each of which looks for specific condi-tions that might hold within the diagram, and mod-ifies the logical representation appropriately. For ex-ample, one of the critics implemented in GROVER is thedefinition by cases critic.

The definition by cases critic is triggeredby thepresence of two equalities in the L-file of the formx -- tl,x = t2, where x is an existential object, andtl, t2 are arbitrary terms involving only universal ob-jects. It is a general feature of diagrams that distincttokens represent distinct objects (token referentiality,see (Barwise 1993)), and therefore such a pair of equal-ities present a puzzle on the face of it. One explanationis that the diagrammer is attempting to assert tl -- t2,but the role ofx is then unexplained. The definitionby cases critic attempts to gather evidence that theexistential object x is being defined by cases, as undersome circumstances being equal to tl and under otherdisjoint circumstances being equal to t2. If such evi-dence can be found, the equalities z = tl and z = t2are replaced by the critic with the more complex for-mulae: P --~ z = tl AQ --~ z = t2, where P andQ are possibly complex formulae representing the twoalternative conditions.

Example: The SchrSder-BernsteinTheorem

The definition by cases critic plays a crucial role inThe SchrSder-Bernstein Theorem, a theorem from thetheory of functions which concerns the way in which

the "size" of sets can be measured. The SchrSder-Bernstein Theorem states that if there is a one-onefunction (an injection) from the set A into the B, anda one-one function from B into A, then there is a bi-jection between the two sets, i.e., a one-one functionfrom A onto B.

V f, g, A, B.Injection(f, A, B) A Injection(g, B, A) Bh.Bijection( h, A, B)

An intuitive proof of the SchrSder-Bernstein Theo-rem would proceed as follows: The bijection h must besome combination of of f and g-l, i.e., for each a E A,h(a) will be either f(a) or g-l(a). The problem istherefore to define a partition of A into sets A1 and A2so that h behaves like f for members of A1 and g-1on members of A2. Since h is to be a bijection, everyb E B will have to be in range(h). Therefore, if b is notin range(f), then h-l(b) must be in A2. So A2 containsg-l(B - range(f)). Moreover, A2 must be closed un-der go f, because ifa E A2 then h(a) g-l(a), so h(acannot be I(a) unless I(a) = g-~(a). Therefore, f(a) : g-l(a), f(a) must be "hit" under h by someother element of A, which can only be g(f(a)). So letA2 be the smallest set containing g-l(B - range(f))and closed under g o f, and let A1 be A - A~.

The diagram of figure 5 illustrates this strategy.

h=!f

!A

Figure 5: The Diagram for the SchrSder-BernsteinTheorem

The diagram contains objects h, A1, A2 and Cwhose existence must be proved, and in addition itrepresents the definition of these objects.

The two arrows defining the function h in the dia-gram, one arrow labeled h = f and the other labeledh = g-1 are recognized by the definition by casescritic as indicating a definition of the function h bycases. The critic looks at the source points of the re-spective arrows, to determine whether they indicatethat the function h is defined to be f on some sub-set of its domain, and g-1 on the other subset of thedomain.

Three other critics are needed in the diagram-matic proof of The SchrSder-Bernstein Theorem. The

46

fu.~ction chains critic looks for information concern-ing items at the end points of arrows, in order to con-struct appropriate assertions concerning the relation-ships between objects at the ends of these arrows. Thediagram of figure 5 contains universal objects al, a2, a4and b3, whose role in the diagram is to serve as startingpoints for function arrows. Since these are universalobjects, they are examplars for arbitrary objects withthe same properties that they themselves exhibit. Thefunction chains critic generalizes the formulae contain-ing these objects to universal formulae.

b3, for example, is a member of C which is mappedby the function g onto some member of A2. Ratherthan view this structure as three distinct formulae,b3 E C, (b3,a3) E f and 3 EA2, werecognise thatthe geometric structure is intended to represent thatevery member of C is mapped by g to some memberof A2.

The function chains critic examines the L-file for for-mulae which match this pattern, constructing the ap-propriate generalizations of the specific formulae. Thepart of the diagram which is significant for this step isshown in figure 6.

g

AI~ ! /~ rang~

!A !B

Figure 6: Function Chains

The result of applying the function chains critic tothe formulae just mentioned is the new formula:

Vb3.(b3 ̄ C ~ Vx.((b3, x) ¯ g ~ z ¯ )The same critic notes that formulae a4 ¯ As,

(a4,b4) f, (b4,as) ¯ g and a5 ¯As indicate thatan arbitrarily chosen element in A2 maps under g o fback into A2.

The formulae involving a4, b4 and a5 have the samestructure, except that this represents a chain of func-tion applications. Again, the chain beginning with theuniversal object is traversed, and the properties of thebeginning and end points of the chain examined. Theresult is a universal formula which asserts that all startpoints with the same properties as the exemplar aremapped by the same chain, to end points with thesame properties as its exemplar.

The result of applying this critic to the chain is:

Va4.(a4 ̄ 2 - -~Vx, y.(((a4, x) ¯ f A (z, y) ¯ g) --* y

These formulae capture the intent of the larger struc-ture in the diagram, by aggregating facts recognized asforming a pattern into an appropriate compound for-mula.

On the basis of the formulae derived by the functionchains critic, the Closure critic recognizes that A21. contains the image under g of B - range(f),

2. and is closed under the composition g o f.

and therefore that A2 is (probably) intended to be theclosure of the given base set under the composition ofg and f. The crucial part of thediagram for this criticis coincidentally identical to the part relevant to thefunction chains critic, so consult figure 6.

The choice to consider A2 as the closure rather thansome superset of the closure is heuristic, but we believethat this is generally likely to be the intention, partic-ularly when no additional information about the setis available, as in this case. The choice of A2 as theclosure means that we will add a formula to the L-fileindicating that A2 is a subset of all non-empty setswith the properties 1 and 2. Note that this formuladoes not imply that the set A2 itself enjoys propertiesi and 2 above. A proof of this fact must be constructedby the theorem prover later in the processing.

The closure critic adds the hint that A2 is definedto be this intersection. This hint is used when theindividual goals of the strategy are constructed.

The final critic used in the proof is the generalizedomain and range critic, which is responsible for in-ferring the intended domains and ranges of Functionassertions. In the diagram of figure 5, the only ar-rows labeled by g have target points in A2, but we donot know that g’s range is just A2. Indeed in the in-tended proof, g is an injection into A. The generalizedomain and range critic examines the diagram look-ing at all of the target points of arrows sharing thesame label. Having identified these end points thecritic identifies the largest graphical object contain-ing all of these end points, and asserts this as theset into which the function maps. This results in theL-file formula Function(g, B, A2) being replaced byFunction(g, B,A), and Function(f ,A, range(f)) byFunction(f, A, B).

Like the closure critic, the action of thegeneralize domain and range critic can be undesir-able. It may over-generalize, since for example the in-tended range of g may indeed have been A2, or under-generalize, since the intended range of the functionmay in fact not appear as a object in the diagram,but may contain the inferred range. Experience withother diagrams will determine which of these cases isthe most likely to occur, and the diagram cues that wemay use to determine the likely intended values for thedomains and ranges of sets.

47

Other Approaches to Graphical

Theorem Proving

We are aware of work in graphical theorem proving byGelernter, Barwise and Etchemendy, and Pastre. Wetry to identify both the similarities and differences ofour work with these other approaches.

Gelernter’s Geometry Machine

The concept of graphical theorem proving was intro-duced by Gelernter in his Geometry Machine ((Gel-ernter 1963; Gelernter & others 1963; Gilmore 1970)).GROVER resembles the Geometry Machine (GM) in thefollowing respects:

¯ The diagram is used as a model of the goal to beproven

¯ The diagram suggests constructions of terms thatare needed in the intended proof

Although GM is oriented specifically towards prov-ing theorems in Euclidean Geometry (the prover usesa set of axioms for Euclidean Geometry, which providethe basis for proving the existence of needed terms),Bundy has shown how the approach can be applied toother domains if one replaces the concept of "diagram"with the concept of "model" ((Bundy 1983), pp. 142-149). The principal difference between the Gelernterapproach and GROVER concerns the way in which thegraphical information is used. In GM, the diagram isconsulted in order to guess bindings that will prove thecurrent subgoal. It is assumed that if the instantiatedsubgoal is true in the diagram, it may be provable.tROVER offers a different form of guidance: the advicetakes the form of specifying the subgoals themselves.Thus with tROVER the high-level structure of the proofis determined by the diagram.

Bundy’s generalization of GM provides a method ofpruning irrelevant formulae from the proof. In termsof &, this would mean discarding a conclusion Ci ofa goal sequent A1 .." Am b C1 "’" C,, if the model canbe used to refute Ci, that is, if variables of Ci can bebound to elements in the model in such a way thatthe instantiated form of Ci is false in the model (itmust be remembered here that the variables in & playthe role conventionally played by constants in otherprovers, and that schematic terms in & play the roleconventionally played by variables). There is no ex-plicit analog to Bundy’s method in tROVER, but webelieve that it is implicitly present because the dia-gram itself is what suggests the subgoals; thus the Ciare already known to be true in the diagram.

Barwise and Etchemendy’s Hyperproof

The work of Jon Barwise and John Etchemendy ingraphical reasoning stems from an underlying beliefthat visual information is frequently a more effec-tive medium for reasoning than is text ((Barwise

Etchemendy 1994; 1990b)). They present several ex-amples to illustrate this point. They have been devel-oping a theoretical foundation for this work in the formof an algebraic theory of infons, that is, informationthat can be manipulated independently of the specificrepresentations it may take. One of its key results isthe identification of five basic rules for manipulatinginformation, which provide the basis for the appliedwork ((Barwise & Etchemendy 1990a), Section

On the applied side, they have developed a sys-tem called Hyperproof which allows the user to reasonabout the blocks world both graphically and with for-mulas. A subsequent version of ttyperproof, which willsupport a more general class of problems, is now beingdeveloped.

Hyperproof is a purely interactive tool, i.e., a proofchecker, rather than as an automatic prover such asGROVER. The user can invoke either standard logi-cal inferences on formulas, which are displayed at thebottom of the screen, or graphical inferences on thediagrams. These two forms of reasoning can be inter-leaved. For example, the user can perform an oper-ation that splits a diagram into two alternative dia-grams, each with more information than the originalone (proof by cases), and an operation that merges twodiagrams into a single diagram containing the informa-tion common to both. The graphical inference rulessupported by ttyperproof are formulated at a fairlylow level, comparable to those of first order logic. Nev-ertheless, the graphical inferences serve to elide whatwould otherwise be large blocks of logical inferences.The reason for this power is that the diagrams com-press a lot of information into a concise representation.

The high level goals of tROVER and Hyperproofare quite compatible. The implementation approachesseem to reflect different priorities. The emphasis inGROVER has been on the heuristic elaboration of veryhigh level inferences (that is, translating high levelclaims by the user, which are expressed graphically,into formal proofs), so that the tool becomes a vehi-cle for discovering, expressing, and verifying proofs ata high level. The emphasis in Hyperproof is on theuse of diagrams as a concise representation of com-plex situations, so that the tool facilitates human rea-soning about such situations. In addition Hyperproofembodies a formal model of reasoning with diagrams,in which inference rules may diagrams as hypothe-ses and/or conclusions. In tROVER the diagram is aheuristic, meta-level device which plays no formal rolein the eventual proof.

More specifically, the salient differences between Hy-perproof and GROVER can be summarized in terms ofcapabilities found in one but not the other. The keyfeatures of tIyperproof that are not now supported byGROVER are:

¯ Graphical inference operations

* Dynamic interaction of the user with the diagram

48

The graphical inference operations support the dy-namic interaction by enabling the user to, in effect,"draw" individual proof steps. The diagram in Hyper-proof is thus an intrinsic part of the interactive reason-ing process, manifested in the graphical inferences.

The key features of {]ROVER that appear to be absentin Hyperproof are:

¯ Ability for the user to assert an existential claimgraphically

¯ Automated construction of strategies to prove theseclaims

There is no provision in Hyperproof for the user toadd to a diagram an object whose existence must thenbe proven as a subgoal. In Hyperproof the initial di-agram represents the problem situation about whichsome claim is to be proven; the graphical inferencesrepresent logical inferences on these situation descrip-tions. In contrast, GGROVER provides the user witha graphical means of specifying subgoals of the form3x.P(x) whose proofs will assist in the proof of the de-sired theorem. It is this form of argumentation thatwe have emphasized in {]ROVER.

The way in which {]ROVER processes such userinput--the construction of a strategy--is another keydifference. (}ROVER heuristically processes the graph-ical information to construct an existential assertion,which is then passed to the underlying theorem prover.The construction of these assertions is a non-trivialtask since it involves selecting, from all of the informa-tion in the diagram, those facts that are relevant to theobject x and that should be included in the assertionP. For example, in the proof of the Diamond Lemma,the existential solve heuristic collects all those atomicfacts that refer to x and to no other object that hasnot yet been defined.

The two approach are complementary. In Hyper-proof the diagram is used as a presentation device thatmakes it easier for the user to reason from a situation(the assumptions of the theorem) to the desired con-clusion. In the current implementation of GROVER, weassume that the user already has a high level proofstrategy in mind--in particular, a series of existentialsubgoals-- and that this strategy is most easily ex-pressed by means of diagrams. Thus the challenge in{]ROVER is for the machine to interpret this graphicalexpression and derive a best guess at what the userhas in mind, and then to carry out the laborious anderror-prone details of the proof.

DATTE - The Work of D. Pastre

Pastre has described a theorem prover which uses di-agrams to aid the proof of theorems, (Pastre 1977),however this work is quite different from ours. Thediagram in Pastre’s theorem prover, DATTE, is an in-ternal representation of the formulae that the theoremprover is currently manipulating rather than something

that the users provides to guide the prover, as in ourcase.

The chief similarity between our proposal and Pas-tre’s work is the use of graphical inference rules. InDATTE, various definitions are expressed as "statementrules", which cause the internal diagram to be modi-fied. For example, if the diagram represents a _ b andx E a then the diagram is updated to represent x E b.This is rather like our proposed graphical inferencerules. The major difference is that, in our case, theuser provides the diagram as guidance for the prover,and the system views it not only as a representationof formulae, but also as a proof strategy. DATTE’s dia-gram represents only those formulae that are known tobe true at a particular point in the proof, not hypothet-ical information or goals to be proved as in {]ROVER.

ConclusionsWe have described various issues that arise when an au-tomated system tries to interpret a diagram as a math-ematical proof. In our investigation of three theoremswhose proofs require different techniques -- transfi-nite induction, mathematical induction, and set the-ory, respectively -- we found that a common elementis the decomposition of the proof into a series of ex-istence proofs; the diagram suggests the conditions tobe proven in "solving" for successive objects. The di-agram also suggests the degree of relevance of eachpreviously solved for object to the current existenceproof, thus providing a tractable set of hypotheses tobe used in each lemma. Finally, patterns in the dia-gram may suggest higher-order abstractions that arecrucial in proving the theorem.

Eventually, our goal is to develop a system that willfoster the development of proofs by students of mathe-matics and even by working mathematicians. By rais-ing the level of the conversation to the types of abstrac-tions contained in diagrams, a theorem proving systemcould serve as a kind of surrogate colleague with whomideas are tested and the implications of different con-structs explored, aROVER, in its prototype state, isa long way being such a system, but it is a start atuncovering the kinds of meaning embedded in a math-ematical diagram.

ReferencesBarker-Plummer, D., and Bailin, S. 1992. Proofsand pictures: Proving the diamond lemma with theGROVER theorem proving system. In Working Notesof the AAAI Symposium on Reasoning with Diagram-matic Representations, March 25-27th 1992, Stan-ford, USA.

Barker-Plummer, D., and Bailin, S. 1997. The role ofdiagrams in mathematical proofs. Machine Graphicsand Vision 6:25-56.

Barwise, J., and Etchemendy, J. 1990a. Information,infons, and inference. In Cooper, R.; Mukai, K.; and

49

Perry, J., eds., Situation Theory and Its Applications.Center for the Study of Language and Information.

Barwise, J., and Etchemendy, J. 1990b. Visual infor-mation and valid reasoning. In Zimmerman, W., ed.,Visualization in Mathematics. Mathematical Associ-ation of America.Barwise, J., and Etchemendy, J. 1994. HypcrproofCSLI Press. (ISBN: 1-881526-11-9).Barwise, J. 1993. Heterogeneous reasoning. In All-wein, G., and Barwise, J., eds., Working Papers onDiagrams and Logic. Indiana University Logic Group.1-13.Bundy, A. 1983. The Computer Modelling of Mathe-matical Reasoning. Academic Press.Gelernter, H., et al. 1963. Empirical explorations ofthe geometry theorem proving machine. In Feigen-baum, and Feldman., eds., Computers and Thought.McGraw Hill.

Gelernter, H. 1963. Realization of a geometry theo-rem proving machine. In Feigenbaum, and Feldman.,eds., Computers and Thought. McGraw Hill.Gilmore, P. 1970. An examination of the geometrytheorem proving machine. Artificial Intelligence 1:171- 187.Pastre, D. 1977. Automatic theorem proving in settheory. Technical report, University of Paris (VI).

50

On the Practical Semantics of Mathematical Diagrams · 2006-01-11 · On the Practical Semantics of...

Documents

Transcript of On the Practical Semantics of Mathematical Diagrams · 2006-01-11 · On the Practical Semantics of...