The Application of Term Rewriting Systems for Expressing...

The Application of Term Rewriting Systems forExpressing Generic Program Slicing Algorithms

Jens Nicolay

May 2009

AbstractProgram slicing is a technique in software engineering for isolating

parts of a program that influence the value of a variable or some otherarbitrary expression in that program. There exist many different slicingalgorithms, with specific techniques for a variety of language features suchas procedures, unstructured control flow, composite data types, pointers,concurrency, etc. This paper shows that it is possible to obtain moreaccurate and flexible generic slicing algorithms by specifying the semanticsof programming languages as a term rewriting system. Changes to thebehavior of the slicing algorithm can then be accomplished through simplechanges in rewriting rules.

1 Introduction1.1 OverviewIn this paper a selection of the work of Field et al. is presented that shows itis possible to obtain generic, flexible and accurate program slicing algorithmsusing term rewriting systems. Program slicing is a technique used in softwareengineering that makes it possible to examine the values of particular variablesor expressions in isolation, by only looking at those parts of a program thatpotentially influence the value in which we are interested. It is clear that thishas advantages when performing any kind of program analysis, like debugging,optimization, testing etc.

Because there are many different ways in which a program can be analyzed,and because there are many different programming languages, a multitude of(variations on) different slicing algorithms exist that are targeted at relativelyspecific problem sets and languages [8]. Besides giving a more detailed introduc-tion of the concept of program slicing, Section 2 presents an example of such anad-hoc slicing algorithm. We conclude that same Section by pointing out somelimitations of “traditional” slicing algorithms and specify some of the desiredproperties of a truly generic and reusable slicing algorithm.

Term rewriting systems are rewriting systems that use terms as the objectsto rewrite, and they will form the basis of our framework to express a flexibleslicing algorithm. Rewriting systems, although they are a foundational theoryof computer science, are intuitively easy to understand. In fact, almost every-one has already encountered them in one form or another. Section 3 gives an

1

introduction to rewriting systems, starting from an abstract view on rewritingand working towards the rewriting of terms and their graph representations.

Section 4 then gives the details of the specific term rewriting system we willuse to perform program slicing. In this paper we focus on Algol-like imperativeprogramming languages, and the Pim rewriting system is specifically designedto mimic the semantics for this class of languages. It models an abstract repre-sentation of memory and the operations of a program on that memory, and assuch is capable of tracking the different values that are associated with expres-sions in a program. By rewriting this information in function of the value weare interested in, we obtain the actual program slice, which is the conclusion ofSection 5.

1.2 AcknowledgmentsI would like to thank John Field for pointing me to some additional resourcesfor Pim. I am also grateful towards prof. Vermeir for organizing an interestingcourse in a relaxed way, and for giving me the liberty to choose this subject. Ichose the work of Field, Tip and others on program slicing because of my everrecurring astonishment of how simple things can lead to great results. Theseresults, presented here, are not my own, so the only contribution I can hope toclaim with this paper is that it sparks some of the same interest in the readeras it did in me.

2 Program SlicingA program slice consists of the parts of a program that potentially affect thevalues computed at some point of interest, called a slicing criterion. A slicingcriterion can for example be the value of a variable at, or the reachability of aspecific point, in the program. The task of computing program slices is calledprogram slicing.

Program slicing is a useful technique for program understanding and pro-gram analysis. The main idea is that slicing can throw away portions of aprogram that are irrelevant to the slicing criterion. This reduces both the sizeof the program and the time and effort needed to reason about certain propertiesit may or may not exhibit.

Program slicing has been widely applied in software engineering. While theoriginal application was debugging, a number of other applications have sincebeen proposed, including model-checking, parallelization, software maintenance,testing, reverse engineering and compiler tuning.

Slicing techniques have undergone rapid development since the original defi-nition was presented by Mark Weiser in 1979 [10]. Because different applicationsand underlying languages require different slicing properties, many different no-tions of slices exist and many slicing algorithms have been proposed. Tradition-ally these algorithms can be divided into two classes.

1. Static slicing algorithms make no assumptions about the inputs to theprogram, and consequently compute slices that are valid for all possibleinput instances.

2

Pgm : := { StmList }

StmList : := Stm| StmList Stm

Stm : := Exp ;| i f ( Exp ) Stm| i f ( Exp ) Stm else Stm| while ( Exp) Stm| { StmList }

Exp : := Id| I n t L i t e r a l| ? Id| LValue = Exp| Exp + Exp

LValue : := Id

Figure 1: Syntax of µC

2. Dynamic slicing algorithms accept a specific instantiation of all inputsand computes slices valid only for that specific instance.

From this division we can deduce a more general notion of constrained slicing [5]which generates slices that are valid for all instantiations of inputs that satisfy agiven set of constraints. The relation between static, dynamic and constrainedslicing is straightforward. A fully constrained slice, with every input a fixedconstant, is a dynamic slice. A fully unconstrained slice is a static slice.

2.1 µC LanguageTo demonstrate our approach we will use a small imperative language µC thatadheres very closely to the standard C semantics. Its syntax is given in Figure 1.All data types in µC are assumed to be integers, including boolean values. Wealso introduce meta-variables in µC, such as ?N, to represent unknown values.These meta-variables can be thought of as representing program input (throughthe call to a read function for example) or as read-only function parameters.

2.2 Weiser’s Algorithm for Static SlicingWe will now present an example of an existing, representative slicing algorithmspecifically designed to compute static slices in imperative languages. In [9]Mark Weiser defines an iterative algorithm for static slicing based on the controlflow graph (CFG) of a program P . In this context a slicing criterion takes theform of (n, V ), where n is a node in the CFG of P , and V is a subset of theprogram’s variables. A slice S of P with respect to (n, V ) is a valid, executableprogram obtained from P by deleting zero or more statements. Additionally Smust halt for every input P halts on, computing the same values for V at the

3

(1) n = ?N;(2) i = 1;(3) s = 0;(4) p = 1;(5) while (i <= n)

{(6) s = s + i;(7) p = p * i;(8) i = i + 1;

}(9) write(s);(10) write(p);

Figure 2: Simple example program P1 with its CFG

statement corresponding to n. Note that this means that for any given criterion,the original program P trivially is its own maximum slice.

Figure 2 shows the structured program P1 consisting of only one procedureworking on integers. Suppose we take as slicing criterion (10, {p}). First weiteratively determine the set of directly relevant variables R0

C(i) for every nodei of the CFG. We start with R0

C(10) = {p} and R0C(i) = ∅ for all other nodes

i 6= 10. Working backwards, we deduce that for every edge i →CFG j that ifthere is at least one relevant variable w ∈ R0

C(j) that is defined at i, then allvariables referenced at i can influence the value of w and should be included inR0C(i). If the sets Def(i) and Ref(i) denote the sets of variables defined and

referenced at node i respectively, then we can write this as follows.

∀i→CFG j : R0C(i) = R0

C(i) ∪{v | v ∈ Ref(i) ∧Def(i) ∩R0

C(j) 6= ∅}

We also propagate the variables that are relevant at j but are not defined orredefined at i.

∀i→CFG j : R0C(i) = R0

C(i) ∪{v | v ∈ R0

C(j) ∧ v /∈ Def(i)}

From R0C a set of directly relevant statements S0

C is derived. A node i is in S0C

if it defines a variable that is relevant to a CFG-successor of i.

S0C =

{i | (Def(i) ∩R0

C(j)) 6= ∅ ∧ i→CFG j}

The reader can verify that we have S0C = {2, 4, 7, 8}, corresponding to relevant

variables i and p. But examining P1 in Figure 2 again, we see that variablen used in the while also influences the value of p at statement 10. So far wehave only traced transitive data dependences, but we also need to trace con-trol dependences to include variables referenced in control predicates of branchstatements like if and while statements. Informally, a branch statement is in-directly relevant if it contains (“controls”) at least one relevant statement in itsbody. If Infl(b) is the set of statements that are control dependent on branchstatement b, then we can define the set of indirectly relevant branch statementsBkC .

BkC ={b | ∃i ∈ SkC ∧ i ∈ Infl(b)

}4

node # Def Ref R0C R1

C

1 {n} ∅ ∅ ∅2 {i} ∅ ∅ {n}3 {s} ∅ {i} {i, n}4 {p} ∅ {i} {i, n}5 ∅ {i, n} {p, i} {p, i, n}6 {s} {s, i} {p, i} {p, i, n}7 {p} {p, i} {p, i} {p, i, n}8 {i} {i} {p, i} {p, i, n}9 ∅ {s} {p} {p}10 ∅ {p} {p} {p}

(1) n = ?N;(2) i = 1;(4) p = 1;(5) while (i <= n)

{(7) p = p * i;(8) i = i + 1;

}(10) write(p);

Figure 3: Some of the sets involved and the resulting slice after applying Weiser’salgorithm for the program in Figure 2 with respect to (10, {p})

Continuing our example, we have Infl(5) = {6, 7, 8} and B0C = {5}. Next we

determine the sets of indirectly relevant variables Rk+1C (i) which, in addition

to the variables in RkC , contain variables that are relevant because they havea transitive data dependence on statements in BkC . We do this by tracing thetransitive data dependences as before, but now with respect to the criterion(b, Ref(b)) where b is a branch statement in BkC .

Rk+1C (i) = RkC(i) ∪

⋃b∈Bk

C

R0(b,Ref(b))(i)

Finally, the sets Sk+1C of indirectly relevant statements consists of the nodes in

BkC together with the nodes i that define a variable that is Rk+1C -relevant to a

CFG-successor j.

Sk+1C = BkC ∪

{i | (Def(i) ∩Rk+1

C (j)) 6= ∅ ∧ i→CFG j}

For our example program P1 given in Figure 2 we then have S1C = {1, 2, 4, 5, 7, 8}.

The sets Rk+10 and Sk+1

C are nondecreasing subsets of the program’s variablesand statements respectively, and the fixpoint of the computation of the Sk+1

C

sets constitutes the desired program slice. We show the resulting program slice,together with some of the sets constructed during the execution of Weiser’salgorithm, in Figure 3.

2.3 Other Slicing AlgorithmsSo far we have only discussed one basic algorithm of intraprocedural slicing.Weiser also extends his algorithm to deal with interprocedural static slicingas well. Since then many variations on his method have appeared, but withcomparable accuracy [8]. Accuracy in the context of program slicing is viewedin terms of the number of statements contained in a slice, where smaller slicesare more accurate than others. A slice is statement-minimal if no other slicefor the same criterion contains fewer statements. The problem of determiningstatement-minimal slices is undecidable in general.

5

n = ?N;i = 1;if (i == 1){

n = n + 1;}else{

n = n + 2;}write(n);

n = ?N;i = 1;if (i == 1){

n = n + 1;}else{

;}write(n);

n = ?N;

n = n + 1;

write(n);

(a) (b) (c)

Figure 4: (a) Example program P2 and static slice w.r.t. statement write(n)(b) More accurate slice obtained through constant propagation (c) Minimal slice

It is out of the scope of this paper to present the details of other methods.For a detailed overview of program slicing techniques we refer the reader to theexcellent survey by Tip [8].

2.4 LimitationsFigure 4 (a) shows an example program P2 for which the static slice with respectto statement write(n), as computed by Weiser’s slicing algorithm, is the entireprogram again. Yet, employing a simple technique like constant propagationwould lead to the more accurate slice depicted in Figure 4 (b). If replacing anentire if statement by one of its branches is allowed, we can make the slice evensmaller and arrive at the minimal slice of Figure 4 (c).

We identify two important restrictions, characteristic not only of variants onWeiser’s approach, but also of many of the other traditional slicing algorithmsthat exist [8].

1. The resulting slice consist of a subset of statements of the original pro-gram, and must sometimes constitute a syntactically valid program. Slic-ing would be more powerful if it allows transformations beyond statementdeletion. Take for example the case where a programming language doesnot allow if statements with empty branches, but where a slicing algo-rithm would exclude all statements in such a branch. Such statementscould never be removed from a slice since it would result in a programthat is not syntactically valid.

2. Slices are computed by data and control dependences. If the objectiveis to compute slices that are as small as possible, then other kinds ofsemantic-preserving optimization techniques should be used in slicing aswell, as the example in Figure 4 demonstrates.

These two issues are strongly interrelated, in the sense that often you need todismiss the first restriction to be able to dismiss the second one.

6

While many different algorithms exist for various slicing techniques, eachof those algorithms are created for the problem they solve. Therefore it wouldbe desirable to have a generic algorithm to perform program slicing, whetherit is for static or dynamic slicing, for interprocedural or intraprocedural slic-ing, to calculate intermediary or end-values of variables, and this maximallyindependent from the programming language in which the original program isexpressed.

3 Rewriting SystemsThe theory of rewriting deals with the stepwise, or discrete, transformation ofobjects, as opposed to continuous transformations of objects [2]. These ob-jects may be strings, terms, formulas, graphs or any other entities from a givendomain. Since computations in computer science are typically stepwise trans-formations of objects, rewriting is therefore a foundational theory of computing.

Rewriting has its origins in both mathematical logic and computer science.Between those two disciplines some differences in terminology exist. In math-ematical logic the word ’reduction’ is used for what is named ’rewriting’ incomputer science. We will encounter similar cases where multiple terms existto denote the same concept or property.

3.1 Abstract Reduction SystemsMany aspects of rewriting can be studied independently of the nature of theobjects that are rewritten. We will start with an abstract approach to formallyintroduce some basic notions.

An abstract reduction system (ARS) is a structure (A, {→α| α ∈ I}) consist-ing of a set A and a set of binary relations →α on A, indexed by a set I. Forα ∈ I the relations →α are called reduction or rewrite relations.

If (a, b) ∈→α for a, b ∈ A, then we write a→α b and call b a one-step reductof a. A reduction sequence with respect to →α is a finite or infinite sequencea0 →α a1 →α a2 →α . . .. Reductions sequences are also called reduction pathsor reductions for short.

A reduction sequence starting from a is called a reduction sequence of a. Ifsuch a reduction is finite and ends in b, then it is called a reduction sequencefrom a to b. A reduction step is a specific occurrence of →α in a reductionsequence.

The length of a finite reduction sequence is the number of reduction stepsoccurring in this reduction sequence. For the reduction sequence a0 → a1 →· · · → an of length n we write a0 � an.

As a first example consider the simplification of arithmetical expressions [2].By convention we underline the expressions being reduced for each step.

(3 + 5) ∗ (1 + 2)→ 8 ∗ (1 + 2)→ 8 ∗ 3→ 24

Even this simplification process has several remarkable properties. In mostexercises the simplification process will result in the form of an expression thatcannot be simplified any further. We call a ∈ A a normal form if there existsno b ∈ A such that a→ b. Clearly 24 is a normal form in our example.

7

The above simplification process is also non-deterministic, since differentrewriting paths are possible. The following reduction sequence also yields thenormal form 24.

(3 + 5) ∗ (1 + 2)→ (3 + 5) ∗ 3→ 3 ∗ 3 + 5 ∗ 3→ 9 + 5 ∗ 3→ 9 + 15→ 24

Even stronger, every arithmetical simplification of (3 + 5) ∗ (1 + 2), when itreturns a result, should yield 24, independently of the specific reduction pathtaken. This is guaranteed by the notion of confluence (also called the Church-Rosser property). We call → confluent if for every a, b, c ∈ A there exists ad ∈ A such that c� a� b implies c� d� b. This property can be visualizedas a diamond structure.

Eventually, also independently of the reduction path taken, the arithmeticalsimplification of (3 + 5) ∗ (1 + 2) will yield the normal form 24. We say that therelation → is strongly normalizing (or terminating or noetherian) if for everya ∈ A all reduction sequences starting from a are finite.

Every reduction relation→ in the preceding examples can be replaced by anequality =, because all the expressions are equal in the sense that they all denotethe number 24. Without going into details, we define an equational systemsimply as an ARS “without orientation”. When dealing with simplificationhowever, one can perceive a direction from complicated expressions to simplerones, and for this reason we use →.

Finally, ≡ is used to denote identity on A. The symbol = denotes theequivalence relation generated by →, called convertibility. We have a =α b (a isconvertible to b) only if there is a finite conversion sequence a ≡ a0 ↔α a1 ↔α

a2 ↔α · · · ↔α an ≡ b, where ↔α=→α ∪ ←α, which is the symmetric closure of→α.

3.2 Term Rewriting SystemsA term rewriting system (TRS) is an abstract reduction system where the ob-jects are first-order terms, and the reduction relation is presented in a standardschematic format of so-called rewrite rules [2]. More formally, a TRS is a pair(Σ, R) of a signature Σ and set of reduction rules or rewrite rules R.

Σ consists of a non-empty set of function symbols or operator symbols eachequipped with a fixed arity, a natural number indicating the number of argu-ments it is supposed to have. 0-ary or nullary functions are constants. A termis a string of symbols from an alphabet consisting of Σ and a countably infiniteset V of variables, where it is assumed that Σ ∩ V = ∅. The set of terms overΣ, indicated as Ter(Σ), is defined inductively:

1. x ∈ Ter(Σ) for every x ∈ V

2. If F is an n-ary function symbol and t1, . . . , tn ∈ Ter(Σ), then F (t1, . . . , tn) ∈Ter(Σ).

For the term F (t1, . . . , tn), the terms ti are called the arguments and F the root.Terms not containing a variable are called ground or closed terms. Terms inwhich no variable occurs more than once are called linear.

Note that function symbols in terms can be used in prefix, infix, postfix orany other appropriate format that can be brought back to conform with the

8

inductive definition of Ter(Σ). Also, function symbols, including constants, canhave more suggestive names, like + for the binary addition operator or T andF for the boolean constants for example.

To obtain some flexibility in describing terms and operations on terms, weintroduce the notion of context. A context can be viewed as an incomplete termthat may contain several empty places, or holes. Formally, a context can bedefined as a term containing zero, one or more occurrences of a special constantsymbol • denoting these holes. A context can be viewed as a term over theextended signature Σ ∪ {•}.

Now that we have covered Ter(Σ) we move on to discuss the second in-gredient of a TRS, namely its rewrite rules R. First, a substitution is a mapσ : Ter(Σ)→ Ter(Σ) which satisfies

σ (F (t1, . . . , tn)) ≡ F (σ (t1) , . . . , σ (tn))

for every n-ary function symbol.A reduction rule or rewrite rule is a pair (t, s) where t, s ∈ Ter(Σ), and is

written t→ s. Two conditions are imposed:

1. the left-hand side t is not a variable,

2. the variables in the right-hand side s are already in t.

A rewrite rule is used to replace a subterm of a term that matches the rule’s left-hand side by the rule’s right-hand side. Variables match any subterm; all othersymbols must match exactly. We can view this as the rule giving the generalscheme, while an instance of that rule is obtained by applying a substitution.Consider a rewrite rule F (G (x) , y) → F (x, x). Then using a substitution σwith σ(x) = 0 and σ(y) = G(x) we can rewrite the term F (G (0) , G (x)) asfollows:

F (G (0) , G (x))→ F (0, 0)A rule is left-linear if in its left-hand side every variable occurs at most one.

A rule is a collapse rule if its right-hand side consists of a single variable.So far we have only seen one-sorted term rewriting systems where the signa-

ture Σ has just one sort. It is straightforward to extend the previous definitionsto the many-sorted case. Given a set of sorts (or “types”) we also, in additionto specifying an arity for each function symbol in Σ, supply sort information. Iff is an n-ary function symbol, n sorts Si have to be specified for the respectiveargument places, as well as an “end sort” S. Then f(t1, . . . , tn) will be a well-formed term only if each ti for 1 ≤ i ≤ n has sort Si; the sort of f(t1, . . . , tn) isS.

3.3 Term Graph Rewriting SystemsTerm graph rewriting is concerned with the representation of terms as graphs,and the manipulation of these terms by rule-based graph transformation [7].Graphs that represent terms can be defined in various ways. Here, we will usedirected, rooted labeled graphs.

Figure 5 shows an example term graph T1 with binary function symbols +and ∗, a unary function symbol S, a constant 0 and a variable z.

Let lab(v) denote the label and term(v) denote the term representation of anode v in a graph. Graph T is a term graph if

9

Figure 5: Example term graph T1 for Σ = {+, ∗, S, 0} and V = {z}.

1. there is a single root node from which all other nodes are reachable,

2. for every non-leaf node v with children v1, . . . , vn: lab(v) = F ∈ Σ andterm(v) = F (term(v1), . . . , term(vn)) ∈ Ter(Σ),

3. for every leaf node w: lab(w) = x ∈ Σ ∪ V and term(w) = x,

4. T is acyclic.

In this paper we will not be concerned with infinite terms, which give rise tocycles in its corresponding term graph. Although cyclic graphs are not neces-sarily a problem in graph rewriting, we impose that term graphs are acyclic inour definition because it allows to relate term graph rewriting with the theoryof term rewriting [7]. It also simplifies some of the discussions in the rest of thispaper.

From the definition it follows that every node in a term graph represents awell-formed term that can be flattened or unraveled to a tree by traversing itfrom the root of that term and replacing all shared subgraphs by separate copiesof their term representations. Using infix notation for + and ∗, term graph T1of Figure 5 has as term representation

term(T1) = S (0) + ((S (0) ∗ (0 + z)) + (S (0) ∗ (0 + z)))

Looking at the inverse of this operation it is clear that, in the presence of sharedsubterms, term graph rewriting can be more space efficient than rewriting termsas trees. Also, when applying reduction rules, if a variable occurs multiple timeson the right-hand side of a rule, only pointers need to be copied instead of theentire subterm. Furthermore, application of rewrite rules inside shared subtermsneed only be carried out once instead of being applied to each identical subterm,making graph rewriting also more time efficient.

Although term rewriting and graph rewriting as we defined them here haveidentical properties for our purposes, sharing can be quite important in practice

10

to achieve acceptable performance. It is for this reason that graph rewritingis a common method of implementing term rewriting algorithms [6]. In theremainder of this paper it will be irrelevant to the discussion whether we aredealing with a tree or graph representation of terms.

3.4 Dynamic DependenceDuring a reduction process we can maintain dynamic dependence relations toallow us to determine how subterms are combined and propagated by the re-duction process as a whole. For each term Ti that reduces to Ti+1 we definetwo binary relations between the nodes of Ti and Ti+1. The residuation rela-tion relates nodes in Ti+1 to the corresponding occurrence of the same nodein Ti. These are the nodes that the rewriting rule does not “touch”. If a ruleis applicable for a subterm, the outer context on which the rule has no effectwould for example be included in this set, as well as variables that appear onboth sides of the rewrite rule. After identifying the residuation relations, whatremains on the right-hand side are “new” nodes created by the rewrite step andon the left-hand side “original” nodes that are not carried over by the rewritestep. The creation relation relates every new node in Ti+1 to all the nodes of Tithat match the symbols in the left-hand side of the rewriting rule (making therewriting step possible). In the absence of left-nonlinear and collapse rules, thedynamic dependence relation for a reduction sequence T0 � Tn then consistsof the transitive closure of creation and residuation relations from the rewritingsteps in T0 � Tn.

For any reduction T0 � Tn a term slice with respect to some subcontextC of Tn is defined as the subcontext S of T0 that is found by tracing backthe dynamic dependence relations from C. Nodes not involved in a creationor residuation relation are irrelevant in a subcontext C and therefore will berepresented by • in the resulting term slice.

To illustrate this, consider the following boolean TRS consisting of two rules(R1) and (R2).

∨ (∨ (p1, p2) , p3)→ ∨ (p1,∨ (p2, p3)) (R1)∨ (T, p)→ T (R2)

By applying the rules above, the term

∧ (∨ (∨ (T,F) ,∧ (F,T)) ,F)

may be rewritten by first applying (R1) followed by (R2).

T0 = ∧(∨ (∨ (T,F) ,∧ (F,T)),F

)→

(R1)

T1 = ∧(∨ (T,∨ (F,∧ (F,T))),F

)→

(R2)T2 = ∧ (T,F)

The reduction sequence from T0 to T1 via (R1) is depicted in Figure 6. Theouter context in T1 here is ∧ (•,F) so between the function symbols ∧ and F wehave a residuation relation. We also identify the subtrees representing variables

11

Figure 6: Creation and residuation rules after applying (R1).

p1, p2, and p3 on both sides of (R1) and we draw a residuation relation betweenthem as well. We now have two newly created ∨-nodes remaining in T1. In T0all the remaining ∨-nodes are involved in the rewriting step. Therefore, every∨-node in T1 has a creation relation to all ∨-nodes in T0.

Figure 7 depicts rewriting step (R2) and we proceed likewise. The outercontext remains ∧ (•,F) and the constant T in T2 is connected with functionsymbols ∨ and T appearing on the left-hand side of (R2) through a creationrelation.

The term slice with respect to the constant T in T2 by tracking back all cre-ation and residuation relations to T0 then consists of the subcontext ∨ (∨ (T, •) , •).

4 PimPim is an equational system that can be used to analyze Algol-class imperativeprogramming languages [1]. It can be oriented to obtain a TRS that char-acterizes the dynamic semantics of programs working on a store (an abstractrepresentation of memory), augmented with generalized conditionals. Trans-lation of such programs to Pim allows us to formally manipulate and reasonabout those programs. The Pim form of a program is often more suited forcarrying out semantic-preserving transformations than the program text itself.Pim can serve as a common intermediate representation for expressing flexiblegeneric slicing algorithms. Only the translation from various languages to Pimis needed to perform program slicing in those languages.

In order to keep the discussion simple, we will focus on a subsystem of Pimnecessary to validate the approach and its results described in this paper. Wealso do not discuss certain semantic issues like nonterminating computations.

12

Figure 7: Creation and residuation relations after applying (R2).

4.1 Pim SignatureThe signature of Pim is given in Figure 8. Since we are interested in knowingthe values at certain addresses during or after program execution, Pim can beviewed as a parameterized data type with formal sorts V and A. These sortsare intended to be instantiated as appropriate to model the data manipulatedby a given programming language. In the case of µC, we add the auxiliary sortsand functions depicted in Figure 9 to model the integer data manipulated byµC programs.

Figure 10 shows a subset of the equations of Pim that is intended to functionas an “operational semantics”. By orienting them from left to right they form arewriting system that is terminating and confluent on ground terms of sort V.Having a confluent set of rewriting rules is important, since reductions may beperformed anywhere in terms without affecting the final term produced.

The store structure in Pim is essentially an abstract term representation ofmemory. The simplest form of a store is a store cell that associates an addressexpression with a merge structure. For example, the store cell

{addr(i)→ [1]}

associates address expression addr(i) with merge structure [1]. Constant ad-dresses such as addr(i) represent ordinary variables.

Stores can be constructed out of several substores using the compositionoperator ◦s. Equations (L1) and (L2) indicate that empty stores, denoted by∅s, disappear when composed with other stores. Stores can be guarded, meaningthey are only executed conditionally. When guarded by the true predicate Ta store structure evaluates to itself, according to equation (L5). Similarly by(L6), if it is guarded by the false predicate F it evaluates to the empty store.

Because stores associates addresses with merge structures, dereference ofstores with an expression of the form

s@a

13

sorts

S store structuresM merge structuresA addressesB booleansV base values

functions

{A 7→M} → S store cellB .s S → S guarded storeS ◦s S → S store composition∅s → S null store

S@A →M store dereference[V]→M merge cell

B .mM→M guarded mergeM◦mM→M merge composition

∅m →M null mergeα1, α2, ...→ A address constants

T,F→ B boolean constantsA � A → B address comparison¬B → B boolean negationM!→ V merge selection

c1, c2, ...→ V base value constants?→ V unknown base value

Figure 8: Signature of Pim

14

sorts

Id identifiersIntLiteral ⊆ V integer literals (subsort of base values)

functions

meta(Id)→ V meta-variables construction from identifiersintSum(V,V)→ V integer additionintEq(V,V)→ V integer equality

addr(Id) address constants construction from identifiers

Figure 9: Auxiliary signature of µC-specific Pim extensions

∅ ◦ l→ l (L1)l ◦ ∅ → l (L2)

T B l→ l (L5)F B l→ ∅ (L6)

{a1 7→ m}@a2 →=〈a1, a2〉 Bm m (S1)∅s@a→ ∅m (S3)

(s1 ◦s s2) @a→ (s1@a) ◦m (s2@a) (S4)(m ◦m [v])!→ v (M2)

[v]!→ v (M3)=〈k1, k2〉 → T, ki constants, k1 = k2 (E1)=〈k1, k2〉 → F, ki constants, k1 6= k2 (E2)

¬T→ F (B1)

Figure 10: Subset of equations of Pim. The equations labeled (Ln) are genericto store or merge structures.

15

results in a merge structure. Unlike an ordinary “lookup” operation whichretrieves a single value, the Pim store dereferencing operator can be thought ofas retrieving all of the values ever associated with the address at which the storeis dereferenced, aggregating those results into a merge structure. This behavioris codified by equations (S1), (S3) and (S4).

The simplest non-empty form of a merge structure is a merge cell such as[meta(N)]. The value meta(N) is the translation of a µC meta-variable ?N. Thenull merge structure is denoted by ∅m. Merge structures can be composed withthe composition operator ◦m and they can be guarded by a boolean predicate.Polymorphic equations (L1), (L2), (L5) and (L6) apply to merge structuresas well. We often drop the subscripts distinguishing between store and mergeconstructs when no confusion will arise.

Merge structures used in conjunction with the selection operation ! yieldvalues. From (M2) and (M3) it follows that when the selection operator isapplied to a merge structure m, the rightmost cell in m must be an unguardedcell [v] for the entire expression m! to evaluate to v.

Combining the dereference operator for retrieval with the selection operator,an expression of the form

(s@a)!

will first retrieve all the values in store s associated with address a and yieldthe value that was most recently assigned to a.

4.2 Translation from µC to PimThe syntax of µC was given in Figure 1 on page 3. A formal description of thetranslation of µC programs to Pim terms is given in Figure 11. The translationis written in the style of Natural Semantics. The translation uses different rulesof the form

premises

conclusion

If the premises are true, then the conclusion is true as well. The numerator is alist of sequents and the conclusion a single sequent. A sequent is a propositioncombined with the hypotheses necessary to prove it, and they are divided bythe turnstile character `. Each sequent takes a µC construct c and an incomingPim store s, and yields a Pim term or a pair of Pim terms, depending on theconstruct being translated.

• Sequents with ⇒Stm are used to translate statements, and sequents with⇒Pgm to translate entire programs. They yield a Pim store term u corre-sponding to the cumulative updates to the store made by the statementor program.

• Sequents with⇒Exp are used to translate expressions computing ordinaryvalues and yield a pair of Pim terms 〈v, u〉. The store term v correspondsto the expression’s value and store term u corresponds to the expression’sside effects. If u = ∅S , i.e. the expression has no side effects, we call theexpression pure. For unpure expressions we have u 6= ∅s.

• Sequents with ⇒LV alue are used to translate expressions computing L-values and yield a pair of Pim terms 〈a, u〉. The address term a corre-sponds to the expression’s L-value and store term u corresponds to the

16

(P ) ∅s ` Stm⇒Stm u

` Stm⇒Pgm u

(S1)s ` {StmList} ⇒Stm u1s ◦ u ` Stm⇒Stm u2

s ` {StmListStm} ⇒Stm u1 ◦ u2

(S3) s ` Exp⇒Exp 〈v, u〉s ` Exp;⇒Stm u

(S5)

s ` Exp⇒Exp 〈vE , uE〉s ◦ uE ` Stm1 ⇒Stm uS1s ◦ uE ` Stm2 ⇒Stm uS2

s ` if (Exp)Stm1 elseStm2 ⇒Stm uE ◦ (itob(vE) B uS1) ◦ (¬itob(vE) B uS2)

(Ep1) s ` Id⇒Exp 〈(s@addr(Id))!, ∅s〉

(Ep2) s ` IntLiteral⇒Exp 〈IntLiteral, ∅s〉

(Ep3) s `?Id⇒Exp 〈meta(Id), ∅s〉

(Eu3)s ` Exp⇒Exp 〈vE , uE〉

s ◦ uE ` LV alue⇒LV alue 〈vL, uL〉s ` LV alue=Exp⇒Exp 〈vE , uE ◦ uL ◦ {vL 7→ [vE ]}〉

(Eu5)s ` Exp1 ⇒Exp 〈v1, u1〉

s ◦ u1 ` Exp2 ⇒Exp 〈v2, u2〉s ` Exp1+Exp2 ⇒Exp 〈intSum(v1, v2), u1 ◦ u2〉

(Eu8)s ` Exp1 ⇒Exp 〈v1, u1〉

s ◦ u1 ` Exp2 ⇒Exp 〈v2, u2〉s ` Exp1==Exp2 ⇒Exp 〈intEq(v1, v2), u1 ◦ u2〉

(Lp) s ` Id⇒LV alue 〈addr(Id), ∅s〉

Figure 11: Translation rules for µC to Pim terms.

17

expression’s side effects. Again u may or may not be the empty store,depending on whether the L-value expression has side effects or not.

Note that the pair construct returned by ⇒Exp or ⇒LV alue is an auxiliarysymbol used only during the translation process. It is not itself a functionsymbol of Pim.

We will now give some examples of how the translation process works. Con-sider the µC assignment expression n = ?N and incoming Pim store s. Rule(Eu3) applies and we first translate the right hand side ?N using rule (Ep3) andstore s, yielding the pair 〈meta(N), ∅s〉. Next we apply rule (Lp) on the L-valueexpression n using store s ◦ ∅s = s yielding the pair 〈addr(n), ∅s〉. In the sequelwe will apply equations (L1)-(L2) implicitly. The conclusion of rule (Eu3) thenresults in the pair

〈meta(N), s ◦ {addr(n) 7→ [meta(N)]}〉

for the entire assignment expression.Next, consider the same assignment expression as a statement: n = ?N;.

We apply rule (S3), thereby effectively deconstructing the pair we previouslyobtained by discarding the expression value meta(N) and keeping only the sideeffect s ◦ {addr(n) 7→ [meta(N)]} on the initial store s.

5 Program Slicing using Pim and Dynamic De-pendence Tracking

5.1 Static Slicing RevisitedWe will return now to program P2 depicted in Figure 4 on page 6. We willagain attempt to find the static slice with respect to the value of n in state-ment write(n), this time using Pim with the technique of dynamic dependencetracking.

The first step is to transform P2 to its corresponding Pim representationT0 and establish initial transitive dependence relations between nodes of theprogram’s AST and the Pim terms during the process. Figure 12 depicts thePim term graph representation corresponding with program P2, with some ofthe (shared) graph edges shown and most other subgraphs flattened for clarity.Also note that the graph is turned upside down, so that its structure mimicsthe sequence of statements in P2 from top to bottom.

We reduce the obtained store structure T0 using the rewriting rules specifiedin Figure 10 until we arrive at the following ground term Tn, which is also astore structure.

Tn = {addr(n) 7→ meta(N)} ◦ {addr(i) 7→ 1} ◦ {addr(n) 7→ intSum(meta(N), 1)}

Slicing with respect to the value of variable n amounts to selecting its valuefrom Tn.

Slice (P2, n) = (Tn@ addr(n))! = intSum(meta(N), 1)

By tracing the dynamic dependence relations from Slice (P2, n) back to the ASTwe finally derive the statement-minimal slice with respect to n, which is exactlyprogram (c) in Figure 4.

18

Figure 12: Schematic Pim representation of program P2 from Figure 4.

The above approach eagerly simplified T0 to Tn before applying the slic-ing criterion. We could also have computed the slice using a single reductionsequence by rewriting the expression(T0@ addr(n))!, in which the initial Pimgraph for P2 is embedded in the slicing criterion.

5.2 FlexibilityA TRS used for program slicing can readily be extended by adding extra rewrit-ing rules to extend the slicing algorithm rather than redesign it for specific sit-uations. [4] describes several variations to extend Pim for computing slices inloops. This effectively results in different slicing algorithms without changingthe underlying TRS framework.

Although so far we have only concentrated on computing slices with respectto the final values of variables, [4] also explains how to compute slices for anypure µC expression e at any program point. This is done by first introducing anew variable v and inserting the assignment

v =op(v, e);

at the program point of interest. The operator op is a special operator that willnot be interpreted, allowing us to slice with respect to the final value of v tosimulate the slice for expression e.

It is fairly straightforward to express conditional slices (and dynamic slicesby extension) using Pim graphs. For this purpose we can extend our Slicefunction to take in a set of constraints that bind meta-variables to expressions.

Slice (P, x, 〈?X1 = e1; . . . ; ?Xn = en〉)≡ ((TP@ addr (x))!) [?X1 = t1, . . . , ?Xn = tn]

The conditional Slice function will evaluate the Pim expression representing theselection of the value of variable x in the Pim graph TP for program P , with

19

every meta-variable Xi substituted by the Pim form ti that corresponds withexpression ei.

As a final consideration we mention that, formally speaking, the resultingslices are contexts derived from the original program’s AST. Certain subtreescould be missing and represented by •. This is in fact possible due to the powerand flexibility of our slicing algorithm, in the sense that it is independent ofthe syntax of the programming language in which we slice. We may howeverencounter situations where the obtained slice is not parseable. Imagine an as-signment statement in the result where the L-value is missing, but the R-valuesurvived the slicing because its side-effects are relevant to that slice. It is verylikely that the language’s syntax rules require the L-value for an assignmentstatement to be present. Therefore it may be necessary to perform some post-processing to obtain parseable programs with identical behavior. In practicethis is not too difficult though, since any syntactically valid substitution for theholes will do [4].

6 Conclusion and Further ReadingThis paper provided an overview of a generic and flexible slicing algorithm, us-ing a term rewriting system with the technique of dynamic dependence trackingand which is equipped with an adequate set of rewriting rules that reduce pro-grams in function of the slicing criterion, while preserving the semantics of boththe language and the program being sliced. More precisely we performed thefollowing steps [8]:

• Translation of a program to Pim TRS.

• Transformation (reduction, optimization) of the Pim terms.

• Maintaining of a mapping between the original program, the original Pimterms and the transformed result.

• Extraction of actual slices, optionally followed by some post-processing.

Instead of specifying several different ad-hoc algorithms for specific purposes,a slicing algorithm based on a TRS can be seen as a generic framework thatcan be extended and modified to perform more accurate program slicing undervarying conditions and constraints.

Pim was introduced by John Field in [3] and that work was further developedin [1]. While this paper only employed a small subset of Pim rules neededto validate the proposed solutions, [1] is an exposition of Pim as a sequenceof increasingly powerful equational systems. Different subsystems of Pim (bytaking different subsets of the rules of Pim) are analyzed to determine if theyare confluent, terminating or left-linear.

In [4] Field, Ramalingam and Tip introduced the algorithm for comput-ing constrained slices, while almost all previously existing algorithms could becategorized as either static or dynamic.

The general notion of slicing, applicable to any TRS, and dynamic depen-dence is explained in detail in [5]. The same paper also presents a proof that,in the case of left-linear systems, the resulting slices are minimal and sound.

20

The work of Klop et al. provides a thorough introduction to the subject ofrewriting systems, including term and graph rewriting [2][6]. Some examples inthis paper are taken from Plump’s chapter in [7] on term graph rewriting.

References[1] JA Bergstra, TB Dinesh, J. Field, and J. Heering. Toward a complete

transformational toolkit for compilers. ACM Transactions on ProgrammingLanguages and Systems (TOPLAS), 19(5):639–684, 1997.

[2] M. Bezem, JW Klop, and R. de Vrijer. Term rewriting systems. CambridgeUniversity Press, 2003.

[3] J. Field. A simple rewriting semantics for realistic imperative programs andits application to program analysis. In Proceedings of the ACM SIGPLANWorkshop on Partial Evaluation and Semantics-Based Program Manipula-tion, pages 98–107, 1992.

[4] J. Field, G. Ramalingam, and F. Tip. Parametric program slicing. In Pro-ceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles ofprogramming languages, pages 379–392. ACM New York, NY, USA, 1995.

[5] J. Field and F. Tip. Dynamic dependence in term rewriting systems andits application to program slicing. Information and Software Technology,40(11-12):609–636, 1998.

[6] JR Kennaway, JW Klop, MR Sleep, and FJ De Vries. On the adequacyof graph rewriting for simulating term rewriting. ACM Transactions onProgramming Languages and Systems (TOPLAS), 16(3):493–523, 1994.

[7] D. Plump. Term graph rewriting. Handbook of Graph Grammars andComputing by Graph Transformation, 2:3–61, 1999.

[8] F. Tip. A survey of program slicing techniques. Centrum voor Wiskundeen Informatica, 1994.

[9] M. Weiser. Program slicing. In Proceedings of the 5th international confer-ence on Software engineering, pages 439–449. IEEE Press Piscataway, NJ,USA, 1981.

[10] Mark David Weiser. Program slices: formal, psychological, and practicalinvestigations of an automatic program abstraction method. PhD thesis,Ann Arbor, MI, USA, 1979.

21

The Application of Term Rewriting Systems for Expressing...

Documents

Transcript of The Application of Term Rewriting Systems for Expressing...