ISS Group at the University of Texas - bc...(a) Control Flow Graph a b (b) Postdominator Tree (c)...

Optimal Control Dependence Computation

and the Roman Chariots Problem

KESHAV PINGALI

Cornell University

and

GIANFRANCO BILARDI

Universit�a di Padova� Italy� and University of Illinois� Chicago

The control dependence relation plays a fundamental role in program restructuring and optimiza�tion� The usual representation of this relation is the control dependence graph �CDG�� but thesize of the CDG can grow quadratically with the input program� even for structured programs�In this article� we introduce the augmented postdominator tree �APT �� a data structure whichcan be constructed in space and time proportional to the size of the program and which supportsenumeration of a number of useful control dependence sets in time proportional to their size�Therefore� APT provides an optimal representation of control dependence� Speci�cally� the APTdata structure supports enumeration of the set cd�e�� which is the set of statements control de�pendent on control��ow edge e� of the set conds�w�� which is the set of edges on which statementw is dependent� and of the set cdequiv�w�� which is the set of statements having the same controldependences as w� Technically� APT can be viewed as a factored representation of the CDGwhere queries are processed using an approach known as �ltering search�

Categories and Subject Descriptors� D�� Programming Languages�� Processors�compilersand optimization I�� Algebraic Manipulation�� Algorithms�analysis of algorithms

General Terms� Algorithms� Languages� Theory

Additional Key Words and Phrases� Compilers� control dependence� program optimization� pro�gram transformation

�� INTRODUCTION

Control dependence is a key concept in program optimization and parallelization�Intuitively� a statement w is control dependent on a statement u if u is a conditionalthat a�ects the execution of w� For example in an if�then�else construct� statements

Keshav Pingali was supported by an NSF Presidential Young Investigator award CCR��NSF grant CCR�� and ONR grant N�� Gianfranco Bilardi was supportedin part by the ESPRIT III Basic Research Programme of the EC under contract No� �� ProjectGEPPCOM� and by the Italian Ministry of University and Research�Authors� addresses� K� Pingali� Department of Computer Science� Cornell University� Ithaca�NY �� email� pingali�cs�cornell�edu G� Bilardi� DEI� Universit�a di Padova� Padova� Italy Department of Electrical Engineering and Computer Science� University of Illinois� Chicago� IL�� email� bilardi�art�dei�unipd�it�Permission to make digital�hard copy of all or part of this material without fee is grantedprovided that the copies are not made or distributed for pro�t or commercial advantage� theACM copyright�server notice� the title of the publication� and its date appear� and notice is giventhat copying is by permission of the Association for Computing Machinery� Inc� �ACM�� To copyotherwise� to republish� to post on servers� or to redistribute to lists requires prior speci�cpermission and�or a fee�c� �� ACM ��

ACM Transactions on Programming Languages and Systems� Vol� �� No� �� May �� Pages ��

� � K�Pingali and G�Bilardi

on the two sides of the conditional statement are control dependent on the predicate�In the presence of nested control structures� multiway branches� and unstructured�ow of control� intuition is an unreliable guide� and one needs to rely on a formal�graph�theoretic de�nition of control dependence� based on the following concepts�

De�nition �� A control �ow graph G � �V�E� is a directed graph in whichnodes represent statements� and an edge u � v represents possible �ow of controlfrom u to v� Set V contains two distinguished nodes START� with no predecessorsand from which every node is reachable� and END� with no successors and reachablefrom every node�

To simplify the discussion we will follow standard practice and assume that thereis an edge from START directly to END in the control �ow graph Ferrante et al��

De�nition �� A node w is said to postdominate a node v if every path fromv to END contains w�

Any node v is postdominated by END and by itself� It can be shown that postdom�inance is a transitive relation and that its transitive reduction is a tree�structuredrelation called the postdominator tree� The parent of a node in this tree is called theimmediate postdominator of that node� The postdominator tree of a program canbe constructed in O�jEj��jEj�� time using an algorithm due to Lengauer and Tar�jan �� jEj� denotes the inverse Ackermann function which grows extremelyslowly with jEj so that the algorithm can be considered linear for all practicalpurposes� This algorithm is relatively easy to code� A rather more complicated al�gorithm due to Harel �� computes the postdominator tree optimally in O�jEj�time� Control dependence can be de�ned formally as follows

De�nition �� A node w is said to be control dependent on edge �u� v� if

�� w postdominates v and

�� if w �� u� then w does not postdominate u�

Intuitively� this means that if control �ows from node u to node v along edgeu � v� it will eventually reach node w� however� control may reach END fromu without passing through w� Thus� u is a �decision�point� that in�uences theexecution of w�

De�nition �� Given a control �ow graph G � �V�E�� its control dependencerelation is the set C � E � V of all pairs �e� w� such that node w is controldependent on edge e�

The notion of control dependence is due to Ferrante et al� �� Cytron et al�� studied many of the properties of this relation� These authors de�ne controldependence as a relation between nodes �in the context of De�nition �� they vieww as being control dependent on node u rather than on the edge �u � v�� Thede�nition given in this article is easier to work with� but the di�erence is merelyone of presentation� not substance�

Control dependence is used in many phases of modern compilers� such as data�owanalysis� loop transformations� and code scheduling� An abstract view of these

ACM Transactions on Programming Languages and Systems� Vol� �� No� �� May ��

Optimal Control Dependence Computation � �

aEV

b c g

dec

ba

f

gSTART

END

a

b

c

e

f

d

g

START

END

d e f

f bc dc e

aSTART

(a) Control Flow Graph

a b

(b) Postdominator Tree

(c) Control Dependence Relation

Fig� �� A program and its control dependence relation�

applications is that they require the computation of the following sets derived fromC Cytron et al� ��

De�nition �� Given a node w and an edge e in a control program graph withcontrol dependence relation C� we de�ne the following control dependence sets

�cd�e� � fw � V j�e� w� � Cg�

�conds�w� � fe � Ej�e� w� � Cg� and

�cdequiv�w� � fv � V jconds�v� � conds�w�g�

Set cd�e� is the set of nodes that are control dependent on edge e� while conds�w�is the set of control dependences of node w� These sets are used in schedul�ing instructions across basic�block boundaries for speculative or predicated exe�cution Bernstein and Rodeh �� Fisher �� Newburn et al� �� and are usedin merging program versions Horowitz et al� �� They are also useful in auto�matic parallelization Allen et al� �� Ferrante et al� �� Simons et al� �� Setcdequiv�w� contains the nodes that have the same control dependences as node w�This information is useful in code scheduling because basic blocks with the samecontrol dependences can be treated as one large basic block� as is done in regionscheduling Gupta and So�a �� The relation cdequiv can also be used to de�compose the control �ow graph of a program into single�entry single�exit �SESE�regions� and this decomposition can be exploited to speed up data�ow analysis bycombining structural and �xpoint induction Johnson �� Johnson et al� �� andto perform data�ow analysis in parallel Gupta et al� �� Johnson et al� ��

Figure � shows a small program and its control dependence relation� For anyedge e� cd�e� is the set of marked nodes in the row corresponding to e� For anynode w� conds�w� is the set of marked edges in the column corresponding to w�Finally� we see that cdequiv�c� � cdequiv�f� � fc� fg and that cdequiv�a� �cdequiv�g� � fa� gg� all the other nodes are in cdequiv sets by themselves�

In this article� we design a data structure to represent the control dependence



START

a

b

c

d

e

f

END

a b c d e fEV

cba

START a

def

Fig� �� A program with three nested repeat�until loops�

relation� Such a data structure must be evaluated along three dimensions

�preprocessing time T the time required to build the data structure�

�space S the overall size of the data structure� and

�query time Q the time required to answer cd� conds� and cdequiv queries�

The size of the control dependence relation gives an upper bound on the spacerequirements of such a data structure� It is easy to show that the size of the relationcan be ��jV jjEj�� even if we restrict our attention to structured programs� Figure �shows a program with three nested repeat�until loops and its control dependencerelation� It can be veri�ed that for programs consisting of n nested repeat�untilloops� jEj � �n�� and jCj � n�n�� therefore� the size of the control dependencerelation can grow quadratically with program size even for structured programs�

It would be incorrect to conclude that quadratic space is a lower bound on thesize of any representation of the control dependence relation� Note that the sizeof the postdominator relation grows quadratically with program size �consider achain of n nodes�� but this relation can be represented using the postdominatortree� which can be built in O�jEj� space Harel �� Lengauer and Tarjan ��and which provides constant time access to the immediate postdominator of anode� as well as proportional time access to all the postdominators of a node� Theexplanation of the paradox is that �� postdominance is a transitive relation and�� the postdominator tree� which is the transitive reduction of this relation� is a�factored�� compact representation of postdominance� There is no point in buildinga representation of the full relation because the factored relation is more compact�and it answers postdominance queries optimally�

Is there a factored representation of the control dependence relation which canbe built in O�jEj� space and O�jEj� preprocessing time� and which will answer cd�conds� and cdequiv queries in time proportional to the size of the output�

The standard representation of the control dependence relation is the control de�

pendence graph �CDG� Ferrante et al� �� which is the bipartite graph �V�E�C��That is� the two sets of nodes in the bipartite graph are V and E� and there is anedge between v and e if v is control dependent on edge e� Since the size of the CDGis ��jCj� �which can be ��jEjjV j�� many attempts have been made to constructmore compact representations of the control dependence relation Ball �� Cytron



et al� �� Ferrante et al� �� Johnson and Pingali �� Sreedhar et al� ��The lack of success led Cytron et al� to conjecture that any data structure thatprovided proportional time access to control dependence sets must use space thatgrows quadratically with program size�

In this article� we describe a data structure called the Augmented Postdominator

Tree �APT � which requires O�jEj� space� is built in O�jEj� time�� and providesproportional time access to cd� conds� and cdequiv sets� This is clearly optimalto within a constant factor� In fact� our approach incorporates a design parameter�� under the control of the compiler writer� representing a trade�o� betweentime and space� A smaller value of � results in faster query time at the expenseof more memory for a larger data structure� corresponding to a �more explicit�representation of the control dependence relation� Interestingly� the full controldependence graph can be viewed as one extreme of this range of data structures�obtained when � � ��jEj�

The rest of the article is organized as follows� In Section �� we reformulate theconds problem as a naturally stated graph problem called the Roman Chariotsproblem� The APT data structure is described incrementally by considering therequirements of the three kinds of control dependence queries� In Sections �� and �� we examine cd� conds� and cdequiv queries respectively and develop themachinery to answer these queries optimally� Experimental results using the SPECbenchmarks are reported in Section �� Finally� in Section �� we contrast our ap�proach with dynamic techniques like memoization Michie �� we also show thatour approach can be viewed as an example of Chazelle�s �ltering search Chazelle��

�� THE ROMAN CHARIOTS PROBLEM

We show that the computation of control dependence sets �De�nition �� hasa natural graph�theoretic formulation which we call the Roman Chariots problem�This formulation exploits the fact that nodes that are control dependent on an edgee in the control �ow graph form a simple path in the postdominator tree Ferranteet al� �� First� we introduce some convenient notation�

De�nition �� Let T �� V� F � be a tree� For v� w � V � let v� w� denotethe set of vertices on the simple path joining v and w in T � and let v� w� denotev� w�� fwg� �In particular� v� v� is empty��

For example� in the postdominator tree of Figure ��b�� d� a� � fd� f� c� ag� whiled� a� � fd� f� cg� This notation is similar to the standard one for open and closedintervals of the line� The following key theorem� due to Ferrante et al� �� shows how edges of the control �ow graph are constrained with respect to thepostdominator tree and provides a simple characterization of cd sets�

Theorem �� If �u� v� is an edge of the control �ow graph� then

�� parent�u� is an ancestor of v in the postdominator tree and

�� cd�u� v� � v� parent�u��

�We assume that the postdominator relation is computed using Harel�s algorithm if the Lengauerand Tarjan algorithm is used� preprocessing time becomes O�jEj��jEj��



aEV

b c gd e f

f bc dc e

aSTART

a b

�a� Control dependence relation

a b

dec

ba

f

gSTART

END

f bc dc e

aSTART

setsE cd

)[ ,ab,g

END

d,f[[[[

))))

e,fb,c

�b� Postdominator tree and cd sets

Fig� �� Compact representation of control dependence�

Proof� Note that since no control �ow edge emanates from END� the expressionparent�u� is de�ned whenever �u� v� � E�

�� If parent�u� does not postdominate v� we can �nd a path v � �� END whichdoes not contain parent�u�� Pre�xing this path with the edge u� v� we obtaina path from u to END which does not contain parent�u�� contradicting the factthat parent�u� postdominates u�

�� We show that cd�u� v� � v� parent�u�� Let w be an element of cd�u� v��From the de�nition of control dependence� w must postdominate v� so w is onthe path v� END� in the postdominator tree� From part �� parent�u� is also onthe path v� END�� However� w cannot be on the path parent�u�� END�� since inthat case it would be distinct from u and postdominate u� Therefore� w mustbe on the path v� parent�u�� Conversely� assume that w is contained in thepath v� parent�u�� From part �� it follows that w postdominates v� it alsofollows that w does not postdominate parent�u�� Therefore� if w �� u� thenw cannot postdominate u either� Therefore w is control dependent on edgeu� v�

This gives a concise characterization of cd sets�

Figure � shows the nonempty cd sets for the program of Figure ��a�� If v� w�is a cd set� we will refer to v and w as the bottom and top nodes of this setrespectively� where the orientations of bottom and top are with respect to thetree�� The postdominator tree and the array of cd sets� together� can be viewed as

�As an aside� we remark that the bottom�closed� top�open representation for the sets has beenchosen here� since it is the most immediate to obtain in our application� In general� a closed setb� t��



a compact representation of the control dependence relation� since we can recoverthe full control dependence relation by expanding each entry of the form v� w� tothe corresponding set of nodes by walking up the postdominator tree from v to w�The advantage of using the postdominator tree and cd sets� instead of the CDG� isthat they can be represented in O�jEj� space� and as we will see� they can be builtin O�jEj� time� What is not obvious is how they can be used to answer controldependence queries in proportional time � that is the subject of the rest of thearticle�

For the purpose of exposition� it is convenient to assume that the array of cdsets� which is indexed by CFG edges in Figure �� is indexed instead by the integers��m� where m is the number of CFG edges for which the corresponding cd setsare nonempty� We will assume that the conversion from an integer �between � andm� to the corresponding CFG edge and vice versa can be done in constant time�We can now reduce the control dependence problem to a naturally stated graphproblem�

Roman Chariots Problem� The major arteries of the Roman road system forma tree rooted at Rome�� Nodes represent cities� and edges represent roads� Publictransportation is provided by chariots that travel between a city and one of itsancestors in the tree�

Given a rooted tree T �� V� F�ROME � and an array A��m� of chariot routeseach speci�ed in the form v� p�� where p is an ancestor of v in T � design a datastructure that permits enumeration of the following sets

�� cd�� the cities on route ��

�� conds�w� the routes that serve city w�

�� cdequiv�w� the cities that are served by all and only the routes that servecity w�

For future reference� we introduce the following de�nition

De�nition �� The set of chariots serving a node v is a subset of A� the set ofall chariots� and will be referred to as set Av �

The control dependence problem is reduced to the Roman Chariots problem asfollows� Procedure ConstructRomanChariots in Figure � takes a control �owgraph as input and returns the corresponding Roman Chariots problem� Assumingthe postdominator tree can be built in time O�jEj�� Procedure ConstructRo�

manChariots takes time O�jEj� and space O�jEj�� Control dependence queriesare handled as follows

in which t is an ancestor of b� is readily converted into the equivalent half�open one b� parent�t��in constant time� The conversion of set b� t� into a closed one is less straightforward and takestime proportional to the number of children of t� assuming that ancestorship can be decided inconstant time� However� if the conversion has to be performed for a batch of half�open sets A� itcan be accomplished in time O�jV j� jAj� by a depth��rst traversal of the tree� This conversion isnot needed in this article��A thorough literature search failed to turn up any historical evidence to support this statement�but it is a matter of record that all roads led to Rome �Cicero� �� B�C�� Pro L� Cornelio BalboOratio �� just as in a tree rooted at Rome�



Procedure ConstructRomanChariots�G�CFG��Tree� RouteArray�f�� G is the control �ow graph�� T �� construct�postdominator�tree�G� �� A �� Initialize to empty array�� i �� for each node p in T in top�down order do� for each child u of p do�� for each edge �u � v� in G do�� if v is not p � then �append a cd set to end of A

�� i �� i � � �� Ai� �� v� p� �� Note the correspondence between�� edge u� v and index i �� endif�� od�� od�� return T � A g

Fig� � Constructing a Roman Chariots problem�

�cd�u� v� If v is parent�u�� return the empty set� Otherwise� let i be the indexinto array A for edge u� v� Execute the Roman Chariots query cd�i��

�conds�w� Execute the Roman Chariots query conds�w�� and translate eachinteger �between � and m� returned by this query to the corresponding CFGedge�

�cdequiv�w� Execute the corresponding Roman Chariots query cdequiv�w��

The correctness of this reduction follows immediately from Theorem �� andProcedure ConstructRomanChariots� In the construction of Figure �� the cd

sets in A are sorted by decreasing top nodes� that is� if t� is a proper ancestor oft� in the postdominator tree� then any cd set whose top node is t� is inserted inthe array before any cd set whose top node is t�� We will exploit this order whenwe consider conds queries in Section �� Note that for a general Roman Chariotsproblem �not arising from a control dependence problem�� this sorting can be doneby a variation of Procedure ConstructRomanChariots� in time O�jAj � jV j��This is within the budget for preprocessing time given below� Therefore� we willassume without loss of generality that A has been sorted in this way�

In subsequent sections� we develop a data structure for the Roman Chariot prob�lem� obtained by a suitable augmentation of the given tree T � Motivated by theapplication to control dependence� we call this data structure an Augmented Post�

dominator Tree �APT �� The rest of the article establishes the following result�

Theorem �� There is a data structure APT for the Roman Chariots problem

�T �� V� F�ROME ��A� that can be constructed in time � � O�jAj��jV j�and stored in space S � O�jAj� �� jV j�� where � � � is a design parameter�

By traversing APT � the queries can be answered with the following performance�

�cd�e�� time O�jcd�e�j�� independent of ��

�conds�w�� time O�� jconds�w�j��



�cdequiv�w�� time O�jcdequiv�w�j�� independent of ��

For the special case when T is the postdominator tree of a control �ow graph�we have jAj � jEj and jV j � jEj� �� leading to the following result�

Corollary �� Given a CFG G � �V�E�� structure APT can be built in

O�jEj� preprocessing time and space and provides proportional time access to cd�

conds� and cdequiv sets�

�� APT � cd QUERIES

No preprocessing is required to answer cd queries optimally� If the query is cd�i��where i is between � and m �the size of A�� let v� w� be the ith route in A� Walkup the tree T from node v to node w� and output all nodes encountered in thiswalk� other than node w� This takes time proportional to the size of the output�This algorithm is similar to that of Ferrante et al� ��

�� APT � conds QUERIES

One way to answer conds queries is to examine all routes in array A and reportevery route whose bottom node is a descendant of the query node and whose topnode is a proper ancestor of the query node� This algorithm is too slow�

A better approach is to limit the search to routes whose bottom nodes are de�scendants of the query node� since these are the only routes that can contain thequery node� To facilitate this we will assume that at every node v we have recordedall routes whose bottom node is v� then the query procedure must visit the subtreeof the postdominator tree rooted at the query node and examine routes recordedat these nodes� This is shown in Figure ��a�� The space taken by the data struc�ture is S � O�jV j � jAj�� which is optimal� However� in the worst case� the queryprocedure must examine all nodes and all routes �consider the query conds�END��so query time is Q � O�jAj � jV j�� which is too slow� To speed up query time� weextend this idea as follows� Rather than store a route only at its bottom node� wecan store the route at every node contained in the route� as in Figure ��b�� We callthis approach full caching� to contrast it to the previous scheme which we call nocaching� Given a query at node q� the query procedure simply outputs all routesstored at that node� if jAq j is the size of this output� this takes time Q � O�jAq j��which is optimal� Unfortunately� this strategy produces the control dependencegraph in disguise and therefore blows up space requirements� For example� for theRoman Chariots problem arising from a nested repeat�until loop� the reader canverify that ��jAj� routes each contain ��jV j� nodes and hence are represented inas many lists� requiring space S � ��jV jjAj� overall� which is far from optimal�

It is possible to compromise between these two extremes� Suppose we partitionthe nodes in V into two disjoint sets called boundary nodes and interior nodes�Although this partition can be made arbitrarily� it is simpler to make all leaf nodesboundary nodes� for now� nonleaf nodes can be classi�ed arbitrarily as boundaryor interior nodes� With each node v � V � we associate a list of routes Lv� de�nedformally as follows�

De�nition �� If v is an interior node� Lv� is the list of all routes whose bottomnode is v� if v is a boundary node� Lv� is the list of all routes containing v�


� K�Pingali and G�Bilardi

)[ ,ab,g

END

d,f[[[[

))))

e,fb,c

ChariotRoutes

4

5

1

2

3dec

ba

f

g

END

START

{3}{4}

{2,5}{1}

dec

ba

f

g

END

START

dec

ba

f

g

END

START

{4} {3}

{1,2}

{}

{4} {3}{1,2}

{1,2}

{1}

{1} {2,5}

{}

{}

{}

{1} {2,5}

(a) No caching

(b) Full caching (c) Some caching: α = 1

Fig� �� Zone structure�

In Figure �� boundary nodes are shown as solid dots� while interior nodes areshown as hollow dots� Figure ��a� shows one extreme in which all nonleaf nodesare interior nodes� while Figure ��b� shows the other extreme in which all nodes areboundary nodes� In Figure ��c�� nodes c and g are interior nodes� while all othernodes are boundary nodes�

Our conds query procedure visits nodes in the subtree below the query node asbefore� but it exploits boundary nodes to limit the portion of this subtree that itvisits� Suppose that the query node is q and that the query procedure encountersa boundary node x� It is easy to show that the query procedure does not needto visit nodes that are proper descendants of x � any route � which contains qand whose bottom node is a proper descendant of x must also contain x� fromDe�nition �� must be stored at x� Therefore� to answer the query conds�q��it is unnecessary to examine the subtree below x� since all the relevant chariotroutes from this subtree are stored at x itself� For example� in Figure ��c�� whenanswering the query conds�g�� it is unnecessary to look below boundary node f �and the query can be answered just by visiting nodes f and g� One way to visualizethis is to imagine that the edges connecting a boundary node to its children aredeleted from the tree �these edges are never traversed by the query procedure��This leaves a forest of small trees� and the query procedure needs to visit only thedescendants of a query node in this forest� We will call each tree in this forest azone� the portion of the forest below a node q will be called the subzone associatedwith node q� These concepts are de�ned formally as follows�


Optimal Control Dependence Computation �

Procedure CondsQuery�QueryNode��f�� APT data structure is global variable �� Query outputs list of routes numbers�� CondsVisit�QueryNode� QueryNode� gProcedure CondsVisit�QueryNode� V isitNode��f�� for each route i in LV isitNode�� in list order do�� let A�i� be b� t�

�� if t is a proper ancestor of QueryNode� then output i � else break � exit from the loop�� od �� if V isitNode is not a boundary node � then�� for each child C of V isitNode�� do�� CondsVisit�QueryNode�C�� od �� endif ��g

Fig� �� Query procedure for conds�

De�nition �� A node w is said to be in the subzone associated with a nodeq� referred to as Zq� if �� w is a descendant of q and �� the path q�w� does notcontain any boundary nodes�

A zone is a maximal subzone� that is� a subzone that is not strictly contained inany other subzone�

In Figure ��c�� there are six zones induced by the following sets of nodes fa� b� cg�fdg� feg� ff� gg� fSTARTg� and fENDg� The subzone associated with node g is the setof nodes ff� gg� Note that even though Chariot Route � contains nodes fa� c� f� gg�it is stored only at nodes a and f � since these are the only boundary nodes itcontains�

Given a query node q� the query procedure examines routes stored at nodes insubzone Zq� To avoid examining routes unnecessarily� we will assume that each listLv� is sorted by top endpoint� from higher �closer to the root� to lower� Exami�nation of routes in a list Lv� can terminate as soon as a route b� t� not containingq is encountered� further routes on the list terminate at a descendant of t and donot contain the query node q� A simple implementation of this query procedureis given in Figure �� Boundary nodes are distinguished from interior nodes by aboolean named Bndry� which is set to true for boundary nodes and to false forinterior nodes� an algorithm for determining which nodes are boundary nodes willbe described in Section �� In line � of Procedure CondsVisit� testing whether tis a proper ancestor of QueryNode can be done in constant time as follows sincet and QueryNode are ordered by the ancestor relation� we can give each nodea dfs �depth��rst search� number and then establish ancestorship by comparingdfs numbers� Since dfs numbers are already assigned by postdominator tree con�



struction algorithms Harel �� Lengauer and Tarjan �� this is convenient�Alternatively� we can use level numbers in the tree�

It follows immediately that the query time is proportional to the sum of thenumber of visited nodes and the number of reported routes

Qq � O�jAq j� jZqj��

Next� we discuss how zones can be constructed to obtain optimal query timewithout blowing up space requirements�

�� Criterion for Zones

To obtain optimal query time� we require that the following inequality hold for allnodes q� �� a positive real number� is a design parameter

jZq j � �jAq j� � � ��

Intuitively� the number of nodes visited when q is queried is at most one more thansome constant proportion of the answer size� The additive term of � prevents zoneZq from becoming empty when q is not contained in any route �jAq j � �� Bycombining Eqs� �� and �� we see that

Qq � O�� jAq j��

Thus� the amount of work done for a query is basically proportional to the outputsize� for � a constant� this is asymptotically optimal�

To get some intuition for the signi�cance of �� consider what happens if we �xthe problem and vary �� If � is set to a small number close to � �strictly speaking�a number less than ��jAj where jAj is the number of chariot routes�� the size of thesubzone associated with each node is �� This means that each node is in a zoneby itself� which corresponds to full caching� At the other extreme� if we choose avery large value of �� nodes can be contained in arbitrarily large zones� and thesituation corresponds to no caching� Thus� by varying �� we get the full range ofbehavior from full caching to no caching�

Can we build zones so that Inequality �� is satis�ed� without blowing up storagerequirements� One bit is required at each node to distinguish boundary nodes frominterior nodes� which takes O�jV j� space� The main storage overhead arises fromthe need to list all overlapping routes at a boundary node� even if these routesoriginate at some other node� This means that a route must be entered into theLv� list of its bottom node and of every boundary node between its bottom nodeand top node�

Our zone construction algorithm is a simple bottom�up� greedy algorithm thattries to make zones as large as possible without violating Inequality �� Moreprecisely� a leaf node is always a boundary node� For a nonleaf node v� we see ifv and all its children can be placed in the same zone without violating Inequal�ity �� if not� v is made a boundary node� and otherwise v is made an interiornode�� Formalizing this intuitive description� we obtain a de�nition for subzone

A variation of this scheme is to allow a node to be in the same zone as some but not all of itschildren� We do not need this complication� but it may be possible to exploit this idea to reducestorage requirements further�



construction�

De�nition �� If node v is a leaf node or �� P

u�children�v jZuj� � ��jAv j�

�� then v is a boundary node� and Zv is fvg� Else� v is an interior node� and Zv

is fvg �u�children�v Zu�

Note that the term �� P

u�children�v jZuj� is simply the number of nodes thatwould be visited by a query at node v if v were made an interior node� If thisquantity is larger than ��jAv j � �� Inequality �� fails� so we make v a boundarynode� Zones are simply maximal subzones�

The de�nition of zones lets us bound storage requirements as follows� Denote byX the set of boundary nodes that are not leaves� If v � �V �X�� then only routeswhose bottom node is v are listed in Lv�� Each route in A appears in the list ofits bottom node and� possibly� in the list of some other node in X � For a boundarynode v� jLv�j � jAv j� Hence� we have

X

v�V

jLv�j �X

v��V�X

jLv�j�X

v�X

jLv�j � jAj�X

v�X

jAv j� ��

From De�nition �� if v � X � then

jAv j �X

u�children�v

jZuj��

When we sum over v � X both sides of Inequality �� we see that the right�hand side evaluates at most to jV j�� since all Zu subzones involved in the resultingdouble summation are disjoint� Hence�

Pv�X jAv j � jV j�� which� used in Relation

�� yieldsX

v�V

jLv�j � jAj� jV j��

In conclusion� to store APT � we need O�jV j� space for the postdominator tree�O�jV j� further space for the Bndry� bit and for list headers� and �nally� fromInequality �� O�jAj � jV j�� for the list elements� All together we have S �O�jAj � �� jV j�� as stated in Theorem ��

We observe that design parameter � embodies a tradeo� between query time�increasing with �� and preprocessing space �decreasing with �� In fact� for � ��jAj� we obtain single�node zones �essentially� the control dependence graph� sinceevery node has its overlapping routes explicitly listed�� and for � � jV j� we obtaina single zone �ignoring START and END and assuming jAvj � � for all other nodes�which is the case for the control dependence problem�� Small constant values suchas � � � yield a reasonable compromise� Figure ��c� shows the zone structure ofthe running example for � � ��

�� Preprocessing for conds Computations

We now describe an algorithm to construct the search structure APT in lineartime� The preprocessing algorithm takes three inputs

�Tree T for which we assume that the relative order of two nodes one of whichis an ancestor of the other can be determined in constant time� For the controldependence problem� this is the postdominator tree�



Procedure CondsPreprocessing�T�tree�A�RouteArray��real��f�� bv��tv�� number of routes with bottom�top node v�� for each node v in T do�� bv� �� tv� �� od� for each route x� y� in A do� Increment bx� �� Increment ty� �� od � �Determine boundary nodes�

�� for each node v in T in bottom�up order do�� Compute output size when v is queried�� av� �� bv� � tv� � �u�childrenv�au� �� zv� �� u�childrenv�zu� �Tentative zone size�� if �v is a leaf� or �zv� � � � av� � �� then � Begin a new zone�� Bndry�v� �� true �� zv� �� else �Put v into same zone as its children� � Bndry�v� �� false �� endif�� od�� Chain each node to the �rst boundary node that is an ancestor�� for each node v in T in top�down order do�� if v is root of postdominator tree�� then NxtBndryv� �� else if Bndry�parent�v�� then NxtBndryv� �� parent�v� �� else NxtBndryv� �� NxtBndryparent�v�� endif�� endif�� od�� Add each route in A to relevant Lv�� for i �� to jAj do�� let Ai� be b� t� �� w �� b �� while t is proper ancestor of w do�� append i to end of list Lw� �� w �� NxtBndryw� � � od�� odg

Fig� �� Constructing the APT structure�

�The array of routes� A� in which routes are sorted by top endpoint� For controldependence problems� this array is built by Procedure ConstructRomanChar�

iots shown in Figure ��

�Real parameter � � �� which controls the space�query�time tradeo�� as describedin the previous section�

The preprocessing algorithm consists of a sequence of a few simple stages�

�� For each node v� compute the number of routes whose bottom node is v and the

number of routes whose top node is v� Let bv� be the number of routes in A



whose bottom endpoint is v� and let tv� be the number of routes whose topendpoint is v� To compute bv� and tv�� two counters are set up and initializedto zero� Then for each route in A� the appropriate counters of its endpointsare incremented� This stage takes time O�jV j� jAj� for the initialization of the�jV j counters and for constant work done for each of the jAj routes�

�� Compute� for each node v� the size jAv j of the answer set Av� It is easy to seethat jAv j � bv�� tv� �

Pu�children�v jAuj� This relation allows us to compute

the jAv j values in bottom�up order� using the values of bv� and tv� computedin the previous step� in time O�jV j��

�� Determine boundary nodes� The objective of this step is to set� at each node�the value of a boolean variable Bndry�v� that identi�es boundary nodes� Def�inition �� can be expressed in terms of subzone size zv� � jZvj as fol�lows� If v is a leaf or ��

Pu�children�v zu�� jAv j � �� then v is a

boundary node� and zv� is set to �� Otherwise� v is an interior node� andzv� � ��

Pu�children�v zu�� Again� zv� and Bndry�v� are easily computed

in bottom�up order� taking time O�jV j��

�� Determine� for each node v� the next boundary node NxtBndryv� in the path

from v to the root� If the parent of v is a boundary node� then it is the nextboundary for v� Otherwise v has the same next boundary as its parent� Thus�NxtBndryv� is easily computed in top�down order� taking O�jV j� time� Aspecial provision is made for the root of T � whose next boundary is set byconvention to �� considered as a proper ancestor of any node in the tree�

�� Construct list Lv� for each node v� By De�nition �� a given route b� t�appears in list Lv� for v � W � where W contains b as well as all boundarynodes contained by b� t�� Speci�cally let W � fw � b� w�� wkg� wherewi � NxtBndrywi�� for i � �� k and where wk is the proper descendantof t such that t is a descendant of NxtBndrywk �� Lv� lists are formed byscanning the routes in A where routes have been entered in decreasing order oftop endpoint� Each route � is appended at the end of �the constructed portionof� Lv� for each node v in the set W corresponding to �� This procedureensures that in each list Lv� routes appear in decreasing order of top endpoint�This stage takes time proportional to the number of append operations� whichisP

v�V jLv�j � O�jAj � jV j��

In conclusion� we have shown that the preprocessing time is T � O�jAj � �� jV j�� as claimed�

Figure � shows the pseudocode for building the search structure� All the prepro�cessing� including construction of the route array A� can be done in two top�downand one bottom�up walks of the postdominator tree� followed by one traversal ofthe route array� Figure illustrates the APT data structure �with � � �� for thecontrol �ow graph of Figure ��

�� APT � cdequiv QUERIES

The routes in a Roman Chariots problem induce a natural equivalence relationon cities two cities are placed in the same equivalence class if and only if theyare served by the same set of routes� In this section� we describe a preprocessing



dec

ba

f

g

END

START{1,2}

{}

{4} {3}

{1} {2,5}

{} )[ ,ab,g

END

d,f[[[[

))))

e,fb,c

ChariotRoutes

4

5

1

2

3

�a� Caching� � � �

Node v b�v� t�v� a�v� z�v� NxtBndry�v� Bndry��v� L�v�

a � � � � f � f�g

b � � � � f � f�� g

c � � � � f � fg

d � � � � f � f�g

e � � � � f � f�g

f � � � � END � f�� g

g � � � � END � fg

START � � � � END � fg

END � � � � �� fg

�b� Values computed during preprocessing

Fig� �� The APT structure and its parameters�

algorithm that produces a list representation of the equivalence classes where eachlist cell points to the next cell as well as to the header of its list� A query cdequiv�w�is answered by simply traversing the list containing w� starting from the header�A query of the form �Are cities v and w in the same equivalence class�� can beanswered in constant time by checking whether w and v have the same header�

A straightforward computation and pairwise comparison of the jV j conds settakes time O�jV j�jAj� Ferrante et al� �� Exploiting structure leads to fasteralgorithms for some problems� but we are not aware of any structure in cdequiv

sets� In particular� nodes in a cdequiv equivalence class are not necessarily adjacenteither in the control �ow graph or in the postdominator tree �consider nodes a andg in Figure �� This suggests that we cannot a�ord to compute and compare condssets explicitly if we want a fast algorithm� In Section �� we obtain a O�jV j� jAj�time algorithm by showing that a conds set is uniquely identi�ed by two functionsjAv j� its size� and Lo�Av�� a descendant of v as de�ned below� both these functionsare computable in linear time� as shown in Section �� Thus� the two functions actas �ngerprints of their sets� and a cdequiv set simply collects nodes with the same�ngerprints��

The �ngerprints we use are easily understood if we restrict attention to the specialcase when the tree is a chain� A single bottom�up walk is su�cient to compute the

�An analogy from physics is the identi�cation of elements from their spectra rather than directlyfrom their atomic structure�



size of the conds set of each node in the chain� Treating this integer as a �ngerprint�we can form equivalence classes by placing all nodes with conds sets of the same sizeinto one class� This gives a coarser equivalence relation than the cdequiv relationbecause the conds sets of nodes p and q are not necessarily equal just becausethese sets are of the same size� Therefore� we need another �ngerprint� Let p bean ancestor of q� and suppose that Ap �� Aq � If jApj � jAq j� there must be someroute � � b� t� � Ap which is not contained in Aq � Therefore� b must be a properancestor of q� This suggests the second �ngerprint� For each node v� compute thenode Lo�Av� which is de�ned to be the lowest node contained in all the routes inAv � and place nodes in the same equivalence class only if this �ngerprint is the samefor both nodes� In our example� Lo�Ap� will be an ancestor of b� while Lo�Aq� willbe a proper descendant of b� so p and q will have di�erent �ngerprints�

It is easily seen that the two �ngerprints set size and Lo together determine thecdequiv equivalence relation completely� The rest of the section shows how tocompute these �ngerprints quickly in general trees�

�� Fingerprints of cdequiv Sets

For notational convenience� we augment the tree with a distinguished node� denotedby � which is considered to be a descendant of all other nodes� and by a node�denoted by �� which is considered to be an ancestor of all other nodes�

De�nition �� If R is a set of chariot routes� Lo�R� is de�ned to be the leastcommon ancestor �lca� of the bottom nodes of routes in R� By convention� Lo�R�is if R is empty�

This de�nition is more general than we need because the sets of routes we dealwith always have at least one node in common� For this special case� Lo can bede�ned more intuitively as follows�

Lemma �� Let R be a nonempty set of chariot routes� and let N be the set

of nodes that belong to every route in R� If N is nonempty� the nodes in N are

totally ordered by the ancestor relation� and Lo�R� is the lowest node in N �

Proof� Since the nodes in N are contained in every route r � R� and the nodesin r are totally ordered by the ancestor relation� it follows that the nodes in N areordered by this relation� Let l be the lowest node in N �

Since l is contained in each route r � R� l is an ancestor of the bottom node ofr� By de�nition of Lo� this means that Lo�R� is a descendant of l�

For every route r � b� t� � R� b is a descendant of Lo�R� �de�nition of Lo�� andt is a proper ancestor of l � de�nition of l� and therefore of Lo�R� �since Lo�R� isa descendant of l�� Therefore� Lo�R� � N � which means that l is a descendant ofLo�R� �de�nition of l��

Therefore l and Lo�R� are identical�

For example� in Figure ��b�� Lo�Af � is c� and Lo�Ag� is a� Note that� for a givennode v� Lo�Av� is always a descendant of v� since v is contained in every route inAv � We show next that jAv j and Lo�Av� uniquely identify Av�

Theorem �� Let p and q be two nodes in the tree� Sets Ap and Aq are equal

if and only if both of the following are true� jApj � jAq j and Lo�Ap� � Lo�Aq��



Proof�

�� If Ap and Aq are equal� clearly so are their �ngerprints�

�� If Ap and Aq are di�erent� then so are p and q� There are two cases toconsider�� Nodes p and q are not related by the ancestor relation� Since Lo�Ap� and

Lo�Aq� are descendants of p and q respectively� it follows that Lo�Ap� ��Lo�Aq��

�� Node q is a descendant of node p� Suppose that jApj � jAq j� Then� theremust be some route � � b� t� that contains p but not q� Lo�Ap� must bean ancestor of b� and q must be a proper descendant of b� Since Lo�Aq�is a descendant of b� it follows that Lo�Ap� �� Lo�Aq�� Therefore� eitherjApj �� jAq j or Lo�Ap� �� Lo�Aq��

In conclusion� whenever Ap and Aq are di�erent� their �ngerprints are di�erentin at least one component�

�� Computing Fingerprints Eciently

Figure � shows how jAv j can be computed for each node v in a single bottom�upwalk of the tree in O�jAj � jV j� time� In this subsection we give an algorithm tocompute Lo�Av� for each node v�

Consider �rst the simpler problem of computing Lo�Aq� just for a given nodeq� It is natural to look for a recursive de�nition of Lo�Aq� in terms of local valuescomputed at q and some values propagated up from its children� If v is a descendantof q� let Aq�v �read as �Aq restricted to v�� be the subset of routes in Aq whosebottom nodes are descendants of v� For example� in Figure �� Ae�g is the set ofchariot routes f�g� Our algorithm will perform a bottom�up walk of the descendantsof q� propagating the value Lo�Aq�v� up from each node v� notice that the valuecomputed when retreating out of q is Lo�Aq�q�� which is nothing but Lo�Aq�� Beforedescribing this algorithm� we introduce some ancillary values de�ned at each node�which can be computed during the bottom�up pass� In the formulae below� mindenotes the least common ancestor of a set of nodes totally ordered by the ancestorrelation� taken to be for the empty set�

De�nition �� The following quantities are de�ned for each node v�

�hv � minft j v� t� � Ag�

�Hv � minft j b� t� � Avg�

��v�� v�� the children of v ordered so that Hvi is an ancestor of Hvi�� and

�tv � minfhv� Hv� � vg�

Informally� hv is the highest top node of any chariot route in the set of chariotroutes whose bottom node is v� for example in Figure � he is c� Hv is the highesttop node of any chariot route in the set of chariot routes containing v� in Figure �He is a� For node e� v� is f � and Hv� is b�

To understand the signi�cance of tv � note that a node on the tree path v� tv�cannot be in the same cdequiv class as nodes that are strictly below v� for tworeasons because it is either contained in a chariot route whose bottom node is v�this is the case if tv � hv�� or it is contained in two routes that come together at v



a

ig

)

[

)

[f h

1

3

b

[

d

)c

e

2

Fig� �� Computing Lo e ciently�

�this is the case if tv � Hv�� For example� the value of te is b because all nodes onthe tree path e� b� are contained in both Route � and Route � which come togetherat e� so these nodes cannot be the same cdequiv class as a node that is a strictdescendant of e�

It is straightforward to compute the ancillary values in a bottom�up walk� Weobserve �rst that hv is easily computed for all nodes in linear time by �rst initializingeach hv to and then scanning every chariot route b� t�� updating the value of hbwith minfhb� tg�

Lemma �� If v is a leaf� we have Hv � hv tv � hv� If v is not a leaf node�

we have the following�

�H� � Hv� and H� � Hv� are the min and second min in the sequence of �possiblyrepeated � values �Hv� � Hv� � �� where v�� v�� are the children of v

�Hv � min�H�� hv� and

�tv � min�H�� hv� v��

Given these ancillary values� the following recursive formula computes the valueof Lo�Aq�v� for all nodes v that are descendants of a query node q� By convention�for a leaf node q� Lo�Aq�v�� is always �

Lemma �� For a �xed q� the following formula computes Lo�Aq�v� for eachdescendant v of q�

if q � v� tv�then Lo�Aq�v� velse Lo�Aq�v� Lo�Aq�v��

Proof� Suppose q � v� tv�� There are two cases�

�� tv � hv� Then� by de�nition of hv� v� tv� is a chariot route� Since q � v� tv��v� tv� � Aq�v� so Lo�Aq�v� � v�



�� tv � Hv� � Then� by de�nition of Hv � there are chariot routes u�� Hv�� andu�� Hv�� where v� and v� are children of v and where u� and u� are descendantsof v� and v�� respectively� These two routes both contain q and �rst meet at v�Therefore Lo�Aq�v� � v�

Suppose q is not contained in v� tv�� This means that every route in Aq�v� if any�originates in the subtree rooted at v�� which means that Aq�v � Aq�v�� Moreover�since all routes in Aq�v� contain q� they must contain v� Therefore� Aq�v � Aq�v��which implies that Lo�Aq�v� � Lo�Aq�v��

The value of Lo�Aq�q� is the desired value Lo�Aq�� Therefore� a walk over thearray of chariot routes �to compute hv� and then a single bottom�up walk of thedescendants of q su�ce to compute the value of Lo�Aq��

We now extend this scheme to compute Lo�Aq� for all nodes q� For a given q�we propagated the single value Lo�Aq�v� up from each node v that is a descendantof q� To extend this scheme� we propagate a sequence of values out of each nodev� where the sequence encodes the values of Lo�Aq�v� for every node q that is an

ancestor of v� For example� in Figure �� the ancestors of e in bottom�up order are �e� d� c� b� a �� and the corresponding sequence of Lo�Aq�e� values is � e� e� e� i� ��Since it is too expensive to have duplicate values in the sequence� we propagateinstead a sequence of pairs of the form x� y�� where x is a Lo value� and y is theancestor of v where this value is no longer relevant� In our example� out of nodee we propagate the sequence of pairs Se �� e� b�� i� a�� Read fromleft to right� this states that the Lo�Aq�e� value is e for any ancestor q of e up to�but not including� node b� is i from there to node a� and is after that� Withthis interpretation� it is clear that for any node q the value of Lo�Aq� is the �rstelement of the �rst pair in the sequence Sq �

If v is a node� the sequence Sv can be expressed in terms of the sequence Sv�where v� is the child of v described in De�nition �� By convention� Sv� for aleaf node v is � ��

Lemma �� For any node v� the sequence Sv can be computed from Sv� as

follows�

�� From Sv� � delete every entry of the form x� y� where y is a descendant of tv�

�� If v� tv� is not empty� make v� tv� the �rst element of the remaining sequence�

The resulting sequence is Sv�

Proof� The proof of correctness follows immediately from Lemma �� sinceLo�Aq��v �� Lo�Aq��v� i� q � v� tv��

For any node q� the value of Lo�Aq� is the �rst element of the �rst pair in thesequence Sq � Therefore� a single bottom�up pass is adequate to compute Lo�Aq�for all nodes q� A key observation for e�ciency is that� by de�nition� the sequenceof second elements of pairs in any Sq are totally ordered by ancestorship� Forexample� in Se� the sequence of second elements is � b� a�� This means thatthe deletion of entries of the form x� y� in Step �� of Lemma �� can start fromthe �rst pair in the sequence and stop as soon as we come across a y that is not adescendant of tv� In other words� sequences can be manipulated like stacks� Thisis the key to obtaining an e�cient implementation�



�� A Fast Algorithm for cdequiv

Figure �� shows the pseudocode for a fast algorithm that exploits the pair of �n�gerprints fLo� Sizeg �discussed above� to identify nodes in the same cdequiv class�Each class is represented as a linked list of nodes� each node points to the next one inthe list as well as to the header of the list� The ability to link a node in constant timerests on the following observation� Let u�� u�� uk be a set of nodes� ordered fromdescendant to ancestor� all with the same Lo� i�e�� Lo�u�� Lo�u�� Lo�uk��Then� it can be easily seen that the corresponding Size values form a nonincreas�ing sequence� i�e�� jAu� j � jAu� j � � � � � jAuk j� The sequence u�� u�� uk is thennaturally broken into segments representing cdequiv classes� the breaking pointsbeing those where a strict inequality occurs�

We assume each node structure has the following six �elds� the �rst four beingauxiliary to the computation of the last two� which encode the cdequiv classes andare part of the APT

�Sv� stack of node pairs�

�Hv� top node closest to root of any route originating from a descendant of nodev�

�RecentSizev� jAwj where w is node for which Lo�Aw� � v most recently inbottom�up walk� This �eld is initialized to ��

�RecentNodev� node w for which Lo�Aw� � v most recently in bottom�up walk�This �eld is initialized to v�

�CdEqNextv� Successor of v in the list of nodes representing cdequiv�v��

�CdEqHeaderv� Header of the list of nodes representing cdequiv�v��

Since every node is control dependent on at least one edge� the Roman Chariotsproblem derived from a control dependence problem has the property that every cityis contained in at least one chariot route� For a general Roman Chariots problem�this is not necessarily the case� and it is necessary to put all nodes that are notcontained in any route into one equivalence class� It is natural to use node asthe header for this class� This processing can be omitted for control dependenceproblems�

To determine the complexity of this algorithm� we note that the work required tocompute tv at a node v is some constant amount plus two terms one proportionalto the number of children of v and the other proportional to the number of routeswhose bottom node is v� Summing over all nodes� we get a term that is O�jV j�jAj��Next� we estimate the work required for pushing and popping pairs� At each nodev� we pop a number of pairs� test one pair that is not popped� and then optionallypush one pair� Since each pair is pushed once and popped once� the total costof pushing and popping is proportional to the number of pairs� which is O�jV j��Finally� the cost of testing a pair that is not popped is charged to the cost ofvisiting the node� Therefore the complexity of the overall algorithm is O�jV j� jAj��for the special case of the control dependence problem this expression is O�jEj��The bottom�up traversal for computing the cdequiv relation can be folded into theconds preprocessing of Section �� but we have shown it separately for simplicity�


�� K�Pingali and G�Bilardi

Procedure CdEquivPreprocessingT f�� T is the postdominator tree �� Processing of node � not needed for control dependence

problems �� RecentSize�� RecentNode�� CdEqNext�� null�� CdEqHeader�� for each node v in bottom�up order do�� compute t�v� �� min returns in�nity �i�e� N � �� if the set is empty�� h � min ft j �v� t is a chariot route g �� h below � min f H�c� j c is a child of vg�� H�v� � min fh� h belowg �� v� � any child c of v having H�c� � h below �� Hv� � min f H�c� j c is a child of v other than v�g �� tv � min f h�Hv�

� v g�� S�v� � S�v�� From S�v�� pop all pairs �x� y where y is descendant of tv�� if �v� tv is not empty then push �v� tv onto S�v� endif �� Lo � �rst element of �rst pair in S�v�� Determine class for node v �� RecentSize�v� � �� RecentNode�v� � v�� CdEqNext�v� � null�� a�v� is jAvj �� if RecentSize�Lo� � a�v� then�� add to current cdequivclass �� CdEqNext�RecentNode�Lo�� v�� CdEqHeader�v� � CdEqHeader�RecentNode�Lo�� else �� start a new cdequivclass �� RecentSize�Lo� � a�v�� CdEqHeader�v� � v

�� endif

�� RecentNode�Lo� � v�� od

g

Fig� �� Algorithm for identifying cdequiv classes�

�� Related Work

There is a large body of previous work on algorithms for computing the cdequiv

relation� Ferrante et al� �� gave the �rst algorithm for this problem theycomputed the conds set of every node explicitly and used hashing to determine setequality� The complexity of this algorithm is O�jV j�jAj�� This algorithm was laterimproved by Cytron et al� �� who described a quadratic�time algorithm for de�termining the cdequiv relation� A linear�time algorithm for the cdequiv problemfor reducible control �ow graphs was given by Ball �� who needed both domina�tor and postdominator information in his solution� subsequently� Podgurski ��


Optimal Control Dependence Computation � ��

gave a linear�time algorithm for forward control dependence equivalence� which isa special case of general control dependence equivalence� Newburg et al� �� useencodings of paths from START to each node to determine the cdequiv relation�They do not describe the complexity of their algorithm in terms of the size of theCFG� but it is likely to be O�jV j�jAj�� if not worse�

The �rst optimal solution to the general cdequiv problem was given by John�son et al� �� who designed an algorithm that required O�jEj� preprocessingtime and space and that enumerated cdequiv sets in proportional time� This al�gorithm required neither dominator nor postdominator information� since it useda depth��rst tree obtained from the undirected version of the control �ow graph�in which the analogs of chariot routes were back edges in the depth��rst tree� Thealgorithm was based on a nontrivial characterization of cdequiv classes in termsof cycle equivalence� a relation that holds between two nodes when they belong tothe same set of cycles� This characterization� which is remarkable in that it doesnot make any explicit reference to the postdominance relation� allows the cdequiv

relation to be computed in less time than it takes to compute the postdominatortree However� since postdominator information is available in APT � the reduc�tion to cycle equivalence is not needed in here� The �ngerprints of sets of chariotroutes used in this article are essentially identical to those in Johnson et al� ��Other researchers are studying properties of cycle equivalence� for example� Rauch�� has developed a dynamic algorithm for computing cycle equivalence incre�mentally when the control �ow graph is modi�ed� A more detailed discussion ofcycle equivalence can be found in Johnson ��

Finally we note that the ancillary quantities� Hv and tv� which were introduced inDe�nition �� are interesting in their own right� The tree of a Roman Chariotsproblem can be viewed as the DFS tree of an undirected graph� in which eachchariot route b� t� represents a back edge connecting nodes b and t� For somenode v suppose that Hv� is not � Then Hv� is the highest ancestor of v that isconnected to some proper descendant of v by a back edge� To see the signi�canceof Hv � note that Hv � minfhv� Hv�g� If Hv is not � then Hv is the highestancestor of v reachable by a path that contains only descendants of v �other thanHv itself�� It is also easy to see that if Hv is not � then there are two pathsfrom v to Hv that have no vertices in common other than v and Hv � One pathis the tree path v�Hv�� If Hv � hv� then the other path is the back edge from vto hv� If Hv � Hv� � there is a proper descendant d of v such that there is a backedge from d to Hv� in that case� the other path is the tree path v� d� concatenatedwith the back edge from d to Hv � The existence of two node�disjoint paths betweenv and Hv means that these nodes are in the same biconnected component of theundirected graph� in fact� the computation of Hv is the key step in the Hopcroft andTarjan algorithm for computing biconnected components Aho et al� �� since itdetermines articulation points in the undirected graph� It can be shown that thecomputation of tv arises similarly in the computation of triconnected components�

� IMPLEMENTATION AND EXPERIMENTS

We can summarize the data structure APT for the Roman Chariots problem asfollows



a b bf c e c dadec

ba

f

g

END

START{1,2}

{}

{4} {3}

{1} {2,5}

{} )[ ,ab,g

END

d,f[[[[

))))

e,fb,c

ChariotRoutes

4

5

1

2

3START

b a c f g e dEND START

: control dependence edges

:postdominator tree edges

�a� APT � � � � �b� Actual Implementation

Fig� �� Implementation of APT �

�� T tree that permits top�down and bottom�up traversals�

�� A array of chariot routes of the form v� w� where w is an ancestor of v in T �

�� dfsv� dfs number of node v�

�� Bndry�v� boolean� Set to true if v is a boundary node� and set to falseotherwise�

�� Lv� list of chariot routes� If v is a boundary node� Lv� is a list of all routescontaining v� otherwise� it is a list of all routes whose bottom node is v�

�� CdEqNextv� Successor of v in the list of nodes representing cdequiv�v��

�� CdEqHeaderv� Header of the list of nodes representing cdequiv�v��

Two aspects of our APT implementation for the control dependence problemare worth mentioning� Instead of storing chariot routes �cd sets� at nodes� westore �references to� the corresponding CFG edges� For example� in Figure ��a��chariot routes � and � are stored at node f � In the actual implementation� shownconceptually in Figure ��b�� we store references to the corresponding CFG edges�START� a and f � b� This is convenient because it enables the output of condsqueries to be produced directly without translation from integers to CFG edges�thereby eliminating a data structure that would be needed for this translation� Sincea CFG edge u � v can be converted to a chariot route v� parent�u�� in constanttime� this does not a�ect the asymptotic complexity of any of the algorithms in thearticle� Finally� in procedure CondsQuery of Figure �� it is worth inlining the callto procedure CondsVisit and eliminating ancestorship tests on routes cached atthe query node itself� if full caching is performed� the overhead of a conds query inAPT � compared to that in the CDG� reduces to a single conditional test�

For control dependence investigations� the standard model problem is a nestof repeat�until loops where the problem size is the number of nested loops� n�Figures ��a� and �b� show storage requirements as problem size is varied� Thestorage axis measures the total number of routes stored at all nodes of the tree�The storage required for the CDG is n�n�� which as expected grows quadraticallywith problem size� For a �xed problem size� the storage needed for APT is betweenthe storage needed for the CDG �full caching� and the storage needed if there isno caching �the dotted line at the bottom of Figures ��a� and �b��

Consider the graph for � � �� in Figure ��a�� For small problem sizes �be�tween � and �� storage requirements look exactly like those of the CDG� For



0 10 20 30 40 50 60 70 800

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Nesting Depth

Sto

rage CDG

ALPHA = 1/32

ALPHA = 1/16

ALPHA = 1

0 10 20 30 40 50 60 70 80 90 1000

50

100

150

200

250

300

ALPHA = 1

ALPHA = 4

ALPHA = 32

ALPHA = >>

Nesting Depth

Sto

rage

�a�Storage vs� Nesting Level �� b� Storage vs� Nesting Level ��

−8 −6 −4 −2 0 2 4 6 80

2000

4000

6000

8000

10000

12000

depth = 100

depth = 64

depth = 32

actual predicted

log(ALPHA)

Sto

rage

−8 −6 −4 −2 0 2 4 6 80

50

100

150

200

250

300

350

400

450

500

depth = 100

depth = 64

depth = 32

depth = 4

log(ALPHA)

Wor

st C

ase

Que

ry T

ime

�c� Storage vs� � �d� Query Time vs� �

0 10 20 30 40 50 60 70 80 90 1000

0.002

0.004

0.006

0.008

0.01

0.012

Nesting Depth

Pre

proc

essi

ng T

ime

(sec

s) ALPHA = 1/32

ALPHA = 1/16

PDOM Time

ALPHA = 1

−8 −6 −4 −2 0 2 4 6 80

1

2

3

4

5

6

7

8

9x 10

−3

Log(ALPHA)

Pre

proc

essi

ng T

ime

(sec

s)

depth = 64

depth = 32

depth = 4

PDOM: depth = 64

PDOM: depth = 32

�e� Preprocessing Time vs� Nesting Level �f� Preprocessing Time vs� �

Fig� �� Experimental results for repeat�until loop nests�

problem sizes larger than �� storage requirements grow linearly� In between thesetwo regimes is a transitory region� A similar pattern can be observed in the graphfor � � �� These results can be explained analytically as follows� From Eq�� it follows that every node is in a zone by itself if� for all nodes q� jZq j � �jAq j�� This means that for all nodes q� jAq j � �� If the nesting depth is n� it is easy toverify that the largest value of jAq j is �n� �� Therefore� if n � �� all nodesare in zones by themselves� which is the case for the CDG� This analysis shows the



adaptive nature of the APT data structure� Intuitively� �� is a measure of the�budget� for space � if the problem size is small compared with the budget� thealgorithm performs full caching� As problem size increases� full caching becomesmore and more expensive� until at some point �� zones with more than one nodestart to appear and �� the graph for APT peels away from the graph for the CDG�A similar analytical interpretation is possible for Figure ��b�� which shows storagerequirements for � � �� Finally� Figure ��c� shows that for a �xed problem size�storage requirements increase as � decreases� as expected� The dashed line is theminimum of the CDG size and the right�hand side of Inequality � for n � �� thisis the computed upper bound on storage requirements for n � �� and it clearlylies above the graph of storage actually used�

Figure ��d� shows that for a �xed problem size� worst�case query time decreasesas � decreases� Because actual query time is too small to measure accurately� wemeasured instead the number of routes examined during querying �say r� and thenumber of nodes in the subzone of the query node� other than the query node itself�say s�� The y�axis is the sum �r � �s�� where the factor of � comes from the needto traverse each edge in the subzone twice� once on the way down and then againon the way back up� Note that each graph levels o� at its two ends �for very small� and for very large �� as it should� It is important to note that the node for whichworst�case query time is exhibited is di�erent for di�erent values of �� In otherwords� the range of query times for a �xed node is far more than the �� ratio seenin Figure ��d��

Finally� Figures ��e� and �f� show how preprocessing time varies with problemsize and with �� These times were measured on a SUN�� Note that for � � �� preprocessing time is less than the time to build the postdominator tree� even forvery small values of �� the time to build the APT data structure is no more thantwice the time to build the postdominator tree� This shows that preprocessing isrelatively inexpensive�

Real programs� such as the SPEC benchmarks� are less challenging than themodel problem� Figure ��a� shows a plot of storage versus program size for all theprocedures in the SPEC benchmarks� The x�axis is the number of basic blocks ina procedure and is a measure of procedure size� The y�axis shows the total numberof routes stored at all nodes of an APT data structure for that procedure� For eachprocedure� the APT data structure was constructed with three di�erent values of� �� a very small value of � �full caching�� and �� a very large valueof �� From Figure ��a�� we can show that storage requirements can be reduced bya factor of � by using a large �� Figure ��b� shows the total storage requirementsfor all the procedures in each of the SPEC benchmarks�

For a �xed problem size� the use of a very small value of � is similar to buildingthe CDG� therefore the data points for small � in Figure ��a� can be viewed asthe storage requirements for the CDG� Note that� unlike in the model problem�storage requirements of procedures in the SPEC benchmarks grow linearly withproblem size �this observation has been made before by other researchers Cytronet al� �� This can be explained as follows� It is easy to verify that if the heightof the postdominator tree does not grow with problem size� the size of the CDGwill grow only linearly with problem size� As is seen in Figure ��c�� the height ofthe postdominator tree for procedures in SPEC is quite small� and it is more or less



0 50 100 150 200 250 300 350 400 450 5000

100

200

300

400

500

600

700

800

+: Full Caching

*: Some Caching: ALPHA = 1

o: No Caching

Program Size: Nodes

Sto

rage

0

2000

4000

6000

8000

10000

12000spice

doduc

mdljdp

wave

tomcatv ora

alvinn

ear

mdljsp

swm

su2cor hydro2d

nasa7 fpppp

Full Caching

Some Caching: ALPHA = 1

No Caching

Sto

rage

SPEC Floating−Point Benchmarks

�a� Storage vs� Program Size �b� Storage for SPEC Floating�Point Benchmarks

0 50 100 150 200 250 300 350 400 450 5000

50

100

150

200

250

300

350

Program Size

Hei

ght o

f Pos

tdom

inat

or T

ree

�c� Height of Postdominator tree vs� Program Size

Fig� �� Experimental results for SPEC benchmarks�

independent of the size of the procedure �only the procedure iniset�f in doduc hasa postdominator tree height of more than �� This re�ects the fact that deeplynested loops and long sequential chains of code are rare in real programs�

Query time was not signi�cantly a�ected when � was set to �� for larger valuesof �� query time for a few nodes was a�ected� but on the whole the e�ect wassmall� Finally� for every procedure in the SPEC benchmarks� preprocessing timeto construct APT is a small fraction of the time to build the postdominator tree�

In general� the choice � � � appears quite reasonable for a implementation ofAPT �

�� CONCLUSIONS

In recent work Bilardi and Pingali �� we have shown how the ideas in APTcan be applied successfully to other interesting variants of the control dependenceproblem such as weak control dependence� which was introduced by Podgurski andClarke �� for proving total correctness of programs� In particular� we providean APT �like data structure� constructed in O�jEj� preprocessing space and time�for answering weak control dependence queries in optimal time� This improves theO�jV j�� time required by the Podgurski and Clarke algorithm�

We have also used the ideas in APT to build the SSA form of a program in



O�jEj� time per variable by exploiting the connection between dominance frontiersand conds sets Pingali and Bilardi �� This algorithm improves the quadratic�time complexity of the commonly used algorithm of Cytron et al� �� and ithas the same complexity as a recent algorithm due to Sreedhar and Gao ��The advantage of our algorithm over the Sreedhar and Gao algorithm is that APTpermits us to compute the dominance frontier of a node optimally� whereas theSreedhar and Gao algorithm requires O�jEj� time for this problem� On the SPECbenchmarks� this advantage results in our algorithm �with � set to �� running �times faster than the Sreedhar and Gao algorithm� Furthermore� our algorithmsubsumes both the Cytron et al� algorithm and the Sreedhar and Gao algorithm� if we build APT with a small value of �� our algorithm reduces to the algorithmof Cytron et al�� while a large value of � produces the algorithm of Sreedhar andGao�

There are many alternatives to the zone construction algorithm given in thisarticle� For example� instead of searching the subtree below a query node for thebottom ends of chariot routes� we can search the path from the query node to END

for the top ends of relevant chariot routes� In general� there is a trade�o� betweenthe sophistication of the query procedure and the amount of caching in APT fora given query time� For example we can use cdequiv information in answeringconds queries� It can be shown that the nodes in a cdequiv equivalence class areordered by the ancestor relation in the postdominator tree Johnson et al� ��Given a query conds�v�� we can answer instead the query conds�w� where w isthe node in the cdequiv class of v that is lowest in the tree� this lets the queryprocedure avoid examining nodes on the path v� w�� which can be exploited duringzone construction to reduce storage requirements�

Although we have used the term caching to describeAPT � note that most cachingtechniques for search problems� such as memoization and related ideas used in thetheorem�proving community Michie �� Segre and Scharstein �� performcaching at run�time �query time�� In contrast� caching in APT is performed duringpreprocessing� and the data structure is not modi�ed by query processing� Thispermits us to get a grip on storage requirements� which is di�cult to do withrun�time approaches� Of course� nothing prevents us from using run�time cachingtogether with APT � if this is useful in some application�

There is a deep connection between APT and the use of factoring to reducethe size of the CDG Cytron et al� �� Factoring identi�es nodes that havecontrol dependences in common and creates representations that permit controldependences to be shared by multiple nodes� The simplest kind of factoring exploitscdequiv sets� If p nodes are in a cdequiv set� and have q routes in common� we canintroduce a junction node� connect the q routes to the junction� and introduce edgesfrom the junction to each of the p nodes� In this way� the number of edges in thedata structure is reduced from p � q to p� q� Exploitation of cdequiv informationalone is not adequate to reduce the asymptotic size of the graph� but the ideaof sharing routes can be extended � for example� factoring is possible when theroutes containing a node v� are a subset of the routes containing node v�� However�no factorization to date has reduced worst�case space requirements� To place theAPT data structure in perspective� note �� that it can be viewed as a factoredrepresentation� since a route is cached just once per zone� and �� that entry for



the route is shared by all nodes in the zone� However� there is an importantdi�erence between the traditional approaches to factorization and the one that wehave adopted in APT � In previous factorizations� every route encountered duringquery processing is reported as output� In our approach� the query proceduremay encounter some irrelevant routes that must be ��ltered out�� but there is aguarantee that the number of irrelevant routes encountered during query processingis at most some constant fraction of the actual output� By permitting this slackin the query procedure� we are successful in reducing space and preprocessing timerequirements without a�ecting asymptotic query time�

More generally� the approach to conds described in this article can be viewed asan example of �ltering search Chazelle �� a technique used in computationalgeometry to solve range search problems� In these problems� a set of geometricalobjects in Rd is given� A query is made in the form a connected region in Rd� and allobjects intersecting this region must be enumerated� To draw the analogy� we canview the routes in our problem as geometric objects� and we can view the query nodeas the analog of the query region� clearly� the conds problem asks for enumerationof all �objects� that intersect the query �range�� Filtering search exploits the factthat to report k objects� it takes ��k� time� Therefore� we can invest O�k� time inan adaptive search technique that is relatively less e�cient for large k than it is forsmall k� In our solution to the conds problem� nodes contained in a large numberof routes are allowed to be in zones with a large number of nodes� therefore� a queryat such a node may visit a large number of nodes� but this overhead is amortizedover the size of the output� Correspondingly� the search procedure visits a smallnumber of nodes if the query node has only a small amount of output� This kind ofsearch procedure with adaptive caching may prove useful in solving other problemsin the context of restructuring compilers�

ACKNOWLEDGEMENTS

Richard Johnson and David Pearson participated in early research that led to thisarticle� In particular the cdequiv algorithm described here uses ideas that weredeveloped jointly with Johnson and Pearson� Paul Chew� Guong�Rong Gao� DexterKozen� Giuseppe Italiano� Mayan Moudgill� Ruth Pingali� Geppino Pucci� BarbaraRyder� Richard Schooler� V� Sreedhar� Eric Stoltz� Eva Tardos� and Michael Wolfegave us invaluable feedback� We thank the referees for their constructive comments�

REFERENCES

Aho� A� V�� Hopcroft� J�� and Ullman� J� �� The Design and Analysis of Computer Algo�rithms� Addison�Wesley� Reading� Mass�

Allen� F�� Burke� M�� Cytron� R�� Ferrante� J�� Hsieh� W�� and Sarkar� V� �� A frame�work for determining useful parallelism� In Proceedings of the �� International Conferenceon Supercomputing� IEEE� New York� ��!��

Ball� T� �� What�s in a region� or computing control dependence regions in near�linear timefor reducible control �ow� ACM Lett� Program� Lang� Syst� �! �Mar�!Dec�� !��

Bernstein� D� and Rodeh� M� �� Global instruction scheduling for superscalar machines� InProceedings of the SIGPLAN �� Conference on Programming Language Design and Imple�mentation� ACM� New York� ��!��

Bilardi� G� and Pingali� K� �� A framework for generalized control dependence� In Proceed�



ings of the SIGPLAN �� Conference on Programming Language Design and Implementation�

ACM� New York� ��!��

Chazelle� B� �� Filtering search� A new approach to query answering� SIAM J� Comput� � ��!��

Cytron� R�� Ferrante� J�� Rosen� B� K��Wegman� M� N�� and Zadeck� F� K� �� E cientlycomputing static single assignment form and the control dependence graph� ACM Trans� Pro�gram� Lang� Syst� �� Oct�� !��

Cytron� R�� Ferrante� J�� and Sarkar� V� �� Compact representations for control depen�dence� In Proceedings of the SIGPLAN �� Conference on Programming Language Design andImplementation� ACM� New York� ��!��

Ferrante� J�� Ottenstein� K� J�� and Warren� J� D� �� The program dependency graphand its uses in optimization� ACM Trans� Program� Lang� Syst� � � �June�� !��

Fisher� J� �� Trace scheduling� A technique for global microcode compaction� IEEE Trans�Comput� � �� !��

Gupta� R�� Pollock� L�� and Soffa� M� L� �� Parallelizing data �ow analysis� In Proceedingsof the Workshop on Parallel Compilation� Queen�s University� Kingston� Ontario�

Gupta� R� and Soffa� M� L� �� Region scheduling� In nd International Conference onSupercomputing� IEEE� New York� ��!��

Harel� D� �� A linear time algorithm for �nding dominators in �owgraphs and related prob�lems� In Proceedings of the ��th ACM Symposium on Theory of Computing� ACM� New York��!��

Horowitz� S�� Prins� J�� and Reps� T� �� Integrating non�interfering versions of programs�In Conference Record of the ��th Annual ACM Symposium on Principles of ProgrammingLanguages� ACM� New York� ��!��

Johnson� R� �� E cient program analysis using dependence �ow graphs� Ph�D� thesis� CornellUniv�� Ithaca� N�Y�

Johnson� R� and Pingali� K� �� Dependence�based program analysis� In Proceedings of theSIGPLAN �� Conference on Programming Language Design and Implementation� ACM� NewYork� ��!��

Johnson� R�� Pearson� D�� and Pingali� K� �� The program structure tree� Computingcontrol regions in linear time� In Proceedings of the SIGPLAN �� Conference on ProgrammingLanguage Design and Implementation� ACM� New York� ��!��

Lengauer� T� and Tarjan� R� E� �� A fast algorithm for �nding dominators in a �owgraph�ACM Trans� Program� Lang� Syst� � � �July�� !��

Michie� D� �� Memo functions and machine learning� Nature �� !��

Newburn� C�� Noonburg� D�� and Shen� J� �� A PDG�based tool and its use in analyz�ing program control dependences� In Proceedings of the International Conference on ParallelArchitectures and Compilation Techniques� IEEE� New York�

Pingali� K� and Bilardi� G� �� APT � A data structure for optimal control dependence com�putation� In Proceedings of the SIGPLAN �� Conference on Programming Language Designand Implementation� ACM� New York� ��!��

Segre� A� and Scharstein� D� �� Bounded�overhead caching for de�nite�clause theoremproving� J� Autom� Reason� �� !��

Simons� B�� Alpern� D�� and Ferrante� J� �� A foundation for sequentializing parallel code�In SPAA �� ACM Symposium on Parallel Algorithms and Architecture� ACM� New York�

Sreedhar� V� C�� Gao� G� R�� and Lee� Y� �� DJ�graphs and their applications to �owgraphanalyses� Tech� Rep� ACAPS Memo �� McGill Univ�� Montreal� Canada� May�

Received July �� accepted October ��


ISS Group at the University of Texas - bc...(a) Control Flow Graph a b (b) Postdominator Tree (c)...

Documents

Transcript of ISS Group at the University of Texas - bc...(a) Control Flow Graph a b (b) Postdominator Tree (c)...