dac88_330

download dac88_330

of 7

Transcript of dac88_330

  • 7/29/2019 dac88_330

    1/7

    Tutorial on High-Level SynthesisMichael C. McFarland, SJ Alice C. ParkerBoston College University of Southern CaliforniaChestnut Hill, MA 02167 Los Angeles, CA 90007

    Raul CarnposanoIBM T.J. Watson Research CenterYorktown Heights, NYAbstract. High-level synthesis takes an abstract behavioralspecification of a digital system and finds a register-transfer evelstructure that realizes the given behavior. In this tutorial we willexamine the high-level synthesis task, showing how it can bedecomposed nto a number of distinct but not independenl. sub-tasks. Then we will present the techniques that have beendeveloped for solving those subtasks. Finally, we will note thoseareas ela ted to high-level synthesis that are still open problems.1. Introduction1.1 What is High-Level Synthesis?The synthesis task is to take a specification of the behaviorrequired of a system and a set of constraints and goals to besatisfied, and to fmd a structure that implements the behavior whilesatisfying the goals and constraints. By behavior we mean the waythe system or its components nteract with their environment, i.e.,the mapping from inputs to outputs. Structure refers to the set ofinterconnected components that make up the system - somethinglike a netlist. Usually there are many different structums that canbe used to rea lize a given behavior. One of the tasks of synthesisis to tlnd the structure that best meets he constraints, such as limi-tations on cycle time. area or power, while minimizing other costs.For example, the goal might be to minimize area while achieving acertain minimum processing rate.Synthesis can take place at various levels of abstraction becausedesigns can be described at various levels of detail. The type ofsynthesis we will focus on in this tutorial begins with a behavioralspecification at what is often called the algorithmic leve l. The pri-mary data types at this level are integers and/or bit strings andarrays, rather than boolean variables. The input specification givesthe required mappings from sequencesof inputs to sequencesofoutputs. It should constrain the internal structure of the sys tem tobe designed as little as possible. From that input specification, thesynthesis system produces a description of a register-rran#er levelstructure that realizes the specified behavior. This sbuctumincludes a data path, that is, a network of registers, functionalunits, multiplexem and buses, as well as hardware to control thedata transfers in that network. If the control is not integrated intothe datapath -and it usually is not - the synthesis system mustalso produce the specification of a fmite state machine that drivesthe datapaths so as to produce the required behavior. The connospecification could be in terms of microcode, a PLA profile or ran-dom logic.High-level synthesis as we define it must be distinguished fromother types of synthesis, which operate at different levels of thedesign hierarchy. For example, high-level synthesis is noL to beconfused with logic synthesis, where the system is specified interms of logic equations, which must be optimized and mappedinto a given technology. Logic synthesis might in fact be used ona design after high-level synthesis has been done, since it pmsup-poses he sorts of decisions that high-level synthesis makes. A t theother end of the spectrum, there is some promising work underway on system evel synthesis, for example on partitioning an algo-

    rithm into multiple processes hat can run in parallel or be pipe-lined. However, this. work is still in its preliminary stages:and wewill not report on it hem.1.2 Why Study High Level Synthesis?In recent years there has been a trend toward automating synthesisat higher and higher levels of the design hierarchy. Logic syn-thesis is gaining acceptance n industry, and there has been consid-erable interest shown in synthesis at higher levels. There am anumber of masons or this:. Shorter design cycle. If mom of the design process isautomated, a company can get a design out the door faster, andthus have a better chance of hitting the marke t window for thatdesign. Furthermore, since much of the cost of the chip is indesign development, automating mom of that processcan lowerthe cos t significantly.- Fewer Errors. If the synthesis process can be verified to be

    correct - by no means a trivial task - there is a greateassurance hat the final design will correspond to the initialspecification. This will mean fewer errors and less debuggingtime for new chips.. The ability to search the design space. A good synthesis sys-tem can produce several designs for the same specification in areasonable amount of time. This allows the developer toexplore different trade-offs between c0st, speed, power and soon, or to take an existing design and produce a functionallyequivalent one that is faster or less expensive.- The design process is self-documenting. An automated system can keep track of what design decisions were made andwhy, and what the effect of those decisions was.. Availability of IC technology to more people. As moredesign expertise is moved into the synthesis system, t becomeeasier for a non-expert to produce a chip that meets a given seof specifications.

    We expect this trend toward higher levels of synthesis to continue.Already there are a number of research groups working on high-level synthesis, and a great deal of progress has been made infinding good techniques for optimization and for exploring designtrade-offs. These techniques are.very important becausedecisionsmade at the algorithmic level tend to have a much greater impacton the design than those at lower levels.Them is now a sizable body of knowledge on high-level synthesisalthough for the most part it has not yet been systematized. In theremainder of this paper, we will describe what the problems are inhigh-level synthesis, and what techniques have been developed tosolve them. To that end, Section 2 will describe the various tasksinvolved in developing a register-trans fer level s tructure from analgorithmic level specification. Section 3 will describe. he basictechniques that have been developed for performing those tasksFinally, Section 4 will look at those issues that have not been adequately addressed and thus provide promising areas for futureresearch.

    Paper 23.133025th ACM/IEEE Design Automation Conference@

    CH2540-3/88/0000/0330$01 .OO0 1988 IEEE

  • 7/29/2019 dac88_330

    2/7

    2. The Synthesis TaskThe system to be designed s usually representedat the algorithmiclevel by a programming language such as Pascal [27] o r Ada [8],or by a hardware description language that is similar to a program-ming language, such as ISPS [2]. Most of the languages used areproceduring languages. That is, they describe data manipulation interms of assignmentstatements hat am organized nto larger blocksusing standard control constructs for sequential execution, condi-tional execution and iteration. There have been experiments, how-ever, with various types of non-procedural hardware descriptionlanguages, including applicative, LISP-like languages [ll] anddeclarative anguagessuch as Prolog.The first step in high-level synthesis is usually the compilation ofthe formal language into an internal representation. Two types ofinternal representationsare generally used: parse trees and graphs.Most approaches use variations of graphs that contain both thedata-flow and the control flow implied by the specification [16],[26], [12]. Fig. 1 shows a part of a simple program that computesthe square-rootof X using Newtons method, along with its graphi-cal representation. The number of iterations necessary n practiceis very small. In the example, 4 iterations were chosen. A firstdegree minimax polynomial approximation for the interval gives the initial value. The data-flow and control flow graphsare shown seprately in the figure for intelligibility. The controlgraph is derived directly from the explicit order given in the pro-gram and from the compilers choice of how to parse he arithmeticexpressions. The data-flow graph shows the essential ordering ofoperations in the program imposed by the da ta relations in thespecification. For example, in fig. 1, the addition at the top of thediagram depends or its input on data produced by the multiplica-tion. This implies that the multiplication must be done first in anyvalid ordvring of the operations. On the other hand, there is nodependence etween the I + 1 operation inside the loop and any ofthe operations in the chain that calculates Y, so the I + 1 may bedone in parallel with those operations, as well as before.or afterthem. The data-flow graph can also be used to remove the depen-dence on the way internal variables are used in the specification,since each value produced by one operation and consumed byanother is representeduniquely by an arc. This ability to reassignvariables is important both for reordering operations and for simpli-fying tbe datapaths..Y :=0.222222+0.888889*X;I:=O;DOUNTILI>3LOOPY:=O.S*(Y+X/Y);I:=I+l; *+Ib+> f

    The res t of this section outlines the various s teps used in turningthe intermediate form into a RT-level StNCtUre, using the SqUaEroot example to illustrate the different steps.Since the specification has been written for human readability andnot for direct translation into hardware, it is desirable to do someinitial optimization of the internal representation. These high-leveltransformations include such compiler-like optimizations as deadcode elimination, constant propagation, common subexpressionelimination, inline expansion of procedures and loop unrolling.Local transformations, including those that a re more specific tohardware, are also used. In the example, the loop-ending criterioncan be changed to I = 0 using a two-bit variable for 1 . The multi-plication times 0.5 can be replaced by a right shift by one. Iheaddition of 1 to I can be replaced by an increment operation. Theinternal representation after these optimizations is depicted on theleft in fig. 2. Loop unrolling can also be done in this case sincethe number of iterations is fixed and small.The next two steps in synthesis are the core of transformingbehavior into structure: scheduling and allocation. They are closelyinterrelated and depend on each other. Scheduling consists inassigning the operations to so-called control steps. A control stepis the fundamental sequencing unit in synchronous systems; itcorresponds o a clock cycle. AUocation consists in assigning theoperations to hardware, i.e. allocating functional units, storage andcommunication paths.The aim of scheduling is to minimize the amount of time or thenumber of control steps needed for completion of the program,given certain limits on the available hardware resources . In ourexample, a trivial special caseuses ust one functional unit and onememory. Each operation has to be scheduled n a different controlstep, so the computation takes 3+4*5=23 control steps. To speedup the computation at the expense of adding more hardware, thecontrol graph can be packed into contml steps as tightly aspossible, observing only the essential dependencies equired by tbedata-flow graph and by the loop boundaries. This form is shown infig. 2. Notice that two dumm y nodes to delimit the loop boun-daries were introduced. Since the shift operation is free, with twofunctional units the operations can now he scheduled n 2+4*2=10control steps.

    control

    66

    d0p

    9

    1010

    l multiplexer.# shiftc constant

    Figure 2. Optimized Control Graph and Schedulet IFigure 1. High-level Specification and graph for sqrtPaper 23.1

    331

  • 7/29/2019 dac88_330

    3/7

    In allocation. the problem is to minimize the amount of hanlwareneeded. The hardware consists essentially of functional units,memory elements and communication paths. To minimize themtogether is usually too complex, so in most systems they areminimized separately. As an example, consider the minimizationof functional units. Mutually exclusive operations , e.g. operationsin different control steps, clearly can share functional units. Theproblem is then to group those operations which are mutuallyexclusive, so that the minimum number of groups results. If func-tional units are capable of performing only some operations, onlythose operations that can be performed by somecommon functionalunit can be grouped. The allocation of functional units for thegiven schedule in fig. 2 is minimal in this sense. The problems ofminimizing the amount of s torage and the complexity of the com-munication paths for a given schedule can be formulated similarly.

    searched is multi-,dimensional and discontinuous, and it is haeven to find a canonical set of operators that systematically takyou through that space. Furthermore. the shapeof the design spais often problem-specific, so that there is no methodology that guaranteed o work in all cases.Finding the best solution to even a limited problem such scheduling is difficult enough. Many synthesis subtasks, ncludinscheduling with a limitation on the number of resourcesand mgiter allocation given a fixed number of registers, are known to bNP-hard. That means that the process of finding an optimal soltion to these s believed to require a number of steps exponential the size of the data set. Yet in high-level synthesis there are sevesuch tasks, and they cannot really be isolated, since they are intedependent.

    In memory allocation, values that are generated n one control stepand used in another must be assigned to storage. Values may beassigned to the same register when their lifetimes do not overlap.Storage assignment should be done in a way that not only minim-izes the number of registers, but also simplifies the communicationpaths.

    3. Basic Techniques3.1 SchedulingWe distinguish two dimensions along which scheduling algorithmmay differ: (1) the interaction between scheduling and operaCommunications paths, including buses and multiplexers. must be and/or datapath allocation; and (2) the type of scheduling algorithchosen so that the functional units and registers are connected as used.

    necessary o support the data transfers required by the specificationand the schedule. The most simple type of communication pathallocation is based only on multiplexers. Buses, which can be seenas distributed multiplexem, offer the advantage of requi:ring lesswiring, but they may be slower than multiplexem. Depending onthe application, a combination of both may be the best solution.In addition to designing the abstrac t structure of the data path, thesystem must decide how each component of the data path is to beimplemented. This is sometimes called module binding. For thebinding of functional units, known components such as adders canbe taken from a hardware library. Libraries facilitate the synthesisprocess and the size/timing estimation, but they can preventefficient solutions that require special hardware. The synthesis ofspecial-purpose ull-custom hardware is possible, but it m.akes hedesign process more expensive, possibly requiring extensive use oflogic synthesis and layout synthesis.

    3.1.1 Interaction with Allocation As noted earlier, scheduling anoperator allocation are interdependent tasks. In o rder to knowhether two operations can be scheduled n the same control steone must know whether they use the same unctional unit. Moreover, finding the most efficient possible schedule for the rehardware requires knowina the delays for the different onerationand those CA only be fo&d after the details of the functional uniand their interconnections are known. On the other hand, in ordto m ake a good judgement about how many functional units shoube used and how operations ought to be distributed among themone must know what operations will b-e done in parallel, whiccomes rom the schedule. Thus there is a vicious circle, since eatask dependson the outcome of the other.

    Once the schedule and the data paths have been chosen, t is neces-sary to synthesize a controller that will drive the data paths asrequired by the schedule. The synthesis of the control hardwareitself can be done in d ifferent ways. If hardwired control is chosen,a control step corresponds o a state in the controlling fmite statemachine. Once the inputs and outputs to the FSM. that is, theinterface to the data part, have been determined as part of the allo-cation. the FSM can be synthesized using known methods, nclud-ing state encoding and optimization of the combinational logic. Ifmicrocoded control is chosen nstead, a control step corresponds oa microprogram step and the microprogram can be optimized usingencoding techniques for the microcontrol word.

    A number of approaches o this problem have been taken by sythesis systems. The most straightforward one is to set some lim(or no limit) on the number of functional units available and theto schedule. This is done, for example, in the Facet system [28in the early Design Automation Assistant [13], and in the Flamsystem [27]. This limit could be set as a default by the program specified by the user. A somewhat more flexible version of thapproach is to iterate the whole process, first choosing a resourlimit, then scheduling, then changing the limit based on the resuof the scheduling, rescheduling and so on until a satisfactory desighas been found. This is done, for example, under user control the MIMOLA system [29] and under guidance of an expert systewith feedback rom the datapath allocator, in Chippe [5].

    Finally, the design has to be converted into real hardware. Lowerlevel tools such as logic synthesis and layout synthesis complete.the design.The major problem undrlying all these tasks is the extremely largenumber of design possibilities which must be examined in order toselect the des ign which meets constraints and is as near as possibleto the optimal design. The design space that needs to be

    Another approach s to develop the schedule and resource requirments simultaneously. For example, the MAHA [21] system allcates operations to functional units as it schedules, adding funtional units only when it carm ot share existing ones. The forcdirected scheduling in the HAL [22] system schedules operatiowithin a given time constraint so as to ba lance the number of funtional units required in each control step. The number of funtional units allocated is then the maximum number required in acontrol step . HAL also includes a feedback oop that allows tscheduling to be repeated after the detailed datapaths have bedesigned, when more is known about delays and interconnect cos

    Paper 23.1332

  • 7/29/2019 dac88_330

    4/7

    The Yorktown Silicon Compiler (YSC) [4] does allocation andscheduling together, but in a different way. It begins with eachoperation being done on a separate unctional unit and all opera-tions being done in the same control step . Additional control stepsare added for loop boundaries, and as required to avoid conflictsover register and memory usage. The hardware is then optimizedso as to share resourcesas much as possible. If there is too muchhardware or there are too many operations chained together in thesame control step, more contml steps are added and the datapathstructure is again optimized. This process is repeated until thehardware and time constraints are met.Finally, functional unit allocation can be done first, followed byscheduling. In the BUD system [17], operations are first parti-tioned into clusters, using a metric that takes nto account potentialfunctional unit sharing, interconnect, and parallelism. Then func-tional units are assigned o each cluster and the scheduling is done.The number of clusters to be used is determined by searchingthrough a range of possible clusterings, choosing the one that bestmeets he design objectives.In the Karlsruhe CADDY/DSL system [25], the datapath is builtfirst, assuming maximal parallelism. This is then optimized,locally and globally, guided by both area constraints and timing.The operations are then scheduled, subject to the constraintsimposed by the datapath.3.1.2 Scheduling Algorithms There are two basic classes ofscheduling algorithms: transformational and iterative/constructive.A transformational type of algorithm begins with a defaultschedule, usually either maximally serial or maximally parallel, andapplies transformations to it to obtain other schedules. Thetransformations move serial operations in parallel and paralleloperations in series. Transformational algorithms differ in howthey choose what transformations o apply.Barbacc is EXPL [l], one of the earliest high-level synthesis sys-tem, used exhaustive search. That is, it tried all possible combma-tions of serial and parallel transformations and chose the bestdesign found. This method has the advantage hat it looks throughall possible designs, but of course t is computationally very expen-sive and not practical for sizable designs. Exhaustive searchcan beimproved somewhat by using branch-and-bound echniques, whichcut off the search along any path that can be recognized to besuboptimal.Another approach o scheduling by transformation is to use heuris-tics to guide the process. Transformations are chosen that promiseto move the design closer to the given constraints or to optimizethe objective. This is the approachused, for example, in the York-town Silicon Compiler [4] and the CAMAD design system [23].The trans formations used in the YSC can be shown to produce afastest possible schedule o r a given specification.The other class of algorithms, the iterative/constructive ones, buildup a schedule by adding operations one at a time until all theoperations have been scheduled. They differ in how the nextoperation to be scheduled is chosen and in how they determinewhere to schedule each operation.The simplest type of scheduling, as soon as possible (ASAP)scheduling, is local both in the selection of the operation to bescheduled and in where it is placed. ASAP scheduling assumesthat the number of functional units has already been specified.

    Operatives are first sorted topologically; that is. if operation x2 isconstrained to follow operation xl by som e necessarydataflow orcontrol relationship, then x2 will follow xl in the topological order.Operations are taken from the list in order and each is put into theearliest control step possible, given its dependenceon other opera-tions and the limits on resourceusage. Figure 3 shows a dataflowgraph and its ASAP schedule. This was the type of schedulingused in the CMUDA system [lo]. in the MIMOLA system and inFlamel. The problem with this algorithm is that no priority isgiven to operations on the critical path, so that when there are lim-its on resource usage, operations that are less critical can bescheduled first on lim ited resources and thus block critical opera-tions. This is shown in Figure 3, where operation 1 is scheduledaheadof operation 2. which is on the critical path, so that operation2 is scheduled ater than is necessary, orcing a longer than optimalschedule.

    Figure 3. ASAP SchedulingList scheduling overcomes his problem by using a more global cri-terion for selecting the next operation to be scheduled. For eachcontrol step to be scheduled, he operations that are available to bescheduled nto that control step, that is, those whose predecessorshave already been scheduled, are kept in a list, ordered by somepriority function. Each operation on the list is taken in turn and isscheduled f the resources t needs are still free in that step; other-wise it is de ferred to the next step . When no more operations canbe scheduled, the algorithm moves to the next control step, theavailable operations are found and ordered, and the process isrepeated. This continues until all the operations have beenscheduled. Studies have shown that this form of scheduling worksnearly as well as branch-and-boundscheduling in microcode optim-ization [6]. Figure 4 shows a list schedule for the graph in Figure3. Here the priority is the length of the path from the operation tothe end of the block. Since operation 2 has a higher priority thanoperation 1, it is scheduled irst, giving an optimal schedule or thiscase.A number of schedulers use list scheduling, though they differsomewhat in the priority function they use. The scheduler in theBUD system uses the length of the path from the operation to theend of the block it is in. Elf [8] and ISYN [19] use the urgencyof au operation, the length of the shortest path from that operationto the nearest ocal constraint.

    Figure 4. A List Schedule

    Paper 23.1333

  • 7/29/2019 dac88_330

    5/7

    Figure 5. A Distribution GraphThe last trpe of scheduling algorithm we will consider is globalboth in the way it selects he next operation to be scheduled and inthe way it decides the control step in which to put it. In this typeof algorithm, the range of possible control s tep assignments foreach operation is calculated, given the time constraints and the pre-cedence relations between the operations. In freedom-basedscheduling, the operations on the critical path are scheduled firstand assigned to functional units. Then the other operations arescheduled and assigned one at a time. At each step theunschedtdedoperation with the least freedom, that is, the one withthe smallest range of control steps nto which it can go, is chosen,so that operations that might present more difficult schedulingproblems are taken care of first, before they becomeblocked.In force-directed scheduling. the range of possible control steps oreach operation is used to form a so-called Distribution Graph. Thedistribution graph shows, for each control step , how heavily loadedthat step is, given that alI possible schedulesare equally likely. Ifan operation could be done in any of k control steps, then l/k isadded to each of those control steps in the graph. For exampleFigure 5 shows a dataflow graph, the range of steps o r each opera-tion, and the corresponding distribution graph for the additionoperations, assuming a time constraint of three control steps.Addition al must be scheduled n s tep 1, so it contributes 1 to thatstep. Similarly addition a2 adds 1 to control step 2. Addition a3could be scheduled n either step 2 or step 3, so it contributes I/; toeach. Operations ate then selected and placed so as to balance thedistribution as much as possible. In the above example, a.3wouldfirst be scheduled into step 3. since that would have the greatesteffect in balancing the graph.3.2 Data Path AllocationData path allocation involves mapping operations onto functionalunits, assigning values to registers, and providing interconnectionsbetween operators and registers using buses and multiplexem. Thedecision to use AINs instead of simple operators s also made atthis time. The optimization goal is usually to minimize someobjective function, such as

    l total interconnect ength,- total reg ister, bus driver and multiplexer cost, or. critical path delays.There may also be one or more constraints on tbe design whichlimit total area of the design, total throughput, or delay from inputto output.The techniques used to perform data path allocation can beclassified into two types, iterative/constructive, and global.Iterative/constructive techniques assign elements one at a time,while global techniques find simultaneous solutions to a number ofassignmentsat a time. Exhaustive search s an extreme case of aglobal solution technique. Iterative/Constructive techniques gen-

    erally look at less of the search space than global techniques, atherefore are mote efficient, but ate less likely to find optimal sotions.3.2.1 Iterative/Constructive Techniques Iterative/constructive tecniques select an operation, value or interconnection to be assignemake the assignment, and tben iterate. The rules which determithe next operation, value or interconnect to be selected can vafrom global rules, which examine many or all items before seleing one, to local selection rules, which select the items in a fixeorder, usually as they occur in the data flow graph from inputs outputs. Global selection involves selecting a candidate fassignment on the basis of some metric, for example taking tcandidate that would add the minimum additional cost to tdesign. Hafers da ta path allocator, the first RT synthesis prograwhich dealt with ITL chips was iterative, and used local selecti[9]. The DAA used a local criterion to se lect which element assign next, but chose where to assign it on the basis of rules thencodedexpert knowledge about the data path design of microptcessors. Once this knowledge base had been tested and improvthrough repeated nterviews with designers, the DAA was able produced much cleaner data paths than when it began [ 13 pag26-311. EMUCS 1101used a global selection criterion, based minimizing both the number of functional tits and registers athe multiplexing needed , o choose the next element to assign awhere to assign it. The Elf system also sought to minimize inteconnect, but used a local se lection criterion. The REAL progra[15] separated out register allocation and performed it afscheduling, but prior to operator and interconnect allocatioREAL is constructive, and selects the earliest value to assigneach step, sharing registers among values whenever possible.

    al +

    4J?lm -- --

    +a + a3,-JIG%* _-- c,T34 +

    -_ a(1(2 a211 a3,a4r!!!?ml ,m2Figure 6. Greedy Data Path AllocationAn example of greedy allocation is shown in fig. 6. The dataflograph on the left is processed from earliest time step to lateOperators, registers and interconnect are allocated for each timstep in sequence. Thus, the selection rule is local, and the alloction constructive. Assignments are made so as to minimize inteconnect. In the case shown in the figure, a2 was assignedadder2 since the increase in multiplexing cost required by thallocation was zero. a4 was assigned o adder1 because here walready a connection from the register to that adder. Other vartions are possible, each with different multiplexing costs. F

    example, if we had assigned a2 to adder1 and a4 to adder1 withochecking for interconnection costs, then the final multiplexinwould have been more expensive. A m ore global selection rualso could have been applied. For example, we could haselected tbe next item for allocation on the basis of minimizatioof cost increase. In this case, f we had already allocated a3addet2, then the next step would be to allocate a4 to the saadder, since they occur in different time s teps, and the incremencost of doing that assignment s less than assigning a2 to adderl.

    Paper 23.1334

  • 7/29/2019 dac88_330

    6/7

    3.2.2 Global Allocation Global allocation techniques nclude graphtheoretic formulations and mathematical programming techniques.One popular graph theoretic formulation 1281 nvolves creatinggraphs in which the elements to be assigned o hardware, whetherthey a re operations, values, or interconnections, am represen tedbynodes, and them is an arc between two nodes if and only if thecorresponding elements can share he samehardware. The problemthen becomesone of finding those sets of nodes n the graph all ofwhose members are corrected to one another, since all of the ele-ments in such a set can share the same hardware without conflict.This is the so-called clique finding problem. If the objective is tominimize the number of hardware units, then we would want tofind the minimal number of cliques that cover the graph, or, to putit another way, to find the maximal cliques in the graph. Unfor-tunately, finding the maximal cliques in a graph is an NP-hardproblem, so in practice, greedy heuristics are employed. Figure 7shows the graph of operations from the example shown in Figure6. One clique is highlighted, showing that the three operations canshare he same adder, ust as n the greedy example.

    Figure 7. Example of a CliqueFormulation of allocation as a mathematical programming probleminvolves creating a variable for each possible assignment of anoperation, register or interconnection to a hardware element. Thevariable is one if the assignment is made and zero if it is not.Constraints must be formulated that guarantee hat each elementmust be assigned to one and only one element, and SO on. Theobjective then is to find a valid solution that minimizes some costfunction. Finding an optimal solution requires exhaustive search,which is very expensive. This was done by Hafer on a smallexample [9], and recent researchby Hafer indicates that heuristicscan be used to reduce the search space,so that larger examples canbe considered.3.3 SummaryThe one issue plaguing synthesis researcherss how to reduce thecomputations required during the entire data path synthesis task,while still obtaining good designs. The goal of most of the tech-niques described above is to process the search space efficiently,and produce a near-optima l solution. The techniques describedhem perform well for each synthesis task in isolation, but the prob-lem of finding good solutions to all tasks simultaneously is still anopen one.Two interesting, different approaches o cutting down the searchspacehave been investigated recently. The first of these s the useof expert knowledge to guide the design process. The DAA wasthe first expert system which performed data path synthesis. Thesecond approach is to narrow the problem domain so that moredomain-specific knowledge can be used. The digital signal pro-cessing domain has been explored by several groups, for example.The CATHEDRAL system [7] is an example of a very successfuleffort in that area. Other programs have been able to get very goodresults by focusing on m icroprocessor design, for example theSUGAR system [24]. Synthesis of pipelined data paths is a designdomain which has now been characterized by a foundation oftheory [20] and implemented by the program Sehwa.

    4. Open problemsThere are a number of open problems yet to be solved in the syn-thesis area.Human factors refers to the place of the designer n the design pro-cess. Some of the important issues yet to be settled are how thedesigner is to input design specifications and constraints, how thesystem is to output results, what decisions the designer shouldmake and what information the designer needs in order to makethem, and how the sys tem is to explain to the user what is goingon during the design process.Some user interaction is now allowedwith EMIJCS 1191,and user interaction is used to guide the searchin Sehwa. Mmola suppotts user interaction, particularly in ms-tticting resources. These efforts, however, am only the beginningof research nvolving this aspectof synthesis.Design verification involves the proof that a detailed designimplements the exact design stated n the specification. This mightinvolve verifying the design produced by the system against theinitial specification, or it could mean verifying the synthesis pro-cess itself by showing that each step in the synthesis processpreserves he behavior of the initial specification. McFarland andParker [18] have used formal me thods to verify a number of theoptimizing transformations used at the beginning of the synthesisprocess,but much mom needs o be done in this area.Integrating levek of design means a number of things. For one, itmeans performing physical design, including floorplanning, alongwith the synthesis of the logical structure. as the BUD programdoes. Second, to make realistic evaluations of design tradeoffs atthe algorithmic and register transfer levels, it is necessary o beable to anticipate what the lower level tools wilI do. Estimation ofperformanceand area at the layout level is performed by BUD. andPLEST [14] performs area estimation, but m ore research on thistopic is needed. Third. integration across design levels meansmaintaining a single representation which contains all levels ofdesign information, as the ADAM Design Data Structure does ]12].Finally, it should be possible to allow parts of a design to exist at aparticular time at different levels, which no current synthesis sys-tem performs to any degree.A number of design tasks am still open problems. Interface designand handling local timing constraints have been researched y Nes-tor [19 ] and Borrlello [3]. with many problems still to be solved.System-level issues ike trading off complexity between he controland the data paths and breaking a system nto interacting asynchm-nous processesare new problems. High level transformations onthe behavior have been classified and studied, but when to applythe transforms and in what order is an open problem. Trickeyfound an effective way of ordering and searching hrough a limitednumber of transformations [27]; but it is not clear that that methodwould generalize to a richer and less orderly se t of transformations.In summary, high-level synthesis dellned as an abstract, limitedproblem of scheduling and allocation is well-understood, and thereare a variety of effective techniques that have been applied to it.However, when it is seen n its real context, opening up such issuesas specification and designer intervention, the need to handle com-plex timing constraints, and the relation of synthesis to the overalldesign and fabrication process, there are still many unansweredquestions. Much work needs o be done before synthesis becomesreally practical.

    Paper 23.1335

  • 7/29/2019 dac88_330

    7/7

    1.

    2.

    3.

    4.

    5.

    6.

    7.

    8.

    9.

    10.

    11.

    12.

    13.14.

    15.

    REFERENCES 16.Barbacc i. M.R. Automated Exploration of the DesignSpace for Register Transfer (RT) Systems. PhD Thesis,Carnegie-Mellon University, 1973. 17.Barbacci. M.R. Instruction Set Processor Specifications(ISPS): The Notation and its Applications. IEEE Transac-tions on Computers C-30, 1 (January, 1981). 2440. 18.BorrielIo, G. and Katz, R.H. Synthesis and Optimizationof Interface Transducer Logic. Proceedings of the interna-tion Conference on Computer-Aided Design (Nove.mber9,1987). 274-277. 19.Brayton, R .K.. Camposano. R., DeMicheli, G.:. Gtten,R.H.J.M., and vanEijndhoven, J. The Yorktown S iliconCompiler. In Silicon Compilation, D.D. Gajski, Ed.Addison-Wesley, Reading, MA, 1988, pp. 204-311. 20.Brewer, F.D. and Gajski. D.D. Knowledge Based Controlin Micro-Architecture Design. In Proceedings of the 24thDesign Automation Conference, ACM and IEEE, June,1987, pp. 203-209. 21.Davidson, S., Landskov. D., Shriver, B.D.. and Mallet&P.W. Some experiments n local microcode compaction forhorizontal machines. IEEE Transactions on Computers C-30. 7 (July, 1981). 460-477. 22.DeMan, H., Rabaey, J., Six, P., and Claesen.L. CathedralII: A Silicon Compiler for Digital Signal Processing. IEEEDesign and Test 3, 6 (December, 1986), 13-25. 23.Girczyc, E.F. Automatic Generation of MicrosequencedData Paths to Realize ADA Circuit Descriptions. PhDThesis, Carleton University, July, 1984. 24.Hafer. L.J. and Parker, A.C. Register-Transfer Level Digi-tal Design Automation: The Allocation Process. InProceedings of the 15th Design Automation Conference,ACM and IEEE, June, 1978, pp. 213-219. 25.Hitchcock, C.Y. and Thomas, D.E. A Method ofAutomatic Data Path Synthesis. In Proceedings of the 20thDesign Automation Conference, ACM and IEEE, June,1983, pp. 484-489.Johnson, S.D. Synthesis of Digital Designs from Recur-sion Equations. PhD Thesis, Indiana University, 1984.MIT Press. 26.Knapp, D.. Granacki, J., and Parker, A.C. An Expert Syn-thesis System. In Proceedings of the International Confer-ence on Computer-aided Design, ACM and IEEE,September,1984, pp. 419-24. 27.Kowalski, T.J. An Artificial Intelligence Approach to VLSIDesign. Kluwer Academic Publishers, Boston, 1985.Kurdahi, F.J. and Parker, A.C. PLEST: A Program forArea Estimation of VLSI Integrated Circuits. In Proceed-ings of the 23rd Design Automation Conference, ACM andIEEE, June, 1986, pp. 467-473.

    28.

    Kurdahi, F.J. and Parker, A.C. REAL: A Program forREgister ALlocation. In Proceedings of the 24th DesignAutomation Conference, ACM and IEEE, June, 1987, pp.210-215.

    29.

    McFarland, M.C. The VT: A Database for AutomatDigital Design. DRC-01-4-80, Design Research CentCarnegie-Mellon University, December, 1978.McFarland, M.C. Using Bottom-Up Design Techniques the Synthesis of Digital Hardware from Abstract BehaviorDescriptions. In Proceedings of the 23rd Design Automation Conference, IEEE and ACM, June, 1986.McFarland, M.C. and Parker, A.C. An Abstract Model Behavior for Hardware Descriptions. IEEE Transactionon Computers C-32, 7 (July, 1983). 621-36.Nestor, J.A. Specification 8z Synthesis of Digital Systewith Interfaces. CMUCAD-87-10, Department of Electricand Computer Engineering, Carnegie-Mellon Uy, Apr1987.Park, N. and Parker, AC. Sehwa: A Software Package Synthesis of Pipelines from Behavioral SpecificatioIEEE Transactions on Computer-Aided Design of DigitaCircuits and Systems 7, 3 (March, 1988), 356-370.Parker, A.C., Plzarm, J., and Mlinar, M. MAHA: A Pgram for Datapath Synthesis. In Proceedings of the 23Design Automation Conference, ACM and IEEE, Ju1986, pp. 461-466.Paulin, P.G. and Knight, J.P. Force-Directed SchedulingAutomatic Data Path Synthesis. in Proceedings of the 24Design Automation Conference, ACM and IEEE, Ju1987, pp. 195-202.Peng, 2. Synthesis of VLSI Systems with the CAMADesign Aid. In Proceedings of the 23rd Design Automtion Conference, IEEE and ACM, June, 1986, pp. 278-28Rajan, J.V. and Thomas, D.E. Synthesis by Delayed Bining of Decisions. In Proceedings of the 22nd Design Autmation Corgference, ACM and IEEE, June, 1985. pp. 3673.Rosenstiel, W. and Camposano, R. Synthesizing Circufrom ,Behavioral Level Specifications. In Proceedings the 7th International Conference on Computer HardwarDescription Languages and their Applications, C. Koomand T. Moto-oka, Eds.. North-Holland, August, 1985, p391-402.Snow, E.A.. Siewiorek, D.P., and Thomas, D.E. Technology-Relative Computer-Aided Design SysteAbstract Representations, Transformations, and DesTradeoffs. In Proceedings of the 15th Design AutomatioConference, ACM and IEEE, 1978 , pp. 220-226.Trickey, H. Flamel: A High-Level Hardware CompiIEEE Transactions on CAD CAD-6, 2 (March, 1987), 25269.Tseng, C. and Siewiorek. D.P. Automated SynthesisData Paths in Digital Systems. IEEE Transactions Computer-Aided Design of Integrated Circuits and SysteCAD-5, 3 (July, 1986). 379-395.Zimmemamm, G. MDS-The Mimola Design MethJournal of Digital Systems 4, 3 (1980), 337-369.

    Paper 23.1336