A :ABSTRACTIONS FOR A HARDWARE ESIGN REUSEhadi/doc/paper/2015-ieee_micro...Mohapatra et al.,...

........................................................................................................................................................................................................................

AXILOG: ABSTRACTIONS FORAPPROXIMATE HARDWARE DESIGN

AND REUSE........................................................................................................................................................................................................................

RELAXING THE TRADITIONAL ABSTRACTION OF “NEAR-PERFECT” ACCURACY IN HARDWARE

DESIGN CAN YIELD SIGNIFICANT GAINS IN EFFICIENCY, AREA, AND PERFORMANCE. AXILOG,

A SET OF LANGUAGE EXTENSIONS FOR VERILOG, PROVIDES THE NECESSARY SYNTAX AND

SEMANTICS FOR APPROXIMATE HARDWARE DESIGN AND REUSE, LETTING DESIGNERS

SAFELY RELAX ACCURACY REQUIREMENTS IN THE DESIGN WHILE KEEPING THE CRITICAL

PARTS STRICTLY PRECISE.

......Several techniques have shownsignificant benefits with approximation at thecircuit level,1–15 but they lack design abstrac-tions that enable designers to methodicallycontrol which parts of a circuit can beapproximated while keeping the critical partsprecise. Thus, a need persists for approximatehardware description languages enabling sys-tematic synthesis of approximate hardware.To meet this need, we introduce Axilog, a setof concise, intuitive, and high-level annota-tions that provide the necessary syntax andsemantics for approximate hardware designand reuse in Verilog.

A key factor in our language formalism isto abstract away the details of approximationwhile maintaining the designer’s completeoversight in deciding which circuit elementscan be synthesized approximately and whichones are critical and therefore cannot beapproximated. Axilog also supports reusabil-ity across modules by providing a set of spe-cific reuse annotations. In general, hardwaresystem implementation relies on modular

design practices in which engineers buildlibraries of modules and reuse them acrosscomplex hardware systems. In this article, weelaborate on the Axilog annotations forapproximate hardware design and reuse.These annotations are coupled with a safetyinference analysis (SIA) that automaticallyinfers which circuit elements are safe toapproximate with respect to the designer’sannotations. Axilog and safety analysis sup-port approximate synthesis and are com-pletely independent of the synthesis process.

To evaluate Axilog, we devised two syn-thesis processes. The first synthesis flowfocuses on current technology nodes and lev-erages commercial tools. This synthesis proc-ess applies approximation by relaxing thetiming constraints of the safe-to-approximatesubcircuits. Results show that this synthesisflow provides, on average, 1.54� energy sav-ings and 1.82� area reduction by allowing a10 percent quality loss. The second synthesisflow studies the potential of approximate syn-thesis by using a probabilistic gate model for

Divya Mahajan

Georgia Institute of Technology

Kartik Ramkrishnan

Rudra Jariwala

University of Minnesota

Amir Yazdanbakhsh

Jongse Park

Bradley Thwaites

Anandhavel Nagendrakumar


Abbas Rahimi

University of California, San Diego

Hadi Esmaeilzadeh


Kia Bazargan

University of Minnesota

............................................................

2 Published by the IEEE Computer Society 0272-1732/15/$31.00�c 2015 IEEE

future technology nodes. This synthesis flowprovides, on average, 2.5� energy and 2.2�probabilistic CMOS (PCMOS) area reduc-tion. Axilog yields these significant benefitswhile only requiring between two to 12annotations, even with complex designs con-taining up to 22,407 lines of code. These

results confirm Axilog’s effectiveness in incor-porating approximation in the hardwaredesign cycle.

Approximate hardware design with AxilogOur principal objectives for approximate

hardware design with Axilog are to

..............................................................................................................................................................................................

Related Work in ApproximationA growing body of research shows the applicability and significant

benefits of approximation.1–15 However, prior research has not

explored extending hardware description languages for systematic

and reusable approximate hardware design.

Approximate programming languagesEnerJ provides a set of type qualifiers to manually annotate all

the approximate variables in the program.16 If we had extended

EnerJ’s model to Verilog, the designer would have had to manually

annotate all approximate wires and registers. Rely asks for manually

marking both approximate variables and operations, which requires

more annotations.17 With our annotations, the designer marks a few

wires and registers, and then the analysis automatically infers which

other connections and gates are safe to approximate.

Approximate circuit design and synthesisPrior work proposes imprecise implementations of custom instruc-

tions and specific hardware blocks.3,4,6–9 Other recent work proposes

algorithms for approximate synthesis that leverages gate pruning, tim-

ing speculation, or voltage overscaling.5,10–15 Although these synthe-

sis techniques provide significant improvements, they do not focus on

approximate hardware design and reuse. In fact, our framework can

benefit and leverage all these synthesis techniques.

References1. L.N. Chakrapani et al., “Probabilistic System-on-a-Chip

Architectures,” ACM Trans. Design Automation of Electronic

Systems, vol. 12, no. 3, 2007, article 29.

2. H. Cho, L. Leem, and S. Mitra, “ERSA: Error Resilient Sys-

tem Architecture for Probabilistic Applications,” IEEE Trans.

Computer-Aided Design of Integrated Circuits and Systems,

vol. 31, no. 4, 2012, pp. 546–558.

3. V. Gupta et al., “IMPACT: Imprecise Adders for Low-Power

Approximate Computing,” Proc. 17th IEEE/ACM Int’l Symp.

Low-Power Electronics and Design, 2011, pp. 409–414.

4. R. Ye et al., “On Reconfiguration-Oriented Approximate

Adder Design and its Application,” Proc. Int’l Conf. Com-

puter-Aided Design, 2013, pp. 48–54.

5. D. Shin and S.K. Gupta, “Approximate Logic Synthesis for

Error Tolerant Applications,” Proc. Conf. Design, Automa-

tion, and Test in Europe, 2010, pp. 957–960.

6. P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading Accuracy

for Power with an Underdesigned Multiplier Architecture,”

Proc. 24th Int’l Conf. VLSI Design, 2011, pp. 346–351.

7. A. Kahng and S. Kang, “Accuracy-Configurable Adder for

Approximate Arithmetic Designs,” Proc. 49th Ann. Design

Automation Conf., 2012, pp. 820–825.

8. D. Mohapatra et al., “Design of Voltage-Scalable Meta-Func-

tions for Approximate Computing,” Proc. Int’l Conf. Com-

puter-Aided Design, 2011, pp. 667–673.

9. S.-L. Lu, “Speeding Up Processing with Approximation

Circuits,” Computer, vol. 37, no. 3, 2004, pp. 67–73.

10. S. Venkataramani et al., “SALSA: Systematic Logic Synthe-

sis of Approximate Circuits,” Proc. 49th Ann. Design Auto-

mation Conf., 2012, pp. 796–801.

11. K. Nepal et al., “ABACUS: A Technique for Automated

Behavioral Synthesis of Approximate Computing Circuits,”

Proc. Conf. Design, Automation, and Test in Europe, 2014,

article 361.

12. Y. Liu et al., “On Logic Synthesis for Timing Speculation,”

Proc. Int’l Conf. Computer-Aided Design, 2012, pp. 591–596.

13. A. Lingamneni et al., “Algorithmic Methodologies for Ultra-

Efficient Inexact Architectures for Sustaining Technology

Scaling,” Proc. 9th Conf. Computing Frontiers, 2012, pp.

3–12.

14. J. Miao, A. Gerstlauer, and M. Orshansky, “Approximate

Logic Synthesis under General Error Magnitude and Fre-

quency Constraints,” Proc. Int’l Conf. Computer-Aided

Design, 2013, pp. 779–786.

15. S. Ramasubramanian et al., “Relax-and-Retime: A Methodol-

ogy for Energy-Efficient Recovery Based Design,” Proc. Int’l

Conf. Computer-Aided Design, 2013, pp.779–786.

16. A. Sampson et al., “EnerJ: Approximate Data Types for Safe

and General Low-Power Computation,” Proc. 32nd ACM Sig-

plan Conf. Programming Language Design and Implementa-

tion, 2011, pp. 164–174.

17. M. Carbin, S. Misailovic, and M. Rinard, “Verifying Quantita-

tive Reliability of Programs that Execute on Unreliable

Hardware,” Proc. ACM Sigplan Int’l Conf. Object Oriented

Programming Systems Languages & Applications, 2013, pp.

33–52.

.................................................................

SEPTEMBER/OCTOBER 2015 3

� craft a small number of Verilog anno-tations that provide designers withcomplete oversight over the approxi-mation process;

� minimize the number of manual anno-tations while relying on SIA to auto-matically infer the designer’s intent forapproximation, thereby relieving thedesigner of the details of the approxi-mate synthesis process; and

� support the reuse of Axilog modulesacross different designs without theneed for reimplementation.

Furthermore, Axilog is a backward-com-patible extension of Verilog. That is, anAxilog code with no annotations is a normalVerilog code. To this end, Axilog providestwo sets of language extensions, one for thedesign and one for the reuse of hardwaremodules. Table 1 summarizes the syntax forthe design and reuse annotations.

The annotations for design dictate whichoperations and connections are safe to ap-proximate in the module. Henceforth, forbrevity, we refer to operations and connec-tions as design elements. The annotations forreuse let designers use the annotated approxi-mate modules across various designs withoutany reimplementation. We provide detailedexamples to illustrate how designers canappropriately relax or restrict the approxima-tion in hardware modules. In the examples,we use background shading to highlight thesafe-to-approximate elements inferred by theanalysis.

Design annotations relaxing accuracy requirementsBy default, all design elements are precise.

The designer can use the relax(arg)statement to implicitly approximate a subsetof these elements. The variable arg is eithera wire, reg, output, or inout. Designelements that exclusively affect signals desig-nated by the relax annotation are safe toapproximate. The following example illus-trates the use of relax:

module full adder(a, b, c in,c out, s);

input a, b, c in; outputc out;

approximate output s;assign s¼ aˆ bˆ c in;assign c out ¼ a & b þ b &

c inþ a & c in;relax(s);

endmodule

In this full adder example, therelax(s) statement implies that the analy-sis can automatically approximate the XORoperations. The unannotated c out signaland the logic generating it are not approxi-mated. Furthermore, because s will carryrelaxed semantics, its corresponding outputis marked with the approximate annota-tion that is necessary for reusing modules.With these annotations and the automatedanalysis, the designer does not need to indi-vidually declare the inputs (a, b, c in) orany of the XOR (̂) operations as approxi-mate. Thus, while designing approximate

Table 1. Summary of Axilog’s language syntax.

Phase Annotation Argument Description

Design relax Wire, reg, output, inout Declare an argument as safe to approximate. Design elements that

affect the argument are safe to approximate.

relax local Similar to relax, but the approximation does not cross module

boundaries.

restrict Any design element that affects the argument is made precise unless

explicitly relaxed.

restrict global All the design elements affecting the argument are precise.

Reuse approximate Output, inout Indicates that the output carries relaxed semantics.

critical Input Indicates the input is critical and approximate elements cannot drive it.

bridge Wire, reg Allow connecting an approximate element to a critical input.

..............................................................................................................................................................................................

ALTERNATIVE COMPUTING TECHNOLOGIES

.................................................................

4 IEEE MICRO

hardware modules, this abstraction signifi-cantly reduces the burden on the designer tounderstand and analyze complex dataflowswithin the circuit.

Scope of approximation. The scope of therelax annotation crosses the boundaries ofinstantiated modules, as shown in Figure 1.

The relax(x) annotation in thenand gate module in Figure 1a impliesthat the AND (&) operation in the and gatemodule is safe to approximate. In some cases,the designer might not prefer the approxima-tion to cross the scope of the instantiatedmodules. Axilog provides the relax localannotation, which does not cross moduleboundaries.

On the other hand, the code in Figure 1bshows that the relax local annotationdoes not affect the semantics of the instanti-ated and gate module, a1. However theNOT (�) operation, which shares the scope ofthe relax local annotation, is safe toapproximate. The scope of approximation ofrelax and relax local is the module inwhich they are declared.

Restricting approximation. In some cases, thedesigner might want to explicitly restrictapproximation in certain parts of the design.Axilog provides the restrict(arg) anno-tation, which ensures that any design elementaffecting the annotated argument (arg)is precise, unless a preceding relax orrelax local annotation has made thedriving elements safe to approximate. Therestrict annotation crosses the boundaryof instantiated modules.

Restricting approximation globally. In somecases, the designer might intend to overridepreceding relax annotations. For instance,the designer might intend to keep certaindesign elements that are used to drive criticalsignals, such as the control signals for a statemachine, write enable of registers, address linesof a memory module, or even clock and reset.To ensure the precision of these signals, Axilogprovides the restrict global annota-tion, which has precedence over relax andrelax local. The restrict global(arg) penetrates through module bounda-

ries and ensures that any design element thataffects arg is not approximated.

Reuse annotationsOur principal idea for these language

abstractions is to maximize the reusability ofthe approximate modules across designs thatmight have different accuracy requirements.

Outputs carrying approximate semantics. Aswe mentioned earlier, designers can use anno-tations to selectively approximate design ele-ments in a module. The reusing designer mustbe aware of the accuracy semantics of the I/Oports without delving into the details of themodule. To enable the reusing designer toview the port semantics, Axilog requires that

module and_gate(n,a,b);

input a, b; output n;

assign n = a & b;

endmodule

module nand_gate(x, a, b);

input a, b;

approximate output x;

wire w0;

and_gate a1(w0, a, b);

assign x = ̃w0; relax(x);

endmodule

(a)


input a,b; output n;

assign n = a & b;

endmodule


input a, b;


wire w0;


assign x = ̃w0;

relax_local (x);

endmodule

(b)

Figure 1. Code segments showing the

difference in the scope of relax and

relax local annotation. (a) Scope of

relax annotation. (b) ———. Scope of

relax local annotation.

.................................................................


all output ports that might be influenced byapproximation be marked as approximate.The two code snippets in Figure 2 illustratethe necessity of the approximate annota-tion. In Figure 2a, output n carries relaxedsemantics due to the relax annotation andis therefore declared as an approximateoutput. Consequently, the a1 instance in thenand gate module will cause its x output tobe relaxed. Therefore, x is marked as anapproximate output.

In Figure 2b, the x output is explicitlyrelaxed, and x is marked as an approximateoutput. The and gate module here does not

carry approximate semantics by default. There-fore, the output of the and gate is notmarked as approximate, because theapproximation is limited to the a1 instance.

Critical inputs. A designer might want toprevent approximation from affecting certaininputs, which are critical to the circuit’s func-tionality. To mark these input ports, Axilogprovides critical annotation. Wires thatcarry approximate semantics cannot drive thecritical inputs without the designer’sexplicit permission at the time of reuse.

Bridging approximate wires to critical inputs.We recognize that there may be cases whenthe reusing designer entrusts a critical inputwith an approximate driver. For such situa-tions, Axilog provides an annotation calledbridge that shows designer’s explicit intentto drive a critical input by an approximatesignal.

SummaryThe semantics of the relax and

restrict annotations provide abstractionsfor designing approximate hardware moduleswhile enabling Axilog to provide formal guar-antees of safety that approximation will berestricted to only those design elements thatthe designer specifically selected. Moreover,the approximate output, criticalinput, and bridge annotations enable reus-ability of modules across different designs. Inaddition to the modularity, the design andreuse annotations enable approximation poly-morphism, implying that the modules withapproximate semantics can be used in a pre-cise manner, and vice versa, without any reim-plementation. These abstractions naturallyextend current hardware-design practices andlet designers apply approximation with fullcontrol without adding substantial overheadto the conventional hardware design and veri-fication cycle.

Safety inference analysisAfter the designer provides annotations,

the compiler must perform a static analysis tofind the approximate and precise design ele-ments in accordance with these annotations.We present the SIA, a static analysis thatidentifies these safe-to-approximate design


input a,b;

approximate output n;

assign n = a & b;

relax(n);

endmodule


input a, b;


wire w0;


assign x = ̃w0;

endmodule

(a)


input a, b;

output n;

assign n= a & b;

endmodule


input a, b;


wire w0;


assignx = ̃w0;

relax(x);

endmodule

(b)

Figure 2. Code segments showing the

necessity of the approximateannotation. (a) Approximate output when

relax annotation applied within the

submodule (and gate). (b) Approximate

output when relax annotation is applied

within the main module (nand gate).

..............................................................................................................................................................................................


.................................................................

6 IEEE MICRO

elements. The design elements are organizedprimarily according to the circuit’s structureand not necessarily on the order of the state-ments in the HDL source code. This propertyis a fundamental property of Verilog thatAxilog inherited. Thus, we first translate theRTL design to primitive gates, while main-taining the module boundaries. Then, weapply the SIA after the code is translated toprimitive gates and the structure of the circuitis identified. Consequently, the SIA can applyall the annotations while considering the cir-cuit’s structure. The SIA is a backward slicingalgorithm that starts from the annotated wiresand iteratively traverses the circuit to identifywhich wires must carry precise semantics.Subtracting the set of precise wires from allthe wires in the circuit yields the safe-to-approximate set of wires. The gates thatimmediately drive these safe-to-approximatewires are the ones that the synthesis enginecan approximate. Figure 3 illustrates the pro-cedure that identifies the precise wires.

This procedure is a backward-flow analy-sis that has three phases. The first phaseidentifies the sink wires, which are eitherunannotated outputs or wires explicitly anno-tated with restrict. The procedure thenidentifies the gates that are driving these sinkwires and adds their input wires to the preciseset. The algorithm repeats this step for thenewly added wires until it reaches an input oran explicitly relaxed wire. However, this phaseis limited to the scope of the module underanalysis.

The second phase identifies the relaxed out-puts of the instantiated submodules. Becauseof the semantic differences between relaxand relax local, a submodule’s outputwill be considered relaxed if two conditionsare satisfied:

� the output drives another explicitlyrelaxed wire, which is not inferreddue to a relax local annotation;and

� the output is not driving a wirealready identified as precise.

The algorithm automatically annotatesthese qualifying outputs as relaxed. The anal-ysis repeats these two phases for all the instan-tiated submodules. For correct functionalityof this analysis, all the module instantiations

are distinct entities in the set M and areordered hierarchically.

In the final phase, the algorithm marksany wire that affects a globally restricted wireas precise. Finally, the SIA identifies the safe-to-approximate subset of the gates and wireswith regards to the designer annotations. Anapproximation-aware synthesis tool can thengenerate an optimized netlist.

Approximate synthesisIn our framework, approximate synthesis

involves two stages. In the first stage, anno-tated Verilog source code is converted to aprecise gate-level netlist while preserving theapproximate annotations. The SIA then iden-tifies the safe-to-approximate subset of thedesign based on designer annotations. In thesecond stage, the synthesis tool appliesapproximate synthesis and optimization tech-niques only to the safe-to-approximate subsetof the circuit elements. The tool may applyany approximate optimization technique—including gate substitution, gate elimination,logic restructuring, voltage overscaling, andtiming speculation—as it deems prudent.The objective is to minimize a combinationof error, delay, energy, and area consideringfinal quality requirements. As Figure 4 shows,we developed two approximate synthesisflows to evaluate Axilog.

AST: Approximate synthesis through relaxingtiming constraints

The AST synthesis flow is applicable tocurrent technology nodes and leverages com-mercial synthesis tools. As Figure 4a shows,we first use the Synopsys Design Compiler tosynthesize the design with no approximation.We perform a multiobjective optimizationtargeting the highest frequency while mini-mizing power and area. We will refer to theresulting netlist as the baseline netlist and itsfrequency as the baseline frequency. Weaccount for variability using Synopsys Prime-TimeVX, which, given timing constraints,provides the probability of timing violationsdue to variations. In case of violation, thesynthesis process is repeated by adjusting tim-ing constraints until PrimeTimeVX confirmsno violations. Second, as Figure 4b shows, werelax the timing constraints only for the safe-to-approximate paths. We then extract the

.................................................................


Inputs: : Set of all the ordered modules within the circuit

ℝ: Queue of all the globally restricted wires

Output: ℙ: Set of precise wires

Initialize ℙ → ∅for each mi ∈ do

I: Set of all inputs ports in mi

A: Set of all wires annotated as relaxed wires in mi

LA: Set of all wires annotated as locally relaxed wires mi

Sink: Queue of all explicitly restricted wires in mi ∪ Set of

unannotated output ports

UW: Set of wires driven by modules which are instantiated within mi

//Phase1: This loop identifies the mi module’s local precise wires (wi)

Initialize N ← ∅ A set of relaxed wires in each module miwhile (Sink ≠ ∅) do

wi ← Sink.dequeue()

if (wi ∉ I and wi ∉(A ∪ LA)) then

if (wi ∈ UW) then

N.append(wi)

else

ℙ.append(wi)

end if

Sink.enqueue(for all input wires of gate wi in mi)

end if

end while

//Phase 2: Identifying the relaxed wires (wj) that are driven by the mj submodules; the mj submodules are the instantiated modules in mi

for (wj ∈ UW) do

if (wj ∈ N and wj drives wire ∈ A) then

mj ← module driving the wire wjmj.A.append(wj)

end if

end for

end for

//Phase 3: Identifying the precise wires (wk) that are globally restricted

while (ℝ ≠ ∅) do

wk ← ℝ.dequeue()

ℙ.append(wk)

ℝ.append(input wires of the gate that drive wk)

end while

Figure 3. The part of the safety inference analysis (SIA) that identifies precise wires according to the designer’s annotations.

..............................................................................................................................................................................................


.................................................................

8 IEEE MICRO

post-synthesis gate delay information inStandard Delay Format and perform gate-level timing simulations with a set of inputdatasets.

We use the baseline frequency for the tim-ing simulations even though some of thesafe-to-approximate paths are synthesizedwith more timing slack. Timing simulationsyield output values that may incur qualityloss at the baseline frequency. We then meas-ure the quality loss, and if the quality loss ishigher than the designer’s requirements, wetighten the timing constraints on the safe-to-approximate paths. We repeat this step untilthe quality requirements are satisfied. Thismethodology could reduce energy and areaby using slower and smaller gates for thepaths that use relaxed timing constraints.

ASG: Approximate synthesis through gate resizingThe ASG synthesis flow studies the poten-

tial of approximate synthesis for future tech-nology nodes. Because the characteristics of

transistors and gates for future technologiesare unknown, we assume that the probabilityof error for a gate is an inverse function of itssize. As a result, gate size, referred to as thePCMOS area,16 should be treated as a proxyfor the cost we would pay in a future technol-ogy node to get more robust gates. That costcould be thicker gate oxides, higher thresholdvoltage, and higher flow VDD to make thetransistors more robust. The ASG and syn-thesis applies approximation by selectivelydownsizing the gates as shown in Figure 4c.In this framework, smaller gates dissipate lessenergy and have smaller PCMOS area, butthey may generate incorrect output withsome probability.

Probabilistic error models for gates. Owing tothe unavailability of future nodes, we aug-ment a currently available library—NanGateFreePDK 45 nm—with a probabilistic errormodel for all the gates in the library. Theerror model provides the probability of a bit

Verilog code

Strict timing constraints

Design compiler PrimeTime VXSynthesized netlistTiming

violation

Yes

Baselinenetlist

No

Synthesis

Statistical error analysisand safe-to-approximate gates identification

Axilogcompiler

Statisticalerror analysis

Profiling inputs

Acceptable error foroutput pins

Safe-to-approximategates

TILOS gate sizingusing probabilistic

models

Resized gate-level netlist

Simulation

Gate-level simulation

Simulationinputs

Simulation Finaloutput error

Probablistic TILOS gate sizing

Determines the acceptablepercentage of error on each output pinand identifies the safe-to-approximate gates

Resizes safe-to-approximate gates

Axilogcode

Worst-case errorcalculation

Outputrequirements

satisfied

No

Yes

Determines the output errorfor test inputs

Timingsimulation

Qualitymeasurement

SDF file

Synthesizedapproximate

netlist

Axilogcompiler

Strict and relaxedtiming constraints

Designcompiler

Synthesis phase Quality observation phase

Input dataset

Qualityrequirement

satisfied

No

Finalapproximate

netlist

YesAxilog code

Safe toapproximate

gates

(a)

(b)

(c)

Figure 4. Synthesis flows for the baseline, for approximate synthesis through relaxing timing constraints (AST), and for

approximate synthesis through gate resizing (ASG). (a) Synthesis flow for baseline error-free gate netlist. (b) Synthesis flow

AST for safe-to-approximate gates. (c) Synthesis flow for ASG, which uses probabilistic models as a proxy for future nodes.

.................................................................


flip in the gate output. We use transistor-levelSpice simulations to find the probability ofan error at the gate output using the CadenceVirtuoso toolset. We take inspiration fromthe PCMOS models described by Cheemala-vagu et al.16

We simulated each gate at different sizesand injected the gate inputs with Gaussiannoise through a minimum-sized buffer. Gateerror also depends on threshold voltage; how-ever, we focused on gate sizing and its effectson gate error for a fixed threshold voltage. Foreach input combination, the noise was injectedon gate inputs in the form of a piecewise linearvoltage source, and the output was sampledfor 10,000 inputs. Finally, we computed theprobability of correct output as follows:

Pcorrect output ¼

1�Number of incorrect samples

Total number of samples:

We repeat this measurement for all theinput combinations of the gate and assign thegate with the worst observed error. Next, weuse this error model to optimize the circuit’spower and area by upsizing the fewest gatesin a circuit while satisfying the designer-specified error requirements.

Gate sizing optimization. The ASG optimi-zation algorithm shown in Figure 5 trades offaccuracy for reductions in PCMOS area andenergy. We extended the Tilos algorithm17 toincorporate probabilistic models andchanged the objective from minimizing delayto minimizing error and cost.

The ASG optimization algorithm com-prises four phases. In the first phase, weextract the adjacency list (a space-efficientway of representing a circuit) of the safe-to-approximate subcircuit and determine itsinputs and outputs.

In the second phase, the algorithm uses aMonte Carlo simulation to determine theerror-free probability of obtaining a 1 or a 0at each node of the subcircuit. For the MonteCarlo simulation, random input vectors areapplied to the subcircuit’s inputs, and a topo-logical traversal propagates the values throughthe circuit for each input vector. This processgives us the probability of getting a 1 or 0 ateach gate’s output. We then initialize all gates

in the safe-to-approximate subcircuit to theirminimum size (that is, having maximumerror). We calculate the initial error map ateach gate’s output by propagating the errorthrough the circuit using the Boolean ErrorPropagation (BEP) algorithm.18 The BEPalgorithm then estimates the worst-case errorprobability for the design’s outputs using eachgate’s error probability model. If the calcu-lated output error is not within the errorrequirements, we enter phase 3.

In the third phase, for each safe-to-approximate output, we identify the gatesdriving that output, called the fan-in cone,and add it to the fan-in hashmap beta.

In the fourth phase, for each gate in thefan-in cone of safe-to-approximate output,we calculate the sensitivity of the output errorto that gate by temporarily increasing thegate’s size to the next possible size and calcu-lating the ratio of decrease in error to increasein gate size. Finally, after calculating the sensi-tivity for each fan-in gate, we permanentlyupsize only the gate that shows the largestimpact toward the output error. We performthe BEP using the changed gate size andupdate the error map. We repeat the fourthphase for each safe-to-approximate outputuntil user-specified error bounds are satisfiedfor each safe-to-approximate output.

The most computationally intensive partof the entire algorithm is phase 3’s BEP func-tion, with a complexity of Oðn3Þ. We opti-mized this function and reduced itscomplexity to Oðn2Þ by decreasing its itera-tion count by grouping gates together. Thesegroups are resized together.

EvaluationWe evaluated Axilog and the approximate

synthesis processes using a set of benchmarkdesigns.

Benchmarks and code annotationTable 2 lists the Verilog benchmarks. We

used Axilog annotations to judiciously relaxsome of the circuit elements. The bench-marks span many domains, including arith-metic units, signal processing, robotics,machine learning, and image processing.Table 2 also includes the input datasets,application-specific quality metrics, number

..............................................................................................................................................................................................


.................................................................

10 IEEE MICRO

Require: K: Netlist for the entire circuit

Θ: Set of safe-to-approximate gates Σ: Error bound on the approximate output

Ensure: ℜ: Different gate sizes for safe-to-approximate gates Initialize ℜ ← ∅ Minimum gate size Initialize ψ ← ∅ {Monte Carlo simulation map} Initialize γ ← ∅ {Error propagation map} Initialize Π ← ∅ {Primary inputs of the safe-to-approximate circuit} Initialize δ ← ∅ {Queue for primary inputs of the safe-to-approximate circuit} Initialize Φ ← ∅ {Primary outputs of the safe-to-approximate circuit} Initialize β ← ∅ {Fan-in hash-map}

//Phase 1: Identifying inputs (Π) and outputs (Φ) of the safe-toapproximate subset of the circuit.//for each mi ∈ Θ do

if fanin_of mi ⊄ Θ thenΠ ← (Π ∪ {mi})

Φ ← (Φ ∪ {mi})

enqueue(δ, mi) else if mi fanout ⊄ Θ then

end if end for

//Phase 2: Performing Monte Carlo Simulations to calculate probability of 1 or 0 (Ψ) at every nodeΨ ← monte_carlo_simulation (δ, K,Θ, Ψ)

//Calculating the initial error map (γ) for every output node using Boolean Error Propagationγ ← boolean_error_propagation (δ, K, Θ, Ψ, γ)

while (∃ wi ∈ Φ s.t. Σ(wi) < γ(wi)) do

while (∃ wi ∈ Φ s.t. Σ(wi) < γ(wi)) do

//Phase 3: Iteratively calculating the fan-in of every output node using back-propagation and adding the gates to (β)

β ← Gates ∈ Φ that have a path to wiδ ← Primary inputs ∈ Φ that have a path to wi

define m -999 //Max sensitivity initialized

//Phase 4: Calculates the sensitivity of each gate to the output

error and permanently resizes the gate with highest sensitivityG ← ∅for each yi ∈ β do

if (sensitivity of yi > m) then

m = sensitivity of yiG ← yi

end if

end for

ℜ (G) ← ℜ(G)*2 //up-size gate permanentlyγ ← boolean_error_propagation (δ, K, Θ, Ψ, γ)

end while

end while

Figure 5. Gate-sizing algorithm for approximate synthesis through gate resizing (ASG) approximate synthesis flow. The

algorithm upsizes the fewest gates in a circuit to reduce cost.

.................................................................


of lines, and number of Axilog annotationsfor design and reuse.

Axilog annotationsWe annotated the benchmarks with the

Axilog extensions. The designs were eitherdownloaded from open-source IP providersor developed without any initial annotations.After development, we analyzed the sourceVerilog codes to identify safe-to-approximateparts. The last two columns of Table 2 showthe number of design and reuse annotationsfor each benchmark. The number of annota-tions ranges from two for Brent-Kung with352 lines to 12 for InverseK with 22,407lines. The Axilog framework let us use only ahandful of annotations to effectively approxi-mate designs that are implemented withthousands of lines of Verilog.

The safe-to-approximate parts are morecommon in the benchmarks’ datapaths ratherthan their control logic. For example,k-means involves a large number of multipli-cations and additions. We used the relaxannotations to declare these arithmetic

operations approximable; however, we usedrestrict to ensure the precision of all thecontrol signals. For smaller benchmarks,such as Brent-Kung, Kogge-Stone, and Wal-lace Tree, we annotated only a subset of theleast-significant output bits in order to limitthe quality loss. We also annotated thebenchmarks with reuse annotations. Thelast column in Table 2 lists the number ofreuse annotations. Overall, one graduatestudent was able to annotate all the bench-marks within two days without beinginvolved in their design. The intuitivenature of Axilog extensions makes annotat-ing straightforward.

Application-specific quality metricsTable 2 shows the application-specific

error metrics to evaluate the quality loss dueto approximation. Using application-specificquality metrics is commensurate with priorwork on approximate computing and lan-guage design.19,20 In all cases, we comparedthe output of the original baseline applicationto the output of the approximated design.

Table 2. Benchmarks, input datasets, and error metrics.

Benchmark name Domain Input dataset

Quality

metric No. of lines

No. of annotations

Design Reuse

Brent-Kung (32-bit adder) Arithmetic

computation

1,000,000

32-bit integers

Average

relative error

352 1 1

FIR (8-bit finite impulse

response filter)

Signal

processing

1,000,000

8-bit integers

Average

relative error

113 6 5

ForwardK (forward

kinematics for two-joint arm)

Robotics 1,000,000

32-bit fixed-point

values

Average

relative error

18,282 5 4

InverseK (inverse

kinematics for two-joint arm)

Robotics 1,000,000

32-bit fixed-point

values

Average

relative error

22,407 8 4

k-means (K-means clustering) Machine learning 1,024� 1,024 pixels

color image

Image

difference

10,985 7 3

Kogge-Stone (32-bit adder) Arithmetic

computation

1,000,000

32-bit integers

Average

relative error

353 1 1

Wallace Tree (32-bit multiplier) Arithmetic

computation

1,000,000

32-bit integers

Average

relative error

13,928 5 3

Neural Network (feedforward

neural network)

Machine learning 1,024� 1,024 pixels

color image

Image diff 21,053 4 3

Sobel (Sobel edge detector) Image processing 1,024� 1,024 pixels

color image

Image diff 143 6 3

..............................................................................................................................................................................................


.................................................................

12 IEEE MICRO

Error ≤ 5% Error ≤ 10%

Brent-K

ung

2.0

1.8

1.6

Ene

rgy

red

uctio

n

1.4

1.2

1.0

FIR

Forw

ardK

Invers

eK

k-mea

ns

Kogge-S

tone

Wallac

e tree

Neural

netw

orkSob

el

Geomea

n

2.02.22.42.6

1.81.6

Are

a re

duc

tion

1.41.21.0

Brent-K

ung FIR

Forw

ardK

Invers

eK

k-mea

ns

Kogge-S

tone

Wallac

e tree

Neural

netw

orkSob

el

Geomea

n

PVT cornersBrent-Kung(%)

FIR(%)

ForwardK(%)

InverseK(%)

k-means(%)

Kogge-Stone

(%)

Wallacetree(%)

Neuralnetwork

(%) Sobel

(%)Geomean

(%)

(SS, 0.81 V, 0°C) 3432

117

7872

8779

6965

2421

6563

8372

5741

5448(SS, 0.81 V, 125°C)

2.02.22.42.6

1.81.6P

roxy

for

ener

gy

red

uctio

n

1.41.21.0

Error ≤ 5% Error ≤ 10%

3.2 2.92.8 3.22.8

2.02.22.42.6

1.81.6

Pro

xy fo

r P

CM

OS

area

red

uctio

n

1.41.21.0

Brent-K

ung FIR

Forw

ardK

Invers

eK

k-mea

ns

Kogge-S

tone

Wallac

e tree

Neural

netw

orkSob

el

Geomea

n

(a)

(b)

(c)

(d)

(e)

Figure 6. Energy and area reduction for AST flow and energy and PCMOS area reduction for ASG flow. (a) Energy reduction

for AST flow¼ (Precise circuit energy)/(Approximate circuit energy). (b) Area reduction for AST flow¼ (Precise circuit area)/

(Approximate circuit area). (c) Energy reduction for ASG flow when the quality degradation limit is set to 10 percent for two

different PVT corners. (d) Proxy for energy reduction¼ (Precise circuit energy)/(Approximate circuit energy). (e) Proxy for

PCMOS area reduction¼ (Precise circuit PCMOS area)/(Approximate circuit PCMOS area).

.................................................................


Experimental resultsBoth synthesis techniques used Synopsys

Design Compiler (G-2012.06-SP5) and Syn-opsys PrimeTime (F-2011.06-SP3-2) for syn-thesis flows and energy analysis, respectively.

AST evaluation. We used Cadence NC-Veri-log (11.10-s062) for timing simulation withStandard Delay Format back annotationsextracted from various operating corners. Weused the Taiwan Semiconductor Manufactur-ing Company 45-nm multi-Vt standard cellslibraries and reported the primary results forthe slowest process, voltage, and temperaturecorner (Slow Slow, 0.81 V, 0�C). The ASTapproach generates approximate netlists forthe current technology node and provides,on average, 1.45� energy and 1.8� areareduction for the 5 percent limit. With the10 percent limit, the average energy and areagains grow to 1.54� and 1.82�, as shown inFigures 6a and 6b.

Benchmarks with a larger datapath, suchas InverseK, Wallace Tree, Neural Network,and Sobel, provide a larger scope for approxi-mation and are usually the ones that seelarger benefits. The circuit’s structure alsoaffects the potential benefits. For instance,Brent-Kung and Kogge-Stone adders benefitdifferently from approximation because ofthe structural differences in their logic trees.The finite impulse response (FIR) bench-mark shows the smallest energy savingsbecause it is a relatively small design that doesnot provide many opportunities for approxi-mation. Nevertheless, FIR still achieves 11percent energy savings and 7 percent areareduction with 10 percent quality loss, sug-gesting that even designs with limitedopportunities for approximation can benefitsignificantly from Axilog.

We also evaluated our AST technique’seffectiveness in the presence of temperaturevariations for a full industrial range of 0 to125�C. We measured the impact of tempera-ture fluctuations on the energy benefits forthe same relaxed designs. Figure 6c comparesthe energy benefits at the lower and highertemperatures (the quality loss limit is set to10 percent). In this range of temperature var-iations, the average energy benefits rangefrom 1.54� (at 0�C) to 1.48� (at 125�C).

These results confirm our framework’s ro-bustness; it yields significant benefits evenwhen temperature varies.

ASG evaluation. For ASG, we used the Nan-Gate FreePDK 45-nm multispeed standardcells library. The AST and ASG techniquesuse different libraries because the FreePDK45-nm library allows Spice simulationsrequired for the ASG flow. As we mentionedearlier, the ASG flow aims to study the trendsin future technology nodes when gates mightshow probabilistic behavior. We developedPCMOS models with the available librariesat 45 nm. The area numbers reported hereare the ones set by the PCMOS model to sat-isfy the fixed-gate robustness. These numbersdo not necessarily correspond to actual areanumbers in any future technology. ThePCMOS area shows the relative cost savingsacross benchmarks and delineates the antici-pated trends. As Figures 6d and 6e show, theASG flow provides, on average, 2� energyand 1.9� PCMOS area reduction for the 5percent error limit. With the 10 percentlimit, the average energy and area gains growto 2.5� and 2.2�.

A xilog’s automated analysis enables ap-proximate hardware design and reuse

without exposing the intricacies of synthesisand optimization. We aim to extend Axilog’sannotations to enable designers to specifytheir desired quality requirements. We alsoaim to refine the capabilities of the synthesistechniques to better control the approxima-tion versus performance-energy tradeoff suchthat the designer’s quality requirements aremet while maximizing the benefits fromapproximation. Furthermore, the ASG tech-nique now has a complexity on the order ofOðn2Þ, and we aim to devise techniques thatwould reduce ASG’s computational complex-ity. Finally, we will make Axilog tools andbenchmarks open source and available atwww.act-lab.org/artifacts/axilog to furtherfacilitate research and development inapproximate hardware design. MICRO

AcknowledgmentsWe thank the anonymous reviewers for

their insightful comments. This work wassupported in part by Semiconductor Research

..............................................................................................................................................................................................


.................................................................

14 IEEE MICRO

Corporation (SRC) contract #2014-EP-2577,Qualcomm Innovation Fellowship, and a giftfrom Google.

References1. L.N. Chakrapani et al., “Probabilistic Sys-

tem-on-a-Chip Architectures,” ACM Trans.

Design Automation of Electronic Systems,

vol. 12, no. 3, 2007, article 29.

2. H. Cho, L. Leem, and S. Mitra, “ERSA: Error

Resilient System Architecture for Probabilis-

tic Applications,” IEEE Trans. Computer-

Aided Design of Integrated Circuits and Sys-

tems, vol. 31, no. 4, 2012, pp. 546–558.

3. V. Gupta et al., “IMPACT: Imprecise Adders

for Low-Power Approximate Computing,”

Proc. 17th IEEE/ACM Int’l Symp. Low-

Power Electronics and Design, 2011, pp.

409–414.

4. R. Ye et al., “On Reconfiguration-Oriented

Approximate Adder Design and its

Application,” Proc. Int’l Conf. Computer-

Aided Design, 2013, pp. 48–54.

5. D. Shin and S.K. Gupta, “Approximate Logic

Synthesis for Error Tolerant Applications,”

Proc. Conf. Design, Automation, and Test in

Europe, 2010, pp. 957–960.

6. P. Kulkarni, P. Gupta, and M. Ercegovac,

“Trading Accuracy for Power with an Under-

designed Multiplier Architecture,” Proc.

24th Int’l Conf. VLSI Design, 2011, pp.

346–351.

7. A. Kahng and S. Kang, “Accuracy-Configura-

ble Adder for Approximate Arithmetic

Designs,” Proc. 49th Ann. Design Automa-

tion Conf., 2012, pp. 820–825.

8. D. Mohapatra et al., “Design of Voltage-

Scalable Meta-Functions for Approximate

Computing,” Proc. Int’l Conf. Computer-


9. S.-L. Lu, “Speeding Up Processing with

Approximation Circuits,” Computer, vol. 37,

no. 3, 2004, pp. 67–73.

10. S. Venkataramani et al., “SALSA: System-

atic Logic Synthesis of Approximate

Circuits,” Proc. 49th Ann. Design Automa-

tion Conf., 2012, pp. 796–801.

11. K. Nepal et al., “ABACUS: A Technique for

Automated Behavioral Synthesis of Approxi-

mate Computing Circuits,” Proc. Conf.

Design, Automation, and Test in Europe,

2014, article 361.

12. Y. Liu et al., “On Logic Synthesis for Timing

Speculation,” Proc. Int’l Conf. Computer-


13. A. Lingamneni et al., “Algorithmic Method-

ologies for Ultra-Efficient Inexact Architec-

tures for Sustaining Technology Scaling,”

Proc. 9th Conf. Computing Frontiers, 2012,

pp. 3–12.

14. J. Miao, A. Gerstlauer, and M. Orshansky,

“Approximate Logic Synthesis under Gen-

eral Error Magnitude and Frequency Con-

straints,” Proc. Int’l Conf. Computer-Aided

Design, 2013, pp. 779–786.

15. S. Ramasubramanian et al., “Relax-and-

Retime: A Methodology for Energy-Efficient

Recovery Based Design,” Proc. Int’l Conf.

Computer-Aided Design, 2013, pp.779–786.

16. S. Cheemalavagu et al., “A Probabilistic

CMOS Switch and Its Realization by Exploit-

ing Noise,” Proc. IFIP Int’l, 2005; www.ece.

rice.edu/�al4/visen/2005vlsisoc.pdf.

17. S.S. Sapatnekar et al., “An Exact Solution to

the Transistor Sizing Problem for CMOS Cir-

cuits Using Convex Optimization,” IEEE

Trans. Computer-Aided Design, vol. 12, no.

11, 1993, pp. 1621–1634.

18. N. Mohyuddin, P. Ehsan, and P. Massoud,

“Probabilistic Error Propagation in a Logic Cir-

cuit Using the Boolean Difference Calculus,”

Advanced Techniques in Logic Synthesis,

Optimizations and Applications, 2011, pp.

359–381.

19. A. Sampson et al., “EnerJ: Approximate

Data Types for Safe and General Low-

Power Computation,” Proc. 32nd ACM Sig-

plan Conf. Programming Language Design

and Implementation, 2011, pp. 164–174.

20. M. Carbin, S. Misailovic, and M. Rinard,

“Verifying Quantitative Reliability of Pro-

grams that Execute on Unreliable Hardware,”

Proc. ACM Sigplan Int’l Conf. Object Ori-

ented Programming Systems Languages &

Applications, 2013, pp. 33–52.

Divya Mahajan is a PhD student in theComputer Science Department at the Geor-gia Institute of Technology, where she worksin the Alternate Computing TechnologiesLab. Her research interests include com-puter architecture, microarchitecture design,and hardware acceleration. Mahajan has an

.................................................................


MS in electrical and computer engineeringfrom the University of Texas at Austin. Con-tact her at divya [email protected].

Kartik Ramkrishnan is a PhD candidate inthe Department of Computer Science andEngineering at the University of Minnesota.His research focuses on the development ofnovel techniques for improving memory sys-tem performance and approximate circuitdesign. Ramkrishnan has an MS in electricalengineering from the University of Minnesota,Minneapolis. Contact him at [email protected].

Rudra Jariwala is a CAD Engineer at Apple.His research interests include approximatehardware design, low-power CAD algo-rithms, and integrated circuit design. Jariwalahas an MS in electrical engineering from theUniversity of Minnesota, Twin Cities. Con-tact him at [email protected].

Amir Yazdanbakhsh is a PhD student in theSchool of Computer Science at the GeorgiaInstitute of Technology, where he works as aresearch assistant in the Alternative Comput-ing Technologies Lab. His research interestsinclude computer architecture, approximategeneral-purpose computing, mixed-signalaccelerator design, machine learning, andneuromorphic computing. Yazdanbakhsh hasan MS in computer engineering from theUniversity of Wisconsin–Madison and anMS in electrical and computer engineeringfrom the University of Tehran. He is a studentmember of IEEE and the ACM. Contact himat [email protected].

Jongse Park is a PhD student in the Collegeof Computing at the Georgia Institute ofTechnology. His research interests includecomputer architecture, programming lan-guages, and alternative computing technol-ogy. Park has an MS in computer sciencefrom the Korea Advanced Institute of Sci-ence and Technology. Contact him [email protected].

Bradley Thwaites is a PhD student in theDepartment of Electrical and Computer Engi-neering at the Georgia Institute of Technology.

His interests are in computer architecture,memory systems, operating systems, accelera-tion, and machine learning. Thwaites has anMS in electrical and computer engineeringfrom the Georgia Institute of Technology.Contact him at [email protected].

Anandhavel Nagendrakumar is a productengineer at IM Flash Technologies. Hisresearch interests include VLSI design, com-puter architecture, and approximate com-puting. Nagendrakumar has an MS in elec-trical and computer engineering from theGeorgia Institute of Technology, where hecompleted the work for this article. Contacthim at [email protected].

Abbas Rahimi is a PhD candidate in theDepartment of Computer Science and Engi-neering at the University of California, SanDiego. His research interests include resilientsystem design, approximate computing,neuro-inspired computing, and on-chip inter-connections. Rahimi has a BS in computerengineering from the School of Electrical andComputer Engineering at the University ofTehran. Contact him at [email protected].

Hadi Esmaeilzadeh is the inaugural holderof the Catherine M. and James E. AllchinEarly Career Professorship in the School ofComputer Science at the Georgia Instituteof Technology. He is also the founder anddirector of the Alternative Computing Tech-nologies Lab. His research focuses on devel-oping new technologies and cross-stack sol-utions to improve the performance andenergy efficiency of computer systems foremerging applications. Esmaeilzadeh has aPhD in computer science and engineeringfrom the University of Washington. Contacthim at [email protected].

Kia Bazargan is an associate professor inthe Electrical and Computer EngineeringDepartment at the University of Minnesota.His research interests include FPGA archi-tectures, physical design for FPGAs, sto-chastic computing, and FPGA applications.Bazargan has a PhD in electrical and com-puter engineering from Northwestern Uni-versity. Contact him at [email protected].

..............................................................................................................................................................................................


.................................................................

16 IEEE MICRO

A :ABSTRACTIONS FOR A HARDWARE ESIGN REUSEhadi/doc/paper/2015-ieee_micro...Mohapatra et al.,...

Documents

Transcript of A :ABSTRACTIONS FOR A HARDWARE ESIGN REUSEhadi/doc/paper/2015-ieee_micro...Mohapatra et al.,...