Low area csla.docx

Low-Power and Area-Efficient Carry Select Adder

Introduction:

Addition usually impacts widely the overall performance of digital systems and a crucial arithmetic function. In electronic applications adders are most widely used. Applications where these are used are multipliers, DSP to execute various algorithms like FFT, FIR and IIR. Wherever concept of multiplication comes adders come in to the picture. As we know millions of instructions per second are performed in microprocessors. So, speed of operation is the most important constraint to be considered while designing multipliers. Due to device portability miniaturization of device should be high and power consumption should be low. Devices like Mobile, Laptops etc. require more battery backup.

So, a VLSI designer has to optimize these three parameters in a design. These constraints are very difficult to achieve so depending on demand or application some compromise between constraints has to be made. Ripple carry adders exhibits the most compact design but the slowest in speed. Whereas carry lookahead is the fastest one but consumes more area [2]. Carry select adders act as a compromise between the two adders. In 2002, a new concept of hybrid adders is presented to speed up addition process by Wang et al. that gives hybrid carry look-ahead/carry select adders design [7]. In 2008, low power multipliers based on new hybrid full adders is presented in [8].

DESIGN of area- and power-efficient high-speed data path logicsystems are one of the most substantial areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position.

The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating mul-tiple carries and then select a carry to generate the sum [1]. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input 5 and 5 , then the final sum and carry are selected by the multiplexers (mux).

2. FAST ADDERS

2.1 Ripple Carry Adder

Concatenating the N full adders forms N bit Ripple carry adder. In this carry out of previous full adder becomes the input carry for the next full adder. It calculates sum and carry according to the following equations. As carry ripples from one full adder to the other, it traverses longest critical path and exhibits worst-case delay. Si = Ai xor Bi xorCi

Ci+1 = Ai Bi + (Ai + Bi) Ci; where i = 0, 1, , n-1

RCA is the slowest in all adders (O (n) time) but it is very compact in size (O (n) area). If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from Cin to Cout. The delay of adder increases linearly with increase in number of bits. Block diagram of RCA is shown in figure 1.

Figure1: Block diagram of RCA

2.2 Carry Skip Adder (CSKA)

A carry skip divides the words to be added in to groups of equal size of k-bits. Carry Propagate pi signals may be used within a group of bits to accelerate the carry propagation. If all the pi signals within the group are pi=1, carry bypasses the entire group as shown in figure 2.

P = pi * pi+1 * pi+2 * pi+kIn this way delay is reduced as compared to ripple carry adder. The worst-case carry propagation delay in a N-bit carry skip adder with fixed block width b, assuming that one stage of ripple has the same delay as one skip, can be derived:

TCSKA = (b -1)+0.5+(N/b-2)+(b -1) = 2b + N/b 3.5 Stages

Block width tremendously affects the latency of adder. Latency is directly proportional to block width. More number of blocks means block width is less, hence more delay. The idea behind Variable Block Adder (VBA) is to minimize the critical path delay in the carry chain of a carry skip adder, while allowing the groups to take different sizes. In case of carry skip adder, such condition will result in more number of skips between stages.

Such adder design is called variable block design, which is tremendously used to fasten the speed of adder. In the variable block carry skip adder design we divided a 32-bit adder in to 4 blocks or groups. The bit widths of groups are taken as; First block is of 4 bits, second is of 6 bits, third is 18 bit wide and the last group consist of most significant 4 bits.

Table 1 shows that the logic utilization of carry skip and variable carry skip 32-bit adder. The power and delay, which are obtained also given in the table1. From table it can be observed that variable block design consumes more area as gate count and number of LUTs consumed by variable block design is more than conventional carry skip adder.

2.3 Carry Select Adder (CSA)

The carry select adder comes in the category of conditional sum adder. Conditional sum adder works on some condition. Sum and carry are calculated by assuming input carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual calculated values of sum and carry are selected using a multiplexer. The conventional carry select adder consists of k/2 bit adder for the lower half of the bits i.e. least significant bits and for the upper half i.e. most significant bits (MSBs) two k/ bit adders. In MSB adders one adder assumes carry input as one for performing addition and another assumes carry input as zero. The carry out calculated from the last stage i.e. least significant bit stage is used to select the actual calculated values of output carry and sum. The selection is done by using a multiplexer. This technique of dividing adder in to stages increases the area utilization but addition operation fastens. The block diagram of conventional k bit adder is shown in figure 3.

Basics:In electronics, acarry-select adderis a particular way to implement anadder, which is a logic element that computes the-bit sum of two-bit numbers. The carry-select adder is simple but rather fast, having a gate level depth of.The carry-select adder generally consists of tworipple carry addersand amultiplexer. Adding two n-bit numbers with a carry-select adder is done with two adders (therefore two ripple carry adders) in order to perform the calculation twice, one time with the assumption of the carry being zero and the other assuming one. After the two results are calculated, the correct sum, as well as the correct carry, is then selected with the multiplexer once the correct carry is known.The number of bits in each carry select block can be uniform, or variable. In the uniform case, the optimal delay occurs for a block size of. When variable, the block size should have a delay, from addition inputs A and B to the carry out, equal to that of the multiplexer chain leading into it, so that the carry out is calculated just in time. Thedelay is derived from uniform sizing, where the ideal number of full-adder elements per block is equal to the square root of the number of bits being added, since that will yield an equal number of MUX delays.

Adder:This is about a digital circuit. For an electronic circuit that handles analog signals seeElectronic mixer.In electronics, anadderorsummeris adigital circuitthat performsadditionof numbers. In manycomputersand other kinds of processors, adders are used not only in thearithmetic logic unit(s), but also in other parts of the processor, where they are used to calculate addresses, table indices, and similar.Although adders can be constructed for many numerical representations, such asbinary-coded decimalorexcess-3, the most common adders operate onbinarynumbers. In cases wheretwo's complementorones' complementis being used to represent negative numbers, it is trivial to modify an adder into anaddersubtractor. Othersigned number representationsrequire a more complex adder.

Half adder

Half Adder logic diagramFile:My work in.svgHalf Adder logic diagramThehalf adderadds two one-bit binary numbersAandB. It has two outputs,SandC(the value theoretically carried on to the next addition); the final sum is2C+S. The simplest half-adder design, pictured on the right, incorporates anXOR gateforSand anAND gateforC. With the addition of an OR gate to combine their carry outputs, two half adders can be combined to make a full adder.[1]

Full adder

Schematic symbol for a 1-bit full adder withCinandCoutdrawn on sides of block to emphasize their use in a multi-bit adderAfull adderadds binary numbers and accounts for values carried in as well as out. A one-bit full adder adds three one-bit numbers, often written asA,B, andCin;AandBare the operands, andCinis a bit carried in from the next less significant stage.[2]The full-adder is usually a component in a cascade of adders, which add 8, 16, 32, etc. binary numbers. The circuit produces a two-bit output sum typically represented by the signalsCoutandS, where. The one-bit full adder'struth tableis:

Full-adder logic diagramInputsOutputs

ABCinCoutS

00000

10001

01001

11010

00101

10110

01110

11111

A full adder can be implemented in many different ways such as with a customtransistor-level circuit or composed of other gates. One example implementation is withand.In this implementation, the finalOR gatebefore the carry-out output may be replaced by anXOR gatewithout altering the resulting logic. Using only two types of gates is convenient if the circuit is being implemented using simple IC chips which contain only one gate type per chip. In this light, Coutcan be implemented as.A full adder can be constructed from two half adders by connectingAandBto the input of one half adder, connecting the sum from that to an input to the second adder, connectingCito the other input and OR the two carry outputs. Equivalently,Scould be made the three-bit XOR ofA,B, andCi, andCoutcould be made the three-bitmajority functionofA,B, andCi.More complex addersRipple carry adder

4-bit adder with logic gates shownIt is possible to create a logical circuit using multiple full adders to addN-bit numbers. Each full adder inputs aCin, which is theCoutof the previous adder. This kind of adder is aripple carry adder, since each carry bit "ripples" to the next full adder. Note that the first (and only the first) full adder may be replaced by a half adder.The layout of a ripple carry adder is simple, which allows for fast design time; however, the ripple carry adder is relatively slow, since each full adder must wait for the carry bit to be calculated from the previous full adder. Thegate delaycan easily be calculated by inspection of the full adder circuit. Each full adder requires three levels of logic. In a 32-bit [ripple carry] adder, there are 32 full adders, so the critical path (worst case) delay is 3 (from input to carry in first adder) + 31 * 2 (for carry propagation in later adders) = 65 gate delays. A design with alternating carry polarities and optimizedAND-OR-Invertgates can be about twice as fast.[3][edit]Carry-lookahead adders

4-bit adder with carry lookaheadTo reduce the computation time, engineers devised faster ways to add two binary numbers by usingcarry-lookahead adders. They work by creating two signals (PandG) for each bit position, based on if a carry is propagated through from a less significant bit position (at least one input is a '1'), a carry is generated in that bit position (both inputs are '1'), or if a carry is killed in that bit position (both inputs are '0'). In most cases,Pis simply the sum output of a half-adder andGis the carry output of the same adder. AfterPandGare generated the carries for every bit position are created. Some advanced carry-lookahead architectures are theManchester carry chain,BrentKung adder, and theKoggeStone adder.Some other multi-bit adder architectures break the adder into blocks. It is possible to vary the length of these blocks based on thepropagation delayof the circuits to optimize computation time. These block based adders include thecarry bypass adderwhich will determinePandGvalues for each block rather than each bit, and thecarry select adderwhich pre-generates sum and carry values for either possible carry input to the block.Other adder designs include thecarry-save adder,carry-select adder,conditional-sum adder,carry-skip adder, andcarry-complete adder.[edit]Lookahead carry unit

A 64-bit adderBy combining multiple carry lookahead adders even larger adders can be created. This can be used at multiple levels to make even larger adders. For example, the following adder is a 64-bit adder that uses four 16-bit CLAs with two levels of LCUs.

3:2 compressorsWe can view a full adder as a3:2 lossy compressor: it sums three one-bit inputs, and returns the result as a single two-bit number; that is, it maps 8 input values to 4 output values. Thus, for example, a binary input of101results in an output of1+0+1=10(decimal number '2'). The carry-out represents bit one of the result, while the sum represents bit zero. Likewise, a half adder can be used as a2:2 lossy compressor, compressing four possible inputs into three possible outputs.Such compressors can be used to speed up the summation of three or more addends. If the addends are exactly three, the layout is known as thecarry-save adder. If the addends are four or more, more than one layer of compressors is necessary and there are various possible design for the circuit: the most common areDaddaandWallace trees. This kind of circuit is most notably used in multipliers, which is why these circuits are also known as Dadda and Wallace multipliers.

MultiplexerIn electronics, a multiplexer (or MUX) is a device that selects one of several analog or digital input signals and forwards the selected input into a single line.[1] A multiplexer of 2n inputs has n select lines, which are used to select which input line to send to the output.[2] Multiplexers are mainly used to increase the amount of data that can be sent over the network within a certain amount of time and bandwidth.[1] A multiplexer is also called a data selector. They are used in CCTV, and almost every business that has CCTV fitted, will own one of these.An electronic multiplexer makes it possible for several signals to share one device or resource, for example one A/D converter or one communication line, instead of having one device per input signal.On the other hand, a demultiplexer (or demux) is a device taking a single input signal and selecting one of many data-output-lines, which is connected to the single input. A multiplexer is often used with a complementary demultiplexer on the receiving end.[1]An electronic multiplexer can be considered as a multiple-input, single-output switch, and a demultiplexer as a single-input, multiple-output switch.[3] The schematic symbol for a multiplexer is an isosceles trapezoid with the longer parallel side containing the input pins and the short parallel side containing the output pin.[4] The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the right. The wire connects the desired input to the output.Indigital circuitdesign, the selector wires are of digital value. In the case of a 2-to-1 multiplexer, a logic value of 0 would connectto the output while a logic value of 1 would connectto the output. In larger multiplexers, the number of selector pins is equal towhereis the number of inputs.For example, 9 to 16 inputs would require no fewer than 4 selector pins and 17 to 32 inputs would require no fewer than 5 selector pins. The binary value expressed on these selector pins determines the selected input pin.A 2-to-1 multiplexer has aboolean equationwhereandare the two inputs,is the selector input, andis the output:

A 2-to-1 muxThis truth table shows that whenthenbut whenthen. A straightforward realization of this 2-to-1 multiplexer would need 2 AND gates, an OR gate, and a NOT gate.Larger multiplexers are also common and, as stated above, requireselector pins forinputs. Other common sizes are 4-to-1, 8-to-1, and 16-to-1. Since digital logic uses binary values, powers of 2 are used (4, 8, 16) to maximally control a number of inputs for the given number of selector inputs. 4-to-1 mux 8-to-1 mux 16-to-1 muxThe boolean equation for a 4-to-1 multiplexer is:

Two realizations for creating a 4-to-1 multiplexer are shown below:

These are two realizations of a 4-to-1 multiplexer: one realized from adecoder,AND gates, and anOR gate one realized from3-state buffersand AND gates (the AND gates are acting as the decoder)Note that the subscripts on theinputs indicate the decimal value of the binary control inputs at which that input is let through.

Carry Select Adder (CSA)

DESIGN of area- and power-efficient high-speed data path logicsystems are one of the most substantial areas of research in VLSIsystem design. In digital adders, the speed of addition is limited by thetime required to propagate a carry through the adder. The sum for eachbit position in an elementary adder is generated sequentially only afterthe previous bit position has been summed and a carry propagated intothe next position.The CSLA is used in many computational systems to alleviate theproblem of carry propagation delay by independently generating multiplecarries and then select a carry to generate the sum [1]. However,the CSLA is not area efficient because it uses multiple pairs of RippleCarry Adders (RCA) to generate partial sum and carry by consideringcarry input __ _ _ and __ _ _, then the final sum and carry areselected by the multiplexers (mux).The basic idea of this work is to use Binary to Excess-1 Converter(BEC) instead of RCA with __ _ _ in the regular CSLA to achievelower area and power consumption [2][4]. The main advantage of thisBEC logic comes from the lesser number of logic gates than the -bitFull Adder (FA) structure. The details of the BEC logic are discussedin Section III.

The carry select adder comes in the category of conditional sum adder. Conditional sum adder works on some condition. Sum and carry are calculated by assuming input carry as 1 and 0 prior the input carry comes. When actual carry input arrives, the actual calculated values of sum and carry are selected using a multiplexer. The conventional carry select adder consists of k/2 bit adder for the lower half of the bits i.e. least significant bits and for the upper half i.e. most significant bits (MSBs) two k/ bit adders. In MSB adders one adder assumes carry input as one for performing addition and another assumes carry input as zero. The carry out calculated from the last stage i.e. least significant bit stage is used to select the actual calculated values of output carry and sum. The selection is done by using a multiplexer. This technique of dividing adder in to stages increases the area utilization but addition operation fastens. The block diagram of conventional k bit adder is shown in figure 3.

Fig. 4.Regular 16-b SQRT CSLA.Fig. 1.Delay and Area evaluation of an XOR gate.

II. DELAY AND AREA EVALUATION METHODOLOGY OF THE BASICADDER BLOCKS

The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Fig. 1. The gates between the dotted lines are performing the operations in parallel and the numeric representation of each gate indi-cates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table I.

TABLE IDELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLAAdder BlockDelayAera

XOR35

2:1 mux34

Half adder36

Full adder613

TABLE IIFUNCTION TABLE OF THE 4-b BECB[3:0]X[3:0]

00000001...1110111100010010...11110000

III. BEC

As stated above the main idea of this work is to use BEC instead of the RCA with 5 in order to reduce the area and power consump-tion of the regular CSLA. To replace the -bit RCA, an -bit BEC is required. A structure and the function table of a 4-b BEC are shown in Fig. 2 and Table II, respectively.

Fig. 3 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the mux. One input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols ~NOT, &AND, ^XOR)

Fig. 2.4-b BEC.

Fig. 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b)group3, (c) group4, and (d) group5. F is a Full Adder.

IV. DELAY AND AREA EVALUATION METHODOLOGY OF REGULAR16-B SQRT CSLA

The structure of the 16-b regular SQRT CSLA is shown in Fig. 4. Ithas ve groups of different size RCA. The delay and area evaluation ofeach group are shown in Fig. 5, in which the numerals within [] specifythe delay values, e.g., sum2 requires 10 gate delays. The steps leadingto the evaluation are as follows.1) The group2 [see Fig. 5(a)] has two sets of 2-b RCA. Based onthe consideration of delay values of Table I, the arrival time ofselection input_____________of 6:3 mux is earlier than __and later than____ ___. Thus, ______ _ ___is summation of2) Except for group2, the arrival time of mux selection input is alwaysgreater than the arrival time of data outputs from the RCAs.Thus, the delay of group3 to group5 is determined, respectively asfollows:

3) The one set of 2-b RCA in group2 has 2 FA for ___ _ _and theother set has 1 FA and 1 HA for___ _ _. Based on the area countof Table I, the total number of gate counts in group2 is determinedas follows:

4) Similarly, the estimated maximum delay and area of the othergroups in the regular SQRT CSLA are evaluated and listed inTable III.

Fig. 6.Modified 16-b SQRT CSLA. The parallel RCA with Cin = 1 is replaced with BEC.TABLE IIIDELAY AND AREA COUNT OF REGULAR SQRT CSLA GROUPSGroupDelayArea

Group21157

Group31387

Group416117

Group519147

Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, (b)group3, (c) group4, and (d) group5. H is a Half Adder.

TABLE IVDELAY AND AREA COUNT OF MODIFIED SQRT CSLAGroupDelayArea

Group21343

Group31661

Group41984

Group522107

FPGA IMPLEMENTATION5.1 FPGA Implementation Flow Diagram:

FIG:5.1 FLOW CHART FOR FPGA

Initially the market research should be carried out which covers the previous version of the design and the current requirements on the design. Based on this survey, the specification and the architecture must be identified. Then the RTL modeling should be carried out in Verilog with respect to the identified architecture. Once the RTL modeling is done, it should be simulated and verified for all the cases. The functional verification should meet the intended architecture and should pass all the test cases. Once the functional verification is clear, the RTL model will be taken to the synthesis process. Three operations will be carried out in the synthesis process such as TranslateMapPlace and RouteThe developed RTL model will be translated to the mathematical equation format which will be in the understandable format of the tool. These translated equations will be then mapped to the library that is, mapped to the hardware. Once the mapping is done, the gates were placed and routed. Before these processes, the constraints can be given in order to optimize the design. Finally the BIT MAP file will be generated that has the design information in the binary format which will be dumped in the FPGA board. 5.2 Introduction to FPGAFPGA stands for Field Programmable Gate Array which has the array of logic module, I /O module and routing tracks (programmable interconnect). FPGA can be configured by end user to implement specific circuitry. Speed is up to 100 MHz but at present speed is in GHz. Main applications are DSP, FPGA based computers, logic emulation, ASIC and ASSP. FPGA can be programmed mainly on SRAM (Static Random Access Memory). It is Volatile and main advantage of using SRAM programming technology is re-configurability. Issues in FPGA technology are complexity of logic element, clock support, IO support and interconnections (Routing). In this work, design of a UART is made using VHDL and is synthesized on FPGA family of Spartan 3E through XILINX ISE Tool. This process includes following:TranslateMapPlace and Route

5.3 FPGA FlowThe basic implementation of design on FPGA has the following steps.Design EntrySimulationSynthesisLogic OptimizationTechnology MappingPlacementRoutingProgramming UnitConfigured FPGAAbove shows the basic steps involved in implementation. The initial design entry of may be VHDL, schematic or Boolean expression. The optimization of the Boolean expression will be carried out by considering area or speed. In technology mapping, the transformation of optimized Boolean expression to FPGA logic blocks, that is said to be as Slices. Here area and delay optimization will be taken place. During placement the algorithms are used to place each block in FPGA array. Assigning the FPGA wire segments, which are programmable, to establish connections among FPGA blocks through routing. The configuration of final chip is made in programming unit.5.4 Synthesis ResultThe developed UART is simulated and verified their functionality. Once the functional verification is done, the RTL model is taken to the synthesis process using the Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped to a specific technology library. Here in this Spartan 3E family, many different devices were available in the Xilinx ISE tool. In order to synthesis this DWT and IDWT design the device named as XC3S500E has been chosen and the package as FG320 with the device speed such as -4.The design of UART is synthesized and its results were analyzed as follows.

top Project Status (06/07/2012 - 04:43:04)

Project File:carryselectexisting.xiseParser Errors:No Errors

Module Name:topImplementation State:Placed and Routed

Target Device:xc3s500e-5cp132 Errors:

Product Version:ISE 13.4 Warnings:

Design Goal:Balanced Routing Results:All Signals Completely Routed

Design Strategy:Xilinx Default (unlocked) Timing Constraints:

Environment:System Settings Final Timing Score:0 (Timing Report)

Device Utilization Summary[-]

Logic UtilizationUsedAvailableUtilizationNote(s)

Number of 4 input LUTs499,3121%

Number of occupied Slices274,6561%

Number of Slices containing only related logic2727100%

Number of Slices containing unrelated logic0270%

Total Number of 4 input LUTs499,3121%

Number of bonded IOBs549258%

Average Fanout of Non-Clock Nets3.27

Performance Summary[-]

Final Timing Score:0 (Setup: 0, Hold: 0)Pinout Data:Pinout Report

Routing Results:All Signals Completely RoutedClock Data:Clock Report

Timing Constraints:

Detailed Reports[-]

Report NameStatusGeneratedErrorsWarningsInfos

Synthesis ReportCurrentThu Jun 7 04:41:44 2012010 Warnings (0 new)0

Translation ReportCurrentThu Jun 7 04:42:45 2012000

Map ReportCurrentThu Jun 7 04:42:50 2012

Place and Route ReportCurrentThu Jun 7 04:43:00 2012001 Info (0 new)

Power Report

Post-PAR Static Timing ReportCurrentThu Jun 7 04:43:03 2012006 Infos (0 new)

Bitgen ReportOut of DateSat May 19 10:46:38 2012000

Secondary Reports[-]

Report NameStatusGenerated

ISIM Simulator LogOut of DateThu May 17 17:32:44 2012

Post-Map Static Timing ReportOut of DateWed Apr 11 12:41:26 2012

WebTalk ReportOut of DateSat May 19 10:46:40 2012

WebTalk Log FileOut of DateSat May 19 10:47:16 2012

Date Generated: 06/07/2012 - 04:43:04

Carry Select Proposed:

Process "Generate Post-Place & Route Static Timing" completed successfully.

top Project Status (06/07/2012 - 05:20:27)

Project File:carryselectproposed.xiseParser Errors:No Errors

Module Name:topImplementation State:Placed and Routed

Target Device:xc3s100e-5cp132 Errors:No Errors

Product Version:ISE 13.4 Warnings:11 Warnings (0 new)

Design Goal:Balanced Routing Results:All Signals Completely Routed

Design Strategy:Xilinx Default (unlocked) Timing Constraints:

Environment:System Settings Final Timing Score:0 (Timing Report)

Device Utilization Summary[-]

Logic UtilizationUsedAvailableUtilizationNote(s)

Number of 4 input LUTs451,9202%

Number of occupied Slices259602%

Number of Slices containing only related logic2525100%

Number of Slices containing unrelated logic0250%

Total Number of 4 input LUTs451,9202%

Number of bonded IOBs548365%

Average Fanout of Non-Clock Nets2.64

Performance Summary[-]

Final Timing Score:0 (Setup: 0, Hold: 0)Pinout Data:Pinout Report

Routing Results:All Signals Completely RoutedClock Data:Clock Report

Timing Constraints:

Detailed Reports[-]

Report NameStatusGeneratedErrorsWarningsInfos

Synthesis ReportCurrentThu Jun 7 05:18:48 2012011 Warnings (0 new)0

Translation ReportCurrentThu Jun 7 05:20:11 2012000

Map ReportCurrentThu Jun 7 05:20:16 2012002 Infos (0 new)

Place and Route ReportCurrentThu Jun 7 05:20:22 2012001 Info (1 new)

Power Report

Post-PAR Static Timing ReportCurrentThu Jun 7 05:20:25 2012006 Infos (0 new)

Bitgen Report

Secondary Reports[-]

Report NameStatusGenerated

ISIM Simulator LogOut of DateThu May 17 17:33:22 2012

Post-Map Static Timing ReportOut of DateWed Apr 11 12:49:54 2012

Date Generated: 06/07/2012 - 05:20:27

Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device.

A VLSI integrated-circuit dieThe first semiconductor chips held two transistors each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Now known retrospectively as small-scale integration (SSI), improvements in technique led to devices with hundreds of logic gates, known as medium-scale integration (MSI). Further improvements led to large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and billions of individual transistors.At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use.As of early 2008, billion-transistor processors are commercially available. This is expected to become more commonplace as semiconductor fabrication moves from the current generation of 65nm processes to the next 45nm generations (while experiencing new challenges such as increased variation across process corners). A notable example is Nvidia's 280 series GPU. This GPU is unique in the fact that almost all of its 1.4 billion transistors are used for logic, in contrast to the Itanium, whose large transistor count is largely due to its 24 MB L3 cache. Current designs, as opposed to the earliest devices, use extensive design automation and automated logic synthesis to lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-performance logic blocks like the SRAM (Static Random Access Memory) cell, however, are still designed by hand to ensure the highest efficiency (sometimes by bending or breaking established design rules to obtain the last bit of performance by trading stability)[citation needed].

Structured designStructured VLSI design is a modular methodology originated by Carver Mead and Lynn Conway for saving microchip area by minimizing the interconnect fabrics area. This is obtained by repetitive arrangement of rectangular macro blocks which can be interconnected using wiring by abutment. An example is partitioning the layout of an adder into a row of equal bit slices cells. In complex designs this structuring may be achieved by hierarchical nesting.Structured VLSI design had been popular in the early 1980s, but lost its popularity later because of the advent of placement and routing tools wasting a lot of area by routing, which is tolerated because of the progress of Moore's Law. When introducing the hardware description language KARL in the mid' 1970s, Reiner Hartenstein coined the term "structured VLSI design" (originally as "structured LSI design"), echoing Edsger Dijkstra's structured programming approach by procedure nesting to avoid chaotic spaghetti-structured programs.ChallengesAs microprocessors become more complex due to technology scaling, microprocessor designers have encountered several challenges which force them to think beyond the design plane, and look ahead to post-silicon: Power usage/Heat dissipation As threshold voltages have ceased to scale with advancing process technology, dynamic power dissipation has not scaled proportionally. Maintaining logic complexity when scaling the design down only means that the power dissipation per area will go up. This has given rise to techniques such as dynamic voltage and frequency scaling (DVFS) to minimize overall power. Process variation As photolithography techniques tend closer to the fundamental laws of optics, achieving high accuracy in doping concentrations and etched wires is becoming more difficult and prone to errors due to variation. Designers now must simulate across multiple fabrication process corners before a chip is certified ready for production. Stricter design rules Due to lithography and etch issues with scaling, design rules for layout have become increasingly stringent. Designers must keep ever more of these rules in mind while laying out custom circuits. The overhead for custom design is now reaching a tipping point, with many design houses opting to switch to electronic design automation (EDA) tools to automate their design process. Timing/design closure As clock frequencies tend to scale up, designers are finding it more difficult to distribute and maintain low clock skew between these high frequency clocks across the entire chip. This has led to a rising interest in multicore and multiprocessor architectures, since an overall speedup can be obtained by lowering the clock frequency and distributing processing. First-pass success As die sizes shrink (due to scaling), and wafer sizes go up (to lower manufacturing costs), the number of dies per wafer increases, and the complexity of making suitable photomasks goes up rapidly. A mask set for a modern technology can cost several million dollars. This non-recurring expense deters the old iterative philosophy involving several "spin-cycles" to find errors in silicon, and encourages first-pass silicon success. Several design philosophies have been developed to aid this new design flow, including design for manufacturing (DFM), design for test (DFT), and Design for X

VLSI Technology, Inc was a company which designed and manufactured custom and semi-custom ICs. The company was based in Silicon Valley, with headquarters at 1109 McKay Drive in San Jose, California. Along with LSI Logic, VLSI Technology defined the leading edge of the application-specific integrated circuit (ASIC) business, which accelerated the push of powerful embedded systems into affordable products.The company was founded in 1979 by a trio from Fairchild Semiconductor by way of Synertek - Jack Balletto, Dan Floyd, Gunnar Wetlesen - and by Doug Fairbairn of Xerox PARC and Lambda (later VLSI Design) magazine.Alfred J. Stein became the CEO of the company in 1982. Subsequently VLSI built its first fab in San Jose; eventually a second fab was built in San Antonio, Texas. VLSI had its initial public offering in 1983, and was listed on the stock market as (NASDAQ:VLSI). The company was later acquired by Philips and survives to this day as part of NXP Semiconductors.

Advanced tools for VLSI design

A VLSI VL82C106 Super I/O chip.The original business plan was to be a contract wafer fabrication company, but the venture investors wanted the company to develop IC (Integrated Circuit) design tools to help fill the foundry.Thanks to its Caltech and UC Berkeley students, VLSI was an important pioneer in the electronic design automation industry. It offered a sophisticated package of tools, originally based on the 'lambda-based' design style advocated by Carver Mead and Lynn Conway.VLSI became an early vendor of standard cell (cell-based technology) to the merchant market in the early 80s where the other ASIC-focused company, LSI Logic, was a leader in gate arrays. Prior to VLSI's cell-based offering, the technology had been primarily available only within large vertically integrated companies with semiconductor units such as AT&T and IBM.VLSI's design tools eventually included not only design entry and simulation but eventually cell-based routing (chip compiler), a datapath compiler, SRAM and ROM compilers, and a state machine compiler. The tools were an integrated design solution for IC design and not just point tools, or more general purpose system tools. A designer could edit transistor-level polygons and/or logic schematics, then run DRC and LVS, extract parasitics from the layout and run Spice simulation, then back-annotate the timing or gate size changes into the logic schematic database. Characterization tools were integrated to generate FrameMaker Data Sheets for Libraries. VLSI eventually spun off the CAD and Library operation into Compass Design Automation but it never reached IPO before it was purchased by Avanti Corp.VLSI's physical design tools were critical not only to its ASIC business, but also in setting the bar for the commercial EDA industry. When VLSI and its main ASIC competitor, LSI Logic, were establishing the ASIC industry, commercially-available tools could not deliver the productivity necessary to support the physical design of hundreds of ASIC designs each year without the deployment of a substantial number of layout engineers. The companies' development of automated layout tools was a rational "make because there's nothing to buy" decision. The EDA industry finally caught up in the late 1980s when Tangent Systems released its TanCell and TanGate products. In 1989, Tangent was acquired by Cadence Design Systems (founded in 1988).Unfortunately, for all VLSI's initial competence in design tools, they were not leaders in semiconductor manufacturing technology. VLSI had not been timely in developing a 1.0m manufacturing process as the rest of the industry moved to that geometry in the late 80s. VLSI entered a long-term technology parthership with Hitachi and finally released a 1.0m process and cell library (actually more of a 1.2m library with a 1.0m gate).As VLSI struggled to gain parity with the rest of the industry in semiconductor technology, the design flow was moving rapidly to a Verilog HDL and synthesis flow. Cadence acquired Gateway, the leader in Verilog hardware design language (HDL) and Synopsys was dominating the exploding field of design synthesis. As VLSI's tools were being eclipsed, VLSI waited too long to open the tools up to other fabs and Compass Design Automation was never a viable competitor to industry leaders.Meanwhile, VLSI entered the merchant high speed static RAM (SRAM) market as they needed a product to drive the semiconductor process technology development. All the large semiconductor companies built high speed SRAMs with cost structures VLSI could never match. VLSI withdrew once it was clear that the Hitachi process technology partnership was working.ARM Ltd was formed in 1990 as a semiconductor intellectual property licensor, backed by Acorn, Apple and VLSI. VLSI became a licensee of the powerful ARM processor and ARM finally funded processor tools. Initial adoption of the ARM processor was slow. Few applications could justify the overhead of an embedded 32 bit processor. In fact, despite the addition of further licensees, the ARM processor enjoyed little market success until they developed the novel 'thumb' extensions. Ericsson adopted the ARM processor in a VLSI chipset for its GSM handset designs in the early 1990s. It was the GSM boost that is the foundation of ARM the company/technology that it is today.Only in PC chipsets, did VLSI dominate in the early 90s. This product was developed by five engineers using the 'Megacells" in the VLSI library that led to a business unit at VLSI that almost equaled its ASIC business in revenue. VLSI eventually ceded the market to Intel because Intel was able to package-sell its processors, chipsets, and even board level products together.VLSI also had an early partnership with PMC, a design group that had been nurtured of British Columbia Bell. When PMC wanted to divest its semiconductor intellectual property venture, VLSI's bid was beaten by a creative deal by Sierra Semiconductor. The telecom business unit management at VLSI opted to go it alone. PMC Sierra became one of the most important telecom ASSP vendors.Scientists and innovations from the 'design technology' part of VLSI found their way to Cadence Design Systems (by way of Redwood Design Automation). Compass Design Automation (VLSI's CAD and Library spin-off) was sold to Avant! Corporation, which itself was acquired by Synopsys.Global expansionVLSI maintained operations throughout the USA, and in Britain, France, Germany, Italy, Japan, Singapore and Taiwan. One of its key sites was in Tempe, Arizona, where a family of highly successful chipsets was developed for the IBM PC.In 1990, VLSI Technology, along with Acorn Computers and Apple Computer were the founding investing partners in ARM Ltd.Ericsson of Sweden, after many years of fruitful collaboration, was by 1998 VLSI's largest customer, with annual revenue of $120 million. VLSI's datapath compiler (VDP) was the value-added differentiator that opened the door at Ericsson in 1987/8. The silicon revenue and GPM enabled by VDP must make it one of the most successful pieces of customer-configurable, non-memory silicon intellectual property (SIP) in the history of the industry. Within the Wireless Products division, based at Sophia-Antipolis in France, VLSI developed a range of algorithms and circuits for the GSM standard and for cordless standards such as the European DECT and the Japanese PHS. Stimulated by its growth and success in the wireless handset IC area, Philips Electronics acquired VLSI in June 1999, for about $1 billion. The former components survive to this day as part of Philips spin-off NXP Semiconductors.Hardware description languageIn electronics, a hardware description language or HDL is any language from a class of computer languages, specification languages, or modeling languages for formal description and design of electronic circuits, and most-commonly, digital logic. It can describe the circuit's operation, its design and organization, and tests to verify its operation by means of simulation.[citation needed]HDLs are standard text-based expressions of the spatial and temporal structure and behaviour of electronic systems. Like concurrent programming languages, HDL syntax and semantics includes explicit notations for expressing concurrency. However, in contrast to most software programming languages, HDLs also include an explicit notion of time, which is a primary attribute of hardware. Languages whose only characteristic is to express circuit connectivity between a hierarchy of blocks are properly classified as netlist languages used on electric computer-aided design (CAD).HDLs are used to write executable specifications of some piece of hardware. A simulation program, designed to implement the underlying semantics of the language statements, coupled with simulating the progress of time, provides the hardware designer with the ability to model a piece of hardware before it is created physically. It is this executability that gives HDLs the illusion of being programming languages, when they are more-precisely classed as specification languages or modeling languages. Simulators capable of supporting discrete-event (digital) and continuous-time (analog) modeling exist, and HDLs targeted for each are available.It is certainly possible to represent hardware semantics using traditional programming languages such as C++, although to function such programs must be augmented with extensive and unwieldy class libraries. Primarily, however, software programming languages do not include any capability for explicitly expressing time, and this is why they do not function as a hardware description language. Before the recent introduction of SystemVerilog, C++ integration with a logic simulator was one of the few ways to use OOP in hardware verification. SystemVerilog is the first major HDL to offer object orientation and garbage collection.Using the proper subset of virtually any (hardware description or software programming) language, a software program called a synthesizer (or synthesis tool) can infer hardware logic operations from the language statements and produce an equivalent netlist of generic hardware primitives to implement the specified behaviour. Synthesizers generally ignore the expression of any timing constructs in the text. Digital logic synthesizers, for example, generally use clock edges as the way to time the circuit, ignoring any timing constructs. The ability to have a synthesizable subset of the language does not itself make a hardware description language.

HistoryThe first hardware description languages were ISP (Instruction Set Processor),[1] developed at Carnegie Mellon University, and KARL, developed at University of Kaiserslautern, both around 1977. ISP was, however, more like a software programming language used to describe relations between the inputs and the outputs of the design. Therefore, it could be used to simulate the design, but not to synthesize it. KARL included design calculus language features supporting VLSI chip floorplanning and structured hardware design, which was also the basis of KARL's interactive graphic sister language ABL, implemented in the early 1980s as the ABLED graphic VLSI design editor, by the telecommunication research center CSELT at Torino, Italy. In the mid 80's, a VLSI design framework was implemented around KARL and ABL by an international consortium funded by the commission of the European Union (chapter in [2]). In 1983 Data-I/O introduced ABEL. It was targeted for describing programmable logical devices and was basically used to design finite state machines.The first modern HDL, Verilog, was introduced by Gateway Design Automation in 1985. Cadence Design Systems later acquired the rights to Verilog-XL, the HDL-simulator that would become the de-facto standard (of Verilog simulators) for the next decade. In 1987, a request from the U.S. Department of Defense led to the development of VHDL (VHSIC Hardware Description Language, where VHSIC is Very High Speed Integrated Circuit). VHDL was based on the Ada programming language. Initially, Verilog and VHDL were used to document and simulate circuit-designs already captured and described in another form (such as a schematic file.) HDL-simulation enabled engineers to work at a higher level of abstraction than simulation at the schematic-level, and thus increased design capacity from hundreds of transistors to thousands[citation needed].The introduction of logic-synthesis for HDLs pushed HDLs from the background into the foreground of digital-design. Synthesis tools compiled HDL-source files (written in a constrained format called RTL) into a manufacturable gate/transistor-level netlist description. Writing synthesizeable RTL files required practice and discipline on the part of the designer; compared to a traditional schematic-layout, synthesized-RTL netlists were almost always larger in area and slower in performance[citation needed]. Circuit design by a skilled engineer, using labor-intensive schematic-capture/hand-layout, would almost always outperform its logically-synthesized equivalent, but synthesis's productivity advantage soon displaced digital schematic-capture to exactly those areas that were problematic for RTL-synthesis: extremely high-speed, low-power, or asynchronous circuitry. In short, logic synthesis propelled HDL technology into a central role for digital design.Within a few years, both VHDL and Verilog emerged as the dominant HDLs in the electronics industry, while older and less-capable HDLs gradually disappeared from use. But VHDL and Verilog share many of the same limitations: neither HDL is suitable for analog/mixed-signal circuit simulation. Neither possesses language constructs to describe recursively-generated logic structures. Specialized HDLs (such as Confluence) were introduced with the explicit goal of fixing a specific Verilog/VHDL limitation, though none were ever intended to replace VHDL/Verilog.Over the years, a lot of effort has gone into improving HDLs. The latest iteration of Verilog, formally known as IEEE 1800-2005 SystemVerilog, introduces many new features (classes, random variables, and properties/assertions) to address the growing need for better testbench randomization, design hierarchy, and reuse. A future revision of VHDL is also in development, and is expected to match SystemVerilog's improvements.Design using HDLEfficiency gains realized using HDL means a majority of modern digital circuit design revolves around it. Most designs begin as a set of requirements or a high-level architectural diagram. Control and decision structures are often prototyped in flowchart applications, or entered in a state-diagram editor. The process of writing the HDL description is highly dependent on the nature of the circuit and the designer's preference for coding style . The HDL is merely the 'capture language'often beginning with a high-level algorithmic description such as MATLAB or a C++ mathematical model. Designers often use scripting languages (such as Perl) to automatically generate repetitive circuit structures in the HDL language. Special text editors offer features for automatic indentation, syntax-dependent coloration, and macro-based expansion of entity/architecture/signal declaration.The HDL code then undergoes a code review, or auditing. In preparation for synthesis, the HDL description is subject to an array of automated checkers. The checkers report deviations from standardized code guidelines, identify potential ambiguous code constructs before they can cause misinterpretation, and check for common logical coding errors, such as dangling ports or shorted outputs. This process aids in resolving errors before the code is synthesized.In industry parlance, HDL design generally ends at the synthesis stage. Once the synthesis tool has mapped the HDL description into a gate netlist, this netlist is passed off to the back-end stage. Depending on the physical technology (FPGA, ASIC gate array, ASIC standard cell), HDLs may or may not play a significant role in the back-end flow. In general, as the design flow progresses toward a physically realizable form, the design database becomes progressively more laden with technology-specific information, which cannot be stored in a generic HDL description. Finally, an integrated circuit is manufactured or programmed for use.Simulating and debugging HDL codeMain article: Logic simulationEssential to HDL design is the ability to simulate HDL programs. Simulation allows an HDL description of a design (called a model) to pass design verification, an important milestone that validates the design's intended function (specification) against the code implementation in the HDL description. It also permits architectural exploration. The engineer can experiment with design choices by writing multiple variations of a base design, then comparing their behavior in simulation. Thus, simulation is critical for successful HDL design.To simulate an HDL model, an engineer writes a top-level simulation environment (called a testbench). At minimum, a testbench contains an instantiation of the model (called the device under test or DUT), pin/signal declarations for the model's I/O, and a clock waveform. The testbench code is event driven: the engineer writes HDL statements to implement the (testbench-generated) reset-signal, to model interface transactions (such as a hostbus read/write), and to monitor the DUT's output. An HDL simulator the program that executes the testbench maintains the simulator clock, which is the master reference for all events in the testbench simulation. Events occur only at the instants dictated by the testbench HDL (such as a reset-toggle coded into the testbench), or in reaction (by the model) to stimulus and triggering events. Modern HDL simulators have a full-featured graphical user interfaces, complete with a suite of debug tools. These allow the user to stop and restart the simulation at any time, insert simulator breakpoints (independent of the HDL code), and monitor or modify any element in the HDL model hierarchy. Modern simulators can also link the HDL environment to user-compiled libraries, through a defined PLI/VHPI interface. Linking is system-dependent (Win32/Linux/SPARC), as the HDL simulator and user libraries are compiled and linked outside the HDL environment.Design verification is often the most time-consuming portion of the design process, due to the disconnect between a device's functional specification, the designer's interpretation of the specification, and the imprecision[citation needed] of the HDL language. The majority of the initial test/debug cycle is conducted in the HDL simulator environment, as the early stage of the design is subject to frequent and major circuit changes. An HDL description can also be prototyped and tested in hardware programmable logic devices are often used for this purpose. Hardware prototyping is comparatively more expensive than HDL simulation, but offers a real-world view of the design. Prototyping is the best way to check interfacing against other hardware devices and hardware prototypes. Even those running on slow FPGAs offer much faster simulation times than pure HDL simulation.Design Verification with HDLsMain article: Functional verificationHistorically, design verification was a laborious, repetitive loop of writing and running simulation test cases against the design under test. As chip designs have grown larger and more complex, the task of design verification has grown to the point where it now dominates the schedule of a design team. Looking for ways to improve design productivity, the EDA industry developed the Property Specification Language.In formal verification terms, a property is a factual statement about the expected or assumed behavior of another object. Ideally, for a given HDL description, a property or properties can be proven true or false using formal mathematical methods. In practical terms, many properties cannot be proven because they occupy an unbounded solution space. However, if provided a set of operating assumptions or constraints, a property checker can prove (or disprove) more properties, over the narrowed solution space.The assertions do not model circuit activity, but capture and document the "designer's intent" in the HDL code. In a simulation environment, the simulator evaluates all specified assertions, reporting the location and severity of any violations. In a synthesis environment, the synthesis tool usually operates with the policy of halting synthesis upon any violation. Assertion-based verification is still in its infancy, but is expected to become an integral part of the HDL design toolset.HDL and programming languagesA HDL is analogous to a software programming language, but with major differences. Many programming languages are inherently procedural (single-threaded), with limited syntactical and semantic support to handle concurrency. HDLs, on the other hand, resemble concurrent programming languages in their ability to model multiple parallel processes (such as flipflops, adders, etc.) that automatically execute independently of one another. Any change to the process's input automatically triggers an update in the simulator's process stack. Both programming languages and HDLs are processed by a compiler (usually called a synthesizer in the HDL case), but with different goals. For HDLs, 'compiler' refers to synthesis, a process of transforming the HDL code listing into a physically realizable gate netlist. The netlist output can take any of many forms: a "simulation" netlist with gate-delay information, a "handoff" netlist for post-synthesis place and route, or a generic industry-standard EDIF format (for subsequent conversion to a JEDEC-format file).On the other hand, a software compiler converts the source-code listing into a microprocessor-specific object-code, for execution on the target microprocessor. As HDLs and programming languages borrow concepts and features from each other, the boundary between them is becoming less distinct. However, pure HDLs are unsuitable for general purpose software application development, just as general-purpose programming languages are undesirable for modeling hardware. Yet as electronic systems grow increasingly complex, and reconfigurable systems become increasingly mainstream, there is growing desire in the industry for a single language that can perform some tasks of both hardware design and software programming. SystemC is an example of suchembedded system hardware can be modeled as non-detailed architectural blocks (blackboxes with modeled signal inputs and output drivers). The target application is written in C/C++, and natively compiled for the host-development system (as opposed to targeting the embedded CPU, which requires host-simulation of the embedded CPU). The high level of abstraction of SystemC models is well suited to early architecture exploration, as architectural modifications can be easily evaluated with little concern for signal-level implementation issues. However, the threading model used in SystemC and its reliance on shared memory mean that it does not handle parallel execution or lower level models well.In an attempt to reduce the complexity of designing in HDLs, which have been compared to the equivalent of assembly languages, there are moves to raise the abstraction level of the design. Companies such as Cadence, Synopsys and Agility Design Solutions are promoting SystemC as a way to combine high level languages with concurrency models to allow faster design cycles for FPGAs than is possible using traditional HDLs. Approaches based on standard C or C++ (with libraries or other extensions allowing parallel programming) are found in the Catapult C tools from Mentor Graphics, the Impulse C tools from Impulse Accelerated Technologies, and the free and open-source ROCCC 2.0 tools from Jacquard Computing Inc. Annapolis Micro Systems, Inc.'s CoreFire Design Suite and National Instruments LabVIEW FPGA provide a graphical dataflow approach to high-level design entry. Languages such as SystemVerilog, SystemVHDL, and Handel-C seek to accomplish the same goal, but are aimed at making existing hardware engineers more productive versus making FPGAs more accessible to existing software engineers. There is more information on C to HDL and Flow to HDL in their respective articles

Low area csla.docx

Documents

Transcript of Low area csla.docx