Download - 32 Bit a Lure Port

8/3/2019 32 Bit a Lure Port

1/52

EECE7353

VLSI Design (Spring 2010)

Final Project

Design and implementation of 32-bit

Arithmetic Logic Unit

by

Soumya Shivakumar Begur

Raghu Varier

Prathamesh Chawan

April 23, 2010

Northeastern University

Electrical and Computer Engineering Department

Contents

1


2/52

1. Introduction ... 3

2. Topology Selection.. 4

3. ALU Design..... 8

4. CMOS Implementation. 12

5. Basic Building Blocks. 13

5.1 Inverter. 13

5.2 And.. 15

5.3 Or.... 17

5.4 Xor......... 19

5.5 Full Adder... 21

5.6 2-to-1 Multiplexer. 24

5.7 3 input AND 27

5.8 4 input OR 28

5.9 4-to-1 Multiplexer. 29

5.10 1 bit ALU 33

6. 32 Bit ALU. 38

6.1 Schematic 39

6.2 Layout. 40

6.3 Output waveforms 41

7. Schematic and layout Verification of 32 bit ALU 42

8. Experimental Results 43

9. Further Up gradation 4410. Learnings 44

11. References 45

2


3/52

1. Introduction

The arithmetic logic unit (ALU) is the core of a CPU in a computer. The adder cell is the

elementary unit of an ALU. The constraints the adder has to satisfy are area, power and speed

requirements. Some of the conventional types of adders are ripple-carry adder, carry- lookaheadadder, carry-skip adder and Manchester carry chain adder [8]. The delay in an adder is

dominated by the carry chain. Carry chain analysis must consider transistor and wiring delays.

The aim of this project is to design a 32 bit ALU which performs the following four functions. Arithmetic ADD function Arithmetic SUBTRACT function Logical AND function Logical OR function Logical XOR function

Since this ALU operates on 32-bit operands, it is called 32-bit ALU.

The function to be performed on the operands A and B is defined by the ALU control / Selectlines.

Result lines provide result of the chosen function applied to the operands A and B.

Carry out indicates the final carry.

3


4/52

1. Topology Selection

The adder cell is the elementary unit of an ALU. The constraints the adder has to satisfy are area,power and speed requirements. Adder can be implemented using static or dynamic logic. Thevarious kinds of implementations are,

Ripple carry adder Carry look ahead adder

Mirror adder Manchester carry chain adder Domino adder Carry skip adder

Ripple carry adder is designed using multiple full adders to add N-bit numbers. Each full adderinputs a Cin, which is the Cout of the previous adder. This kind of adder is called a ripple carryadder, since each carry bit "ripples" to the next full adder.

Carry- lookahead adders first compute carry propagate and generate and then computes SUMand CARRY from them. It allows for carry to be computed in each bit. Figure 1.2 shows a 4-bit

carry-lookahead adder. Carry- lookahead unit requires complex wiring between adders andlookahead unit, as the values must be routed back to adder from lookahead unit. Layout becomescomplex with multiple levels of lookahead.

Figure 1.3 shows a 4-bit carry-skip adder and skip module used. The skip module determineswhether it could just pass a carry in (CIN) the next four bits for addition or it has to wait until the

4


5/52

carry out (C3) propagates through the last full adder in the design. In essence, the skip modulecan make the carry in (CIN) appear to skip through the four full adders.

The Manchester carry chain adder uses a precharged carry chain with P and G signals. Propagatesignal Pi is the XOR of input bits Ai and Bi and generate signal Gi is the NAND of input bits Ai

and Bi. Propagate signal connects adjacent carry bits and Generate signal discharge the carry bit.Figure 1.4 shows a Manchester carry chain. When input bits are 0, G i is HIGH and hence thecarry out node is discharged. When one of the input bits is 1, then P i is HIGH and carry outfollows carry in. When both bits are 1, then both G i and Pi are LOW; hence carry out noderemains isolated from carry in and ground. As the node is pre-charged to a HIGH state the carryout remains HIGH.

Each of the adder configurations may or may not require additional logic apart from full adderdesign. Table shows approximately how many additional gates and transistors are required foreach of the adder configurations.

5


6/52

Dynamic logic can also used for adder implementation for high speed circuits. In this, the logic

blocks are built with n-MOS (see figure) pull-down tree that pre-charged and discharged through

series clocking transistors. The output of the logic gate is driven by a build in inverter that is

dynamically fed by the drain of the N-MOS tree which only can make, at most, one transition-

from logic 1 to 0- during the clock evaluation phase which allows the output inverter to shift

from logic 0 to 1. Therefore any number of gates can be cascaded provided by the fact thatevaluation of input values is only possible in half the clock phase.

Among these different implementations of adders due to the following reasons we triedimplementing Domino adder using Multiple output Domino logic based on the [3]. The Dominocircuits work at high very high speed. The number of transistors required would be less. (N+2) Nis fan-in. It is advantageous for circuits with more fan-ins as in 32 bit adder. More than 60% ofthe high performance microprocessors use domino logic for the implementation of differentfunctionalities.

The Domino logic is faster because of the following reasons. Logic threshold voltage is same as device threshold voltage VT

Less gate loading (less input capacitance) Less output loading (less output capacitance)

We were not able to successfully complete the adder circuit for 32 bit adder as we were facingissues with the propagate circuit which was producing wrong propagate bit which played a vitalrole in the generation of the carry bit. So, due to time constraints of the project we were not ableto debug the error in the circuit and thus we started with the less complex Ripple carry adder.

6


7/52

Ripple Carry Adder:

In terms of area efficiency ripple carry adder is preferred. Keeping in mind small layout area and

less number of interconnections our ALU has been designed using ripple carry configuration.

However, the delay time for worst case is more when compared to other adders.

Ripple carry adder is designed using multiple full adders to add N-bit numbers. Each full adder

inputs a Cin, which is the Cout of the previous adder. The first (and only the first) full adder maybe replaced by a half adder.

The layout of ripple carry adder is simple, which allows for fast design time; however, the ripplecarry adder is relatively slow, since each full adder must wait for the carry bit to be calculatedfrom the previous full adder. The gate delay can easily be calculated by inspection of the fulladder circuit. Each full adder requires three levels of logic. In a 32-bit [ripple carry] adder, there32 full adders, so the critical path (worst case) delay is 32 * 3 = 96 gate delays.

The RCA can be used in applications where the delay is not an issue.

7
http://en.wikipedia.org/wiki/Gate_delayhttp://en.wikipedia.org/wiki/Gate_delay


8/52

1. ALU Design

A 32-bit ALU has been designed for 1.1 V operation in which, the full adder design has been

implemented using CMOS logic. The ALU has 32 stages, each stage consisting of three parts: a)

input multiplexers b) full adder and c) output multiplexers. The ALU performs the following two

arithmetic operations, ADD, SUBTRACT. The three logical operations performed are XOR,

AND and OR. The input and output sections consist of 4 to 1 and 2 to 1 multiplexers. The

multiplexers were designed using the CMOS logic. A set of three select signals has been

incorporated in the design to determine the operation being performed and the inputs and outputs

being selected. Figure 3.1 shows the 4-bit ALU with the CARRY bit cascading all the way from

first stage to fourth stage. The 32-bit ALU was designed in 45nm, twin-tub CMOS technology.

This chapter explains in detail the 32-bit ALU design. All of the multiplexers and the full adder

have been implemented using CMOS logic. Each stage is discussed in detail in the further

sections of this chapter.

3.1 Multiplexer Design

The multiplexers have been used in theALU design for input and output signals

selection. The multiplexer is

implemented using CMOS.

There are two kinds of multiplexers implemented: 2 to 1 multiplexer and 4 to 1 multiplexer.

Figure 3.2 shows the block diagram of a 4 to 1 MUX and Fig. 3.3 shows the circuit level diagram

of the 4 to 1 MUX. Figure 3.4 shows the block diagram of a 2 to 1 MUX and Fig. 3.5 shows the

circuit level diagram of the 2 to 1 MUX. The output of the multiplexer stage is passed as input to

the full adder. A combination of the 2 to 1 MUX and 4 to 1 MUX at the input and output stages

select the signals depending on the operation being performed.

The input and select signals have been named as An, Bn and Sn respectively, with the subscript n

indicating the correct signal number. The input and the output stages have a combination of 2 to

1 multiplexer and 4 to 1 multiplexer to select the type of operation. Figures 3.6 and 3.7 show

how this logic has been implemented at input and output stage, respectively. The select signals

are S0, S1 and S2. Signal S2 determines if the operation being performed is arithmetic add or

subtract. The select signals S0 and S1pick one of the four output signals route it to the output of

the ALU and hence determine which of the four arithmetic or logical operations should be

performed. S2 determines if the arithmetic operations performed is add or subtract. Table 3.1

8

Select line S2 Operation0 Add1 Subtract


9/52


10/52

3.2 Full Adder Design

In ALU, full adder forms the core of the entire design. The full adder performs the computing

function of the ALU.

A full adder could be defined as a combinational circuit that forms the arithmetic sum of three

input bits. It consists of three inputs and two outputs. In our design, we have designated the three

inputs as A, B and CIN. The third input CIN represents carry input to the first stage. The outputs

are SUM and CARRY. Figure 3.8 shows the logic level diagram of a full adder. The Boolean

expressions for the SUM and CARRY bits are as shown below.

SUM= A B CIN

CARRY= A B + A CIN+ B CIN

SUM bit is the XOR function of all three inputs and CARRY bit is the AND function of the three

inputs. The truth table of a full adder is shown in Table 3.3. The truth table also indicates the

status of the CARRY bit; that is to say, if that carry bit has been generated or deleted or

propagated. Depending on the status of input bits A and B, the CARRY bit is either generated or

deleted or propagated [8]. If either one of A or B inputs is 1, then the previous carry is just

propagated, as the sum of A and B is 1. If both A and B are 1s then carry is generated becausesumming A and B would make output SUM 0 and CARRY 1. If both A and B are 0s then

summing A and B would give us 0 and any previous carry is added to this SUM making

CARRY bit 0. This is in effect deleting the CARRY. To construct an n-bit adder we have to

cascade n such 1-bit adders. We have used this ripple carry adder (RCA) configuration in our

ALU design.

10

Select line S2 Operation0 Cin1 1


11/52

In RCA, the CARRY bit ripples all the way from first stage to nth stage. Figure 3.9 shows the

block diagram of a four-bit ripple carry adder. The delay in a RCA depends on the number of

stages cascaded and also the input bits patterns.

For certain input patterns, a CARRY is neither generated nor propagated. This way the CARRY

bit need not ripple through the stages. This effectively reduces the delay in the circuit. On the

other hand, certain input patterns generate carry bit in the first stage itself, which might have to

ripple through all the stages. This definitely increases the delay in the circuit. The propagation

delay of such a case, also called critical path, is defined as worst-case delay over all possibleinput patterns. In a ripple carry adder, the worst-case delay occurs when a carry bit propagates all

the way from least significant bit position to most significant bit position. The total delay of the

adder would be an addition of delay of a SUM bit and delay of a CARRY bit multiplied by

number of bits minus one in the input word, given by following Eq.

Tadder = (N-1) Tcarry + Tsum

Where N is number of bits in input word, Tcarry and Tsum are propagation delays from one stage to

another. For an efficient ripple carry adder, it is important to reduce Tcarry than Tsum as the former

influences the total adder delay more.

3.3 Logical operations

The logical functions are implemented using the respective gates.

The AND and XOR functions are incorporated with the adder itself which results in significant

reduction in the number of gates used.

The OR function is implemented using an OR gate embedded into the basic cell.

11


12/52

2. CMOS Implementation

Static CMOS logic is the most widely used logic in todays industry. The circuit in Static CMOS

adder is build using N-MOS pull-down tree driving the output low in certain input combinationsand PMOS pull-up tree driving the output low at all other input combinations. In Static CMOS

the transistors are doing both the computation n of the output value and the driving of the output

which gives them a great advantage since it improves the circuits robustness to noise and in

regularities in supply voltage.

The most basic gate is an inverter. Digital circuits consist of millions of transistors and switching

rate of these transistors becomes very critical parameter in determining the performance of

circuits. Sizing of transistors becomes very imperative, which determine the switching speed of

not only the succeeding stage but of the previous stage as well.

Conventionally, in an inverter sizing ratio of PMOS to NMOS is maintained approximately at 2,to generate identical transistor performance. The conductivity of PMOS is behind on comparison

to NMOS. For simplicity, the conventional transistor sizing ratio has been followed.

Inverter:

PMOS (W/L) = (180m/50m)

NMOS (W/L) = (90m/50m)

Adders, Multiplexers are all realized with Inverters and other basic gates. ALU implementation

design is symmetric and posses the following advantages; Modules are reusable, less hardware

required, easy to fabricate.

In the entire design, care has been to taken to size PMOS twice that of the NMOS to achieve a

better performance.

The final loading of the Circuit is assumed to be 20fF.

12


13/52

3. Basic Building Blocks:

5.1 Inverter

An inverter is one of the basic gates. It gives a logic 1 for a logic 0 input and a logic 0 for a logic1 input.

Schematic:

Symbol :

13


14/52

Layout

Output waveforms:

The bottom waveform is from the simulation of the layout. And the other waveforms are fromthe simulation of schematic.

14


15/52

5.2 AND

The AND gate is also one of the basic gates. It is implemented using CMOS logic.

Truth Table for AND:

A B C= A and B0 0 00 1 01 0 01 1 1

Schematic:

Symbol :

15


16/52

Layout:

Output waveforms:


16


17/52

5.2OR

The OR gate is also one of the basic gates. It is implemented using CMOS logic.

Truth Table for OR:

A B C= A or B0 0 00 1 11 0 11 1 1

Schematic:

Symbol:

17


18/52

Layout:

Output waveforms:


18


19/52

19


20/52

5.4 XOR

XOR gate can be implemented using different logic. The Boolean equation for two input XOR isgiven by,

C =Where, A and B are inputs and C is the output.

Truth Table for XOR:

A B C= A xor B0 0 00 1 11 0 11 1 0

Schematic:

Symbol:

20


21/52

Layout:

Output waveforms:


21


22/52

5.5 Full Adder

A full adder is a logical circuit that performs an addition operation on three one-bit binarynumbers often written asA,B, and Cin. A full adder has a boolean equation for the output sumand carry where A, b and Cin are inputs and S and Cout are outputs.

.

Truth Table for Full Adder:

A B Cin Sum Cout0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1

In our implementation we have used one XOR and AND gate outputs in the full adder circuit toobtain the XOR and AND output of the input bits. This helped us reducing the total number ofgates in the complete 32 bit ALU by 64 gates.

22


23/52

Schematic:

23


24/52

Block diagram:

Layout:

24


25/52

Output waveforms:

The bottom two waveforms are from the simulation of the layout. And the other waveforms arefrom the simulation of schematic.

25


26/52

5.6 2-to-1 Multiplexer

A 2-to-1multiplexer selects one of two analog or digital input signals and forwards the selectedinput into a single line. A 2-to-1 multiplexer has a Boolean equation where ip1 and ip2are thetwo inputs, ip3is the selector input, and op is the output:

Schematic:

Block diagram:

26
http://en.wikipedia.org/wiki/Boolean_equationhttp://en.wikipedia.org/wiki/Boolean_equation


27/52

Layout

Output waveforms for ip3=0:

The bottom two waveforms are from the simulation of the layout. and the other waveforms arefrom the simulation of schematic.

27


28/52


29/52

5.7 3Input AND

3 input AND gate is used in the 4-to-1 multiplexer circuit.

The output will be a logic 1 if and only if all three inputs are logic 1s and it will be logic

0 if any one input is logic 0.

Schematic :

Block diagram:

Layout :

29


30/52

5.8 4 Input OR

4 input OR gate is used in the 4-to-1 multiplexer circuit.

The output will be a logic 1 if any of the four inputs are logic 1s and it will be logic 0 ifall the inputs are logic 0s.

Schematic:

Block diagram:

30


31/52

Layout :

5.9 4-to-1 Multiplexer

A 4-to-1multiplexer selects one of four analog or digital input signals and forwards the selected

input into a single line. A 4-to-1 multiplexer has aboolean equation where and are the twoinputs, S0 and S1 are the selector inputs, and F is the output:

In our circuit, sel0, sel1 are the selector inputs, ip1, ip2, ip3, ip4 are the four inputs to themultiplexer. A logic value of 0 at ip3 would connect ip1to the output while a logic value of 1would connect ip2 to the output.

Schematic:

31
http://en.wikipedia.org/wiki/Boolean_equationhttp://en.wikipedia.org/wiki/Boolean_equation


32/52

Block diagram:

32


33/52

Layout:

I nputs:

33


34/52

34


35/52

Output from schematic and layout for S1=0 S0=0

Output from schematic and layout for S1=0 S0= 1

35


36/52



36


37/52

5.10 1bit ALU

The 1 bit ALU performs 5 different functions based on the inputs to the select lines.

Truth Table for the functions of 1 bit ALU :

Sl No. S2 S1 S0 OPERATION

1 0 0 0 Addition

2 1 0 0 Subtraction

3 0 0 1 Xor

4 0 1 0 Or

5 0 1 1 And

Schematic:

37


38/52

Symbol:

Layout:

38


39/52

Inputs:

39


40/52

The bottom 2 waveforms are from the simulation of the schematic. And the top waveforms arefrom the simulation of layout.

Output from schematic and layout for S2=0 S1= 0 S0= 0: ADD

Output from schematic and layout for S2=1 S1=0 S0=0: SUBTRACT40


41/52

Output from schematic and layout for S2=0 S1=0 S0=1 : XOR

Output from schematic and layout for S2=0 S1=1 S0= 0:OR

41


42/52

Output from schematic and layout for S2=0 S1=1 S0=1 : AND

42


43/52

6 32bit ALU

The 1 bit ALU block is used 32 times to implement the 32 bit ALU.

The inputs are,

A with bits A0 - A31. A0 being LSB and A31 being MSB.

B with bits B0 - B31. B0 being LSB and B31 being MSB.

Cin

Select lines S0, S1 and S2

The outputs are,

R with bits R0 R31

COUT is the final carry.

Truth Table for the functions:

Sl No. S2 S1 S0 OPERATION

1 0 0 0 Addition

2 1 0 0 Subtraction

3 0 0 1 Xor

4 0 1 0 Or

5 0 1 1 And

43


44/52


45/52

45


46/52

6.2 Layout

46


47/52

47


48/52

6.3 Output waveforms for Add function:

The bottom 2 waveforms are from the simulation of the schematic. And the top waveforms arefrom the simulation of layout.

48


49/52

7 Schematic and Layout Verification of the 32 bit ALU

Testing Plan

In building our circuit, we tested each component separately using HSPICE, making sure all the

components functioned individually exactly as we expected them to. Then as we were

constructing the final 32 bit ALU, we incrementally tested the circuit in the following way:

Tested the basic gates i.e inverter, and, or, and xor for all possible input patterns.

Tested full adder for different inputs and obtained proper sum and carry outputs.

Tested 2:1 multiplexer and 4:1 multiplexer with different select line inputs.

Integrated full adder and 2:1 multiplexer and the addition and subtraction functions were

tested for 1 bit positive numbers.

OR gate and 4:1 multiplexer was also connected to the above 1 bit add/subtract circuit so

as to realize 1 bit ALU.

The output of NAND and XOR gates used in the full adder circuit is also fed to the 4:1

multiplexer as the inputs.

The 1 bit ALU is tested with different select line input combinations so as to obtain thecorresponding output of the desired function.

Tested the final 32-bit ALU for different positive numbers and the power dissipation,

rise time and fall time was obtained.

6 Experimental Results

49


50/52

Block Delay

Full Adder 1.05ns

2:1 Multiplexer 0.048ns

4:1 Multiplexer 0.06 ns

32 bit AND 10.0 ns

32 bit OR 0.25 ns

32 bit XOR 10.2 ns

32 bit Sum 10.3 ns

32 bit carry

Critical path delay: 10.346 nsec

Total number of Transistors: 4032

Total Power Dissipation: 6.1242W

50


51/52

7 Further Up gradation

Currently, the ALU is designed for positive integer. Next generation of this topology, can be

made to handle floating point and negative numbers.

The carry logic can be implemented using better logics which give lesser delay at the

expense of higher power dissipation and complexity.

8 Learnings

Broadened our horizons about various types of static and dynamic logic adder

implementations.

Implementation of this project in Cadence gave us a hand on experience with the design and

sketch of layout considering all the minor details involved.

6 Contributions

The following summarizes the various aspects of the project, and identifies the contribution of

the members of our group to each part.

Circuit Design and HSPICE simulations : Soumya

Layout : Raghu , Prathamesh

51


52/52

6 References

John P. Uyemura, Introduction to VLSI Circuits and systems

Jan M. Rabaey, Anatha Chandrakasan, Borivoje Niokolic, Digital Integrated Inegreated

Circutis

I. S. Hwang and P. S. Magarshack, A High-Speed Dynamically reconfigurable 32 bitCMOS adder