8/3/2019 32 Bit a Lure Port
1/52
EECE7353
VLSI Design (Spring 2010)
Final Project
Design and implementation of 32-bit
Arithmetic Logic Unit
by
Soumya Shivakumar Begur
Raghu Varier
Prathamesh Chawan
April 23, 2010
Northeastern University
Electrical and Computer Engineering Department
Contents
1
8/3/2019 32 Bit a Lure Port
2/52
1. Introduction ... 3
2. Topology Selection.. 4
3. ALU Design..... 8
4. CMOS Implementation. 12
5. Basic Building Blocks. 13
5.1 Inverter. 13
5.2 And.. 15
5.3 Or.... 17
5.4 Xor......... 19
5.5 Full Adder... 21
5.6 2-to-1 Multiplexer. 24
5.7 3 input AND 27
5.8 4 input OR 28
5.9 4-to-1 Multiplexer. 29
5.10 1 bit ALU 33
6. 32 Bit ALU. 38
6.1 Schematic 39
6.2 Layout. 40
6.3 Output waveforms 41
7. Schematic and layout Verification of 32 bit ALU 42
8. Experimental Results 43
9. Further Up gradation 4410. Learnings 44
11. References 45
2
8/3/2019 32 Bit a Lure Port
3/52
1. Introduction
The arithmetic logic unit (ALU) is the core of a CPU in a computer. The adder cell is the
elementary unit of an ALU. The constraints the adder has to satisfy are area, power and speed
requirements. Some of the conventional types of adders are ripple-carry adder, carry- lookaheadadder, carry-skip adder and Manchester carry chain adder [8]. The delay in an adder is
dominated by the carry chain. Carry chain analysis must consider transistor and wiring delays.
The aim of this project is to design a 32 bit ALU which performs the following four functions. Arithmetic ADD function Arithmetic SUBTRACT function Logical AND function Logical OR function Logical XOR function
Since this ALU operates on 32-bit operands, it is called 32-bit ALU.
The function to be performed on the operands A and B is defined by the ALU control / Selectlines.
Result lines provide result of the chosen function applied to the operands A and B.
Carry out indicates the final carry.
3
8/3/2019 32 Bit a Lure Port
4/52
1. Topology Selection
The adder cell is the elementary unit of an ALU. The constraints the adder has to satisfy are area,power and speed requirements. Adder can be implemented using static or dynamic logic. Thevarious kinds of implementations are,
Ripple carry adder Carry look ahead adder
Mirror adder Manchester carry chain adder Domino adder Carry skip adder
Ripple carry adder is designed using multiple full adders to add N-bit numbers. Each full adderinputs a Cin, which is the Cout of the previous adder. This kind of adder is called a ripple carryadder, since each carry bit "ripples" to the next full adder.
Carry- lookahead adders first compute carry propagate and generate and then computes SUMand CARRY from them. It allows for carry to be computed in each bit. Figure 1.2 shows a 4-bit
carry-lookahead adder. Carry- lookahead unit requires complex wiring between adders andlookahead unit, as the values must be routed back to adder from lookahead unit. Layout becomescomplex with multiple levels of lookahead.
Figure 1.3 shows a 4-bit carry-skip adder and skip module used. The skip module determineswhether it could just pass a carry in (CIN) the next four bits for addition or it has to wait until the
4
8/3/2019 32 Bit a Lure Port
5/52
carry out (C3) propagates through the last full adder in the design. In essence, the skip modulecan make the carry in (CIN) appear to skip through the four full adders.
The Manchester carry chain adder uses a precharged carry chain with P and G signals. Propagatesignal Pi is the XOR of input bits Ai and Bi and generate signal Gi is the NAND of input bits Ai
and Bi. Propagate signal connects adjacent carry bits and Generate signal discharge the carry bit.Figure 1.4 shows a Manchester carry chain. When input bits are 0, G i is HIGH and hence thecarry out node is discharged. When one of the input bits is 1, then P i is HIGH and carry outfollows carry in. When both bits are 1, then both G i and Pi are LOW; hence carry out noderemains isolated from carry in and ground. As the node is pre-charged to a HIGH state the carryout remains HIGH.
Each of the adder configurations may or may not require additional logic apart from full adderdesign. Table shows approximately how many additional gates and transistors are required foreach of the adder configurations.
5
8/3/2019 32 Bit a Lure Port
6/52
Dynamic logic can also used for adder implementation for high speed circuits. In this, the logic
blocks are built with n-MOS (see figure) pull-down tree that pre-charged and discharged through
series clocking transistors. The output of the logic gate is driven by a build in inverter that is
dynamically fed by the drain of the N-MOS tree which only can make, at most, one transition-
from logic 1 to 0- during the clock evaluation phase which allows the output inverter to shift
from logic 0 to 1. Therefore any number of gates can be cascaded provided by the fact thatevaluation of input values is only possible in half the clock phase.
Among these different implementations of adders due to the following reasons we triedimplementing Domino adder using Multiple output Domino logic based on the [3]. The Dominocircuits work at high very high speed. The number of transistors required would be less. (N+2) Nis fan-in. It is advantageous for circuits with more fan-ins as in 32 bit adder. More than 60% ofthe high performance microprocessors use domino logic for the implementation of differentfunctionalities.
The Domino logic is faster because of the following reasons. Logic threshold voltage is same as device threshold voltage VT
Less gate loading (less input capacitance) Less output loading (less output capacitance)
We were not able to successfully complete the adder circuit for 32 bit adder as we were facingissues with the propagate circuit which was producing wrong propagate bit which played a vitalrole in the generation of the carry bit. So, due to time constraints of the project we were not ableto debug the error in the circuit and thus we started with the less complex Ripple carry adder.
6
8/3/2019 32 Bit a Lure Port
7/52
Ripple Carry Adder:
In terms of area efficiency ripple carry adder is preferred. Keeping in mind small layout area and
less number of interconnections our ALU has been designed using ripple carry configuration.
However, the delay time for worst case is more when compared to other adders.
Ripple carry adder is designed using multiple full adders to add N-bit numbers. Each full adder
inputs a Cin, which is the Cout of the previous adder. The first (and only the first) full adder maybe replaced by a half adder.
The layout of ripple carry adder is simple, which allows for fast design time; however, the ripplecarry adder is relatively slow, since each full adder must wait for the carry bit to be calculatedfrom the previous full adder. The gate delay can easily be calculated by inspection of the fulladder circuit. Each full adder requires three levels of logic. In a 32-bit [ripple carry] adder, there32 full adders, so the critical path (worst case) delay is 32 * 3 = 96 gate delays.
The RCA can be used in applications where the delay is not an issue.
7
http://en.wikipedia.org/wiki/Gate_delayhttp://en.wikipedia.org/wiki/Gate_delay8/3/2019 32 Bit a Lure Port
8/52
1. ALU Design
A 32-bit ALU has been designed for 1.1 V operation in which, the full adder design has been
implemented using CMOS logic. The ALU has 32 stages, each stage consisting of three parts: a)
input multiplexers b) full adder and c) output multiplexers. The ALU performs the following two
arithmetic operations, ADD, SUBTRACT. The three logical operations performed are XOR,
AND and OR. The input and output sections consist of 4 to 1 and 2 to 1 multiplexers. The
multiplexers were designed using the CMOS logic. A set of three select signals has been
incorporated in the design to determine the operation being performed and the inputs and outputs
being selected. Figure 3.1 shows the 4-bit ALU with the CARRY bit cascading all the way from
first stage to fourth stage. The 32-bit ALU was designed in 45nm, twin-tub CMOS technology.
This chapter explains in detail the 32-bit ALU design. All of the multiplexers and the full adder
have been implemented using CMOS logic. Each stage is discussed in detail in the further
sections of this chapter.
3.1 Multiplexer Design
The multiplexers have been used in theALU design for input and output signals
selection. The multiplexer is
implemented using CMOS.
There are two kinds of multiplexers implemented: 2 to 1 multiplexer and 4 to 1 multiplexer.
Figure 3.2 shows the block diagram of a 4 to 1 MUX and Fig. 3.3 shows the circuit level diagram
of the 4 to 1 MUX. Figure 3.4 shows the block diagram of a 2 to 1 MUX and Fig. 3.5 shows the
circuit level diagram of the 2 to 1 MUX. The output of the multiplexer stage is passed as input to
the full adder. A combination of the 2 to 1 MUX and 4 to 1 MUX at the input and output stages
select the signals depending on the operation being performed.
The input and select signals have been named as An, Bn and Sn respectively, with the subscript n
indicating the correct signal number. The input and the output stages have a combination of 2 to
1 multiplexer and 4 to 1 multiplexer to select the type of operation. Figures 3.6 and 3.7 show
how this logic has been implemented at input and output stage, respectively. The select signals
are S0, S1 and S2. Signal S2 determines if the operation being performed is arithmetic add or
subtract. The select signals S0 and S1pick one of the four output signals route it to the output of
the ALU and hence determine which of the four arithmetic or logical operations should be
performed. S2 determines if the arithmetic operations performed is add or subtract. Table 3.1
8
Select line S2 Operation0 Add1 Subtract
8/3/2019 32 Bit a Lure Port
9/52
8/3/2019 32 Bit a Lure Port
10/52
3.2 Full Adder Design
In ALU, full adder forms the core of the entire design. The full adder performs the computing
function of the ALU.
A full adder could be defined as a combinational circuit that forms the arithmetic sum of three
input bits. It consists of three inputs and two outputs. In our design, we have designated the three
inputs as A, B and CIN. The third input CIN represents carry input to the first stage. The outputs
are SUM and CARRY. Figure 3.8 shows the logic level diagram of a full adder. The Boolean
expressions for the SUM and CARRY bits are as shown below.
SUM= A B CIN
CARRY= A B + A CIN+ B CIN
SUM bit is the XOR function of all three inputs and CARRY bit is the AND function of the three
inputs. The truth table of a full adder is shown in Table 3.3. The truth table also indicates the
status of the CARRY bit; that is to say, if that carry bit has been generated or deleted or
propagated. Depending on the status of input bits A and B, the CARRY bit is either generated or
deleted or propagated [8]. If either one of A or B inputs is 1, then the previous carry is just
propagated, as the sum of A and B is 1. If both A and B are 1s then carry is generated becausesumming A and B would make output SUM 0 and CARRY 1. If both A and B are 0s then
summing A and B would give us 0 and any previous carry is added to this SUM making
CARRY bit 0. This is in effect deleting the CARRY. To construct an n-bit adder we have to
cascade n such 1-bit adders. We have used this ripple carry adder (RCA) configuration in our
ALU design.
10
Select line S2 Operation0 Cin1 1
8/3/2019 32 Bit a Lure Port
11/52
In RCA, the CARRY bit ripples all the way from first stage to nth stage. Figure 3.9 shows the
block diagram of a four-bit ripple carry adder. The delay in a RCA depends on the number of
stages cascaded and also the input bits patterns.
For certain input patterns, a CARRY is neither generated nor propagated. This way the CARRY
bit need not ripple through the stages. This effectively reduces the delay in the circuit. On the
other hand, certain input patterns generate carry bit in the first stage itself, which might have to
ripple through all the stages. This definitely increases the delay in the circuit. The propagation
delay of such a case, also called critical path, is defined as worst-case delay over all possibleinput patterns. In a ripple carry adder, the worst-case delay occurs when a carry bit propagates all
the way from least significant bit position to most significant bit position. The total delay of the
adder would be an addition of delay of a SUM bit and delay of a CARRY bit multiplied by
number of bits minus one in the input word, given by following Eq.
Tadder = (N-1) Tcarry + Tsum
Where N is number of bits in input word, Tcarry and Tsum are propagation delays from one stage to
another. For an efficient ripple carry adder, it is important to reduce Tcarry than Tsum as the former
influences the total adder delay more.
3.3 Logical operations
The logical functions are implemented using the respective gates.
The AND and XOR functions are incorporated with the adder itself which results in significant
reduction in the number of gates used.
The OR function is implemented using an OR gate embedded into the basic cell.
11
8/3/2019 32 Bit a Lure Port
12/52
2. CMOS Implementation
Static CMOS logic is the most widely used logic in todays industry. The circuit in Static CMOS
adder is build using N-MOS pull-down tree driving the output low in certain input combinationsand PMOS pull-up tree driving the output low at all other input combinations. In Static CMOS
the transistors are doing both the computation n of the output value and the driving of the output
which gives them a great advantage since it improves the circuits robustness to noise and in
regularities in supply voltage.
The most basic gate is an inverter. Digital circuits consist of millions of transistors and switching
rate of these transistors becomes very critical parameter in determining the performance of
circuits. Sizing of transistors becomes very imperative, which determine the switching speed of
not only the succeeding stage but of the previous stage as well.
Conventionally, in an inverter sizing ratio of PMOS to NMOS is maintained approximately at 2,to generate identical transistor performance. The conductivity of PMOS is behind on comparison
to NMOS. For simplicity, the conventional transistor sizing ratio has been followed.
Inverter:
PMOS (W/L) = (180m/50m)
NMOS (W/L) = (90m/50m)
Adders, Multiplexers are all realized with Inverters and other basic gates. ALU implementation
design is symmetric and posses the following advantages; Modules are reusable, less hardware
required, easy to fabricate.
In the entire design, care has been to taken to size PMOS twice that of the NMOS to achieve a
better performance.
The final loading of the Circuit is assumed to be 20fF.
12
8/3/2019 32 Bit a Lure Port
13/52
3. Basic Building Blocks:
5.1 Inverter
An inverter is one of the basic gates. It gives a logic 1 for a logic 0 input and a logic 0 for a logic1 input.
Schematic:
Symbol :
13
8/3/2019 32 Bit a Lure Port
14/52
Layout
Output waveforms:
The bottom waveform is from the simulation of the layout. And the other waveforms are fromthe simulation of schematic.
14
8/3/2019 32 Bit a Lure Port
15/52
5.2 AND
The AND gate is also one of the basic gates. It is implemented using CMOS logic.
Truth Table for AND:
A B C= A and B0 0 00 1 01 0 01 1 1
Schematic:
Symbol :
15
8/3/2019 32 Bit a Lure Port
16/52
Layout:
Output waveforms:
The bottom waveform is from the simulation of the layout. And the other waveforms are fromthe simulation of schematic.
16
8/3/2019 32 Bit a Lure Port
17/52
5.2OR
The OR gate is also one of the basic gates. It is implemented using CMOS logic.
Truth Table for OR:
A B C= A or B0 0 00 1 11 0 11 1 1
Schematic:
Symbol:
17
8/3/2019 32 Bit a Lure Port
18/52
Layout:
Output waveforms:
The bottom waveform is from the simulation of the layout. And the other waveforms are fromthe simulation of schematic.
18
8/3/2019 32 Bit a Lure Port
19/52
19
8/3/2019 32 Bit a Lure Port
20/52
5.4 XOR
XOR gate can be implemented using different logic. The Boolean equation for two input XOR isgiven by,
C =Where, A and B are inputs and C is the output.
Truth Table for XOR:
A B C= A xor B0 0 00 1 11 0 11 1 0
Schematic:
Symbol:
20
8/3/2019 32 Bit a Lure Port
21/52
Layout:
Output waveforms:
The bottom waveform is from the simulation of the layout. And the other waveforms are fromthe simulation of schematic.
21
8/3/2019 32 Bit a Lure Port
22/52
5.5 Full Adder
A full adder is a logical circuit that performs an addition operation on three one-bit binarynumbers often written asA,B, and Cin. A full adder has a boolean equation for the output sumand carry where A, b and Cin are inputs and S and Cout are outputs.
.
Truth Table for Full Adder:
A B Cin Sum Cout0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1
In our implementation we have used one XOR and AND gate outputs in the full adder circuit toobtain the XOR and AND output of the input bits. This helped us reducing the total number ofgates in the complete 32 bit ALU by 64 gates.
22
8/3/2019 32 Bit a Lure Port
23/52
Schematic:
23
8/3/2019 32 Bit a Lure Port
24/52
Block diagram:
Layout:
24
8/3/2019 32 Bit a Lure Port
25/52
Output waveforms:
The bottom two waveforms are from the simulation of the layout. And the other waveforms arefrom the simulation of schematic.
25
8/3/2019 32 Bit a Lure Port
26/52
5.6 2-to-1 Multiplexer
A 2-to-1multiplexer selects one of two analog or digital input signals and forwards the selectedinput into a single line. A 2-to-1 multiplexer has a Boolean equation where ip1 and ip2are thetwo inputs, ip3is the selector input, and op is the output:
Schematic:
Block diagram:
26
http://en.wikipedia.org/wiki/Boolean_equationhttp://en.wikipedia.org/wiki/Boolean_equation8/3/2019 32 Bit a Lure Port
27/52
Layout
Output waveforms for ip3=0:
The bottom two waveforms are from the simulation of the layout. and the other waveforms arefrom the simulation of schematic.
27
8/3/2019 32 Bit a Lure Port
28/52
8/3/2019 32 Bit a Lure Port
29/52
5.7 3Input AND
3 input AND gate is used in the 4-to-1 multiplexer circuit.
The output will be a logic 1 if and only if all three inputs are logic 1s and it will be logic
0 if any one input is logic 0.
Schematic :
Block diagram:
Layout :
29
8/3/2019 32 Bit a Lure Port
30/52
5.8 4 Input OR
4 input OR gate is used in the 4-to-1 multiplexer circuit.
The output will be a logic 1 if any of the four inputs are logic 1s and it will be logic 0 ifall the inputs are logic 0s.
Schematic:
Block diagram:
30
8/3/2019 32 Bit a Lure Port
31/52
Layout :
5.9 4-to-1 Multiplexer
A 4-to-1multiplexer selects one of four analog or digital input signals and forwards the selected
input into a single line. A 4-to-1 multiplexer has aboolean equation where and are the twoinputs, S0 and S1 are the selector inputs, and F is the output:
In our circuit, sel0, sel1 are the selector inputs, ip1, ip2, ip3, ip4 are the four inputs to themultiplexer. A logic value of 0 at ip3 would connect ip1to the output while a logic value of 1would connect ip2 to the output.
Schematic:
31
http://en.wikipedia.org/wiki/Boolean_equationhttp://en.wikipedia.org/wiki/Boolean_equation8/3/2019 32 Bit a Lure Port
32/52
Block diagram:
32
8/3/2019 32 Bit a Lure Port
33/52
Layout:
I nputs:
33
8/3/2019 32 Bit a Lure Port
34/52
34
8/3/2019 32 Bit a Lure Port
35/52
Output from schematic and layout for S1=0 S0=0
Output from schematic and layout for S1=0 S0= 1
35
8/3/2019 32 Bit a Lure Port
36/52
Output from schematic and layout for S1=1 S0=0
Output from schematic and layout for S1=1 S0=1
36
8/3/2019 32 Bit a Lure Port
37/52
5.10 1bit ALU
The 1 bit ALU performs 5 different functions based on the inputs to the select lines.
Truth Table for the functions of 1 bit ALU :
Sl No. S2 S1 S0 OPERATION
1 0 0 0 Addition
2 1 0 0 Subtraction
3 0 0 1 Xor
4 0 1 0 Or
5 0 1 1 And
Schematic:
37
8/3/2019 32 Bit a Lure Port
38/52
Symbol:
Layout:
38
8/3/2019 32 Bit a Lure Port
39/52
Inputs:
39
8/3/2019 32 Bit a Lure Port
40/52
The bottom 2 waveforms are from the simulation of the schematic. And the top waveforms arefrom the simulation of layout.
Output from schematic and layout for S2=0 S1= 0 S0= 0: ADD
Output from schematic and layout for S2=1 S1=0 S0=0: SUBTRACT40
8/3/2019 32 Bit a Lure Port
41/52
Output from schematic and layout for S2=0 S1=0 S0=1 : XOR
Output from schematic and layout for S2=0 S1=1 S0= 0:OR
41
8/3/2019 32 Bit a Lure Port
42/52
Output from schematic and layout for S2=0 S1=1 S0=1 : AND
42
8/3/2019 32 Bit a Lure Port
43/52
6 32bit ALU
The 1 bit ALU block is used 32 times to implement the 32 bit ALU.
The inputs are,
A with bits A0 - A31. A0 being LSB and A31 being MSB.
B with bits B0 - B31. B0 being LSB and B31 being MSB.
Cin
Select lines S0, S1 and S2
The outputs are,
R with bits R0 R31
COUT is the final carry.
Truth Table for the functions:
Sl No. S2 S1 S0 OPERATION
1 0 0 0 Addition
2 1 0 0 Subtraction
3 0 0 1 Xor
4 0 1 0 Or
5 0 1 1 And
43
8/3/2019 32 Bit a Lure Port
44/52
8/3/2019 32 Bit a Lure Port
45/52
45
8/3/2019 32 Bit a Lure Port
46/52
6.2 Layout
46
8/3/2019 32 Bit a Lure Port
47/52
47
8/3/2019 32 Bit a Lure Port
48/52
6.3 Output waveforms for Add function:
The bottom 2 waveforms are from the simulation of the schematic. And the top waveforms arefrom the simulation of layout.
48
8/3/2019 32 Bit a Lure Port
49/52
7 Schematic and Layout Verification of the 32 bit ALU
Testing Plan
In building our circuit, we tested each component separately using HSPICE, making sure all the
components functioned individually exactly as we expected them to. Then as we were
constructing the final 32 bit ALU, we incrementally tested the circuit in the following way:
Tested the basic gates i.e inverter, and, or, and xor for all possible input patterns.
Tested full adder for different inputs and obtained proper sum and carry outputs.
Tested 2:1 multiplexer and 4:1 multiplexer with different select line inputs.
Integrated full adder and 2:1 multiplexer and the addition and subtraction functions were
tested for 1 bit positive numbers.
OR gate and 4:1 multiplexer was also connected to the above 1 bit add/subtract circuit so
as to realize 1 bit ALU.
The output of NAND and XOR gates used in the full adder circuit is also fed to the 4:1
multiplexer as the inputs.
The 1 bit ALU is tested with different select line input combinations so as to obtain thecorresponding output of the desired function.
Tested the final 32-bit ALU for different positive numbers and the power dissipation,
rise time and fall time was obtained.
6 Experimental Results
49
8/3/2019 32 Bit a Lure Port
50/52
Block Delay
Full Adder 1.05ns
2:1 Multiplexer 0.048ns
4:1 Multiplexer 0.06 ns
32 bit AND 10.0 ns
32 bit OR 0.25 ns
32 bit XOR 10.2 ns
32 bit Sum 10.3 ns
32 bit carry
Critical path delay: 10.346 nsec
Total number of Transistors: 4032
Total Power Dissipation: 6.1242W
50
8/3/2019 32 Bit a Lure Port
51/52
7 Further Up gradation
Currently, the ALU is designed for positive integer. Next generation of this topology, can be
made to handle floating point and negative numbers.
The carry logic can be implemented using better logics which give lesser delay at the
expense of higher power dissipation and complexity.
8 Learnings
Broadened our horizons about various types of static and dynamic logic adder
implementations.
Implementation of this project in Cadence gave us a hand on experience with the design and
sketch of layout considering all the minor details involved.
6 Contributions
The following summarizes the various aspects of the project, and identifies the contribution of
the members of our group to each part.
Circuit Design and HSPICE simulations : Soumya
Layout : Raghu , Prathamesh
51
8/3/2019 32 Bit a Lure Port
52/52
6 References
John P. Uyemura, Introduction to VLSI Circuits and systems
Jan M. Rabaey, Anatha Chandrakasan, Borivoje Niokolic, Digital Integrated Inegreated
Circutis
I. S. Hwang and P. S. Magarshack, A High-Speed Dynamically reconfigurable 32 bitCMOS adder
Top Related