64 bit sram memory: design paper

A 64-Bit Memory System DesignGroup 13

Mamoon Ismail KhalidN11965449

Haixiang LiuN14227283

Mohammed El MassadN19662628

Yifei ChenN15390017

Linxuam WangN14518973

Abstract—This report discusses the design and implementationof a 64-bit Static Random Access Memory (SRAM) system in45 nm CMOS technology using six-transistor (6T) cells. The goalof the project was to design a memory system that is optimal, interms of speed, in the target technology.

I. INTRODUCTION

The project was divided into four parts.An SRAM memory system consists of four parts: (1) the

address registers, where addresses received from the CPU arestored while the memory retrieves the requested data, (2) a rowdecoder that takes the address stored in the address registerand selects the appropriate row of the memory array, (3) thememory array itself, consisting of SRAM cells arranged intoan array fashion, and (4) two data registers where the data tobe read or written into the memory is stored.

II. METHODOLOGY

The 6T cell is the basis of the SRAM, which holds thedata. On a 64 bit SRAM layout, there is 1 column and 16rows. A column can store 4 bits. i.e. 4 individual elements arepresent in each column. The effective layout is 16x[1x4]. Amodular design approach is done, wherein each column andits circuits(read/write) are done individually. Address registersare used to store and deliver the address values to the circuit,and similarly data registers are used to delver the data to bewritten to the circuit. A second set of data register is used tostore the values read from the SRAM. A 4-16 decoder is usedas the row decoder to select the row of operation.

A. Decoder Logic

The goal here was to design a 1-to-16 line decoder, i.e.,a combinational logic circuit that activates one of sixteenoutput bits for each input value from 0 to 15 — the range ofinteger values that can be expressed in four bits. The intendedfunctionality of the circuit is shown in Figure 1. The circuitwas to be designed such that the delay is at most 116.55ps(35% of 333ps).

We implemented the 4-to-16 decoder using two 2-to-4decoders and 16 AND gates. We used the static logic styleto implement the two 2-to-4 decoders. To implement the WLsignals, we simply fed each decoder output together with theclock signal through an AND gate. To avoid glitch, we use clkwith different delay to control the decoder and wl generation(this part is used synchronize wl with cl) Figure 2. The delayfrom the rising edge of the of the clock signal to the rising

Fig. 1. Intended functionality of WL signals in the target design.

edge of the WL signal was approximately 44 ps. The delayfrom the falling edge of the clock to the falling edge ofthe WL signal was approximately 46 ps. The average powerdissipation of the decoder was 24.5 µW (note that the averagepower dissipation of the decoder was calculated assuming allfour inputs transition from 0 to 1 and 1 to 0 with the sameprobability, i.e., 50%).

Fig. 2. Top-level Cadence Schematic for Decoder Design and WL imple-mentation in our memory system.

Fig. 3. Waveform showing the operation of WL signals in our memory system.

B. Address and Data Registers

For the address and data registers, we used master–slaveedge-triggered D flip-flops as such flip-flops are not suscep-tible to race conditions, which makes them more stable incomparison to other types of flip-flops. We used the flip-flopsto create the 4-bit address register and the two 4-bit dataregisters.

Fig. 4. Circuit-level schematic of address and data registers in our memorysystem.

C. Read and Write Circuit

1) Write Circuits: We implemented the write circuit basedon the lecture notes (using two transmission gates for eachwrite circuit that are controlled by the Write Enable (WE)signal). We connect the outputs of the data register flip-flops toa voltage-controlled switch that outputs 0 upon input of a LOWvoltage for BL and 1 upon input of a HIGH voltage for BL.We also added an inverter chain between each of the two datadrivers and its corresponding transmission gate, in order toreduce the write delay (a 4-inverter chain for the complementof the bit line, with u factor of 2.39, and 2-inverter chain forthe bit line with a u factor of 2.98. Fig. 5 shows the schematicof a single write circuit in our memory system.

Fig. 5. Schematic of a single write circuit in our memory system.

2) Read Circuit: The bit line of an SRAM cell takes arelatively long time to discharge (after the being charged toVDD using the PRE signal and the activation of the word line).To enable reading at higher speeds, we used a sense amplifier,which senses small changes in the bit line of the SRAM celland generates a full-swing output. We then feed the output of

the sense amplifier to the input of the appropriate data registerflip-flop.

Fig. 6. Schemtic of the read circuit of our memory system.

D. 6T SRAM Cell

The 6T sram cell is a bi-stable latching circuitry. Fig. 7shows the schematic of our memory cell, where M4 andM5 are the access transistors, M0 and M1 are the pull-uptransistors, M2 and M6 are the two pull-down transistors.M6, M7, and M8 are the three precharge transistors. Theword lines are connected to the gate terminals of the accesstransistors. Whenever the particular word line goes high theaccess transistors are ON and the sram cell stores the data fromthe write driver at which time the word enable is ON and inthe next clock cycle the read enable is ON, at this point thesense amplifier reads the data stored data from 6T sram cell.

We chose the transistor sizes to the read margin and writemargin requirements. We tested five different sizing configu-rations and the best one satisfy the performance requirementswith the configuration listed in Table I.

Fig. 7. Schematic of a single memory cell in our system.

Transistor Pull-up Pull-down AccessWidth 90 nm 180 nm 145 nm

TABLE ISIZING OF THE VARIOUS TRANSISTORS IN OUR SRAM CELLS AND THE

PRECHARGE CIRCUITRY.

Read Margin:a) VTrip calculation: The potential difference between M1

and M3 with a VDD of 0.8 V is calculated to be 380 mVb) VRead Calculation: The voltage between M2 and M4 iscalculated with a constant BLB (Bit Line Bar) to be 174 mV.Therefore, the Read margin is calculated to be Vtrip - Vreadequals to 206 mV which is nearly 26% of VDD.

Write Margin: a) To calculate the write margin, BLB is keptconstant and BL is kept changing from 0 to VDD and weobserve the voltage at which the out goes high with respect toBL which is at approximately 290 mV. b) Note, whenever weare giving access transistors greater than pull down network,we are not able to get write margin greater than 36.25 percentVdd.

E. SRAM Layout

SRAM Layout The 6T cell layout is done using two metallayers and the achieved area for a single sram cell is 0.81 µm2.

The 16*1 bit sram layout along with the pre-charge circuitis included in Fig. 9.

Fig. 8. Layout a single memory cell in our system.

F. Complete Peripheral

1) Row Write Circuits: The write circuits are used to pushthe BL and BLB beyond the bistability to the value thatneeds to be stored. Each row write circuit has a write drivercontrolled by the write enable signal, that drives the output tothe value of the data. The BL and BLB before being pushedto the 6T cell is controlled by the row address decoder, thuschoosing the cell of operation.

2) Row Read Circuits: The row read circuits are controlledby the address register outputs. The selected BL and BLBvalues are now read by using a sense Amplifier. A sense

Fig. 9. Layout of memory cell array.

amplifier also uses a Precharge circuits to charge the valuesbeing held and a latch is finally used to control, smooth andfilter the final output.

3) Remaining Peripherals: The remainder of the peripheralsare split into three parts. First part is Address registers com-bined with row decoders. This peripheral generated a delay ofabout 118.5+16.5+20 = 155ps. The second peripheral consistsof write driver combined with the data registers which inputsthe data into the write driver. The output of the write drivershould reach the sram cell a few picoseconds before the writeline output from the row decoder reaches the sram cell. So thedata from the write driver is delayed by 70ps by using buffers.The third peripheral consists of sense amplifier and data latchwhose combined accounted to approximately 225 ps. (Pleasesee the table at the end of the report for timing delays andother details).

Component Performance characteristic Value

Data and address registers

CLK-Q Delay 16.5 psSetup time 20 psHold time 0 psPower dissipation 11.72 µW

Row decoder Delay (before array layout) 118.5 psPower dissipation 24.5 µW

SRAM array

Read margin 206 mVWrite margin > 290 mVArea (of individual cell) 0.81 µm2

Cell access Time 159 psPower dissipation

Sense amplifier Delay 225 psPower dissipation

Write circuit Discharge time of bit line 105 psPower dissipation

Total read access time 96 psTotal write delay 112 ps

TABLE IIPERFORMANCE CHARACTERISTICS OF DIFFERENT COMPONENTS IN OUR

MEMORY SYSTEM.

III. SIMULATION RESULTS

Final Output:1) Read and write enable signal (we and re) are opposite. Herewe add a Q of cell to make the result more clearly.2) Synchronization of signals: PRE: syn (RE AND2 reverse ofWL): PRE = 0 only when re = 1 and wl = 0, SAE = syn(PREAND2 RE): SAE = 1 only when PRE = 1 and re = 1 (toavoid glitch for SAE, We use ”RE AND2 RE” replace ”RE”in ”RE” AND2 PRE)3) Explanation of the process: At the initial state, Q = 0, BL =0, BLB = 1. Here, when the add (address) = 0, the wl (wordline) works (equal = 1). In this whole process, there are 4times wl = 1. In first wl = 1, it write 1 to cell (Q goes to 1,BL goes to 1, BLB goes to 0), then in second wl = 1, read 1from cell (pre-charge BL and BLB to 1 when PRE = 0, getOUT sap when SAE = 1, result read data in read register =1). Then write 0 to cell, and read 0 from cell (read data = 0).

Fig. 10. Simulation results of our memory system. The waveforms show thetransitions of the different signals in our system corresponding to a set of readand write operations.

IV. DISCUSSION

1) Synchronization: To generate the PRE signal, weAND’ed the RE signal with the complement of the WLsignal, i.e. PRE = RE.ReverseWL. To generate the SAEsignal, we AND’ed the PRE signal with the RE signal, i.e.,SAE = PRE.RE. To avoid having glitches in the SAEsignal, we first AND the RE signal with itself, and use that asthe Read Enable signal when generating SAE.

2) Layout Issues: The next problem is the layout areaconstraint. The area requirement for a signal SRAM cell is 0.8nm2. It is extremely hard to satisfy the DRC requirement withmany via and connectors within such small area. The solutionto this problem is to compress the number of connector sincethis component will take a lot of space to satisfy DRC needs.

Moreover, utilize upper layers to reduce conflicts happened inM1 layer. If two metal path with different voltage, use M1only demands more space to prevent the interference, yet usetwo different layers could compress them together within anarrower space.

V. FILES IN CADENCE

ID: yc2389 PW: N15390017 Working Directory: /ca-dence/vlsi proj

VI. CONCLUSION

A 64 bit Memory System Design along with the layout ofSRAM array is demonstrated in this report. The whole systemis fully functional with reasonable timing sequence. The layoutof SRAM array is well designed and compact.

64 bit sram memory: design paper

Technology

Transcript of 64 bit sram memory: design paper