Digital Integrated Circuits Lecture 13:...

Post on 17-Aug-2020

2 views 0 download

Transcript of Digital Integrated Circuits Lecture 13:...

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 1

Digital Integrated CircuitsLecture 13: SRAM

Chih-Wei Liu

VLSI Signal Processing LAB

National Chiao Tung University

cwliu@twins.ee.nctu.edu.tw

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 2

Outline

Memory Arrays

SRAM Architecture

SRAM Cell

Decoders

Column Circuitry

Multiple Ports

Serial Access Memories

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 3

Memory Arrays

Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory(CAM)

Read/Write Memory(RAM)

(Volatile)

Read Only Memory(ROM)

(Nonvolatile)

Static RAM(SRAM)

Dynamic RAM(DRAM)

Shift Registers Queues

First InFirst Out(FIFO)

Last InFirst Out(LIFO)

Serial InParallel Out

(SIPO)

Parallel InSerial Out

(PISO)

Mask ROM ProgrammableROM

(PROM)

ErasableProgrammable

ROM(EPROM)

ElectricallyErasable

ProgrammableROM

(EEPROM)

Flash ROM

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 4

Array Architecture

2n words of 2m bits eachIf n >> m, fold by 2k into fewer rows of more columns

Good regularity – easy to designVery high density if good cells are used

row decoder

columndecoder

n

n-kk

2m bits

columncircuitry

bitline conditioning

memory cells:2n-k rows x2m+k columns

bitlines

wordlines

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 5

12T SRAM Cell

Basic building block: SRAM CellHolds one bit of information, like a latchMust be read and written

12-transistor (12T) SRAM cellUse a simple latch connected to bitline46 x 75 λ unit cell

bit

write

write_b

read

read_b

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 6

6T SRAM Cell

Cell size accounts for most of array sizeReduce cell size at expense of complexity

6T SRAM CellUsed in most commercial chipsData stored in cross-coupled inverters

Read:Precharge bit, bit_bRaise wordline

Write:Drive data onto bit, bit_bRaise wordline

bit bit_b

word

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 7

SRAM Read

Precharge both bitlines highThen turn on wordlineOne of the two bitlines will be pulled down by the cellEx: A = 0, A_b = 1

bit discharges, bit_b stays highBut A bumps up slightly

Read stabilityA must not flip

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

0.0

0.5

1.0

1.5

0 100 200 300 400 500 600time (ps)

word bit

A

A_b bit_b

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 8

SRAM Read

Precharge both bitlines highThen turn on wordlineOne of the two bitlines will be pulled down by the cellEx: A = 0, A_b = 1

bit discharges, bit_b stays highBut A bumps up slightly

Read stabilityA must not flipN1 >> N2

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

0.0

0.5

1.0

1.5

0 100 200 300 400 500 600time (ps)

word bit

A

A_b bit_b

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 9

SRAM Write

Drive one bitline high, the other lowThen turn on wordlineBitlines overpower cell with new valueEx: A = 0, A_b = 1, bit = 1, bit_b = 0

Force A_b low, then A rises highWritability

Must overpower feedback inverter

time (ps)

word

A

A_b

bit_b

0.0

0.5

1.0

1.5

0 100 200 300 400 500 600 700

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 10

SRAM Write

Drive one bitline high, the other lowThen turn on wordlineBitlines overpower cell with new valueEx: A = 0, A_b = 1, bit = 1, bit_b = 0

Force A_b low, then A rises highWritability

Must overpower feedback inverterN2 >> P1

time (ps)

word

A

A_b

bit_b

0.0

0.5

1.0

1.5

0 100 200 300 400 500 600 700

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 11

SRAM Sizing

High bitlines must not overpower inverters during readsBut low bitlines must write new value into cell

bit bit_b

med

A

weak

strong

med

A_b

word

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 12

SRAM Column Example

Read Write

H H

SRAM Cell

word_q1

bit_v1f

bit_b_v1f

out_v1rout_b_v1r

φ1

φ2

word_q1

bit_v1f

out_v1r

φ2

MoreCells

Bitline Conditioning

φ2

MoreCells

SRAM Cell

word_q1

bit_v1f

bit_b_v1f

data_s1

write_q1

Bitline Conditioning

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 13

SRAM Layout

Cell size is critical: 26 x 45 λ (even smaller in industry)Tile cells sharing VDD, GND, bitline contacts

VDD

GND GNDBIT BIT_B

WORD

Cell boundary

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 14

Decoders

n:2n decoder consists of 2n n-input AND gatesOne needed for each row of memoryBuild AND from NAND or NOR gates

Static CMOS Pseudo-nMOS

word0

word1

word2

word3

A0A1

A1word

A0 1 1

1/2

2

4

8

16word

A0

A1

11

11

4

8word0

word1

word2

word3

A0A1

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 15

Decoder Layout

Decoders must be pitch-matched to SRAM cellRequires very skinny gates

GND

VDD

word

buffer inverterNAND gate

A0A0A1A2A3 A2A3 A1

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 16

Large Decoders

For n > 4, NAND gates become slowBreak large gates into multiple smaller gates

word0

word1

word2

word3

word15

A0A1A2A3

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 17

Predecoding

Many of these gates are redundantFactor out commongates into predecoderSaves areaSame path effort

A0

A1

A2

A3

word1

word2

word3

word15

word0

1 of 4 hotpredecoded lines

predecoders

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 18

Column Circuitry

Some circuitry is required for each columnBitline conditioningSense amplifiersColumn multiplexing

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 19

Bitline Conditioning

Precharge bitlines high before reads

Equalize bitlines to minimize voltage difference when using sense amplifiers

φbit bit_b

φ

bit bit_b

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 20

Sense Amplifiers

Bitlines have many cells attachedEx: 32-kbit SRAM has 128 rows x 256 cols

128 cells on each bitline

tpd ∝ (C/I) ΔVEven with shared diffusion contacts, 64C of diffusion capacitance (big C)

Discharged slowly through small transistors (small I)

Sense amplifiers are triggered on small voltage swing (reduce ΔV)

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 21

Differential Pair Amp

Differential pair requires no clockBut always dissipates static power

bit bit_bsense_b sense

N1 N2

N3

P1 P2

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 22

Clocked Sense Amp

Clocked sense amp saves powerRequires sense_clk after enough bitline swingIsolation transistors cut off large bitline capacitance

bit_bbit

sense sense_b

sense_clk isolationtransistors

regenerativefeedback

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 23

Twisted Bitlines

Sense amplifiers also amplify noiseCoupling noise is severe in modern processesTry to couple equally onto bit and bit_bDone by twisting bitlines

b0 b0_b b1 b1_b b2 b2_b b3 b3_b

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 24

Column Multiplexing

Recall that array may be folded for good aspect ratioEx: 2k word x 16 folded into 256 rows x 128 columns

Must select 16 output bits from the 128 columnsRequires 16 8:1 column multiplexers

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 25

Tree Decoder Mux

Column mux can use pass transistorsUse nMOS only, precharge outputs

One design is to use k series transistors for 2k:1 muxNo external decoder logic needed

B0 B1 B2 B3 B4 B5 B6 B7 B0 B1 B2 B3 B4 B5 B6 B7A0

A0

A1

A1

A2

A2

Y Yto sense amps and write circuits

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 26

Single Pass-Gate Mux

Or eliminate series transistors with separate decoder

A0A1

B0 B1 B2 B3

Y

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 27

Ex: 2-way Muxed SRAM

MoreCells

word_q1

write0_q1

φ2

MoreCells

A0

A0

φ2

data_v1

write1_q1

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 28

Multiple Ports

We have considered single-ported SRAMOne read or one write on each cycle

Multiported SRAM are needed for register files

Examples:Multicycle MIPS must read two sources or write a result on some cycles

Pipelined MIPS must read two sources and write a third result each cycle

Superscalar MIPS must read and write many sources and results each cycle

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 29

Dual-Ported SRAM

Simple dual-ported SRAMTwo independent single-ended readsOr one differential write

Do two reads and one write by time multiplexingRead during ph1, write during ph2

bit bit_b

wordBwordA

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 30

Multi-Ported SRAM

Adding more access transistors hurts read stabilityMultiported SRAM isolates reads from state nodeSingle-ended design minimizes number of bitlines

bA

wordBwordA

wordDwordC

wordFwordE

wordG

bB bC

writecircuits

readcircuits

bD bE bF bG

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 31

Serial Access Memories

Serial access memories do not use an addressShift RegistersTapped Delay LinesSerial In Parallel Out (SIPO)Parallel In Serial Out (PISO)Queues (FIFO, LIFO)

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 32

Shift Register

Shift registers store and delay dataSimple design: cascade of registers

Watch your hold times!

clk

Din Dout8

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 33

Denser Shift Registers

Flip-flops aren’t very area-efficientFor large shift registers, keep data in SRAM insteadMove read/write pointers to RAM rather than data

Initialize read address to first entry, write to lastIncrement address on each cycle

Din

Dout

clk

counter counterreset

00...00

11...11

readaddr

writeaddr

dual-portedSRAM

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 34

Tapped Delay Line

A tapped delay line is a shift register with a programmable number of stagesSet number of stages with delay controls to mux

Ex: 0 – 63 stages of delay

SR

32

clk

Din

delay5

SR

16

delay4

SR

8

delay3S

R4

delay2

SR

2

delay1

SR

1

delay0

Dout

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 35

Serial In Parallel Out

1-bit shift register reads in serial dataAfter N steps, presents N-bit parallel output

clk

P0 P1 P2 P3

Sin

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 36

Parallel In Serial Out

Load all N bits in parallel when shift = 0Then shift one bit out per cycle

clkshift/load

P0 P1 P2 P3

Sout

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 37

Queues

Queues allow data to be read and written at different rates.Read and write each use their own clock, dataQueue indicates whether it is full or emptyBuild with SRAM and read/write counters (pointers)

Queue

WriteClk

WriteData

FULL

ReadClk

ReadData

EMPTY

DIC-Lec13 cwliu@twins.ee.nctu.edu.tw 38

FIFO, LIFO Queues

First In First Out (FIFO)Initialize read and write pointers to first element

Queue is EMPTY

On write, increment write pointer

If write almost catches read, Queue is FULL

On read, increment read pointer

Last In First Out (LIFO)Also called a stack

Use a single stack pointer for read and write