Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What...

31
1 EE 215B Design Example: Register Files C.K. Ken Yang UCLA [email protected] Courtesy of BA, MAH

Transcript of Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What...

Page 1: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

1EE 215B

Design Example: Register Files

C.K. Ken YangUCLA

[email protected] of BA, MAH

Page 2: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

2EE 215B

Overview

• Reading– Papers

• Overview– An extreme of “SRAM” design is the register file. Register

files are small SRAMs that are used heavily by the datapath. It serves as very local information that is fast to access. It often involves multiple ports for simultaneous access by a number of functional units/ALUs.

– These design parameters lead to very different cell designs and performance targets. This set of notes reviews the basic concepts and shows an example of such a design.

Page 3: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

3

Outline

• Architecture– What is a register file– 2 basic approaches

• Design Example

EE 215B

Page 4: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

4

What Is a Register File

• Fastest memory block available to the microprocessor.• Stores intermediate results of the microprocessor units such as

ALU & MMU • Access speed is directly proportional to the performance of the

processor.

EE 215B

Page 5: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

5

Architecture: Multi-ported Design

• At least 1 write port and 2 read ports– Accommodate a single

ALU with 2-operand instructions.

– r3 <= r2 + r1 • Superscalar designs

– Multiple functional units access the register file.

5EE 215B

Page 6: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

6

Example: 3-ported Cell

• Separate read/write bitlines– Single-port reads– Dual-port write

• Enable different design constraints– Cell sizing– Different pre-charge

of the read-port

EE 215B

Page 7: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

7

Architecture: Multi-banking

• Multi-porting has a large cost in peripheral circuits.– Replicate memory into many

banks• Homogenous – even

division to a number of banks.– Faster access to each bank.– Smaller register size– More MUXing circuitry

EE 215B

Page 8: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

8

Heterogeneous Multi-banking

• Dividing the ports and registers unevenly to the banks.– Smaller bank for the critical

data– Bigger bank for the non-

critical data• Prediction of critical data

based on an algorithm similar to cache prediction.

EE 215B

Page 9: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

9

Outline

• Architecture

• Design Example– Itanium register file

EE 215B

Page 10: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

10

Itanium 2 Integer Register File

• 6 ALUs share 144 x 65 bit 22 ported general registers• 128 GRs + 16 Kernel Register aliased to R16-31• 64 data path bits plus parity

• 12 read ports and 10 write ports – 8 active, 2 inactive• Active and inactive writes can occur simultaneously

• Datapath bypassing on write ports between multi-media (MMU) and integer execution units (IEU)

EE 215B

1.37mm

1.00

mm

MMUIEU

FetzerISSCC05

Page 11: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

11

Integer RF Structure

DecodeData Array

Parity State Machine

Address Repeater

Global Precharger

Bitline Repeater

Address Driver

FetzerISSCC05EE 215B

Page 12: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

12

Floating Point Register File

1.11mm

1.14

mm

MACMAC

• 128 x 82 bit 18 ported general registers• 8 Read Ports

• 6 MAC data ports, 2 store data ports• 10 write ports, 6 active 4 inactive

• 2 MAC result ports , 4 load data ports

EE 215B FetzerISSCC05

Page 13: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

13

Floating Point RF Structure

Decode

Data ArrayParity State Machine

Address Repeater

Bitline Repeater/Globa

l Precharger

Address Driver

EE 215B FetzerISSCC05

Page 14: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

14

Register File Timing

Write Bit Line

Data Bypass

Read Local BitlineEvaluate

Register Write

READ

WRITEWriteAddr

Decode

ReadAddr

Decode

WriteBitline

Pre-discharge

Read Global BitlineEvaluate

Read LocalPrecharge

Read GlobalPrecharge

CK Phase 1 CK Phase 2

FetzerISSCC05EE 215B

Page 15: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

15

Write Following Reads

• Reading a register that is being written into occurs very often• Itanium solution

– Each register file access contains a READ followed by a WRITE.

– No contention, the READ result can be used half-cycle early.• Another common solution

– Write bypass:• WRITE while READ results in a slow read since the cell is

being flipped.• Bypass the READ with the WRITE information at the

multiplexer.

EE 215B

Page 16: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

16

Register File Decode

• Wordline (en) is pulsed– PCK2X pulses each phase– Read followed by write

• WriteH is generated for the accessed register

16

address

PCK2

self-timed pulse width controlhighb

lowb

highb

lowbsel[i] one read/write port

sel[9:0]

NCK

WRITEHwriteen

en

timer_enable

matchb

PCK2

wordline

FetzerISSCC05

Page 17: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

17

Storage Cell

• One storage node for each thread

• Storage node– Tristated by writel to

assist NFET only pass gate writes.

– writel drain connected PFETs provide extra pull-up during a thread switch and make write easier.

17

nb0

writel

b0

nb0

writel

b1nb1

nb1

thread

thread

thread

thread

thread

writel

WRITEH

Storage nodes thread selection

writel

ida

idb

FetzerISSCC05

Page 18: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

18

Register File READ/WRITE (1)

• Buffered read– Isolate the cell from

the read BL• Additional buffering from

write– Isolate stored data

from read access.– Improve the write

timing

wordline[9:0]

-

writel

write

read

writ

e bi

tline

read

bitl

ine

writel

read

writ

e bi

tline

read

bitl

ine

writei

activedata

inactivedata

EE 215B

Page 19: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

19

Register File READ/WRITE (2)

• Port sharing– Active thread READ

shares wordlines with inactive WRITE

– Reduce the number of total ports

wordline[9:0]

-

writel

write

read

read/write circuit

writ

e bi

tline

read

bitl

ine

writel

read

writ

e bi

tline

read

bitl

ine

writei

activedata

inactivedata

EE 215B

Page 20: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

20

Register File READ/WRITE (3)

• Wordline conditioned by writel– Writel high, enables the

read– Writel low, enables the pull

up for the write.

wordline[9:0]

-

writel

write

read

read/write circuit

writ

e bi

tline

read

bitl

ine

writel

read

writ

e bi

tline

read

bitl

ine

writei

activedata

inactivedata

EE 215B

Page 21: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

21

Register File Organization

• 8 banks– 16 registers per bank

• 8 cells per bitline– 2 bitlines merge at the sense-amplifier– Small number of cells

• Logic gate as the sense amplifiers• Pre-charged and evaluates low (high-skew)

• 200ps access time!

EE 215B

Page 22: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

22

Register File Read Path

CK

PRECK

local1reg0 reg7

read0 read7. . .local0

CK

global

LG8LG0. . . .

PRECK

Pulldown in bitcell global bitline circuit

read

EE 215B

Page 23: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

23

READ Simulation

• Just over 200ps from CK to global bitline evaluate– PCK2X pulses twice per

cycle– Matchb is the wordline

enable signal.• Local read/write signals

generated from each wordline

Local BL

Global BL

PCK2X

Wordline

Read

Matchb

EE 215B

Page 24: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

24

WRITE Simulation

writeWRITEH

b0

Writing a “1” Writing a “0”

wordline

nb0

writel

b0

nb0

thread

thread

thread

WRITEH writel

ida

idb

write bitline

write

To read port

and parity writel

wordline

Page 25: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

25

NCK

Floating Nodes During Write

nb0

writel

b0

nb0writel

WRITEH writel

RF Storage Node

•The storage node in the inactive thread floats low during writes to the active thread.

•At low frequency data could be lost so a timer is implemented on WRITEH to end the writes early

nr1

NCK

enable

•NCK rises and nr1 slowly drops. If the NCK phase is long enough enable drops low ending the writeSlow long L devices

treadchangedTIMER CIRCUIT

EE 215B

Page 26: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

26

Switching Threads

• The READ/WRITE I/O ports look like large caps and there is a significant amount of charge sharing

• WRITEH is held at GND when thread/thread_b change values

nb0

writel

b0

nb0

writel

b1nb1

nb1

thread

thread

thread

thread

thread

writel

WRITEH writel

ida

idb

EE 215B

Page 27: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

27

Switching Threads Simulation

threadthread

idab0

nb0

idbb1

nb0

Needed or b1 would fail!

nb0

writel

b0

nb0

writel

b1nb1

nb1

thread

thread

thread

thread

thread

writel

WRITEH writel

ida

idb

EE 215B

Page 28: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

28

Parity

• Parity ripples through 32 stages in three clock cycles after a write (41 stages in four cycles in FPU)

• The two bit parity computation is 6.5 FETs per bit out of 109.5 (<6.0%)

biti-1

d1i d1b

parityin

d1b

parityin

midp

parityind1i

d1iparityin

outpb

d0i d0b

biti

d0b

biti

biti-1

bitid0i

d0ibiti

midpparityout

biti-1biti parityin

parityout

Parity Functional Representation

FETs shared with Read Buffering

EE 215B

Page 29: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

29

Parity State Machine

• The parity state machine is below the data array and gets the same inputs (wordlines/write/parity_in) as a bitcell

• Parity is continuously computed and checked – Register file outputs parity error. – Scan can observe a parity error before the register is read

• ParityError is read with a duplicate of a register read circuit

29

b0 b2b1 b81…...XOR computation tree

thread

Register N

parity

en

enthread

thread

ParityComp

ParityError

StoredParity

ParitySeed

write

ThreadChanged

Page 30: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

30

Register File Comparison

Design MontecitoInteger

MontecitoFP

McKinleyInteger

ISSCC 2002

Technology 0.09μm 0.09μm 0.18μmWrite Ports 10 10 8Read Ports 12 8 12Registers 144 x 65bit 128 x 82bit 128 x 65bitTransistors 1.43M 1.30M 832K

Parity SM Area 0.098mm2 0.083mm2 NAArray Area 0.930mm2 0.935mm2 1.67mm2

Decoder Area 0.330mm2 0.220mm2 0.39mm2

Global Overhead 0.012mm2 0.052mm2 0.13mm2

Total Size 1.37mm2 1.29mm2 2.2mm2

Page 31: Design Example: Register Files - UCLAicslwebs.ee.ucla.edu/yang/classwiki/images/7/7e/15b_ee...4 What Is a Register File • Fastest memory block available to the microprocessor. •

31

Summary

• Register files are critical functional units similar to ALUs.– Determine the cycle-time of a processor

• Highly constrained memory design– Small number of entries– Large number of ports– Highly partitioned (tradeoff of #ports per cell versus many

cells).• Cell design is very unique.

– Single-ended reads– Buffered reads– Multi-threading

• Sense-amplifiers are often digital logic gates• Parity protection is increasingly critical for reliability.

Reference 3