Control Unit : Hardwired vs. Microprogrammed Approach.

Control Unit :Hardwired vs. Microprogrammed Approach

Two Major Blocks in a CPU

Datapath Adders, multipliers, dividers Shifters, Registers Anything that changes or stores data

Control Unit Controls the data How data is stored? Where is it stored? When should data be available?

Control Unit

Correct sequencing of control signals Much like human brain controlling various

parts of body Sequence and timing is the key

Any aberration will result in wrong operation

A Simplified Control Unit

Control Unit

Fetch Unit

Decode Unit

Execution Unit

Write Back Unit

Fetch

Decode

Execute

Write Back

A Possible Implementation

2 to 4Decoder

CLK

Mod-3 Counter

Timing Diagram

CLK

Fetch

Decode

Execute

Write Back

Let’s Sample The Signals

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

Another Way to Generate Signals

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

Hardwired vs Microprogrammed

HardwiredUse gates to generate signalsSqueeze out the juice for performanceDifferent logic styles possible

MicroprogrammedStore the control signals in the sequenceJust read from the memory every clock cycle

A Model Computer (Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)

Accumulator

ALU

Register B

PC

MAR

MDR

RAM

IR

Control

8

8

12

12

12

12

12

12

12

4

12

Bus

R

W

LM

IPLPEP

LDED

LAEA

SA

EU

LB

LIEI

More Details

L = Load E = Copy to bus A,S = Add and Subtract Sign bit to control unit IP = Increment PC

ACC

ALU

B

PC

MAR

MDR

RAM

IR

ControlBus

RW

LM

IP

LPEP

LD

ED

LAEA

S

AEU

LB

LIEI

LDALoad

Accumulator

1 A←(Mem)

1. MAR ←IR

2. MDR ←M(MAR)

3. A ←MDR

EI,LM

R

ED,LA

STAStore

Accumulator2 (Mem) ←A

1. MAR ←IR

2.MDR ←A

3. M(MAR) ← MDR

EI,LM

EA,LD

W

ADD 3 A ←A+B 1. A←ALU(Add) A,EU,LA

SUB 4 A ←A-B 1. A←ALU(Sub) S,EU,LA

MBA 5 B ←A 1. B←A EA,LB

JMP 6 PC ←Mem 1. PC←IR EI,LP

JN 7 PC ←Mem

If –ve flag is set

1. PC←IR if NF is set NF : EI,LP

HLT 8-15 Stop Clock

“Fetch” IR ←Next Instruction

1. MAR ←PC

2. MDR ←M(MAR)

3. IR ← MDR

EP,LM

R

ED,LI,IP

Mnemonic Opcode Action Register TransfersActive

Controls

Hardwired Unit

IR

Decoder Control Matrix

LDASTA

ADDSUB

MBAJMP

JN

Ring Counter

NF

T5 T1

Halt

Opcode

Control Signals

CLK

Table with Sequencing

IP LP EP LM R W LD ED LI EI LA EA A S EU LB

Fetch T2 T0 T0 T1 T2 T2

LDA T3 T4 T5 T3 T5

STA T3 T5 T4 T3 T4

MBA T3 T3

ADD T3 T3 T3

SUB T3 T3 T3

JMP T3 T3

JN T3*F

T3*F

IP = T2; R=T1+T4*LDA; LI=T2;LP = T3*JMP+T3*JN*NF; W=T5* STA; A = T3*ADD;EP = T0; LD = T4*STA; S = T3*SUB;LM = T0+T3*LDA+T3*STA ED=T2+T5*LDA; …..

Control Matrix

Implement using discrete gates Usually done using PLAs Large control matrices are implemented

hierarchicallyFor speed

A well known process and design flows are widespread

An Alternate Implementation

IRStartingAddress

Generator

uPC

Control Store

CLK

+1

MicroinstructionRegister

+NF

& CD

MAP

1*

01

00

Control

Map CD Meaning

1 * From IR

0 0UnconditionalBranch within Microprogram

0 1

NF=0 => IncrementNF=1 =>

Conditional Branch

32 x 24

HLT

Control ROMJump Address

4-bit opcode

Control Store

Fetch 0

00 0011000000000000 0 0 0 01

01 0000100000000000 0 0 0 02

02 1000000110000000 0 1 0 XX

LDA 1 03 0001000001000000 0 0 0 04

04 0000100000000000 0 0 0 05

05 0000000100100000 0 0 0 00

STA 2 06 0001000001000000 0 0 0 07

07 0000001000010000 0 0 0 08

08 0000010000000000 0 0 0 00

ADD 3 09 0000000000101010 0 0 0 00

SUB 4 0A 0000000000100110 0 0 0 00

MBA 5 0B 0000000000010001 0 0 0 00

JMP 6 0C 0100000001000000 0 0 0 00

JN 7 0D 0000000000000000 1 0 0 0F

0E 0000000000000000 0 0 0 00

0F 0100000001000000 0 0 0 00

Expansion 8-E 10-1E

HLT F 1F 0000000000000000 0 0 1 XX

Instruction Op-CodeuInstructionAddress Control Signals CD MAP HLT Addr. Of Next

Control Word

Example 1 – MBA followed by ADD

Fetch 0

00 0011000000000000 0 0 0 01

01 0000100000000000 0 0 0 02

02 1000000110000000 0 1 0 XX

LDA 1 03 0001000001000000 0 0 0 04

04 0000100000000000 0 0 0 05

05 0000000100100000 0 0 0 00

STA 2 06 0001000001000000 0 0 0 07

07 0000001000010000 0 0 0 08

08 0000010000000000 0 0 0 00

ADD 3 09 0000000000101010 0 0 0 00

SUB 4 0A 0000000000100110 0 0 0 00

MBA 5 0B 0000000000010001 0 0 0 00

JMP 6 0C 0100000001000000 0 0 0 00

JN 7 0D 0000000000000000 1 0 0 0F

0E 0000000000000000 0 0 0 00

0F 0100000001000000 0 0 0 00

Expansion 8-E 10-1E

HLT F 1F 0000000000000000 0 0 1 XX

0B09

LB

EU

SAEA

LA

EI

LI

ED

LD

WRLM

EP

LP

IP

Sequence for MBA,ADD

1. MAR ←PC

2. MDR ←M(MAR)

3. IR ← MDR B←A 1. MAR ←PC

2. MDR ←M(MAR)

3. IR ← MDR A←ALU(Add)

0011000000000000

0011000000000000

0000100000000000

0000100000000000

1000000110000000

1000000110000000

0000000000010001

0000000000101010

MOV B,A

ADD

Example 2 – JN with Flag Set

Fetch 0

00 0011000000000000 0 0 0 01

01 0000100000000000 0 0 0 02

02 1000000110000000 0 1 0 XX

LDA 1 03 0001000001000000 0 0 0 04

04 0000100000000000 0 0 0 05

05 0000000100100000 0 0 0 00

STA 2 06 0001000001000000 0 0 0 07

07 0000001000010000 0 0 0 08

08 0000010000000000 0 0 0 00

ADD 3 09 0000000000101010 0 0 0 00

SUB 4 0A 0000000000100110 0 0 0 00

MBA 5 0B 0000000000010001 0 0 0 00

JMP 6 0C 0100000001000000 0 0 0 00

JN 7 0D 0000000000000000 1 0 0 0F

0E 0000000000000000 0 0 0 00

0F 0100000001000000 0 0 0 00

Expansion 8-E 10-1E

HLT F 1F 0000000000000000 0 0 1 XX

0D

CD

If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F

LB

EU

SAEA

LA

EI

LI

ED

LD

WRLM

EP

LP

IP

Example 3 – JN with Flag Not Set

Fetch 0

00 0011000000000000 0 0 0 01

01 0000100000000000 0 0 0 02

02 1000000110000000 0 1 0 XX

LDA 1 03 0001000001000000 0 0 0 04

04 0000100000000000 0 0 0 05

05 0000000100100000 0 0 0 00

STA 2 06 0001000001000000 0 0 0 07

07 0000001000010000 0 0 0 08

08 0000010000000000 0 0 0 00

ADD 3 09 0000000000101010 0 0 0 00

SUB 4 0A 0000000000100110 0 0 0 00

MBA 5 0B 0000000000010001 0 0 0 00

JMP 6 0C 0100000001000000 0 0 0 00

JN 7 0D 0000000000000000 1 0 0 0F

0E 0000000000000000 0 0 0 00

0F 0100000001000000 0 0 0 00

Expansion 8-E 10-1E

HLT F 1F 0000000000000000 0 0 1 XX

0D

CDCD

LB

EU

SAEA

LA

EI

LI

ED

LD

WRLM

EP

LP

IP

Let’s Review the Microprogramming Model Store the microprogram in control store Fetch the instruction Get the set of control signals from the

control word Move the microinstruction address Lather, Rinse, Repeat

What is Microcode?

Michael Slater's "Microprocessor Based Design" (pg.42):

Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.'

Thought Experiment

Why is the design a little clumsy? What can we do about it?

Reason for Clumsiness

JN – Conditional Flag check Without any condition check, the whole

process is very smooth Solution – Avoid all conditional checks

Real Life

A little American Football Story Theory vs. Practice

In theory, there is no difference between theory and practice

In practice, theory and practice are two different things altogether

Live with condition checksKeep designs as clean as possible

A General Approach

IR

Starting and Branch

AddressGenerator

uPC

Control Store

Control Word

External Inputs

Conditional Codes

Format of Microinstructions

Pick yoursYour choice is as best as your neighbor’s

What we did :One bit position per control signalOrder of the bits ?

Don’t matterCan result in long microinstructions

Not the number of microinstructions, but the width

A Note About Density

Observe that only a few bits are set to 1 Poor usage of bit space This scheme is called Horizontal

Microprogram Alternate Version : Encode the bits

Vertical Microprogram

Vertical Microprogram

Encode the bits by grouping similar elements together

General Idea :Group similar resources together

There can be only one source or destination register

Some operations are mutually exclusive Read vs Write of memory

Design Issues

Encoding reduces the bit-spaceBut requires decoders

Cost of decoder vs bit-spaceUsually decoder cost is very low

Another Idea

Group concuurently active signals Every meaningful combination gets a code Complex decoder to interpret every code

Vertical vs Horizontal

Horizontal FasterMore areaMore common currently

Cheap transistors

VerticalSlowerMore microinstructions

Microsequencing

Other ways to save on hardware Every instruction had its own

microprogram sequence Also, instructions have several addressing

modesOnly the first few microinstructions differ

Can we share microcode?

A Powerful Technique in Sharing

Bit-ORing Example Two instructions share some microcode Eventually, must branch The default branch (one instruction’s) is X0 The other branch is stored at X1 Change the least significant bit(s?) to get a new address

Compare that with : Having two conditional branches Store two fields, one for each branch Both very unclean

Thought Experiment :

What if we provided explicit branch instead of storing next field in our microprogram?

Typical instruction set will need a lot of branches

Lot of time will be wasted on branching

A Pat on Our Back

We provided explicit field for addressBranch location is now data It is already saved

Caution :Microinstruction can get very wide

Solution :There is no free lunch.

Can we pipeline microfetch?

A neat idea : Why wait till the current micro-op is over? Branch field gives next operation Get the next op

Caveat : External inputs and status flags may change the order What about interrupts?

They are going to follow you everywhere Should have a mechanism that can invalidate microcode

prefetch Similar to pipeline flush for instructions

Commonly used

Historical Perspectives

Hardwired Logic Popular before 60’s

Only way people did it Popular now

Speed Benefits

Microprogram Popular in 70’s

Memory was slower than CPU No on-chip cache Best way is to store the microcode

Now – Depends on who you ask? Shades of gray :

Extremes of spectrum are harder to find nowadays

Tools for Design

Hardwired Any state machine optimizer Assigning states, minimizing tranisitions, races,

hazards,…….. Microcoding

Small ones can be in binary Large ones – Use microassembler

Very useful debug tool Can use microassembler simultaneously with actual

hardware development

Hardwired vs Microcoding

Hardwired units are faster and smaller Emulation is easy with microcoding Hardwired design is complex if large Bugs in hardwired design cannot be fixed

in field Hardwired control is not suited for loops

Looping with microcode can be made as fast

Hardwired vs Microcode vs RISC RISC

Simpler instruction set Hardwired Implementation

RISC instructions are like microcodes Instructions come from I-Cache instead of Control

Store

Difference : Contents are not fixed Advantage : Only load what you want on the I-Cache

Keeps size smaller as compared to Control Stores

Microprogram vs Software Imagine Floating Point Division Solution 1 : Write in software

Long process Error prone Many fetches repeatedly from memory for the given

sequence of operations

Solution 2 : Microcode Long process too – but designer’s not programmers Relatively error free – more thorough design Requires many cycles but fetched and used locally

Emulation A very common use of microcoding IBM System/360

32 bit architecture 16-bit registers

Secret : Most implementations were 8-bit

Keep cost low Heavy microcoding Programmers oblivious

In 1992, International Meta Systems (IMS) announced the 3250 Designed to emulate the x86, 68K, and 6502 architectures Uses customizable microcode, among other techniques Went bust, never released

Another Interesting Note

Writable Control StoreWhat if you, a programmer, can write your

own control store?Not a mad scientist thought

Implemented inVAX 8800PDP-11/60 IBM System/370

Current Trends

Microcode Update Linux Utility - microcode_ctl

Companion to IA32 microcode driver It decodes and sends new microcode to the kernel

driver to be uploaded to Intel IA32 processors Update is volatile – lost on reboots

Microcode updates are also rolled into BIOS updates typically Ready even before an OS is loaded

Intel Said…..

The Pentium(R) Pro processor and Pentium(R) II processor maycontain design defects or errors known as errata that may cause theproduct to deviate from published specifications. Many times, theeffects of the errata can be avoided by implementing hardware orsoftware work-arounds, which are documented in the Pentium Pro Processor Specification Update and the Pentium II ProcessorSpecification Update. Pentium Pro and Pentium II processors include afeature called "reprogrammable microcode", which allows certain typesof errata to be worked around via microcode updates. The microcodeupdates reside in the system BIOS and are loaded into the processorby the system BIOS during the Power-On Self Test, or POST.

Current Trends

Hyperthreading in P4A second logical CPUComplete state of the system in both CPUs

Microcoding in P4Two pointers control flow independentlyBoth processors share the ROM entriesAccess is alternated between the CPUs

Thank You

Control Unit : Hardwired vs. Microprogrammed Approach.

Documents

Transcript of Control Unit : Hardwired vs. Microprogrammed Approach.