L02-SimpleImps
-
Upload
samalmasry -
Category
Documents
-
view
213 -
download
0
Transcript of L02-SimpleImps
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 1/55
CS 152 Computer Architecture andEngineering
Lecture 2 - Simple MachineImplementations
Krste AsanovicElectrical Engineering and Computer Sciences
University of California at Berkeley
http://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs152
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 2/55
1/27/2009 CS152-Spring¶09 2
Last Time in Lecture 1Computer Science at crossroads from sequential toparallel computingComputer Architecture >> ISAs and RTL ± CS152 is about interaction of hardware and software, and design of
appropriate abstraction layers
Comp. Arch. shaped by technology and applications ± History provides lessons for the future
Cost of software development a large constraint onarchitecture ± Compatibility a key solution to software cost
IBM 360 introduces notion of ³family of machines´running same ISA but very different implementations ± Within same generation of machines ± ³Future-proofing´ for subsequent generations of machine
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 3/55
1/27/2009 CS152-Spring¶09 3
B urrough¶s B 5000 Stack Architecture:An ALGOL Machine, Robert B arton, 1960
Machine implementation can be completely hidden if theprogrammer is provided only a high-level language interface.
Stack machine organization because stacks are convenient for:1. expression evaluation;2. subroutine calls, recursion, nested interrupts;3. accessing variables in block-structured languages.
B6700, a later model, had many more innovative features ± tagged data ± virtual memory ± multiple processors and memories
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 4/55
1/27/2009 CS152-Spring¶09 4
A Stack Machine
A Stack machine has a stack asa part of the processor state
typical operations:push, pop, +, *, ...
Instructions like + implicitlyspecify the top 2 elements of the stack as operands.
a
:
stack
Processor
MainStore
ba
push bcba
push c ba
pop
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 5/55
1/27/2009 CS152-Spring¶09 5
abc
Evaluation of Expressions
(a + b * c) / (a + d * c - e) /
+
* +a e-
ac
dc
*b
Reverse Polisha b c * + a d c * + e - /
push apush bpush cmultiply
*
Evaluation Stack
b * c
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 6/55
1/27/2009 CS152-Spring¶09 6
a
Evaluation of Expressions
(a + b * c) / (a + d * c - e) /
+
* +a e-
ac
dc
*b
Reverse Polisha b c * + a d c * + e - /
add
+
Evaluation Stack
b * c
a + b * c
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 7/55
1/27/2009 CS152-Spring¶09 7
H ardware organization of the stack
Stack is part of the processor state s tack mu s t be bounded and s mall
} number of Registers,not the size of main memory
Conceptually stack is unbounded a part of the s tack i s included in the
proce ss or s tate; the re s t i s kept in themain memory
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 8/55
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 9/55
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 10/55
1/27/2009 CS152-Spring¶09 10
Stack Size and Expression Evaluation
program stack (size = 4 )push a R0push b R0 R1
push c R0 R1 R2* R0 R1+ R0push a R0 R1push d R0 R1 R2push c R0 R1 R2 R3* R0 R1 R2+ R0 R1push e R0 R1 R2- R0 R1
/ R0
a b c * + a d c * + e - /
a and c are³loaded´ twice
not the best use of registers!
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 11/55
1/27/2009 CS152-Spring¶09 11
Register Usage in a GPR Machine
M ore control over register usagesince registers can be named explicitly
Load Ri mLoad Ri (Rj)Load Ri (Rj) (Rk)
- eliminates unnecessary Loads and Stores
- fewer Registers
but instructions may be longer!
Load R0 aLoad R1 cLoad R2 b
Mul R2 R1
(a + b * c) / (a + d * c - e)
ReuseR2
Add R2 R0Load R3 dMul R3 R1Add R3 R0
ReuseR3
Load R0 eSub R3 R0Div R2 R3
ReuseR0
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 12/55
1/27/2009 CS152-Spring¶09 12
Stack Machines: Essential features
In addition to push, pop, +etc., the instruction setmust provide thecapability to ± refer to any element in the
data area ± jump to any in s truction in
the code area ± move any element in the
s tack frame to the top
machinery tocarry out+, -, etc.
stackSP
DP
PC
data
.
.
.
abc
push apush bpush c
*+push e
/
code
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 13/55
1/27/2009 CS152-Spring¶09 13
Stack versus GPR Organization Amdahl, Blaauw and Brooks, 1964
1. The performance advantage of push down stack organizationis derived from the presence of fast registers and not the waythey are used.
2.³Surfacing´ of data in stack which are ³profitable´ isapproximately 50% because of constants and common
subexpressions.3. Advantage of instruction density because of implicit addressesis equaled if short addresses to specify registers are allowed.
4. Management of finite depth stack causes complexity.5. Recursive subroutine advantage can be realized only with the
help of an independent stack for addressing.6. Fitting variable-length fields into fixed-width word is awkward.
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 14/55
1/27/2009 CS152-Spring¶09 14
1. Stack programs are not smaller if short (Register)addresses are permitted.
2. Modern compilers can manage fast register space better than the stack discipline.
Stack Machines (Mostly) Died by 1980
P R¶s and caches are better than stack and displays
Early language - directed architectures often did not take into account the role of compilers!
B5 000, B67 00, HP 3000, IC L 2900, Symbolics 3 6 00
Some would claim that an echo of this mistake isvisible in the SPARC architecture register windows -
more later«
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 15/55
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 16/55
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 17/55
1/27/2009 CS152-Spring¶09 17
ISA to Microarchitecture Mapping
ISA often designed with particular microarchitectural stylein mind, e.g., ± CISC microcoded ± RISC hardwired, pipelined ± VLIW fixed-latency in-order pipelines ± J VM software interpretation
But can be implemented with any microarchitectural style ± Core 2 Duo: hardwired pipelined CISC (x86)
machine (with some microcode support)
± This lecture: a microcoded RISC (MIPS) machine ± Intel could implement a dynamically scheduled out-of-order VLIW (IA-64) processor
± ARM J azelle: A hardware J VM processor ± Simics: Software-interpreted SPARC RISC machine
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 18/55
1/27/2009 CS152-Spring¶09 18
Microarchitecture: I mplementation of an IS A
Structure: How components are connected.Static
Behavior: How data moves between components
Dynamic
Controller
Datapath
controlpointsstatus
lines
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 19/55
1/27/2009 CS152-Spring¶09 19
Microcontrol Unit M aurice Wilkes, 1954
Embed the control logic state table in a memory array
Matrix A Matrix B
Decoder
Next state
op conditionalcode flip-flop
Q address
Control lines toALU , M UX s, Registers
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 20/55
1/27/2009 CS152-Spring¶09 20
Microcoded Microarchitecture
Memory(RAM)
Datapath
Qcontroller(ROM)
Addr Data
zero? busy?
opcode
en M emM emWrt
holds fixedmicrocode instructions
holds user programwritten in macrocode
instructions (e.g.,MIPS, x8 6 , etc.)
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 21/55
1/27/2009 CS152-Spring¶09 21
The MIPS32 ISA
Processor State32 32-bit GPRs, R0 always contains a 01 6 double-precision/32 single-precision FPRsFP status register, used for FP compares & exceptionsPC, the program countersome other special registers
Data types8-bit byte, 1 6 -bit half word32-bit word for integers32-bit word for single precision floating point64 -bit word for double precision floating point
Load/Store style instruction setdata addressing modes- immediate & indexedbranch addressing modes- PC relative & register indirectB yte addressable memory- big-endian mode
All instructions are 32 bits
See H& P
A ppendix B for full description
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 22/55
1/27/2009 CS152-Spring¶09 22
MIPS Instruction Formats
6 5 5 1 6
opcode rs offset B EQZ, B NEZ
6 2 6
opcode offset J, JA L
6 5 5 1 6opcode rs JR, JA LR
opcode rs rt immediate rt n (rs) op immediate
6 5 5 5 5 6
0 rs rt rd 0 func rd n (rs) func (rt)ALU
ALU i
6 5 5 1 6
opcode rs rt displacement M[(rs) + displacement]Mem
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 23/55
1/27/2009 CS152-Spring¶09 23
A B us-based Datapath for MIPS
M icroinstruction: register to register transfer (17 control signals)MA n PC means RegSel = PC; enReg=yes; ldMA= yes
B n Reg[rt] means
enMem
MA
addr
data
ldMA
Memory
busy
MemWrt
B us 32
zero?
A B
OpSel ldA ld B
ALU
enA LU
ALU
control
2
RegWrtenReg
addr
data
rsrtrd32(PC)31( L ink)
RegSel
32 GPRs+ PC ...
32-bit Reg
3rsrtrd
ExtSel
IR
Opcode
ldIR
ImmExt
enImm
2
RegSel = rt; enReg=yes; ld B = yes
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 24/55
1/27/2009 CS152-Spring¶09 24
Memory Module
Assumption: Memory operates independentlyand is slow as compared to Reg-to-Reg transfers
(multiple CP U clock cycles per access)
EnableWrite(1)/Read(0)RAM
din dout
we
addr busy
bus
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 25/55
1/27/2009 CS152-Spring¶09 25
Instruction Execution
Execution of a MIPS instruction involves
1. instruction fetch2. decode and register fetch3. A LU operation4 . memory operation (optional)5 . write back to register file (optional)
+ the computation of the
next instruction address
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 26/55
1/27/2009 CS152-Spring¶09 26
Microprogram Fragments
instr fetch: MA n PC A n PCIR n MemoryPC n A + 4dispatch on OPcode
can betreated a s
a macro
ALU: An Reg[rs]B n Reg[rt]Reg[rd] n func(A,B)do instruction fetch
ALUi: An Reg[rs]B n Imm s ign exten s ion ...Reg[rt] n Opcode(A,B)do instruction fetch
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 27/55
1/27/2009 CS152-Spring¶09 27
Microprogram Fragments ( cont.)
LW: A n Reg[rs]B n ImmMA n A + B
Reg[rt] n Memorydo instruction fetch
J: A n PCB n IRPC n JumpTarg(A, B )do instruction fetch
beqz: A n Reg[rs]
I f zero?(A) then go to bz-takendo instruction fetch
bz-taken: A n PCB n Imm << 2PC n A + B
do instruction fetch
JumpTarg( A ,B) ={ A[ 31:28],B [ 25:0],00}
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 28/55
1/27/2009 CS152-Spring¶09 28
MIPS Microcontroller: first attempt
nextstate
QPC (state)
Opcodezero?
B usy (memory)
Control Signals (1 7 )
ss
6
QProgram ROM
addr
data
latching the inputsmay cause aone - cycle delay
= 2 (opcode+status+s) words
How bigis ³s´?
RO M size ?
Word size ? = control+s bits
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 29/55
1/27/2009 CS152-Spring¶09 29
Microprogram in the ROM worksheet
State Op zero? busy Control points next-state
fetch 0 * * * MA n PC fetch 1fetch 1 * * yes .... fetch 1fetch 1 * * no IR n Memory fetch 2
fetch 2 * * * A n PC fetch 3fetch 3 * * * PC n A + 4 ?
ALU 0 * * * A n Reg[rs] A LU 1ALU 1 * * * B n Reg[rt] A LU 2ALU 2 * * * Reg[rd] n func(A, B ) fetch 0
fetch 3 ALU * * PC n A + 4 ALU 0
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 30/55
1/27/2009 CS152-Spring¶09 30
Microprogram in the ROM
State Op zero? busy Control points next-statefetch 0 * * * MA n PC fetch 1fetch 1 * * yes .... fetch 1fetch 1 * * no IR n Memory fetch 2fetch 2 * * * A n PC fetch 3fetch
3ALU * * PC n A + 4 ALU
0fetch 3 ALU i * * PC n A + 4 ALU i0fetch 3 LW * * PC n A + 4 LW0fetch 3 SW * * PC n A + 4 SW 0fetch 3 J * * PC n A + 4 J0fetch 3 JAL * * PC n A + 4 JAL 0
fetch 3 JR * * PC n A +4
JR 0fetch 3 JALR * * PC n A + 4 JALR0fetch 3 beqz * * PC n A + 4 beqz 0...
ALU 0 * * * A n Reg[rs] A LU 1ALU 1 * * * B n Reg[rt] A LU 2
ALU 2 * * * Reg[rd] n func(A, B ) fetch 0
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 31/55
1/27/2009 CS152-Spring¶09 31
Microprogram in the ROM C ont.
State Op zero? busy Control points next-stateALU i0 * * * A n Reg[rs] A LU i1ALU i1 sExt * * B n sExt 1 6 (Imm) A LU i2ALU i1 uExt * * B n uExt 1 6 (Imm) A LU i2ALU i2 * * * Reg[rd] n Op(A, B ) fetch 0
...J0 * * * A n PC J 1J1 * * * B n IR J 2J2 * * * PC n JumpTarg(A, B ) fetch 0...
beqz0
* * * A n Reg[rs] beqz1beqz 1 * yes * A n PC beqz 2
beqz 1 * no * .... fetch 0beqz 2 * * * B n sExt 1 6 (Imm) beqz 3beqz 3 * * * PC n A+ B fetch 0...
JumpTarg( A ,B) = { A[ 31:28],B [ 25:0],00}
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 32/55
1/27/2009 CS152-Spring¶09 32
Size of Control Store
M IPS: w = 6+2 c = 17 s = ?no. of steps per opcode = 4 to 6 + fetch-sequenceno. of states } (4 steps per op-group ) x op-groups
+ common sequences= 4 x 8 + 10 states = 42 states s = 6
Control ROM = 2 (8+6) x 23 bits } 48 Kbytes
size = 2 (w+s) x (c + s) Control ROM
data
status & opcode
addr
next QPC
Control signals
QPC / w
/ s
/ c
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 33/55
1/27/2009 CS152-Spring¶09 33
Reducing Control Store Size
Reduce the ROM height (= address bits)± reduce inputs by extra external logic
each input bit doubles the size of the
control store± reduce states by grouping opcodes
find common sequences of actions± condense input status bits
combine all exceptions into one, i.e.,exception/no-exception
Reduce the ROM width± restrict the next - state encoding
Next, Dispatch on opcode, Wait for memory, ...± encode control signals (vertical microcode)
Control store has to be fast expensive
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 34/55
1/27/2009 CS152-Spring¶09 34
CS152 Administrivia
Room change/decision: ± Either Soda 306 (here) for all but two lectures,
back in 320 for those ± Or 182 Dwinelle for all lectures
± Vote ?Class schedule almost up to date (but might need tochange quiz depending on vote above)Lab 1 coming out on Thursday, together with PS1
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 35/55
1/27/2009 CS152-Spring¶09 35
Problem Sets
Designed to help you learn material and practice for quizMust hand in day before section that reviewsproblem set, which will be shortly before quizSolutions handed out in review section
Quiz assumes you have worked through problem set.Problem sets graded on 0,1,2 scale
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 36/55
1/27/2009 CS152-Spring¶09 36
Collaboration Policy
Can collaborate to understand problem sets, butmust turn in own solution. Some problems repeatedfrom earlier years - do not copy solutions. Q uizproblems will not be repeated«Each student must complete directed portion of thelab by themselves. OK to collaborate to understandhow to run labs ± Class news group info on web site.
Can work in group of up to 3 students for open-ended portion of each lab ± OK to be in different group for each lab -just make sure to label
participants¶ names clearly on each turned in lab section
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 37/55
1/27/2009 CS152-Spring¶09 37
MIPS Controller V2
QJumpType =next | spin
| fetch | dispatch| feqz | fnez
Control Signals (1 7 )
Control ROM
address
data
+1
Opcode ext
QPC (state)
jumplogic
zero
QPC QPC+1
absolute
op-group
busy
QPCSrcinput encodingreduces RO M height
next - state encodingreduces RO M width
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 38/55
1/27/2009 CS152-Spring¶09 38
J ump Logic
QPCSrc = C ase QJumpTypes
next QPC+1
spin if (busy) then QPC else QPC+1
fetch absolute
dispatch op-group
feqz if (zero) then absolute else QPC+1
fnez if (zero) then QPC+1 else absolute
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 39/55
1/27/2009 CS152-Spring¶09 39
Instruction Fetch & ALU: M IPS- C ontroller -2
State Control points next-state
fetch 0 MA n PCfetch 1 IR n Memoryfetch 2 A n PC
fetch 3 PC n A + 4 ...ALU 0 A n Reg[rs]ALU 1 B n Reg[rt]ALU 2 Reg[rd] n func(A, B )
ALU i0 A n Reg[rs]ALU i1 B n sExt 1 6 (Imm)ALU i2 Reg[rd] n Op(A, B )
nextspinnext
dispatch
nextnextfetch
nextnextfetch
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 40/55
1/27/2009 CS152-Spring¶09 40
Load & Store: M IPS- C ontroller -2
State Control points next-state
LW0 A n Reg[rs] nextLW1 B n sExt 1 6 (Imm) nextLW2 MA n A+ B nextLW3 Reg[rt] n Memory spinLW4 fetch
SW 0 A n Reg[rs] nextSW 1 B n sExt 1 6 (Imm) next
SW 2 MA n A+B
nextSW 3 Memory n Reg[rt] spinSW 4 fetch
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 41/55
1/27/2009 CS152-Spring¶09 41
B ranches: M IPS- C ontroller -2
State Control points next-state
B EQZ 0 A n Reg[rs] nextB EQZ 1 fnezB EQZ 2 A n PC nextB EQZ 3 B n sExt 1 6 (Imm<<2) nextB EQZ 4 PC n A+ B fetch
B NEZ 0 A n Reg[rs] nextB NEZ 1 feqzB NEZ
2A n PC next
B NEZ 3 B n sExt 1 6 (Imm<<2) nextB NEZ 4 PC n A+ B fetch
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 42/55
1/27/2009 CS152-Spring¶09 42
J umps: M IPS- C ontroller -2
State Control points next-state
J0 A n PC nextJ1 B n IR nextJ2 PC n JumpTarg(A, B ) fetch
JR 0 A n Reg[rs] nextJR 1 PC n A fetch
JAL 0 A n PC nextJAL 1 Reg[31] n A nextJAL 2 B n IR next
JAL
3 PC n JumpTarg(A,B
) fetchJALR0 A n PC nextJALR1 B n Reg[rs] nextJALR2 Reg[31] n A nextJALR3 PC n B fetch
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 43/55
1/27/2009 CS152-Spring¶09 43
VAX 11-780 Microcode
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 44/55
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 45/55
1/27/2009 CS152-Spring¶09 45
Mem-Mem ALU Instructions:M IPS- C ontroller -2
M em -M em ALU op M[(rd)] n M[(rs)] op M[(rt)]
ALU MM0 MA n Reg[rs] nextALU MM1 A n Memory spinALU MM2 MA n Reg[rt] next
ALU
MM3B
n Memory spinALU MM4 MA n Reg[rd] nextALU MM5 Memory n func(A, B ) spinALU MM6 fetch
Complex instructions usually do not require datapathmodifications in a microprogrammed implementation
-- only extra space for the control program
Implementing these instructions using a hardwiredcontroller is difficult without datapath modifications
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 46/55
1/27/2009 CS152-Spring¶09 46
Performance Issues
Microprogrammed controlmultiple cycles per instruction
Cycle time ?t C > max(t reg-reg , t ALU , t QROM)
Suppose 10 * t QROM < t RAM
G ood performance, relative to a single - cyclehardwired implementation, can be achieved even with a C P I of 10
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 47/55
1/27/2009 CS152-Spring¶09 47
H orizontal vs Vertical QCode
Horizontal Qcode has wider Qinstructions ± Multiple parallel operations per Qinstruction ± Fewer steps per macroinstruction ± Sparser encoding more bits
Vertical Qcode has narrower Qinstructions ± Typically a single datapath operation per Qinstruction
± separate Qinstruction for branches ± More steps to per macroinstruction ± More compact less bits
Nanocoding ± Tries to combine best of horizontal and vertical Qcode
# QInstructions
B its per QInstruction
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 48/55
1/27/2009 CS152-Spring¶09 48
N anocoding
MC68000 had 17-bit Qcode containing either 10-bit Q jump or 9-bit nanoinstruction pointer ± Nanoinstructions were 68 bits wide, decoded to give 196
control signals
Qcode ROM
nanoaddress
Qcodenext-state
Qaddress
QPC (state)
nanoinstruction ROMdata
Exploits recurringcontrol signal patternsin Qcode, e.g.,
ALU
0 A n Reg[rs]...ALU i0 A n Reg[rs]...
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 49/55
1/27/2009 CS152-Spring¶09 49
Microprogramming in I B M 360
O nly the fa s te s t model s (75 and 95) were hardwired
M30 M40 M50 M65Datapath width(bits)
8 16 32 64
Qinst width(bits)
50 52 85 87
Qcode size(K µinsts) 4 4 2.75 2.75Qstoretechnology
CCROS TCROS BCROS BCROS
Qstore cycle(ns)
750 625 500 200
memory cycle(ns)
1500 2500 2000 750
Rental fee($K/month)
4 7 15 35
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 50/55
1/27/2009 CS152-Spring¶09 50
Microcode Emulation
IBM initially miscalculated the importance of softwarecompatibility with earlier models when introducing the360 seriesHoneywell stole some IBM 1401 customers by offeringtranslation software (³Liberator´) for Honeywell H200series machineIBM retaliated with optional additional microcode for 360series that could emulate IBM 1401 ISA, later extendedfor IBM 7000 series
± one popular program on 1401 was a 650 simulator, so somecustomers ran many 650 programs on emulated 1401s ± (6 50 s imulated on 1401 emulated on 3 6 0)
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 51/55
1/27/2009 CS152-Spring¶09 51
Microprogramming thrived in theSeventies
Significantly faster ROMs than DRAMs were availableFor complex instruction sets, datapath and controller were cheaper and s impler N ew in s truction s , e.g., floating point, could be supported
without datapath modificationsF ixing bug s in the controller was easier ISA compatibility across various models could beachieved easily and cheaply
Except for the cheapest and fastest machines,all computers were microprogrammed
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 52/55
1/27/2009 CS152-Spring¶09 52
W ritable Control Store ( W CS)
Implement control store in RAM not ROM ± MOS SRAM memories now almost as fast as control store (corememories/DRAMs were 2-10x slower)
± Bug-free microprograms difficult to write
User-WCS provided as option on several minicomputers ± Allowed users to change microcode for each processor
User-WCS failed ± Little or no programming tools support ± Difficult to fit software into small space
± Microcode control tailored to original ISA, less useful for others ± Large WCS part of processor state - expensive context switches ± Protection difficult if user can change microcode ± Virtual memory required re s tartable microcode
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 53/55
1/27/2009 CS152-Spring¶09 53
Microprogramming: early EightiesEvolution bred more complex micro-machines
± Complex instruction sets led to need for subroutine and call stacks in µcode ± Need for fixing bugs in control programs was in conflict with read-only nature
of µROM ± --> WCS (B1700, Q Machine, Intel i432, «)
With the advent of VLSI technology assumptions about ROM &
RAM speed became invalid -> more complexityBetter compilers made complex instructions less important.
Use of numerous micro-architectural innovations, e.g.,pipelining, caches and buffers, made multiple-cycle execution of
reg-reg instructions unattractive
Looking ahead to RISC next time ± Use chip area to build fast instruction cache of user-visible vertical
microinstructions - use software subroutine not hardware microroutines ± Use simple ISA to enable hardwired pipelined implementation
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 54/55
1/27/2009 CS152-Spring¶09 54
Modern Usage
M icroprogramming is far from extinct
Played a crucial role in micros of the EightiesDE C uV AX , M otorola 68K series, I ntel 386 and 4 86
Microcode pays an assisting role in most modernmicros ( AM D Opteron, I ntel C ore 2 Duo, I ntel Atom, I BM P ower P C )Most instructions are executed directly, i.e., with hard-wiredcontrolInfrequently-used and/or complicated instructions invoke themicrocode engine
P atchable microcode common for post-fabricationbug fixes, e.g. Intel processors load µcode patchesat bootup
8/6/2019 L02-SimpleImps
http://slidepdf.com/reader/full/l02-simpleimps 55/55
Acknowledgements
These slides contain material developed andcopyright by: ± Arvind (MIT) ± Krste Asanovic (MIT/UCB) ± J oel Emer (Intel/MIT)
±J
ames Hoe (CMU) ± J ohn Kubiatowicz (UCB) ± David Patterson (UCB)
MIT material derived from course 6.823
UCB material derived from course CS252