Skup instrukcija mikropocesora MIPS
description
Transcript of Skup instrukcija mikropocesora MIPS
Skup instrukcija
mikropocesora MIPS
MIPS register convention
Register 1, called $at, is reserved for the assembler, and registers 26-27, called $k0-$k1, are reserved for the operationg system.
Name Register number
Usage Preserved on call?
$zero 0 the constant value 0 n.a.
$v0-$v1
2-3 values for results and expression evalution
no
$a0-$a3
4-7 arguments yes
$t0-$t7
8-15 temporaries no
$s0-$s7
16-23 saved yes
$t8-$t9
24-15 more tempporaries no
$gp 28 global pointer yes
$sp 29 stack pointer yes
$fp 30 frame pointer yes
$ra 31 return address yes
MIPS operands
Name Example Comments
32 registers
$s0-$s7, $t0-$t9, $gp, $fp, $zero, $sp, $ra, $at, Hi, Lo
Fast locations for data, In MIPS, data must be in registers to perform arithmetic. MIPS register $zero always equals 0. Register $at is reserved for the assumbler to handle large constants. Hi and Lo contain the results of multiply and divide.
230 memory words
Memory [0], Memory [4], ..., Memory [4294967292]
Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls.
Fig. 13.1 MicroMIPS instruction formats and naming of the various fields.
5 bits 5 bits
31 25 20 15 0
Opcode Source 1 or base
Source 2 or dest’n
op rs rt
R 6 bits 5 bits
rd
5 bits
sh
6 bits
10 5 fn
jta Jump target address, 26 bits
imm Operand / Offset, 16 bits
Destination Unused Opcode ext I
J
inst Instruction, 32 bits
Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor)Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)Two I-format memory access instructions (lw, sw)Three I-format conditional branch instructions (bltz, beq, bne)Four unconditional jump instructions (j, jr, jal, syscall)
We will refer to this diagram later
A Small Set of Instructions
Instruction UsageLoad upper immediate lui rt,imm
Add add rd,rs,rt
Subtract sub rd,rs,rt
Set less than slt rd,rs,rt
Add immediate addi rt,rs,imm
Set less than immediate slti rd,rs,imm
AND and rd,rs,rt
OR or rd,rs,rt
XOR xor rd,rs,rt
NOR nor rd,rs,rt
AND immediate andi rt,rs,imm
OR immediate ori rt,rs,imm
XOR immediate xori rt,rs,imm
Load word lw rt,imm(rs)
Store word sw rt,imm(rs)
Jump j L
Jump register jr rs
Branch less than 0 bltz rs,L
Branch equal beq rs,rt,L
Branch not equal bne rs,rt,L
Jump and link jal L
System call syscall
Copy
Control transfer
Logic
Arithmetic
Memory access
op15
0008
100000
1213143543
2014530
fn
323442
36373839
8
12Table
The MicroMIPS Instruction Set
13.2 The Instruction Execution Unit
Fig. 13.2 Abstract view of the instruction execution unit for MicroMIPS. For naming of instruction fields, see Fig. 13.1.
ALU
Data cache
Instr cache
Next addr
Control
Reg file
op
jta
fn
inst
imm
rs,rt,rd (rs)
(rt)
Address
Data
PC
5 bits 5 bits
31 25 20 15 0
Opcode Source 1 or base
Source 2 or dest’n
op rs rt
R 6 bits 5 bits
rd
5 bits
sh
6 bits
10 5 fn
jta Jump target address, 26 bits
imm Operand / Offset, 16 bits
Destination Unused Opcode ext I
J
inst Instruction, 32 bits
bltz,jr
beq,bne
12 A/L, lui, lw,sw
j,jal
syscall
22 instructions
13.3 A Single-Cycle Data Path
Fig. 13.3 Key elements of the single-cycle MicroMIPS data path.
/
ALU
Data cache
Instr cache
Next addr
Reg file
op
jta
fn
inst
imm
rs (rs)
(rt)
Data addr
Data in 0
1
ALUSrc ALUFunc DataWrite
DataRead
SE
RegInSrc
rt
rd
RegDst RegWrite
32 / 16
Register input
Data out
Func
ALUOvfl
Ovfl
31
0 1 2
Next PC
Incr PC
(PC)
Br&Jump
ALU out
PC
0 1 2
Instruction fetch Reg access / decode ALU operation Data access
Register writeback
An ALU for MicroMIPS
Fig. 10.19 A multifunction ALU with 8 control signals (2 for function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.
AddSub
x y
y
x
Adder
c 32
c 0
k /
Shifter
Logic unit
s
Logic function
Amount
5
2
Constant amount
Variable amount
5
5
ConstVar
0
1
0
1
2
3
Function class
2
Shift function
5 LSBs Shifted y
32
32
32
2
c 31
32-input NOR
Ovfl Zero
32
32
MSB
ALU
y
x
s
Shorthand symbol for ALU
Ovfl Zero
Func
Control
0 or 1
AND 00 OR 01
XOR 10 NOR 11
00 Shift 01 Set less 10 Arithmetic 11 Logic
00 No shift 01 Logical left 10 Logical right 11 Arith right
lui
imm
13.4 Branching and Jumping
Fig. 13.4 Next-address logic for MicroMIPS (see top part of Fig. 13.3). Adder
jta imm
(rs)
(rt)
SE
SysCallAddr
PCSrc
(PC)
Branch condition checker
in c
1 0 1 2 3
/ 30
/ 32 BrTrue / 32
/ 30 / 30
/ 30
/ 30
/ 30
/ 30
/ 26
/ 30
/ 30 4 MSBs
30 MSBs
BrType
IncrPC
NextPC
/ 30 31:2
16
(PC)31:2 + 1 Default option
(PC)31:2 + 1 + imm When instruction is branch and condition is met
(PC)31:28 | jta When instruction is j or jal
(rs)31:2 When the instruction is jr SysCallAddr Start address of an operating system routine
Update options for PC
Lowest 2 bits of PC always 00
4 MSBs
13.5 Deriving the Control SignalsTable 13.2 Control signals for the single-cycle MicroMIPS implementation.
Control signal 0 1 2 3
RegWrite Don’t write Write
RegDst1, RegDst0 rt rd $31
RegInSrc1, RegInSrc0 Data out ALU out IncrPC
ALUSrc (rt ) imm
AddSub Add Subtract
LogicFn1, LogicFn0 AND OR XOR NOR
FnClass1, FnClass0 lui Set less Arithmetic Logic
DataRead Don’t read Read
DataWrite Don’t write Write
BrType1, BrType0 No branch beq bne bltz
PCSrc1, PCSrc0 IncrPC jta (rs) SysCallAddr
Reg file
Data cache
Next addr
ALU
Control Signal
Settings
Table 13.3
Load upper immediate Add Subtract Set less than Add immediate Set less than immediate AND OR XOR NOR AND immediate OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than 0 Branch on equal Branch on not equal Jump and link System call
001111 000000 100000 000000 100010 000000 101010 001000 001010 000000 100100 000000 100101 000000 100110 000000 100111 001100 001101 001110 100011 101011 000010 000000 001000 000001 000100 000101 000011 000000 001100
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0
op fn
00 01 01 01 00 00 01 01 01 01 00 00 00 00
10
01 01 01 01 01 01 01 01 01 01 01 01 01 00
10
1 0 0 0 1 1 0 0 0 0 1 1 1 1 1
0 1 1 0 1 0 0
00 01 10 11 00 01 10
00 10 10 01 10 01 11 11 11 11 11 11 11 10 10
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
11 0110 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 10 00 00 00 01 11
Instruction Reg
Writ
e
Reg
Dst
Reg
InS
rc
ALU
Src
Add
’Sub
Logi
cFn
FnC
lass
Dat
aR
ead
Dat
aWrit
e
BrT
ype
PC
Src
Instruction Decoding
Fig. 13.5 Instruction decoder for MicroMIPS built of two 6-to-64 decoders.
jrInst
norInst
sltInst
orInst
xorInst
syscallInst
andInst
addInst
subInst
RtypeInst
bltzInst jInst jalInst beqInst bneInst
sltiInst
andiInst oriInst
xoriInst luiInst
lwInst
swInst
addiInst
1
0
1 2
3
4 5
10
12 13
14
15
35
43
63
8 o
p D
eco
de
r
fn D
eco
de
r
/ 6 / 6 op fn
0
8
12
32
34
36 37
38
39
42
63
Control Signal Generation
Auxiliary signals identifying instruction classes
arithInst = addInst subInst sltInst addiInst sltiInst
logicInst = andInst orInst xorInst norInst andiInst oriInst xoriInst
immInst = luiInst addiInst sltiInst andiInst oriInst xoriInst
Example logic expressions for control signals
RegWrite = luiInst arithInst logicInst lwInst jalInst
ALUSrc = immInst lwInst swInst
AddSub = subInst sltInst sltiInst
DataRead = lwInst
PCSrc0 = jInst jalInst syscallInst
Putting It All Together
/
ALU
Data cache
Instr cache
Next addr
Reg file
op
jta
fn
inst
imm
rs (rs)
(rt)
Data addr
Data in 0
1
ALUSrc ALUFunc DataWrite
DataRead
SE
RegInSrc
rt
rd
RegDst RegWrite
32 / 16
Register input
Data out
Func
ALUOvfl
Ovfl
31
0 1 2
Next PC
Incr PC
(PC)
Br&Jump
ALU out
PC
0 1 2
Fig. 13.3
Control
addInst
subInstjInst
sltInst
. ..
.
. .
Fig. 10.19
AddSub
x y
y
x
Adder
c 32
c 0
k /
Shifter
Logic unit
s
Logic function
Amount
5
2
Constant amount
Variable amount
5
5
ConstVar
0
1
0
1
2
3
Function class
2
Shift function
5 LSBs Shifted y
32
32
32
2
c 31
32-input NOR
Ovfl Zero
32
32
MSB
ALU
y
x
s
Shorthand symbol for ALU
Ovfl Zero
Func
Control
0 or 1
AND 00 OR 01
XOR 10 NOR 11
00 Shift 01 Set less 10 Arithmetic 11 Logic
00 No shift 01 Logical left 10 Logical right 11 Arith right
imm
lui
Adder
jta imm
(rs)
(rt)
SE
SysCallAddr
PCSrc
(PC)
Branch condition checker
in c
1 0 1 2 3
/ 30
/ 32 BrTrue / 32
/ 30 / 30
/ 30
/ 30
/ 30
/ 30
/ 26
/ 30
/ 30 4 MSBs
30 MSBs
BrType
IncrPC
NextPC
/ 30 31:2
16
Fig. 13.4
4 MSBs
13.6 Performance of the Single-Cycle Design
Fig. 13.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies.
Instruction access 2 nsRegister read 1 nsALU operation 2 nsData cache access 2 nsRegister write 1 ns Total 8 ns
Single-cycle clock = 125 MHz
P C
P C
P C
P C
P C
ALU-type
Load
Store
Branch
Jump
Not used
Not used
Not used
Not used
Not used
Not used
Not used
Not used
Not used
(and jr)
(except jr & jal)
R-type 44% 6 nsLoad 24% 8 nsStore 12% 7 nsBranch 18% 5 nsJump 2% 3 ns
Weighted mean 6.36 ns
How Good is Our Single-Cycle Design?
Instruction access 2 nsRegister read 1 nsALU operation 2 nsData cache access 2 nsRegister write 1 ns Total 8 ns
Single-cycle clock = 125 MHz
Clock rate of 125 MHz not impressive
How does this compare with current processors on the market?
Not bad, where latency is concerned
A 2.5 GHz processor with 20 or so pipeline stages has a latency of about
0.4 ns/cycle 20 cycles = 8 ns
Throughput, however, is much better for the pipelined processor:
Up to 20 times better with single issue
Perhaps up to 100 times better with multiple issue
A Multicycle Implementation
Fig. Single-cycle versus multicycle instruction execution.
Clock
Clock
Instr 2 Instr 1 Instr 3 Instr 4
3 cycles 3 cycles 4 cycles 5 cycles
Time saved
Instr 1 Instr 4 Instr 3 Instr 2
Time needed
Time needed
Time allotted
Time allotted
A Multicycle Data Path
Fig. Abstract view of a multicycle instruction execution unit for MicroMIPS. For naming of instruction fields
ALU
Cache
Control
Reg file
op
jta
fn
imm
rs,rt,rd (rs)
(rt)
Address
Data
Inst Reg
Data Reg
x Reg
y Reg
z Reg PC
Multicycle Data Path with Control Signals Shown
Fig. Key elements of the multicycle MicroMIPS data path.
Three major changes relative to the single-cycle data path:
1. Instruction & data caches combined
2. ALU performs double duty for address calculation
3. Registers added for intercycle data
/
16
rs
0 1
0 1 2
ALU
Cache Reg file
op
jta
fn
(rs)
(rt)
Address
Data
Inst Reg
Data Reg
x Reg
y Reg
z Reg PC
4
ALUSrcX
ALUFunc
MemWrite MemRead
RegInSrc
4
rd
RegDst
RegWrite
/
32
Func
ALUOvfl
Ovfl
31
PCSrc PCWrite
IRWrite
ALU out
0 1
0 1
0 1 2 3
0 1 2 3
InstData ALUSrcY
SysCallAddr
/
26
4
rt
ALUZero
Zero
x Mux
y Mux
0 1
JumpAddr
4 MSBs
/
30
30
SE
imm
2
Corrections are shown in red
Clock Cycle and Control Signals Table Control signal 0 1 2 3
JumpAddr jta SysCallAddr
PCSrc1, PCSrc0 Jump addr x reg z reg ALU out
PCWrite Don’t write Write
InstData PC z reg
MemRead Don’t read Read
MemWrite Don’t write Write
IRWrite Don’t write Write
RegWrite Don’t write Write
RegDst1, RegDst0 rt rd $31
RegInSrc1, RegInSrc0 Data reg z reg PC
ALUSrcX PC x reg
ALUSrcY1, ALUSrcY0 4 y reg imm 4 imm
AddSub Add Subtract
LogicFn1, LogicFn0 AND OR XOR NOR
FnClass1, FnClass0 lui Set less Arithmetic Logic
Register file
ALU
Cache
Program counter
Execution Cycles
Table Execution cycles for multicycle MicroMIPS
Instruction Operations Signal settingsAny Read out the instruction and
write it into instruction register, increment PC
InstData = 0, MemRead = 1IRWrite = 1, ALUSrcX = 0ALUSrcY = 0, ALUFunc = ‘+’PCSrc = 3, PCWrite = 1
Any Read out rs & rt into x & y registers, compute branch address and save in z register
ALUSrcX = 0, ALUSrcY = 3ALUFunc = ‘+’
ALU type Perform ALU operation and save the result in z register
ALUSrcX = 1, ALUSrcY = 1 or 2ALUFunc: Varies
Load/Store Add base and offset values, save in z register
ALUSrcX = 1, ALUSrcY = 2ALUFunc = ‘+’
Branch If (x reg) = < (y reg), set PC to branch target address
ALUSrcX = 1, ALUSrcY = 1ALUFunc= ‘’, PCSrc = 2PCWrite = ALUZero or ALUZero or ALUOut31
Jump Set PC to the target address jta, SysCallAddr, or (rs)
JumpAddr = 0 or 1,PCSrc = 0 or 1, PCWrite = 1
ALU type Write back z reg into rd RegDst = 1, RegInSrc = 1RegWrite = 1
Load Read memory into data reg InstData = 1, MemRead = 1
Store Copy y reg into memory InstData = 1, MemWrite = 1
Load Copy data register into rt RegDst = 0, RegInSrc = 0RegWrite = 1
Fetch & PC incr
Decode & reg read
ALU oper & PC update
Reg write or mem access
Reg write for lw
1
2
3
4
5
The Control State Machine
Fig. The control state machine for multicycle MicroMIPS.
State 0 InstData = 0
MemRead = 1 IRWrite = 1
ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘+’
PCSrc = 3 PCWrite = 1
Start
Cycle 1 Cycle 3 Cycle 2 Cycle 1 Cycle 4 Cycle 5
ALU- type
lw/ sw lw
sw
State 1
ALUSrcX = 0 ALUSrcY = 3 ALUFunc = ‘+’
State 5 ALUSrcX = 1 ALUSrcY = 1 ALUFunc = ‘ ’ JumpAddr = %
PCSrc = @ PCWrite = #
State 8
RegDst = 0 or 1 RegInSrc = 1 RegWrite = 1
State 7
ALUSrcX = 1 ALUSrcY = 1 or 2 ALUFunc = Varies
State 6
InstData = 1 MemWrite = 1
State 4
RegDst = 0 RegInSrc = 0 RegWrite = 1
State 2
ALUSrcX = 1 ALUSrcY = 2 ALUFunc = ‘+’
State 3
InstData = 1 MemRead = 1
Jump/ Branch
Notes for State 5: % 0 for j or jal, 1 for syscall, don’t-care for other instr’s @ 0 for j, jal, and syscall, 1 for jr, 2 for branches # 1 for j, jr, jal, and syscall, ALUZero () for beq (bne), bit 31 of ALUout for bltz For jal, RegDst = 2, RegInSrc = 1, RegWrite = 1
Note for State 7: ALUFunc is determined based on the op and fn f ields
Speculative calculation of branch address
Branches based on instruction
State and Instruction Decoding
Fig. State and instruction decoders for multicycle MicroMIPS.
jrInst
norInst
sltInst
orInst xorInst
syscallInst
andInst
addInst
subInst
RtypeInst
bltzInst jInst
jalInst beqInst bneInst
sltiInst
andiInst oriInst xoriInst luiInst
lwInst
swInst
andiInst
1
0
1
2
3 4
5
10
12 13
14
15
35
43
63
8
op
De
cod
er
fn D
eco
de
r
/ 6 / 6 op fn
0
8
12
32
34
36 37
38 39
42
63
ControlSt0 ControlSt1 ControlSt2 ControlSt3 ControlSt4 ControlSt5
ControlSt8
ControlSt6 1
st D
eco
de
r
/ 4
st
0 1 2 3 4 5
7
12 13 14 15
8 9 10
6
11
ControlSt7
addiInst
Control Signal Generation
Certain control signals depend only on the control state
ALUSrcX = ControlSt2 ControlSt5 ControlSt7RegWrite = ControlSt4 ControlSt8
Auxiliary signals identifying instruction classes
addsubInst = addInst subInst addiInstlogicInst = andInst orInst xorInst norInst andiInst oriInst xoriInst
Logic expressions for ALU control signals
AddSub = ControlSt5 (ControlSt7 subInst)FnClass1 = ControlSt7 addsubInst logicInst
FnClass0 = ControlSt7 (logicInst sltInst sltiInst)
LogicFn1 = ControlSt7 (xorInst xoriInst norInst)
LogicFn0 = ControlSt7 (orInst oriInst norInst)
Performance of the Multicycle Design
Fig. The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies.
P C
P C
P C
P C
P C
ALU-type
Load
Store
Branch
Jump
Not used
Not used
Not used
Not used
Not used
Not used
Not used
Not used
Not used
(and jr)
(except jr & jal)
R-type 44% 4 cyclesLoad 24% 5 cyclesStore 12% 4 cyclesBranch 18% 3 cyclesJump 2% 3 cycles
Contribution to CPIR-type 0.444 = 1.76Load 0.245 = 1.20Store 0.124 = 0.48Branch 0.183 = 0.54Jump 0.023 = 0.06
_____________________________
Average CPI 4.04
How Good is Our Multicycle Design?
Clock rate of 500 MHz better than 125 MHz of single-cycle design, but still unimpressive
How does the performance compare with current processors on the market?
Not bad, where latency is concerned
A 2.5 GHz processor with 20 or so pipeline stages has a latency of about 0.4 20 = 8 ns
Throughput, however, is much better for the pipelined processor:
Up to 20 times better with single issue
Perhaps up to 100 with multiple issue
R-type 44% 4 cyclesLoad 24% 5 cyclesStore 12% 4 cyclesBranch 18% 3 cyclesJump 2% 3 cycles
Contribution to CPIR-type 0.444 = 1.76
Load 0.245 = 1.20Store 0.124 = 0.48Branch 0.183 = 0.54Jump 0.023 = 0.06
_____________________________
Average CPI 4.04
Cycle time = 2 nsClock rate = 500 MHz
Microprogramming
State 0 InstData = 0
MemRead = 1 IRWrite = 1
ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘+’
PCSrc = 3 PCWrite = 1
Start
Cycle 1 Cycle 3 Cycle 2 Cycle 1 Cycle 4 Cycle 5
ALU- type
lw/ sw lw
sw
State 1
ALUSrcX = 0 ALUSrcY = 3 ALUFunc = ‘+’
State 5 ALUSrcX = 1 ALUSrcY = 1 ALUFunc = ‘ ’ JumpAddr = %
PCSrc = @ PCWrite = #
State 8
RegDst = 0 or 1 RegInSrc = 1 RegWrite = 1
State 7
ALUSrcX = 1 ALUSrcY = 1 or 2 ALUFunc = Varies
State 6
InstData = 1 MemWrite = 1
State 4
RegDst = 0 RegInSrc = 0 RegWrite = 1
State 2
ALUSrcX = 1 ALUSrcY = 2 ALUFunc = ‘+’
State 3
InstData = 1 MemRead = 1
Jump/ Branch
Notes for State 5: % 0 for j or jal, 1 for syscall, don’t-care for other instr’s @ 0 for j, jal, and syscall, 1 for jr, 2 for branches # 1 for j, jr, jal, and syscall, ALUZero () for beq (bne), bit 31 of ALUout for bltz For jal, RegDst = 2, RegInSrc = 1, RegWrite = 1
Note for State 7: ALUFunc is determined based on the op and fn f ields
The control state machine resembles a program (microprogram)
Microinstruction
Fig. Possible 22-bit microinstruction format for MicroMIPS.
PC control
Cache control
Register control
ALU inputs
JumpAddr PCSrc
PCWrite
InstData MemRead
MemWrite IRWrite
FnType LogicFn
AddSub ALUSrcY
ALUSrcX RegInSrc
RegDst RegWrite
Sequence control
ALU function
2bits
23
Symbolic Names for Microinstruction Field ValuesTable Microinstruction field values and their symbolic names. The default value for each unspecified field is the all 0s bit pattern.
Field name Possible field values and their symbolic names
PC control0001 1001 x011 x101 x111
PCjump PCsyscall PCjreg PCbranch PCnext
Cache control0101 1010 1100
CacheFetch CacheStore CacheLoad
Register control1000 1001 1011 1101
rt Data rt z rd z $31 PC
ALU inputs*000 011 101 110
PC 4 PC 4imm x y x imm
ALU function*
0xx10 1xx01 1xx10 x0011 x0111
+ <
x1011 x1111 xxx00
lui
Seq. control01 10 11
PCdisp1 PCdisp2 PCfetch* The operator symbol stands for any of the ALU functions defined above (except for “lui”).
10000 10001 10101 11010
x10
(imm)
Control Unit for Microprogramming
Fig. Microprogrammed control unit for MicroMIPS .
Microprogram memory or PLA
op (from instruction register) Control signals to data path
Address 1
Incr
MicroPC
Data
0
Sequence control
0
1
2
3
Dispatch table 1
Dispatch table 2
Microinstruction register
fetch: ---------------
andi: ----------
Multiway branch
64 entries in each table
Microprogram for MicroMIPS
Fig. The complete MicroMIPS microprogram.
fetch: PCnext, CacheFetch, PC+4 # State 0 (start)PC + 4imm, PCdisp1 # State 1
lui1: lui(imm) # State 7luirt z, PCfetch # State 8lui
add1: x + y # State 7addrd z, PCfetch # State 8add
sub1: x - y # State 7subrd z, PCfetch # State 8sub
slt1: x - y # State 7sltrd z, PCfetch # State 8slt
addi1: x + imm # State 7addirt z, PCfetch # State 8addi
slti1: x - imm # State 7sltirt z, PCfetch # State 8slti
and1: x y # State 7andrd z, PCfetch # State 8and
or1: x y # State 7orrd z, PCfetch # State 8or
xor1: x y # State 7xorrd z, PCfetch # State 8xor
nor1: x y # State 7norrd z, PCfetch # State 8nor
andi1: x imm # State 7andirt z, PCfetch # State 8andi
ori1: x imm # State 7orirt z, PCfetch # State 8ori
xori: x imm # State 7xorirt z, PCfetch # State 8xori
lwsw1: x + imm, mPCdisp2 # State 2lw2: CacheLoad # State 3
rt Data, PCfetch # State 4sw2: CacheStore, PCfetch # State 6j1: PCjump, PCfetch # State 5jjr1: PCjreg, PCfetch # State 5jrbranch1: PCbranch, PCfetch # State 5branchjal1: PCjump, $31PC, PCfetch # State 5jalsyscall1:PCsyscall, PCfetch # State 5syscall
Exception Handling
Exceptions and interrupts alter the normal program flow
Examples of exceptions (things that can go wrong):
ALU operation leads to overflow (incorrect result is obtained) Opcode field holds a pattern not representing a legal operation Cache error-code checker deems an accessed word invalid Sensor signals a hazardous condition (e.g., overheating)
Exception handler is an OS program that takes care of the problem
Derives correct result of overflowing computation, if possible Invalid operation may be a software-implemented instruction
Interrupts are similar, but usually have external causes (e.g., I/O)
Exception Control States
Fig. Exception states 9 and 10 added to the control state machine.
State 0 InstData = 0 MemRead = 1
IRWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘+’
PCSrc = 3 PCWrite = 1
Start
Cycle 1 Cycle 3 Cycle 2 Cycle 4 Cycle 5
ALU- type
lw/ sw lw
sw
State 1
ALUSrcX = 0 ALUSrcY = 3 ALUFunc = ‘+’
State 5 ALUSrcX = 1 ALUSrcY = 1 ALUFunc = ‘ ’ JumpAddr = %
PCSrc = @ PCWrite = #
State 8
RegDst = 0 or 1 RegInSrc = 1 RegWrite = 1
State 7
ALUSrcX = 1 ALUSrcY = 1 or 2 ALUFunc = Varies
State 6
InstData = 1 MemWrite = 1
State 4
RegDst = 0 RegInSrc = 0 RegWrite = 1
State 2
ALUSrcX = 1 ALUSrcY = 2 ALUFunc = ‘+’
State 3
InstData = 1 MemRead = 1
Jump/ Branch
State 10 IntCause = 0
CauseWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘ ’ EPCWrite = 1 JumpAddr = 1
PCSrc = 0 PCWrite = 1
State 9 IntCause = 1
CauseWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘ ’ EPCWrite = 1 JumpAddr = 1
PCSrc = 0 PCWrite = 1
Illegal operation
Overflow
Single-Cycle Data Path
Fig. Key elements of the single-cycle MicroMIPS data path.
/
ALU
Data cache
Instr cache
Next addr
Reg file
op
jta
fn
inst
imm
rs (rs)
(rt)
Data addr
Data in 0
1
ALUSrc ALUFunc DataWrite
DataRead
SE
RegInSrc
rt
rd
RegDst RegWrite
32 / 16
Register input
Data out
Func
ALUOvfl
Ovfl
31
0 1 2
Next PC
Incr PC
(PC)
Br&Jump
ALU out
PC
0 1 2
Clock rate = 125 MHzCPI = 1 (125 MIPS)
Multicycle Data Path
Fig. Key elements of the multicycle MicroMIPS data path.
Clock rate = 500 MHz
CPI 4 ( 125 MIPS)
/
16
rs
0 1
0 1 2
ALU
Cache Reg file
op
jta
fn
(rs)
(rt)
Address
Data
Inst Reg
Data Reg
x Reg
y Reg
z Reg PC
4
ALUSrcX
ALUFunc
MemWrite MemRead
RegInSrc
4
rd
RegDst
RegWrite
/
32
Func
ALUOvfl
Ovfl
31
PCSrc PCWrite
IRWrite
ALU out
0 1
0 1
0 1 2 3
0 1 2 3
InstData ALUSrcY
SysCallAddr
/
26
4
rt
ALUZero
Zero
x Mux
y Mux
0 1
JumpAddr
4 MSBs
/
30
30
SE
imm
2
Getting the Best of Both Worlds
Single-cycle:Clock rate = 125 MHz
CPI = 1
Multicycle:Clock rate = 500 MHz
CPI 4
Pipelined:Clock rate = 500 MHz
CPI 1
Pipelining Concepts
Fig. Pipelining in the student registration process.
Approval Cashier Registrar ID photo Pickup
Start here
Exit
1 2 3 4 5
Strategies for improving performance
1 – Use multiple independent data paths accepting several instructions that are read out at once: multiple-instruction-issue or superscalar
2 – Overlap execution of several instructions, starting the next instruction before the previous one has run to completion: (super)pipelined
Pipelined Instruction Execution
Fig. Pipelining in the MicroMIPS instruction execution process.
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Cycle 9
Instr cache
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
Data cache
Time dimension
Task dimension
In
str
1
In
str
2
In
str
3
In
str
4
In
str
5
Alternate Representations of a Pipeline
Fig. Two abstract graphical representations of a 5-stage pipeline executing 7 tasks (instructions).
1
2
3
4
5
1
2
3
4
5
6
7
(a) Task-time diagram (b) Space-time diagram
Cycle
Instruction
Cycle
Pipeline stage
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Start-up region
Drainage region
a
a
a
a
a
a
a
w
w
w
w
w
w
w
f
f
f
f
f
f
f
r
r
r
r
r
r
r
d
d
d
d
d
d
d
a a a a a a a
w w w w w w w
d d d d d d d
r r r r r r r
f f f f f f f
f = Fetch r = Reg read a = ALU op d = Data access w = Writeback
Except for start-up and drainage overheads, a pipeline can execute one instruction per clock tick; IPS is dictated by the clock frequency
Pipelining Example in a Photocopier
Example
A photocopier with an x-sheet document feeder copies the first sheet in 4 s and each subsequent sheet in 1 s. The copier’s paper path is a 4-stage pipeline with each stage having a latency of 1s. The first sheet goes through all 4 pipeline stages and emerges after 4 s. Each subsequent sheet emerges 1s after the previous sheet. How does the throughput of this photocopier vary with x, assuming that loading the document feeder and removing the copies takes 15 s.
Solution
Each batch of x sheets is copied in 15 + 4 + (x – 1) = 18 + x seconds. A nonpipelined copier would require 4x seconds to copy x sheets. For x > 6, the pipelined version has a performance edge. When x = 50, the pipelining speedup is (4 50) / (18 + 50) = 2.94.
Pipeline Stalls or Bubbles
Fig. Read-after-write data dependency and its possible resolution through data forwarding .
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
$5 = $6 + $7
$8 = $8 + $6
$9 = $8 + $2
sw $9, 0($3)
Data forwarding
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
First type of data dependency
Inserting Bubbles in a Pipeline
Without data forwarding, three bubbles are needed to resolve a read-after-write data dependency
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg file
Reg file ALU
Reg file
Reg file ALU
Reg file
Reg file ALU
Reg file
Reg file ALU
Reg file
Reg file ALU
Cycle 9
Instr cache
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
Data cache
Time dimension
Task dimension
In
str
1
In
str
2
In
str
3
In
str
4
In
str
5
Bubble
Bubble
Bubble
Writes into $8
Reads from $8
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg file
Reg file ALU
Reg file
Reg file ALU
Reg file
Reg file ALU
Reg file
Reg file ALU
Reg file
Reg file ALU
Cycle 9
Instr cache
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
Data cache
Time dimension
Task dimension
In
str
1
In
str
2
In
str
3
In
str
4
In
str
5
Bubble
Bubble
Writes into $8
Reads from $8
Two bubbles, if we assume that a register can be updated and read from in one cycle
Second Type of Data Dependency
Fig. Read-after-load data dependency and its possible resolution
through bubble insertion and data forwarding.
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Data mem
Instr mem
Reg f ile
Reg file ALU
Data mem
Instr mem
Reg f ile
Reg file ALU
Data mem
Instr mem
Reg f ile
Reg file ALU
sw $6, . . .
lw $8, . . .
Insert bubble?
$9 = $8 + $2
Data mem
Instr mem
Reg f ile
Reg file ALU
Reorder?
Without data forwarding, three (two) bubbles are needed to resolve a read-after-load data dependency
Control Dependency in a Pipeline
Fig. Control dependency due to conditional branch.
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Data mem
Instr mem
Reg f ile
Reg file ALU
Data mem
Instr mem
Reg f ile
Reg file ALU
Data mem
Instr mem
Reg f ile
Reg file ALU
$6 = $3 + $5
beq $1, $2, . . .
Insert bubble?
$9 = $8 + $2
Data mem
Instr mem
Reg f ile
Reg file ALU
Reorder? (delayed branch)
Assume branch resolved here
Here would need 1-2 more bubbles
Pipeline Timing and Performance
Fig. Pipelined form of a function unit with latching overhead.
Stage 1
Stage 2
Stage 3
Stage q 1
Stage q
t/q
Function unit
t
. . .
Latching of results
Fig. Throughput improvement due to pipelining as a function of the number of pipeline stages for different pipelining overheads.
Throughput Increase in a q-Stage Pipeline
1 2 3 4 5 6 7 8 Number q of pipeline stages
Th
rou
gh
put i
mp
rove
men
t fa
cto
r
1
2
3
4
5
6
7
8
Ideal: /t = 0
/t = 0.1
/t = 0.05
tt / q +
or
q1 + q / t
Assume that one bubble must be inserted due to read-after-load dependency and after a branch when its delay slot cannot be filled.Let be the fraction of all instructions that are followed by a bubble.
Pipeline Throughput with Dependencies
q(1 + q / t)(1 + )Pipeline speedup =
R-type 44% Load 24% Store 12% Branch 18% Jump 2%
Example
Calculate the effective CPI for MicroMIPS, assuming that a quarter of branch and load instructions are followed by bubbles.
Solution
Fraction of bubbles = 0.25(0.24 + 0.18) = 0.105CPI = 1 + = 1.105 (which is very close to the ideal value of 1)
Pipelined Data Path Design
Fig. Key elements of the pipelined MicroMIPS data path. ALU
Data cache
Instr cache
Next addr
Reg file
op fn
inst
imm
rs (rs)
(rt)
Data addr
ALUSrc ALUFunc DataWrite DataRead
RegInSrc
rt
rd
RegDst RegWrite
Func
ALUOvfl
Ovfl
IncrPC
Br&Jump
PC
1 Incr
0
1
rt
31
0 1 2
NextPC
0
1
SeqInst
0 1 2
0 1
RetAddr
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
SE
Address
Data
Pipelined Control
Fig. Pipelined control signals. ALU
Data cache
Instr cache
Next addr
Reg file
op fn
inst
imm
rs (rs)
(rt)
Data addr
ALUSrc ALUFunc
DataWrite DataRead
RegInSrc
rt
rd
RegDst RegWrite
Func
ALUOvfl
Ovfl
IncrPC
Br&Jump
PC
1 Incr
0
1
rt
31
0 1 2
NextPC
0
1
SeqInst
0 1 2
0 1
RetAddr
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
SE
5 3
2
Address
Data
Optimal Pipelining
Fig. Higher-throughput pipelined data path for MicroMIPS and the execution of consecutive instructions in it .
Reg file
Data cache
Instr cache
Data cache
Instr cache
Data cache
Instr cache
Reg file
Reg f ile ALU
Reg file
Reg f ile ALU
Reg file
Reg f ile ALU
Instruction fetch
Register readout
ALU operation
Data read/store
Register writeback
PC
MicroMIPS pipeline with more than four-fold improvement
Optimal Number of Pipeline Stages
Fig. Pipelined form of a function unit with latching overhead.
Stage 1
Stage 2
Stage 3
Stage q 1
Stage q
t/q
Function unit
t
. . .
Latching of results Assumptions:
Pipeline sliced into q stagesStage overhead is q/2 bubbles per branch (decision made midway)Fraction b of all instructions are taken branches
Derivation of q opt
Average CPI = 1 + b q / 2
Throughput = Clock rate / CPI = [(t / q + )(1 + b q / 2)] –1/2
Differentiate throughput expression with respect to q and equate with 0
q opt = [2(t / ) / b)]
1/2
Data Dependencies and Hazards
Fig. Data dependency in a pipeline.
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Cycle 9
$2 = $1 - $3
Instructions that read register $2
Instr cache
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
Data cache
Fig. When a previous instruction writes back a value computed by the ALU into a register, the data dependency can always be resolved through forwarding.
Resolving Data Dependencies via Forwarding
Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Cycle 9
$2 = $1 - $3
Instructions that read register $2
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
Fig. When the immediately preceding instruction writes a value read out from the data memory into a register, the data dependency cannot be resolved through forwarding (i.e., we cannot go back in time) and a bubble must be inserted in the pipeline.
Certain Data Dependencies Lead to Bubbles Cycle 7 Cycle 6 Cycle 5 Cycle 4 Cycle 3 Cycle 2 Cycle 1 Cycle 8
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Reg f ile
Reg file ALU
Cycle 9
lw $2,4($12)
Instructions that read register $2
Instr cache
Instr cache
Instr cache
Instr cache
Data cache
Data cache
Data cache
Data cache
Data Forwarding
Fig. Forwarding unit for the pipelined MicroMIPS data path.
(rt)
0 1 2
SE
ALU
Data cache
RegInSrc4
Func
Ovfl
0 1
0 1
RegWrite4 RetAddr3
(rs)
ALUSrc1
Stage 3 Stage 4 Stage 5 Stage 2
x3
y3
x4
y4 x3 y3
x4 y4
RegWrite3
d3 d4
x3 y3
x4 y4
Reg file
rs rt
RegInSrc3
ALUSrc2
s2
t2
d4 d3
RetAddr3, RegWrite3, RegWrite4 RegInSrc3, RegInSrc4
RetAddr3, RegWrite3, RegWrite4 RegInSrc3, RegInSrc4
d4 d3
Forwarding unit, upper
Forwarding unit, lower
x2
y2
Design of the Data Forwarding Units
Fig. Forwarding unit for the pipelined MicroMIPS data path.
(rt)
0 1 2
SE
ALU
Data cache
RegInSrc4
Func
Ovfl
0 1
0 1
RegWrite4 RetAddr3
(rs)
ALUSrc1
Stage 3 Stage 4 Stage 5 Stage 2
x3
y3
x4
y4 x3 y3
x4 y4
RegWrite3
d3 d4
x3 y3
x4 y4
Reg file
rs rt
RegInSrc3
ALUSrc2
s2
t2
d4 d3
RetAddr3, RegWrite3, RegWrite4 RegInSrc3, RegInSrc4
RetAddr3, RegWrite3, RegWrite4 RegInSrc3, RegInSrc4
d4 d3
Forwarding unit, upper
Forwarding unit, lower
x2
y2
RegWrite3 RegWrite4 s2matchesd3 s2matchesd4 RetAddr3 RegInSrc3 RegInSrc4 Choose
0 0 x x x x x x2
0 1 x 0 x x x x2
0 1 x 1 x x 0 x4
0 1 x 1 x x 1 y4
1 0 1 x 0 1 x x3
1 0 1 x 1 1 x y3
1 1 1 1 0 1 x x3
Table Partial truth table for the upper forwarding unit in the pipelined MicroMIPS data path.
Let’s focus on designing the upper data forwarding unit
Incorrect in textbook
Hardware for Inserting Bubbles
Fig. Data hazard detector for the pipelined MicroMIPS data path.
(rt)
0 1 2
(rs)
Stage 3 Stage 2
Reg file
rs rt
t2
Data hazard detector
x2
y2
Control signals from decoder
DataRead2
Instr cache
LoadPC
Stage 1
PC Inst reg
All-0s
0 1
Bubble
Controls or all-0s
Pipeline Branch Hazards
Software-based solutions
Compiler inserts a “no-op” after every branch (simple, but wasteful)
Branch is redefined to take effect after the instruction that follows it
Branch delay slot(s) are filled with useful instructions via reordering
Hardware-based solutions
Mechanism similar to data hazard detector to flush the pipeline
Constitutes a rudimentary form of branch prediction:Always predict that the branch is not taken, flush if mistaken
More elaborate branch prediction strategies possible
Branch Prediction
Predicting whether a branch will be taken
Always predict that the branch will not be taken
Use program context to decide (backward branch is likely taken, forward branch is likely not taken)
Allow programmer or compiler to supply clues
Decide based on past history (maintain a small history table); to be discussed later
Apply a combination of factors: modern processors use elaborate techniques due to deep pipelines
A Simple Branch Prediction Algorithm
Fig. Four-state branch prediction scheme.
Not taken
Predict taken
Predict taken again
Predict not taken
Predict not taken
again
Not taken Taken
Not taken Taken
Taken Not taken
Taken
Example
L1: ---- ----L2: ---- ---- br <c2> L2 ---- br <c1> L1
20 iter’s
10 iter’sImpact of different branch prediction schemes
Solution
Always taken: 11 mispredictions, 94.8% accurate1-bit history: 20 mispredictions, 90.5% accurate2-bit history: Same as always taken
Hardware Implementation of Branch Prediction
Fig. Hardware elements for a branch prediction scheme.
The mapping scheme used to go from PC contents to a table entry is the same as that used in direct-mapped caches (Chapter 18)
Compare
Addresses of recent branch instructions
Target addresses
History bit(s) Low-order
bits used as index
Logic From PC
Incremented PC
Next PC
0
1
=
Read-out table entry
Advanced Pipelining
Fig. Dynamic instruction pipeline with in-order issue, possible out-of-order completion, and in-order retirement.
Deep pipeline = superpipeline; also, superpipelined, superpipeliningParallel instruction issue = superscalar, j-way issue (2-4 is typical)
Stage 1
Instr cache
Instruction fetch
Function unit 1
Function unit 2
Function unit 3
Stage 2 Stage 3 Stage 4 Variable # of stages Stage q 2 Stage q 1 Stage q
Ope- rand prep
Instr decode
Retirement & commit stages
Instr issue
Stage 5
Performance Improvement for Deep Pipelines
Hardware-based methods
Lookahead past an instruction that will/may stall in the pipeline(out-of-order execution; requires in-order retirement)
Issue multiple instructions (requires more ports on register file)Eliminate false data dependencies via register renamingPredict branch outcomes more accurately, or speculate
Software-based method
Pipeline-aware compilationLoop unrolling to reduce the number of branches
Loop: Compute with index i Loop: Compute with index i
Increment i by 1 Compute with index i + 1
Go to Loop if not done Increment i by 2
Go to Loop if not done
CPI Variations with Architectural Features
Table Effect of processor architecture, branch prediction methods, and speculative execution on CPI.
Architecture Methods used in practice CPI
Nonpipelined, multicycle Strict in-order instruction issue and exec 5-10
Nonpipelined, overlapped In-order issue, with multiple function units 3-5
Pipelined, static In-order exec, simple branch prediction 2-3
Superpipelined, dynamic Out-of-order exec, adv branch prediction 1-2
Superscalar 2- to 4-way issue, interlock & speculation 0.5-1
Advanced superscalar 4- to 8-way issue, aggressive speculation 0.2-0.5
Development of Intel’s Desktop/Laptop Micros
In the beginning, there was the 8080; led to the 80x86 = IA32 ISA
Half a dozen or so pipeline stages
802868038680486Pentium (80586)
A dozen or so pipeline stages, with out-of-order instruction execution
Pentium ProPentium IIPentium IIICeleron
Two dozens or so pipeline stages
Pentium 4
More advanced technology
More advanced technology
Instructions are broken into micro-ops which are executed out-of-order but retired in-order
Dealing with Exceptions
Exceptions present the same problems as branches
How to handle instructions that are ahead in the pipeline?(let them run to completion and retirement of their
results)
What to do with instructions after the exception point?(flush them out so that they do not affect the state)
Precise versus imprecise exceptions
Precise exceptions hide the effects of pipelining and parallelism by forcing the same state as that of strict sequential execution
(desirable, because exception handling is not complicated)
Imprecise exceptions are messy, but lead to faster hardware(interrupt handler can clean up to offer precise
exception)
The Three Hardware Designs for MicroMIPS
/
ALU
Data cache
Instr cache
Next addr
Reg file
op
jta
fn
inst
imm
rs (rs)
(rt)
Data addr
Data in 0
1
ALUSrc ALUFunc DataWrite
DataRead
SE
RegInSrc
rt
rd
RegDst RegWrite
32 / 16
Register input
Data out
Func
ALUOvfl
Ovfl
31
0 1 2
Next PC
Incr PC
(PC)
Br&Jump
ALU out
PC
0 1 2
Single-cycle
/
16
rs
0 1
0 1 2
ALU
Cache Reg file
op
jta
fn
(rs)
(rt)
Address
Data
Inst Reg
Data Reg
x Reg
y Reg
z Reg PC
4
ALUSrcX
ALUFunc
MemWrite MemRead
RegInSrc
4
rd
RegDst
RegWrite
/
32
Func
ALUOvfl
Ovfl
31
PCSrc PCWrite
IRWrite
ALU out
0 1
0 1
0 1 2 3
0 1 2 3
InstData ALUSrcY
SysCallAddr
/
26
4
rt
ALUZero
Zero
x Mux
y Mux
0 1
JumpAddr
4 MSBs
/
30
30
SE
imm
Multicycle
ALU
Data cache
Instr cache
Next addr
Reg file
op fn
inst
imm
rs (rs)
(rt)
Data addr
ALUSrc ALUFunc
DataWrite DataRead
RegInSrc
rt
rd
RegDst RegWrite
Func
ALUOvfl
Ovfl
IncrPC
Br&Jump
PC
1 Incr
0
1
rt
31
0 1 2
NextPC
0
1
SeqInst
0 1 2
0 1
RetAddr
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
SE
5 3
2
Pipelined
125 MHzCPI = 1
500 MHzCPI 4
500 MHzCPI 1.1
Main MIPS assembly language instruction set
Category
Instruction Example Meaning Comments
Arithmetic
add add $sl, $s2, $s3
$s1 = $s2 + $s3
Three operands; overflow detected
subtract sub $s1, $s2, $s3
$s1 = $s2 - $s3 Three operands; overflow detected
add immediate addi $s1, $s2, 100
$s1 = $s2 + 100
+ constant; overflow detected
add unsigned addu $s1, $s2, $s3
$s1 = $s2 + $s3
Three operands; overflow undetected
subtract unsigned subu $sl, $s2, $s3
$s1 = $s2 - $s3 Three operands; overflow undetected
add immediate unsigned
addiu $s1, $s2, 100
$s1 = $s2 + 100
+ constant; overflow undetected
move from coporoc. reg.
mfc0 $s1, $epc $s1 = $epc Used to copy Exception PC plus other special registers
multiply mult $s2, $s3 Hi, Lo = $s2 * $s3
64-bit signed product in Hi, Lo
multiply unsigned multu $s2, $s3 Hi, Lo = $s2 * $s3
64-bit unsigned product in Hi, Lo
divide div $s2, $s3 Lo = $s2/ $s3, Hi = $s2 mod $s3
Lo = quotient, Hi = remainder
divide unsigned divu $s2, $s3 Lo = $s2/ $s3, Hi = $s2 mod $s3
Unsigned quotient and remainder
move from Hi mfhi $s1 $s1 = Hi Used to get copy of Hi
move from Lo mflo $s1 $s1 = Lo Used to get copy of Lo
Category
Instruction Example Meaning Comments
Logical and and $s1, $s2, $s3
$s1 = $s2 & $s3 Three reg. operands; logical AND
or or $s1, $s2, $s3
$s1 = $s2 ! $s3 Three reg. operands; logical OR
and immediate
andi $s1, $s2, 100
$s1 = $s2 & 100 Logical AND reg. constant
or ori $s1, $s2, 100
$s1 = $s2 ! 100 Logical OR reg. constant
shiftr left logical
sll $s1, $s2, 10 $s1 = $s2 << 10 Shift left by constant
shift right logical
srl $s1, $s2, 10 $s1 = $s2 >> 10 Shift right by constant
Data transfer
load word lw $s1, 100($s2)
$s1 = Memory [$s2+100]
Word from memory to register
store word sw $s1, 100($s2)
Memory [$s2+100]= $s1
Word from register to memory
load byte unsigned
lbu $s1, 100($s2)
$s1 = Memory [$s2+100]
Byte from memory to register
store byte sb $s1, 100($s2)
Memory [$s2+100]= $s1
Byte from register to memory
load upper immediate
lui $s1, 100 $s1 = 100*216 Loads constant in upper 16 bits
Main MIPS assembly language instruction set– cont.
Category
Instruction Example Meaning Comments
Conditional branch
branch on equal
beq $s1, $s2, 25
if ($s1 ==$s2) go to PC + 4 + 100
Equal test; PC relative branch
branch on not equal
bne $s1, $s2, 25
if ($s1!=$s2) go to PC + 4 + 100
Not equal test; PC relative
set on less than
slt $s1, $s2, $s3
if($s2< $s3) $s1 = 1; else $s1 = 0
Compare less than; two's complement
set less than immediate
slti $s1, $s2, 100
if ($s2< 100) $s1 = 1; else $s1 = 0
Compare < constant; two's complement
set less than unsigned
sltu $s1, $s2, $s3
if($s2< $s3) $s1 = 1; else $s1 = 0
Compare less than; natural numbers
set less than unsigned immediate
sltiu $s1, $s2, 100
if($s2< 100) $s1 = 1; else $s1 = 0
Compare < constant; natural numbers
Unconditi-onal jump
jump j 2500 go to 10000 Jump to target address
jump register jr $ra go to $ra For switch, procedure return
jump and link jal 2500 $ra =PC+4; go to 10000
For procedure call
Main MIPS assembly language instruction set– cont.
MIPS machine Language Example
Name Format
6 bits
5 bits
5 bits 5 bits 5 bits
6 bits
Coments
add R 0 2 3 1 0 32 add $1, $2, $3
sub R 0 2 3 1 0 34 sub $1, $2, $3
addi I 8 2 1 100 addi $1, $2,100
addu R 0 2 3 1 0 33 addu $1, $2, $3
subu R 0 2 3 1 0 35 subu $1, $2, $3
addiu I 9 2 1 100 addiu $1,$2,100
mfc0 R 16 0 1 14 0 0 mfc0 $1, $epc
mult R 0 2 3 0 0 24 mult $2, $3
multu R 0 2 3 0 0 25 multu $2, $3
div R 0 2 3 0 0 26 div $2, $3
divu R 0 2 3 0 0 27 divu $2, $3
mfhi R 0 0 0 1 0 16 mfhi $1
mflo R 0 0 0 1 0 18 mflo $1
and R 0 2 3 1 0 36 and $1, $2, $3
or R 0 2 3 1 0 37 or $1, $2, $3
Example
Name Format
6 bits 5 bits 5 bits 5 bits
5 bits
6 bits Coments
andi I 12 2 1 100 andi $1, $2,100
ori I 13 2 1 100 ori $1, $2,100
sll R 0 0 2 1 10
0 sll $1, $2, 10
srl R 0 0 2 1 10
2 srl $1, $2, 10
lw I 35 2 1 100 lw $1,100($2)
sw I 43 2 1 100 sw $1,100($2)
lui I 15 0 1 100 lui $1,100
beq I 4 1 2 25 beq $1,$2,100
bne I 5 1 2 25 bne $1,$2,100
slt R 0 2 3 1 0 42 slt $1, $2, $3
slti I 10 2 1 100 slti $1,$2,100
sltu R 0 2 3 1 0 43 sltu $1, $2, $3
sltiu I 11 2 1 100 sltiu $1,$2,100
j J 2 2500 j 10000
jr R 0 31 0 0 0 8 jr $31
jal J 3 2500 jal 10000
MIPS machine Language – cont.
MIPS Instruction formats
Name Flelds
Comments
Field size
6 bits
5 bits 5 bits
5 bits 5 bits
6 bits
All MIPS instructions 32 bits
R-format
op rs rt rd shamt
funct Arithemetic instruction format
I-Format
op rs rt address/immediate Arithemetic instruction format
J-Format
op target address Jump instruction format
Main MIPS machine language, Formats and examples are shown, with values in each field: op and funct fields form the opcode (each 6 bits), rs field gives a source register (5 bits), rt is also normally a source register (5 bits), rd is the destination register (5 bits), and shamt supplies the shift amount (5 bits). The field values are all in decimal. Floating-point machine language instructions are shown in Figure 4.47 on page 291. Appendix A gives the full MIPS machine language.
MIPS floating-point operands
Name Example Comments
32 floating point registers
$f0, $f1, $f2..., $f31,
MIPS floating -point registers are used in pairs for double precision numbers.
230 memory words
Memory [0], Memory [4], ....,Memory [4294967292]
Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls.
MIPS floating-point assembly language
Category
Instruction Example Meaning Comments
Arthmetic
FP add single add.s $f2, $f4, $f6
$f2 = $f4 + $f6 FP add (single precision)
FP subtract single
sub.s $f2, $f4, $f6
$f2 = $f4 - $f6 FP sub (single precision)
FP multiply single
mul.s $f2, $f4, $f6
$f2 = $f4 * $f6 FP. multiply(single precision)
FP divide single
div.s $f2, $f4, $f6
$f2 = $f4 / $f6 FP divide (single precision)
FP add double add.d $f2, $f4, $f6
$f2 = $f4 + $f6 FP add (double precision)
FP subtract double
sub.d $f2, $f4, $f6
$f2 = $f4 - $f6 FP sub (double precision)
FP multiply double
mul.d $f2, $f4, $f6
$f2 = $f4 * $f6 FP multiply (double precis.)
FP divide double
div.d $f2, $f4, $f6
$f2 = $f4 / $f6 FP divide (double precis.)
Data transfer
load word copr.1
lwc1 $f1, 100($s2)
$f1 = Mem [$s2+100] 32-bit data to FP register
store word copr.1
swc1 $f1,100($s2)
Mem[$s2+100] =$f1 32-bit data to memory
Conditional branch
branch on FP true
bclt 25 if(cond==1) go to PC+4+100
PC relative branch if FP cond.
branch on FP false
bclf 25 if(cond==0) go to PC+4+100
PC relative branch if not cond.
FP compare single (eq.ne.lt.le.gt.ge)
c.lt.s $f2, $f4 if($f2 < $f4) cond=1; else cond = 0
FP compare less than single precision
FP compare double
c.lt.d $f2, $f4 if($f2 <$f4) (eq.ne.lt.le.gt.ge) cond=1; else cond = 0
FP compare less than single precision
MIPS floating-point machine language
Name Format
Example Comments
add.s R 17 16 6 4 2 0 add.s $f2, $f4 $f6
sub.s R 17 16 6 4 2 1 sub.s $f2, $f4 $f6
mul.s R 17 16 6 4 2 2 mul.s $f2, $f4 $f6
div.s R 17 16 6 4 2 3 div.s $f2, $f4 $f6
add.d R 17 17 6 4 2 0 add.d $f2, $f4 $f6
sub.d R 17 17 6 4 2 1 sub.d $f2, $f4 $f6
mul.d R 17 17 6 4 2 2 mul.d $f2, $f4 $f6
div.d R 17 17 6 4 2 3 div.d $f2, $f4 $f6
lwc1 I 49 20 2 100 lwc1 $f2, 100($s4)
swc1 I 57 20 2 100 swc1 $f2, 100($s4)
bc1t I 17 8 1 25 bc1t 25
bc1f I 17 8 0 25 bc1f 25
c.lt.s R 17 16 4 2 0 60 c.lt.s $f2, $f4
c.lt.d R 17 17 4 2 0 60 c.lt.d $f2, $f4
Field size
6 bits
5 bits
5 bits 5 bits 5 bits 6 bits All MIPS instruction 32 bits
Arithmetic and Logical Instructions
Addition (with overflow)
add rd,rs,rt 6 5 5 5 5 6
0 rs rt rd 0 0x20
Absolute value
abs rdest,rsrc pseudoinstruction
Put the absolute value of register rsrc in register rdest.
Addition (without overflow)
addu rd,rs,rt 6 5 5 5 5 6
Put the sum of registers rs and rt into register rd.
0 rs rt rd 0 0x21
Addition immediate (without overflow)
addi rt,rs,imm 6 5 5 16
Arithmetic and Logical Instructions – con.
8 rs rt imm
Addition immediate (without overflow)
addi rt,rs,imm 6 5 5 16
Put the sum of registers rs and sign-extended immediate into register rt.
9 rs rt imm
AND
and rd,rs,rt 6 5 5 5 5 6
Put the logical AND of registers rs and rt into register rd.
Arithmetic and Logical Instructions – con.
And immediate
andi rd,rs,imm 6 5 5 16
Put the logical AND of register rs and the zero-extended immediate into register rt.
0xc rs rt imm
0 rs rt rd 0 0x24
Arithmetic and Logical Instructions – con.
Divide (with overflow)
div rs,rt 6 5 5 10 6
Divide (with overflow)
div rdest,rsrc 1,src2 pseudoinstruction
Divide (without overflow)
divu rs,rt 6 5 5 10 6
0 rs rt 0 0x1a
0 rs rt 0 0x1b
Divide (without overflow)
div rdest,rsrc 1,src2 pseudoinstruction
Put the quotient of register rsrc1 and src2 into register rdest.
Divide register rs by register rt. Leave the quotient in register lo and the remainder in register hi.
Arithmetic and Logical Instructions – con.
Multiply
mult rs,rt 6 5 5 10 6
Multiply (without overflow)
mul rdest,rsrc 1,src2 pseudoinstruction
Unsigned multiply
multu rs,rt 6 5 5 10 6
0 rs rt 0 0x18
0 rs rt 0 0x19
Unsigned multiply (with overflow)
mulou rdest,rsrc1,src2 pseudoinstruction
Put the product of register rsrc1 and src2 into register rdest.
Multiply (with overflow)
mulo rdest,rsrc 1,src2 pseudoinstruction
Multiply register rs and rt. Leave the low-order word of the product in register lo and the high-order word in register hi.
Arithmetic and Logical Instructions – con.
NOR
nor rd,rs,rt 6 5 5 5 5 6
Put the logical NOR of registers rs and rt into register rd.
0 rs rt rd 0 0x27
Negate value (with overflow)
neg rdest,rsrc pseudoinstruction
NOT
not rdest,rsrc pseudoinstruction
Put the sum of registers rs and rt into register rd.
Negate value (without overflow)
negu rdest,rsrc pseudoinstruction
Put the negative of register rsrc into register rdst.
OR
or rd,rs,rt 6 5 5 5 5 6
Put the logical OR of registers rs and rt into register rd.
OR immediate
ori rt,rs,imm 6 5 5 16
Put the logical OR of register rs and the zero-extended immediate into register rt.
0xd rs rt imm
0 rs rt rd 0 0x25
Arithmetic and Logical Instructions – con.
Remainder
rem rdest,rsrc1,rsrc2 pseudoinstruction
Unsigned remainder
remu rdest,rsrc1,rsrc2 pseudoinstructionPut the remainder od register rsrc1 divided by register rsrc2 into register rdest. Note that if an operand is negative, the remainder is unspecified by the MIPS architecture and depends on the convention of the machine on which SPIM is run.
Arithmetic and Logical Instructions – con.
Shift left logical
sll rd,rt,shamt 6 5 5 5 5 6
0 rs rt rd shamt 0
Shift left logical variable
sllv rd,rt,rs 6 5 5 5 5 6
0 rs rt rd 0 4
Shift right arithmetic
sra rd,rt,shamt 6 5 5 5 5 6
0 rs rt rd shamt 3
Shift right arithmetic variable
srav rd,rt,rs 6 5 5 5 5 6
0 rs rt rd 0 7
Arithmetic and Logical Instructions – con.
Shift right logical
sra rd,rt,shamt 6 5 5 5 5 6
0 rs rt rd shamt 2
Shift right logical variable
srlv rd,rt,rs 6 5 5 5 5 6
Shift register rt left (right) by the distance indicated by immediate shamt or the register rs and put the result in register rd.
0 rs rt rd 0 6
Rotate left
rol rdest,rsrc1,rsrc2 pseudoinstruction
Rotate right
ror rdest,rsrc1,rsrc2 pseudoinstructionRotate register rsrc1 left(right) by the distance indicated by rsrc2 and put the result in register rdest.
Arithmetic and Logical Instructions – con.
Subtract (with overflow)
sub rd,rs,rt 6 5 5 5 5 6
Subtract (without overflow)
subu rd,rs,rt 6 5 5 5 5 6
Put the difference of register rs and rt into register rd.
0 rs rt rd 0 0x22
0 rs rt rd 0 0x23
Exclusive OR
xor rd,rs,rt 6 5 5 5 5 6
Put the logical XOR of registers rs and the zero-extended immediate into register rt.
0 rs rt rd 0 0x26
XOR immediate
xori rt,rs,imm 6 5 5 16
Put the logical XOR of register rs and the zero-extended immediate into register rt.
0xe rs rt imm
Constant-Manipulating Instructions
Load upper immediate
lui rt,imm 6 5 5 16
Load the lower halfword of the immediate imm into the upper halfword of registers rt. The lower bits of the register are set to 0.
0xf rs rt imm
Load immediate
li rdest,imm pseudoinstructionMove the immediate imm into register rdest.
Comparison Instructions Set less than
slt rd,rs,rt 6 5 5 5 5 6
Set less than unsigned
sltu rd,rs,rt 6 5 5 5 5 6
Set register rd to 1 if register rs is less than rt, and to 0 otherwise.
0 rs rt rd 0 0x2a
0 rs rt rd 0 0x2b
Set less than immediate
slti rd,rs,imm 6 5 5 16
Set less than unsigned immediate
sltiu rd,rs,imm 6 5 5 16
Set register rd to 1 if register rs is less than the sign-extended immediate, and to 0 otherwise.
0xb rs rd imm
0xa rs rd imm
Comparison Instructions – cont.
Set greater than equal
sge rdest,rsrc1,rsrc2 pseudoinstruction
Set equal
seq rdest,rsrc1,rsrc2 pseudoinstruction
Set register rdest to 1 if register rsrc1 equals rsrc2, and to 0 otherwise.
Set greater than equal unsigned
sgeu rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is greater than or equal to rsrc2, and to 0 otherwise.
Set greater than
sgt rdest,rsrc1,rsrc2 pseudoinstruction
Comparison Instructions – cont.
Set greater than unsigned
sgtu rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is greater than rsrc2, and to 0 otherwise.
Set less than equal
sle rdest,rsrc1,rsrc2 pseudoinstruction
Set less than equal unsigned
sleu rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is less than or equal to rsrc2, and to 0 otherwise.
Set not equal
sne rdest,rsrc1,rsrc2 pseudoinstructionSet register rdest to 1 if register rsrc1 is not equal to rsrc2, and to 0 otherwise.
Branch Instructions
Branch Instruction
b label pseudoinstructionUnconditionally branch to the instruction at the label.
Branch coprocessor z true
bczt label 6 5 5 16
Branch coprocessor z false
bczf label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if z's condition flag is true (false). z is 0, 1, 2, or 3. The floating-point unit is z = 1.
0x1z 8 1 offset
0x1z 8 0 offset
Branch Instructions – cont.
Branch on equal
beq rs,rt,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs equals rt.
Branch on greater than equal zero
bgez rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is greater than or equal to 0.
4 rs rt offset
1 rs 1 offset
Branch on greater than equal zero and link
bgezal rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is greater than or equal to 0. Save the address of the next instruction in register 31.
1 rs 0x11 offset
Branch Instructions – cont.
Branch on greater than zero
bgtz rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is greater than 0.
Branch on less than equal zero
blez rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is less than or equal to 0.
7 rs 0 offset
6 rs 0 offset
Branch on less than equal zero and link
bltzal rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is less than 0. Save the address of the next instruction in register 31.
1 rs 0x10 offset
Branch Instructions – cont.
Branch on less than zero
bltz rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is less than 0.
Branch on not equal
bne rs,label 6 5 5 16
Conditionally branch the number of instructions specified by the offset if register rs is not equal to rt.
1 rs 0 offset
5 rs rt offset
Branch on equal zero
beqz rsrc,label pseudoinstruction
Conditionally branch to the instructions at the label if rsrc1 equals 0.
Branch Instructions – cont.
Branch on greater than equal
bge rsrc1,rsrc2,label pseudoinstruction
Branch on greater than equal unsigned
bgeu rsrc1,rsrc2,label pseudoinstruction
Conditionally branch to the instruction at the label if register rsrc1 is greater than or equal to rsrc2.
Branch on greater than
bgt rsrc1,rsrc2,label pseudoinstruction
Branch on greater than unsigned
bgtu rsrc1,rsrc2,label pseudoinstruction
Conditionally branch to the instruction at the label if register rsrc1 is greater than rsrc2.
Branch Instructions – cont. Branch on less than equal
ble rsrc1,rsrc2,label pseudoinstruction
Branch on less than equal unsigned
bleu rsrc1,rsrc2,label pseudoinstructionConditionally branch to the instruction at the label if register rsrc1 is less than or equal to rsrc2.
Branch on less than
blt rsrc1,rsrc2,label pseudoinstruction
Branch on less than unsigned
bltu rsrc1,rsrc2,label pseudoinstructionConditionally branch to the instruction at the label if register rsrc1 is less than rsrc2. Branch on not equal zero
bnez rsrc,label pseudoinstructionConditionally branch to the instruction at the label if register rsrc is not equal to 0.
Jump Instructions
Jump
J target 6 26
Unconditionally jump to the instruction at target.
Jump and link
jal target 6 26
Unconditionally jump to the instruction at target. Save the address of the next instruction in register rd.
2 target
2 target
Jump and link register
jalr rs,rd 6 5 5 5 5 6
Unconditionally jump to the instruction whose address is in register rs. Save the address of the next instruction in register rd (which defaults to 31).
0 rs 0 rd 0 9
Jump Instructions – cont. Jump register
jr rs 6 5 5 16
Unconditionally jump to the instruction whose address is in register rs.
0x0 rs 0 0x8
Load address
la rdest,address pseudoinstructionLoad computed address-not the contents of the location-into register rdest.
Load byte
lb rt,address 6 5 5 16
0x20 rs rt offset
Load unsigned byte
lbu rt,address 6 5 5 16
Load the byte at address into register rt. The byte is sign-extended by lb, but not by lbu.
0x24 rs rt offset
Load halfword
lh rt,address 6 5 5 16
0x21 rs rt offset
Load unsigned halfword
lhu rt,address 6 5 5 16
Load the 16-bit quantity (halfword) at address into register rt. The halfword is sign-extended by lh, but not by lhu.
0x25 rs rt offset
Jump Instructions – cont.
Load word
lw rt,address 6 5 5 16
Load the 32-bit quantity (word) at address into register rt.
0x23 rs rt offset
Jump Instructions – cont.
Load word coprocessor
lwcz rt,address 6 5 5 16
Load the word at address into register rt of coprocessor z (0-3). The floating-point unit is z=1.
0x3z rs rt offset
Load word left
lwl rt,address 6 5 5 16
0x22 rs rt offset
Load word right
lwr rt,address 6 5 5 16
Load the left (right) bytes from the word at the possibly unaligned address into register rt.
0x23 rs rt offset
Jump Instructions – cont. Load doubleword
ld rdest,address pseudoinstruction Load the 64-bit quantity at address into register rdest and rdest + 1.
Unaligned load halfword
ulh rdest,address pseudoinstruction
Unsigned load halfword unaligned
ulhu rdest,address pseudoinstruction Load the 16-bit quantity (halfword) at the possibly unaligned address into register rdest. The halfword is sign-extended by ulh, but not ulhu.
Unsigned load word
ulw rdest,address pseudoinstruction Load the 32-bit quantity (word) at the possibly unaligned address into register rdest.
Store Instructions Store byte
sb rt,address 6 5 5 16
Store the low byte from register rt at address.
0x28 rs rt 0x8
Store word
sw rt,address 6 5 5 16
Store the word from register rt at address.
0x2b rs rt offset
Store word coprocessor
swcz rt,address 6 5 5 16
Store the word from register rt of coprocessor z at address. The floating point unit is z = 1.
0x3(1-z) rs rt offset
Store halfword
sh rt,address 6 5 5 16
Store the low halfword from register rt at address.
0x29 rs rt 0x8
Store Instructions – cont. Store word left
swl rt,address 6 5 5 16
0x2a rs rt 0x8
Store doublewordsd rsrc,address pseudoinstruction
Store the 64-bit quantity in registers rsrc and rsrc + 1 address.
Store word right
swr rt,address 6 5 5 16
Store the left (right) bytes from register rt at the possibly unaligned address.
0x29 rs rt 0x8
Unaligned store halfwordush rsrc,address pseudoinstruction Store the low halfword from register rsrc at the possibly unaligned address.
Unaligned store wordusw rsrc,address pseudoinstruction Store the word from register rsrc at the possibly unaligned address.
Data Movement Instructions Move
move rdest,rsrc pseudoinstructionMove register rsrc to rdest.
Move from hi
mfhi rd 6 10 5 5 6
0 0 rd 0 0x10
Move from lo
mflo rd 6 10 5 5 6
Move the hi (lo) register to register rd.
0 0 rd 0 0x12
Move to hi
mthi 6 5 15 6
0 rs 0 0x11
Move to lo
mtlo 6 5 15 6
Move register rs to the hi (lo) register.
Data Movement Instructions – cont.
0 rs 0 0x11
Move from coprocessor z
mfcz rt,rd 6 5 5 5 11
Move coprocessor z's register rd to CPU register rt. The floating-point unit is coprocessor z=1.
0x1z 0 rt rd 0
Move double from coprocessor zmfcl,d rdest,rsrc1 pseudoinstructionMove floating-point registers frsrc1 and frsrc1 + 1 to CPU registers rdest and rdest + 1.
Move to coprocessor z
mtcz rt,rd 6 5 5 5 11
Move CPU register rt to coprocessor z's register rd.
0x1z 0 rt rd 0
Floating-Point Instructions Floating-point absolute value double
abs.d fd,fs 6 5 5 5 5 6
0x11 1 0 fs fd 5
Floating-point absolute value single
abs.s fd,fs 6 5 5 5 5 6
Compute the absolute value of the floating-point double (single) in register fs and put it in register fd.
0x11 1 0 fs fd 5
Floating-point addition double
add.d fd,fs,ft 6 5 5 5 5 6
0x11 1 0 fs fd 5
Floating-point addition single
add.s fd,fs,ft 6 5 5 5 5 6
Compute the sum of the floating-point double (single) in register fs and ft and put it in register fd.
0x11 1 0 fs fd 5
Floating-Point Instructions – cont. Compare equal double
c.eq.d fs,ft 6 5 5 5 5 2 4
0x11 1 ft fs fd FC 2
Compare equal single
c.eq.s fs,ft 6 5 5 5 5 2 4
Compare the floating-point double in register fs against the one in ft and set the floating-point condition flag true if they are equal.
0x11 0 ft fs fd FC 2
Compare less than equal double
c.le.d fs,ft 6 5 5 5 5 2 4
0x11 1 ft fs fd FC 2
Compare less than equal single
c.le.s fs,ft 6 5 5 5 5 2 4
Compute the floating-point double in register fs against the one in ft and set the floating-point condition flag true if the first is less than or equal to the second. Use the bclt or bclf instructions to test the value of this flag.
0x11 0 ft fs fd FC 2
Floating-Point Instructions – cont. Compare less than double
c.lt.d fs,ft 6 5 5 5 5 2 4
0x11 1 ft fs 0 FC 2
Compare less than single
c.lt.s fs,ft 6 5 5 5 5 2 4
Compute the floating-point double in register fs against the one in ft and set the condition flag true if the first is less than the second. Use the bclt or bclf instructions to test the value of this flag.
0x11 0 ft fs 0 FC 2
Convert single to double
cvt.d.s fd,fs 6 5 5 5 5 6
0x11 1 0 fs fd 0x21
Convert integer to double
cvt.d.s fd,fs 6 5 5 5 5 6
Convert the single precisioion floating-point number or integer in register fs to a double precision number and put it in register fd.
0x11 1 0 fs fd 0x21
Floating-Point Instructions – cont. Convert double to single
cvt.s.d fd,fs 6 5 5 5 5 6
0x11 1 0 fs fd 0x20
Convert integer to single
cvt.s.w fd,fs 6 5 5 5 5 6
Convert the double precisioion floating-point number or integer in register fs to a single precision number and put it in register fd.
0x11 0 0 fs fd 0x20
Convert double to integer
cvt.w.d fd,fs 6 5 5 5 5 6
0x11 1 0 fs fd 0x24
Convert single to integer
cvt.w.s fd,fs 6 5 5 5 5 6
Convert the double or single precisioion floating-point number in register fs to an integer and put it in register fd.
0x11 0 0 fs fd 0x24
Floating-Point Instructions – cont.
Floating-point divide double
div.d fd,fs,ft 6 5 5 5 5 6
0x11 1 ft fs fd 3
Floating-point divide single
div.s fd,fs,ft 6 5 5 5 5 6
Compute the quotient of the floating-point doubles (singles) in registers fs and ft and put it in register fd.
0x11 0 ft fs fd 3
Load floating-point double
l.d fdest,address pseudoinstruction
Load floating-point single
l.s fdest,address pseudoinstruction
Load the floating-point double (single) at address into register fdest.
Floating-Point Instructions – cont.
Move floating-point double
mov.d fd,fs 6 5 5 5 5 6
0x11 1 0 fs fd 6
Move floating-point single
mov.s fd,fs 6 5 5 5 5 6
Move the floating-point double (single) from register fs to register fd.
0x11 0 0 fs fd 6
Floating-point multiply double
mul.d fd,fs,ft 6 5 5 5 5 6
0x11 1 ft fs fd 2
Floating-point multiply single
mul.s fd,fs,ft 6 5 5 5 5 6
Compute the product of the floating-point dobule (single) in registers fs and ft and put it in register fd.
0x11 0 ft fs fd 2
Floating-Point Instructions – cont.
Negate double
neg.d fd,fs 6 5 5 5 5 6
0x11 1 ft fs fd 7
Negate single
neg.s fd,fs 6 5 5 5 5 6
Negate the floating-point double (single) in register fs and put it in register fd.
0x11 0 0 fs fd 7
Store floating-point double
s.d fdest,address pseudoinstruction
Store floating-point single
s.s fdest,address pseudoinstructionStore the floating-point double (single) in register fdest at address.
Floating-Point Instructions – cont.
Floating-point subtract double
sub.d fd,fs,ft 6 5 5 5 5 6
0x11 1 ft fs fd 1
Floating-point subtract single
sub.s fd,fs,ft 6 5 5 5 5 6
Compute the difference of the floating-point double (single) in registers fs and ft and put it in register fd.
0x11 0 ft fs fd 1
Floating-Point Instructions Return from exception
rfe 6 1 19 6
Restore the Status register.
0x10 1 0 0x20
System call
syscall 6 20 6
Register $v0 contains the number of the system call.
0 0 0xc
Break
break 6 20 6
Cause exception code. Exception 1 is reserved for the debugger.
0 code 0xd
No operation
nop 6 5 5 5 5 6
Do nothing.
0 0 0 0 0 0