QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4...

52
1 QtMips – Simulator for Computer Architectures Education QtMips – Simulator for Education Karel Kočí, Pavel Píša, Michal Štepanovský [*1] https://github.com/cvut/QtMips/ [*2] https://cw.fel.cvut.cz/wiki/courses/b35apo/en/start Czech Technical University in Prague CPU Core, Pipeline and Cache Visualization [*1] for Computer Architecture Courses [*2]

Transcript of QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4...

Page 1: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

1QtMips – Simulator for Computer Architectures Education

QtMips – Simulator for Education

Karel Kočí, Pavel Píša, Michal Štepanovský[*1] https://github.com/cvut/QtMips/[*2] https://cw.fel.cvut.cz/wiki/courses/b35apo/en/start

Czech Technical University in Prague

CPU Core, Pipeline and Cache Visualization [*1] for Computer Architecture Courses [*2]

Page 2: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

2QtMips – Simulator for Computer Architectures Education

QtMips – MIPS Architecture Emulator

CPU core view● single cycle● pipelined

Registers

Code

Peripherals

Cache

Terminal

Data memory

Terminal

Editor

Assembler

Exceptionscontrol

MakeSingle StepRunLoad

Page 3: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

3QtMips – Simulator for Computer Architectures Education

QtMips – Download

● Windows, Linux, Mac https://github.com/cvut/QtMips/releases

● Ubuntu

https://launchpad.net/~ppisa/+archive/ubuntu/qtmips● Suse, Fedora and Debian

https://software.opensuse.org//download.html?project=home%3Appisa&package=qtmips

● Suse Factory

https://build.opensuse.org/package/show/Education/qtmips● Online version

http://cmp.felk.cvut.cz/~pisa/apo/qtmips/qtmips_gui.html● MIPS-ELF binutils and GCC for Linux, MAC OS and

Windows

http://cmp.felk.cvut.cz/~pisa/apo/qtmips/

Page 4: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

4QtMips – Simulator for Computer Architectures Education

QtMips – Origin and Development

● MipsIt used in past for Computer Architecture course at the Czech Technical University in Prague, Faculty of Electrical Engineering

● Diploma theses of Karel Kočí mentored by Pavel Píša

Graphical CPU Simulator with Cache Visualizationhttps://dspace.cvut.cz/bitstream/handle/10467/76764/F3-DP-2018-Koci-Karel-diploma.pdf

● Switch to QtMips in the 2019 summer semester● Fixes, extension and partial internals redesign by Pavel Píša

● Alternatives:● SPIM/QtSPIM: A MIPS32 Simulator

http://spimsimulator.sourceforge.net/● MARS: IDE with detailed help and hints

http://courses.missouristate.edu/KenVollmar/MARS/index.htm● EduMIPS64: 1x fixed and 3x FP pipelines

https://www.edumips.org/

Page 5: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

5QtMips – Simulator for Computer Architectures Education

Compilation: C Assembler Machine Code

int pow = 1;int x = 0; while(pow != 128){ pow = pow*2; x = x + 1;}

addi s0, $0, 1 // pow = 1

addi s1, $0, 0 // x = 0

addi t0, $0, 128 // t0 = 128

while:

beq s0, t0, done // if pow==128, go to done

sll s0, s0, 1 // pow = pow*2

addi s1, s1, 1 // x = x+1

j while

done:

Page 6: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

6QtMips – Simulator for Computer Architectures Education

Hardware realization of basic (main) CPU cycle

Program counter, 32 b

Instruction memory

instruction, 32 bits

constant 4

Instruction address

Next instruction address

Page 7: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

7QtMips – Simulator for Computer Architectures Education

The goal of this lecture

● To understand the implementation of a simple computer consisting of CPU and separated instruction and data memory

● Our goal is to implement following instructions:● Read and write a value from/to the data memorylw – load word, sw – store word

● Arithmetic and logic instructionsadd, sub, and, or, slt

● Program flow change/jump instruction beq● CPU will consist of control unit and ALU.● Notes:

● The implementation will be minimal (single cycle CPU – all operations processed in the single step/clock period)

● The lecture 4 focuses on more realistic pipelined CPU implementation

Page 8: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

8QtMips – Simulator for Computer Architectures Education

The instruction format and instruction types

● The three types of the instructions are considered:

● the R type instructions → opcode=000000, funct – operation ● rs – source, rd – destination, rt – source/destination● shamt – for shift operations, immediate – direct operand

● 5 bits allows to encode 32 GPRs ($0 is hardwired to 0/discard)

Type 31… 0

R opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 rd(5), 15:11 shamt(5) funct(6), 5:0

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

J opcode(6), 31:26 address(26), 25:0

Page 9: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

9QtMips – Simulator for Computer Architectures Education

Opcode encoding

Instruction Opcode Func Operation ALU function ALU control

lw 100011 XXXXXX load word add 0010

sw 101011 XXXXXX store word add 0010

beq 000100 XXXXXX branch equal subtract 0110

add 000000R-type

100000 add add 0010

sub 100010 subtract subtract 0110

and 100100 AND AND 0000

or 100101 OR OR 0001

slt 101010 set-on-less-than set-on-less-than 0111

Decode opcode to the ALU operation●Load/Store (I-type): F = add – add offset to the address base●Branch (I-type): F = subtract – used to compare operands●R-type: F depends on funct fieldThere are more I-type operations which use ALU in the real MIPS ISA

Page 10: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

10QtMips – Simulator for Computer Architectures Education

CPU building blocks

Instr. Memory(ROM)

A RD32 32PC’ PC

32 32

CLK5

Reg. File

A1

A2A3

WE3RD1

RD2

WD3

55

32

32

CLK

32

Data Memory

A RD

WD

WE32

32

32

CLK

Write at the rising edge of CLK when WE = 1

Read after “enough time” for data propagationMultiplexer

Page 11: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

11QtMips – Simulator for Computer Architectures Education

The load word instruction

Description A word is loaded into a register from the specified address

Operation: $t = MEM[$s + offset];

Syntax: lw $t, offset($s)

Encoding: 1000 11ss ssst tttt iiii iiii iiii iiii

lw – load word – load word from data memory into a register

Example: Read word from memory address 0x4 into register number 11:lw $11, 0x4($0)

1000 11ss ssst tttt iiii iiii iiii iiii1000 1100 0000 1011 0000 0000 0000 0100

0 11 4

0x 8C 0B 00 04 – machine code for instruction lw $11, 0x4($0)Note: Register $0 is hardwired to the zero

Page 12: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

12QtMips – Simulator for Computer Architectures Education

Single cycle CPU – implementation of the load instruction

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

20:16

Sign Ext

ALU

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

ALUControl

lw: type I, rs – base address, imm – offset, rt – register where to store fetched data

Page 13: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

13QtMips – Simulator for Computer Architectures Education

Single cycle CPU – implementation of the load instruction

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

20:16

Sign Ext

ALU

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

lw: type I, rs – base address, imm – offset, rt – register where to store fetched data

Write at the rising edge of the clock

ALUControlRegWrite = 1

Page 14: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

14QtMips – Simulator for Computer Architectures Education

Single cycle CPU – implementation of the load instruction

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

20:16

Sign Ext

ALU

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

ALUControlRegWrite = 1

4

PCPlus4

+

lw: type I, rs – base address, imm – offset, rt – register where to store fetched data

Page 15: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

15QtMips – Simulator for Computer Architectures Education

Single cycle CPU – implementation of the store instruction

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

20:16

Sign Ext

ALU

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

sw: type I, rs – base address, imm – offset, rt – select register to store into memory

ALUControlRegWrite = 0

4

PCPlus4

+

MemWrite = 1

20:16

Page 16: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

16QtMips – Simulator for Computer Architectures Education

Single cycle CPU – implementation of the add instruction

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

Sign Ext

ALU

add: type R, rs, rt – source, rd – destination, funct – select ALU operation = add

ALUControlRegWrite = 1

4

PCPlus4

+

20:16

R opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 rd(5), 15:11 shamt(5) funct(6), 5:0

WriteReg01

20:16

15:11

RegDst = 1

ALUSrc = 0

Result01

MemToReg = 0

WriteDataRt

Rd

01

Page 17: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

17QtMips – Simulator for Computer Architectures Education

Single cycle CPU – sub, and, or, slt

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

Sign Ext

ALU

Only difference is another ALU operation selection (ALUcontrol). The data path is the same as for add instruction

ALUControlRegWrite = 1

4

PCPlus4

+

20:16

WriteReg01

20:16

15:11

RegDst = 1

ALUSrc = 0

Result01

MemToReg = 0

WriteDataRt

Rd

01

Page 18: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

18QtMips – Simulator for Computer Architectures Education

Single cycle CPU – implementation of beq

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

Sign Ext

ALU

beq – branch if equal; imm–offset; PC´ = PC+4 + SignImm*4

ALUControlRegWrite = 0

4

PCPlus4

+

20:16

WriteReg01

20:16

15:11

RegDst = X

ALUSrc = 0

Result01

MemToReg = x

WriteData

Branch = 1

+

01

<<2

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

Rt

Rd

01

Page 19: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

19QtMips – Simulator for Computer Architectures Education

Single cycle CPU – Throughput: IPS = IC / T = IPCstr

.fCLK

● What is the maximal possible frequency of the CPU?● It is given by latency on the critical path – it is lw instruction in our case:

Tc = t

PC + t

Mem + t

RFread + t

ALU + t

Mem + t

Mux + t

RFsetup

PC’ PC Instr 25:21

15:0

SrcA

SrcB

Zero

AluOut

SignImm

ReadDataInstr. Memory

A RD

Data Memory

A RD

WD

WE

Sign Ext

ALU

4

PCPlus4

+

20:16

WriteReg01

20:16

15:11

Result01

WriteData

+

01

<<2

Rt

Rd

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

01

Page 20: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

20QtMips – Simulator for Computer Architectures Education

Single cycle CPU – Throughput: IPS = IC / T = IPCstr

.fCLK

● What is the maximal possible frequency of the CPU?● It is given by latency on the critical path – it is lw instruction in

our case: T

c = t

PC + t

Mem + t

RFread + t

ALU + t

Mem + t

Mux + t

RFsetup

Consider following parameters:● t

PC= 30 ns

● tMem

= 300 ns● t

RFread= 150 ns

● tALU

= 200 ns● t

Mux= 20 ns

● tRFsetup

= 20 ns

Then Tc = 1020 ns → f

CLK max = 980 kHz,

IPS = 980e3 = 980 000 instructions per second

Page 21: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

21QtMips – Simulator for Computer Architectures Education

Notes

● Remember the result, so you can compare it with result for pipelined CPU during lecture 4

● You should compare this with actual 30e9 IPS per core, i.e. total 128 300 MIPS for today high-end CPUs

● How many clever enhancements in hardware and programming/compilers are required for such advance!!!

● After this course you should see behind the first two hills on that road.

● We will continue with control unit implementation and its function

Page 22: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

22QtMips – Simulator for Computer Architectures Education

Single cycle CPU – Control unit

R opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 rd(5), 15:11 shamt(5) funct(6), 5:0

I opcode(6), 31:26 rs(5), 25:21 rt(5), 20:16 immediate (16), 15:0

J opcode(6), 31:26 address(26), 25:0

Main decoder ALU op decoderALUOp

Opcode funct5 5

23 ALUControl…

Control signals values reflect opcode and funct fields

ALUOp

00 addition

01 subtraction

10 according to funct

11 -not used-

Opcode RegWrite

RegDst ALUSrc ALUOp Branch Mem Write

MemTo Reg

R-type 000000 1 1 0 10 0 0 0

lw 100011 1 0 1 00 0 0 1

sw 101011 0 X 1 00 0 1 X

beq 000100 0 X 0 01 1 0 X

Page 23: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

23QtMips – Simulator for Computer Architectures Education

ALU Control (ALU function decoder)

ALUOp (selector) Funct ALUControl

00 X 010 (add)

01 X 110 (sub)

1X add (100000) 010 (add)

1X sub (100010) 110 (sub)

1X and (100100) 000 (and)

1X or (100101) 001 (or)

1X slt (101010) 111 (set les than)

Page 24: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

24QtMips – Simulator for Computer Architectures Education

The control unit of the single cycle cpu

MemWriteMemToReg

BranchALUControl 2:0ALUScr

RegDest

RegWrite

4

PC’ PC Instr 25:21

20:16

20:16

15:11

15:0

SrcA

SrcB

Zero

AluOut

WriteDataWriteReg

SignImm PCBranch

ReadData

Result

PCPlus4

Rt

Rd

Instr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

+

01

01

01

Sign Ext <<2

01

ALU

31:26

5:0

Control Unit

Opcode

Funct

Page 25: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

25QtMips – Simulator for Computer Architectures Education

Pipelined instructions execution

Suppose that instruction execution can be divided into 5 stages:

IF – Instruction Fetch, ID – Instruction decode (and Operands Fetch), EX – Execute, MEM – Memory Access, WB – Write Back

and = max { i }ki=1, where i is time required for signal propagation (propagation delay) through i-th stage.

IF – setup PC for memory and fetch pointed instruction. Update PC = PC+4

ID – decode the opcode and read registers specified by instruction, check for equality (for possible beq instruction), sign extend offset, compute branch target address for branch case (this is means to extend offset and add PC)

EX – execute function/pass register values through ALU

MEM – read/write main memory for load/store instruction case

WB – write result into RF for instructions of register-register class or instruction load (result source is ALU or memory)

IF ID EX MEM WB

Page 26: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

26QtMips – Simulator for Computer Architectures Education

Instruction-level parallelism - pipelining

● The time to execute n instructions in the k-stage pipeline:

Tk = k. + (n – 1)

● Speedup:

Prerequisite: pipeline is optimally balanced, circuit can arbitrarily divided

IF I1 I2 I3 I4 I5 I6 I7 I8 I9 I10

ID I1 I2 I3 I4 I5 I6 I7 I8 I9

EX I1 I2 I3 I4 I5 I6 I7 I8

MEM I1 I2 I3 I4 I5 I6 I7

ST I1 I2 I3 I4 I5 I6

1 2 3 4 5 6 7 8 9 10

5

Sk=T1

Tk

=nk τ

kτ+(n−1)τlimn→∞

Sk=k

čas

Page 27: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

27QtMips – Simulator for Computer Architectures Education

Instruction-level parallelism - pipelining

● Does not reduce the execution time of individual instructions, effect is just the opposite...

● Hazards:● structural (resolved by duplication), ● data (result of data dependencies: RAW, WAR, WAW)● control (caused by instructions which change PC)...

● Hazard prevention can result in pipeline stall or pipeline flush

● Remark : Deeper pipeline (more stages) results in shorter sequences of gates in each stage which enables to increase the operating frequency of the processor…, but more stages means higher overhead (demand to arrange better instructions into pipeline and result in more significant lag in the case of stall or pipeline flush)

Page 28: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

28QtMips – Simulator for Computer Architectures Education

Instruction-level parallelism – Semantics violations

Data hazard:

ADD R1,R2,R3

SUB R4,R1,R3

Control hazard:

BEQZ R3, M1

ADD R6,R1,R2

instruction 3

instruction 4

M1: ADD R4,R6,R7

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

Add writes new value to R1

SUB reads incorrect value from R1

Condition and new PC evaluation

PC set to branch target

Should be these instructions fetched (and executed then)?

flow of instructions and expected effect

Page 29: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

29QtMips – Simulator for Computer Architectures Education

Non-pipelined execution

4

PC’ PC Instr 25:21

20:16

20:16

15:11

15:0

SrcA

SrcB

Zero

AluOut

WriteDataWriteReg

SignImm PCBranch

ReadData

Result

PCPlus4

Rt

Rd

Instr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

+

01

01

01

01

Sign Ext <<2

ALU

Page 30: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

30QtMips – Simulator for Computer Architectures Education

Pipelined execution

4

PC’ PC Instr 25:21

20:16

20:16

15:11

15:0

SrcA

SrcB

Zero

AluOutM

WriteDataEWriteRegE

SignImmPCPlus4D

PCBranch

WriteDataM

PCPlus4E

WriteRegM WriteRegW

AluOutW

ReadData

Result

PCPlus4F

Rt

Rd

Instr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

+

01

01

01

01

Sign Ext <<2

Fetch Decode Execute Memory WriteBack

ALU

Page 31: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

31QtMips – Simulator for Computer Architectures Education

MemWriteMemToReg

BranchALUControl 2:0ALUScrRegDest

RegWrite

31:26

5:0

Control Unit

Opcode

Funct

Pipelined execution

4

PC’ PC Instr 25:21

20:16

20:16

15:11

15:0

SrcA

SrcB

Zero

AluOutM

WriteDataEWriteRegE

SignImmPCPlus4D

PCBranch

WriteDataM

PCPlus4E

WriteRegM WriteRegW

AluOutW

ReadData

Result

PCPlus4F

Rt

Rd

Instr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

+

01

01

01

01

Sign Ext <<2

Fetch Decode Execute Memory WriteBack

ALU

Page 32: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

32QtMips – Simulator for Computer Architectures Education

The same design but drawn scaled down…

01

Instruction Memory

A RD

01

10

Data Memory

A RD

WD

WE

<<2

SignExt

+

01

Control unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

PCSrcM

31:26

5:0

25:21

20:16

20:1615:11

15:0 SignImmD SignImmE

RtDRdD

RtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

Op

Funct

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

Zero

BranchE BranchD

ALU

Page 33: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

33QtMips – Simulator for Computer Architectures Education

● Register File – access from two pipeline stages (Decode, WriteBack) – actual write occurs at the first half of the clock cycle, the read in the second half ⇒ there is no hazard for sub $s0 input operand

● RAW (Read After Write) hazard – and (or) requires $s0 in 3 (4)● How can such hazard be prevented without pipeline throughput

degradation?

Cause of the data hazards

Page 34: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

34QtMips – Simulator for Computer Architectures Education

Forwarding to avoid data hazards

● If a result is available (computed) before subsequent instruction(s) requires the value then data hazard can be avoided by forwarding

● Hazard case is indicated when some of source registers in EX stage is the same as destination register in stage MEM or WB

● The register numbers are fed to the Hazard Unit● The RegWrite signal from MEM and WB stage has to be monitored as

well to check that register number on WriteReg lines takes effect – lw / sw etc.

Page 35: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

35QtMips – Simulator for Computer Architectures Education

CPU after previous design steps

01

Instruction Memory

A RD

01

10

Data Memory

A RD

WD

WE

<<2

SignExt

+

01

Control unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

PCSrcM

31:26

5:0

25:21

20:16

20:1615:11

15:0 SignImmD SignImmE

RtDRdD

RtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

Op

Funct

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

Zero

BranchE BranchD

ALU

Page 36: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

36QtMips – Simulator for Computer Architectures Education

Data hazards solved by forwarding

01

Instruction Memory

A RD

01

10

Data Memory

A RD

WD

WE

<<2

SignExt

+

01

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

PCSrcM

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

Op

Funct

ForwardAE

ForwardBE RegWriteM RegWrite

W

Reg. File

A1 RD1

A2 RD2A3WD3

WE3 00

1001

000110

+

ALU

Zero

BranchE BranchD

Page 37: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

37QtMips – Simulator for Computer Architectures Education

● If subsequent instructions require result before it is available in CPU then the pipeline has to be stalled (stall state inserted)

● The stall is mean to solve hazard but affect system throughput● Pipeline stages preceding that one which is affected by the hazard are

stalled until all results required by subsequent instructions are available – results are forwarded to the sink which required their value

Data hazard avoided by pipeline stall

Page 38: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

38QtMips – Simulator for Computer Architectures Education

Data hazard avoided by pipeline stall

● The stall is realized by the holding content of the inter-stage registers (gating their clocks or blocking their latch enable signals)

● Results from colliding stages have to be „discarded“ – certain control signals in CPU (RF or memory write enable, branch gating) are reset (held low)

● Both is achieved by introduction of control signals to hold and/or reset inter-stages registers

Page 39: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

39QtMips – Simulator for Computer Architectures Education

Processor design build till now

01

Instruction Memory

A RD

01

10

Data Memory

A RD

WD

WE

<<2

SignExt

+

01

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

PCSrcM

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

Op

Funct

ForwardAE

ForwardBE RegWriteM RegWrite

W

Reg. File

A1 RD1

A2 RD2A3WD3

WE3 00

1001

000110

+

ALU

Zero

BranchE BranchD

Page 40: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

40QtMips – Simulator for Computer Architectures Education

Processor with data hazards avoided by stall

01

Instruction Memory

A RD

01

10

Data Memory

A RD

WD

WE

<<2

SignExt

+

01

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

PCSrcM

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

Op

Funct

ForwardAE

ForwardBE RegWriteM RegWrite

W

Reg. File

A1 RD1

A2 RD2A3WD3

WE3 00

1001

000110

+

ALU

Zero

BranchE BranchD

EN

Stall F

EN

Stall D

CLR

Page 41: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

41QtMips – Simulator for Computer Architectures Education

Control hazards (branch and jump)

● Result is not known before 4th cycle. Why?

Page 42: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

42QtMips – Simulator for Computer Architectures Education

Control hazards – better to know result earlier…

● If the result of comparison can be evaluated in the 2nd cycle misprediction penalty can be reduced

● But the processing of the comparison at earlier stage can induce new RAW hazards..!!!

Page 43: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

43QtMips – Simulator for Computer Architectures Education

Resolve control hazards by early evaluate and flush

01

Instruction Memory

A RD

+

01

10

Data Memory

A RD

WD

WE

<<2

=

SignExt

+

ALU01

00

1001

000110

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

EquaD PCSrcD

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD

SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

ENCLR

EN

Op

Funct

Stall F Stall DForward

AEForwardBE RegWriteM RegWrite

W

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

CLR

Page 44: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

44QtMips – Simulator for Computer Architectures Education

Resolve RAW hazards by forwarding or stalling

01

Instruction Memory

A RD

+

01

10

Data Memory

A RD

WD

WE

<<2

=

SignExt

+

ALU01

00

1001

000110

01

01

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

EquaD PCSrcD

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD

SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

ENCLR

EN

Op

Funct

Stall F Stall D BranchDForwardBD

ForwardAE

ForwardBE RegWriteM RegWrite

W

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

CLRNo

ActionRequired

Forward /Stall

Stall

Page 45: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

45QtMips – Simulator for Computer Architectures Education

We are finished – pipelined processor is designed

01

Instruction Memory

A RD

+

01

10

Data Memory

A RD

WD

WE

<<2

=

SignExt

+

ALU01

00

1001

000110

01

01

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteEALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

EquaD PCSrcD

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD

SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

ENCLR

EN

Op

Funct

Stall F Stall D BranchDForwardBD

ForwardAE

ForwardBE RegWriteM RegWrite

W

CLR

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

Page 46: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

46QtMips – Simulator for Computer Architectures Education

● What is maximal acceptable frequency for the CPU?● Which stage is the slowest one?● The cycle time is determined by the slowest stage● For our case:

Tc = 300 ns --> 3 333 kHz

If the pipeline fill overhead is neglected (i.e. no pipeline stalls and flushes are considered) then ideal IPC = 1.IPS = 1 • 3 333e3 = 3 333 000 instructions per second

● Introduction of the 5-stage pipeline increases performance (throughput) 3 333 000/ 980 000 = 3.4 times! (considering IPC=1)

Pipelined CPU – performance: IPS = IC / T = IPCavg.fCLK

Page 47: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

47QtMips – Simulator for Computer Architectures Education

What is result of the design?

MemWriteMemToReg

BranchALUControl 2:0ALUScrRegDestRegWrite

31:26

5:0

Control Unit

Opcode

Funct

4

PC’ PC Instr25:21

20:16

20:16

15:11

15:0

SrcA

SrcB

Zero

AluOutM

WriteDataWriteReg

SignImmPCPlus4D

PCBranchPCPlus4E

AluOutW

ReadData

Result

PCPlus4F

RtRd

Instr. Memory

A RD

Data Memory

A RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

+

01

01

01

01

Sign Ext <<2

ALU

Return back to non-pipelined CPU version

Page 48: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

48QtMips – Simulator for Computer Architectures Education

Data Memory

What is result of the design?

MemWriteMemToReg

BranchALUControl 2:0ALUScrRegDestRegWrite

31:26

5:0

Control Unit

Opcode

Funct

4

PC’ PC Instr25:21

20:16

20:16

15:11

15:0

SrcA

SrcB

Zero

AluOutM

WriteDataWriteReg

SignImmPCPlus4D

PCBranchPCPlus4E

Result

PCPlus4F

RtRd

A RDA RD

WD

WE

Reg. File

A1 RD1

A2 RD2A3WD3

WE3

+

+

01

01

01

01

Sign Ext <<2

ALU

ReadData

AluOutW

Control unit(control path)

Data/ALU(data path)

Instr. Memory

A RD

A RD

WD

WE

Return back to non-pipelined CPU version

Memory

Page 49: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

49QtMips – Simulator for Computer Architectures Education

Data Memory

What is result of the design?

Instr. Memory

A RD

A RD

WD

WE

Data-path(ALU, registers)

InstructionPC PCRD A

RD A

WD

Read dataAddress for data

Read/Write

Data to Write

Write enable

Address

Results

Processor

Control unit

Page 50: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

50QtMips – Simulator for Computer Architectures Education

CPU design result – pipelined version

01

Instruction Memory

A RD

+

01

10

Data Memory

A RD

WD

WE

<<2

=

SignExt

+

ALU01

001001

000110

01

01

Control unit

Hazard unit

RegWriteDMemToRegDMemWriteDALUControlDALUSrcDRegDstDBranchD

RegWriteEMemToRegEMemWriteE

ALUControlEALUSrcERegDstE

RegWriteMMemToRegMMemWriteM

RegWriteW

MemToRegW

EquaD PCSrcD

31:26

5:0

25:21

20:16

25:2120:1615:11

15:0 SignImmD

SignImmE

RsDRtDRdD

RsERtERdE

SrcAE

SrcBE

WriteDataE

WriteRegE 4:0

WriteDataM

ALUOutM

WriteRegM 4:0 WriteRegW 4:0

ALUOutW

ReadDataW

ResultW

PCPlus4D

PCBranchD

PCPlus4F

4

InstrDPC´ PC

ENCLR

EN

Op

Funct

Stall F Stall D BranchD ForwardBD

ForwardAE

ForwardBE RegWriteM RegWrite

W

Reg. File

A1 RD1A2 RD2A3WD3

WE3

Page 51: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

51QtMips – Simulator for Computer Architectures Education

Literature and resources

● Hennesy, J. L., Patterson, D. A.: Computer Organization and Design, The HW/SW Interface

● Hennesy, J. L., Patterson, D. A.: Computer Architecture : A Quantitative Approach, Third Edition, San Francisco, Morgan Kaufmann Publishers, Inc., 2002

● Shen, J.P., Lipasti, M.H.: Modern Processor Design : Fundamentals of Superscalar Processors, First Edition, New York, McGraw-Hill Inc., 2004

Page 52: QtMips – Simulator for Education · QtMips – Simulator for Computer Architectures Education 4 QtMips – Origin and Development MipsIt used in past for Computer Architecture course

52QtMips – Simulator for Computer Architectures Education

Motivation and Mottos

● QtMips Home Page https://github.com/cvut/QtMips

Implemented for Computer Architectures https://cw.fel.cvut.cz/wiki/courses/b35apo/start

and Advanced Computer Architectures https://cw.fel.cvut.cz/wiki/courses/b4m35pap/start

courses at Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering

● Come and meet with us, robotics, makers automotive etc. projects● Come and teach with us, teaching is the best way to deeper

understanding the subjects, no simulator can generate so much perturbations as students

● Talk is cheap. Show me the code. Linus Torvalds

Reply https://www.openhub.net/accounts/ppisa● Talk is cheap, show me your happiness. Michal Sojka

Reply https://ppisa.rajce.idnes.cz/selected/