Single-Cycle CPU DataPath

26

description

Single-Cycle CPU DataPath. Building A CPU. We’ve built a small ALU Add, Subtract, SLT, And, Or Could figure out Multiply and Divide. What about the rest How do we deal with memory and registers? What about control operations (branches)? How do we interpret instructions?. - PowerPoint PPT Presentation

Transcript of Single-Cycle CPU DataPath

Page 1: Single-Cycle CPU  DataPath
Page 2: Single-Cycle CPU  DataPath

Building A CPU

• We’ve built a small ALU

• Add, Subtract, SLT, And, Or• Could figure out Multiply and Divide...

5.1

• What about the rest

• How do we deal with memory and registers?

• What about control operations (branches)?

• How do we interpret instructions?

• The whole thing...

• A CPU’s datapath deals with moving data around

• A CPU’s control manages the data

Page 3: Single-Cycle CPU  DataPath

Datapath Overview

5.1

InstructionMemory

RegistersData Memory

Read reg. num A

Read reg. num B

Write reg num

Write reg data

Read reg data A

Read reg dataB

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

PC

Instructions: R-type: 3 registers I-type: 2 registers, Data

Instructions: R-type: 3 registers I-type: 2 registers, Data

ALU Computes on: R-type: 2 registers I-type: Register and data

ALU Computes on: R-type: 2 registers I-type: Register and data

Data to write intodest. register from: ALU or Memory

Data to write intodest. register from: ALU or Memory

Memory: Address from ALU Data to/from regs

Memory: Address from ALU Data to/from regs

Current Instruction: PCCurrent Instruction: PC

Page 4: Single-Cycle CPU  DataPath

Instruction Datapath

5.2

InstructionMemory

Read address

InstructionPC

Add

4

• Instructions will be held in the instruction memory

• The instruction to fetch is at the location specified by the PC

• Instr. = M[PC]

Note: Regular instruction width(32 for MIPS) makes this easy

Note: Regular instruction width(32 for MIPS) makes this easy

• After we fetch one instruction, the PC must be incremented to the next instruction

• All instructions are 4 bytes

• PC = PC + 4

Page 5: Single-Cycle CPU  DataPath

R-type Instruction Datapath

5.2

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Result

Zero

ALU

Instruction

• R-type Instructions have three registers

• Two read (Rs, Rt) to provide data to the ALU

• One write (Rd) to receive data from the ALU

• We’ll need to specify the operation to the ALU (later...)

• We might be interested if the result of the ALU is zero (later...)

Read reg num A

Page 6: Single-Cycle CPU  DataPath

Memory Operations

5.2

Data MemoryRead address

Write address

Write data

Read dataResult

Zero

signextend

16 32

• Memory operations first need to compute the effective address

• LW $t1, 450($s3) # E.A. = 450 + $s3

• Add together one register and 16 bits of immediate data

• Immediate data needs to be converted from 16-bit to 32-bit

• Memory then performs load or store using destination register

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

Instruction

Page 7: Single-Cycle CPU  DataPath

Branches

5.2

AddResult

Sh.Left2

Result

Zero

signextend

16 32

PC + 4

To controllogic

Instruction

• Branches conditionally change the next instruction

• BEQ $2, $1, 42

• The offset is specified as the number of words to be added to the next instruction (PC+4)

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

• Control logic has to decide if the branch is taken

• Uses ‘zero’ output of ALU

• Take offset, multiply by 4

• Shift left two

• Add this to PC+4 (from PC logic)

offset

Page 8: Single-Cycle CPU  DataPath

Integrating the R-types and Memory

5.3

• R-types and Load/Stores are similar in many respects

• Differences:

• 2nd ALU source: R-types use register, I-types use Immediate

• Write Data: R-types use ALU result, I-types use memory

• Mux the conflicting datapaths together

Data MemoryRead address

Write address

Write data

Read dataResult

Zero

signextend

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

Instruction

0

1

1

0

MemoryDatapath

Page 9: Single-Cycle CPU  DataPath

Adding the instruction memory

5.3

InstructionMemory

Add

4

Read address

Instruction [31-0]

Result

PC

Simply add the instruction memoryand PC to the beginning of the datapath.

Separate Instruction and Data memories are needed in order to allowthe entire datapath to complete its job in a single clock cycle.

Data MemoryRead address

Write address

Write data

Read dataResult

Zero 1

00

1

signextend

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

Page 10: Single-Cycle CPU  DataPath

Adding the Branch Datapath

5.3

InstructionMemory

Add

4

Read address

Instruction [31-0]

Result

PCData MemoryRead address

Write address

Write data

Read dataResult

Zero 1

00

1

signextend

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

AddResult

Sh.Left2

0

1

Now we have the datapath for R-type, I-type, and branch instructions.

On to the control logic!

Page 11: Single-Cycle CPU  DataPath

When does everything happen?

5.3

InstructionMemory

Data Memory

AddAdd

4

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

Zero

Result

Result Sh.Left2

0

1

1

00

1

signextend

PC

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

Combinational Logic:Just does it! Outputs are always just a function of its inputs (with some delay)

Registers: Written at the end of the clock cycle. (Rising edge triggered).

clk

clk

clk

Single-Cycle Design

Page 12: Single-Cycle CPU  DataPath

Example

• Suppose it takes: • memory 100 nsec to read a word, • the ALU and adders take 4 nsec, • the register file can be read or written in 1 nsec, • the PC can be read or written in 0.2 nsec, • all multiplexors take 0.1 nsec. • Assume everything else takes 0 time (control, shift,

sign extend, wires, etc.). • How long will it take to execute an add instruction? • How long will it take to execute a lw instruction? • How long will it take to execute a beq instruction? • How long will it take to execute a j instruction?

Page 13: Single-Cycle CPU  DataPath
Page 14: Single-Cycle CPU  DataPath

What do we need to control?

5.3

InstructionMemory

Data Memory

AddAdd

4

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

Zero

Result

Result Sh.Left2

0

1

1

00

1

signextend

PC

16 32

ALU -What is theOperation?

ALU -What is theOperation?

Memory-Read/Write/neither?

Memory-Read/Write/neither?

Mux - are webranching or not?

Mux - are webranching or not?

Mux - Wheredoes 2nd ALUoperand come from?

Mux - Wheredoes 2nd ALUoperand come from?

Registers-Should we write data?

Registers-Should we write data? Mux - Result from

ALU or Memory?

Mux - Result fromALU or Memory?

Almost all of the information we need is in the instruction!

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

Page 15: Single-Cycle CPU  DataPath

The ALU• The ALU is stuck right in the middle of everything...

• It must:

• Add, Subtract, And, or Or for arithmetic instructions

• Subtract for a branch on equal

• Subtract and set for a SLT

• Add for a memory access

5.3

0

1

A

Operation

Result

+ 2B

CarryIn

CarryOut

0

1

BInvert

3Less

Function BInvert Op Carryin Result

And 0 00 0 R = A • BOr 0 01 0 R = A BAdd 0 10 0 R = A + BSubtract 1 10 1 R = A - BSLT 1 11 1 R = 1 if A < B

0 if A B

Function BInvert Op Carryin Result

And 0 00 0 R = A • BOr 0 01 0 R = A BAdd 0 10 0 R = A + BSubtract 1 10 1 R = A - BSLT 1 11 1 R = 1 if A < B

0 if A B

Always the same: Combine into one signal called “sub”

Page 16: Single-Cycle CPU  DataPath

Setting the ALU controls• The instruction Opcode and Function give us the info we need

• For R-type instructions, Opcode is zero, function code determines ALU controls

5.3

Instruction Opcode ALUOp Funct. Code ALU action ALU controlsub op

add R-type 10 100000 add 0 10 sub R-type 10 100010 subtract 1 10and R-type 10 100100 and 0 00or R-type 10 100101 or 0 01SLT R-type 10 101010 SLT 1 11

New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-typeNew control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type

• For I-type instructions, Opcode determines ALU controls

load word LW 00 xxxxxx add 0 10store word SW 00 xxxxxx add 0 10branch equal BEQ 01 xxxxxx subtract 1 10

Page 17: Single-Cycle CPU  DataPath

Controlling the ALU

5.3

ALUOp F5 F4 F3 F2 F1 F0 Function ALU Ctrl

00 x x x x x x Add 0 10x1 x x x x x x Sub 1 101x x x 0 0 0 0 Add 0 101x x x 0 0 1 0 Sub 1 101x x x 0 1 0 0 And 0 001x x x 0 1 0 1 Or 0 011x x x 1 0 1 0 SLT 1 11

Since ALUOp can only be 00, 01, or 10, we don’t care what ALUOp2 is when ALUOP1 is 1

Since ALUOp can only be 00, 01, or 10, we don’t care what ALUOp2 is when ALUOP1 is 1

For ALUOp = 00 or 01, function code is unused

For ALUOp = 00 or 01, function code is unused

AluOp is determined by Opcode -separate logic will generate ALUOp

A 6-input truth table - use standard minimization techniques

A 6-input truth table - use standard minimization techniques

ALUOp1

ALUOp0

F0

F3

F1

F2

A0

A1

A2

Page 18: Single-Cycle CPU  DataPath

Decoding the Instruction - DataThe instruction holds the key to all of the data signals

Writereg./Readreg. B

R-type

Memory,Branch

Opcode RS RT RD ShAmt Function

31-26 25-21 20-16 15-11 10-6 5-0

Opcode RS RT Immediate Data

31-26 25-21 20-16 15-0

To ctrllogic

Readreg. A

Memory address or Branch Offset

To ctrllogic

Readreg. A

Readreg. B

Writereg.

To ALUControl

Not Used

One problem - Write register number must come from two different places.

5.3

Page 19: Single-Cycle CPU  DataPath

Instruction Decoding

5.3

InstructionMemory

Data Memory

AddAdd

4

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

Zero

Result

Result Sh.Left2

0

1

1

00

1

signextend

PC

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

Imm:[15-0]

Rs:[25-21]

Rt:[20-16]

Rd:[15-11]

Op:[31-26]Ctrl

Read Reg A: Rs

Read Reg B: Rt

Write Reg: Either Rd or Rt

Immediate Data: [15-0]

Opcode: [31-26]

0

1

We can decode the data simply by dividing up the instruction bus

Page 20: Single-Cycle CPU  DataPath

Control Signals

5.3

InstructionMemory

Data Memory

AddAdd

4

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

Zero

Result

Result Sh.Left2

0

1

1

00

1

signextend

PC

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

ALUCtrl

6 ALUOp

ALU Control - A function of: ALUOp and the function code

RegWrite

MemToReg

MemWrite

MemRead

ALUSrc

PCSrc

Load

StoreLoad

Memory

Load,R-type BEQ and zero

00: Memory01: Branch10: R-type

0

1

Ctrl

Imm:[15-0]

Rs:[25-21]

Rt:[20-16]

Rd:[15-11]

Op:[31-26]

FC:[5-0]

RegDest

R-type

Page 21: Single-Cycle CPU  DataPath

Inside the control oval

Reg ALU Mem Reg Mem MemInstruction Opcode Write Src To Reg Dest Read Write PCSrc ALUOp

5.3

• This control logic can be decoded in several ways:

• Random logic, PLA, PAL

• Just build hardware that looks for the 4 opcodes

• For each opcode, assert the appropriate signals

Note: BEQ must also check the zero output of the ALU...Note: BEQ must also check the zero output of the ALU...

BEQ 000100 0 0 x x 0 0 1 01

R-format 000000 1 0 0 1 0 0 0 10

LW 100011 1 1 1 0 1 0 0 00

SW 101011 0 1 x x 0 1 0 00

0:Rt1:Rd

0:Reg1:Imm

1:Mem0:ALU

1:Branch

00:Mem01:Branch10:R-type

Page 22: Single-Cycle CPU  DataPath

Control Signals

5.3

InstructionMemory

Data Memory

AddAdd

4

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

Zero

Result

Result Sh.Left2

0

1

1

00

1

signextend

PC

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

ALUCtrl

6

ALUOp

RegWrite

MemToReg

MemWriteMemRead

ALUSrc

PCSrc

0

1

Ctrl

Imm:[15-0]

Rs:[25-21]

Rt:[20-16]

Rd:[15-11]

Op:[31-26]

FC:[5-0]

RegDest

BEQ

ReadWrite

We must ANDBEQ and Zero

Page 23: Single-Cycle CPU  DataPath

Jumping

5.3

InstructionMemory

Data Memory

AddAdd

4

Read address

Instruction [31-0]

Read address

Write address

Write data

Read dataResult

Zero

Result

Result Sh.Left2

0

1

1

00

1

signextend

PC

16 32

Read reg. num A

RegistersRead reg num B

Write reg num

Write reg data

Read reg data A

Read reg data B

Read reg num A

ALUCtrl

6

ALUOp

RegWrite

MemToReg

MemWriteMemRead

ALUSrc

PCSrc

0

1

Ctrl

Imm:[15-0]

Rs:[25-21]

Rt:[20-16]

Rd:[15-11]

Op:[31-26]

FC:[5-0]

RegDest

BEQ

ReadWrite

1

0

Sh.Left2

J:[25-0]

Concat.26

4

32

28

[31-28]

Jump

Page 24: Single-Cycle CPU  DataPath

PerformanceWhat major functional units are used by different instructions?

R-type: Instr. Fetch Register Read ALU Register Write

Assume the following times:

Memory Access: 2ns

ALU: 2ns

Registers: 1ns

6ns

8ns

7ns

5ns

2ns

Branch: Instr. Fetch Register Read ALU

LW: Instr. Fetch Register Read ALU Memory Read Register Write

SW: Instr. Fetch Register Read ALU Memory Write

Jump: Instr. Fetch

Since the longest time is 8ns (LW),the cycle time must be at least 8ns.

Page 25: Single-Cycle CPU  DataPath

Example

• Calculate the execution times for the following program in a Single-cycle datapath with a cycle time of 50 ns

main:

add $9, $0, $0 # clear $9

lw $8, Tonto($9) # put Tonto[0] in $8

addi $9, $9, 4 # increment $9

lw $10, Tonto($9) # put Tonto[1] in $10

add $11, $10, $8

Page 26: Single-Cycle CPU  DataPath

Example 2

Calculate the execution times for the following program in a Single-cycle datapath with a cycle time of 50 ns

.dataARRAY: .word 3, 5, 7, 9, 2 #random valuesSUM: .word 0 #initialize sum to zero

.textmain: addi $6, $0, 5 #initialize loop counter to 5

addi $7, $0, 0 #initialize array index to zeroaddi $8, $0, 0 #set $8 (sum temp) to zero

REPEAT: lw $5, ARRAY($7) #R5 = ARRAY[i]add $8, $8, $5 #SUM+= ARRAY[I]addi $7, $7, 4 #increment index (i++)addi $6, $6, -1 #decrement loop counterbne $6, $0, REPEAT #check if 5 repetitionssw $8, SUM($0) #copy sum to memoryaddi $v0, $0, 10 #exit programsyscall