CS 35101 Computer Architecture Spring 2008 Week...

32
CS 35101 Ch 5.1 Steinfadt, SP08 KSU CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3 Materials adapated from Mary Jane Irwin (www. cse . psu . edu/~mji ) and Kevin Schaffer [adapted from D. Patterson slides ]

Transcript of CS 35101 Computer Architecture Spring 2008 Week...

Page 1: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.1 Steinfadt, SP08 KSU

CS 35101Computer Architecture

Spring 2008

Week 10: Chapter 5.1-5.3

Materials adapated from Mary Jane Irwin(www.cse.psu.edu/~mji) and Kevin Schaffer

[adapted from D. Patterson slides]

Page 2: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.2 Steinfadt, SP08 KSU

Head’s Up Last course week’s material

Understanding performance, Ch. 4.1-4.6 This week’s material

Designing a MIPS single cycle datapath- Reading assignment – PH 5.1-5.3

Next week’s material More on single and multi-cycle datapath design

- Reading assignment – PH: 5.4-5.6

Reminders HW 3 is due Thursday, 3/27 by the start of class Project 2 is posted and due on 4/22 Exam #2 is Tuesday, April 15

Page 3: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.3 Steinfadt, SP08 KSU

Datapath design tended to just work … Control pathsare where the system complexity lives. Bugsspawned from control path design errors reside inthe microcode flow, the finite-state machines, and allthe special exceptions that inevitably spring up in amachine design like thistles in a flower garden.

The Pentium Chronicles, Colwell, pg. 64

Page 4: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.4 Steinfadt, SP08 KSU

Review: Design Principles Simplicity favors regularity

fixed size instructions – 32-bits only three instruction formats

Good design demands good compromises three instruction formats

Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes

Make the common case fast arithmetic operands from the register file (load-store

machine) allow instructions to contain immediate operands

Page 5: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.5 Steinfadt, SP08 KSU

We're ready to look at an implementation of the MIPS Simplified to contain only:

memory-reference instructions: lw, sw arithmetic-logical instructions: add, addu, sub, subu,and, or, xor, nor, slt, sltu

arithmetic-logical immediate instructions: addi, addiu,andi, ori, xori, slti, sltiu

control flow instructions: beq, j Generic implementation:

use the program counter (PC) to supplythe instruction address and fetch theinstruction from memory(and update the PC)

decode the instruction (and read registers) execute the instruction

The Processor: Datapath & Control

FetchPC = PC+4

DecodeExec

Page 6: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.6 Steinfadt, SP08 KSU

Abstract Implementation View Two types of functional units:

elements that operate on data values (combinational) elements that contain state (sequential)

Single cycle operation Split memory model - one memory for instructions

and one for data

Address Instruction

InstructionMemory

Write Data

Reg Addr

Reg Addr

Reg Addr

Register

File ALUData

Memory

Address

Write Data

Read DataPC

Read Data

Read Data

Page 7: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.7 Steinfadt, SP08 KSU

Clocking Methodologies Clocking methodology defines when signals can

be read and when they can be writtenfalling (negative) edge

rising (positive) edgeclock cycle

clock rate = 1/(clock cycle) e.g., 10 nsec clock cycle = 100 MHz clock rate 1 nsec clock cycle = 1 GHz clock rate

State element design choices level sensitive latch master-slave and edge-triggered flipflops

Page 8: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.8 Steinfadt, SP08 KSU

State Elements Set-reset latch

R

S

Q

!Q 0!Q(t)

01

!Q(t+1)

011Q(t)00

110001

Q(t+1)SR

clock

D

Q

!Q

clock

D

Q

Level sensitive D latch

latch is transparent when clock is high (copies input tooutput)

Page 9: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.9 Steinfadt, SP08 KSU

Two-Sided Clock Constraint Race problem with latch based design …

D

clock

Q

!Q

D-latch0D

clock

Q

!Q

D-latch1

clock

Consider the case when D-latch0 holds a 0 and D-latch1 holds a 1 and you want to transfer thecontents of D-latch0 to D-latch1 and vica versa must have the clock high long enough for the transfer to

take place must not leave the clock high so long that the

transferred data is copied back into the original latch Two-sided clock constraint

Page 10: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.10 Steinfadt, SP08 KSU

State Elements, con’t Solution is to use flipflops that change state (Q)

only on clock edge (master-slave)

master (first D-latch) copies the input when the clock ishigh (the slave (second D-latch) is locked in its memorystate and the output does not change)

slave copies the master when the clock goes low (themaster is now locked in its memory state so changes atthe input are not loaded into the master D-latch)

D

clock

Q

!Q

D-latchD

clock

Q

!Q

D-latchQ

!Q

D

clockclock

D

Q

Page 11: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.11 Steinfadt, SP08 KSU

One-Slided Clock Constraint Master-slave (edge-triggered) flipflops removes

one of the clock constraints

D

clock

Q

!Q

MS-ff0D

clock

Q

!Q

MS-ff1

clock

Consider the case when MS-ff0 holds a 0 andMS-ff1 holds a 1 and you want to transfer thecontents of MS-ff0 to MS-ff1 and vica versa must have the clock cycle time long enough to

accommodate the worst case delay path One-sided clock constraint

Page 12: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.12 Steinfadt, SP08 KSU

Latches vs Flipflops Output is equal to the stored value inside the

element Change of state (value) is based on the clock

Latches: output changes whenever the inputs changeand the clock is asserted (level sensitive methodology)

- Two-sided timing constraint Flip-flop: output changes only on a clock edge (edge-

triggered methodology)- One-sided timing constraint

A clocking methodology defines when signals canbe read and written – wouldn’t want to read asignal at the same time it was being written

Page 13: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.13 Steinfadt, SP08 KSU

Our Implementation An edge-triggered methodology, typical execution

read contents of some state elements (combinationalactivity, so no clock control signal needed)

send values through some combinational logic write results to one or more state elements on clock

edge

Stateelement

1

Stateelement

2

Combinationallogic

clock

one clock cycle

Assumes state elements are written on every clockcycle; if not, need explicit write control signal write occurs only when both the write control is asserted

and the clock edge occurs

Page 14: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.14 Steinfadt, SP08 KSU

Fetching Instructions Fetching instructions involves

reading the instruction from the Instruction Memory updating the PC value to be the address of the next

(sequential) instruction

ReadAddress

Instruction

InstructionMemory

Add

PC

4

PC is updated every clock cycle, so it does not need anexplicit write control signal just a clock signal

Reading from the Instruction Memory is a combinationalactivity, so it doesn’t need an explicit read control signal

FetchPC = PC+4

DecodeExec

clock

Page 15: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.15 Steinfadt, SP08 KSU

Instruction Formats Review

5:010:615:1120:1625:2131:26

functshamtrdrtrsop

15:020:1625:2131:26

immedrtrsop

25:031:26

addressop

Page 16: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.16 Steinfadt, SP08 KSU

Decoding Instructions Decoding instructions involves

sending the fetched instruction’s opcode and functionfield bits to the control unit

and

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ControlUnit

reading two values from the Register File- Register File addresses are contained in the instruction

FetchPC = PC+4

DecodeExec

Page 17: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.17 Steinfadt, SP08 KSU

Note that both RegFile read ports are active for allinstructions during the Decode cycle using the rs andrt instruction field addresses Since haven’t decoded the instruction yet, don’t know what

the instruction is ! Just in case the instruction uses values from the RegFile do

“work ahead” by reading the two source operands

Which instructions do make use of the RegFile values?

Also, all instructions (except j) use the ALU afterreading the registers

Why? memory-reference? arithmetic? control flow?

Reading Registers “Just in Case”

Page 18: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.18 Steinfadt, SP08 KSU

Executing R Format Operations R format operations (add, sub, slt, and, or)

perform operation (op and funct) on values in rs and rt store the result back into the Register File (into location rd)

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

overflowzero

ALU controlRegWrite

R-type:31 25 20 15 5 0

op rs rt rd functshamt

10

Note that Register File is not written every cycle (e.g. sw), sowe need an explicit write control signal for the Register File

FetchPC = PC+4

DecodeExec

Page 19: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.19 Steinfadt, SP08 KSU

Consider slt Instruction R format operations (add, sub, slt, and, or)

perform operation (op and funct) on values in rs and rt store the result back into the Register File (into location rd)

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

overflowzero

ALU controlRegWrite

R-type:31 25 20 15 5 0

op rs rt rd functshamt

10

Note that Register File is not written every cycle (e.g. sw), sowe need an explicit write control signal for the Register File

FetchPC = PC+4

DecodeExec

Page 20: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.20 Steinfadt, SP08 KSU

Remember the R format instruction slt slt $t0, $s0, $s1 # if $s0 < $s1

# then $t0 = 1# else $t0 = 0

Consider the slt Instruction

Where does the 1 (or 0) come from to store into $t0 in theRegister File at the end of the execute cycle?

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

overflowzero

ALU controlRegWrite

Page 21: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.21 Steinfadt, SP08 KSU

Page 22: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.22 Steinfadt, SP08 KSU

Executing Load and Store Operations Load and store operations have to

compute a memory address by adding the baseregister (in rs) to the 16-bit signed offset field in theinstruction

- base register was read from the Register File duringdecode

- offset value in the low order 16 bits of the instructionmust be sign extended to create a 32-bit signed value

store value, read from the Register File duringdecode, must be written to the Data Memory

load value, read from the Data Memory, must bestored in the Register File

I-Type: op rs rt address offset31 25 20 15 0

Page 23: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.23 Steinfadt, SP08 KSU

Executing Load and Store Operations, con’t

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

overflowzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

SignExtend

MemWrite

MemRead

Page 24: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.25 Steinfadt, SP08 KSU

Executing Branch Operations Branch operations have to

compare the operands read from the Register Fileduring decode (rs and rt values) for equality (zeroALU output)

compute the branch target address by adding theupdated PC to the sign extended16-bit signedoffset field in the instruction

- “base register” is the updated PC- offset value in the low order 16 bits of the instruction

must be sign extended to create a 32-bit signedvalue and then shifted left 2 bits to turn it into a wordaddress

I-Type: op rs rt address offset31 25 20 15 0

Page 25: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.26 Steinfadt, SP08 KSU

Executing Branch Operations, con’t

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

zero

ALU control

SignExtend16 32

Shiftleft 2

Add

4 Add

PC

Branchtargetaddress

(to branchcontrol logic)

Page 26: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.28 Steinfadt, SP08 KSU

Executing Jump Operations Jump operations have to

replace the lower 28 bits of the PC with the lower 26 bitsof the fetched instruction shifted left by 2 bits

ReadAddress

Instruction

InstructionMemory

Add

PC

4

Shiftleft 2

Jumpaddress

26

4

28

J-Type: op

31 25 0

jump target address

Page 27: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.29 Steinfadt, SP08 KSU

Creating a Single Datapath from the Parts Assemble the datapath elements, add control lines

as needed, and design the control path Fetch, decode and execute each instructions in

one clock cycle – single cycle design no datapath resource can be used more than once per

instruction, so some must be duplicated (e.g., why wehave a separate Instruction Memory and Data Memory)

to share datapath elements between two differentinstruction classes will need multiplexors at the input ofthe shared elements with control lines to do theselection

Cycle time is determined by length of the longestpath

Page 28: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.30 Steinfadt, SP08 KSU

Fetch, R, and Memory Access Portions

ReadAddress

Instruction

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

ovfzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

Page 29: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.31 Steinfadt, SP08 KSU

Multiplexor Insertion

MemtoReg

ReadAddress

Instruction

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

ovfzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

ALUSrc

Page 30: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.32 Steinfadt, SP08 KSU

Clock Distribution

MemtoReg

ReadAddress

Instruction

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

ovfzero

ALU control

RegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

ALUSrc

System Clock

clock cycle

Page 31: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.33 Steinfadt, SP08 KSU

Adding the Branch Portion

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

ovfzero

ALU controlRegWrite

DataMemory

Address

Write Data

Read Data

MemWrite

MemReadSign

Extend16 32

MemtoRegALUSrc

ReadAddress

Instruction

InstructionMemory

Add

PC

4

Page 32: CS 35101 Computer Architecture Spring 2008 Week …ssteinfa/classes/arch.sp08/slides/Ch5/webslides/...CS 35101 Computer Architecture Spring 2008 Week 10: ... We're ready to look at

CS 35101 Ch 5.35 Steinfadt, SP08 KSU

We wait for everything to settle down

ALU might not produce “right answer” right away Memory and RegFile reads are combinational (as are

ALU, adders, muxes, shifter, signextender) Use write signals along with the clock edge to determine

when to write to the sequential elements (to the PC, tothe Register File and to the Data Memory)

The clock cycle time is determined by the logicdelay through the longest path

Our Simple Control Structure

We are ignoring some details like registersetup and hold times