8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
CPEG323 Homework Review II
description
Transcript of CPEG323 Homework Review II
CPEG323 Homework Review I
CPEG323 Homework Review II
Long ChenNovember, 30th, 2005
Homework 4
Problem 1: TerminologiesPerformance: response time &
throughputLatencyWall clock timeWeighted CPISystem time
These terminologies may show up in other courses.
Homework 4 cont
Problem 2: can lower instruction count increase instruction clock
cycle time?It dependsSimple instructions do less work than complex
instructions doSimple instructions execute faster than complex
instructionsSimple instructions -> big code size -> more
cache miss, possibly
Homework 4 cont
Problem 4: Two different implementations, P1 and P2, of the same
instruction set. There are five classes of instructions.P1s clock
rate = 4GHz, P2s clock rate = 6Ghz
The average number of cycles for each instruction class
Peak performance = the fastest rate that a computer can execute any
instruction sequence
Peak performances of PI and P2?
Homework 4 - cont
Clearly, for P1, the ideal instruction sequence is the one composed
entirely of class A instructions.Then, the peak performance of P1
is (4G cycles/sec) / (1 cycle/instn) = 4000 MIPSSimilarly, the peak
performance of P2 is 3000 MIPS, with the instruction sequence
composed of class A, B, and/or C.
Homework 5
Problem 1Speedup = (EX time b4 Imp)/(EX time aft Imp)It takes 100
seconds to complete program P1. Of this time, 15% is used for
division, 40% for memory access. If you improve only division,
whats the maximal possible speedup you can achieve?1/(1 15%) =
117.65%
Homework 5 - cont
Problem 2: Explain how the instruction add $t1, $t2, $t3 is being
executed in the singlecycle datapath, using the figure 5.19 in your
textbook.Four steps:Instruction fetch (IF)Instruction decode and
reading registers (ID)ALU operation (EX)Write the result into the
register file (WB)You should also be familiar with the multicycle
case, pipelined case.
Homework 5 - cont
Problem 3Add necessary datapaths the singlecycle datapath shown in
the figure 5.17 in the textbook for a new instruction jr (jump
register).Modification:The datapath to allow the new PC to come
from a register (Read data 1 port)A new control signal (e.g.,
JumpReg) to control the new PC through a multiplexor
Homework 5 - cont
Homework 5 - cont
Problem 4Find the hazard and reorder the instructions to avoid
pipeline stallS1: lw $t0, 0($t1)S2: lw $t2, 4($t1)S3: sw $t2,
0($t1)S4: sw $t0, 4($t1)RAW hazard between S2 and S3: the content
of $t2 is not available when S2 tries to read $t2. We have to stall
the pipeline, even with the help of forwarding.However, we can
reorder the code to avoid it.
Homework 5 - cont
The reordered code:S1: lw $t0, 0($t1)S2: lw $t2, 4($t1)S4: sw $t0,
4($t1)S3: sw $t2, 0($t1)
Hazard is solved by a clever arrangement of the instructions,
while it still guarantees the correctness. Instruction
reorder/scheduling is a common compiler technique.
Homework 5 - cont
Problem 5: Executing the following code on the pipelined datapath,
what registers are being read and written at the end of the fifth
cycle of the execution?S1: add $2, $3, $1S2: sub $4, $3, $5 S3: add
$5, $3, $7S4: add $7, $6, $1S5: add $8, $2, $6
Homework 5 - cont
S1
S2
S3
S4
S5
CC6
CC8
CC7
CC9
S1: add $2, $3, $1S2: sub $4, $3, $5 S3: add $5, $3, $7S4: add $7,
$6, $1S5: add $8, $2, $6
So, at the end of the fifth cycle of execution, registers $6 and $1
(of S4) are being read and register $2 (of S1) will be
written.
Homework 5 - cont
Problem 6: How many cycles will it take to execute the code below
on the pipelined datapath?S1: lw $4, 100($2)S2: sub $6, $4, $3 S3:
add $2, $3, $5S2 tries to read a register $4 right following S1, a
load instruction that writes the same registerForwarding cannot
help this time
Homework 5 - cont
Homework 6
Problem 1: How many bits are required to implement a direct-mapped
cache with 64KB of data and 4-B blocks, assuming a 32-bit
address?Cache size = 2^16 bytes (64KB)Block size = 2^2 bytes
(4-B)Number of cache blocks = 2^(16-2) = 2^14Each block has 32 bits
of data plus a tag, which is (32 - 14 2) = 16 bits, plus a valid
bit. Thus, the total cache size is 2^14 * (32 + 16 + 1) = 2^14 * 49
= 784 Kbits
Homework 6 - cont
Problem 3: Given a direct-mapped cache with 16-word data and 4-word
blocks, whats the cache misses and hits when having a series of
memory access by the addresses: 2,4,8,20,18,11,43,17?First,
construct the cache
This is a 4-block cache
Homework 6 - cont
Read memory by address 2Which block to look at? The memory block
number: word address DIV word per block2/4 = 0which maps to cache
block number: memory block number MOD # of cache blocks0 module 4 =
0;
Here, we suppose the memory space is 2^8 bytes
Homework 6 - cont
Repeat the step until all memory references have been
finished.Then, we have a cache with the below content
Homework 6 - cont
Problem 4Find the hazards in the code and reorder it to avoid
pipeline stalllw $t0, 0($t1)addi $t3, $t0, 4sw $t3, 0($t1)lw $t2,
4($t1)addi $t4, $t2, 4sw $t4, 4($t1)
The same thing as the problem 4, homework#5
Homework 6 - cont
Problem 6Explain how the instruction lw $t1, 8($t2)is being
executed in the pipelined datapath, using the figure 6.17 in the
textbookFive stages:Instruction fetch (IF);Instruction decode and
register file fetch (ID);Address calculation (EX);Memory access
(MEM);Write back (WB);It is important that you should be able to
explain the details of the execution of a given instruction. For
example, for load, what should be stored in each pipeline
registers?
Good Luck on your quiz!