CPEG323 Homework Review II

CPEG323 Homework Review I

CPEG323 Homework Review II

Long ChenNovember, 30th, 2005

Homework 4
Problem 1: TerminologiesPerformance: response time & throughputLatencyWall clock timeWeighted CPISystem time

These terminologies may show up in other courses.

Homework 4 cont
Problem 2: can lower instruction count increase instruction clock cycle time?It dependsSimple instructions do less work than complex instructions doSimple instructions execute faster than complex instructionsSimple instructions -> big code size -> more cache miss, possibly

Homework 4 cont
Problem 4: Two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions.P1s clock rate = 4GHz, P2s clock rate = 6Ghz
The average number of cycles for each instruction class
Peak performance = the fastest rate that a computer can execute any instruction sequence
Peak performances of PI and P2?

Homework 4 - cont
Clearly, for P1, the ideal instruction sequence is the one composed entirely of class A instructions.Then, the peak performance of P1 is (4G cycles/sec) / (1 cycle/instn) = 4000 MIPSSimilarly, the peak performance of P2 is 3000 MIPS, with the instruction sequence composed of class A, B, and/or C.

Homework 5
Problem 1Speedup = (EX time b4 Imp)/(EX time aft Imp)It takes 100 seconds to complete program P1. Of this time, 15% is used for division, 40% for memory access. If you improve only division, whats the maximal possible speedup you can achieve?1/(1 15%) = 117.65%

Homework 5 - cont
Problem 2: Explain how the instruction add $t1, $t2, $t3 is being executed in the singlecycle datapath, using the figure 5.19 in your textbook.Four steps:Instruction fetch (IF)Instruction decode and reading registers (ID)ALU operation (EX)Write the result into the register file (WB)You should also be familiar with the multicycle case, pipelined case.

Homework 5 - cont
Problem 3Add necessary datapaths the singlecycle datapath shown in the figure 5.17 in the textbook for a new instruction jr (jump register).Modification:The datapath to allow the new PC to come from a register (Read data 1 port)A new control signal (e.g., JumpReg) to control the new PC through a multiplexor

Homework 5 - cont

Homework 5 - cont
Problem 4Find the hazard and reorder the instructions to avoid pipeline stallS1: lw $t0, 0($t1)S2: lw $t2, 4($t1)S3: sw $t2, 0($t1)S4: sw $t0, 4($t1)RAW hazard between S2 and S3: the content of $t2 is not available when S2 tries to read $t2. We have to stall the pipeline, even with the help of forwarding.However, we can reorder the code to avoid it.

Homework 5 - cont
The reordered code:S1: lw $t0, 0($t1)S2: lw $t2, 4($t1)S4: sw $t0, 4($t1)S3: sw $t2, 0($t1)

Hazard is solved by a clever arrangement of the instructions, while it still guarantees the correctness. Instruction reorder/scheduling is a common compiler technique.

Homework 5 - cont
Problem 5: Executing the following code on the pipelined datapath, what registers are being read and written at the end of the fifth cycle of the execution?S1: add $2, $3, $1S2: sub $4, $3, $5 S3: add $5, $3, $7S4: add $7, $6, $1S5: add $8, $2, $6

Homework 5 - cont
S1
S2
S3
S4
S5
CC6
CC8
CC7
CC9
S1: add $2, $3, $1S2: sub $4, $3, $5 S3: add $5, $3, $7S4: add $7, $6, $1S5: add $8, $2, $6
So, at the end of the fifth cycle of execution, registers $6 and $1 (of S4) are being read and register $2 (of S1) will be written.

Homework 5 - cont
Problem 6: How many cycles will it take to execute the code below on the pipelined datapath?S1: lw $4, 100($2)S2: sub $6, $4, $3 S3: add $2, $3, $5S2 tries to read a register $4 right following S1, a load instruction that writes the same registerForwarding cannot help this time

Homework 5 - cont

Homework 6
Problem 1: How many bits are required to implement a direct-mapped cache with 64KB of data and 4-B blocks, assuming a 32-bit address?Cache size = 2^16 bytes (64KB)Block size = 2^2 bytes (4-B)Number of cache blocks = 2^(16-2) = 2^14Each block has 32 bits of data plus a tag, which is (32 - 14 2) = 16 bits, plus a valid bit. Thus, the total cache size is 2^14 * (32 + 16 + 1) = 2^14 * 49 = 784 Kbits

Homework 6 - cont
Problem 3: Given a direct-mapped cache with 16-word data and 4-word blocks, whats the cache misses and hits when having a series of memory access by the addresses: 2,4,8,20,18,11,43,17?First, construct the cache
This is a 4-block cache

Homework 6 - cont
Read memory by address 2Which block to look at? The memory block number: word address DIV word per block2/4 = 0which maps to cache block number: memory block number MOD # of cache blocks0 module 4 = 0;

Here, we suppose the memory space is 2^8 bytes

Homework 6 - cont
Repeat the step until all memory references have been finished.Then, we have a cache with the below content

Homework 6 - cont
Problem 4Find the hazards in the code and reorder it to avoid pipeline stalllw $t0, 0($t1)addi $t3, $t0, 4sw $t3, 0($t1)lw $t2, 4($t1)addi $t4, $t2, 4sw $t4, 4($t1)

The same thing as the problem 4, homework#5

Homework 6 - cont
Problem 6Explain how the instruction lw $t1, 8($t2)is being executed in the pipelined datapath, using the figure 6.17 in the textbookFive stages:Instruction fetch (IF);Instruction decode and register file fetch (ID);Address calculation (EX);Memory access (MEM);Write back (WB);It is important that you should be able to explain the details of the execution of a given instruction. For example, for load, what should be stored in each pipeline registers?

Good Luck on your quiz!

CPEG323 Homework Review II

Documents

Transcript of CPEG323 Homework Review II