ee457 Quiz Fall2016 -
Transcript of ee457 Quiz Fall2016 -
September 22, 2016 10:09 am EE457 Quiz - Fall 2016 1 / 12 C Copyright 2016 Gandhi Puvvada
EE457 Quiz (~10%)Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers
Calculators and Verilog Guides are not needed and hence not allowed.
Fall 2016Instructor: Gandhi Puvvada
Thursday, 9/22/2016 (A 2H 50M exam)05:30 PM - 08:20 PM (170 min) in THH101
Viterbi School of EngineeringUniversity of Southern California
Ques# Topic Page# Time Points Score
1 State Diagram, RTL Design 2-5 50 min. 68
2 CPU Performance 5 20 min. 28
3 Unsigned and Signed numbers 6 20 min. 35
4 MIPS ISA, Byte-addressable processors
7-9 45 min. 62
5 Single-Cycle CPU 10-11 25 min. 45
Total 11 160 min. 238
Perfect Score 230
Student’s Last Name: _______________________________________
Student’s First Name: _______________________________________
Student’s DEN D2L username: [email protected]
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 2 / 12 C Copyright 2016 Gandhi Puvvada
1 ( 7 + 8 + 3 + 12 + 28 = 58 points) 45 min.
State Diagram and RTL design (Iteration counter advancement and terminal count checking):
1.1 A rectangular array of bits, A[I,J], are to be cleared starting from A[1,5] to A[5,1]. It is a total of 25 bits as shown in the diagram. Row index I goes from 1 to 5 and Column index J goes from 5 to 1.The clearing is done row by row from the top row (I == 1) to the bottom row (I == 5). And with in a row, we start from the right end (J == 5) to the left end (J == 1). It is a row-major order as shown in the pseudo code. Complete the state diagram below.
1.1.1 You know that a later assignment over-rides an earlier assignment in a procedural block of Verilog HDL. Complete the two RTL snippets for the CLR case branch. You can use a statement such as I <= I; or J <= J; if needed in your if statement.
In the left-side RTL, J is by default decremented and this decrementation is over-ridden as needed.
In the right-side RTL, I is by default incremented and this incrementation is over-ridden as needed.
for (I = 1; I <= 5; I++){ for (J = 5; J >= 1; J--) A[I][J] = 0;}
I
12345
1 2 3 4 5J
7pts
I<= 1;J<= 5;
INI DONE
START
START
ACK
ACK
RES
ET CLRA[I,J] <= 0;
C
if (J != ) J <= J - 1;
else { J <= ; I <= I + 1; }
C
C = (I == ) (J == )
When you reach the DONE statewhat are the values of I,J?I = ; J = ;
Number of clocks spent in CLRstate =
8pts
CLR: beginA[I,J] <= 0;J <= J - 1; // by default if (
end
CLR: beginA[I,J] <= 0;I <= I + 1; // by default if (
end
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 3 / 12 C Copyright 2016 Gandhi Puvvada
1.2 How many B[K] locations are cleared in the completed state diagram below? ____________
1.3 Complete the state diagram below to clear 50 bits of A[I,J,K] (25 bits of A [I,J,2] and 25 bits of A [I,J,1] ). Like in Q#1.1, for each of the two values of K, the I and the J can be made to go from [I,J]=[5,1] to [I,J]= [1,5]. The array A is maintained in a flash memory. You perhaps know that flash memory takes several clocks to write to it. For the sake of this problem, let us say that we need a minimum of two clocks to write to it and we need to wait until we get a positive indication from the flash memory status output FWD (FWD = Flash memory Writing Done) in the form of (FWD == 1). The Flash memory is not expected to update FWD properly in the first clock when you start writing. So we should not assume that it finished writing even if we see FWD = 1 during the first clock. A Flag F, initially cleared to zero, can be used to make sure that we keep I, J, and K unchanged at least for one clock. At the end of the first clock, the flag F is set indicating that the FWD can be relied upon. From the 2nd clock onwards we wait for (FWD == 1) to update the location coordinates , I, J, and K, clear the Flag F.
3pts
K<= 2;
INI DONE
START
START
ACK
ACK
RES
ET CLRB[K] <= 0;
K = 1
K <= K - 1;K = 1
for (K = 2; K >= 1; K--){ for (I = 1; I <= 5; I++)
{ for (J = 5; J >= 1; J--)A[I,J,K] = 0;
}}
12pts
INI DONE
START
START
ACK
ACK
RES
ET CLR
A[I,J,K] <= 0;
C
if (J != ) J <= J - 1;
else if ( { J <= ; I <= I + 1; }
C
C =
I<= 1;J<= 5;K<= 2;
else {
}
When you reach the DONE statewhat are the values of I,J,K,F?I = ; J = ; K = ; F = ;
F<= 0;
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 4 / 12 C Copyright 2016 Gandhi Puvvada
1.4 You have done the Min/Max lab Part 1 (with two comparators) and Part 2 (with one comparator).Part 1 takes a constant number of clocks (1+15 = 16) where as Part 2 takes a variable number of clocks (1+15 = 16 at minimum for an ascending data and 1 + 15*2 = 31 at maximum for a descending data). Here you are given a dual port memory M and two comparators, and are asked to improve the best case. The Comparator #0 looks at all even numbered locations M[I] and arrives at one set of results, Max0 and Min0. The Comparator #1 looks at all odd numbered locations M[J] and arrives at another set of results, Max1 and Min1. Finally we have a Merge state, where we compare the two sets of results (Comp#0 compares Max0 and Max1 while Comp#1 compares Min0 and Min1) and arrive at the final results, Max and Min. So the best case is 1+7+1=9 clocks for an ascending data. The worst case is 1+14+1=16 for a descending data. So, while in the CMxMn state, if one of the two, Com#0 or Comp#1, has finished its job before the other, he should just wait. Only either when both are about to be done together or when one is done and the other is about to be done, you should prepare to move to the Merge state. An inferior designer waits until both are done (rather than about to be done) and wastes one clock. We will send him back to EE354L.
Here I and J counters are 5 bit counters. I starts with 00000 and J starts with 00001. Each is incremented by 2 whenever it needs to be incremented (I<=I+2; J<=J+2;). So I remains even always and J remains odd always. When I is done, it becomes 16 (10000) and when J is done, it becomes 17 (10001). You may be able to use some or all of the following conditions in your design.(I==14), (I!=14), (I==16), (I!=16), (J==15), (J!=15), (J==17), (J!=17), M[I] >= Max0 , M[I] <= Min0, M[J] >= Max1 , M[J] <= Min1, Max1 >= Max0, Min1 <= Min0,
16ptsReset
Start
Start 1
INI LOAD
Merge
CMxMn
State Diagram for Part 2 for Dual-Port Memory
I <= 0;
Min0 <= M[I];Max0 <= M[I];I <= I + 2;
1
2 flags, 2 comparators, 2 counters.
Flag0<=0;
DONE
J <= 1;Flag1<=0;
Min1 <= M[J];Max1 <= M[J];J <= J + 2;
C
1
if (Max1 >= Max0) Max <= Max1;else Max <= Max0;
if (Min1 <= Min0) Min <= Min1;else Min <= Min0;
C
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 5 / 12 C Copyright 2016 Gandhi Puvvada
You need to complete the design on the previous page by (a) filling up "0" or "1" in the six boxes (b) writing I <= I + 2; or J <= J + 2; at appropriate places in the code(c) and finally figuring out the condition "C" governing state transition from CMxCMn state to the Merge state. Write the long boolean expression below for the condition C.C = ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
2 ( 14 + 8 = 22 points) 15 min. Performance
2.1 Let us say, we are a hardware IP (Intellectual Property) vendor. We designed a Multiply instruction execution IP which takes 20 clocks at 2 GHz and sold the IP to two processor manufacturers, XYZ and ABC, who implemented our USC CISC ISA. The frequency of usage of the multiply instruction (in the dynamic execution trace of the binary produced by compiling the benchmark by a third party compiler) is 10%. Both processors ran at 2 GHz. But percentage of execution time spent on our Mult instruction is 20% in XYZ and 25% in ABC. 1. If you have adequate data, compare the performance of the two processors. If the data is inadequate, explain how it is inadequate. Also, if there is excessive data, state what is excessive.________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________2. If you have adequate data, compare the native MIPs ratings of the two processors. If the data is inadequate, explain how it is inadequate.________________________________________________________________________________________________________________________________________________________3. If you have adequate data, compare the relative MIPs ratings of the two processors. If the data is inadequate, explain how it is inadequate.________________________________________________________________________________________________________________________________________________________
2.2 10% improvement in Instruction A is better than 10% improvement in Instruction B, if (select)(a) frequency of occurrence of A in the dynamic execution trace is more than that of B(b) percentage of execution time spent on A is higher than that of B(c) CPI of A is higher than CPI of B(d) none of the above
Reducing clocks taken by Instruction A by 1 clock is better than reducing clocks taken by Instruction B by 1 clock, if (select)(a) frequency of occurrence of A in the dynamic execution trace is more than that of B(b) percentage of execution time spent on A is higher than that of B(c) CPI of A is higher than CPI of B(d) none of the above
12pts
14pts
8pts
September 22, 2016 10:09 am EE457 Quiz - Fall 2016 6 / 12 C Copyright 2016 Gandhi Puvvada
3 ( 10 + 25 = 35 points) 20 min. unsigned and signed numbers
3.1 If you are allowed to use numbers from the SW (South-West) quadrant only (as shown in the figure below) you can only arrive at some of the 8 combinations in the table on the side. Cross off the rows, which are not possible to fill with these limited choices. Complete the last two columns for the remaining rows.
3.2 Given two 4-bit numbers A (a3a2a1a0) and B (b3b2b1b0), produce 2AleB_BW. 2AleB_BW stands for 2A (double of A) le (less than or equal) to B BW both ways (BW = both ways = whether we treat A and B as signed numbers in 2’s complement notation or unsigned numbers). Let us analyze to see which cases need an actual comparator to compare and which can be concluded easily.
To double A (a3a2a1a0), we can append a zero at the right-end whether A is signed (2’s comp) or unsigned, so basically we are comparing a3a2a1a00 with b3b2b1b0 . _________ T / FIf a3 is a 1, we can conclude 2AleB_BW as ________ (true/false) without any comparator because when A and B are considered to be _____________(signed/unsigned) , the 2A is too ________ (big/small) for any B to match up. For the remaining part, i.e for (a3=0), we need to consider the two cases, (b3=0) and (b3=1). For the case [(a3=0) and (b3=1)], we can conclude 2AleB as false, without any comparator, if both A and B are treated as ______________ (signed/unsigned) numbers. For the case [(a3=0) and (b3=0)], since both are positive, we can use an unsigned 4-bit comparator to find if 2AleB is true or not by comparing a2a1a00 with b3b2b1b0 ______ T / F.If a2a1a0 is lower than b3b2b1 then a2a1a00 is lower than b3b2b1b0 even if b0 is a 1. __ T/FHowever, for a2a1a00 to be equal to b3b2b1b0 , the b0 needs to be a ___ (0 / 1)Complete the 3-bit unsigned comparison below to compare a2a1a0 with b3b2b1. Combine it with b0 to produce an IntR ( = intermediate result) standing for a2a1a00 is lower than or equal to b3b2b1b0 . Combine this with requirements on a3 and b3 to produce the final inference 2AleB_BW.
Operation ResultRight/Wrongif numbers aretreated as signed numbers
ResultRight/Wrongif numbers aretreated as unsigned numbers
V Raw Carry
C4
Right
Wrong
Addition
Subtraction
Subtraction
Subtraction
Subtraction
Addition
Addition
Addition Right
Right
Wrong
Wrong Wrong
Right
Right
Wrong
Right
Right
Wrong
Wrong Wrong
Right
OV
ERFL
OW
(COUT)
10pts
1514
13
12
11
10
9 8
01
2
3
4
5
6
7
00000001
0010
0011
0100
0101
0110011110001001
1010
1011
1100
1101
11101111
Error point:C bit is setUNSIGNED
SMA
LLER
mag
.LA
RG
ER m
ag.
- 1- 2
- 3
- 4
- 5
- 6
- 7 - 8
+0+1
+2
+4
+5
+6
+7
00000001
0010
0011
0100
0101
0110
011110001001
1010
1011
1100
1101
11101111
Error Point:V bit is set
+3
SIGNED
SMALLER mag.
LARGER mag.
4-bitCirclesJust FYI
SW SW
25pts
a bcin
scout
a bcin
scout
a bcin
scout
X2 X1 X0
Y2 Y1 Y0
S2 S1 S0
C0
ADD/
SUB
RawC
arry
Carry
V
a2 a1 a0
b b b
d2 d1 d0
VDD
Zero
d0
d2
d1
2AleB_BWIntR_Lower
IntR_Equal
IntR
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 7 / 12 C Copyright 2016 Gandhi Puvvada
4 ( 6+3+3+4+3+6+6+6+3+8+20 = 68 points) 50 min. MIPS ISA, Byte-addressable processors
4.1 It is possible to replace the unconditional branch, beq $0,$0, L2 with (i) a jump instruction: j L2 (ii) a jump register instruction jr $8 where $8 was previously preloaded with the address 0000_006C using the following instructions: lui $8, 0000h; ori $8, $8, 006Ch
How do you compare the three choices? _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
4.1.1 Arrive at the 16-bit offset filed in the translation of beq $0,$0, L2 on the side.
4.1.2 Arrive at the 26-bit filed in the translation of j L2 on the side.
4.2 Reproduced below is an extract from your class-notes
6pts
// upstream code40 beq $1, $2, L1;44 addu $4, $0, $0;48 beq $0, $0, L2;L1(=4C) ................60 ....64 ....L2(=6C)....
00000 00000 3pts
3pts
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 8 / 12 C Copyright 2016 Gandhi Puvvada
4.2.1 I have corrected the textbook code incorrectly because I assumed that _____________________________________________________________________________________________________________________________________________________________________________
4.2.2 The "add" in the five-instruction rectangle can be replaced by "addi $29, $29, 4" to add a 4 to move the stack pointer $29. Similarly the "sub" can be replaced by "_____________________________________________" to subtract a 4 to move the stack pointer $29.
4.2.3 A student wrote the above code initially and later while debugging he decided that the call to subroutine C was not required. But he just deleted the jal C instruction instead of deleting all the 5 instructions in the rectangles! Is it that it is just wasteful that he is executing 4 instructions or is it harmful? _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
4.2.4 Answer the above question again, this time assuming that it is a conditional call instruction beqal $1, $2, C in the place of jal C . Recall the made-up instruction beqal (branch if equal and link) we used in an earlier exam question, where we write to the link register $31 conditionally. _____________ ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
4.2.4.1 Based on the above analysis, Mr._______________ (Bruin / Trojan) recommends removal of the AND gate and OR gate in the rectangle by redefining the beqal as conditional call but unconditional deposition of the return address into the link register $31. And he further requires that every use of beqal should be preceded by two lines to __________ (save/retrieve) the contents of $31 on the compiler designated stack and followed by two lines to __________ (save/retrieve) the return address saved on the stack into the link register $31.
4.3 The textbook and the class-notes show a ___________ (SRAM / SSRAM) for the Data Memory _______________ (though/because) in real design we use ___________ (SRAM / SSRAM) for the Data Memory (Data Cache).
4pts
3pts
6pts
6pts
6pts
3pts
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 9 / 12 C Copyright 2016 Gandhi Puvvada
4.4 Intel follows ___________ (Little Endian / Big Endian) system. In the Intel 80486 processor system address space, byte 0000_747CH is the ____________ (most / least) significant byte of the 32-bit word with system address ______________ (state in hexadecimal). State the next three 32-bit word addresses, next to the 32-bit word 40: __________________________________.State the next three 64-bit long word addresses, next to the 64-bit long word 40, in the context of i860 (64-bit data 32-bit logical address byte addressable processor): ______________________.
4.5 Shown on the side is the memory interface to a byte-wide memory chip in a memory system based on minimum number of byte-wide banks for an USC128 processor (128-bit data, 32-bit logical address, byte-addressable processor) . USC processors are similar to Intel processors so far as byte-enable pins are concerned. The address pins on the processor are (select) (i) A[31:0] (ii) A[31:3],/BE[7:0] (iii) A[31:4],/BE[15:0]
Fill-in the 3-blanks (marked by the 3 arrows) in the figure on the side. Also find the system addresses corresponding to the lowest-addressed two bytes of this memory chip. The lowest-addressed two bytes of this chip map to the system byte addresses (in hex) _______________________________ _________________________________________________.The system addresses mapping to any location in this memory chip will have the same upper ________ (state a number) bitsnamely ______________ (state their labels in the form X[13:2]). If this chip goes bad, until you replace, you should avoid using memory in system address range (state the range in hex): ______________________________________________________
8pts
A31A30A29A28A27A26A25A24
A23A22A21A20
CS
WERD
A[ ]D[7:0]
D[ ]
A[19:4]
BE4
____KB
Note
20pts
Blank area (for rough work)
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 10 / 12 C Copyright 2016 Gandhi Puvvada
5 ( 15 + 30 = 45 points) 25 min. Single-cycle CPU:
You are familiar with the ordinary jump instruction J (Jump with the 26-bit jump address field), Jal (Jump and Link), Jr rs, (Jump register rs), and the Beq (Branch if Equal). In class we discussed a made-up instruction, Beqal (Branch if equal and link, a conditional call instruction). For this question, assume that the Beqal writes unconditionally to the link register $31 the return address (PC+4) . Hence the control signal "beqal" is crossed off in the control signal table below.
5.1 The data path on the next page is nearly complete. Complete the connections to the 7 loose ends which were marked with numbered arrows .
5.2 Control Signal Table: Complete the three rows and three columns. Whenever possible, use don’t cares.
Inst
ruct
ion
AL
USr
c
AL
UO
p1
AL
Uop
0
Reg
Wri
te
Reg
Dst
Mem
tore
g
Mem
Rea
d
Mem
Wri
te
Bra
nch
beqa
l (no
t ne
eded
)
JUM
P
Jal
JR
R-format 0 1 0 1 1 0 0 0 0 Xlw 1 0 0 1 0 1 1 0 0 Xsw 1 0 0 0 X X 0 1 0 Xbeq 0 0 1 0 X X 0 0 1 Xbeqal XJ X X X 0 X X 0 0 X X 1Jal X 1JR rs X 1
11+4pts
1
21+9pts
Blank area
It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The next three topics, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover advanced topics. Best! Gandhi, TAs: Sanmukh, Pezhman, Fangzhou, Mentors: Bo, Monisha, Nishant HW Graders: Hongtai, Aashish, Yashah Lab graders: Neil, Dong, Congyi
Control
JumpJR
JalPCSrc
RegDst
BranchMemReadMemtoReg
ALUOpMemWrite
ALUSrcRegWrite
Zero
ALUcontrol
1
0
1
0
Jump JR
10
10
Jal
Jump Address [31:0]Instruction [31:0]
PC+4 [31:28]
21 3 4 5
6 7
September 21, 2016 4:23 pm EE457 Quiz - Fall 2016 12 / 12 C Copyright 2016 Gandhi Puvvada
Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.Student’s Last Name:____________________ email: __________________