05-superscalar 2
-
Upload
johnzipper -
Category
Documents
-
view
222 -
download
0
Transcript of 05-superscalar 2
-
8/3/2019 05-superscalar 2
1/38
Nov 2010 ICS311 - superscalar architecture 2 1
Superscalar Processor Architecture(2)
-
8/3/2019 05-superscalar 2
2/38
Nov 2010 ICS311 - superscalar architecture 2 2
Super-scalar Processor
basic concept basic operation
Limitations/challenges
Architecture Issue policy Dealing with false dependencies
Other design issues
Operand fetch and update policies Dealing with conditional branches Preserving the sequential consistency of execution
-
8/3/2019 05-superscalar 2
3/38
Nov 2010 ICS311 - superscalar architecture 2 3
Register operand fetching, updating
-
8/3/2019 05-superscalar 2
4/38
Nov 2010 ICS311 - superscalar architecture 2 4
operand fetch policiesdirect issue
shelved issueissue bound
dispatch bound
-
8/3/2019 05-superscalar 2
5/38
Nov 2010 ICS311 - superscalar architecture 2 5
operand fetch policiesConsider the instruction in form of:
add Rd, Rs1, Rs2
E.g. in IA32: add AX, BX, CX
How/when do the values of the source operands reach theexecution unit?
How/when does the computed result get committed to thedestination operand?
-
8/3/2019 05-superscalar 2
6/38
Nov 2010 ICS311 - superscalar architecture 2 6
(direct issue)operand fetch policies
I-bufferdecode/issue
EU
reg file
src operands
src reg numbers
opcode, dest
reg nos.
-
8/3/2019 05-superscalar 2
7/38Nov 2010 ICS311 - superscalar architecture 2 7
From decode/issue
register file
VRs1, Rs2, Rd
fetch Rs1, Rs2 (if avail),
reset V-bit of Rd
EU
OC Os1 Os2 Rd
update Rd, set its V-bit
(direct issue)operand fetch policies
-
8/3/2019 05-superscalar 2
8/38Nov 2010 ICS311 - superscalar architecture 2 8
shelved issue
-
8/3/2019 05-superscalar 2
9/38Nov 2010 ICS311 - superscalar architecture 2 9
I-buffer
decode/issue
shelving-buffer
dispatch
EU
operand fetch policies (shelved issue): issue bound
reg file
src operands
src reg numbers
opcode, destreg nos.
Operands are
fetched at issue
-
8/3/2019 05-superscalar 2
10/38Nov 2010 ICS311 - superscalar architecture 2 10
From decode/issue
EU
operand fetch/update (shelved issue): issue bound
register file
V
Rs1, Rs2, Rd
fetch Rs1, Rs2 (if avail),
reset V-bit of Rd
Reservation Station
OC Os1 Vs1 Os2 Vs2 Rd
update Rd, set its V-bit
out-of-order dispatch1 instr/cycle
-
8/3/2019 05-superscalar 2
11/38Nov 2010 ICS311 - superscalar architecture 2 11
I-buffer
decode/issue
shelving-buffer
dispatch
EU
operand fetch policies (shelved issue): dispatch bound
reg file
src operands
src reg numbers
opcode, dest
reg nos.
operands are fetched
at dispatch
-
8/3/2019 05-superscalar 2
12/38Nov 2010 ICS311 - superscalar architecture 2 12
From decode/issue
EU
operand fetch/update (shelved issue): dispatch bound
register file
VRs1, Rs2, Rd
fetch Rs1, Rs2 (if avail),
reset V-bit of Rd
Reservation Station
OC Rs1 Rs2 Rd
update Rd, set its V-bit
dispatch
OC Os1 Vs1 Os2 Vs2Rd
-
8/3/2019 05-superscalar 2
13/38
Nov 2010 ICS311 - superscalar architecture 2 13
In the case of shelved issue
Compare issue bound vs. dispatch bound
-
8/3/2019 05-superscalar 2
14/38
Nov 2010 ICS311 - superscalar architecture 2 14
Dealing with conditional branches:speculative execution:Speculative execution
-
8/3/2019 05-superscalar 2
15/38
Nov 2010 ICS311 - superscalar architecture 2 15
Non-speculative executionon encounter of a conditional branch,
suspend fetching and processingdependent instructions: leads to delays
Speculative execution
on encounter of a conditional branch,predict the direction of the branch andcontinue processing along the predictedpath.Avoids delays as long as predictionsare correct.
-
8/3/2019 05-superscalar 2
16/38
Nov 2010 ICS311 - superscalar architecture 2 16
Branch prediction
Static branch prediction
Predict never taken
Continue to fetch instructions sequentially until thebranch direction is resolved.
Predict always taken
As soon as the branch target is decoded, startfetching from the target. (Note that the target maybe decoded before the branch condition is evaluatedand branch direction is resolved.)
Speculative execution
-
8/3/2019 05-superscalar 2
17/38
Nov 2010 ICS311 - superscalar architecture 2 17
Static branch prediction
Predict by opcode, e.g.
JLE may be typically used for loop
control and the branch is usually taken.JE also used in loop control but is
usually not taken.
Compiler may exploit.
-
8/3/2019 05-superscalar 2
18/38
Nov 2010 ICS311 - superscalar architecture 2 18
Dynamic branch prediction
Taken/not taken switch Keep record of the result of the previous
execution of the branch and assume the currentexecution will go the same way.
Branch history table
Maintain recent history of a branchs executions.Use the history to predict direction.
-
8/3/2019 05-superscalar 2
19/38
Nov 2010 ICS311 - superscalar architecture 2 19
Branch prediction
what happens if the prediction is wrong?!
-
8/3/2019 05-superscalar 2
20/38
Nov 2010 ICS311 - superscalar architecture 2 20
Branch prediction
what happens if the prediction is wrong?!
Need to flush the speculatively issued, dispatched, and
executed instructions from buffers & EUs.
cancel/discard effects of the speculatively executed
instructions
-
8/3/2019 05-superscalar 2
21/38
Nov 2010 ICS311 - superscalar architecture 2 21
preserving the sequential consistency ofexecution:ROB
-
8/3/2019 05-superscalar 2
22/38
Nov 2010 ICS311 - superscalar architecture 2 22
parallel executioninstructions can finish out of program
order! Need to preserve seq consistency
Definitions:
distinguish
finish: required operation of instruction is
accomplished except writeback complete: last action performed, e.g. wb retire: wb + delete
-
8/3/2019 05-superscalar 2
23/38
Nov 2010 ICS311 - superscalar architecture 2 23
preserving sequential consistency of instruction executionusing
Re-order buffer (ROB)
-
8/3/2019 05-superscalar 2
24/38
Nov 2010 ICS311 - superscalar architecture 2 24
Re-order buffer (ROB) description & use cyclic buffer holding record of all active
instructions; issued but yet to be retired keeps track of state of instruction; e.g. i, x, f
head: first free slot tail: next instruction to be retired
add new instruction at head on issue an instruction in ROB may complete and retire if
it has finished and all instructions ahead haveretired
-
8/3/2019 05-superscalar 2
25/38
Nov 2010 ICS311 - superscalar architecture 2 25
I-buffer
shelving-buffers
dispatch unit
EUs
Preserving sequential consistency: ROB
ROB
completion/r
etire unit
issued
in execution
finished
completed/retired
decode/ issueunit
-
8/3/2019 05-superscalar 2
26/38
Nov 2010 ICS311 - superscalar architecture 2 26
Exercise
-
8/3/2019 05-superscalar 2
27/38
Nov 2010 ICS311 - superscalar architecture 2 27
Example superscalar pipeline
In-order issue. Non blocking. Issue rate = 1 instructionper cycle
Shelved issue. One 4 entry reservation station. No
bypass. 3 execution units same functions (iadd, isub, imul, idiv)
Possible out-of-order dispatch. Dispatch window = 3.Dispatch rate 1 instruction per cycle.
Execution times: iadd, isub=3cycles; imul, idiv = 6cycles.(EU not pipelined)
Register renaming with very large number of H/Wregisters
Exercise
-
8/3/2019 05-superscalar 2
28/38
Nov 2010 ICS311 - superscalar architecture 2 28
Given the following instruction stream
I1: iadd R1, R2, R3
I2: imul R4, R5, R6
I3: iadd R7, R8, R9
I4: iadd R4, R7, R8I5: idiv R10, R8, R3
I6: isub R2, R5, R6
I7: iadd R11, R3,R6
I8: isub R12, R3,R6
I9: .
Exercise
Instruction format:
opcode Rdest, Rsrc1, Rsrc2
-
8/3/2019 05-superscalar 2
29/38
Nov 2010 ICS311 - superscalar architecture 2 29
Trace the processing of I1 to I8 assuming that
initially, all the 8 instructions are in the I-buffer
Show contents of the I-buffer, RS and EUs duringeach cycle.
Show contents of the ROB during each cycleDerive the total execution time of the 8
instructions in cycles (from the time I1 is issuedto the time I8 is retired)
Exercise
-
8/3/2019 05-superscalar 2
30/38
Nov 2010 ICS311 - superscalar architecture 2 30
Super scalar implementation
Processor hardware requirements: summary
-
8/3/2019 05-superscalar 2
31/38
Nov 2010 ICS311 - superscalar architecture 2 31
Processor hardware requirements: summary
Multiple pipelined fetch and decode stages,and branch prediction logic.
Logic for determining true datadependencies and mechanisms forcommunicating values to where needed
during execution. Mechanisms for issuing multiple
instructions in parallel.
-
8/3/2019 05-superscalar 2
32/38
Nov 2010 ICS311 - superscalar architecture 2 32
Processor hardware requirements: summary
Resources for parallel execution of multipleinstructions
Multiple pipelined functional units
Memory hierarchies for simultaneousservicing of multiple references.
Mechanisms for committing the process
state in correct order.
-
8/3/2019 05-superscalar 2
33/38
Nov 2010 ICS311 - superscalar architecture 2 33
Recognizing opportunities for parallelprocessing of instructions
-
8/3/2019 05-superscalar 2
34/38
Nov 2010 ICS311 - superscalar architecture 2 34
Recognizing opportunities for parallel processing ofinstructions
Processor hardware Processor hardware designed to look for
and recognize opportunities forparallelism.
Implies sophisticated, expensive, bulkyhardware
Compiler support Compiler programmed to look for and
recognize opportunities for parallelism. Then communicate accordingly to the
hardware (machine language must support!)
-
8/3/2019 05-superscalar 2
35/38
Nov 2010 ICS311 - superscalar architecture 2 35
Recognizing opportunities for parallel processing of instructions
Programming language support
Programming language allows
programmer to communicateopportunities for parallel processing.
Information communicated to thehardware via the compiler
-
8/3/2019 05-superscalar 2
36/38
Nov 2010 ICS311 - superscalar architecture 2 36
Reading assignment:
Intel Pentium II super scalar processorarchitecture. Stalling pp.515-520.
Power PC 601 super scalar processorarchitecture. Stalling pp. 521-526
MPC7450 Microprocessor, Motorola. P. 1-4(attached)
S l A hit t
-
8/3/2019 05-superscalar 2
37/38
Nov 2010 ICS311 - superscalar architecture 2 37
Superscalar Architecture
practical assignment
Use the simplescalar simulation tool to demonstrate the
effects of superscalarity of a processor on performance as
measured by instruction throughput.You should be able to come up with a graph illustrating
the relationship between the two variables, i.e. a selected
aspect of superscalarity and instruction throughput.You may use a selected benchmark program in carrying
out the above.Hand in your results including a report on how you
obtained them.Work in groups of no more than four.
Ref: http://www.simplescalar.com
-
8/3/2019 05-superscalar 2
38/38
next
superthreading
http://var/www/apps/conversion/current/tmp/My%20Documents/Downloads/05-superthreading.ppthttp://var/www/apps/conversion/current/tmp/My%20Documents/Downloads/05-superthreading.ppt