Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The...
-
Upload
priyanka-meena -
Category
Documents
-
view
217 -
download
2
Transcript of Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The...
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
1/137
COMPUTER ORGANIZATION AND DESIGNThe Hardware/Software Interface
5th
Edition
Chap er 4
The Processor
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
2/137
Chapter 4 — The Processor — 2
Introduction
CPU performance factors Instruction count
Determined by ISA and compiler
CPI and Cycle time Determined by CPU hardware
e will e!amine two "IPS implementations A simplified #ersion A more realistic pipelined #ersion
Simple subset$ shows most aspects "emory reference% lw$ sw Arithmetic/lo&ical% add$ sub$ and$ or$ slt Control transfer% beq$ j
'()*Introductio n
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
3/137
Chapter 4 — The Processor — 3
Instruction Execution
PC→ instruction memory$ fetch instruction +e&ister numbers→ re&ister file$ read re&isters Dependin& on instruction class
Use A,U to calculate Arithmetic result "emory address for load/store -ranch tar&et address
Access data memory for load/store PC← tar&et address or PC . (
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
4/137
Chapter 4 — The Processor — 4
CPU Overview
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
5/137
Chapter 4 — The Processor — 5
Multiplexers
Cant 0ust 0oinwires toðer Use multiple!ers
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
6/137
Chapter 4 — The Processor —
Control
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
7/137Chapter 4 — The Processor — !
"o#ic $esi#n %asics
'()1,o&icDe
s i&nCon#entio
ns
Information encoded in binary ,ow #olta&e 2 3$ Hi&h #olta&e 2 * 4ne wire per bit "ulti5bit data encoded on multi5wire buses
Combinational element 4perate on data
4utput is a function of input State 6se7uential8 elements
Store information
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
8/137Chapter 4 — The Processor — &
Co'(inational Ele'ents
A9D5&ate : 2 A ; -
A
-
:
I3
I*:
"u!
S
"ultiple!er : 2 S < I* % I3
A
-
:.
A
-
: A,U
=
Adder : 2 A . -
Arithmetic/,o&ic Unit : 2 =6A$ -8
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
9/137
Chapter 4 — The Processor — )
*e+uential Ele'ents
+e&ister% stores data in a circuit Uses a cloc> si&nal to determine when to
update the stored #alue Ed&e5tri&&ered% update when Cl> chan&es
from 3 to *
D
Cl>
?
Cl>
D
?
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
10/137
Chapter 4 — The Processor — ,-
*e+uential Ele'ents
+e&ister with write control 4nly updates on cloc> ed&e when write
control input is * Used when stored #alue is re7uired later
D
Cl>
?
rite
rite
D
?
Cl>
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
11/137
Chapter 4 — The Processor — ,,
Cloc.in# Methodolo#/
Combinational lo&ic transforms datadurin& cloc> cycles -etween cloc> ed&es Input from state elements$ output to state
element ,on&est delay determines cloc> period
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
12/137
Chapter 4 — The Processor — ,2
%uildin# a $atapath
Datapath Elements that process data and addresses
in the CPU +e&isters$ A,Us$ mu!s$ memories$ @
e will build a "IPS datapathincrementally +efinin& the o#er#iew desi&n
'()-uildin&
a Datapath
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
13/137
Chapter 4 — The Processor — ,3
Instruction 0etch
15bit
re&ister
Increment by( for ne!tinstruction
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
14/137
Chapter 4 — The Processor — ,4
10or'at Instructions
+ead two re&ister operands Perform arithmetic/lo&ical operation rite re&ister result
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
15/137
Chapter 4 — The Processor — ,5
"oad*tore Instructions
+ead re&ister operands Calculate address usin& *B5bit offset
Use A,U$ but si&n5e!tend offset
,oad% +ead memory and update re&ister
Store% rite re&ister #alue to memory
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
16/137
Chapter 4 — The Processor — ,
%ranch Instructions
+ead re&ister operands Compare operands
Use A,U$ subtract and chec> ero output
Calculate tar&et address Si&n5e!tend displacement Shift left 1 places 6word displacement8
Add to PC . ( Already calculated by instruction fetch
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
17/137
Chapter 4 — The Processor — ,!
%ranch Instructions
ustre5routes
wires
Si&n5bit wirereplicated
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
18/137
Chapter 4 — The Processor — ,&
Co'posin# the Ele'ents
=irst5cut data path does an instruction inone cloc> cycle Each datapath element can only do one
function at a time Hence$ we need separate instruction and data
memories
Use multiple!ers where alternate data
sources are used for different instructions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
19/137
Chapter 4 — The Processor — ,)
1T/pe"oad*tore $atapath
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
20/137
Chapter 4 — The Processor — 2-
0ull $atapath
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
21/137
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
22/137
Chapter 4 — The Processor — 22
"U Control
Assume 15bit A,U4p deri#ed from opcode Combinational lo&ic deri#es A,U control
opcode A,U4p 4peration funct A,U function A,U control
lw 33 load word add 33*3
sw 33 store word add 33*3
be7 3* branch e7ual subtract 3**3
+5type *3 add *33333 add 33*3
subtract *333*3 subtract 3**3
A9D *33*33 A9D 33334+ *33*3* 4+ 333*
set5on5less5than *3*3*3 set5on5less5than 3***
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
23/137
Chapter 4 — The Processor — 23
The Main Control Unit
Control si&nals deri#ed from instruction3 rs rt rd shamt funct
*%1B F%31F%1* 13%*B *F%** *3%B
F or ( rs rt address*%1B 1F%1* 13%*B *F%3
( rs rt address
*%1B 1F%1* 13%*B *F%3
+5type
,oad/Store
-ranch
opcode alwaysread
read$e!ceptfor load
write for+5type
and load
si&n5e!tendand add
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
24/137
Chapter 4 — The Processor — 24
$atapath ith Control
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
25/137
Chapter 4 — The Processor — 25
1T/pe Instruction
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
26/137
Chapter 4 — The Processor — 2
"oad Instruction
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
27/137
Chapter 4 — The Processor — 2!
%ranchonE+ual Instruction
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
28/137
Chapter 4 — The Processor — 2&
I'ple'entin# 6u'ps
ump uses word address
Update PC with concatenation of Top ( bits of old PC 1B5bit 0ump address
33 9eed an e!tra control si&nal decoded from
opcode
1 address*%1B 1F%3
ump
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
29/137
Chapter 4 — The Processor — 2)
$atapath ith 6u'ps dded
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
30/137
Chapter 4 — The Processor — 3-
Per7or'ance Issues
,on&est delay determines cloc> period Critical path% load instruction Instruction memory→ re&ister file→ A,U→
data memory→ re&ister file
9ot feasible to #ary period for differentinstructions
Giolates desi&n principle "a>in& the common case fast
e will impro#e performance by pipelinin&
'(
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
31/137
Chapter 4 — The Processor — 3,
Pipelinin# nalo#/
Pipelined laundry% o#erlappin& e!ecution Parallelism impro#es performance
()FAn4#er#
i ew
ofPipelinin
& =our loads% Speedup
2 /)F 2 1)
9on5stop%
Speedup2 1n/3)Fn . *)F (2 number of sta&es
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
32/137
Chapter 4 — The Processor — 32
MIP* Pipeline
=i#e sta&es$ one step per sta&e
*) I=% Instruction fetch from memory
1) ID% Instruction decode ; re&ister read
) E% E!ecute operation or calculate address
() "E"% Access memory operand
F) -% rite result bac> to re&ister
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
33/137
Chapter 4 — The Processor — 33
Pipeline Per7or'ance
Assume time for sta&es is *33ps for re&ister read or write 133ps for other sta&es
Compare pipelined datapath with sin&le5cycle
datapath
Instr Instr fetch +e&isterread
A,U op "emoryaccess
+e&isterwrite
Total time
lw 133ps *33 ps 133ps 133ps *33 ps 33ps
sw 133ps *33 ps 133ps 133ps J33ps
+5format 133ps *33 ps 133ps *33 ps B33ps
be7 133ps *33 ps 133ps F33ps
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
34/137
Chapter 4 — The Processor — 34
Pipeline Per7or'ance
Sin&le5cycle 6Tc2 33ps8
Pipelined 6Tc2 133ps8
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
35/137
Chapter 4 — The Processor — 35
Pipeline *peedup
If all sta&es are balanced i)e)$ all ta>e the same time
Time between instructionspipelined
2 Time between instructionsnonpipelined9umber of sta&es
If not balanced$ speedup is less
Speedup due to increased throu&hput ,atency 6time for each instruction8 does not
decrease
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
36/137
Chapter 4 — The Processor — 3
Pipelinin# and I* $esi#n
"IPS ISA desi&ned for pipelinin& All instructions are 15bits
Easier to fetch and decode in one cycle c)f) !B% *5 to *J5byte instructions
=ew and re&ular instruction formats Can decode and read re&isters in one step
,oad/store addressin& Can calculate address in rd sta&e$ access memory
in (th sta&e Ali&nment of memory operands
"emory access ta>es only one cycle
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
37/137
Chapter 4 — The Processor — 3!
8a9ards
Situations that pre#ent startin& the ne!tinstruction in the ne!t cycle
Structure haKards A re7uired resource is busy
Data haKard 9eed to wait for pre#ious instruction to
complete its data read/write
Control haKard Decidin& on control action depends on
pre#ious instruction
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
38/137
Chapter 4 — The Processor — 3&
*tructure 8a9ards
Conflict for use of a resource
In "IPS pipeline with a sin&le memory ,oad/store re7uires data access
Instruction fetch would ha#e tostall
for thatcycle ould cause a pipeline LbubbleM
Hence$ pipelined datapaths re7uire
separate instruction/data memories 4r separate instruction/data caches
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
39/137
Chapter 4 — The Processor — 3)
$ata 8a9ards
An instruction depends on completion ofdata access by a pre#ious instruction add $s0, $t0, $t1sub $t2, $s0, $t3
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
40/137
Chapter 4 — The Processor — 4-
0orwardin# :a.a %/passin#;
Use result when it is computed Dont wait for it to be stored in a re&ister +e7uires e!tra connections in the datapath
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
41/137
Chapter 4 — The Processor — 4,
"oadUse $ata 8a9ard
Cant always a#oid stalls by forwardin& If #alue not computed when needed Cant forward bac>ward in timeN
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
42/137
Chapter 4 — The Processor — 42
Code *chedulin# to void *talls
+eorder code to a#oid use of load result inthe ne!t instruction
C code for A = B + E; C = B + F;
lw $t1, 0($t0)
lw $t2, ($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t, !($t0)add $t", $t1, $t
sw $t", 1#($t0)
stall
stall
lw $t1, 0($t0)
lw $t2, ($t0)
lw $t, !($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)add $t", $t1, $t
sw $t", 1#($t0)
** cycles* cycles
C
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
43/137
Chapter 4 — The Processor — 43
Control 8a9ards
-ranch determines flow of control =etchin& ne!t instruction depends on branch
outcome Pipeline cant always fetch correct instruction
Still wor>in& on ID sta&e of branch In "IPS pipeline
9eed to compare re&isters and computetar&et early in the pipeline
Add hardware to do it in ID sta&e
* ll % h
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
44/137
Chapter 4 — The Processor — 44
*tall on %ranch
ait until branch outcome determinedbefore fetchin& ne!t instruction
% h P di ti
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
45/137
Chapter 4 — The Processor — 45
%ranch Prediction
,on&er pipelines cant readily determinebranch outcome early Stall penalty becomes unacceptable
Predict outcome of branch 4nly stall if prediction is wron&
In "IPS pipeline Can predict branches not ta>en =etch instruction after branch$ with no delay
MIP* ith P di t < t T .
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
46/137
Chapter 4 — The Processor — 4
MIP* with Predict
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
47/137
Chapter 4 — The Processor — 4!
More1ealistic %ranch Prediction
Static branch prediction -ased on typical branch beha#ior E!ample% loop and if5statement branches
Predict bac>ward branches ta>en Predict forward branches not ta>en
Dynamic branch prediction Hardware measures actual branch beha#ior
e)&)$ record recent history of each branch
Assume future beha#ior will continue the trend hen wron&$ stall while re5fetchin&$ and update history
Pi li *
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
48/137
Chapter 4 — The Processor — 4&
Pipeline *u''ar/
Pipelinin& impro#es performance byincreasin& instruction throu&hput
E!ecutes multiple instructions in parallel Each instruction has the same latency
Sub0ect to haKards
Structure$ data$ control Instruction set desi&n affects comple!ity of
pipeline implementation
The BIG Pic ure
MIP* Pi li d $ t th
'()
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
49/137
Chapter 4 — The Processor — 4)
MIP* Pipelined $atapathBPipelined
Datapathand
C
ontrol
-
"E"
+i&ht5to5leftflow leads tohaKards
Pi li i t
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
50/137
Chapter 4 — The Processor — 5-
Pipeline re#isters
9eed re&isters between sta&es To hold information produced in pre#ious cycle
Pi li O ti
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
51/137
Chapter 4 — The Processor — 5,
Pipeline Operation
Cycle5by5cycle flow of instructions throu&hthe pipelined datapath LSin&le5cloc>5cycleM pipeline dia&ram
Shows pipeline usa&e in a sin&le cycle Hi&hli&ht resources used
c)f) Lmulti5cloc>5cycleM dia&ram Oraph of operation o#er time
ell loo> at Lsin&le5cloc>5cycleM dia&ramsfor load ; store
I0 7 " d *t
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
52/137
Chapter 4 — The Processor — 52
I0 7or "oad= *tore= >
I$ 7 " d *t
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
53/137
Chapter 4 — The Processor — 53
I$ 7or "oad= *tore= >
E? 7 " d
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
54/137
Chapter 4 — The Processor — 54
E? 7or "oad
MEM 7 " d
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
55/137
Chapter 4 — The Processor — 55
MEM 7or "oad
% 7or "oad
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
56/137
Chapter 4 — The Processor — 5
% 7or "oad
ron&re&ister number
Corrected $atapath 7or "oad
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
57/137
Chapter 4 — The Processor — 5!
Corrected $atapath 7or "oad
E? 7or *tore
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
58/137
Chapter 4 — The Processor — 5&
E? 7or *tore
MEM 7or *tore
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
59/137
Chapter 4 — The Processor — 5)
MEM 7or *tore
% 7or *tore
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
60/137
Chapter 4 — The Processor — -
% 7or *tore
Multi C/cle Pipeline $ia#ra'
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
61/137
Chapter 4 — The Processor — ,
MultiC/cle Pipeline $ia#ra'
=orm showin& resource usa&e
Multi C/cle Pipeline $ia#ra'
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
62/137
Chapter 4 — The Processor — 2
MultiC/cle Pipeline $ia#ra'
Traditional form
*in#le C/cle Pipeline $ia#ra'
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
63/137
Chapter 4 — The Processor — 3
*in#leC/cle Pipeline $ia#ra'
State of pipeline in a &i#en cycle
Pipelined Control :*i'pli7ied;
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
64/137
Chapter 4 — The Processor — 4
Pipelined Control :*i'pli7ied;
Pipelined Control
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
65/137
Chapter 4 — The Processor — 5
Pipelined Control
Control si&nals deri#ed from instruction As in sin&le5cycle implementation
Pipelined Control
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
66/137
Chapter 4 — The Processor —
Pipelined Control
$ata 8a9ards in "U Instructions
'()J
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
67/137
Chapter 4 — The Processor — !
$ata 8a9ards in "U Instructions
Consider this se7uence%
sub $2, $1,$3and $12,$2,$"or $13,$#,$2
add $1,$2,$2sw $1",100($2)
e can resol#e haKards with forwardin&
How do we detect when to forward<
DataHaK
ards%=orwardi n
&#s)S
tallin&
$ependencies @ 0orwardin#
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
68/137
Chapter 4 — The Processor — &
$ependencies @ 0orwardin#
$etectin# the
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
69/137
Chapter 4 — The Processor — )
$etectin# the
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
70/137
Chapter 4 — The Processor — !-
$etectin# the
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
71/137
Chapter 4 — The Processor — !,
0orwardin# Paths
0orwardin# Conditions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
72/137
Chapter 4 — The Processor — !2
0orwardin# Conditions
E haKard if 6E/"E")+e&rite and 6E/"E")+e&ister+d Q 38
and 6E/"E")+e&ister+d 2 ID/E)+e&ister+s88 =orwardA 2 *3
if 6E/"E")+e&rite and 6E/"E")+e&ister+d Q 38 and 6E/"E")+e&ister+d 2 ID/E)+e&ister+t88
=orward- 2 *3 "E" haKard
if 6"E"/-)+e&rite and 6"E"/-)+e&ister+d Q 38 and 6"E"/-)+e&ister+d 2 ID/E)+e&ister+s88 =orwardA 2 3*
if 6"E"/-)+e&rite and 6"E"/-)+e&ister+d Q 38 and 6"E"/-)+e&ister+d 2 ID/E)+e&ister+t88 =orward- 2 3*
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
73/137
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
74/137
$atapath with 0orwardin#
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
75/137
Chapter 4 — The Processor — !5
$atapath with 0orwardin#
"oadUse $ata 8a9ard
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
76/137
Chapter 4 — The Processor — !
"oadUse $ata 8a9ard
9eed to stallfor one cycle
"oadUse 8a9ard $etection
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
77/137
Chapter 4 — The Processor — !!
"oadUse 8a9ard $etection
Chec> when usin& instruction is decodedin ID sta&e
A,U operand re&ister numbers in ID sta&eare &i#en by I=/ID)+e&ister+s$ I=/ID)+e&ister+t
,oad5use haKard when ID/E)"em+ead and
66ID/E)+e&ister+t 2 I=/ID)+e&ister+s8 or 6ID/E)+e&ister+t 2 I=/ID)+e&ister+t88
If detected$ stall and insert bubble
8ow to *tall the Pipeline
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
78/137
Chapter 4 — The Processor — !&
8ow to *tall the Pipeline
=orce control #alues in ID/E re&ister to 3 E$ "E" and - do no 6no5operation8
Pre#ent update of PC and I=/ID re&ister Usin& instruction is decoded a&ain =ollowin& instruction is fetched a&ain *5cycle stall allows "E" to read data for lw
Can subse7uently forward to E sta&e
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
79/137
*tall%u((le in the Pipeline
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
80/137
Chapter 4 — The Processor — &-
*tall%u((le in the Pipeline
4r$ moreaccurately@
$atapath with 8a9ard $etection
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
81/137
Chapter 4 — The Processor — &,
$atapath with 8a9ard $etection
*talls and Per7or'ance
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
82/137
Chapter 4 — The Processor — &2
*talls and Per7or'ance
Stalls reduce performance -ut are re7uired to &et correct results
Compiler can arran&e code to a#oidhaKards and stalls +e7uires >nowled&e of the pipeline structure
The BIG Pic ure
%ranch 8a9ards'()C
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
83/137
Chapter 4 — The Processor — &3
%ranch 8a9ards
If branch outcome determined in "E"
ControlH
aKards
PC
=lush theseinstructions6Set control
#alues to 38
1educin# %ranch $ela/
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
84/137
Chapter 4 — The Processor — &4
1educin# %ranch $ela/
"o#e hardware to determine outcome to ID
sta&e Tar&et address adder +e&ister comparator
E!ample% branch ta>en3#% sub $10, $, $!0% beq $1, $3, &% and $12, $2, $"!% or $13, $2, $#
"2% add $1, $, $2"#% slt $1", $#, $& '''&2% lw $, "0($&)
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
85/137
Exa'pleA %ranch Ta.en
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
86/137
Chapter 4 — The Processor — &
Exa'pleA %ranch Ta.en
$ata 8a9ards 7or %ranches
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
87/137
Chapter 4 — The Processor — &!
$ata 8a9ards 7or %ranches
If a comparison re&ister is a destination of
1nd or rd precedin& A,U instruction
I= ID E "E" -
I= ID E "E" -
I= ID E "E" -
I= ID E "E" -
add $, $", $#
add $1, $2, $3
beq $1, $, taret
Can resol#e usin& forwardin&
$ata 8a9ards 7or %ranches
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
88/137
Chapter 4 — The Processor — &&
$ata 8a9ards 7or %ranches
If a comparison re&ister is a destination of
precedin& A,U instruction or 1nd precedin&load instruction 9eed * stall cycle
beq stalled
I= ID E "E" -
I= ID E "E" -
I= ID
ID E "E" -
add $, $", $#
lw $1, addr
beq $1, $, taret
$ata 8a9ards 7or %ranches
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
89/137
Chapter 4 — The Processor — &)
$ata 8a9ards 7or %ranches
If a comparison re&ister is a destination of
immediately precedin& load instruction 9eed 1 stall cycles
beq stalled
I= ID E "E" -
I= ID
ID
ID E "E" -
beq stalled
lw $1, addr
beq $1, $0, taret
$/na'ic %ranch Prediction
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
90/137
Chapter 4 — The Processor — )-
/ a c a c ed ct o
In deeper and superscalar pipelines$ branch
penalty is more si&nificant Use dynamic prediction
-ranch prediction buffer 6a>a branch history table8
Inde!ed by recent branch instruction addresses Stores outcome 6ta>en/not ta>en8 To e!ecute a branch
Chec> table$ e!pect the same outcome
Start fetchin& from fall5throu&h or tar&et If wron&$ flush pipeline and flip prediction
,%it PredictorA *hortco'in#
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
91/137
Chapter 4 — The Processor — ),
#
Inner loop branches mispredicted twiceN
outer% *nner%
beq , , *nner beq , , outer
"ispredict as ta>en on last iteration of
inner loop Then mispredict as not ta>en on first
iteration of inner loop ne!t time around
2%it Predictor
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
92/137
Chapter 4 — The Processor — )2
4nly chan&e prediction on two successi#e
mispredictions
Calculatin# the %ranch Tar#et
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
93/137
Chapter 4 — The Processor — )3
# #
E#en with predictor$ still need to calculate
the tar&et address *5cycle penalty for a ta>en branch
-ranch tar&et buffer Cache of tar&et addresses Inde!ed by PC when instruction fetched
If hit and instruction is branch predicted ta>en$ can
fetch tar&et immediately
Exceptions and Interrupts'()RE!
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
94/137
Chapter 4 — The Processor — )4
p p
LUne!pectedM e#ents re7uirin& chan&e
in flow of control Different ISAs use the terms differently
E!ception
Arises within the CPU e)&)$ undefined opcode$ o#erflow$ syscall$ @
Interrupt =rom an e!ternal I/4 controller
Dealin& with them without sacrificin&performance is hard
!ception
s
8andlin# Exceptions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
95/137
Chapter 4 — The Processor — )5
# p
In "IPS$ e!ceptions mana&ed by a System
Control Coprocessor 6CP38 Sa#e PC of offendin& 6or interrupted8 instruction
In "IPS% E!ception Pro&ram Counter 6EPC8
Sa#e indication of the problem In "IPS% Cause re&ister ell assume *5bit
3 for undefined opcode$ * for o#erflow
ump to handler at 333 33*3
n lternate Mechanis'
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
96/137
Chapter 4 — The Processor — )
Gectored Interrupts Handler address determined by the cause
E!ample% Undefined opcode% C333 3333
4#erflow% C333 3313 @% C333 33(3
Instructions either
Deal with the interrupt$ or ump to real handler
8andler ctions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
97/137
Chapter 4 — The Processor — )!
+ead cause$ and transfer to rele#ant
handler Determine action re7uired If restartable
Ta>e correcti#e action use EPC to return to pro&ram
4therwise
Terminate pro&ram +eport error usin& EPC$ cause$ @
Exceptions in a Pipeline
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
98/137
Chapter 4 — The Processor — )&
p p
Another form of control haKard Consider o#erflow on add in E sta&e
add $1, $2, $1 Pre#ent * from bein& clobbered
Complete pre#ious instructions =lush add and subse7uent instructions Set Cause and EPC re&ister #alues
Transfer control to handler Similar to mispredicted branch
Use much of the same hardware
Pipeline with Exceptions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
99/137
Chapter 4 — The Processor — ))
p p
Exception Properties
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
100/137
Chapter 4 — The Processor — ,--
p p
+estartable e!ceptions Pipeline can flush the instruction Handler e!ecutes$ then returns to the
instruction
+efetched and e!ecuted from scratch PC sa#ed in EPC re&ister
Identifies causin& instruction
Actually PC . ( is sa#ed Handler must ad0ust
Exception Exa'ple
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
101/137
Chapter 4 — The Processor — ,-,
p p
E!ception on add in
0 sub $11, $2, $ and $12, $2, $"! or $13, $2, $#C add $1, $2, $1"0 slt $1", $#, $&
" lw $1#, "0($&)
Handler !00001!0 sw $2", 1000($0)
!00001! sw $2#, 100($0)
Exception Exa'ple
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
102/137
Chapter 4 — The Processor — ,-2
p p
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
103/137
Multiple Exceptions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
104/137
Chapter 4 — The Processor — ,-4
Pipelinin& o#erlaps multiple instructions Could ha#e multiple e!ceptions at once
Simple approach% deal with e!ception fromearliest instruction
=lush subse7uent instructions LPreciseM e!ceptions
In comple! pipelines "ultiple instructions issued per cycle 4ut5of5order completion "aintainin& precise e!ceptions is difficultN
I'precise Exceptions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
105/137
Chapter 4 — The Processor — ,-5
ust stop pipeline and sa#e state Includin& e!ception cause6s8
,et the handler wor> out hich instruction6s8 had e!ceptions
hich to complete or flush "ay re7uire LmanualM completion
Simplifies hardware$ but more comple! handlersoftware
9ot feasible for comple! multiple5issueout5of5order pipelines
Instruction"evel Parallelis' :I"P;
'()*3Pa
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
106/137
Chapter 4 — The Processor — ,-
Pipelinin&% e!ecutin& multiple instructions in
parallel To increase I,P
Deeper pipeline ,ess wor> per sta&e ⇒ shorter cloc> cycle
"ultiple issue +eplicate pipeline sta&es⇒ multiple pipelines Start multiple instructions per cloc> cycle CPI *$ so use Instructions Per Cycle 6IPC8 E)&)$ (OHK (5way multiple5issue
*B -IPS$ pea> CPI 2 3)1F$ pea> IPC 2 ( -ut dependencies reduce this in practice
arallel is
m#iaInstruc
tion
s
Multiple Issue
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
107/137
Chapter 4 — The Processor — ,-!
Static multiple issue Compiler &roups instructions to be issued toðer Pac>a&es them into Lissue slotsM Compiler detects and a#oids haKards
Dynamic multiple issue CPU e!amines instruction stream and chooses
instructions to issue each cycle Compiler can help by reorderin& instructions
CPU resol#es haKards usin& ad#anced techni7ues atruntime
*peculation
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
108/137
Chapter 4 — The Processor — ,-&
LOuessM what to do with an instruction Start operation as soon as possible Chec> whether &uess was ri&ht
If so$ complete the operation If not$ roll5bac> and do the ri&ht thin&
Common to static and dynamic multiple issue E!amples
Speculate on branch outcome
+oll bac> if path ta>en is different Speculate on load
+oll bac> if location is updated
Co'piler8ardware *peculation
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
109/137
Chapter 4 — The Processor — ,-)
Compiler can reorder instructions e)&)$ mo#e load before branch Can include Lfi!5upM instructions to reco#er
from incorrect &uess
Hardware can loo> ahead for instructionsto e!ecute -uffer results until it determines they are
actually needed =lush buffers on incorrect speculation
*peculation and Exceptions
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
110/137
Chapter 4 — The Processor — ,,-
hat if e!ception occurs on a
speculati#ely e!ecuted instruction< e)&)$ speculati#e load before null5pointer
chec>
Static speculation Can add ISA support for deferrin& e!ceptions
Dynamic speculation Can buffer e!ceptions until instruction
completion 6which may not occur8
*tatic Multiple Issue
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
111/137
Chapter 4 — The Processor — ,,,
Compiler &roups instructions into Lissue
pac>etsM Oroup of instructions that can be issued on a
sin&le cycle
Determined by pipeline resources re7uired Thin> of an issue pac>et as a #ery lon&
instruction
Specifies multiple concurrent operations ⇒ Gery ,on& Instruction ord 6G,I8
*chedulin# *tatic Multiple Issue
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
112/137
Chapter 4 — The Processor — ,,2
Compiler must remo#e some/all haKards +eorder instructions into issue pac>ets 9o dependencies with a pac>et Possibly some dependencies between
pac>ets Garies between ISAs compiler must >nowN
Pad with nop if necessary
MIP* with *tatic $ual Issue
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
113/137
Chapter 4 — The Processor — ,,3
Two5issue pac>ets 4ne A,U/branch instruction 4ne load/store instruction B(5bit ali&ned
A,U/branch$ then load/store Pad an unused instruction with nop
Address Instruction type Pipeline Sta&es
n A,U/branch I= ID E "E" -
n . ( ,oad/store I= ID E "E" -
n . A,U/branch I= ID E "E" -
n . *1 ,oad/store I= ID E "E" -
n . *B A,U/branch I= ID E "E" -
n . 13 ,oad/store I= ID E "E" -
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
114/137
8a9ards in the $ualIssue MIP*
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
115/137
Chapter 4 — The Processor — ,,5
"ore instructions e!ecutin& in parallel
E data haKard =orwardin& a#oided stalls with sin&le5issue 9ow cant use A,U result in load/store in same pac>et
add $t0, $s0, $s1
load $s2, 0($t0) Split into two pac>ets$ effecti#ely a stall
,oad5use haKard Still one cycle use latency$ but now two instructions
"ore a&&ressi#e schedulin& re7uired
*chedulin# Exa'ple
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
116/137
Chapter 4 — The Processor — ,,
Schedule this for dual5issue "IPS
oo% lw $t0, 0($s1) $t0=arra- ele.ent addu $t0, $t0, $s2 add s/alar *n $s2 sw $t0, 0($s1) store result add* $s1, $s1, de/re.ent o*nter
bne $s1, $ero, oo bran/ $s1=0
A,U/branch ,oad/store cycle
oo% no lw $t0, 0($s1) 1
add* $s1, $s1, no 2
addu $t0, $t0, $s2 no 3bne $s1, $ero, oo sw $t0, ($s1)
IPC 2 F/( 2 *)1F 6c)f) pea> IPC 2 18
"oop Unrollin#
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
117/137
Chapter 4 — The Processor — ,,!
+eplicate loop body to e!pose more
parallelism +educes loop5control o#erhead
Use different re&isters per replication Called Lre&ister renamin&M A#oid loop5carried Lanti5dependenciesM
Store followed by a load of the same re&ister
A>a Lname dependenceM +euse of a re&ister name
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
118/137
$/na'ic Multiple Issue
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
119/137
Chapter 4 — The Processor — ,,)
LSuperscalarM processors
CPU decides whether to issue 3$ *$ 1$ @each cycle A#oidin& structural and data haKards
A#oids the need for compiler schedulin& Thou&h it may still help Code semantics ensured by the CPU
$/na'ic Pipeline *chedulin#
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
120/137
Chapter 4 — The Processor — ,2-
Allow the CPU to e!ecute instructions out
of order to a#oid stalls -ut commit result to re&isters in order
E!ample
lw $t0, 20($s2)addu $t1, $t0, $t2sub $s, $s, $t3
slt* $t", $s, 20 Can start sub while addu is waitin& for lw
$/na'icall/ *cheduled CPU
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
121/137
Chapter 4 — The Processor — ,2,
+esults also sentto any waitin&
reser#ation
stations
+eorders buffer forre&ister writes
Can supplyoperands for
issued instructions
Preser#esdependencies
Hold pendin&operands
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
122/137
*peculation
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
123/137
Chapter 4 — The Processor — ,23
Predict branch and continue issuin& Dont commit until branch outcome
determined
,oad speculation A#oid load and cache miss delay
Predict the effecti#e address Predict loaded #alue ,oad before completin& outstandin& stores -ypass stored #alues to load unit
Dont commit load until speculation cleared
h/ $o $/na'ic *chedulin#B
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
124/137
Chapter 4 — The Processor — ,24
hy not 0ust let the compiler schedule
code< 9ot all stalls are predicable
e)&)$ cache misses
Cant always schedule around branches -ranch outcome is dynamically determined
Different implementations of an ISA ha#edifferent latencies and haKards
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
125/137
Power E77icienc/
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
126/137
Chapter 4 — The Processor — ,2
Comple!ity of dynamic schedulin& and
speculations re7uires power "ultiple simpler cores may be better "icroprocessor :ear Cloc> +ate Pipeline
Sta&esIssuewidth
4ut5of5order/Speculation
Cores Power
i(B *RR 1F"HK F * 9o * F
Pentium *RR BB"HK F 1 9o * *3
Pentium Pro *RRJ 133"HK *3 :es * 1R
P( illamette 133* 1333"HK 11 :es * JF
P( Prescott 133( B33"HK * :es * *3
Core 133B 1R3"HK *( ( :es 1 JF
UltraSparc III 133 *RF3"HK *( ( 9o * R3
UltraSparc T* 133F *133"HK B * 9o J3
Cortex & and Intel i!
'()**+
ea
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
127/137
Processor 1M & Intel Core i! )2-
"ar>et Personal "obile De#ice Ser#er$ cloud
Thermal desi&n power 1 atts *3 atts
Cloc> rate * OHK 1)BB OHK
Cores/Chip * (
=loatin& point< 9o :es
"ultiple issue< Dynamic Dynamic
Pea> instructions/cloc> cycle 1 (
Pipeline sta&es *( *(
Pipeline schedule Static in5order Dynamic out5of5orderwith speculation
-ranch prediction 15le#el 15le#el
*st le#el caches/core 1 i- I$ 1 i- D 1 i- I$ 1 i- D
1nd le#el caches/core *15*31( i- 1FB i-
rd le#el caches 6shared8 5 15 "-
Chapter 4 — The Processor — ,2!
alStuff
%The
A+"
Corte!5AandIn
telCor e
iJPipelines
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
128/137
1M Cortex& Per7or'ance
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
129/137
Chapter 4 — The Processor — ,2)
Core i! Pipeline
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
130/137
Chapter 4 — The Processor — ,3-
Core i! Per7or'ance
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
131/137
Chapter 4 — The Processor — ,3,
Matrix Multipl/'()*1In
str
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
132/137
Unrolled C code1 #include
2 #define UNROLL (4)
3
4 vid d!e"" (int n du$le% & du$le% ' du$le% )
*
6 fr ( int i + ,- i < n- i+UNROLL%4 )
/ fr ( int 0 + ,- 0 < n- 0 ) *
8 "26d c4-
fr ( int x + ,- x < UNROLL- x )
1, cx + ""26l5dd(ix%40%n)-
11
12 fr( int 7 + ,- 7 < n- 7 )
13 *
14 "26d $ + ""26$r5dc5td('70%n)-
1 fr (int x + ,- x < UNROLL- x)
16 cx + ""265ddd(cx
1/ ""26"uld(""26l5dd(&n%7x%4i) $))-
18 9
1
2, fr ( int x + ,- x < UNROLL- x )
21 ""26tred(ix%40%n cx)-
22 9
23 9
Chapter 4 — The Processor — ,32
ructio
n
5,e#elParall e
lisma
nd"atri!"
ult iply
Matrix Multipl/
'()*1In
str
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
133/137
Assembly code%1 v"v5d (:r11):;""4 # L5d 4 ele"ent f int :;""4
2 "v :r$x:r5x # re!iter :r5x + :r$x
3 xr :ecx:ecx # re!iter :ecx + ,
4 v"v5d ,x2,(:r11):;""3 # L5d 4 ele"ent f int :;""3
v"v5d ,x4,(:r11):;""2 # L5d 4 ele"ent f int :;""2
6 v"v5d ,x6,(:r11):;""1 # L5d 4 ele"ent f int :;""1
/ v$r5dc5td (:rcx:r1):;"", # 57e 4 cie f ' ele"ent
8 5dd =,x8:rcx # re!iter :rcx + :rcx 8
v"uld (:r5x):;"",:;"" # 5r5llel "ul :;""14 & ele"ent
1, v5ddd :;"":;""4:;""4 # 5r5llel 5dd :;"" :;""411 v"uld ,x2,(:r5x):;"",:;"" # 5r5llel "ul :;""14 & ele"ent
12 v5ddd :;"":;""3:;""3 # 5r5llel 5dd :;"" :;""3
13 v"uld ,x4,(:r5x):;"",:;"" # 5r5llel "ul :;""14 & ele"ent
14 v"uld ,x6,(:r5x):;"",:;"", # 5r5llel "ul :;""14 & ele"ent
1 5dd :r8:r5x # re!iter :r5x + :r5x :r8
16 c" :r1,:rcx # c"5re :r8 t :r5x
1/ v5ddd :;"":;""2:;""2 # 5r5llel 5dd :;"" :;""2
18 v5ddd :;"",:;""1:;""1 # 5r5llel 5dd :;"", :;""1
1 0ne 68 # 0u" if nt :r8 ?+ :r5x
2, 5dd =,x1:ei # re!iter : ei + : ei 1
21 v"v5d :;""4(:r11) # @tre :;""4 int 4 ele"ent
22 v"v5d :;""3,x2,(:r11) # @tre :;""3 int 4 ele"ent
23 v"v5d :;""2,x4,(:r11) # @tre :;""2 int 4 ele"ent
24 v"v5d :;""1,x6,(:r11) # @tre :;""1 int 4 ele"ent
Chapter 4 — The Processor — ,33
ructio
n
5,e#elParall e
lisma
nd"atri!"
ult iply
Per7or'ance I'pact
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
134/137
Chapter 4 — The Processor — ,34
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
135/137
Pit7alls
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
136/137
Chapter 4 — The Processor — ,3
Poor ISA desi&n can ma>e pipelinin&
harder e)&)$ comple! instruction sets 6GA$ IA518
Si&nificant o#erhead to ma>e pipelinin& wor>
IA51 micro5op approach e)&)$ comple! addressin& modes
+e&ister update side effects$ memory indirection
e)&)$ delayed branches Ad#anced pipelines ha#e lon& delay slots
-
8/17/2019 Chapter 04 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
137/137